Create Embeddings
with 1 Click

Cleora PRO helps Data Science and Analytics teams create top quality
embeddings without access to expensive hardware.
Cleora is used in AI projects for the biggest and most innovative companies
Asics - BasemodelIKEA - BasemodelOrange - BasemodelCompany logoCompany logoCompany logoCompany logoCompany logoZabka - BasemodelCompany logoCompany logoCompany logoCompany logoCompany logoCompany logoCompany logoCompany logoCompany logoCompany logoCompany logoNike - BasemodelNike - BasemodelNike - Basemodel
About Cleora

A machine learning tool that enables faster and hyper-easy production of graph embeddings for big graphs

Cleora embeds entities in n-dimensional spherical spaces utilizing extremely fast, stable, and iterative random projections, which allows for unparalleled performance and scalability. Types of data which can be embedded include for example:
purchase events from e-commerce companies, banks, telco companies, and other businesses
click, page view, and other page navigation event data
card transaction and bank transfer events
textual data

200x faster than DeepWalk,
4x-8x faster than Pytorch-BigGraph by Facebook

Cleora computes embeddings of your relational data. Entities such as clients, products, stores, accounts, and others can be represented with embeddings, just like Word2Vec or BERT for text or CLIP for images. Cleora embeddings are behavioral - they represent entities by their behavior history, which has the form of large graphs.

What can you build with Cleora Embeddings?

Recommender Systems
Client Segmentation
Propensity Prediction
Lifetime Value Modeling
Churn Prediction
and many other types of enterprise models
Cleora PRO (Enterprise) vs Cleora Open Source

Self-service Cleora 2.0 is now available for everyone

Cleora Open Source is publicly available on Github and used by many industry leaders.

Key improvements in Cleora 2.0 over the open source 1.0 version:
automatic scaling: no expensive hardware required
ease of use: only 3 columns extracted from your DB are required. Graphs are detected automatically in the data
performance optimizations: 10x faster embedding times
latest research: significantly improved embedding quality
new feature: item attributes are supported
Visit GitHub repository

Embedding quality

The task is to predict the existence of edges in the graph. For example, predicting whether a certain product will be bought by a certain customer. Higher score is better.

Embedding speed

Total time of computing the embeddings.

Key technical features of Cleora embeddings

The embeddings produced by Cleora are different from those produced by Node2vec, Word2vec, DeepWalk or other systems in this class by a number of key properties:

Efficiency

Cleora is two orders of magnitude faster than Node2Vec or DeepWalk. We’ve embedded graphs with 100s of billions of edges on a single machine without GPUs. It likely is the fastest approach possible.

Inductivity

As Cleora embeddings of an entity are defined only by interactions with other entities, vectors for new entities can be computed on-the-fly.

Cross-dataset compositionality

Thanks to stability of Cleora embeddings, embeddings of the same entity on multiple datasets can be combined by averaging, yielding meaningful vectors.

Stability

All starting vectors for entities are deterministic, which means that Cleora embeddings on similar datasets will end up being similar. Methods like Word2vec, Node2vec or DeepWalk return different results with every run.

Extreme parallelism and performance

Cleora is written in Rust utilizing thread-level parallelism for all calculations except input file loading. In practice this means that the embedding process is often faster than loading the input data.

Dim-wise independence

Thanks to the process producing Cleora embeddings, every dimension is independent of others. This property allows for efficient and low-parameter method for combining multi-view embeddings with Conv1d layers.

We used Cleora for customer-restaurants graph data in the National Capital Region (NCR) area. And to our delight, the embedding generation was superfast (i.e <5 minutes). For context, do remember that GraphSAGE took ~20hours for the same data in the NCR region.

Science

Lab

Sair is a lab focused on behavioral modeling, recommendations, large-scale data and graphs processing. We share our ideas, models, and experimental results, also presenting our take on important breakthroughs and interesting technologies. We hope to build a better and more thorough understanding of the field. We believe in the importance of this research not only from a business perspective but most importantly as a study of human decision-making processes.
Research
8 min read

BaseModel vs TIGER for sequential recommendations

The comparison between BaseModel and TIGER reveals substantial differences in their architectural choices and performance.
Read post
Research
8 min read

BaseModel vs HSTU for sequential recommendations

To evaluate BaseModel against HSTU, we replicated the exact data preparation, training, validation, and testing protocols described in the HSTU paper.
Read post
Research
8 min read

Fourier Feature Encoding of numerical features

Pre-processing raw input data is a very important part of any machine learning pipeline, often crucial for end model performance
Read post
Future
6 min read

Why We Need Inhuman Artificial Intelligence

We continuously wonder how much longer it will take until AI reaches human skill level in these tasks - or, when does AI become "truly" intelligent.
Read post
Engineering
12 min read

EMDE vs Multiresolution Hash Encoding

When we created our EMDE algorithm we primarily had in mind the domain of behavioral profiling.
Read post
Tools
8 min read

Efficient integer pair hashing

Mental models are simple expressions of complex processes or relationships.
Read post
Research
9 min read

Cleora: how we handle billion-scale graph data

We have recently open sourced Cleora — an ultra fast vertex embedding tool for graphs & hypergraphs.
Read post
Research
8 min read

Towards a multi-purpose behavioral model

In various subfields of AI research, there is a tendency to create models which can serve many different tasks with minimal fine-tuning effort.
Read post
Research
10 min read

EMDE Illustrated

In this article we provide some intuitive explanations of our objectives and theoretical background of the Efficient Manifold Density Estimator (EMDE)
Read post
Research
7 min read

How we challenge the Transformer

Having achieved remarkable successes in natural language and image processing, Transformers have finally found their way into the area of recommendation.
Read post