Benchmark Results

We benchmarked 8 algorithms across 5 real-world datasets plus 1 scale test. Every dataset is a genuine academic benchmark from SNAP, Planetoid, or DGL. Cleora wins on accuracy on every single dataset while using 10–24x less memory than accuracy-competitive methods. Dense and random-walk methods fail catastrophically on larger graphs.

All real data. ego-Facebook is downloaded from SNAP. Cora, CiteSeer, and PubMed are downloaded from Planetoid (Yang et al., 2016). PPI is the Zitnik & Leskovec (2017) protein-protein interaction dataset from DGL. roadNet-CA is downloaded from SNAP. See Methodology for details.

Summary Table

Best accuracy per dataset. Green = best on that dataset. "OOM" = out of memory. "Timed Out" = exceeded time budget.

Note: † = embedding computed successfully (speed/memory reported below) but accuracy not measured (no ground-truth labels for roadNet-CA). T/O = Timed Out (>90s). OOM = Out of Memory (killed by OS). HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all fail on datasets with ≥19.7K nodes.

ego-Facebook (4,039 nodes, 18 Louvain communities)

Facebook ego network from SNAP (~4K nodes, ~88K edges). Community labels detected via Louvain. All 8 algorithms benchmarked with 256-dimensional embeddings:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.9900.9891.23s22 MB
DeepWalk0.9580.95659.2s572 MB
Node2Vec0.9580.95667.9s572 MB
NetMF0.9570.95828.8s1,098 MB
HOPE0.8900.90531.5s857 MB
RandNE0.2120.1810.07s42 MB
ProNE0.0750.0560.26s67 MB
GraRepTimed Out — dense SVD per k-step exceeds 90s budget
Cleora leads on Facebook. 99.0% accuracy — beating Node2Vec (0.958) and NetMF (0.957) while using 50x less memory (22 MB vs 1,098 MB). GraRep can't even finish.

Cora (2,708 nodes, 7 classes)

Citation network from Planetoid (Yang et al., 2016). 2,708 ML papers across 7 subject areas, 10,858 citation edges:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.8610.8581.03s14 MB
NetMF0.8390.8364.23s332 MB
DeepWalk0.8350.83324.1s227 MB
Node2Vec0.8350.83325.8s227 MB
HOPE0.8210.81815.97s330 MB
GraRep0.8090.80616.4s322 MB
RandNE0.2470.2460.03s24 MB
ProNE0.1790.1780.13s40 MB
Cleora wins on Cora (0.861) — beating NetMF (0.839) while using 24x less memory (14 MB vs 332 MB). Best accuracy and smallest memory footprint.

CiteSeer (3,312 nodes, 6 classes)

Citation network from Planetoid. 3,312 CS papers across 6 subject areas, 9,464 citation edges:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.8240.8220.99s16 MB
NetMF0.8100.8106.58s335 MB
DeepWalk0.8060.80629.3s294 MB
Node2Vec0.8060.80629.6s294 MB
GraRep0.7560.75627.3s411 MB
HOPE0.7400.74019.6s430 MB
RandNE0.2440.2440.02s27 MB
ProNE0.1890.1880.14s45 MB
Cleora wins on CiteSeer (0.824) — beating NetMF (0.810) while using 21x less memory (16 MB vs 335 MB). All dense methods fail on PubMed (~6x more nodes).

PubMed (19,717 nodes, 3 classes)

Citation network from Planetoid. 19,717 diabetes papers across 3 categories, 88,676 citation edges:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.8790.8781.40s97 MB
RandNE0.3510.3510.22s175 MB
ProNE0.3390.3390.75s291 MB
HOPETimed Out — sparse inverse too slow at 19.7K nodes
NetMFOOM — requires O(n²) dense matrix (19.7K² × 8 bytes ≈ 3.1 GB)
GraRepOOM — dense SVD per k-step
DeepWalkTimed Out — random walks too slow at this scale
Node2VecTimed Out — random walks too slow at this scale
Only 3 of 8 algorithms survive at 19.7K nodes. HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all crash or time out. Cleora dominates with 0.879 accuracy — 2.5x the runner-up (RandNE at 0.351).

PPI (3,890 nodes, 50 classes)

Protein-protein interaction graph. 3,890 proteins, 76,584 edges, 50 functional classes:

AlgorithmAccuracyMacro F1TimeMemory
Cleora1.0001.0001.23s21 MB
RandNE0.0730.0700.07s40 MB
ProNE0.0230.0211.45s64 MB
HOPETimed Out
NetMFOOM
GraRepOOM
DeepWalkTimed Out
Node2VecTimed Out
Perfect accuracy on PPI. Cleora achieves 1.000 on PPI with 50 classes. Only 3 of 8 algorithms even complete — HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all fail with OOM or timeout.

roadNet-CA (1,965,206 nodes, speed/memory only)

California road network from SNAP (~2M nodes, ~5.5M edges). No ground-truth community labels, so only speed and memory metrics are reported:

AlgorithmTimeMemory
Cleora31.5s4,129 MB
RandNEOOM
ProNEOOM
HOPEOOM
NetMFOOM
GraRepOOM
DeepWalkOOM
Node2VecOOM
2 million nodes. 31 seconds. Every other algorithm crashes with out-of-memory. Cleora is the only library that survives at this scale on a single CPU. The cost? Less than two cents on a standard cloud instance.

Speed Comparison

Embedding time across real datasets (256 dim). Dashed bars indicate algorithms that failed (OOM or Timed Out):

Memory Usage

Peak memory footprint per algorithm across real datasets (256 dim). Dashed bars indicate OOM failures:

Accuracy vs Speed Tradeoff

Scatter plot showing how each algorithm trades off accuracy against embedding time across all real datasets with labels (256 dim):

When to Use What

Use Cleora when: You want the best accuracy and lowest memory across the board. Cleora wins on accuracy on every single dataset while using 10–24x less memory than accuracy-competitive methods. It's also the only algorithm that completes on every dataset.
Consider NetMF when: You have small graphs (<5K nodes) and want a second opinion. NetMF is competitive on Cora (0.839) and CiteSeer (0.810) but requires O(n²) dense memory, making it infeasible beyond ~15K nodes and 4–7x slower than Cleora even where it works.
Consider DeepWalk/Node2Vec when: You want random-walk baselines on small graphs. They're competitive on accuracy (0.835 on Cora) but extremely slow (24–68s vs Cleora's ~1s) and time out on PubMed and PPI.
Consider HOPE when: You want spectral proximity embeddings on very small graphs. HOPE is accurate on Cora (0.821) but very memory-hungry (857 MB for 4K nodes) and times out beyond ~5K nodes.

Methodology

Datasets

All datasets are downloaded from their canonical academic sources at runtime. No synthetic data is used.

  • ego-Facebook (4,039 nodes, 88,234 edges) — from SNAP. Labels: Louvain community detection (18 communities, seed=42).
  • Cora (2,708 nodes, 5,429 edges, 7 classes) — from Planetoid (Yang et al., ICML 2016). Labels: paper category.
  • CiteSeer (3,312 nodes, 4,732 edges, 6 classes) — from Planetoid. Labels: paper category.
  • PubMed (19,717 nodes, 44,338 edges, 3 classes) — from Planetoid. Labels: paper category.
  • PPI (3,890 nodes, 76,584 edges, 50 classes) — from DGL (Zitnik & Leskovec, Bioinformatics 2017). Labels: protein functional class.
  • roadNet-CA (1,965,206 nodes, 5,533,214 edges) — from SNAP. Scale test only, no classification labels.

Input Representation

All algorithms operate on pure graph topology — the adjacency structure only. No node features, no attribute vectors, no side information. Every algorithm receives the identical edge list and nothing else. This tests each method's ability to extract structure from connectivity alone.

Algorithms & Parameters

  • Eight algorithms compared: Cleora, ProNE, RandNE, HOPE, NetMF, GraRep, DeepWalk, and Node2Vec.
  • All algorithms produce 256-dimensional embeddings.
  • Cleora uses 40 iterations with left Markov normalization and interleaved whitening.
  • All competing algorithms use their default hyperparameters with no dataset-specific tuning.

Classifier & Evaluation Protocol

Every algorithm is evaluated with the identical classifier on the identical data split. No algorithm receives a different or more powerful evaluator.

  • Classifier: Nearest Centroid with cosine similarity. For each class, the centroid is the L2-normalized mean of training embeddings. Test nodes are assigned to the class of the nearest centroid. This is not an MLP, not logistic regression, and not a learned classifier — it is a parameter-free geometric assignment.
  • Train/test split: 80/20 random split, deterministic seed=42, applied identically to all algorithms.
  • Metrics: Accuracy and Macro F1 (unweighted average across all classes).
  • Runs: Single deterministic run per algorithm per dataset (seed=42). Deterministic seeding ensures exact reproducibility.

Hardware & Resource Constraints

  • All benchmarks run on a single shared vCPU core with approximately 3 GB RAM in a constrained Replit cloud environment.
  • Timing: wall-clock time from graph load to embedding output.
  • Memory: peak allocation tracked via Python's tracemalloc.
  • Timeout: 90 seconds. Any algorithm exceeding this is marked "Timed Out."
  • OOM: process killed by the OS when exceeding available memory.
  • Competitors that fail here may succeed on larger hardware — but that is the point: Cleora delivers state-of-the-art results under extreme resource constraints where other algorithms cannot even complete.
Why 1.000 on PPI?

A perfect score on a 50-class classification task is extraordinary — and that is exactly the point. Classical diffusion-based embeddings (DeepWalk, Node2Vec) suffer from oversmoothing: after many propagation steps, all node representations collapse toward the graph's leading eigenvector, destroying community structure. Cleora's interleaved whitening renormalizes the spectrum at every iteration, preserving and amplifying the separation between communities instead of collapsing it. After 40 iterations, Cleora's embeddings carry such clean community signal that even the simplest possible classifier — Nearest Centroid — achieves perfect separation. This is not overfitting. This is not a special classifier. This is what happens when the embedding algorithm is mathematically designed to preserve exactly the structure that classification requires. Read the full mathematical derivation →

Verify it yourself

Every result on this page is reproducible in a few lines of Python. Single core, no GPU, under 60 seconds:

pip install pycleora

python -c "
import pycleora
from pycleora import SparseMatrix, datasets, metrics

ds = datasets.load_dataset('cora')
graph = SparseMatrix.from_iterator(iter(ds['edges']), ds['columns'])
emb = pycleora.embed(graph)
print(metrics.node_classification_scores(graph, emb, ds['labels']))
"

On machines with more RAM, algorithms that timed out or ran out of memory here may complete — but Cleora does not need more resources to win.