The Graphalytics benchmark is an industrial-grade benchmark for graph analysis platforms such as Giraph, Spark GraphX, and GraphBLAS. It consists of six core algorithms, standard datasets, and reference outputs, enabling the objective comparison of graph analysis platforms.
The benchmark harness consists of a core component, which is extendable by a driver for each different platform implementation. The benchmark includes the following algorithms:
- breadth-first search (BFS)
- PageRank (PR)
- weakly connected components (WCC)
- community detection using label propagation (CDLP)
- local clustering coefficient (LCC)
- single-source shortest paths (SSSP)
The choice of these algorithms was carefully motivated, using the LDBC TUC and extensive literature surveys to ensure good coverage of scenarios. The standard datasets include both real and synthetic datasets, which are classified into intuitive “T-shirt” sizes (S, M, L, etc.).
Each experiment set in Graphalytics consists of multiple platform runs (a platform executes an algorithm on a dataset), and diverse set of experiments are carried out to evaluate different performance characteristics of a system-under-test.
All completed benchmarks must go through a strict validation process to ensure the integrity of the performance results.
The development of Graphalytics is supported by many active vendors in the field of large-scale graph analytics. Currently, Graphalytics already facilitates benchmarks for a large number of graph analytics platforms, such as GraphBLAS, Giraph, GraphX, and PGX.D, allowing comparison of the state-of-the-art system performance of both community-driven and industrial-driven platforms. To get started, the details of the Graphalyics documentation and its software components are described below.
Documents and repositories
- Benchmark specification. The source code is stored in the
ldbc_graphalytics_docs
repository - VLDB paper
ldbc_graphalytics
: Generic driverldbc_graphalytics_platforms_umbra
: Umbra implementationldbc_graphalytics_platforms_graphblas
: GraphBLAS implementation
Graphalytics competition 2023
In 2023, we will hold a new round of the Graphalytics competition. See the LDBC Graphalytics Benchmark presentation for an introduction to the benchmark framework and the competition’s rules.
Artifacts:
- benchmark framework
- reference implementations
- datasets (datasets and expected results) are available on GitHub
Rules
- Participation is free.
- There are no monetary prizes.
- Single-node and distributed implementations are allowed.
- Partial implementations (e.g. just small to mid-sized datasets and only a few algorithms) are allowed.
- Submissions should execute each algorithm-dataset combination three times. From these, the arithmetic mean of the processing times is used for ranking.
- The results of the competition will be published on the LDBC website in the form of leaderboards, which rank them based on performance and price-performance (adjusted for the system price).
- There is a global leaderboard that includes all algorithms and scale factors. Additionally, there is a separate leaderboard for each scale (S, M, L, XL, 2XL+), algorithm and system category (CPU-based/GPU-based, single-node vs. distributed) to for fine-grained comparison.
- Submissions are subject to code review and reproducibility attempts from the organizers.
- System prices should be reported following the TPC Pricing specification.
Recommendations for submissions
- Submissions using modern hardware are welcome (GPUs, FPGAs, etc.).
- We encourage the use of cloud compute instances for running the benchmark (if possible).
Data sets
The Graphalytics datasets are compressed using zstd
. The total size of the compressed archives is approx. 350 GB. When decompressed, the datasets require approximately 1.5 TB of disk space.
For detailed information on the datasets, see the table with their statistics.
The datasets are available in a public Cloudflare R2 bucket.
- Shell script to download the datasets from Cloudflare R2
- Download scripts for individual sizes: test graphs, sizes up to S, size M, size L, size XL, sizes 2XL+
- CWI/SURFsara data repository
Note that some of the Graphalytics datasets were fixed in March 2023. If you downloaded the datasets prior to this point, some datasets had missing/incorrect reference outputs for certain algorithms. Therefore, we recommend to download the datasets again.
dataset | #nodes | #edges | scale | link | size |
---|---|---|---|---|---|
cit-Patents | 3,774,768 | 16,518,947 | XS | cit-Patents.tar.zst |
119.1 MB |
com-friendster | 65,608,366 | 1,806,067,135 | XL | com-friendster.tar.zst |
6.7 GB |
datagen-7_5-fb | 633,432 | 34,185,747 | S | datagen-7_5-fb.tar.zst |
162.3 MB |
datagen-7_6-fb | 754,147 | 42,162,988 | S | datagen-7_6-fb.tar.zst |
200.0 MB |
datagen-7_7-zf | 13,180,508 | 32,791,267 | S | datagen-7_7-zf.tar.zst |
434.5 MB |
datagen-7_8-zf | 16,521,886 | 41,025,255 | S | datagen-7_8-zf.tar.zst |
544.3 MB |
datagen-7_9-fb | 1,387,587 | 85,670,523 | S | datagen-7_9-fb.tar.zst |
401.2 MB |
datagen-8_0-fb | 1,706,561 | 107,507,376 | M | datagen-8_0-fb.tar.zst |
502.5 MB |
datagen-8_1-fb | 2,072,117 | 134,267,822 | M | datagen-8_1-fb.tar.zst |
625.4 MB |
datagen-8_2-zf | 43,734,497 | 106,440,188 | M | datagen-8_2-zf.tar.zst |
1.4 GB |
datagen-8_3-zf | 53,525,014 | 130,579,909 | M | datagen-8_3-zf.tar.zst |
1.7 GB |
datagen-8_4-fb | 3,809,084 | 269,479,177 | M | datagen-8_4-fb.tar.zst |
1.2 GB |
datagen-8_5-fb | 4,599,739 | 332,026,902 | L | datagen-8_5-fb.tar.zst |
1.5 GB |
datagen-8_6-fb | 5,667,674 | 421,988,619 | L | datagen-8_6-fb.tar.zst |
1.9 GB |
datagen-8_7-zf | 145,050,709 | 340,157,363 | L | datagen-8_7-zf.tar.zst |
4.6 GB |
datagen-8_8-zf | 168,308,893 | 413,354,288 | L | datagen-8_8-zf.tar.zst |
5.3 GB |
datagen-8_9-fb | 10,572,901 | 848,681,908 | L | datagen-8_9-fb.tar.zst |
3.7 GB |
datagen-9_0-fb | 12,857,671 | 1,049,527,225 | XL | datagen-9_0-fb.tar.zst |
4.6 GB |
datagen-9_1-fb | 16,087,483 | 1,342,158,397 | XL | datagen-9_1-fb.tar.zst |
5.8 GB |
datagen-9_2-zf | 434,943,376 | 1,042,340,732 | XL | datagen-9_2-zf.tar.zst |
13.7 GB |
datagen-9_3-zf | 555,270,053 | 1,309,998,551 | XL | datagen-9_3-zf.tar.zst |
17.4 GB |
datagen-9_4-fb | 29,310,565 | 2,588,948,669 | XL | datagen-9_4-fb.tar.zst |
14.0 GB |
datagen-sf3k-fb | 33,484,375 | 2,912,009,743 | XL | datagen-sf3k-fb.tar.zst |
12.7 GB |
datagen-sf10k-fb | 100,218,750 | 9,404,822,538 | 2XL | datagen-sf10k-fb.tar.zst |
40.5 GB |
dota-league | 61,170 | 50,870,313 | S | dota-league.tar.zst |
114.3 MB |
graph500-22 | 2,396,657 | 64,155,735 | S | graph500-22.tar.zst |
202.4 MB |
graph500-23 | 4,610,222 | 129,333,677 | M | graph500-23.tar.zst |
410.6 MB |
graph500-24 | 8,870,942 | 260,379,520 | M | graph500-24.tar.zst |
847.7 MB |
graph500-25 | 17,062,472 | 523,602,831 | L | graph500-25.tar.zst |
1.7 GB |
graph500-26 | 32,804,978 | 1,051,922,853 | XL | graph500-26.tar.zst |
3.4 GB |
graph500-27 | 63,081,040 | 2,111,642,032 | XL | graph500-27.tar.zst |
7.1 GB |
graph500-28 | 121,242,388 | 4,236,163,958 | 2XL | graph500-28.tar.zst |
14.4 GB |
graph500-29 | 232,999,630 | 8,493,569,115 | 2XL | graph500-29.tar.zst |
29.6 GB |
graph500-30 | 447,797,986 | 17,022,117,362 | 3XL | graph500-30.tar.zst |
60.8 GB |
kgs | 832,247 | 17,891,698 | XS | kgs.tar.zst |
65.7 MB |
twitter_mpi | 52,579,678 | 1,963,263,508 | XL | twitter_mpi.tar.zst |
5.7 GB |
wiki-Talk | 2,394,385 | 5,021,410 | 2XS | wiki-Talk.tar.zst |
34.9 MB |
example-directed | 10 | 17 | - | example-directed.tar.zst |
1.0 KB |
example-undirected | 9 | 12 | - | example-undirected.tar.zst |
1.0 KB |
test-bfs-directed | <100 | <100 | - | test-bfs-directed.tar.zst |
<2.0 KB |
test-bfs-undirected | <100 | <100 | - | test-bfs-undirected.tar.zst |
<2.0 KB |
test-cdlp-directed | <100 | <100 | - | test-cdlp-directed.tar.zst |
<2.0 KB |
test-cdlp-undirected | <100 | <100 | - | test-cdlp-undirected.tar.zst |
<2.0 KB |
test-pr-directed | <100 | <100 | - | test-pr-directed.tar.zst |
<2.0 KB |
test-pr-undirected | <100 | <100 | - | test-pr-undirected.tar.zst |
<2.0 KB |
test-lcc-directed | <100 | <100 | - | test-lcc-directed.tar.zst |
<2.0 KB |
test-lcc-undirected | <100 | <100 | - | test-lcc-undirected.tar.zst |
<2.0 KB |
test-wcc-directed | <100 | <100 | - | test-wcc-directed.tar.zst |
<2.0 KB |
test-wcc-undirected | <100 | <100 | - | test-wcc-undirected.tar.zst |
<2.0 KB |
test-sssp-directed | <100 | <100 | - | test-sssp-directed.tar.zst |
<2.0 KB |
test-sssp-undirected | <100 | <100 | - | test-sssp-undirected.tar.zst |
<2.0 KB |