LDBC Graphalytics Benchmark (LDBC Graphalytics)

The Graphalytics benchmark is an industrial-grade benchmark for graph analysis platforms such as Giraph, Spark GraphX, and GraphBLAS. It consists of six core algorithms, standard data sets, and reference outputs, enabling the objective comparison of graph analysis platforms.

The benchmark harness consists of a core component, which is extendable by a driver for each different platform implementation. The benchmark includes the following algorithms:

  1. breadth-first search (BFS)
  2. PageRank (PR)
  3. weakly connected components (WCC)
  4. community detection using label propagation (CDLP)
  5. local clustering coefficient (LCC)
  6. single-source shortest paths (SSSP)

The choice of these algorithms was carefully motivated, using the LDBC TUC and extensive literature surveys to ensure good coverage of scenarios. The standard data sets include both real and synthetic data sets, which are classified into intuitive “T-shirt” sizes (S, M, L, etc.).

Each experiment set in Graphalytics consists of multiple platform runs (a platform executes an algorithm on a data set), and diverse set of experiments are carried out to evaluate different performance characteristics of a system-under-test.

All completed benchmarks must go through a strict validation process to ensure the integrity of the performance results.

The development of Graphalytics is supported by many active vendors in the field of large-scale graph analytics. Currently, Graphalytics already facilitates benchmarks for a large number of graph analytics platforms, such as GraphBLAS, Giraph, GraphX, and PGX.D, allowing comparison of the state-of-the-art system performance of both community-driven and industrial-driven platforms. To get started, the details of the Graphalyics documentation and its software components are described below.

Documents and repositories

Graphalytics competition 2023

In 2023, we will hold a new round of the Graphalytics competition. See the LDBC Graphalytics Benchmark presentation for an introduction to the benchmark framework and the competition’s rules.

Artifacts:

Rules

  • Participation is free.
  • There are no monetary prizes.
  • Single-node and distributed implementations are allowed.
  • Partial implementations (e.g. just small to mid-sized data sets and only a few algorithms) are allowed.
  • Submissions should execute each algorithm-data set combination three times. From these, the arithmetic mean of the processing times is used for ranking.
  • The results of the competition will be published on the LDBC website in the form of leaderboards, which rank them based on performance and price-performance (adjusted for the system price).
  • There is a global leaderboard that includes all algorithms and scale factors. Additionally, there is a separate leaderboard for each scale (S, M, L, XL, 2XL+), algorithm and system category (CPU-based/GPU-based, single-node vs. distributed) to for fine-grained comparison.
  • Submissions are subject to code review and reproducibility attempts from the organizers.
  • System prices should be reported following the TPC Pricing specification.

Recommendations for submissions

  • Submissions using modern hardware are welcome (GPUs, FPGAs, etc.).
  • We encourage the use of cloud compute instances for running the benchmark (if possible).

Important dates

  • March 17: Competition is announced
  • April 25: Confirmation of intent
  • May 1: Submissions open
  • June 25: Submissions close

Data sets

The Graphalytics data sets are compressed using zstd. The total size of the compressed archives is approx. 350GB. When decompressed, the data sets require approximately 1.5TB of disk space.

For detailed information on the data sets, see the table with their statistics.

The data sets are available in two locations:

Note that some of the Graphalytics data sets were fixed in March 2023. Prior to this, they were incorrectly packaged or had missing/incorrect reference outputs for certain algorithms. If you are uncertain whether you have the correct versions, cross-check them against these MD5 checksums: datagen-9_4-fb, datagen-sf3k-fb, datagen-sf10k-fb, graph500-27, graph500-28, graph500-29, graph500-30.

data set #nodes #edges scale link size
cit-Patents 3,774,768 16,518,947 XS cit-Patents.tar.zst 119.1 MB
com-friendster 65,608,366 1,806,067,135 XL com-friendster.tar.zst 6.7 GB
datagen-7_5-fb 633,432 34,185,747 S datagen-7_5-fb.tar.zst 162.3 MB
datagen-7_6-fb 754,147 42,162,988 S datagen-7_6-fb.tar.zst 200.0 MB
datagen-7_7-zf 13,180,508 32,791,267 S datagen-7_7-zf.tar.zst 434.5 MB
datagen-7_8-zf 16,521,886 41,025,255 S datagen-7_8-zf.tar.zst 544.3 MB
datagen-7_9-fb 1,387,587 85,670,523 S datagen-7_9-fb.tar.zst 401.2 MB
datagen-8_0-fb 1,706,561 107,507,376 M datagen-8_0-fb.tar.zst 502.5 MB
datagen-8_1-fb 2,072,117 134,267,822 M datagen-8_1-fb.tar.zst 625.4 MB
datagen-8_2-zf 43,734,497 106,440,188 M datagen-8_2-zf.tar.zst 1.4 GB
datagen-8_3-zf 53,525,014 130,579,909 M datagen-8_3-zf.tar.zst 1.7 GB
datagen-8_4-fb 3,809,084 269,479,177 M datagen-8_4-fb.tar.zst 1.2 GB
datagen-8_5-fb 4,599,739 332,026,902 L datagen-8_5-fb.tar.zst 1.5 GB
datagen-8_6-fb 5,667,674 421,988,619 L datagen-8_6-fb.tar.zst 1.9 GB
datagen-8_7-zf 145,050,709 340,157,363 L datagen-8_7-zf.tar.zst 4.6 GB
datagen-8_8-zf 168,308,893 413,354,288 L datagen-8_8-zf.tar.zst 5.3 GB
datagen-8_9-fb 10,572,901 848,681,908 L datagen-8_9-fb.tar.zst 3.7 GB
datagen-9_0-fb 12,857,671 1,049,527,225 XL datagen-9_0-fb.tar.zst 4.6 GB
datagen-9_1-fb 16,087,483 1,342,158,397 XL datagen-9_1-fb.tar.zst 5.8 GB
datagen-9_2-zf 434,943,376 1,042,340,732 XL datagen-9_2-zf.tar.zst 13.7 GB
datagen-9_3-zf 555,270,053 1,309,998,551 XL datagen-9_3-zf.tar.zst 17.4 GB
datagen-9_4-fb 29,310,565 2,588,948,669 XL datagen-9_4-fb.tar.zst 14.0 GB
datagen-sf3k-fb 33,484,375 2,912,009,743 XL datagen-sf3k-fb.tar.zst 12.7 GB
datagen-sf10k-fb 100,218,750 9,404,822,538 2XL datagen-sf10k-fb.tar.zst 40.5 GB
dota-league 61,170 50,870,313 S dota-league.tar.zst 114.3 MB
graph500-22 2,396,657 64,155,735 S graph500-22.tar.zst 202.4 MB
graph500-23 4,610,222 129,333,677 M graph500-23.tar.zst 410.6 MB
graph500-24 8,870,942 260,379,520 M graph500-24.tar.zst 847.7 MB
graph500-25 17,062,472 523,602,831 L graph500-25.tar.zst 1.7 GB
graph500-26 32,804,978 1,051,922,853 XL graph500-26.tar.zst 3.4 GB
graph500-27 63,081,040 2,111,642,032 XL graph500-27.tar.zst 7.1 GB
graph500-28 121,242,388 4,236,163,958 2XL graph500-28.tar.zst 14.4 GB
graph500-29 232,999,630 8,493,569,115 2XL graph500-29.tar.zst 29.6 GB
graph500-30 447,797,986 17,022,117,362 3XL graph500-30.tar.zst 60.8 GB
kgs 832,247 17,891,698 XS kgs.tar.zst 65.7 MB
twitter_mpi 52,579,678 1,963,263,508 XL twitter_mpi.tar.zst 5.7 GB
wiki-Talk 2,394,385 5,021,410 2XS wiki-Talk.tar.zst 34.9 MB
example-directed 10 17 - example-directed.tar.zst 1.0 KB
example-undirected 9 12 - example-undirected.tar.zst 1.0 KB
test-bfs-directed <100 <100 - test-bfs-directed.tar.zst <2.0 KB
test-bfs-undirected <100 <100 - test-bfs-undirected.tar.zst <2.0 KB
test-cdlp-directed <100 <100 - test-cdlp-directed.tar.zst <2.0 KB
test-cdlp-undirected <100 <100 - test-cdlp-undirected.tar.zst <2.0 KB
test-pr-directed <100 <100 - test-pr-directed.tar.zst <2.0 KB
test-pr-undirected <100 <100 - test-pr-undirected.tar.zst <2.0 KB
test-lcc-directed <100 <100 - test-lcc-directed.tar.zst <2.0 KB
test-lcc-undirected <100 <100 - test-lcc-undirected.tar.zst <2.0 KB
test-wcc-directed <100 <100 - test-wcc-directed.tar.zst <2.0 KB
test-wcc-undirected <100 <100 - test-wcc-undirected.tar.zst <2.0 KB
test-sssp-directed <100 <100 - test-sssp-directed.tar.zst <2.0 KB
test-sssp-undirected <100 <100 - test-sssp-undirected.tar.zst <2.0 KB