Graphalytics 2023 Competition Report

by LDBC Graphalytics Competition Organizers, Alibaba, Ant Group, Intel / on 01 Mar 2026

This report was produced by the following authors (affiliations given as of the time of the competition, 2023):

LDBC: Gábor Szárnyas, David Püroja
Alibaba: Xiaojian Luo, Ke Meng, Wenyuan Yu
Ant Group: Likang Chen, Heng Lin, Shipeng Qi
Intel: Sriram Aananthakrishnan, Pascal Costanza, Henry A. Gabb, Yves Vandriessche

Introduction

Graph analytics and why it matters

Over the last few years, Gartner has consistently ranked graph analytics, graph processing, knowledge graphs, and context-enriched analysis (e.g. graph neural networks) among its top data science trends. This is because connected data, or graphs, are everywhere. Biochemical pathways, electrical grids, roadmaps, and communications networks are common examples of graphs. Because graphs are everywhere, it stands to reason that graph processing is also everywhere. Operations on such connected data invoke various graph analytics algorithms to extract useful information embedded in the connections. For example, web searches examine the connections among webpages to find the most relevant information. Social networks use friend connections to identify communities of people with similar interests. Mapping software looks for optimal routes in road networks.

Similarly, the graphs themselves are highly varied. Some are large and some are small. Some are relatively dense while others are very sparse. Some graphs have highly-skewed degree distributions. For example, the X (formerly Twitter) network is skewed because some people have millions of followers while most have only a few. Web graphs, on the other hand, tend to have a high average degree. Road networks often have a high diameter (the length of the path between the most distant nodes). All of these characteristics affect the performance of graph algorithms.

The computational landscape of graph analytics is very diverse, so no single graph and algorithm pair can represent the entire computational space. Several combinations are needed to give a comprehensive, objective, and reproducible representation of graph analytics performance because optimizations that work well for one graph topology might not work well for others.

GAPBS vs. Graphalytics

Good benchmarks are available to measure the performance of graph analytics software and hardware. The GAP Benchmark Suite from the University of California, Berkeley runs six common algorithms on five graphs with different characteristics to give good coverage of the graph analytics landscape (Beamer et al., 2015). Another advantage is that GAP is easy to run, so it was chosen for a recent comparison of several academic software packages for graph analysis (Azad et al., 2020). However, it does not have automatic correctness checking and its graphs are fixed-size and small by modern standards.

LDBC Graphalytics is more of an industrial-strength benchmark consisting of “six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable objective comparison of graph analysis platforms” (Iosup et al., 2016). It is not as easy to run as GAP, but it covers similar regions of the graph analytics landscape and offers a few key advantages. First, it is strictly deterministic and provides reference output to make correctness checking easier. Second, its input graphs are classified by size, some of which are considerably larger than those provided by GAP. Third, the Graphalytics specification is maintained and updated by the LDBC. In 2023, the LDBC sponsored an open competition for graph practitioners to compare their hardware and/or software using Graphalytics. The results of this competition are described in the next section.

It is worth noting that both GAP and Graphalytics measure static graph analytics. SAGA-Bench is a better option for use-cases that require streaming graphs (Basak et al., 2020).

LDBC Graphalytics 2023 Competition

In 2023, we ran a competition on all Graphalytics algorithms (BFS, CDLP, LCC, PR, SSSP, WCC) and datasets of all sizes (S, M, L, XL, 2XL, 3XL). We received five submissions from three different vendors.

Submissions

First, we provide a brief description of each team’s hardware and software environments.

Ant Group submissions

The Ant Group team ran a Graphalytics implementation on the GeaCompute data processing platform. The benchmark was run on a cluster of ecs.c8i.24xlarge and ecs.c8a.48xlarge cloud instances:

Each ecs.c8i.24xlarge cloud instance has a 96-core Intel Xeon Platinum 8475B with 192 GB of main memory and 8 TB external storage on Alibaba Cloud Capacity NAS.
Each ecs.c8a.48xlarge cloud instance has a 192-core AMD EPYC Genoa 9T24 with 384 GB of main memory and 8 TB external storage on Alibaba Cloud Capacity NAS.

The benchmark used different numbers of instances for different scale data. For S-scale, M-scale, L-scale, XL-scale data, 1, 2, 4, 16 instances were used. For 2XL and 3XL scales, 24 instances were used. The three-year total costs of ownership are the following:

1 instance: 217,609.99 USD,
2 instances: 309,472.61 USD,
4 instances: 457,058.18 USD,
16 instances: 1,548,135.04 USD,
24 instances: 2,204,702.88 USD.

Alibaba submission

The Alibaba team ran a Graphalytics implementation on the libgrape-lite library. It has Graphalytics implementations for both GPU and CPU versions, so they submitted execution time results on both types of hardware.

The GPU version ran on a single ebmgn7e.32xlarge cloud instance. The instance features two Intel(R) Xeon(R) Platinum 8396 @ 2.70 GHz CPUs, each with 32 cores. It has 1 TB of main memory and is equipped with 8 Nvidia A100 GPUs. The external storage used is ESSD PL1, and the operating system is Ubuntu 20.04.5 LTS. The total cost of ownership for three years for this version was 456,697.00 USD.
The CPU version ran on a cluster of r7.16xlarge cloud instances. For S-scale data, 2 instances were used; for M-scale, 4 instances; and for L, XL, 2XL and 3XL scales, 8 instances were used. Each cloud instance has a 32-core Intel(R) Xeon(R) Platinum 8480B CPU, with 512 GB of main memory. The external storage used is Alibaba Cloud Capacity NAS, and the operating system is Ubuntu 20.04.5 LTS. The three-year total costs of ownership for these configurations were the following:
- 2 instances: 72,328.41 USD,
- 4 instances: 114,656.82 USD,
- 8 instances: 199,308.88 USD.

Intel submissions

The Intel team ran the unmodified GraphBLAS reference implementation of Graphalytics. The benchmark was run on on-premises servers:

A custom server with a two-socket Intel Xeon Platinum 8480 (224 hardware threads) with 1 TB 4800 MHz DDR5 RDIMM RAM and 7.6 TB Solidigm D7-P5510 NVMe. The operating system was Ubuntu 22.04.2 LTS. The three-year total cost of ownership was calculated based on public vendor purchase prices for an equivalent system, resulting in 51,826 USD.
For dataset sizes S to L, a PowerEdge R650 server with an Intel Xeon Gold 6342, 16 GB DDR4 RDIMM RAM, and 960 GB SSD, running Ubuntu 22.04. The three-year total cost of ownership was 15,354.81 USD.

Benchmark phases

The Graphalytics benchmark’s workload includes the following phases:

Phases of the Graphalytics workload

The key performance metrics used in the competition are the following:

Makespan (in seconds): The time between the Graphalytics driver issuing the command to execute an algorithm on a (previously uploaded) graph and the output of the algorithm being made available to the driver. The makespan can be further divided into processing time and overhead. The makespan metric corresponds to the operation of a cold graph-processing system, which depicts the situation where the system is started up, processes a single dataset using a single algorithm, and then is shut down.
Processing time (in seconds): The time required to execute an actual algorithm. This does not include platform-specific overhead, such as allocating resources, loading the graph from the file system, or graph partitioning. The processing time metric corresponds to the operation of an in-production, warmed-up graph-processing system, where especially loading of the graph from the file system and graph partitioning, both of which are typically done only once and are algorithm-independent, are not considered.

Benchmark results and analysis

The following tables show the results per dataset size, first ranked by price-adjusted processing throughput then by absolute makespan time.

Price-adjusted throughput and pricing results

Legend for the table header:

MS thru./$: Price-adjusted makespan throughput (per USD).

Proc. thru./$: Price-adjusted processing throughput (per USD). By default, the results are sorted on this column (descending) per dataset size.

Pricing ($): Total cost of ownership in USD.

#	Size	Platform	Environment	MS thru./$	Proc. thru./$	Pricing ($)
1.	S	libgrape-gpu	ebmgn7e.32xlarge	0.080	58.349	456,697.00
2.	S	libgrape-lite	r7.16xlarge	1.454	37.403	72,328.41
3.	S	GraphBLAS	bare metal, Xeon Gold 6342	3.051	11.452	15,354.81
4.	S	GraphBLAS	bare metal, Xeon Platinum 8480	6.262	17.988	51,826.00
5.	S	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	7.838	15.727	217,609.99
1.	M	libgrape-gpu	ebmgn7e.32xlarge	0.025	37.132	456,697.00
2.	M	libgrape-lite	r7.16xlarge	0.492	25.441	114,656.82
3.	M	GraphBLAS	bare metal, Xeon Gold 6342	1.120	5.737	15,354.81
4.	M	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	1.938	5.880	309,472.61
5.	M	GraphBLAS	bare metal, Xeon Platinum 8480	2.658	5.319	51,826.00
1.	L	libgrape-gpu	ebmgn7e.32xlarge	0.033	16.575	456,697.00
2.	L	libgrape-lite	r7.16xlarge	0.145	8.987	199,308.88
3.	L	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	0.428	2.275	457,058.18
4.	L	GraphBLAS	bare metal, Xeon Gold 6342	0.621	2.131	15,354.81
5.	L	GraphBLAS	bare metal, Xeon Platinum 8480	0.923	1.945	51,826.00
1.	XL	libgrape-gpu	ebmgn7e.32xlarge	0.015	3.981	456,697.00
2.	XL	libgrape-lite	r7.16xlarge	0.065	2.072	199,308.88
3.	XL	GraphBLAS	bare metal, Xeon Platinum 8480	0.242	0.388	51,826.00
4.	XL	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	0.077	0.275	1,548,135.04
1.	2XL	libgrape-gpu	ebmgn7e.32xlarge	0.008	0.593	456,697.00
2.	2XL	libgrape-lite	r7.16xlarge	0.047	0.329	199,308.88
3.	2XL	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	0.024	0.050	2,204,702.88
4.	2XL	GraphBLAS	bare metal, Xeon Platinum 8480	0.034	0.039	51,826.00
1.	3XL	libgrape-lite	r7.16xlarge	0.013	0.066	199,308.88
2.	3XL	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	0.005	0.008	2,204,702.88

Between sizes S and L, we can observe a clear pattern: GraphBLAS implementations have the lead on price-adjusted makespan throughput, while libgrape implementations rank higher on price-adjusted processing throughput, with libgrape-gpu dominating the price-adjusted processing throughput results, and GeaCompute taking the place in the middle. This trend largely continues for XL and 2XL datasets, for price-adjusted processing throughput, with libgrape-gpu leading followed by libgrape-lite, while for price-adjusted makespan throughput, GraphBLAS is best for XL and libgrape-lite wins for the 2XL size. On the 3XL datasets, the libgrape-lite implementation has an edge for price-adjusted results over GeaCompute.

Makespan and processing times

Legend for the table header:

Mean MS: Mean makespan (in seconds). By default, the results are sorted on these values (ascending) per dataset size.

Mean proc. time: Mean processing time (in seconds).

#Runs: The minimum number of runs conducted per entry.

#	Size	Platform	Environment	Mean MS (s)	Mean proc. time (s)	#Runs
1.	S	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	2.25	0.20	3
2.	S	GraphBLAS	bare metal, Xeon Platinum 8480	2.46	1.23	3
3.	S	libgrape-lite	r7.16xlarge	9.51	0.37	3
4.	S	GraphBLAS	bare metal, Xeon Gold 6342	10.40	3.62	1
5.	S	libgrape-gpu	ebmgn7e.32xlarge	27.42	0.04	3
1.	M	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	3.72	0.50	3
2.	M	GraphBLAS	bare metal, Xeon Platinum 8480	7.26	3.63	3
3.	M	libgrape-lite	r7.16xlarge	17.73	0.34	3
4.	M	GraphBLAS	bare metal, Xeon Gold 6342	33.60	11.08	1
5.	M	libgrape-gpu	ebmgn7e.32xlarge	85.91	0.06	3
1.	L	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	6.30	0.79	3
2.	L	GraphBLAS	bare metal, Xeon Platinum 8480	20.90	9.92	3
3.	L	libgrape-lite	r7.16xlarge	34.65	0.56	3
4.	L	libgrape-gpu	ebmgn7e.32xlarge	66.55	0.13	3
5.	L	GraphBLAS	bare metal, Xeon Gold 6342	104.82	30.56	1
1.	XL	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	11.83	2.31	3
2.	XL	libgrape-lite	r7.16xlarge	77.63	2.42	3
3.	XL	GraphBLAS	bare metal, Xeon Platinum 8480	79.75	49.71	3
4.	XL	libgrape-gpu	ebmgn7e.32xlarge	150.75	0.55	3
1.	2XL	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	19.56	9.25	3
2.	2XL	libgrape-lite	r7.16xlarge	105.83	15.26	3
3.	2XL	libgrape-gpu	ebmgn7e.32xlarge	259.01	3.69	3
4.	2XL	GraphBLAS	bare metal, Xeon Platinum 8480	570.77	492.87	3
1.	3XL	GeaCompute	ecs.c8i.24xlarge / ecs.c8a.48xlarge	90.17	60.39	3
2.	3XL	libgrape-lite	r7.16xlarge	383.52	75.48	3

Across all dataset sizes, GeaCompute consistently has the best mean makespan runtimes, i.e. it provides the best end-to-end runtimes. For mean processing times, the libgrape-gpu implementation has leads up to 2XL. The largest dataset size, 3XL, was only completed by two distributed implementations, libgrape-lite and GeaCompute, with GeaCompute producing better runtimes. GraphBLAS implementations have the slowest absolute runtimes.

Concluding remarks

This report presented the results of the Graphalytics Competition 2023. The results show a surprisingly large variance between end-to-end performance, processing time in a warmed-up state and total cost of ownership. GeaCompute won in the end-to-end category, the GPU-based libgrape-gpu library was the fastest in most cases for warmed-up processing, while GraphBLAS was the least expensive option.

We thank all participants who entered the competition in 2023. If you are interested in future iterations of the competition, please reach out to [email protected].

References

Azad et al. (2020), Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite, 2020 IEEE International Symposium on Workload Characterization (IISWC).
Basak et al. (2020), SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
Beamer et al. (2015), The GAP Benchmark Suite.
Iosup et al. (2016), LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms, Proceedings of the VLDB Endowment, 9(13), 1317-1328.
Mattson et al. (2013), Standards for Graph Algorithm Primitives, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

Tags:

LDBC

GRAPHALYTICS