Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products. They key problem he explained was the extraction of subgraph isomorphic embeddings. The applications of this process are wide enough to cover protein interaction research, social network analysis and even chemical compound investigation. The testing of subgraph isomorphism is an NP-complete type of problem however, his team is focusing on enumerating all subgraph embeddings which, he explains, is even harder.
During the 8th TUC Meeting Eugene Chong from Oracle USA explained what his team and himself had done to improve RDF query processing in their database.
Jerven Bolleman, Lead Software Developer at Swiss-Prot Group, explained why are they offering a free SPARQL and RDF endpoint for the world to use and why is it hard to optimize it. The data biologists use tends to be extremely ambiguous and dirty, additionally, scientists are always trying to find new questions to ask, thus why the difficulty regarding the optimization of UniProt, they wouldn’t be offering the right service to their users by optimizing the query patterns. Furthermore, since UniProt is publicly funded, all the data needs to be public.
Martin Zand, Professor of Medicine and Public Health Sciences at the Rochester enter for Health Informatics, switched the focus of the presentations talking as a user of graph databases. Zand pinpointed the relevance of using graph in healthcare comparing 3 characteristics of healthcare to their counterpart with graphs:
- Healthcare is delivered by networks.
- Patients traverse those networks.
- The topology of the networks influences outcomes.
The talk of Dr. Zand was structured around the presentation of 3 uses cases:
Tim Hegeman from TU Delft presented a very interesting talk about Social Network Benchmark analytics. Graphalytics is a benchmark developed by TU Delft for graph analytics, complex and holistic graph computations.
As per today, over 100 graph analytics systems exist, Hegeman explains, but they’re not comprehensive and there's where Graphalytics excels. It consists on algorithms and datasets (workload) that have been selected using a 2-stage process to ensure the representativity of the workload. The stages of the process were:
Arnau Prat, Lead Researcher at DAMA-UPC from the Technological University of Catalonia presented a talk on the Interactive Workload of the Social Network Benchmark. One of the key aspects of his talk was the introduction of the SNB Data Generator, tool that generates a Facebook-degree social network distribution (groups, posts, likes…). This synthetic social network follows the principle of homophily, isn’t uniform and allows a fair comparison and reproducibility of benchmark executions while being also scalable by using Apache Hadoop.
Last 22nd and 23rd of June took place the 8th edition of the Technical User Community Meeting held in Oracle headquarters at Redwood Shore (California).
During these two days LDBC hosted more than 20 presentations from key members of the industry such as Oracle, Facebook, Neo4j, SAP or Huawei and research regarding the updates on the work within the council, and graphs & RDF applications. We are going to share all of them as independent blog posts during the following weeks.
Thanks to Oracle for hosting this event!
LDBC is proud to announce the new LDBC Graphalytics Benchmark draft specification. LDBC Graphalytics is the first industry-grade graph data management benchmark.