Marcus Paradies, Software developer at SAP extended the talk Arnau Prat gave about the SNB, in this case about the Intelligence workload. In contrast with the 17+4 queries the Interactive workload has, the Business Intelligence (BI) workload consists on 24 queries that can be seen as OLAP-style against the OLTP-style of the Interactive one. The BI focuses on analytic queries and they touch the whole graph.
Sergey Edunov, Software Engineer at Facebook gave a great talk on how and why his company generating large-scale social graphs. The underlying reasons to start such an ambitious project are capacity planning to make sure that their system will be able to handle a graph that keeps growing year after year and fair evaluation of their system against the ones being implemented by other companies.
Weining Qian, professor at East China Normal University presented his talk on Statistical Characteristics of Real-Life Knowledge graphs during the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California.
Qian explained that term knowledge graph was introduced by Google in 2012 and it has been an evolution of the semantic web. Professor Qian then introduced the main question of his talk: how can we efficiently manage knowledge graphs? Are the existing benchmarks sufficient to test them since most of these benchmarks focus only on Social Networks?
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products. They key problem he explained was the extraction of subgraph isomorphic embeddings. The applications of this process are wide enough to cover protein interaction research, social network analysis and even chemical compound investigation. The testing of subgraph isomorphism is an NP-complete type of problem however, his team is focusing on enumerating all subgraph embeddings which, he explains, is even harder.
During the 8th TUC Meeting Eugene Chong from Oracle USA explained what his team and himself had done to improve RDF query processing in their database.
Jerven Bolleman, Lead Software Developer at Swiss-Prot Group, explained why are they offering a free SPARQL and RDF endpoint for the world to use and why is it hard to optimize it. The data biologists use tends to be extremely ambiguous and dirty, additionally, scientists are always trying to find new questions to ask, thus why the difficulty regarding the optimization of UniProt, they wouldn’t be offering the right service to their users by optimizing the query patterns. Furthermore, since UniProt is publicly funded, all the data needs to be public.
Martin Zand, Professor of Medicine and Public Health Sciences at the Rochester enter for Health Informatics, switched the focus of the presentations talking as a user of graph databases. Zand pinpointed the relevance of using graph in healthcare comparing 3 characteristics of healthcare to their counterpart with graphs:
- Healthcare is delivered by networks.
- Patients traverse those networks.
- The topology of the networks influences outcomes.
The talk of Dr. Zand was structured around the presentation of 3 uses cases:
Tim Hegeman from TU Delft presented a very interesting talk about Social Network Benchmark analytics. Graphalytics is a benchmark developed by TU Delft for graph analytics, complex and holistic graph computations.
As per today, over 100 graph analytics systems exist, Hegeman explains, but they’re not comprehensive and there's where Graphalytics excels. It consists on algorithms and datasets (workload) that have been selected using a 2-stage process to ensure the representativity of the workload. The stages of the process were:
Arnau Prat, Lead Researcher at DAMA-UPC from the Technological University of Catalonia presented a talk on the Interactive Workload of the Social Network Benchmark. One of the key aspects of his talk was the introduction of the SNB Data Generator, tool that generates a Facebook-degree social network distribution (groups, posts, likes…). This synthetic social network follows the principle of homophily, isn’t uniform and allows a fair comparison and reproducibility of benchmark executions while being also scalable by using Apache Hadoop.