Third TUC Meeting

The LDBC consortium is pleased to announce the third Technical User Community (TUC) meeting!

This will be a one day event in London on the 19 November 2013 running in collaboration with the GraphConnect event (18/19 November). Registered TUC participants that would like a free pass to all of GraphConnect should register for GraphConnect using this following coupon code: LDBCTUC.

The TUC event will include:

Introduction to the objectives and progress of the LDBC project
Description of the progress of the benchmarks being evolved through Task Forces
Users explaining their use-cases and describing the limitations they have found in current technology
Industry discussions on the contents of the benchmarks

We will also be launching the LDBC non-profit organization, so anyone outside the EU project will be able to join as a member.

We will kick off new benchmark development task forces in the coming year, and talks at this coming TUC will play an important role in deciding the use case scenarios that will drive those benchmarks.

All users of RDF and graph databases are welcome to attend.

Agenda
Logistics
LDBC/TUC Background
- Social Network Benchmark
- Semantic Publishing Benchmark

Agenda

November 19th - Public TUC Meeting

8:00 Breakfast and registration will open for Graph Connect/TUC at 8:00 am (Dexter House)

short LDBC presentation (Peter Boncz) during GraphConnect keynote by Emil Eifrem (09:00-09:30 Dexter House)

NOTE: the TUC meeting is at the Tower Hotel, nearby Dexter House.

10:00 TUC Meeting Opening (Peter Boncz)

10:10 TUC Presentations (RDF Application Descriptions)

Johan Hjerling (BBC): BBC Linked Data and the Semantic Publishing Benchmark
Andreas Both (Unister): Ontology-driven applications in an e-commerce context
Nuno Carvalho (Fujitsu Laboratories Europe): Fujitsu RDF use cases and benchmarking requirements
Robina Clayphan (Europeana): Europeana and Open Data

11:30 Semantic Publishing Benchmark (SPB)

Venelin Kotsev (Ontotext - LDBC): Semantic Publishing Benchmark Task Force Update and report

12:00-13:00 Lunch at the Graph Connect venue

Talks During Lunch:

Pedro Furtado, Jorge Bernardino (Univ. Coimbra): KEYSTONE Cost Action

13:00 TUC Presentations (Graph Application Descriptions)

Minqi Zhou / Weining Qian (East China Normal University): Elastic and realistic social media data generation
Andrew Sherlock (Shapespace): Shapespace Use Case
Sebastian Verheughe (Telenor): Real-time Resource Authorization

14:00 Social Network Benchmark (SNB)

Norbert Martinez (UPC - LDBC): Social Network Benchmark Task Force Update and Report

14:30 Break

14:45 TUC Presentations (Graph Analytics)

Keith Houck (IBM): Benchmarking experiences with [System G Native Store (tentative title)]
Abraham Bernstein (University of Zurich): Streams and Advanced Processing: Benchmarking RDF querying beyond the Standard SPARQL Triple Store
Luis Ceze (University of Washington): Grappa and GraphBench Status Update

15:45 Break

16:00 TUC Presentations* (Possible Future RDF Benchmarking Topics)*

Christian-Emil Ore (Unit for Digital Documentation, University of Oslo, Norway): CIDOC-CRM
Atanas Kiryakov (Ontotext): Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM)
Kostis Kyzirakos (National and Kapodistrian University of Athens / CWI): Geographica: A Benchmark for Geospatial RDF Stores
Xavier Lopez (Oracle): W3C Property Graph progress
Thomas Scharrenbach (University Zurich) PCKS: Benchmarking Semantic Flow Processing Systems

17:20 Meeting Conclusion (Josep Larriba Pey)

17:30 End of TUC meeting

19:00 Social dinner

November 20th - Internal LDBC Meeting

10:00 Start

12:30 End of meeting

coffee and lunch provided

Logistics

Date

19th November 2013

Location

The TUC meeting will be held in The Tower hotel (Google Maps link) approximately 4 minutes walk from the GraphConnect conference in London.

Getting there

From City Airport is the easiest: short ride on the DLR to Tower Gateway. Easy.
From London Heathrow: first need to take the Heathrow Express to Paddington. Then take the Circle line to Tower Hill. See attached.

LDBC/TUC Background

Looking back, we have been working on two benchmarks for the past year: a Social Network Benchmark (SNB) and a Semantic Publishing Benchmark (SPB). While below we provide a short summary, all the details of the work on these benchmark development efforts can be found in the first yearly progress reports:

A summary of these efforts can be read below or, for a more detailed account, please refer to: The Linked Data Benchmark Council: a Graph and RDF industry benchmarking effort. Annual reports about the progress, results, and future work of these two efforts will soon be available for download here, and will be discussed in depth at the TUC.

The Social Network Benchmark (SNB) is designed for evaluating a broad range of technologies for tackling graph data management workloads. The systems targeted are quite broad: from graph, RDF, and relational database systems to Pregel-like graph compute frameworks. The social network scenario was chosen with the following goals in mind:

it should be understandable, and the relevance of managing such data should be understandable
it should cover the complete range of interesting challenges, according to the benchmark scope
the queries should be realistic, i.e., similar data and workloads are encountered in practice

SNB includes a data generator for creation of synthetic social network data with the following characteristics:

data schema is representative of real social networks
data generated includes properties occurring in real data, e.g. irregular structure, structure/value correlations, power-law distributions
the software generator is easy-to-use, configurable and scalable

SNB is intended to cover a broad range of aspects of social network data management, and therefore includes three distinct workloads:

Interactive
- Tests system throughput with relatively simple queries and concurrent updates, it is designed to test ACID features and scalability in an online operational setting.
- The targeted systems are expected to be those that offer transactional functionality.
Business Intelligence
- Consists of complex structured queries for analyzing online behavior of users for marketing purposes, it is designed to stress query execution and optimization.
- The targeted systems are expected to be those that offer an abstract query language.
Graph Analytics
- Tests the functionality and scalability of systems for graph analytics, which typically cannot be expressed in a query language.
- Analytics is performed on most/all of the data in the graph as a single operation and produces large intermediate results, and it is not expected to be transactional or need isolation.
- The targeted systems are graph compute frameworks though database systems may compete, for example by using iterative implementations that repeatedly execute queries and keep intermediate results in temporary data structures.

Semantic Publishing Benchmark

The Semantic Publishing Benchmark (SPB) simulates the management and consumption of RDF metadata that describes media assets, or creative works.

The scenario is a media organization that maintains RDF descriptions of its catalogue of creative works – input was provided by actual media organizations which make heavy use of RDF, including the BBC. The benchmark is designed to reflect a scenario where a large number of aggregation agents provide the heavy query workload, while at the same time a steady stream of creative work description management operations are in progress. This benchmark only targets RDF databases, which support at least basic forms of semantic inference. A tagging ontology is used to connect individual creative work descriptions to instances from reference datasets, e.g. sports, geographical, or political information. The data used will fall under the following categories: reference data, which is a combination of several Linked Open Data datasets, e.g. GeoNames and DBpedia; domain ontologies, that are specialist ontologies used to describe certain areas of expertise of the publishing, e.g., sport and education; publication asset ontologies, that describe the structure and form of the assets that are published, e.g., news stories, photos, video, audio, etc.; and tagging ontologies and the metadata, that links assets with reference/domain ontologies.

The data generator is initialized by using several ontologies and datasets. The instance data collected from these datasets are then used at several points during the execution of the benchmark. Data generation is performed by generating SPARQL fragments for create operations on creative works and executing them against the RDF database system.

Two separate workloads are modeled in SPB:

Editorial: Simulates creating, updating and deleting creative work metadata descriptions. Media companies use both manual and semi-automated processes for efficiently and correctly managing asset descriptions, as well as annotating them with relevant instances from reference ontologies.
Aggregation: Simulates the dynamic aggregation of content for consumption by the distribution pipelines (e.g. a web-site). The publishing activity is described as “dynamic”, because the content is not manually selected and arranged on, say, a web page. Instead, templates for pages are defined and the content is selected when a consumer accesses the page.

Status of the Semantic Publishing Benchmark