Posts

OWL-Empowered SPARQL Query Optimization

Tags:

DEVELOPER , INDUSTRY

The Linked Data paradigm has become the prominent enabler for sharing huge volumes of data using Semantic Web technologies, and has created novel challenges for non-relational data management systems, such as RDF and graph engines. Efficient data access through queries is perhaps the most important data management task, and is enabled through query optimization techniques, which amount to the discovery of optimal or close to optimal execution …

Person Activity Subgraph Features in LDBC DATAGEN

Tags:

SNB , DATAGEN

When talking about DATAGEN and other graph generators with social network characteristics, our attention is typically borrowed by the friendship subgraph and/or its structure. However, a social graph is more than a bunch of people being connected by friendship relations, but has a lot more of other things is worth to look at. With a quick view to commercial social networks like Facebook, Twitter or Google+, one can easily identify a lot of other …

SNB Driver - Part 2: Tracking Dependencies Between Queries

Tags:

SNB , DRIVER , INTERACTIVE

The SNB Driver part 1 post introduced, broadly, the challenges faced when developing a workload driver for the LDBC SNB benchmark. In this blog we’ll drill down deeper into the details of what it means to execute “dependent queries” during benchmark execution, and how this is handled in the driver. First of all, as many driver-specific terms will be used, below is a listing of their definitions. There is no need to read them in …

SNB Driver - Part 3: Workload Execution Putting It All Together

Tags:

SNB , DRIVER , INTERACTIVE

Up until now we have introduced the challenges faced when executing the LDBC SNB benchmark, as well as explained how some of these are overcome. With the foundations laid, we can now explain precisely how operations are executed.

Based on the dependencies certain operations have, and on the granularity of parallelism we wish to achieve while executing them, we assign a Dependency Mode and an Execution Mode to every operation type. Using these …

Running the Semantic Publishing Benchmark on Sesame, a Step by Step Guide

Tags:

SPB , SESAME , RDF , TUTORIAL , GUIDE

Until now we have discussed several aspects of the Semantic Publishing Benchmark (SPB) such as the difference in performance between virtual and real servers configuration, how to choose an appropriate query mix for a benchmark run and our experience with using SPB in the development process of GraphDB for finding performance issues.

In this post we provide a step-by-step guide on how to run SPB using the Sesame RDF data store on a fresh install …

Semantic Publishing Instance Matching Benchmark

Tags:

INSTANCE MATCHING , BENCHMARK

The Semantic Publishing Instance Matching Benchmark (SPIMBench) is a novel benchmark for the assessment of instance matching techniques for RDF data with an associated schema. SPIMBench extends the state-of-the art instance matching benchmarks for RDF data in three main aspects: it allows for systematic scalability testing, supports a wider range of test cases including semantics-aware ones, and provides an enriched gold standard.

The SPIMBench …

Further Developments in SNB BI Workload

Tags:

SNB , BI

We are presently working on the SNB BI workload. Andrey Gubichev of TU Munchen and myself are going through the queries and are playing with two SQL based implementations, one on Virtuoso and the other on Hyper.

As discussed before, the BI workload has the same choke points as TPC-H as a base but pushes further in terms of graphiness and query complexity.

There are obvious marketing applications for a SNB-like dataset. There are also security …

Sizing AWS Instances for the Semantic Publishing Benchmark

Tags:

SPB , AMAZON , EC2 , AWS , RDF

LDBC’s Semantic Publishing Benchmark (SPB) measures the performance of an RDF database in a load typical for metadata-based content publishing, such as the well-known BBC Dynamic Semantic Publishing scenario. Such load combines tens of updates per second (e.g. adding metadata about new articles) with even higher volume of read requests (SPARQL queries collecting recent content and data to generate web page on a specific subject, e.g. Frank …

DATAGEN: a Realistic Social Network Data Generator

Tags:

DEVELOPER , INDUSTRY

In previous posts (Getting started with snb, DATAGEN: data generation for the Social Network Benchmark), Arnau Prat discussed the main features and characteristics of DATAGEN: realism, scalability, determinism, usability. DATAGEN is the social network data generator used by the three LDBC-SNB workloads, which produces data simulating the activity in a social network site during a period of time. In this post, we conduct a series of experiments …

SNB Driver - Part 1

Tags:

SNB , DRIVER , TPC-C , INTERACTIVE

In this multi-part blog we consider the challenge of running the LDBC Social Network Interactive Benchmark (LDBC SNB) workload in parallel, i.e. the design of the workload driver that will issue the queries against the System Under Test (SUT). We go through design principles that were implemented for the LDBC SNB workload generator/load tester (simply referred to as driver). Software and documentation for this driver is available here: …

««
«
1
2
3
4
5
»
»»