Home » Blog » 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database.

8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database.

  • Posted on: 5 January 2017
  • By: Adrian Diaz

During the 8th TUC Meeting Eugene Chong from Oracle USA explained what his team and himself had done to improve RDF query processing in their database.

His talk orbited around 3 main topics. Order-by and filter processing, in-memory processing and in-memory virtual columns. SPARQL has its own order-by semantics and they generated a massive statement to check the value type, numeric value, date and string values. The associated problems with this statements they materialized value types and values in a RDF_VALUE$ table that was stored as ORDER_TYPE, ORDER_NUM, ORDER_DATE these columns were populated in with load time. To process all the order-by, they generated an SQL, they did the same for filter processing. It seems that the solution worked because the times improved by up to 9 times their previous mark.

Regarding the in-memory processing, they used the Oracle Column feature to load frequently accessed columns in memory. Chong and his team stored subject, predicate and object IDs and from the value table the id and the value itself. They also used hash joins to conduct fast full scans of the tables. To test the system they used a machine with 32GB of memory and 2TB of disk space. The benchmark of choice was the LUBM that consisted on >8M rows. They also varied the size of the memory from 1GB to 6GB being the last one enough to store all the data. The improvement using this system ranged from a 4 to 6x gain. The in-memory full scan had an improvement of 190 times its previous mark.

Finally, the objective they had for in-memory virtual columns was to remove the joins in the table by doing a complete de-normalization in memory. This is particularly useful for fully populated data in memory. The queries are processed on the RDF_LINK$ table only and the joins are removed with RDF_VALUE$ table. The performance of using this system resulted in a gain of eight times faster than the previous mark.

Remember to plan your assistance to the upcoming 9th TUC Meeting at SAP's HQ in Walldorf, Germany the 9-10th of February!

Add new comment