The SNB Driver part 1 post
introduced, broadly, the challenges faced when developing a workload
driver for the LDBC SNB benchmark. In this blog we’ll drill down deeper
into the details of what it means to execute “dependent queries” during
benchmark execution, and how this is handled in the driver. First of
all, as many driver-specific terms will be used, below is a listing of
their definitions. There is no need to read them in detail, it is just
there to serve as a point of reference.
Definitions
- Simulation Time (ST): notion of time created by data generator. All
time stamps in the generated data set are in simulation time - Real Time (RT): wall clock time
- Time Compression Ratio: function that maps simulation time to real
time, e.g., an offset in combination with a compression ratio. It is a
static value, set in driver configuration. Real Time Ratio is reported
along with benchmark results, allowing others to recreate the same
benchmark - Operation: read and/or write
- Dependencies: operations in this set introduce dependencies in the
workload. That is, for every operation in this set there exists at least
one other operation (in Dependents) that can not be executed until this
operation has been processed - Dependents: operations in this set are dependent on at least one
other operation (in Dependencies) in the workload - Due Time (DueT): point in simulation time at which the execution of
an operation should be initiated. - Dependent Time (DepT): in addition to Due Time, every operation in
Dependents also has a Dependent Time, which corresponds to the Due Time
of the operation that it depends on. Dependent Time is always before Due
Time. For operations with multiple dependencies Dependent Time is the
maximum Due Time of all the operations it depends on. - Safe Time (SafeT): time duration.
** when two operations have a necessary order in time (i.e., dependency)
there is at least a SafeT interval between them
** SafeT is the minimum duration between the Dependency Time and Due
Time of any operations in Dependents - Operation Stream: sequence of operations ordered by Due Time
(dependent operations must separated by at least SafeT) - Initiated Operations: operations that have started executing but not
yet finished - Local Completion Time (per driver): point in simulation time behind
which there are no uncompleted operationsLocal Completion Time =
min(min(Initiated Operations), max(Completed Operations)) - Global Completion Time (GCT): minimum completion time of all
drivers. Once GCT has advanced to the Dependent Time of some operation
that operation is safe to execute, i.e., the operations it depends
on have all completed executing. Global Completion Time = min(Local
Completion Time) - Execution Window (Window): a timespan within which all operations
can be safely executed
** All operations satisfying window.startTime <= operation.DueT <
window.endTime may be executed
** Within a window no restrictions on operation ordering or operation
execution time are enforced, driver has a freedom of choosing an
arbitrary scheduling strategy inside the window
** To ensure that execution order respects dependencies between
operations, window size is bounded by SafeT, such that: 0 <
window.duration <= SafeT
** Window duration is fixed, per operation stream; this is to simplify
scheduling and make benchmark runs repeatable
** Before any operations within a window can start executing it is
required that: GCT >= window.startTime - (SafeT - window.duration)
** All operations within a window must initiate and complete between
window start and end times: window.startTime <= operation.initiate <
window.endTime and window.startTime <= operation.complete <
window.endTime - Dependency Mode: defines dependencies, constraints on operation
execution order - Execution Mode: defines how the runtime should execute operations of
a given type
Tracking Dependencies
Now, the fun part, making sure dependent operations are executed in the
correct order.
Consider that every operation in a workload belongs to none, one, or
both of the following sets: Dependencies and Dependents.
As mentioned, the driver uses operation time stamps (Due Times) to
ensure that dependencies are maintained. It keeps track of the latest
point in time behind which every operation has completed. That is, every
operation (i.e., dependency) with a Due Time lower or equal to this time
is guaranteed to have completed execution. It does this by maintaining a
monotonically increasing variable called Global Completion Time (GCT).
Logically, every time the driver (via a database connector) begins
execution of an operation from Dependencies that operation is added to
Initiated Operations:
- the set of operations that have started executing but not yet
finished.
Then, upon completion, the operation is removed from Initiated
Operations and added to Completed Operations:
- the set of operations that have started and finished executing.
Using these sets, each driver process maintains its own view of GCT in
the following way. Local progress is monitored and managed using a
variable called Local Completion Time (LCT):
- the point in time behind which there are no uncompleted operations.
No operation in Initiated Operations has a lower or equal Due Time and
no operation in Completed Operations has an equal or higher Due Time.
LCT is periodically sent to all other driver processes, which all then
(locally) set their view of GCT to the minimum LCT of all driver
processes.
At this point the driver has two, of the necessary three (third covered
shortly), pieces of information required for knowing when to execute an
operation:
- Due Time: point in time at which an operation should be executed,
assuming all preconditions (e.g., dependencies) have been fulfilled - GCT: every operation (from Dependencies) with a Due Time before this
point in time has completed execution
However, with only GCT to track dependencies the driver has no way of
knowing when it is safe to execute any particular dependent operation.
What GCT communicates is that all dependencies up to some point in time
have completed, but whether or not the dependencies for any particular
operation are within these completed operations is unknown. The driver
would have to wait until GCT has passed the Due Time (because Dependency
Time is always lower) of an operation before that operation could be
safely executed, which would result in the undesirable outcome of every
operation missing its Due Time.
The required information is which particular operation in Dependencies
does any operation in Dependents depend on. More specifically, the Due
Time of this operation. This is referred to as Dependent Time:
- in addition to Due Time, every operation in Dependents also has
(read: must have) a Dependent Time, which corresponds to the latest Due
Time of all the operations it depends on. Once GCT has advanced beyond
the Dependent Time of an operation that operation is safe to execute.
Using these three mechanisms (Due Time, GCT, and Dependent Time) the
driver is able to execute operations, while ensuring their dependencies
are satisfied beforehand.
Scalable execution in the Presence of Dependencies
The mechanisms introduced in part 1 guarantee that dependency
constraints are not violated, but in doing so they unavoidably introduce
overhead of communication/synchronization between driver
threads/processes. To minimize the negative effects that synchronization
has on scalability an additional Execution Mode was introduced (more
about Execution Modes will be discussed shortly): Windowed
Execution. Windowed Execution has two design goals:
a) make the generated load less ‘bursty’
b) allow the driver to ‘scale’, so when the driver is given more
resources (CPUs, servers, etc.) it is able to generate more load.
In the context of Windowed Execution, operations are executed in groups
(Windows), where operations are grouped according to their Due Time.
Every Window has a Start Time, a Duration, and an End Time, and Windows
contain only those operations that have a Due Time between
Window.startTime and Window.endTime. Logically, all operations within a
Window are executed at the same time, some time within the Window. No
guaranty is made regarding exactly when, or in what order, an operation
will execute within its Window.
The reasons this approach is correct are as follows:
- Operations belonging to the Dependencies set are never executed in
this manner - the Due Times of Dependencies operations are never
modified as this would affect how dependencies are tracked - The minimum duration between the Dependency Time and Due Time of any
operation in Dependents is known (can be calculated by scanning through
workload once), this duration is referred to as Safe Time (SafeT) - A window does not start executing until the dependencies of all its
operations have been fulfilled. This is ensured by enforcing that window
execution does not start until
GCT >= window.startTime - (SafeT - window.duration) = window.endTime -
SafeT; that is, the duration between GCT and the end of the window is no
longer than SafeT
The advantages of such an execution mode are as follows:
- As no guarantees are made regarding time or order of operation
execution within a Window, GCT no longer needs to be read before the
execution of every operation, only before the execution of every window - Then, as GCT is read less frequently, it follows that it does not need
to be communicated between driver processes as frequently. There is no
need or benefit to communicating GCT protocol message more frequently
than approximately Window.duration, the side effect of which is reduced
network traffic - Further, by making no guarantees regarding the order of execution the
driver is free to reschedule operations (within Window bounds). The
advantage being that operations can be rearranged in such a way as to
reduce unwanted bursts of load during execution, which could otherwise
occur while synchronizing GCT during demanding workloads. For example, a
uniform scheduler may modify operation Due Times to be uniformly
distributed across the Window timespan, to ‘smoothen’ the load within a
Window.
As with any system, there are trade-offs to this design, particularly
regarding Window.duration. The main trade-off is that between ‘workload
resolution’ and scalability. Increasing Window.duration reduces
synchronization but also reduces the resolution at which the workload
definition is followed. That is, the generated workload becomes less
like the workload definition. However, as this is both bounded and
configurable, it is not a major concern. This issue is illustrated in
Figure 1, where the same stream of events is split into two different
workloads based on different size of the Window. The workload with
Window size 5 (on the right) has better resolution, especially for the
‘bursty’ part of the event stream.
Figure 1. Window scheduling
This design also trades a small amount of repeatability for scalability:
as there are no timing or ordering guarantees within a window, two
executions of the same window are not guaranteed to be equivalent -
‘what happens in the window stays in the window’. Despite sacrificing
this repeatability, the results of operations do not change. No
dependency-altering operations occur during the execution of a Window,
therefore results for all queries should be equivalent between two
executions of the same workload, there is no effect on the expected
result for any given operation.