Event Driven Post Generation in Datagen

by Arnau Prat / on 10 Apr 2015

As discussed in previous posts, one of the features that makes Datagen
more realistic is the fact that the activity volume of the simulated
Persons is not uniform, but forms spikes. In this blog entry I want to
explain more in depth how this is actually implemented inside of the

First of all, I start with a few basics of how Datagen works internally.
In Datagen, once the person graph has been created (persons and their
relationships), the activity generation starts. Persons are divided into
blocks of 10k, in the same way they are during friendship edges
generation process. Then, for each person of the block, three types of
forums are created:

  • The wall of the person
  • The albums of the person
  • The groups where the person is a moderator

We will put our attention to group generation, but the same concepts
apply to the other types of forums. Once a group is created, the members
of the group are selected. These are selected from either the friends of
the moderator, or random persons within the same block.

After assigning the members to the group, the post generation starts. We
have two types of post generators, the uniform post generator and the
event based post generator. Each post generator is responsible of, given
a forum, generate a set of posts for the forum, whose authors are taken
from the set of members of the forum. The uniform post generator
distributes the dates of the generated posts uniformly in the time line
(from the date of the membership until the end of the simulation time).
On the other hand, the event based post generator assigns dates to
posts, based on what we call “flashmob events”.

Flashmob events are generated at the beginning of the execution. Their
number is predefined by a configuration parameter which is set to 30
events per month of simulation, and the time of the event is distributed
uniformly along all the time line. Also, each event has a volume level
assigned (between 1 and 20) following a power law distribution, which
determines how relevant or important the event is, and a tag
representing the concept or topic of the event. Two different events can
have the same tag. For example, one of the flashmob events created for
SF1 is one related to “Enrique Iglesias” tag, whose level is 11 and
occurs on 29th of May of 2012 at 09:33:47.

Once the event based post generation starts for a given group, a subset
of the generated flashmob events is extracted. These events must be
correlated with the tag/topic of the group, and the set of selected
events is restricted by the creation date of the group (in a group one
cannot talk about an event previous to the creation of the group). Given
this subset of events and their volume level, a cumulative probability
distribution (using the events sorted by event date and their level) is
computed, which is later used to determine to which event a given post
is associated. Therefore, those events with a larger lavel will have a
larger probability to receive posts, making their volume larger. Then,
post generation starts, which can be summarized as follows:

  • Determine the number of posts to generate
  • Select a random member of the group that will generate the post
  • Determine the event the post will be related to given the
    aforementioned cumulative distribution
  • Assign the date of the post based on the event date

In order to assign the date to the post, based on the date of the event
the post is assigned to, we follow the following probability density,
which has been extracted from [1]. The shape of the probability density
consists of a combination of an exponential function in the 8 hour
interval around the peak, while the volume outside this interval follows
a logarithmic function. The following figure shows the actual shape of
the volume, centered at the date of the event.

Following the example of “Enrique Iglesias”, the following figure shows
the activity volume of posts around the event as generated by Datagen.

In this blog entry we have seen how datagen creates event driven user
activity. This allows us to reproduce the heterogenous post creation
density found in a real social network, where post creation is driven by
real world events.


[1] Jure Leskovec, Lars Backstrom, Jon M. Kleinberg: Meme-tracking and
the dynamics of the news cycle. KDD 2009: 497-506