SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

SEISMIC estimates the infectiousness of an information cascade and predict its popularity given the observed history. A typical application is predicting the number of shares/retweets of a post/tweet in Facebook/Twitter in real time.

About SEISMIC

SEISMIC models the information cascade as a self-exciting point process. In a self-exciting point process, each reshare not only increases the cumulative count by one, it also exposes new followers who may further reshare the post. This property is ideal to model the "rich get richer" phenomenon in information spreading.

SEISMIC implements a fast kernel weighted method to estimate the temporally evolving infectiousness, which fully characterizes an information cascade. Roughly speaking, it measures how likely the post will be reshared at that time. Then, if the infectiousness is smaller than a threshold, SEISMIC can accurately predicts the final popularity of the post.

Paper

For more details, you can download our paper

Code

The SEISMIC algorithm is implemented in R and available on CRAN. In R, you can install it by

install.packages(seismic)

Alternatively, you can download the latest package source here.

An example of SEISMIC can be found by

library(seismic) example(pred.cascade)

Download the data

We use a full month of Twitter data to evaluate SEISMIC, which can be found below. The original data set contains over 3.2 billion tweets and retweets on Twitter from Octobor 7 to November 7, 2011. We only kept tweets such that it has at least 50 retweets, the text of the tweet does not contain a pound sign # (hashtag), and the language of the original poster is English. There are 166,076 tweets satisfying these criteria in the end.

Download:

Data format: