# SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

**SEISMIC** estimates the infectiousness of an information
cascade and predict its popularity given the observed history. A
typical application is predicting the number of shares/retweets of
a post/tweet in Facebook/Twitter in real time.

## About SEISMIC

SEISMIC models the information cascade as a self-exciting point
process. In a self-exciting point process, each reshare not only
increases the cumulative count by one, it also exposes new followers
who may further reshare the post. This property is ideal to model
the "rich get richer" phenomenon in information spreading.

SEISMIC implements a fast kernel weighted method to estimate
the temporally evolving infectiousness, which fully characterizes an information cascade. Roughly speaking, it measures how
likely the post will be reshared at that time. Then, if the
infectiousness is smaller than a threshold, SEISMIC can accurately
predicts the final popularity of the post.

## Paper

For more details, you can download our paper

## Code

The SEISMIC algorithm is implemented in R and available
on CRAN. In
R, you can install it by

install.packages(seismic)

Alternatively, you can download the latest package source here.

An example of SEISMIC can be found by

library(seismic)
example(pred.cascade)

## Download the data

We use a full month of Twitter data to evaluate SEISMIC, which can
be found below. The original data set contains over 3.2 billion
tweets and retweets on Twitter from Octobor 7 to November 7,
2011. We only kept tweets such that it has at least 50 retweets, the text of the
tweet does not contain a pound sign # (hashtag), and the language of
the original poster is English. There are 166,076 tweets satisfying
these criteria in the end.

**Download:**

**data.csv** (34,784,489 lines of
tweets/retweets, 285Mb)
**index.csv** (166,077 lines of
tweets, 7.9Mb)

**Data format:**

- data.csv (with header)
** <relative_time_second>,<number_of_followers>**

<relative_time_second>: relative post time of the tweet/retweet (in second)
<number_of_followers>: number of followers of the user who tweets/retweets

- index.csv (with header)
** <tweet_id>,<post_time_day>,<start_ind>,<end_ind>**
<tweet_id>: id of the original tweet
<post_time_day>: post time (UTC) of the original tweet (in day)
<start_ind>: the first row in

**data.csv** of this tweet
<end_ind>: the last row in

**data.csv** of this tweet