**LIM** learns the global influence of a node on the rate of information diffusion.

Below, you can find some extra information:

We assume the influence function of an individual node, the number of follow-up infections that the node's infection will bring.
**LIM** models the rate of information diffusion as the sum of the influence function of individual nodes, and can be trained very efficiently with large data.
For experiment, we track cascades of information diffusion on a set of 500 million Twitter posts and a set of 170 million news articles and blog posts, and then estimate the influence function of nodes using **LIM**

- J. Yang, J. Leskovec. Modeling Information Diffusion in Implicit Networks. IEEE International Conference On Data Mining (ICDM), 2010.

We examine the diffusion of short textual phrase over the online media space using the Memetracker data set.We show several plots about the influence of various type of media. Using the extension of LIM, we show the recency factor and the imitation factor that govern the adoption of phrase along with the influence of media.

You can click over the figures to see them bigger.

**Average influence function of various type of media on different topics**(22 media grouped into 5 types and 1,000 cascades grouped into 6 topics):**Recency factor and latent volume (caused by imitation factor)**

We provide a simple MATLAB implementation of **LIM** and the Memetracker data that we used in our paper.

**Input data file:** The input file to **LIM** is a MATLAB binary data file(data_lim.mat). It contains a matrix `data` where each row vector correponds to a specific time index of a cascade. (If there are K cascades and each cascade has T time indices, then
`data` has KT rows.) `data` has N+2 columns where N is the number of nodes that we model their influence functions.
The first N columns denote how many times each of N nodes become infected to the cascade at the time index.
The N+1-th column specifies the volume of the cascade at the time, and the last column (N+2-th) specifies which cascade the row corresponds to.

data=[<M_{1,1}(1)>,<M_{2,1}(1)>,...,<M_{N,1}(1)>,<V_1(1)>,<1>;
<M_{1,1}(2)>,<M_{2,1}(2)>,...,<M_{N,1}(2)>,<V_1(2)>,<1>;
<M_{1,1}(3)>,<M_{2,1}(3)>,...,<M_{N,1}(3)>,<V_1(3)>,<1>;
...
<M_{1,1}(T)>,<M_{2,1}(T)>,...,<M_{N,1}(T)>,<V_1(T)>,<1>;
<M_{1,2}(1)>,<M_{2,2}(1)>,...,<M_{N,2}(1)>,<V_2(1)>,<2>;
...
<M_{1,K}(T)>,<M_{2,K}(T)>,...,<M_{N,K}(T)>,<V_K(T)>,<T>;]

M_{u,k}(t) denotes how many times node u got infected to cascade k at time t. In the data set, we use K=1000 and N=100. The time length T of a cascade varies depending on how long the cascade lasted.

Also, the data file contains two auxilary vectors. The (K,1) cell `phrname` contains the phrase of K cascades. The (N,1) cell `sitename` contains the names of websites (nodes).

**Running LIM:** Script `test_lim.m` shows how to train LIM using the data file.
Each row vector of the matrix `Influence` contains the influence function of each node.
The script displays the average influence function of all the nodes.
In addition, the script shows the comparison between the ground-truth volume of a test phrase and the predicted volume by LIM for the phrase.
You can change which phrase to test inside the script.