Inferring Networks of Diffusion and Influence

NETINF infers a who-copies-from-whom or who-repeats-after-whom network of news media sites and blogs using the MemeTracker dataset.

Below, you can find some extra information:

About NETINF
Graphs
Download the code
Download the data
Papers
Other methods

About NETINF

We track cascades of information diffusion among more than news media sites and blogs over one year. NETINF efficiently reconstructs a who-copies-from-whom network from these cascades. This makes it possible to see how different web sites copy from each other, and how a few central web sites have specific circles of influence. For more read our paper:

M. Gomez-Rodriguez, J. Leskovec, A. Krause. Inferring Networks of Diffusion and Influence.The 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010.

Graphs

We show several graphs that provide some insight about the structure of who-copies-from-whom networks as inferred by NETINF.

Red nodes represent news media sites and blue nodes represent blogs. The size of every node is proportional to the number of cascades in which it takes part. The width of each edge between two nodes is proportional to its strength, i.e. how likely the destination node can copy or repeat information from the source node.

You can click over the graphs to see them bigger!

Largest Connected Component (LCC) (5,000 largest phrase clusters, 1,000 most active sites, 300 iterations, cascades traced from memes):

Full Network (5,000 largest phrase clusters, 1,000 most active sites, 800 iterations, cascades traced from memes):

Full Network (10,000 largest phrase clusters, 5,000 most active sites, 1,600 iterations, cascades traced from memes):

Full Network (10,000 largest phrase clusters, 5,000 most active sites, 3,200 iterations, cascades traced from memes):

Largest Connected Component (LCC): (2,000 largest phrase clusters, 100 most active sites, 100 iterations, cascades traced from hyperlinks)

For more, read our paper

Download the code

We provide a simple implementation of NETINF in a comprehensive package with the necessary code from SNAP library.

Cascade Input format: The input file to NETINF, with information about the cascades, should have two blocks separated by a blank line. Each line in the first block contains the id and name of a site:

Each line in the second block contains information about one cascade:

<id>,<timestamp>;<id>,<timestamp>;<id>,<timestamp>...

Example of a valid input file:

0,0 1,1 2,2 3,3 4,4 0,0;1,1;2,2;3,3 4,0;3,1 1,0;3,1;2,4 0,0;1,10;2,20;3,11 0,0;1,10;2,11

Within the package, you can find additional information in ReadMe.txt, including how to compile and run NETINF. We also provide some sample input data as a toy.

Download the data

Data contains information about the connectivity of the who-copies-from-whom or who-repeats-after-whom network of news media sites and blogs inferred by NETINF.

Download:

InfoNet5000Q1000NEXP.txt (5,000 phrase clusters, 1,000 sites, 5,000 edges, cascades traced from memes)
InfoNet2000Q1000NEXPL.txt (2,000 phrase clusters, 1,000 sites, 5,000 edges, cascades traced from hyperlinks)

Data format: Each line in the file contains the following information about each edge of the network (sorted as given by NETINF):

<index> <src> <dst> <number_trees> <marginal_gain> <median_timediff> <av_timediff>
<index>: iteration in which the edge was selected by NETINF <src>: news media site or blog that was copied <dst>: news media site or blog that copies <number_trees>: number of trees in which <dst> copied from <src> <marginal_gain>: marginal gain given by the edge <median_timediff>: median time difference of the copies from <src> to <dst> <average_timediff>: average time difference of the copies from <src> to <dst>

Example of a line record: line below maps to the fields above.

99 startribune.com huffingtonpost.com 5 114.122088 1.000000 2.600000

Additionally, you can also find the MemeTracker phrase cluster data and the raw MemeTracker phrase data that was used by NETINF at the MemeTracker website.

Papers

To learn more about NETINF, you can download our paper:

M. Gomez-Rodriguez, J. Leskovec, A. Krause. Inferring Networks of Diffusion and Influence. The 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010.

Moreover, NETINF builds on our previous work:

J. Leskovec, L. Backstrom, J. Kleinberg. Meme-tracking and the Dynamics of the News Cycle. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (ACM KDD), 2009. [Data!]
J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SIAM SDM Intl. Conf. on Data Mining (SIAM SDM), 2007.
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance. Cost-effective Outbreak Detection in Networks. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (ACM KDD), 2007.
J. Leskovec, C. Faloutsos. Scalable Modeling of Real Graphs using Kronecker Multiplication. International Conference on Machine Learning (ICML), 2007.

Other methods

There have been new attempts to infer the edges, transmission rates and prior probabilities of infection of global diffusion networks since NETINF. You may like to check out: