Algorithm: INFOPATH

We model diffusion processes as discrete networks of fully continuous temporal processes occurring at different rates, as in NETRATE. The model allows information to propagate at different rates across different edges and it does not attempt to model the mechanisms underlying individual infections. Instead, it adopts a data-driven approach, which uses only the recorded spatiotemporal traces of diffusion.

We then develop INFOPATH, a time-varying network inference algorithm that use stochastic gradient to provide on-line estimates of the structure and temporal dynamics of a network that changes over time. This framework enables us to study the temporal evolution of information pathways in the online media space. We identify emerging and vanishing pathways over time, and find out when specific mainstream media sites and blogs are key players and produce highly viral content.

For more details read our paper:

M. Gomez-Rodriguez, J. Leskovec, B. Schölkopf. Structure and Dynamics of Information Pathways in On-line Media. The 6th ACM International Conference on Web Search and Data Mining (WSDM), 2013.

A few examples of what our algorithm finds from all posts of 5,000 sites from March 2011 to February 2012 follow:

Recurrent topics vs ongoing news

Connectivity changes tend to reflect the amount of attention that news events or topics trigger over time. Unexpected news events results in a more dramatic increase in the number of edges over a short period of time. More general topics result in a network with more stable connectivity over time. In the figures below, the number of edges for the information network for NBA remains relatively stable over time. In contrast, the number of edges for the information network for Fukushima changes dramatically.

NBA edges over time  Fukushima edges over time 

NBA

Fukushima

Time-varying clusters

Clusters of mainstream news and blogs often emergence and vanish in matter of days, and our on-line algorithm is able to uncover them. The videos below illustrates the information network for Amy Winehouse from Jan 20, 2012 to Feb 28, 2012 and the information network for Gaddafi from Oct 1, 2011 to Feb 28, 2012. Blue nodes are blogs and red nodes are mainstream media.

   

Amy Winehouse

Gaddafi

Blog-to-blog links increase due to civil unrest

News are sometimes spreading earlier among blogs than mainstream media. Interestingly, we find that news events that involved an increasing dramatic civil unrest, as the Libyan civil war or Syria uprise, result in a earlier greater increase in connectivity among blogs than mainstream media. The figures below show the number of links that point between different types of sites across time in two information networks, for Gaddafi and Syria.

Gaddafi type edges over time  Syria type edges over time 

Gaddafi

Syria

Top influential blogs vs mainstream media

Perhaps surprisingly, the amount of mainstream media and blogs among the most influential nodes for most topics or news events are comparable. However, we find that growing numbers of influential blogs on some topics or news events are often temporally correlated with increasing social unrest. The figures below show the number of mainstream media sites vs blogs in the top-100 most influential sites for the information network for Occupy Wall Street and LinkedIn.

Occupy top influencers over time  LinkedIn top influencers over time 

Occupy

LinkedIn

You can investigate further how information pathways change checking out some graphs and videos, downloading and running our algorithm in other datasets, or exploring our cascade dataset.