Social Network and Social Media Analysis: Methods, Models and Applications

NIPS 2012 Workshop

December 7, 2012

Lake Tahoe, Nevada

Room: Glenbrook & Emerald Bay, Harrah’s

Organizers:

Motivation and Goals

The primary goal of the workshop is to become an inflection point in the maturation of social network and social media analysis, promoting greater technical sophistication and practical relevance. To accomplish this, this workshop seeks to bring together researchers from applied disciplines such as sociology, economics, medicine and biology, together with researchers from more theoretical disciplines such as mathematics and physics, within our community of statisticians and computer scientists.

The technical focus of the workshop is the statistical, methodological and computational issues that arise when modeling and analyzing large collections of data that are largely represented as static and dynamic graphs. As the different communities use diverse ideas and mathematical tools, we seek to foster cross-disciplinary collaborations and intellectual exchange. The communities identified above all have a long-standing interest in modeling networks, and while they approach the problem from different directions, their ultimate goals are very similar.

The full Call for Papers can be found here.

Schedule

Time Topic
08:00 - 08:15

Opening Remarks

08:15 - 09:00
Invited Speaker:
Jennifer Chayes (Microsoft Research)
09:00 - 09:30

Coffee Break

09:30 - 10:30

Poster Spotlights

10:30 - 15:30

Poster Session/Ski Break

15:30 - 16:15
Invited Speaker:
Zoubin Ghahramani (University of Cambridge)
16:15 - 17:00
Invited Speaker:
Andrew Tomkins (Google Research)
17:00 - 17:30

Coffee Break

17:30 - 18:15
Invited Speaker:
Sharad Goel (Microsoft Research)
18:15 - 19:00

Closing Panel

Confirmed Speakers

Jennifer Chayes

Jennifer Chayes (Microsoft Research)

The Power of Locality for Network Algorithms

Abstract:

Given the massive size of most networks of current interest, conventional algorithms, which scale as polynomials in the network size, are woefully inadequate. This talk focuses on how to use locality to deliver much more efficient algorithms (quasilinear or even sublinear in the network size), even if this requires that we accept some degradation in the approximation relative to optimal algorithms. In first part of the talk, I consider systems that have only local data access. Many large networks are encoded locally, e.g., with adjacency lists. How does the nature of the data access impact the efficiency of algorithms? Surprisingly, we show that small differences in data access can lead to very large differences in efficiency and approximability of network algorithms. As an example, we consider a covering problem which arises naturally for recruiters on social networks, and show how small differences between the information on neighbors in LinkedIn and Facebook lead to large differences in their utility to recruiters. In the second part of the talk, I consider problems where there is no a priori limit on data access. Nevertheless, we show how for a host of questions - pagerank, coverage, diffusion, determining the most influential nodes in the independent cascade model - we can exploit locality to design highly efficient algorithms, often with quasilinear or sublinear runtimes. This talk represents several projects with Christian Borgs, Mickey Brautbar, Brendan Lucier and Shanghua Teng.

Bio:

Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. Chayes was Research Area Manager for Mathematics, Theoretical Computer Science and Cryptography at Microsoft Research Redmond until 2008. Chayes joined Microsoft Research in 1997, when she co-founded the Theory Group. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, and algorithmic game theory. She is the co-author of over 110 scientific papers and the co-inventor of more than 25 patents.


Zoubin Ghahramani

Zoubin Ghahramani (University of Cambridge)

Probabilistic Models for Social Networks and Other Relational Data

Abstract:

Network data and more generally relational data encoding the pairwise relations between objects appear in many fields. For instance in biology, a protein network connects interacting partners, while in a social network, links among people indicate social relationships. The problems of analysing, understanding and modelling such networks have attracted interest from many research communities. I will briefly review some probabilistic approaches to modelling networks. The key idea behind many such models is that each object has certain latent features, and that observed links in the network depend on these latent features. Probabilistic inference allows one to discover the potentially unbounded number of latent features (including discovering communities as a special case), predict missing links, and generally learn about the statistical properties of the networks. Many of these models can be cast within the theoretical framework of exchangeable arrays established by Aldous, Hoover and Kallenberg. I will describe our work on a general network model (the Random Function Model) that instantiates this theory using Gaussian processes, and relate it to existing models. I will also discuss our work on the Infinite Latent Attribute (ILA) model which allows for a highly structured nonparametric latent variable representation of nodes in a network. Finally, I will describe our Latent Feature Propagation model for dynamic networks. What ties these models together is the idea that rich latent representations underlie the structure of networks, and that these can be discovered via Bayesian inference.

Joint work with Creighton Heaukulani, David A. Knowles, James Lloyd, Peter Orbanz, Konstantina Palla, and Dan Roy.

Bio:

Zoubin Ghahramani is Professor of Information Engineering at the University of Cambridge, UK, and is also Associate Research Professor of Machine Learning at Carnegie Mellon University, USA. He obtained BA and BSE degrees from University of Pennsylvania, and a PhD in 1995 from MIT working with Prof Mike Jordan. He did a postdoc in Computer Science at University of Toronto working with Prof Geoff Hinton. His work has included research on human sensorimotor control, cognitive science, statistics, and machine learning. His current focus is on Bayesian approaches to statistical machine learning, and has recently started a research programme on machine learning applications to information retrieval. He has published over 100 peer reviewed papers, and serves on the editorial boards of several leading journals in the field, including JMLR, JAIR, Annals of Statistics, Machine Learning, and Bayesian Analysis. He is Associate Editor in Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence, currently the IEEE's highest impact journal. He also serves on the Board of the International Machine Learning Society, and was Program Chair of the 2007 International Machine Learning Conference.


Sharad Goel

Sharad Goel (Microsoft Research)

"Going Viral": The Structure of Online Diffusion

Abstract:

Viral products and ideas are intuitively understood to grow through a person-to-person diffusion process analogous to that of an infectious disease. Until recently, however, it has been prohibitively difficult to directly observe such viral events, and thus to rigorously quantify or characterize their structural properties. Here we propose a formal measure of what we label ``structural virality'' that interpolates between two extremes: content that gains its popularity through a single, large broadcast, and that which grows through multiple generations with any one individual directly responsible for only a fraction of total adoption. We use this notion of structural virality to analyze a unique dataset of several billion diffusion events---including the propagation of news stories, videos, images, and petitions---on a large online social network. We find that the very largest observed events nearly always exhibit high structural virality, providing some of the first direct evidence that many of the most popular products and ideas grow through person-to-person diffusion. However, medium-sized events---having thousands of adopters---exhibit surprising structural diversity, and are as likely to grow via broadcast as through viral means. These empirical findings are consistent with a contagion having subcritical infection rate spreading on a network with heavy-tailed degree distribution, which, notably, is a non-viral mechanism in the traditional, theoretical sense.

Bio:

Sharad Goel holds a PhD in Applied Mathematics and a Masters in Computer Science from Cornell, and a BS in Mathematics from the University of Chicago. Following postdoctoral positions in the math departments at Stanford and the University of Southern California, he joined the Microeconomics and Social Systems group at Yahoo! Research. Sharad is currently a Senior Researcher at Microsoft Research – New York City.


Andrew Tomkins

Andrew Tomkins (Google Research)

Social Network Modeling and Task Completion

Abstract

In this talk I'll describe two classical statistical estimation problems facing social network providers: estimation of edge weight, and prediction of engagement. I'll then offer some predictions about problems of interest to the next generation of social networks, related to the provision of assistance in completing tasks. Finally, I'll close with some questions about how network modeling algorithms could or should interact with the significant literature around network characterization.

Andrés Corrada-Emmanuel (Nanigans)



Accepted Papers