Motivation and Goals
The primary goal of the workshop is to become an inflection
point in the maturation of social network and social media
analysis, promoting greater technical sophistication and practical
relevance. To accomplish this, this workshop seeks to bring together researchers
from applied disciplines such as sociology, economics, medicine and
biology, together with researchers from more theoretical disciplines
such as mathematics and physics, within our community of statisticians
and computer scientists.
The technical focus of the workshop is the statistical, methodological
and computational issues that arise when modeling and analyzing large
collections of data that are largely represented as static and dynamic
graphs. As the different communities use diverse ideas and
mathematical tools, we seek to foster cross-disciplinary
collaborations and intellectual exchange. The communities identified
above all have a long-standing interest in modeling networks, and
while they approach the problem from different directions, their
ultimate goals are very similar.
The full Call for Papers can be found here.
The Power of Locality for Network Algorithms
Given the massive size of most networks of current interest, conventional algorithms, which scale as polynomials in the network size, are woefully inadequate. This talk focuses on how to use locality to deliver much more efficient algorithms (quasilinear or even sublinear in the network size), even if this requires that we accept some degradation in the approximation relative to optimal algorithms. In first part of the talk, I consider systems that have only local data access. Many large networks are encoded locally, e.g., with adjacency lists. How does the nature of the data access impact the efficiency of algorithms? Surprisingly, we show that small differences in data access can lead to very large differences in efficiency and approximability of network algorithms. As an example, we consider a covering problem which arises naturally for recruiters on social networks, and show how small differences between the information on neighbors in LinkedIn and Facebook lead to large differences in their utility to recruiters. In the second part of the talk, I consider problems where there is no a priori limit on data access. Nevertheless, we show how for a host of questions - pagerank, coverage, diffusion, determining the most influential nodes in the independent cascade model - we can exploit locality to design highly efficient algorithms, often with quasilinear or sublinear runtimes. This talk represents several projects with Christian Borgs, Mickey Brautbar, Brendan Lucier and Shanghua Teng.
Jennifer Tour Chayes is Distinguished Scientist and Managing Director of Microsoft Research New England in Cambridge, Massachusetts, which she co-founded in 2008, and Microsoft Research New York City, which she co-founded in 2012. Chayes was Research Area Manager for Mathematics, Theoretical Computer Science and Cryptography at Microsoft Research Redmond until 2008. Chayes joined Microsoft Research in 1997, when she co-founded the Theory Group. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, and algorithmic game theory. She is the co-author of over 110 scientific papers and the co-inventor of more than 25 patents.
Probabilistic Models for Social Networks and Other Relational Data
Network data and more generally relational data encoding the pairwise relations between objects appear in many fields. For instance in biology, a protein network connects interacting partners, while in a social network, links among people indicate social relationships. The problems of analysing, understanding and modelling such networks have attracted interest from many research communities. I will briefly review some probabilistic approaches to modelling networks. The key idea behind many such models is that each object has certain latent features, and that observed links in the network depend on these latent features. Probabilistic inference allows one to discover the potentially unbounded number of latent features (including discovering communities as a special case), predict missing links, and generally learn about the statistical properties of the networks. Many of these models can be cast within the theoretical framework of exchangeable arrays established by Aldous, Hoover and Kallenberg. I will describe our work on a general network model (the Random Function Model) that instantiates this theory using Gaussian processes, and relate it to existing models. I will also discuss our work on the Infinite Latent Attribute (ILA) model which allows for a highly structured nonparametric latent variable representation of nodes in a network. Finally, I will describe our Latent Feature Propagation model for dynamic networks. What ties these models together is the idea that rich latent representations underlie the structure of networks, and that these can be discovered via Bayesian inference.
Joint work with Creighton Heaukulani, David A. Knowles, James Lloyd, Peter Orbanz, Konstantina Palla, and Dan Roy.
Zoubin Ghahramani is Professor of Information Engineering at the
University of Cambridge, UK, and is also Associate Research
Professor of Machine Learning at Carnegie Mellon University, USA. He
obtained BA and BSE degrees from University of Pennsylvania, and a
PhD in 1995 from MIT working with Prof Mike Jordan. He did a postdoc
in Computer Science at University of Toronto working with Prof Geoff
Hinton. His work has included research on human sensorimotor
control, cognitive science, statistics, and machine learning. His
current focus is on Bayesian approaches to statistical machine
learning, and has recently started a research programme on machine
learning applications to information retrieval. He has published
over 100 peer reviewed papers, and serves on the editorial boards of
several leading journals in the field, including JMLR, JAIR, Annals
of Statistics, Machine Learning, and Bayesian Analysis. He is
Associate Editor in Chief of IEEE Transactions on Pattern Analysis
and Machine Intelligence, currently the IEEE's highest impact
journal. He also serves on the Board of the International Machine
Learning Society, and was Program Chair of the 2007 International
Machine Learning Conference.
"Going Viral": The Structure of Online Diffusion
Viral products and ideas are intuitively understood to grow
through a person-to-person diffusion process analogous to that of an
infectious disease. Until recently, however, it has been prohibitively
difficult to directly observe such viral events, and thus to
rigorously quantify or characterize their structural properties. Here
we propose a formal measure of what we label ``structural virality''
that interpolates between two extremes: content that gains its
popularity through a single, large broadcast, and that which grows
through multiple generations with any one individual directly
responsible for only a fraction of total adoption. We use this notion
of structural virality to analyze a unique dataset of several billion
diffusion events---including the propagation of news stories, videos,
images, and petitions---on a large online social network. We find that
the very largest observed events nearly always exhibit high structural
virality, providing some of the first direct evidence that many of the
most popular products and ideas grow through person-to-person
diffusion. However, medium-sized events---having thousands of
adopters---exhibit surprising structural diversity, and are as likely
to grow via broadcast as through viral means. These empirical findings
are consistent with a contagion having subcritical infection rate
spreading on a network with heavy-tailed degree distribution, which,
notably, is a non-viral mechanism in the traditional, theoretical
Sharad Goel holds a PhD in Applied Mathematics and a Masters in Computer Science from Cornell, and a BS in Mathematics from the University of Chicago. Following postdoctoral positions in the math departments at Stanford and the University of Southern California, he joined the Microeconomics and Social Systems group at Yahoo! Research. Sharad is currently a Senior Researcher at Microsoft Research – New York City.
Social Network Modeling and Task Completion
In this talk I'll describe two classical statistical estimation problems facing social network providers: estimation of edge weight, and prediction of engagement. I'll then offer some predictions about problems of interest to the next generation of social networks, related to the provision of assistance in completing tasks. Finally, I'll close with some questions about how network modeling algorithms could or should interact with the significant literature around network characterization.
Andrés Corrada-Emmanuel (Nanigans)