Motivation and Goals
Modern technology, including the World Wide Web, telecommunication devices and services, and large-scale data storage, has completely transformed the scale and concept of data in the sciences. Modern data sets are often enormous in size, detail, and heterogeneity, and are often best represented as highly annotated sequences of graphs. Although much progress has been made on developing rigorous tools for analyzing and modeling some types of large, complex, real-world networks, much work still remains and a principled, coherent framework remains elusive, in part because the analysis of networks is a growing and highly cross-disciplinary field.
This workshop aims to bring together a diverse and cross-disciplinary set of researchers, in order to describe recent advances and to discuss future directions for developing new network methods in statistics and machine learning. By network methods, we broadly include those models and algorithms whose goal is to learn the patterns of interaction, flow of information, or propagation of effects in social, biological, and economic systems. We will also welcome empirical studies in applied domains such as the social sciences, biology, medicine, neuroscience, physics, finance, social media, and economics.
The full Call for Papers can be found here.
Lise Getoor (University of California, Santa Cruz)
Entity-based Data Science
There is a growing interest in integrating, analyzing, visualizing and making sense of large collections structured, semi-structured and unstructured data. In the world of big data, data science provides tools to help with this process – tools for cleaning the data, tools for integrating and aligning the data, tools for finding patterns in the data and making predictions, and tools for visualizing and interacting with the data. In this talk, I will focus on entity-based data science, data science techniques which support network analysis. Key tasks in network analysis include: entity resolution (determining when two references refer to the same entity), collective classification (predicting missing entity labels in the network), and link prediction (predicting relationships). I will overview of our recent work on probabilistic soft logic (PSL), a framework for collective, probabilistic reasoning in relational domains. PSL is able to reason holistically about both entity attributes and relationships among the entities. Our recent results show that by using state-of-the-art optimization methods in a distributed implementation, we can solve large-scale problems with millions of random variables orders of magnitude more quickly than existing approaches.
Lise Getoor recently joined the Computer Science Department at the University of California, Santa Cruz, and was formerly at the University of Maryland, College Park. Her primary research interests are in machine learning and reasoning with uncertainty, applied to graphs and structured data. She also works in data integration, social network analysis and visual analytics. She has six best paper awards, an NSF Career Award, and is an Association for the Advancement of Artificial Intelligence (AAAI) Fellow. She has served as action editor for the Machine Learning Journal, JAIR associate editor, and TKDD associate editor. She is a board member of the International Machine Learning Society, has been a member of AAAI Executive council, was PC co-chair of ICML 2011, and has served as senior PC member for conferences including AAAI, ICML, IJCAI, ISWC, KDD, SIGMOD, UAI, VLDB, WSDM and WWW. She received her Ph.D. from Stanford University, her M.S. from UC Berkeley, and her B.S. from UC Santa Barbara. For more information, see http://www.cs.umd.edu/~getoor
Supporting Statistical Hypothesis Testing over Graphs
There has been a growing interest in analyzing the network structure
of complex systems to understand key patterns/dependencies in the
underlying system. This has fueled a large body of research on both
models of network structure and algorithms to automatically discover
patterns in the structure. However, robust statistical models, which
can accurately represent distributions over graph populations, are
critical to accurately assess the significance of discovered patterns
or to distinguish between alternative models. Specifically, since
sampling distributions (either analytical or empirical) can be used to
determine the likelihood of a given sample, statistical models
facilitate hypothesis testing and anomaly detection (e.g., graphs with
low likelihood can be flagged as anomalous). However, unlike metric
spaces the space of graphs exhibits a combinatorial structure that
poses significant theoretical and practical challenges to accurate
estimation and efficient inference. In this talk, I will discuss
recent work in which we investigate the distributional properties of
state-of-the art generative models for graphs, discuss the
implications of the findings for network anomaly detection, and
consider the models from a new viewpoint in order to develop novel
methods to support hypothesis testing.
Jennifer Neville is an associate professor at Purdue University with a
joint appointment in the Departments of Computer Science and
Statistics. She received her PhD from the University of Massachusetts
Amherst in 2006. In 2012, she was awarded an NSF Career Award, in 2008
she was chosen by IEEE as one of "AI's 10 to watch", and in 2007 was
selected as a member of the DARPA Computer Science Study Group. Her
research focuses on developing data mining and machine learning
techniques for relational and network domains, including citation
analysis, fraud detection, and social network analysis.
Mark Newman (University of Michigan)
Spectral operators and inference methods for community detection
Community structure is perhaps the best studied example of large-scale
structure in networks. Two popular but apparently unrelated approaches for
detecting network communites are those based on spectral analysis on the
one hand and on Bayesian inference on the other. In turns out, however,
that there is a deep connection between these approaches whose discovery
has led to a number of new results, including the existence of a
detectability phase transition in networks, and the finding that many
spectral algorithms, such as those based on modularity optimization, can
saturate the bound set by the phase transition. This talk will offer an
overview of some recent advances in this area.
Mark Newman received a Ph.D. in theoretical physics from the University of
Oxford in 1991 and conducted postdoctoral research at Cornell University
before joining the staff of the Santa Fe Institute, a think-tank in New
Mexico devoted to the study of complex systems. In 2002 he left Santa Fe
for the University of Michigan, where he is currently the Paul Dirac
Collegiate Professor of Physics and a professor in the university's Center
for the Study of Complex Systems. Professor Newman is a Fellow of the
American Physical Society and the author of seven books, including
"Networks", an introduction to the field of network theory, and "The Atlas
of the Real World", a popular book on cartography. His research centers on
graph theory and social networks.