Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. This tutorial investigates key advancements in representation learning for networks over the last few years, with an emphasis on fundamentally new opportunities in network biology enabled by these advancements.
Biological networks are powerful resources for the discovery of interactions and emergent properties in biological systems, ranging from single-cell to population level. Network approaches have been used many times to combine and amplify signals from individual genes, and have led to remarkable discoveries in biology, including drug discovery, protein function prediction, disease diagnosis, and precision medicine. Furthermore, these approaches have shown broad utility in uncovering new biology and have contributed to new discoveries in wet laboratory experiments.
Mathematical machinery that is central to these approaches is machine learning on networks. The main challenge in machine learning on networks is to find a way to extract information about interactions between nodes and to incorporate that information into a machine learning model. To extract this information from networks, classic machine learning approaches often rely on summary statistics (e.g., degrees or clustering coefficients) or carefully engineered features to measure local neighborhood structures (e.g., network motifs). These classic approaches can be limited because these hand-engineered features are inflexible, they often do not generalize to networks derived from other organisms, tissues and experimental technologies, and can fail on datasets with low experimental coverage.
Recent years have seen a surge in approaches that automatically learn to encode network structure into low-dimensional representations, using transformation techniques based on deep learning and nonlinear dimensionality reduction. The idea behind these representation learning approaches is to learn a data transformation function that maps nodes to points in a low-dimensional vector space, also termed embeddings. Representation learning methods have revolutionized the state-of-the-art in network science and the goal of this tutorial is to open the door for these methods to computational biology and bioinformatics.
The tutorial investigates techniques for biological network modeling, analytics and optimization:
Tutorial investigates methods and case studies for analyzing biological networks and extracting actionable insights, and in doing so, it provides attendees with a toolbox of next-generation algorithms for network biology.
Our tutorial will cover the key conceptual foundations of representation learning, from more traditional approaches relying on matrix factorization and network propagation to very recent advancements in deep representation learning for networks.
In addition to a broad high-level overview, we will spend a considerable amount of time describing the details of recent advancements in deep representation learning and discussing both algorithmic and implementation aspects.
We just released a new dataset collection, BioSNAP Datasets, containing many large biomedical network datasets.
Tutorial will be held at ISMB conference in Chicago, IL, USA, on Friday, July 6th, 2018.
The tutorial will be of broad interest to researchers who work with network data coming from biology, medicine, and life sciences. Graph-structured data arise in many different areas of data mining and predictive analytics, so the tutorial should be of theoretical and practical interest to a large part of data mining and network science community.
The tutorial will not require prior knowledge beyond fundamental concepts covered in introductory machine learning and network science classes. Attendees will come away with a broad knowledge necessary to understand state-of-the-art representation learning methods and to use these methods to solve central problems in network biology.
No special software or other package installation is needed to follow this tutorial. However, this tutorial contains several demos in Python and Tensorflow that might be of interest to participants.
Marinka Zitnik is a postdoctoral fellow in Computer Science at Stanford University. Her research focuses on network science and representation learning methods for biomedicine. She received her PhD in Computer Science from University of Ljubljana in 2015 while also conducting research at Imperial College London, University of Toronto, Baylor College of Medicine, and Stanford University. She received outstanding research awards at ISMB, CAMDA, RECOMB, and BC2 conferences, and is involved in projects at Chan Zuckerberg Biohub.
Jure Leskovec is an Associate Professor of Computer Science at Stanford University and Chan Zuckerberg Biohub Investigator. His research is recently focusing on biological and biomedical problems and applications of network science to problems in biomedicine and health. Jure received his PhD in Machine Learning from Carnegie Mellon University in 2008 and spent a year at Cornell University. His work received five best paper awards, won the ACM KDD cup and topped the Battle of the Sensor Networks competition.