OhmNet is an approach for learning node features in multi-layer networks. Given any multi-layer network and a hierarchy describing relationships between the layers, it can learn a rich multi-scale mapping of nodes to a low-dimensional feature space.
In biology, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems.
Although incredibly influential, current computational methods for extracting functional information from protein interaction networks lack tissue specificity as they assume that cellular function is constant across organs and tissues. In other words, cellular functions in heart are assumed to be the same as functions in skin. The methods can be, hence, less successful in constructing accurate maps of both where and how proteins act.
To this end, we develop a computational framework that can relate tissues to each other, learn rich feature representations for proteins in each tissue-specific network, and then use the extracted features for tissue-specific cellular function prediction.
Predicting multicellular function through multi-layer tissue networks.
Marinka Zitnik and Jure Leskovec.
To appear at ISMB/ECCB 2017.
Our goal is to learn features of proteins in different tissues. We represent each tissue as a network, where nodes represent proteins. Individual tissue networks act as layers in a multi-layer network, where we use a hierarchy to model dependencies between the layers (i.e., tissues).
We develop a computational framework that learns features of each node (i.e., protein) by taking into consideration connections between the nodes within each layer, together with inter-layer relationships between proteins active on different layers. More precisely, our approach embeds each protein in each tissue in a d-dimensional feature space such that proteins with similar network neighborhoods in similar tissues are embedded closely together.
In OhmNet, we define an objective function that is independent of the downstream prediction task, meaning that the feature representations are learned in a purely unsupervised way. Since learned features are not designed for a specific downstream prediction task, they generalize across a wide variety of tasks and tissues. For example, we use the learned features to study protein functions across different cellular systems (e.g., cell types, tissues, organs, and organ systems).
Our algorithmic framework for unsupervised feature learning in multi-layer networks has two components:
The problem of feature learning in a multi-layer network is to learn functions f1, f2, ..., fT, such that each function fi : Vi → ℝd maps nodes in Vi to feature representations in d-dimensional space.
OhmNet learns functions f1,f2, ..., fT located in the leaf objects of the hierarchy (i.e., layers of a given multi-layer network), as well as estimates for functions fT+1, fT+2, ..., f|M| located in the internal objects of the hierarchy. For example, consider a multi-layer network shown above, consisting of four layers that are interrelated by a two-level hierarchy. OhmNet learns the mappings fi, fj, fk, and fl that map nodes in each layer into a d-dimensional feature space. Additionally, Ohmnet also learns the mappings f2 representing features for nodes at an intermediate scale, and the mapping f1 representing features for nodes at the highest scale.
In this example, we first construct a multi-layer brain network by integrating nine brain-specific protein interaction networks (e.g., the cerebellum, frontal lobe, brainstem, and other brain tissues). Each of nine brain-specific networks is one layer in the multi-layer network. The layers are organized according to a two-level brain hierarchy.
We run Ohmnet on this multi-layer network to find node features in a purely unsupervised way. We then map the nodes to the 2-D space based on the learned features. This way we assign every node in every layer to a point in the two-dimensional space based solely on the node's learned features. We then visualize the points and color them based on the layer they belong to, pictured here. We see how OhmNet learns protein features that expose a multi-scale organization of tissues in human body.
We constructed human protein-protein interaction (PPI) network, tissue-specific network layers, tissue hierarchy, and tissue-specific gene-function relationships. To this end, we took the latest protein, tissue, and function information from various reputable public data sources.
For example, we represented similarities between tissues with a hierarchy defined over 219 tissues (e.g., muscle, adrenal cortex, bone marrow), pictured here. We then constructed a multi-layer network with 107 layers, each representing one tissue-specific protein interaction network, shown as a blue leaf in this picture.
|bio-tissue-networks.tar.gz||Tissue-specific protein interaction networks, one network per human tissue|
|bio-tissue-hierarchy.tar.gz||A hierarchy of human tissues|
|bio-tissue-labels.tar.gz||Tissue-specific gene-function associations from the Gene Ontology|
|bio-tisue-readme.txt||Description of files|
A Python implementation of OhmNet is available on GitHub.