Feature Learning in Multi-Layer Networks

OhmNet is an unsupervised feature learning approach for multi-layer networks. Given any multi-layer network and a hierarchy describing relationships between the layers, it embeds nodes in each layer and at each scale into a d-dimensional space.

In biology, OhmNet represents a shift from flat networks to multiscale models able to predict a range of phenotypes spanning cellular systems.

Although incredibly influential, current methods for protein function prediction lack tissue specificity as they assume that protein functions are constant across organs and tissues. In other words, protein functions in heart are assumed to be the same as functions in skin. These methods are hence less successful in constructing accurate maps of both where and how proteins act.

We develop a computational framework that can relate tissues to each other, learn rich embeddings for proteins in each tissue and at every scale. The learned embeddings can be used for any downstream prediction task, including link prediction, clustering, and node classification.

Predicting multicellular function through multi-layer tissue networks.
Marinka Zitnik and Jure Leskovec.
Bioinformatics, 33, 14:i190-i198, 2017.
Presented at ISMB/ECCB 2017. [Video] [Slides] [Poster]

Feature learning in multi-layer tissue networks

Our goal is to learn features of proteins in different tissues. We represent each tissue as a network, where nodes represent proteins. Individual tissue networks act as layers in a multi-layer network, where we use a hierarchy to model dependencies between the layers (i.e., tissues).

We develop a computational framework that learns features of each node (i.e., protein) by taking into consideration connections between the nodes within each layer, together with inter-layer relationships between proteins active on different layers. More precisely, our approach embeds each protein in each tissue in a d-dimensional feature space such that proteins with similar network neighborhoods in similar tissues are embedded closely together.

In OhmNet, we define an objective function that is independent of the downstream prediction task, meaning that the feature representations are learned in a purely unsupervised way. Since learned features are not designed for a specific downstream prediction task, they generalize across a wide variety of tasks and tissues. For example, we use the learned features to study protein functions across different cellular systems (e.g., cell types, tissues, organs, and organ systems).

Our algorithmic framework for unsupervised feature learning in multi-layer networks has two components:

The problem of feature learning in a multi-layer network is to learn functions f1, f2, ..., fT, such that each function fi : Vi → ℝd maps nodes in Vi to feature representations in d-dimensional space.

OhmNet learns functions f1,f2, ..., fT located in the leaf objects of the hierarchy (i.e., layers of a given multi-layer network), as well as estimates for functions fT+1, fT+2, ..., f|M| located in the internal objects of the hierarchy. For example, consider a multi-layer network shown above, consisting of four layers that are interrelated by a two-level hierarchy. OhmNet learns the mappings fi, fj, fk, and fl that map nodes in each layer into a d-dimensional feature space. Additionally, Ohmnet also learns the mappings f2 representing features for nodes at an intermediate scale, and the mapping f1 representing features for nodes at the highest scale.

Case study: A multiscale model of brain tissues

In this example, we first construct a multi-layer brain network by integrating nine brain-specific protein interaction networks (e.g., the cerebellum, frontal lobe, brainstem, and other brain tissues). Each of nine brain-specific networks is one layer in the multi-layer network. The layers are organized according to a two-level brain hierarchy.

We run Ohmnet on this multi-layer network to find node features in a purely unsupervised way. We then map the nodes to the 2-D space based on the learned features. This way we assign every node in every layer to a point in the two-dimensional space based solely on the node's learned features. We then visualize the points and color them based on the layer they belong to, pictured here. We see how OhmNet learns protein features that expose a multi-scale organization of tissues in human body.

Tissue-specific protein interaction data

We constructed human protein-protein interaction (PPI) network, tissue-specific network layers, tissue hierarchy, and tissue-specific gene-function relationships. To this end, we took the latest protein, tissue, and function information from various reputable public data sources.

For example, we represented similarities between tissues with a hierarchy defined over 219 tissues (e.g., muscle, adrenal cortex, bone marrow), pictured here. We then constructed a multi-layer network with 107 layers, each representing one tissue-specific protein interaction network, shown as a blue leaf in this picture.

File Description
bio-tissue-networks.tar.gz Tissue-specific protein interaction networks, one network per human tissue
bio-tissue-hierarchy.tar.gz A hierarchy of human tissues
bio-tissue-labels.tar.gz Tissue-specific gene-function associations from the Gene Ontology
bio-tissue-readme.txt Description of files

Code

A Python implementation of OhmNet is available on GitHub.