Mambo is a tool for construction, representation and analysis of large multimodal networks in biomedicine.
Given a set of entities together with information about those entities, and a set of relationships between the entities together with information about those relationships, Mambo constructs a multimodal network representation of the data and provides tools for its analysis.
Recent advances in biotechnology have resulted in large and heterogeneous biomedical data generated by various experimental biotechnologies and collected in knowledge databases.
We present Mambo, a framework and a set of computational tools for construction, representation, and analysis of large-scale multimodal networks in biomedicine. Mambo scales to millions of nodes and billions of edges. Mambo supports thousands of modes (entity types), and tens of thousands of links (relationship types).
Mambo develops a representation of biological data that is compact and can be used to analyze large complex biomedical data and generate new domain-specific hypotheses. Multimodal networks extend the classic graph/network structure from homogeneous to heterogeneous networks. A multimodal network is composed of several set of nodes, called modes, where each mode represents a distinct entity type, connected by edges betweeen nodes within a mode and across modes.
Mambo defines a multimodal network by specifying four components:
The figure below shows an example of a multimoodal network. The multimodal network consists of six modes, each denoted by a different color.
Multimodal networks are typically constructed based a variety of data sources. We provide the processed data and the code for two multimodal networks. These examples are intended to show how to use Mambo, provide examples how to analyze the constructed networks, and to demonstrate how Mambo scales to large networks.
This network focuses on protein-coding genes that are frequently mutated in cancer patients according to the International Cancer Genome Consortium (ICGC)1. We begin with 500 most frequently mutated genes. We then include all nodes in other modes (excluding genes) that are within one-hop of these genes to construct the network based on databases listed in Data section.
We selected 500 genes with mutations in the largest number of cancer patients because these genes likely have important roles in cancer disease development and progression. The network contains information on what proteins are encoded by these genes, what drugs or chemicals interact with the genes and proteins, what functions these genes and proteins have in the human body, and what other diseases these genes and proteins are associated with.
The multimodal cancer network has 5 modes: Chemical, Disease, Function, Gene, and Protein. It has 21 link types: Chemical-Chemical, Chemical-Protein, Disease-Chemical, Disease-Disease, Disease-Function, Disease-Gene, Function-Function, Gene-Gene (split into 6 link types by interaction type), Gene-Protein, Protein-Function, and Protein-Protein (split into 6 link types by interaction type). There are a total of 20 K nodes and 3.4 M edges.
All necessary data to construct this network is in
Mambo Repository.
The repository also contains tutorial showing how to construct and analyze this network.
The tutorial begins in notebook 03 Workflow for Constructing a Multimodal Network
and ends in notebook 08 Performing Analytics on a Multimodal Network
.
Mambo easily scales to very large datasets without any additional work required from the user. We demonstrate this property by constructing a giga-scale multimodal biological network, one of the largest networks ever constructed in biology. The network has 10 M nodes and 2.3 B edges. In this network, we integrate protein and genetic interaction data from more than two thousand species.
Further information can be found in Mambo Repository.
jupyter notebook
. The web browser will open the Notebook Dashboard. Check Jupyter Notebook website for further details on how to run the notebooks.01 Introduction to Multimodal Networks
, and follow the tutorial.Mambo uses Jupyter Notebooks to provide a clear and easy-to-use interface with a simple set of commands that can be used to construct, represent and analyze multimodal networks. We provide a tutorial detailing how to use Mambo on several examples of real-world multimodal networks in biomedicine. Code in the tutorial can be easily adapted to new multimodal networks.
Tutorial has three parts. First, we provide background on multimodal networks and how they are represented in Mambo. This can be found in the two notebooks: 01 Introduction to Multimodal Networks
and in 02 Data Representation in Mambo
.
Second, we provide code and data for construction and analysis of a multimodal cancer network. This part of the tutorial provides a detailed understanding of how Mambo works. This part of the tutorial begins in notebook 03 Workflow for Constructing a Multimodal Network
.
Third, we provide code for construction and analysis of a giga-scale multimodal biological network. This part of the tutorial can be found in notebook 09 Giga-Scale Multimodal Biological Network
.
Detailed tutorial including background on multimodal networls, code, and data are available in Mambo Repository.
The following table lists datasets used in multimodal cancer network and giga-scale multimodal biological network examples.
Interaction Type | File | Database |
---|---|---|
Chemical-gene | CTD_chem_gene_ixns.tsv.gz | CTD2 |
Disease-chemical | CTD_chemicals_diseases.tsv.gz | CTD2 |
Disease-disease | doid.obo | Disease Ontology3 |
Disease-function | CTD_Disease-GO_biological_process_associations.tsv.gz
CTD_Disease-GO_cellular_component_associations.tsv.gz CTD_Disease-GO_molecular_function_associations.tsv.gz |
CTD2 |
Disease-gene | CTD_genes_diseases.tsv.gz | CTD2 |
Drug-target | drugbank.xml.zip | DrugBank4 |
Function-function | go.obo | Gene Ontology5 |
Gene-gene | GeneMANIA Data | GeneMANIA6 |
Gene-protein | ENSEMBL Data | ENSEMBL7 |
Protein-function | goa_human.gaf.gz | Gene Ontology5 |
Protein-protein | protein.links.detailed.v10.5.txt.gz | STRING8 |
The following people contributed to the Mambo project (appear in alphabetical order):
Jure Leskovec