deepsnap package

Submodules

deepsnap.batch module

class deepsnap.batch.Batch(batch=None, **kwargs)[source]

Bases: deepsnap.graph.Graph

A plain old python object modeling a batch of deepsnap.graph.Graph objects as one big (disconnected) graph, with torch_geometric.data.Data being the base class, all its methods can also be used here. In addition, single graphs can be reconstructed via the assignment vector batch, which maps each node to its respective graph identifier.

apply_transform(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]

Applies a transformation to each graph object in parallel by first calling to_data_list, applying the transform, and then perform re-batching again to a Batch. A transform should edit the graph object, including changing the graph structure, and adding node/edge/graph attributes. The rest are automatically handled by the deepsnap.graph.Graph object, including everything ended with index.

Parameters
  • transform – Transformation function applied to each graph object.

  • update_tensor – Whether use nx graph to update tensor attributes.

  • update_graph – Whether use tensor attributes to update nx graphs.

  • deep_copyTrue if a new deep copy of batch is returned. This option allows modifying the batch of graphs without changing the graphs in the original dataset.

  • kwargs – Parameters used in transform function in deepsnap.graph.Graph objects.

apply_transform_batched(transform)[source]

A transform that directly operates on batched graphs User customized apply for batched graphs (expert-only)

Parameters

transform – Transformation function applied to each graph object.

static collate(follow_batch=[], transform=None, **kwargs)[source]
static from_data_list(data_list: List[deepsnap.graph.Graph], follow_batch: List = None, transform: Callable = None, **kwargs)[source]

Constructs A deepsnap.batch.Batch object from a python list holding torch_geometric.data.Data objects. The assignment vector batch is created on the fly. Additionally, creates assignment batch vectors for each key in follow_batch.

Parameters
  • data_list (list) – A list of deepsnap.graph.Graph objects.

  • follow_batch (list, optional) – Creates assignment batch vectors for each key.

  • transform – If apply transform when batching.

  • **kwargs – Other parameters.

property num_graphs

Returns the number of graphs in the batch.

Returns

The number of graphs in the batch.

Return type

int

to_data_list()[source]

Reconstructs the list of torch_geometric.data.Data objects from the batch object. The batch object must have been created via from_data_list() in order to be able reconstruct the initial objects.

deepsnap.dataset module

class deepsnap.dataset.EnsembleGenerator(generators, gen_prob=None, dataset_len=0)[source]

Bases: deepsnap.dataset.Generator

generate(**kwargs)[source]

Generate a list of graphs.

Returns

Generated list of deepsnap.graph.Graph objects.

Return type

list

property num_edge_labels

Returns number of the edge labels in the generated graphs.

Returns

The number of edge labels.

Return type

int

property num_edges

Returns number of the edges in each generated graphs.

Returns

List of the number of edges.

Return type

list

property num_graph_labels

Returns number of the graph labels in the generated graphs.

Returns

The number of graph labels.

Return type

int

property num_node_labels

Returns number of the node labels in the generated graphs.

Returns

The number of node labels.

Return type

int

property num_nodes

Returns number of the nodes in each generated graphs.

Returns

List of the number of nodes.

Return type

list

class deepsnap.dataset.Generator(sizes, size_prob=None, dataset_len=0)[source]

Bases: object

Abstract class of on the fly generator used in dataset. It generates graphs on the fly to be fed into the model.

generate()[source]

Overwrite in subclass. Generates and returns a Graph object

property num_edge_labels
property num_edges
property num_graph_labels
property num_node_labels
property num_nodes
set_len(dataset_len)[source]
class deepsnap.dataset.GraphDataset(graphs, task: str = 'node', edge_negative_sampling_ratio: float = 1, edge_message_ratio: float = 0.8, edge_train_mode: str = 'all', edge_split_mode: str = 'exact', minimum_node_per_graph: int = 5, generator=None)[source]

Bases: object

A plain python object modeling a list of Graph with various (optional) attributes.

Parameters
  • graphs (list) – A list of Graph.

  • task (str) – Task this GraphDataset is used for (task = ‘node’ or ‘edge’ or ‘link_pred’ or ‘graph’).

  • edge_negative_sampling_ratio (float) – The number of negative samples compared to that of positive data.

  • edge_message_ratio (float) – The number of training edge objectives compared to that of message-passing edges.

  • edge_train_mode (str) – Whether to use (edge_train_mode = ‘all’: training edge objectives are the same as the message-passing edges; or ‘disjoint’: training edge objectives are different from message-passing edges; or ‘train_only’: training edge objectives are always the training set edges).

  • minimum_node_per_graph (int) – If the number of nodes of a graph is smaller than this, that graph will be filtered out.

  • generator (deepsnap.dataset.Generator) – The dataset can be on-the-fly-generated. When using on the fly generator, the graphs = [] or None, and a generator(Generator) is provided, with an overwritten generate() method.

apply_transform(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]

Applies a transformation to each graph object in parallel by first calling to_data_list, applying the transform, and then perform re-batching again to a GraphDataset.

Parameters
  • transform – user-defined transformation function.

  • update_tensor – whether request the Graph object remain unchanged.

  • kwargs – parameters used in transform function in Graph object.

filter(filter_fn, deep_copy: bool = False, **kwargs)[source]

Filter the dataset, discarding graph data G where filter_fn(G) is False.

GraphDataset.apply_transform is an analog of python map in graph dataset, while GraphDataset.filter is an analog of python filter.

Parameters
  • filter_fn – user-defined filter function that returns True (keep) or False (discard) for graph object in this dataset.

  • deep_copy – whether to deep copy all graph objects in the returned list.

  • kwargs – parameters used in the filter function.

Returns

A new dataset where graphs are filtered by the given filter function.

static list_to_graphs(G_list) → List[deepsnap.graph.Graph][source]

Transform a list of networkx data object to a list of Graph object.

Parameters

G_list – a list of networkx data object.

Returns

A list of deepsnap.graph.Graph object.

Return type

list

num_dims_dict()[source]

Dimensions for all fields.

Returns

Name of the property to the dimension.
e.g. ‘node_feature’ -> feature dimension;

’graph_label’ -> label dimension

Return type

dict

property num_edge_features

Returns edge feature dimension in the graph.

Returns

The number of features per edge in the dataset.

Return type

int

property num_edge_labels

Returns edge feature dimension in the graph.

Returns

The number of labels per edge in the dataset.

Return type

int

property num_edges

Return number of nodes in graph list

Returns

A list of number of nodes for each graph in graph list

Return type

list

property num_graph_features

Returns graph feature dimension in the graph.

Returns

The number of features per graph in the dataset.

Return type

int

property num_graph_labels

Returns graph feature dimension in the graph.

Returns

The number of labels per graph in the dataset.

Return type

int

property num_labels

General wrapper that returns the number of labels depending on the task.

Returns

The number of labels, depending on the task

Return type

int

property num_node_features

Returns node feature dimension in the graph.

Returns

The number of features per node in the dataset.

Return type

int

property num_node_labels

Returns node feature dimension in the graph.

Returns

The number of labels per node in the dataset.

Return type

int

property num_nodes

Return number of nodes in graph list

Returns

A list of number of nodes for each graph in graph list

Return type

list

static pyg_to_graphs(dataset, verbose: bool = False, fixed_split: bool = False) → List[deepsnap.graph.Graph][source]

Transform a torch_geometric.data.Dataset object to a list of Graph object.

Parameters
  • dataset – a torch_geometric.data.Dataset object.

  • verbose – if print verbose warning

  • fixed_split – if load fixed data split from PyG dataset

Returns

A list of deepsnap.graph.Graph object.

Return type

list

resample_disjoint()[source]

Resample disjoint edge split of message passing and objective links.

Note that if apply_transform (on the message passing graph) was used before this resampling, it needs to be re-applied, after resampling, to update some of the edges that were in objectives.

split(transductive: bool = True, split_ratio: List[float] = None, split_types: Union[str, List[str]] = None) → Union[List[deepsnap.graph.Graph], List[deepsnap.hetero_graph.HeteroGraph]][source]

Split datasets into train, validation (and test) set.

Parameters
  • transductive – whether the training process is transductive or inductive. Inductive split is always used for graph-level tasks ( self.task == ‘graph’).

  • split_ratio – number of data splitted into train, validation (and test) set.

Returns

a list of 3 (2) lists of deepsnap.graph.Graph objects corresponding to train, validation (and test) set.

Return type

list

to(device)[source]

Transfer Graph object in the graphs to specified device.

Parameters

device – Specified device name

deepsnap.graph module

class deepsnap.graph.Graph(G=None, **kwargs)[source]

Bases: object

A plain python object modeling a single graph with various (optional) attributes:

Parameters
  • G (networkx.classes.graph) – The NetworkX graph object which contains features and labels for the tasks.

  • **kwargs – keyworded argument list with keys such as "node_feature", "node_label" and corresponding attributes.

static add_edge_attr(G, attr_name: str, edge_attr)[source]

Add edge attribute into a NetworkX graph.

Parameters
  • G (NetworkX Graph) – a NetworkX graph.

  • attr_name (string) – Name of the edge attribute to set.

  • edge_attr (array_like) – edge attributes.

static add_graph_attr(G, attr_name: str, graph_attr)[source]

Add graph attribute into a NetworkX graph.

Parameters
  • G (NetworkX Graph) – a NetworkX graph.

  • attr_name (string) – Name of the graph attribute to set.

  • graph_attr (scalar or array_like) – graph attributes.

static add_node_attr(G, attr_name: str, node_attr)[source]

Add node attribute into a NetworkX graph. Assumes that the node_attr ordering is the same as the node ordering in G.

Parameters
  • G (NetworkX Graph) – a NetworkX graph.

  • attr_name (string) – Name of the node attribute to set.

  • node_attr (array_like) – node attributes.

apply_tensor(func, *keys)[source]

Applies the function func to all tensor attributes *keys. If *keys is not given, func is applied to all present attributes.

Parameters
  • func (function) – a function can be applied to a PyTorch tensor.

  • *keys (string, optional) – names of the tensor attributes that will be applied.

Returns

Return the self deepsnap.graph.Graph.

Return type

deepsnap.graph.Graph

apply_transform(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]

Apply transform function to current graph object.

Note that when the backend graph object (e.g. networkx object) is changed in the transform function, the argument update_tensor is recommended, to update the tensor representation to be in sync with the transformed graph. Similarly, update_graph is recommended when the transform function makes change to the tensor objects.

However, the transform function should not make changes to both the backend graph object and the tensors simultaneously. Otherwise there might exist inconsistency between the transformed graph and tensors. Also note that update_tensor and update_graph cannot be true at the same time.

Parameters
  • transform (fuction) – in the format of transform(deepsnap.graph.Graph, **kwargs). The function needs to either return deepsnap.graph.Graph (the transformed graph object), or the transformed internal .G object (networkx). If returning .G object, all corresponding tensors will be updated.

  • update_tensor (boolean) – if nx graph has changed, use nx graph to update tensor attributes.

  • update_graph – (boolean): if tensor attributes has changed, use attributes to update nx graph.

  • deep_copy (boolean) – True if a new copy of graph_object is needed. In this case, the transform function needs to either return a graph object, Important: when returning Graph object in transform function, user should decide whether the tensor values of the graph is to be copied (deep copy).

  • **kwargs (any) – additional args for the transform function.

Note

This function different from the function apply_tensor.

clone()[source]

Deepcopy the graph object.

Returns

A cloned deepsnap.graph.Graph object with deepcopying all features.

Return type

deepsnap.graph.Graph

contiguous(*keys)[source]

Ensures a contiguous memory layout for the attributes specified by *keys. If *keys is not given, all present attributes are ensured tohave a contiguous memory layout.

Parameters

*keys (string, optional) – tensor attributes which will be in contiguous memory layout.

Returns

deepsnap.graph.Graph object with specified tensor attributes in contiguous memory layout.

Return type

deepsnap.graph.Graph

get_num_dims(key, as_label=False) → int[source]

Returns the number of dimensions for one graph/node/edge property.

Parameters

as_label – if as_label, treat the tensor as labels (

is_directed() → bool[source]

Whether the graph is directed.

Returns

True if the graph is directed.

Return type

bool

is_undirected() → bool[source]

Whether the graph is undirected.

Returns

True if the graph is undirected.

Return type

bool

property keys

Returns all names of the graph attributes.

Returns

List of deepsnap.graph.Graph attributes.

Return type

list

static negative_sampling(edge_index, num_nodes=None, num_neg_samples=None)[source]

Samples random negative edges of a graph given by edge_index.

Parameters
  • edge_index (torch.LongTensor) – The edge indices.

  • num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of edge_index. (default: None)

  • num_neg_samples (int, optional) – The number of negative samples to return. If set to None, will try to return a negative edge for every positive edge. (default: None)

  • force_undirected (bool, optional) – If set to True, sampled negative edges will be undirected. (default: False)

Return type

torch.LongTensor

property num_edge_features

Returns edge feature dimension in the graph.

Returns

Node feature dimension and 0 if there is no edge_feature.

Return type

int

property num_edge_labels

Returns number of the edge labels in the graph.

Returns

Number of edge labels and 0 if there is no edge_label.

Return type

int

property num_edges

Returns number of edges in the graph.

Returns

Number of edges.

Return type

int

property num_graph_features

Returns graph feature dimension in the graph.

Returns

Graph feature dimension and 0 if there is no graph_feature.

Return type

int

property num_graph_labels

Returns number of the graph labels in the graph.

Returns

Number of graph labels and 0 if there is no graph_label.

Return type

int

property num_node_features

Returns node feature dimension in the graph.

Returns

Node feature dimension and 0 if there is no node_feature.

Return type

int

property num_node_labels

Returns number of the node labels in the graph.

Returns

Number of node labels and 0 if there is no node_label.

Return type

int

property num_nodes

Return number of nodes in the graph.

Returns

Number of nodes in the graph.

Return type

int

static pyg_to_graph(data, verbose: bool = False, fixed_split: bool = False)[source]

Converts Pytorch Geometric data to a Graph object.

Parameters
  • data (torch_geometric.data) – a Pytorch Geometric data.

  • verbose – if print verbose warning

  • fixed_split – if load fixed data split from PyG dataset

Returns

A new DeepSNAP deepsnap.graph.Graph object.

Return type

deepsnap.graph.Graph

static raw_to_graph(data)[source]

Write other methods for user to import their own data format and make sure all attributes of G are scalar/torch.tensor. Not implemented.

resample_disjoint(message_ratio)[source]

Resample disjoint edge split of message passing and objective links.

Note that if apply_transform (on the message passing graph) was used before this resampling, it needs to be re-applied, after resampling, to update some of the edges that were in objectives.

split(task: str = 'node', split_ratio: List[float] = None)[source]

Split current graph object to list of graph objects.

Parameters
  • task (string) – one of node, edge or link_pred.

  • split_ratio (array_like) – array_like ratios [train_ratio, validation_ratio, test_ratio].

Returns

A Python list of deepsnap.graph.Graph objects with specified task.

Return type

list

Split the graph into len(split_ratio) graphs for link prediction. Internally this splits edge indices, and the model will only compute loss for the embedding of nodes in each split graph. This is only used for transductive link prediction task In this task, different part of graph is observed in train/val/test Note: this functon will be called twice, if during training, we further split the training graph so that message edges and objective edges are different

to(device, *keys)[source]

Performs tensor dtype and/or device conversion to all attributes *keys. If *keys is not given, the conversion is applied to all present attributes.

Parameters
  • device – Specified device name.

  • *keys (string, optional) – Tensor attributes which will transfer to the specified device.

deepsnap.hetero_gnn module

class deepsnap.hetero_gnn.HeteroConv(convs, aggr='add', parallelize=False)[source]

Bases: torch.nn.modules.module.Module

A “wrapper” layer designed for heterogeneous graph layers. It takes a heterogeneous graph layer, such as deepsnap.hetero_gnn.HeteroSAGEConv, at the initializing stage.

aggregate(xs)[source]

The aggregation for each node type. Currently support concat, add, mean, max and mul.

forward(node_features, edge_indices, edge_features=None)[source]

The forward function for HeteroConv.

Parameters
  • node_features (dict) – A dictionary each key is node type and the corresponding value is a node feature tensor.

  • edge_indices (dict) – A dictionary each key is message type and the corresponding value is an edge index tensor.

  • edge_features (dict) – A dictionary each key is edge type and the corresponding value is an edge feature tensor. Default is None.

reset_parameters()[source]
class deepsnap.hetero_gnn.HeteroSAGEConv(in_channels_neigh, out_channels, in_channels_self=None)[source]

Bases: torch_geometric.nn.conv.message_passing.MessagePassing

The heterogeneous compitable GraphSAGE operator is derived from the “Inductive Representation Learning on Large Graphs”, “Modeling polypharmacy side effects with graph convolutional networks” and “Modeling Relational Data with Graph Convolutional Networks” papers.

Parameters
  • in_channels_neigh (int) – The input dimension of the end node type.

  • out_channels (int) – The dimension of the output.

  • in_channels_self (int) – The input dimension of the start node type. Default is None where the in_channels_self is equal to in_channels_neigh.

forward(node_feature_neigh, node_feature_self, edge_index, edge_weight=None, size=None, res_n_id=None)[source]
message(node_feature_neigh_j, node_feature_self_i, edge_weight)[source]
update(aggr_out, node_feature_self, res_n_id)[source]
deepsnap.hetero_gnn.forward_op(x, func, **kwargs)[source]

A helper function for the heterogeneous operations. Given a dictionary input, it will return a dictionary with the same keys and the values applied by the func with specified parameters.

Parameters
  • x (dict) – A dictionary that the value of each item will be applied by the func.

  • func (function) – The function will be applied to each value in the dictionary.

  • **kwargs – Parameters that will be passed into the func.

deepsnap.hetero_gnn.loss_op(pred, y, label_index, loss_func, **kwargs)[source]

A helper function for the heterogeneous loss operations.

Parameters
  • pred (dict) – A dictionary of predictions.

  • y (dict) – A dictionary of labels.

  • label_index (dict) – A dictionary of indicies that the loss will be computed on. Each value should be a Pytorch long tensor.

  • loss_func (function) – The loss function.

  • **kwargs – Parameters that will be passed into the loss_func.

deepsnap.hetero_graph module

class deepsnap.hetero_graph.HeteroGraph(G=None, **kwargs)[source]

Bases: deepsnap.graph.Graph

A plain python object modeling a heterogeneous graph with various attributes (String node type is required for the HeteroGraph).

Parameters
  • G (networkx.classes.graph) – The NetworkX graph object which contains features and labels for each node type of edge type.

  • **kwargs – keyworded argument list with keys such as "node_feature", "node_label" and corresponding attributes.

property edge_types

Return list of edge types in the heterogeneous graph.

get_num_dims(key, obj_type, as_label: bool = False) → int[source]

Returns the number of dimensions for one graph/node/edge property for specified types.

Parameters
  • key (str) – The choosing property.

  • obj_type – Node or edge type.

  • as_label (bool) – If as_label, treat the tensor as labels.

get_num_edge_features(edge_type: str) → int[source]

Return the edge feature dimension of specified edge type.

Returns

The edge feature dimension for specified edge type.

Return type

int

get_num_edge_labels(edge_type: str) → int[source]

Return the number of edge labels.

Returns

Number of edge labels for specified edge type.

Return type

int

get_num_edges(message_type: Union[tuple, List[tuple]] = None) → int[source]

Return number of edges for a edge type or list of edgs types.

Parameters

edge_type (str or list) – Specified edge type(s).

Returns

The number of edges for a edge type or list of edge types.

Return type

int or list

get_num_node_features(node_type: str) → int[source]

Return the node feature dimension of specified node type.

Returns

The node feature dimension for specified node type.

Return type

int

get_num_node_labels(node_type: str) → int[source]

Return the number of node labels.

Returns

Number of node labels for specified node type.

Return type

int

get_num_nodes(node_type: Union[str, List[str]] = None)[source]

Return number of nodes for a node type or list of node types.

Parameters

node_type (str or list) – Specified node type(s).

Returns

The number of nodes for a node type or list of node types.

Return type

int or list

property message_types

Return the list of message types (src_node_type, edge_type, end_node_type) in the heterogeneous graph.

static negative_sampling(edge_index: Dict[str, None._VariableFunctions.tensor], num_nodes=None, num_neg_samples: Dict[str, int] = None)[source]

Samples random negative edges of a heterogeneous graph given by edge_index.

Parameters
  • edge_index (LongTensor) – The edge indices.

  • num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of edge_index. (default: None)

  • num_neg_samples (int, optional) – The number of negative samples to return. If set to None, will try to return a negative edge for every positive edge. (default: None)

  • force_undirected (bool, optional) – If set to True, sampled negative edges will be undirected. (default: False)

Return type

torch.LongTensor

property node_types

Return list of node types in the heterogeneous graph.

split(task: str = 'node', split_types: Union[str, List[str], tuple, List[tuple]] = None, split_ratio: List[float] = None, edge_split_mode: str = 'exact')[source]

Split current graph object to list of graph objects.

Parameters
  • task (string) – One of node, edge or link_pred.

  • split_types (list) – Types splitted on. Default is None which will split all the types in specified task.

  • split_ratio (array_like) – Array_like ratios [train_ratio, validation_ratio, test_ratio].

Returns

A Python list of Graph objects with specified task.

Return type

list

Split the graph into len(split_ratio) graphs for link prediction. Internally this splits edge indices, and the model will only compute loss for the embedding of nodes in each split graph. This is only used for transductive link prediction task In this task, different part of graph is observed in train/val/test Note: this functon will be called twice, if during training, we further split the training graph so that message edges and objective edges are different

Module contents