deepsnap package¶
Submodules¶
deepsnap.batch module¶
-
class
deepsnap.batch.
Batch
(batch=None, **kwargs)[source]¶ Bases:
deepsnap.graph.Graph
A plain old python object modeling a batch of
deepsnap.graph.Graph
objects as one big (disconnected) graph, withtorch_geometric.data.Data
being the base class, all its methods can also be used here. In addition, single graphs can be reconstructed via the assignment vectorbatch
, which maps each node to its respective graph identifier.-
apply_transform
(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]¶ Applies a transformation to each graph object in parallel by first calling to_data_list, applying the transform, and then perform re-batching again to a Batch. A transform should edit the graph object, including changing the graph structure, and adding node/edge/graph attributes. The rest are automatically handled by the
deepsnap.graph.Graph
object, including everything ended with index.- Parameters
transform – Transformation function applied to each graph object.
update_tensor – Whether use nx graph to update tensor attributes.
update_graph – Whether use tensor attributes to update nx graphs.
deep_copy –
True
if a new deep copy of batch is returned. This option allows modifying the batch of graphs without changing the graphs in the original dataset.kwargs – Parameters used in transform function in
deepsnap.graph.Graph
objects.
-
apply_transform_batched
(transform)[source]¶ A transform that directly operates on batched graphs User customized apply for batched graphs (expert-only)
- Parameters
transform – Transformation function applied to each graph object.
-
static
from_data_list
(data_list: List[deepsnap.graph.Graph], follow_batch: List = None, transform: Callable = None, **kwargs)[source]¶ Constructs A
deepsnap.batch.Batch
object from a python list holdingtorch_geometric.data.Data
objects. The assignment vectorbatch
is created on the fly. Additionally, creates assignment batch vectors for each key infollow_batch
.
-
property
num_graphs
¶ Returns the number of graphs in the batch.
- Returns
The number of graphs in the batch.
- Return type
-
to_data_list
()[source]¶ Reconstructs the list of
torch_geometric.data.Data
objects from the batch object. The batch object must have been created viafrom_data_list()
in order to be able reconstruct the initial objects.
-
deepsnap.dataset module¶
-
class
deepsnap.dataset.
EnsembleGenerator
(generators, gen_prob=None, dataset_len=0)[source]¶ Bases:
deepsnap.dataset.Generator
-
generate
(**kwargs)[source]¶ Generate a list of graphs.
- Returns
Generated list of
deepsnap.graph.Graph
objects.- Return type
-
property
num_edge_labels
¶ Returns number of the edge labels in the generated graphs.
- Returns
The number of edge labels.
- Return type
-
property
num_edges
¶ Returns number of the edges in each generated graphs.
- Returns
List of the number of edges.
- Return type
-
property
num_graph_labels
¶ Returns number of the graph labels in the generated graphs.
- Returns
The number of graph labels.
- Return type
-
property
num_node_labels
¶ Returns number of the node labels in the generated graphs.
- Returns
The number of node labels.
- Return type
-
-
class
deepsnap.dataset.
Generator
(sizes, size_prob=None, dataset_len=0)[source]¶ Bases:
object
Abstract class of on the fly generator used in dataset. It generates graphs on the fly to be fed into the model.
-
property
num_edge_labels
¶
-
property
num_edges
¶
-
property
num_graph_labels
¶
-
property
num_node_labels
¶
-
property
num_nodes
¶
-
property
-
class
deepsnap.dataset.
GraphDataset
(graphs, task: str = 'node', edge_negative_sampling_ratio: float = 1, edge_message_ratio: float = 0.8, edge_train_mode: str = 'all', edge_split_mode: str = 'exact', minimum_node_per_graph: int = 5, generator=None)[source]¶ Bases:
object
A plain python object modeling a list of Graph with various (optional) attributes.
- Parameters
graphs (list) – A list of Graph.
task (str) – Task this GraphDataset is used for (task = ‘node’ or ‘edge’ or ‘link_pred’ or ‘graph’).
edge_negative_sampling_ratio (float) – The number of negative samples compared to that of positive data.
edge_message_ratio (float) – The number of training edge objectives compared to that of message-passing edges.
edge_train_mode (str) – Whether to use (edge_train_mode = ‘all’: training edge objectives are the same as the message-passing edges; or ‘disjoint’: training edge objectives are different from message-passing edges; or ‘train_only’: training edge objectives are always the training set edges).
minimum_node_per_graph (int) – If the number of nodes of a graph is smaller than this, that graph will be filtered out.
generator (
deepsnap.dataset.Generator
) – The dataset can be on-the-fly-generated. When using on the fly generator, the graphs = [] or None, and a generator(Generator) is provided, with an overwritten generate() method.
-
apply_transform
(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]¶ Applies a transformation to each graph object in parallel by first calling to_data_list, applying the transform, and then perform re-batching again to a GraphDataset.
- Parameters
transform – user-defined transformation function.
update_tensor – whether request the Graph object remain unchanged.
kwargs – parameters used in transform function in Graph object.
-
filter
(filter_fn, deep_copy: bool = False, **kwargs)[source]¶ Filter the dataset, discarding graph data G where filter_fn(G) is False.
GraphDataset.apply_transform is an analog of python map in graph dataset, while GraphDataset.filter is an analog of python filter.
- Parameters
filter_fn – user-defined filter function that returns True (keep) or False (discard) for graph object in this dataset.
deep_copy – whether to deep copy all graph objects in the returned list.
kwargs – parameters used in the filter function.
- Returns
A new dataset where graphs are filtered by the given filter function.
-
static
list_to_graphs
(G_list) → List[deepsnap.graph.Graph][source]¶ Transform a list of networkx data object to a list of Graph object.
- Parameters
G_list – a list of networkx data object.
- Returns
A list of
deepsnap.graph.Graph
object.- Return type
-
num_dims_dict
()[source]¶ Dimensions for all fields.
- Returns
- Name of the property to the dimension.
- e.g. ‘node_feature’ -> feature dimension;
’graph_label’ -> label dimension
- Return type
-
property
num_edge_features
¶ Returns edge feature dimension in the graph.
- Returns
The number of features per edge in the dataset.
- Return type
-
property
num_edge_labels
¶ Returns edge feature dimension in the graph.
- Returns
The number of labels per edge in the dataset.
- Return type
-
property
num_edges
¶ Return number of nodes in graph list
- Returns
A list of number of nodes for each graph in graph list
- Return type
-
property
num_graph_features
¶ Returns graph feature dimension in the graph.
- Returns
The number of features per graph in the dataset.
- Return type
-
property
num_graph_labels
¶ Returns graph feature dimension in the graph.
- Returns
The number of labels per graph in the dataset.
- Return type
-
property
num_labels
¶ General wrapper that returns the number of labels depending on the task.
- Returns
The number of labels, depending on the task
- Return type
-
property
num_node_features
¶ Returns node feature dimension in the graph.
- Returns
The number of features per node in the dataset.
- Return type
-
property
num_node_labels
¶ Returns node feature dimension in the graph.
- Returns
The number of labels per node in the dataset.
- Return type
-
property
num_nodes
¶ Return number of nodes in graph list
- Returns
A list of number of nodes for each graph in graph list
- Return type
-
static
pyg_to_graphs
(dataset, verbose: bool = False, fixed_split: bool = False) → List[deepsnap.graph.Graph][source]¶ Transform a torch_geometric.data.Dataset object to a list of Graph object.
- Parameters
dataset – a torch_geometric.data.Dataset object.
verbose – if print verbose warning
fixed_split – if load fixed data split from PyG dataset
- Returns
A list of
deepsnap.graph.Graph
object.- Return type
-
resample_disjoint
()[source]¶ Resample disjoint edge split of message passing and objective links.
Note that if apply_transform (on the message passing graph) was used before this resampling, it needs to be re-applied, after resampling, to update some of the edges that were in objectives.
-
split
(transductive: bool = True, split_ratio: List[float] = None, split_types: Union[str, List[str]] = None) → Union[List[deepsnap.graph.Graph], List[deepsnap.hetero_graph.HeteroGraph]][source]¶ Split datasets into train, validation (and test) set.
- Parameters
transductive – whether the training process is transductive or inductive. Inductive split is always used for graph-level tasks ( self.task == ‘graph’).
split_ratio – number of data splitted into train, validation (and test) set.
- Returns
a list of 3 (2) lists of
deepsnap.graph.Graph
objects corresponding to train, validation (and test) set.- Return type
deepsnap.graph module¶
-
class
deepsnap.graph.
Graph
(G=None, **kwargs)[source]¶ Bases:
object
A plain python object modeling a single graph with various (optional) attributes:
- Parameters
G (
networkx.classes.graph
) – The NetworkX graph object which contains features and labels for the tasks.**kwargs – keyworded argument list with keys such as
"node_feature"
,"node_label"
and corresponding attributes.
-
static
add_edge_attr
(G, attr_name: str, edge_attr)[source]¶ Add edge attribute into a NetworkX graph.
- Parameters
G (NetworkX Graph) – a NetworkX graph.
attr_name (string) – Name of the edge attribute to set.
edge_attr (array_like) – edge attributes.
-
static
add_graph_attr
(G, attr_name: str, graph_attr)[source]¶ Add graph attribute into a NetworkX graph.
- Parameters
G (NetworkX Graph) – a NetworkX graph.
attr_name (string) – Name of the graph attribute to set.
graph_attr (scalar or array_like) – graph attributes.
-
static
add_node_attr
(G, attr_name: str, node_attr)[source]¶ Add node attribute into a NetworkX graph. Assumes that the node_attr ordering is the same as the node ordering in G.
- Parameters
G (NetworkX Graph) – a NetworkX graph.
attr_name (string) – Name of the node attribute to set.
node_attr (array_like) – node attributes.
-
apply_tensor
(func, *keys)[source]¶ Applies the function
func
to all tensor attributes*keys
. If*keys
is not given,func
is applied to all present attributes.- Parameters
func (function) – a function can be applied to a PyTorch tensor.
*keys (string, optional) – names of the tensor attributes that will be applied.
- Returns
Return the self
deepsnap.graph.Graph
.- Return type
-
apply_transform
(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]¶ Apply transform function to current graph object.
Note that when the backend graph object (e.g. networkx object) is changed in the transform function, the argument update_tensor is recommended, to update the tensor representation to be in sync with the transformed graph. Similarly, update_graph is recommended when the transform function makes change to the tensor objects.
However, the transform function should not make changes to both the backend graph object and the tensors simultaneously. Otherwise there might exist inconsistency between the transformed graph and tensors. Also note that update_tensor and update_graph cannot be true at the same time.
- Parameters
transform (fuction) – in the format of
transform(deepsnap.graph.Graph, **kwargs)
. The function needs to either return deepsnap.graph.Graph (the transformed graph object), or the transformed internal .G object (networkx). If returning .G object, all corresponding tensors will be updated.update_tensor (boolean) – if nx graph has changed, use nx graph to update tensor attributes.
update_graph – (boolean): if tensor attributes has changed, use attributes to update nx graph.
deep_copy (boolean) – True if a new copy of graph_object is needed. In this case, the transform function needs to either return a graph object, Important: when returning Graph object in transform function, user should decide whether the tensor values of the graph is to be copied (deep copy).
**kwargs (any) – additional args for the transform function.
Note
This function different from the function
apply_tensor
.
-
clone
()[source]¶ Deepcopy the graph object.
- Returns
A cloned
deepsnap.graph.Graph
object with deepcopying all features.- Return type
-
contiguous
(*keys)[source]¶ Ensures a contiguous memory layout for the attributes specified by
*keys
. If*keys
is not given, all present attributes are ensured tohave a contiguous memory layout.- Parameters
*keys (string, optional) – tensor attributes which will be in contiguous memory layout.
- Returns
deepsnap.graph.Graph
object with specified tensor attributes in contiguous memory layout.- Return type
-
get_num_dims
(key, as_label=False) → int[source]¶ Returns the number of dimensions for one graph/node/edge property.
- Parameters
as_label – if as_label, treat the tensor as labels (
-
property
keys
¶ Returns all names of the graph attributes.
- Returns
List of
deepsnap.graph.Graph
attributes.- Return type
-
static
negative_sampling
(edge_index, num_nodes=None, num_neg_samples=None)[source]¶ Samples random negative edges of a graph given by
edge_index
.- Parameters
edge_index (
torch.LongTensor
) – The edge indices.num_nodes (int, optional) – The number of nodes, i.e.
max_val + 1
ofedge_index
. (default:None
)num_neg_samples (int, optional) – The number of negative samples to return. If set to
None
, will try to return a negative edge for every positive edge. (default:None
)force_undirected (bool, optional) – If set to
True
, sampled negative edges will be undirected. (default:False
)
- Return type
torch.LongTensor
-
property
num_edge_features
¶ Returns edge feature dimension in the graph.
- Returns
Node feature dimension and 0 if there is no edge_feature.
- Return type
-
property
num_edge_labels
¶ Returns number of the edge labels in the graph.
- Returns
Number of edge labels and 0 if there is no edge_label.
- Return type
-
property
num_graph_features
¶ Returns graph feature dimension in the graph.
- Returns
Graph feature dimension and 0 if there is no graph_feature.
- Return type
-
property
num_graph_labels
¶ Returns number of the graph labels in the graph.
- Returns
Number of graph labels and 0 if there is no graph_label.
- Return type
-
property
num_node_features
¶ Returns node feature dimension in the graph.
- Returns
Node feature dimension and 0 if there is no node_feature.
- Return type
-
property
num_node_labels
¶ Returns number of the node labels in the graph.
- Returns
Number of node labels and 0 if there is no node_label.
- Return type
-
property
num_nodes
¶ Return number of nodes in the graph.
- Returns
Number of nodes in the graph.
- Return type
-
static
pyg_to_graph
(data, verbose: bool = False, fixed_split: bool = False)[source]¶ Converts Pytorch Geometric data to a Graph object.
- Parameters
data (
torch_geometric.data
) – a Pytorch Geometric data.verbose – if print verbose warning
fixed_split – if load fixed data split from PyG dataset
- Returns
A new DeepSNAP
deepsnap.graph.Graph
object.- Return type
-
static
raw_to_graph
(data)[source]¶ Write other methods for user to import their own data format and make sure all attributes of G are scalar/torch.tensor.
Not implemented
.
-
resample_disjoint
(message_ratio)[source]¶ Resample disjoint edge split of message passing and objective links.
Note that if apply_transform (on the message passing graph) was used before this resampling, it needs to be re-applied, after resampling, to update some of the edges that were in objectives.
-
split
(task: str = 'node', split_ratio: List[float] = None)[source]¶ Split current graph object to list of graph objects.
- Parameters
task (string) – one of node, edge or link_pred.
split_ratio (array_like) – array_like ratios [train_ratio, validation_ratio, test_ratio].
- Returns
A Python list of
deepsnap.graph.Graph
objects with specified task.- Return type
-
split_link_pred
(split_ratio: Union[float, List[float]])[source]¶ Split the graph into len(split_ratio) graphs for link prediction. Internally this splits edge indices, and the model will only compute loss for the embedding of nodes in each split graph. This is only used for transductive link prediction task In this task, different part of graph is observed in train/val/test Note: this functon will be called twice, if during training, we further split the training graph so that message edges and objective edges are different
-
to
(device, *keys)[source]¶ Performs tensor dtype and/or device conversion to all attributes
*keys
. If*keys
is not given, the conversion is applied to all present attributes.- Parameters
device – Specified device name.
*keys (string, optional) – Tensor attributes which will transfer to the specified device.
deepsnap.hetero_gnn module¶
-
class
deepsnap.hetero_gnn.
HeteroConv
(convs, aggr='add', parallelize=False)[source]¶ Bases:
torch.nn.modules.module.Module
A “wrapper” layer designed for heterogeneous graph layers. It takes a heterogeneous graph layer, such as
deepsnap.hetero_gnn.HeteroSAGEConv
, at the initializing stage.-
aggregate
(xs)[source]¶ The aggregation for each node type. Currently support concat, add, mean, max and mul.
-
forward
(node_features, edge_indices, edge_features=None)[source]¶ The forward function for HeteroConv.
- Parameters
node_features (dict) – A dictionary each key is node type and the corresponding value is a node feature tensor.
edge_indices (dict) – A dictionary each key is message type and the corresponding value is an edge index tensor.
edge_features (dict) – A dictionary each key is edge type and the corresponding value is an edge feature tensor. Default is None.
-
-
class
deepsnap.hetero_gnn.
HeteroSAGEConv
(in_channels_neigh, out_channels, in_channels_self=None)[source]¶ Bases:
torch_geometric.nn.conv.message_passing.MessagePassing
The heterogeneous compitable GraphSAGE operator is derived from the “Inductive Representation Learning on Large Graphs”, “Modeling polypharmacy side effects with graph convolutional networks” and “Modeling Relational Data with Graph Convolutional Networks” papers.
- Parameters
-
deepsnap.hetero_gnn.
forward_op
(x, func, **kwargs)[source]¶ A helper function for the heterogeneous operations. Given a dictionary input, it will return a dictionary with the same keys and the values applied by the func with specified parameters.
- Parameters
x (dict) – A dictionary that the value of each item will be applied by the func.
func (
function
) – The function will be applied to each value in the dictionary.**kwargs – Parameters that will be passed into the func.
-
deepsnap.hetero_gnn.
loss_op
(pred, y, label_index, loss_func, **kwargs)[source]¶ A helper function for the heterogeneous loss operations.
- Parameters
pred (dict) – A dictionary of predictions.
y (dict) – A dictionary of labels.
label_index (dict) – A dictionary of indicies that the loss will be computed on. Each value should be a Pytorch long tensor.
loss_func (
function
) – The loss function.**kwargs – Parameters that will be passed into the loss_func.
deepsnap.hetero_graph module¶
-
class
deepsnap.hetero_graph.
HeteroGraph
(G=None, **kwargs)[source]¶ Bases:
deepsnap.graph.Graph
A plain python object modeling a heterogeneous graph with various attributes (String node type is required for the HeteroGraph).
- Parameters
G (
networkx.classes.graph
) – The NetworkX graph object which contains features and labels for each node type of edge type.**kwargs – keyworded argument list with keys such as
"node_feature"
,"node_label"
and corresponding attributes.
-
property
edge_types
¶ Return list of edge types in the heterogeneous graph.
-
get_num_dims
(key, obj_type, as_label: bool = False) → int[source]¶ Returns the number of dimensions for one graph/node/edge property for specified types.
-
get_num_edge_features
(edge_type: str) → int[source]¶ Return the edge feature dimension of specified edge type.
- Returns
The edge feature dimension for specified edge type.
- Return type
-
get_num_edge_labels
(edge_type: str) → int[source]¶ Return the number of edge labels.
- Returns
Number of edge labels for specified edge type.
- Return type
-
get_num_edges
(message_type: Union[tuple, List[tuple]] = None) → int[source]¶ Return number of edges for a edge type or list of edgs types.
-
get_num_node_features
(node_type: str) → int[source]¶ Return the node feature dimension of specified node type.
- Returns
The node feature dimension for specified node type.
- Return type
-
get_num_node_labels
(node_type: str) → int[source]¶ Return the number of node labels.
- Returns
Number of node labels for specified node type.
- Return type
-
get_num_nodes
(node_type: Union[str, List[str]] = None)[source]¶ Return number of nodes for a node type or list of node types.
-
property
message_types
¶ Return the list of message types (src_node_type, edge_type, end_node_type) in the heterogeneous graph.
-
static
negative_sampling
(edge_index: Dict[str, None._VariableFunctions.tensor], num_nodes=None, num_neg_samples: Dict[str, int] = None)[source]¶ Samples random negative edges of a heterogeneous graph given by
edge_index
.- Parameters
edge_index (LongTensor) – The edge indices.
num_nodes (int, optional) – The number of nodes, i.e.
max_val + 1
ofedge_index
. (default:None
)num_neg_samples (int, optional) – The number of negative samples to return. If set to
None
, will try to return a negative edge for every positive edge. (default:None
)force_undirected (bool, optional) – If set to
True
, sampled negative edges will be undirected. (default:False
)
- Return type
torch.LongTensor
-
property
node_types
¶ Return list of node types in the heterogeneous graph.
-
split
(task: str = 'node', split_types: Union[str, List[str], tuple, List[tuple]] = None, split_ratio: List[float] = None, edge_split_mode: str = 'exact')[source]¶ Split current graph object to list of graph objects.
- Parameters
task (string) – One of node, edge or link_pred.
split_types (list) – Types splitted on. Default is None which will split all the types in specified task.
split_ratio (array_like) – Array_like ratios [train_ratio, validation_ratio, test_ratio].
- Returns
A Python list of Graph objects with specified task.
- Return type
-
split_link_pred
(split_types: List[tuple], split_ratio: Union[float, List[float]], edge_split_mode: str = 'exact')[source]¶ Split the graph into len(split_ratio) graphs for link prediction. Internally this splits edge indices, and the model will only compute loss for the embedding of nodes in each split graph. This is only used for transductive link prediction task In this task, different part of graph is observed in train/val/test Note: this functon will be called twice, if during training, we further split the training graph so that message edges and objective edges are different