deepsnap.dataset

DeepSNAP GraphDataset

class GraphDataset(graphs: Optional[List[deepsnap.graph.Graph]] = None, task: str = 'node', custom_split_graphs: Optional[List[deepsnap.graph.Graph]] = None, edge_negative_sampling_ratio: float = 1, edge_message_ratio: float = 0.8, edge_train_mode: str = 'all', edge_split_mode: str = 'exact', minimum_node_per_graph: int = 5, generator=None, resample_negatives: bool = False, resample_disjoint: bool = False, resample_disjoint_period: int = 1, negative_label_val: Optional[int] = None, netlib=None)[source]

Bases: object

A plain python object modeling a list of deepsnap.graph.Graph objects with various (optional) attributes.

Parameters
  • graphs (list, optional) – A list of deepsnap.graph.Graph.

  • task (str) – The task that this GraphDataset is used for (task = node or edge or link_pred or graph).

  • custom_split_graphs (list) – A list of 2 (train and val) or 3 (train, val and test) lists of splitted graphs, used in custom split of the graph task.

  • edge_negative_sampling_ratio (float) – The number of negative samples compared to that of positive edges. Default value is 1.

  • edge_message_ratio (float) – The number of message passing edges compared to that of training supervision edges. Default value is 0.8.

  • edge_train_mode (str) – Use all or disjoint. In all mode, training supervision edges are same with the message passing edges. In disjoint mode, training supervision objectives are different from the message passing edges. The difference between these two modes please see the DeepSNAP link prediction Colab.

  • edge_split_mode (str) – Use exact or approximate. This mode is designed for the heterogeneous graph. If the mode is exact, split the heterogeneous graph according to both the ratio and the split type. If the mode is approximate, split the heterogeneous graph regardless of the split type.

  • minimum_node_per_graph (int) – If the number of nodes of a graph is smaller than the minimum node per graph, that graph will be filtered out.

  • generator (Generator) – The dataset will be on-the-fly generated. The on-the-fly generator will be used, if the self.graphs is empty or None, and the generator (Generator) is provided with an overwritten generate() method.

  • resample_negatives (bool) – Whether to resample negative edges in each iteration of the link_pred task. User needs to set this variable in the case of tensor backend for the custom split.

  • resample_disjoint (bool) – Whether to resample disjoint training edges in the disjonint link_pred task.

  • resample_disjoint_period (int) – The number of iterations after which the training edges in the disjoint mode are resampled.

  • negative_label_val (int, optional) – The value of negative edges generated in link_pred task. User needs to set this variable in the case of tensor backend custom split.

  • netlib (types.ModuleType, optional) – The graph backend module. Currently DeepSNAP supports the NetworkX and SnapX (for SnapX only the undirected homogeneous graph) as the graph backend. Default graph backend is the NetworkX.

apply_transform(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]

Applies transformation to all graph objects. All graphs in self.graphs will be run by the specified transform() function, and then a new GraphDataset object will be returned.

Parameters
  • transform (callable) – User-defined transformation function.

  • update_tensor (bool) – If the graphs have changed, use the graph to update the stored tensor attributes.

  • update_graph (bool) – If the tensor attributes have changed, use the attributes to update the graphs.

  • deep_copy (bool) – If True, all graphs will be deepcopied and then fed into the transform() function. In this case, the transform() function also might need to return a Graph object.

  • **kwargs (optional) – Parameters used in the transform() function for each Graph object.

Returns

A new GraphDataset object with transformed graphs.

Return type

GraphDataset

filter(filter_fn, deep_copy: bool = False, **kwargs)[source]

Filter the graphs in the dataset. Discarding a graph G when filter_fn(G) is False. apply_transform() is an analog of the Python map function, while filter() is an analog of the Python filter function.

Parameters
  • filter_fn – User-defined filter function that returns True (keep) or False (discard) the graph object in the dataset.

  • deep_copy – If True, all graphs will be deepcopied and then fed into the filter() function.

  • **kwargs – Parameters used in the filter() function.

Returns

A new GraphDataset object with graphs filtered.

Return type

GraphDataset

num_dims_dict()Dict[str, int][source]

Dimensions of all fields.

Returns

Dimensions of all fields. For example, if graphs have two attributes the node_feature and the graph_label. The returned dictionary will have two keys, node_feature and graph_label, and two values, node feature dimension and graph label dimension.

Return type

dict

property num_edge_features

Returns the edge feature dimension.

Returns

The edge feature dimension for the graphs in the dataset.

Return type

int

property num_edge_labels

Returns the number of edge labels.

Returns

The number of edge labels for the graphs in the dataset.

Return type

int

property num_edges

Return the number of edges for the graphs in the dataset.

Returns

A list of number of edges for the graphs in the dataset.

Return type

list

property num_graph_features

Returns the graph feature dimension.

Returns

The graph feature dimension for the graphs in the dataset.

Return type

int

property num_graph_labels

Returns the number of graph labels.

Returns

The number of graph labels for the graphs in the dataset.

Return type

int

property num_labels

A General wrapper that returns the number of labels depending on the task.

Returns

The number of labels, depending on the task.

Return type

int

property num_node_features

Returns the node feature dimension.

Returns

The node feature dimension for the graphs in the dataset.

Return type

int

property num_node_labels

Returns the number of node labels.

Returns

The number of node labels for the graphs in the dataset.

Return type

int

property num_nodes

Return the number of nodes for the graphs in the dataset.

Returns

A list of number of nodes for the graphs in the dataset.

Return type

list

static pyg_to_graphs(dataset, verbose: bool = False, fixed_split: bool = False, tensor_backend: bool = False, netlib=None)List[deepsnap.graph.Graph][source]

Transform a torch_geometric.data.Dataset object to a list of deepsnap.grpah.Graph objects.

Parameters
  • dataset (torch_geometric.data.Dataset) – A torch_geometric.data.Dataset object that will be transformed to a list of deepsnap.grpah.Graph objects.

  • verbose (bool) – Whether to print information such as warnings.

  • fixed_split (bool) – Whether to load the fixed data split from the original PyTorch Geometric dataset.

  • tensor_backend (bool) – True will use pure tensors for graphs.

  • netlib (types.ModuleType, optional) – The graph backend module. Currently DeepSNAP supports the NetworkX and SnapX (for SnapX only the undirected homogeneous graph) as the graph backend. Default graph backend is the NetworkX.

Returns

A list of deepsnap.graph.Graph objects.

Return type

list

resample_disjoint()[source]

Resample splits of the message passing and supervision edges in the disjoint mode.

Note

If apply_transform() (on the message passing graph) was used before this resampling, it needs to be re-applied after resampling, to update some of the (supervision) edges that were in the objectives.

split(transductive: bool = True, split_ratio: Optional[List[float]] = None, split_types: Optional[Union[str, List[str]]] = None, shuffle: bool = True)List[deepsnap.graph.Graph][source]

Split the dataset into train, validation (and test) sets.

Parameters
  • transductive (bool) – Whether the learning is transductive (True) or inductive (False). Inductive split is always used for the graph-level task, self.task equals to graph.

  • split_ratio (list) – A list of ratios such as [train_ratio, validation_ratio, test_ratio].

  • split_types (str or list) – Types splitted on. Default is None.

  • shuffle (bool) – Whether to shuffle data for the splitting.

Returns

A list of 3 (2) deepsnap.dataset.GraphDataset objects corresponding to the train, validation (and test) sets.

Return type

list

to(device)[source]

Transfer the graphs in the dataset to specified device.

Parameters

device (str) – Specified device name, such as cpu or cuda.

DeepSNAP Dataset Generator

class Generator(sizes, size_prob=None, dataset_len=0)[source]

Bases: object

Abstract class of on the fly generator used in the dataset. It generates on the fly graphs, which will be fed into the model.

generate()[source]

Overwrite in subclass. Generates and returns a deepsnap.graph.Graph object

Returns

A DeepSNAP graph object.

Return type

deepsnap.graph.Graph

DeepSNAP Dataset EnsembleGenerator

class EnsembleGenerator(generators, gen_prob=None, dataset_len=0)[source]

Bases: deepsnap.dataset.Generator

generate(**kwargs)[source]

Generate a list of graphs.

Returns

Generated a list of deepsnap.graph.Graph objects.

Return type

list

property num_edge_labels

Returns number of the edge labels in the generated graphs.

Returns

The number of edge labels.

Return type

int

property num_edges

Returns number of the edges in each generated graphs.

Returns

List of the number of edges.

Return type

list

property num_graph_labels

Returns number of the graph labels in the generated graphs.

Returns

The number of graph labels.

Return type

int

property num_node_labels

Returns number of the node labels in the generated graphs.

Returns

The number of node labels.

Return type

int

property num_nodes

Returns number of the nodes in each generated graphs.

Returns

List of the number of nodes.

Return type

list