deepsnap.dataset¶

Contents

DeepSNAP GraphDataset
DeepSNAP Dataset Generator
DeepSNAP Dataset EnsembleGenerator

DeepSNAP GraphDataset ¶

class GraphDataset(graphs: Optional[List[deepsnap.graph.Graph]] = None, task: str = 'node', custom_split_graphs: Optional[List[deepsnap.graph.Graph]] = None, edge_negative_sampling_ratio: float = 1, edge_message_ratio: float = 0.8, edge_train_mode: str = 'all', edge_split_mode: str = 'exact', minimum_node_per_graph: int = 5, generator=None, resample_negatives: bool = False, resample_disjoint: bool = False, resample_disjoint_period: int = 1, negative_label_val: Optional[int] = None, netlib=None)[source]¶

Bases: object

A plain python object modeling a list of deepsnap.graph.Graph objects with various (optional) attributes.

Parameters

graphs (list, optional) – A list of deepsnap.graph.Graph.
task (str) – The task that this GraphDataset is used for (task = node or edge or link_pred or graph).
custom_split_graphs (list) – A list of 2 (train and val) or 3 (train, val and test) lists of splitted graphs, used in custom split of the graph task.
edge_negative_sampling_ratio (float) – The number of negative samples compared to that of positive edges. Default value is 1.
edge_message_ratio (float) – The number of message passing edges compared to that of training supervision edges. Default value is 0.8.
edge_train_mode (str) – Use all or disjoint. In all mode, training supervision edges are same with the message passing edges. In disjoint mode, training supervision objectives are different from the message passing edges. The difference between these two modes please see the DeepSNAP link prediction Colab.
edge_split_mode (str) – Use exact or approximate. This mode is designed for the heterogeneous graph. If the mode is exact, split the heterogeneous graph according to both the ratio and the split type. If the mode is approximate, split the heterogeneous graph regardless of the split type.
minimum_node_per_graph (int) – If the number of nodes of a graph is smaller than the minimum node per graph, that graph will be filtered out.
generator (Generator) – The dataset will be on-the-fly generated. The on-the-fly generator will be used, if the self.graphs is empty or None, and the generator (Generator) is provided with an overwritten generate() method.
resample_negatives (bool) – Whether to resample negative edges in each iteration of the link_pred task. User needs to set this variable in the case of tensor backend for the custom split.
resample_disjoint (bool) – Whether to resample disjoint training edges in the disjonint link_pred task.
resample_disjoint_period (int) – The number of iterations after which the training edges in the disjoint mode are resampled.
negative_label_val (int, optional) – The value of negative edges generated in link_pred task. User needs to set this variable in the case of tensor backend custom split.
netlib (types.ModuleType, optional) – The graph backend module. Currently DeepSNAP supports the NetworkX and SnapX (for SnapX only the undirected homogeneous graph) as the graph backend. Default graph backend is the NetworkX.

apply_transform(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]¶

Applies transformation to all graph objects. All graphs in self.graphs will be run by the specified transform() function, and then a new GraphDataset object will be returned.

Parameters

transform (callable) – User-defined transformation function.
update_tensor (bool) – If the graphs have changed, use the graph to update the stored tensor attributes.
update_graph (bool) – If the tensor attributes have changed, use the attributes to update the graphs.
deep_copy (bool) – If True, all graphs will be deepcopied and then fed into the transform() function. In this case, the transform() function also might need to return a Graph object.
**kwargs (optional) – Parameters used in the transform() function for each Graph object.

Returns

A new GraphDataset object with transformed graphs.

Return type

GraphDataset

filter(filter_fn, deep_copy: bool = False, **kwargs)[source]¶

Filter the graphs in the dataset. Discarding a graph G when filter_fn(G) is False. apply_transform() is an analog of the Python map function, while filter() is an analog of the Python filter function.

Parameters

filter_fn – User-defined filter function that returns True (keep) or False (discard) the graph object in the dataset.
deep_copy – If True, all graphs will be deepcopied and then fed into the filter() function.
**kwargs – Parameters used in the filter() function.

Returns

A new GraphDataset object with graphs filtered.

Return type

GraphDataset

num_dims_dict() → Dict[str, int][source]¶

Dimensions of all fields.

Returns: Dimensions of all fields. For example, if graphs have two attributes the node_feature and the graph_label. The returned dictionary will have two keys, node_feature and graph_label, and two values, node feature dimension and graph label dimension.
Return type: dict

property num_edge_features¶

Returns the edge feature dimension.

Returns: The edge feature dimension for the graphs in the dataset.
Return type: int

property num_edge_labels¶

Returns the number of edge labels.

Returns: The number of edge labels for the graphs in the dataset.
Return type: int

property num_edges¶

Return the number of edges for the graphs in the dataset.

Returns: A list of number of edges for the graphs in the dataset.
Return type: list

property num_graph_features¶

Returns the graph feature dimension.

Returns: The graph feature dimension for the graphs in the dataset.
Return type: int

property num_graph_labels¶

Returns the number of graph labels.

Returns: The number of graph labels for the graphs in the dataset.
Return type: int

property num_labels¶

A General wrapper that returns the number of labels depending on the task.

Returns: The number of labels, depending on the task.
Return type: int

property num_node_features¶

Returns the node feature dimension.

Returns: The node feature dimension for the graphs in the dataset.
Return type: int

property num_node_labels¶

Returns the number of node labels.

Returns: The number of node labels for the graphs in the dataset.
Return type: int

property num_nodes¶

Return the number of nodes for the graphs in the dataset.

Returns: A list of number of nodes for the graphs in the dataset.
Return type: list

static pyg_to_graphs(dataset, verbose: bool = False, fixed_split: bool = False, tensor_backend: bool = False, netlib=None) → List[deepsnap.graph.Graph][source]¶

Transform a torch_geometric.data.Dataset object to a list of deepsnap.grpah.Graph objects.

Parameters

dataset (torch_geometric.data.Dataset) – A torch_geometric.data.Dataset object that will be transformed to a list of deepsnap.grpah.Graph objects.
verbose (bool) – Whether to print information such as warnings.
fixed_split (bool) – Whether to load the fixed data split from the original PyTorch Geometric dataset.
tensor_backend (bool) – True will use pure tensors for graphs.
netlib (types.ModuleType, optional) – The graph backend module. Currently DeepSNAP supports the NetworkX and SnapX (for SnapX only the undirected homogeneous graph) as the graph backend. Default graph backend is the NetworkX.

Returns

A list of deepsnap.graph.Graph objects.

Return type

list

resample_disjoint()[source]¶: Resample splits of the message passing and supervision edges in the disjoint mode.

Note

If apply_transform() (on the message passing graph) was used before this resampling, it needs to be re-applied after resampling, to update some of the (supervision) edges that were in the objectives.

split(transductive: bool = True, split_ratio: Optional[List[float]] = None, split_types: Optional[Union[str, List[str]]] = None, shuffle: bool = True) → List[deepsnap.graph.Graph][source]¶

Split the dataset into train, validation (and test) sets.

Parameters

transductive (bool) – Whether the learning is transductive (True) or inductive (False). Inductive split is always used for the graph-level task, self.task equals to graph.
split_ratio (list) – A list of ratios such as [train_ratio, validation_ratio, test_ratio].
split_types (str or list) – Types splitted on. Default is None.
shuffle (bool) – Whether to shuffle data for the splitting.

Returns

A list of 3 (2) deepsnap.dataset.GraphDataset objects corresponding to the train, validation (and test) sets.

Return type

list

to(device)[source]¶

Transfer the graphs in the dataset to specified device.

Parameters: device (str) – Specified device name, such as cpu or cuda.