deepsnap.dataset¶
DeepSNAP GraphDataset¶
-
class
GraphDataset
(graphs: Optional[List[deepsnap.graph.Graph]] = None, task: str = 'node', custom_split_graphs: Optional[List[deepsnap.graph.Graph]] = None, edge_negative_sampling_ratio: float = 1, edge_message_ratio: float = 0.8, edge_train_mode: str = 'all', edge_split_mode: str = 'exact', minimum_node_per_graph: int = 5, generator=None, resample_negatives: bool = False, resample_disjoint: bool = False, resample_disjoint_period: int = 1, negative_label_val: Optional[int] = None, netlib=None)[source]¶ Bases:
object
A plain python object modeling a list of
deepsnap.graph.Graph
objects with various (optional) attributes.- Parameters
graphs (list, optional) – A list of
deepsnap.graph.Graph
.task (str) – The task that this
GraphDataset
is used for (task = node or edge or link_pred or graph).custom_split_graphs (list) – A list of 2 (train and val) or 3 (train, val and test) lists of splitted graphs, used in custom split of the graph task.
edge_negative_sampling_ratio (float) – The number of negative samples compared to that of positive edges. Default value is 1.
edge_message_ratio (float) – The number of message passing edges compared to that of training supervision edges. Default value is 0.8.
edge_train_mode (str) – Use all or disjoint. In all mode, training supervision edges are same with the message passing edges. In disjoint mode, training supervision objectives are different from the message passing edges. The difference between these two modes please see the DeepSNAP link prediction Colab.
edge_split_mode (str) – Use exact or approximate. This mode is designed for the heterogeneous graph. If the mode is exact, split the heterogeneous graph according to both the ratio and the split type. If the mode is approximate, split the heterogeneous graph regardless of the split type.
minimum_node_per_graph (int) – If the number of nodes of a graph is smaller than the minimum node per graph, that graph will be filtered out.
generator (
Generator
) – The dataset will be on-the-fly generated. The on-the-fly generator will be used, if theself.graphs
is empty or None, and the generator (Generator
) is provided with an overwrittengenerate()
method.resample_negatives (bool) – Whether to resample negative edges in each iteration of the link_pred task. User needs to set this variable in the case of tensor backend for the custom split.
resample_disjoint (bool) – Whether to resample disjoint training edges in the disjonint link_pred task.
resample_disjoint_period (int) – The number of iterations after which the training edges in the disjoint mode are resampled.
negative_label_val (int, optional) – The value of negative edges generated in link_pred task. User needs to set this variable in the case of tensor backend custom split.
netlib (types.ModuleType, optional) – The graph backend module. Currently DeepSNAP supports the NetworkX and SnapX (for SnapX only the undirected homogeneous graph) as the graph backend. Default graph backend is the NetworkX.
-
apply_transform
(transform, update_tensor: bool = True, update_graph: bool = False, deep_copy: bool = False, **kwargs)[source]¶ Applies transformation to all graph objects. All graphs in
self.graphs
will be run by the specifiedtransform()
function, and then a newGraphDataset
object will be returned.- Parameters
transform (callable) – User-defined transformation function.
update_tensor (bool) – If the graphs have changed, use the graph to update the stored tensor attributes.
update_graph (bool) – If the tensor attributes have changed, use the attributes to update the graphs.
deep_copy (bool) – If True, all graphs will be deepcopied and then fed into the
transform()
function. In this case, thetransform()
function also might need to return a Graph object.**kwargs (optional) – Parameters used in the
transform()
function for each Graph object.
- Returns
A new
GraphDataset
object with transformed graphs.- Return type
-
filter
(filter_fn, deep_copy: bool = False, **kwargs)[source]¶ Filter the graphs in the dataset. Discarding a graph G when filter_fn(G) is False.
apply_transform()
is an analog of the Python map function, whilefilter()
is an analog of the Python filter function.- Parameters
- Returns
A new
GraphDataset
object with graphs filtered.- Return type
-
num_dims_dict
() → Dict[str, int][source]¶ Dimensions of all fields.
- Returns
Dimensions of all fields. For example, if graphs have two attributes the node_feature and the graph_label. The returned dictionary will have two keys, node_feature and graph_label, and two values, node feature dimension and graph label dimension.
- Return type
-
property
num_edge_features
¶ Returns the edge feature dimension.
- Returns
The edge feature dimension for the graphs in the dataset.
- Return type
-
property
num_edge_labels
¶ Returns the number of edge labels.
- Returns
The number of edge labels for the graphs in the dataset.
- Return type
-
property
num_edges
¶ Return the number of edges for the graphs in the dataset.
- Returns
A list of number of edges for the graphs in the dataset.
- Return type
-
property
num_graph_features
¶ Returns the graph feature dimension.
- Returns
The graph feature dimension for the graphs in the dataset.
- Return type
-
property
num_graph_labels
¶ Returns the number of graph labels.
- Returns
The number of graph labels for the graphs in the dataset.
- Return type
-
property
num_labels
¶ A General wrapper that returns the number of labels depending on the task.
- Returns
The number of labels, depending on the task.
- Return type
-
property
num_node_features
¶ Returns the node feature dimension.
- Returns
The node feature dimension for the graphs in the dataset.
- Return type
-
property
num_node_labels
¶ Returns the number of node labels.
- Returns
The number of node labels for the graphs in the dataset.
- Return type
-
property
num_nodes
¶ Return the number of nodes for the graphs in the dataset.
- Returns
A list of number of nodes for the graphs in the dataset.
- Return type
-
static
pyg_to_graphs
(dataset, verbose: bool = False, fixed_split: bool = False, tensor_backend: bool = False, netlib=None) → List[deepsnap.graph.Graph][source]¶ Transform a
torch_geometric.data.Dataset
object to a list ofdeepsnap.grpah.Graph
objects.- Parameters
dataset (
torch_geometric.data.Dataset
) – Atorch_geometric.data.Dataset
object that will be transformed to a list ofdeepsnap.grpah.Graph
objects.verbose (bool) – Whether to print information such as warnings.
fixed_split (bool) – Whether to load the fixed data split from the original PyTorch Geometric dataset.
tensor_backend (bool) – True will use pure tensors for graphs.
netlib (types.ModuleType, optional) – The graph backend module. Currently DeepSNAP supports the NetworkX and SnapX (for SnapX only the undirected homogeneous graph) as the graph backend. Default graph backend is the NetworkX.
- Returns
A list of
deepsnap.graph.Graph
objects.- Return type
-
resample_disjoint
()[source]¶ Resample splits of the message passing and supervision edges in the disjoint mode.
Note
If
apply_transform()
(on the message passing graph) was used before this resampling, it needs to be re-applied after resampling, to update some of the (supervision) edges that were in the objectives.
-
split
(transductive: bool = True, split_ratio: Optional[List[float]] = None, split_types: Optional[Union[str, List[str]]] = None, shuffle: bool = True) → List[deepsnap.graph.Graph][source]¶ Split the dataset into train, validation (and test) sets.
- Parameters
transductive (bool) – Whether the learning is transductive (True) or inductive (False). Inductive split is always used for the graph-level task,
self.task
equals to graph.split_ratio (list) – A list of ratios such as [train_ratio, validation_ratio, test_ratio].
split_types (str or list) – Types splitted on. Default is None.
shuffle (bool) – Whether to shuffle data for the splitting.
- Returns
A list of 3 (2)
deepsnap.dataset.GraphDataset
objects corresponding to the train, validation (and test) sets.- Return type
DeepSNAP Dataset Generator¶
-
class
Generator
(sizes, size_prob=None, dataset_len=0)[source]¶ Bases:
object
Abstract class of on the fly generator used in the dataset. It generates on the fly graphs, which will be fed into the model.
-
generate
()[source]¶ Overwrite in subclass. Generates and returns a
deepsnap.graph.Graph
object- Returns
A DeepSNAP graph object.
- Return type
-
DeepSNAP Dataset EnsembleGenerator¶
-
class
EnsembleGenerator
(generators, gen_prob=None, dataset_len=0)[source]¶ Bases:
deepsnap.dataset.Generator
-
generate
(**kwargs)[source]¶ Generate a list of graphs.
- Returns
Generated a list of
deepsnap.graph.Graph
objects.- Return type
-
property
num_edge_labels
¶ Returns number of the edge labels in the generated graphs.
- Returns
The number of edge labels.
- Return type
-
property
num_edges
¶ Returns number of the edges in each generated graphs.
- Returns
List of the number of edges.
- Return type
-
property
num_graph_labels
¶ Returns number of the graph labels in the generated graphs.
- Returns
The number of graph labels.
- Return type
-
property
num_node_labels
¶ Returns number of the node labels in the generated graphs.
- Returns
The number of node labels.
- Return type
-