ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy

We introduce Visual Relations with graph-level analogy (ViRel), a method for unsupervised discovery and learning of visual relations with graph-level analogy. In a grid-world based dataset that test visual relation reasoning, it achieves above 95% accuracy in unsupervised relation classification, discovers the relation graph structure for most tasks, and further generalizes to unseen tasks with more complicated relational structures.

Method
Code
Datasets
Contributors
References

Method

Visual relations form the basis of understanding our compositional world, as relationships between visual objects capture key information in a scene. It is then advantageous to learn relations automatically from the data, as learning with predefined labels cannot capture all possible relations. However, current relation learning methods typically require supervision, and are not designed to generalize to scenes with more complicated relational structures than those seen during training.

Here, we introduce ViRel, a method for unsupervised discovery and learning of Visual Relations with graph-level analogy. The following figure illustrates the task setting. Each task t contains images with a shared relation subgraph $E(G_t)$. The input are image examples under "Example Tasks". Both the relation types and the graph of each task are unknown. The goal is to infer the relation graph $E(G_i)$, the corresponding relations, and the shared relation graph $E(G_t)$ for each task t. The task is challenging since we want to infer both the underlying global visual relations (e.g. "inside", "same-shape") and the shared relation graph from raw pixels of a few images.

To address this challenging task, our method ViRel contrasts isomorphic and non-isomorphic graphs to discover the relations across tasks in an unsupervised manner, with the following architecture and contrastive loss.

Once the relations are learned, ViRel can then retrieve the shared relational graph structure for each task by parsing the predicted relational structure. Using a dataset based on grid-world and the Abstract Reasoning Corpus, we show that our method achieves above 95% accuracy in relation classification, discovers the relation graph structure for most tasks, and further generalizes to unseen tasks with more complicated relational structures. An example result is shown below:

For more information and results, please see the paper.

Code

A reference implementation of ViRel in PyTorch will be available on GitHub.

Datasets

The datasets used by ViRel are included and can be generated in the code repository.

Contributors

The following people contributed to ViRel:
Daniel Zeng*
Tailin Wu*
Jure Leskovec

References

ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy. D. Zeng*, T. Wu*, J. Leskovec. In ICML 2022 Beyond Bayes: Paths Towards Universal Reasoning Systems Workshop.