Open positions
Open research positions in SNAP group are available at undergraduate, graduate and postdoctoral levels.

Flickr image relationships

Dataset information

This dataset is built by forming links between images sharing common metadata from Flickr. Edges are formed between images from the same location, submitted to the same gallery, group, or set, images sharing common tags, images taken by friends, etc. The original images are collected from PASCAL, ImageCLEF, MIR, and NUS-wide.

Dataset statistics
Nodes 105938
Edges 2316948
Nodes in largest WCC 105722 (0.998)
Edges in largest WCC 2316668 (1.000)
Nodes in largest SCC 105722 (0.998)
Edges in largest SCC 2316668 (1.000)
Average clustering coefficient 0.0891
Number of triangles 107987357
Fraction of closed triangles 0.1828
Diameter (longest shortest path) 9
90-percentile effective diameter 4.8

Source (citation)


File Description
flickrEdges.txt.gz Image relationships on Flickr (edges only)
nodeFeaturesFlickr.tar.gz Node features
edgeFeaturesFlickr.tar.gz Edge features
flickrXml.tar.gz Xml of all Flickr metadata

How to parse (in Python)

import xml.etree.ElementTree as ET import sys def parsePhotos(path): f = open(path, 'r') f.readline() content = "" for l in f: content += l if l.startswith(""): yield ET.fromstring(content) content = "" for x in parsePhotos(sys.argv[1]): print ET.tostring(x)