Research Positions in the SNAP Group
Winter Quarter 2023-24

Welcome to the application page for research positions in the SNAP group under Prof. Jure Leskovec, Winter Quarter 2023-24!

Our group has open positions for students interested in independent studies and research (CS191, CS195, CS199, CS399). These positions are available for Stanford University students only. Below are some of the possible research projects. All projects are high-impact, allowing participants to perform research and work on real-world problems and data, and leading to research publications or open source software. Positions are often extended over several quarters and might lead to a Research Assistant position. We are looking for highly motivated students with any combination of skills: machine learning, data mining, network analysis, algorithms, and computer systems.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled early in the quarter. Thanks for your interest!

If you have any questions please contact Lata Nair at lnairp24@stanford.edu.

Application Form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

Please select all the projects that you are interested in. You can find the project descriptions below.

Science Graph Benchmark [description]
Keywords: AI for science, graph machine learning
A Foundation Model for Feature-rich Relational Data [description]
Keywords: graph neural networks pretraining, self-supervised learning, foundation models, relational data
A Framework for Integrating Black-box Temporal Predictors with Hypergraph GNNs in Relational Databases [description]
Keywords: relational hypergraphs, temporal graph neural networks, causal link prediction
RelBench: A Benchmark for Graph Representation Learning on Relational Databases [description]
Keywords: relational data, graph machine learning, benchmark

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191, CS195)

Statement of Purpose

Briefly explain why you would like to participate in this project, why you think you are qualified to work on it, and how you would like to contribute.

Your Resume

Your Transcript

Click on the button below to Submit


Projects

Science Graph Benchmark

Keywords: AI for science, graph machine learning

Our project spearheads scientific innovation using graph machine learning, targeting the development of advanced methods like novel graph transformers and multimodal integration techniques. Our aim is to set new benchmarks and frameworks for data-driven research across disciplines such as biology, materials science, and physics. We seek students to help with: creating innovative task, developing automated discovery methods, and constructing benchmark datasets. This project seeks to advance the vital role of graph machine learning in diverse scientific fields and involves extensive collaboration within the Stanford community.

We are looking for highly motivated students who have experience in graph machine learning and deep learning (e.g., CS224W, CS224N, CS231N, CS230). Proficiency in PyTorch and benchmarking infrastructure is essential. Experience with any physical science is a strong plus.

Go to the application form.

A Foundation Model for Feature-rich Relational Data

Keywords: graph neural networks pretraining, self-supervised learning, foundation models, relational data

Translating the concept of a foundation model to graphs is an open challenge, particularly when it comes to rich structured data, such as relational data. Existing foundation model strategies focus only on the graph’s topology, stripping the graph from its node, edge, and hyperedge features. This project explores how to include a limited set of node and edge features in a graph foundation model that would be transferable across graph domains. The key challenge is defining the concept of neural network invariances beyond that of permutation invariances of GNNs, such that the model can provably transfer knowledge between distinct domains. Our goal is to build a system that can handle large-scale pretraining such that when introduced to a new domain or dataset, the model produces high quality results with no or minimal training.

We are looking for highly motivated students who have experience in machine learning and deep learning (e.g., CS224W, CS224N, CS231N, CS229 etc.), and are proficient with PyTorch. Basic understanding of pretraining is preferred.

Go to the application form.

A Framework for Integrating Black-box Temporal Predictors with Hypergraph GNNs in Relational Databases

Keywords: relational hypergraphs, temporal graph neural networks, causal link prediction

This project will develop a general framework to merge temporal black-box predictors, hypergraph neural networks, and causal inference in relational databases. The common approach in relational database recommendations relies on pretrained black-box temporal models, all heavily feature engineered such as boosted decision trees. This project will seamlessly combine these pretrained models with state-of-the-art GNNs in a time-then-graph framework, enabling better node label, link, and hyperedge predictors. Crucially, this novel framework will employ counterfactual graph modeling to generate embeddings derived from the pretrained temporal black-box models. Additionally, it will incorporate causal adjustments to enhance its applicability in causal prediction tasks.

We are looking for highly motivated students who have experience in machine learning and deep learning (e.g., CS224W, CS224N, CS231N, CS229 etc.), and are proficient with PyTorch. Basic understanding of GNNs and time series modeling is preferred.

Go to the application form.

RelBench: A Benchmark for Graph Representation Learning on Relational Databases

Keywords: relational data, graph machine learning, benchmark

Much of the world's data is stored in relational databases, which contain multiple tables connected by primary-foreign key links. Consequently, many interesting forecasting problems can be thought of as predictions on relational data (Which customers are at risk of churning? Will patient A respond to treatment X?). We recently released a blueprint for a new graph representation learning paradigm on relational databases, accompanied by an initial release of RelBench, a benchmark suite to support further research. We are beginning the next phase of development of RelBench in preparation for an eventual full release. The goal of this release is to provide: a) several databases and several well-tested tasks per-database, and b) GNN baseline results for all tasks.

We are looking for highly motivated students who have experience with GNNs and deep learning (e.g., CS224W, CS224N, CS231N, CS229 etc.), and are proficient with PyTorch. Experience with data-centric machine learning is a bonus, but not essential.

Go to the application form.