Research Positions in the SNAP Group
Winter Quarter 2019-20

Welcome to the application page for research positions in the SNAP group, Winter Quarter 2019-20!

Our group has open positions for Research Assistants and students interested in independent studies and research (CS191, CS195, CS199, CS399). These positions are available for Stanford University students only. Below are some of the possible research projects. All projects are high-impact, allowing participants to perform research and work on real-world problems and data, and leading to research publications or working systems. Positions are often extended over several quarters. We are looking for highly motivated students with any combination of skills: data mining, machine learning, algorithms, social network analysis, and computer systems.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled early in the quarter. Thanks for your interest!

If you have any questions please contact Rok Sosic at rok@cs.stanford.edu.

Application form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

Please select all the projects that you are interested in. You can find the project descriptions below.

Representation Learning with Graph Neural Networks [description]
Keywords: deep learning, representation learning, network analysis
Common Sense Reasoning with Knowledge Graphs [description]
Keywords: knowledge graphs, graph neural networks
Resource Allocation for Hybrid Human-Machine Models [description]
Keywords: machine learning, resource allocation, event forecasting
Faster Training of Large Machine Learning Models and Learning from Self Driving Data [description]
Keywords: convex optimization, stochastic gradient
Question Answering with Knowledge Graphs [description]
Keywords: knowledge graphs, question answering, graph representation learning, machine reading comprehension
SNAP: Stanford Network Analysis Platform [description]
Keywords: open-source software, network analysis, data mining, graph algorithms
Deep Learning for Source Code [description]
Keywords: source code analysis, representation learning

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191)

Statement of Purpose

Briefly explain why you would like to participate in this project, why you think you are qualified to work on it, and how you would like to contribute.

Your Resume

Your Transcript

Click on the button below to Submit


Projects

Representation Learning with Graph Neural Networks

Keywords: deep learning, representation learning, network analysis

Representation learning through Graph Neural Networks is emerging as a major new methodology that allows us to advance our understanding of complex systems, such as social, biological, molecular, and financial networks. Using representation learning, we can generate embeddings that compress high-dimensional network information into low-dimensional feature vectors. These embeddings encode network structures - such as edges between nodes - as geometric relationships, and are crucial for many tasks. For example, after generating embeddings of proteins, genes, and chemicals in a biological interaction network, we can use distances in the learned embedding space to predict novel interactions and assist in drug design.

Our group has multiple projects that seek to push the state-of-the-art in representation learning with Graph Neural Networks on complex networks. The projects range from molecular graph generation, temporal graphs, graph similarity, to addressing domain specific challenges, with data from a range of domains, including biomedical, social, molecular, computer systems, or NLP domains.

We are looking for highly-motivated students that have experience with network analysis, machine learning, and deep learning (e.g., CS224W, CS229, CS224N, CS231N). Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., TensorFlow, PyTorch).

Go to the application form.

Common Sense Reasoning with Knowledge Graphs

Keywords: knowledge graphs, graph neural networks

The goal of this project is to develop a common sense reasoning engine based on knowledge graphs. First, we will construct a commonsense knowledge graph that will support a rich set of intuitive everyday phenomena such as abduction, analogy, causality, agency, and physics in a unified reasoning engine. Next, we will embed the commonsense knowledge graph nodes (using state-of-the-art Graph Convolutional Networks) in a low-dimensional vectorized knowledge space and represent logical operators as learned geometric operations such as translation in this continuous space. We will then develop a question understanding module which will use a novel story understanding system based on ConceptNet and a multi-modal retrieval algorithm, to parse the question into a structured query, ground the query in the knowledge space, and apply the learned operators to deduce the most probable answers.

We are looking for highly-motivated students that have experience with machine learning, graph neural networks, network analysis, and deep learning (e.g., CS224W, CS229, CS224N, CS231N). Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., TensorFlow, PyTorch).

Go to the application form.

Resource Allocation for Hybrid Human-Machine Forecasting Models

Keywords: machine learning, resource allocation, event forecasting

Producing accurate forecasts for geopolitical events is an important and challenging problem. We are developing a hybrid system, integrating human and machine forecasting components to create maximally accurate, flexible, and scalable forecasting capabilities. As part of that system, we are working on a novel family of resource allocation strategies that maximizes the effectiveness of humans (more specifically, crowd workers) when interacting with machine learning models. The goal of this project is to further develop such strategies, both in terms of improving the user behavioral model of the crowd workers, and in terms of framing our experiments as a multi-armed bandit problem. Your expected contribution will be both on the theoretical side (e.g., improving models, devising experiments, helping to write papers, etc.), and practical (analyzing data, writing Python code, and overall helping us to win the Hybrid Forecasting Challenge).

We are looking for highly-motivated students with experience in machine learning and data science (e.g., CS221, CS224W, CS229, CS224N, CS246) and excellent Python skills.

Go to the application form.

Faster Training of Large Machine Learning Models and Learning from Self Driving Data

Keywords: convex optimization, stochastic gradient

The unprecedented growth in modern datasets -- coming from different sources and modalities such as images, videos, sensor data, social networks, etc. -- has provided immense opportunities to utilize these massive datasets. Deep neural networks in particular have been very successful to learn from large amounts of data, especially in computer vision and natural language processing. However, in many problems the resulting optimization problem is complex, and hence training large networks often has a high computational cost. In this project we aim at reducing the training time of learning large complex models. To do so, we want to find an ordering of the training data points based on their contribution to the model accuracy. Having such an ordering allows us to train a model on a subset of data points that provides the same accuracy as the model learned on the whole training data. It also enables us to have a trade-off between the training time and the accuracy of the learned model. We plan to apply our algorithms to a very large dataset collected by self driving cars consisting of lidar and camera data.

For this project we are looking for students with a strong machine learning and programming skills, an interest in building large-scale systems, and a desire to work on real-world applications. Experience with Spark/Hadoop, object detection and tracking algorithms, and self driving data is a big plus.

Go to the application form.

Question Answering with Knowledge Graphs

Keywords: knowledge graphs, question answering, graph representation learning, machine reading comprehension

The goal of this project is to develop a question answering engine based on knowledge graphs. Particularly, we are interested in two QA scenarios: a) answering multiple-choice questions of United States Medical License Exam (USMLE) using public medical knowledge graphs such as diseasedatabase.com, malacards.org, or Wikipedia. A typical kind of question of USMLE, for example, is to judge the most likely cause of the symptom for a given patient; b) answering questions posted by online users in e-commerce websites such as Amazon or JD.com using a product knowledge graph. These questions are basically related to item attributes, for example, how much memory does this TV have. In both QA scenarios, a large amount of semi-structured or unstructured textual information is available. Therefore, the techniques of machine reading comprehension might also be useful for this project.

We are looking for highly-motivated students that have experience with machine learning, knowledge graphs, and natural language processing (e.g., CS224W, CS229, CS224N). Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., TensorFlow, PyTorch).

Go to the application form.

SNAP: Stanford Network Analysis Platform

Keywords: open-source software, network analysis, data mining, graph algorithms

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance network analysis and graph mining library that easily scales to massive networks with billions of nodes and edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. It is downloaded over 1,000 times per month. SNAP is being constantly expanded with new functionality, including support for big-memory multi-core machines with 12TB RAM and 288 CPU cores.

We are looking for students with interest in contributing to the SNAP open-source codebase. SNAP is written mostly in C++, so experience with this language is a plus.

Go to the application form.

Deep Learning for Source Code

Keywords: source code analysis, representation learning

Machine learning on code is revolutionizing the field of software development. However, its potential depends heavily on the expressiveness of the model chosen to represent source code. As such, we combined the expressiveness of Graph Neural Networks (GNNs) and the sheer representational power of Transformers (as seen in BERT and XLNet) to develop a novel model which obtains state-of-the-art results on common machine learning tasks for source code. The goal of this project is to go beyond those common tasks, and showcase the real potential of our architecture. We will train the model on TBs of source code, and then develop a decoder to generate contextual recommendations -- you can think of it as the "next generation" of code auto-completers, where instead of completing a single line of code, it will recommend highly relevant snippets of code. Furthermore, we will be able to identify common patterns (and anti-patterns) of how developers write code. This trove of data will allow our model to recommend different alternatives, e.g., a replacement snippet that is (i) more efficient, (ii) less bug-prone, (iii) based on a specific library, etc.

We are looking for highly-motivated students with experience in deep learning (e.g., CS224W, CS229, CS224N, CS231N) and excellent programming skills. Applicants should be very familiar with at least one deep learning framework (e.g., Tensorflow, PyTorch) and have a desire to work on real-world applications.

Go to the application form.