Research Positions in the SNAP Group
Fall Quarter 2018-19

Welcome to the application page for research positions in the SNAP group, Fall Quarter 2018-19!

Our group has open positions for Research Assistants and students interested in independent studies (CS191, CS199, CS399). These positions are available for Stanford University students only. Below are some of the possible research projects. All projects are high-impact, allowing participants to perform research and work on real-world problems and data, and leading to research publications or working systems. Positions are often extended over several quarters. We are looking for highly motivated students with any combination of skills: data mining, machine learning, algorithms, social network analysis, and computer systems.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled early in the quarter. Thanks for your interest!

If you have any questions please contact Yesenia Gallegos at ygallegos@cs.stanford.edu.

Application form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

Please select all the projects that you are interested in. You can find the project descriptions below.

Evolution of Multimodal Graphs [description]
Keywords: temporal networks, network embeddings, recommender systems
Models of User Online Behavior [description]
Keywords: deep learning, network embeddings, temporal networks, user modeling
Temporal Models of Complex Applications [description]
Keywords: deep learning, network embeddings, computer system modeling, complex events
Deep Learning for Network Medicine [description]
Keywords: biomedical data science, genomics, graph neural networks
Faster Training of Large Machine Learning Models [description]
Keywords: machine learning, deep neural networks, fast training methods
Deep Learning for Dynamic Interactions Networks [description]
Keywords: deep learning, representation learning
Predicting Network Evolution Patterns [description]
Keywords: probabilistic models, dynamic networks
Deep Landscape of Disease Multimorbidity [description]
Keywords: biomedical data science, genomics, graph neural networks
Path to Success: Analyzing the Life Cycle of Companies in the Economy [description]
Keywords: financial contagion, graph mining

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191)

Statement of Purpose

Briefly explain why you would like to participate in this project, why you think you are qualified to work on it, and how you would like to contribute.

Your Resume

Your Transcript

Click on the button below to Submit


Projects

Evolution of Multimodal Graphs

Keywords: temporal networks, network embeddings, recommender systems

Knowledge graphs are multimodal networks encoding the relation of various entities and attributes. In this project we intend to learn the evolution of knowledge graphs in order to capture the emergence of concepts and their relations over time. We target the task of learning online representations of nodes in multimodal networks and utilize them for recommendation.

Traditional recommender algorithms periodically rebuild their models, but they cannot adjust to quick changes in trends caused by timely information. In contrast, online learning models can adopt to temporal effects, hence they may overcome the effect of concept drift. Our final objective is to generate quality top-k recommendations by using online node embeddings learnt over rich network structures.

We are looking for students with experience in deep learning, statistics, and network science (e.g., CS230, CS224W, CS246, STATS200 and others), and an interest to work on recommender systems and network embeddings. Basic understanding of recommender systems is a plus, but not required. Working knowledge of Python is required.

Go to the application form.

Models of User Online Behavior

Keywords: deep learning, network embeddings, temporal networks, user modeling

Most of the cyber attacks, such as phishing, are initiated by exploiting humans rather than security flaws in computer or networking infrastructure. These attacks use social engineering to manipulate users into performing desired actions, such as clicking on malicious links or divulging passwords. To defend against social engineering attacks, it is critical to understand users and their online behavior. The aim of the project is to develop a new approach to modeling online behavior based on embedding multimodal social interactions?email, social media, web access?over time. The goal is to recognize unusual communication behaviors, which could signify an attack. We plan to develop a novel approach to modeling of user behavior that will capture complex interactions that happen over time between many users or groups of users and through different communication channels. The key objective of the model is to learn a low-dimensional embedding of network entities, which will allow us to group nodes and edges with similar temporal and structural behavior and thus detect attacks as unusual behavior.

We are looking for students with experience in deep learning, statistics, and network science (e.g., CS230, CS224W, CS246, STATS200 and others), and interest to work on user modeling and security related projects. Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., Tensorflow, PyTorch).

Go to the application form.

Temporal Models of Complex Applications

Keywords: deep learning, network embeddings, computer system modeling, complex events

Microservices provide a common approach to building large scale, complex applications, where an application is composed of a large number of loosely coupled, distributed services. While microservices lead to improved modularity, scalability, and enable continuous deployment, they make it notoriously challenging to detect and diagnose complex application performance problems, such as slow responses to user requests or bottlenecks in components on the backend. The purpose of this project is to develop new approaches to model the behavior of microservices based applications and then to use these models to analyze the system behavior, for example, in order to identify performance issues or predict critical components before the application breaks down. The goal is to use deep learning and other machine learning techniques and apply them to computer generated data - metrics, logs, tracing events. The result will be low-dimensional representations of the application, which will be then used for analyzing and predicting system behavior.

We are looking for students with experience in deep learning, statistics, and network science (e.g., CS230, CS224W, CS246, STATS200 and others), and interest to apply machine learning to problems in application monitoring and performance. Working knowledge of Python is required, experience with C/C++, Go and systems programming is a plus. You should have some experience with a deep learning framework (e.g., Tensorflow, PyTorch).

Go to the application form.

Deep Learning for Network Medicine

Keywords: biomedical data science, genomics, graph neural networks

Faced with skyrocketing costs for developing new drugs, the pharmaceutical industry is looking at ways to repurpose existing drugs to treat new diseases. Getting a drug to the market currently takes 13-15 years and between US$2 billion and $3 billion on average, and the costs are going up. Drug repurposing has the potential to bring much-needed treatments to the market quickly. The goal of this project is to develop a data-driven drug repurposing approach. The project will focus on automatic learning of deep embeddings for heterogeneous biomedical networks. We will define novel graph convolutional deep networks and use them to embed biomedical networks into compact vector spaces. The approach will use the learned deep representations to predict which diseases a new drug could treat and will use domain experts to evaluate and provide feedback for the machine learning loop. We are especially interested in approaches that can handle complex, heterogeneous networks with multiple types of nodes and edges (e.g., proteins, chemicals, diseases, symptoms) and can operate on all drugs and diseases, even on the hardest, yet extremely important, cases when a drug has no indicated disease or when a disease does not yet have any drug treatment.

We are looking for students with experience in deep learning, statistics, and network science (e.g., CS230, CS224W, CS246, STATS200 and others), and an interest to work on the frontline of medicine. Basic understanding of biology is a plus, but not required. Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., Tensorflow, PyTorch).

Go to the application form.

Faster Training of Large Machine Learning Models

Keywords: machine learning, deep neural networks, fast training methods

The unprecedented growth in modern datasets ? coming from different sources and modalities such as images, videos, sensor data, social networks, etc. ? has provided immense opportunities to utilize these massive datasets. Deep neural networks in particular have been very successful to learn from large amounts of data, especially in computer vision and natural language processing. However, in many problems the resulting optimization problem is complex, and hence training large networks often has a high computational cost. In this project we aim at reducing the training time of learning large complex models. To do so, we want to find an ordering of the training data points based on their contribution to the model accuracy. Having such an ordering allows us to train a model on a subset of data points that comes first in the ordering with almost the same accuracy as the model learned on the whole training data. It also enables us to have a trade-off between the training time and the accuracy of the learned model.

For this project we are looking for students with a strong machine learning and programming skills, an interest in building large-scale systems, and a desire to work on real-world applications.

Go to the application form.

Deep Learning for Dynamic Interactions Networks

Keywords: deep learning, representation learning

Users interact with items over time in several domains, such as social, healthcare, education, and communication interactions. As users and items interact, their properties evolve and co-evolve with one another. Representation learning is a powerful approach of learning these evolving user and item feature vectors. These representations are crucial for various tasks, such as predicting future interactions, identifying anomalies, clustering users and items, and more.

In this project, we will create novel deep learning models to learn dynamic representations for users and items from these sequence of interactions. While existing work focuses on learning these representations for individual users and items, we will focus on creating scalable models for analyzing trajectories, representing groups, and building generative models using dynamic embeddings.

We are looking for highly-motivated students that have experience with network analysis, machine learning, and deep learning (e.g., CS224W, CS229, CS224N, CS231N). Working knowledge of Python is required, and applicants should have experience with Pytorch/Tensorflow.

Go to the application form.

Predicting Network Evolution Patterns

Keywords: probabilistic models, dynamic networks

Networks of interconnected entities are widely used to model pairwise relations between objects in many important problems in sociology, finance, computer science, and operations research. Often times, these networks are dynamic, with nodes or edges appearing or disappearing over time, and the underlying network structure evolving over time. As a result, there is a growing interest in developing dynamic network models that allow us to study evolving networks. Non-parametric models are especially useful when there is no prior knowledge or assumption about the shape or size of the network as they can automatically address the model selection problem. In this project, we aim at using non-parametric Bayesian probabilistic models to capture and predict the evolution of dynamic networks. More precisely, at each timestep we model the network with a non-parametric edge-exchangeable model. We then infer the latent parameters at each timestep and build a model to track the shift in the values of the latent parameters over time. Such a model enables us to predict the values of the latent parameters in the next time step and hence to predict the structure of the underlying network in the future.

We are looking for students with a background in Bayesian probabilistic modeling and programming skills, as well as interest in working with real-networks.

Go to the application form.

Deep Landscape of Disease Multimorbidity

Keywords: biomedical data science, genomics, graph neural networks

Multimorbidity is the co-occurrence of multiple chronic or acute diseases within one person. It is a common phenomenon, especially in older people, where it is associated with higher mortality, poorer quality of life, and higher rates of health service. Better management of people with multimorbidity is a major challenge in healthcare, however, we know little about how diseases co-occur and interact in an individual and how these interactions change over time. The goal of this project is to develop a rigorous data science methodology for representing and modeling multimorbidity. We will seek to push the state-of-the-art in representation learning on complex networks by considering the following two challenges: (1) How to systematically characterize disease co-occurrences and investigate the interplay between multiple diseases at the level of individuals and the entire population; and (2) How to model disease relationships that change over time. We will use the new methodology to study multimorbidity in a large population of real-life patients that have multiple diseases of varying severities at any given time. This project has the potential to improve healthcare management of people with multimorbidity and will enable us to, for the first time, generate hypotheses on multimorbidity.

We are looking for students with experience in deep learning, statistics, and network science (e.g., CS230, CS224W, CS246, STATS200 and others), and an interest to work on the frontline of biology and medicine. Basic understanding of biology is a plus, but not required. Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., Tensorflow, PyTorch).

Go to the application form.

Path to Success: Analyzing the Life Cycle of Companies in the Economy

Keywords: financial contagion, graph mining

Data science and machine learning is taking the guesswork out of financial decisions. Fintech algorithms may learn the behavior of companies: data-driven methods can optimize and personalize decisions e.g. in lending, insurance, fraud detection, and overall investing. In this project we analyze a dataset containing over 100k companies and their transaction history over 10 years. By blending transaction records with metadata like balance sheet information, our aim is to understand what drives the success and failure of companies. The dataset provides an exceptional chance to study the evolution of supplier-customer relations, and the financial flows in the economy.

We are searching for students with a strong programming background in Python, and an interest in Data Science.

Go to the application form.