Research Positions in the SNAP Group
Fall Quarter 2016-17

Welcome to the application page for research positions in the SNAP group, Fall Quarter 2016-17!

Our group has open positions for Research Assistants and students interested in independent studies (CS191, CS199, CS399). These positions are available for Stanford University students only. Below are some of the possible research projects. All the projects will lead to research publications or working systems. We are looking for highly motivated students with any combination of skills: data mining, machine learning, algorithms, social network analysis, and computer systems.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled early in the quarter. Thanks for your interest!

If you have any questions please contact Prof. Leskovec at jure@cs.stanford.edu.

Application form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

Please select all the projects that you are interested in. You can find the project descriptions below.

Inferring Structure from Unstructured Biological or Sensor Data [description]
Keywords: convex optimization, data analysis, computational biology, time series
Understanding Diet and Nutrition through Large-scale Food Logs [description]
Keywords: data mining, big data analysis, machine learning, data visualization
Conversational Agents for Crisis Counseling [description]
Keywords: data science for social good, chatbot agents, natural language processing, deep learning
Dogmatism, Echo-Chambers, and Antisocial Behavior in Online Communities [description]
Keywords: social network analysis, natural language processing, data mining
SNAP: Stanford Network Analysis Platform [description]
Keywords: graph algorithms, parallel algorithms
Ringo: In-memory Graph Exploration Engine [description]
Keywords: systems programming, network analysis, parallel programming, distributed computing
Computational Social Psychology: Personality Effects in Online Behavior [description]
Keywords: natural language processing, personality, social network analysis, data mining, machine learning
Massive Interactomics: Using Network Analysis to Study Life's Diversity [description]
Keywords: computational biology, machine learning, network analysis

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191)

Statement of Purpose

Briefly explain why you would like to participate in this project and why you think you are qualified to work on it.

Your Resume

Your Transcript

Click on the button below to Submit


Projects

Inferring Structure from Unstructured Biological or Sensor Data

Keywords: computational biology, convex optimization, data analysis, time series

Many different applications contain massive sequences of timestamped observations. These observations often come from "sensors", which can be anything from gene expression readings to weather measurements, stock prices, neural fMRI scans, etc. For applications such as these, there is a need to develop optimization algorithms for time series data which are robust (practical/useful in many different settings) and scalable (since these datasets can get very large). With such algorithms, we can build systems that detect anomalies, spot trends, classify events, identify clusters, forecast future behavior, and solve many other problems at the intersection of time series analysis and machine learning. We hope to develop and implement these optimization algorithms, and to try them out on real-world sensor data provided to us by research collaborators from several different industries.

We are looking for students with a strong programming background and an interest in analyzing biological domains and/or sensor data. Knowledge in dealing with very large datasets is a plus.

Go to the application form.

Understanding Diet and Nutrition through Large-scale Food Logs

Keywords: data mining, big data analysis, machine learning, data visualization

The growing popularity of apps such as MyFitnessPal provides us with an unprecedented view of diet and nutritional choices and weight changes across millions of individuals. The data collected from such apps provides a unique opportunity to understand how to eat healthy and how to achieve or maintain a healthy weight and lifestyle. While understanding the interplay of different factors of diet decision-making and user behavior in the app is very important, the biggest challenge lies in supporting the user to reach their goals and help them to make good decisions when they need assistance. To understand how to best support users, we need to understand the decision making process around when to eat, where to eat, and what to eat. This will enable us to identify realistic opportunities for small, targeted, and contextual recommendations that could help users in the moment to stay true to their original goals. For example, understanding that a user viewed multiple menu items through the app and eventually logged one of them could help us understand where and when to encourage that particular user and to remind them of how this decision might impact their short-term and long-term goals.

We are searching for students interested in exploring these questions using a food logging dataset of several million people including data exploration, data cleaning, data visualization, feature extraction, and machine learning. Experience in Python, dealing with large datasets, data visualization, data mining, machine learning is a plus (e.g., pandas, matplotlib/seaborn, sklearn).

Go to the application form.

Conversational Agents for Crisis Counseling

Keywords: data science for social good, chatbot agents, natural language processing, deep learning

Crisis hotlines have been around for years, but until recently there's been very little data on which counseling strategies seemed most effective at helping people cope. The recent emergence of text-based crisis help lines and the accumulation of large-scale datasets is changing that. We will build on existing work on NLP for crisis counseling in our group (http://timalthoff.com/docs/althoff-2016-mental_health.pdf, http://news.stanford.edu/2016/08/10/stanford-research-improve-counseling-crisis-help-lines/) and build a chatbot/conversational agent (generative model) that can generate suggested responses and simulate both the counselor and the patient. The goal is that this tool would be used by a large crisis text line to improve their counseling services.

We are searching for students interested in applying their technical skills to help people in crisis. The project will involve natural language processing and machine learning. Experience with NLP and in particular deep learning techniques are useful (e.g., CS224D and others) and prior experience with conversational agents/chat bots is a plus.

Go to the application form.

Dogmatism, Echo-Chambers, and Antisocial Behavior in Online Communities

Keywords: social network analysis, natural language processing, data mining

Online communities and comment forums are increasingly important venues for everyday social interaction, but they have also been associated with many antisocial behaviors. There is considerable anecdotal evidence that certain online environments foster dogmatic thinking and that the "echo-chamber" effect -- where individuals only interact with others who have similar viewpoints -- can be severely damaging to public discourse. Understanding the factors that lead some online communities to become healthy forums for reasonable discourse and others to become dogmatic "echo-chambers" is a crucial social question for the 21st century. The goal of this project is to analyze these issues through a quantitative lens. Using several large datasets, including all public Reddit comments from 2009 through 2015, our goal is to understand what sorts of online environments foster specific antisocial behaviors (e.g., out-group derogation, "brigading"). A key innovation in this work is a focus on the community level. We do not simply want to detect individual acts of antisocial behavior -- we seek to understand how the collective properties of a community feed back into and shape individual user's behavior.

We are searching for students interested in exploring these questions using a combination of natural language processing and social network analysis. Working knowledge of Python is required. Experience dealing with large datasets, data visualization, data cleaning, R, social network analysis, or natural language processing is a plus.

Go to the application form.

SNAP: Stanford Network Analysis Platform

Keywords: graph algorithms, parallel algorithms

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is being constantly expanded with new graph and network algorithms for big-memory multi-core machines with 12TB RAM and 288 CPU cores.

We are looking for students with interest in developing sequential or parallel graph algorithms. SNAP is written in C++, so extensive experience with this language is a plus.

Go to the application form.

Ringo: In-memory Graph Exploration Engine

Keywords: systems programming, parallel programming, distributed computing

What data analysis engine would you build if your computer has unlimited RAM and CPU? Large-scale data analysis is transforming science and industry. However, tools and solutions for data analysts are bulky and cumbersome to use. The goal of the Ringo research project is to build an interactive system for analysis of large datasets with billions of items. The system provides strong primitives to handle relational tables as well as multimodal networks -- huge graphs with node and edge attributes. Ringo will be based on the SNAP platform. We will run Ringo on machines with 12TB RAM and 288 CPU cores.

We are looking for students with strong programming skills and desire to build computer systems. Ringo is written in Python, so extensive experience with this language is a plus.

Go to the application form.

Computational Social Psychology: Personality Effects in Online Behavior

Keywords: natural language processing, personality, social network analysis, data mining, machine learning

Our personalities help shape our interactions with the world and how others perceive and interact with us. While modern psychological theories have helped understand and quantify differences in peoples' personalities, we know little of how personality affects us socially --for example, do extroverts really have more friends, or do people tend to be friends with people having the same personality? Using a dataset of several hundred thousand peoples' personalities, we aim to answer two questions. First, we ask to what extend can personality be inferred from how a person behaves online (e.g., what they write). Second, using a social network of tens of millions of users, we ask how personality affects social behavior, such as friendships, social interactions, and even community formation. This project will look at personality comprehensively using multiple theories from psychology (e.g,. Big 5, MBTI) and use techniques from natural language processing and social network analysis for answering both questions.

We are searching for students interested in applying their technical skills to help understand society at scale. The project will involve natural language processing and machine learning. Experience with NLP techniques (e.g., CS224D and others), dealing with large datasets, data visualization, social network analysis, data mining, and/or psychology is a plus.

Go to the application form.

Massive Interactomics: Using Network Analysis to Study Life's Diversity

Keywords: computational biology, machine learning, network analysis

The tree of life is one of the most important organizing principles in biology. It provides a snapshot of the diversity within each major evolutionary lineage to answer queries, such as: how closely are gorillas related to us? what is the ancestry of zebrafish? what is the most recent common ancestor of mouse and bacteria? Molecular and technological advancements have dramatically broadened the type of data that we can use to answer such queries. The massive interactomics project aims to mine evolutionary relationships by analyzing protein interaction networks from more than a thousand different species. The challenge of massive interactomics lies in developing new methods that can profile network structure while also taking account of biases and potential confounding in protein interaction data. Interesting research questions in this project include identifying network patterns that are specific for closely related species (e.g., mammals), describing evolutionary trajectories as transitions of proteins between species, and developing models that help us test hypotheses and answer yet unresolved questions about the structure and extent of life's diversity.

We are looking for students that have experience with network analysis, machine learning and statistical methods (e.g., CS224W, CS246, STATS200). Working knowledge of Python is required. Experience with C++ is a plus. Knowledge of basic biology is a plus.

Go to the application form.