Research Positions in the SNAP Group
Fall Quarter 2017-18

Note: The candidates for these positions have been selected. If you have not been contacted by someone from our group, then you were not selected for a position. We will have more openings in the future, we encourage you to apply at that time.

Welcome to the application page for research positions in the SNAP group, Fall Quarter 2017-18!

Our group has open positions for Research Assistants and students interested in independent studies (CS191, CS199, CS399). These positions are available for Stanford University students only. Below are some of the possible research projects. All projects are high-impact, allowing participants to perform research and work on real-world problems and data, and leading to research publications or working systems. Positions are often extended over several quarters. We are looking for highly motivated students with any combination of skills: data mining, machine learning, algorithms, social network analysis, and computer systems.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled early in the quarter. Thanks for your interest!

If you have any questions please contact Prof. Leskovec at jure@cs.stanford.edu.

Application form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

Please select all the projects that you are interested in. You can find the project descriptions below.

Machine Learning for Sensor Data [description]
Keywords: data analytics, convex optimization, deep learning, computer systems
Modeling Nation's Economy through Bank Transaction Data [description]
Keywords: graph mining, big data analysis, economics
Open-domain Social Media Analysis [description]
Keywords: text mining, natural language processing, social media analysis
Inter-community Interaction and Conflict in Social Networks [description]
Keywords: social network analysis, computational social science
Mining Data Science Patterns [description]
Keywords: big data analysis, data science, recommender systems
Representation Learning on Complex Networks: Beyond Node Embeddings [description]
Keywords: deep learning, representation learning
From Bytes to Cells: Understanding Genotype-Phenotype Relationships through Large Biomedical Data [description]
Keywords: computational biology, network analytics, machine learning
Combating Sedentary Behaviour Using a Smartwatch Intervention [description]
Keywords: mobile and Web applications, data science for social good
SNAP: Stanford Network Analysis Platform [description]
Keywords: network analysis, open-source software, graph algorithms, parallel algorithms

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191)

Statement of Purpose

Briefly explain why you would like to participate in this project, why you think you are qualified to work on it, and how you would like to contribute.

Your Resume

Your Transcript

Click on the button below to Submit


Projects

Machine Learning for Sensor Data

Keywords: data analytics, convex optimization, deep learning, computer systems

Many applications contain massive sequences of multidimensional timestamped observations. These observations often come from "sensors", which can be anything from gene expression readings to weather measurements, stock prices, neural fMRI scans, etc. For applications such as these, there is a need for a single platform that can perform state-of-the-art optimization/machine learning algorithms which are robust (practical/useful in many different settings) and scalable (since these datasets can get very large). The algorithms allow us to detect anomalies, spot trends, classify events, identify clusters, forecast future behavior, and more. For this project, we plan on developing an analytics platform, implementing state-of-the-art optimization algorithms, and then using our system to learn from real-world sensor data provided to us by research collaborators from several different industries (automotive, IoT, manufacturing, and more!).

We are looking for students with a strong machine learning and programming skills, an interest in building large-scale systems, and a desire to work on real-world applications. Knowledge in dealing with very large datasets is a plus.

Go to the application form.

Modeling Nation's Economy through Bank Transaction Data

Keywords: graph mining, big data analysis, economics

Bank transactions are one of the most primary signals we can observe in an economy. They can reflect dynamics in the economy, indicate supply and value chains between companies, and indicate prosperity or troubles of a company or an industry sector. The goal of this project is to analyze a unique dataset containing domestic and international transactions of an entire country for over a decade. We aim to use network analysis and graph mining methods to understand the structure of nation's economy from the perspective of transaction data, model how money flows through the economy, identify and quantify value chains and understand what factors contribute to successful and failing companies.

We are searching for students interested in exploring these questions using unique bank transaction data including data exploration, data cleaning, data visualization and machine learning. Experience in C++ or Python and dealing with network data is a plus.

Go to the application form.

Open-domain Social Media Analysis

Keywords: text mining, natural language processing, social media analysis

Traditionally opinion analysis was done through pools and questionnaires, which made it costly, covering a small sample of the population and could not provide real-time updates. The popularity of social media allows for automatic methods to be used to process online discussions in a cost-effective manner covering a much larger population and providing real-time updates as new discussions are published. The goal of this project is to develop a system for opinion analysis that allows ad-hoc queries on a specific target (e.g. product, book, movie) by comparing its mentions to the mentions of its peers (e.g. similar products, books, movies) in order to identify relevant aspects and target’s position along those aspects. We will develop deep learning approaches for extraction and modeling of aspects and opinions.

We are looking for students with interest in working with textual data and developing algorithms. Experience with natural language processing, text mining and deep learning frameworks is a plus.

Go to the application form.

Inter-community Interaction and Conflict in Social Networks

Keywords: social network analysis, computational social science

Social networks and online discussion platforms enable users to form interest-based communities, express opinions, and interact with others. Interactions between different communities are complex. Some interactions are positive, leading to increase in user engagement and loyalties, while other are negative, such as conflicts and raids. The goal of this project is to study various aspects of inter-community interactions and develop computational models and algorithms to model them. Using several large datasets, including comments and posts from Reddit, Disqus, and Wikipedia, we will study cross-community mobilization, conflict, and growth and decline of communities. Both user level and community level interactions between users and content are integral aspects of this project.

We are searching for students interested in exploring these questions using a combination of social network analysis and natural language processing. Working knowledge of Python and Hadoop is required. Prior experience dealing with large datasets, data mining, social network analysis, and natural language processing is a plus.

Go to the application form.

Mining Data Science Patterns

Keywords: big data analysis, data science, recommender systems

Data Scientists often develop a standard set of software patterns to analyze data and gather insights from it. However, these patterns are often repetitive and best practices are scattered across StackOverflow, GitHub and iPython Notebooks. The goal of this project is to identify frequent data science patterns and build a recommendation engine that will help a data scientist automatically analyze a given dataset. The challenge will be in identifying and extracting common patterns and then developing a recommendation engine that can match them.

We are searching for students with a strong programming background (especially in Python), and an interest in Data Science. Familiarity with code parsing, interpreters and compilers would be a great plus.

Go to the application form.

Representation Learning on Complex Networks: Beyond Node Embeddings

Keywords: deep learning, representation learning

Representation learning on complex networks is an important avenue for advancing our understanding of social and biological dynamics. Using representation learning, we can generate embeddings that compress high-dimensional network information into low-dimensional feature vectors. These embeddings encode network structures - such as edges between nodes - as geometric relationships, and are crucial for many tasks. For example, after generating embeddings of proteins, genes, and chemicals in a biological interaction network, we can use distances in the learned embedding space to predict novel interactions and assist in drug design. In this project, we will seek to push the state-of-the-art in representation learning on complex networks, using recent advancements in deep learning. While much of the previous work has focused on embedding individual nodes in relatively simple networks, we will seek to design systems that can: (i) handle complex, heterogeneous networks with multiple types of nodes and edges (e.g., proteins, genes, and chemicals), (ii) embed entire subgraphs (e.g., social groups), and (iii) scale to networks with millions of nodes and billions of edges.

We are looking for highly-motivated students that have experience with network analysis, machine learning, and deep learning (e.g., CS224W, CS229, CS224N, CS231N). Working knowledge of Python is required, and applicants should have some experience with a deep learning framework (e.g., TensorFlow, Torch, or Theano).

Go to the application form.

From Bytes to Cells: Understanding Genotype-Phenotype Relationships through Large Biomedical Data

Keywords: computational biology, network analytics, machine learning

What analytics methods would you develop to harness data at all scales for biomedical progress? Molecular data have already shown their value. The presence of a mutation in a gene can put people in high- or low-risk groups for various diseases. Even more would be possible if a person's molecular data were placed in the context of his health and behavior. Disease registries, health-insurance and hospital records, as well as research publications could be useful here. The big question is how to take the next step to convert these potentials into actionable knowledge. Datasets derived from health records, the numbers of which are skyrocketing, will be messy but critical for medical progress. Large datasets, systematic or not, inevitably throw up spurious correlations, and so recognizing meaningful patterns remains challenging. The project lies at the intersection between machine learning, network science and computational biology. Its focus will be to develop new analytics approaches for data-intensive challenges that will guide biomedical sciences to the next frontier.

We are looking for students with experience in networks, machine learning and statistics (e.g., CS224W, CS246, STATS200 and others), and an interest in working with large biological and clinical datasets. Working knowledge of Python is required and experience with C++ is a plus.

Go to the application form.

Combating Sedentary Behaviour Using a Smartwatch Intervention

Keywords: mobile and Web applications, data science for social good

Sedentary behaviour, including sitting and lying, is detrimental to one's health. It has been associated with obesity, diabetes, cardio-metabolic syndrome, and morbidity, independent of physical activity levels. Wearable devices, including smartwatches, are fundamentally changing how activity and health can be measured and managed - offering a new opportunity to combat sedentary behaviour. Many commercial wearable applications are currently tracking sedentary bouts and using alerts to motivate users to get up and move. Yet, the efficacy of these applications, or how to best design them, are largely unknown. The purpose of this study is three-fold. First, to test if real-time (i.e. just-in-time) messages from a smartwatch intervention are more effective at reducing sedentary behaviour. Second, to determine how message content affects an individual's sedentary behaviour and rate of motivational fatigue. And, third, to test if an adaptive intervention, which tailors message content to account for individual preference and motivational fatigue, can produce greater and more lasting reductions in sedentary behaviour.

We are searching for students interested in continuing to develop our smartphone/smartwatch mobile app platform, which currently consists of a Pebble watch app and a Heroku backend. Experience in C and Javascript is required, and experience with MongoDB, Heroku, and Python is a plus.

Go to the application form.

SNAP: Stanford Network Analysis Platform

Keywords: network analysis, open-source software, graph algorithms, parallel algorithms

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance network analysis and graph mining library that easily scales to massive networks with billions of nodes and edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is being constantly expanded with new graph and network algorithms for big-memory multi-core machines with 12TB RAM and 288 CPU cores.

We are looking for students with interest in contributing to the SNAP codebase or in developing sequential or parallel graph algorithms. SNAP is written mostly in C++, so experience with this language is a plus. There are no RA positions available for this project, only students interested in independent studies (CS191, CS199, CS399) will be considered here.

Go to the application form.