Research Positions in the SNAP Group

Postdoctoral Fellowship

As part of the Mobilize Center at Stanford University, a newly established National Institutes of Health (NIH) Big Data to Knowledge (BD2K) Center for Excellence, SNAP Group has openings for several Distinguished Postdoctoral Fellows.

Please visit Call for Applications: Distinguished Postdoctoral Fellows for more information.


SNAP Research Assistants

Welcome to the application page for research positions in the SNAP group!

Our group has one open position for a Research Assistant. This position is available for Stanford University students only.

Large-scale news mining

If you had access to all news articles on the Web, what could you learn about how people are talked about by the media and how coverage changes in the face of special events, such as marriage, professional success, accidents, or death? Our group has access to a record of nearly the complete U.S. online media landscape of the last 6 years (120 TB of text), and your job would be to help us explore these questions.

We are searching for students interested in large-scale data analysis, natural language processing, and machine learning. Since the dataset consists of 120 TB of text, candidates should have experience with Hadoop (e.g., CS246) and large-scale data processing.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled quickly. Thanks for your interest!

If you have any questions please contact Prof. Leskovec at jure@cs.stanford.edu.


Application form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

Please select all the projects that you are interested in.

Large-scale news mining [description]

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191)

Statement of Purpose

Briefly explain why you would like to participate in this project and why you think you are qualified to work on it.

Your Resume

Your Transcript

Click on the button below to Submit


Previous Projects

This is a list of previous projects with short description to give you a better understanding of what our group is usually looking for.

Machine Learning for Social Media Recommender Systems

Our social networks overload us with information, bombarding us with thousands of tweets, blog posts, and status updates every day. To cope with this "information overload", there is a need to identify content that users will find relevant, interesting, and important. This requires us to develop statistical models of user behavior in order to discover their preferences. The problem also involves large-scale machine learning and optimization tools in order to recommend meaningful content.

We are searching for students interested in machine learning, optimization, and statistical modeling, with strong algorithmic backgrounds. Strong coding experience in Python/C++ is a plus.

Go to the application form.

Missing Link Prediction on Wikipedia

Wikispeedia is an online human-computation game, where the goal is to find a short path between two given Wikipedia articles by clicking existing Wikipedia links. However, important links are often missing, and identifying them is known as the network completion, or link prediction, problem. The goal of this project is to predict missing links on Wikipedia, using data collected through Wikispeedia. If many users went through article A when looking for target T, but A has no links to T, then the method will suggest a new link from A to T. This project will use Wikispeedia data to predict missing links on Wikipedia and then develop a framework for gamifying website navigation beyond Wikipedia.

Programming experience in Java is desirable, since some existing code is written in Java. As the project will involve the gamification of Web-browsing, creative thinking and Web-programming experience (HTML, JavaScript, PHP, SQL) are a big plus.

Go to the application form.

Do Birds of Feather Flock Together: Exploring the Similarities and Differences in Online Behaviour Between Facebook Friends

Social scientists claim that friendship groups are usually homogeneous, or share similar traits and behaviours. This is usually explained by people being attracted to similar others, and encouraging others to conform to norms accepted in a given group. In reality, however, the extent to which people select and influence their friends is unknown. Social media allows observing people at an unprecedented scale allowing to explore this issue further. Which behaviors and preferences are shared between friends and which are not? Are some people 'incompatible' with each other?

We are searching for students interested in exploring these questions using a Facebook-based dataset of 6 million people. Programming experience in Python, C++ or R is necessary. Experience with databases and large sparse matrices is a big advantage. We expect you to write up and publish your results.

Go to the application form.

SNAP: Stanford Network Analysis Platform

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is being constantly expanded with new graph and network algorithms for big-memory multi-core machines with 1TB RAM and 80 CPU cores.

We are looking for students with interest in developing sequential or parallel graph algorithms. SNAP is written in C++, so extensive experience with this language is a plus.

Go to the application form.

Ringo: In-memory Graph Exploration Engine

What data analysis engine would you build if your computer has unlimited RAM and CPU? Large-scale data analysis is transforming science and industry. However, tools and solutions for data analysts are bulky and cumbersome to use. The goal of the Ringo research project is to build an interactive system for analysis of large datasets with billions of items. The system will implement strong primitives to handle relational tables as well as networks -- huge graphs with node and edge attributes. Ringo will be based on the SNAP platform. We will run Ringo on machines with 1TB RAM and 80 CPU cores.

We are looking for students with strong programming skills and desire to build computer systems. Ringo is written in C++ and Python, so extensive experience with those languages is a plus.

Go to the application form.

Snapworld: A System for Processing Tera-Scale Graphs

Large graphs are fundamental to big data science and analytics. Processing of such graphs is challenging, since it is pushing the limits of current computing systems. Snapworld is a distributed framework for executing large computations on a compute cluster with over 1000 cores, based on the BSP (Bulk Synchnonous Parallel) model. The goal of the project is to advance Snapworld and develop graph algorithms that can handle tera-scale graphs - graphs with trillions of edges.

We are looking for students with strong programming skills and desire to build distributed computer systems and algorithms. Most of Snapworld is written in Python with some time sensitive modules in C++ using SNAP, so extensive experience with those languages is a plus.

Go to the application form.