Research Positions in the SNAP Group

Welcome to the application page for research positions in the SNAP group!

Our group does not have any open positions at the moment. Please have a look at some previous project below and check back with us soon.


SnapVX: A Network-Based Convex Optimization Solver

Convex optimization has become a widely used approach of modeling and solving problems in many different fields. However, as applications get larger and more intricate, classical methods of convex analysis begin to fail due to a lack of scalability. The challenge of large-scale optimization lies in developing methods general enough to work well independent of the input and capable of scaling to the immense datasets that today's applications require. We are building SnapVX a general solver for large scale convex problems defined on networks, which can be applied to a variety of examples in machine learning, graph analysis, and more!

Combining Online Social Networks with Offline Physical Activity

The growing popularity of smartphones and wearable sensors provides us with an unprecedented view of physical activity and health outcomes across millions of individuals. The data collected from such devices (e.g., exercise, food intake, hydration, stress, sleep, heart rate, weight) provides a unique opportunity to understand how to encourage healthy physical activity and weight management to stem the inactivity epidemic that more and more countries are facing.

We are social creatures and enjoy sharing our lives with our peers who also influence our behavior through encouragement, support, competition, or peer pressure. Online social networks capture human interactions on a grand scale and could enable us to better understand the behavioral and social factors that influence individuals' decisions to engage in regular physical activity. We are combining activity tracking data with online social interaction data to improve our understanding of how to make users and communities more successful and healthy.

Using Machine Learning to Analyze and Complement Human Decision Making

Understanding human decision making is a very challenging and exciting endeavor. Several diverse domains such as health care, judiciary, insurance etc. provide invaluable insights into human decision making capabilities. The high level goals of this project involve identifying interesting patterns in collective as well as individual decision making behavior, analyzing how these patterns evolve with time and, building frameworks which help us evaluate the goodness of decisions. We also aim to build interpretable machine learning models which allow us to understand as well as predict the future decisions of experts such as doctors, judges, insurance underwriters.

More specifically, for the Autumn quarter of 2015, we will be looking at data from a large insurance company which insures small to mid-scale businesses. The dataset consists of reports written to analyze the risk associated with a business, the decisions of whether a particular business was insured and the outcomes and aftermath of an insurance policy after its approval.

We will be analyzing this dataset and answering several interesting questions such as: how do the risk analysis teams of an insurance company make decisions and what features do they consider, can machine learning algorithms do a better job than humans in predicting the risks associated with insuring a business, how do we evaluate the decision making ability of the risk analysis teams, and how accurately can machine learning algorithms predict the future decisions of risk analysis teams.

SNAP: Stanford Network Analysis Platform

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is being constantly expanded with new graph and network algorithms for big-memory multi-core machines with 1TB RAM and 80 CPU cores.

Ringo: In-memory Graph Exploration Engine

What data analysis engine would you build if your computer has unlimited RAM and CPU? Large-scale data analysis is transforming science and industry. However, tools and solutions for data analysts are bulky and cumbersome to use. The goal of the Ringo research project is to build an interactive system for analysis of large datasets with billions of items. The system will implement strong primitives to handle relational tables as well as networks -- huge graphs with node and edge attributes. Ringo will be based on the SNAP platform. We will run Ringo on machines with 1TB RAM and 80 CPU cores.

Demographic Inference in Social Media

The social side of the web sees huge amounts of data posted to it daily. However, the authors behind this content are often partly shrouded in mystery: who are they? how old are they? what kind of job do they have? what part of the world are they from? -- even what languages do they speak? Characterizing online populations can provide critical insight into social phenomena like rumor spreading and influence or real-world events like flu outbreaks. While we can identify some user's attributes when they explicitly provide the information, we are left to try to infer many of these latent attributes for the vast majority of users. We aim to create new methods that combine social network analysis and text processing to infer many kinds of demographic attributes for social media users.