Research Positions in the SNAP Group
Summer Quarter 2023-24

Welcome to the application page for research positions in the SNAP group under Prof. Jure Leskovec, Summer Quarter 2023-24!

Our group has open positions for Research Assistants and students interested in independent studies and research (CS191, CS195, CS199, CS399) on an exciting research project. These positions are available for Stanford University students only. Below are more details about the project. The project is high-impact, allowing participants to perform research and work on real-world problems and data, and leading to research publications or open source software. Positions are often extended over several quarters. We are looking for highly motivated students with any combination of skills: machine learning, data mining, network analysis, algorithms, and computer systems.

Please apply by filling out and submitting the form below. Apply quickly since the positions usually get filled rapidly. Thanks for your interest!

If you have any questions please contact Lata Nair at lnairp24@stanford.edu.

Application Form

First and Last Name

SUNetID

SUNetID is your Stanford CS login name and contact email address, <your_SUNetID>@cs.stanford.edu. If you don't have a SUNetID, use <your_last_name>_<your_first_name>, so if your last name is Smith and your first name is John, use smith_john.

Email

Department

Student Status

Project(s)

More details about the project are available below.

A 100 Billion Parameter Genomic Language Model [description]
Keywords: LLM Training, Foundation Model, AI for Science

Position

Please select the position you are interested in. Please select all that apply.

25% RA
50% RA
Independent study (CS399, CS199, CS191, CS195)

Statement of Purpose

Briefly explain why you would like to participate in this project, why you think you are qualified to work on it, and how you would like to contribute.

Your Resume

Your Transcript

Click on the button below to Submit


Projects

A 100 Billion Parameter Genomic Language Model

Keywords: LLM Training, Foundation Model, AI for Science

We are developing the largest open source language model for science. We are training this model, a large language model for genomic sequences (DNA, RNA and proteins), on a cluster of 1024 H100 GPUs (1/8th of the compute budget of GPT-4). Not only will we open source the model, but also for the first time, the full training implementation and details. We seek students to help with the development of various aspects of our large-scale system, such as managing data, distributed communication (Deepspeed ZeRO etc.) and building a distributed architecture, as well as running and managing experiments on the cluster.

We are looking for highly motivated students who have experience in machine learning, natural language processing, ML systems and engineering (courses such as CS224W, CS224N, CS231N, CS229 etc. are helpful). A strong background in PyTorch is recommended. Experience with CUDA and distributed computing is a large plus.

Go to the application form.