CS224W:

Social and Information Network Analysis

Autumn 2011

Tuesday & Thursday 9:30AM - 10:45AM in Gates B3

In the first two weeks of the class we will also hold 4 recitation sessions:

- Review of basic probability (Gates B01, 4~6pm on Thu 9/29)
- Review of basic linear algebra (Gates B03, 4~6pm on Fri 9/30)
- Introduction to SNAP (Gates B01, 4~6pm on Thu 10/6)
- Introduction to NetworkX (Gates B03, 4:15~6:15pm on Fri 10/7)

- 1: Introduction
- 2: Six degrees of separation
- 3: Models of the small world, Decentralized search
- 4: Small world phenomena, Search in P2P networks, Strength of weak ties
- 5: Graph structure of the web
- 6: Power-laws and Preferential attachment
- 7: Models of network evolution (1)
- 8: Models of network evolution (2)
- 9: Cascading behavior in networks
- 10: Models of network cascades
- 11: Cascades in viral marketing and the blogosphere
- 12: Influence maximization in networks
- 13: Detecting cascades in networks
- 14: Finding communities and clusters in networks
- 15: Spectral clustering and large scale community structure in networks
- 16: Modularity and large scale community structure in networks
- 17: Kronecker graphs
- 18: Link analysis for Web search
- 19: Networks with positive and negative edges

NO LATE DAYS) |
||

See FAQ for information on how to submit assignments and other work.

World Wide Web, blogging platforms, instant messaging and Facebook can be characterized by the interplay between rich information content, the millions of individuals and organizations who create and use it, and the technology that supports it.

The course will cover recent research on the structure and analysis of such large social and information networks and on models and algorithms that abstract their basic properties. Class will explore how to practically analyze large scale network data and how to reason about it through models for network structure and evolution.

Topics include methods for link analysis and network community detection, diffusion and information propagation on the web, virus outbreak detection in networks, and connections with work in the social sciences and economics.

- Knowledge of basic computer science principles at a level sufficient to write a reasonably non-trivial computer program. (e.g., CS107 or CS145 or equivalent are recommended)
- Familiarity with the basic probability theory. (CS109 or Stat116 is sufficient but not necessary.)
- Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary.)

There is no official text for this course. Notes and reading assignments will be posted periodically on the course web site. The following books are recommended as optional reading:

- Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley and Jon Kleinberg (
**FREE!**). - Networks: An introduction by Mark Newman.

Course handouts and other reading materials can be downloaded here.

The coursework for the course will consist of:

- Two homeworks
- Reaction paper
- Project proposal
- Open ended competition
- Substantial course project

The idea for the problem sets is to practice some skills that will be required for the project. The homeworks will contain written questions and questions that require some programming. Specifically, we will be working both on mathematical models of networks and analyzing real network data, for example, how to find stable sets in structural balance theory or how to optimally seed the network to maximize the influence. Second, we will also work with network datasets to get a flavor of types of questions one asks in network analysis. For example, using citation data create a small citation network, compute degree distributions, clustering coefficients, node centralities.

** Questions:** We try very hard to make questions unambiguous, but some ambiguities may remain. Ask (i.e., post a question on Piazza) if confused or state your assumptions explicitly. Reasonable assumptions will be accepted in case of ambiguous questions.

** Honor code:** We strongly encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the solutions independently, and without referring to written notes from the joint session. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. In addition, each student should write on the problem set the set of people with whom s/he collaborated.

Further, since we occasionally reuse problem set questions from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an **honor code violation to intentionally refer to a previous year's solutions**. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year.

** Late assignments:** Each student will have a total of

** Assignment submission:** To hand in an assignment, write
down the date and time of submission, and leave it in the submission
cabinet on 1st floor Gates building, near the
east entrance. You can find a photo of
it here. It is an honor code violation to write down the wrong time.

Regular (non-SCPD) students should submit hardcopies of assignments. Additional materials like source code should be submitted via assignment submission site. Please do not email your homework solutions to us.

If you are an SCPD student, you should submit all your files via
assignment submission site.Then also send us an e-mail
to the submissions e-mail: cs224w.fall11@gmail.com
(Note: this is different from the normal staff e-mail @stanford.edu). Write "Assignment XXX Submission: SUNetID" on the Subject
of the email, where XXX is one of the following: {HW1, HW2, Reaction,
Proposal, Milestone, Final}. And SUNetID is your Stanford University
Network ID. **Do not attach the homework to the email.**

The course is based on material from the last few years. This means that most of it in form of research papers, which raise lot of interesting issues that have yet to be explored. The goal of the reaction paper is that students familiarize themselves more in depth with the material covered in class, do reading beyond what was covered in class.

Students can work in groups of up to 3 people on the reaction paper.

Students will pick at least two or three related papers where at least one has been mentioned in class or the book (see this and previuos year's course handouts) or any other paper clearly related to course topics (if in doubt check with curse staff about the papers you aim to read). Students should carefully read the papers and write a short approximately 3-5 pages reaction paper about the content of the chosen papers. You should be thinking beyond what you read, and not just take other people's work for granted. The reaction paper should address the following questions:

**Summary**- What is main technical content of the papers?
- How do papers relate to the topics presented in the course?
- What is the connection between the papers you are discussing?

**Critique**- What are strengths and weaknesses of the papers and how they be addressed?
- What were the authors missing?
- Was anything particularly unrealistic?

**Brainstorming**- What are promising further research questions in the direction of the papers?
- How could they be pursued?
- An idea of a better model for something? A better algorithm? A test of a model or algorithm on a dataset or simulated data?

Reaction papers should not just be summaries of the papers you read. The last two bullets should form the most substantial part of the document. Answering these questions can be a very good way to explore a potential project topic. The reaction paper should be concluded with a section with a description of some promising further research directions and questions, and how could they be pursued. The reaction paper has to include at least some amount of each of the following two types of content:

- A test of a model or algorithm (that you have read about or your own) on a dataset or on simulated data
- A proposal for a model or algorithm that potentially extends or improves the topics discussed in the papers you've read.

In prior version of the course, the reaction paper has been a very good way to explore a potential project topic.

The idea of the reaction papers is: Examples of reaction papers:- Regarding the Flow of Influence and Social Meaning Across Social Media Network by Mahalia Miller & Daniel Wiesenthal.
- Beyond Citation Network: Analyzing Publication Data Set as a Multi-Layer Hypergraph by Jingyu Cui & Fan Wang & Jinjian Zhai.
- Self-similar Networks by Adithya Rao & Gautam Kumar Parai & Sandeep Sripada.
- Applying Decentralized Search Algorithms to High School Data by Alexander Loewi & Jan Overgoor & Evan Rosen.
- Language Identification by Peter Pham & Karen Shiell.
- Center-Piece Subgraphs in Social Networks by Abhik Lahiri & Rifat Reza Joyee & Pranav Khaitan.
- Network Analysis and Modeling by Beyang Liu.

There can be three kinds of class projects:

- Experimental evaluation of algorithms and models on an interesting dataset.
- A theoretical project that considers a model, an algorithm or a network property (measure) and derives a rigorous result about it.
- An in-depth critical survey of one of the course topics relating models, experimental results and underlying social theories and offering a novel perspective on the area.

Ideally, projects will be a mix of the three types of projects outlined above. For some useful **project themes**, go here. As with the reaction paper, the project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic **data**. You can also check 2009 student project reports and
2010 student project reports.

There are four deliverables (Click the respective deliverable to know more):

- Project proposal (15% of the project grade)
- Project milestone report (15% of the project grade)
- Final project writeup (60% of the project grade)
- Poster presenting your work for a special poster session (10% of the project grade)

You can work in groups of 3 people on the project.

Student project reports from Fall 2009 class.

Student project reports from Fall 2010 class.

The project proposal should build on the reaction paper. The proposal should survey the related work and identify what are strengths and weaknesses of the papers and how they may be addressed. The proposal should then focus on what are some promising further research directions and questions. How precisely do you plan to pursue them? What methods/data do you plan to use? The proposal should contain at least some amount of each of the following two types of content:

- A test of a model or algorithm (that you have read about or your own) on a dataset or on simulated data
- A proposal for a model or algorithm that potentially extends or improves the topics discussed in the papers you've read.

- What is the problem you are solving?
- What data will you use (how will you get it)?
- How will work do you plan to do the project?
- Which algorithms/techniques/models you plan to use/develop? Be as specific as you can!
- Who will you evaluate your method? How will you test it? How will you measure success?
- What do you expect to submit/accomplish by the end of the quarter?

Some other points to note:

- The project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data
- The result of the project will typically be a 10 page paper, describing the approach, the results, and the related work.

You can work in groups of up to 3 people on the project.

Here are some examples of past project proposals:- Networks & Sentiment by Jan Overgoor & Evan Rosen.
- Self-similar Networks by Adithya Rao & Gautam Kumar Parai & Sandeep Sripada.
- Regarding the Flow of Sentiment Across Social Media Networks by Mahalia Miller &Conal Sathi & Daniel Wiesenthal.
- Community in Networks by Giannis Neokleous & Amir Ghazvinian & Pei-yu Wang.
- Information Flows on Twitter by Huang-Wei Chang & Te-Yuan Huang.
- Exploring Disciplinary in Departments by Susan Biancani.

- Think of this as a draft of your final report but without your major results.
- We expect that you have completed 30% of the project
- Provide a complete picture of your project even if certain key parts have not yet been implemented/solved.
- Include the parts of your project which have been completed so far, such as:
- Thorough introduction of your problem
- Review of the relevant prior work
- Description of the data collection process
- Description of any initial findings or summary statistics from your dataset
- Description of any mathematical background necessary for your problem
- Formal description of any important algorithms used
- Description of general difficulties with your problem which bear elaboration
- Make sure to at least outline the parts which have not yet been completed so that it is clear specifically what you plan to do for the final version.
- Recommended length 3-5 pages

**Introduction/Motivation/Problem Definition (15%)**

What is it that you are trying to solve/achieve and why does it matter.**Prior Work (10%)**

How does your project relate to previous work. Please give a short summary on each paper you cite and include how it is relevant.**Model/Algorithm/Method (30%)**

This is where you give a detailed description of your primary contribution. It is especially important that this part be clear and well written so that we can fully understand what you did.**Results and findings (35%)**

How do you evaluate your solution to whatever empirical, algorithmic or theoretical question you have addressed and what do these evaulation methods tell you about your solution. It is not so important how well your method performs but rather how interesting and clever your experiments and analysis are. We are interested in seeing a clear and conclusive set of experiments which successfully evaluate the problem you set out to solve. Make sure to interpret the results and talk about what can we conclude and learn from your evaluations. Even if you have a theoretical project you should have something here to demonstrate the validity or value of your project (for example, proofs or runtime analysis).**Style and writing (10%)**

Overall writing, grammar, organization and neatness.

The grading will be based:

- 30% on problem sets
- 10% on reaction paper
- 60% on the final project

General course questions should be posted Piazza.

Piazza requires @stanford.edu emaill address to register. If you do not have @stanford.edu address, send us email with your email address and we will register you.

If you need to reach the course staff, you can reach us at cs224w-aut1112-staff@lists.stanford.edu (consists of the TAs and the professor).