In conjunction with the International Conference on Knowledge Discovery & Data Mining (KDD 2010)

Call for

Program - July 25, 2010
8:45 - 9:00am

Opening remarks

9:00 - 9:50am

Social Media Analytics: the Value Proposition
Invited talk: Rohini Srihari

9:50 - 10:30am

Coffee break + Poster session

10:30 - 10:50am

Towards detecting influenza epidemics by analyzing Twitter messages
Aron Culotta

10:50 - 11:10am

Statistical Measure of Quality in Wikipedia
Sara Javanmardi and Cristina Lopes

11:10 - 12:00pm

Identifying Peer Influence in Massive Social Networks
Invited talk: Sinan Aral

12:00 - 2:00pm

Lunch + Poster session

2:00 - 2:20pm

Twitter Under Crisis: Can we trust what we RT?
Marcelo Mendoza, Barbara Poblete and Carlos Castillo

2:20 - 2:40pm

Identifying Breakpoints in Public Opinion
Cuneyt Gurcan Akcora, Murat Ali Bayir, Murat Demirbas and Hakan Ferhatosmanoglu

2:40 - 3:00pm

Empirical Study of Topic Modeling in Twitter
Liangjie Hong and Brian D. Davison

3:00 - 3:30pm

Coffee break

3:30 - 4:20pm

Online Shopping and Social Media
Invited talk: Natalie Glance

4:20 - 4:40pm

Causal Discovery in Social Media Using Quasi-Experimental Designs
Huseyin Oktay, Brian J. Taylor and David D. Jensen

4:40 - 5:00pm

Discussion + Closing remarks

Invited Speakers    Top

Natalie S. Glance is a software engineer and manager at Google working on online shopping search. A physicist by training, her research interests have landed her somewhere in the intersection of social networks, information extraction and discovering global behavior from local actions. In recent years, her research has focused on mining consumer sentiment and buzz from online social media. Previous to Google, Natalie has held research scientist appointments at BuzzMetrics (formerly, Intelliseek) from 2002-2007, WhizBang! Labs (2000-2002) and Xerox Research Centre Europe (1994-2000).

Rohini Srihari is an educator, scientist and entrepreneur. She has founded two language technology companies: Cymfony, and more recently, Janya - where she currently serves as CEO. Janya is a leading provider of text analytics solutions and services to commercial and government customers, including several within the DoD. Dr. Srihari is also a faculty member in the Dept. of Computer Science and Engineering at the University at Buffalo. She has published extensively in the areas of information extraction, text mining, social media mining, and multilingual text analysis. She received a B. Math. from the University of Waterloo, Canada and a PhD in Computer Science from the University at Buffalo.

Lillian Lee is a professor of computer science at Cornell University. Her research interests include natural language processing, information retrieval, and machine learning. She is the recipient of the inaugural Best Paper Award at HLT-NAACL 2004 (joint with Regina Barzilay), a citation in "Top Picks: Technology Research Advances of 2004" by Technology Research News (also joint with Regina Barzilay), and an Alfred P. Sloan Research Fellowship, and her group's work has been featured in the New York Times.

Sinan Aral obtained his PhD in Information Systems from MIT. He is a faculty member in the Information, Operations and Management Sciences department of the NYU Stern School of Business and affiliated faculty at MIT. His research interests include how information diffusion in massive online social networks influences demand patterns, consumer e-commerce behaviors and word of mouth marketing. Sinan's research has been awarded an NSF CAREER Award (2009), the ICIS Best Overall Paper Award (2006 and 2008), the ICIS Best Paper in IT Economics Award (2006), the ICIS Best Paper in IT Business Value Research Award (2006), the ACM SIGMIS Best Dissertation Award (2007), and an IBM Faculty Award. Sinan is a Phi Beta Kappa graduate of Northwestern University and holds masters degrees from the London School of Economics and Harvard University. He has been a Fulbright Scholar and is currently on the Academic Advisory Board of the Institute for Innovation and Information Productivity.

Accepted Papers    Top

Huseyin Oktay , Brian J. Taylor and David D. Jensen. Causal Discovery in Social Media Using Quasi-Experimental Designs.

Guangyu Wu, Derek Greene, Barry Smyth and Padraig Cunningham. Distortion as a Validation Criterion in the Identification of Suspicious Reviews.

Lei Tang , Xufei Wang, Huan Liu and Lei Wang. A Multi-Resolution Approach to Learning with Overlapping Communities.

Jonathan Elsas and Natalie Glance. Shopping for Top Forums: Discovering Online Discussion for Product Research.

Jose Luis Devezas, Cristina Ribeiro and Sergio Nunes. Studying Blog Features over Link Popularity.

Kirill Dyagilev, Shie Mannor and Elad Yom-Tov. Generative Models for Rapid Information Propagation.

Dong Nguyen, Elijah Mayfield and Carolyn Rose. An analysis of perspectives in interactive settings.

Peter Hui and Michelle Gregory. Quantifying Sentiment and Influence in Blogspaces.

Cuneyt Gurcan Akcora, Murat Ali Bayir, Murat Demirbas and Hakan Ferhatosmanoglu. Identifying Breakpoints in Public Opinion.

Jeonhyung Kang, Kristina Lerman and Anon Plangprasopchok. Analyzing Microblogs with Affinity Propagation.

Marcelo Mendoza, Barbara Poblete and Carlos Castillo. Twitter Under Crisis: Can we trust what we RT?

Liangjie Hong and Brian D. Davison. Empirical Study of Topic Modeling in Twitter.

Ryan Rossi and Jennifer Neville. Modeling the Evolution of Discussion Topics and Communication to Improve Relational Classification.

Stacy Patterson and Bassam Bamieh. Interaction-Driven Opinion Dynamics in Online Social Networks.

Ceren Budak, Divyakant Agrawal and Amr El Abbadi. Where The Blogs Tip: Connectors, Mavens, Salesmen and Translators of the Blogosphere.

Aron Culotta. Towards detecting influenza epidemics by analyzing Twitter messages.

Zicong Zhou, Roja Bandari, Joseph Kong, Hai Qian and Vwani Roychowdhury. Information Resonance on Twitter: Watching Iran.

Sara Javanmardi and Cristina Lopes. Statistical Measure of Quality in Wikipedia.

Invited Talks    Top

Social Media Analytics: the Value Proposition - Rohini Srihari
There has been a meteoric rise in the amount of content on the web generated by ordinary users, particularly through mobile devices. This includes social media sites such as Facebook, Twitter, and YouTube, as well as blogs, discussion forums, and reader responses to articles on traditional news sites. Such data can be mined for many purposes including business-related competitive insight, e-commerce, as well as citizen response to current issues. This talk will survey commercial applications exploiting social media data, the business models driving these, and vendors providing the solutions. Computational techniques being used for extracting such information and assimilating it into actionable intelligence will also be briefly discussed. The talk will also touch on applications being pursued by the DoD and intelligence community, where the value proposition (benefits, costs and value) is different, but equally compelling.

A tempest: Or, On the flood of interest in sentiment analysis, opinion mining, and the computational treatment of subjective language. - Lillian Lee
"What do other people think?" has always been an important consideration to most of us when making decisions. Long before the World Wide Web, we asked our friends who they were planning to vote for and consulted Consumer Reports to decide which dishwasher to buy. But the Internet has (among other things) made it possible to learn about the opinions and experiences of those in the vast pool of people that are neither our personal acquaintances nor well-known professional critics --- that is, people we have never heard of. Enter sentiment analysis, a flourishing research area devoted to the computational treatment of subjective and opinion-oriented language. Sample phenomena to contend with range from sarcasm in blog postings to the interpretation of political speeches. This talk will cover some of the motivations, challenges, and approaches in this broad and exciting field.

Online Shopping and Social Media - Natalie S. Glance
Community generated content, or social media, has become increasingly important over the past several years. Social media sites such as blogs, twitter and online discussion boards have been recognized as valuable sources of market intelligence for companies wishing to keep abreast of their customers' attitudes expressed online. There has been little focus, however, on providing a similar service to shoppers themselves. In fact, shoppers perform research prior to making a purchase and tap into many kinds of online information; in particular they may seek out editorial or user reviews of specific products, buying guides for categories of products or informal conversational product discussion such as those found in message boards.

In this talk, I will discuss our recent results in aiding consumers with their shopping research by providing access to community generated content, focusing on reviews and online forums. Reviews of products and merchants can have a large impact on how well a product sells. Given a set of reviews of products or merchants from a wide range of authors and several reviews websites, how can we measure the true quality of the product or merchant? Likewise, discussion forums, are an especially good place to find product comparisons within a category of items, to find expert opinions, and to find first-hand product experiences. I'll present a solution for pulling online forum results from the web into the user interaction flow of the shopping site.

Identifying Peer Influence in Massive Social Networks - Sinan Aral
The talk will report on and discuss 3 papers:
Distinguishing Influence Based Contagion from Homophily Driven Diffusion in Dynamic Networks (Published in the Proceedings of the National Academy of Sciences, 2009, vol. 106, no.51.)
Node characteristics and behaviors are often correlated with the structure of social networks over time. While evidence of this type of assortative mixing and temporal clustering of behaviors amongst linked nodes is used to support claims of peer influence and social contagion in networks, homophily may also explain such evidence. Here we develop a dynamic matched sample estimation framework to distinguish influence and homophily effects in dynamic networks, and apply this framework to a global instant messaging network of 27.4 million users, using data on the day-by-day adoption of a mobile service application and users' longitudinal behavioral, demographic and geographic data. We find that previous methods overestimate peer influence in product adoption decisions in this network by up to 700% and that homophily explains over 50% of the perceived behavioral contagion. These findings and methods are essential to both our understanding the mechanisms that drive contagions in networks and our knowledge of how to propagate or combat them in domains as diverse as epidemiology, marketing, development economics and public health.

Creating Social Contagion through Viral Product Design: A Randomized Trial of Peer Influence in Networks
We examine how firms can create word of mouth peer influence and social contagion by incorporating viral features into the design of their products. Evaluating the effects of such product design decisions on social contagion is difficult because econometric identification of peer influence is non-trivial. Although several approaches have been proposed, it is widely believed that the most effective way to obtain unbiased estimates of peer effects is to conduct large-scale randomized trials of peer-to-peer communications intended to influence particular economic decisions, such as the decision to adopt a product. We therefore designed and conducted a randomized field experiment testing the effectiveness of passive-broadcast and active-personalized viral messaging capabilities in creating peer influence and social contagion among the 1.4 million friends of 9,687 experimental users of The experiment utilizes a customized commercial Facebook application to observe user behavior, communications traffic and the peer influence effects of randomly enabled viral messaging capabilities on application diffusion in the local networks of experimental and control population users. Results show that viral product design features generate econometrically identifiable peer influence and social contagion effects. Features that require more activity on the part of the user and are more personalized to recipients create greater marginal increases in the likelihood of adoption per message, but generate fewer total messages creating countervailing effects on peer influence. On average, passive-broadcast viral messaging capabilities, which are less personalized but also require less user effort, generate a 246% increase in local peer influence and contagion effects over a baseline model in which viral messaging is disabled. Adding active-personalized viral messaging capabilities, which are more personalized but require more user effort, generates an additional 98% increase in local peer influence and contagion effects over the passive-broadcast model. Analysis shows that initial peer adoptions in users' local networks drive a viral feedback loop that accelerates contagion. These results shed light on how viral products can be designed to generate social contagion and how randomized trials can be used to identify peer influence effects in social networks.

Identifying Social Influence: A Comment on Opinion Leadership and Social Contagion in New Product Diffusion (Forthcoming in Marketing Science)
I sketch five broad questions that could, if appropriately addressed, dramatically improve how we conceptualize and manage social contagions in a variety of domains: 1) What exactly is (causal) social influence? 2) How do product characteristics affect peer influence and contagion? 3) What is the role of sustained product use in creating sustainable contagions? 4) How do the distributions of individual characteristics over network nodes affect contagion? 5) Are there 'systems' of complementary contagion management strategies?