CS345A:
Data Mining
Winter 2010
Class project
Overview:
A software project that discovers or leverages interesting relationships within a significant amount of data. Best if the project leverages what we have learned in class.
Logistics about the projects:
- A Project Proposal should be sent in positively by midnight, Feb. 1.
- Access to Aster Data's nCluster will be available soon.
- Some project ideas (these serve merely as ideas. They should by no means restrict your imagination)
- Implement anti-spam algorithm (e.g. Trust Rank) on a collection of webpages
- Implement a better version of topic-sensitive PageRank on a collection of webpages (by "better," we mean "incorporating your own ideas")
- Implement collaborative filtering technique on certain basket/item data (from Ebay or Amazon, for instance)
- Tell something useful about a collection of documents -- Web pages, news articles, reviews, blogs, e.g. Possible goals include identifying sentiment (is a review positive or negative?), telling wise blogs from foolish, telling real news from publicity releases
- Be sure to look under Resources to see what data sets are available.
Deliverables:
- Final project writeup (5-10 pages) due Sunday, March 14 (pdf by email to staff mailing list) . This is a comprehensive description of your project. You should include the following:
- Project idea
- Your specific implementation
- Key results and metrics of your system
- What worked, what did not work, what surprised you, and why
- Poster Session
Each team should prepare a poster, and be prepared to give a short explanation, in front of the poster. Everyone will also get an opportunity to see what other people have done for their projects. We will be supplying poster boards and easels for displaying the posters. The exact date and venue for the poster session is March 16th - 3.30 to 6.30 pm in Gates lobby.
Project titles (Winter 2010):
- Frequency-Domain Characterization of Trending Topics
- A Music Recommendation System based on Yahoo! Data Corpus
- Identifying Trending Topics on Twitter
- Wikipedia vandalism
- Product Offer Comparison across Different Merchants
- Extracting Information from Yelp Reviews
- Exploring Methods of De-Novo Short Read Assembly Using MapReduce
- Topic Chaining and Phrase Linking
- Understanding Correlations between Product Reviews and Ratings
- Finding the Social Roots of Controversy in Wikipedia
- Techniques to improve detection of trending topics on Twitter
- Mining Hospital Records for Predicting Patient Drop-off
- Social Information Engine: Data Mining Twitter for Product Recommendations
- Comparing the impact of cross-disciplinary and cross-institutional academic research: An exploration of the ISI Web of Science database
- Woodstock: Using Twitter tweets' sentiments to predict stock price change
- Book Recommendation System
- Seven years of Wikipedia's Revision History as a Time dependent Graph: A Love Story
- Adaptive Locality Sensitive Hashing for Recommending Twitter Followers
- Combining Content Filtering and Collaborative Filtering for the Netflix Prize
- Twitter #Hashtags
- Collaborative Filtering on Netflix Challenge
- A Music Recommendation System
- Content Based Auto-tagging of Flickr Images using ImageWebs
- A Data Mining Based Approach to Determining Causal Associations Between Drugs and Condition
- Twitter Personal Newspaper
- WikiSuggest: A Suggestion Engine for Editors on Wikipedia
- Hashtags on Twitter
- OMOP Cup