Mining web social data with latent aspect models on distributed computers

Posted on:2010-11-29

Degree:Ph.D

Type:Dissertation

University:University of California, Santa Barbara

Candidate:Chen, Wen-Yen

Full Text:PDF

GTID:1448390002487928

Subject:Computer Science

Abstract/Summary:

Social-network products are flourishing. Sites such as MySpace, Facebook, and Orkut attract millions of visitors a day, approaching the traffic of Web search sites. These social-network sites provide tools for individuals to establish communities, to upload and share user generated content, and to interact with each other. Specifically, users can connect to each other explicitly by adding friends, or implicitly by joining communities. However, the rapid growth of the amount of data on social-network sites (e.g., the number of communities) has made information finding increasingly challenging. To help users overcome the information overload problem and sift through huge amounts of information efficiently, recommender systems have been developed to generate suggestions based on user preferences.;In this dissertation, we focus on the information recommendation tasks in social networks, including personalized community recommendations, image and community clustering, and automatic photo annotations. We propose a collaborative filtering method to perform effective personalized community recommendations by fusing multiple information sources. We then compare algorithms from very different domains, LDA and ARM, to evaluate their effectiveness for community recommendation task. To provide timely recommendations, we parallelize the proposed algorithms on distributed machines to efficiently handle large scale data sets. In parallelizing spectral clustering we investigate two representative ways of approximating the dense similarity matrix. We compare one by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. Through an empirical study on a large document data set and a large photo data set, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem. Additionally, we outlined the functionalities of the Fotofiti (FF) website, a research platform for automating semantic annotation of digital photographs as well as integrating various components of ongoing research such as landmark recognition.

Keywords/Search Tags:

Data, Sites

Related items

1	Replica placement and selection strategies in data grids
2	Stepping off the sidewalk: An examination of the data collection techniques of Web sites visited by children
3	Multimedia Data Mining In Social Network Sites
4	Key Technology Research On The Synchronous Development Of Web Sites And WAP Sites Based On CMS
5	Analysing Mirna Interaction And Conservative Sites Based On Secondary Structure
6	The impact of source type, source offset, and receiver spacing on experimental MASW data at soft-over-stiff sites
7	Dissemination Innovative Research Of Video Sites Self-Produced Programs
8	Preliminary evaluation of soil vapor intrusion at manufactured gas plant sites
9	VOC emission monitoring at Eagle Ford Shale Oil and Gas production sites using a wireless sensor network (WSM)
10	Characteristics of rest sites used by raccoons (Procyon lotor) in Richmond, Kentucky