Font Size: a A A

Mining web social data with latent aspect models on distributed computers

Posted on:2010-11-29Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Chen, Wen-YenFull Text:PDF
GTID:1448390002487928Subject:Computer Science
Abstract/Summary:
Social-network products are flourishing. Sites such as MySpace, Facebook, and Orkut attract millions of visitors a day, approaching the traffic of Web search sites. These social-network sites provide tools for individuals to establish communities, to upload and share user generated content, and to interact with each other. Specifically, users can connect to each other explicitly by adding friends, or implicitly by joining communities. However, the rapid growth of the amount of data on social-network sites (e.g., the number of communities) has made information finding increasingly challenging. To help users overcome the information overload problem and sift through huge amounts of information efficiently, recommender systems have been developed to generate suggestions based on user preferences.;In this dissertation, we focus on the information recommendation tasks in social networks, including personalized community recommendations, image and community clustering, and automatic photo annotations. We propose a collaborative filtering method to perform effective personalized community recommendations by fusing multiple information sources. We then compare algorithms from very different domains, LDA and ARM, to evaluate their effectiveness for community recommendation task. To provide timely recommendations, we parallelize the proposed algorithms on distributed machines to efficiently handle large scale data sets. In parallelizing spectral clustering we investigate two representative ways of approximating the dense similarity matrix. We compare one by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. Through an empirical study on a large document data set and a large photo data set, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem. Additionally, we outlined the functionalities of the Fotofiti (FF) website, a research platform for automating semantic annotation of digital photographs as well as integrating various components of ongoing research such as landmark recognition.
Keywords/Search Tags:Data, Sites
Related items