Font Size: a A A

Community Detecting Based On Latent Semantic Mining

Posted on:2014-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2268330401967103Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the time of Web2.0, the social network service is getting more and more popularworldwide. The user structure, information structure and community structure of thesocial network are significant subjects of complex network researches.In recent years,Sina microbloging is developing fast and providing open data platform for developersand researchers, which becomes one of the hottest topics for industry and academia.Based on the open data platform of Sina microbloging, detailed statistics andanalysis are made in this thesisfor the user structure and information structure. On thesubject of community structure, combined with the sociability and media characteristics,the author proposesa method to separate the user’s "social dimension" and interest"dimension. The LDA, which is more suitable for social community searching based on"term-document" corpus, is improved. We set the original corpus to "user-friends" and"user-interest points" corpus for LDA training and community detecting.This thesisshows detailed statistics and analysis for user catalogs and statisticalphenomenon. At the same time, the author makes deep analysis on the informationstructure of Sina microbloging for detailed explanations.Improved LDA models areproposed in this thesisfor "social and interest overlapping community".1. SI-LDA model is proposed for social community detecting. The basicassumption is his friends can describe every user and other famous users can alsodescribe the interests of a user. The key for social community is the relationship of users.Based On RA(Resource Allocation) method, we optimized the LDA corpus and used thePageRank algorithm to find the famous users that can describe interests. The authordoes the community detecting for a sample of200thousand users and compares hismodel with Louvain Method.2. Naming and indexingmethods based on SI-LDA model are proposed. SI-LDA isdivided into2LDA models and simplified, by which we can label and find thecommunity for a new user.100million users are tagged by SI-LDA and deep analysis isexpressed in the experiment of this model.
Keywords/Search Tags:LDA, community detecting, RA algorithm, PageRank algorithm
PDF Full Text Request
Related items