Font Size: a A A

Applied Research Of Data Mining For Social Network Site

Posted on:2015-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ZhangFull Text:PDF
GTID:2298330467455835Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the improvement of the terminals of the Social Network Sites, people are more and moredependent on the SNS, especially the Microblog, the social media based on shot text. Therefore, thecommercial value is quite clear. It not only can provide great utility value to various commercialtenants, but also provide convenience for microblog users by putting in their interest informationand user classification. All this are based on the establishment of the theme analysis model, whichcan effectively analysis and processing the microblog text, and mining the users’ interest and theirfocus areas behind the huge amounts of microblog data.As the length of a piece of microblog is about140words, and the information behind them isrelatively less, so it has seriously data sparsity problem, so the traditional text mining algorithmcan’t achieve very good effect. And we can unsupervisely classify the documents and words, andmining the latent meaning from the text by topic modeling.The paper using the Sina microblog data of different fields as the test corpus, and the data wereobtained by Sina open source API. As the microblog text has some “noise” problem, first of all, thepaper using regular expression to preprocessing the text. Then it discussed the Chinese wordssegmentation of microblog text, which was based on the Conditional Random Fields model. Aftersegmentation, the system removed the stop words from the segmented text. Then the paper designand implemented a thesis analyze system based on the Latent Dirichlet Allocation Model, whoseprediction of documents and words is based on Bayesian Network. The system can more accuratelyextract the topic of the microblog by modeling the given text and doing Gibbs sample. The systemis of high practical value.
Keywords/Search Tags:Social Network Site, Microblog, Chinese words segmentation, LDA, Topic Model, Analysis of the Topics
PDF Full Text Request
Related items