Font Size: a A A

Research On Theanalysis Of Microblog Information Based On Text Clustering

Posted on:2015-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:X HuFull Text:PDF
GTID:2298330452450787Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an emerging social media and information exchange platform, the microbloghas witnessed rapid development and extensive application in recent years. Bycontrast, the analysis and mining of microblog information is still at the preliminarystage. Because the microblog information is massive, short, informal, and with highrepetition rate, so it is hard to satisfy the requirement of microblog informationanalysis by traditional methods. It is under this background, the thesis introduces themethod of text clustering, and performs research and experiments on microbloginformation considering its characteristics, which aims to aggregate those microblogtexts that have similar content and detect microblog topics. In this way, it can not onlyeffectively organize the information, save the users from browsing numerousmicroblog texts, but also help in early warning of public opinion on microblog.The major work presented in the thesis is as follows. Firstly, the thesis analysesthe features of microblog text information, studies frequently-used microbloginformation analysis methods, compares the advantages and disadvantages of eachmethod, and establishes the scheme of the research which is based on text clustering.Secondly, the thesis designs processing flow based on text clustering considering thecharacteristics and processing efficiency of microblog information, which includesmicroblog text preprocessing, microblog text presentation and clustering process.Then, deeply analyses text representation and text clustering algorithm, choosesvector space model to describe microblog texts, and k-means algorithm to implementmicroblog texts clustering, discusses specific realization of microblog informationprocessing flow. Finally, performs expriments on preprocessing microblog data, anddeeply discusses the influences of feature dimension and distance to clusteringresults.The study shows that adopting min-max principle can well solve the “initialpoints sensitive” problem of k-means algorithm; on the computation of text similaritycosine distance is more suitable for microblog text clustering than euclidean distance,which can achieve high accuracy rate and recall rate, so it demonstrates the feasibilityand rationality of k-means text clustering algorithm to analyse microblog informationfrom practice, and provides the basis for in-depth analysis of microblog informationand the development of application system. The research work of the thesis provides a valuable reference to the application development of public opinion monitoring onmicroblog.
Keywords/Search Tags:the Analysis of Microblog Information, Microblog Topic Dection, TextClustering
PDF Full Text Request
Related items