Font Size: a A A

Research On The Application Of Text Clustering Based On LDA In The Analysis Of Network Public Opinion In Colleges And Universities

Posted on:2015-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:S P WangFull Text:PDF
GTID:2208330428981147Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the rapidly increasing of the users, more and more people through the Internet to express their views. Especially college students use the Internet more frequently. The Internet has become the main carrier of college students to express their opinions.In this paper, it has comprehensive descriptions of the theory of network public opinion, network public opinion information collection, network public opinion data preprocessing, the analysis of the network public opinion and common text clustering algorithms. First, it has a research on the network public opinion information acquisition and data preprocessing technique based on university BBS. Through the analysis of the data acquisition way of college BBS site, the dynamic web technology based on AJAX was proposed. At the same time, according to the structure information of the university BBS website, web page cleaning method based on DOM technology is designed. It also uses the API provided by Pangu word segmentation system to grab the data Chinese text segmentation.Because of the traditional clustering algorithm based on word frequency to build the space vector, it make the dimension is too high and the calculation results accurate enough. In the light of the above defects, this paper presents a clustering algorithm based on the combination of LDA topic model and vector space model to calculate the similarity of text. LDA topic model is a text of the underlying theme probability generation model; it can solve the semantic relationship between texts. At the same time, the LDA theme model has strong ability of dimension reduction; it can improve the accuracy of clustering results. The clustering algorithm based on the combination of LDA topic model and vector space model to calculate the similarity of text proposed in this paper not only solves the deep semantic information leakage problems of traditional text clustering, but also solves the problem of the LDA topic model for too much dimension reduction makes dimensions is too low to could not distinguish the texts.Rely on the above research results; this paper designs the overall architecture of the network public opinion analysis prototype system and each function module. And using the implementation with the VS2010, it verifies the research achievements of this paper.
Keywords/Search Tags:university network public opinion, data cleaning, LDA topic model, spacevector model, cluster analysis
PDF Full Text Request
Related items