Font Size: a A A

Research Of Short Text Classification And Clustering In Public Opinion Analysis

Posted on:2014-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:L L DiFull Text:PDF
GTID:2268330425494518Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of Internet, Internet has become a new medium which influencespeople’s work and life. More and more people are willing to express personal feelings, attitudesand viewpoints on the international situation, government policies and social events through allkinds of platforms which the Internet provides. Therefore, the Internet has been the main mediumof public opinion propagation. In order to prevent social public order from negative impact ofpublic opinion, governments want to grasp the dynamics of public opinions timely and control it indue course. Text categorization and text clustering are two important technologies of naturallanguage processing in the course of analyzing public opinions from Internet. The quantity of shorttexts from text data of Internet is so large that it is very necessary to research categorization andclustering of short texts. At present, there are many studies of long text categorization, but few forshort text classification. And algorithms really suitable for short text classification have notappeared yet.Aiming to solve above problems, this paper firstly introduces the present research situation oftext categorization and clustering and short text classification, analyzes the process of textclassification and clustering, mines key technologies, and presents the tests of clustering algorithms.Secondly, based on the study of long text classification, the paper puts forward a methodgenerating a "dictionary" from a training set which consists of long text corpus, and improving onTF-IDF which is used to calculate the weight of the features. Then, a new algorithm, combiningthe improved algorithm of the simple vector distance (Rocchio) and K nearest neighbor (KNN), isused to implement short text classification. Finally, in order to put the new algorithm into use, some tests are conducted and presented inthis paper. The design of a public opinion analysis system, diagrams of its information collection,processes of short text classification and text clustering are also presented in the paper.
Keywords/Search Tags:Network Public Opinion Analysis, Short Text Classification, Text Clustering, Simple Vector Distance, K nearest neighbor
PDF Full Text Request
Related items