Font Size: a A A

Research On Topic Detection And Tracking In Internet Public Opinion

Posted on:2015-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:N LinFull Text:PDF
GTID:2308330461974987Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, people are more willing to express their thoughts attitudes and feelings in the network on the grave or sensitive event. Internet has become the most important carrier of public opinion. Therefore, Internet public opinion analysis and supervision has become a very important and urgent problem to country, government, enterprise and organization. This paper focuses on the research of Topic Detection and Tracking which belongs to Internet Public Opinion analysis technologies. Topic Detection and Tracking is the new study of how to identify the informations which describe the same topic and track the follow-up informations of the topic from the massive information in the network. It is based on information retrieval, text mining technology and natural language processing technology.This paper focuses on the research of Topic Detection and Tracking of web news. It introduces the concept and research status of TDT, the technologies of information collection and preprocess.In the research of Topic Detection, for the topic detection model, aiming at the disadvantages of VSM and the features of web news, it puts forward an improved model TD-VSM which used information entropy and structure characteristic of web news to improve TF-IDF and used time characteristic of web news to improve Cosine Similarity. For the topic detection algorithm, aiming at the disadvantages of Single-Pass which is sensitive to input sequence and threshold selection and the features of Topic Detection, it puts forward an improved algorithm MSTLC which used tow layers clustering to promote the performance of the topic detection. In the first layer clustering, it used an improved batch clustering algorithm named DBS-BIC-K-Means to aggregated the samples into micro classes. In the second layer clustering, it used an improved Incremental clustering algorithm Mutli-Centroids-Single-Pass to aggregated micro classes into the real classes.In the research of Topic Tracking, for the topic tracking model, according to the features of Topic Tracking, it puts forward an improved model TT-VSM applies to Topic Tracking. For the topic tracking algorithm, it focused on the research of KNN and SVM. According to the features of Topic Tracking, it puts forward an improved algorithm I-B-SVM-KNN to promote the performance of the topic tracking. It used the distance data to optimal hyperplane to decide classification algorithm, used class number compensation to solve class imbalance, and replace global data with boundary hull vector to incremental learning.Final, experiments show the effectiveness of the improvements.
Keywords/Search Tags:Internet Public Opinion, Public Opinion Analysis, Topic Detection and Tracking, TDT, Vector Space Model, VSM, Text Clustering, Tow Layer Clustering, Text Classification
PDF Full Text Request
Related items