Font Size: a A A

Research On Correlative Techniques Of Hot-topic Discovery About Internet Public Opinion

Posted on:2011-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Y QinFull Text:PDF
GTID:2178330332960368Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the network media has become an important origin of people gaining information, the network complex information enhances greatly to the social populace's influence. Traditionally analyzing news corpus and withdrawing hot-topics is implemented difficultly by the specialist in the artificial way. How using automation manner to discover hot-topics rapidly in the huge news stream, becomes an important researching direction. Based on the present study status on hot-topic discovery, this thesis has analyzed the existing method, in view of easy to create the subject drifting in the topic discovery process, this thesis has designed one network hot-topic discovery model based on the analysis of topic characteristic. The main content of this article are as follows:This thesis proposed hot-topic discovery process based on the report subject partition. Analyzed the topic and report multi-aspects characteristic caused two kind of subject driftings when the whole report represent with a vector, proposed the report's subject partition thought,and confirmed the feasibility of the news subject partition by TextTiling algorithm through experiment. Given subject recognition methods based on double-level clustering, and compared with the Single-pass clustering through experiment, and indicated double-level clustering algorithm has the better subject separating capacity and the accepted clustering performance. Proposed a way of selecting topic's seed report and renewal topic's subject and using the common subject among reports to carry on the topic recognition.This thesis proposed using multi-strategy optimization method in the processing incremental corpus. Using filtration strategy and the similarity weaken on the grouped corpus to avoid the non-correlated topic comparison, enhanced the system performance, simultaneously prevented two similar topics mutually the noises, enhanced the system's accuracy. In summary, this thesis based on the analysis of topic characteristic, focus on hot-topic discovery designed the process of hot-topic recognition based on the subject partition and multi-strategy optimization in the incremental corpus in detail. We confirmed the feasibility of the plan through experiments, it's an efficiency way to improve the problem of subject drifting, enhanced the hot-topic discovery performance. Finally, this article also narrated the aspect which awaits improvements.
Keywords/Search Tags:Hot-topic, Subject partition, Text clustering, Seed report
PDF Full Text Request
Related items