Font Size: a A A

Research On Dynamic Topic Evolution Of Chinese News And Its Key Techniques

Posted on:2013-02-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J ZhaoFull Text:PDF
GTID:1228330377951802Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the growing popularity of network news has become an irresistible trend. Comparing to the traditional news media, the advantages of network news on the timeliness and interactivity make it has more far-reaching influence in reporting on major social events and become the main source for people to get news information. Usually, people hope to make network news stories related to a news event to be integrated into a news topic to understand the overview of news event comprehensively. Faced with this practical demand, the news topic-based research, combined with different areas of study, generates many research directions oriented the Internet, such as news search, news clustering, news classification and hot news discovery, and preferably meets the news topic detection problem. However, the traditional news topic detection which takes the news topic extraction as the main target is lack of the discovery and investigation of dynamic evolution characteristics for the news topic. And as the higher standards of users’ requirement for the news organization automation, how to make the news stream data to be reasonably and orderly organized by the research on the dynamic topic evolution becomes a hot research problem in the web information processing.The research on the dynamic topic evolution of news (or referred to the dynamic topic evolution for short) as a temporal semantic mining for the text topic evolution is established on the basis of traditional theory of topic model which makes the topic content to be the information representation can be used for computation and comparison. And then the topic sequence is constructed in chronological order through the full study on the topic time which is closely related to the topic information under different evolution episodes. And finally, the dynamic topic trajectory in the intensity and content is discovered deeply with clustering methods. The research on the dynamic topic evolution can obtain the topic information accurately and completely at all topic episodes to help users understand the cause and effect as well as the correlation and difference of news topic. Thus, it has a very important role in Web News Search, Network Public Opinion Monitoring, Internet Incident Detection and Emergency Management, etc.In this background, aiming at the theoretical problems and technical challenges of the dynamic evolution of news topic, this dissertation launches a system study on a number of key issues of the dynamic topic evolution in depth for Chinese news. Firstly, the dissertation introduces the background and significance of this research, and secondly studies the principal framework and the main research objects in the dynamic topic evolution, and thirdly discusses the related work in this field, and finally, summarizes the basic research idea of this study.Then, around the topic model, temporal information processing and evolution pattern mining of news topic three aspects, the dissertation based on the in-depth study firstly introduces a topic information extraction approach to Chinese news from Internet, and then proposes a temporal expression normalization algorithm for the real news text. And on this basis, a topic time parser on Chinese news from Internet is constructed to extract the topic time from texts automatically. At last, based on the combination of these works referred above, we put an evolution mining method for news topic with the unified framework.The main contribution of this dissertation can be summarized in the following aspects:(1) To address the performance degradation problem of topic model by the low accuracy of topic information extraction for Chinese news, we propose a new linguistic knowledge based approach to extracting topic information from Chinese Internet news. The new approach is based on the characteristics analysis on Internet news, and amends the mislabeled candidate topic words by introducing the Chinese part of speech and location features to construct heuristic rules. Therefore, the set of topic words is expanded. Our experimental results show that this approach is able to effectively improve the semantic correctness and topic integrity of extracted topic information.(2) To resolve the conflict between the selection mechanism of reference time in current related works and the real context of news texts, we present a dynamic reference selection algorithm based on temporal expression classification. This algorithm firstly classifies the temporal expressions in the document into two classes based on the temporal reference features derived from the modifier and temporal noun produced by temporal expressions. And then the time is associated with its correct reference time according to the corresponding class. Meanwhile, the Scenario-time Shifting model is introduced to solve the defuzzification problem. This algorithm noticeably improves the accuracy and universality of normalization system.(3) To meet the low precision problem of the topic time extraction, we propose a topic time extraction method based on the topic-time dependency. This method studies the location dependency and semantic dependency between topic and time to establish the topic-time mapping model through in-depth study of the story characteristics and structural features. And based on the mapping model, two strategies, topic weight and unsupervised learning, are introduced to extract topic time. This method has higher accuracy than compared methods, and greatly improve the correlativity between topic and topic time.(4) To cover the shortage of the feature computation and incremental updates for topic model in the dynamic topic evolution mining, we design an evolution pattern mining algorithm for news topic based on the feature evolution. This algorithm firstly builds the incremental feature computation model by introducing the dynamic word characteristics under topic evolution, and then conducts the forward fusion and reverse filter based on the existing topic-related stories and newly arrived stories. This algorithm remarkably improves the precision of topic model and fully enhances the overall performance of association calculation, and consequently heightens the correctness and completeness of results.In this dissertation, we solve the deficiencies of the evolution characteristics study on news topic in the current information techniques, and design a theoretical framework of the evolution discovery for news topic based on the temporal information, and in addition propose a system method of proceeding research on the dynamic topic evolution. Generally speaking, this dissertation lays a theory foundation for the special news integration and network public opinion warning that are guided by the dynamic topic evolution, and provides a new way for the further development of the network public safety theory and emergency decision-making technology.
Keywords/Search Tags:Topic Evolution, Topic Model, Temporal Information, Web InformationProcessing, Machine Learning
PDF Full Text Request
Related items