Font Size: a A A

Event Temporal Summarization System Based On Text Clustering

Posted on:2016-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2308330503950634Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid increase of network data, human beings enter the era of big data inevitably, and they face the dual challenge of “data explosion” and “knowledge deprivation” at the same time. In news reports, for example, Internet users will be drowned out in the huge amount of news reports when an emergency broke out, and it is difficult for them to obtain useful information of the event timely, effective and comprehensive. As a result, in the era of big data, the problem needed to be solved is no longer how to access to information, but how to timely, effective and comprehensive help users get the most useful information in the shortest possible time.In this sense, this paper studies and implements an event temporal summarization system based on text clustering. The goal of the system is to track the evolution of the emergency efficient, excavate the main useful information of the incident from the vast amounts of news reports, and present to the user in the form of event temporal summarization. In this case, users can intuitively grasp the evolution of the event. The mainly work and innovations in this paper are as follows:First of all, in order to overcome the question that the recall and precision of information retrieval is not high, this paper studies and realizes a query expansion algorithm based on generalized semantic distance. The algorithm regards word as a unit, uses the open interface of search engines, calculate the generalized semantic distance between words based on the generalized network co-occurrence frequency. Compared to traditional query expansion methods based on static ontology, this method can effectively improve the reliability of the algorithm.Secondly, this paper puts forward and implements a new method to calculate the text similarity based on the emergency news’ text structure features. First, Extract the elements of breaking news events by putting the task of extracting news event’ elements into a named entity recognition task, and then set up news events’ element representation model combined with the result of query expansion. Secondly, considering that timeliness is the key influence factor for news report, this paper introduces the concept of time window and establishes event time representation model when calculating the similarity between different reports. Finally, compute the similarity of news text combining news event elements representation model and event time representation model. The experimental results show that the algorithm is superior to the traditional algorithm, and the performance improved significantly.Next, this paper studies and analyzes the background of big data, and proposes the limitation of various clustering algorithms, then combines with the characteristics of the emergency, uses dynamic hierarchical clustering algorithm to cluster the news text based on the news reports’ sub-theme. Dynamic hierarchical clustering algorithm can consider not only the connectivity between different clusters, but also the degree of approximation between different clusters. The experimental results show that the algorithm proposed by this paper can significantly improve the clustering performance.In addition, the prototype system designed in this paper participated in the Temporal Summarization task of Text REtrieval Conference. The evaluation results is in the second of all eligible teams, which fully demonstrates that the method designed in this paper can achieve the desired effect.
Keywords/Search Tags:event temporal summarization, text clustering, query expansion, the similarity calculation
PDF Full Text Request
Related items