Font Size: a A A

Research On Hot Event Detection And Automatic Event Content Features Extraction

Posted on:2010-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:2178360275479607Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
After the advent of the new media - Internet, we have shaken off the shackles of information shortage and stepped into a new era with abundant information. But under the situation of information overload, the information on the Internet has demonstrated two significant characteristics: (1) dramatic expansion of scale; (2) confused and disordered structure. They have made it more and more difficult to discover and manage the needed information, therefore, a tool which can quickly obtain our needed information from the Internet has become people's urgent need.Currently, search engine can fulfill our need for quickly obtaining the needed information to some extent. However, by locating information using keywords matching, this method may result in returning information of great redundancy, since the resources will be returned as long as they contain the query keywords, which will undoubtedly, makes the result include much irrelevant information. Moreover, it simply lists out the information in the result without efficient organization, which fails to help people to have an overall understanding of certain news events. Although some authoritative organizations promulgate hot event ranking lists of the Year in certain areas every year, many of which are usually screened by manual efforts, thus there are great limitations in both the objectivity and real-time of the results.This paper designs a Hot Event Detection and Description model, implements the experimental system and tries to mitigate the above problems to some extent. The system is focused on the stream of news report on the Internet, which automatically discovers hot events on the Internet during any period selected by users, and can extract the content features of the hot event and present comprehensive event-related information to users. Therefore, the main research in this paper is done as follows:1. The design of a hot event detection algorithm based on two-layer clustering. Since it processes a large scale of Internet data stream and in order to reduce its complexity and improve the real-time efficiency and accuracy of event detection, the system divides the news corpus into hundreds of groups according to the date. We further divide each group into some macro-clusters using the first-layer clustering; select the macro-clusters of each day during the period selected by users into event lists by the second-layer clustering.2. The induction of an event's Hot Degree calculating formula. The formula is consisted of the characteristic quantities which can measure the Hot Degree of the events and are extracted by analyzing the characteristics of hot events in the past years. The system sorts the event lists after calculating the Hot Degree and filters the events which don't fit in the characteristics of hot event.3. The research of calculating method of automatically extracting the content feature of hot events from different aspects. The system presents the overall information of the hot event to users according to event title, event summary, event relevant words, event relevant documents and event developing curve chart.Finally, this paper chooses the news corpus to do experiments and conducts relevant evaluation. The results show that the experimental system in this paper has achieved a better effect.
Keywords/Search Tags:Topic Detection and Tracking, Event Detection, Hot Event Detection, Content Features Extraction
PDF Full Text Request
Related items