Font Size: a A A

Detection Of Hot Events In Streaming News Using Hierarchical Clustering And N-gram Models

Posted on:2012-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:W L WangFull Text:PDF
GTID:2218330362457518Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet applications, the way returning search result by keyword-based query users input in search website, of traditional web search is the bottleneck to meet largely growing search demand. Detecting hot events from Internet, and show relative queries with interesting hot event to user is significant for improving user experience. Hot event detection is a new research area in web search, by means of mining information from Internet resources show result for users. Unlike topic detection and tracking and event detection, hot event detection focuses on the time feature and frequency in a period of time, also it must be easy to understand not just a few semantic words. It is meaningful to find a simple method with high accuracy to detect hot events based on Internet data.The thesis presents a new hierarchical clustering and semantic model of mining from news stream corresponding hot terms. The models first clusters word into candidate hot event, and then make serialization of candidate hot events to represent hot events so that it can help users understand the meaning of hot event. The model applied to really simple syndication (RSS) news for hot event detection, it showed that this method has good accuracy. We also define a new concept of pseudo-events to do evaluation for our hot event detection model.We prepare streaming news by crawling MSN, BBC news website in seven really simple syndication news feeds. Using hierarchical clustering method and the semantic model on best feature set to analyze and detect hot events, adopted pseudo-event to evaluate the result. Our experiment shows a good result and it means that our method is simple and effective.
Keywords/Search Tags:Hot event detection, Hierarchical clustering, Event serialization
PDF Full Text Request
Related items