Font Size: a A A

Exploring Temporal Text Mining For News Content Anatomy And Recommendation

Posted on:2011-04-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:1118330332978381Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid growth of the Internet greatly accelerates information propogation. Web news plays a very important role on the Internet, and has already became one of the most widely used Web applications. Web news is the report of the recently happened fact which is publised on the Web. Compared to traditional news media, Web news has many advantages such as freshness, capability, richness, interactivity, searchability etc. It greatly faciliates users to get information from the outside world. However, the massive amount of Web news is also coupled with information overload problems.News content anatomy and recommendation can greatly fulfill users' requirements of Web news. News content anatomy is the process of extracting previously unknown, understandable and usable patterns from news content. Based on the analysis of users'usage pattern of Web news, recommendation system automatically pushes users'preferred news to them. Both news content anatomy and recommendation deal with temporal text, and the key of them are the temporal text mining techniques. By exploring temporal text mining, we study multiple problems of news content anatomy and recommendation, as follows:We firstly propose a bursty event detction method by analyzing bursty features in temporal news corpus. The features in the copus are represented as feature trail and are then transformed to wavelet domain. We introduce an elastic burst detection algorithm to identify multi-scale bursty features, and model them as a vector. By setting the preference as features' power (bursty level), affinity propagation clustering algorithm is used to group these bursty features with high document overlap and identically distribution in bursty time windows together. Then, events are returned to users with the order of their power.We then study a particular news stream monitoring task:timely detecting of bursty events which have happened recently and discovering their evolutionary patterns along the timeline. We use a multi-resolution sliding window to monitor the feature trail and apply an online multi-resolution burst detection method to identify bursty features with different bursty durations within recent time window. We cluster bursty features to form bursty events and associate each event with a power value which reflects its bursty level. An information retrieval method based on cosine similarity is used to discover the event's evolution along the timeline.We further introduce an online event detection algorithm in news stream. Firstly, we represent a feature stream as a random process and apply a goodness-of-fit test to find out these features with obvious changes in distribution of term frequency in a news document. Left side significance test is further used to validate bursty features. Then, an evolutionary spectral clustering algorithm is applied to group highly correlated bursty features to form bursty events.To help users understand various aspects of a tempoarl news stream, we study topic decomposition and summarization for a temporal-sequenced text corpus of a specific topic. We derive sub-topics by applying Non-negative Matrix Factorization (NMF) to terms-by-sentences matrix of the temporal news stream. And then, we detect incidents of each sub-topic and generate summaries for both sub-topic and its incidents by examining the constitution of its encoding vector generated by NMF. Finally, we rank each sentences based on the encoding matrix and select top ranked sentences of each sub-topic as the tempoal news corpus'summary.Finally, we present an architecture for providing personalized phonic Web news in Internet-connected consumer electronics. It provides two types of personalization. An adaptive channel navigation method is introduced to help users reach relevant channels quickly. Besides, a news recommending strategy is proposed to track multiple threads of users'interests and provide users with preferred news. Finally, we implement this system named EagleRadio. EagleRaido can not only provide personalized phonic news, but also integrate some news content anatomy funcitons, such as bursty events dectection, user's interests modeling and visualizaiton.
Keywords/Search Tags:temporal text mining, news content anatomy, news recommdation, bursty feature, bursty event, affinity propagation clustering, evolutionary clustering, topic decomposition, adaptive channel navigation, multiple topic user modeling
PDF Full Text Request
Related items