Font Size: a A A

Research On And Implementation Of News Reader Based On Text Analysis

Posted on:2012-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:X MaoFull Text:PDF
GTID:2178330332978559Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The explosive growth of Web brings richer news information to users, at the same time makes it more difficult for them to find interesting information. To solve this problem, the news reading platform comes into being. News reading platform uses focused crawler to collect news data from different websites, then processes and analyzes the data to extract the most valuable information for the user, making the reading faster and more convenient. This thesis focuses on the news reading platform and related key technologies, and carries out the following work:By analyzing news properties, we introduce the time decay model and virtual graph model, and then propose a news ranking algorithm based on information mutual reinforcement. We use bottom up clustering to compute news topics, and rank the three of news sources, topics and articles simultaneously.Based on the traditional spectral clustering algorithm, we introduce the concept of temporal smoothness to express the similarity between several clustering computations. We also add the related link constrained model and tag constrained model, and propose a constrained temporal smooth spectral clustering algorithm. The relaxed optimal solution can be obtained by solving the matrix eigenvalue decomposition in the modified target cost function of spectral clustering.We design and implement Eagle NewsReader, a news reading system. It has crawled more than 1.4 million news articles from 28 source websites, and provides users with unified news reading service after data processing and analyzing. Both the ranking and clustering algorithms proposed in this thesis have been applied in the system, and achieve good results.
Keywords/Search Tags:News Reading, News Ranking, Spectral Clustering, Text Analysis, News Topic
PDF Full Text Request
Related items