Font Size: a A A

The Study On Automatic Summarization Approach Of Blog Documents

Posted on:2012-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:J MiaoFull Text:PDF
GTID:2218330338463714Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the extension of information explosion, people put forward new requirements to obtain information. People no longer satisfy with the multimedia and articles displayed by news media and portals, but more inclined to interflow,sharing and interaction with the other users of inter-net. With the increasing of the requirements, Blog has gained extensive attention and further development as a poster child for WEB2.0. Because of the good and bad Blog articles and comments, how to obtain the main contents of the article and give readers a summary of contents considering the comments become a difficult problem for many Blog applications, The research of automatic document summarization for Blog proposes a method to solve this problem.At present, most of the research for automatic summarization focus on single document or multiple documents under the same topic, and summarization methods for interactive entity of internet are few. Proven general automatic summarization method have understanding and analysis bias in Blog content or structure, and the generated summarization is also of poor quality mostly. There is a lack of corresponding summarization method in handling Blog articles, and the existing methods are also of poor effect. This paper analyzes the relations and characteristics of Blog elements, and proposes an automatic summarization method for Blog articles.This paper will include the following aspects:1. This paper proposes many features to quantify the importance of Blog elements. Through analyzing the characteristics of Blog elements,this paper proposes Blog statistical features, content complexity and opinion Uniqueness to quantify the importance of Blog elements. Experiments show that these features could effectively improve the quality of Blog summarization.2. This paper proposes a method to rank comments and filter noise. According to the relevant features of the text and comments, this paper obtains the comments score and setting the filter threshold through regression approach. Experiments show that the method could rank comments effectively and the noise filtering effect is obvious.3. This paper proposes a HITS-based method to rank sentences. In the method, sentences in the Blog text and comments are considered as vertex of HITS graph, the link graph are generated through analyzing the relation between Blog text and comments, then obtain sentences ranking through HITS algorithm.4. Based on the research on the question above. This paper proposes a summarization method for Blog article. This method includes the following steps: comments weighting, noise filtering, sentences weighting and summary generation. Experiments show that this method works better than some known ones on the dataset of ifeng Blog in terms of the score of ROUGE.
Keywords/Search Tags:Automatic document summarization, Blog, comments, HITS
PDF Full Text Request
Related items