Font Size: a A A

Research And Implementation Of Blog Document Automatic Summarization Based On Ontology

Posted on:2010-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:S SunFull Text:PDF
GTID:2178360272991634Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Web 2.0, many Web 2.0-based applications came into being. BLOG is one of the representative applications. However, the number of BLOG documents has increased sharply. How to quickly and efficiently access and use these resources has become an urgent problem. And automatic summarization is the key to solve this problem.Based on the study of BLOG document automatic summarization and Ontology, I proposed a solution for Ontology-based BLOG document automatic summarization. In this paper, the key contents can be summarized as follows:In this paper, I proposed an Ontology-based BLOG document automatic summarization system architecture (O-BSSA), which was used to guide the generation of BLOG document automatic summarization. O-BSSA used autonomy capacity and cooperation capacity of agent and combined Ontology technology with automatic summarization technology. O-BSSA adopted multi-agent structure, and implemented BLOG document collection, pre-processing, modeling and summarization. O-BSSA is with the character of high parallelity, high reliability and high expansibility.Under the guidance of O-BSSA, I conducted a deep sutdy of BLOG document modeling technology, BLOG document topic analysis technology and BLOG document automatic summarization technology. In the modeling phase, I proposed a new approach for weighting keywords using BLOG features. This method was based on the consideration of TF * IDF method and BLOG document structure feature, tag feature and comment feature, which made the method more suitable for BLOG documents. Then, I used Vector Space Model (VSM) to describe the information of BLOG documents and extracted feature items from BLOG documents with Latent Semantic Analysis method. In the topic analysis phase, I used the concepts and their relationship defined in BLOG Ontology to build concept hiberarchy tree to count concepts instead of keywords and analyse BLOG document topics. This method not only made use of similarity-based structure analysis method but also made use of description ability of semantic concepts in Ontology. In the summary phase, I considered the importance of feature items in sentences, sentences in paragraphs and topic in paragraph clusters which were used to calculate the score of each sentences. Then I chose the representative sentences in BLOG ducuments according to abstract compression ratio. In the limit of summary length, this method not only can effectively avoid similar semantic sentences being chosen, but also can distribute the representative sentences into each topic, which made the final summary more concise. Finally, based on the above theories, I carried out a prototype system-Ontology-based BLOG document automatic summarization system. Then I did some experiments. The results showed that summary redundancy, coverage and accuracy had been improved.
Keywords/Search Tags:BLOG Document, Automatic Summarization, Subject Analysis, Latent Semantic, Ontology
PDF Full Text Request
Related items