Font Size: a A A

Research On The Domain Ontology-based Automatic Content Summarization Of Web Documents And Its Implementation

Posted on:2008-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178360212490601Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the carrier of huge amounts of information, Internet is more and more popular. People are puzzled by a lot of web page information while enjoying the advantage of network. How to locate useful web information fast and exactly has been research hotspot now. Web information exists in the form of document. Key technology of web document automatic summarization is a powerful tool to solve this problem, which extracts the core content of web page with the analysis of computer. It can speed up the useful information acquisition process by judging the value of web page from its core content.However, it is not satisfied using the technology of web document automatic summarization to solve the problem yet. On the one hand, non-standard HTML tags and noise information of web page interfere with the precise web document extraction. On the other hand, existing technology of automatic summarization is based on statistics method, which ignores the analysis of content and subject of document, leading to decline of summary quality. Aiming at these shortages, this paper try to propose an approach to extract web document using down-top algorithm based on page segmentation and propose an algorithm of document automatic summarization based on domain ontology. The former approach improves the accuracy effectively and the latter algorithm merges semantic analysis into automatic summarization based on Latent Semantic Analysis Model (LSAM), improving the quality of automatic summarization. MIA, a domain ontology-based system is also described here to validate the algorithms. Main contribution the author made in this paper can be concluded as follows:1) A down-top web document extraction algorithm based on page segmentation is proposed according to the existing extraction method of web document.2) A domain ontology-based automatic summarization algorithm is given, which introduces semantic analysis into Latent Semantic Analysis.3) With above new algorithms, key components of Mobile Intelligent Assistant (MIA) system are designed, and a prototype is implemented to demonstrate the validity of new algorithms...
Keywords/Search Tags:automatic summarization, domain ontology, web document, singular value decomposition, text block
PDF Full Text Request
Related items