Font Size: a A A

Design And Realization For Automatic Summarization System Of Search Engine On Chinese Web Document Of Science And Technology

Posted on:2009-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:H E ZhuFull Text:PDF
GTID:2178360242982076Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the network has already become the warehouse of the data and source of knowledge. How to obtain and utilize these resources faster and more effectively is a problem needing to be solved urgently. Now, the information retrieval and automatic summarization is the key technology to this problem. Automatic summarization technology can express the content of a document compactly, which is the direction of the information retrieval. But, as the main instrument of the search engine, now it can only return some sentences or paragraphs as the abstract which including the key words. It is too hard for the users to grasp the content of the Web document through this kind of result. And the traditional text summarization technology based on the frequency statistic only focuses on the external characteristic of text. And it is lack of deep semantic analysis of the text. So, it isn't totally suitable for the summarization of the Web document.The goal in the research of this subject is to propose a kind of automatic summarization method for Web document of science and technology in Chinese, which is based on carrying on further investigation on technology of the automatic summarization. Further more, a Web document automatic summarization system is developed actually in this paper. As an auxiliary tool of search engine, the summary content should be complete, general, continued either.Accordingly, in the paper, text information of the Web is extracted on the basis of analyzing information characteristic of Web firstly. Secondly, comprehensive statistical method and heuristic rule are used to get the keywords and key sentences, and finally, the eligible summary sentence according to the proportion of the summary is obtained. In this course, the related problems and techniques about text summarization are discussed in detail. The algorithm about extracting textual abstract and subtitle from Web document is present. And a method of getting summarization of Web document on the basis of combination statistical method and text structure analysis is submitted. Finally, this paper synthesize the above research results and design and realize a systematic model of automatic summarization of a Chinese science and technology Web document, which has been tested by the real Web page.The test result shows that the summarization of Web document can meet the demands of the completeness and generality, and covers the main content of the file. It proves that the method of summarization in this paper is feasible. But the continuity of the some summaries is limited. How to make use of some nature language understanding and technology of producing to improve the quality of the summary properly, especially to improve the continuity of the summary, will be a focal point that will be studied further in the future.
Keywords/Search Tags:Automatic summarization, search engine, science and technology document in Chinese, Web cleaning, summarization extraction
PDF Full Text Request
Related items