Font Size: a A A

Research On Automatic Abstract System Of Chinese Web Page

Posted on:2006-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:X D XuFull Text:PDF
GTID:2168360155472114Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present, with the popularization of Internet, the network has already become an enormous information source. And people face an extreme problem: How to search information needed and obtain the main content, how to read the new information emerging out every day. Automatic summarization is exactly one of the strong tools that solve this difficult problem. People firstly can get the summarization of text with computers, and then decide whether to read the full text carefully or not. That will raise the efficiency of obtaining the information of the electronic texts.The research and development of the text summarization have become quite valuable in both research and commercial areas. Currently, western countries have made noticeable progress on the research in the above areas. However, domestic research in this area is still at its beginning, and let alone the web page processing. The goal in the research of this subject is to propose a kind of automatic summarization method to web page, which bases on carrying on further investigation on technology of the automatic summarization, and to develop a web page automatic summarization system actually. This system can be used as an auxiliary tool to search, and it is unrestricted about field. The summary content should have completeness, and it also has certain speed and readability.Accordingly, in the paper, text information of the web page is extracted on the basis of analyzing information characteristic of web page firstly, secondly, comprehensive statistical method and heuristic rule are used to get the keyword and key sentences, and finally, the eligible summary sentence according to the proportion of the summary is obtained. In this course, the related problems and techniques about text summarization are discussed in detail. The algorithm about extracting text block and subtitle from web page is present. And a method of getting summarization of web pages on the basis of combination statistical method and text structure analysis is submitted. The corpus base about web page is also analyzed. Finally, this paper synthesized the above research results and designed and realized a systematic model of automatic summarization of a Chinese web page, which has been tested by the real web page.The test result shows that the summarization of most files can meet the demands of the completeness and generality, having reflected the main content of the file. This proves the method of summarization that the paper puts forward is feasible. But the summarization is affected by types of the article, The readabilities of some summaries need to be improved. On the premise of not influencing processing speeds, how to make use of some nature language understanding and technology of producing to improve the quality of the summary properly, especially to improve the readability of the summary, will be a focal point that will be studied further in the future.
Keywords/Search Tags:automatic summarization, Chinese web pages, subtitle, extracting information, discourse analyzing, key words
PDF Full Text Request
Related items