Font Size: a A A

The Design And Implementation Of Automatic Summarization System On Chinese Web Pages

Posted on:2012-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z XiongFull Text:PDF
GTID:2218330371952196Subject:Computer technology
Abstract/Summary:PDF Full Text Request
When the user submits the query to search engine through a browser, the search engine returns the page title, URL, and page summary of hits to the user. Users always make the relevance judgement by only browse the page summary instead of whole content of web page. Thus the page summary has a very important role for users to quickly understand what about the page is. So it can improve the search efficiency.In order to reflect the audience's query needs, this paper is to implement a dynamic query-based extraction summarization system. The main works are as follow:1,I have designed and implemented a complete summary generation system, which involves two processes. One is Web page pre-processing and the other is extraction summary. Web page pre-processing includes parsing the HTML document, noising elimination and the segment of the Chinese sentence. As the web pages have so many different forms and have noise information, and their layouts are so complex, we must first pre-process web pages to get the text messages. The segment of sentence is to make the generated sentence more complete and coherent. The extraction process first divides the sentence into words, and then calculates the weight of each sentence with the query features, TF / IDF features, cuewords features, and the location feature. Then we select a few highest weighted sentences as a summary, and finally form the final summary in the original order.2,Secondly, I built up a web page summarization evaluation system. This system gets the summaries data from three commercial search engines including Baidu, Sogou, and Yahoo. Using this data I use an improved pyramid method by adopting the longest common subsequence of characters as its SCUs (Summarization Content Units). This method can automatically evaluate the system with mass of summaries. This experiment demonstrates that the feature selection and weight calculation are reasonable.
Keywords/Search Tags:Web Page Summarization, Summary Extraction, Summary Evaluation
PDF Full Text Request
Related items