Font Size: a A A

Document Summary Theme-oriented Research

Posted on:2012-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiuFull Text:PDF
GTID:2208330332992449Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the fast development of our economy, society and Internet, network information is explosive growth. How to quickly obtain useful information from information has become a problem to be solved. At present the search engine technology mainly used for general information processing, but in specific fields, there is not mature system.This paper studied summarization of the search engine technology. Focused on efficiency, related to query and reflecting the main content of document.Automatic summarization is the key of this paper. This paper has focused on work of three areas:improving the summarization efficiency, introducing keywords extraction to summarization, the removal of redundancy between summary sentences. To improve summarization efficiency, we introducing the inverted list structure to calculation the features of words and sentences, and use Double-Array Trie to storage segmentation dictionary and user thesauri, so as to improve the efficiency of word search. Combined keywords extraction and summary extraction, introducing accessor variety and the position locality of the words to respectively improve the extraction of high frequency and low-frequency word. To lessen the redundancy between sentences, this paper propose the concept of inclusion between sentences. Through sentences inclusion, reduced the probability of extracting sentences that one includes another to summary together, so as to improve the abstract quality.In addition, the paper implemented a vertical search prototype system, and applied query-focused summarization in vertical search. Successfully applied coding compression, memory exchange and memory cache technology in word segmentation system, and the application of this system in the standard retrieval and organization search, has been online in Institute of Standardizatioin of Hebei province and Name Address Center of China Post. During the period of testing. It integrated automatic summarization, vertical search and database connection together, and provided an uniform solution of standard management. The solution take summarization, search technology to end users.
Keywords/Search Tags:Summarization, Query, Keywords Extraction, Inclusion Relationship between Sentences, Vertical Search
PDF Full Text Request
Related items