Font Size: a A A

Research Of Automatic Chinese Text Summarization Based On Feature Information Extract

Posted on:2008-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:X H YeFull Text:PDF
GTID:2178360215456498Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With high-speed development of modern technology, newspapers, books, scientific literature, etc. in the language as the carrier of information have emerged. Especially of the high-speed development of internet, along with the daily emergence of a mass of information, to those from the massive information quickly and accurately find the information the user needs. Automatic document summary disposal has been increasingly concerned about the research topic.Natural language processing as a major area of application of automatic abstracting involves a lot of theory and application technology. People have dedicated in Automatic abstract for more than 20 years, since slowly development of the related technical level, automatic abstracting system can not be completely automatic syntax, semantic and contextual analysis, Summary results is indicative areas. Based on current research, use statistical methods to detect features in articles, access information of feature articles, establish an automatic abstracting system on this basis can generate a simple, accurate and comprehensive summary so that users can quickly obtain useful information from the massive information.Based on current research, we use statistical knowledge, consider the article genre, type, and score words according to the characteristics of the article. Design Chinese automatic abstracting system based on feature information. Studies include the following:(1) The study compares the current automatic abstracting research methods, Based on a statistical analysis of the feasibility and scope of application;(2) This paper presents a feature-based information extraction Chinese automatic abstracting, Analysis of the original version of the feature words and characteristics, structure vector space model; By calculating the sentence was an important article and remove redundancy methods, access to summaries of results;(3) Based on a design feature information from the Chinese automatic abstracting system;(4) In order to test the Chinese automatic abstracting the feasibility and effectiveness, In this paper the development of internal evaluation of the CAS system to be evaluated. Respectively from the national committees from a Corpus different style of the same genre categorization of the two groups for evaluation corpus, Evaluation results of the analysis, based on statistical understanding of the pros and cons of the method. Based on the statistical analysis methods for the different articles Genre digest the results of different reasons, for the future to further improve the system's Digest a meaningful path of exploration.
Keywords/Search Tags:Automatic Summarization, Features Information, VSM, Natural Language Understanding
PDF Full Text Request
Related items