Font Size: a A A

Automatic Summarization System Based On Natural Language Processing

Posted on:2007-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2208360185456434Subject:Computer applications
Abstract/Summary:PDF Full Text Request
In this thesis,the author first introduces the latest development of Automatic Summarization System in domestic and abroad, which shows the lack of the automatic summarization system research. Then the author introduces some basic concepts about automatic abstract system. Secondly, some basic concepts about Abstract and automatic summarization system are introduced, and the main formal models and methods of system are compared and analyzed, such as statistics based, meaning based, concept based, knowledge based etc. We induce their characteristics and put forward a kind of comprehensive automatic summarization system based on latent semantic analysis and text multilevel dependency structure.Latent Semantic Analysis (LSA) is a completely automatic theory and method of the acquisition and representation of knowledge, which extracts the contextual-usage meaning of words by statistical computations applied to a large corpus of text. LSA is similar to Vector Space Mode (VSM), representing textual materials with space vectors. LSA can advance the accuracy of subsequent processes by using a truncated Singular Value Decomposition (SVD) to remove the influences of synonymy. In this paper, the authors introduce the basic ideas, characters and implementations of LSA,and discuss the applications based on LSA.Text Multilevel Dependency Structure (TMDS) is one kind of method used in automatically realizing to withdraw and expression the knowledge. If regards each part as the pitch point, and draw a line in two parts that are semantically relate to another one, then we obtained a connection network. It clear expression article overall construction; At the same time the text structure penetrated a stride compared to the language surface structure, can accurately survey the central content of one article according to the chapter structure. Thus, the automatic summarization based on the text structure can avoid much shortage of the mechanical digest, guarantee digest quality.A new text summarization method is proposed. It process documents not only based on latent semantic analysis, but also based on text multilevel dependency structure. The method first analysis the latent semantic structure of texts, make single value decomposition on text-matrix, reconstruct the semantic matrix; then a method based on text multilevel dependency structure is adopted, deeply analysis the content of the semantic matrix, abstract the important sentences to generate Abstraction and make up the shortage of latent semantic analysis on structure and syntax.
Keywords/Search Tags:natural language process, text summarization, latent semantic analysis, text multilevel dependency structure
PDF Full Text Request
Related items