| With the rapid development of network, the abstract by manual has been unable to meet the current rapid development of the information , It is emerge that a faster, more comprehensive summary form is required,therefore,as the times required automatic summarization have come, automatic summarization uses the computer is to collect the original text abstract, Firstly , artificial transform unstructured natural language into a structured machine language that computer can recognize, then computer analyses the text and pick up the abstract, Finally, computer generates automatically the text abstract, so far, the theme of the article is provided to the user in summary form, so that users unnecessary to read the all text, you can find article that you required, It saves the time to obtain meaningful content that improve the working efficiency.Latent semantic analysis ( LSA ) is a new information retrieval model, which analyses a large set of texts with statistical calculation methods in order to extract the latent semantic structure between the words, It uses the latent semantic structure to represent the words and texts to eliminate the correlation of words and simplified the text vector to achieve the purpose of dimensionality reduction, the LSA theory is applied to the automatic summarization system that improves the quality of the system greatly.Clustering divides a group objects into several groups or categories, in a word, the classification process is that the similar elements is divided into the same group and the different elements is divided into the different groups, the paper researches the paragraphs clustering of single text, in other words,it is transform a series of passages into a number of subsets or clusters, the aim is to build a few clusters that closely in the class and separate between different classes.This paper presents a automatic abstract system based on latent semantic analysis ( LSA ) and paragraph clustering, firstly,computer abstract several semantic similarity paragraphs into a few clustering classes, then it abstracts several sentences which can express the theme of a text from each paragraph clustering as a text initial abstract, finally it polishes the initial sentences and generates the final abstract.the features of this paper: it uses LSA to compute the similarity of sentences, combining the hierarchical clustering and K-means algorithm to paragraph clustering, so that the sentence similarity computing and paragraph clustering classification is more accurate, finally, it processes the candidate selection with the method to optimize the processing, processing, polishing, after experimental verification, it is proved this generated abstract quality is more accurate, more comprehensive, more concise than basing on statistical method for the generation of abstract quality. |