| Multi-Document Summarization is one of the most important problems in the fields of data mining, information retrieval and so on. A lot of remarkable achievements have been made and a large number of algorithms have been proposed about multi-document summarization. However, due to the differences of applications areas, data, and the complexity of the task itself, there are still many problems in the multi-document summarization task. Faced areas of science and technology, we improve traditional clustering algorithm to do the multi-document summarization. Then we improve methods of information extraction, and we use the methods which based combination of rules and statistics to do the information extraction.We improve the traditional hierarchical clustering algorithm to do the multi-document summarization task, transform the multi-document summarization task into the task of documents clustering and improve the feature selection method. The traditional method of feature selection does not make any distinctions. We propose a method of feature selection which based entities and terminology and combine the features with different weights, and do the comparative experiments. However, the multi-document method based on the traditional clustering just uses cosine as similarity and does not make any distinctions. We propose a similarity calculation method based multi-dimensions, and combine the similarity with different weights. We get better performance to use the improved clustering algorithm on areas of science and technology.Obviously, the researchers don’t meet the academic category in understanding the clustering divided, more like to know that the academic category research category, research methods and other information. Based on the demand, we improved the statistical information extraction method to research on category information extraction, and we propose the improved calculation method of features’ weight. Based particularity of areas of science and technology, we propose the improved method of rules and statistics to do information extraction of academic method. In this way, we combine the parsing tree and rule to make the information extraction performance improved. Finally, the report will generate automatically with the experiment results. |