Multi-Document Summary And Synthesis Reports Automatically Generated In The Field Of Science And Technology

Posted on:2013-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:S C Wang

Full Text:PDF

GTID:2298330467476173

Subject:Computer software and theory

Abstract/Summary:

Multi-Document Summarization is one of the most important problems in the fields of data mining, information retrieval and so on. A lot of remarkable achievements have been made and a large number of algorithms have been proposed about multi-document summarization. However, due to the differences of applications areas, data, and the complexity of the task itself, there are still many problems in the multi-document summarization task. Faced areas of science and technology, we improve traditional clustering algorithm to do the multi-document summarization. Then we improve methods of information extraction, and we use the methods which based combination of rules and statistics to do the information extraction.We improve the traditional hierarchical clustering algorithm to do the multi-document summarization task, transform the multi-document summarization task into the task of documents clustering and improve the feature selection method. The traditional method of feature selection does not make any distinctions. We propose a method of feature selection which based entities and terminology and combine the features with different weights, and do the comparative experiments. However, the multi-document method based on the traditional clustering just uses cosine as similarity and does not make any distinctions. We propose a similarity calculation method based multi-dimensions, and combine the similarity with different weights. We get better performance to use the improved clustering algorithm on areas of science and technology.Obviously, the researchers donâ€™t meet the academic category in understanding the clustering divided, more like to know that the academic category research category, research methods and other information. Based on the demand, we improved the statistical information extraction method to research on category information extraction, and we propose the improved calculation method of featuresâ€™ weight. Based particularity of areas of science and technology, we propose the improved method of rules and statistics to do information extraction of academic method. In this way, we combine the parsing tree and rule to make the information extraction performance improved. Finally, the report will generate automatically with the experiment results.

Keywords/Search Tags:

natural language processing, multi-document summary, text clustering, information extraction, report generation

Related items

1	Statistic-based Automatic Keypharse Extraction And Summarization From Multi-document
2	The Research On Multi-document Summarization Generation Method Based On Text Relation Graph
3	Research On Multimodal Algorithm For Strutured Document Information Extraction
4	Research On Key Techniques Of Query-focused Multi-document Summarization
5	A Transferable Approach To Generating Abstractive Text Summary Based On Pre-trained Language Model
6	Research On The Automatic Generation Of Customer Reviews Report Facing Network
7	Text Summerzatio Generation Research Based On Multimodal Data
8	Research On Semantic Text Exchange Method Based On Pre-trained BART Language Model
9	Research On Document-level Relationship Extraction With Reasoning Information
10	Natural Language Processing Aiming To The Core Texts Of Scientific Literature