Design And Implementation Of Multi Document Automatic Summarization System In Biomedical

Posted on:2011-10-12

Degree:Master

Type:Thesis

Country:China

Candidate:Q F Meng

Full Text:PDF

GTID:2178330338980954

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Biomedical research is the most attention-grabbing research area in the twenty-first century. Especially with the development of the biological research at molecular level and accomplishment of Human Genome Project, a mass of medical data was produced creating hundreds of biological databases. There is a great deal of biological knowledge which can be extracted from the biological data. It has an essential effect to researchers and medical workers, that how to conveniently, fast and accurately query and retrieve knowledge from the complex data to find useful information on their work.This paper starting from the practical problems which the current medical researchers and medical workers faced, applied the multi-document automatic summarization technology to the biomedical field,focusing on the research of the design and implementation of the biomedical multi-document automatic summarization system.First of all, according to the characteristics of query results on the PubMed, we used crawler to save PubMed query results in the local computer, and for the characteristics of the original material, proposed the method to establish the corpus and set up the corpus.Secondly, base on the corpus established, in order to standardize data formats, we preprocessed the corpus, focusing on the research of the part-of-speech tagging and named entity recognition, and finally selected bidirectional inference algorithm with the easiest-first strategy which tags better and faster, in order to ensure the accuracy of the sentence tagging, and overcome the shortcomings of the traditional algorithms.Finally, this paper recognized the topic of the standard data. The topic detection is the key of this paper. The medical literatures have large quantity of data, so this paper used K-means clustering algorithm to cluster the topic, and at the same time improved the traditional K-means algorithm, so that the clustering type could be dynamically increased to overcome the shortcomings of traditional K-means clustering algorithm which is difficult to identify potential topic. And this paper for the first time proposed and applied alternating enhanced strategy to abstracts sentence extraction. Finally the abstract sentences were sorted and the abstract was generated. This paper evaluated the abstracts using two internal evaluation methods, which proved that we achieved good results.

Keywords/Search Tags:

Biomedical, Multiple Documents Automatic Summarization, Named Entity Recognition, K-means clustering

PDF Full Text Request

Related items

1	Research Of Automatic Summarization Based On Named Entity
2	Recognizing Named Entities In Biomedical Literatures
3	A Study On The Recognition Of Biomedical Named Entity Based On Statistic
4	Research Of Word Representations On Biomedical Named Entity Recognition
5	Research On Named Entity Recognition Technology In Biomedical Field
6	Research On Key Techniques Of Multiple Documents Automatic Summarization
7	Research On Chinese Named Entity Recognition For Legal Documents
8	Research On Biomedical Named Entity Recognition Based On Hybrid Model
9	Research On Biomedical Named Entity Recognition Based On Deep Learning
10	Research On Biomedical Named Entity Recognition Based On Integrated Model