Font Size: a A A

Design And Implementation Of Multi Document Automatic Summarization System In Biomedical

Posted on:2011-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q F MengFull Text:PDF
GTID:2178330338980954Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Biomedical research is the most attention-grabbing research area in the twenty-first century. Especially with the development of the biological research at molecular level and accomplishment of Human Genome Project, a mass of medical data was produced creating hundreds of biological databases. There is a great deal of biological knowledge which can be extracted from the biological data. It has an essential effect to researchers and medical workers, that how to conveniently, fast and accurately query and retrieve knowledge from the complex data to find useful information on their work.This paper starting from the practical problems which the current medical researchers and medical workers faced, applied the multi-document automatic summarization technology to the biomedical field,focusing on the research of the design and implementation of the biomedical multi-document automatic summarization system.First of all, according to the characteristics of query results on the PubMed, we used crawler to save PubMed query results in the local computer, and for the characteristics of the original material, proposed the method to establish the corpus and set up the corpus.Secondly, base on the corpus established, in order to standardize data formats, we preprocessed the corpus, focusing on the research of the part-of-speech tagging and named entity recognition, and finally selected bidirectional inference algorithm with the easiest-first strategy which tags better and faster, in order to ensure the accuracy of the sentence tagging, and overcome the shortcomings of the traditional algorithms.Finally, this paper recognized the topic of the standard data. The topic detection is the key of this paper. The medical literatures have large quantity of data, so this paper used K-means clustering algorithm to cluster the topic, and at the same time improved the traditional K-means algorithm, so that the clustering type could be dynamically increased to overcome the shortcomings of traditional K-means clustering algorithm which is difficult to identify potential topic. And this paper for the first time proposed and applied alternating enhanced strategy to abstracts sentence extraction. Finally the abstract sentences were sorted and the abstract was generated. This paper evaluated the abstracts using two internal evaluation methods, which proved that we achieved good results.
Keywords/Search Tags:Biomedical, Multiple Documents Automatic Summarization, Named Entity Recognition, K-means clustering
PDF Full Text Request
Related items