Biomedical Named Entities Recognition Based On Classifiers Ensemble

Posted on:2011-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:J Sun

Full Text:PDF

GTID:2120330332960921

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Biomedical Named Entity Recognition (Bio-NER) is an extremely important and fundamental task of biomedical text mining, and is also a critical step for biomedical text mining, only when bio-entities are correctly identified could other more complex tasks, such as, gene normalization, biomedical events extraction and protein-protein interaction extraction, be performed effectively. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have ecomplex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and SVM have been widely used for learning to recognize such entities from an annotated corpus.As the performance of Bio-NER system is not as good as that in news domain, in order to improve the performance of Bio-NER system, classifiers ensemble methods which combine results of multiple classifiers have been proposed.Methods of biomedical named entities recognition based on classifiers ensemble are mainly researched in the thesis. Experiments are operated on the corpus of BioCreAtIvE 2 GM. and its contributions can be summaried as follows:(1) Construction of different single classifiers:Six divergent models are implemented with different machine learning algorithms, different multiclasses methods and different feature sets in this paper. The feature set, feature extraction method and trainning process of each model are described in detail.To further improve the recognition performance of maximum entropy methods, the TBL method is adopted to correct some tagging errors in ME model. Experimental results show that error correction has greatly improved the recognition performance of maximum entropy methods.(2) Biomedical named entities recognition based on classifiers ensemble:Simple set operation method (union and intersection), voting method and two layer stacking method are used to combine the tagging results of the six single classifiers. Experimental results show that the tagging results using multiple classifiers constantly outperform single classifiers; The performance of classsifiers ensemble relies on performance or quantity of single classifiers and the diversity among different classifiers which participate in the combination; The two-layer stacking algorithm is more effective than voting and union and intersection operations methods. The best method achieves an F-Measure of 88.14%, which is higher than that of the (?)op-ranked Bio-NER systems in the BioCreAtIvEâ…¡GM challenge.

Keywords/Search Tags:

Text Mining, Biomedical Named Entity Recognition, Machine Learning, Classifiers Ensemble

PDF Full Text Request

Related items

1	Research On Biomedical Named Entity Recognition Algorithm Based On Multi-Task Learning
2	Research On Named Entity Recognition And Normalization For Biomedical Text
3	Research On Biomedical Named Entity Recognition Based On Deep Learning
4	Research And Application Of Biomedical Named Entity Recognition Based On Reinforcement Learning
5	Research And Implementation Of A Biomedical Named Entity Recognition Method Based On Deep Learning
6	Research On Biomedical Named Entity Recognition Method Based On Word Meaning Enhancemen
7	Research On Biomedical Named Entity Recognition Method Based On Deep Neural Network
8	Research On Biomedical Named Entity Recognition Based On Weak Supervision
9	Research On Improving Biomedical Named Entity Recognition Models By Incorporating Multi-source Information
10	Research On Identification Of Bacteria Named Entity Based On Deep Learning And Language Model