Font Size: a A A

Biomedical Named Entities Recognition Based On Classifiers Ensemble

Posted on:2011-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J SunFull Text:PDF
GTID:2120330332960921Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Biomedical Named Entity Recognition (Bio-NER) is an extremely important and fundamental task of biomedical text mining, and is also a critical step for biomedical text mining, only when bio-entities are correctly identified could other more complex tasks, such as, gene normalization, biomedical events extraction and protein-protein interaction extraction, be performed effectively. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have ecomplex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and SVM have been widely used for learning to recognize such entities from an annotated corpus.As the performance of Bio-NER system is not as good as that in news domain, in order to improve the performance of Bio-NER system, classifiers ensemble methods which combine results of multiple classifiers have been proposed.Methods of biomedical named entities recognition based on classifiers ensemble are mainly researched in the thesis. Experiments are operated on the corpus of BioCreAtIvE 2 GM. and its contributions can be summaried as follows:(1) Construction of different single classifiers:Six divergent models are implemented with different machine learning algorithms, different multiclasses methods and different feature sets in this paper. The feature set, feature extraction method and trainning process of each model are described in detail.To further improve the recognition performance of maximum entropy methods, the TBL method is adopted to correct some tagging errors in ME model. Experimental results show that error correction has greatly improved the recognition performance of maximum entropy methods.(2) Biomedical named entities recognition based on classifiers ensemble:Simple set operation method (union and intersection), voting method and two layer stacking method are used to combine the tagging results of the six single classifiers. Experimental results show that the tagging results using multiple classifiers constantly outperform single classifiers; The performance of classsifiers ensemble relies on performance or quantity of single classifiers and the diversity among different classifiers which participate in the combination; The two-layer stacking algorithm is more effective than voting and union and intersection operations methods. The best method achieves an F-Measure of 88.14%, which is higher than that of the (?)op-ranked Bio-NER systems in the BioCreAtIvEâ…¡GM challenge.
Keywords/Search Tags:Text Mining, Biomedical Named Entity Recognition, Machine Learning, Classifiers Ensemble
PDF Full Text Request
Related items