The Research Of Biomedical Name Entity Recognition By Combining Dictionary Based And Machine Learning Based Method

Posted on:2010-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2178360302460349

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Biomedical name entity recognition (Bio-NER) is a task that recognizes professional terminology in the field of molecular biology and medicine. Professional vocabulary includes biomedical name entity as well as the location of their activities, such as protein, DNA, RNA, cell lines. Currently there are the mass of biomedical literature texts for mining knowledge. In order to obtain links among biomedical entities, we should identify genes, proteins and other biomedical entities from literatures. Therefore, biomedical name entity recognition is basis of other text mining technologies, such as the relationship extraction, hypothesis generation and text classification.Nowadays there are three methods on the research of biomedical name entity recognition, including dictionary-based method, rule-based method and statistical machine learning method. Dictionary-based approach is relatively simple and practical, but its performance is limited to the size and quality of dictionaries. Rule-based method depends on the completeness and rationality of the rules, but it has lack of adaptability. Statistical machine learning method uses artificial tagging corpus for training, generates the target model, and then uses the model to predict the unlabeled corpus. The advantage of its method is that it brings robustness of system, and this method is used popularly.As we know, there isn't any lexicon that can include the whole biomedical entities and biomedical entities emerge in endlessly. To make up defects of dictionary-based method, and to combine with the advantages of statistical machine learning methods, we propose a new combination between dictionary and machine learning method in this thesis. First, we download dictionary information about biomedical name entities from relative biomedical websites; combine with Conditional Random Fields (CRFs) model to give Part Of Speech-Entity (POS-Entity) marks for corpus. We adapt distributed strategies to depart entities into different groups, and then generate different tagging models respectively. Besides we choose more effective features followed by the characteristics of biomedical name entity, adapt CRFs model to complete task of biomedical name entity recognition.We can get effectiveness from experimental results to show the influence of approach namely combination of dictionary based and machine learning based approach. The results obtained from the experiment on JNLPBA2004 corpus shows that the F-score can be improved from 72.83%, which is attained by adding POS-Entity tags to the CRFs model after adapting distributed strategies without any post-processing. The performance further increased to 73.39% after post-processing.

Keywords/Search Tags:

Biomedical Name Entity Recognition, Distributed Strategy, Features, Entity Dictionary, Conditional Random Fields

PDF Full Text Request

Related items

1	Biomedical Named Entity Recognition And Classification Of Biomedical Literature
2	Research On Algorithm And System Implementation On Named Entity Recognition For Chinese Electronic Medical Records
3	Recognition Of Named Entity In Electronic Medical Records Based On Cascaded Conditional Random Fields
4	Biomedical Named Entity Recognition And Entity Relation Extraction Based On Deep Learning Method
5	A Cambodian-named Entity Recognition Study Based On Constrained Random Fields
6	Research On Biomedical Dictionary-Based Entity Representation And Its Application
7	Conditional Random Fields Based English Name Entity Recognition
8	Chinese Named Entity Recognition Based On Conditional Random Fields
9	Named Entity Recognition Based On Conditional Random Fields
10	Named Entity Recognition Based On Conditional Random Fields Chinese Research