Font Size: a A A

The Algorithm Research On Recognition Of Biomedical Named Entity Based On Text Mining

Posted on:2019-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:B T GaoFull Text:PDF
GTID:2428330569477269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recently,in order to obtain the required biomedical knowlege rapidly and effectively from mass biomedical literature,using text mining technology has become a hot research topic in the biomedical domain.Due to the biological named entity is the basic element in the biomedical text,biomedical named entity recognition is minor premise for information extraction,information retrieval,machine translation and natural language processing,etc.in the biomedical text.Therefore,using the method of text mining to effectively identify the named entity in the biomedical text is of great significance to better carry out the next work.In particular,the role and function of protein is an important research project of life science.Therefore,it is of great importance to identify the named entity of biomedical proteins for biomedical research.But the existing research methods usually are supervised learning methods,and the kind of method often needs a lot of labeled data in the target domain for training model to ensure the performance of the model.In the biomedical domain,manual annotation data requires expensive cost of manpower and time.In order to reduce the need of labeled data in target domain for classification model to save resource,this paper mainly study from the following two aspects:(1)Biomedicine protein named entity recognition based on transfer learning.In order to reduce the requirement of labeled data in target domain for NER,the problem of NER in biomedical texts is transformed into a hidden Markov model based on transfer learning and this study proposes a BioTrHMM algorithm.The data sets in the target domain for NER do not need a large amount of labeled data to learn a model for the task by transfer learning.With the help of labeled data in source data sets across a different but related domain,and use the method of data gravitation to evaluate the contribution of samples in the auxiliary data sets about learning a model for the target domain.And calculate the weights of the data from the source domain and the data from the target domain.And then construct the hidden Markov model algorithm(BioTrHMM)based on the transfer learning.The experiment results on GENIA corpus show the BioTrHMM algorithm has better performance than the traditional algorithm of hidden Markov model,only uses small amount of labeled data in target domain,and the method reduces the cost of manual annotation data greatly.(2)Biomedicine protein named entity recognition based on PU learning.In the actual study,due to the less of labeled data,the traditional supervised learning method can not build effective classification model for biomedical named entity recognition in biomedical text.In the light of the problem of biomedicine named entity recognition in this case,this study transformed the problem into a biomedicine named entity recognition problem under PU learning.By using a two-step PU learning methods,this study use 1-DNF method,Spy technology,Naive Bayesian Classifier and Rocchio method respectively in the first step to classfy strong negative sample from the unlabeled data,and then construct HMM classification model with positive data and strong negative data to recognize named entity in biomedical text.Experimental results showed that in the case of less data annotations,the performance of using PU study method of two-step to construct classification model significantly better than the performance of model constructed by using the less of annotation data directly.In the case of less data annotation,this research using PU learning method to build classification model not only shows the good classification performance,and saves the resource cost with no additional manual annotation data at the same time.
Keywords/Search Tags:named entity recognition, transfer learning, positive unlabeled learning, biomedical, text mining
PDF Full Text Request
Related items