Font Size: a A A

Drug Name Recognition Based On Partial Labeling And Reinforcement Learning

Posted on:2022-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:M T QuFull Text:PDF
GTID:2480306350453384Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Medical drug name recognition is the basic work of relation extraction and event extraction in drug-related tasks,which has important research significance in the field of biomedical science.Most of the existing drug recognition methods are based on the guided machine learning method,which often requires a large number of manually annotated data as training data.However,due to the limited manual labeling data,new drugs emerge in an endless stream,which restricts the performance of drug name recognition model.In this thesis,the composition characteristics of drug names are analyzed,and a neural network model based on character embedding and drug name prefix and suffix embedding is proposed to improve the semantic expression of drug names.At the same time,distant supervision,partial labeling learning and reinforcement learning are used to expand the training data and improve the performance of drug name recognition.The main research contents of this thesis include the following points.Firstly,this thesis focused on the composition characteristics of drug names,summarized the prefix and suffix dictionary of drug names,and added the prefix and suffix embedding and character embedding in the word embedding layer to improve the semantic expression ability of drug names.Drug name has some obvious characteristics in the form of word formation,such as the same prefix or suffix of drug name.In this thesis,prefixes and suffixes embedding and characters embedding are added to the embedding layer of the neural network model to capture its word-formation characteristics and improve its semantic expression,thus improving the recognition performance of drug names.Secondly,a hybrid training method based on manual annotated data and remote supervised data is adopted to improve the robustness and performance of the model.In order to suppress the over-fitting problem in the training process,part of distant supervision data is added to the manual labeled data to train the recognition model,so as to improve the robustness of the recognition model.At the same time,in the training based on extended data,part of manual annotation data is added to the distant supervision data to guide the model parameters to converge in the correct direction.The experimental results show that both the character embedding and mixed data training methods summarized in this thesis can effectively improve the performance of the model.At the same time,the model can effectively identify some new drug names that have not yet been included in the dictionary,which shows that the model has good generalization ability.
Keywords/Search Tags:Drug name recognition, Partial labeling learning, Remote supervision, Reinforce learning
PDF Full Text Request
Related items