A Method For Information Extraction From Medical Patents

Posted on:2021-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Duan

Full Text:PDF

GTID:2504306476953149

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Through information extraction on the documents of biomedical field,the degree of automation of the construction of domain knowledge base can bi imporved,which can further support the application of computers on document retrieval,diagnostic decision making,innovation examinations and predictive analysis in biomedical domain.Biomedical patents have innovative and informative content and sophisticated experimental verification,which give them high application value both academically and commercially.Anti-tumor drug is a field that has received much attention recently,and a great amount of related patents is released every year.However,the analysis work on these patents is mainly done manually,which is time consuming and expensive.Therefore,it is of great importance to study automated methods for extracting key information from anti-tumor drug patents.This thesis takes information extraction on Chinese anti-tumor drug patents as goal.After a preliminary analysis on the content of anti-tumor drug patents,we found that the key information mainly include entities like chemical compounds,desease names and drug targets,etc.Therefore,this thesis concentrates on the recognition of these entities in anti-tumor drug patents.The main researches and results in this thesis include:1)An anti-tumor drug patents entity recognition(ERATDP)dataset is constructed.By investigating public datasets,analyzing and referring to existing annotation guide,a projectdriven annotation guide is formed.Then an anti-tumor drug patents entity recognition dataset is constructed to support the training and evaluating of the recognition models.2)For the fact that the dataset has rich categories of entities,a combined method for entity recognition is studied.After analysis on the characteristics of the entities,a method based on the combination of dictionary-based,pattern-based and mechine-learning-based methods is designed.Meanwhile,to reduce the overfitting problem caused by the small amount of labelled data,text augmentation methods are applied to further improve the performance.3)Based on the combined entity recognition model,a prototype system for information extraction from Chinese anti-tumor drug patents is designed and implemented.Experiments are conducted on the ERATDP dataset,whose results showed that the combined method can achieve better overall performance than the well applied methods and the text augmentation strategy can improve the performace on the sparse classes.The work of this thesis can meet the requirement of a real world project on the one hand,on the other hand,it has reference value for the related work of entity recognition in specific domain.

Keywords/Search Tags:

Anti-tumor Drug Patents, Entity Recognition, BERT Model, Text Augmetation

PDF Full Text Request

Related items

1	GAN-based Named Entity Recognition For TCM Text
2	Study On Named Entity Recognition Model Of Cancer Patient Online Questioning Text Based On Transfer Learning
3	Semi-Supervised-Based Named Entity Recognition And System Application For Drug Patents
4	Research On Named Entity Recognition In TCM Medical Records Based On BERT Pre-training Mode
5	Medical Text Named Entity Recognition Based On Improved Sequence Labeling Model
6	Research And Implementation Of Medical Entity Recognition System Based On Double BiLSTM
7	Research On Named Entity Recognition Of Xinjiang Local Medicine Based On Pre-training Model
8	Research On Medical Text Named Entity Recognition And Entity Relation Extraction Based On Machine Reading Comprehension Framework
9	Research On Method Of Medical Named Entity Recognition Based On Pre-trained Model
10	Research On Named Entity Recognition And Relation Extraction For Medical Texts