Font Size: a A A

Chinese Biomedical Text Information Extraction Based On Deep Learning

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y DingFull Text:PDF
GTID:2428330626460397Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Because the field of biomedicine is closely related to people's health,the field of biomedicine has attracted much attention.At the same time,the documents in the field of biomedicine have shown an exponential growth.These documents contain a large amount of knowledge and are a valuable resource for relevant researchers.However,knowledge extraction from documents consumes a lot of time and effort,and it is difficult to meet the needs of relevant researchers to extract knowledge from biomedical documents.Therefore,text mining technology appeared.Biomedical entity recognition is one of the basic tasks in text mining.Chinese biomedical entity relation corpus was construction by using publicly available English biomedical annotated corpora,translation technology and manual annotation methods.Then stroke ELMo was trained based on a large number of documents in the field of biomedicine.Finally,we build a model based on strokes ELMo+BILSTM+CRF to complete the entity recognition task.This model to solve the problem of polysemy and poor protein recognition.In the field of biomedicine,the short text classification of clinical is one of the important steps in the construction of an auxiliary medical diagnosis system,which has high application prospects and medical clinical value.This dissertation presents a neural network integration method based on BERT.Compared with other BERT-based models,the model can achieve higher accuracy.In terms of pre-training language models,our method uses fine-tuning techniques such as continuous training and Gradual unfreezing.In terms of training models,our method uses techniques such as pseudo-labeling and five-fold cross-training.In terms of feature representation,this method also designs a series of general features that can improve the effect of short text classification,which can alleviate the problem of short text information shortage.Compared with other BERT-based models,the model can achieve higher accuracy.In the field of biological information extraction,relation extraction is of great significance.Based on the biomedical corpus,our method builds an integrated model of BILSTM based on attention mechanism and multi-granularity Lattice,which can integrate word-level information into character sequences to avoid the influence of word segmentation errors.By using an external language knowledge base,the problem of Chinese ambiguity was avoided.Finally,the results of the model are further improved by incorporating a series of Chinese-specific features.Experimental results show that the model can well extract the relation between entities.
Keywords/Search Tags:Entity recognition, Short text classification, relation extraction, pre-trained language model
PDF Full Text Request
Related items