Font Size: a A A

Research On Biomedical Text Information Extraction Method Based On Semantic Enhancement

Posted on:2022-11-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L MaFull Text:PDF
GTID:1480306758979109Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Biomedical literature is the major source of knowledge in the field of biomedicine,and the rich knowledge has important significance for promoting health,treatment concepts,preventive healthcare,medical management,etc.The exponentially increasing medical literature is far beyond humans' ability to acquire knowledge in traditional methods.Accurate acquisition of valuable and key medical knowledge from massive electronic literature has become a research hotspot.In this thesis,aiming at the issues such as the loss of semantic information,inadequate utilization of external resources,and low utilization rate of sample information in information extraction,the biomedical text information representation,semantic knowledge utilization,and deep model building are taken as the entry points to deeply explore information extraction methods based on semantic enhancement and the application of which in the recognition of adverse medicine reactions and the discovery of medicine action mechanism.The main work completed is as follows:(1)Aiming at the issue of insufficient semantic information contained in biomedical entities,this thesis integrates biomedical knowledge and proposes a method for relation extraction of biomedical entities based on a normalized network.Firstly,a pre-trained language model was adopted to generate high-quality deep context-related word representations for each word in the sentence.Secondly,the contribution degree and latent semantic information of different words to entity relations were explored via the attention mechanism to obtain rich contextual information.Thirdly,a bidirectional gated recurrent unit network based on the normalization method was adopted to obtain the semantic representation of the global depth,solve the issue of network degradation while solving high-order dependencies,and strengthen the semantic representation via residual connections.In the end,the structural features of biomedical entities were introduced to further strengthen the representation.By combining different normalization methods and conducting tests on different scale data sets,it was verified that this method had good generalization performance on small scale data sets,which can be used as a beneficial supplement for the biomedical test such as the discovery of medicines and the new use of old medicines.(2)Aiming at the low utilization of error sample information and insufficient semantic representation in traditional biomedical event extraction methods,this thesis proposes a biomedical event extraction method based on semantic enhancement and error detection mechanism.The self-training learning mode was adopted to solve the issue of small samples in the training set.In the process of expanding the number of samples in the iterative learning,the wrong samples were recognized via semantic similarity detection,and then filtered to obtain accurate training samples.In which,SVM was used for the training of small sample data sets and the prediction of biomedical events,while the semantic representation of error samples was based on a high-order semantic representation method of extended short sentences,extracted by frequent pattern mining and C-LSTM model.The assessment results provided by the Bio NLP GENIA shared task were taken as measure standards,and a large number of test results showed that the proposed method could well extract biomedical events from biomedical literature,provide auxiliary tools for other downstream tasks,and serve as additional supplements for pathway enrichment,the discovery of gene ontology signaling pathways,metabolism,etc.(3)Aiming at the discontinuity of semantic understanding in the recognition of adverse medicine reactions,the entity recognition method was applied to the research,and a two-stage neural network model based on spanning sequence fragments was put forward.In the first stage,the text sequence was input into the pre-trained language model,and the extended representation of semantically similar words was obtained by combining the word vectors pre-trained by word2 vec,and the sequence labeling method was applied to recognize the mentioned words in the adverse reactions.In the second stage,after the sequence containing the identifiers of the mentioned words was output by the pre-trained language model,the convolutional neural network was adopted to reconstruct the context,and the span representation based on the continuous and non-continuous combinations of the mentioned words was used to obtain the dependencies between the mentioned words,so as to improve the discriminative performance of the model.The test results showed that the proposed method could achieve good results on different forms of corpus and provide certain value for medicine safety detection and medicine discovery.(4)Aiming at the issue that the action mechanism of Mongolian medicine in treating diseases is unknown,and the knowledge base that depends on the study of medicine mechanism can't be timely updated,a study of the action mechanism of medicines based on biomedical information extraction is put forward.Mongolian medicine Tufuling Qiwei Tangsan was taken as an example for the treatment of psoriasis,according to the network pharmacology analysis method,firstly,based on the neural network relation extraction model,the interaction of medicine components and targets was recognized,and the component targets of the existing knowledge base was expanded,and the psoriasis targets were recognized.Then the protein interaction networks of medicine component targets and disease targets were constructed respectively,after the networks were merged,enrichment analysis was conducted to discover key target genes that may treat the diseases.In the end,the method based on biomedical event extraction was adopted to accurately obtain the literature related to key target genes,and the molecular docking technology was used for double auxiliary verification to deeply explore the action mechanism of Tufuling Qiwei Tangsan in the treatment of psoriasis,which has certain inspiration and guiding significance for the clinical application of Mongolian medicine in the treatment of diseases.
Keywords/Search Tags:Semantic Enhancement, Deep Learning, Entity Recognition, Relation Extraction, Biomedical Event
PDF Full Text Request
Related items