Font Size: a A A

Research On Key Technologies Of Text Retrieval And Relation Extraction For Precision Medicine

Posted on:2023-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F LiuFull Text:PDF
GTID:1524306830482154Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of biomedical technology,the amount of biomedical literature is growing explosively.Biomedical text mining is proposed to automatically search for relevant biomedical literature from medical databases,extract meaningful biomedical knowledge,construct biomedical knowledge graphs,and provide services for precision medicine.Deep learning methods in the natural language processing have been widely used in the biomedical field.However,biomedical texts for precision medicine,including biomedical literature and electronic medical records,have diverse expressions of biomedical entities,complex syntactic structures,long fact descriptions and symbolic representations.It makes text mining tasks such as biomedical text retrieval and relation extraction face enormous challenges.Aiming at the challenges and key problems of biomedical text mining for precision medicine,this dissertation focuses on biomedical text retrieval and biomedical relation extraction based on deep learning.The main problems,research contents and innovations of the dissertation include the following aspects:(1)Biomedical entities in the biomedical text have different complex term expressions and the sentences of biomedical descriptions have non-continuous and non-local semantic associations.These two problems make existing neural network models struggle to capture the semantic associations between queries and documents.To solve the problem,the dissertation proposes a knowledge enhanced edge-driven graph neural ranking(KEGNR).The method first constructs a heterogeneous graph connecting biomedical terms between the query and the document by defining three kinds of nodes and five kinds of edges,which is called a query-doc graph.Then,the knowledge related to the query is integrated into the query-doc graph,and a knowledge-query-doc graph is obtained.Finally,an edge-driven graph ranking model is applied to capture the semantic matching signals between the query and the document and obtains a relevance score for document ranking.Experimental results show that KEGNR can not only effectively alleviate the semantic gap between queries and documents caused by different expressions of biomedical entities,but also capture non-continuous and non-local semantic matching signals between queries and documents.Experimental analysis shows that the knowledge integration method of KEGNR can effectively alleviate the influence of noise information in external knowledge on the text representation ability of the model.In addition,the edge-driven self-attention pooling layer of KEGNR can provide some interpretability for matching the document to the query.(2)Different complex representations of the same biomedical entity in biomedical text exist in multiple sentences,which makes models difficult to extract the semantics of documentlevel entity relations.To solve the problem and explore whether the pre-trained self-attention mechanism can understand the semantics of document-level entity relations,the dissertation proposes an entity mask based on pre-trained self-attention(EM-PSAT).To avoid the loss of information and the introduction of noise caused by document segmentation,this dissertation proposes a document preprocessing method.The method retains the sentences in which the target entities are located and also retains sentences that do not contain the target entities between these sentences.The preprocessing method preserves text information as much as possible,keeps semantic coherence,and discards some useless information.Then,to make the model pay more attention to the context between entities and understand the semantics of document-level entity relations,EM-PSAT masks the target biomedical entities with uniform identifiers.Finally,a pre-trained self-attention mechanism is used to extract document-level entity relations.Experiments show that entity masking enables the model to pay more attention to the context between entities and understand the semantics of document-level relationships.By visualizing the weights of pretrained self-attention,it is found that the self-attention mechanism can capture document-level entities and their associated contexts.(3)To solve the problem of long-distance context dependencies and complex semantics caused by numerous medical entities and cross-sentence relationships in biomedical text,the dissertation proposes a multi-granularity sequential network(MGSN).The method uses different sequence encoders to hierarchically encode multi-granularity information,such as entity global information,entity local information and document-level information,and then integrate them for document-level relation extraction.Experimental results show that encoding different granularities of information helps the model to solve the problem of long-distance context dependencies in cross-sentence relation extraction and understand document-level complex semantics.In addition,experimental analysis shows that the CNN-based bi-affine structure of MGSN can locate in sentences that reflect the target entity pair relationship,thus providing certain interpretability for document-level biomedical relation extraction.(4)To solve the problem that incomplete knowledge bases cannot provide the medical knowledge required for biomedical text retrieval,the dissertation proposes an entity relation aware graph neural ranking(ERAGNR).It aims to use entity relation extraction technology to improve text retrieval and alleviate the problem caused by incomplete knowledge bases.ERAGNR mines the relationships between some biomedical entities in the document and combine them with incomplete external knowledge.It increases the semantic association and reduces the semantic gap between the query and the document.The method first constructs a knowledge-query graph and a document-entity graph,and then fuses the two graphs to obtain a knowledge-query-document-entity graph.Then an additional biomedical relation extraction task is introduced.Based on the shared text encoder and graph neural network,two separate network structures are used to complete the text retrieval task and relation extraction task.The the text retrieval task and relation extraction task share the same graph neural network,so that the graph model can learn the semantic matching pattern between the query and the document,and learn to recognize the relationships between the entities in the document.It allows the model to automatically mine biomedical entity relationships within document in the text retrieval task,and capture semantic matching signals between the context of the entity relationships and the query.The experimental results show that ERAGNR greatly enrich the semantics of the knowledge-query-doc graph by mining entities in the document to expand the query-document graph when knowledge bases are incomplete.Through biomedical relation extraction task,the model can learn the ability to capture the context of the entity relations in the document,so that the model can more accurately match the semantics between the query and the document.
Keywords/Search Tags:biomedical text retrieval, query-document graph, graph neural ranking, document-level biomedical relation extraction, multi-granularity sequential model
PDF Full Text Request
Related items