Font Size: a A A

Extraction Of Biological Entity Relation Based On Literature Mining And Its Application

Posted on:2021-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2370330602498994Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
There are all kinds of interactions between biological entities(such as chemicals,proteins,etc.)in living organisms.It is very important to study the interaction of bio-logical entities to understand the life mechanism.With the rapid growth of biomedical literature,we can efficiently extract biological entity relation through literature mining,and then establish a structured biological database,which has broad application signifi-cance in the fields of basic biomedical research and drug discovery.As far as we know,existing machine learning-based systems require scientists to manually design features to extract the relations between biological entities,and it is difficult for these systems to characterize the dependent information between words.Most deep learning-based systems ignore the hierarchical relation of biological entities,and do not establish the association between the relations.Moreover,these systems are trained on specific data sets and perform poorly when migrating to other data sets,thus causing difficulties in extracting multiple biological entity relation.In addition,most biological entity rela-tional databases are built manually,which is very resource-intensive and difficult to keep up with the speed of publications.For this reason,this thesis designs a univer-sal deep learning model that can extract biological entity relation at different levels,and then mine biological entity relations from massive literature.Finally a structured database can be established for researchers to use.The main research and contributions of this thesis are as follows:1.Design of extraction method of hierarchical relation of biological entitiesThis thesis proposes a multi-channel convolutional neural network model(MC-CNN)for extracting multiple biological entity relations.This model divides sentences into multiple phrases and learn their semantics through convolution operation.Then it capture the underlying relation words expressing association from the phrases,and finally categorizes the underlying relations into high-level relations from bottom to top.Instead of artificially constructing features,this thesis uses the language model(BERT)to learn the word distribution in the biological field from the biological corpus,so as to produce more accurate word vectors.By combining the attention mechanism and the residual layer,it can learn the fully semantics of sentences.And finally,the multi-channel convolution layer is used to predict the relation.In addition,in order to enhance the robustness of the model on multiple data sets,this thesis designed the Ranking loss function,which uses sample distribution information to adaptively adjust parameter updates.Through testing on the drug-drug and chemical-protein relation data sets,the results show that the proposed method has better effects than the existing methods,thus indicating that the method is effective for extracting multiple biological entity relations.2.Establishment and application of biological entity interaction databaseIn this thesis,the pre-trained model is used to mine the biological entity relation from the massive public literature abstracts,and then an entity interaction database is established.First,we download a large number of biomedical literature abstracts from PubMed retrieval system,and then extract the entity relation data.In order to ensure the quality of the data,we designed filtering indicators and scoring strategies for the extraction results,and only kept the data that meet the indicators in the database.In the end,about 300,000 biological entity relations are stored in our database.In addition,we have built a web-based retrieval system.In this system,the scattered biological entities constitute a large biological relation network.Users can query the direct or indirect relation between biological entities,and can also sort and visualize the results.
Keywords/Search Tags:Biomedical Literature, Relation Extraction, Multi-channel CNN, Database
PDF Full Text Request
Related items