Font Size: a A A

Knowledge Extraction In Chemical Literature

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:H D ZhangFull Text:PDF
GTID:2371330548469575Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the number of chemical-related literature is increasing,there are more than 30 chemical-related publishers,more chemistry-related periodicals are hundreds of them,so much literature is convenient for researchers,and it makes it more difficult for researchers to find the information they need in the vast amount of information.Moreover,the search results are more and more demanding,the retrieval based on the traditional string matching results can no longer meet the needs,the more urgent need of researchers is to find the hidden chemical knowledge,in the literature to find the relationship between entities.This paper first introduces the background of the research on knowledge extraction at home and abroad,then analyzes the current research status of the knowledge extraction in the field of chemistry,and finally introduces the work done in the field of chemistry.The main tasks of this paper are two points,the first one is to discover the potential relationship between different types of chemical entities in the literature;the second is to study the document retrieval algorithm based on entity potential relationship,that is,given an X to find all the related y,and x,Y are the entity categories in chemistry.It is difficult to excavate the complex latent relationship between entities in the literature,and it is necessary to identify the entities in the chemical literature,and extract the relationship between proteins,DNA,diseases and even proteins and proteins.In this paper,we first identify the entities in the literature using CRFs based on contextual clues,and propose an extraction method based on the improved association algorithm fpgrowth generate the relational matrix,which stores the relationships between all the entities in the matrix.In the knowledge extraction of chemical literature,there are many problems such as foreign bodies and nouns,different nouns,non-standard abbreviations,spelling errors,and so on,this paper proposes a method based on improving Levenshtein distance and expanding thesaurus to solve the problems of fuzzy lookup and inaccurate lookup.And the index retrieval is carried out by means of multiple mode retrieval and score adjustment strategy based on the association score penalty reward mechanism.The experiment shows that the method has higher accuracy and recall rate,and has higher satisfaction to the result than the traditional retrieval methods.
Keywords/Search Tags:chemical domain, named entity recognition, entity relationship extraction, knowledge extraction
PDF Full Text Request
Related items