Knowledge Extraction In Chemical Literature

Posted on:2019-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:H D Zhang

Full Text:PDF

GTID:2371330548469575

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,the number of chemical-related literature is increasing,there are more than 30 chemical-related publishers,more chemistry-related periodicals are hundreds of them,so much literature is convenient for researchers,and it makes it more difficult for researchers to find the information they need in the vast amount of information.Moreover,the search results are more and more demanding,the retrieval based on the traditional string matching results can no longer meet the needs,the more urgent need of researchers is to find the hidden chemical knowledge,in the literature to find the relationship between entities.This paper first introduces the background of the research on knowledge extraction at home and abroad,then analyzes the current research status of the knowledge extraction in the field of chemistry,and finally introduces the work done in the field of chemistry.The main tasks of this paper are two points,the first one is to discover the potential relationship between different types of chemical entities in the literature;the second is to study the document retrieval algorithm based on entity potential relationship,that is,given an X to find all the related y,and x,Y are the entity categories in chemistry.It is difficult to excavate the complex latent relationship between entities in the literature,and it is necessary to identify the entities in the chemical literature,and extract the relationship between proteins,DNA,diseases and even proteins and proteins.In this paper,we first identify the entities in the literature using CRFs based on contextual clues,and propose an extraction method based on the improved association algorithm fpgrowth generate the relational matrix,which stores the relationships between all the entities in the matrix.In the knowledge extraction of chemical literature,there are many problems such as foreign bodies and nouns,different nouns,non-standard abbreviations,spelling errors,and so on,this paper proposes a method based on improving Levenshtein distance and expanding thesaurus to solve the problems of fuzzy lookup and inaccurate lookup.And the index retrieval is carried out by means of multiple mode retrieval and score adjustment strategy based on the association score penalty reward mechanism.The experiment shows that the method has higher accuracy and recall rate,and has higher satisfaction to the result than the traditional retrieval methods.

Keywords/Search Tags:

chemical domain, named entity recognition, entity relationship extraction, knowledge extraction

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Relation Extraction For Chemical Industry Safety
2	Incremental Learning-based Knowledge Graph Construction Technology For Petrochemical Safety Domain
3	Research On Multi Type Named Entity Extraction Based On The Complex Characteristics Of The Knowledge Of Food Safety Events
4	Research And Implementation Of Key Technologies For The Construction Of Knowledge Graph Of Airline Unsafe Events
5	Research On Text Big Data Analysis Method Of High-speed Railway Safety
6	Research On Named Entity Recognition For Architectural Texts
7	Research On Entity Extraction For Animal Food Safety Hazards
8	Research On The Construction Of CNC Machine Tool Fault Knowledge Graph Based On Deep Learning
9	Research And Application For Named Entity Recognition Of Coal Mine Accident Field Based On Deep Learning
10	Research On Named Entity Recognition For Hazardous Chemical Storage Technology