Font Size: a A A

Research On Entity Relationship Extraction Of Tea Diseases And Insect Pests Based On Chinese Glyph Information

Posted on:2023-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:L T LiaoFull Text:PDF
GTID:2543307088968769Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Relation Extraction is an important subtask in the information extraction.and has attracted the attention of academia and industry.As the place of origin of Tea,China is rich in tea species resources.However,diseases and insect pests have hampered the healthy development of the tea industry to varying degrees.A large mount of unstructured or semi-structured texts related to tea diseases and insect pests appeared online.If we build a knowledge graph of tea diseases and insect pests based on these informations,it will help the development of intelligent.agriculture.As one of the key technologies of constructing knowledge graph,relationship extraction has been pursued by many researcher.And we also study the relationship extraction of tea diseases and insect pests,as follows:(1)Building the relation extraction dataset containing 33000 corpus of tea diseases and insect pests.First,we crawled the semi-structured information in Baidu Encyclopedia to get 1310 tea related entities and 2451 original triples of tea diseases and insect pests,and determine 9 entity types and 9 relationship types.Then,crawling1.1G free text data from tea relevant websites,filter and clean these.Finally,borrowing the idea of distant supervision to label the texts,so as to complete the annotation of the corpus.(2)Building a relation extraction model of tea diseases and insect pests based on Chinese glyph.The research of relation extraction is mostly based on english,while english is phonological and Chinese is representational.Mainstream method of relation extraction often ignores the characteristics of Chinese as hieroglyphs.Considering the characteristics of Chinese glyph in form and meaning,and the dataset of tea diseases and insect pests is Chinese and the entities in the dataset have the same construction in Chinese Character Pattern,we propose a relation extraction model of tea diseases and insect pests based on Chinese glyph.In order to obtain more accurate entity embedding,adding Chinese glyph-level embedding to entity information.And we use three strategies to get the embedding of Chinese glyph,such as BERT、BERT_CNN、BERT_BI-GRU.The experimental results on self-built dataset and public character relationship datasets show that,the model of Integrating Chinese glyph into entity information can effectively improve the performance of relationship extraction.(3)Building a joint extraction model of entity relation of tea diseases and insect pests based on multi-information fusion.The traditional pipeline model has problems such as error propagation,entity redundancy and lack of information interaction.So,It can not be well applied to practical application scenarios.Compared with the traditional pipeline model,the joint model makes the subtasks interact with each other and make full use of the information through unified modeling.Moreover,by analyzing the selfbuilt dataset,it has overlapping relationships,a sentence contains multiple triples,and there is overlap between triples.In order to better apply to practice and solve the problem of relationship overlap,we selected more advanced joint extraction model of the cascade tagging,by tagging the possible head entities in the sentence,and then identifying and tagging the possible tail entities according to the specific relationship.Considering that the current mainstream joint extraction model of the cascade tagging does not consider the characteristics of Chinese glyph in form and meaning,nor does it make full use of the tag information of head entity,We propose a joint entity relationship extraction model based on multi-information fusion,which integrates glyph information and tag information.The model integrates glyph information and tag information into the word embeding,in which the glyph information adopts the similar processing method in(2).The experimental results show that the entity relationship joint extraction model based on multi-information fusion of Chinese glyph information and tag information has better relationship extraction performance.
Keywords/Search Tags:relation extraction, tea diseases and insect pests, distant supervision, Chinese glyph, joint extraction
PDF Full Text Request
Related items