Font Size: a A A

Fine-Grained Entity And Relation Extraction

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:B C WangFull Text:PDF
GTID:2428330611499986Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big Cilin is an automatically constructed large-scale open domain Chinese entity knowledge base.It can automatically obtain categories of entities based on multiple information sources in the search engine and mine the hyponymy relationship of the categories to build a hierarchical category system.At present,the entity scale of the Big Cilin has reached more than 10 million,and the scale of hypernym words has also reached more than 180,000.While expanding the data scale,the Big Cilin needs to consider better solutions in fine-grained scenarios to realize synonymous relationship mining and hyponymy relationship mining.Based on this background,this paper studies three small directions to accomplish this goal.1.Entity synonym mining.As the size of entities increases,the number of internal redundant entities also increases accordingly.This paper proposes a pretrained word embedding fine-tuning technology based on Tongyi Cilin.Compared with directly using pre-trained word embedding to determine the synonymous relation of entities,this method makes full use of the external synonym knowledge base of synonym lexicals to pull the distance of synonyms in the semantic space,and at the same time,alienate non-synonyms in the semantic space.This method can more accurately determine whether the names of two entities are the same entity.2.Concept path fusion of hypernyms.The entities and hypernyms in the Big Cilin are mainly obtained through automatic mining,so some errors are inevitable.Whereas many other existing knowledge graphs have hypernym systems constructed by manual means.In this paper,the concept system of the Big Cilin and other knowledge graphs are used to construct a conceptual path matching dataset,and a variety of matching models are used to mine the alignment relation of concepts in different graphs and uses their alignment to correct the hyponymy relation in the Big Cilin.3.Fine-grained entity typing.There is already a good hyponymy relation extraction module in the Big Cilin,but with the increase of the size of hypernym words,we expect to use more abundant information sources to determine the hyponymy relation.Based on the existing fine-grained entity typing recognition schemes,this paper uses different models and enhancement strategies to test metrics under the two datasets.And the knowledge from the Chinese dataset is migrated to the Big Cilin to serve as a richer supporting information for the Big Cilin's hyponymy relation mining module.
Keywords/Search Tags:synonym mining, concept path fusion, entity typing
PDF Full Text Request
Related items