Font Size: a A A

Research Of Domain-Oriented Chinese Entity Linking Technology

Posted on:2020-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2428330596971774Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,an increasing number of text information are presented in the form of natural language.How to extract and make full use of valuable information from massive text resources is an important research topic for realizing the intelligentization of Natural Language Processing(NLP).As an important task in text mining,entity linking technology refers to identify the representation item of entity in the text and link it to the identified named entity.The research of this technology in English is relatively sufficient.However,the research foundation of Chinese entity linking is relatively weak due to the lack of related datasets and knowledge database,the limitation of word segmentation technology and the standardization of Chinese character writing,especially for the domain-oriented Chinese entity linking task.Thus,how to solve the problem of data sparseness becomes an important part of research in this task.In this paper,we study the Part building-oriented entity linking technology.To be specific,we first build the exogenous knowledge database,the synonym dictionary,the ambiguous dictionary and the Party building entity classification dictionary based on offline Chinese Wikipedia,aiming to provide data support for named entity recognition and entity linking tasks.In the named entity recognition,the TrBiLSTM-CRF model proposed by this paper selects the source-domain data to expand the target-domain data by calculating the correlation of data,in order to realize the transfer of knowledge from source-domain to target-domain and solve the problem of data lacking in domainoriented Chinese named entity recognition task.In the entity linking,we first select the candidate entity through a comprehensive usage of text and dictionaries,and then combine multiple semantic similarity calculation methods to analyze the semantic similarity among the representation item of entity and candidate named entities for constructing a graph model of semantic relations.The final results of entity linking are then obtained using the PageRank algorithm.In the part of experiments,we construct an entity linking dataset involving the Party building such as events,people,history of the Party,and the red revolution places.Firstly,the named entity recognition experiments are carried out on the dataset.The results show that the performance of the proposed TrBiLSTM-CRF model is significantly better than those of the baseline system and other methods.This proves that the data transfer method proposed in this paper can effectively improve the performance of domain-oriented named entity recognition.Based on its results,we then carry out the entity linking experiments.The results show that the Chinese entity linking technology based on the graph model of semantic relations can achieve ideal experimental results for each type of entity in a specific domain.
Keywords/Search Tags:Chinese Entity Linking Technology, Chinese Named Entity Recognition, Transfer Learning, Deep Learning, the Graph Model of Semantic Relations
PDF Full Text Request
Related items