Font Size: a A A

Named Entity Recognition For Communication Terminology

Posted on:2020-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2428330575956749Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is a basic technology in natural language processing,which provides important basic information for other tasks.The field of communication studied in this thesis has the characteristics of high professional knowledge,abundant terminology and diverse subdivisions,but lacks the necessary high-quality entity list and corpus,which seriously restricts the development of the task of named entity recognition in this field.At the same time,although the named entity recognition technology has achieved good results in the conventional field,when it is applied in the field of communication,which is highly professional,there are often many adaptability problems.Nominated entity recognition is rarely studied in the field of communication.With the vigorous development of communication technology,accurate and efficient extraction of named entity from professional literature in the field of communication is the basis of supporting the deepening application of natural language technology in this field,and has a high value for reference to other professional fields.This thesis focuses on the task of Chinese named entity extraction in the field of communication,and supplements the terminology defined by China Communications Standardization Association in the "Communication Dictionary Retrieval System" and the corpus of communication literature abstracts crawled from the HowNet.In this thesis,the characteristics of named entities in the field of communication are analyzed,and the basic hypothesis of nested named entities in this field is put forward.Then,based on this assumption,we focus on lexical domain discrimination in dictionary construction and word/word dimension adaptation in feature selection.We propose a lexical domain discrimination method based on Latent Dirichlet Allocation(LDA)model and a communication feature extractor based on Conditional Random Field(CRF)model.Law.Subsequently,several communication domain features are selected.Based on the Long Short-Term Memory(LSTM)model,domain knowledge is mapped into input features by using the pre-CRF layer,and a named entity recognition model for communication domain terminology features is constructed.Finally,the design features are superimposed and tested,and the recognition results are compared to verify the validity of the selected domain features and the applicability of the designed model in the field of communication.The results of this thesis have been applied to the construction of communication knowledge atlas of an enterprise,which provides an important basic technical support for accurate extraction of object ontology.At the same time,the results of this thesis have important reference significance for the research of named entity recognition in other similar professional fields.
Keywords/Search Tags:Named Entity Recognition, Named Entity, Communication Terminology, Latent Dirichlet Allocation, Conditional Random Field, Long Short-Term Memory
PDF Full Text Request
Related items