Font Size: a A A

Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LanFull Text:PDF
GTID:2518306569494994Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information age,the amount of information in the network has increased dramatically at an alarming rate.It is extremely difficult for users to accurately find the required parts in these huge amounts of information.How to use computers to assist users in filtering noise data and mining useful information has become a current research hotspot.Named entity recognition is a basic task of natural language processing,which aims to use computers to automatically extract named entities from natural language texts to lay the foundation for later more advanced tasks.For scientific research and technical personnel,it is often necessary to find information from a large amount of literature,but the research of named entity recognition in the Chinese field is currently mostly concentrated in the general field,that is,the field of news text.There are few researches on named entity recognition in the field of science and technology.In order to identify effective entities or terms in the specific field of science and technology texts,a method for generating dependent entity word vectors is proposed,a model for entity recognition in the field of science and technology is built,and the research on entity recognition methods in the field of science and technology is carried out and experimental verification is carried out.Aiming at the feature that the technical terms in the texts in the scientific and technological field are mostly compound words,the use of mutual information and TF-IDF and other methods to construct a scientific and technical term dictionary ensures that the terminology can be retained as completely as possible when the corpus is segmented,so as to predict future word vectors Prepare for training.In addition,according to the clear and concise sentence structure of scientific and technological texts,and the characteristics of clear reference,the dependency characteristics of words between sentences are introduced into the process of word vector training,and the generation model of dependent entity word vectors is constructed,and the semantic similarity level is carried out.The comparison of word vector training methods verifies the effect of dependent features on the semantics of word vectors,and uses the trained dependent entity vector as the pre-trained word vector for the subsequent entity recognition model.Aiming at the problem that the current mainstream Chinese named entity recognition model only uses character vectors for sequence modeling and loses word and word sequence information,a lattice input layer for entity recognition in the science and technology field is established,and the character vectors and word vectors are spliced together.It can use the information of two granularities of characters and words at the same time to provide conditions for the use of dependent entity word vectors.In order to be able to use the global information of the sentence,a sequence modeling layer of Bi LSTM-Attention is established.Experiments were designed to study the effects of lattice input layer,dependent word vectors and Attention mechanism on entity recognition.The experimental results prove that the dependent entity word vector can significantly improve the performance of entity recognition in the scientific and technological field,and the Attention mechanism has improved the performance of entity recognition in the general field and the scientific and technological field.In summary,this article is based on the characteristics of entity recognition in the scientific and technological field,and based on these characteristics,the research on entity recognition methods in the scientific and technological field is carried out,and the performance of entity recognition is improved on the basis of existing methods.
Keywords/Search Tags:Chinese named entity recognition, dependency analysis, word vector pre-training, Attention mechanism
PDF Full Text Request
Related items