Font Size: a A A

Research On Chinese Named Entity Recognition Algorithm Based On Remote Semantics And LSTM-CRF

Posted on:2022-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:M YuFull Text:PDF
GTID:2518306494470994Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the amount of information and data is growing exponentially.How to accurately obtain effective information from massive text and transform unstructured data into structured data has become a research hotspot and difficulty.Named entity recognition is one of the basic research.Named entity recognition aims to identify and extract specific named entities from natural language texts for subsequent natural language understanding and generation tasks.The traditional named entity recognition task is based on Chinese word segmentation.Word segmentation errors will continue the whole process of the task.Error propagation will lead to the degradation of the performance of named entity recognition,and the model is difficult to fully extract the potential semantic information in the long sentence sequence.The emergence of deep learning provides a new tool for the research of named entity recognition technology.Based on long short-term memory(LSTM),this paper studies the Chinese named entity recognition technology under long-distance text,and optimizes it under the data of nuclear power field.The main contents and innovations of this paper are as followsAiming at the problem of poor performance of long-distance text information extraction,this paper proposes a named entity recognition algorithm based on attention mechanism and lattice LSTM.The algorithm uses lattice LSTM network to introduce word information,which effectively solves the problem of word boundary error propagation caused by Chinese word segmentation errors;Because the chain structure of LSTM network can not make full use of the global information of sentence sequence,and the performance of feature extraction for long-distance text information will weaken with the increase of distance,this paper introduces the attention mechanism in lattice LSTM network to obtain the internal correlation of word information and long-distance semantic information;In addition,the parameters of the model are optimized.Experimental results on several datasets show that the proposed algorithm improves the recognition performance by 0.48%-1.11%compared with the reference algorithm.Furthermore,this paper studies the problem of named entity recognition in specific domain,and designs a named entity recognition algorithm in nuclear power domain based on LSTM-CRF(conditional random fields,CRF).In view of the shortage of corpus resources in the field of nuclear power,the data set in the field of nuclear power is constructed,and the data cleaning and annotation of nuclear power corpus are completed;Aiming at the serious problem of named entity nesting in nuclear power data set,this paper designs a single character candidate word path,which provides effective word information for LSTM network training and speeds up the network training process.Compared with the reference algorithm,the F1 value of this algorithm improves the recognition performance by 0.37%.
Keywords/Search Tags:Named entity recognition, remote semantics, lattice Long Short-Term Memory, attention mechanism, nuclear power field
PDF Full Text Request
Related items