Font Size: a A A

A Research Toward Chinese Named Entity Recognition Based On Transfer Learning

Posted on:2022-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaoFull Text:PDF
GTID:2518306332957949Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the information age of the Internet,the proportion of information stored in structured and unstructured form and used for language processing is very large.Before the widespread use of neural networks for natural language processing tasks,the researchers in the field of named entity recognition was usually focused on using lexical and syntactic knowledge to improve the performance of their methods.As low-resource named entity recognition tasks have become mainstream,transfer learning as a means to cope with this has become a popular research direction.Cross-domain transfer learning is a technique which can transfer knowledge from a high-resource domain to a low-resource domain to compensate for the lack of data information,and transfer learning is effective in dealing with resource-poor named entity recognition tasks.To improve the performance of a neural network-based named entity recognition model in the face of the lack of well-annotated entity data,a transfer learning-based Chinese named entity recognition model is proposed in this paper.The specific tasks are as follows.Firstly,a data transfer method based on entity features is proposed in the third chapter.The BERT model based on pre-training is applied to generating word vectors.By calculating the similarity of feature distribution between low resource data and high resource data,the most representative entity features are selected for feature transfer mapping,and the distance of entity distribution between the two domains is calculated to make up the gap between the data of the two domains.The neural network model is trained by using high resource data.Then,an entity boundary detection method based on vocabulary information is proposed in fourth chapter.This method utilizes the Bi LSTM+CRF as the main structure of the model,and then integrates character boundary information to assist the attention network to improve the model's ability to recognize entity boundaries,thereby further improving the overall model Entity recognition performance.By analyzing the test results,it can be seen that compared with the ordinary model,the addition of boundary detection method can improve the F1 value by 3%,and the accuracy rate and recall rate by 2% and 3% respectively.Finally,multiple named entity recognition methods based on transfer learning are selected as baseline methods for comparison,and experiments are conducted on several datasets from different domains.The results show that the model proposed in this paper improves the accuracy of named entity recognition by 1%,the recall rate by2%,and the F1 value by 2% on average in the field with low-resource.
Keywords/Search Tags:Named entity recognition, Transfer learning, LSTM, CRF
PDF Full Text Request
Related items