Font Size: a A A

Complex Chinese Named Entity Recognition In Finance

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhuFull Text:PDF
GTID:2428330614970087Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition is a basic task in the field of natural language processing and plays a pivotal role in tasks such as information extraction,machine translation,and knowledge graph construction.It has also received widespread attention in financial,biological and pharmaceutical industries.Generally speaking,before model training,named entity recognition needs to manually label a large amount of text data to ensure the richness of the sample,and then machine tagging is used to train the tagger.At present,most of the researches on named entity recognition are short entities.In the field of annotated corpus data,named entities based on fully supervised learning have achieved high performance.Because the process of labeling data is time-consuming and labor-intensive,most Only partially labeled data exists.In the case of insufficient labeled data,weakly supervised iterative learning is usually used to gradually train the model.The research in this paper is mainly aimed at the problem of complex entities in the text in the financial field and insufficient labeling data.The commonly used named entity recognition schemes cannot effectively identify complex entities in the text in this case.This paper proposes a method of weakly supervised learning to recognize the complex named entities(commonly composed of multiple small entity sequences,hereinafter referred to as CNEs)in the corpus,which makes it difficult to determine the boundaries of such entities.To improve the recognition accuracy,our method is proposed to separate the context semantic relationship determination from the entity boundary confirmation.The specific work is as follows:1)In this paper,we propose a semantic model based on CNEs mask processing.Before training,the CNEs in the corpus will be masked,and then use the masked corpus training the semantic model through Bi LSTM-CRF.2)And we also propose a weakly supervised CNEs boundary confirmation model based on sequential patterns.In the small sample data set,the target CNE candidate set is found by sliding window combined with sequence pattern matching,and then it is effectively screened and judged by the semantic understanding model obtained in 1).3)In addition,the complex entities in the text also affect the effectiveness of weakly supervised training to a certain extent.In this regard,this paper proposes an Optimized-Bootstrapping algorithm based on the sample similarity scoring mechanism.It can effectively improve the selection of incremental samples Reliability of incremental samples in weakly supervised iterative learning.In this paper,the data in the financial field is used as an experimental data set to compare the effects of the currently popular models in named entity recognition and the proposed scheme.The results show that the method proposed in this paper is more direct The named entity recognition method based on Bi LSTM-CRF has greatly improved the performance of small data training samples,and the proposed method has certain generalization ability.
Keywords/Search Tags:named entity recognition, weakly supervised learning, deep learning, pattern matching, high dimensional index
PDF Full Text Request
Related items