Font Size: a A A

Application Of Bert In Chinese Company Name Recognition

Posted on:2022-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306749471854Subject:Enterprise Economy
Abstract/Summary:PDF Full Text Request
Chinese company name entity recognition is an important segmentation of Named Entity Recognition.Chinese company name entity recognition is the basis for automated analysis of the company's exposure,popularity,operating status and other information,which is significant in analyzing news and financial reports.However,nowadays Chinese company name entity recognition has many difficulties.In view of these difficulties,this thesis optimizes the traditional model,and builds a text match model based on company name feature thesaurus.The main research of this thesis is shown as follows:(1)In terms of the lack of separators in Chinese texts and the extra meaning of Chinese company name,this thesis selects single Chinese character vectors as input,using BERT word vector model and the traditional Word2 vec model to generate word vectors,and then put word vectors into BiLSTM-CRF model to make judgments.After fully combining the textual context,this model reduces the impact of word segmentation errors and polysemy on the model,and improves the recall rate and accuracy rate of the model.(2)In terms of the various forms of company names and their multiple contexts,this thesis uses the fine-tuned RoBERTa model and wwm strategy to generate more accurate word vectors for the word formation characteristics of Chinese characters.By this way,this thesis solved the difficulty in recognize the company name from complex context.(3)This thesis evaluates the corpus for the wide variety of company name forms,and proposes a corpus evaluation method using the integrity of company name and the percentage of company name entities in total named entities,so that the model can simulate a variety of application scenarios and fully test the performance of the model.This thesis selects the classified financial report data of Sina News in recent years and the boson corpus for testing,compare and test word vector models such as Word2 vec,BERT,RoBERTa on these corpus,and use the wwm strategy on the BERT and RoBERTa pre-training models.The test results show that,compared with the commonly used Word2 vec word vector model,the BERT word vector model can better identify the company name entity in relation to the context;compared with BERT,the wwm increases the F1 rate by 1.75% with the equal training time on financial report data,and in the boson corpus,RoBERTa improve the F1 rate by 3.11% while RoBERTa-wwm improving by 4.02%,which supports the effectiveness of RoBERTa model and wwm strategy.
Keywords/Search Tags:Chinese company name entity recognition, Artificial neural networks, BERT, BiLSTM-CRF, RoBERTa, wwm
PDF Full Text Request
Related items