Font Size: a A A

Application Of Several Types Of Statistics And Deep Learning Methods In Enterprise Named Entity Recognition

Posted on:2022-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhuFull Text:PDF
GTID:2518306737953429Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Named entity recognition of enterprise name is a basic work in the processing of enterprise name text data,and it is an upstream task in natural language processing.Enterprise names usually combine words according to personal wishes.The recognition of such entities is different from other Chinese sentences.It has the characteristics of shorter length,rich semantic expression,and parallel entities.The accuracy of their recognition directly affects the accuracy of business name retrieval,business information extraction and business name matching.As the carrier of enterprise information,enterprise name can play a certain role in inter-enterprise cooperation and fnancial supervision.In this context,this paper studies the recognition of enterprise name naming entities.First,traditional recognition technology relies on the manual dic-tionaries and manual rules for the recognition of enterprise name,which requires a lot of manpower and time,and it is less flexible;secondly,when using neural network models to recognize enterprise names,the problem is the semanteme of character vector is too single.In response to the above problems,this paper adopts a selfbuilt enter-prise name corpus which is real and in Chinese.The paper uses YEDDA tools to preprocess the corpus data such as manual labeling and word segmentation.Aiming at the structural characteristics of short corporate name text,rich semantic expression,and complex word composition,this paper designs two experiments,one based on traditional statistical methods and the other one based on deep learning methods.According to the analysis of the existing literature,the HMM and the CRF methods were selected,because them are typical.The outcome is CRF model is better than HMM at the same conditions and data sets.And the comprehensive evaluation index F1 value reaches 84.33%.At the same time,the BERT-BLSTM-CRF model experiment is designed.It using the BERT pre-training language model and BLSTM neural network ex-tracts the context information of the word to enrich the semantics of the word,and then it using the CRF model to extract the tag sequence feature.Contrast with deep learning methods in existing literature.The experimental results show that the model used in this article has certain advantages.Its accuracy,recall rate,and F1 value reach 96.78%,94.88%,and 95.22%,respectively.The F1 value is more than 10% higher than other models.Explain the model effectively improves the recognition rate of the named entity of the enterprise name.
Keywords/Search Tags:Enterprise Name, Named Entity Recognition, BERT, Bi-directional Long Short-Term Memory, Conditional Random Field
PDF Full Text Request
Related items