Application Of Several Types Of Statistics And Deep Learning Methods In Enterprise Named Entity Recognition

Posted on:2022-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Zhu

Full Text:PDF

GTID:2518306737953429

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Named entity recognition of enterprise name is a basic work in the processing of enterprise name text data,and it is an upstream task in natural language processing.Enterprise names usually combine words according to personal wishes.The recognition of such entities is different from other Chinese sentences.It has the characteristics of shorter length,rich semantic expression,and parallel entities.The accuracy of their recognition directly affects the accuracy of business name retrieval,business information extraction and business name matching.As the carrier of enterprise information,enterprise name can play a certain role in inter-enterprise cooperation and fnancial supervision.In this context,this paper studies the recognition of enterprise name naming entities.First,traditional recognition technology relies on the manual dic-tionaries and manual rules for the recognition of enterprise name,which requires a lot of manpower and time,and it is less flexible;secondly,when using neural network models to recognize enterprise names,the problem is the semanteme of character vector is too single.In response to the above problems,this paper adopts a selfbuilt enter-prise name corpus which is real and in Chinese.The paper uses YEDDA tools to preprocess the corpus data such as manual labeling and word segmentation.Aiming at the structural characteristics of short corporate name text,rich semantic expression,and complex word composition,this paper designs two experiments,one based on traditional statistical methods and the other one based on deep learning methods.According to the analysis of the existing literature,the HMM and the CRF methods were selected,because them are typical.The outcome is CRF model is better than HMM at the same conditions and data sets.And the comprehensive evaluation index F1 value reaches 84.33%.At the same time,the BERT-BLSTM-CRF model experiment is designed.It using the BERT pre-training language model and BLSTM neural network ex-tracts the context information of the word to enrich the semantics of the word,and then it using the CRF model to extract the tag sequence feature.Contrast with deep learning methods in existing literature.The experimental results show that the model used in this article has certain advantages.Its accuracy,recall rate,and F1 value reach 96.78%,94.88%,and 95.22%,respectively.The F1 value is more than 10% higher than other models.Explain the model effectively improves the recognition rate of the named entity of the enterprise name.

Keywords/Search Tags:

Enterprise Name, Named Entity Recognition, BERT, Bi-directional Long Short-Term Memory, Conditional Random Field

PDF Full Text Request

Related items

1	Named Entity Recognition Based On BiLSTM-CRF
2	Research On Text Data Sequence Labeling
3	Named Entity Recognition For Communication Terminology
4	Named Entity Recognition In Medical Field
5	Address Entity Identification Based On Bidirectional LSTM And CRF Models
6	Research And Implementation Of Multi-field And Multi-range Entity Recognition Based On Bi-LSTM
7	Chinese Named Entity Recognition Based On Semantic Vectors Integration
8	Research Of Named Entity Recognition Based On Probabilistic Dependency
9	Research Of Chinese Named Entity Recognition Based On Deep Neural Network
10	Chinese Named Entity Recognition Based On Bidirectional LSTM-CRF Model