Font Size: a A A

Research On The Classification Method Of Enterprise National Economy Industry Based On BERT Model

Posted on:2024-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2568307100462314Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the increase of Small and Medium-sized Enterprises(SMEs)support policies,the number of SMEs has been increasing year by year,which brings great challenges to the economic management aspect of the country.Therefore,classifying enterprises into national economic sectors can help governments and regulators better understand the industries and characteristics of enterprises,so that they can manage SMEs more effectively and promote the healthy development of the national economy.This thesis focuses on the use of deep learning techniques for national economy industry classification of enterprise information text and discusses the problems that exist when using deep learning techniques.These problems include poor structural accuracy of individual neural network models,the same level of attention to all words,and the inability to address multiple meaning words.To address these problems,this thesis makes the following contributions to improve existing deep learning techniques:(1)In order to solve the problem of insufficient data in industry classification tasks,this thesis creates an Enterprise Information Text(EIT)dataset and obtains data such as basic information and business scope of enterprises by using crawler techniques and preprocesses them.This dataset,which covers SMEs in different fields and sizes,was used as the baseline dataset for this thesis and for model training.(2)In order to improve the efficiency and accuracy of data classification in the enterprise information text classification task,a BERT-based Convolutional Bi-directional Long Short-Term Memory network model(BERT-CBL)is proposed in this thesis.The model uses BERT to train word vectors and CNN-Bi LSTM model to extract features of enterprise information text,thus solving the problem of word vector polysemy,improving the expressiveness of text features,and enabling the model to better capture local information in text and retain contextual information,thus having advantages in processing long text.After experiments,it is proved that the BERT-CBL model achieves remarkable results in this task,with an accuracy rate of 86.89%.(3)In order to solve the problem of feature sparsity in enterprise information text,this thesis optimized the BERT-CBL model and proposed a BERT-based dual-feature fusion model(BERT-DFF).The model uses a Bi-directional Gated Recurrent Unit(Bi GRU)model based on a multi-head self-attention mechanism to extract global semantic features of enterprise information text,and introduces a multi-scale convolutional neural network model to extract different fine-grained local information in enterprise information text to obtain local key features.In this way,the proposed model achieves a performance improvement of 2.31% in experiments,which confirms that it significantly outperforms other advanced baseline methods in terms of performance and can meet the needs of practical industry classification.This thesis aims to construct an EIT dataset for SMEs and propose a model applicable to the classification of national economy industries.Through experiments,it is proved that the model can effectively save time and labor cost,and achieve good classification results.
Keywords/Search Tags:Enterprise Information, Text Classification, BERT, Neural Network, Multi-head Self-attention Mechanism
PDF Full Text Request
Related items