Font Size: a A A

Research And Application Of Parallelization Of Text Classification Based On Improved Convolutional Neural Network

Posted on:2022-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z MaFull Text:PDF
GTID:2518306575966459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In today's Internet era,almost all data is generated and stored on the Internet,which also brings a great challenge to the task of text classification.Compared with traditional machine learning algorithms,deep learning models have more advantages in large-scale data processing.Therefore,deep learning has gradually become an important research direction in the field of text classification.Convolutional neural network plays an important role in the field of text classification because of its good performance of feature extraction.However,in order to deal with a variety of text classification problems,Convolutional Neural Network usually requires more complex model variants and faces great challenges.Convolutional neural network in the text classification problems are mainly faced in the following aspects: 1)how to effectively improve the quality of the features extracted by Convolutional Neural Network to improve the classification performance;2)how to alleviate the problem of large amount of computation and long time consuming for large-scale data classification by convolutional neural network with complex structure,so as to improve the classification efficiency;3)how to effectively deal with the problem of data imbalance in text classification.In view of the above problems,this thesis studies how to improve the efficiency and accuracy of text classification by convolutional neural network under large-scale data.The main research contents can be divided into the following aspects:1.Convolutional neural network is characterized by a large amount of computation,focusing on local features and ignoring global connections.In order to obtain deep-level multi-scale features,convolutional neural network needs to superimpose multiple layers of convolutional kernels of different sizes,which will lead to an increase in the number of parameters and the amount of computation.In this thesis,a CNN-SVM model based on improved Inception is proposed to change the structure of depth growth of convolutional neural network by combining deep separable convolution and void convolution with dense submatrices,so as to obtain deep-level multi-scale features while reducing the computational load.At the same time,the multi-head self-attention mechanism is adopted to obtain the long-distance global connection,which alleviates the disadvantage of convolutional neural network in global connection.2.In the text classification of large-scale data,it will take a long time and consume a large amount of computing resources to process large amounts of data and complex models.To solve this problem,the CNN-SVM model based on improved Inception is designed for parallelization.Data parallelism and global parameter update strategies are adopted to further reduce the computational complexity of the algorithm.The algorithm is implemented based on the Spark computing framework to make the algorithm more suitable for processing large-scale data.3.Data imbalance often exists in actual data.The problem of data imbalance will seriously affect the classification effect.However,there is a serious data imbalance problem in most biomedical texts.In order to alleviate this problem,this thesis uses the focused loss function as the loss function of the model to classify biomedical texts.The experimental results show that the proposed model can achieve a good classification effect on large-scale text classification and improve the efficiency of classification.
Keywords/Search Tags:Text classification, CNN, Attention mechanism, Parallelization, Biomedical
PDF Full Text Request
Related items