Font Size: a A A

Research On Text Classification Algorithm Based On Multi-Channel Parallel Classifier

Posted on:2024-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:B L LuFull Text:PDF
GTID:2568307064484674Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet,the value of data has been continuously recognized.As an important carrier of information,text information occupies an increasingly important position in network resources.Text sentiment analysis in natural language processing has gradually become a research hotspot in the field of artificial intelligence,and text classification algorithms have also become the core content of research.The study found that the current text classification method based on machine learning needs a large amount of emotional corpus,and there are problems such as low model efficiency,dimension explosion,and inability to distinguish polysemy of words.The single classifier model based on deep learning is not accurate enough for text classification,and the corpus vocabulary weight,word frequency and feature extraction capabilities,as well as the ability to extract semantic information from the context and context of the text,cannot be taken into account.Aiming at the problems of low efficiency,dimension explosion and insufficient feature extraction of text corpus in traditional text classification methods,this thesis proposes a text classification method based on multi-channel parallel processors.By constructing a multi-channel fusion model,the text Sequence features are used to identify and classify,and achieve better results in sentiment classification problems.This thesis completes the following three aspects of research work:1.Construct model structures based on convolutional neural network(TextCNN),long short-term memory network(LSTM),and Transformer respectively,and explore the ability of a single classifier model to classify text.At the same time,a data set is built,and text data with rich emotional characteristics in various industries and fields is extracted through crawler technology.In order to make up for the shortcomings of a single kernel classifier,and give full play to the ability of the convolutional neural network kernel to extract local text features,the long-short-term memory network kernel to extract contextual sequence features,and the Transformer model kernel to assign greater weights to important words The ability to build multi-channel parallel processors,embed separately constructed kernels,and conduct comparative experiments with multiple single-kernel models at the same time.Validate on evaluation indicators such as accuracy rate,precision rate,recall rate,F1-score,etc.,draw the ROC curve,and measure the AUC value.The experimental results show that F1-score and AUC of the fusion model are up to 0.8938 and 0.9592,both of which are higher than that of the single kernel model,and the fusion model has stronger emotion classification ability.2.Introduce TF-IDF weighting to solve the problem that the text classification model has insufficient ability to extract word frequency.TF-IDF is a commonly used weighting factor for text information retrieval.The ability to extract word frequency.In this thesis,firstly,TF-IDF weighting is introduced into the bidirectional long shortterm memory network(Bi LSTM)text classification model,and at the same time,the ability of bidirectional long short-term memory network and long short-term memory network to extract contextual semantic information is explored,and comparative experiments are carried out to verify the TF-IDF The effect of IDF weighting on text classification.The experimental results show that the model classification effect of the bidirectional long short-term memory network based on TF-IDF weighting is better than that of the unweighted network model.Due to the ability to extract reverse features,the Bi LSTM network has better classification effect than LSTM.3.On the basis of the above research,aiming at the problems of uneven distribution of text corpus information and insufficient semantic feature extraction in text classification,a multi-channel parallel text classifier model weighted by TF-IDF is constructed.The model is finally based on Convolutional neural network,bidirectional long-short-term memory network,Transformer as the core,the model introduces a bidirectional long-short-term memory network to replace the long-short-term memory network,and improves the model to increase the extraction of reverse text features,and introduces in the improved model TF-IDF weighting further improves the ability to extract word frequency.F1-score and AUC of fusion model TBCNT reached 0.9011 and 0.9633.The experimental results show that the classification effect of the multichannel parallel text classifier based on TF-IDF weighting constructed in this thesis is better than that of a single classifier model.
Keywords/Search Tags:text classification, TextCNN, BiLSTM, self-attention mechanism, TF-IDF
PDF Full Text Request
Related items