Font Size: a A A

Research On Chinese Text Multi-label Classification Based On Deep Learning

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y H JiangFull Text:PDF
GTID:2428330614454983Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development and promotion of the Internet and computer technology,text information on the Internet has shown an explosive growth trend,and the information overload is serious.In order to efficiently manage the content of text information,to achieve accurate text information positioning,text information filtering,and real-time processing of text data are inseparable from the rapid development of text classification technology.The multi-label text classification method based on deep learning realizes the automatic processing of text content tags,can effectively utilize and manage text information.In this study,the multi-label text classification task is studied,and the tagged text data is obtained to provide experimental data support for the subsequent multi-label text classification research.The data comes from the Goku Q&A website and Baidu knows the website.Since the data on the website is based on users' questions and labeling according to their needs,the data has characteristics of diversity and noise.Therefore,the data has the characteristics of diversity and high noise.In order to ensure the availability of data obtained through web crawler technology.Firstly,the data is cleaned,including sensitive word filtering,length-to-national,zero-width character filtering,meaningless text filtering,and semantic integrity judgment.Then the cleaned data is segmented by Niutrans word segmentation tool.Finally,the word2 vec tool is used to perform word vector conversion on the number after word segmentation,which is convenient for input model training.Text RNN and Text CNN,as the main model architectures to solve the problem of multi-label text classification,have their own advantages and limitations.Because Text RNN model adopts Bilstm structure,the output of the latter time step depends on the output of the previous time step and cannot be processed in parallel,resulting in the overall running speed being too slow.Text CNN mainly relies on filter window to extract features,which has limited ability in long-distance modeling and is not sensitive to word order.The sliding window size of Text CNN is not easy to determine,choosing too small will easily cause important information to be lost,and choosing too large will cause huge parameter space.Therefore,this paper proposes a Text RCNN multi-label text classification method based on attention mechanism.This method combines the advantages of cyclic neural network to process sequence data and the advantages of convolutional neural networks to extract local features.At the same time,the attention mechanism is introduced to make the model focus on thefeatures that contribute more to the text classification results.The experimental results show that the classification effect of the model is optimal,F value reach 0.9612.
Keywords/Search Tags:Multi-label text classification, Deep Learning, TextRNN, TextCNN, Attention mechanism, TextRCNN
PDF Full Text Request
Related items