Font Size: a A A

Multi-label Text Classification Based On Neural Network

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:X X XuFull Text:PDF
GTID:2428330623967957Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Since 2013,neural network based deep learning theory has made significant progress,and has been widely used in the field of image and natural language processing learning,and has spawned many research and application directions.Text classification is one of the most important tasks in natural language processing.There are many applications in real life,such as public opinion monitoring,label recommendation,and information search.Traditional single-label text classification is difficult to solve the problem of text diversity in real-life scenarios.Multi-label text classification has become a popular research direction in natural language processing text classification tasks.The research in this article is based on the text classfication research,using TextCNN neural network as the classification framework,through the improvement of text word vectors,the tf-word word vector algorithm is proposed,and the number of convolution layers is deepened on the basis of TextCNN neural network,and The threshold is improved to improve the accuracy of text classification,and a TL-TextCNN model is constructed.Through experiments,our model has greatly improved the indicators of multi-label text classification.TL-TextCNN is suitable for multi-label text classification.The main work of this article includes the following aspects:(1)The tf-word algorithm is proposed for the text word vector problem.The features extracted by tf-idf are highly sparse and cannot reflect the specific position of words in the text and the relationship between words and words.word2 vec is a commonly used word vector expression model,but it can not solve the problem of word ambiguity or make specific optimization for specific problems.The tf-word algorithm combines the advantages of tf-idf and word2 ve,generated dense word vectors with word frequency features.(2)Traditional the final output of TextCNN is the label probability distribution based on softmax,and the label of the text is determined by the threshold.In this thesis,for different labels,using text similarity to design thresholds,a label-threshold algorithm can effectively improve the accuracy of multi-label text classification.(3)We design and implement a multi-label text classification model TextCNN based on deep learning,and improve the model and propose a TL-TextCNN network.In the traditional TextCNN model,we combine tf-word word vector and label-threshold algorithm,the number of convolution layers is increased and the convolution is increased.We analyze the number of convolutional layers.In summary,this thesis designs the TL-TextCNN model and conducts experimental verification on the AAPD and RCV1-V2 data sets.The experiment proves that the classification accuracy on the data set AAPD has increased by 4.7%.On the RCV1-V2 data set,the classification accuracy has increased by 1.8%.The TL-TextCNN model can effectively improve the accuracy of multi-label text classification.
Keywords/Search Tags:Neural Network, Multi-label Text Classification, Word Vector, TextCNN
PDF Full Text Request
Related items