Deep Text Classification Model Research Oriented On Category Consistency

Posted on:2021-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Zeng

Full Text:PDF

GTID:2428330611467010

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Text classification is an important basic task in natural language processing tasks.According to whether the training samples and test samples belong to the same category set,it can be divided into closed-world text classification and open-world text classification.Nowadays,the main research to solve this task is focused on deep learning models.Therefore,this paper takes the category consistency as an entry point,then utilize deep learning models to solve above two types of text classification tasks.The Neural Bag-of-Word model is a kind of simple and effective model to solve closed-world text classification.However,after our analysis,the current models do not consider the different discrimination capabilities of words when generating text vectors.Meanwhile,they also do not consider the order of words.Therefore,we propose a Weighted Word Embedding Model to solve the first limitation and combine N-grams to alleviate the second limitation.We evaluate the proposed model on five datasets and verify its effectiveness.In addition,we design two case studies and visualize the weights to further illustrate our ideas.The Convolutional neural network model is also a kind of effective model to solve closed-world text classification.However,after our analysis,the parameters of most existing CNN models will increase with the increase of the length of N-grams features because they extract N-grams features through convolution filters of fixed window size.It will increase the cost of hardware.Meanwhile,they do not consider the different discrimination capabilities of words when generating local feature vectors.Therefore,we propose a model called Weighted N-grams Convolutional neural network to alleviate above two limitations.We evaluate the proposed model on five datasets and verify its effectiveness.In addition,we analyze the relationship between the parameter amount of the model and the length of the captured local features.Deep Open Classification Model(DOC)is a classic model for solving open-world text classification.It has best classification performance among non-incremental models.However,after experimental observation,we find that the Gaussian distribution of probability is not an easy assumption.Therefore,we design a framework called Open Classification UnifiedFramework.The framework introduces three strategies of mixed loss function,batch normalization and data augmentation to reduce the distribution difference between the training set and test set,which makes the framework more satisfy the assumption of the Gaussian distribution of probability.We evaluate the framework on two datasets and verify its effectiveness.Meanwhile,we also analyze the effectiveness of mixed loss function,batch normalization and data augmentation.

Keywords/Search Tags:

closed-world text classification, open-world text classification, deep learning

PDF Full Text Request

Related items

1	Research On Key Technologies Of Chinese Text Classification Based On Deep Learning
2	Reserch On Application Of News Text Classification Based On Deep Learning
3	Research On Text Classification Based On Deep Neural Network
4	Design And Implementation Of Long Text Classification Algorithm Based On Deep Neural Network
5	Research And Application Of Text Classification Technology Based On Deep Learning
6	The Research Of Text Classification Based On Deep Learning
7	Research On Text Classification Method Based On Convolutional Neural Network
8	Research On Multi-label Classification For Scientific Text Resources Based On Deep Learning
9	Analysis Of Text Information Based On Deep Learning
10	Research On Text Representation And Classification Based On Deep Learning