Text Classification Based On TF-IDF Matrix And Caps Net

Posted on:2019-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:G Zhang

Full Text:PDF

GTID:2428330626452414

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Text classification is a very classical topic in Natural Language Processing.Text classification is also a very complicated processing that requires a large number of relevant professional knowledge about content filtering and feature extraction.With the fast development of network and multimedia,abundant data is transmitted through network.These valuable textual data is full of information about users' reading action.In practice,textual data is usually classified into many data sets according to different needs.It's very useful to extract the deep information in a subsequent step.So studying an effective text classification method has theoretical importance and practical applications.There are many words in a text,some of which are the weak relevant words.Weak relevant words play little role in the text categorization.TF-IDF(Term Frequency-Inverse Document Frequency)is an effective algorithm to analysis the word frequency in current information searching field.In information retrieval and text mining,TF-IDF weight,a statistical measure,is often used to calculate how important a word is to a document in a collection or corpus.CNN(Convolution Neural Network)which is a typical structure of neural network is very common in text classification.However,there are some limitations in BP(Back Propagation)of CNN affect the classification results to a certain extent.In order to eliminate the drawbacks,this paper uses the dynamic routing between capsules in CapsNet(Capsule Network).The main work and innovation are as follows.(1)According to the characteristics of textual data,an algorithm based on TF-IDF matrix is proposed to remove weak relevant vocabularies.The algorithm is used to remove some words which have little effect in the text categorization to cut down the amount of the feature.This helps reduce the size of the text embedding vector and improve algorithm efficiency.(2)CapsNet is used for classification after removing the weak relevant words from the text.Dynamic routing is useful to avoid the limitations of BP to remote the accuracy of text classification.(3)Experiment verify the effectiveness of the removal algorithm based on Term Frequency-Inverse Document Frequency.Besides,this paper summarize and analyze the deficiencies of the text classification algorithm based on CNN and future prospects.

Keywords/Search Tags:

Weak Key Words, TF-IDF, Capsule Network, Text Classification

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Fusion Label Information And Capsule Network
2	Research On Capsule Network Text Classification Algorithm Based On Label Embedding
3	Fine-grained Text Classification
4	Research On Manifold Learning Based On The Text Classification
5	Research On News Text Classification Based On Deep Learning
6	Research On Text Classification By Combined Global And Local Features
7	Application Of Weak Supervised Learning On Text Classification
8	Research On Stop Words And Feature Selection For Text Classification
9	Research On Dataless Text Classification With Seed Words: A Supervised Topic Modeling Approach
10	Research On SAR Image Classification Algorithm Based On Capsule Network