Research On Text Classification Method Based On Feature Vector Construction

Posted on:2020-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:Q Gu

Full Text:PDF

GTID:2428330596479671

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Text is the source of diverse information,and because of its unstructured nature,getting insights from it takes too much and is relatively difficult.Text categorization is a classic theme in the field of natural language processing and is the process of assigning predefined labels or categories based on content.As a scientific research trend under the massive data environment,neural network is an automated predictive analysis method.Representation method in text representation model based on neural network is easy to exhibit high sparsity,and the classification model often has problems such as poor classification performance for specific situations.Faced with the above problems,this paper has carried out the following research:(1)Text representation.Aiming at the problem that the GloVe model has large number of irrelevant words in the process of word vector representation training,this paper proposes a WT-GloVe-based word vector weighting model.Firstly,the feature word extraction is carried out by means of feature weighting algorithm based on word spacing and inter-class contribution degree.Secondly,according to the shortcomings of GloVe model,a filtering irrelevant word method is proposed to improve the quality of word vector training.Finally,combined with the feature weighting algorithm based on word spacing and inter-class distribution and GloVe filtered by irrelevant words,a weighted word vector model is generated to effectively obtain the importance degree and semantic information of feature words,and form a new word vector weighting model.Reference to other models in the same environment,The word vector weighting model based on WT-GloVe can effectively improve the classification effect.(2)Text classification.Aiming at the problem that the fasttext model is classified in Chinese text,the effect of the word information obtained by the sub-word embedding method is not obvious and a large number of redundant terms are generated.This paper proposes a text classification model based on STL-fastText.Firstly,based on the TF-IDF algorithm,a low-frequency word weighting algorithm based on correlation is proposed.Secondly,the whole corpus is used as the input of the LDA model.Perform a topic analysis on the text content to learn the distribution of its subject words,the obtained result is supplemented by the low frequency high discrimination feature.Finally,the dictionary is reconstructed from the input layer of the fastText model,and the new dictionary obtained by the feature is added as the input of the hidden layer to complete the construction of the STL-fastText model.Reference to other models in the same environment,the experimental results show that the text classification model based on STL-fastText can effectively improve the classification effect of Chinese short texts.

Keywords/Search Tags:

Neural netword, Text classification, TF-IDF, WT-GloVe, STL-fastText

PDF Full Text Request

Related items

1	Research On Chinese Text Classification Based On Improved FastText
2	Research On FastText Text Classification Algorithm Based On TF-IDF
3	Research On The Method And Its Application Of Short Text Classification Based On FastText
4	Research Of Text Orientation Classification Based On Neural Netword
5	Research On Chinese Short Text Classification Based On Improved FastText
6	Research On Fast And Precise Classification Algorithm Of Long Text Based On FastText
7	Research On Text Classification Based On Deep Learning
8	Research On FastText-based Classification Of News Texts And Its Application In Agricultural News
9	Research On Text Classification Based On Improved TF-IDF And FastText Algorithm
10	Research On Text Classification Method Based On Graph Convolutional Neural Network