Research On Automatic Text Classification Based On Machine Learning

Posted on:2022-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:W P Sheng

Full Text:PDF

GTID:2518306545955399

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Text classification is one of the natural language processing technologies.Many researches are related to text classification,such as news topic classification,question answering system classification and film review classification.Relying on manual text classification,not only time-consuming and inefficient,using computer technology to automatic text classification has become a hot research direction.Based on the study of natural language processing technology and machine learning theory,this paper makes a deep discussion on automatic text classification method based on machine learning.The main work and achievements include:First,a TF-IDF-MP keyword extraction algorithm based on weight preprocessing is proposed.By analyzing the limitations of the TF-IDF algorithm in feature word selection and text classification,the TF-IDF algorithm introduces parameters such as average word frequency and feature word position weighting factors,which are specifically based on the number of times the feature words appear in a single document and The average number of occurrences of the feature word in all documents in the corpus is compared,the improved Sigmoid function is used to adjust the weight of the feature word,and the position weight of the nouns in the first and last paragraphs of the article is weighted according to the marked part of speech feature words The factor is set to 1.2,and the TF-IDF algorithm is improved to extract document keywords.Second,a text classification model based on Bi LSTM-Att-CNN network is proposed.This model is based on the experimental data of removing stop words after the proposed model segmentation in the third chapter.It uses Bi LSTM to obtain global features of the text and better mines the semantic dependence of the word context.It also uses convolutional neural networks to extract deeper local features.The hidden layer adds an Attention mechanism,and assigns different weight values to improve the accuracy of text classification according to the semantic information contained in the feature words and the degree of influence on text classification.Finally,the above two methods based on the Sogou news data set for keyword extraction and text classification experiments,both achieved relatively ideal results.

Keywords/Search Tags:

Text classification, convolutional neural network, TF-IDF, LSTM, attention mechanism

PDF Full Text Request

Related items

1	Research On Chinese News Text Classification Based On Nested LSTM
2	Research On Automatic Text Classification Based On Machine Learning
3	Research On Text Classification Method Based On Machine Learning
4	Research On Convolutional Neural Network Text Classification Model Based On Attention Mechanism
5	Research On Text Classification Model Based On BGRU And Self-Attention Mechanism
6	Text Representation And Classification Based On Deep Learning With Improved Attention Mechanism
7	Research On Classification Of News Text Based On Deep Learning
8	Research On Text Classification Algorithm Based On Mixed Convolution
9	Research On Long Text Classification Algorithm Via Multi-model Fusion With Attention Mechanism
10	Research On Key Technologies Of Convolutional Neural Network-Based Short Text Classification