Font Size: a A A

Research On Automatic Text Classification Based On Machine Learning

Posted on:2022-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:W P ShengFull Text:PDF
GTID:2518306545955399Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text classification is one of the natural language processing technologies.Many researches are related to text classification,such as news topic classification,question answering system classification and film review classification.Relying on manual text classification,not only time-consuming and inefficient,using computer technology to automatic text classification has become a hot research direction.Based on the study of natural language processing technology and machine learning theory,this paper makes a deep discussion on automatic text classification method based on machine learning.The main work and achievements include:First,a TF-IDF-MP keyword extraction algorithm based on weight preprocessing is proposed.By analyzing the limitations of the TF-IDF algorithm in feature word selection and text classification,the TF-IDF algorithm introduces parameters such as average word frequency and feature word position weighting factors,which are specifically based on the number of times the feature words appear in a single document and The average number of occurrences of the feature word in all documents in the corpus is compared,the improved Sigmoid function is used to adjust the weight of the feature word,and the position weight of the nouns in the first and last paragraphs of the article is weighted according to the marked part of speech feature words The factor is set to 1.2,and the TF-IDF algorithm is improved to extract document keywords.Second,a text classification model based on Bi LSTM-Att-CNN network is proposed.This model is based on the experimental data of removing stop words after the proposed model segmentation in the third chapter.It uses Bi LSTM to obtain global features of the text and better mines the semantic dependence of the word context.It also uses convolutional neural networks to extract deeper local features.The hidden layer adds an Attention mechanism,and assigns different weight values to improve the accuracy of text classification according to the semantic information contained in the feature words and the degree of influence on text classification.Finally,the above two methods based on the Sogou news data set for keyword extraction and text classification experiments,both achieved relatively ideal results.
Keywords/Search Tags:Text classification, convolutional neural network, TF-IDF, LSTM, attention mechanism
PDF Full Text Request
Related items