Research On Text Classification Technology Based On Machine Learning

Posted on:2020-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Xing

Full Text:PDF

GTID:2428330590453153

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,network data and resources are rapidly growing.How to effectively classify,manage and utilize such information has become a hot research topic.As an important basis of information retrieval and data mining,text classification technology has been widely used in the fields of content information filtering,natural language processing and understanding,and news classification.The text classification technology based on machine learning,which based on statistical theory,firstly uses algorithm to perform statistical analysis on known training data to obtain the rule,and then the rule is used to predict and analyze the unknown data.This thesis studies the text classification technology based on machine learning.Firstly introduce the general process of text classification and text preprocessing,text representation,spatial dimension reduction,classification method and classification performance evaluation.Based on this,the random forest algorithm in machine learning classification algorithm is selected for key research.We analysis the problems and optimization space of the algorithm,and optimizes it.On the one hand,for the problem of neglecting the strong and weak classifiers in the traditional random forest voting,the voting mechanism of the algorithm is optimized.Firstly,based on the classification effect of the decision tree,the decision tree is given corresponding weights,and then combined with the probability that the samples output by the decision tree belong to each class to perform a weighted probability vote.On the other hand,the value of hyperparameters in random forest algorithm has a great influence on the performance of the algorithm,and when the algorithm is applied to text classification the number of hyperparameters is large and the range of values is large.A hyperparameter optimization algorithm combining random search algorithm and grid search algorithm is proposed to solve the problem of parameter optimization of hyperparameters.A text classification experiment based on Python is designed for the proposed algorithm,the random forest algorithm which optimizes the voting mechanism is compared with the traditional random forest algorithm,and verify the effectiveness of the random forest algorithm that optimizes the voting mechanism and hyperparameter selection.The random forest optimization algorithm proposed in this thesis can improve the classification performance of the algorithm.And the hyperparameter optimization algorithm has certain reference significance for the hyperparameter optimization problem of machine learning algorithm,especially for the hyperparameter optimization problem in text classification.

Keywords/Search Tags:

machine learning, text classification, random forest, weighting mechanism, hyper parametric optimization

PDF Full Text Request

Related items

1	Research On Optimization Of Random Forest Algorithm And Its Application In Text Parallel Classification
2	Text Classification Algorithm Based On Mahalanobis Hyper Ellipsoidal Learning Machine
3	Research On ELM Image Classification Combining HOG And Random Forest
4	A Classification System For Network Violation Information Based On Machine Learning
5	Completing News Classification By Related Machine Learning Algorithms
6	Several Research On Random Forest Improvement
7	Random Forest In Application Of Text Classification
8	Research On Parallel Text Categorization Of Random Forest
9	Class-Imbalanced Data Stream Classification Method Based On Adaptive Random Forest
10	Research On Parallel Text Classification Algorithm Base On Random Forest And Spark