Research On Text Representation And Classification Based On Machine Learning Algorithm

Posted on:2019-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Yuan

Full Text:PDF

GTID:2518306473453404

Subject:Control theory and control engineering

Abstract/Summary:

With the gradual progress of Internet technology,network information has grown with texts,audio and video,and images as carriers.The scale,type,and content have been increasing.For the massive information and data,how to efficiently mine out the required content,standardize the management of data and provide the reference for classification decision-making has been the focus of attention.Text automatic classification technology can find the location of information quickly and accurately in many complex data sets,which provides great help for information processing.With the widespread application of technologies such as machine learning and deep learning,how to use the related methods to improve the text classification technology effectively for the classification effect of the classifier has become the main problem at present.Firstly,the text categorization process and technology to do a systematic description,an explanation is given for the main steps and key elements of text categorization.The features of pretreatment,representation model,classifier design,performance evaluation and other related technologies are analyzed and summarized.Secondly,an improved RWABC method for the case of dimensional disasters that represent the formation of redundant features in the model is proposed.The random walk algorithm is used to optimize the comprehensive measure feature selection method to filter out the redundant features in the feature space.The artificial bee colony algorithm is used to find the global optimal solution,which effectively reduces the dimensionality of the feature space.Then,a text classification method based on adaptive weighted K-nearest neighbor is proposed to alleviate the skew problem caused by unbalanced text distribution.The standard deviation of the text is used to change the weight of the algorithm,and the shrink factor is used to control the text class Density and effectively improve the classification performance of the K-nearest neighbor method for the sample boundary problem.

Keywords/Search Tags:

Text classification, machine learning, feature selection, K nearest neighbor

Related items

1	Evolutionary Extreme Learning Machine Based Feature Weighted Nearest Neighbor Classification Algorithm
2	Text Sentiment Analysis Based On Text Classification
3	Research On Robust Large Margin Classification Learning
4	Research Of Nearest Neighbor Classification Algorithm Based On Sample Selection
5	Study On Generalized Nearest Neighbor Pattern Classification
6	Research On Text Classification Algorithms Based On Machine Learning
7	Automatic Classification Research On Chinese Web Document Orientation
8	Research On Instance Selection Algorithms For Machine Learning
9	Research On Instance Selection Method For K Neighborhood Classification
10	Research On Machine Learning Methods For Intelligent Decision-Making