Font Size: a A A

Research On Text Representation And Classification Based On Machine Learning Algorithm

Posted on:2019-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z B YuanFull Text:PDF
GTID:2518306473453404Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the gradual progress of Internet technology,network information has grown with texts,audio and video,and images as carriers.The scale,type,and content have been increasing.For the massive information and data,how to efficiently mine out the required content,standardize the management of data and provide the reference for classification decision-making has been the focus of attention.Text automatic classification technology can find the location of information quickly and accurately in many complex data sets,which provides great help for information processing.With the widespread application of technologies such as machine learning and deep learning,how to use the related methods to improve the text classification technology effectively for the classification effect of the classifier has become the main problem at present.Firstly,the text categorization process and technology to do a systematic description,an explanation is given for the main steps and key elements of text categorization.The features of pretreatment,representation model,classifier design,performance evaluation and other related technologies are analyzed and summarized.Secondly,an improved RWABC method for the case of dimensional disasters that represent the formation of redundant features in the model is proposed.The random walk algorithm is used to optimize the comprehensive measure feature selection method to filter out the redundant features in the feature space.The artificial bee colony algorithm is used to find the global optimal solution,which effectively reduces the dimensionality of the feature space.Then,a text classification method based on adaptive weighted K-nearest neighbor is proposed to alleviate the skew problem caused by unbalanced text distribution.The standard deviation of the text is used to change the weight of the algorithm,and the shrink factor is used to control the text class Density and effectively improve the classification performance of the K-nearest neighbor method for the sample boundary problem.
Keywords/Search Tags:Text classification, machine learning, feature selection, K nearest neighbor
PDF Full Text Request
Related items