Font Size: a A A

Feature Weighting Method For Binary Classification In Machine Learning

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:T F WangFull Text:PDF
GTID:2517306113453484Subject:Statistics
Abstract/Summary:PDF Full Text Request
Regarding the study of classification models in machine learning methods,most of the existing research on classification variables has focused on variable selection.As the basis of high-dimensional statistical modeling,the importance and necessity of variable selection in processing large-scale high-dimensional data are unquestionable.However,for low-dimensional data,when the total number of variables available for analysis is not large,variable selection sometimes leads to a lack of effective information for the overall classification,which affects the classification accuracy.At the same time,the existing binary classification in machine learning usually assumes that each feature has the same impact on the categorical variable and builds a classification model without considering the possible differential impact of the features on the categorical variable which is not sure for most of the cases.Based on these issues,this paper focuses on feature weighting methods,mainly studying variable weighting in binary classification models.That is,the corresponding weights are given to the features of the model to improve the classification accuracy.The main research contents and conclusions of this paper are as follows:Firstly,this paper proposes a variable weighting method based on mutual information and applies this method to classic machine learning classification algorithms such as Naive Bayes,decision trees,K-nearest neighborhoods and random forests.Secondly,the performance of each weighted classifier was tested by experiments on the Wisconsin Breast Cancer Dataset and the Blood Transfusion Information Dataset provided by the Blood Transfusion Service Center from UCI machine learning repository.The experimental results show that for binary classification tasks,the weighted machine learning methods proposed in this paper tend to outperform the corresponding traditional methods in terms of classification accuracy.Finally,this paper verifies the effectiveness of the weighting method based on mutual information for machine learning models.This method has the following advantages: first,the method is based on information theory,so the weight measurement results are reliable;second,the method does not negatively affect robust classifier,thus,the weighting method in this paper can be used for multiple classification models;finally,this method can improve the classification accuracy of several traditional classifiers,which can play an important role in practical applications.
Keywords/Search Tags:Feature Weighting Method, Naive Bayes Classifier, Decision Tree, K-Nearest Neighbors, Random Fore
PDF Full Text Request
Related items