Research On Imbalanced Data Classification Algorithms Based On Factorization Machine

Posted on:2020-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:X K Zhang

Full Text:PDF

GTID:2518306305998449

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Classification is an important task in machine learning,the classification algorithms give the category of input data through the judgment of the function,the classification problem can be divided into multi-class and binary-class problems according to the data's categories prediction.Some classical classification algorithms are usually based on the assumption that data is balanced.In practical applications,many data sets are unbalanced,and sometimes minority class will become more important,misclassification will cause more serious consequences,such as medical data classification,bank customer credit evaluation,etc.Some traditional algorithms with overall accuracy as the learning goal do not perform well on the unbalanced data set classification,so it is very important to improve the classifier's classification effect on the unbalanced data set.Factorization Machine(FM)is an algorithm based on matrix decomposition,which is mainly applied to solve the problem of sparse data feature combination.The greatest feature of FM is the introduction of second-order polynomials on a linear basis.Due to matrix decomposition,FM can also learn the relationship between hidden feature vectors from sparse data,making it have a good learning ability for sparse data.FM also has linear complexity,which also allows FM to train data faster.Based on FM,this paper will expand and apply it to the problem of unclassified data set dichotomy.The achievements are as follows:(1)FM based on unbalanced classification interval is proposed.The core idea of the algorithm is to use the hinge loss function and the ramp loss function to train the unbalanced data set in FM,and introduce new hyperparameters,gap point and slope point.Gap point provide different penalty coefficients for positive and negative samples of misclassification to reduce the degree of migration of the classification hyperplane,thus giving the model a controllable classification space.Noise points in the dataset can affect the training of the algorithm,and slope point can be truncated to reduce their effects.(2)A hyperparameter self-optimization algorithm for ramp loss function for controllable classification space is proposed.After introducing the unbalanced classification interval,the model adds new hyperparameters,which means that we must manually specify more parameter values in the process of adjusting the model parameters,which makes the task of tuning difficult.Based on this,we propose a hyperparameter self-optimization algorithm for the ramp loss function.After introducing the ramp loss function,the model can automatically optimize the newly added hyperparameters,greatly reducing the time required for tuning.(3)Experimental verification and analysis.Six unbalanced data sets on UCI are selected for experiments.The experimental results show that the FM training effect based on the unbalanced classification interval is better than the traditional classification model,and the slope loss function truncates the noise points,and the classification effect is higher than FM based on the introduction of the hinge loss function.The ramp loss function hyperparameter self-optimization algorithm for controllable classification space has obvious training advantages on unbalanced data sets,which not only reduces the adjustment work of new hyperparameters in the model,but also obtains more accurate hyperparameter values and improves the classification effect.

Keywords/Search Tags:

Unbalanced data, Factorization machine, Unbalanced interval, Classified hyperplane, Hyperparameter self-optimization

PDF Full Text Request

Related items

1	Unbalanced Data Classification Algorithm Based On SVM For Research And Application
2	Research On Non-negative Matrix Factorization And Its Application To Unbalanced Data Classification
3	Research On Federatedlearning Methods For Unbalanced Data
4	Research On Employee Turnover Prediction Based On SMOTE-SVM Under Unbalanced Data
5	Research And Application Of Active Learning Method For Unbalanced Data Set Based On One Class SVM
6	Research On Unbalanced Text Data Set Classification Algorithm
7	Research On Credit Scoring Method For Unbalanced Data
8	Unbalanced Data Classification Under-sampling Algorithm Based On SVM For Research And Application
9	Research And Application Of Integrated Algorithms For Unbalanced Data Sets
10	Categories Of Unbalanced Data Integration Classification Research