Font Size: a A A

Research On Imbalanced Data Classification Algorithms Based On Factorization Machine

Posted on:2020-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:X K ZhangFull Text:PDF
GTID:2518306305998449Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Classification is an important task in machine learning,the classification algorithms give the category of input data through the judgment of the function,the classification problem can be divided into multi-class and binary-class problems according to the data's categories prediction.Some classical classification algorithms are usually based on the assumption that data is balanced.In practical applications,many data sets are unbalanced,and sometimes minority class will become more important,misclassification will cause more serious consequences,such as medical data classification,bank customer credit evaluation,etc.Some traditional algorithms with overall accuracy as the learning goal do not perform well on the unbalanced data set classification,so it is very important to improve the classifier's classification effect on the unbalanced data set.Factorization Machine(FM)is an algorithm based on matrix decomposition,which is mainly applied to solve the problem of sparse data feature combination.The greatest feature of FM is the introduction of second-order polynomials on a linear basis.Due to matrix decomposition,FM can also learn the relationship between hidden feature vectors from sparse data,making it have a good learning ability for sparse data.FM also has linear complexity,which also allows FM to train data faster.Based on FM,this paper will expand and apply it to the problem of unclassified data set dichotomy.The achievements are as follows:(1)FM based on unbalanced classification interval is proposed.The core idea of the algorithm is to use the hinge loss function and the ramp loss function to train the unbalanced data set in FM,and introduce new hyperparameters,gap point and slope point.Gap point provide different penalty coefficients for positive and negative samples of misclassification to reduce the degree of migration of the classification hyperplane,thus giving the model a controllable classification space.Noise points in the dataset can affect the training of the algorithm,and slope point can be truncated to reduce their effects.(2)A hyperparameter self-optimization algorithm for ramp loss function for controllable classification space is proposed.After introducing the unbalanced classification interval,the model adds new hyperparameters,which means that we must manually specify more parameter values in the process of adjusting the model parameters,which makes the task of tuning difficult.Based on this,we propose a hyperparameter self-optimization algorithm for the ramp loss function.After introducing the ramp loss function,the model can automatically optimize the newly added hyperparameters,greatly reducing the time required for tuning.(3)Experimental verification and analysis.Six unbalanced data sets on UCI are selected for experiments.The experimental results show that the FM training effect based on the unbalanced classification interval is better than the traditional classification model,and the slope loss function truncates the noise points,and the classification effect is higher than FM based on the introduction of the hinge loss function.The ramp loss function hyperparameter self-optimization algorithm for controllable classification space has obvious training advantages on unbalanced data sets,which not only reduces the adjustment work of new hyperparameters in the model,but also obtains more accurate hyperparameter values and improves the classification effect.
Keywords/Search Tags:Unbalanced data, Factorization machine, Unbalanced interval, Classified hyperplane, Hyperparameter self-optimization
PDF Full Text Request
Related items