Font Size: a A A

Research On The High-Order Factorization Machine Based On Combined Features

Posted on:2020-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:C Z LiuFull Text:PDF
GTID:2518306305995979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Factorization Machine is proposed in recent years and is mainly used to solve the problem of large-scale sparse data feature combination.It is a machine learning algorithm combining matrix decomposition and support vector machine.FM uses a factorization method for the cross-term coefficient,which can also well learn the inter-relationship between variables in the implicit data in the sparse data.The combined feature is a high-order feature formed by combining a single feature,which helps to represent a nonlinear relationship in the data,and can express more data underlying semantics than a single feature.Based on the custom feature combination,this thesis studies the factorization machine for classification and ordinal regression tasks.The specific results are as follows:(1)Based on the frequent pattern,a classification-oriented feature extraction method is proposed.Firstly,the frequent patterns of related categories in the data are mined as the basis of the combined features.Secondly,in order to make the extracted combined features helpful for class classification,the K-L divergence is used to measure the class distinguishing ability of the frequent patterns.Finally,the feature combination is given.Feature combination using the most distinguishing pre-m frequent patterns.The experimental results show that the combined features extracted by this method have improved the effect of most classification models.(2)For the ordinal regression problem,a combined feature extraction method for ordinal regression is proposed.In order to make the extracted combination features contain ordinal tag information,an ordered binary decomposition method is proposed to decompose the ordinal regression into multiple binary sub-problems.On each of the binary sub-problems,the frequent patterns of the relevant categories are mined and the associated K-L divergence is calculated.Considering the imbalance of frequent mode K-L divergence in different sub-problems,a method of cyclically selecting frequent patterns is proposed.The balanced selection distinguishes different levels of frequent patterns,and the last selected frequent patterns are used for feature combination.Experimental demonstrations were performed using a variety of ordinal regression models on public and private datasets.The experimental results show that using the most distinguishing frequent pattern combination features can effectively improve the training effect of most ordinal regression models.(3)A custom high-order factorization machine model is proposed.The factorization machine can only learn the second-order relationship between features,and belongs to the second-order polynomial model.The high-order factorization machine enumerates all the feature combinations by exhaustive means,which makes the model too complicated and difficult to solve.This thesis proposes a custom high-order factorization machine(CHOFM),which uses a set of custom high-order feature combination rule sets instead of the original high-order combination.This approach reduces invalid feature combinations while preserving the expressive power of high-order composite features.We present a training method for CHOFM models based on SGD.The experimental results show that the CHOFM model is better than FM.In addition,the CHOFM model has better convergence.
Keywords/Search Tags:Factorization machine, Feature combination, Feature selection, Ordinal regression, Frequent pattern
PDF Full Text Request
Related items