A major challenge for data mining problems in e-commerce applications,such as recommender systems and advertisement click-through rate predictions,is how to learn feature combination information from a large number of sparse categorical features.On the one hand,the information provided by feature combination can significantly improve the prediction accuracy of the tasks,on the other hand,its interpretability is helpful for data analysis.In recent years,the research direction of related problems mainly focuses on the design of deep learning components that can capture feature interactions,which will lead to poor interpretability of models and features.This paper studies the representation learning of categorical features.We utilize automatic feature engineering technology to design solutions that expresses information of multi-order feature combinations,and use these framworks cascaded with traditional machine learning models or deep learning models,which makes the learning models combined with this feature representation can ensure the prediction accuracy and interpretability at the same time.The first work proposed in this paper is Automatic Embedded Feature Engineering(AEFE),an automatic feature engineering framework that can construct complex second-order combinatorial features from categorical features.Its main processes include categorical feature pair search,custom paradigm feature construction and multiple feature selection.Through these steps,AEFE can intelligently construct slide window features with good interpretability and map categorical feature pairs into dense feature representations.In order to make AEFE more efficient,acceleration techniques such as Information Gain Factorization based search and data sampling are proposed.Experiments show that AEFE cascaded Gradient Boosting Decision Tree(GBDT)has better prediction accuracy on three datasets than comparative deep learning models;further analysis experiments not only reveal the strong interpretability of AEFE at the feature level,but also verify the effects of thr proposed acceleration techniques.In order to enhance AEFE and further improve the versatility and effectiveness of the framework,this paper proposes Boosting-based Automatic feature Combination Encoding(BACE),an automatic feature engineering framework for representation learning of higher-order feature groups.The core idea of BACE is to construct the representation of the feature groups by gradient boosting method,and to use the Proxy Model Search to accelerate the process.The feature combination encoding phase includes two schemes: Complex Target Encoding(CTE)and Embedding Encoding(EE).Compared with AEFE,BACE not only improves the orders of feature combination—from second order to higher order,but also has more applicable learning models—CTE scheme for GBDT and EE scheme for the deep learning models.EE scheme can improve the structure of the deep learning model and enable the model to selectively learn feature interactions.Experimental results show that,compared with baseline methods,BACE can bring different degrees of prediction accuracy improvement,and also shows the effectiveness of high-order features constructed by BACE. |