Font Size: a A A

Automatic Multi Table Expansion Algorithm Based On Directed Graph

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:S Y JinFull Text:PDF
GTID:2428330647950689Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
In this paper,we make a brief introduction to Auto ML,focusing on the algorithms related to automatic feature engineering.For the scenario of relational database,we lack the algorithms that can reduce human intervention,directly summarize the data and information of multiple tables into one table,and input the machine learning model.In this paper,the limitations of the existing automatic feature generation methods(Deep Feature Synthesis Algorithm)are studied and summarized based on the common examples of feature combination in the actual scene.It is concluded that the existing automatic feature generation methods have the following shortcomings: they can not effectively mine the information of category features,the decomposition of ring graph will lose information,and in the process of feature generation It leads to the explosion of feature dimensions,the inability to effectively mine time window related features,the lack of use of labels,the inability to generate Cartesian product features and so on.n view of the above shortcomings,this paper analyzes the relationship between tables in multi table scenario,describes the relationship between tables with directed graph,and proposes a multi table expansion algorithm based on directed graph.In view of the above shortcomings,this paper analyzes the relationship between tables in the multi table scenario,describes the relationship between tables with the directed graph,and proposes a multi table expansion algorithm based on the directed graph,which transforms the depth priority feature synthesis path of the tree structure into the level priority feature synthesis path of the directed graph structure,On this basis,Cartesian product synthesis features are constructed according to directed graph.At the same time,time-dependent feature and category dependent feature are added.In order to solve the problem of feature dimension explosion,we add heuristic cluster search to feature selection.The results of our method in three kaggle competition scenarios have been significantly improved.
Keywords/Search Tags:Feature Engineering, Directed Graph, Feature Synthesis, Feature Selection, Cartesian Product Synthesis Feature, Deep Feature Synthesis
PDF Full Text Request
Related items