Research Of Factorized Learning Algorithms Based On Random Forest And Stochastic Gradient Descent

Posted on:2022-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:C Wang

Full Text:PDF

GTID:2518306572951059

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Machine learning applications are becoming increasingly important in enterprises.Most of machine learning algorithms require a single table as input data,while relational data in enterprise applications are typically normalized to meet the third normal form.The application of machine learning technology is faced with the problem of data representation mismatch,and researchers have to take the strategy of denormalizing data through primary/foreign-key joins to materialize those tables to a temporary single table as the input of machine learning algorithms,which is called�learning after join�.This strategy causes several problems: a wide table incurs redundancy avoided by normalization,increasing storage requirements;the training process introduces unnecessary calculation time cost.To solve this problem,attention on �factorized learning� is proliferating among researchers.Factorized learning restructures and improves the classical machine learning model,taking the normalized multi-table datasets as input data,avoiding the redundancy without sacrificing the quality of the original machine learning model.However,most of the studies on factorized learning are based on linear models,and only a few of them focus on nonlinear models.Moreover,most of the algorithms are highly dependent on matrix linear algebra operation,which is difficult to be applied to some powerful machine learning algorithms involving randomness,such as random forest and stochastic gradient descent,etc.On the other hand,some trivial steps of traditional machine learning,such as training set-test set partition,become a challenge in the field of factorized learning,as multi-table dataset cannot be easily partitioned.Previous studies simply ignored these problems by using datasets with fixed join schema,which greatly limits the generality and scalability of the model.This study focuses on factorized random forest and factorized random gradient descent on normalized multi-table datasets.For random forest,we introduce a sample-over-join algorithm and incremental decision tree EFDT to construct factorized random forest,with multi-table dataset as input.For stochastic gradient descent,we design a factorized index algorithm that extracts single data from the normalized multi-table dataset in any specified position,and construct factorized stochastic gradient descent with multi-table dataset as input.Extensive experiments with synthetic dataset and multiple real-world datasets reveal the improvement of those two factorized learning algorithms compared with traditional algorithms.The study proves that the factorized learning algorithm on large-scale datasets avoid the space and time redundancy caused by join operation,without any loss in accuracy and scalability.

Keywords/Search Tags:

Machine learning, Factorized learning, Random forest, Stochastic gradient descent

PDF Full Text Request

Related items

1	A Research Of Stochastic Gradient Descent Algorithm
2	Research On Improving The Convergence Performance Of Stochastic Gradient Descent In Distributed Machine Learning
3	A Research And Application On Stochastic Gradient Descent Algorithm In Distributed Cluster
4	Research On Privacy Preserving Methods In Stochastic Gradient Descent
5	Optimization Algorithms Of Neural Networks Weights Based On Stochastic Gradient Descent
6	The Reseach And Application Of Stochastic Gradient Descent And Dual Coordinate Descent Algorithm
7	Applied Research On Gradient Descent Algorithm In Deep Learning
8	Research On Distributed Stochastic Gradient Descent Algorithm
9	Research Of Stochastic Parallel Gradient Descent Based On Segmentation Random Disturbance
10	A Ranking Algorithm ListNet Based On Stochastic Gradient Descent