Font Size: a A A

Multi-valued Random Forest Algorithm And Its Application In Machine Learning

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:K A ZhangFull Text:PDF
GTID:2518306485475074Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Machine learning is an interdisciplinary subject covering multiple fields.Currently,common subjects such as mathematics,probability theory,and statistics are involved in machine learning.Since its introduction in the 20 th century,machine learning has always been the focus of scholars.With the current development of the Internet,artificial intelligence is rapidly emerging,and most of the research on artificial intelligence is based on machine learning.Therefore,machine learning is the core of artificial intelligence and can make computers more intelligent.At present,application of machine learning has spread in various fields,such as face recognition,autonomous driving,fault diagnosis,stock selection,etc.Machine learning can be used to deal with classification,prediction and regression problems.However,the current research on machine learning is basically dealing with single value,that is,the input attribute is single value and the output category result is also single value.Nevertheless,in reality,we can find that there are many multi-valued situations,while there is little research on it in the field of machine learning.Therefore,it is necessary to add the factor of multi-value environment to the research of machine learning,which will help solve many real-life problems.There are many machine learning algorithms,including linear regression,logistic regression,linear discriminant analysis,naive Bayes,KNN,K-means,random forest,etc.However,due to many excellent characteristics of random forest,such as it has mature application,fast learning and training speed,high accuracy of results,great ability to process a large number of input variables and high-dimensional data,no need to do feature selection,and good ability to balance errors in unbalanced data,random forest has attracted the attention of many scholars.Therefore,this paper selects an algorithm based on random forest to study classification and prediction in a multi-valued environment in machine learning.In this paper,the multi-valued environment is divided into single-class multi-valued and double-class multi-valued situations.First of all,this article considers the single-class and multi-valued situations.For single-class multi-valued situation,this article subdivides it into multi-valued attribute and multi-valued category,and constructs different random forest algorithms for classification and prediction.In addition,in the case of multi-valued attribute,this article subdivides the multi-valued attribute into two cases of average probability attribute value and non-average probability attribute value according to whether the probability of each data in the attribute value is equally divided.For the classification and prediction of the random forest algorithm,the classification and judgment are first based on a decision tree,and then it is expanded to many trees for comprehensive consideration.In the selection of each node of multi-valued attribute,this paper uses an improved information gain method.In the selection of multiple-valued nodes,this paper uses an improved similarity method.And these two methods can classify and predict data well.In the case of two-class multi-value,the multi-valued attribute and category are taken into account,the random forest algorithm is constructed and its application in machine learning is explored.In addition,this paper uses the similarity method and the maximum ratio algorithm to analyze the two-class multi-valued situation,and concludes that both methods can classify and predict multi-valued data well.However,for the similarity algorithm,the tree has fewer layers and can be calculated faster.For the maximum ratio algorithm,it uses the classification of a binary tree.Compared with the multi-branch tree,the binary tree can completely classify each branch,and the result is more accurate.Therefore,it is recommended to use the similarity algorithm when the amount of data is huge or the solution time is limited.For solving classification and prediction without time limit or pursuing high accuracy,the maximum ratio algorithm is better to use.In the analysis of single-class multi-valued and double-class multi-valued environment,this article introduces relevant examples to demonstrate,and it is concluded that a series of multi-valued algorithms based on random forest can effectively classify and predict various situations in multi-valued environment,which is conducive to the development of research in multi-valued fields.
Keywords/Search Tags:Multi-valued Environment, Single-class Multi-valued Situation, Two-class Multi-valued Situation, Random Forest, Machine Learning
PDF Full Text Request
Related items