Font Size: a A A

Random Forest Algorithm Based On Optimized Auto-encoder

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z WuFull Text:PDF
GTID:2517306113453444Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology in the world,electronic information technology has been updated several times in just a few decades.From the beginning of the 21 st century,when computers were the tools of a few people,to their widespread use today,the information technology and related hardware industries are rapidly advancing.With the support of hardware,the computing power of the computer has increased exponentially,and the neural network algorithm,which used to be burdened by insufficient computing power,has begun to shine again and shows different characteristics from other machine learning algorithms.In the new era,neural network has become an indispensable part of machine learning with its excellent performance.Since the encoder is comparatively large difference in the neural network with other network of a branch,is a kind of neural network algorithm,the supervision to the raw data as the goal,after a series of complex nonlinear transformation to turn raw data into the similar and not exactly the same new data,through multiple iterations makes new data difference as small as possible to the original data.In this way,the hidden layer can be regarded as a special expression of the original data,and the relationship between the original data can be deeply explored.In the process of self-coding,the auto-encoder will reconstruct the original data to a certain extent.If there is a deep nonlinear relationship between the original data,the auto-encoder can reconstruct the data in the hidden layer by virtue of the correlation of the original data.Among a series of machine learning algorithms,random forest algorithm is a new algorithm.It combines the basic ideas of decision tree algorithm and Bagging algorithm in integrated learning.Meanwhile,it further optimizes the random forest algorithm with the unique characteristics of Bagging,and USES random attribute selection to select variables for a single decision tree algorithm,and finally obtains better performance compared with other machine learning algorithms.The advantage of random forest algorithm over ordinary decision tree algorithm is that it can prevent algorithm overfitting without pruning,and it has excellent parallelism in high-dimensional data processing,so it is a good choice for processing high-dimensional data.Random forest algorithm is mainly used in data classification and non-parametric regression.Depending on its excellent performance,random forest algorithm has a very broad development prospect in medicine,biology,statistics,economics and many other disciplines.With the development of The Times,auto-encoder is more and more popular in the field of data compression.At the same time,more and more data sets are no longer suitable for random forest algorithm,and random forest algorithm lacks the processing capacity of the original data,so it cannot effectively select the data features.Aiming at the shortcomings of the auto-encoder and the random forest algorithm,this paper starts with extracting the characteristics of the original data and improves the auto-encoder to a certain extent,so that the hidden layer data features acquired by the auto-encoder can be well used in the random forest algorithm,so as to combine the advantages of the two.The specific research contents are as follows: firstly,the optimized auto-encoder is used to carry out nonlinear reconstruction of the original data,and secondly,the reconstructed data characteristics are applied to the random forest algorithm,so as to optimize the performance of the random forest algorithm to a certain extent.Finally,a variety of data sets are used to verify the proposed algorithm.Contemporary China's intricate network information environment,a large number of useless information on the Internet wantonly accumulation will not only increase the efficiency of people use the Internet to reduce,but also to get in the way people spiritual needs,especially large and good sports comment articles,Internet users not to effectively extract meaningful information from it.Therefore,how to build a safe network environment to protect people's daily use of the Internet and how to classify the objectivity of sports commentary articles are the issues that people pay close attention to and need to solve.Based on the above situation,the use of random forest algorithm based on optimization from the encoder to classify the objectivity of the sports commentary,make use of the accuracy and recall rate,OOB score that a variety of different angles such as evaluation index,a comprehensive analysis of comprehensive to evaluate the algorithm,to verify the real value of the algorithm,and then to the current development of random forest algorithm and the encoder provides a new way of thinking.
Keywords/Search Tags:Auto-encoder, Random forest, Neural networks, Sports review articles
PDF Full Text Request
Related items