Font Size: a A A

Research And Implementation Of Prediction Analysis Algorithm Based On Cloud Platform

Posted on:2017-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:L HuangFull Text:PDF
GTID:2348330518996232Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Prediction is a science that used to estimate unknown events by relative history data.In the era of big data,applying machine learning to predict can improve prediction precision and performance greatly.Presently,there have been a lot of machine learning algorithms used to do prediction,typically C4.5 decision tree and BP neural network.C4.5 choose split attribute by information gain rate and construct the branch of decision tree recursively until the tree constructing is done.BP neural network is a nonlinear learning algorithm which adjust weight value by error back propagation until achieving the optimal value.So far,there have been many improvements about C4.5 decision tree and BP neural network,but there are still some certain limitations.C4.5 decision tree algorithm has strong dependency with memory.We can't construct a tree for large-scale sample without doing some processing,especially when the size of data is out of the memory.In this paper,an improved algorithm to downscale data size is proposed.We first cluster sample data with K-means algorithm,and then select clustering centers and the data whose distance between the nearest center are greater than a given value for the final training set.For BP neuron network,there is a positive correlation between the computing complexity and the complexity of the network structure.The purpose of this paper is to propose a method to determine the number of hidden layer neurons.The method mainly takes the theory of linear correlation analysis to delete the redundant nodes and assign the weights related to others.What's more,genetic algorithm is used to optimize the weights and threshold before linear analysis.The paper validate the improved algorithms by the public bike rental demand data.The result shows that the proposed improvement can obviously improve the execution efficiency of the algorithm within the acceptable range of impact on accuracy.In addition,the improved algorithms are developed as components and integrated into the massive data analysis platform for an enterprise.At last,this article combine scene data with algorithm components to develop and integrate two applications:public bicycle rental demand forecast and Ad CTR forecast.
Keywords/Search Tags:C4.5 decision tree, BP neuron network, Spark, prediction
PDF Full Text Request
Related items