Font Size: a A A

Research On Data Mining Algorithm In The Electric Power Cloud Data Analysis Platform

Posted on:2015-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiFull Text:PDF
GTID:2298330434457340Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the speeding up of the power grid construction, the development of smartgrid, the intelligent terminal accumulated a lot of electricity data quickly, a year ofdata storage size will increase from the current level of GB to terabytes and petabyteseven, at the same time, also transition from dozens to hundreds of electric power loaddata dimension.Can make use of data mining algorithm for huge amounts of dataprocessing, but the traditional data mining algorithm in the face of huge amounts ofdata, will face many bottlenecks, such as data storage problems and its processingperformance, etc., which creates a data mining algorithm can’t effectively to deal withhuge amounts of data.Cloud computing has high reliability, virtual sex, distributedstorage and powerful parallel computing ability, and has good expansibility, thesecharacteristics and the combination of data mining can solve the problems faced bythe traditional data mining.In this background, this paper research on the above issues.Firstly, in order to effectively to huge amounts of power load forecastinganalysis and user data classification, selection of the classic algorithms of datamining, association rules Apriori algorithm and naive bayes algorithm, for the twoalgorithm is carried on the thorough research, especially the algorithm of ideas andcalculation steps of the algorithm.Second, in the thought had the understanding oftwo algorithms, after analyzing the shortage of traditional algorithm and according tothe characteristics of the algorithm itself, put forward to improve, association rulesApriori algorithm of frequent itemsets access and naive bayes algorithm model of thestage of training need to be repeated calculations, in these two aspects to parallelizethe improvement algorithm.Finally, using cloud computing technology in graphsprogramming framework and HBase distributed database technology, such as to putforward three kinds of improved algorithm parallelization improvement, and designthe corresponding Map and Reduce functions, to improve its ability to deal with largeamounts of high-dimensional data.And in the data mining algorithm for parallel afterimprovement, the data mining algorithms to register to the electric power cloud dataanalysis platform handle huge amounts of electricity data.Framework based on graphs for association rules Apriori algorithm and naivebayes algorithm for parallel improvements, in practice, using the association rulesApriori algorithm to analyze the temperature of the power load, the influence ofclassifying users using the naive bayes algorithm, and compared the improvedefficiency of the algorithm.Experiments show that after parallel processing algorithmhad significantly improved in terms of efficiency.But this article just made graphs ofsome of the steps of algorithm of parallel processing, not the algorithm itself is optimized and improved.
Keywords/Search Tags:Cloud Computing, Machine Learning, Parallel Algorithms, LoadForecasting
PDF Full Text Request
Related items