Font Size: a A A

Research And Application Of Data Analysis System For Large Scale Real Time Power Data

Posted on:2018-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:S R GuoFull Text:PDF
GTID:2428330542990116Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of cloud computing,the Internet of Things,the speed of data generation in various fields is accelerating,showing explosive growth.The world has entered the era of Big Data.Power industry,as a data intensive industry,includes a large number of acquisition and monitoring equipment.The popularity of smart meters and the increase in the frequency of the use of electricity produce a large amount of electricity data.Unlike the static data processed by traditional data mining methods,tremendous electricity data comes in the form of real-time data streams,which has the characteristics of dynamicity,infinity and instantaneity,etc.It requires the use of data analysis and mining technology for flowing data.Therefore,designing an analysis system can deal with large power data in an effective,efficient and simple way,and converting the data to a commercial value,is an important research topic.In recent years,the development of cloud computing,Internet of things accelerate the growth of data,resulting in explosive growth and the world has entered the era of big data.Electric power industry is a data intensive industry,which has a large number of monitoring equipment and plenty of smart electricity meters.With the popularity of smart meters,the acquisition frequency accelerated,making it produces a large amount of data in every moment.Large-scale real-time electricity data arrive in the form of streams and it necessary to be analysed.Different from the static data,data streams has the characteristics of timeliness,dynamic,infinite and instantaneous.So it is significant to design a system that can handle and analysis these large scale data and ultimately converting into commercial value.In order to analyse the large-scale real-time electric data,in this paper,we construct a parallel distributed data analysis system.Meanwhile,we study the data stream clustering method and unbalanced data stream classification algorithm,which can handle the data flow data.The main research contents of this paper are summarized as follows:(1)We design the large-scale real-time electricity data analysis system based on the parallel distributed framework:Hadoop and Spark.The system uses a modular design and can be divided into the data storage layer,data processing layer,data analysis layer and data visualization display layer.Low coupling degree,good scalability combined with the distributed processing characteristics of Hadoop and Spark,making it access to efficient analysis of large-scale real-time electric data.(2)In view of the traditional clustering algorithm for static data is difficult to adapt to high-speed real-time data,in this paper,we improve the CluStream algorithm,and propose a temporal density characteristics of adaptive DACluStream clustering algorithm for data stream based on distributed parallel Spark Streaming framework,improving the clustering effect of CluStream online.Experiments on artificial data sets and real data sets show that the proposed algorithm has better real-time clustering effect.The implementation of DACluStream algorithm is provided in the above data analysis system.(3)To solve the unbalance distribution of electric data,this paper combined the Clustering Fusion and ensemble learning methods,designing and implementing a based on clustering fusion classification algorithm CE-DStream in the distributed parallel Spark Streaming framework to classify the unbalance data stream,experiments on the real-time power consumption data show the effectiveness and good scalability of the algorithm,and adapt to large-scale real-time electricity data analysis.The implementation of CE-DStream algorithm is provided in the power data analysis system.
Keywords/Search Tags:big data, imbalance data, data stream, clustering, classification
PDF Full Text Request
Related items