Font Size: a A A

Research On Classification Algorithm Of Electric Load Data Based On Hadoop Platform

Posted on:2019-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:M HeFull Text:PDF
GTID:2382330566491426Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the coming of intelligent era,the amount of data in all industries is increasing rapidly.At the same time,the construction scale of the power network sector is expanding gradually,the speed of construction is accelerating,and more and more areas are involved.This leads to the increasing magnitude of the power data.The problem of how to deal with the large-scale power data quickly comes into being.At the same time,China's power sector also began to pay attention to the field of power users,through the classification of user power load data to analyze the behavior of power users,how to classify the power load data has become another difficult problem.Aiming at these two problems,this paper studies the classification and efficiency of power load data.First of all,this paper studies the problem of data classification,it is necessary to find a suitable algorithm for the classification of power data,and the algorithm of classification is mainly based on clustering algorithm and classification algorithm.This paper enumerates the common clustering algorithms and classification algorithms.Through the analysis of the characteristics and functions of each algorithm,the simulation experiments of various algorithms are carried out by using the power load data.The K-means clustering algorithm is selected to classify the power load data from the experimental results and the characteristics of the power load data.Next,we analyze the K-means clustering algorithm.It is found that the algorithm is uncertain in the selection of K value,and it is unable to determine the appropriate K value.At the same time,the way of calculating spatial distance between data and data is not rigorous,so it is difficult to show the connection between data and clustering midpoint.Based on the above two problems,a threshold based method is proposed to determine whether the K value is appropriate by comparing the distance between the data and the cluster center with the threshold.At the same time,we analyze the user behavior index to add the weight value to the spatial distance formula to show the influence of data points and data centers.Through the above analysis and optimization,an improved K-means algorithm based on power load data is obtained.Finally,this paper explores the efficiency of power data processing,and finds that centralized processing methods used by power sector can no longer meet the current needs.Now we propose to use Hadoop platform parallel cluster to process large scale power data,and build multiple nodes to further embody the advantages of cluster.The improved algorithm is then optimized based on the MapReduce programming framework,which is better integrated with the Hadoop platform.The experimental results show that the optimal algorithm proposed in this paper can classify the power load data effectively,and each kind of data is representative and can embody the power consumption characteristics of this kind of users.At the same time,the efficiency of data processing in the Hadoop platform parallel cluster is much higher than the centralized processing mode.Finally,the results of the power load classification can be used to analyze the user's behavior effectively and provide suggestions for the Power department.
Keywords/Search Tags:Large scale power data, Hadoop Platform, Data classification, User behavior
PDF Full Text Request
Related items