Font Size: a A A

Research And Implementation Of Smart City Energy Consumption Data Based On Spark

Posted on:2018-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:M YiFull Text:PDF
GTID:2348330518998945Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The core idea of the wisdom city idea is to use the computer technology to improve the city's competitiveness.With the rapid development of the Internet technology,especially the development of the mobile Internet technology in recent years,it provides the data support for the construction of the intelligent city.Power consumption data collection system is using the smart meters to collect the household electricity co nsumption,the collection of data is up to 20 GB per day,using traditional relationship database to store these information has many bottlenecks,so to study a large-scale electricity data‘s storage system is the basic requirement of this article.Power plant is no way to store the electircity power,the power plant can only us e the historical data to predict the future use of electricity to produce the same amount of electricity,so it is necessary to accurate forecast electricity consumption based on electricity history consumption data to provide accurate guidance for the power plant.At the same time to provide statistical data query function for the construction of energy-saving emission reduction wisdom city to provide data support.In order to realize the above-mentioned requirement,this paper firstly analyzes the system requirements in detail,decomposes the system requirements into electricity data collection and storage requirement and electricity data analysis requirement.The analysis of electricity data includes SQ L statistical query analysis and electricity consumption forecast.The electricity data store system is using the Hadoop distributed file system HDFS as stoage strategy.The HDFS provides the high availability based on Zookeeper scheme,which makes the system have high reliability and high fault tolerance.Electricity data storage system collection is using open source tools Sqoop to achieve the original relational database of data into the Hadoop distributed file system.Electricity analysis system is divided into statistical analysis of electricity consump tion and electricity forecast two parts,power data statistical analysis is based on the front page input SQ L statement for the system to provide data on the SQ L data query function,the article is using Spark SQ L as electricity data statistics query strategy.In this paper,using the decision tree regression model to prodict the users‘ electricity consumption in one day via the history electricity data of the users through the Pearson coefficient theory that compute the factors related to the user's electricity consumption are related to the electricity consumption,and choose the main facts as the influence vector of the decision tree regression model,then using the Spark to solve the decision tree regression model.The error analysis of the results is carried out by 10-fold cross validation method.The parameters of the decision tree regression model are adjusted continuously,and the equilibrium selection is made on the average absolute error of the model and the solution time of the model.C hoosing the most balanced parameters between the decision tree solved time and the average absolute error of the model.Finally,the functional and nonfunctional requirements of the syste m are tested in the test environment,which verifies that the system can meet the functional and nonfunctional requirements of the system.The resluts shows that when selecting good decision tree parameters,the electricity forecast value and the true value of the actual absolute error can reach 5% below,the model of the solution time is also acceptable within the range,indicating that the electricity forecast model with feasibility.In the test part also gives the use of Spark SQ L,Hive SQ L query to do the performance comparison,the results show that Spark SQ L is better than Hive speed,and gives the Spark SQL implementation of SQL query and SQ L query directly in the database performance comparison,the results Indicating that Spark SQ L is faster than the database.Various test results show that this paper is designed to be reasonable.
Keywords/Search Tags:Smart City, Power Consumption Data, HDFS, Spark, Decision Tree
PDF Full Text Request
Related items