Font Size: a A A

Application Of Data Cleaning Method In Data Center Of Electric Company

Posted on:2012-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhangFull Text:PDF
GTID:2178330335467007Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Center is one of "SG186 project" of the national grid during the Eleventh Five-Year. The purpose of the data center is the use of electric power enterprise data warehouse technology integration within the fragmented business data. And it supports the power of different departments within the enterprise, different needs, different levels of users to access the information they need access to data by means of convenient and effective, and it provides various types of data analysis for decision-makers by a way that the data of the industry distribution in the network to be integrated . The main features of the data center are to provide a standard consistent with the data sharing and access platforms for the enterprise through a unified data definitions and naming conventions, to ensure the uniqueness of the data, accuracy, completeness, standardization and timeliness.The main focus of this paper is data cleaning process of the data center in the ETL (Extract Transform Load). According to the national grid "SG186 project" planning requirements on the construction of data centers and ETL functional architecture of data center, ETL process of data center is mainly divided into extraction, cleansing, transformation and loading of four main parts. In accordance with the actual production of electric power enterprise business needs, in this paper, data extraction in the"cleansing process"is divided into two sub-processes: Which is to detect abnormal data and its value is set to "NULL", and then to predict the value of these vacancies filled by other valid values.In this paper, our main object of study is the data value of the electric energy billing system in the power marketing. In addition, the article also introduced the object model of electrical data and the reason of generating abnormal data, at the same time, proposed the way of the genetic neural network prediction models to fill the vacancy value. The effectiveness of this method was verified. And the data cleaning method mentioned in the text was applied to the construction of electric power enterprise data centers, to improve data quality dimensions of information silos in the past the problem of data for management decision-making data services to provide effective and appropriate decision support.In short, with the continuous development of computer information technology and the business requirements of the quality of the data are constantly being improved in the analysis systems of decision-making and assist, noise data or dirty data in the data extraction process of the encounter is gradually transformed into data extraction is an important part. To further improve the quality of data, we applied a method which based on genetic neural network to handle the missing values. This method fully used the global search ability of genetic algorithm and the nonlinear mapping ability of neural network, so that the prediction accuracy of the data was greatly improved. The experiment shows that this method is feasible and effective in improving the prediction precision of data.
Keywords/Search Tags:ETL, Data Cleaning, Data Center, Genetic Neural Network Algorithm, Power Data, Vacancies Value
PDF Full Text Request
Related items