Font Size: a A A

Lossless Compression And Cloud Warehouse Storage Research Of Measurement Large Sets Information

Posted on:2015-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:L GuoFull Text:PDF
GTID:2272330422484551Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
With the application of network scale expansion and smart devices, the smart grid isdeveloping towards the interaction direction of energy and information, and massiveinformation processing and intelligent dispatching. Distribution network characteristic such asmeasurement point increasing, analog quantity changing fast and high fluctuation isparticularly outstanding, such massive information need continuously information storage inthe dispatching and monitoring system shows, which have produced a large data set ofinformation, which is not only the transmission obstacle of information communication, butalso affect the query processing of application information. Directly access the massive dataset information is easy to make the information delay and access speed slow, serious evencause the key information delay or omission, which directly threats the operation safety andreal-time control. So far, the electric power data center has not yet reached smart gridcomputation required levels, including the massive data storage, automation management andhigh availability, processing measurement data information is one of the key problems inintelligent distribution network technology. Effective storage and compression technology forlarge data sets is needed.This paper aims at real time storage and compression processing research of Massiverailway dispatching information flow, which uses the new Hadoop cloud computing and Hivedata warehouse framework technology, to solve the storage problem of electric powerdispatching information flow and ensure grid operation safety and reliable power supply. Tosolve large data compression and storage problem in intelligent scheduling system, usingHadoop framework and Map/Reduce distributed parallel programming model, furthercombining with Hive framework technology, a new distributed cluster lossless compressionmethod based on cloud framework is proposed. Firstly, using the public informationrelationship, objects of dispatching monitoring public information and key data businessinformation flow are established, which solve the integration problem of massive information.Then classified comparing the lossless compression method for dictionary coding andstatistical coding, scheduling host and the monitor server is deployed using cloud computingnode and cluster network configuration. Taking Deflate, Gzip, BZip2and Lzo Losslesscompression coding fused in the cluster node, lossless compression experiment environmentof scheduling monitoring information is established.Taking dispatching section measurement log for example, the test results of differentcompression formats on the same section log sets show: the BZip2cluster compression ratio is higher than the other three ones. when the section log sets exceed three million, thecompression ratio surpasses81%, increasing the section log sets, using Hive data warehouseframework technology, BZip2compression ratio will surpasses85%,which is more suitablefor compression processing for monitoring history information flow. While Lzo clustercompression method is faster and more suitable for real-time information processing ofdynamic measurement and control process. The results meet requirement of2s dynamicrefreshment engineering application for railway network large data sets.
Keywords/Search Tags:smart grid scheduling, large data sets, Hadoop cloud computing, hive data warehouse, Map/Reduce, lossless compression, distributon cluster, public information
PDF Full Text Request
Related items