Font Size: a A A

Analysis And Processing Of Power System Log Data Based On Spark

Posted on:2018-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:J L TuFull Text:PDF
GTID:2322330542951810Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous expansion of the scale of power dispatching automation system,the log data of real-time processing of power system increase dramatically.Affected by the disk performance,the log data can not be processed in time,which resulted in delay and could not meet the real-time demand.Therefore,memory processing capacity is a good choice.In recent years,with the rapid development of cloud computing and big data technology,the use of big data technology to obtain potentially useful information for massive amounts of real-time log data has received widespread attention.And Spark stands out among the open source computing framework.As its upper tool,Spark Streaming provides batch-based real-time processing,which can meet the demand of the dispatching automation system for a certain period of real-time data processing.However,subject to the changes of data stream reception rate and other operating environment,static batch interval and block interval will lead to higher end-to-end latency and processing time,and even cause system instability.Addressing the problem of Spark Streaming,this thesis has a deep research on the impack of its interval.This thesis analyzes the processing flow of Spark Streaming in depth.Combining the effect of burst log flow on dispatching automation system and considering the characteristics of log data,data filtering and format layouts can reduce system processing time.As the effect of batch interval on the end-to-end latency of single query task,a dynamic adjustment algorithm based on Fixed Point Iteration is proposed,which has no need to determine the step size to converge to the optimal batch interval quickly,so that the system end-to-end latency can achieve the best.Considering the effect of block interval on the resource utilization and processing time of multi-query tasks,a dynamic adjustment strategy based on greedy algorithm is proposed,which can find the optimal block interval in time to reduce the processing time of multi-query tasks,so that the demand to achieve a rapid analysis of real-time log data flow to monitor,analyze and predict the power dispatching automation system all-round and multi-angle can be meet.Based on the existing Spark Streaming,this thesis improves and develops the DASpark Streaming system to achieve the above functions and bulids the experimental platform.We compare the performance with existing Spark Streaming by the real-time log test data from a prefecture-level dispatch center.The result shows that the improved DASpark Streaming has obvious advantages to reduce the end-to-end latency and processing time of the system effectively and improve the resource utilization.
Keywords/Search Tags:dispatching automation system, log data, Spark Streaming, interval, latency
PDF Full Text Request
Related items