Font Size: a A A

Research On Methods Of Anomaly Detection And Resource Estimation For Big Data Systems

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2428330611498185Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the technological innovation and the gradual transformation of data production methods,big data systems are facing many new challenges,which means that the operation status of big data systems will be more variable and the operation and maintenance work will be more difficult.Efficient detection of system anomalies is the cornerstone of prevention and governance,and rational allocation of scheduling system resources is the primary and common means of handling anomalies.In actual application scenarios,relying only on manual maintenance for the normal operation of the system is not only extremely costly but also inefficient.Therefore,intelligent operation and maintenance is the only way for the development of related technologies for big data systems.Based on the data flow scenarios faced by big data systems,this paper conducts related research around anomaly detection and resource estimation technology,and designs corresponding algorithms based on current research.The research focus of the anomaly detection algorithm based on the data stream is to be able to find outliers in the data in a relatively short period of time and use a small amount of memory,thereby promptly feeding back the potential risks of the big data system to the user.The resource estimation technology requires that after mastering the system's operating data for a period of time,it is possible to make a more accurate estimation of the next resource threshold required by the system,so as to support the resource manager to make a reasonable allocation of system resources from the overall situation to achieve the system's computing power With the maximization of data throughput.In this paper,we first design a data flow-oriented anomaly detection algorithm.Before the detection,the data is divided into sets,instead of judging whether it is an anomaly point by point,but by collectively judging some data by the nature of the set,which greatly improves the efficiency of the algorithm.In addition,comparing the data with the single definition of anomalies in previous studies,this paper innovatively proposes the concept of auxiliary anomaly standards based on the definition of global anomalies.Global anomalies are set by users according to actual scenarios.The auxiliary anomaly standard can adjust the anomaly standard adaptively according to the characteristics of the data,so that the algorithm can accelerate the calculation without adding additional calculation overhead.It is found through experiments that the algorithm proposed in this paper reduces the CPU time by 60% in the best case compared with other algorithms in the same study.The main research content of the second part of this article is resource estimation.Resource estimation is a more fine-grained task in resource scheduling and allocation.Accurate estimation results can make resource allocation a computable problem and support resource managers Make good decisions.This paper divides the tasks in the big data system into periodic tasks and burst tasks,and finds the current technical pain points for the resource estimation of burst tasks.Therefore,this paper focuses on how to estimate resources for burst tasks with uncertain operating characteristics and data distribution,and applies extreme value theory to system resource estimation.Finally,a complete resource estimation framework is proposed.After comparing with the traditional solution,the algorithm proposed in this paper can increase the resource utilization rate by at least 7.6 percentage points.
Keywords/Search Tags:big data system, data flow, anomaly detection, resource estimation, extreme value theory
PDF Full Text Request
Related items