Research On Methods Of Anomaly Detection And Resource Estimation For Big Data Systems

Posted on:2021-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Gao

Full Text:PDF

GTID:2428330611498185

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the technological innovation and the gradual transformation of data production methods,big data systems are facing many new challenges,which means that the operation status of big data systems will be more variable and the operation and maintenance work will be more difficult.Efficient detection of system anomalies is the cornerstone of prevention and governance,and rational allocation of scheduling system resources is the primary and common means of handling anomalies.In actual application scenarios,relying only on manual maintenance for the normal operation of the system is not only extremely costly but also inefficient.Therefore,intelligent operation and maintenance is the only way for the development of related technologies for big data systems.Based on the data flow scenarios faced by big data systems,this paper conducts related research around anomaly detection and resource estimation technology,and designs corresponding algorithms based on current research.The research focus of the anomaly detection algorithm based on the data stream is to be able to find outliers in the data in a relatively short period of time and use a small amount of memory,thereby promptly feeding back the potential risks of the big data system to the user.The resource estimation technology requires that after mastering the system's operating data for a period of time,it is possible to make a more accurate estimation of the next resource threshold required by the system,so as to support the resource manager to make a reasonable allocation of system resources from the overall situation to achieve the system's computing power With the maximization of data throughput.In this paper,we first design a data flow-oriented anomaly detection algorithm.Before the detection,the data is divided into sets,instead of judging whether it is an anomaly point by point,but by collectively judging some data by the nature of the set,which greatly improves the efficiency of the algorithm.In addition,comparing the data with the single definition of anomalies in previous studies,this paper innovatively proposes the concept of auxiliary anomaly standards based on the definition of global anomalies.Global anomalies are set by users according to actual scenarios.The auxiliary anomaly standard can adjust the anomaly standard adaptively according to the characteristics of the data,so that the algorithm can accelerate the calculation without adding additional calculation overhead.It is found through experiments that the algorithm proposed in this paper reduces the CPU time by 60% in the best case compared with other algorithms in the same study.The main research content of the second part of this article is resource estimation.Resource estimation is a more fine-grained task in resource scheduling and allocation.Accurate estimation results can make resource allocation a computable problem and support resource managers Make good decisions.This paper divides the tasks in the big data system into periodic tasks and burst tasks,and finds the current technical pain points for the resource estimation of burst tasks.Therefore,this paper focuses on how to estimate resources for burst tasks with uncertain operating characteristics and data distribution,and applies extreme value theory to system resource estimation.Finally,a complete resource estimation framework is proposed.After comparing with the traditional solution,the algorithm proposed in this paper can increase the resource utilization rate by at least 7.6 percentage points.

Keywords/Search Tags:

big data system, data flow, anomaly detection, resource estimation, extreme value theory

PDF Full Text Request

Related items

1	Research Of The Real-time Network Data Flow Anomaly Detection Based On Storm
2	Research On Efficient Abnormal Detection And Flow Moment Estimation Based On Data Center Network
3	A Research Of Patient Flow Anomaly Detection Method Based On Data Mining
4	Change Detection Algorithm For Data Flow In Real-time Exchange Rate Data Flow Anomaly Detection
5	Research On Anomaly Detection And Prediction Algorithm For Multidimensional Time Series Data Based On Deep Learning
6	The Research On Flow-based Network Abnormal Traffic Detection Method
7	Research On Anomaly Detection Based Attack Source Identification Technologies In Wireless Sensor Networks
8	Anomaly Detection System For Big Data Of Mobile Printed Circuit Board Industry
9	Research Of Data Fusion And Analysis In Multiple Data Sources Based On Flow Matrix
10	Intellegent Anomaly Detection And Its Applications