Font Size: a A A

Automatic Diagnostic Results Analysis Tool For Cloud Network Products

Posted on:2022-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y C OuFull Text:PDF
GTID:2518306335966809Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Recently,the use of public clouds is constantly increasing.Enterprises and consumers move their business from local server to the cloud for cutting down their IT budget and improving their service quality.Cloud network products deploy on cloud usually use virtualization technology,so that multiple cloud network product instances belong to different tenants can deploy on the same physical machine.However,the characteristic of sharing the same physical device often makes the cloud network product instance failure more harmful(For example,the CPU resource exhaustion caused by one abnormal instance will make it difficult for the remaining cloud network product instances to provide high-quality services to their tenants),which makes troubleshooting particularly complicated,and ultimately greatly increases the time required for troubleshooting.Therefore,fault diagnosis research on cloud network product examples is of great significance to ensure the reliability and stability of cloud networks.Currently,common fault diagnosis methods for cloud network product instances are mainly divided into the following two types:passive fault diagnosis and active fault diagnosis.Between them,passive fault diagnosis is widely used in the industry because of its low cost and non-intrusive advantages to tenants.In a large cloud service provider considered in this dissertation,a series of instance-level diagnostic rules(such as,setting safety thresholds for CPU monitoring indicators,grabbing software change related event logs)are preset to passively extract instance information and output the judgment results of the diagnostic rules,thereby achieving cloud network product instance-level fault diagnosis.However,even if instance-level diagnostic results are obtained,it still relies on expert knowledge for manual troubleshooting to obtain the final root cause of the fault.This is often time-consuming and labor-intensive.How to further improve the efficiency of cloud network product instance diagnosis will be the top priority of this dissertation.In response to this problem,many researchers have conducted in-depth research on network fault diagnosis,but the existing research results are mostly oriented to large-scale network systems,lack of research on analyzing the diagnostic results of cloud network product instances,and lack of experimental verification from real cloud network diagnostic data.In this regard,based on cloud network diagnostic result data in real world,this dissertation innovatively proposes an automated analysis tool for the diagnostic results of cloud network product instances.This tool realizes fault detection and fault classification of cloud network product instances by using the characteristics of the time sequence correlation of the diagnostic results,thereby reducing the repetitive inference work of technicians and assisting technicians in completing troubleshooting tasks more efficiently.Specifically,the main contributions of this dissertation can be summarized into the following three points:(1)Considering the time series correlation characteristics of diagnosis result data of cloud network product instances,a multi-instance learning recurrent neural network algorithm based on windowed time series data is designed for fault detection.It overcomes the difficulty of single-point time series data that is difficult to describe the data time series association and too large granularity of fault labels,and achieves accurate fault detection of cloud network product instances,which means accurately identifying the time window for instance failures.(2)Considering the mapping relationship between cloud network product instance fault types and instance diagnosis results,the problem of cloud network product instance fault classification is abstracted as a codebook decoding problem.In view of the high complexity of the decoding algorithm,a fault classification algorithm for cloud network product instances,which is composed of codebook matrix optimization algorithm and contribution-based fault search algorithm,is proposed.This algorithm reduces the dimension of the codebook matrix and the cardinality of the candidate solution set without losing the accuracy,and greatly improves the efficiency,(3)Use real cloud network product instance diagnostic result data to carry out experimental verification and performance evaluation of the proposed algorithm.Based on the diagnostic result data from a large cloud service provider,the performance of the fault detection model in timing information extraction and classification accuracy is verified and analyzed,and the accuracy of 93.85%and the F-score of 92.07%are achieved on the test set.Based on the true positive fault data obtained by the fault detection algorithm,the performance of the fault classification model in the running efficiency and classification efficiency is verified and analyzed,and the diagnosis rate of 92.59%and the diagnosis efficiency rate of 85,70% are achieved.
Keywords/Search Tags:cloud network product, fault diagnosis, multiple instance learning, neural network, codebook
PDF Full Text Request
Related items