Font Size: a A A

The Design And Implementation Of Alarm Management Subsystem Of Network Operation And Maintenance System

Posted on:2022-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2518306725984139Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As an important task in network operation and maintenance,network alarm management can comprehensively and effectively manage the alarms that occur in the network,which is of great significance to improve the quality of network service.However,with the expansion of the network scale,there are more and more network devices in the environment,and the alarms generated by the devices also increase,which poses new challenges to the alarm management ability in the network operation and maintenance system.The traditional network operation and maintenance system only has simple alarm management ability,which is suitable for the situation of small-scale alarm.In the current network environment with massive alarms,there are three main problems: Firstly,the alarm collection ability is insufficient.The existing alarm collection method collects alarm information through polling,which can not adapt to large-scale alarm and will cause alarm collection congestion in large-scale network environment.Secondly,the ability of fault location based on alarm information is insufficient.The existing fault location method relies on alarm association rule library,and also needs the experience support of operation and maintenance personnel.The effect of fault location under massive alarms is not good,and it can not respond to the needs of rapid location and resolution of faults under large-scale alarms.Thirdly,the alarm analysis is insufficient.The existing system only displays the alarm information in the time dimension,supplemented by the alarm correlation analysis,and does not perform statistical analysis of global alarm data,so the operation and maintenance personnel can not further understand the network environment information through the alarm information.In this context,this thesis designs and implements a alarm management subsystem of network operation and maintenance system.The main work is as follows:1.Aiming at the problem of insufficient alarm collection ability,this subsystem designs and implements an alarm information collector based on Kafka message queue,which avoids the database query operation of the alarm platform,and can complete the alarm information collection task in time even in the face of alarm storm.It has the advantages of low delay and high throughput,and effectively solves the problem of easy congestion in traditional alarm collection under massive alarms.2.Aiming at the problem of insufficient fault location ability,this subsystem proposes a fault location method based on network alarm clustering method.This method takes the network alarm information generated by network devices as input,determines the alarm correlation through the similarity of alarm information in three dimensions of time,space and text,and then clusters the alarms.Finally,it generates faults with root cause analysis.This method generates a clustering model based on historical alarms,and does not rely on the rule library.It has strong fault location ability and high accuracy,and effectively improves the system's fault location ability under massive alarms.3.Aiming at the problem of insufficient alarm analysis ability,this subsystem provides a multi-dimensional alarm analysis method,which shows the global alarm summary through the alarm level,content,frequency and other indicators.It can help the operation and maintenance personnel quickly understand the network environment,which is conductive to discovery the potential network risks.4.In order to make the operation and maintenance personnel understand the fault details more intuitively,this subsystem also designs and implements the fault management module,completes the persistence of the system-generated faults,and provides fault query under various conditions.In order to help the operation and maintenance personnel understand the fault quickly,this module also uses the fault propagation diagram to reproduce the propagation process of the fault in the network environment.Now the system is online and put into use in many branches of network operators in some provinces across the country,which provides effective help for improving the perception accuracy and positioning accuracy of faults,reducing the impact time of faults,and improving the network quality.
Keywords/Search Tags:Network Operation and Maintenance, Alarm Management, Alarm Collection, Alarm Clustering
PDF Full Text Request
Related items