| In recent years,with the rapid development of information technology,business traffic has increased significantly and service demands have upgraded rapidly.As the cornerstone of national information infrastructure broadbandization,fiber optic communication networks are evolving towards "digitalization" and "intelligentization".However,facing a series of problems and challenges such as the increasingly large scale of the network as well as complex and diverse hardware resources,the existing fault operation methods can no longer meet the development needs of optical networks.Once an optical network fails,it will carry a series of risks such as missing or damaged business data of key industries such as the Internet,5G,electricity,finance and so on.This will cause incalculable economic losses.Therefore,fault management technologies,such as rapid monitoring,accurate identification and early prediction in optical network are of great significance to ensure national information security.This article carries out a series of research on optical network fault management technology,analyzes a series of problems existing in the current optical network fault operation from the data and model levels,and proposes more effective optical network data collection,data balancing and fault management model building solutions for these problems.The main work and innovation points of this paper are as follows:(1)In response to the high latency and coarse granularity of existing optical network monitoring data collection methods,a data collection scheme for optical networks based on in-band telemetry is proposed.The core idea of this scheme is to collect performance data by having data packets pass through different nodes along the link,thus achieving finegrained collection.The simulation result shows that this scheme can collect network monitoring data at the speed of data packet transmission,achieving second-level collection of performance monitoring data.In addition,when traffic surges,this scheme will display obvious spikes on Grafana,thus effectively detecting micro-burst events and providing more detailed research data for analyzing the status of optical networks,laying a solid foundation for the subsequent implementation of fault management models.(2)To address the problem of the imbalance of fault data in optical network data,a time-series data equalization technique based on Time Generative Adversarial Network(Time Generative Adversarial Network,TimeGAN)is proposed.By learning the temporal correlations among real optical network data,TimeGAN can better capture the real data distribution and generate enhanced data that conforms to the real data distribution.This paper conducts experimental verification based on actual data from the live network.The results show that the generated data by TimeGAN captures the changing trends of various features in the monitoring period of the real data,thus increasing the available samples of fault data and achieving the equalization of optical network data.This provides a more balanced dataset for building subsequent fault management models.(3)To address the problem of insufficient utilization of the correlation between different tasks in current optical network fault management,a multi-task learning model based on Progressive Layered Extraction(Progressive Layered Extraction,PLE)is proposed.By establishing multiple feature extraction networks and sharing expert networks among different tasks,the PLE-based multi-task model can learn the information provided by diferent tasks and improve the model performance of each task.The test results show that,the multi-task model established in this paper achieves an accuracy of 99.47%in fault identification tasks,and the mean squared error in multiple feature prediction regression tasks is no higher than 0.0073.Compared with the similar structure of a single-task model,the PLE-based multi-task learning model can achieve better model performance by learning the correlation between multiple tasks. |