As central aspects of IP network management, performance evaluation and fault diagnosis have been hot-spots of network management area for years. To estimate network performance metrics and collect fault events, a lot of research work has been carried out in network measurement area, and many measurement techniques have been developed. However, there are still some issues remain to be resolved. For example, the parallel performance evaluation of end-to-end paths in large-scale network is difficult, and the result of network fault diagnosis is imprecise under the circumstances of alarm loss and spuriousness alarms.In this thesis, a new network performance and fault management mechanism based on passive measurement is proposed in order to solve two key issues: (1) the online evaluation of network performance status, (2) the real-time correlation and analysis of fault events. With this mechanism, end-to-end performance metrics are obtained through passive measurement, and a model-based reasoning system is developed to diagnose network faults on the basis of performance evaluation. This mechanism contains several technical issues including the tracking of large-scale IP flows, the passive measurement techniques, the reasoning model for fault diagnosis. The above issues are discussed in this thesis which results in following achievements:(1) Proposing a high speed flow tracking algorithm. Based on the locality of IP traffic, a scalable hash tree (SHT) algorithm is proposed for tracking the IP flows in large-scale network. SHT algorithm shortens the period of IP flow seeking and performs better than IPSX algorithm in efficiency.(2) Proposing measurement methods for QoS/QoE metrics estimation. Based on three basic QoS metrics, some“half-pathâ€metrics are proposed and the according passive measurement techniques are developed. Also a QoE evaluation model is proposed to combine different performance metrics into a single rating score.(3) Proposing a fault diagnosis method based on performance measurement. Taking the advantage of current model-based reasoning (MBR) method, a network routing mode (RM) is proposed, and a fault reasoning model called RMBR is designed to locate network faults on the basis of performance evaluation. Two key algorithms of RMBR - the routing path simulation algorithm (RPS) and the link metrics tomography algorithm (LMT) - are also proposed.(4) Developing a network measurement and diagnosis system. Algorithms and models in this thesis are implemented as a prototyping system called IDCFlow, which is consisted of several sub-systems (RPPM, FaultMan, and NetView). IDCFlow has been deployed in CERNET and some commercial networks. |