Font Size: a A A

Data Stream Clustering And Telecom Data Management

Posted on:2009-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L ChangFull Text:PDF
GTID:1118360272958844Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of IC technology, internet technology and video technology, computing and transmit capability of IT system has been enhanced greatly. As the same time, Data volumes also become very large. Unpredictable huge storage even can been realized in desktop system. So huge data volumes cause a new problem:how to process these data. These data are often generated as continuous stream with feature of "unpredictable,burst,unbounded". Traditional data processing technologies aren't able to satisfy such new data model. Recently, many new models and analysis method were wildly concerned. Associated with actual applications, some results are succeed at telecommunication,finance and other fields.Broadband services contribute a lot to growth of profit for every Teleco. But a new problem rises because of continuous growing network traffic. How to balance between bandwidth and traffic, to keep IP network stable and operating healthily? This is big challenge which all Telco must face. Several traffic monitor tools are generally deployed in IP network, which can be divided to three types according to collection method of network traffic:traffic mirror,SNMP and Netflow. Associate with traffic monitor skills, data stream algorithms and management system have been wildly concerned from academia and industry. In this paper,we study the problem of clustering data streams over sliding windows,the problem of Top-N ranking and the problem of anomaly detection over IP traffic. We propose novel algorithm and analysis system, which has been deployed in IP network maintenance system of Shanghai Telecom.The main contributions of this thesis include the following three aspects:1. Two types of exponential histogram of cluster features, false positive and false negative, are introduced in this paper. With these structures, a clustering algorithm based on sliding windows is proposed. The algorithm can precisely obtain the distribution of recent records with limited memory, thus it can produce the clustering result over sliding windows. Furthermore, it can be extended to deal with the clustering problem over N-n window (an extended model of the sliding window). Theoretical analysis and comprehensive experimental results demonstrate that the proposed method is of high quality, little memory and fast processing rate. A real system based on this algorithm has been designed.2. A real time network traffic monitor system - SMART is introduced in this paper.Based on data stream method and aim at Top-N topic, the system converts different formats of raw Netflow data (Netflow V5 or V9) to user-defined control flows through combination and filtering.It can compute Top-N frequent flows with sliding window, detect burst on arbitrary attributes, and present results visually to users. The system could be used to replace the traditional offline monitoring system used in Shanghai Telecom. In its daily operation(7*24), it is shown that the processing speed achieves 30,000 flows/s. The basis of advanced streaming algorithms and design of robust system architecture enable SMART to achieve good performance.3. Base on PCA(Principal Components Analysis), we propose RealMon, a real stream monitoring system. RealMon can monitor the huge amount of SNMP (Simple Network Management Protocol) messages and detect anomaly without interfere of network jittering through observation of base vector L. On basis of corelation analysis between different ports of network devices, RealMon can monitor thousands of network links and assist network administrator to find anomaly. In order to enhance stability and decrease false alarm, RealMon realizes data cleaning over data stream.To sum up, streaming techniques combining with telecom network traffic analysis have amazing prospects.We study three problems in telecom network traffic analysis and propose a total solution including algorithms,synopsis data structure and system realization. Theoretic analysis and experimental results show that our algorithms are very suitable for real time data stream and outperform other methods in space requirements,processing rate and result quality. These theories and system come out of real applications and finally can be used to resolve actual problems.
Keywords/Search Tags:data stream, clustering, network traffic monitoring, data cleaning, anomaly detection, Netflow
PDF Full Text Request
Related items