Font Size: a A A

Clustering Algorithms And Its Application In Log Data Processing

Posted on:2012-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2178330332489975Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the science and technology development, compuer networks cause has rapidly developed in China, and it has greatly influenced our work, living, learning, brought us extremely convenience and shortcut. However, the computer network brings us advantages, but it else causes a variety of computer network security problems. To address these issues, a variety of log-based methods and technology became study consensus. Among them, log data as the research object, clustering algorithm in log scale compression is a more appropriate application method.Because traditional clustering cannot be directly applied to log data fields, this paper firstly in-depth study clustering algorithm. The definition, clustering algorithm produces course and data types of the clustering algorithm, all of which are discussed. The traditional clustering algorithm's several branches: partitional clustering, hierarchical clustering, based on density of clustering, based on grid of clustering and model based clustering made respectively a general description. This paper summarizes and analyzes the clustering algorithm existing problems and areas for improvement. Aiming at these problems, combining with characteristics of network logs and system logs, the major research work is as follows:1. Grid-based network log two-step clustering is designed and put forwardOn multi-protocol log data divided into grids, the grid inside and outside is respectively twice clustered, and generated final clusters. The algorithm needs not set parameter k, and determines the number of clusters itself. This algorithm deal with dynamic data, actual incremental clustering, erase clustering data, process new arrival logs. Experiment shows that this algorithm log-scale compression significantly, and does not destroy logs integrity and reliability, and does not affect user network access.2. Based on event-mapping clustering algorithm is designed and put forwardTo the operating system log, safety and application log, uniform clustering algorithm is designed to produce unified description of user behavior generalization. By research mapping relationship between the log and events, based on mapping clustering algorithm is designed and put forward. Refer to the event correlation thoughts, and through the experimental summary, the mapping relationship between the log and events is established. This algorithm makes full use of the prior knowledge of operating system log, safety and application log, simplified complexity of the clustering and easy implementation, speed, time complexity is low, generated event information description accurate, complete, easy to understand and identify, become later safety research high quality data sources.
Keywords/Search Tags:Clustering Algorithm, Log, Data Mining, Grid-based Clustering, Events Correlation
PDF Full Text Request
Related items