Font Size: a A A

The Research And Application Of Security Log Clustering Mining Algorithm Based On Hadoop Platform

Posted on:2016-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:R SuFull Text:PDF
GTID:2308330482477559Subject:Software engineering
Abstract/Summary:PDF Full Text Request
We are living in the era of information explosion. With the rapid development of information technology, the online log data is growing in an unprecedented scale, and they were characterized by large volume and variety, speedy velocity, low density value and so on. Relational database mainly faces to the storage and processing of the structured data, but in the real world, the huge amount of the data with the variety of the formats and forms, is also characterized by many different computing characters. So using a single host-based system to centralize log data for storage or computing cannot meet the requirements of the data analysis in such a huge amount. Therefore, an inevitable choice for big data processing is combining cluster-based distributed storage and parallel computing structure.To solve the problem, this paper focus on the application of Hadoop-based security log clustering algorithm for data mining. By analyzing some of the problems and deficiency in traditional relational database such as storing and managing the large scale of data with different structures, and the possibility of the cooperation between Hadoop and the traditional relational databases in storage and management, a new clustering analysis framework of security log based on Hadoop is proposed, and this paper expounds the process of the security log clustering algorithm based on the framework in detail, and deeply analyzed the key technologies involved in this framework. This text mainly includes the following several aspects.1. By analyzing the characteristics of relational database and its insufficiency, it proposes a new cooperative framework combining Hadoop platform with relational database. This frame integrating relational database with the Hadoop platform, makes the data storage and computing deployed to all nodes of cluster, establishes a unified data storage and processing structure, through cluster’s paralleled compute power and storage capacity to have an analysis on log data, and successfully solved the problem of speedy storage and data analysis of large-scale logs.2. By finding the hidden information of relationship among these log data, and comparing with the data analysis methods that we often used, it put forward the K-means clustering method based on MapReduce. By using the MapReduce distributive computing framework and implementation of the K-means cluster algorism, it could analyzing the potential contacts and rules among existing data and potential information in the log data, after that, it could make an evaluation and EWS for the safety level of log data.3. According to the two aspects mentioned above, a clustering analysis system of network security log data based on the Hadoop platform was established. This system has already been applied to the security monitoring service platform in a supermarket in Shaanxi Province. And it is used to analyze the log records of all the safety equipment in this service platform. So that this system implements the management and monitor for security monitoring service platform and enhances the efficiency of data storage and analysis.In short, deep into the research and application of the key technologies in the security log clustering mining algorithm based on Hadoop, not only make the relational database to be used reasonably, but also makes large scale of network security logs to be stored, managed, mined and analyzed efficiently.
Keywords/Search Tags:Hadoop, Hive, MapReduce, K-means clustering, log analysis
PDF Full Text Request
Related items