Font Size: a A A

Big Log Data Analysis And Security Check Based On Frame

Posted on:2016-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhouFull Text:PDF
GTID:2308330461478283Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today’s big data and cloud computing era, various devices such as mobile phones, smart wristbands, smart glasses and even the trash connect to the network produced a lot of applications, which convenience the people’s lives. But at the same time those device generate a large number of system log files, which contain the users’ landing sites, device information, current status, blood pressure and heart rate and so on. Based on this information, someone can analyze the user’s behavior, health status, and so on. How to effectively deal with such a large number of logs and dig out useful information has become one of the hottest global academic research.This paper based on big data environment, makes some depth research in current log mining. First, understanding classic logs parallel processing papers, and in-depth study of currently available large data processing and analysis methods, then design a user data and system security inspection solutions.In this paper, we use IBM smart logistics of smart city log files as data source, on the mainframe using hadoop as a parallel computing framework for large-scale parallel processing logs, the results of hadoop statistics as SVM (support vector machine) data sources, to classify existing data about the user’s habits, summed up the habit laws of different users. If there is an abnormal behavior, the system will timely warn user by email or SMS, to ensure user account and system security.Meanwhile, this paper uses an improved active learning algorithm, called G-AL (Gaussian active learning), based on Gaussian kernel function, which is used to analyze the distribution of training set. To make the selection engine in active learning more inclined to choose new unlabeled samples from the area uncovered with training data samples. Such then, the training set will be scattered, to access higher classification accuracy. To evaluate the performance of the proposed algorithm, we conduct different experiments on four data sets. And experimental results show that G-AL is superior to other comparison algorithms, including MS and EQB. The system use this algorithm to deeply analyze unauthorized access request, the classifier can learn more sample information through a smaller price.thereby enhancing the performance and security of the entire system.
Keywords/Search Tags:Big Data, Hadoop, SVM, Active Learning
PDF Full Text Request
Related items