Font Size: a A A

The Research Of Automatically Simplifying Large-scale Forensic Logs For Intrusion Forensics

Posted on:2015-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2308330461457924Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous advance of information technology, computer crimes (e.g., hackers’ intrusions) are becoming an unignorable and destabilizing factor which directly influences normal condition in human society such as national security, economy, culture and other regions. So the research on intrusion forensics is very significant for fighting against computer crime and improving security of computer network in the current situation. Logs effectively record behaviors of users, applications and systems, so they are important sources of candidate evidences for intrusion forensic analysis.However, there are still some problems in current logs. The most crucial problem is the size of log data set. Weekly data volume can be up to thousands or even millions. It inevitably makes much useful information (e.g., events related to attacks) submerged in redundant events, which increases the difficulty for analysis in Intrusion Forensics.This thesis presents a novel method, namely automatically simplifying forensic logs based on information theory and feature weight in parallel. It is to utilize vertical division about attributes based on MapReduce of Hadoop which is a kind of open source frameworks. For each attribute subset, we observe and study the relationship between each individual through the mutual information and entropy. The independent attribute must be filled with large entropy and little mutual information compared with others. At this time, we will get scores after giving the corresponding entropy as a weight to each selected attribute. Then we will sort them descendingly and set thresholds aiming at reducing redundant log records as intermediate results. In the end, we conduct second reduce on remaining logs with a custom-designed function. We have made use of several typical data set at Windows and Linux to do some experiments. The experiment indicate that our method is of rapidness and efficiency, without any priori knowledge, with less manual intervention as well as suitable for large-scale data.
Keywords/Search Tags:Intrusion forensics, Log data set, information theory, feature weight, MapReduce
PDF Full Text Request
Related items