Font Size: a A A

The Methods Of Data Preprocessing And Behavior Analysis Prediction For Multi-source Log Fusion

Posted on:2018-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhuFull Text:PDF
GTID:2348330542490802Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-source log data is the basis of user behavior analysis and prediction in network security field,and fusion analysis technology is proved to be an important means for analysis and prediction.Traditional data fusion analysis technology is divided into three levels,namely,data level,feature level and decision level,and each level of fusion analysis meets the different needs of users.Aiming at decision-level space-time heterogeneity and similarity between the feature-level processing methods and the data-level processing methods,this paper combines the data level and the feature level into one level and divides the log analysis structure into two levelsto reduce the complexity of fusion analysis.Combined with the characteristics of multi-source log data source and data analysis under the big dataenvironment,this paper lists the process of log fusion analysis in two stages as follows:Primary stage's task is mainly to preprocess the data source.Security log data source is ubiquitous and available,but it acts with some short-comings,such as,diverse types,high growth rate and large data noise.In order to reduce data spatio-temporal heterogeneity,data size and data noise,this paper presents a pre-processing method based on the similarity join.First,the similarity joinoperations perform flexible data cleaning for the data source by adjusting the threshold dynamically,thereby improving data quality.Second,The efficiency of preprocessing was improved by 43.91% with improved IAE-MapReduce parallelization.Advanced stage's task is mainly to carry on behavioralanalysis and prediction of pre-processed data.The decision tree learning algorithm is used extensively because of its model applicability and easy construction.However,the traditional decision tree learning algorithm has the disadvantages of unreasonable choice of measured attributes and low construction efficiency.In order to select a reasonable measured attribute and reveal the essential characteristics of the data,this paper proposes a feature extraction method based on manifold learning to construct the decision tree with the essential attributes of data,which reduces its false alarm rate and false negative rate 42.16%,46.26% and avoids over-fitting.In order to improve the efficiency of decision tree construction,this paper proposes a measured attributes caching method based on caching tree to cache the attribute nodes,which improves the speed of decision tree construction and the stability of the decision tree.In summary,this paper presents a method of data preprocessing and behavior prediction for multi-source log,and realizes the analysis and prediction of security situation in cloud environment.Simulation results show that,compared with the existing MapReduce pretreatment methods and behavioralanalysis and prediction of the ID3 method,the method proposed in this paper manifests higher efficiency and accuracy.
Keywords/Search Tags:Multi-source Log Preprocessing, Behavior Analysis Prediction, Parallel Similarity Join, Decision Tree, Manifold Learning
PDF Full Text Request
Related items