Multi-source Heterogeneous Secure Data Processing Analysis Based On Multi-manifold Learning

Posted on:2019-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:S Xiao

Full Text:PDF

GTID:2428330548487410

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The analysis of multi-source security data is the foundation of network security analysis and prediction.The fusion analysis technology of multi-source data is an important method for processing security data.Log data can record changes in the status of the system,and log files can indicate changes in system status.Manifold learning algorithm is a widely used method of data dimensionality reduction and feature extraction in the last decade.This method integrates computer science,mathematics,intelligent science,and cognitive science,and has become the focus and hot direction in the field of machine learning and research.Combined with the manifold learning algorithm,the paper divides the fusion analysis of multi-source heterogeneous security data into three parts: multi-source data preprocessing,feature extraction and security analysis.The first part is the preprocessing stage,which is mainly the preprocessing of multi-source security data.Security data generally exists in network security devices.In order to reduce the heterogeneity of multi-source data in semantics,time and space,and remove dirty data,this paper proposes a data preprocessing method based on manifold learning algorithm.First,the data is first filtered to identify and identify noise data and other data cleaning operations.Then the stream source learning algorithm is used to reduce the data source and reduce the amount of data and other data reduction operations to obtain high-quality data.The second part is the security feature extraction stage,which is mainly the feature extraction of pre-processed data.In order to analyze multi-source heterogeneous data sources and select reasonable data features to reveal the essential features of the data,a method of data feature extraction based on multi manifold learning algorithm is proposed,which takes into account the category attributes and distance information of multisource data.The third part is the security analysis stage,which mainly analyzes the security of the extracted data features.Random forest algorithm is widely used due to its advantages of easy construction,strong universality,and convenience of combination with other algorithms.However,the traditional random forest learning algorithm has the disadvantages of being time-consuming,easily producing similar decision trees and having low construction efficiency.Therefore,a random forest construction method based on multiple manifold learning is proposed,which selects the essential attributes of the data to build a decision tree to generate random forests to improve the accuracy of the random forest,and to avoid the effect of noise and the phenomenon of over-fitting.Therefore,this paper proposes a random forest construction method based on multi-manifold learning.It selects the essential attributes of the data to build a decision tree,generates random forests,improves the accuracy of random forests,and effectively avoids the effects of noise and over-fitting.

Keywords/Search Tags:

Security analysis, Multi-source heterogeneous data, Data preprocessing, Multi-manifold learning, Random forest

PDF Full Text Request

Related items

1	The Methods Of Data Preprocessing And Behavior Analysis Prediction For Multi-source Log Fusion
2	Preprocessing Technology And Merge Of Multi-source Heterogeneous Logs
3	Combining Multi-source And Heterogeneous Data In Recommender Models And Systems
4	Multi-Manifold Learning Algorithm For Multi-Source Data Aggregation
5	Multi-source Heterogeneous Security Data Aggregation Based On Ontology
6	Campus Security Monitoring System For Multi-source Heterogeneous Data
7	Research On Key Technologies Of Targeted Cyber Attacks Detection Based On Multi-Source Heterogeneous Data
8	Research On Multi-manifold Learning Algorithm For High Dimensional Data
9	Research And Application Of Matrix Factorization Algorithms For Multi-source Heterogeneous Data
10	Multi Scale Analysis And Prediction Of Hybrid Frequency Data Based On Random Forest