Font Size: a A A

Study Of Machine Learning Application On Web-Based Network Security

Posted on:2012-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F YangFull Text:PDF
GTID:1118330371460555Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Recent years, Web (World Wide Web) boomed for its being free of restriction by common firewalls, which stimulates the trend that most legacy applications are ported to take the form of a Web application. The popularization of Web also has its dark side, it brought about the web-based attack becoming the number one threat on the internet. Intrusion detection is the main counter-measure against attacks, however the much adopted misuse detection which encodes the features of every known attack into signatures failed to handle the sharp increase of new attacks. Anomay detection which builds patterns for normal behaviors and detects attacks which significantly deviates from the normal patterns is becoming a promising alternative. This method usually adopts models and metrics from machine learning and data mining to build its own detection model and procedure, it is also known as being able to detect new unknown attacks.A hidden markov based grammar model is presented in this paper. Hidden markov model successfully addressed applications like voice recognition and handwriting recognition, and it also proved to be a good candidate for representing a regular grammar. HMM-based grammar model effectively encodes the grammar of normal requests, and the similarity of a sample to the model is also an appropriate measurement for anomality evaluation. A maximized Bayesian post priori based principal controlling the generalization process, ensures neither over-generalization nor under-generalization of the grammar.As the HMM-based grammar model troubled by structural complexity and computational complexity in terms of learning, a DFA (Definitive Finite Automaton) model is proposed to replace the HMM model as the grammar representation. DFA is much simpler in both structural and computational complexity compared to HMM, Moreover, its self is a highly efficient classifier, which saves additional classifying mechanism. DFA proves not only to simplify the leaning and detection which is paramount in practical use, but also to retain almost as good a detection performance as HMM does.This paper also summarized and compared the much referred grammar-based models. A systematic analysis is made on inner connections between models, and a comparing experiment is carried out on the advantages and disadvantages in terms of complexity, performance and special features.Most supervised learning methods are troubled be training phase design and the laborious training samples labeling, thus the detection performance also heavily relies on the perfection of training. An unsupervised clustering based method is proposed, which works under the premise that normal samples present a great similarity with each other, and dominate in number in the normal and abnormal samples blended practical network stream. A bottom-up agglomerative clustering process sets the maximized cluster from the others, which represents the normal sample cluster and the anomaly clusters. A minimized error principal is adopted to decide the optimized stopping criteria.A single detection model models one aspect of attacks, and is hard to cope with practical network streams with variant attack types. A multi-model detection framework is proposed to map anomaly probabilities of multiple models into a unified high-dimensional feature space, and to detect with a kernel-based SVM classifier. This framework not only enhances the detection performance, but also exhibits an impressing flexibility.
Keywords/Search Tags:intrusion detection, anomaly detection, Web attacks, HMM, grammar inference, text clustering, feature mapping, SVM classifier, kernel function
PDF Full Text Request
Related items