Font Size: a A A

Hybrid Intrusion Detection With Clustering-Outlier Technique And Incremental SVM Classification

Posted on:2016-11-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Chitrakar RoshanFull Text:PDF
GTID:1108330461953063Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid and wide-spread growth of internet technology, security risks and threats are also increasing day by day. Newer versions of attacks and intrusions are evolving continuously by putting extra challenges to the field of intrusion detection. In this present context, this thesis work proposes a hybrid approach of intrusion detection along with a hybrid architecture of intrusion detection system. The proposed architecture is flexible enough to perform intrusion detection tasks either by using a single hybrid module or by using multiple hybrid modules. The "Clustering-Outlier detection followed by SVM classification" is proposed as the first hybrid IDS module to be used in the architecture, whereas the second module proposed is the "Incremental SVM with Half-partition method".The Clustering-Outlier detection is an algorithm developed by this research work, which combines k-Medoids clustering and outlier analysis such that both the operations are carried out simultaneously. The selection of k-Medoids for the Clustering-Outlier detection is finalized from a simulation work/experimentation which discovers that k-Medoids clustering outperforms k-Means clustering when used for detecting anomalies in a large databases or network traffic data. The research work shows that k-Medoids consistently yields higher rates of accuracy and detection rate but lower rates of false positives in the mean time-whether it is followed by a Naive Bayes classification or an SVM classification.This thesis work also suggests that SVM classification is more suitable that NB classification for an IDS. For this purpose, a simulation/experiment work is carried out, in which a comparative analysis is done between the Clustering-Outlier detection followed by NB classification and the Clustering-Outlier detection followed by SVM classification. It is shown that the combination having SVM is better in terms of accuracy, detection rate and false alarm rates. Hence, Clustering-Outlier detection is then followed by a SVM classification in order to design an IDS.In the second module to be used in the proposed hybrid IDS architecture, the Half-partition strategy is adopted with the intention to reduce the time and space complexity of incremental SVM classification. In the Half-partition strategy, thesupport vectors identified in the current iteration of incremental SVM are selected and retained in a smarter way for the next iteration. By using this method, an algorithm named Candidate Support Vector (CSV) selection algorithm is developed, which works two times faster and storage space is reduced by half compared to other incremental support vector machine (ISVM) algorithms. Thus, CSV-ISVM algorithm is proposed as the final piece of this thesis work.This and the following few paragraphs explains the problems or motivations behind this thesis work. Intrusion Detection System (IDS) has been established as the most essential and unavoidable component of the whole network security and defense system. In present context, a wide range of attacks and threats are increasing day by day along with rapidly growing network technologies and the Internet. Uncontrolled databases and web servers have been constantly targetted by intruders. Therefore, this thesis choses IDS as its major research work.Need of applying clustering techniques like k-means and k-medoids into IDS is realized to handle big data and multimedia. Various IDS and IPS have been implemented for quite a long time for protecting and securing information, specially in network environment. Most of them work well with known attacks and work well with small data or network traffics. Due to evolution of big and multimedia databases, an IDS that is able to detect attacks from the huge data samples in an acceptably less amount of time is required.There is a need of new techniques which are better in detecting anomalies efficiently. In recent years, data mining approaches have been proposed and used as detection techniques for discovering anomalies and unknown attacks. These approaches have resulted in high accuracy and good detection rates but with moderate false alarm on novel attacks. In addition, some attacks and normal connections are not detected correctly. Hence, there is a need to detect and identify such normal instances and attacks accurately in an interconnected network.Most of IDS related works are focused in increasing accuracy and detection rates. As a consequence in due course, many approaches give rise to false positives and also fail sometimes in detecting new threats like zero-day attacks. Hence, an outlier analysis is felt necessary in order to detect new anomalies very efficiently and also to help reduce false alarms in classifying the attacks. So, this thesis develops an algorithm that combines clustering with outlier detection.Time complexity has always been a major concern in IDS. In case of big and multimedia data traffics, online detection generally takes longer time thus compromising the performance of the network speed. Considering this problem, the classification of normal and abnormal data traffic should be done in as less time as possible. And therefore, a faster classification method like Naive Bayes (NB) classification or SVM becomes really necessary. This thesis addresses this necessity with better SVM approaches.NB classifiers are based on a very strong independence assumption with fairly simple construction. They work fine with good data distribution. When NB is combined with k-Medoids clustering, time complexity increases as the size of data grows. Therefore, to address the time complexity, Naive Bayes classification could be replaced with a better unsupervised learning method e.g. Support Vector Machine (SVM) that can produce high detection rate with a small-sized data distribution. Moreover, the time consumed by SVM should also be reduced to give it an extra performance and thus an idea of designing a new algorithm is justified. Therefore, this thesis improves the time consumption of incremental SVM classification by inventing newer methods.As newer threats and attacks are coming and new security scenarios are developing day by day, it has become a compulsion for an IDS to learn continuously over every new network scenario. Not surprisingly, SVM classification also needs an incremental technique to be incorporated before using it in an IDS. Therefore, a new Half-partition method is suggested in reducing the time taken in incremental SVM classification.Moreover, implementation aspect of IDS should also be taken into consideration. Since network attacks are quite unpredictable, the security infrastructure in which the IDS is implemented should be flexible enough to incorporate necessary techniques in required ways. Such an IDS architecture, therefore, should also be sought in order to provide a number of options and combinations of detection techniques or components.With the afore-mentioned motivation, this research is carried out with the aim to provide a flexible IDS infrastructure and propose a hybrid IDS with k-Medo ids-Outlier method and incremental SVM classification scheme. Other objectives of this work are:-(1) To detect intrusion in real time, (2) To guarantee the predictability of the model, (3) To handle infrequent patterns, and (4) To reduce false alarm rates.In order to meet these objectives, special attention is paid to make sure that the amount of time taken to build the model and detect the anomalies does not create extra overheads to the web servers. Predictability of the model is tested to make sure that it can always produce the desired accuracy in detecting attacks. And also, the proposed model is able to handle infrequent normal patterns or anomalies and learn also from them in order to carry out correct classification. Moreover, iterative detection technique is used in the proposed model to minimize the false alarm rates.The methodology adopted by the thesis are explained ahead. This thesis first carries out a comparative study of k-Means and k-Medoids clustering technique in order to find out which one is most suitable for an IDS in real time. For this, each clustering is followed by a Naive Bayes classification method and results are analysed based on intrusion detection parameters.It also designs an algorithm called "Clustering-Outlier Detection algorithm" that unifies k-Medoids clustering and outlier detection technique by keeping the clustering quality of k-Medoids intact and without increasing the time complexity of the algorithm. Then this algorithm is combined with a classification method to be used in an IDS.This work also carries out a comparision between NB and SVM classification by appyling a simple simulation/experiment method to see whether SVM can perform quickly (using as less data sample as possible) than NB.This work modifies and improves incremental SVM classification, in which the newly proposed "Half-partition strategy" selects and retains "Candidate Support Vectors (CSV)" sets. A new algorithm named "Candidate Support Vector based Incremental SVM" or CSV-ISVM algorithm implements the proposed strategy.Separate experiments are carried out for different pieces of approaches and research works. Combined experiments also are done wherever necessary. Types of experiments include and not limited to data pre-processing and extraction, clustering, outlier detection, classification and cross validation etc. The data set used in all the experiments is Kyoto2006+ data sets. The experiments related to clustering and outlier detection techniques are evaluated on the basis of clustering quality and execution time. They are compared with other similar methods and the methods proposed by this research work have been found better. In case of experiments related to classification methods, the evaluation criteria are performance, accuracy, detection rate and false positive rate of the classification scheme as well as the execution time, in some cases.Consequences of the research works show that:-The k-Medoids clustering technique followed by Naive Bayes classification method, in case of large data sets, is proven to be more significant than k-Means clustering in terms of accuracy and detection rate. The method also reduces the false alarm rate in the mean time.Combination of SVM classification with k-Medoids-Outlier detection method produces better accuracy, detection and false alarm rates. This approach is shown to be better than the combination of k-Medoids with Naive Bayes classification.The new algorithm CSV-ISVM method that implements the proposed Half-partition strategy is shown to perform double faster with just half the data samples (support vectors) than other similar incremental SVM methods.All the proposed approaches and research works have enhanced the detection rate with minimum false positive rates. The proposed algorithms e.g. Clustering-Outlier Detection algorithm and CSV-ISVM are also tested and compared experimentally with other similar methods and are found better to be used by IDS in real-time environment. These proposed methods can be used for network intrusion detection in real-time because of its higher detection rate, improved false alarm rate as well as acceptably less amount of learning time.
Keywords/Search Tags:Hybrid Intrusion Detection, Clustering-Outlier Detection, Incremental SVM, Candidate Support Vector, Half-Partition Method
PDF Full Text Request
Related items