Font Size: a A A

Effective data mining for intrusion detection and WWW prediction applications

Posted on:2006-08-05Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Awad, Mamoun AdelFull Text:PDF
GTID:1458390005997404Subject:Computer Science
Abstract/Summary:
Data Mining is an analytical process to analyze, explore, and summarize large amounts of data in order to uncover new patterns and/or to discover new relationships between variables. Predictive data mining is the most common type of data mining and it has the most important business applications. This dissertation focuses on the classification/prediction problem using efficient models in two important applications of data mining--- intrusion detection and WWW prediction.; Intrusion Detection attempts to detect computer attacks by examining various data records observed in processes on the network and it is split into two groups---anomaly detection systems and misuse detection systems. Our interest here is in anomaly detection and our proposed method is a scalable solution for detecting network based anomalies. We use Support Vector Machines (SVM) for classification. The SVM is one of the most successful classification algorithms in the data mining area, but its long training time limits its use. This dissertation presents a study for enhancing the training time of SVM, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) algorithm for clustering. Clustering analysis helps find the boundary points, which are the most qualified data points to train SVM, between two classes. We present a new approach of combining SVM and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with Rocchio Bundling technique and random selection in terms of accuracy loss and training time gain using a single benchmark real data set. We show that our proposed method contributes significantly in improving the training process of SVM with high generalization accuracy and outperforms the Rocchio bundling technique.; WWW Prediction is the problem of predicting the next page(s) a user might visit after surfing a web site. The improvement of many applications depends on surfing prediction. In this dissertation, we propose a hybrid model that combines three classification techniques, namely, Support Vector Machines, Markov model, and Artificial Neural Networks, to resolve prediction using Dempster's Rule. Such fusion overcomes the inability in predicting the unseen data in the case of Markov model and the complexity of multi-class problem in the case of Artificial Neural Networks and Support Vector Machines, especially when dealing with large number of classes. We also employ a reduction technique which uses domain knowledge to reduce the number of classifiers and improve the predictive accuracy. We demonstrate the effectiveness of our hybrid model by comparing our results with widely used techniques, namely, Markov model and Association Rule Mining, based on a benchmark dataset.
Keywords/Search Tags:Data, Mining, WWW prediction, Intrusion detection, Markov model, SVM, Support vector machines, Applications
Related items