With great developments and extensive applications of technology of computer networks, especially for worldwide spread of Internet, science and technology of computer and internet are continuously innovated and upgraded. Infrastructure and resource of computer networks have become increasingly important to nations, enterprises and individuals, ceaselessly change traditional way in which people live, work and study, and bring about new problems and challenges at the same time. With the increase of informatization level and enhancement of dependence on computer networks for human society, how to keep informatization society running normally, safely and steadily is the most important issue of which computer network safety is one to be always strengthened and improved. At present, application of interconnected network is extensively extended and its open characteristic is extensively strengthened, which causes more and more network systems exposed to threat of attacks and intrusions.Based on the research background stated above, this dissertation is to intend to develop research on network intrusion detection based on data mining. In order to enhance the effectiveness for unknown intrusions, some network detection algorithms mainly using clustering analysis are proposed which are measured by the detection rate and the false positive rate, and are provided with computer simulations. Meanwhile, an improved algorithm for preprocess of data mining is proposed.The main contributions of this dissertation are summarized as follows:1. From the point of view of the research background and the evolutionary history of intrusion detection, concepts and basic elements of intrusion, intrusion detection and intrusion detection system are introduced, backgrounds, start points, feasibility, current research and problems are discussed that intrusion detection is studied using data mining.After analyzing problems of the feature subset selection in the field of data mining, a new algorithm of feature subset selection based on the improved genetic algorithm is proposed. In this section, the research necessity for feature subset selection and its popular methods are discussed. In order to solve the problem of the LVF algorithm, an improved algorithm is proposed to optimize2. results of feature subset selection. By simulated experiments, more optimal and stable feature subsets than those of the LVF algorithm are achieved.3. In order to solve the problem of effectiveness and efficiency of network intrusion detection, the Network Intrusion Detection Based on Genetic Clustering (NIDBGC) algorithm is proposed after the algorithm proposed by Portnoy et al. is analyzed. NIDBGC algorithm consists of the Leader clustering stage and the genetic optimization stage, which can automatic set up the set of original clusters, optimize them by combination and label intrusion activities. Experimental results prove that the average detection rate and the average false positive rate keep a relative high level when NIDBGC algorithm deals with much larger intrusion ratio than that of the algorithm proposed by Portnoy et al. This demonstrates that NIDBGC algorithm is feasible for detecting unknown intrusions and can obtain relative good results.4. Clustering analysis using the cluster centroid is discussed by examples. For network intrusion detection, the activity distribution is unknown and may be not in super-spherical shape. So, a network intrusion detection algorithm for non-spherical clusters, Network Intrusion Detection Based on Nearest Neighbor Genetic Clustering (NIDBNNGC), is proposed. NIDBNNGC algorithm consists of the nearest neighbor clustering stage and the genetic optimization stage, which can also automatic set up the set of original clusters, optimize them by combination and label intrusion activities. By experimental results, NIDBNNGC algorithm achieved a better performance than that of NIDBGC algorithm. Meanwhile, for NIDBNNGC algorithm, stochastic single point mutation operator as the local search operator of genet... |