The data classification is a process of giving data object partition according to the features of a group of data objects, and has widely been researched in statistics, machine learning, neural network and expert system. Recently, it has become one of the important research aspects of data mining. In fact, the classification is a two-step process. In the first step, a model is built describing a predetermined set of data classes or concepts. In the second step, the predictive accuracy of the model is estimated, if the accuracy of the model is considered acceptable, the model can be used to classify future data objects. Generally, the learned model can represent by classification rules, decision trees, or mathematical formulae. At present, the classification methods often used include genetic algorithm, decision tree, and neural network, and etc.The classification rule mining methods based on traditional genetic algorithm have four main drawbacks: (1) producing only one classification rule per class, (2) the quality of the rule isn't high, (3) redundant rules are too much in the optimized population, (4) the classification accuracy isn't high. This work presents a classification rule mining method based on hybrid genetic algorithms, which can overcome four above drawbacks and improve the accuracy of classification rule mining.Firstly, this paper introduces the arisen background, definition and function of data mining, presents that the predictive accuracy, computing complexity and simplicity of the model description are three criteria to evaluate the classification model, analyzes and compares some classification rule mining methods often used.Secondly, this paper introduces the basic principle of genetic algorithm and local search algorithm, and analyzes their merits and defects. Genetic algorithm has a strong capacity of global search, but its capacity of local search is weak, on the other hand, local search algorithm has a strong capacity of local search. Therefore, the two algorithms above can be combined and form a hybrid genetic algorithms.Thirdly, this paper analyzes the principle of mining classification rule, and then points out that the standard genetic algorithm doesn't fit classification problem greatly. So a classification rule mining method using hybrid genetic algorithms is proposed. The proposed hybrid genetic algorithms adopt Michigan method, so every chromosome represents a rule. Aimed at classification problem, individual encoding,fitness function, producing individual function, genetic operator and local search operator are designed to produce multiple high quality rule, and simplicity factor is added into fitness function. In addition, some redundant rules exist in the optimized population, considering the conciseness of the final rule set, so this paper presents a rule extraction method. Experiment shows that the classification rule mining method using hybrid genetic algorithms can find a set of the succinct, accurate and comprehensible classification rules.At last, this paper analyzes the parallelism of the proposed classification algorithm, establishes PVM parallel computing platform based on Windows 2000. The parallel classification algorithm adopts coarse-grained master/slave model, so it especially suits working at PC cluster. Experiment shows that the proposed parallel classification algorithm has a good speed-up ratio. |