Font Size: a A A

Research On Large Database Classification Models Based On RS And GEP

Posted on:2013-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HuFull Text:PDF
GTID:2248330374452616Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification,as one of data analysis ways,can extract the model which can describe all objects from the large amount of data. Because of using the known model to predict new data, Classification is a favourable supervised learning process. A good classification rule can make us not only understand this class better, but also use these data effectively.The classification is an important task in data mining, it extracts a model by analyzing the known attributes of training set. By using the model,we can map the data that will be classified to the specified classification rule one-on-one. Classification has been widely applied to machine learning, neural networks and performance prediction.In most cases, the training set of classification are continuous, noisy and incomplete actually, which will affect the accuracy of classification. In order to improve the accuracy of classification.Firstly,the paper uses a wide range of threshold discretization method to discretize continuous data.Secondly,this paper takes advantage of the rough set theory, which can deal with these incomplete, redundant, partial knowledge, and the evolutionary strategy of GEP. We focus on how to remove those redundant, continuous and partial data on the data preprocess layer.This paper proposed attribute reduction algorithm of Rough set based on Gene expression programming(GEP).Finally, to the question that the present classification rule is complicated, this paper proposes a new classification model, which includes data acquisition, preprocessing, discretization and reduction. The main work of this paper is as follows:(1) We systematically review the related literatures on classification,GEP and rough set theory’s;give a detailed discussion on the core content-reduction of rough set; point out the defect of the genetic algorithm reduction;and find the differences between Genetic algorithm and gene expression programming.(2) On the basis of theoretical analysis of GEP, this paper studies how to improve the attribute reduction algorithm,and proposes a reduction algorithm based on GEP,ARRS_GEP,and uses different reduction methods to verify the validity of the new algorithm.(3) Many algorithms in the classification task require discrete data, for example, rough sets, etc.To solve such a problem,this paper uses the wide range of threshold discrete method to discretize the continuous features.By analyzing the problem that there exists noisy data when we extract classification rule,this paper proposes to do these operations such as cross, variation, restructuring, inserted string,on the data link layer. After the reduction of condition attributes,we use the classification algorithm to extract the rule reduction.(4) To test and verify the proposed model, this paper has predicted one trading enterprise.The result shows that the model can reduce the complicacy of classification rule.The derived classification rule via the proposed method has fewer attributes, and is simple relatively.This indicates that the model is effective in knowledge reduction and rule extraction.
Keywords/Search Tags:Rough Set, Gene Expression Programming, Reduction, Rule, Classification
PDF Full Text Request
Related items