Font Size: a A A

Research On EP-based Classification For High Dimensional Data

Posted on:2013-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:W F ShiFull Text:PDF
GTID:2248330377460567Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the last decade, as the capabilities of collecting information and processingdata are improved, many areas such as scientific research, biomedical, networkcommunications, commercial have a large number of high dimensional data, soclassifying high-dimensional data becomes a hot research field of data mining.Affected by the "dimension effect", classification methods having goodperformance in low-dimensional data space can’t get a good classification results inhigh dimensional space due to excessive computational complexity and otherfactors, establishing effective classification algorithms for high dimensional datahas become a challenging problem for data mining. Therefore, researching theclassifier for high dimensional data and applying to high dimensional data has animportant significance.The main contents for high dimensional data are as follows:(1) High-dimensional data, traditional classification methods are discussed andthe shortcomings of the classification applied in the high dimensional data also areanalyzed.(2) The EP-based classification algorithm for high dimensional data isintroduced. The EP patterns, the EP patterns mining method and EP-based classifierare detailed. The application of the EP-based classification on the high dimensionaldata will produce too many of the EP patterns and will effect the classificationaccuracy.(3) In high dimensional data, there are a lot of redundant and irrelevant featuresresulting in a lot of redundant and irrelevant EP patterns in EP mining. We describetwo methods of feature selection: lasso selection methods based on linearregression and causal relationship selection method. We merge feature selectionmethods into the EP-based classification and remove the redundant and irrelevantfeatures by feature selection methods, so that we remove redundant and irrelevantEP patterns. We propose two EP-based classifications for high dimensional data:lasso-based EP classification for continuous data and causal relationship-based EPclassification for discrete data.(4) The lasso features selection method applied to ultra-high-dimensional orhigh dimensional small-sample data will have two problems: the excessive computational complexity and over-fitting. We propose two improved lasso featureselection methods: iterative characteristics of the lasso selection and equalizationcharacteristics of the lasso selection.The experimental results show the effectiveness of all proposed algorithm.
Keywords/Search Tags:high-dimensional data, EP patterns, feature selection, lasso, causalrelationship
PDF Full Text Request
Related items