Font Size: a A A

A Study On An Optimal Feature Selection Algorithm Using Minimum Joint Mutual Information Loss Criterion

Posted on:2012-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ZhangFull Text:PDF
GTID:2218330362958143Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of social economy and computer technology, studying onFilter feature selection algorithms nowadays is of great importance not only in theory but alsoin practice, for a good feature selection algorithm has broad application prospects.In this thesis, an optimal feature selection algorithm using minimal joint mutual infor-mation loss criterion is proposed based on the concept of invariant feature subset. By ap-plying joint mutual information as the criterion, this algorithm decomposes the feature se-lection problem into two sub-problems. It first discovers an invariant feature subset by amaximal conditional mutual information principle, and then eliminate potential redundantfeatures based on the minimal joint mutual information loss criterion. From the reliabilitypoint of view, this criterion can also abate the disturbance caused by sample insufficiencyin conditional mutual information estimation. Finally the algorithm will discover a featuresubset which best compresses the discriminative information contained in the original set offeatures.A fast implementation of conditional mutual information estimation is proposed andused to tackle the computationally intractable problem in conditional mutual information es-timation. The definition of local mutual information is introduced, and the relation betweenconditional mutual information and local mutual information is then discovered, which trans-form the estimation task to several local mutual information solving tasks.Ten benchmark datasets and challenge datasets are selected in order to verify the perfor-mance of the proposed algorithm and the efficiency of the fast conditional mutual informationestimation. Those selected datasets are differ greatly in the sample size (2000~72626) aswell as the feature dimension (22~139351), which can provide a comprehensive testingunder different conditions. Empirical results show that our algorithm outperforms severalrepresentative feature selection algorithms. In addition, experimental results for the executiontask show that the proposed implementation for conditional independence test achievesa better performance on both time efficiency and RAM utilization, especially on highdimensional dataset Thrombin containing 139351 features.
Keywords/Search Tags:Classification, Feature selection, Invariant feature subset, Minimum joint mu-tual information loss, Fast conditional mutual information estimation
PDF Full Text Request
Related items