| Classification models and feature selection methods on gene expression microarray data in bioinformatics are now the focus of attention in machine learning and data mining community. The truth that genes are likely to co-regulate in functioning means that feature groups do exist in microarray data, and domain experts would like to exploit them. Therefore, this paper studies the group feature selection algorithms, trying to find the feature groups which provide more insights into the underlying data relationships for domain experts, and meanwhile reduce the dimensionality and improve classification accuracy.Generally, group feature selection algorithms can be divided into two categories according to the way of finding feature groups:the explicit ones and the implicit ones. In an explicit group feature selection algorithm, groups are found under some criterion, and features in the same group are always highly correlated. After the groups are identified, the feature selection algorithm is performed on representative features from each group to obtain the final group of features. By contrast, the implicit group feature selection algorithms do not find groups directly, while groups can be identified by the results that the feature selection algorithm provides. For these two sorts of group feature selection algorithms, the main contributions of this paper are:1. Propose a feature clustering based explicit group feature selection algorithm, FC-gRFE (Feature Clustering based Group SVM-RFE). This algorithm performs feature clustering on the training set to find the feature groups, and then select features at the group level via SVMRFE. In addition, SW-gRFE (Sample Weighting FC-gRFE) which based on FC-gRFE with sample weighting is proposed. This algorithm firstly calculates sample weights according to sample importance and then performs FC-gRFE to get group feature selection results on the weighted sample set. Experimental results on microarray data show that the proposed algorithm can find the feature groups of data, without scarifying the classification accuracy.2. Propose an implicit group feature selection algorithm with modified weights, CW-groupS (Coefficient Weight group feature Selection). The algorithm firstly codes each feature on the original feature set via a sparse model, Elastic Net, and then calculates correlations among features based on the coding coefficients which are more discriminative than the original features, and at last solves the feature-correlation weighted Fused Lasso model to get the sparse feature coefficients with group effects as the group feature selection results. An efficient algorithm based on FISTA is implemented to solve CW-groupS. Results of experiments on synthetic data sets and microarray data sets validate the effectiveness of the proposed algorithm. 3. Propose an ensemble explicit group feature selection algorithm EN-gRFE (ENsemble FC-gRFE). This algorithm firstly merges the feature selection results provided by FC-gRFE on random sub-sampling datasets into a feature set with repeated features, and performs a clustering algorithm on the most frequently repeated features to obtain the finally selected feature groups. Results of experiments on microarray data sets validate the effectiveness of the proposed algorithm. |