Font Size: a A A

Research On Hybrid Feature Selection Of Pattern Classification

Posted on:2016-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2308330461467280Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of Internet technology, high-dimensional data emerge and grow with a massive trend. These high-dimensional data contain much redundant and irrelevant information, which will result in the curse of dimensionality. This situation will further lead to higher requirements and more challenges for machine learning algorithms. Feature selection has become a heated research topic in the field of machine learning, statistics, data mining and pattern recognition in recent years. Feature selection is used to reduce the redundant and irrelevant information for improving the accuracy and efficiency of pattern classification, and it has been widely applied in many fields such as text classification, intrusion detection, genomic analysis, and image retrieval.Based on the distribution characteristics of the samples, feature selection can select a good feature subset from the original feature space by using some evaluation criteria. The main objective of feature selection is to find a minimal feature subset from a set of features with high performance in representing the original features. The classification performances of the selected features subset must be close to or better than the original feature space. Feature selection plays an important role in data analysis and pre-processing step, which can select the relevant features, eliminate irrelevant and redundant features, reduce the dimension of training samples and data noise, and improve the accuracy and efficiency of pattern classification. Therefore many feature selection algorithms have been widely developed in pattern classification during past years. This paper makes in-depth research on theoretical knowledge and practical application of the feature selection algorithms. The main contribution of this paper and innovative points are summed up as follow:1. The hybrid and combined feature selection algorithms to promote classification accuracy are investigated, and a two-stage feature selection method for text categorization by using category correlation degree (CCD) and latent semantic indexing (LSI). In the first stage, a novel CCD method is proposed to select the most effective features for text classification, which is more effective than the traditional feature selection method. In the second stage, LSI is used to discover the important correlative relationship between features and reduce the feature space dimension. The experimental results have proved that our method can reduce effectively the dimension of text vector and improve the performance of text categorization.2. Due to the low accuracy and prematurity of basic particle swarm optimization algorithm used for feature selection, this paper presents an improved particle swarm algorithm (IPSO) for feature selection. We adopt chaos theory to optimize the inertia weight parameters and use genetic mutation operations to increase the diversity of the population, thereby improving the solution quality and convergence speed of particle swarm optimization algorithm.3. In order to overcome the shortcomings that standard artificial fish swarm algorithm has low convergence accuracies and low convergence speed in the later period, this paper proposes a novel artificial fish swarm algorithm (AFSA) with parameters dynamic adjusting for feature selection. We adjust the parameters dynamically in the visual field and step length, and introduce the strategy of keeping the best individual, thereby improving the search ability and convergence speed of the fish swarm algorithm.4. The global best solutions and search capabilities of IPSO and ant colony swarm algorithm (ACO) are investigated, and we propose a hybrid feature selection algorithm based on IPSO and ACO called HIPA. In HIPA, the global best solutions obtained by the IPSO and ACO are used for recombination, and the new solutions achieved from this recombination are given to the populations of the IPSO and ACO as the global best at next iteration, respectively. Information flow between IPSO and ACO helps improve global and local search abilities of the HIPA approach. Compared with the performances of the IPSO and ACO, the experimental results have proved that our method can give very promising performances with very small feature subset dimension.5. The global best solutions and search capabilities of IPSO and adaptive fish swarm algorithm are investigated, and we present a hybrid feature selection algorithm based on IPSO and AFSA called HIPF. In HIPF, the global best solutions obtained by the IPSO and ASFA are used for recombination, then the new solutions achieved from this recombination are given to the populations of the IPSO as the global best and the global best solution from this current system are given to the populations of the ASFA as the global best at next iteration. Information flow between IPSO and AFSA helps improve global and local search abilities of the HIPF approach. Compared with the performances of the IPSO and AFSA, the experimental results have proved that our method can give very promising performances with very small feature subset dimension.6. The hybrid applications of filter model and wrapper model are investigated, and this paper proposes two novel swarm intelligence-based feature selection approaches, called mr2HIPA and mrHIPF, which consist of filter model and wrapper model. The mr2HIPA combines the maximum relevance minimum redundancy (mRMR) method and HIPA algorithm, and the mr HIPF combines mRMR method and HIPF algorithm. In the first stage, to reduce the number of features, the filter model incorporates mRMR to select top-ranked feature subsets. In the second stage, one wrapper model called HIPA recombines IPSO and ACO to increase the searching abilities of feature subsets. The other wrapper model called HIPF recombines IPSO and ABFSA to increase the searching abilities of feature subsets. The results show that mr2HIPA and mr2HIPF outperforms other well-known feature selection approaches in terms of feature reduction efficiency and classification accuracy.To evaluate the usefulness of the two-stage feature selection method based on CCD and LSI, we carried out experiments on datasets of Fudan University for text classification. The results show that this method can get better classification performance. To evaluate the usefulness of hybrid feature selection methods based on swarm intelligence, we carry out experiments on four datasets of the UCI machine learning repository with medical image processing. The results show that the proposed algorithm can effectively improve the performances of the classifier in terms of classification accuracy and feature dimension, but the proposed algorithms are time-consuming, the proposed algorithm cannot meet the requirements of real-time system. In future, this article will further improve the operational efficiency of pattern classification algorithms, such as hardware implementation, so that the proposed algorithms can meet the requirements of some real-time systems.
Keywords/Search Tags:pattern recognition, machine learning, feature selection, text classification, swarm intelligence
PDF Full Text Request
Related items