Font Size: a A A

Research On Feature Selection Algorithm And Its Application In Content-Based Image Retrieval

Posted on:2006-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1118360155972589Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the scope and fields of computer application expanded increasingly, and in particular the rapid development of the Internet, large even huge amount of data has been produced in various application systems and on the Internet, resulting in the problem and phenomenon of "data explosion and knowledge scarcity"; data mining is the most effective method to tackle the problem. However, data preprocessing is essential factor for effective data mining, and feature selection is one of the most important data preprocessing methods. In addition, feature selection is necessary step for machine learning and pattern recognition. The research of feature selection started from 60's last century with many achievements. However, with the appearance of new application domains and objects, there are still many problems should be solved for feature selection urgently. This paper gives a detailed introduction to these, and makes in-depth research on current focus, especially the algorithms of feature selection, with certain products. The author divides the research of feature selection algorithm into three stages, at first, puts forward the model of common feature selection algorithm. At the same time, from the researcher and user's perspective, categorizes the feature selection algorithms. These will facilitate the user to select appropriate algorithm, promote the application and build solid base for the research of it. Secondly, presents and introduces some specific algorithms of feature selection, which are the focus and hotspot of current research. They contain: algorithms of feature selection from fuzzy feature space, from high dimensional feature space (supervised and unsupervised) and algorithms of feature selection using less training data. For feature selection from fuzzy space, the author adopts the extension matrix as search strategy and fuzzy similarity between classes as evaluation criterion. The theory analysis and experimental results showed the algorithm has better performance and lower time cost. The algorithm is special for fuzzy feature selection, which take full advantage of the fuzziness of feature and can be used in fuzzy classifier. For supervised high-dimensional feature selection, the author presents a filter method using several levels based on feature correlation. It can efficiently remove the irrelevant and redundant features in the original set. The experimental results proved the method can drastically reduce the dimension of selected feature set. Simultaneously, analyzes other algorithms based on feature correlation and describes the definition and computation of feature correlation. These will be helpful for the future research. For the research of unsupervised feature selection, the author gives a layered filter algorithm based on feature ranking. The criterion of ranking is exponential entropy, and the evaluation criterion is fuzzy feature evaluation index. The algorithm can eliminate irrelevant and redundant features, deal with high-dimensional data and noise data, and has less computation cost. All algorithms listed above need ample training samples, however, when the available training samples is few with respect to the dimension of feature set, how to select feature? The author elaborates on the corresponding feature selection algorithms using support vector machines (SVM), which is based on statistic learning theory with the aim of structural risk minimum. These algorithms mainly consider the influence of feature subset to the performance of SVM. The current research is made with the labeled samples, however, along with the development of theory of SVM, the unsupervised feature selection using SVM will be feasible. In addition, the author considers content-based image retrieval as example to describe the application of feature selection, presents the necessaries and methods of feature selection in content-based image retrieval. Especially, make deeply research of the particular method, such as relevance feedback. At the same time, apply the presented algorithm of supervised high-dimensional feature selection into the image classification and get well effect. Feature selection is also widely applied into text classification, intrusion detection, gene analysis, and etc. The application scope of feature selection will be extended with the enlarged domains of data mining and pattern recognition. In the end, this paper concludes by summarizing the research and indicating its future work.
Keywords/Search Tags:Feature Selection Algorithm, Fuzzy Set, Support Vector Machines, Content-Based Image Retrieval
PDF Full Text Request
Related items