Font Size: a A A

Attribute selection in machine learning based on information theory (Spanish text)

Posted on:2002-06-20Degree:DrType:Thesis
University:Universidad de Las Palmas de Gran Canaria (Spain)Candidate:Lorenzo Navarro, Jose JavierFull Text:PDF
GTID:2468390011493984Subject:Computer Science
Abstract/Summary:
This thesis fits into the Machine Learning area, specifically in the Supervised Inductive Learning topic. In this area, the quality of the induced knowledge depends heavily on the quality of the measures used in the learning process which are going to represent the concepts. So, an open research subject is the selection of the most relevant attributes to induce the knowledge and which is the problem this thesis will study and a solution will be proposed. To face the problem, some concepts from the Information Theory will be used to establish an analogy between an information channel and a classifier that is the central idea this thesis is based on.; In the conceptual frame of this thesis, a definition of attribute relevance is given using the concepts of mutual information and entropy based distance. The GD Measure is a pratical method developed to carry out the attribute selection process according to their relevance with respect to the concept or class of interest. The GD measure exhibits an desirable property which is not found in other approaches, it takes into account the interdependence among attributes without estimating the multivariable probability density functions that appears in the definition of relevance.; To assess the quality of the CD measure, a set of experiments was designed to study different aspects of the proposed measure. An experiment tries to evaluate the bias that shows the measure with respect to the number of attribute values. Another one studies the quality of the selected attributes in two different dataset categories: artificial datasets where the attribute relevance is known a priori and real datasets where the relevance is unknown. In the latter case, the results obtained with the GD measure are compared with the ones obtained with other well known attribute selection methods.; At the end of this thesis, an architecture to induce classifiers in Computer Vision problems is proposed. In this architecture, the two main elements are the GD measure and the inducer module because the quality of the induced classifiers depends on them and so the quality of the classification. This architecture is tested with two different problems: in a Knowledge Based Vision system and in a Active Vision system to detect and identify faces.
Keywords/Search Tags:Attribute selection, GD measure, Information, Quality, Thesis
Related items