Attribute selection in machine learning based on information theory (Spanish text)

Posted on:2002-06-20

Degree:Dr

Type:Thesis

University:Universidad de Las Palmas de Gran Canaria (Spain)

Candidate:Lorenzo Navarro, Jose Javier

Full Text:PDF

GTID:2468390011493984

Subject:Computer Science

Abstract/Summary:

This thesis fits into the Machine Learning area, specifically in the Supervised Inductive Learning topic. In this area, the quality of the induced knowledge depends heavily on the quality of the measures used in the learning process which are going to represent the concepts. So, an open research subject is the selection of the most relevant attributes to induce the knowledge and which is the problem this thesis will study and a solution will be proposed. To face the problem, some concepts from the Information Theory will be used to establish an analogy between an information channel and a classifier that is the central idea this thesis is based on.; In the conceptual frame of this thesis, a definition of attribute relevance is given using the concepts of mutual information and entropy based distance. The GD Measure is a pratical method developed to carry out the attribute selection process according to their relevance with respect to the concept or class of interest. The GD measure exhibits an desirable property which is not found in other approaches, it takes into account the interdependence among attributes without estimating the multivariable probability density functions that appears in the definition of relevance.; To assess the quality of the CD measure, a set of experiments was designed to study different aspects of the proposed measure. An experiment tries to evaluate the bias that shows the measure with respect to the number of attribute values. Another one studies the quality of the selected attributes in two different dataset categories: artificial datasets where the attribute relevance is known a priori and real datasets where the relevance is unknown. In the latter case, the results obtained with the GD measure are compared with the ones obtained with other well known attribute selection methods.; At the end of this thesis, an architecture to induce classifiers in Computer Vision problems is proposed. In this architecture, the two main elements are the GD measure and the inducer module because the quality of the induced classifiers depends on them and so the quality of the classification. This architecture is tested with two different problems: in a Knowledge Based Vision system and in a Active Vision system to detect and identify faces.

Keywords/Search Tags:

Attribute selection, GD measure, Information, Quality, Thesis

Related items

1	Research And Application Of Feature Selection Algorithm Based On Information Measure
2	Improvement And Application Of Naive Bayes Aglorithm Based On Attribute Selection Weighting
3	The Research On Vocational College Graduate Thesis Quality Guarantee System Under Information & Technology Circumstance
4	Research On Attribute Importance Measure Theory And Method Based On Data Coordination
5	Research On Attribute Selection Algorithm Based On Analysis Of Correlation Between Attributes
6	An analysis of the ability of an instrument to measure quality of library service and library success
7	New Research On Several Problems For Attribute Reduction Of Information Table
8	Research On Some Issues Of Knowledge Acquiring And Uncertainty Measure In Information System
9	Feature extraction and selection for speech recognition
10	Research On Online Shop Selection Method Based On User Portraits And Hot Sales