Research On Two-Stage Feature Selection Methods In Machine Learning

Posted on:2016-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:G M Liu

Full Text:PDF

GTID:2348330470969457

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data, it becomes an urgent problem that how to process the data quickly and explore useful information. As a pretreatment process in machine learning and data mining, feature selection has become a hotspot in academia. Machine learning algorithm has not be the bottlenecks in big data processing. In recent years, many studies have shown that irrelevant and redundant features greatly affect the accuracy and efficiency of machine learning algorithms. So, it is need to choose the appropriate feature selection algorithm to select effective features from massive amounts of data, in order to serve for machine learning algorithms efficiently.This paper focuses on research on feature selection methods in machine learning. Our purpose is to choose the most efficient features from high dimensional features, in order to improve the efficiency and reduce the running time of algorithms. The main contents of this paper is divided into the following parts:Firstly, starting from the classification of feature selection, based on the relationship between feature selection and machine learning algorithm, feature selection can be divided into filter model and wrapper model. Filter model has characteristic of efficient and high applicability, and it can detect and delete irrelevant features. Wrapper model has characteristic of high accuracy and optimum feature subset, and it can form optimum subset without redundant features. Combined with the two feature selection models, we propose the two-stage feature selection method.Secondly, considering high dimensional binary data with only values of 0 and 1, we define diff-criterion to measure the relationship between features. Compared to traditional methods, it greatly improves the efficiency of the correlation analysis.Thirdly, considering the detection of the redundant features, from the perspective of correlation analysis, we propose a non-linear correlation analysis based on maximum information coefficient. This method can compute the level of non-linear correlation between features. This can further reduce the dimension of feature subset.Finally, based on the maximum relation minimum redundant theory, we propose two feature selection methods. One method is for binary data, by diff-criterion and Markov Blanket. Another is based on symmetrical uncertainty and maximum information coefficient, in order to get the optimal subset.

Keywords/Search Tags:

feature selection, machine learning, diff-criterion, maximum information coefficient

PDF Full Text Request

Related items

1	Facial Feature Selection Based On Maximum Information Coefficient
2	Feature Selection Method Based On The Fusion Of Maximum Information Coefficient And Improved Harmony Algorithm
3	Research And Application Of The SPEC Feature Selection Algorithm Based On Correlation
4	Feature Selection Research Based On Maximum Relevance Minimum Redundancy
5	Research On Feature Selection For Machine Learning
6	A Study On Feature Selection Algorithms Using Information Entropy
7	Research On Feature Selection Algorithm Based On Reinforcement Learning
8	The Study Of Novel Dimensionality Reduction Methods And Application In Intelligent Recognition
9	Research Of Feature Selection For Text Classification
10	Development and evaluation of adaptive feature selection techniques for sequential decision procedures