Font Size: a A A

Research About Feature Selection And Classification For Interactive Feature Of High-dimensinal Data

Posted on:2016-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1108330479950975Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming of the era of big data, the curse of dimensionality problem exists in many areas. Therefore, the feature selection method having the characteristics of good effect and efficiency becomes research hotspot. They include the method using statistics, global optimization and penalty function. However, the traditional feature selection depends on the original high-dimensional feature space, and rarely considers the complex interactions between features. Therefore, it brings many restrictions for the development of pattern recognition, machine learning and data mining. The interactions characteristics of high-dimensional data make the classification problem complex, such as medince data and bioinformation data, etc. How to use the advantages of simplicity and interpretability of linear methods and consider the complex interaction between features has become a challenging research work at present. The sparse characteristics of high-dimensional data in the regression and classification problem make linear methods such as lasso obtaining a huge success. The model and method of interactive feature selection using the mathematical theory of penalty function and convex optimization need further study. Therefore, it is an urgent need of big data analysis method to express the interpretability and objective of classification and regression problems.In view of the the low efficiency and complexity problem of feature selection, the mathematical concept and generated model of feature interaction are given. The mathematical model feature selection using the penalty function and convex optimization is builded for the classification and regression problems. The coordinate descent algorithm based on convex optimization is proposed. The experiment and evaluation work has carried out. The work finished in this paper will probably make significant meaning and value for both theoretical research of feature interaction and selection, and practical application of mathematical concept of the penalty function and convex optimization.First, based on the principle of graphical representation of multivariate data and barycenter interactive features, the interaction feature and selection problem is studed using global optimization feature selection. Based on the barycenter interaction features and the improved genetic or evolution operations, the feature selection methods are proposed using improved genetic algorithm, particle swarm optimization or differential evolution algorithm. The classifier uses the traditional classifier. The characteristics of the proposed method are good effect but low efficiency. These experiment results become the foundation of the feature selection methods using the penalty function.Secondly, based on the study of interactive feature and penalty function, convex optimization and lasso, the elastic net methods using interactive features are proposed and are used in the feature selection and classification questions. The lasso penalted binomial or multinomial logistic regression models and coordinate descent algorithm are studed. The elastic net penalted binomial or multinomial logistic regression models and coordinate descent algorithm also are studed. The traditional classifier and lasso classifier are maken as the classifier. The experimental results show that proposed method has good performance in interpretability, good effect and high efficiency.Finally, based on the elastic net feature selection, and considering the interaction and penalty function, based on the hierarchical idea of original features and interaction features, the hierarchical lasso methods using interactive features are proposed and are used in the feature selection and classification questions. Research contents include: The model definition of hierarchical lasso penalty logistic regression model with interactive features, convex relaxation strategy, the calculation of hierarchical model parameters based on the coordinate descent method, the calculation of hierarchical model parameters based on the generalized gradient descent method, and the selected strategy of optimized regularization parameter. The hierarchical lasso penalted logistic regression models and the corresponding non-convex optimization algorithm are proposed. The hierarchical lasso model and coordinate descent with barycenter interaction features are proposed. The experimental results show that proposed method has good classification performance.
Keywords/Search Tags:feature interaction, feature selection, classification, differential evolution, hierarchical lasso, coordinate descent
PDF Full Text Request
Related items