Font Size: a A A

Feature Selection With Cost Constraint

Posted on:2016-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:J K LiFull Text:PDF
GTID:2308330464958463Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
There are many kinds of costs in practical life, including test cost, misclassification cost, computation cost, delay cost and etc. Test costs are the time, money, or other resources spent on obtaining data items related to some objects; misclassification costs correspond to the penalty of deciding that an object belongs to class A when its real class is B. The misclassification costs are one of best metric for evaluating classifier performance. Waste time for waiting and do not do meaningful things, we called the delay cost. Cost-sensitive learning has attracted much attention from the machine learning and data mining communities.In data mining and machine learning, constraint satisfaction problem is a common and significant problem. This problem has been widely applied in different fields, such as business, combinatorics, computational complexity theory, security cryptology and applied mathematics. In our daily life, for the economic reason among others, people undertake only part of them. At present, many researchers focus on the particle constraint satisfaction problem, and solve cost-sensitive constraint satisfaction problem by defining different constraint conditions.Cost-sensitive learning is a hot research object in data mining. Constraint satisfaction problem is one of most famous problems in artificial intelligence and machine learning. Therefore, we study those two problems with rough set theory, and propose feature selection problem with cost constraint in this paper. The goals of the research are the treatment of different types of data model for cost-sensitive constraint satisfaction problem and design the efficient solution algorithms under different constraint conditions.This paper concludes two parts.In the first part, we study the feature selection based on cost-sensitive rough sets for the nominal data. First of all, we develop a fast randomized algorithm to tackle the minimal test cost feature selection problem under the time cost constraint. Decreasing runtime is the goal of the algorithm design.Secondly, in order to further decrease the runtime of the algorithm, we introduce the restart strategy to the fast randomized algorithm, namely a fast randomized algorithm with restart strategy. The restart strategy has a significant effect on improving the performance and stability of algorithms, especially in large datasets.In the last stage of this part, we present a quadratic heuristic algorithm for solving the minimal feature selection with multi-cost constraint problem. The new problem firstly takes the test costs and misclassification costs into consideration in cost constraint problem. The experimental results indicate that the proposed algorithm can find optimal results in most cases, and much faster than the backtracking algorithm.In the second part, we research the issue of cost constraint feature selection on the numeric data. In this section, we introduce an adaptive neighborhood model for the numeric data. The number of inconsistent objects in neighborhood performs monotonically decreases with features, which means that adding a new feature in the feature subset at least does not increase the number of inconsistent objects. We propose a sine heuristic algorithm for cost constraint feature selection on the numeric data. Experimental results show that the new algorithm can obtain a good result in the performance of the effects and efficiency.
Keywords/Search Tags:cost-sensitive learning, rough sets, cost constraint, feature selection, randomized algorithm
PDF Full Text Request
Related items