Font Size: a A A

A Study Of The Merging Algorithm For Cost-sensitive Attribute Reduction

Posted on:2017-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:A T ZhangFull Text:PDF
GTID:2358330482999343Subject:Engineering
Abstract/Summary:PDF Full Text Request
Data mining, also known as knowledge discovery in databases (Knowledge Discover in Database, KDD), is currently a hot issue in the field of artificial intelligence and database research. In the real world, the size of attribute data could be dozens, hundreds, even thousands. Many of these attribute are redundant, they could interfere with the data mining process, or affect the efficiency of the algorithm. Therefore, attribute reduction is proposed as a pretreatment technology. On the other hand, behaviors or things in real world have costs, such as the test costs, misclassification costs, the cost of delay, which involving money, time, labor and others. Cost-sensitive learning committed to mining problems involving various types of cost.Test-cost-sensitive attribute reduction problems have being studied, which include:the minimal-test-cost reduct problem, the minimal-test-cost reduct with simple common cost problem, the minimal-time-cost reduct problem, different algorithm frameworks are applied to these specific problems. Heuristic algorithm is fast, but since they often converge to a local optimal solution, so the correct rate of heuristic algorithm is low. While backtracking algorithm guaranteed to find the optimal solution, but the run time of it can not be accepted. Bionic algorithms are often able to find the optimal solution, but time cost of it is high. Recently, semi-greedy algorithm is employed to this issue, the behavior of it is good.In this paper, based on the backtracking algorithm and divide and conquer algorithm, we present a simple yet effective algorithm called merge reduction to this issue. It consists of three critical technologies, divide and conquer, backtracking and competition mechanism. First, a number of subtables are obtained through dividing attributes randomly into groups. The size g of the groups has a great impact on the performance of the algorithm. When the number of attributes is n. In extreme cases, when the size of g is the number of attribute, merge reduction would be degenerate into a backtracking algorithm. For each subtable, a reduct is obtained through backtracking. Then the reducts of each pair of adjacent groups are merged to form a new subtable. This process repeats until only one subtable is obtained and the final reduct computed. The reduct computed may be local optimal result, becuase we divide attributes into groups. Since a local optima may not be the globally optima, sometimes we can not find the optimal reduct. Since the result of one run is unsatisfactory, we employ the comptition strategy to repeat the process p times and choose the best reduct.In this paper, merge reduction is employed to the minimal-test-cost reduct problem, the minimal-test-cost reduct with simple common cost problem, the minimal-time-cost reduct problem, we do lots of experiments on four real-world datasets from UCI (University of California-Irvine) and three different test cost distributions to test merge reduction. With a lot of experiments, we know when the values of p bigger than 6, the result of merge reduction become stable. The optimal g is slightly different from different problems. The optimal g of the minimal-test-cost reduct problem is 6. The optimal g of the minimal-test-cost reduct with simple common cost problem and the minimal-time-cost reduct problem is 7. Compared with the existing heuristic algorithm, ant colony algorithm and backtracking algorithms, merge reduction can shorten the running time while keep the accept correct rate. Therefore, merge reduction is efficient and stable.
Keywords/Search Tags:Cost-sensitive learning, rough sets, attribute reduction, Merge, Backtracking, competition
PDF Full Text Request
Related items