Font Size: a A A

Research On Optimization Algorithm For Dataset Covering Problem

Posted on:2021-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:R X LiuFull Text:PDF
GTID:2428330611999981Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of data science,it is common to train learning algorithms based on certain datasets.Through surveys or scientific experiments,we can collect data sets prospectively.Recently,it has been recognized that the training data set is only representative,it is not enough.If the trained system is to handle some less popular categories well,it must include enough examples from these categories.This is the dataset coverage problem.The dataset coverage problem is different from the imbalanced learning problem,and the method of directly using the imbalanced learning may not be able to deal with such problems well.So someone proposed the concept of Maximal Uncovered Pattern(MUP).On this basis,there are already three algorithms for obtaining MUP,which are Pattern-Breaker algorithm,Pattern-Combiner algorithm and Deep Diver algorithm.Then,it proposes how to search as few data tuples as possible according to the MUP to be considered to achieve an approximate algorithm based on greedy strategy to solve the problem of lack of coverage.However,the existing methods still have room for optimization and improvement.Based on the aforementioned research,this paper mainly studies the following content.(1)We researched and analyzed the advantages and disadvantages of the three existing algorithms for acquiring MUP and the applicable scenarios,and studied the differences and connections between acquiring MUP algorithms and frequent itemset mining.(2)Combining the relevant research in association rule mining and the idea of search algorithm,we research and propose an improved algorithm Fast Deep Diver algorithm for Deep Diver algorithm,which can identify uncovered patterns faster and obtain MUP faster,thereby filtering out more node.(3)We propose a solution to the problem of computing coverage in the face of data sparse problems and insufficient bitmaps and insufficient memory.(4)Faced with the need to obtain the data tuples obtained by using the existing approximation algorithm based on greedy ideas,but such data tuples are difficult to collect in real life,we propose a pattern bitmap-based Matching algorithm to handle this situation.In this paper,through experiments on three data sets,we verified the impact of the lack of coverage problems.At the same time,we compared the performance of each acquisition MUP algorithm in various scenarios through experiments,analyzed their advantages and disadvantages and applicable scenarios,and verified The superiority of Fast Deep Diver algorithm compared to Deep Diver algorithm,Fast Deep Diver algorithm has higher running performance than Deep Diver algorithm in most cases,and has better stability.
Keywords/Search Tags:Dataset coverage problem, MUP, Imbalanced learning, Frequent itemset mining, Data sparseness
PDF Full Text Request
Related items