Font Size: a A A

A Method To Evaluate The Coverage Of The Data Set To The Particular Data

Posted on:2022-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:J W MiaoFull Text:PDF
GTID:2518306572460074Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of data science,it is often necessary to use data sets to train learning algorithms to complete related tasks.The data set used for training often needs us to collect proactively.If we want the model to have good result on the unfrequent data,the data set must contain enough examples similar to these data.Insufficient coverage of the training data set to the data to be predicted often leads to the inaccuracy prediction.In order to foreseen these inaccuracies in advance,this paper proposes a method to evaluate the coverage degree of the multi-dimensional categorical attribute data set to the data to be predicted.This paper aims at the topic of evaluating the coverage of data sets to the data to be predicted.It is proposed that the coverage of data sets to the data to be predicted depends on the number of similar data between data sets and data to be predicted,and uses pattern to represent a kind of similar data.Then,it is proposed to observe the number of similar data in data sets and data to be predicted from multiple perspectives use multiple pattens.On this basis,the four steps of this paper to evaluate the coverage of training data set to the data to be predicted are as follows.(1)This paper first introduces how to extract the appropriate pattern set from the data to be predicted as different prespectives to observe the data to be predicted.(2)Then we propose to use the deep autoregressive model instead of the complex full table scan to quickly predict the coverage of the patten,and verify the accuracy and superiority of the deep autoregressive model in terms of running time on three data sets.(3)Then,this paper proposes a heuristic method to determine the coverage threshold of a pattern,which is used to judge whether a single pattern is adequately covered.For the overall coverage of the data to be predicted,it is obtained by voting the results of the multi-patten.(4)Finally,for the data to be predicted with insufficient coverage,this paper proposes to use a tree search method to find the cause of insufficient coverage of the data to be predicted,and suggests targeted supplement and enhancement.In this paper,we do coverage evaluation experiments on three real data sets of different sizes,and obtain the best accuracy of 0.8,0.78 and 0.57 respectively.At the same time,the average running speed only takes 8 ms,and good results are achieved.
Keywords/Search Tags:Coverage, Patten, Deep Autoregressive Models, Machine learning
PDF Full Text Request
Related items