A Method To Evaluate The Coverage Of The Data Set To The Particular Data

Posted on:2022-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:J W Miao

Full Text:PDF

GTID:2518306572460074

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of data science,it is often necessary to use data sets to train learning algorithms to complete related tasks.The data set used for training often needs us to collect proactively.If we want the model to have good result on the unfrequent data,the data set must contain enough examples similar to these data.Insufficient coverage of the training data set to the data to be predicted often leads to the inaccuracy prediction.In order to foreseen these inaccuracies in advance,this paper proposes a method to evaluate the coverage degree of the multi-dimensional categorical attribute data set to the data to be predicted.This paper aims at the topic of evaluating the coverage of data sets to the data to be predicted.It is proposed that the coverage of data sets to the data to be predicted depends on the number of similar data between data sets and data to be predicted,and uses pattern to represent a kind of similar data.Then,it is proposed to observe the number of similar data in data sets and data to be predicted from multiple perspectives use multiple pattens.On this basis,the four steps of this paper to evaluate the coverage of training data set to the data to be predicted are as follows.(1)This paper first introduces how to extract the appropriate pattern set from the data to be predicted as different prespectives to observe the data to be predicted.(2)Then we propose to use the deep autoregressive model instead of the complex full table scan to quickly predict the coverage of the patten,and verify the accuracy and superiority of the deep autoregressive model in terms of running time on three data sets.(3)Then,this paper proposes a heuristic method to determine the coverage threshold of a pattern,which is used to judge whether a single pattern is adequately covered.For the overall coverage of the data to be predicted,it is obtained by voting the results of the multi-patten.(4)Finally,for the data to be predicted with insufficient coverage,this paper proposes to use a tree search method to find the cause of insufficient coverage of the data to be predicted,and suggests targeted supplement and enhancement.In this paper,we do coverage evaluation experiments on three real data sets of different sizes,and obtain the best accuracy of 0.8,0.78 and 0.57 respectively.At the same time,the average running speed only takes 8 ms,and good results are achieved.

Keywords/Search Tags:

Coverage, Patten, Deep Autoregressive Models, Machine learning

PDF Full Text Request

Related items

1	Deep Learning Models And Applications Based On The Restricted Boltzmann Machine
2	Stock Price Forecasting Based On Support Vector Machine And GARCH Models
3	Induction Motor Fault Diagnosis Based On Deep Learning Models
4	Research On Novel Deep Forest Models
5	Coverage-guided Efficient Fuzzing Technology Based On Deep Learning
6	Task-oriented Machine Reading Comprehension Via Deep Learning
7	The Research On Deep Coverage Technology And Its Application Of Tai'an 4G Network
8	Deep Learning Models for RNA-Protein Bindin
9	Nearly 15 Years In Beijing Urban Area Remote Sensing Monitoring Of Vegetation Cover And Its Thermal Environment
10	Research On Deep Learning Models And Algorithms For Speaker Recognition