Font Size: a A A

Research On The Method Of Evaluation Of Data Reduction Effect

Posted on:2013-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:W K ZhongFull Text:PDF
GTID:2248330392457799Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of computer technology, a huge number of massive datagenerated. In order to decrease the time to process massive data and the storage needed,data reduction is needed before further analyzing. Results of the effect of data reductionare directly related to whether the data analyzing is right or wrong afterward. Therefore,researching on reasonable evaluations of the effect of data reduction is of great theoreticaland practical significance.Due to the fact that changes of instances and characters in data set has impact on theclassification, the paper gives individually the computational formula of new macroscopicF1value of two classified data set and many classified data set, and proposes a new datareduction evaluation method based on classification which was used to data set havingapparent classification by analyzing the affect of class radius and the distance betweenclasses and number of instance in the data set for classification accuracy.From both sides of data reduction based on instance selection: data edit and datacompress, the effect of data reduction on similarity was analyzed. According to theanalysis on the frequency distribution, quintile fractals and distance between theinstances, three data reduction assessment method based on similarity was proposed.There are the method based on Ma’s distance, the method based on QQPlot diagram andthe method based on Statistical histogram. These methods are suitable for any data set.Effect to self-relativity of data set before and after data reduction based on charactersselection and instance selection were also described. By analyzing a statistics valueMoran’s I of space self-relativity, a method of evaluation to data reduction effect based onspace self-relativity was given. This kind of method is suitable for the data set that has ahigh space self-relativity.The research worked on the evaluation methods of data reduction based on charactersselection and instances selection achieves theoretical and practical values. Theseachievements also have a positive significance to improve the efficiency of processingmassive data.
Keywords/Search Tags:instance, characters, spatial autocorrelation, classified precision, reductionevaluation
PDF Full Text Request
Related items