Font Size: a A A

An Empirical Study On The Quality Evaluation Method Of My Country’s Forestry Statistics Data Based On Data Mining

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:W X ZhuFull Text:PDF
GTID:2493306737475984Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous construction of my country’s sunshine and service-oriented government,the quality of government statistical data has been comprehensively improved.As an important part of government statistical data,my country’s forestry statistical data still has poor quality,low accuracy,and even data distortion,which affects the statistical function,especially the decision-making support function.Therefore,we are studying how to evaluate the quality of forestry statistical data.It has important theoretical and practical application value.Based on the above background,this paper conducts research on data quality assessment methods.First,on the basis of literature research,the research status of data quality theory and anomaly detection in forestry data are sorted out,anomaly recognition methods are studied,and anomaly detection algorithms are compared and analyzed.Secondly,taking forestry statistical time series data as the object,using the method of combining data mining technology and manual experience to evaluate the quality of existing forestry statistical data,and divide the forestry statistical data into forestry ecological statistics and forestry investment statistics according to content and nature.Data and forestry industry statistics,preliminary judgment of the distribution of abnormal values,selection of detection methods according to the distribution characteristics of the data,after detecting abnormalities,both local model changes and overall changes,mining the information behind.Forestry ecological data adopts unsupervised KNN,LOF,CBLOF,and isolated forest algorithms,forestry investment data adopts unsupervised LOF,CBLOF algorithms,and forestry industry statistics adopts unsupervised LOF algorithms to calculate abnormal values of various forestry statistical data.Identification and data quality assessment.The results show that the four unsupervised anomaly detections,namely KNN,LOF,CBLOF,and isolated forest,can accurately identify the distribution of the initial outliers,and can be well applied to the anomaly recognition of forestry statistical data.KNN and the isolated forest algorithm have better performance when identifying large differences in data dimensions.At the same time,the isolated forest does not need to make assumptions about the parameters,and the operation is easier;while LOF takes into account the local changes of the data and explores the data The point where the change does not conform to the local law;when the data has obvious categories,the CBLOF algorithm is used,which has high efficiency and high performance.From the perspective of data quality,the selected forestry ecological statistics,forestry investment statistics and forestry industry statistics selected in this article can accurately explore the emergence of new things or the occurrence of new mechanisms.Looking at forestry as a whole The statistical data is of good quality.However,in the statistical data of forestry ecology,my country’s artificial afforestation area from 1984 to 1985 was abnormally high,and it was judged as an impurity spot.The main contribution of this paper is that in the quality of forestry statistical data,an anomaly detection method based on data mining is proposed to detect the quality of data,and the use of inconsistent data and anomaly detection methods.The empirical research shows that the data mining-based program in this paper can effectively identify anomalies and evaluate them.
Keywords/Search Tags:Forestry Statistics, Quality Assessment, Anomaly Detection
PDF Full Text Request
Related items