Font Size: a A A

Study On Outlier Detection Algorithm And Its Application In Grain Situation Analysis

Posted on:2022-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:T F LiuFull Text:PDF
GTID:2481306605468864Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the process of grain storage,factors such as temperature,humidity,moisture,and pests that affect the safety of grain storage are called grain conditions.What can be directly detected in the granary is air temperature,silo temperature,grain temperature,silo humidity,air humidity,grain dampness,and grain pile moisture.The storage period of grain in my country’s grain depots is long.Grain is susceptible to non-biological factors during storage,such as temperature,humidity,moisture,etc.,as well as various biological factors,such as various pests and molds.These factors are easy to cause heat and mildew of grain,leading to poor quality of stored grain and loss of quantity.With the construction of grain depot informatization,a large amount of grain situation data has been collected.Through the research and application of grain situation data,the establishment of a grain situation early warning system in line with modern grain depot management is of strategic significance for maintaining and guaranteeing the security of my country’s grain reserves.Outlier detection helps to quickly find abnormal grain conditions,increase the hit rate of grain condition early warning,reduce storage losses,and ensure the safety of grain storage.Outlier detection technology is an important research direction of data mining.By identifying outliers,researchers can obtain important data information,which helps people make better data decisions.At present,most of the grain condition data mining is to study the normal grain condition temperature and humidity changes or only use statistical methods to judge abnormalities.This "one size fits all" method has great defects for judging abnormal grain conditions,ignoring the real abnormal data.Information hidden in.Therefore,in order to effectively mine the abnormal information of the grain condition data,the outlier detection method is applied to the grain condition data analysis.This paper comprehensively considers the characteristics of large and complex grain data,multiple attributes,and unbalanced data sets,and proposes two outlier detection optimization algorithms.According to the different characteristics of the two algorithms,the grain data abnormal early warning system is designed and implemented.The main research contents are as follows:(1)Combining the characteristics of large amount of grain condition data and many attributes,a grain condition data outlier detection algorithm based on spectral clustering is proposed.Aiming at the problem that the traditional spectral clustering algorithm uses global scale parameters that the algorithm cannot accurately reflect the true distribution of the data set,and the use of K-means to cluster feature vectors causes the algorithm to be sensitive to the initial parameters.The proposed optimization algorithm uses the idea of Self-Tuning algorithm.Specifically,when constructing the similarity matrix,a local scale parameter is set for each sample to make the algorithm more in line with the true distribution of the data and improve the accuracy of the algorithm.When clustering the feature space,the dichotomous K-means algorithm is used to optimize the K-means algorithm to reduce the time cost,calculate the outlier index of each sample,and then detect outliers.Experimental results show that the optimization algorithm based on spectral clustering can effectively detect outliers in the grain situation data,and its accuracy and recall rate are improved compared with traditional algorithms.(2)In order to detect the abnormal grain condition in time and reduce the time cost,in view of the high time complexity of the above algorithm,combined with the redundant and extremely unbalanced characteristics of the grain condition data,a kind of grain condition data based on the isolated forest is proposed.Outlier detection algorithm.The basic idea is to select isolated trees with large differences and high accuracy to construct isolated forests,so as to avoid the problem of uneven performance among isolated trees.After the algorithm constructs the isolated tree,the difference value calculated by the Q statistic method and the accuracy value calculated by the cross-validation method are used to construct a fitness function.According to the fitness value,a better isolated tree is selected to construct an isolated forest.In the optimization stage of experimental parameters,the ROC curve and AUC are used to select the most effective parameters.In the experimental performance evaluation stage,compared with the traditional algorithm,the optimization algorithm based on the isolation forest in this paper has a better effect on outlier detection of grain data,higher accuracy and recall rates,and linear time complexity.At the end of the experiment,the results of the two algorithms to detect outliers were combined,and the Pareto analysis of the reasons for the outliers was carried out.(3)Analyzing and summarizing the two algorithms proposed in the article,it is found that the outlier algorithm based on spectral clustering has higher accuracy and recall rate,but the time cost is high,while the outlier detection algorithm based on isolated forest has a higher accuracy rate.When the sum recall rate is good,the time complexity is linear.The different characteristics of the two algorithms are used to design and implement an early warning system for abnormal grain conditions.The system includes modules such as system information management,grain depot management,grain condition pre-processing and abnormal early warning.The two outlier detection optimization algorithms proposed in this paper based on the characteristics of grain situation data can effectively detect outliers,which are of great significance to grain situation data mining,regular research and food storage safety.
Keywords/Search Tags:Outlier Detection, Grain Situation Data, Spectral Clustering, Isolation Forest, Abnormal Warning
PDF Full Text Request
Related items