Font Size: a A A

Research On Big Data Cleaning Algorithm Of Sand Mining Based On K-Means-CNN

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2392330611468171Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data quality issue is the key for the data mining.The high-quality data brings accurate and comprehensive information,which helps human to make correct judgments and decisions.Therefore,it is very important to do data cleaning well and improve data quality.At present,researchers have proposed many data cleaning methods,which have solved the data cleaning problems in most fields.However,these data cleaning methods still have many deficiencies in the processing of missing values,abnormal values,and duplicate values.It is mainly reflects on the algorithm’s flaws and inaccurate classification of data problems.Moreover,for sand mining big data,there isn’t currently an effective cleaning method for sand big data.River sand mining activities are becoming more and more frequent.During the sand mining process,various data closely related to sand mining will be generated.These data include: business data,equipment data,sensor data,artificial data,etc.among which,the river information data collected by sensors is very helpful to analyze the sand mining business,and some backup and incomplete data are redundant and garbage data.When analyzing and applying the data,misleading decision-making problems will occur.Therefore,cleaning these data can fully analyze the mining data.To guide people to make the right decisions.Big data and deep learning technologies have become the mainstream of data processing.Among them,sand mining in rivers is related to the people’s livelihood,and it is urgent to solve the problem of sand mining.The existing data cleaning methods are improved in this paper,and raised a data cleaning method based on clustering and convolutional neural networks for sand mining big data.The main work of this article is as follows:First,research the data source and understand the sand mining big data,study the main quality problems of these source data,according to the data problems,carry out specific algorithm models and structural design and optimization to ensure that the data can be accurately classified.Second,study how to classify large data sets.The K-Means clustering algorithm using the least squares method is used to cluster the sand content data set,and the threshold is set using the least squares method to reduce the influence of the outliers on the clustering effect.The clustered data set is then used as a training sample for the convolutional neural network,and simulated training is performed to obtain a network model.The model is continuously optimized and tuned to obtain the best training effect.Third,input experimental data in the adjusted data cleaning model.After classifying the model,compare the output results with the input experimental data,calculate the error,and consider it to be abnormal if it is not within the error range,and then correct it.deal with.Fourth,the experimental results are compared and analyzed.The application of the model to the river intelligent sand mining supervision platform shows that the data cleaning algorithm combining K-Means clustering and convolutional neural network can effectively deal with the big data of sand mining.The cleaning improves the quality of sand mining data and puts forward effective suggestions for sand mining work.
Keywords/Search Tags:Sand mining big data, K-Means clustering, CNN, Data cleaning
PDF Full Text Request
Related items