| With the development of urbanization in my country,large-scale urban construction is accompanied,and the output of construction waste is also increasing year by year.In order to better achieve the goal of reducing,recycling and decontaminating construction waste,the Ministry of Housing and Urban-Rural Development has launched a national pilot project for construction waste management.Entrusted by the Ministry of Housing and Urban-Rural Development,the project team developed a national construction waste management platform information reporting system to support information collection and interaction between the Ministry of Housing and Urban-Rural Development and the pilot cities.However,during the operation of the system,we found that the quality of the data reported by each pilot city was different,and to a certain extent there were inaccuracies,duplications,and omissions,which had affected the statistical analysis of the pilot work of the Ministry of Housing and Urban-Rural Development.In this context,this study carried out research on cleaning strategies and quality control methods for the spatiotemporal data of construction waste reported by pilot cities.The main research contents and results are as follows:(1)According to the attribute characteristics of the construction waste spatiotemporal data,a multi-constraint combination model is constructed,and the TOPPIS algorithm is used to calculate the matching degree of various data and cleaning models,and based on this,the optimal cleaning model is selected.(2)For the type of "dirty data" in the reporting system,build a natural language repeated cleaning model,an abnormal data cleaning model and a missing data filling model to achieve data consistency check,invalid value and missing value processing.(1)A natural language repeated cleaning model is constructed based on the N-Gram algorithm,and a comparison experiment is carried out with the edit distance algorithm and the Smith-waterman algorithm.The results show that in terms of precision,the cleaning model proposed in this paper has a minimum of 87.5% and a maximum of96.1%,which is higher than the other two algorithms.The lowest rate is 87.42%,the highest is 93.2%,and the accuracy is much higher than the other two algorithms.(2)An abnormal data cleaning model was constructed based on the Laida criterion,and outlier detection was carried out for the inventory inspection data of multiple pilot cities;the occurrence times of outliers in different cities were analyzed,and the outliers were eliminated by the ignore tuple method.(3)Construct the missing data filling model based on the improved LSTM cleaning algorithm,and compare it with the traditional LSTM algorithm.The results show that the average RMSE accuracy of the padding model proposed in this paper is 11.708,and the traditional LSTM algorithm is 22.653;the average MAPE accuracy of the padding model proposed in this paper is 9.064%,and the traditional LSTM algorithm is 16.942%,so the traditional LSTM algorithm in this paper is more accurate.(3)In order to test the overall quality of the data reported by each pilot city,build a construction waste data quality assessment model.According to the first-level indicators such as accuracy,uniqueness,consistency,and integrity,the AHP-entropy combined weight method is used to calculate the index weight,and the data level is divided based on the fuzzy comprehensive evaluation method.Finally,a relatively reasonable reporting data processing is designed.Program.A data quality assessment experiment was carried out for the data reported by 35 pilot cities in September 2019,and the experimental results were communicated and fed back to the pilot cities.Each pilot city had no objection to the data quality level obtained in this paper,which proves that the evaluation results of this model are more reliable.(4)In order to solidify the data cleaning model and data quality assessment model,a construction waste reporting data cleaning platform was developed,which realized the automation of construction waste reporting data cleaning and quality assessment.Through this platform,the Ministry of Housing and Urban-Rural Development can obtain and report data quality information in a timely manner,feedback data quality classification results to pilot cities,and urge them to make data rectification.The application of the system shows that it can effectively improve the overall accuracy of construction waste reporting data.To sum up,this study designs a set of disposal system and software system that combines data cleaning and quality assessment for the spatiotemporal data reported by construction waste pilot cities,which realizes accurate matching of cleaning models,automatic filling of problem data,and improvement of data quality.scientific evaluation.Improved methods are proposed in terms of sensitive word detection and data pre-filling,which provides a reliable guarantee for the quality improvement of reported data.The practical application results show that the disposal system and software system are efficient and available,and effectively support the pilot work of construction waste management. |