| Industries that produce through physical or chemical changes are called process industries.In recent years,the process industry is developing towards intelligent and optimized manufacturing.The manufacturing process of the process industry has a strong real-time nature.The enterprise is in the process of continuous development,the number of manufacturing sites is also increasing,which leads to the double increase of data in the process industry production process.Since no unified data standard has been established among various enterprises,there is a problem of heterogeneous real-time data.The process of obtaining,producing,using and disposing of raw materials is called the full life cycle of the product.The integration of data in the full life cycle of the process industry is one of the focuses of this article.At present,there are still deficiencies in data integration methods and research related to the entire life cycle of the process industry.Clean and reliable data is a prerequisite for improving the efficiency of data analysis and ensuring the accuracy of data analysis.Therefore,cleaning the data is an essential part.At present,the process industry has not yet formed a complete data cleaning system,and the data cleaning method related to the process industry production data remains to be studied.To solve the above problems,this paper adopts the data warehouse method to integrate the data of the whole life cycle of the process industry,and proposes an improved ETL(Extract-Transform-Load)framework for the data integration of the whole life cycle of the process industry.The framework ensures the efficiency and stability of data integration by adding an intermediate database between data extraction and conversion.First,the intermediate library is used to store the cleaned data.When the data conversion rules change,the system does not need to re-execute the ETL process.It only needs to obtain the data from the intermediate library for direct conversion,which improves the efficiency of data integration;secondly,the underlying data The source and the intermediate database do not interfere with each other,and the intermediate database will not be affected when there is a problem with the underlying data source,thus ensuring the stability of data integration.In addition,this article combines the domain knowledge of experts in related fields and proposes an improved Long Short-Term Memory(LSTM)algorithm for process industry production data to solve the anomaly detection and missing value filling of process industry production data.This paper first combines the professional technical experience of domain experts with unsupervised learning algorithms to screen abnormal data,and then reconstructs the structure of the standard LSTM algorithm.The dualinput structure improves the correlation between data features and makes the model more powerful.Finally,the Mean Absolute Error(MAE)and Mean Absolute Percentage Error(MAPE)of the model parameters can meet the requirements of detection,that is,the model predicts that the missing value is close to the true value.Finally,this paper designs and implements a data integration and cleaning system for the process industry.This paper uses the cement mill quality data of a certain company as the test set.By cleaning the test set,the value of the above work is verified and the data quality is effectively improved. |