The industrial data of complex manufacturing process brings many challenges to the anomaly analysis and modeling because of the multi-stages of manufacturing process and the variety of data collection.A common type of data in complex manufacturing process is multivariate time series.Due to the rare occurrence of anomalies in real-world manufacturing processes,the proportion of abnormal labeled data is very small,which constitutes imbalanced multivariate time series data.This kinds of data restricts the application of artificial intelligence technology due to its imbalance,numerous variables and complex relationship.Artificial intelligence models applied n manufacturing processes mainly consist of machine learning and deep learning methods.However,the perfect models in the lab often failed or perform weaker in practical applications,the main reason is that the models have poor adaptability or generalization to the real data.Supervised machine learning model require a large amount of labeled data for the training,the lack of a certain type of labeled data would bring bias.Deep unsupervised learning model is a method often chosen for the lack of labeled data,but the difference of training data affect the generalization variance.In this thesis,imbalanced multivariate time series data of complex manufacturing process are studied and analyzed,the new methodologies for data quality modeling are developed,it is a holistic solution to improve the performance of anomaly detection and prediction using supervised and unsupervised learning models for manufacturing process.The innovation mainly includes the following three aspects:1.Data acquisition system analysis and data quality assessment.(1)Before taking action to improve the data quality,the first step is to make sure whether the data collected by the measuring device or data acquisition systems can be trusted.The experimental design and analysis of the measuring equipment or data acquisition system should be carried out,the data fitting method and the repeatability and reproducibility(Gage R&R)could be used to validate the stability and inherent errors of the measuring system.(2)Complex manufacturing processes produce a physical product as well as a data product.Complex manufacturing process could be treated as a kind of data production process,in which the data id flowing through the processes.The quality of data product could be evaluated as a physical product using the multi-dimensional assessment system.The multi-dimension assessment system includes qualitative and quantitative methods,they are combined to comprehensively evaluate the quality of imbalanced multivariate time series data.The results of data acquisition system analysis and data quality assessment are the basis and condition for the data quality improvement in the next action.2.Data quality improvement.The proposed methodology of data quality improvement mainly include training data selection and data augmentation.(1)Covariance based Distance of Multivariate Time Series(CDMTS)is proposed to calculate the “distance” of multivariate time series,which is suitable for the training data selection of multivariate time series data.Different training datasets are real-time monitored by data quality control chart,the automatic warnings are given for the poor quality of datasets exceeded the control line.The proposed method could also be used to classify different time length of multivariate time series data.(2)A Similarity based Multivariate Time Series Data Augmentation(IMTSA)is proposed,the approach produces similar data with the similar attributes as the original time series.The imbalance of data is reduced and the volume of data is increased,which can be used as the improvement of data quality.3.Anomaly detection and prediction based on data quality improvement.(1)Based on the results of data augmentation,combined with the widely used supervised machine learning models: Random Forest and Logistic Regression,the anomaly detection performance is significantly improved by the augmented data.(2)According to the results of training data selection,the deep unsupervised leaning model Autoencoder is decomposed into multiple isomorphic autoencoders(AEs)with different thresholds,while the data quality of training process is controlled by controlling the variance of the training datasets,so as to improve the anomaly prediction performance By comparison,the AEs based on data quality improvement which is data-centric methodology achieved better prediction results than the model-centric methodology like AE,LSTM-AE and GRU-AE.Data quality modeling and improvement help deep understanding the data and improve the anomaly detection and prediction performance of artificial intelligence models in complex manufacturing process.In the future,the generality of the proposed data quality modeling methodology and improvement theory in this thesis should be validated by more practical manufacturing process to achieve its true potential.We envision that the theory and methodologies of imbalanced multivariate time series data quality modeling and data-centric intelligent manufacturing can be improved and augmented continuously with the increasing sophistication of manufacturing demand and proliferation of different varieties of industrial data. |