Font Size: a A A

Research On Storage And Query Of Quality Data Of Steel Plates Based On Spark

Posted on:2019-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:K C MaoFull Text:PDF
GTID:2428330548978818Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Plate strip quality data is an important type of data collected during the strip steel production process.The time series is the main type and has the characteristics of high dimensions,high noise,and non-stationary characteristics.Therefore,the similarity query for related data is always One difficulty,especially related research based on the distributed computing framework Spark,is not much.Most of the methods that use the Spark framework to study the similarity query of sequential data stay in the stage based on native RDD.Once the data volume exceeds the partition to bear the load,its performance will rapidly decrease.In order to solve the problems encountered in the Spark quality data similarity query on Spark,this paper conducts the following research:(1)First,we study some basic issues that need to be studied in time series similarity queries,such as similarity measures,time series representation methods,and indexing methods.We studied the stability of time series data for the characteristics of strip quality data.An Empirical Mode Decomposition(EMD)method was introduced,and a segmentation-based linear representation(PLR)method based on important point segmentation was introduced for dimension reduction of time series data.(2)We studied the efficiency of R-tree indexing and MVP-tree indexing under the distributed memory computing platform,and for the similarity query of massive time series data,solidified the index to storage nodes,enabling the system to perform time query operations with high throughput and low latency.evaluation experiments for the efficiency of temporal queries,show that in the face of different dimensional time series,Spark uses different indexes,and the query results will be different.(3)Based on the in-memory computing framework Spark and the sub-frame SparkSQL,an extended system S-TSQS(Spark Time-sereis Similarity Query System)supporting similarity query of sequential data was designed.The system uses Spark's extensible features,adds new DataSet API methods,and introduces an index management mechanism for Spark.Compared with the original SparkDataSet query processing solution under different influencing parameters,this solution has better performance.
Keywords/Search Tags:Strip steel quality data, time series, Similarity query, Two stage processing, Time series index
PDF Full Text Request
Related items