Research On Storage And Query Of Quality Data Of Steel Plates Based On Spark

Posted on:2019-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:K C Mao

Full Text:PDF

GTID:2428330548978818

Subject:Computer Science and Technology

Abstract/Summary:

Plate strip quality data is an important type of data collected during the strip steel production process.The time series is the main type and has the characteristics of high dimensions,high noise,and non-stationary characteristics.Therefore,the similarity query for related data is always One difficulty,especially related research based on the distributed computing framework Spark,is not much.Most of the methods that use the Spark framework to study the similarity query of sequential data stay in the stage based on native RDD.Once the data volume exceeds the partition to bear the load,its performance will rapidly decrease.In order to solve the problems encountered in the Spark quality data similarity query on Spark,this paper conducts the following research:(1)First,we study some basic issues that need to be studied in time series similarity queries,such as similarity measures,time series representation methods,and indexing methods.We studied the stability of time series data for the characteristics of strip quality data.An Empirical Mode Decomposition(EMD)method was introduced,and a segmentation-based linear representation(PLR)method based on important point segmentation was introduced for dimension reduction of time series data.(2)We studied the efficiency of R-tree indexing and MVP-tree indexing under the distributed memory computing platform,and for the similarity query of massive time series data,solidified the index to storage nodes,enabling the system to perform time query operations with high throughput and low latency.evaluation experiments for the efficiency of temporal queries,show that in the face of different dimensional time series,Spark uses different indexes,and the query results will be different.(3)Based on the in-memory computing framework Spark and the sub-frame SparkSQL,an extended system S-TSQS(Spark Time-sereis Similarity Query System)supporting similarity query of sequential data was designed.The system uses Spark's extensible features,adds new DataSet API methods,and introduces an index management mechanism for Spark.Compared with the original SparkDataSet query processing solution under different influencing parameters,this solution has better performance.

Keywords/Search Tags:

Strip steel quality data, time series, Similarity query, Two stage processing, Time series index

Related items

1	Query Processing Techniques Based On Time Series Analysis
2	Study On Water Quality Time Series Data Mining And Application Integration
3	Study On Similarity Query Over Time Series Data
4	Research On Uncertain Time Series Similarity Matching
5	Research On Test System Of Data Management Platform For The Time Series Database
6	The Approximate Query Research Of Time Series Based On Linear Hash Index
7	Time Series Data Mining Technology And Its Applied Research In The Prediction Of Water Quality
8	Time Series Similarity, Aggregate Top-k Query Algorithms And Applications
9	Research On Data Mining And Forecasting Methods Over Time Series Data With Complex Structure
10	Research On Real-time Identification Method Of Data Stream Time Series Events