Font Size: a A A

Query Optimization For Multi-Source Steel Logistics Data

Posted on:2024-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:T ZouFull Text:PDF
GTID:2568307067493594Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
With the wide application of network freight platforms in steel logistics,large-scale logistics data from different sources has been generated,which plays an extremely important role in the digital transformation of the steel industry.In order to further promote the cost reduction and efficiency improvement of steel logistics,it is urgent to design efficient multi-source data management and query optimization methods to support the core functions of steel freight platform such as vehicle and cargo matching,route planning and transportation monitoring.At present,steel freight platforms lack a unified management method for multi-source logistics data,and most of them use wide tables to associate and store different data tables related to query and analysis in advance,facing problems such as data redundancy and poor flexibility.In recent years,a large number of studies in academia and industry have focused on the efficient management of spatio-temporal data for spatio-temporal query tasks,but this method cannot be directly used to solve the multisource logistics data management of steel freight platforms including spatio-temporal data and other operational data.Steel logistics data has the characteristics of massive,multi-source,dynamic,and complex structure,which brings a series of challenges for efficient storage and query:(1)Steel logistics operation data such as goods orders,waybills,vehicles,and inventory are stored in different data tables.When completing data analysis tasks such as vehicle cargo matching and vehicle scheduling,it often involves multi-table join queries,and the calculation and transmission costs of multi-table join queries are high,How to design an appropriate data filtering strategy,effectively filter out redundant columns and redundant tuples in multiple tables,and optimize the connection sequence of multiple tables to reduce network transmission resources and memory consumption is the first problem that needs to be solved?(2)In applications such as transportation route planning and monitoring,it is necessary to query the historical and real-time trajectories of vehicles executing each transportation task at high frequencies.The second problem that needs to be solved is how to integrate massive and high-speed trajectory data with smaller scale data such as vehicles and waybills to meet the query needs?(3)Steel freight mainly involves long-distance transportation tasks,requiring multiple and longer stops at different locations(such as restaurants,rest areas,gas stations,maintenance points,etc.).When planning routes for transportation vehicles,it is often necessary to involve trajectory similarity queries based on stopping points.How to define trajectory similarity based on stopping points and design efficient trajectory similarity query methods based on stopping points,Becoming the third urgent problem to be solved.In order to solve the above problems,this paper designs a management method for efficient management of multi-source data such as transportation orders,waybills,vehicles,and trajectories,and on this basis,for typical tasks such as multi-table join query,freight trajectory query,and trajectory similarity query combined with stopping points,this paper proposes multi-table join query optimization based on column traceability,query optimization for multi-attribute freight trajectory and query optimization method based on trajectory similarity based on stopping point.Specifically,the main work of this paper includes the following aspects:1.Multi-table join query optimization based on column sourcing: Aiming at the inefficient multi-table join query of steel logistics operation data,a traceability algorithm for data columns is proposed and implemented,and redundant columns are filtered according to the traceability results to reduce the scale of intermediate projection data.In addition,the algorithm for filtering redundant tuples is proposed and implemented,the specific process is as follows: the data in each table is counted on the information of the join column,and the information is used to filter out the tuples in the table containing the join column that do not meet the join conditions.Finally,a dynamic programming algorithm based on heuristic rules is designed to optimize the join order of multi-table join queries and further improve the efficiency of multi-table join queries.Experiments show that the design scheme can reduce the query time consumption by 50% and the memory occupation by 30%.2.Range query of multi-attribute freight trajectory:In order to meet the requirements of tracking range query involved in transportation monitoring and other applications,this paper proposes the definition of multi-attribute freight trajectory,and designs and implements the fusion strategy of trajectory data and logistics operation data.Subsequently,in view of the problem of data access tilt,a tiered storage strategy is designed,specifically,the No SQL database HBase and Redis are used to store data in tiers,and HBase is used to store historical data,while real-time data is stored in Redis.On this basis,the spatio-temporal index and attribute index are constructed based on the hierarchical storage model,so as to improve the range query efficiency of multi-attribute freight trajectory.Experiments show that the proposed solution can reduce the storage space and query time by 50% by 40%.3.trajectory similarity query combined with stay points:First,this paper proposes a trajectory similarity measure function based on sequence representation of stay points.Then,based on the similarity function,a filtering verification mechanism is proposed for the similarity query of long-distance freight trajectories in the case of a single machine.Grid.In order to obtain the final query results more efficiently,a signaturebased algorithm is used in the verification phase to filter out obviously dissimilar trajectories.Finally,in order to meet the similarity query requirements of large-scale long-distance trajectory data,this paper implements a distributed similarity query framework based on Spark,which evenly divides the trajectory data into each partition and builds a local index on each partition.The experiment proves that the method in this paper improves the query speed by more than 20% compared with the comparison method.In summary,this paper focuses on the management of multi-source steel logistics data to carry out related research work.Aiming at effectively managing massive multisource logistics data,a management method for efficiently managing multi-source data such as transportation orders,waybills,vehicles,and tracks was proposed.Firstly,a column traceability algorithm is proposed for multi-table join queries of logistics operation data.Through preprocessing,redundant columns and tuples are eliminated,and the join order is optimized to achieve the goal of improving multi table join queries.Then,for multi-attribute freight trajectory query,a columnar storage mode suitable for freight trajectory is designed,and historical data and real-time data are hierarchically managed and indexed to improve the efficiency of multi-attribute freight trajectory query for highfrequency access data hotspots.Finally,considering the large number of stops generated during steel freight transportation,the goal is to provide trajectory similarity queries that meet the needs of steel logistics scenarios.A trajectory similarity measurement method considering spatiotemporal characteristics of dwell points is designed,and indexing and pruning strategies are designed to improve query efficiency.
Keywords/Search Tags:Steel Logistics, Spatio-temporal Index, Column Traceability, Multi-attribute Trajectory, Hierarchical Storage, Stay Point
PDF Full Text Request
Related items