| Shipping occupies the most important position in the global futures trade.More than 80%of the commodity trade is carried by shipping.The Automatic Identification System(AIS)can generate hundreds of millions of voyage data every day.Through the analysis of voyage data,it is possible to grasp the traffic amount and direction of various cargoes in real time,and also the changes of berths at various ports.Applying such information to actual commodity trade will undoubtedly bring huge economic benefits to users.The content of this article is to design and implement a ship behavior analysis system based on big data,and solve the research-type problems that arise in the system.The purpose of this system is to conduct a series of analysis on shipping information of cargoes such as iron ore,coal,LNG etc.And show it to users in an intuitive,intelligent and convenient way,so that users can timely,objective,and effective grasp trade and shipping information.This article mainly uses the following methods to analyze the behavior of ships:1.Design and use the Preprocessing Algorithm of AIS data.The volume of ship position data used in this article is quite huge,involving hundreds of thousands of ships.The overall AIS data in this article lasted more than five years,and the entries exceeded 100 billion.In this paper,by querying shipping-related data,and analyzing the original data of AIS,analyzing whether the data items are missing and their legitimacy,a distributed data preprocessing algorithm based on Spark is designed.By designing the unique field code named ShipNumber for the ship,and comprehensively considering the historical changes of the ship’s MMSI,designed a Partition algorithm.Using Spark for parallel calculation partitioned by ShipNumber,which greatly improves the efficiency and accuracy of data preprocessing.2.Design and optimize the clustering algorithm and confirmation algorithm of berths.Because the amount of AIS data is quite large,even if ships that transport iron ore,coal,and liquefied natural gas(LNG)are extracted analyzed separately,there are also more than one billion pieces of data.In addition,the clustering algorithm consumes more computing resources itself.If the berths are clustered in a single process,the calculation time will be an unacceptable number(in years).Aiming at this problem,this article designs a distributed DBSCAN algorithm with weights based on Spark for clustering berths of different futures goods.As everyone knows,DBSCAN is a density-based clustering algorithm,and the distribution of AIS data is not uniform between sources and geographical distributions.For example,the signal density of base station data is much greater than satellite data;the signal along China’s coast is much stronger than that of India.In order to solve this problem,this article adopts different DBSCAN clustering parameter thresholds for AIS signals distributed in different geographical locations.Meanwhile,because static data may also be generated when ships are sheltering at sea,and considering the distribution of ports in the world,this paper designs and optimizes the algorithm for confirming berths to confirm berths with high reliability.3.Design and improve algorithms for obtaining voyages.Voyage refers to the period during which a ship completes one transportation and production task in operation.In actual work,a ship is generally called a full(empty)voyage from the time when the ship is loaded(unloaded)at the departure port,and when the ship is unloaded(loaded)at the arrival port.The analysis of futures commodity trade depends on the capacity of ships,and the statistics of capacity are closely related to the division of voyages.In this article,the preprocessed AIS data is used to design and improve the ship voyage extraction algorithm.First,the parking points are extracted during the voyage of the ship,and then the parkinng points is combined to obtain the parking log,and then the voyage is obtained.4.Design and optimize algorithms for ship voyage prediction and confirmation.Most of the existing ship trajectory prediction algorithms are based on voyages near the port,and it is impossible to accurately predict voyages at sea.Based on the above-mentioned voyage extraction algorithm,the Douglas-Pucker algorithm is used to extract the sailing segments,and the current AIS data information and historical navigation trajectory are used to design a ship voyage prediction algorithm.At the same time,a voyage confirm algorithm is designed to improve the accuracy of voyage prodiction,and predict the trading amount of futures trade.Based on the algorithms of the above four parts,the following functional points of the ship behavior analysis system are designed and implemented:1.Data storage design:According to the magnitude of the data and the use scenario,design the HBase database that stores the AIS raw data and design the MySQL database that stores the crawler result data and analysis result data.2.Data development architecture design:Based on the requirements of the shipping business,the functional modules such as AIS data pre-processing,berth clustering and confirmation,ship voyage extraction,and real-time voyage prediction are designed and implemented. |