Analysing port and shipping operations using big data

The maritime freight industry is of critical importance to the economic output of the UK, with almost half a billion tonnes of freight being handled by UK ports in 2016. The Freight Transportation Association estimate that delays on both side of the Channel cost the UK logistics industry £750,000 a day. As the demands upon shipping freight are likely to increase in the future, a more in-depth understanding of the UK maritime shipping industry becomes increasingly more important.

Our report outlines the work undertaken by the Data Science Campus to explore the operation, use and relationships between ports in the UK at a macro level and the behaviour and operational characteristics of ships at a micro level, specifically:

  • national and international relationships
  • traffic at ports and related factors
  • inbound delays
  • capacity utilisation

Two sources of data are utilized:

  • Automatic Identification System (AIS). AIS data records the position, speed, heading, bearing and rate of turn for each ship, at frequent time intervals throughout its voyage
  • Consolidated European Reporting System (CERS). CERS data is collected at a higher level and records details such as destination port and expected time of arrival for the voyage of each ship

A means of storing, decoding and processing AIS data is proposed. A means by which AIS and CERS data can be merged is presented, allowing a more comprehensive analysis to be undertaken when compared with exploring each dataset in isolation. Exploratory analysis of both datasets uncovers several insights for ships using the largest UK ports and Felixstowe in particular. These insights include:

  • port traffic and utilisation
  • shipping movements
  • port network analysis
  • movement of hazardous materials
  • delays at port

A novel unsupervised machine learning approach using K-means clustering is applied to AIS data aggregated over a time-based window. This is used to classify the behaviour of a ship into one of six unique groups at every point throughout its voyage. These classifications give a more meaningful and interpretable representation of ship behaviour and intention over time when compared with raw positional AIS data.

This classification along with a series of additional non-AIS related features are used to explore the feasibility of using supervised machine learning techniques to predict the likelihood that a ship will be delayed arriving at port. Random Forests, AdaBoost, Gradient Boosting and XGBoost algorithms are applied to shipping data taken from in and around the port of Felixstowe. Results are promising with the XGBoost algorithm being able to correctly identify a ship delay in nearly 70% of test cases.

These initial results suggest that additional focus should be placed on further development of both the classification and delays models. A means by which these predictions can be used to explore, simulate and optimise the operational efficiency ports throughout the UK is also discussed. The report concludes by discussing how the tools and technique used in the project may be applied to a broader set of applications lying outside of the maritime field.

The Data Science Campus wishes to thank the Maritime and Coastguard Agency, the Centre for Big Data Statistics at Statistics Netherlands, the UK Hydrographic Office and the Department for International Trade for their support and assistance through this course of this project.

The code is available in this repository.

If you’re interested in learning more about this project or would like to get involved then you can get in touch via email or Twitter.