Data science for sustainable development

The Sustainable Development Goals (SDG) cover the most pressing issues of our time. With 244 indicators across 17 goals, measuring the SDGs is a huge challenge for statistical agencies. Their need to be globally relevant means that many of the indicators have not previously been reported nationally and so data to measure them is often not readily available.

The report published today by the Sustainable Development Goals team at the Office for National Statistics shows tremendous progress has been made in the UK by publishing headline statistics against three-quarters of indicators. However, for a number of the remaining indicators, new sources of data still need to be determined or methods found to provide data at a sufficient level of quality and granularity. This presents a worthy challenge for data science which we, at the Data Science Campus, have taken up with our colleagues in the Sustainable Development Goals team.

One of the first Campus projects was to work with with the SDG team to develop a reusable platform to visualise the reporting status of the Sustainable Development Goals in the UK. Now known as Open SDG, the platform is the result of collaboration between the ONS, the US government and the non-profit Centre for Open Data Enterprise (CODE).

Two years later, the project has been scaled up and there are now more than 10 countries using the Open SDG solution for reporting their SDG data, with interest from others. These countries include:

  • Armenia
  • Germany
  • Ghana
  • Jamaica
  • Kazakhstan
  • Kyrgyzstan
  • Moldova
  • Namibia
  • Poland
  • Rwanda
  • UK

The project is being further developed into a fully-fledged website by the Open SDG community.

The Sustainable Development Goals team presenting the reporting platform to a delegation from Argentina.

Since then, we’ve been helping to tackle the gaps in reporting in three main ways:

  • exploring novel data sources,
  • developing new methods and tools for measuring indicators
  • improving statistics that support SDG-relevant policies

Exploring the use of novel data sources to measure specific indicators

The SDGs are a global challenge and are, in some cases, not aligned with current national data collections which are based on their relevance to UK policy issues. Reusing existing statistics or conducting new data collection campaigns can be difficult so innovative solutions using big data and alternative data sources are being explored.

Our exploration of global, open, geospatial datasets are examples of this.

Indicator 9.1.1. measures the proportion of the rural population in a country that can conveniently access the road network in all seasons. While we have good data for measuring this indicator within Great Britain, making international comparisons is more difficult. Working with our colleagues in ONS Geography, we have investigated using global datasets such as Open Street Map, the Global Roads Inventory Project, and the Global Human Settlement Layer for this purpose.

Indicator 6.6.1 tracks changes in the spatial extent of water-related ecosystems such as inland open waters. The need to understand how such systems have changed over time makes this a difficult indicator to measure with conventional mapping data. We have been investigating the use of the Global Surface Water dataset for this purpose.

Critical in each of these cases has been assessing quality of the datasets for the purpose of producing statistics. To do this we are investigating how quality assurance frameworks, Quality Assurance for Administrative Data (QAAD) and European Statistical Systems (ESS) dimensions of quality can be applied to account for the novel and geospatial nature of these sources.

Developing cross-cutting data science methods and tools to support measurement and disaggregation

A principle of the SDGs is “leave no one behind.” Data science aims to help develop disaggregated indicators to ensure that those at risk of disadvantage because of their characteristics, location, or socio-economic status are recognised. This is a significant challenge as many statistics are currently only available at a headline national level.

Data Science can look to address this through modelling against alternative data sources. For instance, accessibility is an issue that appears in several SDGs and which varies substantially among different groups. For example, indicators 1.4.1 and 3.8.1 relate to those with access to basic and health services, and indicators 9.1.1 and 11.2.1 relates to those with access to transport infrastructure.

The Campus has developed a reusable tool that uses open transport data and tools to determine how accessible service locations are for dispersed communities. We are working with the SDG team to explore how this tool could be applied to these indicators.

Providing data and insight to support policies aimed at achieving the goals

Data science can help support complex policy issues through the development of innovative new statistics. For example, improving our valuation of green space in urban residential gardens can inform policies aimed at progressing SDGs such as Goal 11, to make cities and human settlements inclusive, safe, resilient and sustainable, and Goal 13, to take urgent action to combat climate change and its impacts. However, the age and frequency at which indicators are published may mean that they cannot be used to drive early policy interventions or make operational decisions. This is an issue recognised by the recent establishment of the Data For Now initiative.

The Global Surface Water dataset allows indicator 6.6.1 to be measured with a high accuracy across the world, but because of the complexity in producing it, changes are only reported annually. This means it is unable to provide early identification of emerging issues in the distribution and quantity of available water caused by events such as climate change, drought, flooding, or human activities.

We are working with the UN Environment Programme to explore whether high-resolution, near real-time satellite imagery can address these limitations to provide rapid assessments of emerging issues and localised assessments in various countries.

Continuing to build capability in data science

The principle to “leave no one behind” could equally refer to the technological gap that limits developing countries to measure and report against the SDGs. Recognising this, we have looked to help build capability in data science, and to favour solutions that have a capacity for reuse through the UN Global Platform for Official Statistics. For example, by providing worldwide access to global geospatial and earth observation datasets, as well as sharing the resulting algorithms and methodology via the methods service.

In recent months, we’ve also welcomed the first recruits to our AI for International Development hub. The hub will enable us to scale up our work across the SDGs.

The SDGs represent one of the greatest data challenges of our age. In addition to reporting, data must drive the policy changes and early action to see the SDGs achieved. Data science is essential to this effort by drawing new insight from novel data sources, and improving methods that can be shared and reused so that no one is left behind. Our work with international collaborations will help ensure solutions to such complex issues remain relevant and fit-for-purpose by a global audience.