Leading international collaboration in machine learning for official statistics

With a growing demand for more timely, accurate, relevant and trustworthy data sources to inform society and decision makers, National Statistics Organisations (NSOs) have been investigating innovative ways to produce official statistics more efficiently. The ONS-UNECE Machine Learning Group 2021 (ML 2021) is one initiative addressing this need and builds on the momentum of the 2019-2020 UNECE Machine Learning Project.

ML 2021, launched in January 2021, is led by the Office for National Statistics (ONS) Data Science Campus in partnership with the UN Economic Commission for Europe High-Level Group for the Modernisation of Official Statistics (UNECE HLG-MOS). ML 2021 provides a friendly platform for the global statistical community to develop research, build skills and share common challenges and solutions on machine learning developments and applications in the official statistics space.

Here we share our recent progress, including the ML 2021 structure (PPT, 52KB), governance, and the research workstreams the group will explore. More information on recent developments and resources is available on our public page.

Our first quarter

As 2020 came to a close, an outreach exercise was conducted to promote the new group and refresh ML 2021 membership. In an amazing response from the international community the membership doubled, and ML 2021 now has 250 engaged members from 33 countries and 4 international organisations. Members were then asked to submit machine learning (ML) activity proposals outlining main objectives and outputs. These were used to help inform the group’s governance structure and community-driven work schedule for 2021. These proposals were grouped into workstreams and members confirmed. ML 2021 has 18 research projects in progress across five workstreams.

Workstream 1: Pilot studies: from idea to valid solution

These are activities which are still in the proof of concept phase. Split into themes, they will benefit from existing good practice, knowledge and experiences of other group members. The themes include:

  • coding and classification;
  • imputation;
  • imagery;
  • modelling for estimation;
  • transferring knowledge and experience;
  • route optimisation.

Workstream 2: From valid solution to production

This workstream explores how proven ML applications can be integrated into production models. Its objective is to determine a best practice workflow for the applications examined as part of the workstream.

Workstream 3: Data ethics and governance

Data ethics and governance have been discussed regularly in recent years, but never fully explored. This workstream gives us that opportunity to consider existing ethical frameworks and determine ethical principles which can be applied to ML within the official statistics space.

Workstream 4: Quality of machine learning training data

It is important to have good quality training data when developing ML models. This workstream will identify and describe circumstances where retraining of models is needed, how retraining needs can be detected, and whether objective criteria can be developed to enable retraining to be triggered automatically.

Workstream 5: Quality Framework for Statistical Algorithms

This final workstream will explore dimensions of the Quality Framework for Statistical Algorithms (QF4SA) within a consolidated project and analyse the output based on a set of standard metrics and procedures.

Business as usual

The group meets monthly to hear about the latest developments from workstream leads. Live polls and surveys consult on ways forward and guest speakers are also invited to keep the membership informed on global ML developments. Previous presentations are available on the ML 2021 public wiki.

Workstream members are also encouraged to meet regularly to ensure progress is made, and draft interim reports detailing their findings to share with ML 2021 members.

What is next?

Now the group’s core workstreams are established, we are providing the ML community with events and resources that can support the progress of their workstream and further share developments and expertise globally. Most events are open to non-members and feature ongoing work from the group itself.

For example, we are hosting a workshop exploring the practical application of developing ethical principles, coffee & coding type events on the lifecycle of ML projects and demo sessions on relevant tools for the application of ML. The group is also developing a ML training courses catalogue to respond to the demand for a more structured and easily accessible training curriculum. An open 1 to 2 day webinar is also planned for December to showcase the projects carried out by the group.We are also making sure we build and maintain connections with other global initiatives such as the UN Committee of Experts on Big Data and Data Science for Official Statistics (UN-CEBD) and the HLG-MOS Synthetic Data Project.

Find out more

To support the ML 2021 Group’s progress, take part in events, and gain access to our working documents, contact us.

An official UNECE publication detailing the findings from the ML Project 2019 to 2020 and some ML 2021 activities is also planned for later this year – stay tuned!

Our friends in UNECE Statistics have also recently published a news article detailing their work exploring the relevance of ML within the official statistical space.

Oliver Mahoney, International Relations and Data Science

Additional authors: