How international collaboration is advancing machine learning in official statistics
New technologies and data sources have tremendous potential to improve statistical production. They offer a way to generate statistics in a more timely, accurate and cost-efficient manner. Yet, keeping up with the pace of change is challenging, especially for National Statistical Organisations (NSOs) that must innovate with care to maintain a “gold standard” in their outputs. International cooperation between NSOs and other official statistical bodies is one way to help accelerate change in a responsible way.
In 2021, the Office for National Statistics (ONS) and the United Nations Economic Commission for Europe (UNECE) Machine Learning Group (ML 2021) demonstrated the benefits of international cooperation for technological advance. ML 2021 is coordinated by the ONS’ Data Science Campus and the UNECE High-Level Group on Modernisation of Official Statistics (HLG-MOS). It is a platform for international research collaboration, knowledge exchange, resource sharing and capacity building in the use of machine learning (ML) for official statistics.
Established by the UNECE in 2019, it has quickly grown into a valuable platform for the global statistical community. The ONS took over coordination in 2021 and worked with UNECE to expand its activities, with new research workstreams and the introduction of “Coffee and Coding” tutorials. Membership has more than doubled since January 2021 to 248 members from 33 countries, demonstrating high demand for its activities. As a result of its success, the group is now looking for new members in 2022.
The Machine Learning Group aims to:
- facilitate research projects and skill-building activities
- build a community for sharing resources and good practice
- exchange ideas and experiences
- keep up to date with the latest developments in the field
Research collaboration – investigating the added value of machine learning
Building on the group’s earlier work, members conducted 18 separate projects in 2021. These projects demonstrated the added value of ML in coding and classification, and editing and imputation. The projects also explored how to tackle the challenges of taking ML solutions into production.
Workstream 1: From idea to valid solution
“Coding and classification” was the most popular application area in this year’s research of ML applications. New application areas that participants investigated included modelling for estimation and route optimisation. One study highlighted the benefit that NSOs gain from replicating ML projects from other NSOs.
Workstream 2: From valid solution to production
The workstream explored how to make the operationalisation of ML solutions smooth and efficient. It explored how to develop a user-friendly interface and how to build a data lake that data scientists can efficiently draw data from. It also produced a paper outlining typical steps that statistical organisations take from ML experimentation to deployment.
Workstream 3: Ethics
High-level guidance was produced on ethical considerations that arise in ML projects to support analysts, researchers, data scientists, and statisticians. It has been published by the UK Statistics Authority.
Workstream 4: Model retraining
A simulation study was carried out exploring how to identify the circumstances where an ML model should be retrained to maintain its predictive power and quality.
Workstream 5: Quality framework for statistical algorithms
In the 2019 to 2020 project, a quality framework was developed to compare different methods including ML. This year, a colleague from the National Institute of Statistics and Geography (INEGI) in Mexico tested the framework on a real-use case that used natural language processing to predict occupation and economic activity. The framework was used to assess the output of this project based on a standard set of five criteria:
- explainability (understanding what causes a model to make particular decisions)
- accuracy
- reproducibility
- timeliness
- cost effectiveness
It reaffirmed the importance of having a holistic view, with quality dimension priorities varying between stakeholders at different stages of the production cycle.
This year, the group has discussed and experimented on new application areas for the first time, such as route optimisation and modelling for estimation. The workstreams have also made progress on several new and under-explored production issues. These include how to obtain high-quality training data sets, how to monitor model decay once deployed, and how to develop user interfaces.
The group’s work has shown that ML will be essential for integrating big data into production in an efficient and accurate manner. Privacy and ethical concerns will grow as public awareness of artificial intelligence (AI) increases, so statistical organisations will also need to establish robust systems to address them.
Join us for the next stage of our machine learning journey
Machine learning (ML) is evolving quickly. In 2021, significant progress was made, but there are many more issues to explore before the full potential of ML is achieved within official statistics. The group will continue its work in 2022 (ML 2022), focusing on:
- taking ML models into production
- international research collaboration
- capacity building in data science
We want to hear from anyone working in official statistics and data science communities with an interest in ML. You will receive invites to our meetings, regular updates on developments and opportunities, and access to the members-only area of our website. More information is available on our website where you can find out more about last year’s activities and read the 2021 report.
If you would like to join, please contact Alison Baily at the Office for National Statistics (ONS) and InKyung Choi at the United Nations Economic Commission for Europe (UNECE) by emailing ML2022@ons.gov.uk.
We also want to hear new ideas for activities in research, knowledge exchange and capacity building. Whether you are an ML expert or a relative beginner, we welcome proposals for activities that you could lead or organise with support from other members of the community in 2022. This can be anything from running your own research project to organising a study group or inviting an expert to give a one-off talk or tutorial.
If you would like to submit an activity proposal, please read the invitation letter and fill out the accompanying form by Wednesday 19 January 2022. We aim to announce the programme and begin work at the end of January.