Evaluating synthetic data using SynthGauge

The availability of synthetic data is becoming increasingly important as organisations look to collaborate on research and analyses without disclosing any private or sensitive information contained in their data.

Data synthesis is an active area of research for many organisations, including the Office for National Statistics (ONS). It is the process of replacing a private dataset with one that looks and behaves the same but divulges no sensitive data.

When generating synthetic data, there is a trade-off between keeping sensitive information private and ensuring the dataset is still fit for purpose and meets users’ needs. Making synthetic data with a greater degree of privacy generally means they become less useful and vice versa.

Evaluating synthetic datasets within the context of their downstream purposes is therefore an import stage in the adoption of these methods. There are various metrics that can measure aspects of privacy and utility within datasets but there is no one-size-fits-all approach. Instead, to make the most informed decisions, it is important to evaluate a range of metrics and this is where SynthGauge can help.

SynthGauge is a Python library that provides a framework for evaluating synthetic data using a range of metrics and visualisations. It includes a suite of metrics covering privacy and utility, and work is continuing to explore new metrics that can be added.

Through its Evaluator, SynthGauge provides an intuitive and consistent interface to evaluate synthetic datasets. Many datasets can be compared consistently and quickly to provide vital insight.

SynthGauge will not make any decisions on behalf of the user or specify if one synthetic dataset is better than another. This decision is dataset- and purpose-dependent so can vary widely from user to user. Instead, SynthGauge is intended to support decision makers.

With engagement from the open-source community, we hope the suite of metrics can be expanded and refined. There is out-of-the-box support for custom metrics and users are encouraged to explore and push the SynthGauge capabilities. It is an evolving product.

To help you get started, the SynthGauge repository and API Reference documentation are available on GitHub.

If you have any questions about SynthGauge or can help us to improve it, please contact us by email.

Additional authors: