Evaluation of Metrics for Assessing Synthetic Tabular Data Quality

Name: Evaluation of Metrics for Assessing Synthetic Tabular Data Quality
Start: 2025-06-10T15:30:00Z
End: 2025-06-10T16:00:00Z
Location: La Llotja palace

Jun 10, 2025·

Nora Amama-BenHassun

Slides

seio 2025

Abstract

The need to comply with privacy rules and reluctance to share original datasets have fueled synthetic data (SD) adoption. While current methods mimic original data, their quality assessment relies on validation metrics, whose reliability is uncertain, making it a key research focus. This research introduces key validation metrics, focusing on tabular SD. After an extensive state of art, we describe a broad set of measures, some of them adapted from other contexts, and propose a framework to guide the selection of appropriate metrics based on specific use cases. We set a comprehensive simulation study aim to assess the reliability of resemblance metrics, including the Propensity Score Mean-Squared Error and the Kolmogorov-Smirnov statistics, among others, when applied to different generation methods. The goals are twofold: first, to enhance the evaluation of synthetic data quality, and second, to address current deficiencies and facilitate their use in privacy-invasive domains.
Keywords: Synthetic Data, Tabular Data, Data Privacy, Data Utility, Validation Metrics

Date

Jun 10, 2025 3:30 PM — 4:00 PM

Event

XLI National Congress of Statistics and Operations Research (SEIO 2025)

Location

La Llotja palace

Avinguda de Tortosa, 6, Lleida, Lleida 25005

Last updated on Jun 10, 2025