Technical GlossaryData Science and Data Management
Synthetic Data Leakage
A risk in which synthetic data leaks membership or privacy-sensitive information because it preserves too much trace of the real data.
Synthetic data leakage is a critical risk that can undermine the expected privacy benefits of synthetic generation. If the generator reproduces certain real records too closely, membership or personal information may be exposed through the synthetic dataset. For that reason, synthetic data must be tested not only for quality, but also for resistance to privacy attacks. Membership inference and nearest-neighbor-based checks become especially important here. Synthetic data security is an inseparable part of successful generation.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
