Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data
OPEN ACCESS
Loading...
Author / Producer
Date
2024-02
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The protection of private information is of vital importance in data-driven research, business and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper, we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy and synthetic data.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
24 (1)
Pages / Article No.
179 - 226
Publisher
Springer
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Differential privacy; Synthetic data; Covariance loss
Organisational unit
09679 - Bandeira, Afonso / Bandeira, Afonso