The Definition of Differential Privacy - Cynthia Dwork

Institute for Advanced Study

14 Nov 201618:21

Summary

TLDRThe speaker discusses differential privacy, focusing on its definition and evolution over the past decade. They explore privacy-preserving data analysis, using examples like the U.S. Census and statistical research, and explain how differential privacy protects individual data while still allowing population-level insights. The talk highlights the limitations of de-identified data, the challenges of balancing privacy with accurate statistical analysis, and the concept of privacy loss. It also touches on the composition property, which enables complex privacy-preserving algorithms, and the potential societal implications of learning from population data.

Takeaways

📅 The speaker is giving a talk on differential privacy, a concept that has evolved over the past decade.
🔐 Differential privacy aims to allow data analysis while preserving individual privacy, with applications like the Census Bureau and epidemic detection.
🤔 The idea of de-identifying data is flawed because it doesn't truly protect privacy, as the speaker learned from discussions with Helen Nissenbaum.
📊 Releasing statistics instead of raw data is a method that seems to protect privacy, but it can be compromised if too many accurate statistics are released.
👤 The speaker's interest in privacy stems from conversations with Helen Nissenbaum, focusing on privacy-preserving data analysis.
🔑 Differential privacy provides a mathematical definition of privacy that ensures the outcome of any analysis is independent of whether any individual joins or refrains from joining the dataset.
🔗 The concept of 'adjacent datasets' is introduced, which are datasets that differ by only one person's data, crucial for defining differential privacy.
🎰 Differentially private algorithms use randomness (flipping coins) and a parameter epsilon to ensure privacy.
🔒 The formal definition of differential privacy involves the probability of any output event being almost the same for adjacent datasets, controlled by epsilon.
🌐 Differential privacy has properties like future-proofing, applicability to groups, and the ability to handle multiple analyses (composition).

Q & A

What is the main topic of the presentation?
-The main topic of the presentation is differential privacy, specifically its definition and application in privacy-preserving data analysis.
Why was the speaker interested in privacy-preserving data analysis?
-The speaker became interested in privacy-preserving data analysis due to conversations with Helen Nissenbaum, seeking a way to address privacy concerns with mathematical rigor.
What is an example application of differential privacy mentioned in the script?
-An example application of differential privacy mentioned is the Census Bureau's data analysis for resource allocation while maintaining privacy.
What is the issue with de-identified data according to the speaker?
-The issue with de-identified data is that it is either not data or not de-identified, implying that de-identified data still carries privacy risks.
What is the 'fundamental law of information reconstruction' referred to in the script?
-The 'fundamental law of information reconstruction' refers to a collection of results showing that overly accurate estimates of too many statistics can completely destroy privacy.
What is the formal definition of differential privacy provided in the script?
-Differential privacy is defined such that for all pairs of data sets differing in one person's data, the probability of any output event is at most e to the power of epsilon times the probability of the same event for the other data set.
What does epsilon represent in the context of differential privacy?
-Epsilon is a parameter that measures privacy loss in differential privacy; a smaller epsilon indicates stronger privacy protection.
What are the properties of differentially private algorithms mentioned in the script?
-The properties mentioned include future-proof privacy, scalability to groups, understanding privacy loss under composition, and the ability to build complex analyses from simple building blocks.
Why does the speaker argue that learning about the population does not necessarily compromise privacy?
-The speaker argues that learning about the population does not necessarily compromise privacy because such learning would occur even if the individual was replaced by another random member of the population.
What is the significance of the composition property in differential privacy?
-The composition property is significant because it allows for understanding and bounding the cumulative privacy loss over multiple analyses, which is crucial for complex data analyses.
How does differential privacy ensure the generalizability of data analyses?
-Differential privacy ensures generalizability by blurring answers just enough to protect individual privacy, allowing insights drawn from a dataset to accurately represent the broader population.