Using Open Source Tools to Build Privacy-Conscious Data Systems
Summary
TLDRIn this talk, a senior software engineer from Ethical AI discusses the complexities of data privacy compliance, highlighting the increasing number of global regulations. The speaker outlines the seven foundational principles of GDPR and emphasizes the importance of data subject requests, data mapping, and consent tracking. They introduce Fides, an open-source privacy platform, designed to automate compliance processes, cater to different personas, and ensure privacy throughout the software development lifecycle. Fides is praised for its comprehensive approach, addressing DSR processing, data mapping, user privacy interfaces, and compliance enforcement.
Takeaways
- π Data privacy is becoming increasingly complex with over 30 countries and 6 U.S. states having passed privacy legislation, affecting how companies handle personal data.
- π The GDPR's seven foundational principles serve as a model for many privacy laws, emphasizing the importance of compliance across different regions.
- π The 'triumvirate of compliance' for technology companies includes Data Subject Request (DSR) processing, Record of Processing Activities (RoPA), and Consent Tracking.
- π¨βπ» Manual processes for compliance are not scalable and can be costly, leading many organizations to seek automated solutions.
- π οΈ Fides, an open-source privacy engineering platform, aims to address privacy compliance challenges by offering tools for automated DSR processing, data mapping, and consent management.
- π Fides is designed to cater to different personas within an organization, including software engineers, privacy engineers, compliance professionals, and security professionals.
- π» The platform includes a CLI for developers, an API for configuration and execution, and a UI for privacy administration, providing a comprehensive approach to privacy management.
- π Fides uses a language called Fides Lang to express privacy policies as code, allowing for evaluations against systems and data sets to ensure compliance.
- π The Python Software Foundation has recognized Fides' value, contributing to its development and implementing it as part of their infrastructure.
- π Fides is not just a compliance tool but a holistic solution that covers the entire data lifecycle, from development to runtime, and includes user-facing privacy centers.
Q & A
What is the significance of the GDPR's seven foundational tenets in data privacy?
-The GDPR's seven foundational tenets serve as a comprehensive framework for data protection and privacy. They outline the main requirements that organizations must adhere to in order to ensure compliance with data privacy laws, including principles like data minimization, purpose limitation, and the right to erasure.
How does the speaker describe the current state of data privacy regulations globally?
-The speaker describes the current state of data privacy regulations as complex and growing, with over 30 countries having data protection laws, including the EU as a single entity, and six U.S. states having passed privacy legislation, with more in progress.
What does the speaker refer to as the 'triumvirate of compliance'?
-The 'triumvirate of compliance' refers to three critical components that technology companies must address to ensure data privacy compliance: data subject request processing, record of processing activities (ROPA), and consent tracking.
Why are manual processes for handling data privacy not considered scalable according to the speaker?
-Manual processes for handling data privacy are not scalable because they often rely on interns or data engineers performing repetitive tasks like running SQL queries, logging into APIs, and sending emails, which is both time-consuming and expensive.
What is the role of the 'Privacy Center' in the context of the Fides platform?
-The 'Privacy Center' in the Fides platform is a user-facing interface that allows individuals to manage their privacy preferences, such as data access, data erasure, and consent management. It is a key component in how users interact with the platform from a privacy perspective.
How does Fides aim to help with the processing of Data Subject Requests (DSRs)?
-Fides aims to help with the processing of Data Subject Requests (DSRs) by providing automated DSR processing capabilities, which can reduce the manual workload and improve the efficiency of handling such requests.
What is the significance of the 'fideslang' in the Fides platform?
-The 'fideslang' is a YAML-based language used in the Fides platform to express privacy policies and metadata. It allows for the codification of privacy policies in a way that can be evaluated against the code and systems to ensure compliance.
How does the speaker suggest using Fides during the software development lifecycle?
-The speaker suggests using Fides during the software development lifecycle by integrating it into the CI process, using it as a git hook, and employing its CLI for maintaining privacy at development time, which can help catch privacy failures before they reach production.
What is the importance of the Python Software Foundation's contribution to Fides mentioned in the script?
-The Python Software Foundation's contribution to Fides is significant because it indicates the recognition of Fides by a major organization in the Python community. It also implies that Fides will be integrated into the Python Software Foundation's infrastructure, potentially increasing its adoption and use.
What are the different personas that Fides aims to cater to?
-Fides aims to cater to different personas including software engineers, privacy engineers, compliance professionals, and potentially security professionals. It provides a CLI for development time privacy maintenance, an API for configuration and execution, and a UI for privacy administration during runtime.
Outlines
π Introduction to Data Privacy and Compliance
The speaker, a senior software engineer at Ethical, kicks off the final session of the day by addressing the audience's potential exhaustion but assures that the topic of data privacy and compliance is exciting. The talk will focus on modern privacy regulations, highlighting the complexity of data privacy laws across over 30 countries and six U.S. states, emphasizing the critical need for compliance. The speaker outlines the seven foundational tenets of GDPR, which serve as a basis for many privacy laws, and introduces the concept of the 'triumvirate of compliance' for technology companies: data subject request processing, record of processing activities (ROPA), and consent tracking. The talk aims to explore how open source tools can assist in maintaining compliance with these regulations.
π οΈ The Role of Engineers in Privacy Compliance
The speaker emphasizes the collective effort required for privacy compliance within a company, involving not just a compliance or privacy engineering team, but all teams, including software engineers. The importance of synergy among teams is stressed, as privacy compliance is not a task for a single group. The speaker introduces 'Fides', an open-source privacy as code platform, which aims to address the challenges of privacy compliance. Fides is pronounced after the Roman goddess of trust and is designed to be a comprehensive platform for building and maintaining privacy-respecting software throughout the software development lifecycle. The platform is built by privacy experts and is open-source, supported by a private company with a team of engineers and compliance experts. Fides caters to different personas within an organization, including software engineers, privacy engineers, compliance professionals, and security professionals, by providing a CLI for development time, an API for configuration and execution, and a UI for privacy administration during runtime.
π Demonstrating Fides with a Live Example
The speaker proceeds to a live demonstration of Fides, starting with the command line interface (CLI). The CLI allows for the deployment of a sample application that showcases the evaluation of privacy policies against code and the handling of user privacy requests. The speaker introduces 'Fides Lang', a language used to express privacy as code, which includes systems, data sets, and policies. These are the foundational elements for discussing privacy in a code-relevant manner. The demonstration includes pushing metadata files to a server, evaluating privacy policies, and showing how privacy breaches can be caught during development, preventing privacy issues from reaching production. The speaker also shows the user-facing privacy center, where users can manage their data access, erasure, and consent, emphasizing the importance of a dark pattern-free consent flow.
π Holistic Privacy Compliance with Fides
The speaker summarizes the capabilities of Fides in maintaining privacy throughout the software development lifecycle and user interactions. Fides includes connectors to data sources, including third-party APIs, to handle data requests and ensure compliance. The speaker discusses the importance of handling data subject requests, consent tracking, and data mapping, which are often complex and require a holistic approach. Fides is presented as a solution that addresses these challenges by offering tools for DSR processing, data mapping, a user-facing privacy center, and enforcing compliance during development. The speaker acknowledges the difficulty of data compliance and the reality that many companies are either non-compliant or only partially compliant. Fides is positioned as a comprehensive, open-source solution supported by privacy experts, with recent contributions from the Python Software Foundation. The talk concludes with resources for learning more about Fides, including documentation, GitHub repositories, and a podcast episode featuring the speaker.
Mindmap
Keywords
π‘Data Privacy
π‘Compliance
π‘Data Subject Request (DSR)
π‘Data Mapping
π‘Consent Tracking
π‘GDPR
π‘Open Source Tools
π‘Fides
π‘Privacy as Code
π‘Data Lifecycle
π‘Global Privacy Control (GPC)
Highlights
Introduction to the importance of data privacy and compliance in modern software engineering.
Discussion on the increasing complexity of data privacy with over 30 countries having data protection laws.
Mention of six U.S states with privacy legislation and the trend towards more states adopting similar laws.
Overview of the seven foundational tenets of GDPR as a basis for many privacy laws.
The 'triumvirate of compliance' comprising data subject request processing, data mapping, and consent tracking.
Challenges faced by companies in complying with privacy laws, including reliance on manual processes and the cost of non-compliance.
The need for a holistic approach to compliance that covers the entire data lifecycle.
Introduction to Fides, an open-source privacy engineering platform designed to address compliance challenges.
Fides' support for different personas including software engineers, privacy engineers, and compliance professionals.
Demonstration of Fides' CLI for pushing privacy metadata files to a centralized server.
Explanation of Fides' privacy policy evaluation process to ensure code compliance with privacy policies.
Showcase of Fides' user-facing privacy center for managing data access, erasure, and consent.
Discussion on how Fides helps maintain privacy during the software development lifecycle.
The integration of Fides with the Python Software Foundation and its use in pip infrastructure.
Summary of Fides' comprehensive solution for DSR processing, data mapping, user privacy centers, and development-time compliance enforcement.
Promotion of resources for learning more about Fides, including documentation, GitHub repositories, and a podcast episode.
Transcripts
foreign
[Music]
I'm a senior software engineer at ethica
thank you for being here on the final
session of the first day I'm sure you're
all exhausted
but we're going to talk about the most
exciting thing which is data privacy
compliance
so really quick we're going to speed run
modern privacy regulations that's going
to give us kind of a nice set stage for
when we talk about
how open source tools can help you stay
compliant
so really quick there are 30 plus
countries across the world and that is
actually including the EU as a single
entity so we're looking at what almost
60 countries with data protection laws
six U.S states have already passed
privacy legislation you'll probably know
CCPA in California Florida has their own
Texas past their own and there are 20
more states on the way so data privacy
is getting more complex by the day there
are all these different rules for all
these different places and it's becoming
more and more critical to stay on top of
things
so generally what do these laws require
these are kind of the seven foundational
tenets of gdpr I'm not going to go into
each one but this is what's listed on
their website is kind of the main things
to remember and even though not all
privacy laws are exactly this
um they're pretty similar it was kind of
a uh we're going to copy your homework
but like change a little bit situation
um where they're all generally similar
to this but with a few changes here and
there
practically for engineers for technology
companies this comes down to what I call
the triumvirate of compliance so how do
we stay compliant with generally what
those seven tenants require so number
one is going to be data subject request
processing it's going to be DSR from
here on out so you've probably heard of
like the right to be forgotten the right
to Erasure you've got the right to
access so being able to see what data a
company has about you second is a ropa
that's like the legalese record of
processing activities some people call
it the data map you need to be able to
to tell The Regulators every year
what are you doing with whose data and
what is your what is your legal grounds
for doing so you have to be able to
explain this to regulators and then
finally we have consent tracking I'm
sure you all remember a few years ago
the internet got even more annoying
because every website you go to now has
a pop-up and you have to click a thing
that says okay you can use my data no I
don't use my data
um and you need to actually track that
so when when a user comes to your
website
they're now generating data they're
probably storing and you actually need
to
respect their wishes in terms of consent
all throughout the life cycle of the
data
so organizations in some cases are
trying to to comply with all these
things and from what we've seen a lot of
companies are relying on manual
processes so this basically just ends up
being a bunch of interns or a bunch of
data Engineers running SQL queries they
are logging into apis they're sending
emails manually and it's not scalable it
gets really expensive
and there's no one to make coffee
because the the interns are all busy
handling data subject requests
next you have some companies are just
kind of
accepting that it's too complex and it's
too expensive and they're just going to
say okay we're going to be out of
compliance we know that it's a risk but
it's a risk we're going to take because
the
the alternative is just too difficult
and then finally there are vendors
there are people out there buying their
Solutions and adopting them but again
to be truly compliant you have to cover
data throughout the entire lifestyle
life cycle so it's not just our data
warehouse is covered so we're good there
or it's not just okay we have a cookie
consent Banner on our web page
everything needs to be working together
the data subject requests need to get
processed against the data warehouse and
the application database so it's a lot
more complex than just usually a single
solution can provide
so to sum up
an ideal way to solve this problem is to
have a single tool that can do automated
DSR processing right so
give the interns a break there's not
enough of them on Earth to handle all
these requests
automated data mapping so some kind of
tool that can go and figure out where
your data lives what it's doing there
why it's being used
Etc and then finally a dark pattern for
you that's very specific because people
are actually getting caught on this a
dark pattern free consent flow so people
need to be able to visit your website
they need to be able to give consent and
that needs to be stored and used later
when processing dsrs
Etc
so this is the ideal state right so
we've got software engineers and privacy
professionals working together
to achieve privacy compliance because
just like security it's not something
that a single team or a single group of
people can solve within a company you
have potentially a compliance team or
privacy engineering team and they're
there to set guidelines and they're
there to make recommendations and work
with people but they need the Synergy of
all the other teams to really make this
work and that's why it's really
important that a tool also includes
software Engineers as part of that
because ultimately we're the ones
building the software or data Engineers
we're the ones building the software
that's using this data and we know it
better than a privacy engineer is going
to know it
okay so we're going to enter fides which
is the open source privacy is code
platform that we're going to talk about
today that's aiming to solve some of
these problems
all these problems in fact
so what is it first off it's pronounced
fides and it's from the goddess of trust
and Roman mythology our company name as
well ethica
earlier I called it a triumvirate a lot
of Greco-Roman themes going on here
and it's a platform an entire platform
not just like a single application it's
a platform for building and maintaining
privacy inspecting software
um that means it's going to cover
the software development lifelike life
cycle itself so from the the engineering
stage due to the CI stage all the way
into runtime runtime applications
it's built by privacy experts so it is
fully open source but it's being backed
by a private company full of Engineers
and compliance experts who understand
this kind of stuff because it is really
complicated and then finally
as I had mentioned earlier you have to
be able to cater to different personas
you need to have software Engineers
working with privacy Engineers working
with compliance professionals
potentially even then working with
Security Professionals
so we have a CLI for maintaining privacy
at development time we have an API for
configuration and execution and then we
have a UI for privacy Administration
during run time so we're catering to all
these different personas in a single
application single platform
now we're going to hop into a demo
so here we've got the fetus command line
spun up
um
it is in Python so it's pip and soluble
if you want to as well you can use it
via Docker container
so
anytime you pip install fetas you're
going to get this handy command that'll
give you a good idea
of what feta is capable of so if he does
deploy up it's going to require that you
have Docker compose installed as well
and then what it's going to do is it's
going to spin up
an entire sample application that's
going to show you the flow of evaluating
your privacy policies against your code
as well as what happens when a user
wants to submit a privacy request
real quick I'll show you
what that looks like
so in these yaml files this is kind of a
thing we call feed as Lang
and this is the foundation of feed as
this is
expressing
privacy as code
so we have systems we have things like
data sets we've created all of these
kind of prerequisite
Primitives that you need to be able to
talk about privacy in a way that's
applicable to code
and then scalable is metadata
so here we have systems that can be a
micro service that can be a specific
functionality within a microservice we
have data sets
so okay our red is cache we need to know
what data is stored in there because
there might be pii in our application
database
we need to know everything that's in
there
we need to know what types of data is in
there
and then we have our policies so this is
basically a privacy policy codified in
feed as Lang that we're going to be able
to run evaluations against
so when you run feed as deploy up you're
going to get greeted by this nice web
page that opens locally
so the few different things we can play
around with first I'm going to jump into
the CLI demo
so when you run feed as push which is a
CLI command it's going to take all of
those yaml metadata files that I showed
you before and it's going to push them
up to the server this is pretty closely
based off of like Cube cuddle Cube CTL
whatever you might want to call it it's
meant to be a very familiar interface
where you've
created everything as yaml files locally
and then it's getting pushed and stored
in a centralized server
so now if I do a feed as evaluate
it's going to take that codified privacy
policy that I showed earlier
and it's going to
check everything that I've declared and
my systems and my data sets and make
sure that I'm not breaching that policy
in any way
I can come back over here
and
I've added this privacy declaration in
here so that's saying I've added some
kind of functionality to my system
I've come in here and I've added it as
an additional privacy Decker saying okay
I'm also using this data for this reason
with this data subject in this qualifier
so if I do that
we now have something that breaches our
privacy policy and the evaluation is
going to come up as a fail and you're
going to see what that looks like
so this is something that would run in
Ci or it could even run as a git hook
before the developer pushes and this
basically means that you can catch
privacy failures before they even
get to production right because again
just like with security if the security
flaw makes it into production it's still
an incident right you haven't you
haven't truly stopped it from happening
so the first kind of protection here is
fetus will help you at development time
avoid some of these some of these issues
what happens after you deploy your
application though right so I'm going to
come we have a a nice
we have a nice sample application here
cookie house so say I'm a user uh it's
my cheat day we're not counting calories
so I'm just going to come over here I'm
going to buy the triple pack
okay so now we're gonna buy some cookies
awesome
so we've got this privacy Center down
here this is something that gets
deployed along with your code and this
is the key way in which your users are
going to interact with fides from a
privacy perspective
so they come here even though this is
themed like the website we were just
using this is actually something you
would deploy from
from fides
so we can do our data access we can do
our data Erasure we can also manage our
consent
so I can come over here to manage
consent
I can give him my email right so that's
my primary identifier that's how the
system knows who I am because I made my
order with that email that ID and now
okay because I'm using this browser it's
detecting Global privacy control
it's automatically opted me out of data
sales and uh sales and sharing email
marketing for product analytics is okay
that's perfect because that's what I've
set my browser to allow
now I'll come in here and do a an access
request so I ordered my cookies they
were delicious but now I'm kind of
curious what this company has on me
okay so this is what we call the admin
UI this is where the compliance team the
Privacy professionals potentially the
Privacy Engineers would come to interact
with the system
so what's nice is that the the engineers
are interacting with fetas in a very
engineering type way right they're using
the CLI they're writing yaml files
that's something that Engineers just are
inherently used to doing these days
and so they don't really feel any extra
friction when they do those things yes
it's extra steps but it's not like
they're having to go learn a new tool
that they're completely unfamiliar with
likewise we're not forcing compliance
professionals lawyers or privacy
Engineers to go and learn to use a CLI
or to go and learn to write yaml files
they're able to come in here and use the
UI something that should already be
familiar with
to to then ingest that information
so I'm going to just toggle this okay so
I can see as you just saw I submitted an
access request
it's going to show up here in the
Privacy Center
I can approve it
and then in production depending on what
you've set up usually it's S3 that would
actually go to an S3 bucket the user
would be sent to link and then be able
to download it
additionally over here in data mapping
we have a view of all of our systems
because again we've defined everything
as metadata and now we're able to say
hey
we know these are our systems we know
what the uses are
and we can then drill in and see exactly
exactly what's going on there
so this is really important for anyone
that's had to do any kind of auditing
around privacy we have everything in
here that you need to be compliant
okay so that's fit is in a nutshell
that's covering
how we maintain privacy during the
software development life cycle and then
that's maintaining
how users interact with fetas and how
that eventually gets back to your admin
UI where you can then do those erasures
do those access requests and maintain
that user's privacy throughout the
entire application including the data
warehouse so we have connectors
that can talk to your data source and
that includes third-party apis as well
because that also falls under it
um so like under Connection Manager you
can see so this sample application has a
database and has a postgres
database and we're connected to both of
those and we're able to go find that
Thomas ethica.com and remove it or grab
it from all those places and we
understand
the execution graph of how that should
be run
because for instance if you're if you're
running
in Erasure you don't want to
mess up a foreign key relationship or
anything of that nature so again we're
actually building a graph and
understanding what order we should
execute everything in to make sure we
don't mess things up
so quick summary
data compliance is really really hard
and it's not a simple or a solved
problem
maybe some people sending the audience
right now have just come to the
realization that they're probably not
compliant uh don't worry that's that's
normal
it's really difficult
um because you need to handle the DSR
processing you need to handle the
consent tracking you need to handle data
mapping and potentially more things
right as as more regulation is added and
is there even slightly different
depending on each spot something I
didn't even get into was
you also need to look at
geography when doing some of these
things so for instance if you are in
California and you go to that privacy uh
you know consent Center that I just
showed you you're going to have a
different experience than if you're in a
different state or you're in Europe
because we actually tailor it to each
geographic location that's important
and again I think most companies
actually
are solving this by simply being
non-compliant and just hoping they can
wing it or get away with it or not get
caught so a lot of people that come to
us and want help implementing fetas are
companies that are just not compliant
we're not replacing some kind of
existing solution they're just not
compliant in the first place
and additionally the ones that do come
to us that already have some kind of
data data compliance tool it's not
holistic right because there are some
really fantastic tools for say okay
let's do data governance and lineage
throughout our data warehouse but okay
what about the application database or
what about when a user visits your
website how are you recording what their
consent is and then
tying that through your entire your
entire day to life cycle
which is more than just a data warehouse
well lucky for you all if he does does
it does it all and you're welcome no
we're we're always working on it we're
always trying to evolve it uh
but again that's that's kind of the
benefit of fidez is you get privacy
experts right we do this day in and day
out building this fully open source
project
um recently actually the the python
software Foundation uh contributed to
fetus and is now
implementing fidez as part of all of the
infrastructure the python software
foundation so in the near future even
using something like pip will actually
be
talking to fidas in the background which
has been really exciting for us
and in summation
why fetus does it all is because we have
a solution for the dsrs the data mapping
user-facing privacy Center
and enforcing compliance and development
time
quick appendix
for any links so first docs.ethica.com
you can go there to learn all about
fidas anything else you might want to
know we have a lot of tutorials on there
if you want to play around with the demo
that I did the fedes deploy up it will
have instructions on that as well it's
just a pip install ethica fetus we also
have links to fidas and fideslang on
GitHub get both of those completely open
source contributions welcome we interact
with the community quite a bit and then
finally for some
I guess self-promotion I was recently on
an episode of The Talk python to Me
podcast episode 409 and if you just want
to hear me talk about data I'd be
flattered you can go listen to that we
get into you know a lot more than just
feed as we talk about data privacy as a
whole compliance and how it affects us
as software engineers and innovators and
entrepreneurs
so that's all I have for everyone thank
you
foreign
[Music]
5.0 / 5 (0 votes)