As new digital platforms and services emerge, the challenge of keeping users’ information safe online is growing more complex – novel technologies require novel privacy solutions. At Google, we continue to invest in privacy-enhancing technologies (PETs), a family of cutting-edge tools that help solve the critical task of data processing by providing people guarantees that their personal information is kept private and secure.
Over the past decade, we’ve integrated PETs throughout our product suite, used them to help tackle societal challenges and made many of our own freely available to developers and researchers around the world via open source projects.
Today we’re excited to share updates on our work with differential privacy, a mathematical framework that allows for analysis of datasets in a privacy-preserving way to help ensure individual information is never revealed.
Differential privacy is a PET not known by most users, but one of the unsung heroes behind some of the most widely used tech features today. But like many PETs, industry adoption of differential privacy can be challenging for many reasons: complex technical integrations, limited scalability for large applications, high costs for computing resources and more.
We’re pleased to announce we have achieved what we know to be the largest application of differential privacy in the world spanning close to three billion devices over the past year, helping Google improve products like Google Home, Google Search on Android and Messages. Using this technology we were able to improve the overall user experience in these products.
For example, we were able to identify the root causes of crashes for Matter devices in Google Home to help increase customer satisfaction. Matter is an industry standard simplifying the set up and control of smart home devices across smart home ecosystems. As Google Home continued to add support for new device types, our team uncovered and quickly patched some connectivity issues with the Home app by using insights unlocked by our differential privacy tool.
This three billion device deployment was made possible through six plus years of research on our “shuffler” model, which effectively shuffles data between “local” and “central” models to achieve more accurate analysis on larger data sets while still maintaining the strongest privacy guarantees.
Over five years ago, we set out on a mission to democratize access to our PETs by releasing the first open source version of our foundational differential privacy libraries. Our goal is to make many of the same technologies we use internally freely available to anyone, in turn lowering the barrier to entry for developers and researchers worldwide.
As part of this commitment, we open sourced a first-of-its-kind Fully Homomorphic Encryption (FHE) transpiler two years ago and have continued to remove barriers to entry along the way. We have also done the same with our work on Federated Learning and other privacy technologies like secure multi-party computation, which allows two parties (e.g., two research institutions) to join their data and do analysis on the combined data without ever revealing the underlying information.
Since 2019, we’ve expanded access to these libraries by publishing them in new programming languages to reach as many developers as possible. Today, we are announcing the release of PipelineDP for Java Virtual Machine (JVM) called PipelineDP4j. This work is an evolution of the joint work we’ve done with OpenMined. PipelineDP4j allows developers to execute highly parallelizable computations using Java as the baseline language, and opens the door for new applications of differential privacy by reducing the barrier of entry for developers already working in Java. With the addition of this JVM release, we now cover some of the most popular developer languages – Python, Java, Go, and C++ – potentially reaching more than half of all developers worldwide.
Additionally, some of our latest differential privacy algorithms are now helping power unique tools like Google Trends. One of our model developments now allows Google Trends to provide greater insights into low-volume locales. For differential privacy – and most privacy guarantees in general – datasets need to meet a minimum threshold to ensure individuals’ data isn’t revealed. Our new offering can help professionals like researchers and local journalists obtain more insights on smaller cities or areas, and thus shine a light on top of mind topics. For example, a journalist in Luxembourg making queries for Portuguese language results can now access insights that were not available before.
The increased adoption of differential privacy both by industry and governments is a major advancement in handling user data in a private way. Nevertheless, this widespread adoption can also lead to an increased risk of faulty mechanism design and implementation. The vast volume of algorithms developed in this field renders manual inspection of their implementation impractical – and there is a lack of flexible tools capable of testing the diverse range of techniques without significant assumptions.
To allow practitioners to test whether a given mechanism violates a differential privacy guarantee, we are releasing a library, DP-Auditorium, utilizing only samples from the mechanism itself, without requiring access to any internal properties of the application.
Effective testing for a privacy guarantee entails two key steps: evaluating the privacy guarantee over a fixed dataset, and exploring datasets to find the "worst-case" privacy guarantee. DP-Auditorium introduces versatile interfaces for both components, facilitating efficient testing and consistently outperforming existing black-box access testers. Most importantly, these interfaces are designed to be flexible, enabling contributions and expansions from the research community, thereby continually augmenting the testing capabilities of the tool.
We’ll continue to build on our long-standing investment in PETs and commitment to helping developers and researchers securely process and protect user data and privacy.