Home SECURITY US scientists create PAC Privacy algorithm to protect training data from leaks

US scientists create PAC Privacy algorithm to protect training data from leaks

US scientists create PAC Privacy algorithm to protect training data from leaks


US scientists create PAC Privacy algorithm to protect training data from leaks

Technology is creating a new future where training data is not intercepted.

A group of scientists from the Massachusetts Institute of Technology (MIT) developed technology , which allows you to minimize the amount of noise added to machine learning models to ensure the protection of personal data. The study will be presented on August 24 at the International Conference Crypto Summit 2023 .

Machine Learning (ML) models are trained on large datasets that may contain sensitive information such as medical images or biometric data. If such a model gets into the public domain, then there is a risk that someone will be able to extract this data from it. To prevent leakage, scientists add noise, or random changes, to the model to mask the original data.

However, adding noise reduces the accuracy of the model, so it is desirable to add as little noise as possible, but just enough to ensure data protection.

The scientists introduced a new privacy metric, which they called “Probably Approximately Correct (PAC) Privacy,” and built a framework around it that automatically determines the minimum amount of noise needed to protect data. One of the advantages of this framework is that it does not require knowledge of the internal structure of the model or its training process, which makes it easy to use for various types of models and applications.

PAC Privacy addresses the issue of data protection differently than other approaches. Instead of focusing only on the issue of distinguishability, PAC Privacy determines how difficult it is for an adversary to recover any piece of randomly selected or generated personal data after adding noise.

The authors developed an algorithm that automatically tells the user how much noise should be added to the model to prevent attackers from recovering an approximate version of personal data. The algorithm guarantees privacy even if the adversary has unlimited computing resources.

In several cases, scientists have shown that the amount of noise needed to protect sensitive data from intruders is much less with PAC Privacy than with other approaches. This can help engineers build machine learning models that reliably hide training data while maintaining accuracy in real-world conditions.

A distinctive feature of PAC Privacy from other approaches to privacy is that the algorithm does not require knowledge of the internal mechanisms of the model or the process of its training. When using PAC Privacy, the user can set the desired level of “confidence” at an early stage. The algorithm then automatically tells the user the optimal amount of noise to add to the output model before it is released to the public.

However, PAC Privacy also has limitations:

  • the technology does not tell the user how much the accuracy of the model will decrease after adding noise;
  • PAC Privacy requires the ML model to be trained many times on different subsets of the data, so the computations can be expensive.

Scientists are going to continue improving the method in the coming years.


Source link



Please enter your comment!
Please enter your name here