Tom Goldstein - Dataset security issues in generative AI


Machine learning systems are built using large troves of training data that may contain private or copyrighted content. In this talk, I’ll survey a number of security issues that arise when sensitive data is used. I’ll begin by talking about attack methods that extract private training data from federated learning protocols. Then, I’ll discuss data privacy issues that arise when using generative models. These models are often created using a training objective that explicitly promotes their ability to regenerate their training data, causing a host of issues.

SEC LL2.224