If you look at a GIF in a text editor instead of in your browser, it would look meaningless to you. That's because a gif, just like vector embeddings, are stored in a machine representation that requires interpretation before a human can understand it.
I mentioned embedding inversion attacks above and how much original text can be recovered, but the same is also true of embeddings representing other things like faces, voices, and more. Attacks on these models can generate photos and audio that look and sound like the original source.
Not only that, but there are other attacks on embeddings including membership inference and attribute inference attacks. And even if you ignore these attacks, embeddings are incredibly useful in and of themselves since they can be used for potent purposes like semantic search. Imagine a hacker that can just query your data asking for the most sensitive bits. Embeddings help the attacker do just that.
Protect Your AI Data
If all of this is alarming to you, it should be. And that's why we are building Cloaked AI. Vector embeddings are incredibly powerful and the benefits are numerous, including constraining hallucinations. However, the risks are high if the data is sensitive. If you're storing embeddings in a vector database, you should be protecting them with Cloaked AI.
And as of last week, you can get your hands on the Cloaked AI beta. We're looking for feedback and input and we'd love to hear about your use cases. Fill out the form on the Cloaked AI page, and you'll be emailed the full instructions to access the beta. The team is busily adding a lot of features and functionality to Cloaked AI, and in the next beta release, you'll see an option to deploy it as a Pinecone proxy for a zero-friction experience. Reply to this email and let me know what else you'd like to see.