What’s happening to your data in AI? I’ve had countless conversations with tech enthusiasts asking that question. It’s a valid concern, given the rapid expansion of AI technologies and their integration into our daily lives. So, I decided to shed some light on this topic by breaking down the different levels of data usage by AI companies. Let’s dive in.
Understanding Four Levels of Data Use
I created these levels as a helpful guide in understanding roughly where a system is in relation to your data. Not all systems will fit cleanly into these categories, and some will change levels depending on usage. Even so, many of the most popular tools do fit nicely into this categorization.
Level 0 (ZDR – Zero Day Retention)
With a Zero Day Retention policy, companies do not keep your data for any duration, nor do they train on it. Your data never existed to them.
Level 1 (30-Day Retention, No Training)
Here, companies keep a tab on your data for 30 days, but never for training. They will not access it unless your submissions violated policy, perhaps sharing something offensive. If a submission caused system failure or generations that violate guidelines, they may view your content to understand why. Apart from that, your data remains untouched.
Level 2 (30-Day Retention, User Analytics, No Training)
Think of this as Level 1 with a sprinkle of analytics. Systems like Copilot for business operate at this level. Your data won’t be retained beyond 30 days or be used for training. However, anonymous user data such as interaction duration, the ratio of accepted to rejected answers, and the like, are retained for analytical purposes.
Level 3 (Retention to Train)
At this level, not only is your data retained, but it’s also actively used for training purposes. Analytics may or may not be part of that.
Beyond Level 3
Almost all current systems still abstract away the user when training. You yourself are not known to the AI systems being trained on your submissions. In the future, this may change. OpenAI has already introduced custom instructions, and their examples are fairly personal.
A Snapshot of Current AI Systems and Their Retention Levels
ChatGPT: By default, it’s a level 3 system. You can turn it into a level 1 system just by toggling off the chat history.
Bing Chat: While the regular version sits at level 3, Bing Chat for Enterprise operates at level 0, making it a rare gem for professionals seeking top-notch privacy.
Bard, Midjourney, and Anthropic: All three operate at level 3.
Github Copilot: Typically, it’s a level 3 system. But with a little tweaking, it can be set to level 2. And for those in business, Copilot defaults to level 2.
The Bigger Picture
The data conundrum is a genuine concern. Companies and personal users want to know that their data is safe, and nothing they do with AI will get them sued. Personally, I think it is akin to cloud computing. Companies like Amazon and Microsoft already offer massive server farms where we store crucial data – from code to emails. We’ve trusted these giants with our data, believing in their protocols and safeguards.
Similarly, companies like OpenAI and Microsoft have stringent policies in place for AI usage, parallel to the standards we’ve come to accept for cloud computing. To put things in perspective, if we were to classify cloud computing within the Level 0-3 scale, it would likely fall between levels 1 and 2. And with contracts and assurances for cloud services, there’s little reason to lose sleep over potential data mishaps. I see AI in the same light.
Final Thoughts
I am hopeful as we move forward into this new world — this is a revolution with a scale potentially never achieved before. I’m excited to see how it will continue to affect our lives, and in my opinion, improve them.
Disclaimer: A significant portion of this blog was crafted with the assistance of AI! If you’re as captivated by AI as I am, dive deeper into my other blogs where I explore its wonders and potential.
Leave A Comment