How can you keep your data safe?
So we now come to the most important point: How can you get the most out of LLMs while keeping your data safe?
The most important thing is to understand the terms of use of LLMs. Proprietary LLMs can either be used through personal accounts, such as the one you use to sign up to ChatGPT directly, or through third-party providers, which use the vendor’s APIs to access the models and enable their use in their applications.
Personal accounts for popular LLMs generally allow the company to use your data for training (3). However, many applications using these models have negotiated much more secure terms of use, forbidding the use of customer data. So, while using personal ChatGPT accounts for sensitive work is never a good idea, a third-party application running on ChatGPT may be perfectly secure. Datalore AI, for example, only uses third-party LLMs which can guarantee terms of use that protect our customer’s data, making them a secure way to pass your information to these models.
Secondly, you should be careful about which parts of your system you grant LLMs access to and what safeguards you have. LLMs are vulnerable to prompt injection attacks, where malicious actors hijack the LLM for their own purposes. For example, if you give an LLM access to a database containing sensitive data, an attacker may be able to get the model to extract and send this data to them.
Finally, if you decide to fine-tune your own model, you should be very careful about the data you use. Given the tendency of models to memorize their training data, there is a chance that sensitive information from your training set could end up in the model outputs.