Updated 2024-08-18
Introduction
Generative AI (artificial Intelligence) systems are a type of system capable of producing various types of content, such as text, images, graphics, music, and video based on a human-generated input – a prompt. The prompt is usually text, but for instance in the case of music, could be audio or other types of media. The results produced are probability-constructed and draw on huge databases created by various sources, depending on the purpose of the system.
For instance the text-producing GPT (Generative Pre-trained Transformer) and image-producing DALL-E are based on scraping the Internet for billions of of image-text pairs, and the music-producing MusicGen is based on 20,000 hours of music, including 10,000 “high-quality” licensed music tracks and 390,000 instrument-only tracks.
The services Runbox provide and our communication with customers are mainly text-based. Accordingly, our main concern is the use of generative AI systems that are designed for written conversation, such as Open AI’s GPT-based ChatGPT, Microsoft’s Bing Chat, Google’s Bard, and Meta’s LLaMA.
Common for these systems is the use of large language model (LLM), and the processing of vast amount of data mainly scraped from Internet. In its “raw form”, the dataset contains all kinds of information, without regard to privacy, copyrights, and intellectual property rights.
After formatting and cleaning (to get rid of incorrect or inappropriate material) the process results in a corpus that contains billions of words with labels that connect a syllable, a word, or a phrase (a token) to other tokens based on probabilities – i short, the corpus is pre-trained.
In the first training round the corpus is analyzed for patterns, clusters, and more, using algorithms based on different statistical methods, followed by fine-tuning by training engineers that “curate” the corpus with relevant input.
The corpus “training” may thereafter continue based on conversations initiated by users prompts.
Can AI generated text be trusted?
Inherent in the way AI generated text is produced are several sources of errors and inaccuracies:
- The sources used for building the corpus;
- The methods by which the corpus is cleaned;
- The methods for analyzing the corpus and the structuring of tokens;
- The statistical methods and algorithms used for pre-training, and the criteria for parameter setting, representing weights and biases;
- The skill of the curating engineers, and the tools they use;
- What you ask for and how you phrase your question;
- Finally, what conversations other users have had.
Consequently, the short answer to the question above is “no”.
See our blog post Be privacy concerned when using ChatGPT (and other AI chatbots) for more information on this topic.
Generative AI Policy
One of Runbox’ main selling points is personal support. This, in combination with the risks for factual inaccuracies, is the background for policy statement #1:
1. We will not use text generated by AI systems in our communication with customers unless we can verify its accuracy.
We acknowledge how text-generating AI systems can improve our knowledge, and accordingly policy statement # 2:
2. Text-generating AI systems can be utilized when performing research for internal documents, in publishing, and in automated communication with human supervision and validation.
The following will then apply:
- AI-generated content should be viewed as a starting point, not the finished
product. - All AI-generated content must be proofread and checked for accuracy, hereunder
ensure that all information is up-to-date. Be aware of outputs that could contain subtle but meaningful hallucinations, factual errors and biased or inappropriate statements. - Don’t input any personally identifiable information or information which identify a
company. - Don’t input any sensitive information, even if it cannot be linked to a person or
company. - Don’t input any of the company’s intellectual property.
- Do disable history if using external tools (like ChatGPT) that enable that choice.
- Even if it is a internal document, clarify that generative AI systems have been used during the preparation.
In the case of AI-generated media other than text, this policy is valid and should be used where appropriate.