Breakthrough Claimed in Fight Against AI Security Flaw

Introduction to CaMeL

CaMeL is a system designed to securely execute user requests using language models. It works by splitting responsibilities between two language models: a "privileged LLM" (P-LLM) and a "quarantined LLM" (Q-LLM). The P-LLM generates code that defines the steps to take, while the Q-LLM parses unstructured data into structured outputs.

How CaMeL Works

The P-LLM acts as a "planner module" that only processes direct user instructions. It generates code that operates on values, but never sees the content of emails or documents. The Q-LLM, on the other hand, is a temporary, isolated helper AI that extracts information from unstructured data. It has no access to tools or memory and cannot take any actions, preventing it from being directly exploited.

Separation of Responsibilities

The separation of responsibilities between the P-LLM and Q-LLM ensures that malicious text can’t influence which actions the AI decides to take. The P-LLM only sees that a value exists, such as "email = get_last_email()", and then writes code that operates on it. This approach prevents information leakage and ensures the security of the system.

From Prompt to Secure Execution

CaMeL converts the user’s prompt into a sequence of steps that are described using code. For example, the prompt "Find Bob’s email in my last email and send him a reminder about tomorrow’s meeting" would convert into code that uses a locked-down subset of Python. This code is then executed using a special, secure interpreter that monitors it closely and tracks where each piece of data comes from.

Secure Execution

The secure interpreter uses a "data trail" to track the origin of each piece of data. It notes that the address variable was created using information from the potentially untrusted email variable and applies security policies based on this data trail. This process involves analyzing the structure of the generated Python code and running it systematically.

Conclusion

CaMeL is an innovative system that securely executes user requests using language models. Its dual-LLM approach and secure interpreter ensure that malicious text can’t influence the actions of the AI. By tracking the origin of each piece of data and applying security policies, CaMeL provides a secure and reliable way to execute user requests.

FAQs

What is CaMeL?
CaMeL is a system that securely executes user requests using language models.
How does CaMeL work?
CaMeL works by splitting responsibilities between two language models: a "privileged LLM" (P-LLM) and a "quarantined LLM" (Q-LLM).
What is the purpose of the P-LLM and Q-LLM?
The P-LLM generates code that defines the steps to take, while the Q-LLM parses unstructured data into structured outputs.
How does CaMeL ensure security?
CaMeL ensures security by tracking the origin of each piece of data and applying security policies based on this data trail.
What programming language does CaMeL use?
CaMeL uses a locked-down subset of Python.