Introduction to AI Security
Your AI code assistant will coerce you to execute risky code snippets, and things just will get done the second you open the folder. This is how to continue to stay safe. What would happen if the most perilous line you will ever see today was the "Are you sure?" prompt?
My Story
Some weeks ago, a group of people released a microservice that was small, having passed an artificial intelligence-enabled security check. I stood and watched the notes to construction regarding the building march by in my design office, and I felt that itch again: it did not have any harsh edge, it was pleasant, but it was not helpful. An experiment upon the same flow with a toy repo confirmed the argument — the assistant was induced to beg to be allowed to do something of the backdrop with a long scroll of helpful matter, and the risky bit was quite out of frame.
The same trend was soon rediscovered in another project I was doing last year, where one of the developers’ IDEs would be loaded to accomplish a task file whenever a folder is opened. No malicious magical arts — careless and credibility. That was a fast lesson, but the moral of the story was still the same: the new era of AI does not exalt old security issues; it merely folds them under a more pleasant surface.
Why This Matters
The headline of today is obvious: scientists demonstrated a so-called lies-in-the-loop (LITL) attack that makes an AI coding agent think you are telling the truth, and then it creates a harmful command, of course, you press Enter, and supply-chain danger ensues. At the same time, another thread demonstrates that your IDE may be a part of the problem. According to Hacker News, Cursor, a fork of VS Code, powered by AI, comes with Workspace Trust switched off; any repository containing a .vscode/tasks.json can be run as code by default when you open a folder; it becomes code running under your account.
And, as you may be thinking, part yes, this is just a prompt injection, in other words. The 2025 LLM Top 10 of OWASP begins with LLM01 Prompt Injection and LLM02 Insecure Output Handling. The ancient rule is true: mistrust input should never be the force to take a sensitive action without extreme measures. But this is the twist — man is still in the loop, and it is the loop the lie falls in.
Your Fix in Steps
The following is a short route that can be completed by teams during this week. It seems to be an essay as the fix is not a characteristic but a habit.
- Turn trust back on (and pin it). Enabling the Workspace Trust in any AI-enhanced IDE or similar, and setting up the option of Open in Restricted Mode in the untrusted folders. Better still: audit repos exclusively. Hint: .vscode/tasks.json looks like executable code, and it is.
- Acts by gate agents as opposed to vibes. Human-in-the-loop (HITL) is not in control when the human is not able to see the risky delta. Make the agent provide a short and fixed Action Plan containing the precise command and target in a monospace box; refuse to accept approvals when the box exceeds a fixed size. This reduces the above-the-fold trick of burying it.
- Split responsibilities: render vs. run. Have the agent in a render sandbox (plan, diff, test outline) and execute it with a different (strict) allow-lists. It is the proposing of the agent; it is the enforcing by the runner. OWASP translates this to the following LLM02 / LLM05: constrain outputs and secure the supply chain.
- Label your models and artifacts. Drawing the author/model by name only out of the public hubs is a bad idea. Pin to unchanging SHAs and reflect to your registry. Palo Alto presents the Model Namespace Reuse to demonstrate the reason why names cannot be trusted.
- Approves Authorize a small-diff view. Prior to running the run command, only the least amount of diff or a single command should be displayed. No story, no scroll back, no emojis, no more, no less. In case the agent is not able to display a diff, it is not runnable. Tip: Brief reviewing periods reduce decision fatigue and success in social engineering.
- Instrument the splash zone. Generate a low-privileged, disposable agent environment with special API keys, project secrets, and kill switches. Record all outgoing calls and file touches. Should one of them go wrong, you nuke a sandbox- not your laptop.
- Make use of adversarial exercises. Test Purple-Team Purple-Team tests conceal injected instructions in tickets, READMEs, and issues, just like in the LITL study, and measure time-to-notice and time-to-kill. Slow and cautious approval habits should be rewarded. This much would have cost us thousands if we had not hit it; it was the first thing discovered by the drill.
- Address: Are you sure it’s more like an interface, rather than a checkbox? Create your own approval prompt: high-contrast, single-screen fixed font. The most effective prompt is more of a surgical consent form and not a pep talk. Which would you do away with first, logs or access?
Quick Myths
“Humans are the safety net.” They are — until the fall is disguised by the friendly text on the net with a blanket. “Sandboxing is enough.” Only in the case of the sandbox containing the secrets, the tokens, and logs to respond to you, if you have the sandbox. “Trusted sources imply safe models.” Names are Hijackable; Pin by Hash & Mirror.
Checklist
Before you merge today, scan this:
- Workspace Trust on
- Agent “Action Plan” diff renders cleanly
- Commands are short and pinned
- Model pulls by SHA
- Throwaway keys only
- Logs centralized and reviewed.
What to Do This Week
Make Tuesday your day of Artificial Intelligence security. In IDs, do trust modes and append the one supplementary single of the Panel, do it like a top five of external mirror with hashes. Then do a 30-minute deceptive exercise — Hide one of the instructions in an issue and see if your team gets it? If approvals feel rushed (slow down the loop on purpose).
Further Reading
- Dark Reading (Sep 15, 2025) — report on “Lies-in-the-Loop” beating AI.
- Checkmarx (Sep 15, 2025) — primary research — Proof of Concept on LITL with HITL bypass patterns.
- The Hacker News (Sep 12, 2025) — Cursor IDE default trust setting allows executing tasks silently on folder opening.
- OWASP GenAI Top 10 (2025) — LLM01/LLM02 Grounding for prompt injection/out handling controls.
Conclusion
AI security is a growing concern, and it’s essential to take steps to protect yourself and your team from potential threats. By following the steps outlined in this article, you can help prevent AI-related security breaches and ensure a safer coding environment.
FAQs
Q: What is a lies-in-the-loop (LITL) attack?
A: A LITL attack is a type of attack that makes an AI coding agent think you are telling the truth, and then it creates a harmful command.
Q: How can I prevent LITL attacks?
A: You can prevent LITL attacks by enabling Workspace Trust, using a render sandbox, and splitting responsibilities between the agent and the runner.
Q: What is the importance of labeling models and artifacts?
A: Labeling models and artifacts is crucial to prevent hijacking and ensure the integrity of your code.
Q: How can I instrument the splash zone?
A: You can instrument the splash zone by generating a low-privileged, disposable agent environment with special API keys, project secrets, and kill switches.









