AI-generated code could be a disaster for the software supply chain

Introduction to AI Hallucinations

In AI, hallucinations occur when an LLM (Large Language Model) produces outputs that are factually incorrect, nonsensical, or completely unrelated to the task it was assigned. Hallucinations have long dogged LLMs because they degrade their usefulness and trustworthiness and have proven vexingly difficult to predict and remedy. Recently, a phenomenon known as “package hallucination” has been identified in a study scheduled to be presented at the 2025 USENIX Security Symposium.

What are Package Hallucinations?

Package hallucinations occur when an LLM generates code that references non-existent packages. For the study, the researchers ran 30 tests, 16 in the Python programming language and 14 in JavaScript, that generated 19,200 code samples per test, for a total of 576,000 code samples. Of the 2.23 million package references contained in those samples, 440,445, or 19.7 percent, pointed to packages that didn’t exist. Among these 440,445 package hallucinations, 205,474 had unique package names.

The Threat of Package Hallucinations

One of the things that makes package hallucinations potentially useful in supply-chain attacks is that 43 percent of package hallucinations were repeated over 10 queries. This means that specific names of non-existent packages are repeated over and over, making them a predictable and potentially exploitable vulnerability. Attackers could seize on the pattern by identifying nonexistent packages that are repeatedly hallucinated, publishing malware using those names, and waiting for them to be accessed by large numbers of developers.

Patterns and Disparities in Package Hallucinations

The study uncovered disparities in the LLMs and programming languages that produced the most package hallucinations. The average percentage of package hallucinations produced by open source LLMs such as CodeLlama and DeepSeek was nearly 22 percent, compared with a little more than 5 percent by commercial models. Code written in Python resulted in fewer hallucinations than JavaScript code, with an average of almost 16 percent compared with a little over 21 percent for JavaScript.

Conclusion

Package hallucinations pose a significant threat to the security of software development, particularly in the context of supply-chain attacks. The predictable and repeatable nature of these hallucinations makes them a valuable target for malicious actors. As the use of LLMs in software development continues to grow, it is essential to address this vulnerability and develop strategies to prevent and mitigate the effects of package hallucinations.

FAQs

Q: What are package hallucinations?
A: Package hallucinations occur when an LLM generates code that references non-existent packages.
Q: How common are package hallucinations?
A: According to the study, 19.7 percent of package references pointed to packages that didn’t exist.
Q: Can package hallucinations be exploited by attackers?
A: Yes, package hallucinations can be used in supply-chain attacks, particularly if the same non-existent package names are repeated over and over.
Q: Are some LLMs or programming languages more prone to package hallucinations?
A: Yes, the study found that open source LLMs and JavaScript code were more likely to produce package hallucinations than commercial models and Python code.