There’s No Putting Toothpaste Back in the Tube
Developers frequently embed security tokens, private encryption keys, and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.
A Partial Fix, Not a Permanent Solution
Lasso, a security research group, recently investigated Microsoft’s fix for a critical issue with its Copilot AI. The fix involved cutting off access to a special Bing user interface, once available at cc.bingj.com, to the public. However, the fix didn’t appear to clear the private pages from the cache itself. As a result, the private information was still accessible to Copilot, which would make it available to the Copilot user who asked.
Although Bing’s cached link feature was disabled, cached pages continued to appear in search results. This indicated that the fix was a temporary patch and while public access was blocked, the underlying data had not been fully removed.
When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, human users were prevented from retrieving the cached data, but Copilot could still access it.
Private Repositories, Public Problems
The Lasso researchers found that simply making a repository private isn’t enough. Once exposed, credentials are irreparably compromised. The only recourse is to rotate all credentials. This advice still doesn’t address the problems resulting when other sensitive data is included in repositories that are switched from public to private.
Microsoft incurred legal expenses to have tools removed from GitHub after alleging they violated a raft of laws, including the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act. Company lawyers prevailed in getting the tools removed. To date, Copilot continues to undermine this work by making the tools available anyway.
Conclusion
It’s clear that simply making a repository private isn’t enough to protect sensitive data. Developers must take steps to input sensitive information securely and avoid including it in public repositories. Additionally, large language models like Copilot must be designed to respect privacy and security, rather than undermining it.
FAQs
- What is Copilot? Copilot is a large language model developed by Microsoft that can be used to generate text and complete tasks.
- What is a repository? A repository is a collection of files and data stored on a server or database.
- Why are private repositories important? Private repositories are important because they contain sensitive information that should not be publicly accessible.
- What can I do to protect my data? To protect your data, you should input sensitive information securely and avoid including it in public repositories. Additionally, you should use large language models like Copilot responsibly and ensure they are designed to respect privacy and security.