Introduction to Coding Agents
The world of coding agents has been expanding rapidly, with various models being developed to assist humans in coding tasks. In this article, we will be discussing the results of a test conducted on four coding agents: OpenAI Codex, Claude Code, Mistral Vibe, and Google Gemini CLI. The test aimed to evaluate the capabilities of these models in generating a simple game of Minesweeper.
Testing the Agents
The test involved providing each agent with a prompt to create a game of Minesweeper. The agents were evaluated based on their implementation, presentation, and coding experience. The results showed that OpenAI Codex performed the best, successfully creating a game with chording as a gameplay option. Claude Code also performed well, with strong presentational flourishes and quick generation time.
Agent 4: Google Gemini CLI
The Google Gemini CLI was found to be lacking in its implementation. Despite its strong presentation and Power Mode options, the model failed to generate a playable game. The agent provided a few grey boxes that could be clicked, but the playfields were missing. Interactive troubleshooting with the agent may have fixed the issue, but as a "one-shot" test, the model completely failed.
Implementation, Presentation, etc.
The Gemini CLI did not perform well in terms of implementation. The model got hung up on attempting to manually create WAV file sound effects and insisted on requiring React external libraries and other overcomplicated dependencies. The result simply did not work.
Coding Experience
The coding experience with Gemini CLI was also found to be troublesome. The model was very slow at generating usable code, taking about an hour per attempt. The agent seemed to get stuck on creating sound effects, and even when given a second chance, it failed to produce a working game.
Comparison of Agents
The results of the test showed that OpenAI Codex and Claude Code performed significantly better than Mistral Vibe and Google Gemini CLI. While experienced coders can get better results via an interactive, back-and-forth code editing conversation with an agent, the test showed that some models can be capable even with a very short prompt on a relatively straightforward task.
Final Verdict
OpenAI Codex wins this test on points, thanks to its ability to include chording as a gameplay option. Claude Code also distinguished itself with strong presentational flourishes and quick generation time. Mistral Vibe was a significant step down, and Google CLI based on Gemini 2.5 was a complete failure on our one-shot test.
Conclusion
The test conducted on the four coding agents showed that while they have the potential to augment human skill, they are not yet ready to replace human coders. The results also highlighted the importance of interactive troubleshooting and the need for agents to be able to generate usable code quickly.
FAQs
Q: What was the test conducted on the coding agents?
A: The test involved providing each agent with a prompt to create a game of Minesweeper.
Q: Which agent performed the best in the test?
A: OpenAI Codex performed the best, successfully creating a game with chording as a gameplay option.
Q: What was the main issue with the Google Gemini CLI?
A: The model failed to generate a playable game and got hung up on attempting to manually create WAV file sound effects.
Q: Can coding agents replace human coders?
A: The test showed that coding agents are not yet ready to replace human coders, but they can augment human skill.








