Testing AI agents is still a big problem for most teams. You either test manually through scenarios, or you've built custom scripts. Rogue is an open source framework that works differently - it uses one agent to test another agent automatically.
How it works: you describe what your agent should do, Rogue creates test scenarios, then starts an evaluator that talks with your agent. You can watch these agent-to-agent conversations in real-time, which makes it easy to see where problems happen.
The setup is good - one server with multiple ways to connect (terminal interface, web interface, command line). You can run the server once and connect different ways, or add it to your CI/CD pipeline. Works with OpenAI, Anthropic, and Google models.
They include a sample t-shirt selling agent you can test right away, which helps you understand how it works. The command line mode is useful for automation.
If you're building agents and manual testing takes too much time, this is worth looking at. Good documentation on GitHub, simple to start with uvx.
Link to the open source:
https://github.com/qualifire-dev/rogue
Testing AI agents is still a big problem for most teams. You either test manually through scenarios, or you've built custom scripts. Rogue is an open source framework that works differently - it uses one agent to test another agent automatically. How it works: you describe what your agent should do, Rogue creates test scenarios, then starts an evaluator that talks with your agent. You can watch these agent-to-agent conversations in real-time, which makes it easy to see where problems happen. The setup is good - one server with multiple ways to connect (terminal interface, web interface, command line). You can run the server once and connect different ways, or add it to your CI/CD pipeline. Works with OpenAI, Anthropic, and Google models. They include a sample t-shirt selling agent you can test right away, which helps you understand how it works. The command line mode is useful for automation. If you're building agents and manual testing takes too much time, this is worth looking at. Good documentation on GitHub, simple to start with uvx. Link to the open source: https://github.com/qualifire-dev/rogue