I Built an Adversarial Eval Framework and Attacked 5 LLMs — Every Single One Failed
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
Adversarial evaluation framework for LLMs with concrete results, highly relevant to AI/ML testing.
Agent-eval is an open-source adversarial evaluation framework that runs full ReAct agentic loops with tool calls against live LLM backends, then scores outputs through a three-tier assertion pyramid (deterministic, heuristic, model-as-judge). Testing 5 models (including Llama 3.3 70B via Groq) on 10 adversarial scenarios—prompt injection via tool output, hallucinated file contents, sycophancy, and circular dependency chains—the best model scored 62.5% and the worst 34%, with every model failing the same three tests. The framework short-circuits upward: if Tier 1 deterministic checks catch a prompt injection, it skips expensive LLM judge calls.