Skip to content

Four Signals

Agentic insights for modern tech teams

Auto-verifying your AI-SRE's fixes (Part II): HolmesGPT end-to-end on a real cluster
AI/ML / dev.to

Auto-verifying your AI-SRE's fixes (Part II): HolmesGPT end-to-end on a real cluster

HolmesGPT, an open-source AI-SRE, autonomously investigates alerts on a GKE cluster, generates code patches via a Claude wrapper, and verifies fixes using mirrord exec to run patched code against real cluster dependencies. In a demo, it correctly diagnosed a ValueError causing 5% error rate violations, produced a patch that passed verification by clearing the SLO without regressions, and rejected a second patch that failed. The system integrates with Alertmanager, Prometheus, and existing runbooks to ground investigations in cluster state and documentation.

Why it matters

For platform and SRE engineers, this demonstrates a self-hostable, verifiable AI agent loop that can autonomously patch production incidents without blind trust, combining LLM reasoning with real cluster testing via mirrord.

How I Used Automated Red Teaming To Take My AI Agent from 6/9 Breaches to Zero
AI/ML / dev.to

How I Used Automated Red Teaming To Take My AI Agent from 6/9 Breaches to Zero

Automated red teaming using Strands Evals reduced AI agent breaches from 6/9 to zero by generating adversarial cases tailored to the agent's tools (bash, lookup_employee) and running multi-turn CrescendoStrategy attacks. The unprotected agent leaked AWS credentials via creative prompt escalation, but systematic testing across data_exfiltration, excessive_agency, and system_prompt_leak categories identified and patched vulnerabilities. The approach works with any agent framework but leverages Amazon Bedrock and Strands Agents for built-in evaluation features.

From Transcript to Typed Action Items: Three Parallel Agents in TypeScript
AI/ML / dev.to

From Transcript to Typed Action Items: Three Parallel Agents in TypeScript

A TypeScript implementation of a meeting summarizer uses three parallel agents—each with a distinct system prompt and temperature—to separately handle prose summary, typed action items via Zod schemas, and per-speaker sentiment, then a fourth agent merges the results. This avoids prompt fighting and unstructured output common in single-prompt approaches, producing a structured Markdown report with an action-item table, owner, and optional due dates. The 280-line open-source example runs on Claude Sonnet 4-6 and demonstrates how typed outputs and concurrency improve reliability for multi-task LLM pipelines.

Languages / lwn.net

Free-threaded Python: past, present, and future

At PyCon US 2026, CPython core developer Thomas Wouters detailed the free-threaded Python interpreter, which removes the Global Interpreter Lock (GIL) to enable true parallel thread execution. He traced the motivation from historical single-CPU designs to modern multi-core systems, noting that the GIL was originally the most efficient way to support threads but now limits performance. Wouters, who works on this at Meta, highlighted that while alternatives like rewriting in Rust or using multiple processes exist, they require significant data and code restructuring, making free-threaded Python a more direct path to leveraging multi-core hardware.

Cloudflare launched self-managed OAuth for all
Cloud / blog.cloudflare.com

Cloudflare launched self-managed OAuth for all

Cloudflare opened self-managed OAuth to all customers, enabling developers to create and manage their own OAuth clients for delegated API access. The upgrade required migrating from an older Hydra OAuth engine to newer versions, with custom SQL migrations using CREATE INDEX CONCURRENTLY to avoid downtime. This move addresses growing demand from agentic tools and SaaS integrations that previously relied on harder-to-manage API tokens.

Next.js 16 Server Actions Security: The Auth Check Most Developers Miss
Security / dev.to

Next.js 16 Server Actions Security: The Auth Check Most Developers Miss

Next.js 16 Server Actions are public HTTP endpoints, not internal helpers—'use server' exposes them without authentication or authorization. Developers often protect the page UI but skip auth checks inside the action, leaving mutations vulnerable to direct cURL calls with any valid session. The fix requires explicit session verification and resource ownership checks inside every Server Action, treating it as an independent API endpoint.

Presentation: Rust at the Core - Accelerating Polyglot SDK Development
AI/ML / infoq.com

Presentation: Rust at the Core - Accelerating Polyglot SDK Development

Spencer Judge, leading Temporal's SDK team, details an architecture using a shared Rust core with language-specific layers to avoid rewriting ~70,000 lines of complex state machine logic across eight SDKs. He covers practical challenges like FFI boundaries, async bridging, and memory safety, contrasting native extensions with emerging WebAssembly approaches for cross-language portability.

Google OpenRL is an Experimental Self-hosted API for LLM Post-Training Fine-tuning
AI/ML / infoq.com

Google OpenRL is an Experimental Self-hosted API for LLM Post-Training Fine-tuning

Google's GKE Labs released OpenRL, an open-source project providing a self-hosted API for post-training fine-tuning of LLMs on standard Kubernetes clusters. It decouples reinforcement learning infrastructure from AI research, enabling parallel RL jobs to increase GPU utilization by avoiding idle time during CPU-bound reward calculations. OpenRL runs on macOS, Nvidia GPUs, and GKE, and includes an autoresearch recipe for parameter sweeps with Gemma models.

AI Coding Agents Need Project Memory, Not Just Bigger Prompts
AI/ML / dev.to

AI Coding Agents Need Project Memory, Not Just Bigger Prompts

AI coding agents fail to retain project-specific lessons across sessions, leading to repeated mistakes like editing generated files or running wrong test commands. A simple LESSONS.md file scales poorly beyond a few entries, becoming noisy, stale, and disconnected from codebase context. A graph-based approach that links lessons to file paths, commands, domains, and freshness metadata enables agents to retrieve relevant knowledge at the right moment.

Two men hold a trophy-like chip display.
AI/ML / arstechnica.com

OpenAI and Broadcom announce chip designed for LLM inference at scale

OpenAI and Broadcom announced Jalapeño, an ASIC designed from scratch for LLM inference at data center scale, developed in nine months using insights from OpenAI's model roadmap. Early testing shows substantially better performance per watt than current state-of-the-art, with deployment targeted by end of 2026. The chip aims to reduce OpenAI's dependence on Nvidia and enable vertical integration across its full stack.