Skip to content

Four Signals

Agentic insights for modern tech teams

The hard part of my AI agent wasn't doing the work, it was planning it
AI/ML / dev.to

The hard part of my AI agent wasn't doing the work, it was planning it

Building an AI agent CLI that executes actions across hundreds of apps revealed that the planning mode was far harder than direct execution. The initial approach of using a single agent for both planning and execution failed because the two modes pull in opposite directions, leading to interference. The solution required a separate planner agent with read-only tools to research actual system state before generating plans, preventing it from fabricating steps based on assumptions.

Why it matters

For engineers building multi-agent systems or tool-using LLMs, this highlights a critical architectural pattern: separating planning from execution into distinct agents with constrained capabilities to prevent mode interference and hallucinated plans.

AI/ML / cncf.io

Building a Cluster-Aware AI Agent with Kubernetes, Argo CD, and GitOps

This article provides a practical guide for building a self-hosted, read-only AI agent within a Kubernetes cluster, using GitHub Actions and Argo CD Image Updater for the CI/CD pipeline. It emphasizes a fully on-premises setup where no data leaves the cluster, avoiding reliance on cloud AI services.

Your Local LLM Is Not as Private as You Think
AI/ML / dev.to

Your Local LLM Is Not as Private as You Think

Cyera Research disclosed CVE-2026-7482 (Bleeding Llama), a critical 9.1-rated heap out-of-bounds read in Ollama versions before 0.17.1, exploitable via three unauthenticated API calls to exfiltrate process memory containing prompts, API keys, and tool outputs. The vulnerability challenges the assumption that local LLM execution guarantees privacy, as Ollama servers often evolve from local experiments into shared infrastructure with exposed endpoints and egress paths. The disclosure timeline also revealed a security visibility gap between patch availability and clear release notes.

One Agent or Many? Orchestrating AI Agents Without the Mess
AI/ML / dev.to

One Agent or Many? Orchestrating AI Agents Without the Mess

Orchestrating AI agents effectively starts by maximizing a single agent with more tools before splitting into multiple agents. When a single agent buckles under complex logic or tool overload, two patterns emerge: a manager agent that calls specialists as tools, or decentralized handoffs where peers transfer conversations. The key is to keep evaluation and maintenance simple by avoiding premature splitting.

DevTools / cncf.io

Securing CI/CD for an open source project, part 3: Credentials, verification, and what’s next

This article is the final installment of a series detailing how the Cilium open-source project secures its CI/CD pipeline, focusing on credential management, verification mechanisms, and future security enhancements. It likely covers best practices for handling secrets, signing artifacts, and ensuring pipeline integrity.

General / jeffgeerling.com

Framework's 10G Ethernet module exposes USB-C's complexity

Jeff Geerling's testing of the WisdPi 10G Ethernet Expansion Card for Framework laptops reveals that USB-C's bandwidth complexity and the Realtek RTL8159 controller's requirement for USB 3.2 Gen 2x2 (20 Gbps) often bottleneck performance to well under 8 Gbps on many Framework models, including the Framework 13 with AMD Ryzen AI 5 340. Even on the Framework 12, which officially supports Gen 2x2, Linux on Ubuntu 26.04 with kernel 7.x failed to achieve more than 7 Gbps due to driver issues, while Windows 11 with the Realtek driver reached 9.4+ Gbps but introduced thermal concerns with the module hitting nearly 70°C. Geerling recommends the $99 card only for users who need speeds beyond the 2.5 Gbps offered by Framework's $40 Ethernet Expansion Card and who can avoid lap use due to heat.

Announcing Silk: a silky smooth fiber runtime for ClickHouse
DevTools / clickhouse.com

Announcing Silk: a silky smooth fiber runtime for ClickHouse

ClickHouse announced Silk, a stackful-fiber C++ library with a NUMA-aware work-stealing scheduler and io_uring as the I/O ground truth, designed to eliminate heap allocation in the steady-state hot path. Unlike OS threads or C++20 stackless coroutines, Silk yields in tens of nanoseconds and avoids cache aliasing issues that plagued prior fiber implementations like Alibaba's Photon. The first integration target is ClickHouse's distributed cache, targeting I/O-bound workloads where tail latency at the 99th and 99.9th percentile dominates query performance.

Building a European Cloud Orchestration Platform within an Enterprise
Cloud / infoq.com

Building a European Cloud Orchestration Platform within an Enterprise

Maximilian Techritz and Johannes Ott detailed at KubeCon Europe how their enterprise built a European cloud orchestration platform using a Kubernetes Control Plane approach with Crossplane, External Secrets Operator, Kyverno, and Flux. Rather than building another tool, they unified management of GCP, AWS, and Azure resources via declarative configuration and GitOps. Adoption was driven through monthly tech talks and inner-source collaboration, reducing the cognitive load on engineers managing disparate toolchains.

Nona isn't your open source project #151,523
Open Source / dev.to

Nona isn't your open source project #151,523

Nona is an open-source, self-hosted remote config and feature flag service designed as a lean alternative to Firebase Remote Config. It runs as a single Docker container with multi-zone availability via SQLD, prioritizing real-time responses with no cache and obsessing over latency and memory consumption. The team of experienced architects focuses on doing one thing excellently rather than cramming in thousands of features.

Run a vLLM Server on HF Jobs in One Command
AI/ML / huggingface.co

Run a vLLM Server on HF Jobs in One Command

Hugging Face now lets you spin up a private, OpenAI-compatible vLLM server with a single `hf jobs run` command, using the official `vllm/vllm-openai` image and pay-per-second billing on A10g GPUs ($1.50/hr). The endpoint is gated by HF tokens, supports curl or the OpenAI Python client, and auto-stops via a configurable `--timeout` flag. This eliminates manual server provisioning and Kubernetes overhead for ephemeral LLM workloads like testing, evals, or batch generation.