The Wire — 2026-06-04

Gemma 4 12B: A unified, encoder-free multimodal model

Google DeepMind's Gemma 4 12B is an open-source (Apache 2.0) multimodal model that runs locally on laptops with 16GB VRAM, using an encoder-free architecture to natively process vision and audio without separate encoders. It incorporates multi-token prediction drafters for low latency and achieves benchmark performance near the larger 26B MoE model, enabling agentic workflows on consumer hardware.

Why it matters

For platform engineers and architects building local AI agents, Gemma 4 12B offers a high-performing, laptop-runnable multimodal model with a novel unified architecture that eliminates encoder bottlenecks, making it ideal for privacy-sensitive and offline agentic applications.

AI/ML / kasra.blog

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

A researcher spent $1,500 testing 10 LLMs on a deliberately vulnerable React Native/Expo app with a FastAPI backend and open Firebase Firestore. GPT-5.5 achieved a 70% solve rate ($9.46/solve), while Deepseek V4 Pro solved 3/10 at $0.62/solve, and Claude Sonnet 4.6 succeeded only 2/10 but was often cut off by budget limits. Most failures stemmed from models never discovering the Firebase bypass, hitting security guardrails (Gemini refused nearly every run), or exhausting tokens without producing the exploit.

DevTools / blog.ammaraskar.com

Full Disclosure: 1-Click GitHub Token Stealing via a VSCode Bug

A critical VSCode bug in github.dev's webview security model enables attackers to steal a GitHub OAuth token with full repo access via a single click. The token, POSTed from github.com to github.dev for browser-based editing, is not scoped to a single repository. The exploit leverages VSCode's postMessage-based cross-origin communication between the main window and webview iframes, allowing an attacker's page to exfiltrate the token.

General / elixir-lang.org

Elixir v1.20: Now a gradually typed language

Elixir v1.20 introduces a gradual type system using set-theoretic types, achieving type inference without annotations. It finds verified bugs and dead code with low false positives by leveraging a dynamic() type that narrows types based on runtime checks. The system passes 12/13 categories in the 'If T: Benchmark for Type Narrowing', showing precise type recovery from ordinary code.

DevTools / infoq.com

Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet

Google's fleet-wide A/B experimentation system uses a centralized framework with a unified assignment layer supporting hierarchical allocation and deterministic bucketing to minimize interference across interconnected services. It emphasizes exposure logging to distinguish assigned from truly exposed populations, configuration propagation to serving systems for low-latency decisions, and guardrails to enforce traffic limits. The system integrates with analytics pipelines to evaluate impact across end-to-end user journeys, reducing operational overhead for product teams.

General / microsoft.com

mimalloc: A new, high-performance, scalable memory allocator for the modern era

Microsoft Research's mimalloc is a drop-in malloc replacement that uses per-thread heaps and atomic operations to scale to hundreds of threads and memory footprints exceeding 500 GiB, making it suitable for LLM workloads. Its compact ~12K line codebase delivers bounded worst-case times and low fragmentation, driving adoption in NoGIL CPython 3.13+, Unreal Engine, and Bing, with its Rust wrapper exceeding 100K daily downloads.

Cloud / infoq.com

AWS Replaces Fat-Tree Data Center Networks with Random Graph Theory, Cutting Routers by 69%

AWS has deployed Resilient Network Graphs (RNG) based on quasi-random graph theory as the default network topology for new non-GPU data centers, marking the first large-scale production use of expander-based fabrics. This eliminates the fat-tree hierarchy, cutting routers by 69%, boosting throughput up to 33%, and reducing network power consumption by 40%. The architecture relies on ShuffleBox, a passive optical device that creates random logical connectivity without latency or power, and Spraypoint, a custom distributed protocol that sprays traffic across multiple paths to handle the lack of hierarchy.

AI/ML / dev.to

Extending a MCP/A2A Currency Agent with A2UI

A tutorial extends a currency agent built on Google's A2A (Agent2Agent) and MCP protocols with A2UI, enabling real-time streaming of interactive UI components like charts and approval forms. The agent uses the ADK framework (Python, Go, Java support) and Antigravity CLI, the Gemini CLI successor, requiring a Google Cloud project with gemini-2.5-flash and ADK 2.1.0. It demonstrates multi-agent interoperability with dynamic UIs, using A2A for agent communication and A2UI for custom frontends.

AI/ML / cncf.io

Securing CI/CD for an open source project: Controlling who runs what

This article likely discusses recent supply chain attacks (e.g., Axios npm compromise, LiteLLM PyPI hijack) and presents strategies for securing CI/CD pipelines in open source projects, specifically focusing on access control and authorization of pipeline executions.

AI/ML / dev.to

Run AI Coding Agents Safely with Docker Sandboxes

Docker Sandboxes provide microVM environments for AI coding agents like Claude Code, Codex, and Cursor, isolating them from the host system to prevent arbitrary command execution and untrusted file modifications. The sandboxes offer network policy controls (Open, Balanced, Locked Down) via the `sbx` CLI, with Balanced allowing predefined domains such as AI provider APIs and package managers. Credentials are stored on the host with sandboxes seeing only sentinel values, and secrets can be set globally or per project.