The Wire — 2026-05-24

AWS MCP Server Reaches GA with Full API Coverage and IAM-Based Governance

AWS's managed MCP server is now GA, providing AI coding agents with IAM-governed access to all AWS APIs, documentation, and sandboxed Python execution via a standard interface. Part of the open-source Agent Toolkit, it integrates with Claude Code and Cursor, using OAuth 2.1 with a local proxy for IAM credentials, though practitioners note the lack of action-restricting gateways.

Why it matters

For a Solutions Architect focused on agent orchestration and cloud infrastructure, this offers a standardized, auditable path to connect AI agents to AWS services—addressing governance and security gaps that have hindered production deployments.

General / dev.to

I stress-tested Gemma 4 E4B's 128K context on a laptop GPU — recall is great, prefill is not

Stress-testing Gemma 4 E4B (Q4_K_M, ~9.6 GB) on an RTX 5050 laptop with 8 GB VRAM showed perfect recall across 5K–100K context in a needle-in-a-haystack test, but time to first token (prefill) scaled nearly linearly from 4s at 5K to 72s at 100K, while generation throughput dropped only 26% (9.2→6.8 tok/s). The author defines three practical zones—interactive (<20K), research-assistant (20–60K), batch (60–100K)—and provides a ~30-line Python rig on Ollama 0.24.0 to reproduce the results.

AI/ML / dev.to

What Is WebMCP? The Google I/O 2026 Web Standard That Changes AI Agent Tool Use

WebMCP is a proposed open web standard from Google that lets developers annotate JS functions and HTML forms so AI agents can call them as typed tools, replacing brittle DOM scraping or custom API integrations. The origin trial starts in Chrome 149, enabling agents like Gemini to interact with websites reliably via a manifest. This shifts every website into a tool surface for agent orchestration, solving the long-tail integration problem for browser-based AI agents.

AI/ML / thenewstack.io

OpenClaw passed 300,000 GitHub stars. Then Google launched Spark.

OpenClaw, Peter Steinberger's open-source personal agent, surpassed 300,000 GitHub stars by April, offering self-hosted control on a Mac mini drawing 7 watts. Google countered at I/O with Gemini Spark, a 24/7 agent built on Gemini 3.5 Flash and the Antigravity stack, running on Google Cloud VMs with deep Gmail/Docs/Sheets integration. Both converge on MCP for tool connectivity, but the substrate—self-hosted metal vs. managed cloud—determines who holds credentials and context, with Chinese regulators already flagging OpenClaw's local security risks.

AI/ML / dev.to

Gemma 4 on Android: Tricks for Faster On-Device Inference

Optimizing on-device inference with Gemma 4 E2B on Android using LiteRT-LM 0.12.0 requires careful backend handling: GPU via OpenCL can deliver 52 tok/s on high-end devices like the S26 Ultra, but silently falls back to CPU (2-5 tok/s) on mid-range hardware, and NPU initialization risks native crashes due to driver fragmentation. Prefill latency (time to first token) is often the bigger bottleneck than decode speed, especially with long inputs, so streaming tokens and capping output length are critical UX mitigations. The model uses the .litertlm format from Hugging Face (gated, requires read token) and must not be confused with GGUF.