31B — Gemma 4 Deployment with NVIDIA L4, MCP, Cloud Run, and Antigravity CLI

8.2 relevance

Similar to #4 but for 31B model, still highly actionable and relevant.

2026-06-02 DevTools dev.to

31B — Gemma 4 Deployment with NVIDIA L4, MCP, Cloud Run, and Antigravity CLI

Summary

Deploying Gemma 4 on Google Cloud Run with NVIDIA L4 GPUs and vLLM, this project uses Python MCP tools and Antigravity CLI (successor to Gemini CLI) to provision containers, manage the model, and run observability/performance tests. The MCP server communicates via stdio transport within the same local environment, with environment setup scripts for GCP authentication and variable management.

Key Takeaways

Leverage Antigravity CLI and the Python MCP SDK to build a self-hosted vLLM infrastructure agent that automates provisioning, deployment, and monitoring of Gemma 4 on Cloud Run with L4 GPUs.

Why it matters

For platform engineers deploying large language models in serverless GPU environments, this combines Cloud Run's scalability with MCP-based tooling to create a reproducible, agent-driven DevOps workflow for self-hosted vLLM deployments.

Author

xbill