Cloudflare Identifies Query Planning Bottleneck in ClickHouse
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
ClickHouse query planning bottleneck analysis is deep technical content on data engineering and observability.
Cloudflare traced a billing pipeline slowdown to lock contention in ClickHouse's query planning stage, where 45% of CPU time was spent in the filterPartsByPartition function waiting on a single mutex. The team patched ClickHouse by replacing an exclusive lock with a shared lock, removing per-query copies of the parts list, and improving part filtering, cutting query durations by 50% and decoupling latency from part count growth. The root cause emerged after migrating to a per-tenant partitioning scheme that increased data parts without changing query access patterns.