Skip to content

[GitHub Trending] run-llama/liteparse

7.7 relevance
Score Breakdown
technical depth
7
novelty
4
actionability
9
community
8
strategic
4
personal
8

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

Open-source document parser highly actionable for data engineering tasks.

2026-05-29 Open Source github.com
A fast, helpful, and open-source document parser. Contribute to run-llama/liteparse development by creating an account on GitHub.
Summary

LiteParse is an open-source, locally-run PDF parser from the LlamaIndex team that extracts spatial text with bounding boxes via PDFium, supports pluggable OCR (Tesseract or custom HTTP servers), and generates PNG screenshots for LLM agents. Outputting structured JSON or layout-preserved text, it runs across platforms with bindings for Rust, Node.js, Python, and WASM, offering a lightweight alternative to cloud-based document parsers for simple documents while deferring complex cases to its sibling LlamaParse.

Key Takeaways
  • Integrate LiteParse as the default local PDF parser in your LLM agent stacks, using its spatial bounding boxes and screenshot capabilities for context, and scale to LlamaParse only when documents exceed local parsing limits.
Why it matters

For AI agent orchestration and data engineering pipelines, LiteParse provides a fast, local, open-source tool to feed structured PDF data into LLM workflows without cloud dependencies or API costs, fitting directly into RAG and data extraction chains.

Author

run-llama