Skip to content

[GitHub Trending] opendatalab/MinerU

7.3 relevance
Score Breakdown
technical depth
7
novelty
7
actionability
8
community
7
strategic
6
personal
9

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

Transforms PDFs to LLM-ready formats, directly enabling agentic data pipelines.

AI/ML github.com
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows. - opendatalab/MinerU
Summary

MinerU is an open-source document parser for LLM/RAG/Agent workflows, converting PDF and Office formats to structured Markdown/JSON via a dual VLM+OCR engine covering 109 languages. It integrates natively with LangChain, Dify, and MCP Server, and supports three inference backends (pipeline, vlm-engine, hybrid-engine) for CPU/GPU and domestic AI chips. The 3.4 release upgraded the pipeline OCR to PP-OCRv6, delivering 11% higher accuracy and 2x faster processing.

Author

opendatalab