← Back to feed

opendataloader-pdf

GitHub Repo Pretty sure · shipping beats vaporware
https://github.com/opendataloader-project/opendataloader-pdf

Legit PDF parser with real benchmarks and accessibility angle—not rebadged wrapper nonsense, but enterprise roadmap is doing the heavy lifting until Q2 2026.

35%
40%
25%
Slop 35%Signal 40%Science 25%

Ship today: deterministic PDF→Markdown/JSON extraction, hybrid AI mode, benchmarks (#1 overall 0.90, table 0.93) that look credible vs competitors. Real code in three SDKs (Python/Node/Java). The accessibility auto-tagging pitch is smart but—and this matters—it's *not shipping until Q2 2026*. That's the entire differentiation vs Docling/Marker, and it's a roadmap promise. The enterprise upsell (PDF/UA export, accessibility studio) is standard playbook, but the free tier is genuinely useful fo...

5491 stars Java 2026-03-19 310 days old

Become a MFer to rate — log in