Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does it compare to docling?


Docling primarily uses AI models to extract PDF content, this project looks like it uses a custom parser written in Java, built atop veraPDF.


Correct me if I am wrong, but Docling can do both. It has also, among other strategies, a non-AI pipeline to determine the layout (based on qpdf I believe). So these projects are not that different.


While it has a PDF parser, my understanding is that it is mainly used to break a PDF document into chunks, which are then handed off to various specialized models. From its docs: "The main purpose of Docling is to run local models which are not sharing any user data with remote services."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: