I haven't seen anything better. It started as a PoC and I decided not to include table detection on the page and require the user to draw box around the table.
I use Tabula under the hood for the cell/row detection and it is really good given the correct mode is selected for the type of table. The modes are stream (find cells by spacing) or lattice (find cells by ruling lines).
I use Tabula under the hood for the cell/row detection and it is really good given the correct mode is selected for the type of table. The modes are stream (find cells by spacing) or lattice (find cells by ruling lines).
The OCR/OpenCV seemed to be fine as well as long as the text isn't too blurry. Here is a GIF of the OCR/OpenCV running on an example Image PDF: https://lh3.googleusercontent.com/-OobUBBtnydg/X6Vn_Ls3juI/A...