Menu

#13 Just extract text?

v1.0 (example)
open
nobody
None
5
2016-07-19
2016-07-19
hmijail
No

It would be very useful to have an option to only dump the text contained in the PDF. Looks like one of the files created by the Tesseract processing (as seen by using -debug, at least with Tesseract 3.04.00) is a .txt dump, but of course this is only page-by-page, so they would have to be recovered, concatenated and saved.

Discussion


Log in to post a comment.

MongoDB Logo MongoDB