Just extract text?
Brought to you by:
tobias-elze
It would be very useful to have an option to only dump the text contained in the PDF. Looks like one of the files created by the Tesseract processing (as seen by using -debug, at least with Tesseract 3.04.00) is a .txt dump, but of course this is only page-by-page, so they would have to be recovered, concatenated and saved.