Could not determine number of pages
Brought to you by:
tobias-elze
Hi,
I've just updated the version of pdfsandwich from 0.1.3 to 0.1.4 on Mac OS X 10.7 and Fedora 23. That is, I ran svn update and then make and make install.
Now I get this error on both machines and pdfsandwich is doing nothing:
Fatal error: exception Failure("Error: Could not determine number of pages of file /var/folders/v3/gypkpph90tv1vr56dn_q_xc40000gr/T/pdfsandwich_inputfiled3c818.pdf")
The file reported in the error message exists and seems to be a link to the pdf file that should be OCR-ed. But I can't open it and it might actually be that the link is wrong, because it points to the .pdf file as if it were in the tmp folder which it isn't.
Oh, sorry to hear. If you urgently need 0.1.4, you may want to download the sources from sourceforge instead of checking out via svn, as I'm currently fixing some other bugs, and this bug seems to be the side effect of one of my bug fixing attempts. But I'm optimistic that we can fix this quickly.
Could you please tell me the exact command line call which led to this error?
Thanks,
Tobias
Hi, thanks for the quick reply. I do not need version 0.1.4 urgently, so I just checked out revision r49 via svn and it works again. (I'm just too lazy, to manually download things ;))
I did not use any options to get the error, just this:
pdfsandwich He\ \&\ Kowler\ 1991.pdf(I also tried with a pdf file without spaces, but it didn't work either).
Okay, it should be fixed now. Could you try it out?
Thanks,
Tobias
I tried with another different PC (also Fedora 23), and the OCR seems to work, as it takes some time and corresponding progress messages are written on the console.
But, the last step fails with this error:
and I do not get the output file at the location where I started pdfsandwich.
The complete output file however is in the /tmp folder.
(Offtopic: The main reason I lately updated pdfsandwich was that one specific file resulted in badly readable text in the OCRed file (looks like low resolution). The original file looks good, but has no OCR in it. What options could I use to get good output? I already tried -resolution 500 and -noimage with no success.)
Thanks for noting this, I fixed that now. Feel free to try it out.
-noimage will work only together with hocr2pdf and will definitely not solve your problem. The first thing to try out is to skip pre-processing by unpaper (Option: -nopreproc), because sometimes unpaper messes things up. Does that help anything? Feel free to send one of these pages directly to me so that I can have a look.
Tobias
Alright, it works now. Thanks for the quick fix.
Regarding the offtopic problematic PDF:
Here is the the PDF file:
https://kartoffelsalat.ddns.net:8001/f/571989e24d/?raw=1
And this is after pdfsandwich with the -nopreproc option, sadly still with worse quality:
https://kartoffelsalat.ddns.net:8001/f/da6dad419a/?raw=1
I have installed pdfsandwich on Ubuntu and I'm trying to execute below command for .tif (Multipages tif file) to .pdf file and it throws below error message.
Can you please help me on this?
$ /usr/bin/pdfsandwich -verbose -lang spa+eng+fra Sample_3_Multi_page.tif -o Sample_3_Multi_page.pdf
pdfsandwich version 0.1.4
Checking for convert:
convert -version
Version: ImageMagick 6.8.9-9 Q16 x86_64 2018-07-10 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC Modules OpenMP
Delegates: bzlib cairo djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png rsvg tiff wmf x xml zlib
Checking for unpaper:
unpaper -version
6.1
Checking for tesseract:
tesseract -v
tesseract 3.04.01
leptonica-1.73
libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
Checking for gs:
gs -v
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
Input file: "Sample_3_Multi_page.tif"
Output file: "Sample_3_Multi_page.pdf"
Fatal error: exception Failure("Error: Could not determine number of pages of file Sample_3_Multi_page.tif")
Thanks.