PDF Parser Command Line
xpdf file.pdf
xpdf file.pdf :18
OCR Commands
PDF pages to Image
magick -density 150 file.pdf[1-6] -quality 90 -resize 75% pdfPage-%d.png
Image to Text
rem tesseract pdfPage-2.png pdfText/pdfPage-2.txt
rem tesseract pdfPage-[3-6].png pdfText/pdfPage-%d.md
FOR /L %y IN (3, 1, 6) do tesseract "pdfPage-%y.png" pdfText/pdfPage-%y
rem FOR /L %%y IN (3, 1, 6) do tesseract "pdfPage-%%y.png" pdfText/pdfPage-%%y
tesseract pdfPage-6.png pdfText/pdfPage-6 -l mal
|
|