OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Features

  • Generates a searchable PDF/A file from a regular PDF
  • Places OCR text accurately below the image to ease copy / paste
  • Keeps the exact resolution of the original embedded images
  • When possible, inserts OCR information as a "lossless" operation without disrupting any other content
  • Optimizes PDF images, often producing files smaller than the input file
  • If requested, deskews and/or cleans the image before performing OCR
  • Distributes work across all available CPU cores

Project Samples

Project Activity

See All Activity >

Categories

PDF, OCR

License

Mozilla Public License 1.0 (MPL)

Follow OCRmyPDF

OCRmyPDF Web Site

Other Useful Business Software
Gemini 3 and 200+ AI Models on One Platform Icon
Gemini 3 and 200+ AI Models on One Platform

Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OCRmyPDF!

Additional Project Details

Programming Language

Python

Related Categories

Python PDF Software, Python OCR Software

Registered

2023-11-17