Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.
Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.
Features
- OCR engine and command line program
- Line recognition and character pattern recognition
- Unicode (UTF-8) support
- Recognizes more than 100 languages, and can be trained to recognize others
- Supports various output formats
License
Apache License V2.0Follow Tesseract OCR
Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit
Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Rate This Project
Login To Rate This Project
User Reviews
-
Enjoy this project for my mission
-
Brilliant. Worked properly first time. great code.
-
very good OCR project!
-
wow, good OCR. The release files are very oldest than http://code.google.com/p/tesseract-ocr/ I packed tesseract with gImageReader http://sourceforge.net/projects/gimagereader/
-
how to install in win Xp?