How do I choose between Tesseract and OpenCV? [closed]

  • Tesseract is an OCR engine. It’s used, worked on and funded by Google specifically to read text from images, perform basic document segmentation and operate on specific image inputs (a single word, line, paragraph, page, limited dictionaries, etc.).

  • OpenCV, on the other hand, is a computer vision library that includes features that let you perform some feature extraction and data classification. You can create a simple letter segmenter and classifier that performs basic OCR, but it is not a very good OCR engine (I’ve made one in Python before from scratch. It’s really inaccurate for input that deviates from your training data).

If you want to get a basic understanding of how hard OCR is, try OpenCV. Tesseract is for real OCR.

Leave a Comment