How do I segment a document using Tesseract then output the resulting bounding boxes and labels

Success. Many thanks to the people at the Pattern Recognition and Image Analysis Research Lab (PRImA) for producing tools to handle this. You can obtain them freely on their website or github. Below I give the full solution for a Mac running 10.10 and using the homebrew package manager. I use wine to run windows … Read more

Extracting code from photograph of T-shirt via OCR

You can probably type faster than you can clean up images and install OCR engines: #!/usr/bin/perl (my$d=q[AA GTCAGTTCCT CGCTATGTA ACACACACCA TTTGTGAGT ATGTAACATA CTCGCTGGC TATGTCAGAC AGATTGATC GATCGATAGA ATGATAGATC GAACGAGTGA TAGATAGAGT GATAGATAGA GAGAGA GATAGAACGA TC GATAGAGAGA TAGATAGACA G ATCGAGAGAC AGATA GAACGACAGA TAGATAGAT TGAGTGATAG ACTGAGAGAT AGATAGATTG ATAGATAGAT AGATAGATAG ACTGATAGAT AGAGTGATAG ATAGAATGAG AGATAGACAG ACAGACAGAT AGATAGACAG AGAGACAGAT TGATAGATAG ATAGATAGAT TGATAGATAG … Read more

Converting YUV->RGB(Image processing)->YUV during onPreviewFrame in android?

Although the documentation suggests that you can set which format the image data should arrive from the camera in, in practice you often have a choice of one: NV21, a YUV format. For lots of information on this format see http://www.fourcc.org/yuv.php#NV21 and for information on the theory behind converting it to RGB see http://www.fourcc.org/fccyvrgb.php. There … Read more

Using Tesseract for handwriting recognition

In short, you would have to train the Tesseract engine to recognize the handwriting. Take a look at this link: Tesseract handwriting with dictionary training This is what the linked post says: It’s possible to train tesseract to recognize handwriting. Here are the instructions: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract But don’t expect very good results. Academics have typically gotten … Read more