TESSERACT ACCURACY 1/6/23 Since 2018, I have been testing Tesseract's OCR engine against the resolution of the text. I wrote script to auto-generate a test PDF file (here is an example using Helvetica Narrow font) with different resolution text in six different fonts (Helvetica, Times-Roman, Courier, Palatino, Bookman, and Helvetia-Narrow). I then run Tesseract on the different PDF's and determine the accuracy of the OCR. I characterize the resolution by the height of a typical capital letter in pixels. It turns out that there is a sweet spot for Tesseract of about 30 pixels for the height of a capital letter (seems strange to me that it would not continue to improve at higher and higher resolutions, but okay). See the plot below. My software k2pdfopt uses this result and tries to optimize OCR text size to be in this "sweet spot."