I'm very grateful to this thread -- I was getting ready to order a CZUR scanner in the expectation that it would work reasonably well under Linux.  So much for those expectations.

One of the ways in which OCR contributes real value is if you have a large number of documents that are idiosyncratic in the same way -- then you can teach it how to recognize the characters.  I have lots of old books with such oddities, and OCR that is optimized for mass use -- read: business paperwork -- just doesn't cut it.  If anyone knows of anything open-source that works reasonably well, I'd love to hear about it.

On 11/14/22 10:54, Alvin Starr via talk wrote:
On 2022-11-14 08:40, Stewart C. Russell via talk wrote:
On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:

Apparently the scan under MacOS (and probably under Windows)
has better OCR than under Linux.  Grr.

We're probably stuck with Tesseract, which — while it's much better than it used to be — is now optimized for mass "good enough" recognition of simple pages. Omnipage dropped its Linux support years ago, and Abbyy Finereader's Linux support is only for ($$$) enterprise. Adobe's now the monster of OCR, but of course it's only built into its rented Acrobat Pro platform.

It's a shame that Linux users don't get the nice things that come with hardware that we buy. The page remapping and finger editing-out sound very handy.
There are a number of cloud OCR solutions.

I have not tested them but I would bet they are of good quality.
Of course the trade off is that your making your data available for the cloud provider to monetize along with analyzing by the worlds various security services.

A few years ago I tested various text to speech solutions and in the end the only ones of quality that were not insanely expensive were the cloud providers.
Initially I was using the google TTS that was bundled into chrome but that got closed down so I ended up with the fee based service.
Still the quality was way better than anything we could buy.

My guess is that OCR will go that way.
The hardware manufacturers will bundle some white labeled cloud service that is somehow limited or hobbled and subject to upsell.
-- 
Peter King			 	peter.king@utoronto.ca
Department of Philosophy
170 St. George Street #521
The University of Toronto		   (416)-946-3170 ofc
Toronto, ON  M5R 2M8
       CANADA

http://individual.utoronto.ca/pking/

=========================================================================
GPG keyID 0x7587EC42 (2B14 A355 46BC 2A16 D0BC  36F5 1FE6 D32A 7587 EC42)
gpg --keyserver pgp.mit.edu --recv-keys 7587EC42