
We bought a CZUR ET24 Pro. Quite a bit more expensive than other models. It claimed that it would support Linux. <https://shop.czur.com/products/etscanner> I installed the supplied-to-me RPM using "rpm --force". It seems to work. (I only scanned the junk on the desktop where the scanning mat will go.) | From: Kevin Cozens via talk <talk@gtalug.org> | If it helps I found a page from two years ago where some asked about using a | CZUR scanner. That page said they had installed and ran the Windows software | using Wine but the driver was still an issue. Right. If it worked, that might be the best way. Apparently the scan under MacOS (and probably under Windows) has better OCR than under Linux. Grr. At least if Stephen Wilkinson's review on the product page is correct. | That page pointed to a web page | from four years earlier that said it is a UVC device. We have earlier CZUR scanners which are UVC at a lower resolution than the proprietary scanning software sees. | The scanner has no support in XSane? Not that I've seen. | Is there any support for it in VueScan? I don't know VueScan. | From: Lennart Sorensen via talk <talk@gtalug.org> | Subject: Re: [GTALUG] Borked Python setup, please help | I see people claiming many CZUR scaners are just UVC devices, and that | there exists programs that can capture from that, like guvcview and a | few others. It strangely does sound like it is actually more of a video | camera than a scanner. They are visible as UVC, but I think that the full resolution isn't what a UVC driver sees. I have not tested this particular model. The other thing is that CZUR has put a lot of effort into optimizing the scanning process. Perhaps some of that also exists in mobile phone camera scanner apps. - puts red reference lines (from LEDs) on the page to figure out and correct for page curvature - (optionally) combines scanning and OCR - a workflow to speed up multipage scanning - little tools to put on your finger to hold down page edges. The software edits these out of the picture. - quite good resolution for the task - various lights to try to avoid glare. | Maybe your model is something else? This model claimed Linux support. We had an earlier one that we liked but we didn't like having to boot Windows and later having to ship the result to the Linux machine. Every step in the workflow is a tax. We'll see if the Linux software is good enough. Eventually.

On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:
Apparently the scan under MacOS (and probably under Windows) has better OCR than under Linux. Grr.
We're probably stuck with Tesseract, which — while it's much better than it used to be — is now optimized for mass "good enough" recognition of simple pages. Omnipage dropped its Linux support years ago, and Abbyy Finereader's Linux support is only for ($$$) enterprise. Adobe's now the monster of OCR, but of course it's only built into its rented Acrobat Pro platform. It's a shame that Linux users don't get the nice things that come with hardware that we buy. The page remapping and finger editing-out sound very handy. Stewart

On 2022-11-14 08:40, Stewart C. Russell via talk wrote:
On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:
Apparently the scan under MacOS (and probably under Windows) has better OCR than under Linux. Grr.
We're probably stuck with Tesseract, which — while it's much better than it used to be — is now optimized for mass "good enough" recognition of simple pages. Omnipage dropped its Linux support years ago, and Abbyy Finereader's Linux support is only for ($$$) enterprise. Adobe's now the monster of OCR, but of course it's only built into its rented Acrobat Pro platform.
It's a shame that Linux users don't get the nice things that come with hardware that we buy. The page remapping and finger editing-out sound very handy. There are a number of cloud OCR solutions.
I have not tested them but I would bet they are of good quality. Of course the trade off is that your making your data available for the cloud provider to monetize along with analyzing by the worlds various security services. A few years ago I tested various text to speech solutions and in the end the only ones of quality that were not insanely expensive were the cloud providers. Initially I was using the google TTS that was bundled into chrome but that got closed down so I ended up with the fee based service. Still the quality was way better than anything we could buy. My guess is that OCR will go that way. The hardware manufacturers will bundle some white labeled cloud service that is somehow limited or hobbled and subject to upsell. -- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

I'm very grateful to this thread -- I was getting ready to order a CZUR scanner in the expectation that it would work reasonably well under Linux. So much for those expectations. One of the ways in which OCR contributes real value is if you have a large number of documents that are idiosyncratic in the same way -- then you can teach it how to recognize the characters. I have lots of old books with such oddities, and OCR that is optimized for mass use -- read: business paperwork -- just doesn't cut it. If anyone knows of anything open-source that works reasonably well, I'd love to hear about it. On 11/14/22 10:54, Alvin Starr via talk wrote:
On 2022-11-14 08:40, Stewart C. Russell via talk wrote:
On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:
Apparently the scan under MacOS (and probably under Windows) has better OCR than under Linux. Grr.
We're probably stuck with Tesseract, which — while it's much better than it used to be — is now optimized for mass "good enough" recognition of simple pages. Omnipage dropped its Linux support years ago, and Abbyy Finereader's Linux support is only for ($$$) enterprise. Adobe's now the monster of OCR, but of course it's only built into its rented Acrobat Pro platform.
It's a shame that Linux users don't get the nice things that come with hardware that we buy. The page remapping and finger editing-out sound very handy. There are a number of cloud OCR solutions.
I have not tested them but I would bet they are of good quality. Of course the trade off is that your making your data available for the cloud provider to monetize along with analyzing by the worlds various security services.
A few years ago I tested various text to speech solutions and in the end the only ones of quality that were not insanely expensive were the cloud providers. Initially I was using the google TTS that was bundled into chrome but that got closed down so I ended up with the fee based service. Still the quality was way better than anything we could buy.
My guess is that OCR will go that way. The hardware manufacturers will bundle some white labeled cloud service that is somehow limited or hobbled and subject to upsell.
-- Peter King peter.king@utoronto.ca Department of Philosophy 170 St. George Street #521 The University of Toronto (416)-946-3170 ofc Toronto, ON M5R 2M8 CANADA http://individual.utoronto.ca/pking/ ========================================================================= GPG keyID 0x7587EC42 (2B14 A355 46BC 2A16 D0BC 36F5 1FE6 D32A 7587 EC42) gpg --keyserver pgp.mit.edu --recv-keys 7587EC42

On Mon, Nov 14, 2022 at 12:25 PM Peter King via talk <talk@gtalug.org> wrote:
One of the ways in which OCR contributes real value is if you have a large number of documents that are idiosyncratic in the same way ... If anyone knows of anything open-source that works reasonably well, I'd love to hear about it.
For all that Tesseract is a mass-ingestion OCR tool, it can be fine tuned. Whether there are tools for training it that are user-friendly, I don't know. I'd really like a tool that would stop tesseract on matches lower than a certain confidence threshold, and allow manual control of what was stored in the text. A few years ago tesseract was used to create a searchable archive of all available documentation from the Free City of Danzig, the short-lived city state that existed from 1920-1939 in what is now Gdańsk, Poland. Most of the paperwork (and there was a *lot*: very big on public participation in deciding on how they were going to be run) was printed in Fraktur (aka blackletter, gothic or textura). Tesseract was trained to read this script, and now the parameters live in the 'tesseract-ocr-frk' package for all to use. I wish they could have done the same for the then-contemporary written script of Sütterlin, one of the great "go home you're drunk" cursives. For very automatic OCR on Linux, the ocrmypdf tool is quite amazing. Great way of stress-testing your hardware, too. Stewart

On 2022-11-10 02:21, D. Hugh Redelmeier via talk wrote:
| Is there any support for it in VueScan?
I don't know VueScan. It is similar in idea to XSane. It supports a lot of (old/obsolete) scanners. I can't use XSane to scan slides on my HP G4010 because it doesn't turn on the light in the lid. VueScan does. The downside to VueScan is that is more of a commercial product. There are versions you can use for free but it may add a watermark to the scanned images. To use it for scanning slides without watermarks I would need to play $149.
-- Cheers! Kevin. http://www.ve3syb.ca/ | "Nerds make the shiny things that https://www.patreon.com/KevinCozens | distract the mouth-breathers, and | that's why we're powerful" Owner of Elecraft K2 #2172 | #include <disclaimer/favourite> | --Chris Hardwick
participants (6)
-
Alvin Starr
-
D. Hugh Redelmeier
-
Kevin Cozens
-
Peter King
-
Stewart C. Russell
-
Stewart Russell