Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Oh my GOD -- the SUN just fell into YANKEE STADIUM!!


aus+uk / uk.comp.os.linux / Re: Tesseract alternative

SubjectAuthor
* Re: Tesseract alternativeDavey
+- Re: Tesseract alternativeAnthonyL
+- Re: Tesseract alternativeAnthonyL
`* Re: Tesseract alternativePancho
 `* Re: Tesseract alternativeDavey
  `* Re: Tesseract alternativePancho
   `* Re: Tesseract alternativePancho
    `* Re: Tesseract alternativeDavey
     `- Re: Tesseract alternativePancho

1
Re: Tesseract alternative

<3304777101@f1.n221.z2.fidonet.fi>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=486&group=uk.comp.os.linux#486

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!jHXdSDKPKExtdJMCjskdCQ.user.46.165.242.75.POSTED!not-for-mail
From: Dav...@f1.n221.z2.fidonet.fi (Davey)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Wed, 18 Aug 2021 09:41:08 +0200
Organization: rbb soupgate
Message-ID: <3304777101@f1.n221.z2.fidonet.fi>
References: <1332937368@f0.n0.z0.fidonet.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="63679"; posting-host="jHXdSDKPKExtdJMCjskdCQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-MailConverter: SoupGate-OS/2 v1.20
X-Notice: Filtered by postfilter v. 0.9.2
X-Comment-To: All
 by: Davey - Wed, 18 Aug 2021 07:41 UTC

On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:

> I use Ubuntu 18.04, and I want to find an OCR programme to use with my
> scanner. Research showed me that Tesseract was the most usual choice,
> so I found and followed several different installation procedures,
> none of which gave me a working OCR program. Further research showed
> that there are several known problems with this, and no two people
> seemed to have the same problem or fix. Eventually I gave up, having
> wasted hours getting nowhere.
> So I am looking for either a definitive way to get this working, or
> more likely, an alternative OCR program.
> The destination will usually be LibreOffice Write.
>
> Advice or helpful suggestions welcome, please.

I can only assume that nobody uses OCR on Ubuntu. Oh well.
--
Davey.

Re: Tesseract alternative

<611e3438.10981734@news.eternal-september.org>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=495&group=uk.comp.os.linux#495

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: nos...@please.invalid (AnthonyL)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Thu, 19 Aug 2021 10:42:48 GMT
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <611e3438.10981734@news.eternal-september.org>
References: <1332937368@f0.n0.z0.fidonet.org> <3304777101@f1.n221.z2.fidonet.fi>
Injection-Info: reader02.eternal-september.org; posting-host="ef5c69da66d2a23e7d202aae95c2d5cd";
logging-data="16472"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/wGd/aGFbCnyhH5YY82zyh"
Cancel-Lock: sha1:tTWOBfFAM2pPYymtv2tAIjfN16c=
X-Newsreader: Forte Free Agent 1.21/32.243
 by: AnthonyL - Thu, 19 Aug 2021 10:42 UTC

On Wed, 18 Aug 2021 09:41:08 +0200, Davey
<Davey@f1.n221.z2.fidonet.fi> wrote:

>On Sun, 15 Aug 2021 10:18:26 +0100
>Davey <davey@example.invalid> wrote:
>
>> I use Ubuntu 18.04, and I want to find an OCR programme to use with my
>> scanner. Research showed me that Tesseract was the most usual choice,
>> so I found and followed several different installation procedures,
>> none of which gave me a working OCR program. Further research showed
>> that there are several known problems with this, and no two people
>> seemed to have the same problem or fix. Eventually I gave up, having
>> wasted hours getting nowhere.
>> So I am looking for either a definitive way to get this working, or
>> more likely, an alternative OCR program.
>> The destination will usually be LibreOffice Write.
>>
>> Advice or helpful suggestions welcome, please.
>
>I can only assume that nobody uses OCR on Ubuntu. Oh well.

I have but not for a year or so and that was Tesseract and a bit
underwhelming. I think I was using the same (k)Ubuntu as you.

The last decent free OCR program I had came with a Canon all-in-one
printer on Windows. I think the more recent ones I've done have been
via Google Drive and convert from one format eg PDF to another eg
Google Docs.

--
AnthonyL

Why ever wait to finish a job before starting the next?

Re: Tesseract alternative

<1828586284@f1.n221.z2.fidonet.fi>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=496&group=uk.comp.os.linux#496

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!aioe.org!jHXdSDKPKExtdJMCjskdCQ.user.46.165.242.75.POSTED!not-for-mail
From: Antho...@f1.n221.z2.fidonet.fi (AnthonyL)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Thu, 19 Aug 2021 11:42:48 +0200
Organization: rbb soupgate
Message-ID: <1828586284@f1.n221.z2.fidonet.fi>
References: <3304777101@f1.n221.z2.fidonet.fi>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="32159"; posting-host="jHXdSDKPKExtdJMCjskdCQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
X-Comment-To: All
X-MailConverter: SoupGate-OS/2 v1.20
 by: AnthonyL - Thu, 19 Aug 2021 09:42 UTC

On Wed, 18 Aug 2021 09:41:08 +0200, Davey
<Davey@f1.n221.z2.fidonet.fi> wrote:

>On Sun, 15 Aug 2021 10:18:26 +0100
>Davey <davey@example.invalid> wrote:
>
>> I use Ubuntu 18.04, and I want to find an OCR programme to use with my
>> scanner. Research showed me that Tesseract was the most usual choice,
>> so I found and followed several different installation procedures,
>> none of which gave me a working OCR program. Further research showed
>> that there are several known problems with this, and no two people
>> seemed to have the same problem or fix. Eventually I gave up, having
>> wasted hours getting nowhere.
>> So I am looking for either a definitive way to get this working, or
>> more likely, an alternative OCR program.
>> The destination will usually be LibreOffice Write.
>>
>> Advice or helpful suggestions welcome, please.
>
>I can only assume that nobody uses OCR on Ubuntu. Oh well.

I have but not for a year or so and that was Tesseract and a bit
underwhelming. I think I was using the same (k)Ubuntu as you.

The last decent free OCR program I had came with a Canon all-in-one
printer on Windows. I think the more recent ones I've done have been
via Google Drive and convert from one format eg PDF to another eg
Google Docs.

--
AnthonyL

Why ever wait to finish a job before starting the next?

Re: Tesseract alternative

<sut0v9$386$1@dont-email.me>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=637&group=uk.comp.os.linux#637

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Pancho.D...@outlook.com (Pancho)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Sun, 20 Feb 2022 09:17:28 +0000
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sut0v9$386$1@dont-email.me>
References: <1332937368@f0.n0.z0.fidonet.org>
<3304777101@f1.n221.z2.fidonet.fi>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 20 Feb 2022 09:17:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0187e5e2e17ff3e0cc2327af784c31bf";
logging-data="3334"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19euQQolnfVvXs+Br7rU8G93d0MqpwmfWY="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.0
Cancel-Lock: sha1:4Zth/ogHiSEQdY268bTtGz8OYQY=
In-Reply-To: <3304777101@f1.n221.z2.fidonet.fi>
Content-Language: en-GB
 by: Pancho - Sun, 20 Feb 2022 09:17 UTC

On 18/08/2021 08:41, Davey wrote:
> On Sun, 15 Aug 2021 10:18:26 +0100
> Davey <davey@example.invalid> wrote:
>
>> I use Ubuntu 18.04, and I want to find an OCR programme to use with my
>> scanner. Research showed me that Tesseract was the most usual choice,
>> so I found and followed several different installation procedures,
>> none of which gave me a working OCR program. Further research showed
>> that there are several known problems with this, and no two people
>> seemed to have the same problem or fix. Eventually I gave up, having
>> wasted hours getting nowhere.
>> So I am looking for either a definitive way to get this working, or
>> more likely, an alternative OCR program.
>> The destination will usually be LibreOffice Write.
>>
>> Advice or helpful suggestions welcome, please.
>
> I can only assume that nobody uses OCR on Ubuntu. Oh well.
>

OCRmypdf works and comes with a working Dockerhub image. It is based on
Tesseract.

<https://hub.docker.com/r/jbarlow83/ocrmypdf/>

I've been meaning into set it up as a service on a shared folder, where
my scanner depositing scans. The OCR worked ok, but I had a bit of
trouble determining an event handler for when the scanner PDF file
writing was actually completed. I had a few problems a few months ago,
and gave up, but some of them seem to have resolved themselves, so I'll
give it another go.

Re: Tesseract alternative

<sut3hq$gaj$1@dont-email.me>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=638&group=uk.comp.os.linux#638

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: dav...@example.invalid (Davey)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Sun, 20 Feb 2022 10:01:30 +0000
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <sut3hq$gaj$1@dont-email.me>
References: <1332937368@f0.n0.z0.fidonet.org>
<3304777101@f1.n221.z2.fidonet.fi>
<sut0v9$386$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 20 Feb 2022 10:01:30 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="3ea986c1673db861a4e90c4a5ecd8052";
logging-data="16723"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19deNk02HrrYG715BAGD0TW"
Cancel-Lock: sha1:dRGdWdm4ictSVvcMuwjsb6UqNDY=
X-Newsreader: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)
 by: Davey - Sun, 20 Feb 2022 10:01 UTC

On Sun, 20 Feb 2022 09:17:28 +0000
Pancho <Pancho.Dontmaileme@outlook.com> wrote:

> On 18/08/2021 08:41, Davey wrote:
> > On Sun, 15 Aug 2021 10:18:26 +0100
> > Davey <davey@example.invalid> wrote:
> >
> >> I use Ubuntu 18.04, and I want to find an OCR programme to use
> >> with my scanner. Research showed me that Tesseract was the most
> >> usual choice, so I found and followed several different
> >> installation procedures, none of which gave me a working OCR
> >> program. Further research showed that there are several known
> >> problems with this, and no two people seemed to have the same
> >> problem or fix. Eventually I gave up, having wasted hours getting
> >> nowhere. So I am looking for either a definitive way to get this
> >> working, or more likely, an alternative OCR program.
> >> The destination will usually be LibreOffice Write.
> >>
> >> Advice or helpful suggestions welcome, please.
> >
> > I can only assume that nobody uses OCR on Ubuntu. Oh well.
> >
>
> OCRmypdf works and comes with a working Dockerhub image. It is based
> on Tesseract.
>
> <https://hub.docker.com/r/jbarlow83/ocrmypdf/>
>
> I've been meaning into set it up as a service on a shared folder,
> where my scanner depositing scans. The OCR worked ok, but I had a bit
> of trouble determining an event handler for when the scanner PDF file
> writing was actually completed. I had a few problems a few months
> ago, and gave up, but some of them seem to have resolved themselves,
> so I'll give it another go.
>

I'll take a look, bu Tesseract was the sticking point when I tried to
get OCR working. Thanks.
--
Davey.

Re: Tesseract alternative

<suu95j$hvn$1@dont-email.me>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=641&group=uk.comp.os.linux#641

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Pancho.D...@outlook.com (Pancho)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Sun, 20 Feb 2022 20:43:31 +0000
Organization: A noiseless patient Spider
Lines: 68
Message-ID: <suu95j$hvn$1@dont-email.me>
References: <1332937368@f0.n0.z0.fidonet.org>
<3304777101@f1.n221.z2.fidonet.fi> <sut0v9$386$1@dont-email.me>
<sut3hq$gaj$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 20 Feb 2022 20:43:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0187e5e2e17ff3e0cc2327af784c31bf";
logging-data="18423"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19xQM8g5/oNmv1kuKHTV6JkmSu0I4AngmQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.0
Cancel-Lock: sha1:AWiEsKpTEHwA9XdvAjesn3nY4kQ=
In-Reply-To: <sut3hq$gaj$1@dont-email.me>
Content-Language: en-GB
 by: Pancho - Sun, 20 Feb 2022 20:43 UTC

On 20/02/2022 10:01, Davey wrote:
> On Sun, 20 Feb 2022 09:17:28 +0000
> Pancho <Pancho.Dontmaileme@outlook.com> wrote:
>
>> On 18/08/2021 08:41, Davey wrote:
>>> On Sun, 15 Aug 2021 10:18:26 +0100
>>> Davey <davey@example.invalid> wrote:
>>>
>>>> I use Ubuntu 18.04, and I want to find an OCR programme to use
>>>> with my scanner. Research showed me that Tesseract was the most
>>>> usual choice, so I found and followed several different
>>>> installation procedures, none of which gave me a working OCR
>>>> program. Further research showed that there are several known
>>>> problems with this, and no two people seemed to have the same
>>>> problem or fix. Eventually I gave up, having wasted hours getting
>>>> nowhere. So I am looking for either a definitive way to get this
>>>> working, or more likely, an alternative OCR program.
>>>> The destination will usually be LibreOffice Write.
>>>>
>>>> Advice or helpful suggestions welcome, please.
>>>
>>> I can only assume that nobody uses OCR on Ubuntu. Oh well.
>>>
>>
>> OCRmypdf works and comes with a working Dockerhub image. It is based
>> on Tesseract.
>>
>> <https://hub.docker.com/r/jbarlow83/ocrmypdf/>
>>
>> I've been meaning into set it up as a service on a shared folder,
>> where my scanner depositing scans. The OCR worked ok, but I had a bit
>> of trouble determining an event handler for when the scanner PDF file
>> writing was actually completed. I had a few problems a few months
>> ago, and gave up, but some of them seem to have resolved themselves,
>> so I'll give it another go.
>>
>
> I'll take a look, bu Tesseract was the sticking point when I tried to
> get OCR working. Thanks.
>

OK. I just looked. I'm actually building my own Docker Image (presumably
because I wanted to use it on a rPi). It's very simple Dockerfile
---
FROM ubuntu:latest

RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
python3-pip \
ocrmypdf

RUN pip3 install --no-cache-dir \
watchdog==0.10.2

COPY watcher.py .

ENTRYPOINT ["/usr/bin/ocrmypdf"]
---

I can't speak to your set up, but "apt install ocrmypdf" includes
Tesseract. The Dockerfile is basically doing the ocrmypdf install on
virgin Ubuntu:20.04. It works, I just tested it.

Re: Tesseract alternative

<sv07ml$t44$1@dont-email.me>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=642&group=uk.comp.os.linux#642

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Pancho.D...@outlook.com (Pancho)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Mon, 21 Feb 2022 14:30:45 +0000
Organization: A noiseless patient Spider
Lines: 85
Message-ID: <sv07ml$t44$1@dont-email.me>
References: <1332937368@f0.n0.z0.fidonet.org>
<3304777101@f1.n221.z2.fidonet.fi> <sut0v9$386$1@dont-email.me>
<sut3hq$gaj$1@dont-email.me> <suu95j$hvn$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Feb 2022 14:30:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="455d3535cd444193f0f4b1c000c33bee";
logging-data="29828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Q8IT/ktJ6fs5rELrlSdzNKq0JlmuytrU="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.0
Cancel-Lock: sha1:k9f4gobXwrVE3SE1q2JUhi0VLeI=
In-Reply-To: <suu95j$hvn$1@dont-email.me>
Content-Language: en-GB
 by: Pancho - Mon, 21 Feb 2022 14:30 UTC

On 20/02/2022 20:43, Pancho wrote:
> On 20/02/2022 10:01, Davey wrote:
>> On Sun, 20 Feb 2022 09:17:28 +0000
>> Pancho <Pancho.Dontmaileme@outlook.com> wrote:
>>
>>> On 18/08/2021 08:41, Davey wrote:
>>>> On Sun, 15 Aug 2021 10:18:26 +0100
>>>> Davey <davey@example.invalid> wrote:
>>>>> I use Ubuntu 18.04, and I want to find an OCR programme to use
>>>>> with my scanner. Research showed me that Tesseract was the most
>>>>> usual choice, so I found and followed several different
>>>>> installation procedures, none of which gave me a working OCR
>>>>> program. Further research showed that there are several known
>>>>> problems with this, and no two people seemed to have the same
>>>>> problem or fix. Eventually I gave up, having wasted hours getting
>>>>> nowhere. So I am looking for either a definitive way to get this
>>>>> working, or more likely, an alternative OCR program.
>>>>> The destination will usually be LibreOffice Write.
>>>>>
>>>>> Advice or helpful suggestions welcome, please.
>>>>
>>>> I can only assume that nobody uses OCR on Ubuntu. Oh well.
>>>
>>> OCRmypdf works and comes with a working Dockerhub image. It is based
>>> on Tesseract.
>>>
>>> <https://hub.docker.com/r/jbarlow83/ocrmypdf/>
>>>
>>> I've been meaning into set it up as a service on a shared folder,
>>> where my scanner depositing scans. The OCR worked ok, but I had a bit
>>> of trouble determining an event handler for when the scanner PDF file
>>> writing was actually completed. I had a few problems a few months
>>> ago, and gave up, but some of them seem to have resolved themselves,
>>> so I'll give it another go.
>>>
>>
>> I'll take a look, bu Tesseract was the sticking point when I tried to
>> get OCR working. Thanks.
>>
>
> OK. I just looked. I'm actually building my own Docker Image (presumably
> because I wanted to use it on a rPi). It's very simple Dockerfile
> ---
> FROM ubuntu:latest
>
>
> RUN apt-get update && apt-get install -y --no-install-recommends \
>   python3 \
>   python3-pip \
>   ocrmypdf
>
> RUN pip3 install --no-cache-dir \
> watchdog==0.10.2
>
> COPY watcher.py .
>
> ENTRYPOINT ["/usr/bin/ocrmypdf"]
> ---
>
> I can't speak to your set up, but "apt install ocrmypdf" includes
> Tesseract. The Dockerfile is basically doing the ocrmypdf install on
> virgin Ubuntu:20.04. It works, I just tested it.
>

Now working on a Raspberry Pi.

I can store a PDF scan direct to a network share from the Scanner
console, no PC needed.

I have OcrMyPDF running in a Docker container on the rPi, watching the
folder. This automatically adds OCR text to the PDF and moves it to an
appropriate folder.

Now all I need to do is write something to give the scans appropriate
filenames, inferred from the OCR text.

Little things please little minds :-)

>
>
>
>

Re: Tesseract alternative

<sv0r2h$mte$1@dont-email.me>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=643&group=uk.comp.os.linux#643

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: dav...@example.invalid (Davey)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Mon, 21 Feb 2022 20:01:21 +0000
Organization: A noiseless patient Spider
Lines: 94
Message-ID: <sv0r2h$mte$1@dont-email.me>
References: <1332937368@f0.n0.z0.fidonet.org>
<3304777101@f1.n221.z2.fidonet.fi>
<sut0v9$386$1@dont-email.me>
<sut3hq$gaj$1@dont-email.me>
<suu95j$hvn$1@dont-email.me>
<sv07ml$t44$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Date: Mon, 21 Feb 2022 20:01:21 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="dfcbc61043ceb4b9b15866a81b192b08";
logging-data="23470"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18zEUtu+WS8iWnv/yqS7LNh"
Cancel-Lock: sha1:neNPxTeIvn32yRKOh2J1vZ4ELgc=
X-Newsreader: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)
 by: Davey - Mon, 21 Feb 2022 20:01 UTC

On Mon, 21 Feb 2022 14:30:45 +0000
Pancho <Pancho.Dontmaileme@outlook.com> wrote:

> On 20/02/2022 20:43, Pancho wrote:
> > On 20/02/2022 10:01, Davey wrote:
> >> On Sun, 20 Feb 2022 09:17:28 +0000
> >> Pancho <Pancho.Dontmaileme@outlook.com> wrote:
> >>
> >>> On 18/08/2021 08:41, Davey wrote:
> >>>> On Sun, 15 Aug 2021 10:18:26 +0100
> >>>> Davey <davey@example.invalid> wrote:
> >>>>> I use Ubuntu 18.04, and I want to find an OCR programme to use
> >>>>> with my scanner. Research showed me that Tesseract was the most
> >>>>> usual choice, so I found and followed several different
> >>>>> installation procedures, none of which gave me a working OCR
> >>>>> program. Further research showed that there are several known
> >>>>> problems with this, and no two people seemed to have the same
> >>>>> problem or fix. Eventually I gave up, having wasted hours
> >>>>> getting nowhere. So I am looking for either a definitive way to
> >>>>> get this working, or more likely, an alternative OCR program.
> >>>>> The destination will usually be LibreOffice Write.
> >>>>>
> >>>>> Advice or helpful suggestions welcome, please.
> >>>>
> >>>> I can only assume that nobody uses OCR on Ubuntu. Oh well.
> >>>
> >>> OCRmypdf works and comes with a working Dockerhub image. It is
> >>> based on Tesseract.
> >>>
> >>> <https://hub.docker.com/r/jbarlow83/ocrmypdf/>
> >>>
> >>> I've been meaning into set it up as a service on a shared folder,
> >>> where my scanner depositing scans. The OCR worked ok, but I had a
> >>> bit of trouble determining an event handler for when the scanner
> >>> PDF file writing was actually completed. I had a few problems a
> >>> few months ago, and gave up, but some of them seem to have
> >>> resolved themselves, so I'll give it another go.
> >>>
> >>
> >> I'll take a look, bu Tesseract was the sticking point when I tried
> >> to get OCR working. Thanks.
> >>
> >
> > OK. I just looked. I'm actually building my own Docker Image
> > (presumably because I wanted to use it on a rPi). It's very simple
> > Dockerfile ---
> > FROM ubuntu:latest
> >
> >
> > RUN apt-get update && apt-get install -y --no-install-recommends \
> >   python3 \
> >   python3-pip \
> >   ocrmypdf
> >
> > RUN pip3 install --no-cache-dir \
> > watchdog==0.10.2
> >
> > COPY watcher.py .
> >
> > ENTRYPOINT ["/usr/bin/ocrmypdf"]
> > ---
> >
> > I can't speak to your set up, but "apt install ocrmypdf" includes
> > Tesseract. The Dockerfile is basically doing the ocrmypdf install
> > on virgin Ubuntu:20.04. It works, I just tested it.
> >
>
> Now working on a Raspberry Pi.
>
> I can store a PDF scan direct to a network share from the Scanner
> console, no PC needed.
>
> I have OcrMyPDF running in a Docker container on the rPi, watching
> the folder. This automatically adds OCR text to the PDF and moves it
> to an appropriate folder.
>
> Now all I need to do is write something to give the scans appropriate
> filenames, inferred from the OCR text.
>
> Little things please little minds :-)
>
>
>
> >
> >
> >
> >
>

You are working at the extreme edge of my programming knowledge, so if
I try this, I might get lost!
--
Davey.

Re: Tesseract alternative

<sv19fb$kpt$1@dont-email.me>

 copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=644&group=uk.comp.os.linux#644

 copy link   Newsgroups: uk.comp.os.linux
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Pancho.D...@outlook.com (Pancho)
Newsgroups: uk.comp.os.linux
Subject: Re: Tesseract alternative
Date: Tue, 22 Feb 2022 00:07:07 +0000
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <sv19fb$kpt$1@dont-email.me>
References: <1332937368@f0.n0.z0.fidonet.org>
<3304777101@f1.n221.z2.fidonet.fi> <sut0v9$386$1@dont-email.me>
<sut3hq$gaj$1@dont-email.me> <suu95j$hvn$1@dont-email.me>
<sv07ml$t44$1@dont-email.me> <sv0r2h$mte$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 22 Feb 2022 00:07:07 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="54e6021f9cd8acf865d95b02233040f9";
logging-data="21309"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+IU1eq/GRVachKCv66wPiaUSZI/y6OlOA="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.0
Cancel-Lock: sha1:9oQJxhrEmkcMuGf3SUlJiu08NvY=
In-Reply-To: <sv0r2h$mte$1@dont-email.me>
Content-Language: en-GB
 by: Pancho - Tue, 22 Feb 2022 00:07 UTC

On 21/02/2022 20:01, Davey wrote:

>
> You are working at the extreme edge of my programming knowledge, so if
> I try this, I might get lost!

I'm not sure which bit. Maybe Docker? It is easier to learn Docker than
it is to learn how to diagnose installation problems in Linux, unless
you want to be like Sisyphus.

Your difficulties with Tesseract could be due to all sorts of things,
specific to your installation of Ubuntu. It is very hard to make
software packages work on every installation of Ubuntu, no matter what
else is installed.

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor