Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

OK, enough hype. -- Larry Wall in the perl man page


computers / comp.mail.pine / De-duplicating a Maildir directory

SubjectAuthor
* De-duplicating a Maildir directoryd...@brannerchinese.com
+* Re: De-duplicating a Maildir directoryd...@brannerchinese.com
|`* Re: De-duplicating a Maildir directoryJ.O. Aho
| +- Re: De-duplicating a Maildir directoryd...@brannerchinese.com
| `* Re: De-duplicating a Maildir directoryCarlos E.R.
|  `* Re: De-duplicating a Maildir directoryHenning Hucke
|   `- Re: De-duplicating a Maildir directoryCarlos E.R.
`- Re: De-duplicating a Maildir directoryEduardo Chappa

1
De-duplicating a Maildir directory

<a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=266&group=comp.mail.pine#266

  copy link   Newsgroups: comp.mail.pine
X-Received: by 2002:ac8:5c50:: with SMTP id j16mr1535823qtj.255.1639733383922;
Fri, 17 Dec 2021 01:29:43 -0800 (PST)
X-Received: by 2002:a25:dcc2:: with SMTP id y185mr3355534ybe.611.1639733383619;
Fri, 17 Dec 2021 01:29:43 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.mail.pine
Date: Fri, 17 Dec 2021 01:29:43 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=220.135.22.246; posting-account=Qw_BcgoAAAA92_AD2P838HK9JO60EUh_
NNTP-Posting-Host: 220.135.22.246
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
Subject: De-duplicating a Maildir directory
From: dpb...@brannerchinese.com (d...@brannerchinese.com)
Injection-Date: Fri, 17 Dec 2021 09:29:43 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 6
 by: d...@brannerchinese. - Fri, 17 Dec 2021 09:29 UTC

Does Alpine contain functionality for de-duplicating a Maildir directory?

It sometimes happens that a single message gets saved more than once to an archiving directory, and I'd like to know if there is already functionality for removing such duplicates.

Thanks!

- dpb

Re: De-duplicating a Maildir directory

<73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=267&group=comp.mail.pine#267

  copy link   Newsgroups: comp.mail.pine
X-Received: by 2002:a05:622a:1112:: with SMTP id e18mr1556143qty.226.1639733834835;
Fri, 17 Dec 2021 01:37:14 -0800 (PST)
X-Received: by 2002:a25:28a:: with SMTP id 132mr3281040ybc.681.1639733834620;
Fri, 17 Dec 2021 01:37:14 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.mail.pine
Date: Fri, 17 Dec 2021 01:37:14 -0800 (PST)
In-Reply-To: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=220.135.22.246; posting-account=Qw_BcgoAAAA92_AD2P838HK9JO60EUh_
NNTP-Posting-Host: 220.135.22.246
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>
Subject: Re: De-duplicating a Maildir directory
From: dpb...@brannerchinese.com (d...@brannerchinese.com)
Injection-Date: Fri, 17 Dec 2021 09:37:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 4
 by: d...@brannerchinese. - Fri, 17 Dec 2021 09:37 UTC

I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate

But I'm wondering if there is anything comparable built into Alpine itself.

- dpb

Re: De-duplicating a Maildir directory

<j23fquFn6amU1@mid.individual.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=268&group=comp.mail.pine#268

  copy link   Newsgroups: comp.mail.pine
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: use...@example.net (J.O. Aho)
Newsgroups: comp.mail.pine
Subject: Re: De-duplicating a Maildir directory
Date: Fri, 17 Dec 2021 13:58:06 +0100
Lines: 16
Message-ID: <j23fquFn6amU1@mid.individual.net>
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
<73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net CHKt4R3JhieqrLFf4CvZyATv2h3mITkZBBqVhld35kco86p4CK
Cancel-Lock: sha1:PaSYAY98JF7DcFwIdo015VWNJaA=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.4.0
Content-Language: en-US-large
In-Reply-To: <73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>
 by: J.O. Aho - Fri, 17 Dec 2021 12:58 UTC

On 17/12/2021 10.37, d...@brannerchinese.com wrote:
> I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate
>
> But I'm wondering if there is anything comparable built into Alpine itself.

I think de-duplication is a file system feature, zfs has a such
functionality where it will just store one block with the same data and
then just point to that block. When you delete the last file pointing to
that block, then the block content is deleted too.

No, I Alpine don't have a function for deleting duplicate mails, you
should look at tools made for this, for example
https://github.com/kdeldycke/mail-deduplicate

--
//Aho

Re: De-duplicating a Maildir directory

<c1e2a50a-e111-446d-97de-2c2d0f2e9302n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=275&group=comp.mail.pine#275

  copy link   Newsgroups: comp.mail.pine
X-Received: by 2002:a37:bac2:: with SMTP id k185mr4373152qkf.685.1639827653512;
Sat, 18 Dec 2021 03:40:53 -0800 (PST)
X-Received: by 2002:a25:1004:: with SMTP id 4mr10293751ybq.669.1639827653233;
Sat, 18 Dec 2021 03:40:53 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.mail.pine
Date: Sat, 18 Dec 2021 03:40:52 -0800 (PST)
In-Reply-To: <j23fquFn6amU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=220.135.22.246; posting-account=Qw_BcgoAAAA92_AD2P838HK9JO60EUh_
NNTP-Posting-Host: 220.135.22.246
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
<73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com> <j23fquFn6amU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c1e2a50a-e111-446d-97de-2c2d0f2e9302n@googlegroups.com>
Subject: Re: De-duplicating a Maildir directory
From: dpb...@brannerchinese.com (d...@brannerchinese.com)
Injection-Date: Sat, 18 Dec 2021 11:40:53 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 4
 by: d...@brannerchinese. - Sat, 18 Dec 2021 11:40 UTC

I find mail-deduplicate inadequately documented, and some of the functionality doesn't work as expected. Output, for instance, seems always to be to mbox format, even when I specify Maildir input.

However, I find fdupes (available through many package managers) helpful.

- dpb

Re: De-duplicating a Maildir directory

<112b2697-5a95-91cf-7e4a-ec073f5626b4@washington.edu>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=279&group=comp.mail.pine#279

  copy link   Newsgroups: comp.mail.pine
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cha...@washington.edu (Eduardo Chappa)
Newsgroups: comp.mail.pine
Subject: Re: De-duplicating a Maildir directory
Date: Sat, 18 Dec 2021 10:41:54 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <112b2697-5a95-91cf-7e4a-ec073f5626b4@washington.edu>
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Injection-Info: reader02.eternal-september.org; posting-host="7da90bf58cccedf12006e80b2a193ce3";
logging-data="29977"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RnV2z0CN1BM2RMxnIiUKb"
Cancel-Lock: sha1:f46KNqWbR4xB44+LnxcA/9L8oxk=
In-Reply-To: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
 by: Eduardo Chappa - Sat, 18 Dec 2021 17:41 UTC

On Fri, 17 Dec 2021, d...@brannerchinese.com wrote:

> Does Alpine contain functionality for de-duplicating a Maildir directory?
>
> It sometimes happens that a single message gets saved more than once to
> an archiving directory, and I'd like to know if there is already
> functionality for removing such duplicates.

Dear dpb,

if you build alpine with maildir support, then the mailutil program
bundled with Alpine will be able to read a maildir folder and remove
duplicates. What you would do is to use the mailutil program as

mailutil dedup MAILBOX_NAME

if you do not input the MAILBOX, mailutil will remove duplicates of your
INBOX. For purposes of defining a duplicate, this is understood as two
messages that have the same message-id.

I hope this helps.

--
Eduardo
https://tinyurl.com/yc377wlh (web)
http://repo.or.cz/alpine.git (Git)

Re: De-duplicating a Maildir directory

<0ps89i-gpi.ln1@Telcontar.valinor>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=281&group=comp.mail.pine#281

  copy link   Newsgroups: comp.mail.pine
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: robin_li...@es.invalid (Carlos E.R.)
Newsgroups: comp.mail.pine
Subject: Re: De-duplicating a Maildir directory
Date: Tue, 21 Dec 2021 13:09:36 +0100
Lines: 23
Message-ID: <0ps89i-gpi.ln1@Telcontar.valinor>
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
<73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>
<j23fquFn6amU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net uRaJ74IfIhUX2c+fnfhAKACBBl/lgoVi2hh9TABp6OK9rPElPY
X-Orig-Path: Telcontar.valinor!not-for-mail
Cancel-Lock: sha1:bKiyPGqzqJcuyjhkHYc6EaykA7Q=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
In-Reply-To: <j23fquFn6amU1@mid.individual.net>
Content-Language: en-CA
 by: Carlos E.R. - Tue, 21 Dec 2021 12:09 UTC

On 17/12/2021 13.58, J.O. Aho wrote:
> On 17/12/2021 10.37, d...@brannerchinese.com wrote:
>> I'm aware of this free-standing application:
>> https://github.com/kdeldycke/mail-deduplicate
>>
>> But I'm wondering if there is anything comparable built into Alpine
>> itself.
>
> I think de-duplication is a file system feature, zfs has a such
> functionality where it will just store one block with the same data and
> then just point to that block. When you delete the last file pointing to
> that block, then the block content is deleted too.
>
> No, I Alpine don't have a function for deleting duplicate mails, you
> should look at tools made for this, for example
> https://github.com/kdeldycke/mail-deduplicate

Thunderbird has an addon to do this. It searches a folder, and produces
a window listing duplicates (it displays several fields), offering to
delete them. I find it a useful function.

--
Cheers, Carlos.

Re: De-duplicating a Maildir directory

<slrnss8bhf.v5g.h_hucke+spam.news@romulus.aeon.icebear.cloud>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=282&group=comp.mail.pine#282

  copy link   Newsgroups: comp.mail.pine
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: h_hucke+...@newsmail.aeon.icebear.org (Henning Hucke)
Newsgroups: comp.mail.pine
Subject: Re: De-duplicating a Maildir directory
Date: Thu, 23 Dec 2021 08:07:11 -0000 (UTC)
Organization: aeon: think longer than you thought before
Lines: 26
Message-ID: <slrnss8bhf.v5g.h_hucke+spam.news@romulus.aeon.icebear.cloud>
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
<73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>
<j23fquFn6amU1@mid.individual.net> <0ps89i-gpi.ln1@Telcontar.valinor>
Reply-To: Henning Hucke <h_hucke+news.reply(trick)@newsmail.aeon.icebear.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8-Bit
X-Trace: individual.net CHaB+LUfRXtfDdeCY4zhIwCKfz7fQtiX9klaIWUm4rTn1Wp2Kb
X-Orig-Path: news.aeon.icebear.cloud!news1.aeon.icebear.cloud!.POSTED.romulus.aeon.icebear.cloud!not-for-mail
Cancel-Lock: sha1:v6XU18aveQSNJ/VlsmtroKQhvCw= sha1:+ZJsRaIsIldawePHckxvTOtO1Ds=
Injection-Date: Thu, 23 Dec 2021 08:07:11 -0000 (UTC)
Injection-Info: sirius.aeon.icebear.cloud; posting-host="romulus.aeon.icebear.cloud:fd09:afca:b044:1:4ecc:6aff:fecf:5c8f";
logging-data="24263"; mail-complaints-to="abuse+news@aeon.icebear.cloud"
User-Agent: slrn/1.0.3 (Linux)
 by: Henning Hucke - Thu, 23 Dec 2021 08:07 UTC

On 2021-12-21, Carlos E.R. <robin_listas@es.invalid> wrote:

> [...]
>
> Thunderbird has an addon to do this. It searches a folder, and produces
> a window listing duplicates (it displays several fields), offering to
> delete them. I find it a useful function.

Strange thing whis is! I never had (real) duplicates except intentional ones.
The last part of the centence means that indeed it happenes that I save
one mail to another folder without deleting the "original".
Aside from this duplicates show up from sources which obvioulsy don't
understand the task of a message ID and the necessity to avoid duplicates or
which don't know how to generate unique identifiers.

Atlassian and Jira are an bad example of that...

Nonetheless they are no real duplicates in the sense that they are
identical in message ID as well as mail body.

Best regards,
Henning
--
In the first place, God made idiots;
this was for practice; then he made school boards.
-- Mark Twain

Re: De-duplicating a Maildir directory

<4s6e9i-9im.ln1@Telcontar.valinor>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=283&group=comp.mail.pine#283

  copy link   Newsgroups: comp.mail.pine
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: robin_li...@es.invalid (Carlos E.R.)
Newsgroups: comp.mail.pine
Subject: Re: De-duplicating a Maildir directory
Date: Thu, 23 Dec 2021 13:32:36 +0100
Lines: 52
Message-ID: <4s6e9i-9im.ln1@Telcontar.valinor>
References: <a007b527-c2f8-405a-86f4-eaee0cbcca1cn@googlegroups.com>
<73b32abf-1502-4494-8b13-fa50e0483106n@googlegroups.com>
<j23fquFn6amU1@mid.individual.net> <0ps89i-gpi.ln1@Telcontar.valinor>
<slrnss8bhf.v5g.h_hucke+spam.news@romulus.aeon.icebear.cloud>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net qEPGIScksapS8wro2L/hrgzddofm2Z4YTWGFFqoFZ/so6YjRlF
X-Orig-Path: Telcontar.valinor!not-for-mail
Cancel-Lock: sha1:6BGeVIACTySpBDmbspKoXI0HYO8=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
In-Reply-To: <slrnss8bhf.v5g.h_hucke+spam.news@romulus.aeon.icebear.cloud>
Content-Language: en-CA
 by: Carlos E.R. - Thu, 23 Dec 2021 12:32 UTC

On 23/12/2021 09.07, Henning Hucke wrote:
> On 2021-12-21, Carlos E.R. <robin_listas@es.invalid> wrote:
>
>> [...]
>>
>> Thunderbird has an addon to do this. It searches a folder, and
>> produces a window listing duplicates (it displays several fields),
>> offering to delete them. I find it a useful function.
>
> Strange thing whis is! I never had (real) duplicates except intentional
> ones.
> The last part of the centence means that indeed it happenes that I save
> one mail to another folder without deleting the "original".
> Aside from this duplicates show up from sources which obvioulsy don't
> understand the task of a message ID and the necessity to avoid
> duplicates or
> which don't know how to generate unique identifiers.
>
> Atlassian and Jira are an bad example of that...
>
> Nonetheless they are no real duplicates in the sense that they are
> identical in message ID as well as mail body.

They happen easily when having two or more computers with local folders,
when trying to keep things in sync between them.

Say, on computer A you save mails about SciFi to folder SciFi, and later
you do the same on computer B, but at that time there is a different
selection for whatever reason, and later you try to sync the two SciFi
folders.

Or you move some mails to a temporary folder, then a year later you find
that temporary and forgotten folder, and being afraid of deleting mails
you move them to a final folder, not remembering they are already there.

Things like that.

True duplicates.

So, a go at finding duplicates finds them and you can remove them
relatively easily.

Judging a dupe just by the messageid is a mistake. For instance, the
sent folder and the inbox from a mail list would have your email in both
places with the same messageid, but if you look carefully you see
different headers, and sometimes different bodies.

Gmail does exactly this mistake.

--
Cheers, Carlos.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor