Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

ASHes to ASHes, DOS to DOS.


devel / comp.lang.python / Re: Comparing sequences with range objects

SubjectAuthor
* Re: Comparing sequences with range objectsAntoon Pardon
+* Re: Comparing sequences with range objectsduncan smith
|+* Re: Comparing sequences with range objectsAntoon Pardon
||`* Re: Comparing sequences with range objectsduncan smith
|| +* Re: Comparing sequences with range objectsAntoon Pardon
|| |`- Re: Comparing sequences with range objectsduncan smith
|| `- Re: Comparing sequences with range objects2QdxY4RzWzUUiLuE
|`- Re: Comparing sequences with range objectsAntoon Pardon
`* Re: Comparing sequences with range objectsChristian Gollwitzer
 `- Re: Comparing sequences with range objectsIan Hobson

1
Re: Comparing sequences with range objects

<mailman.61.1649402503.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17754&group=comp.lang.python#17754

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: antoon.p...@vub.be (Antoon Pardon)
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Fri, 8 Apr 2022 09:21:29 +0200
Lines: 49
Message-ID: <mailman.61.1649402503.20749.python-list@python.org>
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de k7ZVeRdbKeU/80PlU5Fu5g4KA6IgDU04JmD5l03Su1fg==
Return-Path: <Antoon.Pardon@vub.be>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="1024-bit key; unprotected key"
header.d=vub.be header.i=@vub.be header.b=VcYrD3wZ; dkim-adsp=pass;
dkim-atps=neutral
X-Spam-Status: OK 0.038
X-Spam-Evidence: '*H*': 0.92; '*S*': 0.00; '(which': 0.04; '2022':
0.05; 'containing': 0.05;
'received:eur05-am6-obe.outbound.protection.outlook.com': 0.07;
'hard.': 0.09; 'identical': 0.09; 'memory.': 0.09; 'that.': 0.15;
'(when': 0.16; '+0200,': 0.16; '>>>>': 0.16; 'antoon': 0.16;
'comparing': 0.16; 'complete.': 0.16; 'duplicates': 0.16;
'from:addr:antoon.pardon': 0.16; 'from:addr:vub.be': 0.16;
'from:name:antoon pardon': 0.16; 'heuristic': 0.16; 'message-
id:@vub.be': 0.16; 'objects.': 0.16; 'pardon': 0.16; 'pardon.':
0.16; 'received:40.107.22': 0.16; 'schreef': 0.16; 'wrote:': 0.16;
'problem': 0.16; 'instead': 0.17; 'thu,': 0.19; 'to:addr:python-
list': 0.20; 'list,': 0.24; 'object': 0.26; "wasn't": 0.26; '>>>':
0.28; 'asked': 0.29; 'header:User-Agent:1': 0.30; 'program': 0.31;
'think': 0.32; 'unless': 0.32; 'but': 0.32; 'there': 0.33; 'same':
0.34; 'header:In-Reply-To:1': 0.34; 'people': 0.36; 'change':
0.36; 'missing': 0.37; 'really': 0.37; 'two': 0.39; 'single':
0.39; 'enough': 0.39; 'list': 0.39; 'use': 0.39; 'still': 0.40;
'something': 0.40; 'want': 0.40; 'should': 0.40; 'sorry': 0.60;
'skip:h 10': 0.61; 'data,': 0.63; 'security': 0.64; 'range': 0.64;
'your': 0.64; 'per': 0.68; 'clear.': 0.69; 'piece': 0.69;
'population': 0.69; 'times': 0.69; 'establish': 0.70; 'chance':
0.71; 'records': 0.75; 'identifying': 0.76; 'yes': 0.76; 'field':
0.78; 'moment': 0.81; '(like': 0.84; 'comparable': 0.84; 'easy.':
0.84; 'eat': 0.84; 'indexes': 0.84;
'received:eurprd01.prod.exchangelabs.com': 0.84; 'records.': 0.84;
'joel': 0.91; 'younger': 0.91; 'duplicate': 0.93
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=MujZOOF0xyq05iihRVjS5wma9uUl5L7iqjLVX1CPItojMHQDe29OYRsvz50dmOzEnPjeFNZCpgKidDDwZ9g9Had1zJtZhIau2lj7oqsCSjLjcAzGOiiMLeSt6zkKtKUJu4+VlgHcJ5WBU2+Al1na0g78swcq5hoKQpJPZh6L83d8FZjaIKqSDvtCTElDUFsu4mPF1fHxBJr4oxMF6axJAPlv8264ue1VA8uef//Fp/6Cien0hhWGfSGkaQyYgEtjE11hzuj9pUEEuOe4mXo88qPmzBf/ke3KDx3iIgakDafW0/JAdQB3Wdy6SoBSt4TihxHXBM3ebqQ+iLzcKQHKrA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=OCqQCPcPpbsaX7KjDkWgJ4hujjtofBnO9iRpXPppPkQ=;
b=CcDG5f2Ulhuw30Lpn6ohNWZdp+KdF9GMLvZcsLY2VSSQh007ib+z4R8W+402I0UAoRlWXmvj3aH/21Bj32c8q0feD/QvNiuvExbWx3YGrbH42P2a5MmJ5aZExyDZwum922Ynh3AzHy6Kc7l7xDwvR+4WS3omB217svjvOCYzE+BZU58bXb6v7f9SCavTErepYe2XT0mdOcBQ7dGVUpiQSK+MvrkMg5epHl1ypMCVIo/70/JJupauCXGnhhpUQxEZJTKwpJ1Zb7+9KgaRQYFnuLU7n9jr8taR7XkVnlA7QuauYZEHEHpWIIzVLGN4LG8YoGpT4eLAkjgbnyInKrR3eQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=vub.be; dmarc=pass action=none header.from=vub.be; dkim=pass
header.d=vub.be; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vub.be; s=selector1;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=OCqQCPcPpbsaX7KjDkWgJ4hujjtofBnO9iRpXPppPkQ=;
b=VcYrD3wZvwG5ZsKwKIl26DoYhwzz8w5s198B1/D30EfvwmPElSmJiOIv3N509DBJUEh0Obh37jBq0pF4qgy/ddyBkqfF/RDvCcUIjsAQmUJBO68Ihnok8ZuSyKBGdTkFkWbhDVBO9wFfWsV0MZV61Q8qllWfnfwuHjlUUAfpleU=
Authentication-Results: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=vub.be;
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Content-Language: nl-BE
In-Reply-To: <20220408062448.ny3nufpeyajwkcqp@hjp.at>
X-ClientProxiedBy: AM0PR01CA0099.eurprd01.prod.exchangelabs.com
(2603:10a6:208:10e::40) To DB7PR01MB4155.eurprd01.prod.exchangelabs.com
(2603:10a6:5:2f::21)
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 6f672645-d6a6-41c0-f0f8-08da19306c8e
X-MS-TrafficTypeDiagnostic: AM0PR01MB6145:EE_
X-Microsoft-Antispam-PRVS: <AM0PR01MB6145DE2F74E52BAE5C6DE5768FE99@AM0PR01MB6145.eurprd01.prod.exchangelabs.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: Vl3r4w1HQiVwJfmx2mn2FWIZ4rdm+Zrx2G4LUOApeYIFrRha72w75fgKkrTVooTgY3P+7ugoPjaAksmamj4PJIn2K0sf4VAEi2TJqiqwDyajb8ah4En721eLyrzA2RdzleexpCpWuUbTQMrMoXgOu45c+WzgdkiYqazjSEBxu2L0xHa3L6hoJA7AGJNci1GKNj7wAjyeXjb/mHfycKgJ+COGemH2NQoBmAirAVR9qVigdHy+ZipZVG6eQJ7DBGwd1keOOhbi3ukGaY75XzjOvLcD8Go/Pe6Y74G2zW+hUKGs6KGRigwI+wKpf5XprryVCXZKMuErjno4VFUxqkasbi+P/b5olI+adM+DwwyyJE/hYo+uVj/K+zJUqPT98DgWASYF0FpoK9XG0u+lWnI0em96jNENp0B3kNogEyI8jB9mfx+o04GGkR86oM7m5Bp7pGNpRZq3tt0tQsI5nv7TtDsZPv+fyowU9bt+Aeb+vYEMAlHelFeH7UeshRYHJTn127WXKmy/NqbNW1CyHHgJ85x4+E2N+3ZkcAoxVca3a3YQL1jLQY4xmJMIjeYOJvxANxRix2edsydsbkRP0Qr0igTg8TIIiw99VxC9OmbHST/kiYI+cCu2+Hzs4WkOizNdgpJse4ATusa43T1oV1SMBVoEwDikByUB8I2MzBOUzHudti4aP08sZCQcP6PcQvfg9smKeJFqu3PTT6/1wqubyQ==
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
IPV:NLI; SFV:NSPM; H:DB7PR01MB4155.eurprd01.prod.exchangelabs.com; PTR:;
CAT:NONE;
SFS:(13230001)(4636009)(366004)(2616005)(66946007)(66556008)(6666004)(8676002)(33964004)(66476007)(186003)(6506007)(6512007)(52116002)(53546011)(83380400001)(6916009)(786003)(316002)(38100700002)(508600001)(6486002)(31696002)(86362001)(2906002)(44832011)(5660300002)(36756003)(8936002)(31686004)(43740500002);
DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 2
X-MS-Exchange-AntiSpam-MessageData-0: wKQWrdFP7ke/h5kZEGVf8t0reym69LfQbUCLoC4TaxdXn
EKQvHlagxVISGwzirKlZtQyX1haTDkBvQVT0EZONuO99X
iiXBLPP8ctPhpN2UofEta18XCNF0b3zstYabIlJbMlgq+
Uklh18pT7w57OMN/5aRLGqf/arTeizoMfslMTdeNXCuYw
Lptm1zyYuE8A0MqkEtHfi54KbauHi9c1iAgJ53uRvwLWY
qHttXtZb++MbWEalOnB76I5bN06qHnRk8XKicviGzLCxX
vdyBmkP1Z68o66nOc+npRF8t8Lt7w43pVqm68Yru2sHCz
Vc2j5Wdb2Dex5U43SLtGuIixTwbllX3hI93/cHAVfD+rm
h2kgF/fP4sQCMCLIxvMCQsWVMLBdZMZZSRLB1x4Pvi2sv
OkG24UZB7hBnUT2ippRfjb59A/gIzMmtXxfvi2uMnwdP6
soLaWOeyQ8liWuYPzJk69l5sHlSU64rPlYf3oMkr+GaiL
OrDWDvVry5kRK8kFulLXY3z+qAxh8hyj2EQ0keDHsbxoa
pDVe7MmvlTAyT8b162x68611sAi73cz+0Cqir0OOebRFB
E/BXjmqrknZuLgGtDIU8h6MjX/BXc1nbsS6Y/RfAA36a9
C5N8jrStN9b+awTvZLw8wNptE/FxRcXepg1nqSxDMu1pa
TRkZmmpc3p+0pLNIB5TBWCOP0umEiV6NjZZrHDQEpRw4j
LZRZ0bV6g4JnUnh3+fdxUZvTjEj6Qi7ZGbSudMYDxHabP
UXEJbutV2cHmwVjSgsT1I+L16IbyAXwvih1Q0Xc0IHkeG
PLs999zwCFqAF4+ai4TRMy0nJVPewc73lL2QwCVdkm38V
Bfsx3bXZA2dN++iwco4GXXqPT2Sgj0CCjgChSgnICq6Ru
pG7WEr9f/4LOWDe30OXqBcZ8sJPt55SOWQiSPz1LKJPn5
6uRNJwoeURc4lBv8UpxnT5oFjSHLF2ZvRdS0wF53MkCUa
n9buul0q8ehaRzdbfX1cwv+/OK/Eneu9XtIVGKfApfArh
fjfX0yU3RbCQAhd8I+9i4nbtsj12+FAcsR1iQWIAAR46L
b+pXnp88Jmtcc5s+oz4N9glKc4HjBeUdWt7OBfzp7idDE
WpblhfPqAXwiaw/R0wPBhgV99obRLHO3Ls88p0yXvVpJJ
+q+TsQw5rH36mOJbZ+4B+w6ZjeOtVoAJFAfuSMcPRZra8
OmzMqy6z5LNgvkijYTJ+s4Gw0JPFyV8r6Ion0NBIFCrdm
zJ8gzylfoI8XYdefCUFrb6UHmbBzqHd0meE5jTnWCs5HN
gEmHQF4nF6oyEXyRSz8oceiMtyj7KKuwaP0Iv8noxEdib
n2jgdlfOgvm/FUYpAneCMdKPzIfeBiVVrNRitcEtpQk4Z
kEwZ5M7astl1xey/FD85ste79LzAtc0mmQrvVgyl+d/xA
sEm1mjnvFxHlIfMH8oy1zJbWNAfnJCx0cdi8w6giAm4Po
jRAhifVfPvucbn7AqP+yi/f35dyWh5OJeKXSVkA1YCSXd
XDZmR0nn4aMmtYxC19c5R/A3V+D/WazmvBGkhw6+bigWG
+qXPCqkH5RgrEwhpAyBEtwfVxz3DKeKxPK64tUc5f6TqH
KUikvYxkFbkwsYrgQ+YQmkJ8+Y6vOvARezQ8PDwvojz1h
FXLJhI/qEm6wCVpH6vYUpT+bUw8ecI9CS+M9U0XStmZpE
tJaphjkuQbsfaQU63MtoZHDS5rgCWzXH/1vzj1fYAGQEg
FpG9Y7GdCaSJe3B21PR2m5x6mmNfN4hCnXXuUKGwGiePh
X-MS-Exchange-AntiSpam-MessageData-1: mHIx3pKfGg7x82gQyUpe5GjAWypxoOBN0zo=
X-OriginatorOrg: vub.be
X-MS-Exchange-CrossTenant-Network-Message-Id: 6f672645-d6a6-41c0-f0f8-08da19306c8e
X-MS-Exchange-CrossTenant-AuthSource: DB7PR01MB4155.eurprd01.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Apr 2022 07:21:39.8028 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 695b7ca8-2da8-4545-a2da-42d03784e585
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: GDG64umI85CTMIUCSHcC6mrysVp8B8j5Oc1ZF25xcfBjLxVlWpOlJEfiOwnatZCr5aq9Ak5PgjxtqADSsLR/FA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR01MB6145
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
X-Mailman-Original-References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
 by: Antoon Pardon - Fri, 8 Apr 2022 07:21 UTC

Op 8/04/2022 om 08:24 schreef Peter J. Holzer:
> On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:
>> Op 7/04/2022 om 16:08 schreef Joel Goldstick:
>>> On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon<antoon.pardon@vub.be> wrote:
>>>> I am working with a list of data from which I have to weed out duplicates.
>>>> At the moment I keep for each entry a container with the other entries
>>>> that are still possible duplicates.
> [...]
>> Sorry I wasn't clear. The data contains information about persons. But not
>> all records need to be complete. So a person can occur multiple times in
>> the list, while the records are all different because they are missing
>> different bits.
>>
>> So all records with the same firstname can be duplicates. But if I have
>> a record in which the firstname is missing, it can at that point be
>> a duplicate of all other records.
> There are two problems. The first one is how do you establish identity.
> The second is how do you ween out identical objects. In your first mail
> you only asked about the second, but that's easy.
>
> The first is really hard. Not only may information be missing, no single
> single piece of information is unique or immutable. Two people may have
> the same name (I know about several other "Peter Holzer"s), a single
> person might change their name (when I was younger I went by my middle
> name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
> same person?), they will move (= change their address), change jobs,
> etc. Unless you have a unique immutable identifier that's enforced by
> some authority (like a social security number[1]), I don't think there
> is a chance to do that reliably in a program (although with enough data,
> a heuristic may be good enough).

Yes I know all that. That is why I keep a bucket of possible duplicates
per "identifying" field that is examined and use some heuristics at the
end of all the comparing instead of starting to weed out the duplicates
at the moment something differs.

The problem is, that when an identifying field is judged to be unusable,
the bucket to be associated with it should conceptually contain all other
records (which in this case are the indexes into the population list).
But that will eat a lot of memory. So I want some object that behaves as
if it is a (immutable) list of all these indexes without actually containing
them. A range object almost works, with the only problem it is not
comparable with a list.

--
Antoon Pardon.

Re: Comparing sequences with range objects

<kEX3K.402086$iK66.240336@fx46.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17756&group=comp.lang.python#17756

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx46.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Subject: Re: Comparing sequences with range objects
Content-Language: en-GB
Newsgroups: comp.lang.python
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
From: dun...@invalid.invalid (duncan smith)
In-Reply-To: <mailman.61.1649402503.20749.python-list@python.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 55
Message-ID: <kEX3K.402086$iK66.240336@fx46.iad>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Fri, 08 Apr 2022 14:28:32 UTC
Organization: blocknews - www.blocknews.net
Date: Fri, 8 Apr 2022 15:28:30 +0100
X-Received-Bytes: 3853
 by: duncan smith - Fri, 8 Apr 2022 14:28 UTC

On 08/04/2022 08:21, Antoon Pardon wrote:
>
>
> Op 8/04/2022 om 08:24 schreef Peter J. Holzer:
>> On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:
>>> Op 7/04/2022 om 16:08 schreef Joel Goldstick:
>>>> On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon<antoon.pardon@vub.be>
>>>> wrote:
>>>>> I am working with a list of data from which I have to weed out
>>>>> duplicates.
>>>>> At the moment I keep for each entry a container with the other entries
>>>>> that are still possible duplicates.
>> [...]
>>> Sorry I wasn't clear. The data contains information about persons.
>>> But not
>>> all records need to be complete. So a person can occur multiple times in
>>> the list, while the records are all different because they are missing
>>> different bits.
>>>
>>> So all records with the same firstname can be duplicates. But if I have
>>> a record in which the firstname is missing, it can at that point be
>>> a duplicate of all other records.
>> There are two problems. The first one is how do you establish identity.
>> The second is how do you ween out identical objects. In your first mail
>> you only asked about the second, but that's easy.
>>
>> The first is really hard. Not only may information be missing, no single
>> single piece of information is unique or immutable. Two people may have
>> the same name (I know about several other "Peter Holzer"s), a single
>> person might change their name (when I was younger I went by my middle
>> name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
>> same person?), they will move (= change their address), change jobs,
>> etc. Unless you have a unique immutable identifier that's enforced by
>> some authority (like a social security number[1]), I don't think there
>> is a chance to do that reliably in a program (although with enough data,
>> a heuristic may be good enough).
>
> Yes I know all that. That is why I keep a bucket of possible duplicates
> per "identifying" field that is examined and use some heuristics at the
> end of all the comparing instead of starting to weed out the duplicates
> at the moment something differs.
>
> The problem is, that when an identifying field is judged to be unusable,
> the bucket to be associated with it should conceptually contain all other
> records (which in this case are the indexes into the population list).
> But that will eat a lot of memory. So I want some object that behaves as
> if it is a (immutable) list of all these indexes without actually
> containing
> them. A range object almost works, with the only problem it is not
> comparable with a list.
>

Is there any reason why you can't use ints? Just set the relevant bits.

Duncan

Re: Comparing sequences with range objects

<mailman.67.1649452095.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17762&group=comp.lang.python#17762

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: antoon.p...@vub.be (Antoon Pardon)
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Fri, 8 Apr 2022 23:08:10 +0200
Lines: 28
Message-ID: <mailman.67.1649452095.20749.python-list@python.org>
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de r048/F1PfnfYb4FSvssp+wNzzLp8WqEZu0jePezDD43Q==
Return-Path: <Antoon.Pardon@vub.be>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="1024-bit key; unprotected key"
header.d=vub.be header.i=@vub.be header.b=iQmPfJWF; dkim-adsp=pass;
dkim-atps=neutral
X-Spam-Status: OK 0.048
X-Spam-Evidence: '*H*': 0.91; '*S*': 0.00; '(which': 0.04;
'containing': 0.05; 'memory.': 0.09; 'that.': 0.15; 'antoon':
0.16; 'comparing': 0.16; 'duplicates': 0.16; 'elements.': 0.16;
'from:addr:antoon.pardon': 0.16; 'from:addr:vub.be': 0.16;
'from:name:antoon pardon': 0.16; 'iterate': 0.16; 'message-
id:@vub.be': 0.16; 'pardon': 0.16; 'schreef': 0.16; 'wrote:':
0.16; 'problem': 0.16; 'instead': 0.17; "can't": 0.17; 'to:addr
:python-list': 0.20; 'idea': 0.24; 'object': 0.26; 'header:User-
Agent:1': 0.30; 'but': 0.32; 'there': 0.33; 'header:In-Reply-
To:1': 0.34; 'list': 0.39; 'use': 0.39; 'something': 0.40; 'want':
0.40; 'should': 0.40; 'range': 0.64; 'well': 0.65; 'less': 0.65;
'per': 0.68; 'obvious': 0.69; 'population': 0.69; 'relevant':
0.73; 'records': 0.75; 'identifying': 0.76; 'yes': 0.76; 'field':
0.78; 'moment': 0.81; 'comparable': 0.84; 'eat': 0.84; 'indexes':
0.84; 'received:eurprd01.prod.exchangelabs.com': 0.84
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=SZGXK80/q+65kEUoFv0Z9ZTAZ1ScOZRfUhovr+SVyLpCJOJawGwADM2n6V+/tZ6s0B00n/cAtiurZy410wpkuEyY69pc/9CXus/fCvlsx1KNmPhch7Uywg9Tq8QQ7bOcC1MVs9uu6459uOwEMSO29r4gs0BVTWTBRZYAoBjhcoO8TKghhRusflFivmFdZP8OwiAL+sC/h6/5Gb6VHMiiE/2kisnoEorTo5BsY+9S2GbjgysNyJU5qii7XV/m5fko+W9H46TGt6jB8y1tiZhg/Ws01bumJ8Ff3sXkvVu3ZCnUh3khRvE+0tAIn18zARjdQwnHbtpw/Z/qB916NKkF8g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=p6lvZt7c1HFS6+wnGbB3Z7gAUBTouDvrzKh0Hf1ieRI=;
b=h4s6vObPzrm2pXi1jRd4nfErkku68VOe/ZpKEhwgmpjLvBiiefwyTASIFpExrCzNJXQTHbm6BwlCmRCXusrnqqX56IEnNfmXap/qm5OkmHfX8Os5nO7J9r5Fh2Ei6anra7ib9V2BB2BZpSTRE7VQoQhiFHdEB4efD4yZ4IJBPSX4JTeU7ZaDKahCQ0k2jL4Xw1C4tsgGxjAVzbrJwtYavAcDF6IBE9ow7OoPLB+wwyMA8nnEJ8ETuKyUrTef6b7SRdMbn5jQuvZvNOplGpZbDDJBELH6PsBzyoC9naTIDRogbvmhcc5/mCLH3rA+BZYjmaQzkMPEgzJBW/DnyevM1Q==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=vub.be; dmarc=pass action=none header.from=vub.be; dkim=pass
header.d=vub.be; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vub.be; s=selector1;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=p6lvZt7c1HFS6+wnGbB3Z7gAUBTouDvrzKh0Hf1ieRI=;
b=iQmPfJWFgzGO1B+tqQXCV83bLjEMKm+lKg/iGAQsenjF5t2CuD5X5A8FhHj7qv3H4KKgtZS09imFSVqhooLgqjO4NRESeszxUyMFeoMTPu75RbMF4yNB5RrCVOi/ZbBf9rXI2HBSUz72QKSa1RBk8K7tTMALhJdSVMNppeH44ro=
Authentication-Results: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=vub.be;
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Content-Language: nl-BE
In-Reply-To: <kEX3K.402086$iK66.240336@fx46.iad>
X-ClientProxiedBy: AM0PR02CA0184.eurprd02.prod.outlook.com
(2603:10a6:20b:28e::21) To AM0PR01MB4147.eurprd01.prod.exchangelabs.com
(2603:10a6:208:6a::12)
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 31aaecb3-e7d7-4185-02f8-08da19a3e3bf
X-MS-TrafficTypeDiagnostic: DBAPR01MB7080:EE_
X-Microsoft-Antispam-PRVS: <DBAPR01MB708006F91B65416A25DDBF198FE99@DBAPR01MB7080.eurprd01.prod.exchangelabs.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: QAPrQK/+gTU0UbQ3oAH6Tbp4TJ/VuDoBK979nh9gd0vBMLJCkzRhjbzhvr8j7TppnhZFdWjsSl3Gr3OKx7jywcJebc7z+hQVV0vqkFYHIShkYYYiRCkawqxS/bu4HlB/oJMUJlpiJWMwmS8hL669yUw3SmbXQkAx128mvou6J/Q0jdKsbnVwG2hhPnO6Wtiuh7nZU9wogt7hTSmru9lSw5gVvOaqppf2rsghTp3oLt7ASKMovfSI0uG77QoyVgloDiWhgQXgtqOpdUkCrRQB+er7u8qLRKqgGgDz+UXv3DR81ru2FktPY4d0Y3vMpiqymy8+f+Vm11lm66AJemAdRaliVjC7qoGuXlAxUahfNLrTdavg4PSd3GwL+PZJHdXsOE0oygSBMw5gfseQcjJzjoRf3TFlZsQG5Vm3kiXpTqLJyf98qPSMoJTBTPx1TBSP7CNGewSE79myjbD7+r3JQMkgfA3vRZOj8PPn4uwMKGuCtEnl8nD53lyEuoB2gbIf3L7PkC637vukOY84Ut8KEmmnvzFEhvhv34ADlG70warEwBGTPrgCHxrDAnm2vpebPICyfErfmcRD6ZG46o75dnm6JFJ5N1TYTRJm9pdS7wjGF9OQ9AQFbXhqLdxjlDhT+Eha1P2BsJM0VNpXPiGmYwyM7A9Y5i0mkeJ7IJdzMMZF/0vwUHJxwMeATodGcvt7+rC3DHjD3qIaxZ7SMPSzVg==
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
IPV:NLI; SFV:NSPM; H:AM0PR01MB4147.eurprd01.prod.exchangelabs.com; PTR:;
CAT:NONE;
SFS:(13230001)(4636009)(366004)(38100700002)(6916009)(31686004)(316002)(786003)(36756003)(66476007)(66946007)(66556008)(8676002)(186003)(2906002)(2616005)(6512007)(508600001)(44832011)(6486002)(5660300002)(53546011)(6506007)(31696002)(52116002)(86362001)(8936002)(43740500002);
DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 2
X-MS-Exchange-AntiSpam-MessageData-0: wP231SKFWMFfJA7n5Xl5hjieMKfFbTrfn/YGXGFqy3xlR
/V6lXDDgf+yaz+Egw+g65DIkObjt2n+vtDuNJVKvVfDWZ
us/A9CcwMTbVa5ZhEj/T/XDjdB/l+62tG5XoFM4cq+9Dm
oh1vn8b0e+OgHhSbNwaspjSpC5dMpd19KIvbet50mSn3p
gLrWTlsmFbRz4LaSGmvsS5sY8YWaBB8jjEtF1g1mhNgqQ
CcSU96as3nwv8331e9zYviGusMZjfPOEmvyDatwgud0DL
ps+WBi1AnLar7dwVKzMPEBYMbS2pXcDmEyeblA1D+WlQx
vmtuzJcsI+bfHQp9uSIMgdUFNm8N+Kgy9eo78Soiyg/T9
4LTrQzhcMgJ8vfY7BCOPu5G9ZQppaHXHNG0/i60fNE3ak
C8HXY/PKrRdgTUjGOqYyKq9Gpk+N43pa6kDFsLYX/ZrMm
nMrCmY88f6WyYv9VBTqvWbtCoYgwcTEQpClJCSjLjVK1T
FVcue88pdITS1w4eS9qGKvamk8TjriduVSjSKtg7cGuZa
6tT/xTxsJmtZ7uKvKbqSdpOpD/6pGLNraCgJgyud2ciHt
HBpIK433Gtfre69qH18SF2rG5u+33J8TgtRdqL/IJEFA+
kfFpHigbdmX426mKbNEQpi0STDZRYa36Vd3zeQebPmwXd
eC/tUWSiLTGNu989O/eeqqWhB8VBTflKr1Yg0/LiC/uQF
QkO50GoYxyZso8rCyhuKrVU2aCh/3RcxbxqI+ihU4MVvW
weSY0qNJD87J5ItmAwP88uudYiEG+ffFo6eu8LcS0eF9w
dl0veY6kSqUo1JbBx6taHLyeLiny9OwVd7V5xcMFMINPG
Z8ahxJOFL55fR/F/+aiLEfBhjz3nZaUTmt5HasfjnyQlX
8KVXlaiuBfNuH2+gGwMu1irHPlV9knYJNDfOMPd8/NLNV
ka0d25XL5AeKNzTTRt4bcO5PlId6mu+GeeZKTCRpDChHI
0UzIGrXzVfHi+fibBMvBBRS55d4Dx7HOUjfouNx/YmrMj
5o5Aw4K1H1QnuY9fJVr0SI4aYseDglL6M4Ys+4zxSsETC
Dr5dif81BcVsqodqST5JlrMv0faGTH0PX81hBY1W9J4cM
hxMxmqfmtVRIICbY3Xuo7iAGnPUqbhn4Qr6WWpq3ra1z7
qkVR4mg8WrXsNu/WT770tQV94Z1oCRA5whJtZA/xEelxk
9tj526dF4xYTLH5XIjE4AdY10cRmskl+T44YoGquoksoI
mYNizDvgD2ZBJz9tAGPtACJclT2YUFJMhjdhDdizPO9GB
aEyuQv8/nXj6mfFuxBqm7mRo894xULoNOyQdf7UVDU9SQ
0k4Ng7D+kihidOmWVeFYIcBQph1Iv3xpwmZsoP2dgxxTS
dfNBY9gQ0A1yqCRja/ku4nM3hmmQ+Ra348Ivah6pfR8/u
wcpLfSlCvQKLQDrnGLizk+ugwXkaOxpHah4E+O2GwGogy
uBni/JcXvYGJwr7ykcN/6z+W9cMdZZ468RIAtSU/8KkYS
JrjqpR/b9sTLQjfFv4zVgr7wSYTC6lvOmy3tvo3BQf+rP
lfyNlZ4OENV3zKFvX8Olpu4uzSYE18gv/fZJI+zBDDb0T
ckTG79ibkrpwqWREI6eJsFsbVsh1/JmdpJfmQiJ2+Dqs2
OYj2qoEaoD3HmrkSV2dJygDbTWkJO8oJONB9kyx4u/n8m
d007a4/bLMt2NyuN4xdVwKY5SgedpGFwxMGUKb5WOVBXu
06Cwo8zCpmZJBCLh4FGWchuivTabkGkKj+C3QYpK01t5H
X-MS-Exchange-AntiSpam-MessageData-1: C/E5S9XiVLOt7VMtPhswxwA8zEjdMTNWx/c=
X-OriginatorOrg: vub.be
X-MS-Exchange-CrossTenant-Network-Message-Id: 31aaecb3-e7d7-4185-02f8-08da19a3e3bf
X-MS-Exchange-CrossTenant-AuthSource: AM0PR01MB4147.eurprd01.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Apr 2022 21:08:11.9374 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 695b7ca8-2da8-4545-a2da-42d03784e585
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: i6/qa2nmQJt6DnNJ7WnwGBxNQY4mIxiA29mKPSDFFuqDdmTu1jDWHkTeTSLyrIok3kRR9SNUiDU9CIAL2EYfRg==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR01MB7080
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
X-Mailman-Original-References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
 by: Antoon Pardon - Fri, 8 Apr 2022 21:08 UTC

Op 8/04/2022 om 16:28 schreef duncan smith:
> On 08/04/2022 08:21, Antoon Pardon wrote:
>>
>> Yes I know all that. That is why I keep a bucket of possible duplicates
>> per "identifying" field that is examined and use some heuristics at the
>> end of all the comparing instead of starting to weed out the duplicates
>> at the moment something differs.
>>
>> The problem is, that when an identifying field is judged to be unusable,
>> the bucket to be associated with it should conceptually contain all
>> other
>> records (which in this case are the indexes into the population list).
>> But that will eat a lot of memory. So I want some object that behaves as
>> if it is a (immutable) list of all these indexes without actually
>> containing
>> them. A range object almost works, with the only problem it is not
>> comparable with a list.
>>
>
> Is there any reason why you can't use ints? Just set the relevant bits.

Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.

--
Antoon.

Re: Comparing sequences with range objects

<Q144K.71885$001.60018@fx34.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17763&group=comp.lang.python#17763

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx34.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Subject: Re: Comparing sequences with range objects
Content-Language: en-GB
Newsgroups: comp.lang.python
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
From: dun...@invalid.invalid (duncan smith)
In-Reply-To: <mailman.67.1649452095.20749.python-list@python.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 57
Message-ID: <Q144K.71885$001.60018@fx34.iad>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Sat, 09 Apr 2022 00:01:52 UTC
Organization: blocknews - www.blocknews.net
Date: Sat, 9 Apr 2022 01:01:50 +0100
X-Received-Bytes: 2857
 by: duncan smith - Sat, 9 Apr 2022 00:01 UTC

On 08/04/2022 22:08, Antoon Pardon wrote:
>
> Op 8/04/2022 om 16:28 schreef duncan smith:
>> On 08/04/2022 08:21, Antoon Pardon wrote:
>>>
>>> Yes I know all that. That is why I keep a bucket of possible duplicates
>>> per "identifying" field that is examined and use some heuristics at the
>>> end of all the comparing instead of starting to weed out the duplicates
>>> at the moment something differs.
>>>
>>> The problem is, that when an identifying field is judged to be unusable,
>>> the bucket to be associated with it should conceptually contain all
>>> other
>>> records (which in this case are the indexes into the population list).
>>> But that will eat a lot of memory. So I want some object that behaves as
>>> if it is a (immutable) list of all these indexes without actually
>>> containing
>>> them. A range object almost works, with the only problem it is not
>>> comparable with a list.
>>>
>>
>> Is there any reason why you can't use ints? Just set the relevant bits.
>
> Well my first thought is that a bitset makes it less obvious to calulate
> the size of the set or to iterate over its elements. But it is an idea
> worth exploring.
>

def popcount(n):
"""
Returns the number of set bits in n
"""
cnt = 0
while n:
n &= n - 1
cnt += 1
return cnt

and not tested,

def iterinds(n):
"""
Returns a generator of the indices of the set bits of n
"""
i = 0
while n:
if n & 1:
yield i
n = n >> 1
i += 1

Duncan

Re: Comparing sequences with range objects

<t2r892$4gc$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17764&group=comp.lang.python#17764

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: aurio...@gmx.de (Christian Gollwitzer)
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Sat, 9 Apr 2022 08:14:57 +0200
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <t2r892$4gc$1@dont-email.me>
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 9 Apr 2022 06:14:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="52b89b1c81b58be24b56e8a73f537c72";
logging-data="4620"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+oRSW0dUFz9qcwxvNi++aZeOFuW2eKAtQ="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Cancel-Lock: sha1:F6V8klNghL2hLIhMb7/IVTqrF0g=
In-Reply-To: <mailman.61.1649402503.20749.python-list@python.org>
 by: Christian Gollwitzer - Sat, 9 Apr 2022 06:14 UTC

Am 08.04.22 um 09:21 schrieb Antoon Pardon:
>> The first is really hard. Not only may information be missing, no single
>> single piece of information is unique or immutable. Two people may have
>> the same name (I know about several other "Peter Holzer"s), a single
>> person might change their name (when I was younger I went by my middle
>> name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
>> same person?), they will move (= change their address), change jobs,
>> etc. Unless you have a unique immutable identifier that's enforced by
>> some authority (like a social security number[1]), I don't think there
>> is a chance to do that reliably in a program (although with enough data,
>> a heuristic may be good enough).
>
> Yes I know all that. That is why I keep a bucket of possible duplicates
> per "identifying" field that is examined and use some heuristics at the
> end of all the comparing instead of starting to weed out the duplicates
> at the moment something differs.
>
> The problem is, that when an identifying field is judged to be unusable,
> the bucket to be associated with it should conceptually contain all other
> records (which in this case are the indexes into the population list).
> But that will eat a lot of memory. So I want some object that behaves as
> if it is a (immutable) list of all these indexes without actually
> containing
> them. A range object almost works, with the only problem it is not
> comparable with a list.

Then write your own comparator function?

Also, if the only case where this actually works is the index of all
other records, then a simple boolean flag "all" vs. "these items in the
index list" would suffice - doesn't it?

Christian

Re: Comparing sequences with range objects

<mailman.69.1649565478.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17768&group=comp.lang.python#17768

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: hobso...@gmail.com (Ian Hobson)
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Sun, 10 Apr 2022 11:37:51 +0700
Lines: 70
Message-ID: <mailman.69.1649565478.20749.python-list@python.org>
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<t2r892$4gc$1@dont-email.me>
<c716c471-da3d-f44e-3862-1876f6affd3c@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de 9R5KKPXQAG9IMVLRcd6ezAJR6l9CUQkT2yGin5MjrYEQ==
Return-Path: <hobson42@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=hlUcj221;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.052
X-Spam-Evidence: '*H*': 0.90; '*S*': 0.00; '(which': 0.04;
'containing': 0.05; 'matching': 0.07; 'hard.': 0.09; 'list"':
0.09; 'memory.': 0.09; 'received:209.85.214.173': 0.09; 'received
:mail-pl1-f173.google.com': 0.09; 'schrieb': 0.09; 'that.': 0.15;
'(when': 0.16; '626': 0.16; '695': 0.16; '8bit%:47': 0.16;
'antoon': 0.16; 'comparing': 0.16; 'duplicates': 0.16; 'flag':
0.16; 'gollwitzer': 0.16; 'heuristic': 0.16; 'key.': 0.16;
'wrote:': 0.16; 'problem': 0.16; 'instead': 0.17; 'message-
id:@gmail.com': 0.18; 'it?': 0.19; 'to:addr:python-list': 0.20;
'run': 0.23; 'object': 0.26; 'function': 0.27; '>>>': 0.28;
'header:User-Agent:1': 0.30; 'present': 0.30; 'takes': 0.31;
'approach': 0.31; 'program': 0.31; 'think': 0.32; "doesn't": 0.32;
'christian': 0.32; 'received:209.85.214': 0.32; 'records,': 0.32;
'unless': 0.32; 'but': 0.32; 'there': 0.33; 'same': 0.34;
'viruses': 0.34; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'also,':
0.36; 'people': 0.36; 'change': 0.36; 'missing': 0.37; 'really':
0.37; 'using': 0.37; 'received:209.85': 0.37; 'received:209':
0.39; 'two': 0.39; 'single': 0.39; 'enough': 0.39; 'list': 0.39;
'use': 0.39; 'both': 0.40; 'something': 0.40; 'want': 0.40;
'should': 0.40; 'skip:\xc2 10': 0.62; 'data,': 0.63; 'email':
0.63; 'security': 0.64; 'range': 0.64; 'key': 0.64; 'your': 0.64;
'choose': 0.67; 'per': 0.68; 'items': 0.68; 'piece': 0.69;
'population': 0.69; 'chance': 0.71; 'records': 0.75;
'identifying': 0.76; 'yes': 0.76; 'field': 0.78; 'moment': 0.81;
'(like': 0.84; 'avg.': 0.84; 'comparable': 0.84; 'eat': 0.84;
'exit.': 0.84; 'indexes': 0.84; 'params': 0.84; 'vs.': 0.84;
'that:': 0.91; 'younger': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=message-id:date:mime-version:user-agent:subject:content-language:to
:references:from:in-reply-to:content-transfer-encoding;
bh=79SjvDBC7MeW4hPW+yOt0TIAEwYoJJlF59DHsQPTYwY=;
b=hlUcj221lS9HHDxPQlkrga/5QKUXYNBWbXyPTDoOxPWrio1EvAs64qxRPX8E7bKd/l
h/Wgvfi87j04AwpaiF0znQTy4TcWHVqIoHcHIZ1YdiSe4Z7B6o6+fyhbYu1MtLpCA+vJ
fJsWttotDnCjbRUjZAdz2MfSwTHC2PLVSamjdbaJw7MnRBP4gizr4vbWyXsITK0aZmyC
pq92LekciKwjA21tQ4DGC5FtPznxRGEraXTVh7IX/qa3UylXh55rRGPjrdKfJVE8iuZL
9nDWRQV+G/mX+hZ3dtTfR+qhWaysmC8YWJcrLQjaTuKdaRXFo/AaKecddvqmg9UNhOl4
3jRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
:content-language:to:references:from:in-reply-to
:content-transfer-encoding;
bh=79SjvDBC7MeW4hPW+yOt0TIAEwYoJJlF59DHsQPTYwY=;
b=yER3spnovAc056qTPFsq+ygV0IrH7j1C0NlyK6rPnxbqih9uUL1tnYw08NPtJdZNTq
oCDCwg1aBsDt3vLGfBluKraVCj4T2Aagg8yITNs3y3Hj0actYdbNtuFhcAlJ/IC6aaY5
JTfUu0kbnKkj3fzM6HM2Gg0iwRN87uTw0eBlvhECOP9TOuWMfpCX8+jX08DAkHyXxf91
UEY9+lMjgN1xZ/VyKOiZ9rqUes+sBmDcnCNwCndQyo/z4pHQ870ipSV1JE/KGL/hfgi0
6VHtBGcakGjxz4ID7oWhnlGlOx4WjO5fB7DkbWTXi+dN0iUMfGloLNEtICM9L+tWonx5
1ZRg==
X-Gm-Message-State: AOAM530hXL8gn95uyQUhRZLEHdCGQssGXBc9h4PM8KXFJmKNylDCwHqu
RANVNUDx9adnBaSoJWAslFV4DihanQ8=
X-Google-Smtp-Source: ABdhPJzJ9pyuCWUhEybz3nnVsFgu5hZhwhZyYmLL3bJN3DXJECN7auzS+pn12YmL3YT0/W5ZPTH6AA==
X-Received: by 2002:a17:902:f68f:b0:158:4a47:5607 with SMTP id
l15-20020a170902f68f00b001584a475607mr3378019plg.89.1649565475493;
Sat, 09 Apr 2022 21:37:55 -0700 (PDT)
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Content-Language: en-GB
In-Reply-To: <t2r892$4gc$1@dont-email.me>
X-Antivirus: AVG (VPS 220409-4, 4/9/2022), Outbound message
X-Antivirus-Status: Clean
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <c716c471-da3d-f44e-3862-1876f6affd3c@gmail.com>
X-Mailman-Original-References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<t2r892$4gc$1@dont-email.me>
 by: Ian Hobson - Sun, 10 Apr 2022 04:37 UTC

On 09/04/2022 13:14, Christian Gollwitzer wrote:
> Am 08.04.22 um 09:21 schrieb Antoon Pardon:
>>> The first is really hard. Not only may information be missing, no single
>>> single piece of information is unique or immutable. Two people may have
>>> the same name (I know about several other "Peter Holzer"s), a single
>>> person might change their name (when I was younger I went by my middle
>>> name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
>>> same person?), they will move (= change their address), change jobs,
>>> etc. Unless you have a unique immutable identifier that's enforced by
>>> some authority (like a social security number[1]), I don't think there
>>> is a chance to do that reliably in a program (although with enough data,
>>> a heuristic may be good enough).
>>
>> Yes I know all that. That is why I keep a bucket of possible duplicates
>> per "identifying" field that is examined and use some heuristics at the
>> end of all the comparing instead of starting to weed out the duplicates
>> at the moment something differs.
>>
>> The problem is, that when an identifying field is judged to be unusable,
>> the bucket to be associated with it should conceptually contain all other
>> records (which in this case are the indexes into the population list).
>> But that will eat a lot of memory. So I want some object that behaves as
>> if it is a (immutable) list of all these indexes without actually
>> containing
>> them. A range object almost works, with the only problem it is not
>> comparable with a list.
>
>
> Then write your own comparator function?
>
> Also, if the only case where this actually works is the index of all
> other records, then a simple boolean flag "all" vs. "these items in the
> index list" would suffice - doesn't it?
>
>     Christian
>
Writing a comparator function is only possible for a given key. So my
approach would be:

1) Write a comparator function that takes params X and Y, such that:
if key data is missing from X, return 1
If key data is missing from Y return -1
if X > Y return 1
if X < Y return -1
return 0 # They are equal and key data for both is present

2) Sort the data using the comparator function.

3) Run through the data with a trailing enumeration loop, merging
matching records together.

4) If there are no records copied out with missing
key data, then you are done, so exit.

5) Choose a new key and repeat from step 1).

Regards

Ian

--
Ian Hobson
Tel (+66) 626 544 695

--
This email has been checked for viruses by AVG.
https://www.avg.com

Re: Comparing sequences with range objects

<mailman.70.1649622049.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17770&group=comp.lang.python#17770

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: antoon.p...@vub.be (Antoon Pardon)
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Sun, 10 Apr 2022 22:20:33 +0200
Lines: 41
Message-ID: <mailman.70.1649622049.20749.python-list@python.org>
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
<0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de U5WQUcgsI1ldAlZ2ZGK4aQ5k+KAsSo8IDda7xMdm8PbQ==
Return-Path: <Antoon.Pardon@vub.be>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="1024-bit key; unprotected key"
header.d=vub.be header.i=@vub.be header.b=mcI2TZiI; dkim-adsp=pass;
dkim-atps=neutral
X-Spam-Status: OK 0.013
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'def': 0.04; '"""': 0.09;
'skip:\xc2 20': 0.09; 'antoon': 0.16; 'bits': 0.16; 'elements.':
0.16; 'from:addr:antoon.pardon': 0.16; 'from:addr:vub.be': 0.16;
'from:name:antoon pardon': 0.16; 'iterate': 0.16; 'message-
id:@vub.be': 0.16; 'naive': 0.16; 'pardon': 0.16; 'pardon.': 0.16;
'received:40.107.21': 0.16; 'schreef': 0.16; 'tested,': 0.16;
'yield': 0.16; 'wrote:': 0.16; 'to:addr:python-list': 0.20;
'returns': 0.22; 'idea': 0.24; 'header:User-Agent:1': 0.30;
'seem': 0.31; 'but': 0.32; 'header:In-Reply-To:1': 0.34; 'using':
0.37; 'skip:\xc2 10': 0.62; 'well': 0.65; '&amp;': 0.65; 'less':
0.65; 'maximum': 0.67; 'complexity': 0.69; 'generator': 0.69;
'obvious': 0.69; '8bit%:100': 0.76; 'cnt': 0.84; 'indices': 0.84;
'received:eurprd01.prod.exchangelabs.com': 0.84
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=ilJgkXfKGINqXfzfyGTNZqG9jIn3w8ykdJgAVFjPdAZslCURRgIbMVj3YYWNsPI1yjgIzOJFH+QdYpZTBmVNMn/P90DPrzeBxnRWgMkDWpZ3ylalj3zpef0Mtgu168l6hSkBhUonSms/9DD3kraMyp2GYgyUQAbUOPnomYNuH9aJGfKuyZz0Rxa7UrWhFB+lWymgcVlO11TIgcTeeQEcIvogmoQbZiF6zAlgyjqqkIAJbYpgIf5uOxWGbySR+Prs5X0eE7JcpUKRsohYsS32en/vmvTw8D/ZKOqFp5uvVkZVb0HiIXdEek/xWyr9X1PIL7W6ab6R2ud0vwXmZ5SO8w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=90kJNYTiN6huIB9Js1hm8q57QIJST/BbMrBA3OAcSrY=;
b=LaqB9JvH7yctgb4+uzb94SXuat6Zd5Mjg2H/5Au0iE0JMNSyTtaCd5PXVzJArRByuI4wnFdmp59F6cL/q/M4zoP4U0N1a/EemGUBQ7YWuqn4Duv8LpOpjyUL4E4nz0rb26ffc2Bbd+uNdmBjK52vFU/vUpeJwqprRoz3sSVayF9la9v/5F+vX0Qng2sjUzkXnQIHyxSVhDkLAeWjqzD7DFmgWjkFjarPuBTDjx29+HxPGFMf7UsKIk/AzmEJo6Z49algoG8WSnPvDb5Y9vBFUO9eABI6fdptnXihdpHV2VUVXFaI5km8IJzax4+eZmjP+CS+F3XEaoytHvGZzp3ekA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=vub.be; dmarc=pass action=none header.from=vub.be; dkim=pass
header.d=vub.be; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vub.be; s=selector1;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=90kJNYTiN6huIB9Js1hm8q57QIJST/BbMrBA3OAcSrY=;
b=mcI2TZiIp0ElS+mpk7qlN/L6iUr/GO7qZsfWhsN+bIN9En9LEfZL+dXEGwUeMeI/uUq3rRuOzdmL6RNx8aXrFZNEDjeGSEdhawcAUG9ONPJHIdnkbHGv8KIDOFe0bHlS5He28IfJyLJwWwhxdlxDV4S4Q5AEcjh8/8HCGCBs+m4=
Authentication-Results: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=vub.be;
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Content-Language: nl-BE
In-Reply-To: <Q144K.71885$001.60018@fx34.iad>
X-ClientProxiedBy: AM9P195CA0016.EURP195.PROD.OUTLOOK.COM
(2603:10a6:20b:21f::21) To DB7PR01MB4155.eurprd01.prod.exchangelabs.com
(2603:10a6:5:2f::21)
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 3ad4df41-3752-4edc-4d53-08da1b2f9729
X-MS-TrafficTypeDiagnostic: DU2PR01MB8797:EE_
X-Microsoft-Antispam-PRVS: <DU2PR01MB87976C631B0572422742F71B8FEB9@DU2PR01MB8797.eurprd01.prod.exchangelabs.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: qawlBSTZ+WvW8lEDs5Lb8pc6FVxbGZrNJnJs/5qKsZSKMP37LJh4pfRjIARs2pWln0Fm95zzeSHFoWHuFu5FXqhWjifnHyRmyHo0tpPJ4Ldr/UrIQ0oHHSgl4CpAyfuhBXCvRcWPsBTCQnbiIlrxAAAS4FNWN6UM71FFxfQb2qMDJIRi0QYwQuKdnfQP82s1Qu0iUgUFP0EeNQhIl87pbZKFXqcIZQWvL7ITGTkTc4BBz004b1HWQToafmo2OSMK5J7Ff/9yqT29VyOVeRWy1gGFcIf60ntXJ48OLStNt4zmNrEQjejZbccgXu7R7UWQONfGNheXOOqHHRhLTqEqeVHuu4u6KTkqKLKzW1giCjNIwCFvBgNOTEYAwoAxlm50o0BSTrX2DTRz75oUivmTHrPHIEJHQXbWZfPc8qHCRI03bEsIKkulVuAdLcDic2Mx8x1TSK4/F78XI3G5dRKnQDVrbjM9IMx20oxrXveIAu3uQQKYYqa7jnpJuQVT3kQn0DGlGH8IneUDQxjCPxU3SrTO1d8qbMvlFNM9FyzDMLe+0MdNZC9t37cT2cR+KFofzd3Zi+2XHvUGMCAOOk3I/ALDZ0SgZmxCl9w+9AxuKo+vtg5CKsqRTDV+gZsoefmrsl7cXdhxCCA1fBcR/bCs3u68iVLPUatpatplJ2UT6T67Q5kyc+ozTJoT3vm5949r3w50nNl6jnR7bAV5vA9U/Q==
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
IPV:NLI; SFV:NSPM; H:DB7PR01MB4155.eurprd01.prod.exchangelabs.com; PTR:;
CAT:NONE;
SFS:(13230001)(4636009)(366004)(6512007)(5660300002)(316002)(186003)(786003)(38100700002)(6486002)(86362001)(44832011)(2616005)(508600001)(6506007)(33964004)(6916009)(2906002)(52116002)(6666004)(53546011)(31696002)(36756003)(66476007)(8676002)(66556008)(31686004)(66946007)(8936002)(43740500002);
DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 2
X-MS-Exchange-AntiSpam-MessageData-0: as3p8vaXqJqZ9IlGmUT4U7l3Nae+2YKUgoq43DCV3olpO
AHKY67T8B9NkfjsGrj8IUtjAZ1vrDZ5r886SxTyd2cKqL
JiprKJB0V8cglUtWbgmIyC6oCNUVhAbqvG8SxD0z2V6Cw
mVvva0seeylkqen9erMim9NyOnMqQRfDV1HlBzvmdQnuS
Qasmz1yTiAhVjulJWmPcDCLoN8zdxQXwM2ZUK6CmZEWg2
0eG41VWb+XRpiXcBFtX/MNhLei0l5lLl6Z0vhubNYJmtq
9DIurS2yufCqU484cZPJGJI+cC+XeoU42nGnb/EZUDYT/
bJFbLXeQeJzNd2WcMCoiNdW804NCNRnNsn48QaY4vtxPU
u/mTQEaOT5gLez0CUdxqc0bjsM322gpCYwPMPf1cCxmdk
0sU4IrI/lxegMEJS115DGnsF0erAeUgdSRtGV9eseq0F+
K1l5diSJ7h+mqhpCOZFggocmfj6d/YK6v3A2CWUSN9LgH
dHRfWKUXGl16NK+ZXbK2zrZblNmH2E3WI2A8Oxf3bYfVW
oAEeAoCEHdO1R+UzCGAsq9hWU7e2BOdFPLupkcy7wilir
CSTMdhbPSTLkt1MUiwEmERTWm8tsy5ikAlMm96eP/gjU7
Ho9kFj8FmbNLeipb/s87m783Hcrg2PBE9L2e//PA7R+vU
sl2FDQoFyljsXwgxGEmXuwGbWYprDJ7aCyS6K/Z7HkNDR
Kuxty6QvY2M3hkRmtS6CyFrxJ0P6NiaxresUQa8CL/JYm
61p1OkK6aiE3WlOEUblwl0fY0m7ai/GReGg+ynpn50Ezz
NhlrsGXmVnT4coMlz+rR1wD1XMtdae5ig/AI5GMKScmsR
61eZGpBeI86FTccLr6xXRbKwy1kZ0XH24MoO5XxRIxzCd
sdoaWSDeTxOTQ/O1WmIuw7Vx3aCsnYZMl9JRmG7nDxDjC
vAARYmFADmWaGE7hEI/XQUsODnG4FQ8jqIBcJ7msPog+A
OK4f2paMMc7e2CZ7+oxMmks34Mc12nMAZnjQbD2dGZgxq
PWltjiBE8GE6IyuD2CchJcLAhITrYCgp9Q359mHSvJttk
+jfMfoD5ABvMbJqhYByE5Pb+I2D3RF0oHbMEtxuqJmC2L
SSNi7WvtbtxKKcbajkZ9q+fOUg5FpnkjyiWWZOlfzNuHD
9ssDovdGOAkc3DBjn1M9SDgpDaXeDvt9WP9kmyEvqQwL1
8oNJ73HLVOlEXyVC6OF/qskPwh/FbJjmyS9i3wGPwAVje
vGXh0Z3PtkRFQFxD4TcAktIgRPK1uGipPX/DzPTSu5uWO
/If/fczRe2O8SklGr1e5zqTCnF/mNx4xSYJsOBKmZ4BR9
oX/Gh0ddBJvYQj4nPQtKsSGVYfJ2D5WMQmn6W8+twZxTk
BnOXaQhsGkUVy8Dk7CnDyGq89oqCCzOtH9QOlbIU/RVCW
Aqmv3gOdv6J8YX+zOQvrQxEDzXELH0iDSa747VUpicmpq
kkvOjiJSmRo3Vbyu1kjClWX0fqE7kAEUk0nPD98qC3tFR
yecu3dAnZD5CzvzqpZl4MsnPjQubmZpMjnzLx9uJECgc/
6QVZr3EvRSmutrF1aR6NooNktIYrJvbof/mh1RG1USQ86
nMoRvjnnUOMob0vWVmLGTZgSCshJTkUc/4MekMIRgsIzL
BLoHBk0Rhz/wVu9Ty0rllSgOfCpLczb9+jIO98M3GYysD
VGHDJ4jy8IhMGHROWnuaHXB3IpBKlTnWbw0jdw0X8iL4m
0soVU13UR8OWaUI77IrPPikdw5E4Eug5aEQ6G8XhMIic2
X-MS-Exchange-AntiSpam-MessageData-1: n7qbDAzQYuYvGA==
X-OriginatorOrg: vub.be
X-MS-Exchange-CrossTenant-Network-Message-Id: 3ad4df41-3752-4edc-4d53-08da1b2f9729
X-MS-Exchange-CrossTenant-AuthSource: DB7PR01MB4155.eurprd01.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Apr 2022 20:20:44.1856 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 695b7ca8-2da8-4545-a2da-42d03784e585
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: KEkVylp3+jcC2ysQNYTc6INwYCrkg0OE5FEPn+oXQv2+VEgF7JcplonatrVWAmIX/OPmCQzWEA7n7aa2UOz9wA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR01MB8797
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
X-Mailman-Original-References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
 by: Antoon Pardon - Sun, 10 Apr 2022 20:20 UTC

Op 9/04/2022 om 02:01 schreef duncan smith:
> On 08/04/2022 22:08, Antoon Pardon wrote:
>>
>> Well my first thought is that a bitset makes it less obvious to calulate
>> the size of the set or to iterate over its elements. But it is an idea
>> worth exploring.
>>
>
>
>
> def popcount(n):
>     """
>     Returns the number of set bits in n
>     """
>     cnt = 0
>     while n:
>         n &= n - 1
>         cnt += 1
>     return cnt
>
> and not tested,
>
> def iterinds(n):
>     """
>     Returns a generator of the indices of the set bits of n
>     """
>     i = 0
>     while n:
>         if n & 1:
>             yield i
>         n = n >> 1
>         i += 1
>
Sure but these seem rather naive implementation with a time complexity of
O(n) where n is the maximum number of possible elements. Using these would
turn my O(n) algorithm in a O(n^2) algorithm.

--
Antoon Pardon.

Re: Comparing sequences with range objects

<YdK4K.571661$7F2.149038@fx12.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17771&group=comp.lang.python#17771

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Subject: Re: Comparing sequences with range objects
Content-Language: en-GB
Newsgroups: comp.lang.python
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
<0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
<mailman.70.1649622049.20749.python-list@python.org>
From: dun...@invalid.invalid (duncan smith)
In-Reply-To: <mailman.70.1649622049.20749.python-list@python.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 49
Message-ID: <YdK4K.571661$7F2.149038@fx12.iad>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Mon, 11 Apr 2022 00:02:00 UTC
Organization: blocknews - www.blocknews.net
Date: Mon, 11 Apr 2022 01:01:58 +0100
X-Received-Bytes: 2896
 by: duncan smith - Mon, 11 Apr 2022 00:01 UTC

On 10/04/2022 21:20, Antoon Pardon wrote:
>
>
> Op 9/04/2022 om 02:01 schreef duncan smith:
>> On 08/04/2022 22:08, Antoon Pardon wrote:
>>>
>>> Well my first thought is that a bitset makes it less obvious to calulate
>>> the size of the set or to iterate over its elements. But it is an idea
>>> worth exploring.
>>>
>>
>>
>>
>> def popcount(n):
>>     """
>>     Returns the number of set bits in n
>>     """
>>     cnt = 0
>>     while n:
>>         n &= n - 1
>>         cnt += 1
>>     return cnt
>>
>> and not tested,
>>
>> def iterinds(n):
>>     """
>>     Returns a generator of the indices of the set bits of n
>>     """
>>     i = 0
>>     while n:
>>         if n & 1:
>>             yield i
>>         n = n >> 1
>>         i += 1
>>
> Sure but these seem rather naive implementation with a time complexity of
> O(n) where n is the maximum number of possible elements. Using these would
> turn my O(n) algorithm in a O(n^2) algorithm.
>

I thought your main concern was memory. Of course, dependent on various
factors, you might be able to do much better than the above. But I don't
know what your O(n) algorithm is, how using a bitset would make it
O(n^2), or if the O(n^2) algorithm would actually be slower for typical
n. The overall thing sounds broadly like some of the blocking and
clustering methods I've come across in record linkage.

Duncan

Re: Comparing sequences with range objects

<mailman.72.1649639570.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17773&group=comp.lang.python#17773

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: 2QdxY4Rz...@potatochowder.com
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Sun, 10 Apr 2022 19:44:44 -0500
Lines: 48
Message-ID: <mailman.72.1649639570.20749.python-list@python.org>
References: <CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
<0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
<YlN5/H/wc2EFlcUf@scrozzle>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de SPSf2OmcckUIUjZFI6ajIgl3zv6aonf2VQQOnm36RTkA==
Return-Path: <2QdxY4RzWzUUiLuE@potatochowder.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.007
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'def': 0.04; '"""': 0.09;
'set,': 0.09; '+0200,': 0.16; 'antoon': 0.16; 'bits': 0.16;
'elements.': 0.16; 'from:addr:2qdxy4rzwzuuilue': 0.16;
'from:addr:potatochowder.com': 0.16; 'integer,': 0.16; 'iterate':
0.16; 'loops': 0.16; 'message-id:@scrozzle': 0.16; 'naive': 0.16;
'pardon': 0.16; 'received:136.243': 0.16; 'received:www458.your-
server.de': 0.16; 'received:your-server.de': 0.16; 'schreef':
0.16; 'sets,': 0.16; 'tested,': 0.16; 'yield': 0.16; 'wrote:':
0.16; 'to:addr:python-list': 0.20; 'returns': 0.22; 'received:de':
0.23; 'idea': 0.24; 'bit': 0.27; 'seem': 0.31; 'do.': 0.32;
'elements': 0.32; 'received:136': 0.32; 'but': 0.32; 'header:In-
Reply-To:1': 0.34; 'using': 0.37; 'once': 0.63; 'your': 0.64;
'well': 0.65; 'less': 0.65; 'maximum': 0.67; 'complexity': 0.69;
'generator': 0.69; 'obvious': 0.69; 'charset:iso-8859-1': 0.73;
'cnt': 0.84; 'indices': 0.84; 'received:70': 0.84; 'received:88':
0.84
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
X-Authenticated-Sender: 2QdxY4RzWzUUiLuE@potatochowder.com
X-Virus-Scanned: Clear (ClamAV 0.103.5/26508/Sun Apr 10 10:22:23 2022)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <YlN5/H/wc2EFlcUf@scrozzle>
X-Mailman-Original-References: <CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
<0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
 by: 2QdxY4Rz...@potatochowder.com - Mon, 11 Apr 2022 00:44 UTC

On 2022-04-10 at 22:20:33 +0200,
Antoon Pardon <antoon.pardon@vub.be> wrote:

>
>
> Op 9/04/2022 om 02:01 schreef duncan smith:
> > On 08/04/2022 22:08, Antoon Pardon wrote:
> > >
> > > Well my first thought is that a bitset makes it less obvious to calulate
> > > the size of the set or to iterate over its elements. But it is an idea
> > > worth exploring.
> > >
> >
> >
> >
> > def popcount(n):
> >     """
> >     Returns the number of set bits in n
> >     """
> >     cnt = 0
> >     while n:
> >         n &= n - 1
> >         cnt += 1
> >     return cnt
> >
> > and not tested,
> >
> > def iterinds(n):
> >     """
> >     Returns a generator of the indices of the set bits of n
> >     """
> >     i = 0
> >     while n:
> >         if n & 1:
> >             yield i
> >         n = n >> 1
> >         i += 1
> >
> Sure but these seem rather naive implementation with a time complexity of
> O(n) where n is the maximum number of possible elements. Using these would
> turn my O(n) algorithm in a O(n^2) algorithm.

O(n) where n is the expected number of elements. The loops iterate once
for each bit actually contained in the set, which is usually [much] less
than the size of the universe. If you have lots and lots of elements in
your sets, or your universe is large and n is a long integer, then these
may not be as efficient as other methods. You know your data better
than we do.

Re: Comparing sequences with range objects

<mailman.76.1649746200.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17785&group=comp.lang.python#17785

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: antoon.p...@vub.be (Antoon Pardon)
Newsgroups: comp.lang.python
Subject: Re: Comparing sequences with range objects
Date: Tue, 12 Apr 2022 08:49:50 +0200
Lines: 61
Message-ID: <mailman.76.1649746200.20749.python-list@python.org>
References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
<0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
<mailman.70.1649622049.20749.python-list@python.org>
<YdK4K.571661$7F2.149038@fx12.iad>
<e56fb2e5-7daa-24ef-4f3a-467adff200c6@vub.be>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de LWbpKeVaH034Zvv12ZhNlAa0i3r70yiTdr/dNBR/Scow==
Return-Path: <Antoon.Pardon@vub.be>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="1024-bit key; unprotected key"
header.d=vub.be header.i=@vub.be header.b=haHCcUtW; dkim-adsp=pass;
dkim-atps=neutral
X-Spam-Status: OK 0.012
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'def': 0.04; '"""': 0.09;
'blocking': 0.09; 'memory.': 0.09; 'skip:\xc2 20': 0.09; '64bit':
0.16; '>>>>': 0.16; 'antoon': 0.16; 'bits': 0.16; 'clustering':
0.16; 'elements.': 0.16; 'from:addr:antoon.pardon': 0.16;
'from:addr:vub.be': 0.16; 'from:name:antoon pardon': 0.16;
'iterate': 0.16; 'message-id:@vub.be': 0.16; 'naive': 0.16;
'operations,': 0.16; 'pardon': 0.16; 'schreef': 0.16; 'sounds':
0.16; 'tested,': 0.16; 'vector': 0.16; 'words.': 0.16; 'yield':
0.16; 'wrote:': 0.16; 'instead': 0.17; 'to:addr:python-list':
0.20; "i've": 0.22; 'returns': 0.22; 'idea': 0.24; '>>>': 0.28;
'header:User-Agent:1': 0.30; 'seem': 0.31; 'concern': 0.32;
'execution': 0.32; 'but': 0.32; 'able': 0.34; 'header:In-Reply-
To:1': 0.34; 'meaning': 0.35; 'people': 0.36; 'change': 0.36;
'main': 0.37; 'using': 0.37; 'methods': 0.39; 'double': 0.40;
'four': 0.60; 'skip:\xc2 10': 0.62; 'come': 0.62; 'overall': 0.64;
'times.': 0.64; 'your': 0.64; 'well': 0.65; '&amp;': 0.65; 'less':
0.65; 'maximum': 0.67; 'operations': 0.68; 'complexity': 0.69;
'generator': 0.69; 'obvious': 0.69; 'population': 0.69;
'8bit%:100': 0.76; 'cnt': 0.84; 'indices': 0.84;
'received:eurprd01.prod.exchangelabs.com': 0.84; 'dependent': 0.93
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=WubuB7etQr9Ju63htlycPADVoRSt637bGmCgigAHJU0alU/+eatYVzo/keQ75I8y7QrYPKuDw3taIw5z7Z6OrchNHxd9c1G2IVaBBK0NNbda2i6djK4k0B+4mLCRQHvUrbBGTugU+A2O8yrnSQ9udh0YzwYxdgyQim8c9WgvwNOL+kqolpcxbXf7OWmt83Nvku2revT3tEPbmbsvyJv1X4YsTRE4GUA2Ex4tGnTkLX8AF0DexmYEwqxu6zyPVuUv1j4yFZtIYg+dATw0L0SFo1c97+CYSZePMwhsJUh7Smd/F3j4a1iz18A6DjqjGZzGLXlqEPkhELDjx2C8jEUnow==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=hydgKtSwEfAyi3IqS3WZ2m1RfRys8V+8LTzuGKUAXtg=;
b=JyAx8pLPpjY4E9pGVCeYedlg4uJ42BRhGqYwll8Obcv2N7fI9068M59+CVab86MrOtkA/BjocNrRhQEHdKMlEOipnJoKIP7DgA7nFUXbWjn+ZpuGa8720L8fgcmLiBAnAo5wdCzGPYDA4j300SvmRvYVEkH1GKjq8CAGiCMexkesTUCdif7WjMaj+mmsMiAE0z+FpEHXKVt7ABo0AsJFJm3dUB4PAcfPPuaKt4iPb0jG2gkT17FfblvByg2kRYJYwBYOfOymUFy7lJK7r6GVMQi4OLVERSQ7NGyaWgLjfjayzIaIs69Pra8BGQgSi4Z/JK5GplTsSrS1PKVMzMTAsA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=vub.be; dmarc=pass action=none header.from=vub.be; dkim=pass
header.d=vub.be; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vub.be; s=selector1;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
bh=hydgKtSwEfAyi3IqS3WZ2m1RfRys8V+8LTzuGKUAXtg=;
b=haHCcUtW0T0beVxVdunNDWhpFDAaksKIpFc2Jq4kSImCZFuqpfLezkgJtgJ94bovHYh2H+n8VEqhTvUBPbhZIZK3iI5rJNkAoLjgoXyj8/G4W0te6REnQG3pvSGjIoZ048/P5M3k4/uz7FC4ZSuJYBdKt6yaJlN1fHXqTNjLZvg=
Authentication-Results: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=vub.be;
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Content-Language: nl-BE
In-Reply-To: <YdK4K.571661$7F2.149038@fx12.iad>
X-ClientProxiedBy: AM8P251CA0026.EURP251.PROD.OUTLOOK.COM
(2603:10a6:20b:21b::31) To AM0PR01MB4147.eurprd01.prod.exchangelabs.com
(2603:10a6:208:6a::12)
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: dfc3c652-7fea-4099-de7a-08da1c50a86c
X-MS-TrafficTypeDiagnostic: AS8PR01MB8635:EE_
X-Microsoft-Antispam-PRVS: <AS8PR01MB863582CCE6CDB3F792D3D9C88FED9@AS8PR01MB8635.eurprd01.prod.exchangelabs.com>
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: zfqgVbwmSBOld2vh0oR492Q5mdf06rlOsIv3o/SpTQQ2qGfKZ+Zyb02PWqAmGRBSrhVrH+O4fEPr4tmNfNbBDFCphpSZwwxGXFEiMd9jBdlvWsth8jdD29YNERaR8B77T+AEetbmePyYHWT331KqYpv8YQEzQ8x/moHivrLKObsInJ45CHmIXguUSxEc3Oa03tLcU27HZqZEJAExNN4AehOtD+Q7t/VPAu3keN5mm1i0KkT7VUcGBk01XgmKbjt0aZKVudJf2eMR/mYg2QD0vZcjiVSVhF2ZNOAdm+0h43LT0mBRu4sEf6MzAFMS+qu92q8cvqJeko6d5z2yU3QpdwWKZkYKIh8NHMlYF2KANVSIpf7jvuIOYiwFvT9bgAL3H4E8jqoSiLorJmm/4d4noNsRUR4kDBXRVGsLNtR/Uz0Y3lO/gFTHeonQKeACBQf57FwLNU2cEVLBVzgG9Sw4Sl4xG/EEHwRzJ4/gEwp2Cd+vdJ9m1y9W2szxiEWOZaxItVMoVZRx3IYCYPa2EiiUV8pdZyaT1DXn873o/HdI5B+g/IPyDsvmreIFHb46ogantn89tP9KP7vHLbViz8OouSvtTXGEXnI58KHoctR9rLUOA3HsKLy1/Oz7SKjLKtRQoqLAv3rf5TxOT2n5i5CQ+7g7U4jPBrieRRnTYPOWGHcFSkaAcNvdBJUwBZ0je19oQFZ47EnLrAtKoJLfvAyJZIGQ8PAzhNskMJNUiiOzp2A=
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
IPV:NLI; SFV:NSPM; H:AM0PR01MB4147.eurprd01.prod.exchangelabs.com; PTR:;
CAT:NONE;
SFS:(13230001)(4636009)(366004)(6666004)(6512007)(6506007)(33964004)(31686004)(52116002)(53546011)(508600001)(5660300002)(44832011)(6486002)(36756003)(8936002)(38100700002)(2616005)(66946007)(8676002)(66556008)(66476007)(186003)(83380400001)(2906002)(6916009)(786003)(31696002)(316002)(38350700002)(86362001)(26005)(43740500002);
DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: byOFPZfulLUrj6ni5u0HZoomtuxtok+kPiibH1k0LH+JH
NYjvxyJCnsVS1OkDe8asU5Fq6SFf53k5k+LNWxifwy5bE
oy5G24TJHt24OtDDbae1LJZugCPHUSS5b0dddnE2CtUDa
LQwdB8VuJBeH0dg7410OTWh0xJXT2w4LcfZP81+R9y253
e2G3z8KWGvCJ/1lCC4uRu1Krd5YtJJ5I/2GHWGgxCzB1b
HNWV2REfkaxLRBOSn7l3U9gkzw1Gd81zFdaJdR5FQfrwK
r+pifeUpwPQUYPWedAGFDGF8XAUBoJIcdkJ6y8BQ8amgG
vkGBLsaCUqOb6Z9tPMKcX4HxygEU7BGS4Ctxa4pWO03LE
y/4Mi1s3VIB7+6goXtVm+fe8fHAfsXKVPanjiykmqnCFo
/d5lOdQSa3V5Phtya2lwhf1zOuYZYTq+vvulPbDGOwndk
sBsrJmtyd5J0atldSx2LF1z+/fpMRLPvJKYK+9Dpk3dRr
QSY1Y7dvMiJl/ATA+r6E6qOfO7qaxJ28YGWgNcZ/e7wFe
QyFSiPozXn/vCQUIZ4ik3U6r+91zXQc6hnF61StU8bjp3
cEEQDqZ+zv4anNqYPaj9dxwdFZ7QlIvHPbpSSEC/1iQ33
L8Qx1fCjaYAx30zp+VmIW67cNW7eQsPOSah3uM9zFDWZG
/SMn5UxP3yqB0xAxUxjEJWrfcyiOm5kjPD2K6DVYaZeLt
r1HkdEEmLkL1tdbMUvznCFvFFmm52ANpoS5kugwEedw84
XPx1LQ0q108FdIsyvvztRz2Xr+JhOiw5BAgTzciug384j
C9rNO0VRDIbdMtb3D4n+SjBjsv+Zu+rDeNwyebLdjLeqD
b2GxsypdsZnRmwRhOyLDPeCZlRDbmroSr0xZvbPkyn222
KmxsENDIkomMwStSlSBm/Tzte0AdTuHQjYG2KsltGpsGb
2sQ23G593xeumc0qyQJj38tMHYsr3oAKObphQVIlW5W+t
IReicU9W8LsJVzlnWbeDEvxfa7QLerYl/S2oISA4pzUTI
DQAIuGe78XVT6+UnpX1LAAO6LLCl3VWQOQS+33ej+SyhV
sIt+UaD6JiAgJNMZb7heKOUHrOK7koLImxTegmvnP6itZ
0BPcIK5HdomWF/kG82AlwibtGR/aggwQEg2n4oCDH8BjQ
2CL7XYNrLfwPYfOp6ppdV716Kpjoy7W3jA54abCuTTr9n
/LyrVYMPamIXv/JspNymBx4R98fl3poGGnL1VFLwxk4W1
9gkkjsKRV57YQgegXJ9lV4igyTyM7Ii/K1gfvuYnPwEvL
BB6+TzxKoO2qZZbiuoBgbfHlIDV06QR5mdIQS4KKdeTGq
qPsz7PW3c+fVedu2ufba4V4TuW3okuuzybX8IC7XhdVcl
sUGzAuhT1D2V7ATN4sQ8y0eGsSeycIhtPSvsQLdEfnOa/
/M5VsZ3a+x7WXHlqp9XfAYOibc/91b9F/rfNlZvnySEe3
pgVgtrDBCn55Rt7HZGFs9hKiQzLKavf/x+d3QQztOeyqG
4agDCiw0vkgjyvxfhhzN2c2Yep4LVa3LVEepWmdSxLgp+
xd+5nrRZw8Vb1tdht3zlXoU7d7sscFG7Qdr4xMxUcsSoF
DjYzagRDmd3enb/THnduw9E8ujh9fu5knRNGgNTtN+OeX
buIN29qTEk0WbhbLv0qy53OI1LODIv6HablRb/5TtYyVL
NTg39a7ZZ9mPP6Hgxj4yQ6+7mg6xmWIe8igo18bw==
X-OriginatorOrg: vub.be
X-MS-Exchange-CrossTenant-Network-Message-Id: dfc3c652-7fea-4099-de7a-08da1c50a86c
X-MS-Exchange-CrossTenant-AuthSource: AM0PR01MB4147.eurprd01.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Apr 2022 06:49:57.7827 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 695b7ca8-2da8-4545-a2da-42d03784e585
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: luL0LfBWUmzdVKwiWWf7ekfB+NW/4ZKMaW6ZJ4xgdM0qpfQh/ZrTrNRuf0npmO5QZCyLTmCMEUZyKeUMFXHuZQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR01MB8635
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <e56fb2e5-7daa-24ef-4f3a-467adff200c6@vub.be>
X-Mailman-Original-References: <98f69f0d-3909-ca13-a440-1d226164b9a5@vub.be>
<CAPM-O+wZU1KQRy-JE_Ez6aDBaj=4fGN93H7v6gBHaVsp5+3JEQ@mail.gmail.com>
<af73f623-078a-6cfe-0761-16c62e378568@vub.be>
<20220408062448.ny3nufpeyajwkcqp@hjp.at>
<7f32456e-0022-1230-f5b4-b86ad057cb68@vub.be>
<mailman.61.1649402503.20749.python-list@python.org>
<kEX3K.402086$iK66.240336@fx46.iad>
<92c3d7c7-4876-b4f4-bcd1-f428eb6865c0@vub.be>
<mailman.67.1649452095.20749.python-list@python.org>
<Q144K.71885$001.60018@fx34.iad>
<0cdbc5b1-0fd9-7a39-e7a9-c80a4566c087@vub.be>
<mailman.70.1649622049.20749.python-list@python.org>
<YdK4K.571661$7F2.149038@fx12.iad>
 by: Antoon Pardon - Tue, 12 Apr 2022 06:49 UTC

Op 11/04/2022 om 02:01 schreef duncan smith:
> On 10/04/2022 21:20, Antoon Pardon wrote:
>>
>>
>> Op 9/04/2022 om 02:01 schreef duncan smith:
>>> On 08/04/2022 22:08, Antoon Pardon wrote:
>>>>
>>>> Well my first thought is that a bitset makes it less obvious to
>>>> calulate
>>>> the size of the set or to iterate over its elements. But it is an idea
>>>> worth exploring.
>>>>
>>>
>>>
>>>
>>> def popcount(n):
>>>     """
>>>     Returns the number of set bits in n
>>>     """
>>>     cnt = 0
>>>     while n:
>>>         n &= n - 1
>>>         cnt += 1
>>>     return cnt
>>>
>>> and not tested,
>>>
>>> def iterinds(n):
>>>     """
>>>     Returns a generator of the indices of the set bits of n
>>>     """
>>>     i = 0
>>>     while n:
>>>         if n & 1:
>>>             yield i
>>>         n = n >> 1
>>>         i += 1
>>>
>> Sure but these seem rather naive implementation with a time
>> complexity of
>> O(n) where n is the maximum number of possible elements. Using these
>> would
>> turn my O(n) algorithm in a O(n^2) algorithm.
>>
>
> I thought your main concern was memory. Of course, dependent on
> various factors, you might be able to do much better than the above.
> But I don't know what your O(n) algorithm is, how using a bitset would
> make it O(n^2), or if the O(n^2) algorithm would actually be slower
> for typical n. The overall thing sounds broadly like some of the
> blocking and clustering methods I've come across in record linkage.

Using bitsets would make change my algorithm from O(n) into O(n^2) because
the bitset operations are O(n) instead of O(1). If I have a 20000 people
a bitset will take a vector of about 300, 64bit words. With 40000 people
the bitset will take a vector of about 600. So doubling the population
will also double the time for the bitset operations, meaning doubling
the population will increase execution time four times.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor