Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

In these matters the only certainty is that there is nothing certain. -- Pliny the Elder


devel / comp.lang.python / Re: Best practice for caching hash

SubjectAuthor
* Re: Best practice for caching hashChris Angelico
`- Re: Best practice for caching hashGreg Ewing

1
Re: Best practice for caching hash

<mailman.314.1647410314.2329.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17529&group=comp.lang.python#17529

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ros...@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: Best practice for caching hash
Date: Wed, 16 Mar 2022 16:58:22 +1100
Lines: 40
Message-ID: <mailman.314.1647410314.2329.python-list@python.org>
References: <CAPTjJmreW-t5ggSg+Re_Rvw7yf9RRLp_mav_8pGej-zrfwoAhw@mail.gmail.com>
<YjFSbUEVy4rdttsp@cskk.homeip.net>
<CAPTjJmo9K9odB1PCKdUc3f4F_noWQb43dRHFhPOE1Rq=3ckXqg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de yMXyM84Nu8PDZmTzimKfgADs5CsPidWDKy+YKrdE31SA==
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=lD0XYWyN;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '2022': 0.05; "python's":
0.05; 'demands': 0.07; 'mar': 0.07; 'angelico': 0.09; 'compute':
0.09; 'likewise': 0.09; 'objects,': 0.09; 'practice.': 0.09;
'tags': 0.09; 'unlike': 0.09; 'that.': 0.15; 'both.': 0.16;
'cameron': 0.16; 'chrisa': 0.16; 'dict': 0.16; 'dicts': 0.16;
'flexibility.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris
angelico': 0.16; 'hash': 0.16; 'objective': 0.16; 'sensible':
0.16; 'simpson': 0.16; 'tuple': 0.16; 'tuples': 0.16; 'varied':
0.16; 'wrote:': 0.16; 'python': 0.16; 'values': 0.17; 'to:addr
:python-list': 0.20; 'maybe': 0.22; "what's": 0.22; "i'd": 0.24;
'anything': 0.25; 'object': 0.26; "isn't": 0.27; 'function': 0.27;
'chris': 0.28; 'wrong': 0.28; 'issues.': 0.32; 'objects': 0.32;
'simple,': 0.32; 'message-id:@mail.gmail.com': 0.32; 'but': 0.32;
"i'm": 0.33; 'subject:for': 0.33; 'same': 0.34; 'header:In-Reply-
To:1': 0.34; 'received:google.com': 0.34; 'one.': 0.35;
'from:addr:gmail.com': 0.35; 'work,': 0.36; 'received:209.85':
0.37; 'received:209': 0.39; 'quite': 0.39; 'use': 0.39; 'wed,':
0.39; 'internal': 0.63; 'widely': 0.64; 'similar': 0.65; 'well':
0.65; 'entire': 0.67; 'latter': 0.69; 'relevant': 0.73;
'effective': 0.78; 'significant': 0.78; 'database': 0.80;
'stability': 0.84; 'badly': 0.91; 'former': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=RouG2vXNDyBEeDg8JXs990OmhFdpHekcHwyhOqM5mBE=;
b=lD0XYWyNNBXDfsPNmYjzHzFZ0On/4z/I+ucqtlTkDxdhzUDtjtr+MNW0H1hreLBTRC
VIDT7HJ9S2vyoH6GoWvJYN4JiOGEfuDj7DnEthfjZ8sT3dJUamEUH0zszB0TbyEyaqRo
h2zcB+62sK4PIwbOHpsOJBi4Vsc2HUtPgbMZ1whNs3NM8NIO761VKlHjbfD3bxshnDTo
p5sMKUobUmzwnECSFWdJouiaEr8kSj7OowTNEAtB04oXj+6nigsHAP6KuYi1f2q2v+OR
ynpNlsu5YjjXyPJoAKPevl/X0Yelb1bVi+4BSMpvtDxqYPOsXdFMnp0deTGthLHwhOe8
R15Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to;
bh=RouG2vXNDyBEeDg8JXs990OmhFdpHekcHwyhOqM5mBE=;
b=YvWAR8jGlDzrmfAgJBR6ecNQaYgd7sdyHoUYA8PMyECO/rgaxnCTC3nKVHwukh21+i
mDisdkr+Mn6LrDrY92o3QTc01YHzfXDHtrtFsUIo9tfaymWelri5B3Nk5KeqgLDFboCA
KOAw3N0Ddcu/4e+eFy+r9zx/SS8mhFvJlP4heDfoYbTeOneKVnXJWzyqd/9WkTiR/Hed
7MmGBmF55a3yWp6HVeukfQecgIQNRsvGG+7PktgK9phsphih7vdQ3w/J6BG1yE7e1lIi
ETWAspKq5C7KzDKfBjpERav5b9i7C60ux8ldO0pPSybMaRseCsHYRtc2hkbFIHJPDvVQ
b7rQ==
X-Gm-Message-State: AOAM533NOOk3LDjyM9hD3FP+FcB6Rk03iINF4GyEKQ1/rKxJsd8z9gJi
QcyDp4ovrrpnQqKgnAcYIFDLD09NG5mCrz2EwbpRbc4R
X-Google-Smtp-Source: ABdhPJxauOGQiv6HwxxSvzwsughOFnO3n5sUrv+5KQJkD9XTo9BJOqxtHvNA+OlFPLTfnM1i7p+YdW1pPaHZhBKopTc=
X-Received: by 2002:adf:f801:0:b0:1f0:7675:f5e6 with SMTP id
s1-20020adff801000000b001f07675f5e6mr22051744wrp.564.1647410313448; Tue, 15
Mar 2022 22:58:33 -0700 (PDT)
In-Reply-To: <YjFSbUEVy4rdttsp@cskk.homeip.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmo9K9odB1PCKdUc3f4F_noWQb43dRHFhPOE1Rq=3ckXqg@mail.gmail.com>
X-Mailman-Original-References: <CAPTjJmreW-t5ggSg+Re_Rvw7yf9RRLp_mav_8pGej-zrfwoAhw@mail.gmail.com>
<YjFSbUEVy4rdttsp@cskk.homeip.net>
 by: Chris Angelico - Wed, 16 Mar 2022 05:58 UTC

On Wed, 16 Mar 2022 at 14:00, Cameron Simpson <cs@cskk.id.au> wrote:
>
> On 16Mar2022 10:57, Chris Angelico <rosuav@gmail.com> wrote:
> >> Is it sensible to compute the hash only from the immutable parts?
> >> Bearing in mind that usually you need an equality function as well and
> >> it may have the same stability issues.
> >
> >My understanding - and I'm sure Marco will correct me if I'm wrong
> >here - is that this behaves like a tuple: if it contains nothing but
> >hashable objects, it is itself hashable, but if it contains anything
> >unhashable, the entire tuple isn't hashable.
>
> A significant difference is that tuples have no keys, unlike a dict.
>
> A hash does not have to hash all the internal state, ony the relevant
> state, and not even all of that. The objective of the hash is twofold to
> my mind:
> - that "equal" objects (in the `==` sense) have the same hash, so that
> they hash to the same backet in dicts and can therefore be found
> - that hash values are widely distributed, to maximise the evenness of
> the object distribution in buckets
>
> For dicts to work, the former has to hold.

Python only demands the first one. And Python's own integers hash to
themselves, which isn't what I'd call "widely distributed", but which
works well in practice.

> The latter has more flexibility. A dict has keys. If the dicts are quite
> varied (sets of tags for example), it may be effective to just hash the
> keys. But if the dict keys are similar (labelled CSV-rows-as-dicts, or
> likewise with database rows) this will go badly because the hashes will
> all (or maybe mostly) collide.

The general recommendation is to use all the same data for hashing as
you use for equality checking. What's the point of just hashing the
keys, if the values also contribute to equality? I'd just keep it
simple, and hash both.

ChrisA

Re: Best practice for caching hash

<j9der2F2okkU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17530&group=comp.lang.python#17530

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: greg.ew...@canterbury.ac.nz (Greg Ewing)
Newsgroups: comp.lang.python
Subject: Re: Best practice for caching hash
Date: Wed, 16 Mar 2022 19:36:16 +1300
Lines: 21
Message-ID: <j9der2F2okkU1@mid.individual.net>
References: <CAPTjJmreW-t5ggSg+Re_Rvw7yf9RRLp_mav_8pGej-zrfwoAhw@mail.gmail.com>
<YjFSbUEVy4rdttsp@cskk.homeip.net>
<CAPTjJmo9K9odB1PCKdUc3f4F_noWQb43dRHFhPOE1Rq=3ckXqg@mail.gmail.com>
<mailman.314.1647410314.2329.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net txdY1BD6l9nSUMlQd3l9BgxbxIhkKIPw4snc1dvyJadVd09PO1
Cancel-Lock: sha1:+JyFiwB6ntsMDrK/XEcBconmD/c=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:91.0)
Gecko/20100101 Thunderbird/91.3.2
Content-Language: en-US
In-Reply-To: <mailman.314.1647410314.2329.python-list@python.org>
 by: Greg Ewing - Wed, 16 Mar 2022 06:36 UTC

On 16/03/22 6:58 pm, Chris Angelico wrote:
> And Python's own integers hash to themselves,
> which isn't what I'd call "widely distributed", but which
> works well in practice.

Not exactly, they seem to be limited to 60 bits:

>>> hex(hash(0xfffffffffffffff))
'0xfffffffffffffff'
>>> hex(hash(0x1fffffffffffffff))
'0x0'

And up to that limit they're as widely distributed as you
can get -- each integer hashes to a unique value!

But keep in mind that the hash value itself is not directly
used to locate a dict slot -- there is extra scrambling that
goes on in the dict code.

--
Greg

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor