Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Fascinating is a word I use for the unexpected. -- Spock, "The Squire of Gothos", stardate 2124.5


devel / comp.lang.python / Re: string storage [was: Re: imaplib: is this really so unwieldy?]

SubjectAuthor
o Re: string storage [was: Re: imaplib: is this really so unwieldy?]Tim Chase

1
Re: string storage [was: Re: imaplib: is this really so unwieldy?]

<mailman.361.1622037308.3087.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=13354&group=comp.lang.python#13354

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: python.l...@tim.thechases.com (Tim Chase)
Newsgroups: comp.lang.python
Subject: Re: string storage [was: Re: imaplib: is this really so unwieldy?]
Date: Wed, 26 May 2021 08:09:01 -0500
Lines: 28
Message-ID: <mailman.361.1622037308.3087.python-list@python.org>
References: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
<hd6qag98c37mvqurlu3mfcvie38o63kn6n@4ax.com>
<d0e29810-858a-8a32-fda6-a68c63224606@mrabarnett.plus.com>
<s8jtd7$e0d$1@ciao.gmane.io> <s8ksoo$10pm$1@ciao.gmane.io>
<20210526080901.0fd1f042@bigbox.attlocal.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de 4PFAWNatWTjpXaiJWbd9aAWvZomfNPK24OsfsGrw2HMw==
Return-Path: <python.list@tim.thechases.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.004
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'string': 0.05; 'subject:
[': 0.07; 'byte': 0.09; 'characters,': 0.09; 'overhead': 0.09;
'received:172.0': 0.09; 'received:sbcglobal.net': 0.09; 'string,':
0.09; 'yes.': 0.09; '(so': 0.16; '-tkc': 0.16; 'blob': 0.16;
'characters.': 0.16; 'from:addr:python.list': 0.16;
'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16;
'gauld': 0.16;
'received:172-0-250-193.lightspeed.rcsntx.sbcglobal.net': 0.16;
'received:172.0.250': 0.16; 'received:172.0.250.193': 0.16;
'received:174.136.13': 0.16; 'received:174.136.13.174': 0.16;
'received:accountservergroup.com': 0.16;
'received:lightspeed.rcsntx.sbcglobal.net': 0.16;
'received:rcsntx.sbcglobal.net': 0.16;
'received:unifiedlayer.com': 0.16;
'received:uscentral455.accountservergroup.com': 0.16; 'sensible':
0.16; 'strings,': 0.16; 'subject:string': 0.16; 'unicode': 0.16;
'wrote:': 0.16; 'python': 0.16; 'to:addr:python-list': 0.23;
'space': 0.26; 'cc:2**0': 0.27; 'single': 0.28; 'effect': 0.28;
'mostly': 0.28; 'text': 0.29; 'it,': 0.31; 'but': 0.31; "doesn't":
0.32; 'python-list': 0.32; 'header:In-Reply-To:1': 0.33; 'same':
0.34; 'contains': 0.35; 'mean': 0.37; 'though': 0.38; 'does':
0.38; '8bit%:14': 0.39; 'enough': 0.40; 'otherwise': 0.40;
'pretty': 0.40; 'best': 0.61; 'entire': 0.61; 'subject:this':
0.63; 'your': 0.64; 'cost': 0.64; 'received:websitewelcome.com':
0.65; 'skip:1 20': 0.67; 'less': 0.68; 'global': 0.72; "you'll":
0.75; 'received:192.185': 0.77; 'characters': 0.84; 'savings':
0.84; 'strings': 0.84; 'subject:really': 0.84; 'largely': 0.91;
'tend': 0.91; 'consists': 0.93; 'former': 0.93; 'storage': 0.95
X-Authority-Reason: nr=8
In-Reply-To: <s8ksoo$10pm$1@ciao.gmane.io>
X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; amd64-portbld-freebsd13.0)
X-AntiAbuse: This header was added to track abuse,
please include it with any abuse report
X-AntiAbuse: Primary Hostname - uscentral455.accountservergroup.com
X-AntiAbuse: Original Domain - python.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - tim.thechases.com
X-BWhitelist: no
X-Source-IP: 172.0.250.193
X-Source-L: No
X-Exim-ID: 1lltHT-003lTR-OT
X-Source:
X-Source-Args:
X-Source-Dir:
X-Source-Sender: 172-0-250-193.lightspeed.rcsntx.sbcglobal.net
(bigbox.attlocal.net) [172.0.250.193]:13072
X-Source-Auth: tim@thechases.com
X-Email-Count: 2
X-Source-Cap: dGhlY2hhc2U7dGhlY2hhc2U7dXNjZW50cmFsNDU1LmFjY291bnRzZXJ2ZXJncm91cC5jb20=
X-Local-Domain: yes
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20210526080901.0fd1f042@bigbox.attlocal.net>
X-Mailman-Original-References: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
<hd6qag98c37mvqurlu3mfcvie38o63kn6n@4ax.com>
<d0e29810-858a-8a32-fda6-a68c63224606@mrabarnett.plus.com>
<s8jtd7$e0d$1@ciao.gmane.io>
<s8ksoo$10pm$1@ciao.gmane.io>
 by: Tim Chase - Wed, 26 May 2021 13:09 UTC

On 2021-05-26 08:18, Alan Gauld via Python-list wrote:
> Does that mean that if I give Python a UTF8 string that is mostly
> single byte characters but contains one 4-byte character that
> Python will store the string as all 4-byte characters?

As best I understand it, yes: the cost of each "character" in a
string is the same for the entire string, so even one lone 4-byte
character in an otherwise 1-byte-character string is enough to push
the whole string to 4-byte characters. Doesn't effect other strings
though (so if you had a pure 7-bit string and a unicode string, the
former would still be 1-byte-per-char…it's not a global aspect)

If you encode these to a UTF8 byte-string, you'll get the space
savings you seek, but at the cost of sensible O(1) indexing.

Both are a trade-off, and if your data consists mostly of 7-bit ASCII
characters, or lots of small strings, the overhead is less pronounced
than if you have one single large blob of text as a string.

> If so, doesn't that introduce a pretty big storage overhead for
> large strings?

Yes. Though such large strings tend to be more rare, largely because
they become unweildy for other reasons.

-tkc

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor