Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

My sister opened a computer store in Hawaii. She sells C shells down by the seashore.


devel / comp.lang.python / Re: string storage [was: Re: imaplib: is this really so unwieldy?]

SubjectAuthor
o Re: string storage [was: Re: imaplib: is this really so unwieldy?]Tim Chase

1
Re: string storage [was: Re: imaplib: is this really so unwieldy?]

<mailman.368.1622066521.3087.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=13368&group=comp.lang.python#13368

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: python.l...@tim.thechases.com (Tim Chase)
Newsgroups: comp.lang.python
Subject: Re: string storage [was: Re: imaplib: is this really so unwieldy?]
Date: Wed, 26 May 2021 16:15:38 -0500
Lines: 33
Message-ID: <mailman.368.1622066521.3087.python-list@python.org>
References: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
<hd6qag98c37mvqurlu3mfcvie38o63kn6n@4ax.com>
<d0e29810-858a-8a32-fda6-a68c63224606@mrabarnett.plus.com>
<s8jtd7$e0d$1@ciao.gmane.io> <s8ksoo$10pm$1@ciao.gmane.io>
<20210526080901.0fd1f042@bigbox.attlocal.net>
<s8m1c2$10r0$1@ciao.gmane.io>
<20210526161538.3e6f018c@bigbox.attlocal.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de Bbg3QKOfNlwLvcfD+DjHgAi5WnVG1eriBqAGb1SlUmzg==
Return-Path: <python.list@tim.thechases.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'stream': 0.04; 'bigger':
0.05; 'parameter': 0.05; 'string': 0.05; 'received:108': 0.07;
'subject: [': 0.07; 'overhead': 0.09; 'received:172.0': 0.09;
'received:sbcglobal.net': 0.09; 'string,': 0.09; 'yes.': 0.09;
'-tkc': 0.16; 'from:addr:python.list': 0.16;
'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16;
'gauld': 0.16; 'once.': 0.16; 'received:108.167': 0.16;
'received:108.167.139': 0.16; 'received:108.167.139.23': 0.16;
'received:172-0-250-193.lightspeed.rcsntx.sbcglobal.net': 0.16;
'received:172.0.250': 0.16; 'received:172.0.250.193': 0.16;
'received:174.136.13': 0.16; 'received:174.136.13.174': 0.16;
'received:74.220': 0.16; 'received:74.220.217': 0.16;
'received:accountservergroup.com': 0.16;
'received:cm3.websitewelcome.com': 0.16;
'received:lightspeed.rcsntx.sbcglobal.net': 0.16;
'received:rcsntx.sbcglobal.net': 0.16;
'received:unifiedlayer.com': 0.16;
'received:uscentral455.accountservergroup.com': 0.16; 'specify':
0.16; 'sporadic': 0.16; 'subject:string': 0.16; 'unicode': 0.16;
'wrote:': 0.16; 'tim': 0.23; 'to:addr:python-list': 0.23; '>>>':
0.26; 'cc:2**0': 0.27; 'text': 0.29; 'it,': 0.31; 'but': 0.31;
"doesn't": 0.32; 'python-list': 0.32; 'right,': 0.32; 'split':
0.32; 'using': 0.33; 'header:In-Reply-To:1': 0.33; 'files': 0.33;
'couple': 0.37; 'mean': 0.37; 'file': 0.38; 'though': 0.38;
"it's": 0.38; 'does': 0.38; 'could': 0.40; 'pretty': 0.40; 'in,':
0.61; 'entire': 0.61; 'skip:o 10': 0.62; 'upon': 0.63; 'modern':
0.63; 'subject:this': 0.63; 'later': 0.63; 'per': 0.64;
'received:websitewelcome.com': 0.65; 'took': 0.70; 'positions':
0.75; 'chase': 0.77; 'potentially': 0.77; 'decode': 0.84;
'strings': 0.84; 'subject:really': 0.84; 'largely': 0.91; 'tend':
0.91; 'storage': 0.95
X-Authority-Reason: nr=8
In-Reply-To: <s8m1c2$10r0$1@ciao.gmane.io>
X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; amd64-portbld-freebsd13.0)
X-AntiAbuse: This header was added to track abuse,
please include it with any abuse report
X-AntiAbuse: Primary Hostname - uscentral455.accountservergroup.com
X-AntiAbuse: Original Domain - python.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - tim.thechases.com
X-BWhitelist: no
X-Source-IP: 172.0.250.193
X-Source-L: No
X-Exim-ID: 1lm0sN-000WMO-PK
X-Source:
X-Source-Args:
X-Source-Dir:
X-Source-Sender: 172-0-250-193.lightspeed.rcsntx.sbcglobal.net
(bigbox.attlocal.net) [172.0.250.193]:35929
X-Source-Auth: tim@thechases.com
X-Email-Count: 2
X-Source-Cap: dGhlY2hhc2U7dGhlY2hhc2U7dXNjZW50cmFsNDU1LmFjY291bnRzZXJ2ZXJncm91cC5jb20=
X-Local-Domain: yes
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20210526161538.3e6f018c@bigbox.attlocal.net>
X-Mailman-Original-References: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
<hd6qag98c37mvqurlu3mfcvie38o63kn6n@4ax.com>
<d0e29810-858a-8a32-fda6-a68c63224606@mrabarnett.plus.com>
<s8jtd7$e0d$1@ciao.gmane.io>
<s8ksoo$10pm$1@ciao.gmane.io>
<20210526080901.0fd1f042@bigbox.attlocal.net>
<s8m1c2$10r0$1@ciao.gmane.io>
 by: Tim Chase - Wed, 26 May 2021 21:15 UTC

On 2021-05-26 18:43, Alan Gauld via Python-list wrote:
> On 26/05/2021 14:09, Tim Chase wrote:
>>> If so, doesn't that introduce a pretty big storage overhead for
>>> large strings?
>>
>> Yes. Though such large strings tend to be more rare, largely
>> because they become unweildy for other reasons.
>
> I do have some scripts that work on large strings - mainly produced
> by reading an entire text file into a string using file.read().
> Some of these are several MB long so potentially now 4x bigger than
> I thought. But you are right, even a 100MB string should still be
> OK on a modern PC with 8GB+ RAM!...

If you don't decode it upon reading it in, it should still be 100MB
because it's a stream of encoded bytes. It would only 2x or 4x in
size if you decoded that (either as a parameter of how you opened it,
or if you later took that string and decoded it explicitly, though
now you have the original 100MB byte-string **plus** the 100/200/400MB
decoded unicode string).

You don't specify what you then do with this humongous string, but
for most of my large files like this, I end up iterating over them
piecewise rather than f.read()'ing them all in at once. Or even if
the whole file does end up in memory, it's usually chunked and split
into useful pieces. That could mean that each line is its own
string, almost all of which are one-byte-per-char with a couple
strings at sporadic positions in the list-of-strings where they are
2/4 bytes per char.

-tkc

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor