Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

What is research but a blind date with knowledge? -- Will Harvey


devel / comp.lang.python / Code improvement question

SubjectAuthor
o Code improvement questionMike Dewhirst

1
Code improvement question

<mailman.246.1700003665.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24466&group=comp.lang.python#24466

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: mik...@dewhirst.com.au (Mike Dewhirst)
Newsgroups: comp.lang.python
Subject: Code improvement question
Date: Wed, 15 Nov 2023 10:14:10 +1100
Lines: 47
Message-ID: <mailman.246.1700003665.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de HiNz10md0wKtZfSsLSLXiAa3cn9Lp5ynxD0JRH0t6s4g==
Cancel-Lock: sha1:adx8xUkuDj42zGB52AR0UzZY+u8= sha256:gEg1wNa7KJUDgHseBmoEVhRcN5rAbxC9Z/t0OUVRduw=
Return-Path: <miked@dewhirst.com.au>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.014
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'def': 0.04; '"""': 0.09;
'set.': 0.09; 'smaller': 0.09; 'subject:Code': 0.09; '&gt;': 0.14;
'bits': 0.16; 'extracting': 0.16; 'from:addr:dewhirst.com.au':
0.16; 'from:addr:miked': 0.16; 'from:name:mike dewhirst': 0.16;
'hints': 0.16; 'received:webmastery.com.au': 0.16;
'subject:improvement': 0.16; 'super': 0.16; 'testing.': 0.16;
'advance.': 0.17; 'subject:question': 0.17; 'to:addr:python-list':
0.20; 'code': 0.23; "i'd": 0.24; 'pattern': 0.26; 'else': 0.27;
'bit': 0.27; 'header:User-Agent:1': 0.30; 'specified': 0.32;
'but': 0.32; "i'm": 0.33; 'received:192.168.0': 0.33;
'received:au': 0.35; 'files': 0.36; 'received:192.168': 0.37;
'thanks': 0.38; 'match': 0.40; 'me.': 0.62; 'skip:\xc2 10': 0.62;
'here': 0.62; 'improve': 0.66; 'numbers': 0.67; 'drop': 0.69;
'pieces': 0.70; '8bit%:100': 0.76; 'cheers': 0.76; '"")': 0.84;
'cas': 0.91; 'received:103': 0.91
User-Agent: Mozilla Thunderbird
Content-Language: en-US
Autocrypt: addr=miked@dewhirst.com.au; keydata=
xsBNBF8LqvIBCACv4FdFv4O01C+v5M9crmjaS2wSPvNtzJGipliWAHBvcaPOkeoiCNcSRYFL
f+uFcnRIxeNGbNAlhFeT7rhd9s5sn8x6IbBmFZgb0wFYQOKh/8L/3i2MK5IBnWjk+dN647BD
ed2iCFlj08QWZQssMEupabCAly4zTes9wwtnoiVroeoXmGKyeDBb+GGTpo+rydDPGEBdxqIg
2ErTICGxxkYsY/WAWUcMesbCA21ChcMBvoPAMHxKBSjDCeNJ59cR2Y+ae5/CJ2lIrWBsDfXA
H35u/hs5V68yrMfUx7dT+vD3uHhphjnI7ig+xLJlIeZGj2UCtC+dJ4HlmzA/+Vmpy7W1ABEB
AAHNJU1pa2UgRGV3aGlyc3QgPG1pa2VkQGRld2hpcnN0LmNvbS5hdT7CwI4EEwEIADgCGwMF
CwkIBwIGFQoJCAsCBBYCAwECHgECF4AWIQT80KDtd/VTJ2hUWCka9bfUGoI4swUCXw+cuAAK
CRAa9bfUGoI4sz29B/48c+CzO864l0F+drEubAc2m6Fq1+NGD9M2Vb6a2xmOjkf+43WxS1Pw
ueAVJrhhS4kvKIE+rkuIVbJ7VVbYGVD3qTSHFgdojrbmghytJb7YKr2Cb0T7q25wAxbBSmG3
KdEEP4DJh5yP7qS2OnaR49iJkyHHvfTANHEOHnAfXCS4+CFy1Iv8HHRzRIwHj4Y4epf85Tnc
IY+jAnAXJCsMmdCK88TYLt/N5v5ilp+wDsVsQu7bVBiU+inuIJVzgmgj0UCs2dJBCoKPYq6i
EWHyvrETnAw8sahVthkyHWAJkaO58VI+jcHAbFLlsafF619De0M/LqPTBkTMRjgFHpmUUdor
zsBNBF8LqvIBCADc1gGJbL/pUndIMLkRw0O7y75XyWzTHF5LBJMGtIu5uP8bjZl/rL+JAC5p
TJa4FX22CszC46Fxk8cZIw3G3MY2CJRA3ocmyzkRs3xQQEKRf5KehaLk0vWPwbGNqQ/c1EUj
hBD7QPw8byIBOfl+CiPtrUWJOz+NDOTKtENVvpcBbP7PRz87g1dfWJ4qg1FdwNM8VJADyh90
mkRS3K31L+dFjx/P/Yitj7HNNVj102Y1+d9YcxYIXaewKH/etknrJyMWtD3vA2iBi2Rr3y8w
SdOQgIWGj5POgiIMiLKK80FJHf7EAWIr1Sd8zEa9sIugp5xGvAEqjzP9ebJMRIgG9aARABEB
AAHCwHYEGAEIACACGwwWIQT80KDtd/VTJ2hUWCka9bfUGoI4swUCXw+c2AAKCRAa9bfUGoI4
s5O+B/9DVQ3B9E8tR/90RNDIX6ykvg2p4ZzWrUKJVtiOYX/knL9UAV2GKF+CzzOSQ2Z/LtBS
5UR/dL1J1QLDy8u+5anaL1oJkD4GXNsx8vtLVOChNEfuDoFRhsxiiMRV2X9BhAzYjV9Fn0mc
SxZbBJZIMlD2oDd6YhTVogmSO9hu5z27+7EBbCdyf9Ru0U3XOy7c0H5Csa2Yz+DIImUgqJnP
iSFHVR13zzJuJmMn8xtOuraVfxo0UOcg85yU8x9MOlTTjuZqBOmYbSraE95YbNYoZZljPlvv
oaA63mAZGJlOm9kwRwEO3jUzseR5S8L7cSLKFJBq/Q7mK15p+ZFLc2aH3OrS
X-PPP-Message-ID: <20231114231411.4041525.5705@plesk01.e1.webmastery.com.au>
X-PPP-Vhost: chemintro.com
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
 by: Mike Dewhirst - Tue, 14 Nov 2023 23:14 UTC

I'd like to improve the code below, which works. It feels clunky to me.

I need to clean up user-uploaded files the size of which I don't know in
advance.

After cleaning they might be as big as 1Mb but that would be super rare.
Perhaps only for testing.

I'm extracting CAS numbers and here is the pattern xx-xx-x up to
xxxxxxx-xx-x eg., 1012300-77-4

def remove_alpha(txt):

    """  r'[^0-9\- ]':

    [^...]: Match any character that is not in the specified set.

    0-9: Match any digit.

    \: Escape character.

    -: Match a hyphen.

    Space: Match a space.

    """

    cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)

    bits = cleaned_txt.split()

    pieces = []

    for bit in bits:

        # minimum size of a CAS number is 7 so drop smaller clumps of digits

        pieces.append(bit if len(bit) > 6 else "")

    return " ".join(pieces)

Many thanks for any hints

Cheers

Mike

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor