Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Harrison's Postulate: For every action, there is an equal and opposite criticism.


devel / comp.lang.python / Re: Code improvement question

SubjectAuthor
* Re: Code improvement questionMike Dewhirst
+* Re: Code improvement questionRimu Atkinson
|+- Re: Code improvement questionMike Dewhirst
|+* Re: Code improvement questionPeter J. Holzer
||`- Re: Code improvement questionRimu Atkinson
|+- Re: Code improvement questionThomas Passin
|+- Re: Code improvement questionPeter J. Holzer
|+* Re: Code improvement questionThomas Passin
||`* Re: Code improvement questionStefan Ram
|| `* Re: Code improvement questionStefan Ram
||  `- Re: Code improvement questionPaul Rubin
|`- RE: Code improvement question<avi.e.gross
`* Re: Code improvement questionjak
 `* Re: Code improvement questionMRAB
  `- Re: Code improvement questionjak

1
Re: Code improvement question

<mailman.249.1700019686.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24469&group=comp.lang.python#24469

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: mik...@dewhirst.com.au (Mike Dewhirst)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Wed, 15 Nov 2023 14:41:20 +1100
Lines: 66
Message-ID: <mailman.249.1700019686.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de d6FKRcykQ6Wsd3GrSHBJig9XyPcnYvffrJO1XGALjqmw==
Cancel-Lock: sha1:Dxl0tQBBmd9cZ/x+vNWMTgdhZBU= sha256:7qZGSQFWR8sp/Kz6MN3Us9vjJT6N8l+G0XcvEC3KIx0=
Return-Path: <miked@dewhirst.com.au>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.011
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'looks': 0.02; 'def': 0.04;
'"""': 0.09; 'set.': 0.09; 'smaller': 0.09; 'subject:Code': 0.09;
'&gt;': 0.14; 'bits': 0.16; 'expressions': 0.16; 'extracting':
0.16; 'from:addr:dewhirst.com.au': 0.16; 'from:addr:miked': 0.16;
'from:name:mike dewhirst': 0.16; 'hints': 0.16;
'received:webmastery.com.au': 0.16; 'subject:improvement': 0.16;
'super': 0.16; 'testing.': 0.16; 'thread.': 0.16; 'wrote:': 0.16;
'advance.': 0.17; 'subject:question': 0.17; 'to:addr:python-list':
0.20; 'code': 0.23; "i'd": 0.24; 'tried': 0.26; 'pattern': 0.26;
'else': 0.27; 'bit': 0.27; 'sense': 0.28; 'header:User-Agent:1':
0.30; 'am,': 0.31; 'think': 0.32; 'answers': 0.32; 'python-list':
0.32; 'specified': 0.32; 'but': 0.32; "i'm": 0.33; 'there': 0.33;
'received:192.168.0': 0.33; 'header:In-Reply-To:1': 0.34;
'question.': 0.35; 'received:au': 0.35; 'files': 0.36; 'people':
0.36; 'received:192.168': 0.37; 'thanks': 0.38; 'use': 0.39;
'match': 0.40; 'me.': 0.62; 'skip:\xc2 10': 0.62; 'here': 0.62;
'skip:r 40': 0.64; 'your': 0.64; 'came': 0.65; 'improve': 0.66;
'numbers': 0.67; 'drop': 0.69; 'pieces': 0.70; 'little': 0.73;
'8bit%:100': 0.76; 'documented': 0.76; '"")': 0.84; '10:25': 0.84;
'cas': 0.91; 'received:103': 0.91
User-Agent: Mozilla Thunderbird
Content-Language: en-US
Autocrypt: addr=miked@dewhirst.com.au; keydata=
xsBNBF8LqvIBCACv4FdFv4O01C+v5M9crmjaS2wSPvNtzJGipliWAHBvcaPOkeoiCNcSRYFL
f+uFcnRIxeNGbNAlhFeT7rhd9s5sn8x6IbBmFZgb0wFYQOKh/8L/3i2MK5IBnWjk+dN647BD
ed2iCFlj08QWZQssMEupabCAly4zTes9wwtnoiVroeoXmGKyeDBb+GGTpo+rydDPGEBdxqIg
2ErTICGxxkYsY/WAWUcMesbCA21ChcMBvoPAMHxKBSjDCeNJ59cR2Y+ae5/CJ2lIrWBsDfXA
H35u/hs5V68yrMfUx7dT+vD3uHhphjnI7ig+xLJlIeZGj2UCtC+dJ4HlmzA/+Vmpy7W1ABEB
AAHNJU1pa2UgRGV3aGlyc3QgPG1pa2VkQGRld2hpcnN0LmNvbS5hdT7CwI4EEwEIADgCGwMF
CwkIBwIGFQoJCAsCBBYCAwECHgECF4AWIQT80KDtd/VTJ2hUWCka9bfUGoI4swUCXw+cuAAK
CRAa9bfUGoI4sz29B/48c+CzO864l0F+drEubAc2m6Fq1+NGD9M2Vb6a2xmOjkf+43WxS1Pw
ueAVJrhhS4kvKIE+rkuIVbJ7VVbYGVD3qTSHFgdojrbmghytJb7YKr2Cb0T7q25wAxbBSmG3
KdEEP4DJh5yP7qS2OnaR49iJkyHHvfTANHEOHnAfXCS4+CFy1Iv8HHRzRIwHj4Y4epf85Tnc
IY+jAnAXJCsMmdCK88TYLt/N5v5ilp+wDsVsQu7bVBiU+inuIJVzgmgj0UCs2dJBCoKPYq6i
EWHyvrETnAw8sahVthkyHWAJkaO58VI+jcHAbFLlsafF619De0M/LqPTBkTMRjgFHpmUUdor
zsBNBF8LqvIBCADc1gGJbL/pUndIMLkRw0O7y75XyWzTHF5LBJMGtIu5uP8bjZl/rL+JAC5p
TJa4FX22CszC46Fxk8cZIw3G3MY2CJRA3ocmyzkRs3xQQEKRf5KehaLk0vWPwbGNqQ/c1EUj
hBD7QPw8byIBOfl+CiPtrUWJOz+NDOTKtENVvpcBbP7PRz87g1dfWJ4qg1FdwNM8VJADyh90
mkRS3K31L+dFjx/P/Yitj7HNNVj102Y1+d9YcxYIXaewKH/etknrJyMWtD3vA2iBi2Rr3y8w
SdOQgIWGj5POgiIMiLKK80FJHf7EAWIr1Sd8zEa9sIugp5xGvAEqjzP9ebJMRIgG9aARABEB
AAHCwHYEGAEIACACGwwWIQT80KDtd/VTJ2hUWCka9bfUGoI4swUCXw+c2AAKCRAa9bfUGoI4
s5O+B/9DVQ3B9E8tR/90RNDIX6ykvg2p4ZzWrUKJVtiOYX/knL9UAV2GKF+CzzOSQ2Z/LtBS
5UR/dL1J1QLDy8u+5anaL1oJkD4GXNsx8vtLVOChNEfuDoFRhsxiiMRV2X9BhAzYjV9Fn0mc
SxZbBJZIMlD2oDd6YhTVogmSO9hu5z27+7EBbCdyf9Ru0U3XOy7c0H5Csa2Yz+DIImUgqJnP
iSFHVR13zzJuJmMn8xtOuraVfxo0UOcg85yU8x9MOlTTjuZqBOmYbSraE95YbNYoZZljPlvv
oaA63mAZGJlOm9kwRwEO3jUzseR5S8L7cSLKFJBq/Q7mK15p+ZFLc2aH3OrS
In-Reply-To: <088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
X-PPP-Message-ID: <20231115034120.4073668.64410@plesk01.e1.webmastery.com.au>
X-PPP-Vhost: chemintro.com
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
 by: Mike Dewhirst - Wed, 15 Nov 2023 03:41 UTC

On 15/11/2023 10:25 am, MRAB via Python-list wrote:
> On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
>> I'd like to improve the code below, which works. It feels clunky to me.
>>
>> I need to clean up user-uploaded files the size of which I don't know in
>> advance.
>>
>> After cleaning they might be as big as 1Mb but that would be super rare.
>> Perhaps only for testing.
>>
>> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
>> xxxxxxx-xx-x eg., 1012300-77-4
>>
>> def remove_alpha(txt):
>>
>>       """  r'[^0-9\- ]':
>>
>>       [^...]: Match any character that is not in the specified set.
>>
>>       0-9: Match any digit.
>>
>>       \: Escape character.
>>
>>       -: Match a hyphen.
>>
>>       Space: Match a space.
>>
>>       """
>>
>>       cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>>
>>       bits = cleaned_txt.split()
>>
>>       pieces = []
>>
>>       for bit in bits:
>>
>>           # minimum size of a CAS number is 7 so drop smaller clumps
>> of digits
>>
>>           pieces.append(bit if len(bit) > 6 else "")
>>
>>       return " ".join(pieces)
>>
>>
>> Many thanks for any hints
>>
> Why don't you use re.findall?
>
> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)

I think I can see what you did there but it won't make sense to me - or
whoever looks at the code - in future.

That answers your specific question. However, I am in awe of people who
can just "do" regular expressions and I thank you very much for what
would have been a monumental effort had I tried it.

That little re.sub() came from ChatGPT and I can understand it without
too much effort because it came documented

I suppose ChatGPT is the answer to this thread. Or everything. Or will be.

Thanks

Mike

Re: Code improvement question

<uj3h1b$1u2b5$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24486&group=comp.lang.python#24486

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rimuatki...@gmail.com (Rimu Atkinson)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Thu, 16 Nov 2023 11:34:16 +1300
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <uj3h1b$1u2b5$1@dont-email.me>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 15 Nov 2023 22:34:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5e9b271c486695801137324e105337f0";
logging-data="2034021"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Nc7yiZwdjrZ4S+mYbn/qD"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:CnsNlLTDPBsPRXqUkQCVPqraI4U=
Content-Language: en-NZ
In-Reply-To: <mailman.249.1700019686.3828.python-list@python.org>
 by: Rimu Atkinson - Wed, 15 Nov 2023 22:34 UTC

>>>
>> Why don't you use re.findall?
>>
>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>
> I think I can see what you did there but it won't make sense to me - or
> whoever looks at the code - in future.
>
> That answers your specific question. However, I am in awe of people who
> can just "do" regular expressions and I thank you very much for what
> would have been a monumental effort had I tried it.

I feel the same way about regex. If I can find a way to write something
without regex I very much prefer to as regex usually adds complexity and
hurts readability.

You might find https://regex101.com/ to be useful for testing your
regex. You can enter in sample data and see if it matches.

If I understood what your regex was trying to do I might be able to
suggest some python to do the same thing. Is it just removing numbers
from text?

The for loop, "for bit in bits" etc, could be written as a list
comprehension.

pieces = [bit if len(bit) > 6 else "" for bit in bits]

For devs familiar with other languages but new to Python this will look
like gibberish so arguably the original for loop is clearer, depending
on your team.

It's worth making the effort to get into list comprehensions though
because they're awesome.

>
> That little re.sub() came from ChatGPT and I can understand it without
> too much effort because it came documented
>
> I suppose ChatGPT is the answer to this thread. Or everything. Or will be.

I am doubtful. We'll see!

R

Re: Code improvement question

<mailman.278.1700196987.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24499&group=comp.lang.python#24499

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.nntp4.net!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: mik...@dewhirst.com.au (Mike Dewhirst)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 15:56:19 +1100
Lines: 77
Message-ID: <mailman.278.1700196987.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
<7591aa86-d484-4fd9-abd4-ae0875170dee@dewhirst.com.au>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
protocol="application/pgp-signature";
boundary="------------DOtMgMR21T084YVljP11OOKr"
X-Trace: news.uni-berlin.de V8b3+YMow9bLVYK+LDiOvAGWqAh1NHPE9kmgn4RPKnvA==
Cancel-Lock: sha1:Gsli1XR8hWeLIxiOhj8mET11V0Q= sha256:EhQEziUjTKP+YWZsjd2OFEZCm4PxWj0kdsZ3Prb5afA=
Return-Path: <miked@dewhirst.com.au>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.014
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'looks': 0.02; 'absolute':
0.05; 'content-type:multipart/signed': 0.05; 'loop': 0.07;
'thing.': 0.07; 'content-type:application/pgp-signature': 0.09;
'filename:fname piece:asc': 0.09; 'regex': 0.09; 'subject:Code':
0.09; '&gt;': 0.14; 'import': 0.15; '>>>>': 0.16; 'anyway.': 0.16;
'decrypt': 0.16; 'devs': 0.16; 'expressions': 0.16;
'from:addr:dewhirst.com.au': 0.16; 'from:addr:miked': 0.16;
'from:name:mike dewhirst': 0.16; 'key.': 0.16; 'matches.': 0.16;
'phishing.': 0.16; 'readability': 0.16;
'received:webmastery.com.au': 0.16; 'subject:improvement': 0.16;
'thread.': 0.16; 'wrote:': 0.16; 'python': 0.16;
'subject:question': 0.17; 'to:addr:python-list': 0.20; 'written':
0.22; 'languages': 0.22; 'code': 0.23; 'saying': 0.25; 'tried':
0.26; 'else': 0.27; 'bit': 0.27; '>>>': 0.28; 'sense': 0.28;
'suggest': 0.28; 'header:User-Agent:1': 0.30; 'am,': 0.31;
'think': 0.32; 'answers': 0.32; 'python-list': 0.32; 'but': 0.32;
'there': 0.33; 'received:192.168.0': 0.33; 'able': 0.34; 'same':
0.34; 'header:In-Reply-To:1': 0.34; 'trying': 0.35; 'question.':
0.35; 'received:au': 0.35; 'team.': 0.35; 'understood': 0.35;
'people': 0.36; "it's": 0.37; 'received:192.168': 0.37; 'though':
0.37; 'way': 0.38; 'could': 0.38; 'handle': 0.39; 'list': 0.39;
'use': 0.39; 'something': 0.40; 'sample': 0.61; 'me.': 0.62;
'feel': 0.63; 'email': 0.63; 'public': 0.63; 'key': 0.64; 'skip:r
40': 0.64; 'your': 0.64; 'came': 0.65; 'look': 0.65; '100%': 0.66;
'numbers': 0.67; 'content-type:multipart/mixed': 0.68; 'adds':
0.69; 'complexity': 0.69; 'etc,': 0.69; 'depending': 0.70;
'pieces': 0.70; 'private': 0.73; 'little': 0.73; 'documented':
0.76; 'signature': 0.76; '80%': 0.81; 'awesome.': 0.84; 'easy.':
0.84; 'readability.': 0.84; 'url-ip:78.47/16': 0.84; 'url-
ip:78/8': 0.84; 'received:103': 0.91
User-Agent: Mozilla Thunderbird
Content-Language: en-US
Autocrypt: addr=miked@dewhirst.com.au; keydata=
xsBNBF8LqvIBCACv4FdFv4O01C+v5M9crmjaS2wSPvNtzJGipliWAHBvcaPOkeoiCNcSRYFL
f+uFcnRIxeNGbNAlhFeT7rhd9s5sn8x6IbBmFZgb0wFYQOKh/8L/3i2MK5IBnWjk+dN647BD
ed2iCFlj08QWZQssMEupabCAly4zTes9wwtnoiVroeoXmGKyeDBb+GGTpo+rydDPGEBdxqIg
2ErTICGxxkYsY/WAWUcMesbCA21ChcMBvoPAMHxKBSjDCeNJ59cR2Y+ae5/CJ2lIrWBsDfXA
H35u/hs5V68yrMfUx7dT+vD3uHhphjnI7ig+xLJlIeZGj2UCtC+dJ4HlmzA/+Vmpy7W1ABEB
AAHNJU1pa2UgRGV3aGlyc3QgPG1pa2VkQGRld2hpcnN0LmNvbS5hdT7CwI4EEwEIADgCGwMF
CwkIBwIGFQoJCAsCBBYCAwECHgECF4AWIQT80KDtd/VTJ2hUWCka9bfUGoI4swUCXw+cuAAK
CRAa9bfUGoI4sz29B/48c+CzO864l0F+drEubAc2m6Fq1+NGD9M2Vb6a2xmOjkf+43WxS1Pw
ueAVJrhhS4kvKIE+rkuIVbJ7VVbYGVD3qTSHFgdojrbmghytJb7YKr2Cb0T7q25wAxbBSmG3
KdEEP4DJh5yP7qS2OnaR49iJkyHHvfTANHEOHnAfXCS4+CFy1Iv8HHRzRIwHj4Y4epf85Tnc
IY+jAnAXJCsMmdCK88TYLt/N5v5ilp+wDsVsQu7bVBiU+inuIJVzgmgj0UCs2dJBCoKPYq6i
EWHyvrETnAw8sahVthkyHWAJkaO58VI+jcHAbFLlsafF619De0M/LqPTBkTMRjgFHpmUUdor
zsBNBF8LqvIBCADc1gGJbL/pUndIMLkRw0O7y75XyWzTHF5LBJMGtIu5uP8bjZl/rL+JAC5p
TJa4FX22CszC46Fxk8cZIw3G3MY2CJRA3ocmyzkRs3xQQEKRf5KehaLk0vWPwbGNqQ/c1EUj
hBD7QPw8byIBOfl+CiPtrUWJOz+NDOTKtENVvpcBbP7PRz87g1dfWJ4qg1FdwNM8VJADyh90
mkRS3K31L+dFjx/P/Yitj7HNNVj102Y1+d9YcxYIXaewKH/etknrJyMWtD3vA2iBi2Rr3y8w
SdOQgIWGj5POgiIMiLKK80FJHf7EAWIr1Sd8zEa9sIugp5xGvAEqjzP9ebJMRIgG9aARABEB
AAHCwHYEGAEIACACGwwWIQT80KDtd/VTJ2hUWCka9bfUGoI4swUCXw+c2AAKCRAa9bfUGoI4
s5O+B/9DVQ3B9E8tR/90RNDIX6ykvg2p4ZzWrUKJVtiOYX/knL9UAV2GKF+CzzOSQ2Z/LtBS
5UR/dL1J1QLDy8u+5anaL1oJkD4GXNsx8vtLVOChNEfuDoFRhsxiiMRV2X9BhAzYjV9Fn0mc
SxZbBJZIMlD2oDd6YhTVogmSO9hu5z27+7EBbCdyf9Ru0U3XOy7c0H5Csa2Yz+DIImUgqJnP
iSFHVR13zzJuJmMn8xtOuraVfxo0UOcg85yU8x9MOlTTjuZqBOmYbSraE95YbNYoZZljPlvv
oaA63mAZGJlOm9kwRwEO3jUzseR5S8L7cSLKFJBq/Q7mK15p+ZFLc2aH3OrS
In-Reply-To: <uj3h1b$1u2b5$1@dont-email.me>
X-PPP-Message-ID: <20231117045620.2576098.91959@plesk01.e1.webmastery.com.au>
X-PPP-Vhost: chemintro.com
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <7591aa86-d484-4fd9-abd4-ae0875170dee@dewhirst.com.au>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
 by: Mike Dewhirst - Fri, 17 Nov 2023 04:56 UTC
Attachments: "OpenPGP_signature.asc" (application/pgp-signature)

On 16/11/2023 9:34 am, Rimu Atkinson via Python-list wrote:
>
>>>>
>>> Why don't you use re.findall?
>>>
>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>>
>> I think I can see what you did there but it won't make sense to me -
>> or whoever looks at the code - in future.
>>
>> That answers your specific question. However, I am in awe of people
>> who can just "do" regular expressions and I thank you very much for
>> what would have been a monumental effort had I tried it.
>
> I feel the same way about regex. If I can find a way to write
> something without regex I very much prefer to as regex usually adds
> complexity and hurts readability.
>
> You might find https://regex101.com/ to be useful for testing your
> regex. You can enter in sample data and see if it matches.
>
> If I understood what your regex was trying to do I might be able to
> suggest some python to do the same thing. Is it just removing numbers
> from text?
>
> The for loop, "for bit in bits" etc, could be written as a list
> comprehension.
>
> pieces = [bit if len(bit) > 6 else "" for bit in bits]
>
> For devs familiar with other languages but new to Python this will
> look like gibberish so arguably the original for loop is clearer,
> depending on your team.
>
> It's worth making the effort to get into list comprehensions though
> because they're awesome.
I agree qualitatively 100% but quantitively perhaps I agree 80% where
readability is easy.
I think that's what you are saying anyway.

>
>
>
>>
>> That little re.sub() came from ChatGPT and I can understand it
>> without too much effort because it came documented
>>
>> I suppose ChatGPT is the answer to this thread. Or everything. Or
>> will be.
>
> I am doubtful. We'll see!
>
> R
>
>

--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Your
email software can handle signing.

Attachments: "OpenPGP_signature.asc" (application/pgp-signature)
Re: Code improvement question

<uj7ca7$2ocq7$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24500&group=comp.lang.python#24500

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 10:38:14 +0100
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <uj7ca7$2ocq7$1@dont-email.me>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 17 Nov 2023 09:38:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="87d02f484480f5bce6b49b0e888f082a";
logging-data="2896711"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ZPzLD1oe0dU8pFDAm5S4A"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:ONFJn1Wi7xMJOtlWrnvRPPkaIqQ=
In-Reply-To: <mailman.249.1700019686.3828.python-list@python.org>
 by: jak - Fri, 17 Nov 2023 09:38 UTC

Mike Dewhirst ha scritto:
> On 15/11/2023 10:25 am, MRAB via Python-list wrote:
>> On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
>>> I'd like to improve the code below, which works. It feels clunky to me.
>>>
>>> I need to clean up user-uploaded files the size of which I don't know in
>>> advance.
>>>
>>> After cleaning they might be as big as 1Mb but that would be super rare.
>>> Perhaps only for testing.
>>>
>>> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
>>> xxxxxxx-xx-x eg., 1012300-77-4
>>>
>>> def remove_alpha(txt):
>>>
>>>       """  r'[^0-9\- ]':
>>>
>>>       [^...]: Match any character that is not in the specified set.
>>>
>>>       0-9: Match any digit.
>>>
>>>       \: Escape character.
>>>
>>>       -: Match a hyphen.
>>>
>>>       Space: Match a space.
>>>
>>>       """
>>>
>>>       cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>>>
>>>       bits = cleaned_txt.split()
>>>
>>>       pieces = []
>>>
>>>       for bit in bits:
>>>
>>>           # minimum size of a CAS number is 7 so drop smaller clumps
>>> of digits
>>>
>>>           pieces.append(bit if len(bit) > 6 else "")
>>>
>>>       return " ".join(pieces)
>>>
>>>
>>> Many thanks for any hints
>>>
>> Why don't you use re.findall?
>>
>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>
> I think I can see what you did there but it won't make sense to me - or
> whoever looks at the code - in future.
>
> That answers your specific question. However, I am in awe of people who
> can just "do" regular expressions and I thank you very much for what
> would have been a monumental effort had I tried it.
>
> That little re.sub() came from ChatGPT and I can understand it without
> too much effort because it came documented
>
> I suppose ChatGPT is the answer to this thread. Or everything. Or will be.
>
> Thanks
>
> Mike

I respect your opinion but from the point of view of many usenet users
asking a question to chatgpt to solve your problem is truly an overkill.
The computer world overflows with people who know regex. If you had not
already had the answer with the use of 're' I would have sent you my
suggestion that as you can see it is practically identical. I am quite
sure that in this usenet the same solution came to the mind of many
people.

with open(file) as fp:
try: ret = re.findall(r'\b\d{2,7}\-\d{2}\-\d{1}\b', fp.read())
except: ret = []

The only difference is '\d' instead of '[0-9]' but they are equivalent.

Re: Code improvement question

<mailman.279.1700219873.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24501&group=comp.lang.python#24501

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: hjp-pyt...@hjp.at (Peter J. Holzer)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 12:17:44 +0100
Lines: 93
Message-ID: <mailman.279.1700219873.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
<20231117111744.oocpwdjryvcty5ol@hjp.at>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
protocol="application/pgp-signature"; boundary="qhr7zblpfjge5uky"
X-Trace: news.uni-berlin.de YkTqy9sY9vRFvWTM2b6bMg4A746WAgtQcVcZwZTSdOkA==
Cancel-Lock: sha1:6btDnTy+ej14vzaNF84HtNTzeEA= sha256:ZyREJ5kv33r4SO/XgdYaWr5sV1T+340myAcIHh98Ij0=
Return-Path: <hjp-python@hjp.at>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'looks': 0.02; 'comments':
0.03; 'content-type:multipart/signed': 0.05; 'string': 0.07;
'thing.': 0.07; 'content-type:application/pgp-signature': 0.09;
'filename:fname piece:asc': 0.09; 'filename:fname
piece:signature': 0.09; 'filename:fname:signature.asc': 0.09;
'regex': 0.09; 'subject:Code': 0.09; 'syntax': 0.15; '"creative':
0.16; '(especially': 0.16; '__/': 0.16; 'abstraction,': 0.16;
'challenge!"': 0.16; 'expressions': 0.16; 'extensions': 0.16;
'extracting': 0.16; 'from:addr:hjp-python': 0.16;
'from:addr:hjp.at': 0.16; 'from:name:peter j. holzer': 0.16;
'handful': 0.16; 'hjp@hjp.at': 0.16; 'holzer': 0.16; 'lookahead':
0.16; 'matches.': 0.16; 'nested': 0.16; 'oh,': 0.16;
'readability': 0.16; 'reality.': 0.16; 'simple:': 0.16; 'stross,':
0.16; 'subject:improvement': 0.16; 'them)': 0.16; 'url-
ip:212.17.106/24': 0.16; 'url-ip:212.17/16': 0.16; 'url:hjp':
0.16; '|_|_)': 0.16; 'wrote:': 0.16; 'python': 0.16;
'subject:question': 0.17; 'uses': 0.19; 'to:addr:python-list':
0.20; 'language': 0.21; "i've": 0.22; 'code': 0.23; 'seems': 0.26;
'tried': 0.26; 'pattern': 0.26; 'bit': 0.27; 'sense': 0.28;
'suggest': 0.28; 'think': 0.32; '(as': 0.32; 'answers': 0.32;
'python-list': 0.32; 'but': 0.32; "i'll": 0.33; 'there': 0.33;
'able': 0.34; 'same': 0.34; 'header:In-Reply-To:1': 0.34;
'trying': 0.35; 'question.': 0.35; 'understood': 0.35; 'people':
0.36; 'those': 0.36; 'way': 0.38; 'quite': 0.39; 'use': 0.39;
'something': 0.40; 'helps': 0.60; 'sample': 0.61; 'me.': 0.62;
'received:212': 0.62; 'feel': 0.63; 'skip:r 40': 0.64; 'your':
0.64; 'received:userid': 0.66; 'numbers': 0.67; 'exactly': 0.68;
'adds': 0.69; 'collect': 0.69; 'complexity': 0.69; 'impressed':
0.69; 'natural': 0.69; 'url-ip:212/8': 0.69; 'manual': 0.70;
'easy': 0.74; 'altogether.': 0.84; 'readability.': 0.84;
'received:at': 0.84; 'url-ip:78.47/16': 0.84; 'url-ip:78/8': 0.84
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <uj3h1b$1u2b5$1@dont-email.me>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20231117111744.oocpwdjryvcty5ol@hjp.at>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
 by: Peter J. Holzer - Fri, 17 Nov 2023 11:17 UTC
Attachments: signature.asc (application/pgp-signature)

On 2023-11-16 11:34:16 +1300, Rimu Atkinson via Python-list wrote:
> > > Why don't you use re.findall?
> > >
> > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
> >
> > I think I can see what you did there but it won't make sense to me - or
> > whoever looks at the code - in future.
> >
> > That answers your specific question. However, I am in awe of people who
> > can just "do" regular expressions and I thank you very much for what
> > would have been a monumental effort had I tried it.
>
> I feel the same way about regex. If I can find a way to write something
> without regex I very much prefer to as regex usually adds complexity and
> hurts readability.

I find "straight" regexps very easy to write. There are only a handful
of constructs which are all very simple and you just string them
together. But then I've used regexps for 30+ years, so of course they
feel natural to me.

(Reading regexps may be a bit harder, exactly because they are to
simple: There is no abstraction, so a complicated pattern results in a
long regexp.)

There are some extensions to regexps which are conceptually harder, like
lookahead and lookbehind or nested contexts in Perl. I may need the
manual for those (especially because they are new(ish) and every
language uses a different syntax for them) or avoid them altogether.

Oh, and Python (just like Perl) allows you to embed whitespace and
comments into Regexps, which helps readability a lot if you have to
write long regexps.

> You might find https://regex101.com/ to be useful for testing your regex.
> You can enter in sample data and see if it matches.
>
> If I understood what your regex was trying to do I might be able to suggest
> some python to do the same thing. Is it just removing numbers from text?

Not "removing" them (as I understood it), but extracting them (i.e. find
and collect them).

> > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)

\b - a word boundary.
[0-9]{2,7} - 2 to 7 digits
- - a hyphen-minus
[0-9]{2} - exactly 2 digits
- - a hyphen-minus
[0-9]{2} - exactly 2 digits
\b - a word boundary.

Seems quite straightforward to me. I'll be impressed if you can write
that in Python in a way which is easier to read.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Attachments: signature.asc (application/pgp-signature)
Re: Code improvement question

<mailman.280.1700225332.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24502&group=comp.lang.python#24502

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: lis...@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 07:48:41 -0500
Lines: 62
Message-ID: <mailman.280.1700225332.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
<7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de mxe5N0P16NtM9JD8AvYzpgBGpJn/n018D7lQZIlVSYzg==
Cancel-Lock: sha1:ICerS2U2RBw8EGLEXXZvJZHSIJc= sha256:3mpiAz95rxu2zs0d8/vaoC79qVZIFKxAJ+47YUHT4ds=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=DR2Lw2iv;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'looks': 0.02; 'comments':
0.03; 'string': 0.07; 'thing.': 0.07; 'expression': 0.09; 'regex':
0.09; 'subject:Code': 0.09; 'syntax': 0.15; '(especially': 0.16;
'>>>>': 0.16; 'abstraction,': 0.16; 'expressions': 0.16;
'extensions': 0.16; 'extracting': 0.16; 'flag': 0.16; 'handful':
0.16; 'holzer': 0.16; 'lookahead': 0.16; 'matches.': 0.16;
'nested': 0.16; 'oh,': 0.16; 'readability': 0.16;
'received:10.0.0': 0.16; 'received:23.83.209.151': 0.16;
'received:64.90': 0.16; 'received:64.90.62': 0.16;
'received:64.90.62.162': 0.16; 'received:dreamhost.com': 0.16;
'received:quail.birch.relay.mailchannels.net': 0.16; 'simple:':
0.16; 'subject:improvement': 0.16; 'them)': 0.16; 'wrote:': 0.16;
'python': 0.16; 'subject:question': 0.17; 'uses': 0.19; 'to:addr
:python-list': 0.20; 'language': 0.21; 'written': 0.22; "i've":
0.22; 'code': 0.23; 'seems': 0.26; 'tried': 0.26; 'pattern': 0.26;
'bit': 0.27; '>>>': 0.28; 'sense': 0.28; 'suggest': 0.28; 'header
:User-Agent:1': 0.30; 'am,': 0.31; 'think': 0.32; '(as': 0.32;
'answers': 0.32; 'python-list': 0.32; 'received:10.0': 0.32;
'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'but': 0.32; "i'll":
0.33; 'there': 0.33; 'able': 0.34; 'same': 0.34; 'header:In-Reply-
To:1': 0.34; 'trying': 0.35; 'question.': 0.35; 'understood':
0.35; 'people': 0.36; 'those': 0.36; 'way': 0.38; 'quite': 0.39;
'use': 0.39; 'something': 0.40; 'helps': 0.60; 'sample': 0.61;
'above': 0.62; 'me.': 0.62; 'feel': 0.63; 'skip:r 40': 0.64;
'your': 0.64; 'numbers': 0.67; 'entire': 0.67; 'nearly': 0.67;
'header:Received:6': 0.67; 'received:64': 0.67; 'exactly': 0.68;
'adds': 0.69; 'collect': 0.69; 'complexity': 0.69; 'impressed':
0.69; 'natural': 0.69; 'manual': 0.70; 'easy': 0.74;
'altogether.': 0.84; 'readability.': 0.84; 'url-ip:78.47/16':
0.84; 'url-ip:78/8': 0.84
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1700225322; a=rsa-sha256;
cv=none;
b=c41nz4Vlhl0/5kvlyPYwHAya7ukG5qJLFbhqzUDO9E7g/nkIPgirB6Ku67yORgt84CSF9J
ukS6TK5+b9aIslVplklv4XWppkUMPcv6ugUMe6/kbRwTeYagUnIX6Yw04QZq1Cr7icjzAH
LHmwxX7wPKG+HIzNUeUSaY2jWddhEYF5+WzqJnF9+VGc4CErKVeoRlFhQK0uBnx+gUHT4S
To7Eh3OpYMvLp/Yb2xxVuPjMIy/+bS50zdQzRRq7tmtTRVN+fHTpI2Ic5Kq/aBuc9bEfvd
aaDIJKdllD54+qTHDhnpewsBFbsQSuJj56HCrqvFCwwQzfTkmU9YcMCnDyws4g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1700225322;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=913YwgbEXbixatwEtF54RburLx95f1HlHfJmCL3k508=;
b=5WsEpWNYaHC+mpxLowJ/hWvVgjCYFqO+UAiVRVxQQgY4mVtg2s6DfS57vXcarv6ROytsZw
jDx1UP6LdpdeLd4va6PfSGqc60FtevfUoeXd9JD5dQAA2iVcMDS/W3xclevoOpBFsb6hPk
g+4F0nfd14BkTydGYd3vd3RgDYqGKlKDlX3JnCGWdUgasVBiBdJLHUDp9c9DDCPRjpnjBI
pX1t/20rHivlbYAzdWgUa6NMxFJGopVqlFK8MHSusqwxWhjbAGbJAkIrtRPiETTjxUwjEU
LcT7PFsqatH2Hjq7AYKxUck1cHrwazWT0KmcFjWO2qktrnyg/DXxCHN5Gvc2tQ==
ARC-Authentication-Results: i=1; rspamd-55bcb54c45-hfp7p;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Illustrious-Spot: 35ef68e12d7e4ed5_1700225322458_1600854054
X-MC-Loop-Signature: 1700225322458:201024230
X-MC-Ingress-Time: 1700225322458
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1700225322;
bh=913YwgbEXbixatwEtF54RburLx95f1HlHfJmCL3k508=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=DR2Lw2ivY7n73mLEJCrAjvIiQk5FLyfdwr7X4hMUN7oUhUUD8XA4EfEghweyg26uU
Qyq7DvmxS7Ff0Wn8/ChN7rH/sGJ9OtDyF8FE8YJbsQn+eBHiMwuwvKFoTbQyAUm7se
IlR3chwMZyGDoGZ7mXLdmg1Hyxanj48unYv+oDK7l709PW/mztEZEo3sqnslCpBDYR
7aQKFYCYm7ms0huaDGjTizysrvZCwu5GxdYpz/jdoAP7Mojoia3QEa1r9qIyGYxW6E
jljEnGY4Rf+y32OcHvP9WTcunCdVSNpQyg6GXZDq9o0z70LO30c+q9TApgqzCxfL8X
66cs6inCFcyfA==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <20231117111744.oocpwdjryvcty5ol@hjp.at>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
 by: Thomas Passin - Fri, 17 Nov 2023 12:48 UTC

On 11/17/2023 6:17 AM, Peter J. Holzer via Python-list wrote:
> On 2023-11-16 11:34:16 +1300, Rimu Atkinson via Python-list wrote:
>>>> Why don't you use re.findall?
>>>>
>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>>>
>>> I think I can see what you did there but it won't make sense to me - or
>>> whoever looks at the code - in future.
>>>
>>> That answers your specific question. However, I am in awe of people who
>>> can just "do" regular expressions and I thank you very much for what
>>> would have been a monumental effort had I tried it.
>>
>> I feel the same way about regex. If I can find a way to write something
>> without regex I very much prefer to as regex usually adds complexity and
>> hurts readability.
>
> I find "straight" regexps very easy to write. There are only a handful
> of constructs which are all very simple and you just string them
> together. But then I've used regexps for 30+ years, so of course they
> feel natural to me.
>
> (Reading regexps may be a bit harder, exactly because they are to
> simple: There is no abstraction, so a complicated pattern results in a
> long regexp.)
>
> There are some extensions to regexps which are conceptually harder, like
> lookahead and lookbehind or nested contexts in Perl. I may need the
> manual for those (especially because they are new(ish) and every
> language uses a different syntax for them) or avoid them altogether.
>
> Oh, and Python (just like Perl) allows you to embed whitespace and
> comments into Regexps, which helps readability a lot if you have to
> write long regexps.
>
>
>> You might find https://regex101.com/ to be useful for testing your regex.
>> You can enter in sample data and see if it matches.
>>
>> If I understood what your regex was trying to do I might be able to suggest
>> some python to do the same thing. Is it just removing numbers from text?
>
> Not "removing" them (as I understood it), but extracting them (i.e. find
> and collect them).
>
>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>
> \b - a word boundary.
> [0-9]{2,7} - 2 to 7 digits
> - - a hyphen-minus
> [0-9]{2} - exactly 2 digits
> - - a hyphen-minus
> [0-9]{2} - exactly 2 digits
> \b - a word boundary.
>
> Seems quite straightforward to me. I'll be impressed if you can write
> that in Python in a way which is easier to read.

And the re.VERBOSE (also re.X) flag can always be used so the entire
expression can be written line-by-line with comments nearly the same as
the example above

Re: Code improvement question

<mailman.281.1700232369.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24503&group=comp.lang.python#24503

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: hjp-pyt...@hjp.at (Peter J. Holzer)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 15:46:06 +0100
Lines: 61
Message-ID: <mailman.281.1700232369.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
<20231117111744.oocpwdjryvcty5ol@hjp.at>
<7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
<20231117144606.ssezd234lj753bp2@hjp.at>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
protocol="application/pgp-signature"; boundary="25oltib3f4hqssmy"
X-Trace: news.uni-berlin.de wccjbH3j47Dp0OdOKEwJ3gO0I6OgTlcTKlXe1mt+FWdQ==
Cancel-Lock: sha1:3ajvLMRiiP247m58lDsfRXGRd9g= sha256:Pl9BXaKj8A3KstpEECA4xQmCugz82H6SJ1HePpeMZP8=
Return-Path: <hjp-python@hjp.at>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'comments': 0.03; 'content-
type:multipart/signed': 0.05; 'content-type:application/pgp-
signature': 0.09; 'expression': 0.09; 'filename:fname piece:asc':
0.09; 'filename:fname piece:signature': 0.09;
'filename:fname:signature.asc': 0.09; 'subject:Code': 0.09;
'yes.': 0.09; '"creative': 0.16; '__/': 0.16; 'challenge!"': 0.16;
'flag': 0.16; 'from:addr:hjp-python': 0.16; 'from:addr:hjp.at':
0.16; 'from:name:peter j. holzer': 0.16; 'hjp@hjp.at': 0.16;
'holzer': 0.16; 'oh,': 0.16; 'readability': 0.16; 'reality.':
0.16; 'stross,': 0.16; 'subject:improvement': 0.16; 'url-
ip:212.17.106/24': 0.16; 'url-ip:212.17/16': 0.16; 'url:hjp':
0.16; '|_|_)': 0.16; 'wrote:': 0.16; 'python': 0.16;
'subject:question': 0.17; 'to:addr:python-list': 0.20; 'written':
0.22; 'seems': 0.26; 'sense': 0.28; 'am,': 0.31; 'python-list':
0.32; "i'll": 0.33; 'same': 0.34; 'header:In-Reply-To:1': 0.34;
'way': 0.38; 'quite': 0.39; 'helps': 0.60; 'above': 0.62; 'me.':
0.62; 'received:212': 0.62; 'skip:r 40': 0.64; 'received:userid':
0.66; 'entire': 0.67; 'nearly': 0.67; 'exactly': 0.68;
'impressed': 0.69; 'url-ip:212/8': 0.69; 'received:at': 0.84
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20231117144606.ssezd234lj753bp2@hjp.at>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
<20231117111744.oocpwdjryvcty5ol@hjp.at>
<7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
 by: Peter J. Holzer - Fri, 17 Nov 2023 14:46 UTC
Attachments: signature.asc (application/pgp-signature)

On 2023-11-17 07:48:41 -0500, Thomas Passin via Python-list wrote:
> On 11/17/2023 6:17 AM, Peter J. Holzer via Python-list wrote:
> > Oh, and Python (just like Perl) allows you to embed whitespace and
> > comments into Regexps, which helps readability a lot if you have to
> > write long regexps.
> >
[...]
> > > > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
> >
> > \b - a word boundary.
> > [0-9]{2,7} - 2 to 7 digits
> > - - a hyphen-minus
> > [0-9]{2} - exactly 2 digits
> > - - a hyphen-minus
> > [0-9]{2} - exactly 2 digits
> > \b - a word boundary.
> >
> > Seems quite straightforward to me. I'll be impressed if you can write
> > that in Python in a way which is easier to read.
>
> And the re.VERBOSE (also re.X) flag can always be used so the entire
> expression can be written line-by-line with comments nearly the same
> as the example above

Yes. That's what I alluded to above.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Attachments: signature.asc (application/pgp-signature)
Re: Code improvement question

<mailman.282.1700234667.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24504&group=comp.lang.python#24504

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: lis...@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 10:17:37 -0500
Lines: 30
Message-ID: <mailman.282.1700234667.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
<7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
<20231117144606.ssezd234lj753bp2@hjp.at>
<303c6738-4c51-4dbd-9c3c-1fe659b2ff6e@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de /GtmrKqwlkdgbX7EVWDPVg/iT3Rw11ChIXwnU4TX5bSg==
Cancel-Lock: sha1:RgPKUBh+HgKPG2VjZ9gz0QIXt7A= sha256:VrnD8TxHwyPAGJxnL4n53X1OvWWzYsevsZ06sm2IbZQ=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=kM7gdvNK;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.006
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'comments': 0.03;
'expression': 0.09; 'subject:Code': 0.09; 'yes.': 0.09; '>>>>>>':
0.16; 'explicit': 0.16; 'flag': 0.16; 'holzer': 0.16; 'oh,': 0.16;
'readability': 0.16; 'received:10.0.0': 0.16; 'received:64.90':
0.16; 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'subject:improvement': 0.16;
'wrote:': 0.16; 'python': 0.16; 'subject:question': 0.17; 'to:addr
:python-list': 0.20; 'written': 0.22; 'seems': 0.26; '>>>': 0.28;
'header:User-Agent:1': 0.30; 'am,': 0.31; 'python-list': 0.32;
'received:10.0': 0.32; 'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; "i'll": 0.33; 'same':
0.34; "didn't": 0.34; 'header:In-Reply-To:1': 0.34; 'people':
0.36; 'way': 0.38; 'quite': 0.39; 'helps': 0.60; 'above': 0.62;
'me.': 0.62; 'skip:r 40': 0.64; 'entire': 0.67; 'nearly': 0.67;
'header:Received:6': 0.67; 'received:64': 0.67; 'exactly': 0.68;
'impressed': 0.69
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1700234259; a=rsa-sha256;
cv=none;
b=yGMQyb0MhgJVto97FrO+eB0oU8+uHguV2wj7Dwgwh3VhORUTND2pjjzWnZ33XEP/XbTfHT
LIEkiJTNCvadFqTi3Rsqk0EHTiBIW6akHpvJumPWR6TlqvZBhRCMxOm0NBb3sFfnJC5NQw
HF6R0dwUpGgSfjeMeuQBg/TB250C5lHsUkjoV6OR7S5Efm9skrZa848bqSy2tlaZnTVNQW
Sa/61AcWZ+JVCzdkSQ16CY5jUGqZYVqgkTr/LbK2TOwgTYWwwRPL8Sys+UxCFYOd1J8CQV
vJ5LmPjbv3BsJ4BPlScgwy2SwmZoaGBeWKcAjur6F7dPn2HGQpB8p7zxlWXk6A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1700234259;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=7o1XXpUfyy913PAcq4UdnbQfOFvV1FhnhTphHKG44nQ=;
b=MZjBotuc8ujUA7EnK7/MyohJSgOOp+DztydHQCkZ1P0eOssbUmn42EWQ0K5jVahwNP5aUr
mrXPSoNcvy2KUAQ4DDIJXwg9MKQi8BNnrzj2l15Si5HgDbpTteDzxEgiIQSSTV2lFQO9vy
T6T/sYJUbY/pJxm+CwcSjVYNE07uYbZfYS0DMyb166LrHhXvRFp6vmPEgarNuUGMjg7dZn
+BiPdrd3ik/H9AZ5AXnNTf6f4/ufR4oexzgIZg3ThMxSooMBcS9Wbp8yTvOnr4gQdU0NWE
OZ8bZWhmZ5/8iK0Z/vCB9OJhMPnBNom4hgum//5LDMAMTVO3j+pNq0rsfTbzyA==
ARC-Authentication-Results: i=1; rspamd-55bcb54c45-gw64q;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Eyes-Absorbed: 0ca0f27d18cff00a_1700234259137_3470861331
X-MC-Loop-Signature: 1700234259137:2447333087
X-MC-Ingress-Time: 1700234259137
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1700234258;
bh=7o1XXpUfyy913PAcq4UdnbQfOFvV1FhnhTphHKG44nQ=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=kM7gdvNK7LninHZ98vHhtRZmgq18yoSzpwYidWCbP1/o8VLSuwE7P+SDmWHaKUxjT
e+toBqdzugM5BSAmwfvgJN6mL5ACIaHmGDdDlRwAsgGhIRNKxWt+N9xJs/NVZeOZaD
pXoWrfFz/76oY/hH8W9AatOnWi4KSpoKrXdJUm5z+GUC8qRfRkX7vNTvcyJ670YTUo
y2BbbprWvNXp6YWlF/p5eALauzPcY2HcnX80LFACH4t5xOMcY+bM8DqNdAly9dMoEf
NtqtCc3rXROx5lYb7ajpJPD6geuTJBlw9iLQKd4QWGbyMItGn+f+/M6MwfFgqYeCpz
g2auHniiUaptQ==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <20231117144606.ssezd234lj753bp2@hjp.at>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <303c6738-4c51-4dbd-9c3c-1fe659b2ff6e@tompassin.net>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
<7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
<20231117144606.ssezd234lj753bp2@hjp.at>
 by: Thomas Passin - Fri, 17 Nov 2023 15:17 UTC

On 11/17/2023 9:46 AM, Peter J. Holzer via Python-list wrote:
> On 2023-11-17 07:48:41 -0500, Thomas Passin via Python-list wrote:
>> On 11/17/2023 6:17 AM, Peter J. Holzer via Python-list wrote:
>>> Oh, and Python (just like Perl) allows you to embed whitespace and
>>> comments into Regexps, which helps readability a lot if you have to
>>> write long regexps.
>>>
> [...]
>>>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>>>
>>> \b - a word boundary.
>>> [0-9]{2,7} - 2 to 7 digits
>>> - - a hyphen-minus
>>> [0-9]{2} - exactly 2 digits
>>> - - a hyphen-minus
>>> [0-9]{2} - exactly 2 digits
>>> \b - a word boundary.
>>>
>>> Seems quite straightforward to me. I'll be impressed if you can write
>>> that in Python in a way which is easier to read.
>>
>> And the re.VERBOSE (also re.X) flag can always be used so the entire
>> expression can be written line-by-line with comments nearly the same
>> as the example above
>
> Yes. That's what I alluded to above.

I know, and I just wanted to make it explicit for people who didn't know
much about Python regexes.

Re: Code improvement question

<Or-20231117191916@ram.dialup.fu-berlin.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24506&group=comp.lang.python#24506

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram...@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: 17 Nov 2023 18:20:14 GMT
Organization: Stefan Ram
Lines: 24
Expires: 1 Dec 2024 11:59:58 GMT
Message-ID: <Or-20231117191916@ram.dialup.fu-berlin.de>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au> <088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com> <32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au> <mailman.249.1700019686.3828.python-list@python.org> <uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at> <7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net> <20231117144606.ssezd234lj753bp2@hjp.at> <303c6738-4c51-4dbd-9c3c-1fe659b2ff6e@tompassin.net> <mailman.282.1700234667.3828.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de XAEZN0cgZkq6NxM7Qk9QDAsm3OOfV0Op+iEPzM29GaJx41
Cancel-Lock: sha1:pekrcjKJoRtJ6NAh1yePfFZIVTY= sha256:RBKNhmAwVuB7XMfG9pQDjosHo0bHOJDaLK0dw6sOceQ=
X-Copyright: (C) Copyright 2023 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Accept-Language: de-DE-1901, en-US, it, fr-FR
 by: Stefan Ram - Fri, 17 Nov 2023 18:20 UTC

Thomas Passin <list1@tompassin.net> writes:
>>>>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
....
>I know, and I just wanted to make it explicit for people who didn't know
>much about Python regexes.

Or,

def repeat_preceding( min=None, max=None, count=None ):
return '{' + str( count )+ '}' if count else \
'{' + str( min )+ ',' + str( max )+ '}'

digit = '[0-9]'
word_boundary = r'\b'
hyphen = '-'

my_regexp = word_boundary + \
digit + repeat_preceding( min=2, max=7 ) + hyphen + \
digit + repeat_preceding( count=2 ) + hyphen + \
digit + repeat_preceding( count=2 ) + word_boundary

.

Re: Code improvement question

<mailman.283.1700247424.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24507&group=comp.lang.python#24507

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: pyt...@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Fri, 17 Nov 2023 18:56:54 +0000
Lines: 84
Message-ID: <mailman.283.1700247424.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj7ca7$2ocq7$1@dont-email.me>
<520925c5-bb86-4473-a27a-9c6aa74d73b9@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de aCNR49iaCVCR7htvmv9zCQeJ+jjvjKaLMzhh3alwrjjg==
Cancel-Lock: sha1:3iigKHqKgTs/KZrR6ALIv1OwA2g= sha256:XJ/KLtW5Dg83sN8Vyz35jKLiGEy4t/OdldRi9g+z7a4=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=gYAMooqR;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'looks': 0.02; 'def': 0.04;
'suggestion': 0.07; '"""': 0.09; 'from:addr:python': 0.09;
'received:192.168.1.64': 0.09; 'set.': 0.09; 'smaller': 0.09;
'subject:Code': 0.09; '>>>>': 0.16; 'bits': 0.16; 'expressions':
0.16; 'extracting': 0.16; 'fp:': 0.16;
'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16;
'hints': 0.16; 'idea.': 0.16; 'message-id:@mrabarnett.plus.com':
0.16; 'practically': 0.16; 'received:84.93': 0.16;
'received:84.93.230': 0.16; 'received:84.93.230.227': 0.16;
'received:plus.net': 0.16; 'subject:improvement': 0.16; 'super':
0.16; 'testing.': 0.16; 'thread.': 0.16; 'wrote:': 0.16;
'problem': 0.16; 'advance.': 0.17; 'subject:question': 0.17;
'instead': 0.17; 'solve': 0.19; 'to:addr:python-list': 0.20;
'code': 0.23; "i'd": 0.24; 'tried': 0.26; 'pattern': 0.26; 'else':
0.27; 'bit': 0.27; '>>>': 0.28; 'sense': 0.28; 'computer': 0.29;
'header:User-Agent:1': 0.30; 'am,': 0.31; 'think': 0.32;
'question': 0.32; 'answers': 0.32; 'python-list': 0.32;
'specified': 0.32; 'received:192.168.1': 0.32; 'but': 0.32; "i'm":
0.33; 'there': 0.33; 'same': 0.34; 'header:In-Reply-To:1': 0.34;
'question.': 0.35; 'respect': 0.35; 'files': 0.36; 'people': 0.36;
'received:192.168': 0.37; 'thanks': 0.38; 'quite': 0.39; 'use':
0.39; 'match': 0.40; 'view': 0.60; 'me.': 0.62; 'skip:\xc2 10':
0.62; 'here': 0.62; 'opinion': 0.64; 'skip:r 40': 0.64; 'your':
0.64; 'came': 0.65; 'improve': 0.66; 'numbers': 0.67; 'bad': 0.67;
'drop': 0.69; 'pieces': 0.70; 'truly': 0.70; 'little': 0.73;
'8bit%:100': 0.76; 'documented': 0.76; '"")': 0.84; '10:25': 0.84;
'cas': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1700247415; bh=LATzPz7G2lKWsECFre6PrFR8wJ2LJE6Nsj1OOjqUSh8=;
h=Date:Subject:To:References:From:In-Reply-To;
b=gYAMooqRPT/xGSDe79YpjENJqGk+Rw3K3HlVFd40aWcA4ND6V3jlaEuUBLW1OPwIg
8CURL5p8xLUhF4dV7+4kEUWwQflijQwRQzvSc8Z1QaFHMLHosFsnydVUqD4qjGfbSO
lthq9uJprsJSfb9ZN/rcG0dze2T13ejwKhzvFUvxpJKBOEK//XOJzQMx9myooPFOc+
cM96C66vR/5NkvpA4R+cw1k9gDHqfmdgtiJj5LHyNcf4gn0ZlbA2h9ac2gPJ3dHLSr
Rde+Lgf53ZhVpIJeqw2ISyDopIue3jGJRhl7E1bscUny0xSiK5xE1+HRH0e2gJzj3b
fx+claBBuW3AQ==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=XaWaca15 c=1 sm=1 tr=0 ts=6557b777
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=Wh19-yAZkFwYYKrtXDgA:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <uj7ca7$2ocq7$1@dont-email.me>
X-CMAE-Envelope: MS4xfAC0+HsR3FXa6gHTa5PF41L+q3cVoH29SmyK7W0Mh8F8EGlEVLJUUkpvf9kri8gr1jzzmvScuK45MOKLYm7SFbq5Ikg1A1feFcJpRF/gSwjHwowwJT53
tuQnAYUNSJysRSDSxvA3Vvw3hFIzUZypQJVKN3jQF7UuYe+tB3qr+j4AfkzP8/p7gFiuU3GZeWMp6vJ3OCyKF0AiZ7U598xJ+nY=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <520925c5-bb86-4473-a27a-9c6aa74d73b9@mrabarnett.plus.com>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj7ca7$2ocq7$1@dont-email.me>
 by: MRAB - Fri, 17 Nov 2023 18:56 UTC

On 2023-11-17 09:38, jak via Python-list wrote:
> Mike Dewhirst ha scritto:
>> On 15/11/2023 10:25 am, MRAB via Python-list wrote:
>>> On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
>>>> I'd like to improve the code below, which works. It feels clunky to me.
>>>>
>>>> I need to clean up user-uploaded files the size of which I don't know in
>>>> advance.
>>>>
>>>> After cleaning they might be as big as 1Mb but that would be super rare.
>>>> Perhaps only for testing.
>>>>
>>>> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
>>>> xxxxxxx-xx-x eg., 1012300-77-4
>>>>
>>>> def remove_alpha(txt):
>>>>
>>>>       """  r'[^0-9\- ]':
>>>>
>>>>       [^...]: Match any character that is not in the specified set.
>>>>
>>>>       0-9: Match any digit.
>>>>
>>>>       \: Escape character.
>>>>
>>>>       -: Match a hyphen.
>>>>
>>>>       Space: Match a space.
>>>>
>>>>       """
>>>>
>>>>       cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>>>>
>>>>       bits = cleaned_txt.split()
>>>>
>>>>       pieces = []
>>>>
>>>>       for bit in bits:
>>>>
>>>>           # minimum size of a CAS number is 7 so drop smaller clumps
>>>> of digits
>>>>
>>>>           pieces.append(bit if len(bit) > 6 else "")
>>>>
>>>>       return " ".join(pieces)
>>>>
>>>>
>>>> Many thanks for any hints
>>>>
>>> Why don't you use re.findall?
>>>
>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>>
>> I think I can see what you did there but it won't make sense to me - or
>> whoever looks at the code - in future.
>>
>> That answers your specific question. However, I am in awe of people who
>> can just "do" regular expressions and I thank you very much for what
>> would have been a monumental effort had I tried it.
>>
>> That little re.sub() came from ChatGPT and I can understand it without
>> too much effort because it came documented
>>
>> I suppose ChatGPT is the answer to this thread. Or everything. Or will be.
>>
>> Thanks
>>
>> Mike
>
> I respect your opinion but from the point of view of many usenet users
> asking a question to chatgpt to solve your problem is truly an overkill.
> The computer world overflows with people who know regex. If you had not
> already had the answer with the use of 're' I would have sent you my
> suggestion that as you can see it is practically identical. I am quite
> sure that in this usenet the same solution came to the mind of many
> people.
>
> with open(file) as fp:
> try: ret = re.findall(r'\b\d{2,7}\-\d{2}\-\d{1}\b', fp.read())
> except: ret = []
>
> The only difference is '\d' instead of '[0-9]' but they are equivalent.
>
Bare excepts are a very bad idea.

Re: Code improvement question

<uj8udg$30lcg$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24508&group=comp.lang.python#24508

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Sat, 18 Nov 2023 00:53:21 +0100
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <uj8udg$30lcg$1@dont-email.me>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj7ca7$2ocq7$1@dont-email.me>
<520925c5-bb86-4473-a27a-9c6aa74d73b9@mrabarnett.plus.com>
<mailman.283.1700247424.3828.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 17 Nov 2023 23:53:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d6ecbcf7fd4acd33960c6b10e027db16";
logging-data="3167632"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+Ajdfp1LI82GIzhbuYd+Z"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:EXScUMSAap97kXnbaI8XzIpaGi0=
In-Reply-To: <mailman.283.1700247424.3828.python-list@python.org>
 by: jak - Fri, 17 Nov 2023 23:53 UTC

MRAB ha scritto:
> Bare excepts are a very bad idea.

I know, you're right but to test the CAS numbers were inside a string
(txt) and instead of the 'open(file)' there was 'io.StingIO(txt)' so the
risk was almost null. When I copied it here I didn't think about it.
Sorry.

RE: Code improvement question

<mailman.284.1700290518.3828.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24509&group=comp.lang.python#24509

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From:
Newsgroups: comp.lang.python
Subject: RE: Code improvement question
Date: Sat, 18 Nov 2023 01:55:13 -0500
Lines: 110
Message-ID: <mailman.284.1700290518.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
<002f01da19ec$2e90d1b0$8bb27510$@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de bcxONRVXn3kscyoc1ZGemwlapw4FLbG5JCyH/tQRubfQ==
Cancel-Lock: sha1:eocyinCuZ+Pv1aFOW4REpU/Hovo= sha256:KQWZjdtyCaNJSENCDcRUMkzhp0OT4eSJyxKv4euAAQI=
Return-Path: <avi.e.gross@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=IqKxKTJ0;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'looks': 0.02; 'comments':
0.03; '17,': 0.04; 'fairly': 0.05; 'improvement': 0.05; 'row':
0.05; '2023': 0.07; 'hopefully': 0.07; 'received:mail-
qk1-x72d.google.com': 0.07; 'simple.': 0.07; 'string': 0.07;
'thing.': 0.07; 'expression': 0.09; 'received:108': 0.09; 'regex':
0.09; 'subject:Code': 0.09; 'syntax': 0.15; '"creative': 0.16;
'(especially': 0.16; '__/': 0.16; 'abstraction,': 0.16;
'challenge!"': 0.16; 'explaining': 0.16; 'expressions': 0.16;
'extensions': 0.16; 'extracting': 0.16; 'handful': 0.16;
'hjp@hjp.at': 0.16; 'holzer': 0.16; 'lookahead': 0.16; 'matches.':
0.16; 'nested': 0.16; 'oh,': 0.16; 'readability': 0.16;
'reality.': 0.16; 'simple:': 0.16; 'solved': 0.16; 'specify':
0.16; 'stross,': 0.16; 'subject:improvement': 0.16; 'them)': 0.16;
'unicode': 0.16; 'url-ip:212.17.106/24': 0.16; 'url-ip:212.17/16':
0.16; 'url:hjp': 0.16; '|_|_)': 0.16; 'wrote:': 0.16; 'python':
0.16; 'subject:question': 0.17; 'message-id:@gmail.com': 0.18;
'uses': 0.19; 'calls': 0.19; 'to:addr:python-list': 0.20;
'language': 0.21; "i've": 0.22; 'languages': 0.22; 'maybe': 0.22;
'code': 0.23; 'skip:- 10': 0.25; 'anyone': 0.25; 'seems': 0.26;
'tried': 0.26; 'friday,': 0.26; 'pattern': 0.26; 'bit': 0.27;
'function': 0.27; 'expect': 0.28; 'sense': 0.28; 'suggest': 0.28;
'module': 0.31; 'think': 0.32; 'question': 0.32; '(as': 0.32;
'answers': 0.32; 'extract': 0.32; 'python-list': 0.32; 'but':
0.32; "i'll": 0.33; 'there': 0.33; 'able': 0.34; 'same': 0.34;
'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34;
'trying': 0.35; 'complex': 0.35; 'question.': 0.35; 'understood':
0.35; 'usual': 0.35; 'from:addr:gmail.com': 0.35; 'functions':
0.36; 'people': 0.36; 'those': 0.36; 'using': 0.37; 'others':
0.37; 'way': 0.38; 'quite': 0.39; 'added': 0.39; 'adding': 0.39;
'single': 0.39; 'text': 0.39; 'enough': 0.39; 'use': 0.39; 'on.':
0.39; 'match': 0.40; 'something': 0.40; 'want': 0.40; 'english':
0.60; 'helps': 0.60; 'reference': 0.60; 'detail': 0.61; 'sample':
0.61; 'above': 0.62; 'from:': 0.62; 'me.': 0.62; 'to:': 0.62;
'feel': 0.63; 'skip:m 20': 0.63; 're:': 0.64; 'skip:r 40': 0.64;
'your': 0.64; 'parts': 0.65; 'look': 0.65; 'numbers': 0.67;
'exactly': 0.68; 'adds': 0.69; 'and,': 0.69; 'collect': 0.69;
'complexity': 0.69; 'impressed': 0.69; 'interpreted': 0.69;
'mini': 0.69; 'natural': 0.69; 'url-ip:212/8': 0.69; 'within':
0.69; 'manual': 0.70; 'speed': 0.71; 'easy': 0.74; 'features':
0.75; 'choice': 0.76; 'surprise': 0.76; 'sent:': 0.78; 'happens':
0.84; 'powerful': 0.84; 'altogether.': 0.84; 'readability.': 0.84;
'remind': 0.84; 'spell': 0.84; 'url-ip:78.47/16': 0.84; 'url-
ip:78/8': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1700290515; x=1700895315; darn=python.org;
h=content-language:thread-index:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:from:to:cc:subject:date:message-id:reply-to;
bh=8LP6Yirha2a/qjGD+3l4IsTBvSgME64G9FJbEtBhSbc=;
b=IqKxKTJ0XwaSUQNtHPXIw0aLypSw8E9M19W1wsB3QF7DtcNOTVquDuHMQ6woh14hMp
oVos/Eh5D5Nw6mmt/Y9u6FZbBtOiSxaCGlHUADHARQLaI9u2KoI5WVmt8j520hvpI8tL
2crlhsCF+tV5sJor8T0PMa66RlT0Dd5FxZQkzhu1K7hekk2gFiUJ58M2RGcYqeEFCanh
/7fw8xPk8oIG6yo4WgEI75GhPJh9KyBKQ7E3HS3+nEMPPsyR1kGmMGVfGsIqpkqlSske
TVTAIG1e1hOGgbUVJOKC+4pi04HL869c4cJTCnAx02xvc9FugWgtLwBct3ONAi/5Spzg
3n2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1700290515; x=1700895315;
h=content-language:thread-index:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=8LP6Yirha2a/qjGD+3l4IsTBvSgME64G9FJbEtBhSbc=;
b=ZSvMe88j/lB4AhRKZKL2jx5Fl8mxcBxFOoTdQnJgDDARh6BaJemycecLswOhwD7Whk
BuPGqQuRQZQ9lUk9UMDbB/nMwSqbGXdsG1Axvlahuzkyekv9KUHCRgt9ZNaT2tNA0akN
FlNvn+Wo16xmOXm7igFeeemUhBdotvC4uqXLPoAfjend+CRxmhgfxYmRJ1U+5ehS6B4l
KqgEMDv9Yg8A8jiqewUBeB+ZWHVgILxUk3uO0oG63t9UkZf4xPmWFRRb4YI1072ar0Mi
1AtmCvxDB+WMmTq/ucOWzfpuztL76Y56ivMpdpmD1bn4TEvCRAo2hp4M67IxyF+xvT6m
w1sA==
X-Gm-Message-State: AOJu0YyOe1Z/qVKnLz+f8VBF6XxDpbuLONH+UM0c3neywjI5xINQF+Vt
OG5r/iMgheEBsurRCrDhcMTcnJad3e4=
X-Google-Smtp-Source: AGHT+IF44+qOQob8R3kNguYJmRr37bRPEZEfSSuZT1dJRPcYEvP112TfwEdOJcfr5EhR15prdL5gUA==
X-Received: by 2002:a05:620a:2b4b:b0:774:139a:e442 with SMTP id
dp11-20020a05620a2b4b00b00774139ae442mr1971116qkb.76.1700290515226;
Fri, 17 Nov 2023 22:55:15 -0800 (PST)
In-Reply-To: <20231117111744.oocpwdjryvcty5ol@hjp.at>
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQK7xPSQ2IRPXHJuWgTVIwbz8Q42lgGya1wOAWAvuX8CEPFEdQFnalNIAnqYAsiudCHqsA==
Content-Language: en-us
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <002f01da19ec$2e90d1b0$8bb27510$@gmail.com>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
 by: - Sat, 18 Nov 2023 06:55 UTC

Many features like regular expressions can be mini languages that are designed to be very powerful while also a tad cryptic to anyone not familiar.

But consider an alternative in some languages that may use some complex set of nested function calls that each have names like match_white_space(2, 5) and even if some are set up to be sort of readable, they can be a pain. Quite a few problems can be solved nicely with a single regular expression or several in a row with each one being fairly simple. Sometimes you can do parts using some of the usual text manipulation functions built-in or in a module for either speed or to simplify things so that the RE part is simpler and easier to follow.

And, as noted, Python allows ways to include comments in RE or ways to specify extensions such as PERL-style and so on. Adding enough comments above or within the code can help remind people or point to a reference and just explaining in English (or the language of your choice that hopefully others later can understand) can be helpful. You can spell out in whatever level of detail what you expect your data to look like and what you want to match or extract and then the RE may be easier to follow.

Of course the endless extensions added due to things like supporting UNICODE have made some RE much harder to create or understand and sometimes the result may not even be what you expected if something strange happens like the symbols ①❹⓸

The above might match digits and maybe be interpreted at some point as 12 dozen, which may even be appropriate but a bit of a surprise perhaps.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Peter J. Holzer via Python-list
Sent: Friday, November 17, 2023 6:18 AM
To: python-list@python.org
Subject: Re: Code improvement question

On 2023-11-16 11:34:16 +1300, Rimu Atkinson via Python-list wrote:
> > > Why don't you use re.findall?
> > >
> > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
> >
> > I think I can see what you did there but it won't make sense to me - or
> > whoever looks at the code - in future.
> >
> > That answers your specific question. However, I am in awe of people who
> > can just "do" regular expressions and I thank you very much for what
> > would have been a monumental effort had I tried it.
>
> I feel the same way about regex. If I can find a way to write something
> without regex I very much prefer to as regex usually adds complexity and
> hurts readability.

I find "straight" regexps very easy to write. There are only a handful
of constructs which are all very simple and you just string them
together. But then I've used regexps for 30+ years, so of course they
feel natural to me.

(Reading regexps may be a bit harder, exactly because they are to
simple: There is no abstraction, so a complicated pattern results in a
long regexp.)

There are some extensions to regexps which are conceptually harder, like
lookahead and lookbehind or nested contexts in Perl. I may need the
manual for those (especially because they are new(ish) and every
language uses a different syntax for them) or avoid them altogether.

Oh, and Python (just like Perl) allows you to embed whitespace and
comments into Regexps, which helps readability a lot if you have to
write long regexps.

> You might find https://regex101.com/ to be useful for testing your regex.
> You can enter in sample data and see if it matches.
>
> If I understood what your regex was trying to do I might be able to suggest
> some python to do the same thing. Is it just removing numbers from text?

Not "removing" them (as I understood it), but extracting them (i.e. find
and collect them).

> > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)

\b - a word boundary.
[0-9]{2,7} - 2 to 7 digits
- - a hyphen-minus
[0-9]{2} - exactly 2 digits
- - a hyphen-minus
[0-9]{2} - exactly 2 digits
\b - a word boundary.

Seems quite straightforward to me. I'll be impressed if you can write
that in Python in a way which is easier to read.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Re: Code improvement question

<regexp-20231119110325@ram.dialup.fu-berlin.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24510&group=comp.lang.python#24510

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!paganini.bofh.team!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram...@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: 19 Nov 2023 10:04:06 GMT
Organization: Stefan Ram
Lines: 32
Expires: 1 Dec 2024 11:59:58 GMT
Message-ID: <regexp-20231119110325@ram.dialup.fu-berlin.de>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au> <088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com> <32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au> <mailman.249.1700019686.3828.python-list@python.org> <uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at> <7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net> <20231117144606.ssezd234lj753bp2@hjp.at> <303c6738-4c51-4dbd-9c3c-1fe659b2ff6e@tompassin.net> <mailman.282.1700234667.3828.python-list@python.org> <Or-20231117191916@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de BP40UAC9OlVuImCXDoc2gQe2QLkpakF01CGWsZ+NQ/6k+/
Cancel-Lock: sha1:sRof9xN9p1cM3izGhFlc5/M/LOg= sha256:8o/PuF3a3NLOEna7zmq/vVRZGgKT+NSynFae91UAHSc=
X-Copyright: (C) Copyright 2023 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Accept-Language: de-DE-1901, en-US, it, fr-FR
 by: Stefan Ram - Sun, 19 Nov 2023 10:04 UTC

ram@zedat.fu-berlin.de (Stefan Ram) writes:
>Thomas Passin <list1@tompassin.net> writes:
>>>>>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>Or,

def repeat_preceding( min=None, max=None, count=None ):
''' require that the preceding regexp is repeated
a certain number of times, use either min and max
or count '''
return '{' + str( count )+ '}' if count else \
'{' + str( min )+ ',' + str( max )+ '}'

digit = '[0-9]' # match a decimal digit
word_boundary = r'\b' # match a word boundary
a_hyphen = '-' # match a literal hyphen character

def digits( **kwargs ):
''' A certain number of digits. See 'repeat_preceding' for
the possible kwargs. '''
return digit + repeat_preceding( **kwargs )

def word( regexp: str ):
''' something that starts and ends with a word boundary '''
return word_boundary + regexp + word_boundary

my_regexp = \
word \
( digits( min=2, max=7 ) + a_hyphen +
digits( count=2 ) + a_hyphen +
digits( count=2 ))

Re: Code improvement question

<87wmuexaxd.fsf@nightsong.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24511&group=comp.lang.python#24511

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.furie.org.uk!pasdenom.info!paganini.bofh.team!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: no.em...@nospam.invalid (Paul Rubin)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Sun, 19 Nov 2023 03:03:10 -0800
Organization: A noiseless patient Spider
Lines: 6
Message-ID: <87wmuexaxd.fsf@nightsong.com>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me>
<20231117111744.oocpwdjryvcty5ol@hjp.at>
<7072d3e8-317c-4953-9b7e-5a1750d957aa@tompassin.net>
<20231117144606.ssezd234lj753bp2@hjp.at>
<303c6738-4c51-4dbd-9c3c-1fe659b2ff6e@tompassin.net>
<mailman.282.1700234667.3828.python-list@python.org>
<Or-20231117191916@ram.dialup.fu-berlin.de>
<regexp-20231119110325@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="9d73f0b489e4758fb859966819163c5e";
logging-data="3981230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ntOQ1pOCzELClZiPMDSc9"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:wkSknKe2SjG0RU/EauvzJ0ta1Nc=
sha1:XTOeCy59vxvBFOBXI+pXfDqgdYc=
 by: Paul Rubin - Sun, 19 Nov 2023 11:03 UTC

ram@zedat.fu-berlin.de (Stefan Ram) writes:
> return '{' + str( count )+ '}' if count else \
> '{' + str( min )+ ',' + str( max )+ '}'

return f'{{{count}}}' if count else f'{{{min},{max}}}'

Re: Code improvement question

<ujggnh$f9p7$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24513&group=comp.lang.python#24513

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rimuatki...@gmail.com (Rimu Atkinson)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Tue, 21 Nov 2023 09:48:49 +1300
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <ujggnh$f9p7$1@dont-email.me>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<mailman.249.1700019686.3828.python-list@python.org>
<uj3h1b$1u2b5$1@dont-email.me> <20231117111744.oocpwdjryvcty5ol@hjp.at>
<mailman.279.1700219873.3828.python-list@python.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 20 Nov 2023 20:48:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45783c4d54ca1742fdc38ab99c199b70";
logging-data="501543"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18n9FnUSjbiMnoDVRMJWUWa"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:2Efc0REL83eihZO1oSgMNqv78F8=
Content-Language: en-NZ
In-Reply-To: <mailman.279.1700219873.3828.python-list@python.org>
 by: Rimu Atkinson - Mon, 20 Nov 2023 20:48 UTC

>
>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>
> \b - a word boundary.
> [0-9]{2,7} - 2 to 7 digits
> - - a hyphen-minus
> [0-9]{2} - exactly 2 digits
> - - a hyphen-minus
> [0-9]{2} - exactly 2 digits
> \b - a word boundary.
>
> Seems quite straightforward to me. I'll be impressed if you can write
> that in Python in a way which is easier to read.
>

Now that I know what {} does, you're right, that IS straightforward!
Maybe 2023 will be the year I finally get off my arse and learn regex.

Thanks :)

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor