novaBBS - comp.lang.python - Re: Ask for help on using re

Ask for help on using re

<904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=14443&group=comp.lang.python#14443

copy link Newsgroups: comp.lang.python

X-Received: by 2002:a37:9445:: with SMTP id w66mr3890690qkd.410.1628156431208;
Thu, 05 Aug 2021 02:40:31 -0700 (PDT)
X-Received: by 2002:a37:b002:: with SMTP id z2mr3757502qke.440.1628156431058;
Thu, 05 Aug 2021 02:40:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Thu, 5 Aug 2021 02:40:30 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2001:b011:e603:7535:505a:e50a:f0d6:639b;
posting-account=G2sM6AoAAADOlDdo9rWD6sFkj3T5ULsz
NNTP-Posting-Host: 2001:b011:e603:7535:505a:e50a:f0d6:639b
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
Subject: Ask for help on using re
From: jfo...@ms4.hinet.net (Jach Feng)
Injection-Date: Thu, 05 Aug 2021 09:40:31 +0000
Content-Type: text/plain; charset="UTF-8"

by: Jach Feng - Thu, 5 Aug 2021 09:40 UTC

I want to distinguish between numbers with/without a dot attached:

>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'ch \d{1,}[.]').findall(text)
['ch 1.', 'ch 23.']
>>> re.compile(r'ch \d{1,}[^.]').findall(text)
['ch 23', 'ch 4 ', 'ch 56 ']

I can guess why the 'ch 23' appears in the second list. But how to get rid of it?

--Jach

Re: Ask for help on using re

<610bbf35$0$695$14726298@news.sunsite.dk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14444&group=comp.lang.python#14444

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Message-ID: <610bbf35$0$695$14726298@news.sunsite.dk>
From: nddtwent...@gmail.com (Neil)
Subject: Re: Ask for help on using re
Newsgroups: comp.lang.python
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
User-Agent: tin/2.4.4-20191224 ("Millburn") (Linux/5.4.0-80-generic (x86_64))
Date: 05 Aug 2021 10:36:39 GMT
Lines: 20
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 9a566bc3.news.sunsite.dk
X-Trace: 1628159799 news.sunsite.dk 695 nddtwentyone@gmail.com/79.66.240.120:33164
X-Complaints-To: staff@sunsite.dk

by: Neil - Thu, 5 Aug 2021 10:36 UTC

Jach Feng <jfong@ms4.hinet.net> wrote:
> I want to distinguish between numbers with/without a dot attached:
>
>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>> re.compile(r'ch \d{1,}[.]').findall(text)
> ['ch 1.', 'ch 23.']
>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> ['ch 23', 'ch 4 ', 'ch 56 ']
>
> I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
>
> --Jach

Does

>>> re.findall(r'ch\s+\d+(?![.\d])',text)

do what you want? This matches "ch", then any nonzero number of
whitespaces, then any nonzero number of digits, provided this is not
followed by a dot or another digit.

Re: Ask for help on using re

<838626e9-8354-4ce3-8c79-6fdeeb3c91f6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14445&group=comp.lang.python#14445

copy link Newsgroups: comp.lang.python

X-Received: by 2002:a05:620a:21d8:: with SMTP id h24mr4171464qka.499.1628161985965;
Thu, 05 Aug 2021 04:13:05 -0700 (PDT)
X-Received: by 2002:a37:b703:: with SMTP id h3mr4323277qkf.240.1628161985773;
Thu, 05 Aug 2021 04:13:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Thu, 5 Aug 2021 04:13:05 -0700 (PDT)
In-Reply-To: <610bbf35$0$695$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:b011:e603:7535:505a:e50a:f0d6:639b;
posting-account=G2sM6AoAAADOlDdo9rWD6sFkj3T5ULsz
NNTP-Posting-Host: 2001:b011:e603:7535:505a:e50a:f0d6:639b
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com> <610bbf35$0$695$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <838626e9-8354-4ce3-8c79-6fdeeb3c91f6n@googlegroups.com>
Subject: Re: Ask for help on using re
From: jfo...@ms4.hinet.net (Jach Feng)
Injection-Date: Thu, 05 Aug 2021 11:13:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Jach Feng - Thu, 5 Aug 2021 11:13 UTC

Neil 在 2021年8月5日星期四下午6:36:58 [UTC+8] 的信中寫道：
> Jach Feng <jf...@ms4.hinet.net> wrote:
> > I want to distinguish between numbers with/without a dot attached:
> >
> >>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>>> re.compile(r'ch \d{1,}[.]').findall(text)
> > ['ch 1.', 'ch 23.']
> >>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> > ['ch 23', 'ch 4 ', 'ch 56 ']
> >
> > I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
> >
> > --Jach
> Does
>
> >>> re.findall(r'ch\s+\d+(?![.\d])',text)
>
> do what you want? This matches "ch", then any nonzero number of
> whitespaces, then any nonzero number of digits, provided this is not
> followed by a dot or another digit.

Yes, the result is what I want. Thank you, Neil!

The solution is more complex than I expect. Have to digest it:-)

--Jach

Re: Ask for help on using re

<in2cnvFesqhU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14449&group=comp.lang.python#14449

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: pkpear...@nowhere.invalid (Peter Pearson)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: 5 Aug 2021 15:00:15 GMT
Lines: 17
Message-ID: <in2cnvFesqhU1@mid.individual.net>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
X-Trace: individual.net aS4bVC9T4L1dID8iush/bwFnQcHC8K2G9jwQRaxuo+0XabwdrV
Cancel-Lock: sha1:xhUhlAzo6nkWjFSUqKAXBvy7tIE=
User-Agent: slrn/1.0.3 (Linux)

by: Peter Pearson - Thu, 5 Aug 2021 15:00 UTC

On Thu, 5 Aug 2021 02:40:30 -0700 (PDT), Jach Feng <jfong@ms4.hinet.net> wrote:
I want to distinguish between numbers with/without a dot attached:
>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'ch \d{1,}[.]').findall(text)
['ch 1.', 'ch 23.']
>>> re.compile(r'ch \d{1,}[^.]').findall(text)
['ch 23', 'ch 4 ', 'ch 56 ']
I can guess why the 'ch 23' appears in the second list. But how to get
rid of it?

>>> re.findall(r'ch \d+[^.0-9]', "ch 1. is ch 23. is ch 4 is ch 56 is ")
['ch 4 ', 'ch 56 ']

--
To email me, substitute nowhere->runbox, invalid->com.

Re: Ask for help on using re

<610bffbe$0$6185$426a34cc@news.free.fr>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14450&group=comp.lang.python#14450

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.nntp4.net!news.gegeweb.eu!gegeweb.org!fdn.fr!proxad.net!feeder1-2.proxad.net!cleanfeed1-a.proxad.net!nnrp1-2.free.fr!not-for-mail
Subject: Re: Ask for help on using re
Newsgroups: comp.lang.python
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
From: ast...@invalid (ast)
Date: Thu, 5 Aug 2021 17:11:58 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
MIME-Version: 1.0
In-Reply-To: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: fr
Content-Transfer-Encoding: 8bit
Lines: 23
Message-ID: <610bffbe$0$6185$426a34cc@news.free.fr>
Organization: Guest of ProXad - France
NNTP-Posting-Date: 05 Aug 2021 17:11:58 CEST
NNTP-Posting-Host: 91.170.32.5
X-Trace: 1628176318 news-4.free.fr 6185 91.170.32.5:8029
X-Complaints-To: abuse@proxad.net

by: ast - Thu, 5 Aug 2021 15:11 UTC

Le 05/08/2021 à 11:40, Jach Feng a écrit :
> I want to distinguish between numbers with/without a dot attached:
>
>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>> re.compile(r'ch \d{1,}[.]').findall(text)
> ['ch 1.', 'ch 23.']
>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> ['ch 23', 'ch 4 ', 'ch 56 ']
>
> I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
>
> --Jach
>

>>> import re

>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'

>>> re.findall(r'ch \d+\.', text)
['ch 1.', 'ch 23.']

>>> re.findall(r'ch \d+(?!\.)', text) # (?!\.) for negated look ahead
['ch 2', 'ch 4', 'ch 56']

Re: Ask for help on using re

<610c0206$0$23960$426a34cc@news.free.fr>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14451&group=comp.lang.python#14451

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.niel.me!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!cleanfeed1-b.proxad.net!nnrp1-1.free.fr!not-for-mail
Subject: Re: Ask for help on using re
Newsgroups: comp.lang.python
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<610bffbe$0$6185$426a34cc@news.free.fr>
From: ast...@invalid (ast)
Date: Thu, 5 Aug 2021 17:21:42 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
MIME-Version: 1.0
In-Reply-To: <610bffbe$0$6185$426a34cc@news.free.fr>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: fr
Content-Transfer-Encoding: 8bit
Lines: 27
Message-ID: <610c0206$0$23960$426a34cc@news.free.fr>
Organization: Guest of ProXad - France
NNTP-Posting-Date: 05 Aug 2021 17:21:42 CEST
NNTP-Posting-Host: 91.170.32.5
X-Trace: 1628176902 news-4.free.fr 23960 91.170.32.5:8495
X-Complaints-To: abuse@proxad.net

by: ast - Thu, 5 Aug 2021 15:21 UTC

Le 05/08/2021 à 17:11, ast a écrit :
> Le 05/08/2021 à 11:40, Jach Feng a écrit :
>> I want to distinguish between numbers with/without a dot attached:
>>
>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>> re.compile(r'ch \d{1,}[.]').findall(text)
>> ['ch 1.', 'ch 23.']
>>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
>> ['ch 23', 'ch 4 ', 'ch 56 ']
>>
>> I can guess why the 'ch 23' appears in the second list. But how to get
>> rid of it?
>>
>> --Jach
>>
>
> >>> import re
>
> >>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>
> >>> re.findall(r'ch \d+\.', text)
> ['ch 1.', 'ch 23.']
>
> >>> re.findall(r'ch \d+(?!\.)', text) # (?!\.) for negated look ahead
> ['ch 2', 'ch 4', 'ch 56']

ops ch2 is found. Wrong

Re: Ask for help on using re

<610c03b9$0$6455$426a34cc@news.free.fr>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14452&group=comp.lang.python#14452

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!cleanfeed1-b.proxad.net!nnrp4-2.free.fr!not-for-mail
Subject: Re: Ask for help on using re
Newsgroups: comp.lang.python
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<610bffbe$0$6185$426a34cc@news.free.fr>
From: ast...@invalid (ast)
Date: Thu, 5 Aug 2021 17:28:57 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
MIME-Version: 1.0
In-Reply-To: <610bffbe$0$6185$426a34cc@news.free.fr>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: fr
Content-Transfer-Encoding: 8bit
Lines: 41
Message-ID: <610c03b9$0$6455$426a34cc@news.free.fr>
Organization: Guest of ProXad - France
NNTP-Posting-Date: 05 Aug 2021 17:28:57 CEST
NNTP-Posting-Host: 91.170.32.5
X-Trace: 1628177337 news-4.free.fr 6455 91.170.32.5:10456
X-Complaints-To: abuse@proxad.net

by: ast - Thu, 5 Aug 2021 15:28 UTC

import regex

# regex is more powerful that re

>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'

>>> regex.findall(r'ch \d++(?!\.)', text)

['ch 4', 'ch 56']

## ++ means "possessive", no backtrack is allowed

Re: Ask for help on using re

<ca42a740-3381-401f-a8c4-3faf4425a15bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14461&group=comp.lang.python#14461

copy link Newsgroups: comp.lang.python

X-Received: by 2002:a05:620a:1999:: with SMTP id bm25mr6361460qkb.329.1628211434512;
Thu, 05 Aug 2021 17:57:14 -0700 (PDT)
X-Received: by 2002:ac8:7761:: with SMTP id h1mr6903567qtu.159.1628211434347;
Thu, 05 Aug 2021 17:57:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Thu, 5 Aug 2021 17:57:14 -0700 (PDT)
In-Reply-To: <610c03b9$0$6455$426a34cc@news.free.fr>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:b011:e603:53e7:3944:7e9:2e57:2971;
posting-account=G2sM6AoAAADOlDdo9rWD6sFkj3T5ULsz
NNTP-Posting-Host: 2001:b011:e603:53e7:3944:7e9:2e57:2971
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<610bffbe$0$6185$426a34cc@news.free.fr> <610c03b9$0$6455$426a34cc@news.free.fr>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ca42a740-3381-401f-a8c4-3faf4425a15bn@googlegroups.com>
Subject: Re: Ask for help on using re
From: jfo...@ms4.hinet.net (Jach Feng)
Injection-Date: Fri, 06 Aug 2021 00:57:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Jach Feng - Fri, 6 Aug 2021 00:57 UTC

ast 在 2021年8月5日星期四下午11:29:15 [UTC+8] 的信中寫道：
> Le 05/08/2021 à 17:11, ast a écrit :
> > Le 05/08/2021 à 11:40, Jach Feng a écrit :
> >> I want to distinguish between numbers with/without a dot attached:
> >>
> >>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>>>> re.compile(r'ch \d{1,}[.]').findall(text)
> >> ['ch 1.', 'ch 23.']
> >>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> >> ['ch 23', 'ch 4 ', 'ch 56 ']
> >>
> >> I can guess why the 'ch 23' appears in the second list. But how to get
> >> rid of it?
> >>
> >> --Jach
> >>
> >
> > >>> import re
> >
> > >>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >
> > >>> re.findall(r'ch \d+\.', text)
> > ['ch 1.', 'ch 23.']
> >
> > >>> re.findall(r'ch \d+(?!\.)', text) # (?!\.) for negated look ahead
> > ['ch 2', 'ch 4', 'ch 56']
> import regex
>
> # regex is more powerful that re
> >>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>> regex.findall(r'ch \d++(?!\.)', text)
>
> ['ch 4', 'ch 56']
>
> ## ++ means "possessive", no backtrack is allowed
Can someone explain how the difference appear? I just can't figure it out:-(

>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'ch \d+[^.]').findall(text)
['ch 23', 'ch 4 ', 'ch 56 ']
>>> re.compile(r'ch \d+[^.0-9]').findall(text)
['ch 4 ', 'ch 56 ']

--Jach

Re: Ask for help on using re

<seiqoa$tg7$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14464&group=comp.lang.python#14464

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!aioe.org!Eol/z4dydygnZXRd+nhA0Q.user.46.165.242.91.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: Fri, 6 Aug 2021 10:09:42 +0200
Organization: Aioe.org NNTP Server
Message-ID: <seiqoa$tg7$1@gioia.aioe.org>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="30215"; posting-host="Eol/z4dydygnZXRd+nhA0Q.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: it

by: jak - Fri, 6 Aug 2021 08:09 UTC

Il 05/08/2021 11:40, Jach Feng ha scritto:
> I want to distinguish between numbers with/without a dot attached:
>
>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>> re.compile(r'ch \d{1,}[.]').findall(text)
> ['ch 1.', 'ch 23.']
>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> ['ch 23', 'ch 4 ', 'ch 56 ']
>
> I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
>
> --Jach
>

import re

t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)

res = r.findall(t)

dot = [x[1] for x in res if x[1] != '']
udot = [x[0] for x in res if x[0] != '']

print(f"dot: {dot}")
print(f"undot: {udot}")

out:

dot: ['ch 4', 'ch 56']
undot: ['ch 1.', 'ch 23.']

Re: Ask for help on using re

<610cfe63$0$32528$426a74cc@news.free.fr>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14466&group=comp.lang.python#14466

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!cleanfeed3-b.proxad.net!nnrp1-1.free.fr!not-for-mail
Subject: Re: Ask for help on using re
Newsgroups: comp.lang.python
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<610bffbe$0$6185$426a34cc@news.free.fr>
<610c03b9$0$6455$426a34cc@news.free.fr>
<ca42a740-3381-401f-a8c4-3faf4425a15bn@googlegroups.com>
From: ast...@invalid (ast)
Date: Fri, 6 Aug 2021 11:18:27 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
MIME-Version: 1.0
In-Reply-To: <ca42a740-3381-401f-a8c4-3faf4425a15bn@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: fr
Content-Transfer-Encoding: 8bit
Lines: 46
Message-ID: <610cfe63$0$32528$426a74cc@news.free.fr>
Organization: Guest of ProXad - France
NNTP-Posting-Date: 06 Aug 2021 11:18:27 CEST
NNTP-Posting-Host: 91.170.32.5
X-Trace: 1628241507 news-3.free.fr 32528 91.170.32.5:9743
X-Complaints-To: abuse@proxad.net

by: ast - Fri, 6 Aug 2021 09:18 UTC

Le 06/08/2021 à 02:57, Jach Feng a écrit :
> ast 在 2021年8月5日星期四下午11:29:15 [UTC+8] 的信中寫道：
>> Le 05/08/2021 à 17:11, ast a écrit :
>>> Le 05/08/2021 à 11:40, Jach Feng a écrit :

>> import regex
>>
>> # regex is more powerful that re
>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>> regex.findall(r'ch \d++(?!\.)', text)
>>
>> ['ch 4', 'ch 56']
>>
>> ## ++ means "possessive", no backtrack is allowed

> Can someone explain how the difference appear? I just can't figure it out:-(
>

+, *, ? are greedy, means they try to catch as many characters
as possible. But if the whole match doesn't work, they release
some characters once at a time and try the whole match again.
That's backtrack.
With ++, backtrack is not allowed. This works with module regex
and it is not implemented in module re

with string = "ch 23." and pattern = r"ch \d+\."

At first trial \d+ catch 23
but whole match will fail because next character is . and . is not
allowed (\.)

A backtrack happens:

\d+ catch only 2
and the whole match is successful because the next char 3 is not .
But this is not what we want.

with ++, no backtrack, so no match
"ch 23." is rejected
this is what we wanted

Using re only, the best way is probably

re.findall(r"ch \d+(?![.0-9])", text)
['ch 4', 'ch 56']

Re: Ask for help on using re

<296739fe-87a5-457d-b541-46c92620cc57n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14467&group=comp.lang.python#14467

copy link Newsgroups: comp.lang.python

X-Received: by 2002:a05:622a:209:: with SMTP id b9mr8782674qtx.136.1628247459480;
Fri, 06 Aug 2021 03:57:39 -0700 (PDT)
X-Received: by 2002:ac8:7fc1:: with SMTP id b1mr8478930qtk.25.1628247459312;
Fri, 06 Aug 2021 03:57:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Fri, 6 Aug 2021 03:57:38 -0700 (PDT)
In-Reply-To: <seiqoa$tg7$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:b011:e603:53e7:3944:7e9:2e57:2971;
posting-account=G2sM6AoAAADOlDdo9rWD6sFkj3T5ULsz
NNTP-Posting-Host: 2001:b011:e603:53e7:3944:7e9:2e57:2971
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com> <seiqoa$tg7$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <296739fe-87a5-457d-b541-46c92620cc57n@googlegroups.com>
Subject: Re: Ask for help on using re
From: jfo...@ms4.hinet.net (Jach Feng)
Injection-Date: Fri, 06 Aug 2021 10:57:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Jach Feng - Fri, 6 Aug 2021 10:57 UTC

jak 在 2021年8月6日星期五下午4:10:05 [UTC+8] 的信中寫道：
> Il 05/08/2021 11:40, Jach Feng ha scritto:
> > I want to distinguish between numbers with/without a dot attached:
> >
> >>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>>> re.compile(r'ch \d{1,}[.]').findall(text)
> > ['ch 1.', 'ch 23.']
> >>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> > ['ch 23', 'ch 4 ', 'ch 56 ']
> >
> > I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
> >
> > --Jach
> >
> import re
> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>
> res = r.findall(t)
>
> dot = [x[1] for x in res if x[1] != '']
> udot = [x[0] for x in res if x[0] != '']
>
> print(f"dot: {dot}")
> print(f"undot: {udot}")
>
> out:
>
> dot: ['ch 4', 'ch 56']
> undot: ['ch 1.', 'ch 23.']
> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
That's an interest solution! Where the '|' operator in re.compile() was documented?

--Jach

Re: Ask for help on using re

<sejgar$1aic$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14479&group=comp.lang.python#14479

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!aioe.org!Eol/z4dydygnZXRd+nhA0Q.user.46.165.242.91.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: Fri, 6 Aug 2021 16:17:59 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sejgar$1aic$1@gioia.aioe.org>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<seiqoa$tg7$1@gioia.aioe.org>
<296739fe-87a5-457d-b541-46c92620cc57n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="43596"; posting-host="Eol/z4dydygnZXRd+nhA0Q.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: it

by: jak - Fri, 6 Aug 2021 14:17 UTC

Il 06/08/2021 12:57, Jach Feng ha scritto:
> jak 在 2021年8月6日星期五下午4:10:05 [UTC+8] 的信中寫道：
>> Il 05/08/2021 11:40, Jach Feng ha scritto:
>>> I want to distinguish between numbers with/without a dot attached:
>>>
>>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>>> re.compile(r'ch \d{1,}[.]').findall(text)
>>> ['ch 1.', 'ch 23.']
>>>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
>>> ['ch 23', 'ch 4 ', 'ch 56 ']
>>>
>>> I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
>>>
>>> --Jach
>>>
>> import re
>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>>
>> res = r.findall(t)
>>
>> dot = [x[1] for x in res if x[1] != '']
>> udot = [x[0] for x in res if x[0] != '']
>>
>> print(f"dot: {dot}")
>> print(f"undot: {udot}")
>>
>> out:
>>
>> dot: ['ch 4', 'ch 56']
>> undot: ['ch 1.', 'ch 23.']
>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
> That's an interest solution! Where the '|' operator in re.compile() was documented?
>
> --Jach
>

I honestly can't tell you, I've been using it for over 30 years. In any
case you can find some traces of it in the "regular expressions quick
reference" on the site https://regex101.com (bottom right side).

Re: Ask for help on using re

<sejhfq$1tg0$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14480&group=comp.lang.python#14480

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!aioe.org!Eol/z4dydygnZXRd+nhA0Q.user.46.165.242.91.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: Fri, 6 Aug 2021 16:37:43 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sejhfq$1tg0$1@gioia.aioe.org>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<seiqoa$tg7$1@gioia.aioe.org>
<296739fe-87a5-457d-b541-46c92620cc57n@googlegroups.com>
<sejgar$1aic$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="62976"; posting-host="Eol/z4dydygnZXRd+nhA0Q.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Content-Language: it
X-Notice: Filtered by postfilter v. 0.9.2

by: jak - Fri, 6 Aug 2021 14:37 UTC

Il 06/08/2021 16:17, jak ha scritto:
> Il 06/08/2021 12:57, Jach Feng ha scritto:
>> jak 在 2021年8月6日星期五下午4:10:05 [UTC+8] 的信中寫道：
>>> Il 05/08/2021 11:40, Jach Feng ha scritto:
>>>> I want to distinguish between numbers with/without a dot attached:
>>>>
>>>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>>>> re.compile(r'ch \d{1,}[.]').findall(text)
>>>> ['ch 1.', 'ch 23.']
>>>>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
>>>> ['ch 23', 'ch 4 ', 'ch 56 ']
>>>>
>>>> I can guess why the 'ch 23' appears in the second list. But how to
>>>> get rid of it?
>>>>
>>>> --Jach
>>>>
>>> import re
>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>>>
>>> res = r.findall(t)
>>>
>>> dot = [x[1] for x in res if x[1] != '']
>>> udot = [x[0] for x in res if x[0] != '']
>>>
>>> print(f"dot: {dot}")
>>> print(f"undot: {udot}")
>>>
>>> out:
>>>
>>> dot: ['ch 4', 'ch 56']
>>> undot: ['ch 1.', 'ch 23.']
>>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>> That's an interest solution! Where the '|' operator in re.compile()
>> was documented?
>>
>> --Jach
>>
>
> I honestly can't tell you, I've been using it for over 30 years. In any
> case you can find some traces of it in the "regular expressions quick
> reference" on the site https://regex101.com (bottom right side).
>
....if I'm not mistaken, the '|' it is part of normal regular
expressions, so it is not a specific extension of the python libraries.
Perhaps this is why you don't find any documentation on it.

Re: Ask for help on using re

<slrnsgqim3.49p.jon+usenet@raven.unequivocal.eu>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14481&group=comp.lang.python#14481

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jon+use...@unequivocal.eu (Jon Ribbens)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: Fri, 6 Aug 2021 14:44:19 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <slrnsgqim3.49p.jon+usenet@raven.unequivocal.eu>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<seiqoa$tg7$1@gioia.aioe.org>
<296739fe-87a5-457d-b541-46c92620cc57n@googlegroups.com>
<sejgar$1aic$1@gioia.aioe.org> <sejhfq$1tg0$1@gioia.aioe.org>
Injection-Date: Fri, 6 Aug 2021 14:44:19 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="05b4852fb3707fc8ce384c047bfef277";
logging-data="1244"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UOWwfvsme/kFx4DUAq+gVE47AmfsQpxc="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:j2PqED7m1wNMLzF9EbxwkPuoCeE=

by: Jon Ribbens - Fri, 6 Aug 2021 14:44 UTC

On 2021-08-06, jak <nospam@please.ty> wrote:
> Il 06/08/2021 16:17, jak ha scritto:
>> Il 06/08/2021 12:57, Jach Feng ha scritto:
>>> That's an interest solution! Where the '|' operator in re.compile()
>>> was documented?
>>
>> I honestly can't tell you, I've been using it for over 30 years. In any
>> case you can find some traces of it in the "regular expressions quick
>> reference" on the site https://regex101.com (bottom right side).
>
> ...if I'm not mistaken, the '|' it is part of normal regular
> expressions, so it is not a specific extension of the python libraries.
> Perhaps this is why you don't find any documentation on it.

The Python documentation fully describes the regular expression syntax
that the 're' module supports, including features that are widely
supported by different regular expression systems and also Python
extensions. '|' is documented here:
https://docs.python.org/3/library/re.html#index-13

Re: Ask for help on using re

<f15a5776-c609-4b41-8394-2c5d78dc4dd9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14487&group=comp.lang.python#14487

copy link Newsgroups: comp.lang.python

X-Received: by 2002:ad4:4ea8:: with SMTP id ed8mr2980665qvb.2.1628303036105;
Fri, 06 Aug 2021 19:23:56 -0700 (PDT)
X-Received: by 2002:a0c:e44f:: with SMTP id d15mr2981678qvm.18.1628303035982;
Fri, 06 Aug 2021 19:23:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Fri, 6 Aug 2021 19:23:55 -0700 (PDT)
In-Reply-To: <seiqoa$tg7$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:b011:e603:1408:54a1:8596:cfc:d4af;
posting-account=G2sM6AoAAADOlDdo9rWD6sFkj3T5ULsz
NNTP-Posting-Host: 2001:b011:e603:1408:54a1:8596:cfc:d4af
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com> <seiqoa$tg7$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f15a5776-c609-4b41-8394-2c5d78dc4dd9n@googlegroups.com>
Subject: Re: Ask for help on using re
From: jfo...@ms4.hinet.net (Jach Feng)
Injection-Date: Sat, 07 Aug 2021 02:23:56 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Jach Feng - Sat, 7 Aug 2021 02:23 UTC

jak 在 2021年8月6日星期五下午4:10:05 [UTC+8] 的信中寫道：
> Il 05/08/2021 11:40, Jach Feng ha scritto:
> > I want to distinguish between numbers with/without a dot attached:
> >
> >>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>>> re.compile(r'ch \d{1,}[.]').findall(text)
> > ['ch 1.', 'ch 23.']
> >>>> re.compile(r'ch \d{1,}[^.]').findall(text)
> > ['ch 23', 'ch 4 ', 'ch 56 ']
> >
> > I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
> >
> > --Jach
> >
> import re
> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>
> res = r.findall(t)
>
> dot = [x[1] for x in res if x[1] != '']
> udot = [x[0] for x in res if x[0] != '']
>
> print(f"dot: {dot}")
> print(f"undot: {udot}")
>
> out:
>
> dot: ['ch 4', 'ch 56']
> undot: ['ch 1.', 'ch 23.']
The result can be influenced by the order of re patterns?

>>> import re
>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M).findall(t)
[('ch 1.', ''), ('ch 23.', ''), ('', 'ch 4'), ('', 'ch 56')]

>>> re.compile(r'(ch +\d+)|(ch +\d+\.)', re.M).findall(t)
[('ch 1', ''), ('ch 23', ''), ('ch 4', ''), ('ch 56', '')]

--Jach

Re: Ask for help on using re

<selj64$4sh$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14488&group=comp.lang.python#14488

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!aioe.org!7XDtAKef6tDHBbst9plIbw.user.46.165.242.91.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: Sat, 7 Aug 2021 11:18:58 +0200
Organization: Aioe.org NNTP Server
Message-ID: <selj64$4sh$1@gioia.aioe.org>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<seiqoa$tg7$1@gioia.aioe.org>
<f15a5776-c609-4b41-8394-2c5d78dc4dd9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="5009"; posting-host="7XDtAKef6tDHBbst9plIbw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Content-Language: it
X-Notice: Filtered by postfilter v. 0.9.2

by: jak - Sat, 7 Aug 2021 09:18 UTC

Il 07/08/2021 04:23, Jach Feng ha scritto:
> jak 在 2021年8月6日星期五下午4:10:05 [UTC+8] 的信中寫道：
>> Il 05/08/2021 11:40, Jach Feng ha scritto:
>>> I want to distinguish between numbers with/without a dot attached:
>>>
>>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>>> re.compile(r'ch \d{1,}[.]').findall(text)
>>> ['ch 1.', 'ch 23.']
>>>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
>>> ['ch 23', 'ch 4 ', 'ch 56 ']
>>>
>>> I can guess why the 'ch 23' appears in the second list. But how to get rid of it?
>>>
>>> --Jach
>>>
>> import re
>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>>
>> res = r.findall(t)
>>
>> dot = [x[1] for x in res if x[1] != '']
>> udot = [x[0] for x in res if x[0] != '']
>>
>> print(f"dot: {dot}")
>> print(f"undot: {udot}")
>>
>> out:
>>
>> dot: ['ch 4', 'ch 56']
>> undot: ['ch 1.', 'ch 23.']
> The result can be influenced by the order of re patterns?
>
>>>> import re
>>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>> re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M).findall(t)
> [('ch 1.', ''), ('ch 23.', ''), ('', 'ch 4'), ('', 'ch 56')]
>
>>>> re.compile(r'(ch +\d+)|(ch +\d+\.)', re.M).findall(t)
> [('ch 1', ''), ('ch 23', ''), ('ch 4', ''), ('ch 56', '')]
>
> --Jach
>
Yes, when the patterns intersect each other as in your case. the
difference between the 2 patterns is the "." in addition. The logical or
does not continue checking when the condition is satisfied, so it is a
good idea, in these cases, to search for the most complete patterns
before the others.

Re: Ask for help on using re

<selk1u$g6e$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=14489&group=comp.lang.python#14489

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!aioe.org!7XDtAKef6tDHBbst9plIbw.user.46.165.242.91.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.python
Subject: Re: Ask for help on using re
Date: Sat, 7 Aug 2021 11:33:47 +0200
Organization: Aioe.org NNTP Server
Message-ID: <selk1u$g6e$1@gioia.aioe.org>
References: <904b3a0e-9e0f-401a-8bb0-cf14de5a8a85n@googlegroups.com>
<seiqoa$tg7$1@gioia.aioe.org>
<f15a5776-c609-4b41-8394-2c5d78dc4dd9n@googlegroups.com>
<selj64$4sh$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="16590"; posting-host="7XDtAKef6tDHBbst9plIbw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: it

by: jak - Sat, 7 Aug 2021 09:33 UTC

Il 07/08/2021 11:18, jak ha scritto:
> Il 07/08/2021 04:23, Jach Feng ha scritto:
>> jak 在 2021年8月6日星期五下午4:10:05 [UTC+8] 的信中寫道：
>>> Il 05/08/2021 11:40, Jach Feng ha scritto:
>>>> I want to distinguish between numbers with/without a dot attached:
>>>>
>>>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>>>> re.compile(r'ch \d{1,}[.]').findall(text)
>>>> ['ch 1.', 'ch 23.']
>>>>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
>>>> ['ch 23', 'ch 4 ', 'ch 56 ']
>>>>
>>>> I can guess why the 'ch 23' appears in the second list. But how to
>>>> get rid of it?
>>>>
>>>> --Jach
>>>>
>>> import re
>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>>>
>>> res = r.findall(t)
>>>
>>> dot = [x[1] for x in res if x[1] != '']
>>> udot = [x[0] for x in res if x[0] != '']
>>>
>>> print(f"dot: {dot}")
>>> print(f"undot: {udot}")
>>>
>>> out:
>>>
>>> dot: ['ch 4', 'ch 56']
>>> undot: ['ch 1.', 'ch 23.']
>> The result can be influenced by the order of re patterns?
>>
>>>>> import re
>>>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>> re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M).findall(t)
>> [('ch 1.', ''), ('ch 23.', ''), ('', 'ch 4'), ('', 'ch 56')]
>>
>>>>> re.compile(r'(ch +\d+)|(ch +\d+\.)', re.M).findall(t)
>> [('ch 1', ''), ('ch 23', ''), ('ch 4', ''), ('ch 56', '')]
>>
>> --Jach
>>
> Yes, when the patterns intersect each other as in your case. the
> difference between the 2 patterns is the "." in addition. The logical or
> does not continue checking when the condition is satisfied, so it is a
> good idea, in these cases, to search for the most complete patterns
> before the others.
>
>
PS
.... the behavior of the logical or that I have described is not typical
of regular expressions but it is common in all programming languages.

If God had a beard, he'd be a UNIX programmer.

devel / comp.lang.python / Re: Ask for help on using re

Subject	Author
Ask for help on using re	Jach Feng
Re: Ask for help on using re	Neil
Re: Ask for help on using re	Jach Feng
Re: Ask for help on using re	Peter Pearson
Re: Ask for help on using re	ast
Re: Ask for help on using re	ast
Re: Ask for help on using re	ast
Re: Ask for help on using re	Jach Feng
Re: Ask for help on using re	ast
Re: Ask for help on using re	jak
Re: Ask for help on using re	Jach Feng
Re: Ask for help on using re	jak
Re: Ask for help on using re	jak
Re: Ask for help on using re	Jon Ribbens
Re: Ask for help on using re	Jach Feng
Re: Ask for help on using re	jak
Re: Ask for help on using re	jak