Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The world is no nursery. -- Sigmund Freud


devel / comp.unix.bsd.freebsd.misc / Re: freebsd arm64 (rpi4) problem with regex?

SubjectAuthor
* freebsd arm64 (rpi4) problem with regex?Mike Scott
+* Re: freebsd arm64 (rpi4) problem with regex?Lew Pitcher
|`* Re: freebsd arm64 (rpi4) problem with regex?Mike Scott
| `- Re: freebsd arm64 (rpi4) problem with regex?Christian Weisgerber
`* Re: freebsd arm64 (rpi4) problem with regex?Christian Weisgerber
 +* Re: freebsd arm64 (rpi4) problem with regex?druck
 |+* Re: freebsd arm64 (rpi4) problem with regex?Lew Pitcher
 ||`* Re: freebsd arm64 (rpi4) problem with regex?A. Dumas
 || `- Re: freebsd arm64 (rpi4) problem with regex?Lew Pitcher
 |`- Re: freebsd arm64 (rpi4) problem with regex?Ahem A Rivet's Shot
 `- Re: freebsd arm64 (rpi4) problem with regex?Mike Scott

1
freebsd arm64 (rpi4) problem with regex?

<stgtog$27m$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=143&group=comp.unix.bsd.freebsd.misc#143

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: usenet...@scottsonline.org.uk.invalid (Mike Scott)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: freebsd arm64 (rpi4) problem with regex?
Date: Thu, 3 Feb 2022 15:52:47 +0000
Organization: Scott family
Lines: 41
Message-ID: <stgtog$27m$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 3 Feb 2022 15:52:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fdee02df1e6a40b65be0a63da2d46075";
logging-data="2294"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+2w0g4NMaX5J5koqm5V6QdIs0LClWLDUs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:qcn7MwZbxLYMOJXPkS1dYrMDloI=
Content-Language: en-GB
 by: Mike Scott - Thu, 3 Feb 2022 15:52 UTC

This is on freebsd13.0/arm64/rpi4

A problem arising from milter-regex. This fails to accept known-good
regular expressions, directly taken from a working i386 system.

I believe the problem lies in the regex library, as a test program fails
to compile regular expressions that contain backslashed special characters:

The salient chunk of my test program is
regex_t re;
if( regcomp( &re, argv[1], REG_ICASE ) ) {
printf("bad re\n");

which works on "simple" things:
# ./a '123' 'abc123def'
re <<123>> string <<abc123def>>
matching:- <<123>>

but fails on \s and \t etc:
# ./a '\s' 'abc def'
re <<\s>> string <<abc def>>
bad re

although this also works
# ./a '\\' 'abc\def'
re <<\\>> string <<abc\def>>
matching:- <<\>>

(test program takes the re and a test string as its two args)

I'd not be surprised if this is another char <==> int problem, but the
regex stuff is a tad more complex than spfmilter was.

Can anyone check this out please?

--
Mike Scott
Harlow, England

Re: freebsd arm64 (rpi4) problem with regex?

<sth0fm$69j$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=144&group=comp.unix.bsd.freebsd.misc#144

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Thu, 3 Feb 2022 16:39:18 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <sth0fm$69j$1@dont-email.me>
References: <stgtog$27m$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 3 Feb 2022 16:39:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="560024ea8f4a2260a76c8650eec08086";
logging-data="6451"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0X1H+5Sc+DfHAvvv8eIcZpUjYYyBZE3M="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:B5NVDf2b/Q6x0y0/D+CO1do3GJY=
 by: Lew Pitcher - Thu, 3 Feb 2022 16:39 UTC

On Thu, 03 Feb 2022 15:52:47 +0000, Mike Scott wrote:

> This is on freebsd13.0/arm64/rpi4
>
> A problem arising from milter-regex. This fails to accept known-good
> regular expressions, directly taken from a working i386 system.
>
> I believe the problem lies in the regex library, as a test program fails
> to compile regular expressions that contain backslashed special
> characters:
>
> The salient chunk of my test program is
> regex_t re;
> if( regcomp( &re, argv[1], REG_ICASE ) ) {
> printf("bad re\n");

In general, it would be helpful to know /why/ regcomp(3) disliked a given
regex. Try using regerror(3) [1]. Something like this (caution: code
neither syntax checked nor
tested) ...
regex_t re;
int regcomp_rc;

if(regcomp_rc = regcomp(&re, argv[1], REG_ICASE))
{
char regcomp_error[256]; /* or some other large-enough size */

regerror(regcomp_rc,argv[1],regcomp_error,sizeof regcomp_error);
printf("bad re: regcomp() = %d (%s\n",regcomp_rc,regcomp_error);
/*
... other error handling as required
*/
}
could give you a better idea of why regcomp() didnt like a given regex.

> which works on "simple" things:
> # ./a '123' 'abc123def'
> re <<123>> string <<abc123def>>
> matching:- <<123>>
>
>
> but fails on \s and \t etc:
> # ./a '\s' 'abc def'
> re <<\s>> string <<abc def>>
> bad re

re_format(7) [2] gives a list of handled backslash-escaped sequences,
and '\t' isn't one of the handled sequences. Given that, regex(7)
says that an atom may be
"...
a '\' followed by any other character (matching that character taken
as an ordinary character, as if the '\' had not been present)
..."

So, it looks like regcomp() /should/ handle your test case here.

> although this also works # ./a '\\' 'abc\def'
> re <<\\>> string <<abc\def>>
> matching:- <<\>>

It looks to me like the regcomp(3) backslash-handling logic may be
rejecting anything that doesn't match it's list of handled characters
(although it /should/ handle your '\t' as 't', according to the docs).

> (test program takes the re and a test string as its two args)
>
>
> I'd not be surprised if this is another char <==> int problem, but the
> regex stuff is a tad more complex than spfmilter was.
>
> Can anyone check this out please?

[1] https://www.freebsd.org/cgi/man.cgi?query=regex&sektion=3
[2] https://www.freebsd.org/cgi/man.cgi?query=re_format&sektion=7

HTH
--
Lew Pitcher
"In Skills, We Trust"

Re: freebsd arm64 (rpi4) problem with regex?

<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=145&group=comp.unix.bsd.freebsd.misc#145

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!inka.de!mips.inka.de!.POSTED.localhost!not-for-mail
From: nad...@mips.inka.de (Christian Weisgerber)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Thu, 3 Feb 2022 18:23:07 -0000 (UTC)
Message-ID: <slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>
References: <stgtog$27m$1@dont-email.me>
Injection-Date: Thu, 3 Feb 2022 18:23:07 -0000 (UTC)
Injection-Info: lorvorc.mips.inka.de; posting-host="localhost:::1";
logging-data="77028"; mail-complaints-to="usenet@mips.inka.de"
User-Agent: slrn/1.0.3 (FreeBSD)
 by: Christian Weisgerber - Thu, 3 Feb 2022 18:23 UTC

On 2022-02-03, Mike Scott <usenet.16@scottsonline.org.uk.invalid> wrote:

> This is on freebsd13.0/arm64/rpi4
>
> A problem arising from milter-regex. This fails to accept known-good
> regular expressions, directly taken from a working i386 system.

I think your arm64 system is at a different revision of FreeBSD
than your i386 one.

> I believe the problem lies in the regex library, as a test program fails
> to compile regular expressions that contain backslashed special characters:
>
> but fails on \s and \t etc:
> # ./a '\s' 'abc def'
> re <<\s>> string <<abc def>>
> bad re

What are "\s" and "\t" supposed to mean? In traditional regular
expressions, they have no meaning. In that case, the '\' used to
be ignored, i.e., they were equivalent to plain "s" and "t".

However, that was changed in this commit...

regex(3): Interpret many escaped ordinary characters as EESCAPE
https://cgit.freebsd.org/src/commit/lib/libc/regex?id=adeebf4cd47c3e85155d92f386bda5e519b75ab2

.... so such sequences would now result in an error.

Subsequently, some GNU extensions have been added that give new
meaning to "\s" but not to "\t".

--
Christian "naddy" Weisgerber naddy@mips.inka.de

Re: freebsd arm64 (rpi4) problem with regex?

<stjbrh$tk5$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=146&group=comp.unix.bsd.freebsd.misc#146

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: usenet...@scottsonline.org.uk.invalid (Mike Scott)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 14:05:36 +0000
Organization: Scott family
Lines: 108
Message-ID: <stjbrh$tk5$1@dont-email.me>
References: <stgtog$27m$1@dont-email.me> <sth0fm$69j$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 4 Feb 2022 14:05:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="cb00fa646a898e093f323b69d633726d";
logging-data="30341"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+/walH08PqNERov7RvEBrmeJMWvyL5A0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:QvOkP7srXIBKi/yXc6dAACwchnc=
In-Reply-To: <sth0fm$69j$1@dont-email.me>
Content-Language: en-GB
 by: Mike Scott - Fri, 4 Feb 2022 14:05 UTC

On 03/02/2022 16:39, Lew Pitcher wrote:
> On Thu, 03 Feb 2022 15:52:47 +0000, Mike Scott wrote:
>
>> This is on freebsd13.0/arm64/rpi4
>>
>> A problem arising from milter-regex. This fails to accept known-good
>> regular expressions, directly taken from a working i386 system.
>>
>> I believe the problem lies in the regex library, as a test program fails
>> to compile regular expressions that contain backslashed special
>> characters:
>>
>> The salient chunk of my test program is
>> regex_t re;
>> if( regcomp( &re, argv[1], REG_ICASE ) ) {
>> printf("bad re\n");
......

Thanks for the responses.

Firstly, I have to confess to some history: way, way back, I modified
milter-regex to use pcre rather than libc's regex routines. That's
probably why my patterns still have \s strings and the like: these are
valid in pcre as \s -> whitespace and \t -> tab etc.

That said, the "proper" package code from freebsd was reinstated several
years ago, and both systems (i386 and arm64) are running the same
packaged version 2.7.2. (It means my re's certainly haven't worked for a
while, but that's a separate issue: ooops!) I have the exact same
milter-regex config file on both machines.

On the arm64 box (fbsd 13.0), I get logged:
parse_ruleset: /usr/local/etc/milter-regex.conf:196: regcomp:
^\s*Fwd.?\s*$: trailing backslash (\)

As has been pointed out, \s may not mean what I wanted it to: but is
nevertheless valid, and that re should be accepted as equivalent to
^s*Fwd.?s*$
(man re_format is unambiguous on this)

On the i386 (running fbsd 11.4), the file compiles happily in full.
Hence my supposition about errors in the regex library.

The error returned in my test code on the arm64 from regcomp() is 5
(REG_EESCAPE). On the i386 I get

% ./a '\b' '123abc4 56'
re <<\b>> string <<123abc4 56>>
matching:- <<b>>

and on the arm64:
root@kirk:/usr/plumtree/config/milter-regex # ./a '\b' '123abc4 56'
re <<\b>> string <<123abc4 56>>
bad re 5

Hmmm. I'm wondering about char's and int's. It's been a long, long while
since I looked into the depths of Henry Spencer's original code (that on
a Sun): I have a vague recollection of liberties being taken with them
but IMWBW.

FWIW the test code, hacked from elsewhere, is

#include <stdio.h>
#include <regex.h>
#include <stdlib.h>

#define MAXMATCH 100

int main(int argc, char *argv[]) {
regex_t re;
regmatch_t matches[MAXMATCH];

if( argc != 3 ) exit(1);

printf("re <<%s>> string <<%s>>\n", argv[1], argv[2]);

/* if( regcomp( &re, argv[1], REG_EXTENDED | REG_ICASE ) ) { */
int errc = regcomp( &re, argv[1], REG_ICASE );
if( errc ) {
printf("bad re %x\n", errc);
exit(1);
}

int err = regexec( &re, argv[2], MAXMATCH, matches, 0);
if( err ) {
printf("match failed %d\n", err);
exit(1);
}

printf("matching:- <<");
int p;
for( p = matches[0].rm_so; p < matches[0].rm_eo; ++p )
printf("%c", argv[2][p]);
printf(">>\n");

}

--
Mike Scott
Harlow, England

Re: freebsd arm64 (rpi4) problem with regex?

<slrnsvqghf.1rq.naddy@lorvorc.mips.inka.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=147&group=comp.unix.bsd.freebsd.misc#147

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!news.niel.me!aioe.org!news.freedyn.de!weretis.net!feeder8.news.weretis.net!news.szaf.org!inka.de!mips.inka.de!.POSTED.localhost!not-for-mail
From: nad...@mips.inka.de (Christian Weisgerber)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 15:11:43 -0000 (UTC)
Message-ID: <slrnsvqghf.1rq.naddy@lorvorc.mips.inka.de>
References: <stgtog$27m$1@dont-email.me> <sth0fm$69j$1@dont-email.me>
<stjbrh$tk5$1@dont-email.me>
Injection-Date: Fri, 4 Feb 2022 15:11:43 -0000 (UTC)
Injection-Info: lorvorc.mips.inka.de; posting-host="localhost:::1";
logging-data="1915"; mail-complaints-to="usenet@mips.inka.de"
User-Agent: slrn/1.0.3 (FreeBSD)
 by: Christian Weisgerber - Fri, 4 Feb 2022 15:11 UTC

On 2022-02-04, Mike Scott <usenet.16@scottsonline.org.uk.invalid> wrote:

> On the arm64 box (fbsd 13.0), I get logged:
> parse_ruleset: /usr/local/etc/milter-regex.conf:196: regcomp:
> ^\s*Fwd.?\s*$: trailing backslash (\)

> On the i386 (running fbsd 11.4), the file compiles happily in full.
> Hence my supposition about errors in the regex library.

Again: This is an intentional change in behavior that was at some
point introduced in FreeBSD's libc regex code.

Specifically this commit, which is in 13.x but not in 11.x:
https://cgit.freebsd.org/src/commit/lib/libc/regex?id=adeebf4cd47c3e85155d92f386bda5e519b75ab2

Here's full commit message:

------------------->
regex(3): Interpret many escaped ordinary characters as EESCAPE

In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for
any character to be escaped, but "ORD_CHAR preceded by an unescaped
<backslash> character [gives undefined results]".

Historically, we've interpreted an escaped ordinary character as the
ordinary character itself. This becomes problematic when some extensions
give special meanings to an otherwise ordinary character
(e.g. GNU's \b, \s, \w), meaning we may have two different valid
interpretations of the same sequence.

To make this easier to deal with and given that the standard calls this
undefined, we should throw an error (EESCAPE) if we run into this scenario
to ease transition into a state where some escaped ordinaries are blessed
with a special meaning -- it will either error out or have extended
behavior, rather than have two entirely different versions of undefined
behavior that leave the consumer of regex(3) guessing as to what behavior
will be used or leaving them with false impressions.

This change bumps the symbol version of regcomp to FBSD_1.6 and provides the
old escape semantics for legacy applications, just in case one has an older
application that would immediately turn into a pumpkin because of an
extraneous escape that's embedded or otherwise critical to its
operation.

This is the final piece needed before enhancing libregex with GNU extensions
and flipping the switch on bsdgrep.

[1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/

PR: 229925 (exp-run, courtesy of antoine)
Differential Revision: https://reviews.freebsd.org/D10510
<-------------------

--
Christian "naddy" Weisgerber naddy@mips.inka.de

Re: freebsd arm64 (rpi4) problem with regex?

<stk3rt$b9n$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=148&group=comp.unix.bsd.freebsd.misc#148

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: new...@druck.org.uk (druck)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 20:55:23 +0000
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <stk3rt$b9n$2@dont-email.me>
References: <stgtog$27m$1@dont-email.me>
<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 4 Feb 2022 20:55:28 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ad66360aa23bb7c6212227adfdd70aec";
logging-data="11575"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zYKrW0TYRZcJjQSPjLdxY"
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:2665nWTFIVETahF5lfwpG4WNA6c=
In-Reply-To: <slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>
X-Antivirus-Status: Clean
Content-Language: en-GB
X-Antivirus: Avast (VPS 220204-4, 4/2/2022), Outbound message
 by: druck - Fri, 4 Feb 2022 20:55 UTC

On 03/02/2022 18:23, Christian Weisgerber wrote:
> What are "\s" and "\t" supposed to mean? In traditional regular
> expressions, they have no meaning.

I'm not sure how many decades ago you are claiming for traditional reg
ex, but \s and \t have been any white space and tab for a long as I can
remember.

---druck

Re: freebsd arm64 (rpi4) problem with regex?

<stk61a$3gd$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=149&group=comp.unix.bsd.freebsd.misc#149

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 21:32:26 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <stk61a$3gd$1@dont-email.me>
References: <stgtog$27m$1@dont-email.me>
<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de> <stk3rt$b9n$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 4 Feb 2022 21:32:26 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c09004439912f8b45c0d412c617490b3";
logging-data="3597"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ESzj5nPWcReWm0ozXzf5+lr2pIVWVuYo="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:3m91S+xxleqaDCTkXOtoPw1d4Ng=
 by: Lew Pitcher - Fri, 4 Feb 2022 21:32 UTC

On Fri, 04 Feb 2022 20:55:23 +0000, druck wrote:

> On 03/02/2022 18:23, Christian Weisgerber wrote:
>> What are "\s" and "\t" supposed to mean? In traditional regular
>> expressions, they have no meaning.
>
> I'm not sure how many decades ago you are claiming for traditional reg
> ex, but \s and \t have been any white space and tab for a long as I can
> remember.

Yah.... no.

In POSIX regular expressions, neither \s nor \t have any documented
"special" meaning; for BREs,
"The interpretation of an ordinary character preceded by a backslash
( '\' ) is undefined, except for:
* The characters ')', '(', '{', and '}'
* The digits 1 to 9 inclusive (see BREs Matching Multiple Characters)
* A character inside a bracket expression"
and for EREs,
"An ordinary character is any character in the supported character set,
except for the ERE special characters listed in ERE Special
Characters. The interpretation of an ordinary character preceded by a
backslash ( '\' ) is undefined."
where, ERE Special Characters consists of a handful of punctuation
characters, and no alphabetics [1].

A common implementation of the POSIX regular expression parser defines a
regular expression atom, in part, as
"..., a '\' followed by one of the characters "^.[$()|*+?{\" (matching
that character taken as an ordinary character), a '\' followed by any
other character (matching that character taken as an ordinary
character, as if the '\' had not been present), ..." [2]

In neither case do either \s or \t have any "special" meaning.

OTOH, Perl-compatable regular expressions recognize \s and \t as having
special meanings, with \s meaning "any white space character", and \t
meaning "tab (hex 09)"

It is worth noting that the OP was asking about POSIX regular
expressions, as handled by the POSIX regcomp(3) interface, and /not/ pcre
regular expressions.

HTH

[1] https://pubs.opengroup.org/onlinepubs/009696899/basedefs/
xbd_chap09.html

[2] https://man7.org/linux/man-pages/man7/regex.7.html
--
Lew Pitcher
"In Skills, We Trust"

Re: freebsd arm64 (rpi4) problem with regex?

<20220204215316.54b1d782d15e2ba46017482f@eircom.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=150&group=comp.unix.bsd.freebsd.misc#150

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ste...@eircom.net (Ahem A Rivet's Shot)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 21:53:16 +0000
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <20220204215316.54b1d782d15e2ba46017482f@eircom.net>
References: <stgtog$27m$1@dont-email.me>
<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>
<stk3rt$b9n$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="d75739521146797333d35310e328cf02";
logging-data="7782"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19UorydkA4aX12YETsW9az/D6/FxZMHxio="
Cancel-Lock: sha1:GcQ2L38+tHB11kmzfVfEk67dzCQ=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd13.0)
X-Clacks-Overhead: "GNU Terry Pratchett"
 by: Ahem A Rivet's - Fri, 4 Feb 2022 21:53 UTC

On Fri, 4 Feb 2022 20:55:23 +0000
druck <news@druck.org.uk> wrote:

> On 03/02/2022 18:23, Christian Weisgerber wrote:
> > What are "\s" and "\t" supposed to mean? In traditional regular
> > expressions, they have no meaning.
>
> I'm not sure how many decades ago you are claiming for traditional reg
> ex, but \s and \t have been any white space and tab for a long as I can
> remember.

They are in many places (including pcre) but not re_format(7).

--
Steve O'Hara-Smith
Odds and Ends at http://www.sohara.org/

Re: freebsd arm64 (rpi4) problem with regex?

<stk86s$ab8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=151&group=comp.unix.bsd.freebsd.misc#151

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: alexan...@dumas.fr.invalid (A. Dumas)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 23:09:32 +0100
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <stk86s$ab8$1@dont-email.me>
References: <stgtog$27m$1@dont-email.me>
<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de> <stk3rt$b9n$2@dont-email.me>
<stk61a$3gd$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 4 Feb 2022 22:09:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bc6672c37381b902b0ef5c980c0d661e";
logging-data="10600"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QII4UHxthr53qhNHbeTscUXcSWydHDmI="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.5.0
Cancel-Lock: sha1:JhUhL+WTvmPFmXb2QW8JsVpoimw=
In-Reply-To: <stk61a$3gd$1@dont-email.me>
Content-Language: nl
 by: A. Dumas - Fri, 4 Feb 2022 22:09 UTC

On 04-02-2022 22:32, Lew Pitcher wrote:
> OTOH, Perl-compatable regular expressions recognize \s and \t as having
> special meanings,

Not only pcre, also enhanced or extended. I don't have FreeBSD here but
this is from the MacOS man page which is based on BSD: (conclusion below
that)

-----
ENHANCED FEATURES
When the REG_ENHANCED flag is passed to one of the regcomp()
variants, additional features are activated. Like the enhanced regex
implementations in scripting languages such as
perl(1) and python(1), these additional features may conflict with
the IEEE Std 1003.2 (``POSIX.2'') standards in some ways. Use this with
care in situations which require
portability (including to past versions of the Mac OS X using the
previous regex implementation).

For enhanced basic REs, `+', `?' and `|' remain regular
characters, but `\+', `\?' and `\|' have the same special meaning as the
unescaped characters do for extended REs, i.e.,
one or more matches, zero or one matches and alteration,
respectively. For enhanced extended REs, back references are available.
Additional enhanced features are listed below.

Within a bracket expression, most characters lose their magic.
This also applies to the additional enhanced features, which don't
operate inside a bracket expression.

Assertions (available for both enhanced basic and enhanced extended REs)
In addition to `^' and `$' (the assertions that match the null
string at the beginning and end of line, respectively), the following
assertions become available:

[...]

Shortcuts (available for both enhanced basic and enhanced extended REs)
The following shortcuts can be used to replace more complicated
bracket expressions.

[...]
\s Matches a space character. This is equivalent to
`[[:space:]]'.
[...]

Literal Sequences (available for both enhanced basic and enhanced
extended REs)
Literals are normally just ordinary characters that are matched
directly. Under enhanced mode, certain character sequences are
converted to specific literals.

[...]
\t The ``horizontal-tab'' character (ASCII code 9).
[...]
-----

So in practice it turns out that, using the built-in BSD-based grep on
MacOS without any flags, it does support both \s and \t.

Re: freebsd arm64 (rpi4) problem with regex?

<stkbj1$3gd$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=152&group=comp.unix.bsd.freebsd.misc#152

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Fri, 4 Feb 2022 23:07:13 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <stkbj1$3gd$2@dont-email.me>
References: <stgtog$27m$1@dont-email.me>
<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de> <stk3rt$b9n$2@dont-email.me>
<stk61a$3gd$1@dont-email.me> <stk86s$ab8$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 4 Feb 2022 23:07:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c5fdc5b9984ac18d7e5d21e3a4f852bf";
logging-data="3597"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19i4SZepb4MiZXKYueCwDfwJ/IkGDu0+sk="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:V7WD4Uvi74tfedtoWe0gLZjLcoo=
 by: Lew Pitcher - Fri, 4 Feb 2022 23:07 UTC

On Fri, 04 Feb 2022 23:09:32 +0100, A. Dumas wrote:

> On 04-02-2022 22:32, Lew Pitcher wrote:
>> OTOH, Perl-compatable regular expressions recognize \s and \t as having
>> special meanings,
>
> Not only pcre, also enhanced or extended. I don't have FreeBSD here but
> this is from the MacOS man page which is based on BSD: (conclusion below
> that)
>
> -----
> ENHANCED FEATURES
> When the REG_ENHANCED flag is passed to one of the regcomp()
> variants, additional features are activated.
[snip]
> So in practice it turns out that, using the built-in BSD-based grep on
> MacOS without any flags, it does support both \s and \t.

From the OP
OP> The salient chunk of my test program is
OP> regex_t re;
OP> if( regcomp( &re, argv[1], REG_ICASE ) ) {
OP> printf("bad re\n");

If I correctly understand the documentation you posted, to get the BSD
regex "Enhanced features" that include enhanced escape parsing, the OP
would have had to specify the REG_ENHANCED flag to regcomp()

I note that the OP /did not/ include this flag, nor has this flag
been discussed elsethread. I also note that the OP /did not/ include the
REG_EXTENDED flag, so his regcomp() will interpret the regex as a BRE.

Still, it is up to the implementation as to how it will handle the
expansion of those escaped characters that the POSIX standard leaves
undefined (in BRE only; they are well-defined in ERE, which the OP
is not using).

--
Lew Pitcher
"In Skills, We Trust"

Re: freebsd arm64 (rpi4) problem with regex?

<su0mct$k5o$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=154&group=comp.unix.bsd.freebsd.misc#154

  copy link   Newsgroups: comp.unix.bsd.freebsd.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: usenet...@scottsonline.org.uk.invalid (Mike Scott)
Newsgroups: comp.unix.bsd.freebsd.misc,comp.sys.raspberry-pi
Subject: Re: freebsd arm64 (rpi4) problem with regex?
Date: Wed, 9 Feb 2022 15:25:16 +0000
Organization: Scott family
Lines: 50
Message-ID: <su0mct$k5o$1@dont-email.me>
References: <stgtog$27m$1@dont-email.me>
<slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 9 Feb 2022 15:25:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9cc950fcf610dc2d669e63dfa86f135a";
logging-data="20664"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/nwJpWrrP3djdt7UWa6HgrFKtzrzclLZs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:y3VEP/sgR4Wyw1L8WOtRLi5ZAC0=
In-Reply-To: <slrnsvo7cb.2b73.naddy@lorvorc.mips.inka.de>
Content-Language: en-GB
 by: Mike Scott - Wed, 9 Feb 2022 15:25 UTC

On 03/02/2022 18:23, Christian Weisgerber wrote:
> On 2022-02-03, Mike Scott <usenet.16@scottsonline.org.uk.invalid> wrote:
>
>> This is on freebsd13.0/arm64/rpi4
>>
>> A problem arising from milter-regex. This fails to accept known-good
>> regular expressions, directly taken from a working i386 system.
>
> I think your arm64 system is at a different revision of FreeBSD
> than your i386 one.
>
>> I believe the problem lies in the regex library, as a test program fails
>> to compile regular expressions that contain backslashed special characters:
>>
>> but fails on \s and \t etc:
>> # ./a '\s' 'abc def'
>> re <<\s>> string <<abc def>>
>> bad re
>
> What are "\s" and "\t" supposed to mean? In traditional regular
> expressions, they have no meaning. In that case, the '\' used to
> be ignored, i.e., they were equivalent to plain "s" and "t".
>
> However, that was changed in this commit...
>
> regex(3): Interpret many escaped ordinary characters as EESCAPE
> https://cgit.freebsd.org/src/commit/lib/libc/regex?id=adeebf4cd47c3e85155d92f386bda5e519b75ab2
>
> ... so such sequences would now result in an error.
>
> Subsequently, some GNU extensions have been added that give new
> meaning to "\s" but not to "\t".
>

Thanks to everyone for comments.

I'll note my own error in accidentally trying to use pcre-style
expressions in regex; I also take on board the changes being made: but
just perhaps, the man pages should also be kept in step with the code --
and incompatible changes like that noted above perhaps merit a warning
in a large flashing fluorescent font for some years.

Meanwhile I've removed the offending RE's from the milter's config file,
and it runs happily. Whether it runs /correctly/ remains to be seen.

Thanks again.

--
Mike Scott
Harlow, England

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor