Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Our vision is to speed up time, eventually eliminating it." -- Alex Schure


devel / comp.unix.programmer / Re: Wrapper for glob() that implements /**/ sub-pattern.

SubjectAuthor
* Wrapper for glob() that implements /**/ sub-pattern.Kaz Kylheku
`* Wrapper for glob() that implements /**/ sub-pattern.Kaz Kylheku
 `* Wrapper for glob() that implements /**/ sub-pattern.Kaz Kylheku
  `* Wrapper for glob() that implements /**/ sub-pattern.William Ahern
   +* Wrapper for glob() that implements /**/ sub-pattern.William Ahern
   |`* Wrapper for glob() that implements /**/ sub-pattern.Nuno Silva
   | +- Wrapper for glob() that implements /**/ sub-pattern.William Ahern
   | `- Wrapper for glob() that implements /**/ sub-pattern.Adam Sampson
   `* Wrapper for glob() that implements /**/ sub-pattern.Kaz Kylheku
    `- Wrapper for glob() that implements /**/ sub-pattern.Kaz Kylheku

1
Wrapper for glob() that implements /**/ sub-pattern.

<20230911202309.171@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10649&group=comp.unix.programmer#10649

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.programmer
Subject: Wrapper for glob() that implements /**/ sub-pattern.
Date: Tue, 12 Sep 2023 03:49:16 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 122
Message-ID: <20230911202309.171@kylheku.com>
Injection-Date: Tue, 12 Sep 2023 03:49:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c4d2f9d891c0a83c444defc9ec9b8b50";
logging-data="1492993"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EqeSnAbQOAd+hTStmoNkli1gRRQLRyb0="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:NzYTfDJp92ULW0R7z6dXPCEFVZU=
 by: Kaz Kylheku - Tue, 12 Sep 2023 03:49 UTC

Hi all,

I'm experimenting with a wrapper function that drop-in replaces
for the POSIX glob, but gives it the /**/ superpower.

The /**/ pattern matches zero or more path components.

I have a prototype here which works like this.

If the /**/ sub-pattern does not occur in pattern, then
it just calls glob, passing it all its parameters.

If the /**/ sub-pattern occurs in the pattern, then
it iterates on it, successively replacing it with
/, /*/, /*/*/, /*/*/*/, ... and calling itself recursively.

After the first recursive call, it adds GLOB_APPEND
to the flags.

There are issues to do with termination (when do we stop?)
and performance.

In the prototype, I have the recursion generate a maximum of 48 /*/ star
wildcards across the entire path, and each /**/ pattern can individually
expand to no more than 10.

Multiple occurrences of /**/ drag down the performance of the prototype
badly. Up to three is what I would call practical.

The real function should handle patterns starting with "**/" and also
ending in "/**", as well as when "**" is the entire pattern.

Plus there are issues of sorting. We might want to collect results with
GLOB_NOSORT and sort the paths ourselves.

I'm already thinking forward to a different algorithm, but
here is the prototype.

#include <glob.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static int super_glob_rec(const char *pattern, int flags,
int (*errfunc) (const char *epath, int eerrno),
glob_t *pglob, size_t star_limit)
{ const char *dblstar = strstr(pattern, "/**/");

if (dblstar == 0) {
return glob(pattern, flags, errfunc, pglob);
} else {
size_t i, base_len = strlen(pattern);
size_t ds_off = dblstar - pattern + 1;
size_t tail_off = ds_off + 3;
size_t limit = star_limit > 10 ? 10 : star_limit;

for (i = 0; i < limit; i++) {
size_t space = base_len - 3 + i * 2;
char *pat_copy = malloc(space + 1);
size_t j;
char *out = pat_copy + ds_off;
int res;

strncpy(pat_copy, pattern, ds_off);

for (j = 0; j < i; j++) {
*out++ = '*';
*out++ = '/';
}

strcpy(out, pattern + tail_off);

if (i > 0)
flags |= GLOB_APPEND;

res = super_glob_rec(pat_copy, flags, errfunc, pglob, star_limit - i);

free(pat_copy);

if (res && res != GLOB_NOMATCH)
return res;
}

return 0;
}
}

static int super_glob(const char *pattern, int flags,
int (*errfunc) (const char *epath, int eerrno),
glob_t *pglob)
{ return super_glob_rec(pattern, flags, errfunc, pglob, 48);
}

int main(int argc, char **argv)
{ int status = EXIT_FAILURE;

if (argc == 2) {
glob_t glb;
int res = super_glob(argv[1], 0, NULL, &glb);
if (res && res != GLOB_NOMATCH) {
fprintf(stderr, "%s: glob failed with %d\n", argv[0], res);
} else {
for (size_t i = 0; i < glb.gl_pathc; i++)
puts(glb.gl_pathv[i]);
}
globfree(&glb);
} else if (argc == 1) {
fprintf(stderr, "%s: specify one glob pattern argument\n", argv[0]);
}

return status;
}

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<20230911234401.851@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10650&group=comp.unix.programmer#10650

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.programmer
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Date: Tue, 12 Sep 2023 06:47:05 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 95
Message-ID: <20230911234401.851@kylheku.com>
References: <20230911202309.171@kylheku.com>
Injection-Date: Tue, 12 Sep 2023 06:47:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c4d2f9d891c0a83c444defc9ec9b8b50";
logging-data="1532139"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UBJEIXsh2OKxCnzzwvtZSh3F2/nvixVQ="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:SwMy77OvLeLTetW47haIRjaRuks=
 by: Kaz Kylheku - Tue, 12 Sep 2023 06:47 UTC

On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
> The real function should handle patterns starting with "**/" and also
> ending in "/**", as well as when "**" is the entire pattern.

I fixed this in the prototype.

#include <glob.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static int super_glob_rec(const char *pattern, int flags,
int (*errfunc) (const char *epath, int eerrno),
glob_t *pglob, size_t star_limit)
{ const char *dblstar = 0;

if (strncmp(pattern, "**/", 3) == 0 || strcmp(pattern, "**") == 0) {
dblstar = pattern;
} else if ((dblstar = strstr(pattern, "/**/")) != 0) {
dblstar++;
} else if (strlen(pattern) >= 3) {
const char *end = pattern + strlen(pattern);
if (strcmp(end - 3, "/**") == 0)
dblstar = end - 2;
}

if (dblstar == 0) {
return glob(pattern, flags, errfunc, pglob);
} else {
size_t i, base_len = strlen(pattern);
size_t ds_off = dblstar - pattern;
size_t tail_off = ds_off + 2;
size_t limit = star_limit > 10 ? 10 : star_limit;

for (i = 0; i < limit; i++) {
size_t space = base_len - 3 + i * 2;
char *pat_copy = malloc(space + 2);
size_t j;
char *out = pat_copy + ds_off;
int res;

strncpy(pat_copy, pattern, ds_off);

for (j = 0; j < i; j++) {
*out++ = '*';
if (j < i - 1)
*out++ = '/';
}

strcpy(out, pattern + tail_off);

if (i > 0)
flags |= GLOB_APPEND;

res = super_glob_rec(pat_copy, flags, errfunc, pglob, star_limit - i);

free(pat_copy);

if (res && res != GLOB_NOMATCH)
return res;
}

return 0;
}
}

static int super_glob(const char *pattern, int flags,
int (*errfunc) (const char *epath, int eerrno),
glob_t *pglob)
{ return super_glob_rec(pattern, flags, errfunc, pglob, 48);
}

int main(int argc, char **argv)
{ int status = EXIT_FAILURE;

if (argc == 2) {
glob_t glb;
int res = super_glob(argv[1], 0, NULL, &glb);
if (res && res != GLOB_NOMATCH) {
fprintf(stderr, "%s: glob failed with %d\n", argv[0], res);
} else {
for (size_t i = 0; i < glb.gl_pathc; i++)
puts(glb.gl_pathv[i]);
}
globfree(&glb);
} else if (argc == 1) {
fprintf(stderr, "%s: specify one glob pattern argument\n", argv[0]);
}

return status;
}

Re: Wrapper for glob() that implements /**/ sub-pattern.

<20230912101332.94@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10651&group=comp.unix.programmer#10651

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.programmer
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Date: Tue, 12 Sep 2023 17:17:17 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <20230912101332.94@kylheku.com>
References: <20230911202309.171@kylheku.com> <20230911234401.851@kylheku.com>
Injection-Date: Tue, 12 Sep 2023 17:17:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c4d2f9d891c0a83c444defc9ec9b8b50";
logging-data="1748388"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bdJFF82dOrZqnZHdZ90qcT/KDguxrruI="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:NLmM8e3mY1rtXfoyOkpzisMYo58=
 by: Kaz Kylheku - Tue, 12 Sep 2023 17:17 UTC

On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>> The real function should handle patterns starting with "**/" and also
>> ending in "/**", as well as when "**" is the entire pattern.
>
> I fixed this in the prototype.

Issues:

1. Sorting with brace expansions:

Brace expansion, is supported in glibc's glob via GLOB_BRACE.

The order of the brace substitutions has to be preserved
through the globbing.

Given a glob pattern "alpha{beta,gamma}omega", it has to be
expanded into "alphabetaomega" and "alphagammaomega" which
are separately matched, sorted, and then combined together in
that order.

It has to be that way, otherwise it will wreck the semantics of
command lines generated with brace expansion.

E.g. *.{foo,bar} has to behave like *.foo *.bar where all the
.foo files list before .bar files.

That means if you want to implement some new glob semantics on top
of glob, you have to intercept GLOB_BRACE and do it yourself;
you can't just apply a sorting pass to the total expansion.

2. Escaping

The interior /**/ pattern could occur in a class like [abc/**/def]
in which case it must not be recognized.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<rq5atj-sh92.ln1@wilbur.25thandClement.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10652&group=comp.unix.programmer#10652

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Thu, 14 Sep 2023 02:00:02 +0000
Message-ID: <rq5atj-sh92.ln1@wilbur.25thandClement.com>
From: will...@25thandClement.com (William Ahern)
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Newsgroups: comp.unix.programmer
References: <20230911202309.171@kylheku.com> <20230911234401.851@kylheku.com> <20230912101332.94@kylheku.com>
User-Agent: tin/2.4.4-20191224 ("Millburn") (OpenBSD/7.3 (amd64))
Date: Wed, 13 Sep 2023 18:56:11 -0700
Lines: 56
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-telFjaePB3zpj/e1Lkj+Dgt6J1sm1IWEZuylIxghhKNio4LWK/b55nffFWexDm5Frvv3MHrL3fs85MI!p/mjFxya+O/g38qf+CDRloTotR6ORT1jMfm7EIkLhVerH0LJLep6uRdMjNHPPVt7XI0=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: William Ahern - Thu, 14 Sep 2023 01:56 UTC

Kaz Kylheku <864-117-4973@kylheku.com> wrote:
> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>> The real function should handle patterns starting with "**/" and also
>>> ending in "/**", as well as when "**" is the entire pattern.
>>
>> I fixed this in the prototype.
>
> Issues:
<snip>
>
> 2. Escaping
>
> The interior /**/ pattern could occur in a class like [abc/**/def]
> in which case it must not be recognized.

FWIW, OpenBSD sh seems not to tolerate slashes in bracket expressions:

$ ls *
bar foo foo*bar
$ ls *[*]*
foo*bar
$ ls *[*z]*
foo*bar
$ ls *[*/]*
ls: *[*/]*: No such file or directory

It seems to be [recursively] splitting on slash before pattern matching on
filenames. See line 1086 in globit at
https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/ksh/eval.c?annotate=1.67

And this behavior seems to be POSIX compliant:

2.13.3 Patterns Used for Filename Expansion

The rules described so far in Patterns Matching a Single Character and
Patterns Matching Multiple Characters are qualified by the following rules
that apply when pattern matching notation is used for filename expansion:

1. The <slash> character in a pathname shall be explicitly matched by
using one or more <slash> characters in the pattern; it shall neither
be matched by the <asterisk> or <question-mark> special characters nor
by a bracket expression. <slash> characters in the pattern shall be
identified before bracket expressions; thus, a <slash> cannot be
included in a pattern bracket expression used for filename expansion.

At first I was wondering why you thought you could get away with merely
scanning for slash+double-star and double-star+slash--bracket expressions
obviously require stateful parsing. But clearly POSIX anticipates (or even
expects) that shells would process slashes before parsing component
patterns. I believe this allowance/limitation would likewise apply to
glob(3), which specially mentions rule #3 under 2.13.3; unlike case and
fnmatch(3), which are normally used for generic string matching.

But I guess none of that is helpful if you're trying to match some
sophisticated Bash behavior.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<c08atj-0322.ln1@wilbur.25thandClement.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10653&group=comp.unix.programmer#10653

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Thu, 14 Sep 2023 02:33:26 +0000
Message-ID: <c08atj-0322.ln1@wilbur.25thandClement.com>
From: will...@25thandClement.com (William Ahern)
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Newsgroups: comp.unix.programmer
References: <20230911202309.171@kylheku.com> <20230911234401.851@kylheku.com> <20230912101332.94@kylheku.com> <rq5atj-sh92.ln1@wilbur.25thandClement.com>
User-Agent: tin/2.4.4-20191224 ("Millburn") (OpenBSD/7.3 (amd64))
Date: Wed, 13 Sep 2023 19:33:16 -0700
Lines: 46
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-Nko0RkoFR+TelFnsuq/otUzPtOs8BMW4zfMvytnPnbjkOFGodkwzbCwqMXBMEZmQ4kmqsLR963VIogO!RgAGYRbfbC7upbntieEeMqPFMVSkUYEo0okmWn9HCN4JZSQ56eIe+2XzHaWx4jAk2bk=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: William Ahern - Thu, 14 Sep 2023 02:33 UTC

William Ahern <william@25thandclement.com> wrote:
> Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>>> The real function should handle patterns starting with "**/" and also
>>>> ending in "/**", as well as when "**" is the entire pattern.
>>>
>>> I fixed this in the prototype.
>>
>> Issues:
> <snip>
>>
>> 2. Escaping
>>
>> The interior /**/ pattern could occur in a class like [abc/**/def]
>> in which case it must not be recognized.
>
<snip>
> At first I was wondering why you thought you could get away with merely
> scanning for slash+double-star and double-star+slash--bracket expressions
> obviously require stateful parsing. But clearly POSIX anticipates (or even
> expects) that shells would process slashes before parsing component
> patterns. I believe this allowance/limitation would likewise apply to
> glob(3), which specially mentions rule #3 under 2.13.3; unlike case and
> fnmatch(3), which are normally used for generic string matching.

I'm glad I dug into this as I may have missed this caveat several years ago
when experimenting with a recursive glob command in POSIX shell:

https://25thandClement.com/~william/2023/glob.sh

The recursion is implemented by recursively appending '*/' to the directory
prefix, rather than with an in-pattern construct, either because I forgot
about the '/**/' extension or assumed it wasn't possible without more
cumbersome, stateful parsing of the pattern string. Though most of the
implementation is preoccupped with how to safely communicate filenames with
special characters--particularly whitespace--through pipelines using only
fast shell commands like printf (as opposed to od).

I wish I had pushed this up to GitHub as I rely on the (sh) printf %b format
specifier for decoding encoded filenames. POSIX sh printf %b clashes with
the newly defined %b in C2x. There's a discussion on the Open Group
mailing-list about whether POSIX 202x/SuSv5 should deprecate %b, and someone
did a query across GitHub code only to find that almost all the usages of %b
are the same copy-pasted code which could be easily replaced if POSIX
deprecated/removed the old %b specifier.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<20230913201834.46@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10654&group=comp.unix.programmer#10654

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.programmer
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Date: Thu, 14 Sep 2023 03:34:12 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <20230913201834.46@kylheku.com>
References: <20230911202309.171@kylheku.com>
<20230911234401.851@kylheku.com> <20230912101332.94@kylheku.com>
<rq5atj-sh92.ln1@wilbur.25thandClement.com>
Injection-Date: Thu, 14 Sep 2023 03:34:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3cc25e62ab9734f1842fb79a3bbf3440";
logging-data="2631169"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+FTsvhUQGDNkRNXaIEiaaMjtf5xnKwwAI="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:eSmmax7LBK/FeYJ9/5PL+kEUT6I=
 by: Kaz Kylheku - Thu, 14 Sep 2023 03:34 UTC

On 2023-09-14, William Ahern <william@25thandClement.com> wrote:
> Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>>> The real function should handle patterns starting with "**/" and also
>>>> ending in "/**", as well as when "**" is the entire pattern.
>>>
>>> I fixed this in the prototype.
>>
>> Issues:
><snip>
>>
>> 2. Escaping
>>
>> The interior /**/ pattern could occur in a class like [abc/**/def]
>> in which case it must not be recognized.
>
> FWIW, OpenBSD sh seems not to tolerate slashes in bracket expressions:

Yes; and it doesn't make sense. Or not in glob, anyway.

Nevertheless, a star glob preprocessor above glob should heed the class
syntax and not treat /**/; just pass that through to glob and let it
fail.

Matching slashes with class syntax makes sense in situations when
we are not matching paths. Or matching paths more freely.
obvously it's allowed in a POSIX shell case statement, and in fnmatch()
(in the absence of FNM_PATHNAME) and and so on.

> At first I was wondering why you thought you could get away with merely
> scanning for slash+double-star and double-star+slash--bracket expressions
> obviously require stateful parsing.

I was initially after the behavior: proof-of-concept. When things have
driven around the block, then we tighten the lugnuts.

In the current version I have it look for ** being the whole thing,
starting with **/, ending with /** or containing /**/ where the
leading / is not escaped or in a character class.

I think I may havae a bug: for detecting the trailing /**, we should
avoid interpreting it if it follows a backslash; let glob deal
with it.

> But I guess none of that is helpful if you're trying to match some
> sophisticated Bash behavior.

I used this to cob together a function called glob* that is now
integrated in TXR Lisp.

The current glob function is a wrapper for glob written in C,
which now calls superglob if the extension flag GLOB_XSTAR is present.

glob accepts a list of patterns, not just a single pattern; the results
from multiple patterns are catenated.

glob* is written in Lisp, and performs its own brace expansion to
generate a list of patterns passed to glob with GLOB_XSTAR.

(I had once submitted a TXR Lisp brace expansion solution to Rosetta
Code; I integrated that ready-made solution.)

I've tested on Linux, MacOS and Cygwin so far; it's looking good.
Will check Solaris 10.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<20230913231055.828@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10655&group=comp.unix.programmer#10655

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.unix.programmer
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Date: Thu, 14 Sep 2023 06:17:37 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 122
Message-ID: <20230913231055.828@kylheku.com>
References: <20230911202309.171@kylheku.com>
<20230911234401.851@kylheku.com> <20230912101332.94@kylheku.com>
<rq5atj-sh92.ln1@wilbur.25thandClement.com> <20230913201834.46@kylheku.com>
Injection-Date: Thu, 14 Sep 2023 06:17:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3cc25e62ab9734f1842fb79a3bbf3440";
logging-data="2664044"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+U6aLLE2EqCiuP64zt484a7tyjIXTXomk="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:a/DWOy0Nt36Y1ehUDOMJo9wmeMc=
 by: Kaz Kylheku - Thu, 14 Sep 2023 06:17 UTC

On 2023-09-14, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
> On 2023-09-14, William Ahern <william@25thandClement.com> wrote:
>> Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>>> On 2023-09-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
>>>>> The real function should handle patterns starting with "**/" and also
>>>>> ending in "/**", as well as when "**" is the entire pattern.
>>>>
>>>> I fixed this in the prototype.
>>>
>>> Issues:
>><snip>
>>>
>>> 2. Escaping
>>>
>>> The interior /**/ pattern could occur in a class like [abc/**/def]
>>> in which case it must not be recognized.
>>
>> FWIW, OpenBSD sh seems not to tolerate slashes in bracket expressions:
>
> Yes; and it doesn't make sense. Or not in glob, anyway.
>
> Nevertheless, a star glob preprocessor above glob should heed the class
> syntax and not treat /**/; just pass that through to glob and let it
> fail.
>
> Matching slashes with class syntax makes sense in situations when
> we are not matching paths. Or matching paths more freely.
> obvously it's allowed in a POSIX shell case statement, and in fnmatch()
> (in the absence of FNM_PATHNAME) and and so on.
>
>> At first I was wondering why you thought you could get away with merely
>> scanning for slash+double-star and double-star+slash--bracket expressions
>> obviously require stateful parsing.
>
> I was initially after the behavior: proof-of-concept. When things have
> driven around the block, then we tighten the lugnuts.
>
> In the current version I have it look for ** being the whole thing,
> starting with **/, ending with /** or containing /**/ where the
> leading / is not escaped or in a character class.
>
> I think I may havae a bug: for detecting the trailing /**, we should
> avoid interpreting it if it follows a backslash; let glob deal
> with it.
>
>> But I guess none of that is helpful if you're trying to match some
>> sophisticated Bash behavior.
>
> I used this to cob together a function called glob* that is now
> integrated in TXR Lisp.
>
> The current glob function is a wrapper for glob written in C,
> which now calls superglob if the extension flag GLOB_XSTAR is present.
>
> glob accepts a list of patterns, not just a single pattern; the results
> from multiple patterns are catenated.
>
> glob* is written in Lisp, and performs its own brace expansion to
> generate a list of patterns passed to glob with GLOB_XSTAR.

I should mention I fixed the brace expansion sorting issue;
and the general sorting issue.

The individual super_globs calls beneath the glob wrapper do the
sorting: GLOB_NOSORT is passed to glob and then the array is sorted
using qsort.

The brace expansion is processed in glob* which appends the individual
results together in order, so there is no global sort messing things up.

For qsort, I am using the comparison function below.

In contrast, Bash's internal glob function does stupid things,
like use strcoll or strcmp under different circumstnaces.
strcoll falls victim to locale; strcmp is silly for paths.

The following function just sorts on byte without regard for
character set. However, it collates the / character before any other.

So for instance these two entries are in sorted order:

test/
test-dir/

whereas under strcmp(), test-dir would come first because -
is before / in ASCII.

convert is a casting macro; just pretend convert(T, E) is ((T) (E)).

static int glob_path_cmp(const void *ls, const void *rs)
{ const unsigned char *lstr = *convert(const unsigned char * const *, ls);
const unsigned char *rstr = *convert(const unsigned char * const *, rs);

for (; *lstr && *rstr; lstr++, rstr++)
{
if (*lstr == *rstr)
continue;
if (*lstr == '/')
return -1;
if (*rstr == '/')
return 1;
if (*lstr < *rstr)
return -1;
if (*lstr > *rstr)
return 1;
}

if (!*lstr)
return -1;
if (!*rstr)
return 1;

return 0;
}

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<uduhtl$2ijjc$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10656&group=comp.unix.programmer#10656

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nunojsi...@invalid.invalid (Nuno Silva)
Newsgroups: comp.unix.programmer
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Date: Thu, 14 Sep 2023 09:59:32 +0100
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <uduhtl$2ijjc$1@dont-email.me>
References: <20230911202309.171@kylheku.com> <20230911234401.851@kylheku.com>
<20230912101332.94@kylheku.com>
<rq5atj-sh92.ln1@wilbur.25thandClement.com>
<c08atj-0322.ln1@wilbur.25thandClement.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="6b9751d841d2c195d7fd46e63f058073";
logging-data="2707052"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/wrmWnSHjekYFYFky2VT0h"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
Cancel-Lock: sha1:+DmKHvH6FDhvAv80xGgqEDqf1mc=
 by: Nuno Silva - Thu, 14 Sep 2023 08:59 UTC

On 2023-09-14, William Ahern wrote:

[...]
> I wish I had pushed this up to GitHub as I rely on the (sh) printf %b format
> specifier for decoding encoded filenames. POSIX sh printf %b clashes with
> the newly defined %b in C2x. There's a discussion on the Open Group
> mailing-list about whether POSIX 202x/SuSv5 should deprecate %b, and someone
> did a query across GitHub code only to find that almost all the usages of %b
> are the same copy-pasted code which could be easily replaced if POSIX
> deprecated/removed the old %b specifier.

How complete would such a query be? I mean, are there other
"centralized" web repositories of code with a significant user base?

(There is also code at other locations grouping several projects, FSF's
savannah comes to mind, but I don't think it has a centralized code
search, does it? Same goes for SourceForge?)

--
Nuno Silva

Re: Wrapper for glob() that implements /**/ sub-pattern.

<6fmftj-9ad.ln1@wilbur.25thandClement.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10660&group=comp.unix.programmer#10660

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.szaf.org!3.eu.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 16 Sep 2023 04:15:02 +0000
Message-ID: <6fmftj-9ad.ln1@wilbur.25thandClement.com>
From: will...@25thandClement.com (William Ahern)
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Newsgroups: comp.unix.programmer
References: <20230911202309.171@kylheku.com> <20230911234401.851@kylheku.com> <20230912101332.94@kylheku.com> <rq5atj-sh92.ln1@wilbur.25thandClement.com> <c08atj-0322.ln1@wilbur.25thandClement.com> <uduhtl$2ijjc$1@dont-email.me>
User-Agent: tin/2.4.4-20191224 ("Millburn") (OpenBSD/7.3 (amd64))
Date: Fri, 15 Sep 2023 21:10:46 -0700
Lines: 32
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-7ZCIXVtxZrpmTvXdr3w7cofRUScu1lEVySVafa7KGh4SdePzndNhUxzKvRQmBEw6xuuiz/F+51szPNk!0bmWhkfE8kTdf7PO4XGnADOHwSsH3kgxedJZWgIKETCi4sCws6LkkNlkgUQ1LSaacg==
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: William Ahern - Sat, 16 Sep 2023 04:10 UTC

Nuno Silva <nunojsilva@invalid.invalid> wrote:
> On 2023-09-14, William Ahern wrote:
>
> [...]
>> I wish I had pushed this up to GitHub as I rely on the (sh) printf %b format
>> specifier for decoding encoded filenames. POSIX sh printf %b clashes with
>> the newly defined %b in C2x. There's a discussion on the Open Group
>> mailing-list about whether POSIX 202x/SuSv5 should deprecate %b, and someone
>> did a query across GitHub code only to find that almost all the usages of %b
>> are the same copy-pasted code which could be easily replaced if POSIX
>> deprecated/removed the old %b specifier.
>
> How complete would such a query be? I mean, are there other
> "centralized" web repositories of code with a significant user base?
>
> (There is also code at other locations grouping several projects, FSF's
> savannah comes to mind, but I don't think it has a centralized code
> search, does it? Same goes for SourceForge?)
>

I don't know how much credence anyone on the mailing-list gives to that
GitHub search, nor do I know where the balance of opinion stands on
deprecating %b. And unfortunately most of the discussion occured on a
conference call (which I wasn't privy to) and the mailing-list. Though it
seems to have spilled over to an Enhancement Request regarding %q,
https://www.austingroupbugs.net/view.php?id=1771, and the GNU coreutils
mailing-list, https://lists.gnu.org/r/bug-coreutils/2023-09/msg00015.html

I don't believe the mailing-list archive is public, but subscribing is open.
You first need to create an account on opengroup.org. The mailing-list is
austin-group-l@opengroup.org, but the portal website is byzantine and I
can't even find the page showing my subscription to the mailing-list.

Re: Wrapper for glob() that implements /**/ sub-pattern.

<y2asf7eclbw.fsf@offog.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10664&group=comp.unix.programmer#10664

  copy link   Newsgroups: comp.unix.programmer
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ats...@offog.org (Adam Sampson)
Newsgroups: comp.unix.programmer
Subject: Re: Wrapper for glob() that implements /**/ sub-pattern.
Date: Sat, 16 Sep 2023 14:13:39 +0100
Lines: 13
Message-ID: <y2asf7eclbw.fsf@offog.org>
References: <20230911202309.171@kylheku.com> <20230911234401.851@kylheku.com>
<20230912101332.94@kylheku.com>
<rq5atj-sh92.ln1@wilbur.25thandClement.com>
<c08atj-0322.ln1@wilbur.25thandClement.com>
<uduhtl$2ijjc$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net EtFUt89+oOgcY3BC60RpyQOAAcpNygrDIEbGvNW5aqWQaO2AB4
X-Orig-Path: cartman.offog.org!not-for-mail
Cancel-Lock: sha1:mxnzsJmNKLJwWd0G48/gVLaDngg= sha1:CZ1cI8FHw3K4/egnp3NA4BJ5nzI= sha256:wfHn4H2flEcEc4WAOxIoi7hRHrDprCAt2rABtpHseSo=
User-Agent: Gnus/5.13 (Gnus v5.13)
 by: Adam Sampson - Sat, 16 Sep 2023 13:13 UTC

"Nuno Silva" <nunojsilva@invalid.invalid> writes:

> How complete would such a query be? I mean, are there other
> "centralized" web repositories of code with a significant user base?

I find codesearch.debian.net very useful for this kind of thing -- it
allows regex searches across all of Debian's source packages. This
includes both widely-used and obscure software, although it only covers
packages in the current development version of Debian so older code
won't be included.

--
Adam Sampson <ats@offog.org> <http://offog.org/>

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor