Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"The following is not for the weak of heart or Fundamentalists." -- Dave Barry


devel / comp.unix.shell / Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)

SubjectAuthor
* Convert CR LF to CRHarry
+* Re: Convert CR LF to CREd Morton
|`* Re: Convert CR LF to CRHarry
| `* Re: Convert CR LF to CREd Morton
|  `* Re: Convert CR LF to CRHarry
|   `* Re: Convert CR LF to CREd Morton
|    `* Re: Convert CR LF to CRHarry
|     +- Re: Convert CR LF to CRHarry
|     `* Re: Convert CR LF to CRDavid W. Hodgins
|      `* Re: Convert CR LF to CRHarry
|       `* Re: Convert CR LF to CRDavid W. Hodgins
|        `- Re: Convert CR LF to CRHarry
+* Re: Convert CR LF to CRKenny McCormack
|`- Re: Convert CR LF to CRHarry
+- Re: Convert CR LF to CRHarry
+* Re: Convert CR LF to CRKeith Thompson
|`- Re: Convert CR LF to CRHarry
+* Re: Convert CR LF to CRKaz Kylheku
|`* Re: Convert CR LF to CRHarry
| +- Re: Convert CR LF to CRKenny McCormack
| +- Re: Convert CR LF to CRKaz Kylheku
| `* Re: Convert CR LF to CRSpiros Bousbouras
|  +* Re: Convert CR LF to CRJanis Papanagnou
|  |`* Re: Convert CR LF to CRKeith Thompson
|  | `* Re: Convert CR LF to CRJanis Papanagnou
|  |  `* Re: Convert CR LF to CRKeith Thompson
|  |   +* Re: Convert CR LF to CRJanis Papanagnou
|  |   |+* Reading a file all in one go in GAWK (Was: Convert CR LF to CR)Kenny McCormack
|  |   ||+- Re: Reading a file all in one go in GAWK (Was: Convert CR LF to CR)Janis Papanagnou
|  |   ||`- Re: Reading a file all in one go in GAWKBen Bacarisse
|  |   |`* Re: Convert CR LF to CREd Morton
|  |   | +- Re: Convert CR LF to CRKenny McCormack
|  |   | `* Re: Convert CR LF to CRJanis Papanagnou
|  |   |  +* Re: Convert CR LF to CREd Morton
|  |   |  |+- Re: Convert CR LF to CRKenny McCormack
|  |   |  |`* Re: Convert CR LF to CRJanis Papanagnou
|  |   |  | +* Re: Convert CR LF to CREd Morton
|  |   |  | |`* Re: Convert CR LF to CRJanis Papanagnou
|  |   |  | | `* Re: Convert CR LF to CREd Morton
|  |   |  | |  `* [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)Janis Papanagnou
|  |   |  | |   `* Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)Janis Papanagnou
|  |   |  | |    `- Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)Ed Morton
|  |   |  | `* Re: Convert CR LF to CRWilliam Unruh
|  |   |  |  +- Re: Convert CR LF to CRJanis Papanagnou
|  |   |  |  `* Re: Convert CR LF to CREd Morton
|  |   |  |   `- Re: Convert CR LF to CREd Morton
|  |   |  +* Re: Convert CR LF to CREd Morton
|  |   |  |`- Re: Convert CR LF to CRJanis Papanagnou
|  |   |  `- Re: Convert CR LF to CRSpiros Bousbouras
|  |   +* GNU Awk bulk load of file (manual references) (was Re: Convert CR LFJanis Papanagnou
|  |   |`* Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR LFKenny McCormack
|  |   | `- Re: GNU Awk bulk load of file (manual references) (was Re: Convert CRJanis Papanagnou
|  |   `- Re: Convert CR LF to CRBen Bacarisse
|  `- Re: Convert CR LF to CRHarry
`* Re: Convert CR LF to CRKeith Thompson
 `- Re: Convert CR LF to CRHarry

Pages:123
Re: Convert CR LF to CR

<scido5$l0v$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4111&group=comp.unix.shell#4111

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 16:55:16 -0500
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <scido5$l0v$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciag4$1fv$1@dont-email.me> <scib4b$a13$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Jul 2021 21:55:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6be9b774b912cadfdb38e017142779d5";
logging-data="21535"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Y6UNh2i4LFk+A/mHQBh9h"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:+N5twc3XBMmLFfi97JgSHzfzHk8=
In-Reply-To: <scib4b$a13$1@news-1.m-online.net>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-4, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 21:55 UTC

On 7/12/2021 4:10 PM, Janis Papanagnou wrote:
> On 12.07.2021 22:59, Ed Morton wrote:
>> On 7/12/2021 3:49 PM, Janis Papanagnou wrote:
>>> On 12.07.2021 17:25, Ed Morton wrote:
>>>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>>>> It's more cryptic, no one seems to be perfectly sure how it works
>>>>
>>>> I don't know why it wouldn't just work like any other RS.
>>>
>>> The responses in that other thread left a different impression to me,
>>> that it wasn't obvious or as clear as it should be.
>>
>> I don't recall the previous conversation on it you're referring to but
>> it seems very clear and simple to me.
>>
>>>> gawk looks for
>>>> where a string matching that regexp occurs in the file and uses that to
>>>> identify the end of a record. If no such string exists in the file the
>>>> whole file is stored in $0.
>>>
>>> If all we want is a pattern that is principally non-existing wouldn't
>>> it be clearer to use something like "$^" (i.e. "^$" reversed), which
>>> is a meta-character sequence that does obviously not make any sense.
>>
>> `$^` means the literal chars `$` then `^` since `$` is only an anchor
>> metachar at the end of a regexp or subexpression and `^` only at the
>> beginning. Look:
>>
>> $ echo 'a$^b' | grep '$^'
>> a$^b
>
> We were talking about GNU Awk, don't we?

Yeah, but there's no magic about GNU awks regexp handling. My mistake
was assuming BREs and EREs were the same in regard to anchors.

>
> Consequently I have tested GNU Awk:
>
> $ echo $'a$^b\na$^b' | awk 'BEGIN{RS="$^"}{print NR, $0}'
> 1 a$^b
> a$^b

I stand corrected, apparently that's a difference between BREs and EREs:

$ echo 'a$^b' | grep '$^'
a$^b

$ echo 'a$^b' | grep -E '$^'
$

$ echo 'a$^b' | sed 's/$^/X/'
aXb

$ echo 'a$^b' | sed -E 's/$^/X/'
a$^b

See
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_09
which I thought was saying you could use any character before the `^`
and it wouldn't match which was supported by this test:

$ printf 'ax^b\nax^b\n' | awk 'BEGIN{RS="x^"}{print NR, $0}'
1 ax^b
ax^b

but then I can't explain this which is apparently just ignoring the RS
setting:

$ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS=".^"}{print NR, $0}'
1 a.^b
2 a.^b

Regards,

Ed.

Re: Convert CR LF to CR

<sciedo$r5l$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4112&group=comp.unix.shell#4112

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 17:06:47 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <sciedo$r5l$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciasb$20j$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Jul 2021 22:06:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0cda7cfa51dfb5595f80509ce25a50d0";
logging-data="27829"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//84gH8BmRiK7ljC520TrP"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:c07VGcCr8L2STzaX3Kz1QIubUYg=
In-Reply-To: <sciasb$20j$1@dont-email.me>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-4, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 22:06 UTC

On 7/12/2021 4:06 PM, William Unruh wrote:
> On 2021-07-12, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>> On 12.07.2021 17:25, Ed Morton wrote:
>>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>>> It's more cryptic, no one seems to be perfectly sure how it works
>>>
>>> I don't know why it wouldn't just work like any other RS.
>>
>> The responses in that other thread left a different impression to me,
>> that it wasn't obvious or as clear as it should be.
>>
>>> gawk looks for
>>> where a string matching that regexp occurs in the file and uses that to
>>> identify the end of a record. If no such string exists in the file the
>>> whole file is stored in $0.
>>
>> If all we want is a pattern that is principally non-existing wouldn't
>> it be clearer to use something like "$^" (i.e. "^$" reversed), which
>> is a meta-character sequence that does obviously not make any sense.
>
> Isn't $^ something that occurs at the end of every line? (End of this
> line, beginning of the next)

You're mixing up string and lines and records. `^` means start of a
string and `$` means the end of string.

In a line-oriented tool like grep or sed (without the GNU -z option and
without using a hold space), the string in question is the line that was
just read into memory and so `^` and `$` can be used to find the
start/end of the line because the line in question is the whole string.

In a record-oriented tool like awk, when used with the default RS of
`\n`, the `^` and `$` can be used the same way as in sed or grep, but
when used with a different RS the record can contain newlines so the `^`
and `$` do not match the start and end of lines any more, they match the
start and end of the record. If you use an RS that can't exist in the
input then the whole input file is the record and so `^` matches the
start of the input file while `$` matches the end of the input file.

So no, `$^` does not occur anywhere in any input if we're assuming `$`
and `^` to be anchor metachars in that expression (which they apparently
are when using an ERE such as awk uses).

>
> Actually, I also have problems with ^$ since that would seem to mean and
> empty line,

No, it's an empty string which could be a line given some RS values or
it could be an empty record that's part of a file given other RS values
or it could be an empty file given yet other RS values.

which is certainly possible (LFLF) would seem to have an
> empty line in it. But clearly I would have to know EXACTLU how awk
> determines the start and end of a line.

A line starts with the character following `^` or the previous `\n`. A
line ends with the character before `$` or the next `\n`.

Ed.

Re: Convert CR LF to CR

<scigmk$7hl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4113&group=comp.unix.shell#4113

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 17:45:40 -0500
Organization: A noiseless patient Spider
Lines: 131
Message-ID: <scigmk$7hl$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciasb$20j$1@dont-email.me> <sciedo$r5l$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 12 Jul 2021 22:45:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0cda7cfa51dfb5595f80509ce25a50d0";
logging-data="7733"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PqstG2KfloK8Arn95pHVu"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:wKQRUi54JRWAmc6kdanHlJQ6Ffg=
In-Reply-To: <sciedo$r5l$1@dont-email.me>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-4, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 22:45 UTC

On 7/12/2021 5:06 PM, Ed Morton wrote:
> On 7/12/2021 4:06 PM, William Unruh wrote:
>> On 2021-07-12, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>>> On 12.07.2021 17:25, Ed Morton wrote:
>>>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>>>> It's more cryptic, no one seems to be perfectly sure how it works
>>>>
>>>> I don't know why it wouldn't just work like any other RS.
>>>
>>> The responses in that other thread left a different impression to me,
>>> that it wasn't obvious or as clear as it should be.
>>>
>>>> gawk looks for
>>>> where a string matching that regexp occurs in the file and uses that to
>>>> identify the end of a record. If no such string exists in the file the
>>>> whole file is stored in $0.
>>>
>>> If all we want is a pattern that is principally non-existing wouldn't
>>> it be clearer to use something like "$^" (i.e. "^$" reversed), which
>>> is a meta-character sequence that does obviously not make any sense.
>>
>> Isn't $^ something that occurs at the end of every line? (End of this
>> line, beginning of the next)
>
> You're mixing up string and lines and records. `^` means start of a
> string and `$` means the end of string.
>
> In a line-oriented tool like grep or sed (without the GNU -z option and
> without using a hold space), the string in question is the line that was
> just read into memory and so `^` and `$` can be used to find the
> start/end of the line because the line in question is the whole string.
>
> In a record-oriented tool like awk, when used with the default RS of
> `\n`, the `^` and `$` can be used the same way as in sed or grep, but
> when used with a different RS the record can contain newlines so the `^`
> and `$` do not match the start and end of lines any more, they match the
> start and end of the record. If you use an RS that can't exist in the
> input then the whole input file is the record and so `^` matches the
> start of the input file while `$` matches the end of the input file.
>
> So no, `$^` does not occur anywhere in any input if we're assuming `$`
> and `^` to be anchor metachars in that expression (which they apparently
> are when using an ERE such as awk uses).
>
>>
>> Actually, I also have problems with ^$ since that would seem to mean and
>> empty line,
>
> No, it's an empty string which could be a line given some RS values or
> it could be an empty record that's part of a file given other RS values
> or it could be an empty file given yet other RS values.
>
> which is certainly possible (LFLF) would seem to have an
>> empty line in it. But clearly I would have to know EXACTLU how awk
>> determines the start and end of a line.
>
> A line starts with the character following `^` or the previous `\n`. A
> line ends with the character before `$` or the next `\n`.
>
>     Ed.

Here's some examples showing what's matched between `^` and `$` when the
string in memory (the current awk record) is a line, a multi-line
paragraph, and a whole file, all based on the value of `RS`:

The sample input (courtesy of Robert Burns "Tam O'Shanter" written/set
near my home town):

$ cat file
Ah, gentle dames! it gars me greet,
To think how mony counsels sweet,

How mony lengthen'd, sage advices,
The husband frae the wife despises!

Read 1 line at a time:

$ awk 'match($0,/^.*$/) { print "<" substr($0,RSTART,RLENGTH) ">"
}' file
<Ah, gentle dames! it gars me greet,>
<To think how mony counsels sweet,>
<>
<How mony lengthen'd, sage advices,>
<The husband frae the wife despises!>

Read 1 paragraph at a time:

$ awk -v RS='' 'match($0,/^.*$/) { print "<"
substr($0,RSTART,RLENGTH) ">" }' file
<Ah, gentle dames! it gars me greet,
To think how mony counsels sweet,>
<How mony lengthen'd, sage advices,
The husband frae the wife despises!>

Read the whole file at once (RS='^$` could be `RS='anything nonexistent'`):

$ awk -v RS='^$' 'match($0,/^.*$/) { print "<"
substr($0,RSTART,RLENGTH) ">" }' file
<Ah, gentle dames! it gars me greet,
To think how mony counsels sweet,

How mony lengthen'd, sage advices,
The husband frae the wife despises!
>

As you can see `^` and `$` always match the start and end of the record
that awk is currently processing, whether that's a line, or a paragraph,
or a whole file. The only time when `^` and `$` also identify the
start/end of a line is when the whole record is a single line.

To find lines in the multi-line paragraph case, you need to test for
`^|\n` at the start of the lines (`^` to find the start of the first
line, `\n` to find subsequent) and/or `\n|$` at the end of the lines
(`$` to find the end of the last line, `\n` to find previous), depending
on what you want to do with it, and you also need to account for the
fact that if you find `\n` it'll be part of the matching string, unlike
`^` or `$`, e.g.

$ awk -v RS='' 'match($0,/^[^\n]*/) { print "<"
substr($0,RSTART,RLENGTH) ">" }' file
<Ah, gentle dames! it gars me greet,>
<How mony lengthen'd, sage advices,>

$ awk -v RS='' 'match($0,/\n[^\n]*/) { print "<"
substr($0,RSTART+1,RLENGTH-1) ">" }' file
<To think how mony counsels sweet,>
<The husband frae the wife despises!>

Regards,

Ed.

[OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)

<scjafq$itu$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4114&group=comp.unix.shell#4114

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mb-net.net!open-news-network.org!news.bgeserver.de!bgepartei.de!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)
Date: Tue, 13 Jul 2021 08:05:46 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 49
Message-ID: <scjafq$itu$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciag4$1fv$1@dont-email.me> <scib4b$a13$1@news-1.m-online.net>
<scido5$l0v$1@dont-email.me>
NNTP-Posting-Host: 2001:a61:241e:cc01:c8cd:9774:cf36:bff0
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626156346 19390 2001:a61:241e:cc01:c8cd:9774:cf36:bff0 (13 Jul 2021 06:05:46 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Tue, 13 Jul 2021 06:05:46 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
In-Reply-To: <scido5$l0v$1@dont-email.me>
 by: Janis Papanagnou - Tue, 13 Jul 2021 06:05 UTC

On 12.07.2021 23:55, Ed Morton wrote:
>>
>> $ echo $'a$^b\na$^b' | awk 'BEGIN{RS="$^"}{print NR, $0}'
>> 1 a$^b
>> a$^b
>
> I stand corrected, apparently that's a difference between BREs and EREs:
>
> $ echo 'a$^b' | grep '$^'
> a$^b
>
> $ echo 'a$^b' | grep -E '$^'
> $
>
> $ echo 'a$^b' | sed 's/$^/X/'
> aXb
>
> $ echo 'a$^b' | sed -E 's/$^/X/'
> a$^b
>
> See
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_09
> which I thought was saying you could use any character before the `^`
> and it wouldn't match which was supported by this test:
>
> $ printf 'ax^b\nax^b\n' | awk 'BEGIN{RS="x^"}{print NR, $0}'
> 1 ax^b
> ax^b
>
> but then I can't explain this which is apparently just ignoring the RS
> setting:
>
> $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS=".^"}{print NR, $0}'
> 1 a.^b
> 2 a.^b

$ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS="[.]^"}{print NR, $0}'
1 a.^b
a.^b

It seems ".^" had not been "ignored" as RS but interpreted as string?

Janis

>
> Regards,
>
> Ed.

Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)

<scjceo$jdh$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4115&group=comp.unix.shell#4115

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mb-net.net!open-news-network.org!news.bgeserver.de!bgepartei.de!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)
Date: Tue, 13 Jul 2021 08:39:20 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 28
Message-ID: <scjceo$jdh$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciag4$1fv$1@dont-email.me> <scib4b$a13$1@news-1.m-online.net>
<scido5$l0v$1@dont-email.me> <scjafq$itu$1@news-1.m-online.net>
NNTP-Posting-Host: 2001:a61:241e:cc01:c8cd:9774:cf36:bff0
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626158360 19889 2001:a61:241e:cc01:c8cd:9774:cf36:bff0 (13 Jul 2021 06:39:20 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Tue, 13 Jul 2021 06:39:20 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
In-Reply-To: <scjafq$itu$1@news-1.m-online.net>
 by: Janis Papanagnou - Tue, 13 Jul 2021 06:39 UTC

On 13.07.2021 08:05, Janis Papanagnou wrote:
> On 12.07.2021 23:55, Ed Morton wrote:
>>
>> but then I can't explain this which is apparently just ignoring the RS
>> setting:
>>
>> $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS=".^"}{print NR, $0}'
>> 1 a.^b
>> 2 a.^b
>
> $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS="[.]^"}{print NR, $0}'
> 1 a.^b
> a.^b
>
> It seems ".^" had not been "ignored" as RS but interpreted as string?

....which still wouldn't explain the outcome of your test case, though.

Hmm..

> Janis
>
>>
>> Regards,
>>
>> Ed.
>

Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)

<sck3n6$7qe$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4116&group=comp.unix.shell#4116

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)
Date: Tue, 13 Jul 2021 08:16:22 -0500
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <sck3n6$7qe$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciag4$1fv$1@dont-email.me> <scib4b$a13$1@news-1.m-online.net>
<scido5$l0v$1@dont-email.me> <scjafq$itu$1@news-1.m-online.net>
<scjceo$jdh$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 13 Jul 2021 13:16:22 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7253244bb523020778fe88aec70683be";
logging-data="8014"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+P9n9JYU9H1X78P8J9wOwP"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:lO/yJ45BhwKM0kidTXm8EVlpmuU=
In-Reply-To: <scjceo$jdh$1@news-1.m-online.net>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-4, 7/12/2021), Outbound message
 by: Ed Morton - Tue, 13 Jul 2021 13:16 UTC

On 7/13/2021 1:39 AM, Janis Papanagnou wrote:
> On 13.07.2021 08:05, Janis Papanagnou wrote:
>> On 12.07.2021 23:55, Ed Morton wrote:
>>>
>>> but then I can't explain this which is apparently just ignoring the RS
>>> setting:
>>>
>>> $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS=".^"}{print NR, $0}'
>>> 1 a.^b
>>> 2 a.^b
>>
>> $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS="[.]^"}{print NR, $0}'
>> 1 a.^b
>> a.^b
>>
>> It seems ".^" had not been "ignored" as RS but interpreted as string?
>
> ...which still wouldn't explain the outcome of your test case, though.
>
> Hmm..
>

I talked to Arnold and he thinks it's a bug which he's investigating.
See https://lists.gnu.org/archive/html/bug-gawk/2021-07/msg00026.html.

Ed.

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor