Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Never test for an error condition you don't know how to handle. -- Steinbach


devel / comp.unix.shell / Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR LF to CR)

SubjectAuthor
* Convert CR LF to CRHarry
+* Re: Convert CR LF to CREd Morton
|`* Re: Convert CR LF to CRHarry
| `* Re: Convert CR LF to CREd Morton
|  `* Re: Convert CR LF to CRHarry
|   `* Re: Convert CR LF to CREd Morton
|    `* Re: Convert CR LF to CRHarry
|     +- Re: Convert CR LF to CRHarry
|     `* Re: Convert CR LF to CRDavid W. Hodgins
|      `* Re: Convert CR LF to CRHarry
|       `* Re: Convert CR LF to CRDavid W. Hodgins
|        `- Re: Convert CR LF to CRHarry
+* Re: Convert CR LF to CRKenny McCormack
|`- Re: Convert CR LF to CRHarry
+- Re: Convert CR LF to CRHarry
+* Re: Convert CR LF to CRKeith Thompson
|`- Re: Convert CR LF to CRHarry
+* Re: Convert CR LF to CRKaz Kylheku
|`* Re: Convert CR LF to CRHarry
| +- Re: Convert CR LF to CRKenny McCormack
| +- Re: Convert CR LF to CRKaz Kylheku
| `* Re: Convert CR LF to CRSpiros Bousbouras
|  +* Re: Convert CR LF to CRJanis Papanagnou
|  |`* Re: Convert CR LF to CRKeith Thompson
|  | `* Re: Convert CR LF to CRJanis Papanagnou
|  |  `* Re: Convert CR LF to CRKeith Thompson
|  |   +* Re: Convert CR LF to CRJanis Papanagnou
|  |   |+* Reading a file all in one go in GAWK (Was: Convert CR LF to CR)Kenny McCormack
|  |   ||+- Re: Reading a file all in one go in GAWK (Was: Convert CR LF to CR)Janis Papanagnou
|  |   ||`- Re: Reading a file all in one go in GAWKBen Bacarisse
|  |   |`* Re: Convert CR LF to CREd Morton
|  |   | +- Re: Convert CR LF to CRKenny McCormack
|  |   | `* Re: Convert CR LF to CRJanis Papanagnou
|  |   |  +* Re: Convert CR LF to CREd Morton
|  |   |  |+- Re: Convert CR LF to CRKenny McCormack
|  |   |  |`* Re: Convert CR LF to CRJanis Papanagnou
|  |   |  | +* Re: Convert CR LF to CREd Morton
|  |   |  | |`* Re: Convert CR LF to CRJanis Papanagnou
|  |   |  | | `* Re: Convert CR LF to CREd Morton
|  |   |  | |  `* [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)Janis Papanagnou
|  |   |  | |   `* Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)Janis Papanagnou
|  |   |  | |    `- Re: [OT] GNU Awk's regexp RS behavior (was Re: Convert CR LF to CR)Ed Morton
|  |   |  | `* Re: Convert CR LF to CRWilliam Unruh
|  |   |  |  +- Re: Convert CR LF to CRJanis Papanagnou
|  |   |  |  `* Re: Convert CR LF to CREd Morton
|  |   |  |   `- Re: Convert CR LF to CREd Morton
|  |   |  +* Re: Convert CR LF to CREd Morton
|  |   |  |`- Re: Convert CR LF to CRJanis Papanagnou
|  |   |  `- Re: Convert CR LF to CRSpiros Bousbouras
|  |   +* GNU Awk bulk load of file (manual references) (was Re: Convert CR LFJanis Papanagnou
|  |   |`* Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR LFKenny McCormack
|  |   | `- Re: GNU Awk bulk load of file (manual references) (was Re: Convert CRJanis Papanagnou
|  |   `- Re: Convert CR LF to CRBen Bacarisse
|  `- Re: Convert CR LF to CRHarry
`* Re: Convert CR LF to CRKeith Thompson
 `- Re: Convert CR LF to CRHarry

Pages:123
Re: Convert CR LF to CR

<15f5f2d2-1da2-48d2-bf32-4f138e0a5eb4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4081&group=comp.unix.shell#4081

  copy link   Newsgroups: comp.unix.shell
X-Received: by 2002:a05:620a:817:: with SMTP id s23mr1491335qks.296.1625927958057;
Sat, 10 Jul 2021 07:39:18 -0700 (PDT)
X-Received: by 2002:a0c:9a03:: with SMTP id p3mr42044520qvd.40.1625927957962;
Sat, 10 Jul 2021 07:39:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.unix.shell
Date: Sat, 10 Jul 2021 07:39:17 -0700 (PDT)
In-Reply-To: <87k0lz6my1.fsf@nosuchdomain.example.com>
Injection-Info: google-groups.googlegroups.com; posting-host=96.49.148.18; posting-account=0cwH1wkAAACyHC8RuqgGpRum9kOwb46T
NNTP-Posting-Host: 96.49.148.18
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com> <87k0lz6my1.fsf@nosuchdomain.example.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <15f5f2d2-1da2-48d2-bf32-4f138e0a5eb4n@googlegroups.com>
Subject: Re: Convert CR LF to CR
From: harryooo...@hotmail.com (Harry)
Injection-Date: Sat, 10 Jul 2021 14:39:18 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Harry - Sat, 10 Jul 2021 14:39 UTC

On Friday, July 9, 2021 at 5:18:36 PM UTC-7, Keith Thompson wrote:

> #include <stdio.h>
> int main(void) {
> int prev = 0;
> int c;
> while ((c = getchar()) != EOF) {
> if (! (prev == '\r' && c == '\n')) {
> putchar(c);
> }
> prev = c;
> }
> }

It works very well, thanks.

Re: Convert CR LF to CR

<8735sl7mhw.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4084&group=comp.unix.shell#4084

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Sat, 10 Jul 2021 16:55:07 -0700
Organization: None to speak of
Lines: 83
Message-ID: <8735sl7mhw.fsf@nosuchdomain.example.com>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="68bb88e2a94ee9aa4c90ad022c4869a2";
logging-data="27022"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19tPb6+ewlsGB6Yw5taMUie"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:LE2eKK08xsbn2uhuDQKENZdvwgM=
sha1:ZwWq11ct9IsRfbOWMOhgIcl43CM=
 by: Keith Thompson - Sat, 10 Jul 2021 23:55 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
> On 10.07.2021 13:02, Spiros Bousbouras wrote:
>> By the way , for this kind of thing the first thing to cross my mind would be
>> to use C because for this problem it actually provides the clearest code.
>
> And likely also the fastest, which would matter if - as I would expect
> with health data - a lot of data has to be handled.
>
>> So
>> something like Keith's code in <87k0lz6my1.fsf@nosuchdomain.example.com>
>> would be my preference.
>
> For small data sets a quick awk prototype solution might be preferable,
> and also if there's no C development environment available in the OP's
> working context.

I thought of using C because I found it clearer to think of the input
and output as character-oriented, not line-oriented. awk processing is
line-oriented.

On the other hand, there's more than one way to describe the problem.

One way, which led to my C solution, was:

Copy input *characters* to output, except that every occurrence of
CRLF is replaced by CR.

An equivalent way to describe it is:

Copy input *lines* to output, except that any line ending in CR is
printed without a trailing newline (think "echo -n" vs. "echo").

(This assumes a "line" is terminated by a single LF character.)

The input is line-oriented. The output is not.

That led me to this awk solution, which is similar to the one Kaz
posted, except that mine is a bit more verbose. It might be easier for
someone who's not an awk expert to follow. (My awk is a bit rusty).

#!/usr/bin/awk -f

{
if (/\r$/) {
printf("%s", $0);
}
else {
print
}
}

Awk's input is performed a line at a time, discarding the terminating
newline character. The "print" statement adds a newline character (and
prints $0, the input line, if you don't give it an argument). "printf"
prints exactly what you tell it to, printing a newline only if you
specify it. (`awk '{print}'` copies input to output *except* that if
the input doesn't end in a newline, it will add one.)

And here's a Perl one-liner solution:

perl -pe 'chomp if /\r$/'

Perl, like awk, is line-oriented, but it doesn't discard newlines on
input. "chomp" deletes a trailing newline. "-p" tells Perl to do an
awk-like loop copying input lines to output. An equivalent Perl script
that doesn't use special command-line options:

#!/usr/bin/perl

# The following two lines don't matter in this case, but are good Perl
# practice in general.
use strict;
use warnings;

while (<>) {
chomp if /\r$/;
print;
}

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Re: Convert CR LF to CR

<scdj64$ud3$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4086&group=comp.unix.shell#4086

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Sun, 11 Jul 2021 03:57:24 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 18
Message-ID: <scdj64$ud3$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com>
NNTP-Posting-Host: 2001:a61:241e:cc01:4048:ff35:82b7:bf20
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1625968644 31139 2001:a61:241e:cc01:4048:ff35:82b7:bf20 (11 Jul 2021 01:57:24 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Sun, 11 Jul 2021 01:57:24 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
In-Reply-To: <8735sl7mhw.fsf@nosuchdomain.example.com>
 by: Janis Papanagnou - Sun, 11 Jul 2021 01:57 UTC

On 11.07.2021 01:55, Keith Thompson wrote:
>
> I thought of using C because I found it clearer to think of the input
> and output as character-oriented, not line-oriented.

Definitely.

> awk processing is line-oriented.

Unless you use GNU Awk with a "stream processing" setup.

>
> On the other hand, there's more than one way to describe the problem.

Sure.

Janis

Re: Convert CR LF to CR

<87r1g55xgs.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4087&group=comp.unix.shell#4087

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Sat, 10 Jul 2021 20:41:07 -0700
Organization: None to speak of
Lines: 28
Message-ID: <87r1g55xgs.fsf@nosuchdomain.example.com>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com>
<scdj64$ud3$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="68bb88e2a94ee9aa4c90ad022c4869a2";
logging-data="20648"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oOT/nGYX4mX0TNiT5n9Oq"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:wSxOLmsYNMfpMEvmS/nohHeIJFM=
sha1:Q6pG1AfD93IIOXYMcA/uxCP4uKc=
 by: Keith Thompson - Sun, 11 Jul 2021 03:41 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
> On 11.07.2021 01:55, Keith Thompson wrote:
>> I thought of using C because I found it clearer to think of the input
>> and output as character-oriented, not line-oriented.
>
> Definitely.
>
>> awk processing is line-oriented.
>
> Unless you use GNU Awk with a "stream processing" setup.

I don't see anything about that in the gawk manual. Do you have a
reference?

For myself, if a problem isn't strictly line-oriented I generally treat
that as a sign that I should use something other than awk, but I'm sure
it's more powerful than what I'm aware of.

(There's a comp.lang.awk newsgroup, so perhaps this isn't quite topical.)

>> On the other hand, there's more than one way to describe the problem.
>
> Sure.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Re: Convert CR LF to CR

<scehh5$76d$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4089&group=comp.unix.shell#4089

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Sun, 11 Jul 2021 12:35:17 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 52
Message-ID: <scehh5$76d$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com>
NNTP-Posting-Host: 2001:a61:241e:cc01:4048:ff35:82b7:bf20
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1625999717 7373 2001:a61:241e:cc01:4048:ff35:82b7:bf20 (11 Jul 2021 10:35:17 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Sun, 11 Jul 2021 10:35:17 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
X-Enigmail-Draft-Status: N1110
In-Reply-To: <87r1g55xgs.fsf@nosuchdomain.example.com>
 by: Janis Papanagnou - Sun, 11 Jul 2021 10:35 UTC

On 11.07.2021 05:41, Keith Thompson wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>> On 11.07.2021 01:55, Keith Thompson wrote:
>>> I thought of using C because I found it clearer to think of the input
>>> and output as character-oriented, not line-oriented.
>>
>> Definitely.
>>
>>> awk processing is line-oriented.
>>
>> Unless you use GNU Awk with a "stream processing" setup.
>
> I don't see anything about that in the gawk manual. Do you have a
> reference?

I don't recall whether it was mentioned in Arnold's Awk book and/or
also discussed in comp.lang.awk, and myself I rarely used it and it
was decades ago. A quick test (from memory) shows how to activate
that feature, by setting RS to NUL. The following script

awk 'BEGIN{RS="\0"} END{print NR}'

will print 1 (for non-empty files), meaning that it has processed one
record.

> For myself, if a problem isn't strictly line-oriented I generally treat
> that as a sign that I should use something other than awk, but I'm sure
> it's more powerful than what I'm aware of.

Actually you can expect that GNU Awk supports quite some features that
the Awk standard doesn't, and quite useful features, like allowing RS
to be a regular expression (just for a prominent and important example).

Given GNU Awk's widespread availability (on many platforms), and being
open source, and being still actively supported and developed, makes it
effectively the quasi standard tool for awk.

Mind that here in comp.unix.shell there's regularly code posted that
contains bash-specifcs, often without this fact being mentioned. I
think it should always be mentioned when non-standard constructs are
(necessarily or unnecessarily) used, so folks can decide whether the
solution is suited or not, and the same (IMO to a lesser degree,
because of being a quasi-standard) is true for GNU Awk.

> (There's a comp.lang.awk newsgroup, so perhaps this isn't quite topical.)

Awk solutions are fine here. For specific Awk language or tool specific
discussions the awk newsgroup might be better suited. Though I don't
see that we went into any gory awk details here with our discussion.

Janis

Reading a file all in one go in GAWK (Was: Convert CR LF to CR)

<scem7t$3c1g9$1@news.xmission.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4091&group=comp.unix.shell#4091

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gaze...@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.unix.shell
Subject: Reading a file all in one go in GAWK (Was: Convert CR LF to CR)
Date: Sun, 11 Jul 2021 11:55:41 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <scem7t$3c1g9$1@news.xmission.com>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com> <scdj64$ud3$1@news-1.m-online.net> <87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
Injection-Date: Sun, 11 Jul 2021 11:55:41 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="3540489"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Sun, 11 Jul 2021 11:55 UTC

In article <scehh5$76d$1@news-1.m-online.net>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
....
>>> Unless you use GNU Awk with a "stream processing" setup.
>>
>> I don't see anything about that in the gawk manual. Do you have a
>> reference?
>
>I don't recall whether it was mentioned in Arnold's Awk book and/or
>also discussed in comp.lang.awk, and myself I rarely used it and it
>was decades ago.

I don't think "stream processing" is a GAWK "thing". I think you are
mis-remembering that (to put it kindly).

From what you say below, I am going to guess that what you are attempting
to evoke is the idea of reading in a whole file at once, and then
processing through the buffer. Sort of like what you might do in a
low-level language like C.

>A quick test (from memory) shows how to activate
>that feature, by setting RS to NUL. The following script
>
> awk 'BEGIN{RS="\0"} END{print NR}'

This doesn't do what you think it does. In fact, setting RS to a NUL
character is perfectly legitimate and will do exactly that. This can be
useful when processing the /proc/*/environ files in Linux. E.g.,

gawk 'BEGIN { RS = "\0" } { print NR,$0 }' /proc/self/environ | less

>will print 1 (for non-empty files), meaning that it has processed one
>record.

Only if there are no actual nulls in the file.

Now, what I *think* you are going for is that if you set RS="^$", then it
*is* guaranteed to never match. This has been discussed in the newsgroup
(comp.lang.awk), and has been codified as the following include file
(readfile.awk), which I have in my "awksrc" directory (accessed via the
AWKPATH environment variable) on all of my systems:

--- Cut Here ---
# readfile.awk --- read an entire file at once
# # Original idea by Denis Shirokov, cosmogen@gmail.com, April 2013
#

function readfile(file, tmp, save_rs)
{ save_rs = RS
RS = "^$"
getline tmp < file
close(file)
RS = save_rs

return tmp
} --- Cut Here ---

This could be used to solve OP's problem (assuming we ever actually figure
out what OP's problem is), assuming the file isn't too large.

P.S. I posted on this thread a few days ago. To date, mine is the only
actual solution to OP's problem as originally described. Everything else
(i.e., all the other posts on this thread) has been noise.

--
BigBusiness types (aka, Republicans/Conservatives/Independents/Liberatarians/whatevers)
don't hate big government. They *love* big government as a means for them to get
rich, sucking off the public teat. What they don't like is *democracy* - you know,
like people actually having the right to vote and stuff like that.

Re: Reading a file all in one go in GAWK (Was: Convert CR LF to CR)

<scenni$8ts$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4092&group=comp.unix.shell#4092

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Reading a file all in one go in GAWK (Was: Convert CR LF to CR)
Date: Sun, 11 Jul 2021 14:21:06 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 71
Message-ID: <scenni$8ts$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<scdj64$ud3$1@news-1.m-online.net> <87r1g55xgs.fsf@nosuchdomain.example.com>
<scehh5$76d$1@news-1.m-online.net> <scem7t$3c1g9$1@news.xmission.com>
NNTP-Posting-Host: 2001:a61:241e:cc01:4048:ff35:82b7:bf20
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626006066 9148 2001:a61:241e:cc01:4048:ff35:82b7:bf20 (11 Jul 2021 12:21:06 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Sun, 11 Jul 2021 12:21:06 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
X-Enigmail-Draft-Status: N1110
In-Reply-To: <scem7t$3c1g9$1@news.xmission.com>
 by: Janis Papanagnou - Sun, 11 Jul 2021 12:21 UTC

On 11.07.2021 13:55, Kenny McCormack wrote:
> In article <scehh5$76d$1@news-1.m-online.net>,
> Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
> ...
>>>> Unless you use GNU Awk with a "stream processing" setup.
>>>
>>> I don't see anything about that in the gawk manual. Do you have a
>>> reference?
>>
>> I don't recall whether it was mentioned in Arnold's Awk book and/or
>> also discussed in comp.lang.awk, and myself I rarely used it and it
>> was decades ago.
>
> I don't think "stream processing" is a GAWK "thing". I think you are
> mis-remembering that (to put it kindly).

It is, strictly, not stream processing and that's why I wrote "stream
processing" (in quotes). But I don't think I am misremembering because
it serves the intended purpose, which was to regard the various line
separator characters in the text file as part of the data (as opposed
to being record/line separators).

>
> From what you say below, I am going to guess that what you are attempting
> to evoke is the idea of reading in a whole file at once, and then
> processing through the buffer.

Effectively, yes.

> Sort of like what you might do in a low-level language like C.

Not quite. In C I usually do real stream processing, not filling the
whole file in the record buffer (in an array).

>
>> A quick test (from memory) shows how to activate
>> that feature, by setting RS to NUL. The following script
>>
>> awk 'BEGIN{RS="\0"} END{print NR}'
>
> This doesn't do what you think it does.

It always did what I thought it would do. (What do you think I have
thought it would do?)

> In fact, setting RS to a NUL
> character is perfectly legitimate and will do exactly that. This can be
> useful when processing the /proc/*/environ files in Linux. E.g.,
>
> gawk 'BEGIN { RS = "\0" } { print NR,$0 }' /proc/self/environ | less
>
>> will print 1 (for non-empty files), meaning that it has processed one
>> record.
>
> Only if there are no actual nulls in the file.

We have been considering text files in this thread (not binary). Myself
I have never considered awk to be a binary file processor, but if one
wants to use it that way I'm of course fine with it.

>
> Now, what I *think* you are going for is that if you set RS="^$", then it
> *is* guaranteed to never match.

This is an alternative (and works for your binary file case as well).

Janis

> [...]

Re: Reading a file all in one go in GAWK

<87fswlvw0e.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4093&group=comp.unix.shell#4093

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.unix.shell
Subject: Re: Reading a file all in one go in GAWK
Date: Sun, 11 Jul 2021 14:08:01 +0100
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <87fswlvw0e.fsf@bsb.me.uk>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com>
<scehh5$76d$1@news-1.m-online.net> <scem7t$3c1g9$1@news.xmission.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="5b0a93f32d51033d1e588d0a2f6a1fd7";
logging-data="6283"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+3uxuipu4KaiYDhOLDasyE1j8jLYmyCyE="
Cancel-Lock: sha1:YZxgoJ5+mjXK2exR8/CYsw5bki0=
sha1:tHeAYXf6h0ThRK6TWAFgKrH5Uh4=
X-BSB-Auth: 1.9658edd0f6b37dde8714.20210711140801BST.87fswlvw0e.fsf@bsb.me.uk
 by: Ben Bacarisse - Sun, 11 Jul 2021 13:08 UTC

gazelle@shell.xmission.com (Kenny McCormack) writes:

> P.S. I posted on this thread a few days ago. To date, mine is the only
> actual solution to OP's problem as originally described. Everything else
> (i.e., all the other posts on this thread) has been noise.

Including the three the OP said worked well? Odd. I thought your
suggestion was intended as a joke (at the OP's expense) but maybe you
were being serious.

--
Ben.

GNU Awk bulk load of file (manual references) (was Re: Convert CR LF to CR)

<sch1vd$u3m$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4094&group=comp.unix.shell#4094

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: GNU Awk bulk load of file (manual references) (was Re: Convert CR LF
to CR)
Date: Mon, 12 Jul 2021 11:28:13 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 20
Message-ID: <sch1vd$u3m$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626082093 30838 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 09:28:13 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 09:28:13 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
X-Enigmail-Draft-Status: N1110
In-Reply-To: <87r1g55xgs.fsf@nosuchdomain.example.com>
 by: Janis Papanagnou - Mon, 12 Jul 2021 09:28 UTC

On 11.07.2021 05:41, Keith Thompson wrote:
>> Unless you use GNU Awk with a [sort of] "stream processing" setup.
>
> I don't see anything about that in the gawk manual. Do you have a
> reference?

It's indeed mentioned in Arnold Robbin's Book and also in the GNU Awk
manual online (see chapter "Record Splitting with gawk" for "\0").
https://www.gnu.org/software/gawk/manual/gawk.html

The current manual version mentions also "^$" in the context of a
getline based readfile function
https://www.gnu.org/software/gawk/manual/gawk.html#Readfile-Function

And finally there's mention of an extension library function to read
an entire file at once
https://www.gnu.org/software/gawk/manual/gawk.html#Extension-Sample-Readfile

Janis

Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR LF to CR)

<sch571$3dba4$1@news.xmission.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4095&group=comp.unix.shell#4095

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gaze...@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.unix.shell
Subject: Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR LF
to CR)
Date: Mon, 12 Jul 2021 10:23:29 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <sch571$3dba4$1@news.xmission.com>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com> <scdj64$ud3$1@news-1.m-online.net> <87r1g55xgs.fsf@nosuchdomain.example.com> <sch1vd$u3m$1@news-1.m-online.net>
Injection-Date: Mon, 12 Jul 2021 10:23:29 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="3583300"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Mon, 12 Jul 2021 10:23 UTC

In article <sch1vd$u3m$1@news-1.m-online.net>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>On 11.07.2021 05:41, Keith Thompson wrote:
>>> Unless you use GNU Awk with a [sort of] "stream processing" setup.
>>
>> I don't see anything about that in the gawk manual. Do you have a
>> reference?
>
>It's indeed mentioned in Arnold Robbin's Book and also in the GNU Awk
>manual online (see chapter "Record Splitting with gawk" for "\0").
>https://www.gnu.org/software/gawk/manual/gawk.html

As I've already explained to you, there is absolutely nothing special or
unique about setting RS = "\0".

It works entirely as expected. I even gave you an example of using it to
parse Linux proc "environ" files. You can also use it to parse the
"cmdline" file. There are other files in /proc as well that are delimited
by null characters.

--
If you ask a Trumper who is to blame for the debacle of Jan 6, they will almost certainly say
something about Antifa/BLM/something/whatever. This shows just how screwed up they are; they can't
even get their narrative straight. What they *should* say is "Eugene Goodman". If not for him, the plot
would probably have succeeded, so he (Eugene) is clearly to blame for the failure.

Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR LF to CR)

<schat7$t8$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4096&group=comp.unix.shell#4096

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!rocksolid2!i2pn.org!aioe.org!news.mb-net.net!open-news-network.org!news.bgeserver.de!bgepartei.de!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: GNU Awk bulk load of file (manual references) (was Re: Convert CR
LF to CR)
Date: Mon, 12 Jul 2021 14:00:39 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 43
Message-ID: <schat7$t8$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<scdj64$ud3$1@news-1.m-online.net> <87r1g55xgs.fsf@nosuchdomain.example.com>
<sch1vd$u3m$1@news-1.m-online.net> <sch571$3dba4$1@news.xmission.com>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626091239 936 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 12:00:39 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 12:00:39 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
X-Enigmail-Draft-Status: N1110
In-Reply-To: <sch571$3dba4$1@news.xmission.com>
 by: Janis Papanagnou - Mon, 12 Jul 2021 12:00 UTC

On 12.07.2021 12:23, Kenny McCormack wrote:
> In article <sch1vd$u3m$1@news-1.m-online.net>,
> Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>> On 11.07.2021 05:41, Keith Thompson wrote:
>>>> Unless you use GNU Awk with a [sort of] "stream processing" setup.
>>>
>>> I don't see anything about that in the gawk manual. Do you have a
>>> reference?
>>
>> It's indeed mentioned in Arnold Robbin's Book and also in the GNU Awk
>> manual online (see chapter "Record Splitting with gawk" for "\0").
>> https://www.gnu.org/software/gawk/manual/gawk.html
>
> As I've already explained to you, there is absolutely nothing special or
> unique about setting RS = "\0".

Depends on your mental image of "special". Special is that it's not
portably working; it's even explained in the GNU Awk manual IIRC,
since you cannot expect awk's to not fail when using C strings as an
Awk string implementation. In GNU Awk, OTOH, it is _technically_ not
special, but it had been mentioned in the past that it can be used
to achieve what we're talking about here in this thread. That's all.
Is that so hard to accept?

I'm not sure what your problem is. Are you trying to teach me how to
breathe? Do you feel to be misunderstood by me?

>
> It works entirely as expected.

Yes. I also wrote upthread "It always did what I thought it would do."
(in response to your "This doesn't do what you think it does." hubris).

> I even gave you an example of using it to
> parse Linux proc "environ" files. You can also use it to parse the
> "cmdline" file. There are other files in /proc as well that are delimited
> by null characters.

So what? I know all that. Repeating your trivial statements is nothing
but unnecessary noise (to use your habitual style of word choices).

Janis

Re: Convert CR LF to CR

<schean$k7q$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4097&group=comp.unix.shell#4097

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 07:59:03 -0500
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <schean$k7q$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Jul 2021 12:59:03 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6be9b774b912cadfdb38e017142779d5";
logging-data="20730"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ZyKeAMPK05LbUAUrOSSdd"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:3UqY9jfe3Syt1v//INsEBFkcUVg=
In-Reply-To: <scehh5$76d$1@news-1.m-online.net>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-2, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 12:59 UTC

On 7/11/2021 5:35 AM, Janis Papanagnou wrote:
<snip>
> The following script
>
> awk 'BEGIN{RS="\0"} END{print NR}'
>
> will print 1 (for non-empty files), meaning that it has processed one
> record.

Assuming gawk, use RS="^$" instead of RS="\0". An input file could
contain NULs and still be processed by gawk (but not some other awks as
it's then not a valid POSIX text file), but "^$" only matches an empty
file and therefore cannot match any string in a non-empty file.

Ed.

Re: Convert CR LF to CR

<schhqs$3dfug$1@news.xmission.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4098&group=comp.unix.shell#4098

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gaze...@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 13:58:52 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <schhqs$3dfug$1@news.xmission.com>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com> <87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net> <schean$k7q$1@dont-email.me>
Injection-Date: Mon, 12 Jul 2021 13:58:52 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="3588048"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Mon, 12 Jul 2021 13:58 UTC

In article <schean$k7q$1@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
>On 7/11/2021 5:35 AM, Janis Papanagnou wrote:
><snip>
>> The following script
>>
>> awk 'BEGIN{RS="\0"} END{print NR}'
>>
>> will print 1 (for non-empty files), meaning that it has processed one
>> record.
>
>Assuming gawk, use RS="^$" instead of RS="\0". An input file could
>contain NULs and still be processed by gawk (but not some other awks as
>it's then not a valid POSIX text file), but "^$" only matches an empty
>file and therefore cannot match any string in a non-empty file.

Like a stopped clock...

--
People who want to share their religious views with you
almost never want you to share yours with them. -- Dave Barry

Re: Convert CR LF to CR

<schjcq$3ck$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4099&group=comp.unix.shell#4099

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 16:25:30 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 73
Message-ID: <schjcq$3ck$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626099930 3476 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 14:25:30 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 14:25:30 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
X-Enigmail-Draft-Status: N1110
In-Reply-To: <schean$k7q$1@dont-email.me>
 by: Janis Papanagnou - Mon, 12 Jul 2021 14:25 UTC

Hi Ed,

since you killfiled a prominent member you obviously haven't seen all
replies.

On 12.07.2021 14:59, Ed Morton wrote:
> On 7/11/2021 5:35 AM, Janis Papanagnou wrote:
> <snip>
>> The following script
>>
>> awk 'BEGIN{RS="\0"} END{print NR}'
>>
>> will print 1 (for non-empty files), meaning that it has processed one
>> record.
>
> Assuming gawk, use RS="^$" instead of RS="\0".

That has already been suggested...

> An input file could
> contain NULs and still be processed by gawk (but not some other awks as
> it's then not a valid POSIX text file), but "^$" only matches an empty
> file and therefore cannot match any string in a non-empty file.

....and I replied that we're considering text files here (no NULs).

Seen from a wider perspective all the options are arguable and depend
on the constraints and tools used.

This has actually all been already disputed in this newsgroup 8 years
ago (and the discussion was spanning many months).

To quote one of my years old replies (just as an example, but it may
also answer another posters vigorous attack) from that old thread:

A '\0' character had never been a reliable separator in case of
binary files. And WRT text files using RS=SUBSEP would be even
better than the suggested RS="\0" in cases when your programs
shall run on other (and older) awks as well.

There are yet more aspects to consider. At that time you posted (also
just for another example) to use "\n$" in certain application contexts.
Another poster suggested an equivalent of RS=".*" (IIRC). And so on.
In short; what fits best depends.

The "^$" is very appealing because you can use it for binary files as
well (although I'm using awk as _text_-processor). There's a few things
I don't like much with it, though. Similar to "\0" it is non-portable.
It's more cryptic, no one seems to be perfectly sure how it works - at
least that was the expression I've got from the discussion 8 years ago,
and also now I immediately start a test case to see whether it matches
an empty line (just to have some confidence). Because of that fact I
think it should explicitly be documented and supported by a rationale
that explains the _mechanism_. (Of course there could also just be a
statement that it's sort of a special pattern that works "magically",
so that no [technical] explanation needs to be formulated.) The GNU
Awk manual has the topic incoherently spread across three chapters.
It would certainly be helpful to have a more coherent picture and a
guidance or suggestion (with all the caveats, e.g. about what happens
with a RS="^$" statement in other awks). There have also performance
issues been addressed by Arnold in the past, I think that the @load
option is the fastest because it bypasses the regexp processing,
which would aid the user to make an informed choice what to use when.

Until we have such a "directive" I fear we'll repeat our discussions
every couple years again, and they seem to not be quickly terminated
discussions on every re-iteration. ;-)

Janis

>
> Ed.

Re: Convert CR LF to CR

<87y2abvc9t.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4100&group=comp.unix.shell#4100

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 15:26:38 +0100
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <87y2abvc9t.fsf@bsb.me.uk>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com>
<scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="cba6ae1bc6b8c42f7eddd5f629c5b53c";
logging-data="15914"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/xZw6BUltU3X+xWtfVRNXN/iG7IkJvJA="
Cancel-Lock: sha1:gDqTBId26vAyp6m8/NEDDUsWi2U=
sha1:W90nDYkEdSNF/5oqKLMJPFkMDts=
X-BSB-Auth: 1.bf2aa56376bc18715a06.20210712152638BST.87y2abvc9t.fsf@bsb.me.uk
 by: Ben Bacarisse - Mon, 12 Jul 2021 14:26 UTC

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>> On 11.07.2021 01:55, Keith Thompson wrote:
>>> I thought of using C because I found it clearer to think of the input
>>> and output as character-oriented, not line-oriented.
>>
>> Definitely.
>>
>>> awk processing is line-oriented.
>>
>> Unless you use GNU Awk with a "stream processing" setup.
>
> I don't see anything about that in the gawk manual. Do you have a
> reference?
>
> For myself, if a problem isn't strictly line-oriented I generally treat
> that as a sign that I should use something other than awk, but I'm sure
> it's more powerful than what I'm aware of.

You can write something like your C solution using gawk if you set
RS="().". The ()s are needed to make RS longer than one character, but
having done that, every character is then available, one by one, in RT.

Something like (IIRC)

awk 'BEGIN{RS="()."} p!="\r" || RT!="\n" {printf RT} {p=RT}'

That's how I'd interpret doing "stream processing" in GAWK.

--
Ben.

Re: Convert CR LF to CR

<schmss$lpb$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4101&group=comp.unix.shell#4101

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 10:25:16 -0500
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <schmss$lpb$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Jul 2021 15:25:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6be9b774b912cadfdb38e017142779d5";
logging-data="22315"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SskVs9mMeLxjc1TAy4EOy"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:cV7R7cseOCyeZ/6CX3Q8Xv1cU/Q=
In-Reply-To: <schjcq$3ck$1@news-1.m-online.net>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-2, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 15:25 UTC

On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
<snip>
> The "^$" is very appealing because you can use it for binary files as
> well (although I'm using awk as _text_-processor). There's a few things
> I don't like much with it, though. Similar to "\0" it is non-portable.
> It's more cryptic, no one seems to be perfectly sure how it works

I don't know why it wouldn't just work like any other RS. gawk looks for
where a string matching that regexp occurs in the file and uses that to
identify the end of a record. If no such string exists in the file the
whole file is stored in $0. Since "^" means "start-of-string" and "$"
means "end-of-string" the only possible way that "^$" could match a
string in a file would be if there was nothing between the start and end
of the file, i.e. the file is empty. Using RS='^$' for any file is no
different than using RS='foo' when foo doesn't exist in the file.

Ed.

Re: Convert CR LF to CR

<schobj$3djq4$1@news.xmission.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4102&group=comp.unix.shell#4102

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gaze...@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 15:50:11 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <schobj$3djq4$1@news.xmission.com>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com> <schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net> <schmss$lpb$1@dont-email.me>
Injection-Date: Mon, 12 Jul 2021 15:50:11 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
logging-data="3592004"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
 by: Kenny McCormack - Mon, 12 Jul 2021 15:50 UTC

In article <schmss$lpb$1@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
>On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
><snip>
>> The "^$" is very appealing because you can use it for binary files as
>> well (although I'm using awk as _text_-processor). There's a few things
>> I don't like much with it, though. Similar to "\0" it is non-portable.
>> It's more cryptic, no one seems to be perfectly sure how it works
>
>I don't know why it wouldn't just work like any other RS. gawk looks for
>where a string matching that regexp occurs in the file and uses that to
>identify the end of a record. If no such string exists in the file the
>whole file is stored in $0. Since "^" means "start-of-string" and "$"
>means "end-of-string" the only possible way that "^$" could match a
>string in a file would be if there was nothing between the start and end
>of the file, i.e. the file is empty. Using RS='^$' for any file is no
>different than using RS='foo' when foo doesn't exist in the file.

This is not your "stopped clock" moment, I'm afraid.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/Security

Re: Convert CR LF to CR

<sci9sh$9lr$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4103&group=comp.unix.shell#4103

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 22:49:21 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 38
Message-ID: <sci9sh$9lr$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626122961 9915 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 20:49:21 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 20:49:21 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
X-Enigmail-Draft-Status: N1110
In-Reply-To: <schmss$lpb$1@dont-email.me>
 by: Janis Papanagnou - Mon, 12 Jul 2021 20:49 UTC

On 12.07.2021 17:25, Ed Morton wrote:
> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>> It's more cryptic, no one seems to be perfectly sure how it works
>
> I don't know why it wouldn't just work like any other RS.

The responses in that other thread left a different impression to me,
that it wasn't obvious or as clear as it should be.

> gawk looks for
> where a string matching that regexp occurs in the file and uses that to
> identify the end of a record. If no such string exists in the file the
> whole file is stored in $0.

If all we want is a pattern that is principally non-existing wouldn't
it be clearer to use something like "$^" (i.e. "^$" reversed), which
is a meta-character sequence that does obviously not make any sense.

If there's a meta-character sequence that may make sense in a way that
you have to think about what constitutes a string in a context where
we are just in the process to define the parsing unit string by setting
RS opens room for confusion. YMMV.

It seems to me that an explanation of "$^" (one that cannot exist per
definition of the meaning of '^' and '$') is clearer than the version
that requires a more verbose explanation like the one here (that had
in this or similar form also already been given 8 years ago, IIRC):

> Since "^" means "start-of-string" and "$"
> means "end-of-string" the only possible way that "^$" could match a
> string in a file would be if there was nothing between the start and end
> of the file, i.e. the file is empty. Using RS='^$' for any file is no
> different than using RS='foo' when foo doesn't exist in the file.
>
> Ed.

Janis

Re: Convert CR LF to CR

<scia1q$udk$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4104&group=comp.unix.shell#4104

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 15:52:09 -0500
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <scia1q$udk$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Jul 2021 20:52:10 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6be9b774b912cadfdb38e017142779d5";
logging-data="31156"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19pag4o9eAVHIjeCASz4kdG"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:GrtWeSeVhNKwyAXiuvcHajunoU8=
In-Reply-To: <schjcq$3ck$1@news-1.m-online.net>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-4, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 20:52 UTC

On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
> Hi Ed,
>
> since you killfiled a prominent member you obviously haven't seen all
> replies.

The only person I have killfiled is Kenny and that's well-deserved. If
he's yammering away somewhere in this thread I'm sure that, as always,
it'll be wrong and/or redundant and/or infantile name-calling so I know
I'm not missing anything there.

Actually I may still have that other well-know netkook Alan Conor aka
Tom Newton killfiled too but I haven't seen him post anything in years
so I doubt if he's who you're referring to.

Ed.

Re: Convert CR LF to CR

<sciag4$1fv$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4105&group=comp.unix.shell#4105

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mortons...@gmail.com (Ed Morton)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 15:59:47 -0500
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <sciag4$1fv$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Jul 2021 20:59:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6be9b774b912cadfdb38e017142779d5";
logging-data="1535"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+HpgC+t1+DV0ZCpGLbP06B"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:Z9f2PH5DvaWau/uj683Fa5izz0U=
In-Reply-To: <sci9sh$9lr$1@news-1.m-online.net>
X-Antivirus-Status: Clean
Content-Language: en-US
X-Antivirus: Avast (VPS 210712-4, 7/12/2021), Outbound message
 by: Ed Morton - Mon, 12 Jul 2021 20:59 UTC

On 7/12/2021 3:49 PM, Janis Papanagnou wrote:
> On 12.07.2021 17:25, Ed Morton wrote:
>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>> It's more cryptic, no one seems to be perfectly sure how it works
>>
>> I don't know why it wouldn't just work like any other RS.
>
> The responses in that other thread left a different impression to me,
> that it wasn't obvious or as clear as it should be.

I don't recall the previous conversation on it you're referring to but
it seems very clear and simple to me.

>> gawk looks for
>> where a string matching that regexp occurs in the file and uses that to
>> identify the end of a record. If no such string exists in the file the
>> whole file is stored in $0.
>
> If all we want is a pattern that is principally non-existing wouldn't
> it be clearer to use something like "$^" (i.e. "^$" reversed), which
> is a meta-character sequence that does obviously not make any sense.

`$^` means the literal chars `$` then `^` since `$` is only an anchor
metachar at the end of a regexp or subexpression and `^` only at the
beginning. Look:

$ echo 'a$^b' | grep '$^'
a$^b

See
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_08
for more info.

Regards,

Ed.

Re: Convert CR LF to CR

<sciaq9$9s6$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4106&group=comp.unix.shell#4106

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 23:05:13 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 13
Message-ID: <sciaq9$9s6$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<scia1q$udk$1@dont-email.me>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626123913 10118 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 21:05:13 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 21:05:13 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
In-Reply-To: <scia1q$udk$1@dont-email.me>
 by: Janis Papanagnou - Mon, 12 Jul 2021 21:05 UTC

On 12.07.2021 22:52, Ed Morton wrote:
> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>> Hi Ed,
>>
>> since you killfiled a prominent member you obviously haven't seen all
>> replies.
>
> The only person I have killfiled is Kenny and [...]

And, IIRC, he's the one that suggested to use "^$".
(Or was it that he suggested to not use "\0"?)
Anyway...

Re: Convert CR LF to CR

<sciasb$20j$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4107&group=comp.unix.shell#4107

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: unr...@invalid.ca (William Unruh)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 21:06:19 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <sciasb$20j$1@dont-email.me>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com>
<scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com>
<scehh5$76d$1@news-1.m-online.net> <schean$k7q$1@dont-email.me>
<schjcq$3ck$1@news-1.m-online.net> <schmss$lpb$1@dont-email.me>
<sci9sh$9lr$1@news-1.m-online.net>
Injection-Date: Mon, 12 Jul 2021 21:06:19 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9dc66bf3a3c9ff153b3e0000f7b83e06";
logging-data="2067"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19On/jHcVo/p5hvftIE0maq"
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:63ue9LTbD8of7IUifrUir9UtOZc=
 by: William Unruh - Mon, 12 Jul 2021 21:06 UTC

On 2021-07-12, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
> On 12.07.2021 17:25, Ed Morton wrote:
>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>> It's more cryptic, no one seems to be perfectly sure how it works
>>
>> I don't know why it wouldn't just work like any other RS.
>
> The responses in that other thread left a different impression to me,
> that it wasn't obvious or as clear as it should be.
>
>> gawk looks for
>> where a string matching that regexp occurs in the file and uses that to
>> identify the end of a record. If no such string exists in the file the
>> whole file is stored in $0.
>
> If all we want is a pattern that is principally non-existing wouldn't
> it be clearer to use something like "$^" (i.e. "^$" reversed), which
> is a meta-character sequence that does obviously not make any sense.

Isn't $^ something that occurs at the end of every line? (End of this
line, beginning of the next)

Actually, I also have problems with ^$ since that would seem to mean and
empty line, which is certainly possible (LFLF) would seem to have an
empty line in it. But clearly I would have to know EXACTLU how awk
determines the start and end of a line.

>
> If there's a meta-character sequence that may make sense in a way that
> you have to think about what constitutes a string in a context where
> we are just in the process to define the parsing unit string by setting
> RS opens room for confusion. YMMV.
>
> It seems to me that an explanation of "$^" (one that cannot exist per
> definition of the meaning of '^' and '$') is clearer than the version
> that requires a more verbose explanation like the one here (that had
> in this or similar form also already been given 8 years ago, IIRC):
>
>> Since "^" means "start-of-string" and "$"
>> means "end-of-string" the only possible way that "^$" could match a
>> string in a file would be if there was nothing between the start and end
>> of the file, i.e. the file is empty. Using RS='^$' for any file is no
>> different than using RS='foo' when foo doesn't exist in the file.
>>
>> Ed.
>
> Janis
>

Re: Convert CR LF to CR

<scib4b$a13$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4108&group=comp.unix.shell#4108

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 23:10:35 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 50
Message-ID: <scib4b$a13$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciag4$1fv$1@dont-email.me>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626124235 10275 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 21:10:35 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 21:10:35 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
In-Reply-To: <sciag4$1fv$1@dont-email.me>
 by: Janis Papanagnou - Mon, 12 Jul 2021 21:10 UTC

On 12.07.2021 22:59, Ed Morton wrote:
> On 7/12/2021 3:49 PM, Janis Papanagnou wrote:
>> On 12.07.2021 17:25, Ed Morton wrote:
>>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>>> It's more cryptic, no one seems to be perfectly sure how it works
>>>
>>> I don't know why it wouldn't just work like any other RS.
>>
>> The responses in that other thread left a different impression to me,
>> that it wasn't obvious or as clear as it should be.
>
> I don't recall the previous conversation on it you're referring to but
> it seems very clear and simple to me.
>
>>> gawk looks for
>>> where a string matching that regexp occurs in the file and uses that to
>>> identify the end of a record. If no such string exists in the file the
>>> whole file is stored in $0.
>>
>> If all we want is a pattern that is principally non-existing wouldn't
>> it be clearer to use something like "$^" (i.e. "^$" reversed), which
>> is a meta-character sequence that does obviously not make any sense.
>
> `$^` means the literal chars `$` then `^` since `$` is only an anchor
> metachar at the end of a regexp or subexpression and `^` only at the
> beginning. Look:
>
> $ echo 'a$^b' | grep '$^'
> a$^b

We were talking about GNU Awk, don't we?

Consequently I have tested GNU Awk:

$ echo $'a$^b\na$^b' | awk 'BEGIN{RS="$^"}{print NR, $0}'
1 a$^b
a$^b

Janis

>
> See
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_08
> for more info.
>
> Regards,
>
> Ed.

Re: Convert CR LF to CR

<scibcs$a2o$1@news-1.m-online.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4109&group=comp.unix.shell#4109

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!news2.arglkargh.de!news.karotte.org!news.space.net!news.m-online.net!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 23:15:08 +0200
Organization: (posted via) M-net Telekommunikations GmbH
Lines: 37
Message-ID: <scibcs$a2o$1@news-1.m-online.net>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com>
<20210709142658.883@kylheku.com>
<b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net>
<8735sl7mhw.fsf@nosuchdomain.example.com> <scdj64$ud3$1@news-1.m-online.net>
<87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
<schmss$lpb$1@dont-email.me> <sci9sh$9lr$1@news-1.m-online.net>
<sciasb$20j$1@dont-email.me>
NNTP-Posting-Host: 2001:a61:241e:cc01:f8fa:19e:e572:d210
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Trace: news-1.m-online.net 1626124508 10328 2001:a61:241e:cc01:f8fa:19e:e572:d210 (12 Jul 2021 21:15:08 GMT)
X-Complaints-To: news@news-1.m-online.net
NNTP-Posting-Date: Mon, 12 Jul 2021 21:15:08 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
In-Reply-To: <sciasb$20j$1@dont-email.me>
 by: Janis Papanagnou - Mon, 12 Jul 2021 21:15 UTC

On 12.07.2021 23:06, William Unruh wrote:
> On 2021-07-12, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
>> On 12.07.2021 17:25, Ed Morton wrote:
>>> On 7/12/2021 9:25 AM, Janis Papanagnou wrote:
>>>> It's more cryptic, no one seems to be perfectly sure how it works
>>>
>>> I don't know why it wouldn't just work like any other RS.
>>
>> The responses in that other thread left a different impression to me,
>> that it wasn't obvious or as clear as it should be.
>>
>>> gawk looks for
>>> where a string matching that regexp occurs in the file and uses that to
>>> identify the end of a record. If no such string exists in the file the
>>> whole file is stored in $0.
>>
>> If all we want is a pattern that is principally non-existing wouldn't
>> it be clearer to use something like "$^" (i.e. "^$" reversed), which
>> is a meta-character sequence that does obviously not make any sense.
>
> Isn't $^ something that occurs at the end of every line? (End of this
> line, beginning of the next)

Well, on close look it may have a similar interpretation problem (but
probably to a lesser degree?).

>
> Actually, I also have problems with ^$ since that would seem to mean and
> empty line, which is certainly possible (LFLF) would seem to have an
> empty line in it. But clearly I would have to know EXACTLU how awk
> determines the start and end of a line.

I think it boils down to have a good documentation and/or guidance in
the manual. (Then these threads could be significantly shortened. ;-)

Janis

Re: Convert CR LF to CR

<gH6tZXOFD=W8IcVzL@bongo-ra.co>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4110&group=comp.unix.shell#4110

  copy link   Newsgroups: comp.unix.shell comp.lang.awk
Path: i2pn2.org!i2pn.org!aioe.org!Lsq9Ulyii8Zln50ye03obQ.user.gioia.aioe.org.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.unix.shell,comp.lang.awk
Subject: Re: Convert CR LF to CR
Date: Mon, 12 Jul 2021 21:22:52 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 23
Message-ID: <gH6tZXOFD=W8IcVzL@bongo-ra.co>
References: <3ec51006-7355-44b5-a871-71c467c8605en@googlegroups.com> <20210709142658.883@kylheku.com> <b053012b-041b-4882-9fb9-8b83663ffe31n@googlegroups.com>
<zsdP8S5dxCsfgn03B@bongo-ra.co> <scbv49$fi0$1@news-1.m-online.net> <8735sl7mhw.fsf@nosuchdomain.example.com>
<scdj64$ud3$1@news-1.m-online.net> <87r1g55xgs.fsf@nosuchdomain.example.com> <scehh5$76d$1@news-1.m-online.net>
<schean$k7q$1@dont-email.me> <schjcq$3ck$1@news-1.m-online.net>
NNTP-Posting-Host: Lsq9Ulyii8Zln50ye03obQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
X-Notice: Filtered by postfilter v. 0.9.2
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Mon, 12 Jul 2021 21:22 UTC

On Mon, 12 Jul 2021 16:25:30 +0200
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

[On AWK dark corners.]
> The GNU
> Awk manual has the topic incoherently spread across three chapters.
> It would certainly be helpful to have a more coherent picture and a
> guidance or suggestion (with all the caveats, e.g. about what happens
> with a RS="^$" statement in other awks). There have also performance
> issues been addressed by Arnold in the past, I think that the @load
> option is the fastest because it bypasses the regexp processing,
> which would aid the user to make an informed choice what to use when.
>
> Until we have such a "directive" I fear we'll repeat our discussions
> every couple years again, and they seem to not be quickly terminated
> discussions on every re-iteration. ;-)

The only person who can provide authoritative answers is Arnold Robbins. I
think he reads comp.lang.awk but I'm not sure if he reads comp.unix.shell
so you should have crossposted to comp.lang.awk (which I've done). As I'm
typing this , I can see that several more posts have been made in the thread
discussing esoteric issues regarding AWK .It would have served everyone best
if these also appeared on comp.lang.awk .

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor