Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Long computations which yield zero are probably all for naught.


devel / comp.lang.forth / Handling unsupported line-endings

SubjectAuthor
* Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsHeinrich Hohl
|`* Re: Handling unsupported line-endingsdxforth
| `* Re: Handling unsupported line-endingsHeinrich Hohl
|  +* Re: Handling unsupported line-endingsdxforth
|  |`- Re: Handling unsupported line-endingsdxforth
|  +* Re: Handling unsupported line-endingsminf...@arcor.de
|  |`- Re: Handling unsupported line-endingsdxforth
|  `* Re: Handling unsupported line-endingsAnton Ertl
|   +* Re: Handling unsupported line-endingsHeinrich Hohl
|   |`- Re: Handling unsupported line-endingsAnton Ertl
|   `* Re: Handling unsupported line-endingsNickolay Kolchin
|    +* Re: Handling unsupported line-endingsdxforth
|    |+* Re: Handling unsupported line-endingsdxforth
|    ||`* Re: Handling unsupported line-endingsAnton Ertl
|    || `* Re: Handling unsupported line-endingsdxforth
|    ||  +* Re: Handling unsupported line-endingsdxforth
|    ||  |`* Re: Handling unsupported line-endingsAnton Ertl
|    ||  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |  +- Re: Handling unsupported line-endingsdxforth
|    ||  |  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |   `* Re: Handling unsupported line-endingsdxforth
|    ||  |    `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |     `* Re: Handling unsupported line-endingsdxforth
|    ||  |      `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |       `* Re: Handling unsupported line-endingsdxforth
|    ||  |        `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |         `* Re: Handling unsupported line-endingsdxforth
|    ||  |          `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |           `* Re: Handling unsupported line-endingsdxforth
|    ||  |            `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |             `* Re: Handling unsupported line-endingsdxforth
|    ||  |              +* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |              |`* Re: Handling unsupported line-endingsdxforth
|    ||  |              | `* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  +* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |`* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |  `- Re: Handling unsupported line-endingsRuvim
|    ||  |              |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    ||  |              |   `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |    `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |     `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |      `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |       `- Re: Handling unsupported line-endingsRon AARON
|    ||  |              `* Re: Handling unsupported line-endingsdxforth
|    ||  |               `- Re: Handling unsupported line-endingsdxforth
|    ||  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||   `* Re: Handling unsupported line-endingsdxforth
|    ||    `- Re: Handling unsupported line-endingsAnton Ertl
|    |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | +* Re: Handling unsupported line-endingsdxforth
|    | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsdxforth
|    | | |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsAnton Ertl
|    | | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | | `* Re: Handling unsupported line-endingsAnton Ertl
|    | | |  `- Re: Handling unsupported line-endingsdxforth
|    | | `* Re: Handling unsupported line-endingsHeinrich Hohl
|    | |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |   `* Re: Handling unsupported line-endingsRon AARON
|    | |    `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     +* Re: Handling unsupported line-endingspahihu
|    | |     |+- Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     |`- Re: Handling unsupported line-endingsRon AARON
|    | |     `* Re: Handling unsupported line-endingsRon AARON
|    | |      +* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |      |`- Re: Handling unsupported line-endingsRon AARON
|    | |      `- Re: Handling unsupported line-endingsdxforth
|    | `- Re: Handling unsupported line-endingsAnton Ertl
|    +* Re: Handling unsupported line-endingsAnton Ertl
|    |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    `* Re: Handling unsupported line-endingsMarcel Hendrix
|     +- Re: Handling unsupported line-endingsNickolay Kolchin
|     `* Re: Handling unsupported line-endingsAnton Ertl
|      `* Re: Handling unsupported line-endingsdxforth
|       `* Re: Handling unsupported line-endingsAnton Ertl
|        `* Re: Handling unsupported line-endingspahihu
|         +* Re: Handling unsupported line-endingsdxforth
|         |`* Re: Handling unsupported line-endingsAnton Ertl
|         | `- Re: Handling unsupported line-endingsdxforth
|         `- Re: Handling unsupported line-endingsAnton Ertl
+* Re: Handling unsupported line-endingsS Jack
|`- Re: Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsBranimir Maksimovic
|`- Re: Handling unsupported line-endingsdxforth
`* Re: Handling unsupported line-endingsdxforth
 +- Re: Handling unsupported line-endingsRuvim
 `* Re: Handling unsupported line-endingsAnton Ertl
  +* Re: Handling unsupported line-endingsRuvim
  |`* Re: Handling unsupported line-endingsAnton Ertl
  | `* Re: Handling unsupported line-endingsRuvim
  |  `* Re: Handling unsupported line-endingsAnton Ertl
  |   +* Re: Handling unsupported line-endingsRuvim
  |   |`- Re: Handling unsupported line-endingsAnton Ertl
  |   `* Re: Handling unsupported line-endingsdxforth
  |    `* Re: Handling unsupported line-endingsRuvim
  |     `* Re: Handling unsupported line-endingsdxforth
  |      `* Re: Handling unsupported line-endingsRuvim
  |       `* Re: Handling unsupported line-endingsdxforth
  `* Re: Handling unsupported line-endingsdxforth

Pages:1234567
Handling unsupported line-endings

<skjhir$jd9$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14970&group=comp.lang.forth#14970

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Handling unsupported line-endings
Date: Mon, 18 Oct 2021 21:16:59 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skjhir$jd9$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="19881"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Mozilla-News-Host: news://nntp.aioe.org:119
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Mon, 18 Oct 2021 10:16 UTC

I used this in a recent app and figure others may find it useful.

My READ-LINE automatically handles LF or CRLF end-of-line (as well as
CP/M end-of-file character). Rarely do I process files with CR line
endings but should it happen, here's a substitute routine I've used
for that. It automatically handles CR, LF or CRLF endings.

The examples use posinfile, readdata, seekinfile for simplicity.
They are high-level versions of FILE-POSITION, READ-FILE,
REPOSITION-FILE respectively.

: end postpone exit postpone then ; immediate
: of postpone over postpone = postpone if postpone drop ; immediate
: split ( a u char -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;

\ Scan for CR LF CRLF return len & offset to next line
: /EOL ( a u -- u' offs )
over swap begin dup while
over c@ $0D of drop tuck swap - swap
1+ c@ $0A <> over + 2+ end
$0A of drop swap - dup 1+ end
drop 1 /string
repeat drop swap - dup ;

\ Sample 'get-line' function.
\ Note: buffer must be 1 char greater than u !!!
\ CP/M EOF $1A must be handled externally
: GETLINE ( a u -- a u' -1 | 0 )
posinfile 2>r 2dup 1+ readdata rot min dup if
( a a u') /eol 0 2r> d+ seekinfile -1
end rdrop rdrop nip nip ;

\ As above but includes 'eol' (true if eol found)
: GETLINE ( a u -- a u' eol -1 | 0 )
posinfile 2>r 2dup 1+ readdata rot min dup if ( a a u)
( a a u') /eol dup 0 2r> d+ seekinfile over <> -1
end rdrop rdrop nip nip ;

\ Example CP/M EOF handling
begin
pad 256 getline ( a u -1 | 0 )
while
$1A split ( a' u' a u) do-stuff
( a' u') nip ( cpmeof)
until then

Re: Handling unsupported line-endings

<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14972&group=comp.lang.forth#14972

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:620a:5e4:: with SMTP id z4mr20338535qkg.395.1634553664474;
Mon, 18 Oct 2021 03:41:04 -0700 (PDT)
X-Received: by 2002:a05:620a:2947:: with SMTP id n7mr21904516qkp.60.1634553664328;
Mon, 18 Oct 2021 03:41:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 18 Oct 2021 03:41:04 -0700 (PDT)
In-Reply-To: <skjhir$jd9$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:ed:a723:ba01:dd98:a1c7:9ab5:d122;
posting-account=mrP5kgoAAADXISqI3e5f4EXLUinHClBq
NNTP-Posting-Host: 2003:ed:a723:ba01:dd98:a1c7:9ab5:d122
References: <skjhir$jd9$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: hheinric...@gmail.com (Heinrich Hohl)
Injection-Date: Mon, 18 Oct 2021 10:41:04 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 20
 by: Heinrich Hohl - Mon, 18 Oct 2021 10:41 UTC

On Monday, October 18, 2021 at 12:17:03 PM UTC+2, dxforth wrote:
> I used this in a recent app and figure others may find it useful.

It is definitely useful that READ-LINE can handle all possible end-of-line
characters. Strange that this is not mandated by the Forth Standard.

Years ago I had the problem that I could not process postscript files
because these may contain a mixture of EOL characters:
CR, LF, or CRLF.

I informed Forth Inc. about this problem, and since then SwiftForth contains
a sophisticated EOL-SCANNER which handles all cases properly.

This EOL scanner is used in READ-LINE, but it can also be used separately
if you decide to use buffered input (i.e. loading large chunks of the text
file into a buffer and then scan for lines in memory).

Have you made sure that your /EOL routine finds each EOL sequence with
the same speed in case your buffer is large (e.g. 10 MB)?

Henry

Re: Handling unsupported line-endings

<skjlhf$k4f$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14973&group=comp.lang.forth#14973

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 18 Oct 2021 22:24:31 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skjlhf$k4f$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="20623"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Mon, 18 Oct 2021 11:24 UTC

On 18/10/2021 21:41, Heinrich Hohl wrote:
> On Monday, October 18, 2021 at 12:17:03 PM UTC+2, dxforth wrote:
>> I used this in a recent app and figure others may find it useful.
>
> It is definitely useful that READ-LINE can handle all possible end-of-line
> characters. Strange that this is not mandated by the Forth Standard.
>
> Years ago I had the problem that I could not process postscript files
> because these may contain a mixture of EOL characters:
> CR, LF, or CRLF.
>
> I informed Forth Inc. about this problem, and since then SwiftForth contains
> a sophisticated EOL-SCANNER which handles all cases properly.
>
> This EOL scanner is used in READ-LINE, but it can also be used separately
> if you decide to use buffered input (i.e. loading large chunks of the text
> file into a buffer and then scan for lines in memory).
>
> Have you made sure that your /EOL routine finds each EOL sequence with
> the same speed in case your buffer is large (e.g. 10 MB)?

Not something I considered as the goal was a READ-LINE equivalent and
the latter is typically inefficient anyway. If speed is important,
mandating READ-LINE to handle all EOL may not be the best way to go?

Re: Handling unsupported line-endings

<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14981&group=comp.lang.forth#14981

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:6214:50c:: with SMTP id v12mr27704587qvw.45.1634595928193;
Mon, 18 Oct 2021 15:25:28 -0700 (PDT)
X-Received: by 2002:ac8:7f52:: with SMTP id g18mr32642794qtk.196.1634595928056;
Mon, 18 Oct 2021 15:25:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 18 Oct 2021 15:25:27 -0700 (PDT)
In-Reply-To: <skjlhf$k4f$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:ed:a723:ba01:5015:4181:c500:4d47;
posting-account=mrP5kgoAAADXISqI3e5f4EXLUinHClBq
NNTP-Posting-Host: 2003:ed:a723:ba01:5015:4181:c500:4d47
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: hheinric...@gmail.com (Heinrich Hohl)
Injection-Date: Mon, 18 Oct 2021 22:25:28 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Heinrich Hohl - Mon, 18 Oct 2021 22:25 UTC

Yes, READ-LINE is slow and only useful for small files. It spends a lot of time
opening and closing a file to read just one line of text. However, I think that
READ-LINE should be able to handle any EOL sequence no matter how slow it is.

Once in a while I have to process postscript files with a size of up to 2 GB.
My first attempt for this was based on READ-LINE. But it was rather slow.
For better speed, I now read the postscript files in chunks of 10 MB size into a buffer
and use EOL-SCANNER to read lines from this buffer. This is much faster because
the number of file access operations is greatly reduced.

After looking at your /EOL routine more closely I can see that you chose the best
possible algorithm: Check the range for all three possible EOL characters in one pass.
CR first, if found also check for CRLF, otherwise check for LF. All tests in one pass
means that the search time is almost independent of the EOL type.

For a short time, SwiftForth's EOL-SCANNER used a different algorithm
which performed two passes. In the first pass it checked for CR, and if found,
for CRLF. In the second pass it checked for LF. Then it compared the results
and took the EOL character that came first.

This approach is fine if the scanned area is small, e.g. if you expect strings of
only 255 characters. It is also ok if your files contain a random mix of EOL characters
as in case of postscript files.

Problems arise if your files contain only LF as EOL character and you read in large chunks
of data. In this case, the first pass of the EOL-SCANNER will not be successful.
It wastes time scanning the entire buffer, e.g. 10 MB of characters, for CR (or CRLF)
without success before it starts the second pass and eventually finds an LF.

This deficiency has been rectified by now. SwiftForth version 3.10.2 and later use
the optimal algorithm, similar to your approach in /EOL, but written as code definitions.

Re: Handling unsupported line-endings

<skla3u$ec8$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14983&group=comp.lang.forth#14983

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Tue, 19 Oct 2021 13:21:50 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skla3u$ec8$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="14728"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Tue, 19 Oct 2021 02:21 UTC

On 19/10/2021 09:25, Heinrich Hohl wrote:
> ...
> This deficiency has been rectified by now. SwiftForth version 3.10.2 and later use
> the optimal algorithm, similar to your approach in /EOL, but written as code definitions.

Yes, a code definition would be the way to go as it's quite horrible in
high-level forth. I had researched the algorithm - it's been mentioned
previously on c.l.f. and, as you say, used by SwiftForth.

Here it is in 8080 DTC code :)

\ /EOL
\ Scan for CR LF CRLF return len & offset to next line
code /EOL ( a u -- u' offs )
d pop h pop d push h push
1 $: e a mov d ora 4 $ jz m a mov $0D cpi
2 $ jz $0A cpi 3 $ jz h inx d dcx 1 $ jmp
2 $: d pop h push dsub call d pop d inx d ldax
d pop h push h inx $0A cpi hpush jnz h inx 1push
3 $: d pop dsub call d pop h push h inx 1push
4 $: h pop ' dup jmp
end-code

Re: Handling unsupported line-endings

<38f6b64e-e7ce-4697-a397-0404a3e53386n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14985&group=comp.lang.forth#14985

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ac8:5f82:: with SMTP id j2mr14659339qta.35.1634631842489;
Tue, 19 Oct 2021 01:24:02 -0700 (PDT)
X-Received: by 2002:ac8:7fc2:: with SMTP id b2mr35330702qtk.122.1634631842331;
Tue, 19 Oct 2021 01:24:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 19 Oct 2021 01:24:02 -0700 (PDT)
In-Reply-To: <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:f7:1f1b:da5a:594d:921b:35c7:745e;
posting-account=AqNUYgoAAADmkK2pN-RKms8sww57W0Iw
NNTP-Posting-Host: 2003:f7:1f1b:da5a:594d:921b:35c7:745e
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <38f6b64e-e7ce-4697-a397-0404a3e53386n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: minfo...@arcor.de (minf...@arcor.de)
Injection-Date: Tue, 19 Oct 2021 08:24:02 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 5
 by: minf...@arcor.de - Tue, 19 Oct 2021 08:24 UTC

Heinrich Hohl schrieb am Dienstag, 19. Oktober 2021 um 00:25:28 UTC+2:
> Yes, READ-LINE is slow and only useful for small files. It spends a lot of time
> opening and closing a file to read just one line of text.

Memory-mapped files are your friend here.
Do you open/close files between reading lines??

Re: Handling unsupported line-endings

<2021Oct19.095538@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14986&group=comp.lang.forth#14986

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Tue, 19 Oct 2021 07:55:38 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 68
Message-ID: <2021Oct19.095538@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="70d6a91d67886e4082d730b0666cc883";
logging-data="2006"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18MjuCyMY6mA7EACt9gPFfi"
Cancel-Lock: sha1:jTwLfxyOuctS3o6+WyuDi1mtHD0=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Tue, 19 Oct 2021 07:55 UTC

Heinrich Hohl <hheinrich.hohl@gmail.com> writes:
>Yes, READ-LINE is slow and only useful for small files. It spends a lot of time
>opening and closing a file to read just one line of text.

Which files would READ-LINE open and close and why? Gforth's
READ-FILE certainly does not do that.

As for slowness, I have just tried this:

perf stat -e cycles:u -e instructions:u ~/gforth/gforth -e '"count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'

count-unique.in is the input for Ben Hoyt's count-unique example and
consists of 10 bibles, 998170 lines, or 43MB. On a 4GHz Skylake
gforth-fast takes 802M cycles or 0.201s user time for this (and 31M
cycles or 0.011s without the call to FOO).

For comparison, "wc -l" takes 31M cycles or 0.011s for counting the
lines, so Gforth's READ-LINE could indeed be quite a bit faster (and
its startup could also be faster), but for now the speed of READ-LINE
has not been a problem for us.

>However, I think that
>READ-LINE should be able to handle any EOL sequence no matter how slow it is.

Yes, and Gforth's READ-LINE handles CR, LF, and CRLF.

>Once in a while I have to process postscript files with a size of up to 2 GB.

A 2GB file would take about 10s to input with READ-LINE on a 4GHz
Skylake.

>SwiftForth version 3.10.2 and later use
>the optimal algorithm, similar to your approach in /EOL, but written as code definitions.

Since you mention it, I have also measured SwiftForth 3.11.0 with the
same benchmark:

LC_NUMERIC=en_US.utf8 perf stat -e cycles -e cycles:u -e cycles:k -e instructions:u sf 's" count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'

The result is curious:

5,945,307,655 cycles
768,834,475 cycles:u
5,131,957,800 cycles:k
733,948,005 instructions:u # 0.95 insn per cycle

1.488727585 seconds time elapsed

0.916473000 seconds user
0.572295000 seconds sys

What is clear is that it takes quite a bit of time, and a significant
part of that time is spent in the kernel (cycles:k and seconds sys; by
contrast, for gforth-fast the same command reports 43M cycles and
0.004s sys). What is curious is that the cycles count and the process
accounting disagree by so much about the proportion of time spent in
the kernel.

In any case the difference between Gforth's performance and
SwiftForth's performance here is probably because Gforth uses buffered
I/O, and SwiftForth seems to use unbuffered I/O.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<skm8jv$dla$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14987&group=comp.lang.forth#14987

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Tue, 19 Oct 2021 22:02:23 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skm8jv$dla$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<38f6b64e-e7ce-4697-a397-0404a3e53386n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="13994"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Tue, 19 Oct 2021 11:02 UTC

On 19/10/2021 19:24, minf...@arcor.de wrote:
> Heinrich Hohl schrieb am Dienstag, 19. Oktober 2021 um 00:25:28 UTC+2:
>> Yes, READ-LINE is slow and only useful for small files. It spends a lot of time
>> opening and closing a file to read just one line of text.
>
> Memory-mapped files are your friend here.
> Do you open/close files between reading lines??
>

No but since every line read is an overestimate, file repositioning
occurs after each line.

Re: Handling unsupported line-endings

<caa26119-443b-4fc8-b338-4924448c0005n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14990&group=comp.lang.forth#14990

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:620a:31aa:: with SMTP id bi42mr231887qkb.76.1634653566780;
Tue, 19 Oct 2021 07:26:06 -0700 (PDT)
X-Received: by 2002:ac8:5dd1:: with SMTP id e17mr207916qtx.313.1634653566630;
Tue, 19 Oct 2021 07:26:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 19 Oct 2021 07:26:06 -0700 (PDT)
In-Reply-To: <2021Oct19.095538@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a01:c23:c06c:5700:2d7b:3acb:71f6:66a1;
posting-account=mrP5kgoAAADXISqI3e5f4EXLUinHClBq
NNTP-Posting-Host: 2a01:c23:c06c:5700:2d7b:3acb:71f6:66a1
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <caa26119-443b-4fc8-b338-4924448c0005n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: hheinric...@gmail.com (Heinrich Hohl)
Injection-Date: Tue, 19 Oct 2021 14:26:06 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 20
 by: Heinrich Hohl - Tue, 19 Oct 2021 14:26 UTC

On Tuesday, October 19, 2021 at 10:40:49 AM UTC+2, Anton Ertl wrote:
> Which files would READ-LINE open and close and why? Gforth's
> READ-FILE certainly does not do that.

Sorry, my mistake. You are right. Files are opened only once, then many
READ-LINE operations are executed, and finally the file is closed again.
But the file position must be updated after reading each line.

Actually, speed was not that bad when I read the postscript file from a local hard disk.
I assume that the hard disk cache made sure that performance does not suffer too much.

However, the file converter was extremely slow when I tried to read the
huge postscript file from a NAS via the network. Not usable at all.
Maybe file repositioning is slow if a network is involved.

The updated version of my converter which uses buffered input/output
(chunks of 10 MB) does not have any speed problems when used in
combination with a NAS (Buffalo TeraStation).

And good to know that GForth handles all EOL cases properly.
You never know what kind of text file you will need to process.

Re: Handling unsupported line-endings

<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14993&group=comp.lang.forth#14993

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a37:ae83:: with SMTP id x125mr1665665qke.37.1634671615511;
Tue, 19 Oct 2021 12:26:55 -0700 (PDT)
X-Received: by 2002:ac8:7059:: with SMTP id y25mr2063599qtm.404.1634671615292;
Tue, 19 Oct 2021 12:26:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 19 Oct 2021 12:26:55 -0700 (PDT)
In-Reply-To: <2021Oct19.095538@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Tue, 19 Oct 2021 19:26:55 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 139
 by: Nickolay Kolchin - Tue, 19 Oct 2021 19:26 UTC

On Tuesday, October 19, 2021 at 11:40:49 AM UTC+3, Anton Ertl wrote:
> Heinrich Hohl <hheinri...@gmail.com> writes:
> >Yes, READ-LINE is slow and only useful for small files. It spends a lot of time
> >opening and closing a file to read just one line of text.
> Which files would READ-LINE open and close and why? Gforth's
> READ-FILE certainly does not do that.
>
> As for slowness, I have just tried this:
>
> perf stat -e cycles:u -e instructions:u ~/gforth/gforth -e '"count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
>
> count-unique.in is the input for Ben Hoyt's count-unique example and
> consists of 10 bibles, 998170 lines, or 43MB. On a 4GHz Skylake
> gforth-fast takes 802M cycles or 0.201s user time for this (and 31M
> cycles or 0.011s without the call to FOO).
>
> For comparison, "wc -l" takes 31M cycles or 0.011s for counting the
> lines, so Gforth's READ-LINE could indeed be quite a bit faster (and
> its startup could also be faster), but for now the speed of READ-LINE
> has not been a problem for us.
> >However, I think that
> >READ-LINE should be able to handle any EOL sequence no matter how slow it is.
> Yes, and Gforth's READ-LINE handles CR, LF, and CRLF.
> >Once in a while I have to process postscript files with a size of up to 2 GB.
> A 2GB file would take about 10s to input with READ-LINE on a 4GHz
> Skylake.
> >SwiftForth version 3.10.2 and later use
> >the optimal algorithm, similar to your approach in /EOL, but written as code definitions.
> Since you mention it, I have also measured SwiftForth 3.11.0 with the
> same benchmark:
>
> LC_NUMERIC=en_US.utf8 perf stat -e cycles -e cycles:u -e cycles:k -e instructions:u sf 's" count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
>
> The result is curious:
>
> 5,945,307,655 cycles
> 768,834,475 cycles:u
> 5,131,957,800 cycles:k
> 733,948,005 instructions:u # 0.95 insn per cycle
>
> 1.488727585 seconds time elapsed
>
> 0.916473000 seconds user
> 0.572295000 seconds sys
>
> What is clear is that it takes quite a bit of time, and a significant
> part of that time is spent in the kernel (cycles:k and seconds sys; by
> contrast, for gforth-fast the same command reports 43M cycles and
> 0.004s sys). What is curious is that the cycles count and the process
> accounting disagree by so much about the proportion of time spent in
> the kernel.
>
> In any case the difference between Gforth's performance and
> SwiftForth's performance here is probably because Gforth uses buffered
> I/O, and SwiftForth seems to use unbuffered I/O.
>

Actually, all forths implement READ-LINE poorly.

Benchmark results:

- wc -l: 0.074s
- gcc: 0,290s
- LXF (Version 1.6-982-823 Compiled on 2017-12-03): 10.736s
- LXD (Version 1.6-984-825 Compiled on 2017-12-04): 10.673s
- SwiftForth (SwiftForth i386-Linux 3.11.2 22-Jun-2021): 9.082s
- VfxForth_x64 ( 5.20 RC2 [build 0114] 2021-05-27): 6.760s
- VfxForth_x86 ( 5.20 [build 0749] 2021-05-27): 8.636s
- KForth-64 (0.2.3): 3m47.700s
- SPF4 (4.21): 8.087s
- gforth-0.7.3: 3.730s
- gforth-fast-0.7.3: 3.296s
- gforth 0.7.9_20211007: 5.228s
- gforth-fast 0.7.9_20211007: 5.037s

MinForth (V3.4.8 - 64) refused to execute benchmark for unknown
reason...

All tests performed under ArchLinux 64-bit (5.14.12-arch1-1) on
AMD5950X.

test.txt: 1016666745 bytes (970Mb). The file was generated from FASTA
"language shootout benchmark" with argument 100000000.

Forth source code:

\ uncomment for SPF
\ REQUIRE INCLUDE lib/include/ansi.f

\ uncomment for KForth
\ INCLUDE ans-words.4th
\ INCLUDE strings.4th
\ INCLUDE files.4th

[UNDEFINED] BUFFER: [IF]
: BUFFER: CREATE ALLOT ;
[THEN]

[UNDEFINED] 0<= [IF]
: 0<= 0> INVERT ;
[THEN]

1024 CONSTANT maxline
maxline BUFFER: buf

0 VALUE file

: test-read-line
S" test.txt" R/O BIN OPEN-FILE ABORT" open failed" TO file
BEGIN
buf maxline file READ-LINE \ n flag ior
2DROP \ n
0<=
UNTIL
;

test-read-line BYE

C source code:

#include <stdio.h>
#include <stdlib.h>

int main()
{ char buf[1024];
char* s = buf;
FILE* f = fopen("test.txt", "rb");
if(f == NULL) { perror("oops"); exit(-1); }
size_t n = sizeof(buf);
while(fgets(s, n, f) != NULL);
fclose(f);
}

P.S. Please treat this as an official performance regression report for
gforth.

P.P.S. And yes, READ-LINE poor performance is a big problem for
"language shootout" benchmarks...

Re: Handling unsupported line-endings

<sknsb6$1jc7$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14995&group=comp.lang.forth#14995

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 12:45:10 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sknsb6$1jc7$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="52615"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 20 Oct 2021 01:45 UTC

On 20/10/2021 06:26, Nickolay Kolchin wrote:
>
> 1024 CONSTANT maxline
> maxline BUFFER: buf

Shouldn't that be:

maxline 2 + BUFFER: buf

>
> 0 VALUE file
>
> : test-read-line
> S" test.txt" R/O BIN OPEN-FILE ABORT" open failed" TO file
> BEGIN
> buf maxline file READ-LINE \ n flag ior
> 2DROP \ n
> 0<=
> UNTIL
> ;
>
>
> int main()
> {
> char buf[1024];
> char* s = buf;
> FILE* f = fopen("test.txt", "rb");
> if(f == NULL) { perror("oops"); exit(-1); }
> size_t n = sizeof(buf);
> while(fgets(s, n, f) != NULL);
> fclose(f);
> }

It begs the question what could be the difference. Potential ones
that spring to mind:

- AFAIK fgets reads exactly 1024 bytes. It's unclear what read-line
actually reads.
- what line endings are being handled which determines implementation
- internal buffering
- was read-line implemented 'on top of' fgets

> P.P.S. And yes, READ-LINE poor performance is a big problem for
> "language shootout" benchmarks...

Mixing apples and oranges?

Re: Handling unsupported line-endings

<sko3h4$1p8s$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14996&group=comp.lang.forth#14996

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 14:47:48 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sko3h4$1p8s$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="58652"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 20 Oct 2021 03:47 UTC

On 20/10/2021 12:45, dxforth wrote:
> On 20/10/2021 06:26, Nickolay Kolchin wrote:
>>
>> 1024 CONSTANT maxline
>> maxline BUFFER: buf
>
> Shouldn't that be:
>
> maxline 2 + BUFFER: buf

I notice ANS says:

"The line buffer provided by c-addr should be at least u1+2 characters long."

"should" equates to "recommend". Why not "shall"? AFAIR a standard program
has no knowledge of READ-LINE's workings and must assume worst case.

Re: Handling unsupported line-endings

<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14997&group=comp.lang.forth#14997

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:6214:4108:: with SMTP id kc8mr4407606qvb.54.1634711000748; Tue, 19 Oct 2021 23:23:20 -0700 (PDT)
X-Received: by 2002:a37:a401:: with SMTP id n1mr3659770qke.390.1634711000295; Tue, 19 Oct 2021 23:23:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 19 Oct 2021 23:23:20 -0700 (PDT)
In-Reply-To: <sknsb6$1jc7$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 06:23:20 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 29
 by: Nickolay Kolchin - Wed, 20 Oct 2021 06:23 UTC

On Wednesday, October 20, 2021 at 4:45:13 AM UTC+3, dxforth wrote:
> On 20/10/2021 06:26, Nickolay Kolchin wrote:
> >
> > 1024 CONSTANT maxline
> > maxline BUFFER: buf
> Shouldn't that be:
>
> maxline 2 + BUFFER: buf

Missed that. Thanks. MinForth still doesn't work.

> It begs the question what could be the difference. Potential ones
> that spring to mind:
>

> - what line endings are being handled which determines implementation

The whole "line-ending thing" is CP/M legacy. I.e. Unix hosted
implementations shouldn't care about that.

> - was read-line implemented 'on top of' fgets

The real question -- why gforth doesn't use fgets()?

> > P.P.S. And yes, READ-LINE poor performance is a big problem for
> > "language shootout" benchmarks...
> Mixing apples and oranges?

No. Some shootout programs read large files line-by-line and I/O
overhead takes more time than actual benchmark.

Re: Handling unsupported line-endings

<skof9j$1g94$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14998&group=comp.lang.forth#14998

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 18:08:34 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skof9j$1g94$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="49444"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 20 Oct 2021 07:08 UTC

On 20/10/2021 17:23, Nickolay Kolchin wrote:
> On Wednesday, October 20, 2021 at 4:45:13 AM UTC+3, dxforth wrote:
>
>> - what line endings are being handled which determines implementation
>
> The whole "line-ending thing" is CP/M legacy. I.e. Unix hosted
> implementations shouldn't care about that.
>
>> - was read-line implemented 'on top of' fgets
>
> The real question -- why gforth doesn't use fgets()?

Given other line endings aren't going away anytime soon, one must
accommodate them. The only question is how.

Re: Handling unsupported line-endings

<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=14999&group=comp.lang.forth#14999

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ac8:4155:: with SMTP id e21mr5183516qtm.312.1634716259897;
Wed, 20 Oct 2021 00:50:59 -0700 (PDT)
X-Received: by 2002:a05:6214:20ac:: with SMTP id 12mr4664291qvd.13.1634716259645;
Wed, 20 Oct 2021 00:50:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 00:50:59 -0700 (PDT)
In-Reply-To: <skof9j$1g94$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 07:50:59 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Nickolay Kolchin - Wed, 20 Oct 2021 07:50 UTC

On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:

> Given other line endings aren't going away anytime soon, one must
> accommodate them. The only question is how.

No reason to do this when running under Unix. This contradicts with
all other system tools.

Re: Handling unsupported line-endings

<skoj3g$16ui$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15000&group=comp.lang.forth#15000

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 19:13:35 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skoj3g$16ui$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="39890"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 20 Oct 2021 08:13 UTC

On 20/10/2021 18:50, Nickolay Kolchin wrote:
> On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
>
>> Given other line endings aren't going away anytime soon, one must
>> accommodate them. The only question is how.
>
> No reason to do this when running under Unix. This contradicts with
> all other system tools.

You want to load Intel HEX data files. The data files have CRLF line
endings. What will you do?

Re: Handling unsupported line-endings

<c1d93024-7e7f-4e2d-8b8b-f1c2c1cba00an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15001&group=comp.lang.forth#15001

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a0c:aa97:: with SMTP id f23mr4652953qvb.49.1634718182783; Wed, 20 Oct 2021 01:23:02 -0700 (PDT)
X-Received: by 2002:a37:c208:: with SMTP id i8mr3948402qkm.207.1634718182555; Wed, 20 Oct 2021 01:23:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 01:23:02 -0700 (PDT)
In-Reply-To: <skoj3g$16ui$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com> <skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com> <skoj3g$16ui$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c1d93024-7e7f-4e2d-8b8b-f1c2c1cba00an@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 08:23:02 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 13
 by: Nickolay Kolchin - Wed, 20 Oct 2021 08:23 UTC

On Wednesday, October 20, 2021 at 11:13:42 AM UTC+3, dxforth wrote:
> On 20/10/2021 18:50, Nickolay Kolchin wrote:
> > On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
> >
> >> Given other line endings aren't going away anytime soon, one must
> >> accommodate them. The only question is how.
> >
> > No reason to do this when running under Unix. This contradicts with
> > all other system tools.
> You want to load Intel HEX data files. The data files have CRLF line
> endings. What will you do?

Write custom load function. Look, text file ending is defined by underlying
operating system.

Re: Handling unsupported line-endings

<2021Oct20.110354@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15002&group=comp.lang.forth#15002

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 09:03:54 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 36
Message-ID: <2021Oct20.110354@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com> <skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="1dab82ece26fd7bfccda9dc95b4dae2b";
logging-data="24643"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18UKUhZoOhcaJPnTKXjo0gk"
Cancel-Lock: sha1:/IOGId3OhjUhXy7IDAOWbKuFbng=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 20 Oct 2021 09:03 UTC

Nickolay Kolchin <nbkolchin@gmail.com> writes:
>On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
>
>> Given other line endings aren't going away anytime soon, one must
>> accommodate them. The only question is how.
>
>No reason to do this when running under Unix.

Unix is not living in isolation. I do a git pull, and some of the
files I get have CRLF newlines (not sure if CR-only is still a thing,
but better safe than sorry). You can complicate your workflow by
always converting files after every git pull, or you use a tool that
observes Postel's law, and accepts all kinds of newlines. Gforth's
READ-LINE is designed for the latter usage.

>This contradicts with
>all other system tools.

I have yet to encounter or hear about problems with the way Gforth's
READ-LINE handles newlines.

By contrast, some other programs have caused problems when they were
fed files with CRLF (or without trailing newline), so I had to find a
workaround for that. I did so by writing a cat replacement in Gforth
(but ironically without using READ-LINE) that conditioned the input.

It's funny that cat does not have an option to insert trailing
newlines among the many options it has. It's really uncool if you cat
a bunch of .csv files, and some of them have no trailing newlines.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<685a27aa-fcf8-4021-9c4d-d6cfd2c99230n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15003&group=comp.lang.forth#15003

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a37:8ec6:: with SMTP id q189mr4364784qkd.145.1634722847437; Wed, 20 Oct 2021 02:40:47 -0700 (PDT)
X-Received: by 2002:ac8:5dd1:: with SMTP id e17mr5495277qtx.313.1634722847210; Wed, 20 Oct 2021 02:40:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 02:40:47 -0700 (PDT)
In-Reply-To: <2021Oct20.110354@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com> <skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com> <2021Oct20.110354@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <685a27aa-fcf8-4021-9c4d-d6cfd2c99230n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 09:40:47 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 19
 by: Nickolay Kolchin - Wed, 20 Oct 2021 09:40 UTC

On Wednesday, October 20, 2021 at 12:30:42 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
> >
> >> Given other line endings aren't going away anytime soon, one must
> >> accommodate them. The only question is how.
> >
> >No reason to do this when running under Unix.
> Unix is not living in isolation. I do a git pull, and some of the
> files I get have CRLF newlines (not sure if CR-only is still a thing,
> but better safe than sorry). You can complicate your workflow by
> always converting files after every git pull, or you use a tool that
> observes Postel's law, and accepts all kinds of newlines. Gforth's
> READ-LINE is designed for the latter usage.

This is a poor example. Git have everything for auto dealing with
line-endings. If it doesn't work -- you have broken configuration. For
other things "dos2unix" exists.

What about gforh 0.7.3 - 0.7.9 performance regression? Is it intended?

Re: Handling unsupported line-endings

<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15004&group=comp.lang.forth#15004

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:4cf:: with SMTP id q15mr5888420qtx.265.1634725514527;
Wed, 20 Oct 2021 03:25:14 -0700 (PDT)
X-Received: by 2002:a37:a046:: with SMTP id j67mr4305265qke.127.1634725514346;
Wed, 20 Oct 2021 03:25:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 03:25:14 -0700 (PDT)
In-Reply-To: <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:ed:a723:ba01:2132:baa5:ae0:cfc8;
posting-account=mrP5kgoAAADXISqI3e5f4EXLUinHClBq
NNTP-Posting-Host: 2003:ed:a723:ba01:2132:baa5:ae0:cfc8
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: hheinric...@gmail.com (Heinrich Hohl)
Injection-Date: Wed, 20 Oct 2021 10:25:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: Heinrich Hohl - Wed, 20 Oct 2021 10:25 UTC

On Wednesday, October 20, 2021 at 9:51:00 AM UTC+2, Nickolay Kolchin wrote:
> On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
>
> > Given other line endings aren't going away anytime soon, one must
> > accommodate them. The only question is how.
> No reason to do this when running under Unix. This contradicts with
> all other system tools.

If you write text files on a PC running under Unix/Linux, you should of course
us LF as an EOL character. This is the default EOL character under this OS.

Reading text files is a different matter.

When reading text files under any OS, you cannot know under which OS these
files have been generated. Windows? Mac? Linux?

I have Windows and Linux PCs running in the same network. And all PCs in the
world are somehow connected with each other via Internet. It makes sense that
READ-LINE can handle text files that are using any EOL sequence.

Even worse: Open a postscript file in a text editor. The used EOL sequence
may change several times within a postscript file. You will find CRLF, CR as
well as LF used as EOL sequences in the same file. In order to process
postscript files you need a READ-LINE routine that can handle all cases.

Henry

Re: Handling unsupported line-endings

<2021Oct20.123903@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15005&group=comp.lang.forth#15005

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 10:39:03 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 37
Message-ID: <2021Oct20.123903@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com> <skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com> <2021Oct20.110354@mips.complang.tuwien.ac.at> <685a27aa-fcf8-4021-9c4d-d6cfd2c99230n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="1dab82ece26fd7bfccda9dc95b4dae2b";
logging-data="25191"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8DY9Sy76A6L8FDX3076+X"
Cancel-Lock: sha1:56G/BDbyfkCv+hs8a9quySXtJzU=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 20 Oct 2021 10:39 UTC

Nickolay Kolchin <nbkolchin@gmail.com> writes:
>On Wednesday, October 20, 2021 at 12:30:42 PM UTC+3, Anton Ertl wrote:
>> Nickolay Kolchin <nbko...@gmail.com> writes:
>> >On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
>> >
>> >> Given other line endings aren't going away anytime soon, one must
>> >> accommodate them. The only question is how.
>> >
>> >No reason to do this when running under Unix.
>> Unix is not living in isolation. I do a git pull, and some of the
>> files I get have CRLF newlines (not sure if CR-only is still a thing,
>> but better safe than sorry). You can complicate your workflow by
>> always converting files after every git pull, or you use a tool that
>> observes Postel's law, and accepts all kinds of newlines. Gforth's
>> READ-LINE is designed for the latter usage.
>
>This is a poor example. Git have everything for auto dealing with
>line-endings. If it doesn't work -- you have broken configuration.

Maybe we have; are we the only ones? In any case, blaming the
configuration does not fix the problem.

>For
>other things "dos2unix" exists.
>
>What about gforh 0.7.3 - 0.7.9 performance regression? Is it intended?

It's a consequence of an intended change (READ-LINE now continues if
the lower level produces an EINTR rather than delivering a non-zero
ior), but I think it can be implemented more efficiently.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15006&group=comp.lang.forth#15006

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ac8:5794:: with SMTP id v20mr6277378qta.243.1634730444657;
Wed, 20 Oct 2021 04:47:24 -0700 (PDT)
X-Received: by 2002:ac8:4553:: with SMTP id z19mr6311525qtn.187.1634730444471;
Wed, 20 Oct 2021 04:47:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 04:47:24 -0700 (PDT)
In-Reply-To: <428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 11:47:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 29
 by: Nickolay Kolchin - Wed, 20 Oct 2021 11:47 UTC

On Wednesday, October 20, 2021 at 1:25:15 PM UTC+3, Heinrich Hohl wrote:
> On Wednesday, October 20, 2021 at 9:51:00 AM UTC+2, Nickolay Kolchin wrote:
> > On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
> >
> > > Given other line endings aren't going away anytime soon, one must
> > > accommodate them. The only question is how.
> > No reason to do this when running under Unix. This contradicts with
> > all other system tools.
> If you write text files on a PC running under Unix/Linux, you should of course
> us LF as an EOL character. This is the default EOL character under this OS.
>
> Reading text files is a different matter.
>
> When reading text files under any OS, you cannot know under which OS these
> files have been generated. Windows? Mac? Linux?
>
> I have Windows and Linux PCs running in the same network. And all PCs in the
> world are somehow connected with each other via Internet. It makes sense that
> READ-LINE can handle text files that are using any EOL sequence.
>
> Even worse: Open a postscript file in a text editor. The used EOL sequence
> may change several times within a postscript file. You will find CRLF, CR as
> well as LF used as EOL sequences in the same file. In order to process
> postscript files you need a READ-LINE routine that can handle all cases.
>

I was going to write that no other language have such "readline" behaviour by
default, but discovered that Julia has.

Nevertheless, it should not be worth the 15-45 times performance regression.

Re: Handling unsupported line-endings

<skp023$1kto$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15007&group=comp.lang.forth#15007

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 22:54:41 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skp023$1kto$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<2021Oct20.110354@mips.complang.tuwien.ac.at>
<685a27aa-fcf8-4021-9c4d-d6cfd2c99230n@googlegroups.com>
<2021Oct20.123903@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="54200"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 20 Oct 2021 11:54 UTC

On 20/10/2021 21:39, Anton Ertl wrote:
> Nickolay Kolchin <nbkolchin@gmail.com> writes:
>>On Wednesday, October 20, 2021 at 12:30:42 PM UTC+3, Anton Ertl wrote:
>>> Nickolay Kolchin <nbko...@gmail.com> writes:
>>> >On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
>>> >
>>> >> Given other line endings aren't going away anytime soon, one must
>>> >> accommodate them. The only question is how.
>>> >
>>> >No reason to do this when running under Unix.
>>> Unix is not living in isolation. I do a git pull, and some of the
>>> files I get have CRLF newlines (not sure if CR-only is still a thing,
>>> but better safe than sorry). You can complicate your workflow by
>>> always converting files after every git pull, or you use a tool that
>>> observes Postel's law, and accepts all kinds of newlines. Gforth's
>>> READ-LINE is designed for the latter usage.
>>
>>This is a poor example. Git have everything for auto dealing with
>>line-endings. If it doesn't work -- you have broken configuration.
>
> Maybe we have; are we the only ones? In any case, blaming the
> configuration does not fix the problem.

Had similar experience with GitHub transforming DOS source files to
Unix line ending on download. While not my files it became my problem.
Due to the number of files involved I needed a conversion utility
that did wildcards. None could be found and ended up writing one.

Re: Handling unsupported line-endings

<skp08u$ge5$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15008&group=comp.lang.forth#15008

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 14:58:21 +0300
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <skp08u$ge5$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
<ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 20 Oct 2021 11:58:22 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6a86d84c1e78a1b2d07e1ff8e4227c5d";
logging-data="16837"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19fEChlyCQNjCTM5D40dr/s"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.0
Cancel-Lock: sha1:scSMx2U7qC3T9jIw6s1KXpMcj68=
In-Reply-To: <ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
Content-Language: en-US
 by: Ron AARON - Wed, 20 Oct 2021 11:58 UTC

On 20/10/2021 14:47, Nickolay Kolchin wrote:
> On Wednesday, October 20, 2021 at 1:25:15 PM UTC+3, Heinrich Hohl wrote:
>> On Wednesday, October 20, 2021 at 9:51:00 AM UTC+2, Nickolay Kolchin wrote:
>>> On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
>>>
>>>> Given other line endings aren't going away anytime soon, one must
>>>> accommodate them. The only question is how.
>>> No reason to do this when running under Unix. This contradicts with
>>> all other system tools.
>> If you write text files on a PC running under Unix/Linux, you should of course
>> us LF as an EOL character. This is the default EOL character under this OS.
>>
>> Reading text files is a different matter.
>>
>> When reading text files under any OS, you cannot know under which OS these
>> files have been generated. Windows? Mac? Linux?
>>
>> I have Windows and Linux PCs running in the same network. And all PCs in the
>> world are somehow connected with each other via Internet. It makes sense that
>> READ-LINE can handle text files that are using any EOL sequence.
>>
>> Even worse: Open a postscript file in a text editor. The used EOL sequence
>> may change several times within a postscript file. You will find CRLF, CR as
>> well as LF used as EOL sequences in the same file. In order to process
>> postscript files you need a READ-LINE routine that can handle all cases.
>>
>
> I was going to write that no other language have such "readline" behaviour by
> default, but discovered that Julia has.
>
> Nevertheless, it should not be worth the 15-45 times performance regression.

8th's "f:getline" also handles all the CRLF,CR,LF variants (mixed in the
same file or not). Precisely because we can't control what format data
arrives in.

The performance penalty is minuscule, because the only time a check is
performed is if the character read was a CR or LF.

In my experience, the time spent in the entire rest of your program is
almost always greater than the time spent parsing lines of text from a
text file.

Re: Handling unsupported line-endings

<b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15009&group=comp.lang.forth#15009

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:94:: with SMTP id o20mr6631826qtw.169.1634734425889;
Wed, 20 Oct 2021 05:53:45 -0700 (PDT)
X-Received: by 2002:ac8:7f52:: with SMTP id g18mr6614002qtk.196.1634734425699;
Wed, 20 Oct 2021 05:53:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 05:53:45 -0700 (PDT)
In-Reply-To: <skp08u$ge5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com> <ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 12:53:45 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 59
 by: Nickolay Kolchin - Wed, 20 Oct 2021 12:53 UTC

On Wednesday, October 20, 2021 at 2:58:24 PM UTC+3, Ron AARON wrote:
> On 20/10/2021 14:47, Nickolay Kolchin wrote:
> > On Wednesday, October 20, 2021 at 1:25:15 PM UTC+3, Heinrich Hohl wrote:
> >> On Wednesday, October 20, 2021 at 9:51:00 AM UTC+2, Nickolay Kolchin wrote:
> >>> On Wednesday, October 20, 2021 at 10:08:38 AM UTC+3, dxforth wrote:
> >>>
> >>>> Given other line endings aren't going away anytime soon, one must
> >>>> accommodate them. The only question is how.
> >>> No reason to do this when running under Unix. This contradicts with
> >>> all other system tools.
> >> If you write text files on a PC running under Unix/Linux, you should of course
> >> us LF as an EOL character. This is the default EOL character under this OS.
> >>
> >> Reading text files is a different matter.
> >>
> >> When reading text files under any OS, you cannot know under which OS these
> >> files have been generated. Windows? Mac? Linux?
> >>
> >> I have Windows and Linux PCs running in the same network. And all PCs in the
> >> world are somehow connected with each other via Internet. It makes sense that
> >> READ-LINE can handle text files that are using any EOL sequence.
> >>
> >> Even worse: Open a postscript file in a text editor. The used EOL sequence
> >> may change several times within a postscript file. You will find CRLF, CR as
> >> well as LF used as EOL sequences in the same file. In order to process
> >> postscript files you need a READ-LINE routine that can handle all cases.
> >>
> >
> > I was going to write that no other language have such "readline" behaviour by
> > default, but discovered that Julia has.
> >
> > Nevertheless, it should not be worth the 15-45 times performance regression.
> 8th's "f:getline" also handles all the CRLF,CR,LF variants (mixed in the
> same file or not). Precisely because we can't control what format data
> arrives in.
>
> The performance penalty is minuscule, because the only time a check is
> performed is if the character read was a CR or LF.
>
> In my experience, the time spent in the entire rest of your program is
> almost always greater than the time spent parsing lines of text from a
> text file.

Only 15 times slower than C.

: test
"test.txt" f:open
repeat
f:getline drop
f:eof? not
while!
drop
;

test bye

- 4.344s

P.S. I was wrong about Julia. It doesn't handle 0D (mac classic) line endings,
which makes perfect sense...

Pages:1234567
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor