Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The clothes have no emperor. -- C. A. R. Hoare, commenting on ADA.


devel / comp.lang.forth / Re: Handling unsupported line-endings

SubjectAuthor
* Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsHeinrich Hohl
|`* Re: Handling unsupported line-endingsdxforth
| `* Re: Handling unsupported line-endingsHeinrich Hohl
|  +* Re: Handling unsupported line-endingsdxforth
|  |`- Re: Handling unsupported line-endingsdxforth
|  +* Re: Handling unsupported line-endingsminf...@arcor.de
|  |`- Re: Handling unsupported line-endingsdxforth
|  `* Re: Handling unsupported line-endingsAnton Ertl
|   +* Re: Handling unsupported line-endingsHeinrich Hohl
|   |`- Re: Handling unsupported line-endingsAnton Ertl
|   `* Re: Handling unsupported line-endingsNickolay Kolchin
|    +* Re: Handling unsupported line-endingsdxforth
|    |+* Re: Handling unsupported line-endingsdxforth
|    ||`* Re: Handling unsupported line-endingsAnton Ertl
|    || `* Re: Handling unsupported line-endingsdxforth
|    ||  +* Re: Handling unsupported line-endingsdxforth
|    ||  |`* Re: Handling unsupported line-endingsAnton Ertl
|    ||  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |  +- Re: Handling unsupported line-endingsdxforth
|    ||  |  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |   `* Re: Handling unsupported line-endingsdxforth
|    ||  |    `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |     `* Re: Handling unsupported line-endingsdxforth
|    ||  |      `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |       `* Re: Handling unsupported line-endingsdxforth
|    ||  |        `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |         `* Re: Handling unsupported line-endingsdxforth
|    ||  |          `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |           `* Re: Handling unsupported line-endingsdxforth
|    ||  |            `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |             `* Re: Handling unsupported line-endingsdxforth
|    ||  |              +* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |              |`* Re: Handling unsupported line-endingsdxforth
|    ||  |              | `* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  +* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |`* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |  `- Re: Handling unsupported line-endingsRuvim
|    ||  |              |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    ||  |              |   `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |    `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |     `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |      `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |       `- Re: Handling unsupported line-endingsRon AARON
|    ||  |              `* Re: Handling unsupported line-endingsdxforth
|    ||  |               `- Re: Handling unsupported line-endingsdxforth
|    ||  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||   `* Re: Handling unsupported line-endingsdxforth
|    ||    `- Re: Handling unsupported line-endingsAnton Ertl
|    |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | +* Re: Handling unsupported line-endingsdxforth
|    | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsdxforth
|    | | |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsAnton Ertl
|    | | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | | `* Re: Handling unsupported line-endingsAnton Ertl
|    | | |  `- Re: Handling unsupported line-endingsdxforth
|    | | `* Re: Handling unsupported line-endingsHeinrich Hohl
|    | |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |   `* Re: Handling unsupported line-endingsRon AARON
|    | |    `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     +* Re: Handling unsupported line-endingspahihu
|    | |     |+- Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     |`- Re: Handling unsupported line-endingsRon AARON
|    | |     `* Re: Handling unsupported line-endingsRon AARON
|    | |      +* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |      |`- Re: Handling unsupported line-endingsRon AARON
|    | |      `- Re: Handling unsupported line-endingsdxforth
|    | `- Re: Handling unsupported line-endingsAnton Ertl
|    +* Re: Handling unsupported line-endingsAnton Ertl
|    |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    `* Re: Handling unsupported line-endingsMarcel Hendrix
|     +- Re: Handling unsupported line-endingsNickolay Kolchin
|     `* Re: Handling unsupported line-endingsAnton Ertl
|      `* Re: Handling unsupported line-endingsdxforth
|       `* Re: Handling unsupported line-endingsAnton Ertl
|        `* Re: Handling unsupported line-endingspahihu
|         +* Re: Handling unsupported line-endingsdxforth
|         |`* Re: Handling unsupported line-endingsAnton Ertl
|         | `- Re: Handling unsupported line-endingsdxforth
|         `- Re: Handling unsupported line-endingsAnton Ertl
+* Re: Handling unsupported line-endingsS Jack
|`- Re: Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsBranimir Maksimovic
|`- Re: Handling unsupported line-endingsdxforth
`* Re: Handling unsupported line-endingsdxforth
 +- Re: Handling unsupported line-endingsRuvim
 `* Re: Handling unsupported line-endingsAnton Ertl
  +* Re: Handling unsupported line-endingsRuvim
  |`* Re: Handling unsupported line-endingsAnton Ertl
  | `* Re: Handling unsupported line-endingsRuvim
  |  `* Re: Handling unsupported line-endingsAnton Ertl
  |   +* Re: Handling unsupported line-endingsRuvim
  |   |`- Re: Handling unsupported line-endingsAnton Ertl
  |   `* Re: Handling unsupported line-endingsdxforth
  |    `* Re: Handling unsupported line-endingsRuvim
  |     `* Re: Handling unsupported line-endingsdxforth
  |      `* Re: Handling unsupported line-endingsRuvim
  |       `* Re: Handling unsupported line-endingsdxforth
  `* Re: Handling unsupported line-endingsdxforth

Pages:1234567
Re: Handling unsupported line-endings

<924924b6-2820-48d9-bd52-3f4e79e13ae0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15010&group=comp.lang.forth#15010

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a37:8ec6:: with SMTP id q189mr5296381qkd.145.1634736485512; Wed, 20 Oct 2021 06:28:05 -0700 (PDT)
X-Received: by 2002:a05:620a:424b:: with SMTP id w11mr5126290qko.179.1634736485289; Wed, 20 Oct 2021 06:28:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 06:28:04 -0700 (PDT)
In-Reply-To: <skjhir$jd9$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:3f7a:20d0:78e4:5c6f:fbd8:a031; posting-account=V5nGoQoAAAC_P2U0qnxm2kC0s1jNJXJa
NNTP-Posting-Host: 2600:1700:3f7a:20d0:78e4:5c6f:fbd8:a031
References: <skjhir$jd9$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <924924b6-2820-48d9-bd52-3f4e79e13ae0n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: sdwjac...@gmail.com (S Jack)
Injection-Date: Wed, 20 Oct 2021 13:28:05 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 24
 by: S Jack - Wed, 20 Oct 2021 13:28 UTC

On Monday, October 18, 2021 at 5:17:03 AM UTC-5, dxforth wrote:
> I used this in a recent app and figure others may find it useful.
>

For the rare occasion:
:) cat foo
cr .( Hello )
cr .( World )
cr bye
:) xxd foo
00000000: 6372 202e 2820 4865 6c6c 6f20 290d 0a63 cr .( Hello )..c
00000010: 7220 2e28 2057 6f72 6c64 2029 0d0a 6372 r .( World )..cr
00000020: 2062 7965 0d0a bye..
:) sed -e 's/\x0d/\n/' -e 's/\x0a$//' foo| xxd
00000000: 6372 202e 2820 4865 6c6c 6f20 290a 6372 cr .( Hello ).cr
00000010: 202e 2820 576f 726c 6420 290a 6372 2062 .( World ).cr b
00000020: 7965 0a ye.
:) sed -e 's/\x0d/\n/' -e 's/\x0a$//' foo > bar ; frogd '"bar" fload'

Hello
World
:)

--
me

Re: Handling unsupported line-endings

<9b026806-4455-4767-b35c-2c87458596c2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15012&group=comp.lang.forth#15012

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:393:: with SMTP id j19mr64263qtx.166.1634738000735;
Wed, 20 Oct 2021 06:53:20 -0700 (PDT)
X-Received: by 2002:a37:274f:: with SMTP id n76mr126553qkn.510.1634738000537;
Wed, 20 Oct 2021 06:53:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 06:53:20 -0700 (PDT)
In-Reply-To: <b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=78.131.56.82; posting-account=5cIhGQgAAAD51vWxObfbr2Fz1M5rcgWL
NNTP-Posting-Host: 78.131.56.82
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com> <ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me> <b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9b026806-4455-4767-b35c-2c87458596c2n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: pah...@gmail.com (pahihu)
Injection-Date: Wed, 20 Oct 2021 13:53:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2164
 by: pahihu - Wed, 20 Oct 2021 13:53 UTC

Nickolay Kolchin ezt írta (2021. október 20., szerda, 14:53:46 UTC+2):
> Only 15 times slower than C.
>
> : test
> "test.txt" f:open
> repeat
> f:getline drop
> f:eof? not
> while!
> drop
> ;
>

Hi,

In 8th you can mmap the file, it is faster.

: process-line drop ;

: test
"test.txt" true f:mmap >s
' process-line s:eachline
;

pahihu

Re: Handling unsupported line-endings

<24c92fcd-3e51-4fa4-97f1-dbf0a87b0652n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15013&group=comp.lang.forth#15013

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:620a:2544:: with SMTP id s4mr219330qko.219.1634739239995;
Wed, 20 Oct 2021 07:13:59 -0700 (PDT)
X-Received: by 2002:a37:a4c5:: with SMTP id n188mr254218qke.312.1634739239813;
Wed, 20 Oct 2021 07:13:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 07:13:59 -0700 (PDT)
In-Reply-To: <9b026806-4455-4767-b35c-2c87458596c2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com> <ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me> <b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
<9b026806-4455-4767-b35c-2c87458596c2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24c92fcd-3e51-4fa4-97f1-dbf0a87b0652n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 14:13:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 26
 by: Nickolay Kolchin - Wed, 20 Oct 2021 14:13 UTC

On Wednesday, October 20, 2021 at 4:53:21 PM UTC+3, pahihu wrote:
> Nickolay Kolchin ezt írta (2021. október 20., szerda, 14:53:46 UTC+2):
> > Only 15 times slower than C.
> >
> > : test
> > "test.txt" f:open
> > repeat
> > f:getline drop
> > f:eof? not
> > while!
> > drop
> > ;
> >
> Hi,
>
> In 8th you can mmap the file, it is faster.
>
> : process-line drop ;
>
> : test
> "test.txt" true f:mmap >s
> ' process-line s:eachline
> ;
>

Yes. A bit faster -- 2,982s.

Re: Handling unsupported line-endings

<skpedu$kn2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15014&group=comp.lang.forth#15014

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 18:59:57 +0300
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <skpedu$kn2$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
<ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me>
<b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 20 Oct 2021 15:59:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6a86d84c1e78a1b2d07e1ff8e4227c5d";
logging-data="21218"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19zNZSh2Yhd8Rh/Gpa1/q8C"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.0
Cancel-Lock: sha1:CLAYkeWhCRIcogcKSJsstYyU6+A=
In-Reply-To: <b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
Content-Language: en-US
 by: Ron AARON - Wed, 20 Oct 2021 15:59 UTC

On 20/10/2021 15:53, Nickolay Kolchin wrote:
> On Wednesday, October 20, 2021 at 2:58:24 PM UTC+3, Ron AARON wrote:
>> On 20/10/2021 14:47, Nickolay Kolchin wrote:
>>> Nevertheless, it should not be worth the 15-45 times performance regression.
>> 8th's "f:getline" also handles all the CRLF,CR,LF variants (mixed in the
>> same file or not). Precisely because we can't control what format data
>> arrives in.
>>
>> The performance penalty is minuscule, because the only time a check is
>> performed is if the character read was a CR or LF.
>>
>> In my experience, the time spent in the entire rest of your program is
>> almost always greater than the time spent parsing lines of text from a
>> text file.
>
> Only 15 times slower than C.
>
> : test
> "test.txt" f:open
> repeat
> f:getline drop
> f:eof? not
> while!
> drop
> ;
>
> test bye
>
> - 4.344s

Yes, it is slower than C. But C doesn't handle arbitrary EOL
terminators, nor does it allocate a dynamic string to pass on. So not
exactly a head-to-head comparison.

Furthermore, my point is that the time spent in reading a line is most
often much less than whatever time you spend processing that line.
You're just dropping it, which is not a particularly useful test case.

In any event, while I do spend some effort in making primitives faster
in 8th, I don't think I've actually every tried to make f:getline any
faster. Nobody has complained so far... and there's lots of other stuff
to do.

Re: Handling unsupported line-endings

<skpevp$p86$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15015&group=comp.lang.forth#15015

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 19:09:28 +0300
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <skpevp$p86$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
<ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me>
<b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
<9b026806-4455-4767-b35c-2c87458596c2n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 20 Oct 2021 16:09:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6a86d84c1e78a1b2d07e1ff8e4227c5d";
logging-data="25862"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19iE+4SCU0tKcmlyzqxyaJ2"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.0
Cancel-Lock: sha1:uhTdNV/sS0GSc1qcwjkdU5xC9Vk=
In-Reply-To: <9b026806-4455-4767-b35c-2c87458596c2n@googlegroups.com>
Content-Language: en-US
 by: Ron AARON - Wed, 20 Oct 2021 16:09 UTC

On 20/10/2021 16:53, pahihu wrote:
> Nickolay Kolchin ezt írta (2021. október 20., szerda, 14:53:46 UTC+2):
>> Only 15 times slower than C.
>>
>> : test
>> "test.txt" f:open
>> repeat
>> f:getline drop
>> f:eof? not
>> while!
>> drop
>> ;
>>
>
> Hi,
>
> In 8th you can mmap the file, it is faster.
>
> : process-line drop ;
>
> : test
> "test.txt" true f:mmap >s
> ' process-line s:eachline
> ;

True, that is faster (depending on the system configuration as well).

I still maintain that in normal use, what you do with the line is going
to take more time than just snipping it from the file.

The f:getline is built on top of C library fgetc().

I just did a 'perf' analysis, and 86% of the time of 'getline' is
actually spent in getc(). So probably a refactor would be a good idea...

Re: Handling unsupported line-endings

<ce6dfc9e-5be3-40e4-bb61-2e82a6e47d52n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15017&group=comp.lang.forth#15017

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ac8:5f82:: with SMTP id j2mr194118qta.75.1634747770508;
Wed, 20 Oct 2021 09:36:10 -0700 (PDT)
X-Received: by 2002:a05:620a:24d1:: with SMTP id m17mr134902qkn.316.1634747770279;
Wed, 20 Oct 2021 09:36:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 09:36:10 -0700 (PDT)
In-Reply-To: <skpedu$kn2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org> <1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com> <ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me> <b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
<skpedu$kn2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ce6dfc9e-5be3-40e4-bb61-2e82a6e47d52n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 20 Oct 2021 16:36:10 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 57
 by: Nickolay Kolchin - Wed, 20 Oct 2021 16:36 UTC

On Wednesday, October 20, 2021 at 7:00:00 PM UTC+3, Ron AARON wrote:
> On 20/10/2021 15:53, Nickolay Kolchin wrote:
> > On Wednesday, October 20, 2021 at 2:58:24 PM UTC+3, Ron AARON wrote:
> >> On 20/10/2021 14:47, Nickolay Kolchin wrote:
> >>> Nevertheless, it should not be worth the 15-45 times performance regression.
> >> 8th's "f:getline" also handles all the CRLF,CR,LF variants (mixed in the
> >> same file or not). Precisely because we can't control what format data
> >> arrives in.
> >>
> >> The performance penalty is minuscule, because the only time a check is
> >> performed is if the character read was a CR or LF.
> >>
> >> In my experience, the time spent in the entire rest of your program is
> >> almost always greater than the time spent parsing lines of text from a
> >> text file.
> >
> > Only 15 times slower than C.
> >
> > : test
> > "test.txt" f:open
> > repeat
> > f:getline drop
> > f:eof? not
> > while!
> > drop
> > ;
> >
> > test bye
> >
> > - 4.344s
> Yes, it is slower than C. But C doesn't handle arbitrary EOL
> terminators, nor does it allocate a dynamic string to pass on. So not
> exactly a head-to-head comparison.
>
> Furthermore, my point is that the time spent in reading a line is most
> often much less than whatever time you spend processing that line.
> You're just dropping it, which is not a particularly useful test case.
>

oforth will be probably a fair comparison:

: test
"test.txt" File new dup open(File.READ)
begin
dup readLine drop dup end?
until
;

test bye

Result -- 1,757s

> In any event, while I do spend some effort in making primitives faster
> in 8th, I don't think I've actually every tried to make f:getline any
> faster. Nobody has complained so far... and there's lots of other stuff
> to do.

That's true. 8th is definitely not a performance workhorse.

Re: Handling unsupported line-endings

<skpjb8$ndn$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15018&group=comp.lang.forth#15018

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 20:23:52 +0300
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <skpjb8$ndn$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
<ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me>
<b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
<skpedu$kn2$1@dont-email.me>
<ce6dfc9e-5be3-40e4-bb61-2e82a6e47d52n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 20 Oct 2021 17:23:52 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6a86d84c1e78a1b2d07e1ff8e4227c5d";
logging-data="23991"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UdW9b6PCzbNwCJPflSi4D"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.0
Cancel-Lock: sha1:dMBywFFlqBUVS8Mbz8BRd1Ps2pE=
In-Reply-To: <ce6dfc9e-5be3-40e4-bb61-2e82a6e47d52n@googlegroups.com>
Content-Language: en-US
 by: Ron AARON - Wed, 20 Oct 2021 17:23 UTC

On 20/10/2021 19:36, Nickolay Kolchin wrote:
> On Wednesday, October 20, 2021 at 7:00:00 PM UTC+3, Ron AARON wrote:
>> On 20/10/2021 15:53, Nickolay Kolchin wrote:
>>> On Wednesday, October 20, 2021 at 2:58:24 PM UTC+3, Ron AARON wrote:
>>>> On 20/10/2021 14:47, Nickolay Kolchin wrote:
>>>>> Nevertheless, it should not be worth the 15-45 times performance regression.
>>>> 8th's "f:getline" also handles all the CRLF,CR,LF variants (mixed in the
>>>> same file or not). Precisely because we can't control what format data
>>>> arrives in.
>>>>
>>>> The performance penalty is minuscule, because the only time a check is
>>>> performed is if the character read was a CR or LF.
>>>>
>>>> In my experience, the time spent in the entire rest of your program is
>>>> almost always greater than the time spent parsing lines of text from a
>>>> text file.
>>>
>>> Only 15 times slower than C.
>>>
>>> : test
>>> "test.txt" f:open
>>> repeat
>>> f:getline drop
>>> f:eof? not
>>> while!
>>> drop
>>> ;
>>>
>>> test bye
>>>
>>> - 4.344s
>> Yes, it is slower than C. But C doesn't handle arbitrary EOL
>> terminators, nor does it allocate a dynamic string to pass on. So not
>> exactly a head-to-head comparison.
>>
>> Furthermore, my point is that the time spent in reading a line is most
>> often much less than whatever time you spend processing that line.
>> You're just dropping it, which is not a particularly useful test case.
>>
>
> oforth will be probably a fair comparison:
>
> : test
> "test.txt" File new dup open(File.READ)
> begin
> dup readLine drop dup end?
> until
> ;
>
> test bye
>
> Result -- 1,757s
>
>> In any event, while I do spend some effort in making primitives faster
>> in 8th, I don't think I've actually every tried to make f:getline any
>> faster. Nobody has complained so far... and there's lots of other stuff
>> to do.
>
> That's true. 8th is definitely not a performance workhorse.

No arguments from me: performance per se is not one of its goals.
*Adequate* performance for the tasks it's used for is.

Re: Handling unsupported line-endings

<2021Oct21.001107@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15019&group=comp.lang.forth#15019

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 22:11:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2021Oct21.001107@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="cb6b39d610a1492c953bbd599323a98e";
logging-data="4591"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18HY0fDWl2RsvZAkDZI2pXh"
Cancel-Lock: sha1:4t1KVBlbNCsQx4rMJDYKIOSAR18=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 20 Oct 2021 22:11 UTC

Nickolay Kolchin <nbkolchin@gmail.com> writes:
>- gforth-0.7.3: 3.730s
>- gforth-fast-0.7.3: 3.296s
>- gforth 0.7.9_20211007: 5.228s
>- gforth-fast 0.7.9_20211007: 5.037s
....
>P.S. Please treat this as an official performance regression report for
>gforth.

Thank you. Mostly Fixed by commit
1d7f2373f67acea44d8a2e084b29f1ac2e375022.

For my smaller benchmark:

LC_NUMERIC=en_US.utf8 perf stat -e cycles -e cycles:u -e cycles:k -e instructions:u gforth-fast -e 's" ../forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
cycles:u
802M 0.7.9 before patch
503,446,908 0.7.9 with patch
485,725,830 0.7.3

It's possible to be a lot faster by implementing buffering for
READ-FILE and READ-LINE ourselves (one can then use SIMD instructions
for scanning for CR and LF), but for now it does not appear to be that
important to be faster to go to these lengths.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct21.002727@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15020&group=comp.lang.forth#15020

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 20 Oct 2021 22:27:27 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 16
Message-ID: <2021Oct21.002727@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="cb6b39d610a1492c953bbd599323a98e";
logging-data="4591"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+udfx9GLXOi+P7o3RkLGR6"
Cancel-Lock: sha1:gfZ6JdiBKtqIE4dXPJLlQEoR1fA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 20 Oct 2021 22:27 UTC

Nickolay Kolchin <nbkolchin@gmail.com> writes:
>The real question -- why gforth doesn't use fgets()?

That's a good question. I looked at gforth-0.3.0 which only deals
with one kind of newline, and it does not use fgets(), either; so it's
not just that fgets() does not work with our requirements for
newlines, apparently we found a mismatch between READ-LINE
requirements and what fgets() provides already earlier. I don't know
what that mismatch was, though.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<FY2cJ.2541$oo4.1029@fx02.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15022&group=comp.lang.forth#15022

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
Newsgroups: comp.lang.forth
From: branimir...@icloud.com (Branimir Maksimovic)
Subject: Re: Handling unsupported line-endings
References: <skjhir$jd9$1@gioia.aioe.org>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 15
Message-ID: <FY2cJ.2541$oo4.1029@fx02.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Thu, 21 Oct 2021 01:00:21 UTC
Organization: usenet-news.net
Date: Thu, 21 Oct 2021 01:00:21 GMT
X-Received-Bytes: 1150
 by: Branimir Maksimovic - Thu, 21 Oct 2021 01:00 UTC

On 2021-10-18, dxforth <dxforth@gmail.com> wrote:
> I used this in a recent app and figure others may find it useful.
>
> My READ-LINE automatically handles LF or CRLF end-of-line (as well as
> CP/M end-of-file character). Rarely do I process files with CR line
> endings but should it happen, here's a substitute routine I've used
> for that. It automatically handles CR, LF or CRLF endings.
>
What are supported line endings when standard line endings are NOT?

--

7-77-777
Evil Sinner!
with software, you repeat same experiment, expecting different results...

Re: Handling unsupported line-endings

<skqel8$s65$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15023&group=comp.lang.forth#15023

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 21 Oct 2021 12:09:58 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skqel8$s65$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org>
<fde1b1eb-e256-470a-86df-f55d224e3accn@googlegroups.com>
<skof9j$1g94$1@gioia.aioe.org>
<1009e30b-dd8d-4a78-b24e-bc9cb8770a37n@googlegroups.com>
<428f260a-702f-44d8-bfae-3a246396fc54n@googlegroups.com>
<ddcec95d-31aa-42c1-bc6b-35424e7b3864n@googlegroups.com>
<skp08u$ge5$1@dont-email.me>
<b099beb0-a80e-4796-b54b-6fb7cb5d61b6n@googlegroups.com>
<skpedu$kn2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="28869"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Thu, 21 Oct 2021 01:09 UTC

On 21/10/2021 02:59, Ron AARON wrote:
>
> Furthermore, my point is that the time spent in reading a line is most
> often much less than whatever time you spend processing that line.
> You're just dropping it, which is not a particularly useful test case.

Everyone loves a benchmark comparison - never mind how useful it is :)

Back in the day how fast your forth compiled source was seen as a
measure of its quality. Despite the hard drives and fast desktops of
today old habits die hard. SwiftForth's multi-thread dictionary and
reading in whole source files before compiling being an example. The
argument will be that memory is cheap so why not.

Re: Handling unsupported line-endings

<skqfdd$12r3$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15024&group=comp.lang.forth#15024

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 21 Oct 2021 12:22:53 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skqfdd$12r3$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <FY2cJ.2541$oo4.1029@fx02.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="35683"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Thu, 21 Oct 2021 01:22 UTC

On 21/10/2021 12:00, Branimir Maksimovic wrote:
> On 2021-10-18, dxforth <dxforth@gmail.com> wrote:
>> I used this in a recent app and figure others may find it useful.
>>
>> My READ-LINE automatically handles LF or CRLF end-of-line (as well as
>> CP/M end-of-file character). Rarely do I process files with CR line
>> endings but should it happen, here's a substitute routine I've used
>> for that. It automatically handles CR, LF or CRLF endings.
>>
> What are supported line endings when standard line endings are NOT?
>

The line endings supported by READ-LINE is implementation-defined.

Re: Handling unsupported line-endings

<skqkf8$ndl$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15026&group=comp.lang.forth#15026

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 21 Oct 2021 13:49:11 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skqkf8$ndl$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<924924b6-2820-48d9-bd52-3f4e79e13ae0n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="23989"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Thu, 21 Oct 2021 02:49 UTC

On 21/10/2021 00:28, S Jack wrote:
> On Monday, October 18, 2021 at 5:17:03 AM UTC-5, dxforth wrote:
>> I used this in a recent app and figure others may find it useful.
>>
>
> For the rare occasion:
> :) cat foo
> cr .( Hello )
> cr .( World )
> cr bye
> :) xxd foo
> 00000000: 6372 202e 2820 4865 6c6c 6f20 290d 0a63 cr .( Hello )..c
> 00000010: 7220 2e28 2057 6f72 6c64 2029 0d0a 6372 r .( World )..cr
> 00000020: 2062 7965 0d0a bye..
> :) sed -e 's/\x0d/\n/' -e 's/\x0a$//' foo| xxd
> 00000000: 6372 202e 2820 4865 6c6c 6f20 290a 6372 cr .( Hello ).cr
> 00000010: 202e 2820 576f 726c 6420 290a 6372 2062 .( World ).cr b
> 00000020: 7965 0a ye.
> :) sed -e 's/\x0d/\n/' -e 's/\x0a$//' foo > bar ; frogd '"bar" fload'
>
> Hello
> World
> :)

This is equivalent to using an external utility to convert a text file
to the desired line endings before passing it to the application?

That's what I did for the first version of the app. However because
conversion would almost certainly be necessary, I decided the next
version of the app should handle the conversion internally and not
burden the user.

Re: Handling unsupported line-endings

<2af61807-5e4d-4190-a6d5-3437a2724584n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15028&group=comp.lang.forth#15028

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ac8:4155:: with SMTP id e21mr3811959qtm.312.1634794971169;
Wed, 20 Oct 2021 22:42:51 -0700 (PDT)
X-Received: by 2002:a05:6214:154d:: with SMTP id t13mr2977171qvw.40.1634794970938;
Wed, 20 Oct 2021 22:42:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 20 Oct 2021 22:42:50 -0700 (PDT)
In-Reply-To: <2021Oct21.001107@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<2021Oct21.001107@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2af61807-5e4d-4190-a6d5-3437a2724584n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Thu, 21 Oct 2021 05:42:51 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16
 by: Nickolay Kolchin - Thu, 21 Oct 2021 05:42 UTC

On Thursday, October 21, 2021 at 1:27:23 AM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >- gforth-0.7.3: 3.730s
> >- gforth-fast-0.7.3: 3.296s
> >- gforth 0.7.9_20211007: 5.228s
> >- gforth-fast 0.7.9_20211007: 5.037s
> ...
> >P.S. Please treat this as an official performance regression report for
> >gforth.
> Thank you. Mostly Fixed by commit
> 1d7f2373f67acea44d8a2e084b29f1ac2e375022.
>

Confirmed.

- gforth-0.7.9_20211014: 2.783s
- gforth-fast-0.7.9_20211014: 2.686s

Re: Handling unsupported line-endings

<skr7s6$18ic$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15029&group=comp.lang.forth#15029

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 21 Oct 2021 19:20:22 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skr7s6$18ic$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<skla3u$ec8$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="41548"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Thu, 21 Oct 2021 08:20 UTC

Re-written for a tighter main loop:

\ Scan for CR LF CRLF return len & offset to next line
: /EOL ( a u -- u' offs )
over swap begin dup while
over c@ dup $0D - while $0A - while 1 /string
repeat \ got LF
drop swap - dup 1+ exit
then \ got CR
2drop tuck swap - swap 1+ c@ $0A <> over + 2+ exit
then \ neither
drop swap - dup ;

Re: Handling unsupported line-endings

<2021Oct21.175211@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15032&group=comp.lang.forth#15032

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 21 Oct 2021 15:52:11 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 36
Message-ID: <2021Oct21.175211@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <caa26119-443b-4fc8-b338-4924448c0005n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="cb6b39d610a1492c953bbd599323a98e";
logging-data="21480"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18d9XwzFKr3GM93YX/bO19p"
Cancel-Lock: sha1:l34aztMzreYIqjlXfDiMEdsyWfM=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 21 Oct 2021 15:52 UTC

Heinrich Hohl <hheinrich.hohl@gmail.com> writes:
>On Tuesday, October 19, 2021 at 10:40:49 AM UTC+2, Anton Ertl wrote:
>> Which files would READ-LINE open and close and why? Gforth's
>> READ-FILE certainly does not do that.
>
>Sorry, my mistake. You are right. Files are opened only once, then many
>READ-LINE operations are executed, and finally the file is closed again.
>But the file position must be updated after reading each line.

Yes, that's what SwiftForth is doing.

>Actually, speed was not that bad when I read the postscript file from a local hard disk.
>I assume that the hard disk cache made sure that performance does not suffer too much.

On a competent OS the OS cache does that, but it still costs system
call overhead (and the overhead of reading more than needed).

The alternative is to use user-level buffered I/O. Gforth uses C's
buffered I/O for that. Unfortunately the C interface is not so great
for implementing READ-LINE, but still better than using raw file
access.

>However, the file converter was extremely slow when I tried to read the
>huge postscript file from a NAS via the network. Not usable at all.
>Maybe file repositioning is slow if a network is involved.

What network file system did you use? NFS tends to cache the read
file for some seconds, so with NFS I would not expect to see such
behaviour.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct21.180930@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15033&group=comp.lang.forth#15033

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 21 Oct 2021 16:09:30 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 28
Message-ID: <2021Oct21.180930@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="cb6b39d610a1492c953bbd599323a98e";
logging-data="21480"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Qpx1JWs5oRan3045dym3Q"
Cancel-Lock: sha1:vIlVTt5DnddWn22Kp571i+Dewlg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 21 Oct 2021 16:09 UTC

dxforth <dxforth@gmail.com> writes:
>On 20/10/2021 12:45, dxforth wrote:
>> On 20/10/2021 06:26, Nickolay Kolchin wrote:
>>>
>>> 1024 CONSTANT maxline
>>> maxline BUFFER: buf
>>
>> Shouldn't that be:
>>
>> maxline 2 + BUFFER: buf
>
>I notice ANS says:
>
>"The line buffer provided by c-addr should be at least u1+2 characters long."
>
>"should" equates to "recommend". Why not "shall"? AFAIR a standard program
>has no knowledge of READ-LINE's workings and must assume worst case.

Yes. That's pretty clear from the text, so the use of "should"
apparently has not caused confusion (I don't remember anybody being
confused by it). It's an error-prone interface, though.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<skt1fp$1826$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15035&group=comp.lang.forth#15035

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 11:43:37 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skt1fp$1826$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="41030"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Fri, 22 Oct 2021 00:43 UTC

On 22/10/2021 03:09, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 20/10/2021 12:45, dxforth wrote:
>>> On 20/10/2021 06:26, Nickolay Kolchin wrote:
>>>>
>>>> 1024 CONSTANT maxline
>>>> maxline BUFFER: buf
>>>
>>> Shouldn't that be:
>>>
>>> maxline 2 + BUFFER: buf
>>
>>I notice ANS says:
>>
>>"The line buffer provided by c-addr should be at least u1+2 characters long."
>>
>>"should" equates to "recommend". Why not "shall"? AFAIR a standard program
>>has no knowledge of READ-LINE's workings and must assume worst case.
>
> Yes. That's pretty clear from the text, so the use of "should"
> apparently has not caused confusion (I don't remember anybody being
> confused by it).

Because they weren't bitten by it? Interesting to know how many
implementations actually crash if the extra buffer space is not
provided.

> It's an error-prone interface, though.

Remembering to add 2 to the buffer size? Agree with that but
what's the alternative if one insists on handling any line ending?

Re: Handling unsupported line-endings

<skt3bc$1pn4$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15036&group=comp.lang.forth#15036

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 12:15:24 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skt3bc$1pn4$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="59108"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Fri, 22 Oct 2021 01:15 UTC

On 22/10/2021 11:43, dxforth wrote:
> On 22/10/2021 03:09, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>On 20/10/2021 12:45, dxforth wrote:
>>>> On 20/10/2021 06:26, Nickolay Kolchin wrote:
>>>>>
>>>>> 1024 CONSTANT maxline
>>>>> maxline BUFFER: buf
>>>>
>>>> Shouldn't that be:
>>>>
>>>> maxline 2 + BUFFER: buf
>>>
>>>I notice ANS says:
>>>
>>>"The line buffer provided by c-addr should be at least u1+2 characters long."
>>>
>>>"should" equates to "recommend". Why not "shall"? AFAIR a standard program
>>>has no knowledge of READ-LINE's workings and must assume worst case.
>>
>> Yes. That's pretty clear from the text, so the use of "should"
>> apparently has not caused confusion (I don't remember anybody being
>> confused by it).
>
> Because they weren't bitten by it? Interesting to know how many
> implementations actually crash if the extra buffer space is not
> provided.
>
>> It's an error-prone interface, though.
>
> Remembering to add 2 to the buffer size? Agree with that but
> what's the alternative if one insists on handling any line ending?

I suppose the spec could have been written:

"Read the next line from the file specified by fileid into memory given
by address /c-addr u1/. Up to two implementation-defined line-terminating
characters may be read into memory at the end of the line, but are not
included in the count u2. The line buffer provided by c-addr /shall/ be
at least 2 characters long."

Re: Handling unsupported line-endings

<33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15038&group=comp.lang.forth#15038

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:ac8:7c46:: with SMTP id o6mr11240407qtv.197.1634886647616;
Fri, 22 Oct 2021 00:10:47 -0700 (PDT)
X-Received: by 2002:a05:622a:2cd:: with SMTP id a13mr7504715qtx.328.1634886647461;
Fri, 22 Oct 2021 00:10:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Fri, 22 Oct 2021 00:10:47 -0700 (PDT)
In-Reply-To: <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:1c05:2f14:600:8886:e67:72f1:efb9;
posting-account=-JQ2RQoAAAB6B5tcBTSdvOqrD1HpT_Rk
NNTP-Posting-Host: 2001:1c05:2f14:600:8886:e67:72f1:efb9
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: mhx...@iae.nl (Marcel Hendrix)
Injection-Date: Fri, 22 Oct 2021 07:10:47 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 8
 by: Marcel Hendrix - Fri, 22 Oct 2021 07:10 UTC

On Tuesday, October 19, 2021 at 9:26:56 PM UTC+2, Nickolay Kolchin wrote:
[..]
> test.txt: 1016666745 bytes (970Mb). The file was generated from FASTA
> "language shootout benchmark" with argument 100000000.

Can you provide a simpler test file than text.txt? I don't want to implement
the fasta program just for this purpose.

-marcel

Re: Handling unsupported line-endings

<1a85dbbe-fed3-4d2f-8f09-460775becd5en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15041&group=comp.lang.forth#15041

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:1195:: with SMTP id m21mr11693047qtk.96.1634891171766;
Fri, 22 Oct 2021 01:26:11 -0700 (PDT)
X-Received: by 2002:ac8:7fc2:: with SMTP id b2mr11710283qtk.122.1634891171536;
Fri, 22 Oct 2021 01:26:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Fri, 22 Oct 2021 01:26:11 -0700 (PDT)
In-Reply-To: <33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1a85dbbe-fed3-4d2f-8f09-460775becd5en@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Fri, 22 Oct 2021 08:26:11 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: Nickolay Kolchin - Fri, 22 Oct 2021 08:26 UTC

On Friday, October 22, 2021 at 10:10:48 AM UTC+3, Marcel Hendrix wrote:
> On Tuesday, October 19, 2021 at 9:26:56 PM UTC+2, Nickolay Kolchin wrote:
> [..]
> > test.txt: 1016666745 bytes (970Mb). The file was generated from FASTA
> > "language shootout benchmark" with argument 100000000.
>
> Can you provide a simpler test file than text.txt? I don't want to implement
> the fasta program just for this purpose.
>
> -marcel

Here is the link to python program that generates required file:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/fasta-python3-2.html

execute with:

python3 app.py 100000000

And on next link you can find programs in other languages that can generate required output.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/fasta.html

Here is link to pregenerated file: https://nbkolchin.com/forth/test.txt.gz

Re: Handling unsupported line-endings

<2021Oct22.103014@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15042&group=comp.lang.forth#15042

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 08:30:14 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 44
Message-ID: <2021Oct22.103014@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="ad868696593786e09a875ff1960832b8";
logging-data="32156"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/boTTLJb/xtyqVgGDNmn1m"
Cancel-Lock: sha1:3jTRK59gWY827u7scdkwmOw+Dmo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 22 Oct 2021 08:30 UTC

dxforth <dxforth@gmail.com> writes:
>Because they weren't bitten by it? Interesting to know how many
>implementations actually crash if the extra buffer space is not
>provided.

Typically the Forth system may overwrite these two characters, so if
that memory is accessible (the usual case), the result is not a crash
of READ-LINE, but memory corruption.

>> It's an error-prone interface, though.
>
>Remembering to add 2 to the buffer size? Agree with that but
>what's the alternative if one insists on handling any line ending?

1) If we stick with the read-line stack effect, the ideal is to just
write the characters into the buffer without the line ending. That
requires either an interface that gives you one character in a
register (not something like Unix' read()), or you need to read stuff
into a buffer elsewhere (not so nice for very small systems).

2) The given length is the total buffer size, u2 is the size excluding
any newline characters, and whether the line end has yet to be reached
is indicated in some other way than by checking whether u1=u2; maybe
with a specific flag value.

If we let our minds wander further: READ-LINE's nastyness comes from
designing an interface that deals with arbitrarily long lines, but a
pre-allocated buffer that may be too short for the line. If we allow
to dynamically allocate the buffer, we can have a much nicer
interface, as discussed in
<2018Jun12.105031@mips.complang.tuwien.ac.at>:

|GET-LINE ( fid -- c-addr u flag ior )
| |flag is false at the end of file

But of course, that's not an interface for tiny embedded systems.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct22.105318@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15043&group=comp.lang.forth#15043

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 08:53:18 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 27
Message-ID: <2021Oct22.105318@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org> <skt3bc$1pn4$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="ad868696593786e09a875ff1960832b8";
logging-data="32156"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19CisQR9hOklPCi9yLzSRd2"
Cancel-Lock: sha1:7EZ6l172J0PDXs4Y1GzBTovS9x4=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 22 Oct 2021 08:53 UTC

dxforth <dxforth@gmail.com> writes:
>On 22/10/2021 11:43, dxforth wrote:
>> On 22/10/2021 03:09, Anton Ertl wrote:
>>> It's an error-prone interface, though.
>>
>> Remembering to add 2 to the buffer size? Agree with that but
>> what's the alternative if one insists on handling any line ending?
>
>I suppose the spec could have been written:
>
> "Read the next line from the file specified by fileid into memory given
> by address /c-addr u1/. Up to two implementation-defined line-terminating
> characters may be read into memory at the end of the line, but are not
> included in the count u2. The line buffer provided by c-addr /shall/ be
> at least 2 characters long."

Yes, that's my option 2), but you also need to specify how to
recognize whether the line end has been reached or not.

Anyway, it's water down the river.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sku3k2$10tf$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15044&group=comp.lang.forth#15044

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 21:26:09 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sku3k2$10tf$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<2021Oct22.103014@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="33711"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Fri, 22 Oct 2021 10:26 UTC

On 22/10/2021 19:30, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>Because they weren't bitten by it? Interesting to know how many
>>implementations actually crash if the extra buffer space is not
>>provided.
>
> Typically the Forth system may overwrite these two characters, so if
> that memory is accessible (the usual case), the result is not a crash
> of READ-LINE, but memory corruption.

I'm thinking of the /EOL algorithm which relies on a preceding file
read 1 character greater than the user requested. If the user
assigned a buffer of u characters and asked READ-LINE for u characters
then the buffer will be overwritten. Does fgets have this problem
of overwrite - or does it conform to the buffer size it has been given?

>
>>> It's an error-prone interface, though.
>>
>>Remembering to add 2 to the buffer size? Agree with that but
>>what's the alternative if one insists on handling any line ending?
>
> 1) If we stick with the read-line stack effect, the ideal is to just
> write the characters into the buffer without the line ending. That
> requires either an interface that gives you one character in a
> register (not something like Unix' read()), or you need to read stuff
> into a buffer elsewhere (not so nice for very small systems).
>
> 2) The given length is the total buffer size, u2 is the size excluding
> any newline characters, and whether the line end has yet to be reached
> is indicated in some other way than by checking whether u1=u2; maybe
> with a specific flag value.
>
> If we let our minds wander further: READ-LINE's nastyness comes from
> designing an interface that deals with arbitrarily long lines, but a
> pre-allocated buffer that may be too short for the line. If we allow
> to dynamically allocate the buffer, we can have a much nicer
> interface, as discussed in
> <2018Jun12.105031@mips.complang.tuwien.ac.at>:
>
> |GET-LINE ( fid -- c-addr u flag ior )
> |
> |flag is false at the end of file
>
> But of course, that's not an interface for tiny embedded systems.

That's something I'd have in an app or library. Implementation still
requires a block read function and a buffer.

Re: Handling unsupported line-endings

<2021Oct22.122045@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15045&group=comp.lang.forth#15045

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 10:20:45 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 53
Message-ID: <2021Oct22.122045@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="ad868696593786e09a875ff1960832b8";
logging-data="17338"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/JtJb5dwPF+TkjYmf0wGtS"
Cancel-Lock: sha1:jElxww68b+slLxbirsUwyJXkjVI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 22 Oct 2021 10:20 UTC

Marcel Hendrix <mhx@iae.nl> writes:
>On Tuesday, October 19, 2021 at 9:26:56 PM UTC+2, Nickolay Kolchin wrote:
>[..]
>> test.txt: 1016666745 bytes (970Mb). The file was generated from FASTA
>> "language shootout benchmark" with argument 100000000.
>
>Can you provide a simpler test file than text.txt? I don't want to implement
>the fasta program just for this purpose.

I ran iforth on the ten bibles from Ben Hoyt's count-unique task:

LC_NUMERIC=en_US.utf8 perf stat -e cycles -e cycles:u -e cycles:k -e instructions:u iforth 's" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
998170

Performance counter stats for 'iforth s" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye':

971,836,455 cycles
788,403,506 cycles:u
183,080,326 cycles:k
1,827,909,289 instructions:u # 2.32 insn per cycle

0.250143388 seconds time elapsed

0.208843000 seconds user
0.036146000 seconds sys

For comparison:

LC_NUMERIC=en_US.utf8 perf stat -e cycles -e cycles:u -e cycles:k -e instructions:u gforth-fast -e 's" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
998170

Performance counter stats for 'gforth-fast -e s" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye':

549,667,728 cycles
506,120,832 cycles:u
43,316,660 cycles:k
1,606,393,446 instructions:u # 3.17 insn per cycle

0.144901109 seconds time elapsed

0.139777000 seconds user
0.003993000 seconds sys

So slightly worse than Gforth, quite a bit better than other Forth
systems. Quite a lot of the speed difference is due to iForth's
bigger startup overhead (346M cycles=166M cycles:u+180M cycles:k).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Pages:1234567
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor