Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Disobedience: The silver lining to the cloud of servitude. -- Ambrose Bierce


devel / comp.lang.forth / Re: Handling unsupported line-endings

SubjectAuthor
* Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsHeinrich Hohl
|`* Re: Handling unsupported line-endingsdxforth
| `* Re: Handling unsupported line-endingsHeinrich Hohl
|  +* Re: Handling unsupported line-endingsdxforth
|  |`- Re: Handling unsupported line-endingsdxforth
|  +* Re: Handling unsupported line-endingsminf...@arcor.de
|  |`- Re: Handling unsupported line-endingsdxforth
|  `* Re: Handling unsupported line-endingsAnton Ertl
|   +* Re: Handling unsupported line-endingsHeinrich Hohl
|   |`- Re: Handling unsupported line-endingsAnton Ertl
|   `* Re: Handling unsupported line-endingsNickolay Kolchin
|    +* Re: Handling unsupported line-endingsdxforth
|    |+* Re: Handling unsupported line-endingsdxforth
|    ||`* Re: Handling unsupported line-endingsAnton Ertl
|    || `* Re: Handling unsupported line-endingsdxforth
|    ||  +* Re: Handling unsupported line-endingsdxforth
|    ||  |`* Re: Handling unsupported line-endingsAnton Ertl
|    ||  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |  +- Re: Handling unsupported line-endingsdxforth
|    ||  |  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |   `* Re: Handling unsupported line-endingsdxforth
|    ||  |    `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |     `* Re: Handling unsupported line-endingsdxforth
|    ||  |      `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |       `* Re: Handling unsupported line-endingsdxforth
|    ||  |        `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |         `* Re: Handling unsupported line-endingsdxforth
|    ||  |          `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |           `* Re: Handling unsupported line-endingsdxforth
|    ||  |            `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |             `* Re: Handling unsupported line-endingsdxforth
|    ||  |              +* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |              |`* Re: Handling unsupported line-endingsdxforth
|    ||  |              | `* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  +* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |`* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |  `- Re: Handling unsupported line-endingsRuvim
|    ||  |              |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    ||  |              |   `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |    `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |     `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |      `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |       `- Re: Handling unsupported line-endingsRon AARON
|    ||  |              `* Re: Handling unsupported line-endingsdxforth
|    ||  |               `- Re: Handling unsupported line-endingsdxforth
|    ||  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||   `* Re: Handling unsupported line-endingsdxforth
|    ||    `- Re: Handling unsupported line-endingsAnton Ertl
|    |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | +* Re: Handling unsupported line-endingsdxforth
|    | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsdxforth
|    | | |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsAnton Ertl
|    | | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | | `* Re: Handling unsupported line-endingsAnton Ertl
|    | | |  `- Re: Handling unsupported line-endingsdxforth
|    | | `* Re: Handling unsupported line-endingsHeinrich Hohl
|    | |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |   `* Re: Handling unsupported line-endingsRon AARON
|    | |    `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     +* Re: Handling unsupported line-endingspahihu
|    | |     |+- Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     |`- Re: Handling unsupported line-endingsRon AARON
|    | |     `* Re: Handling unsupported line-endingsRon AARON
|    | |      +* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |      |`- Re: Handling unsupported line-endingsRon AARON
|    | |      `- Re: Handling unsupported line-endingsdxforth
|    | `- Re: Handling unsupported line-endingsAnton Ertl
|    +* Re: Handling unsupported line-endingsAnton Ertl
|    |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    `* Re: Handling unsupported line-endingsMarcel Hendrix
|     +- Re: Handling unsupported line-endingsNickolay Kolchin
|     `* Re: Handling unsupported line-endingsAnton Ertl
|      `* Re: Handling unsupported line-endingsdxforth
|       `* Re: Handling unsupported line-endingsAnton Ertl
|        `* Re: Handling unsupported line-endingspahihu
|         +* Re: Handling unsupported line-endingsdxforth
|         |`* Re: Handling unsupported line-endingsAnton Ertl
|         | `- Re: Handling unsupported line-endingsdxforth
|         `- Re: Handling unsupported line-endingsAnton Ertl
+* Re: Handling unsupported line-endingsS Jack
|`- Re: Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsBranimir Maksimovic
|`- Re: Handling unsupported line-endingsdxforth
`* Re: Handling unsupported line-endingsdxforth
 +- Re: Handling unsupported line-endingsRuvim
 `* Re: Handling unsupported line-endingsAnton Ertl
  +* Re: Handling unsupported line-endingsRuvim
  |`* Re: Handling unsupported line-endingsAnton Ertl
  | `* Re: Handling unsupported line-endingsRuvim
  |  `* Re: Handling unsupported line-endingsAnton Ertl
  |   +* Re: Handling unsupported line-endingsRuvim
  |   |`- Re: Handling unsupported line-endingsAnton Ertl
  |   `* Re: Handling unsupported line-endingsdxforth
  |    `* Re: Handling unsupported line-endingsRuvim
  |     `* Re: Handling unsupported line-endingsdxforth
  |      `* Re: Handling unsupported line-endingsRuvim
  |       `* Re: Handling unsupported line-endingsdxforth
  `* Re: Handling unsupported line-endingsdxforth

Pages:1234567
Re: Handling unsupported line-endings

<sla1tn$hjf$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15079&group=comp.lang.forth#15079

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 27 Oct 2021 02:10:46 +0300
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <sla1tn$hjf$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 26 Oct 2021 23:10:47 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="12a4d6d4b21b6e8e85913f0e57c2a694";
logging-data="18031"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0cCRSZ2/41GFImG++uEly"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:En4ubAdLzLLReVIcJi+XXAyTLU4=
In-Reply-To: <sl7p4a$1je$1@gioia.aioe.org>
Content-Language: en-US
 by: Ruvim - Tue, 26 Oct 2021 23:10 UTC

On 2021-10-26 05:28, dxforth wrote:
> On 25/10/2021 17:56, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
[...]
>>> ANS does not entitle a standard programs to test for a completed line
>>> using 'u2<>u1'.
>>
>> It does, as discussed above.
>>
>>> It would deny an implementer the right to have a completed line of
>>> u2=u1 chars - an entitlement given in the first paragraph of the
>>> specification.
>>
>> I find no such entitlement there.
>
> "The line buffer provided by c-addr should be at least u1+2 characters
> long."
>
> Entitles it.

No. This statement provides nothing support to your claim. Since even
without this restriction it's possible to implement this function in
such a way that it can read a completed line of u2=u1 chars (but then
this function just will not provide information whether a completed line
was read or just a part of the line).

Therefore, this restriction on programs only entitles a system with
additional ways to implement this function. Namely, it provides a way to
implement the function more efficiently with less effort is some cases
(and the Rationale A.11.6.1.2090 explain that).

Also, take into account the following statements:

| At most u1 characters are read.
[...]
| When u1 = u2 the line terminator has yet to be reached.

It means that even if more than u1 characters were read inside the
function, they (that are beyond u1) should be discarded via reposition
of the file, and they shall be available for the next read operation.

So a completed line of u2 chars (except line terminators), where u2>=u1,
cannot be read as a single line by the standard READ-LINE word. Such a
line is considered as it doesn't fit the buffer and it's read by at
least two READ-LINE calls. And a program may test this case by comparing
u1 and u2.

The only thing seems incorrectly in the specification is
(0 <= u2 <= u1)
that should be:
(0 <= u2 < u1)
Since it is related to the case when "a line terminator was received
before u1 characters were read", and then u2 < u1.

>>> "As Gforth shows, you can also do it completely without extra chars."
>>>
>>> is a complication nobody needs.  You want Forth to be slower than C ?
>>
>> Forth's READ-LINE can be as fast as C's fgets(), if Forth does its own
>> buffering.  In that case you just don't copy the line terminator
>> characters.
>>
>> While Gforth does not do this (it uses C's buffering through getc, and
>> performance suffers from that), it's READ-LINE is still the fastest
>> one among the Forth systems posted here.
>
> Good for you but let's not condemn everyone to using getc.

"getc" is not necessary, READ-FILE and REPOSITION-FILE are enough.

>>
>>>> Your current READ-LINE does not handle CR without following LF
>>>> correctly.
>>>
>>> Since when does ANS require it?  ANS permits implementers to choose
>>> what line terminators to support and CR was low priority for me.
>>
>> It's acceptable to treat CR as a non-line-terminator.  It is not
>> acceptable to tread "CR 1" as a line terminator, and your READ-LINE
>> does that.
>
> A lone CR in a text file that uses CRLF line endings in not convention.
> There is no 'correct response' to such a situation, nor AFAIK did ANS
> suggest one.

ANS suggests that a system should define line terminators.

If you (as a system implementer) haven't defined a lone CR is a line
terminator, you should treat it as a regular part of a line, and it will
be correct.

If your system treat "CR 1" as a line terminator, you have to define
(i.e. document) this sequence as a line terminator. Otherwise you don't
have an excuse to treat it as a line terminator. I can only note that it
would be a very strange and unexpected choice of a line terminator.

--
Ruvim

Re: Handling unsupported line-endings

<slabi4$6h5$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15080&group=comp.lang.forth#15080

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 27 Oct 2021 12:55:15 +1100
Organization: Aioe.org NNTP Server
Message-ID: <slabi4$6h5$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="6693"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 27 Oct 2021 01:55 UTC

On 27/10/2021 10:10, Ruvim wrote:
> On 2021-10-26 05:28, dxforth wrote:
>>
>> "The line buffer provided by c-addr should be at least u1+2 characters
>> long."
>>
>> Entitles it.
>
> No. This statement provides nothing support to your claim.

It entitles implementers to make use of u1+2 character space provided.

> Since even
> without this restriction it's possible to implement this function in
> such a way that it can read a completed line of u2=u1 chars (but then
> this function just will not provide information whether a completed line
> was read or just a part of the line).

That the u1+2 character entitlement creates a problem for you determining
whether a line was completed is acknowledged. Nevertheless that is what
ANS specified. If you don't like it, you need to change the spec.

> Also, take into account the following statements:
>
> | At most u1 characters are read.
> [...]
> | When u1 = u2 the line terminator has yet to be reached.

That may be interpreted as:

a) a statement of what happens in the case of a long line, or
b) a mandate that a completed line must never assume u1 characters

I opt for (a) as it is consistent with the rest of the specification e.g. ...

> The only thing seems incorrectly in the specification is
> (0 <= u2 <= u1)
> that should be:
> (0 <= u2 < u1)
> Since it is related to the case when "a line terminator was received
> before u1 characters were read", and then u2 < u1.

Here you agree the spec doesn't fit what you want and to make it so, you
claim ANS made a mistake. I don't buy it because it would mean ANS made
two mistakes - the other being a completed line cannot be u1 characters
despite ANS making provision for it.

> ANS suggests that a system should define line terminators.

I did:

READ-LINE ( c-addr u1 fid -- u2 flag ior )
...
The disk file may use either CRLF (CP/M and MS-DOS) or LF (UNIX) as
line terminator.

> If you (as a system implementer) haven't defined a lone CR is a line
> terminator, you should treat it as a regular part of a line, and it will
> be correct.

No, I don't have to support that. It's text for which there are
conventions - not binary. Anton put garbage in - he got garbage out.

Re: Handling unsupported line-endings

<slami6$st4$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15082&group=comp.lang.forth#15082

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!rocksolid2!news.neodome.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 27 Oct 2021 08:03:01 +0300
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <slami6$st4$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me> <slabi4$6h5$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 27 Oct 2021 05:03:02 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="12a4d6d4b21b6e8e85913f0e57c2a694";
logging-data="29604"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Xpx1paULQhby3zWQ2VQZM"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:PZasDhWS66FLyEglIowyKi66HP4=
In-Reply-To: <slabi4$6h5$1@gioia.aioe.org>
Content-Language: en-US
 by: Ruvim - Wed, 27 Oct 2021 05:03 UTC

On 2021-10-27 04:55, dxforth wrote:
> On 27/10/2021 10:10, Ruvim wrote:
>> On 2021-10-26 05:28, dxforth wrote:
>>>
>>> "The line buffer provided by c-addr should be at least u1+2
>>> characters long."
>>>
>>> Entitles it.
>>
>> No. This statement provides nothing support to your claim.
>
> It entitles implementers to make use of u1+2 character space provided.

Yes.

>> Since even
>> without this restriction it's possible to implement this function in
>> such a way that it can read a completed line of u2=u1 chars (but then
>> this function just will not provide information whether a completed line
>> was read or just a part of the line).
>
> That the u1+2 character entitlement creates a problem for you determining
> whether a line was completed is acknowledged.  Nevertheless that is what
> ANS specified.

By my view, you make incorrect conclusion from what ANS specified.

> If you don't like it, you need to change the spec.
>
>> Also, take into account the following statements:
>>
>> | At most u1 characters are read.
>> [...]
>> | When u1 = u2 the line terminator has yet to be reached.
>
> That may be interpreted as:
>
> a) a statement of what happens in the case of a long line, or

> b) a mandate that a completed line must never assume u1 characters

It's slightly confusing. My try of clarifying this as follows.
By a line I mean characters of the line excluding line terminators.

Can a line in a file be of u1+1 characters? Or u1+2 characters?
Of course it can, as well as of u1 characters, or less than u1.

Can READ-LINE read a completed line of u1+1 chars? It cannot. It can
read only part of this line. But why not? Because it will violate the
specification. And ditto for a completed line of u1 chars — it will just
violate the specification.

So a line of u1 characters is not a single exclusions. Any line of u1 or
*more* characters cannot be read completely at once.

>
> I opt for (a) as it is consistent with the rest of the specification
> e.g. ...

Then you don't provide a user with a way to distinguish a complete line
and incomplete line. It's critical if a user reads a source code. An
incomplete line can break a lexeme into parts and cause error of
compilation.

And also you violate the condition "When u1 = u2 the line terminator has
yet to be reached".

It's unclear to me why you so restrict the scope of this condition.

>
>> The only thing seems incorrectly in the specification is
>>     (0 <= u2 <= u1)
>> that should be:
>>     (0 <= u2 < u1)
>> Since it is related to the case when "a line terminator was received
>> before u1 characters were read", and then u2 < u1.
>
> Here you agree the spec doesn't fit what you want and to make it so, you
> claim ANS made a mistake.

Formally, it's not a mistake, since this condition still is true anyway.
But a more strong variant is also true. So the given more relaxed
variant is confusing.

There is another small mistake.

"At most u1 characters are read" should be read as
"At most u1 characters except line terminators are read". Otherwise
having the CRLF line terminator this condition sometimes cannot be met.

> I don't buy it because it would mean ANS made
> two mistakes - the other being a completed line cannot be u1 characters
> despite ANS making provision for it.

It makes provision not for that. See A.11.6.1.2090
Hence, it doesn't show the mistake.

>
>> ANS suggests that a system should define line terminators.
>
> I did:
>
>  READ-LINE  ( c-addr u1 fid -- u2 flag ior )
>  ...
>  The disk file may use either CRLF (CP/M and MS-DOS) or LF (UNIX) as
>  line terminator.
>
>> If you (as a system implementer) haven't defined a lone CR is a line
>> terminator, you should treat it as a regular part of a line, and it will
>> be correct.
>
> No, I don't have to support that.  It's text for which there are
> conventions - not binary.  Anton put garbage in - he got garbage out.

Well, then just declare that a sequence of CR + any character is also a
terminator. And then it will preach what it practices at least.

--
Ruvim

Re: Handling unsupported line-endings

<slaqno$eha$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15083&group=comp.lang.forth#15083

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 27 Oct 2021 17:14:15 +1100
Organization: Aioe.org NNTP Server
Message-ID: <slaqno$eha$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me> <slabi4$6h5$1@gioia.aioe.org>
<slami6$st4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="14890"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Wed, 27 Oct 2021 06:14 UTC

On 27/10/2021 16:03, Ruvim wrote:
>
> Can a line in a file be of u1+1 characters? Or u1+2 characters?
> Of course it can, as well as of u1 characters, or less than u1.
>
> Can READ-LINE read a completed line of u1+1 chars? It cannot. It can
> read only part of this line. But why not? Because it will violate the
> And ditto for a completed line of u1 chars — it will just
> violate the specification.

ISTM READ-LINE was expressly designed to read u1 chars plus eol.

A.11.6.1.2090 READ-LINE

Implementations are allowed to store the line terminator in the memory buffer
in order to allow the use of line reading functions provided by host operating
systems, some of which store the terminator. Without this provision, a temporary
buffer might be needed. The two-character limitation is sufficient for the vast
majority of existing operating systems."

Re: Handling unsupported line-endings

<slbeis$1q6$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15086&group=comp.lang.forth#15086

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Wed, 27 Oct 2021 14:52:58 +0300
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <slbeis$1q6$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me> <slabi4$6h5$1@gioia.aioe.org>
<slami6$st4$1@dont-email.me> <slaqno$eha$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 27 Oct 2021 11:53:00 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="12a4d6d4b21b6e8e85913f0e57c2a694";
logging-data="1862"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Ge+qYRSnXBHGUBwRs49vL"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:nWLD4tjxbK42P/ctWW9YE7CLbY4=
In-Reply-To: <slaqno$eha$1@gioia.aioe.org>
Content-Language: en-US
 by: Ruvim - Wed, 27 Oct 2021 11:52 UTC

On 2021-10-27 09:14, dxforth wrote:
> On 27/10/2021 16:03, Ruvim wrote:
>>
>> Can a line in a file be of u1+1 characters? Or u1+2 characters?
>> Of course it can, as well as of u1 characters, or less than u1.
>>
>> Can READ-LINE read a completed line of u1+1 chars? It cannot. It can
>> read only part of this line. But why not? Because it will violate the
>> And ditto for a completed line of u1 chars — it will just
>> violate the specification.
>
> ISTM READ-LINE was expressly designed to read u1 chars plus eol.
>
> A.11.6.1.2090 READ-LINE
>
> Implementations are allowed to store the line
> terminator in the memory buffer
> in order to allow the use of line reading
> functions provided by host operating
> systems, some of which store the terminator.
> Without this provision, a temporary
> buffer might be needed. The two-character
> limitation is sufficient for the vast
> majority of existing operating systems."
>

It's correct: it can read more under the hood, but then it should change
the file position backward under the hood in some cases to meet the API
requirements.

The only reason to require the reserve of 2 characters beyond u1 is to
avoid overhead both in an implementation and in run-time in some cases.

If you specify no reserve beyond u1, and the line terminator is CRLF,
and you just pass the buffer of u1-2 chars (with 2 chars reserved) to an
OS function (that requires 2 chars reserve), you cannot meet the
requirement concerning u1=u2 case without overhead when the last read
char is CR, since you need to perform one *more* read into *another*
buffer of length 1+2 and make reposition back if the first read
character is not LF. And you cannot use the given buffer if a user want
to read 1 character (i.e. u1=1).

If you specify the reserve of 1 character, and you pass u1-1 to the OS
function, you are also forced to use *another* buffer internally if u1=1
(i.e., a user want to read 1 character), and then copy a character to
the buffer that is passed to you by the user.

So 2 characters reservation is a good choice that makes implementations
simpler and more efficient in some cases.

--
Ruvim

Re: Handling unsupported line-endings

<slbtf8$1i23$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15088&group=comp.lang.forth#15088

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 28 Oct 2021 03:07:02 +1100
Organization: Aioe.org NNTP Server
Message-ID: <slbtf8$1i23$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<sl8472$13k6$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="51267"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Wed, 27 Oct 2021 16:07 UTC

On 26/10/2021 16:37, dxforth wrote:
> On 25/10/2021 14:49, dxforth wrote:
>> ...
>> You want a READ-LINE that handles all common line
>> terminators and works within the confines of a 'u1' sized buffer?
>> ISTM SwiftForth's READ-LINE will do that with the following source
>> tweaks. Remove the '1+' and insert '1-' before EOL-SCANNER.
>
> Correction. Amend the above to read:
>
> Remove the '1+' and insert 'SWAP 1-' before MIN.
>

I got around to testing this. While the patch above works and READ-LINE
correctly handles each EOL type, the downside is the test to determine
whether an EOL occurred is changed from 'u2 < u1' to 'u2 < u1-1'. The
only way to handle that portably would be the addition of an EOL flag.
I had hoped for a simple amendment to the READ-LINE spec. to overcome
the annoying 'u1+2' buffer requirement however that's now scuttled.

Re: Handling unsupported line-endings

<89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15092&group=comp.lang.forth#15092

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:1754:: with SMTP id l20mr33970340qtk.309.1635357759323;
Wed, 27 Oct 2021 11:02:39 -0700 (PDT)
X-Received: by 2002:a05:6214:224d:: with SMTP id c13mr31428472qvc.37.1635357758872;
Wed, 27 Oct 2021 11:02:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Wed, 27 Oct 2021 11:02:38 -0700 (PDT)
In-Reply-To: <sla1tn$hjf$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=213.21.29.203; posting-account=DoM31goAAADuzlbg5XKrMFannjkYS2Lr
NNTP-Posting-Host: 213.21.29.203
References: <skjhir$jd9$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at>
<sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at>
<sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at>
<sl19jo$1533$1@gioia.aioe.org> <2021Oct23.185441@mips.complang.tuwien.ac.at>
<sl2e46$16ea$1@gioia.aioe.org> <2021Oct24.090321@mips.complang.tuwien.ac.at>
<sl3e4n$1m83$1@gioia.aioe.org> <2021Oct24.185514@mips.complang.tuwien.ac.at>
<sl59h0$2kr$1@gioia.aioe.org> <2021Oct25.085641@mips.complang.tuwien.ac.at>
<sl7p4a$1je$1@gioia.aioe.org> <sla1tn$hjf$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: nbkolc...@gmail.com (Nickolay Kolchin)
Injection-Date: Wed, 27 Oct 2021 18:02:39 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 12
 by: Nickolay Kolchin - Wed, 27 Oct 2021 18:02 UTC

On Wednesday, October 27, 2021 at 2:10:49 AM UTC+3, Ruvim wrote:
> On 2021-10-26 05:28, dxforth wrote:
> > On 25/10/2021 17:56, Anton Ertl wrote:
> >> dxforth <dxf...@gmail.com> writes:
> [...]
> "getc" is not necessary, READ-FILE and REPOSITION-FILE are enough.

This is a really bad advice:

1. REPOSITION-FILE won't work on character devices for example.
2. READ-FILE ... REPOSITION-FILE will work really slow, compared to buffered
I/O. We must reduce system calls at all costs.

Re: Handling unsupported line-endings

<sld80h$5u2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15094&group=comp.lang.forth#15094

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 28 Oct 2021 07:13:03 +0300
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <sld80h$5u2$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me>
<89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 28 Oct 2021 04:13:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="87177e40b847d37ed1a633c37c8c13e5";
logging-data="6082"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18S85FrmvXHhlwviaHC1+qv"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:QQqdgkqH9mrDjBldEPjleQHdzeY=
In-Reply-To: <89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
Content-Language: en-US
 by: Ron AARON - Thu, 28 Oct 2021 04:13 UTC

On 27/10/2021 21:02, Nickolay Kolchin wrote:
> On Wednesday, October 27, 2021 at 2:10:49 AM UTC+3, Ruvim wrote:
>> On 2021-10-26 05:28, dxforth wrote:
>>> On 25/10/2021 17:56, Anton Ertl wrote:
>>>> dxforth <dxf...@gmail.com> writes:
>> [...]
>> "getc" is not necessary, READ-FILE and REPOSITION-FILE are enough.
>
> This is a really bad advice:
>
> 1. REPOSITION-FILE won't work on character devices for example.
> 2. READ-FILE ... REPOSITION-FILE will work really slow, compared to buffered
> I/O. We must reduce system calls at all costs.

Yes. I was using 'fgetc()' to just read character by character, relying
on the libc implementation to buffer the reads. But it turns out that
that is quite slow. Using fread() to read a buffer in and scan for
characters is many times faster.

But then, since you're just 'getting a line', you only use part of the
buffer you've read. So I had to 'fseek()' after reaching EOL so the next
call would start at the beginning of a line.

An optimization I hit upon was to use a dynamic buffer size, based on
whatever the real line-length read turned out to be, starting with an
initial size of 80 characters. After a couple line reads, the buffer
size read in was adapted to the reality of the file being read. This
turns out to be a pretty big win in terms of performance, somewhat
surprisingly.

A further optimization was to use the 'memchr()' function to scan for a
character instead of doing the normal loop over the buffer. It turns out
that glibc's implementation of memchr() is *extremely* fast, using
heavily optimized assembly-language to really scan quickly. It's
something like 10x faster than an optimal C implementation (reading e.g.
4 bytes a a time and other tricks). I'm suitably impressed, because
scanning the buffer for CR and LF (e.g. scanning twice) with memchr() is
much faster than the equivalent C loop. Caveat programmer.

I think a better option for future development would be to reimplement
buffered IO using read() instead, and handle repositioning with a simple
index change. But I don't have the time right now to do that.

BTW: I would like to thank you for pointing out the sub-optimal (!)
performance of 8th's f:getline et. al. You spurred me on to take a
closer look, and I've radically improved that family of words.

Re: Handling unsupported line-endings

<sldi6v$1gle$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15097&group=comp.lang.forth#15097

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 28 Oct 2021 18:07:11 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sldi6v$1gle$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me>
<89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
<sld80h$5u2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="49838"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Thu, 28 Oct 2021 07:07 UTC

On 28/10/2021 15:13, Ron AARON wrote:
>
> But then, since you're just 'getting a line', you only use part of the
> buffer you've read. So I had to 'fseek()' after reaching EOL so the next
> call would start at the beginning of a line.
>
> An optimization I hit upon was to use a dynamic buffer size, based on
> whatever the real line-length read turned out to be, starting with an
> initial size of 80 characters. After a couple line reads, the buffer
> size read in was adapted to the reality of the file being read. This
> turns out to be a pretty big win in terms of performance, somewhat
> surprisingly.

That adaption is done once or continues throughout the read? Presumably
you 'join' lines when the guess is short?

Re: Handling unsupported line-endings

<sldieo$qf8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15099&group=comp.lang.forth#15099

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 28 Oct 2021 10:11:19 +0300
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <sldieo$qf8$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me>
<89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
<sld80h$5u2$1@dont-email.me> <sldi6v$1gle$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Oct 2021 07:11:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="87177e40b847d37ed1a633c37c8c13e5";
logging-data="27112"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MiPJjFV1lbt4xtupha1Vt"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:8Ra7tz9kEAEJrt6y+XsbLoFGMww=
In-Reply-To: <sldi6v$1gle$1@gioia.aioe.org>
Content-Language: en-US
 by: Ron AARON - Thu, 28 Oct 2021 07:11 UTC

On 28/10/2021 10:07, dxforth wrote:
> On 28/10/2021 15:13, Ron AARON wrote:
>>
>> But then, since you're just 'getting a line', you only use part of the
>> buffer you've read. So I had to 'fseek()' after reaching EOL so the next
>> call would start at the beginning of a line.
>>
>> An optimization I hit upon was to use a dynamic buffer size, based on
>> whatever the real line-length read turned out to be, starting with an
>> initial size of 80 characters. After a couple line reads, the buffer
>> size read in was adapted to the reality of the file being read. This
>> turns out to be a pretty big win in terms of performance, somewhat
>> surprisingly.
>
> That adaption is done once or continues throughout the read?  Presumably
> you 'join' lines when the guess is short?

It's done throughout the read, since line lengths may change
unpredictably (depends of course on the kind of file being read).

Since this is 8th, I'm reading directly into a string item, which is a
dynamically-sized item already. So it's not a problem to expand the
string as necessary (subject of course to memory constraints).

Yes, if the guess is short (e.g. no line-terminator character found in
the read buffer), more reads are performed until EOL or EOF.

Re: Handling unsupported line-endings

<sldjh2$1a4$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15100&group=comp.lang.forth#15100

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 28 Oct 2021 18:29:38 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sldjh2$1a4$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me>
<89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
<sld80h$5u2$1@dont-email.me> <sldi6v$1gle$1@gioia.aioe.org>
<sldieo$qf8$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="1348"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Thu, 28 Oct 2021 07:29 UTC

On 28/10/2021 18:11, Ron AARON wrote:
> On 28/10/2021 10:07, dxforth wrote:
>>
>> That adaption is done once or continues throughout the read?  Presumably
>> you 'join' lines when the guess is short?
>
> It's done throughout the read, since line lengths may change
> unpredictably (depends of course on the kind of file being read).

So what criteria had to be met for a buffer size change?

Re: Handling unsupported line-endings

<sldkaa$563$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15101&group=comp.lang.forth#15101

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: clf...@8th-dev.com (Ron AARON)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Thu, 28 Oct 2021 10:43:06 +0300
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <sldkaa$563$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at> <sl7p4a$1je$1@gioia.aioe.org>
<sla1tn$hjf$1@dont-email.me>
<89d22260-09f8-4baa-95f0-84d25853803fn@googlegroups.com>
<sld80h$5u2$1@dont-email.me> <sldi6v$1gle$1@gioia.aioe.org>
<sldieo$qf8$1@dont-email.me> <sldjh2$1a4$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Oct 2021 07:43:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="87177e40b847d37ed1a633c37c8c13e5";
logging-data="5315"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19QW/XXUtaklmv5YgsOOB7j"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:jKQbNiztFHYJVypgGlCZF/yM8yc=
In-Reply-To: <sldjh2$1a4$1@gioia.aioe.org>
Content-Language: en-US
 by: Ron AARON - Thu, 28 Oct 2021 07:43 UTC

On 28/10/2021 10:29, dxforth wrote:
> On 28/10/2021 18:11, Ron AARON wrote:
>> On 28/10/2021 10:07, dxforth wrote:
>>>
>>> That adaption is done once or continues throughout the read?  Presumably
>>> you 'join' lines when the guess is short?
>>
>> It's done throughout the read, since line lengths may change
>> unpredictably (depends of course on the kind of file being read).
>
> So what criteria had to be met for a buffer size change?

Pretty simple-minded actually.

I look at the returned string size. If it's more than 1/2 the previous
size, I set the new size at the average between the old and new. This
keeps extremely short lines from disturbing things too much.

"It works for me", though it may just be an artifact of the test data I
used.

Re: Handling unsupported line-endings

<sljml0$3qg$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15112&group=comp.lang.forth#15112

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 31 Oct 2021 01:59:44 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sljml0$3qg$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="3920"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Sat, 30 Oct 2021 14:59 UTC

What happens when READ-LINE is asked for u1 = 0 characters?

Several forths tried report 0 true 0

According to ANS RFI 001 that may be interpreted as a blank line having
been read:

u2 flag ior Meaning
-- ---- --- -------
0 false zero End-of-file; no characters were read

0 true zero A blank line was read

which is misleading and potentially problematic (your program enters an
infinite loop trying to get the next line). The alternative is to report
0 false 0 which signals end-of-file and stops further processing.

Re: Handling unsupported line-endings

<sljv8f$6l2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15114&group=comp.lang.forth#15114

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 30 Oct 2021 20:26:38 +0300
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <sljv8f$6l2$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 30 Oct 2021 17:26:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ed76492fb590d3cbb9d465ba9baa8a6c";
logging-data="6818"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19NJG0q/6PdXsTkc5Cq9q8C"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:pj5rZ77NqrxNmjU8Ko/rJuy6gfE=
In-Reply-To: <sljml0$3qg$1@gioia.aioe.org>
Content-Language: en-US
 by: Ruvim - Sat, 30 Oct 2021 17:26 UTC

On 2021-10-30 17:59, dxforth wrote:
> What happens when READ-LINE is asked for u1 = 0 characters?
>
> Several forths tried report  0 true 0
>
> According to ANS RFI 001 that may be interpreted as a blank line having
> been read:
>
> u2      flag    ior     Meaning
> --      ----    ---     -------
> 0       false   zero    End-of-file; no characters were read
>
> 0       true    zero    A blank line was read
>
> which is misleading and potentially problematic (your program enters an
> infinite loop trying to get the next line).  The alternative is to report
> 0 false 0  which signals end-of-file and stops further processing.
>

A reliable option is to return a nonzero ior. Among the standard throw
codes, it can be something of:

-11 "result out of range"
-24 "invalid numeric argument"
-9 "invalid memory address"
-71 "unspecified READ-LINE error"
-37 "file I/O exception"
-57 "exception in sending or receiving a character"

Or some system specific throw code.

Actually, it doesn't make any sense for a program to pass zero u1. It
could be only due to a mistake.

A problem is that at the moment the behavior of READ-LINE (and READ-FILE
too) is underspecified for the case u1=0.

There are two option to make the standard better in this regard:

a. Declare existing of an ambiguous condition in the case of u1=0.

b. Declare throw code for the case of u1=0.

I would prefer the option (b) with some general purpose throw code for
all the cases when the provided buffer is insufficient.

--
Ruvim

Re: Handling unsupported line-endings

<2021Oct30.192843@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15115&group=comp.lang.forth#15115

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 30 Oct 2021 17:28:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 40
Message-ID: <2021Oct30.192843@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="f6486f8558d2c8a97d27c68c04e29f2e";
logging-data="5692"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/p01AqqwDUdz7/doAB9z14"
Cancel-Lock: sha1:OZLEkLojwnw7bot9q+LgtIoXyFI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 30 Oct 2021 17:28 UTC

dxforth <dxforth@gmail.com> writes:
>What happens when READ-LINE is asked for u1 = 0 characters?
>
>Several forths tried report 0 true 0
>
>According to ANS RFI 001 that may be interpreted as a blank line having
>been read:
>
>u2 flag ior Meaning
>-- ---- --- -------
>0 false zero End-of-file; no characters were read
>
>0 true zero A blank line was read

If you don't ask for more than 0 characters, don't be surprised that
you only get 0.

>which is misleading and potentially problematic (your program enters an
>infinite loop trying to get the next line).

If a programmer writes a loop that performs read-line with u1=0 every
time and terminates on EOF, that's just one of many ways to write an
infinite loop. Forth does not protect the programmer from other ways
to write an infinite loop, and it does not protect from this way,
either.

>The alternative is to report
>0 false 0 which signals end-of-file and stops further processing.

If the READ-LINE happens at the end-of-file, return false, otherwise
return true. If you return false when not at the end-of-file, a
program that uses READ-LINE with u1=0 to check for EOF will not work
as intended (I have not written such programs, though).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<slkg0f$sph$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15116&group=comp.lang.forth#15116

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 31 Oct 2021 01:12:30 +0300
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <slkg0f$sph$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
<2021Oct30.192843@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 30 Oct 2021 22:12:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b4e4afc09b9edbf82f12c3d8bb5a23a7";
logging-data="29489"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+/WNxKxgGRY7HX9Z5WZBEh"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:cmZd/xdGQv5g7k7eBknVg92Zays=
In-Reply-To: <2021Oct30.192843@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Ruvim - Sat, 30 Oct 2021 22:12 UTC

On 2021-10-30 20:28, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>> What happens when READ-LINE is asked for u1 = 0 characters?
>>
>> Several forths tried report 0 true 0
>>
>> According to ANS RFI 001 that may be interpreted as a blank line having
>> been read:
>>
>> u2 flag ior Meaning
>> -- ---- --- -------
>> 0 false zero End-of-file; no characters were read
>>
>> 0 true zero A blank line was read
>
> If you don't ask for more than 0 characters, don't be surprised that
> you only get 0.
>
>> which is misleading and potentially problematic (your program enters an
>> infinite loop trying to get the next line).
>
> If a programmer writes a loop that performs read-line with u1=0 every
> time and terminates on EOF, that's just one of many ways to write an
> infinite loop. Forth does not protect the programmer from other ways
> to write an infinite loop, and it does not protect from this way,
> either.
>
>> The alternative is to report
>> 0 false 0 which signals end-of-file and stops further processing.
>
> If the READ-LINE happens at the end-of-file, return false, otherwise
> return true.

Aside from the the flag, it also specifies some meaning for u2. When the
flag is true, u2=u1 means that a partial line was read, u2<u1 — that a
complete line was read, and u2=0 means that an empty complete line was read.

So, when flag is true, u2=u1=0 means two mutually exclusive things: that
an empty line was read and that only a partial line of length 0 was read
(i.e. nothing was read).

Hence, if u1=0 (and the EOF is not reached), the file may be neither
read nor not read.

The only possible correct solution is to return a non zero ior, I think.

> If you return false when not at the end-of-file, a
> program that uses READ-LINE with u1=0 to check for EOF will not work
> as intended (I have not written such programs, though).

It's correct.

--
Ruvim

Re: Handling unsupported line-endings

<sll1jk$mjo$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15117&group=comp.lang.forth#15117

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 31 Oct 2021 14:12:52 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sll1jk$mjo$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
<2021Oct30.192843@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="23160"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Sun, 31 Oct 2021 03:12 UTC

On 31/10/2021 04:28, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>What happens when READ-LINE is asked for u1 = 0 characters?
>>
>>Several forths tried report 0 true 0
>>
>>According to ANS RFI 001 that may be interpreted as a blank line having
>>been read:
>>
>>u2 flag ior Meaning
>>-- ---- --- -------
>>0 false zero End-of-file; no characters were read
>>
>>0 true zero A blank line was read
>
> If you don't ask for more than 0 characters, don't be surprised that
> you only get 0.
>
>>which is misleading and potentially problematic (your program enters an
>>infinite loop trying to get the next line).
>
> If a programmer writes a loop that performs read-line with u1=0 every
> time and terminates on EOF, that's just one of many ways to write an
> infinite loop. Forth does not protect the programmer from other ways
> to write an infinite loop, and it does not protect from this way,
> either.

We find implementers going out of their way to produce a result that may
end in an infinite loop. We know the risks - just not the advantages.

>
>>The alternative is to report
>>0 false 0 which signals end-of-file and stops further processing.
>
> If the READ-LINE happens at the end-of-file, return false, otherwise
> return true. If you return false when not at the end-of-file, a
> program that uses READ-LINE with u1=0 to check for EOF will not work
> as intended (I have not written such programs, though).

No because programs check 'flag' for EOF as described by the standard.

Re: Handling unsupported line-endings

<2021Oct31.131437@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15118&group=comp.lang.forth#15118

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 31 Oct 2021 12:14:37 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 37
Message-ID: <2021Oct31.131437@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org> <2021Oct30.192843@mips.complang.tuwien.ac.at> <slkg0f$sph$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="fe77bbe53a44e15cab56ae824db7f3cb";
logging-data="3566"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+IHA11eZbl+EifRFGP/1FY"
Cancel-Lock: sha1:cmVh8+7aF6c4L2nfBSogYUSZ62U=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 31 Oct 2021 12:14 UTC

Ruvim <ruvim.pinka@gmail.com> writes:
>On 2021-10-30 20:28, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>> What happens when READ-LINE is asked for u1 = 0 characters?
....
>Aside from the the flag, it also specifies some meaning for u2. When the
>flag is true, u2=u1 means that a partial line was read, u2<u1 — that a
>complete line was read, and u2=0 means that an empty complete line was read.

I don't see the u2=0 case in
<https://forth-standard.org/standard/file/READ-LINE>.

To me it is obvious that the line in RFI 1
<http://www.complang.tuwien.ac.at/forth/dpans-html/a0001.htm> you
refer to is a mistake (based on assuming that u1>0) on the side of the
person who drafted the response (and was not caught be the committee).
It also appears that this reponse was not intended to change the
meaning of the standard.

And in any case, Forth-2012 kept the Forth-94 wording, so even if the
intent of the response was to change the standard, nobody turned it
into a successful (or unsuccessful for that matter) Forth200x
proposal.

>So, when flag is true, u2=u1=0 means two mutually exclusive things: that
>an empty line was read and that only a partial line of length 0 was read
>(i.e. nothing was read).

Which shows the mistake in the response. But this mistake is not
present in Forth-94 nor in Forth-2012.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct31.134336@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15119&group=comp.lang.forth#15119

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 31 Oct 2021 12:43:36 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 19
Message-ID: <2021Oct31.134336@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org> <2021Oct30.192843@mips.complang.tuwien.ac.at> <sll1jk$mjo$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="fe77bbe53a44e15cab56ae824db7f3cb";
logging-data="3566"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188RfTPSfduFiBLhyDLrZCE"
Cancel-Lock: sha1:DJmLrkhO/NClX6H40jRsx8Ze/9E=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 31 Oct 2021 12:43 UTC

dxforth <dxforth@gmail.com> writes:
>On 31/10/2021 04:28, Anton Ertl wrote:
>> If a programmer writes a loop that performs read-line with u1=0 every
>> time and terminates on EOF, that's just one of many ways to write an
>> infinite loop. Forth does not protect the programmer from other ways
>> to write an infinite loop, and it does not protect from this way,
>> either.
>
>We find implementers going out of their way to produce a result that may
>end in an infinite loop.

Why do you think that somebody went out of their way?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<slm8e8$18u5$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15120&group=comp.lang.forth#15120

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 1 Nov 2021 01:15:34 +1100
Organization: Aioe.org NNTP Server
Message-ID: <slm8e8$18u5$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
<2021Oct30.192843@mips.complang.tuwien.ac.at> <sll1jk$mjo$1@gioia.aioe.org>
<2021Oct31.134336@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="41925"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Sun, 31 Oct 2021 14:15 UTC

On 31/10/2021 23:43, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 31/10/2021 04:28, Anton Ertl wrote:
>>> If a programmer writes a loop that performs read-line with u1=0 every
>>> time and terminates on EOF, that's just one of many ways to write an
>>> infinite loop. Forth does not protect the programmer from other ways
>>> to write an infinite loop, and it does not protect from this way,
>>> either.
>>
>>We find implementers going out of their way to produce a result that may
>>end in an infinite loop.
>
> Why do you think that somebody went out of their way?

Performing a raw read of the source and getting 0 bytes means EOF
to me. What does it mean to you?

Re: Handling unsupported line-endings

<slogli$rmb$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15121&group=comp.lang.forth#15121

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 1 Nov 2021 13:48:16 +0300
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <slogli$rmb$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
<2021Oct30.192843@mips.complang.tuwien.ac.at> <slkg0f$sph$1@dont-email.me>
<2021Oct31.131437@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 1 Nov 2021 10:48:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="59523997ff1a6fc5807777b4b7f8354c";
logging-data="28363"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19yEpa/G+S5m40Z9AVKAHPT"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:ANXmIRezrXrqrFDboEtuDxi/1wo=
In-Reply-To: <2021Oct31.131437@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Ruvim - Mon, 1 Nov 2021 10:48 UTC

On 2021-10-31 15:14, Anton Ertl wrote:
> Ruvim <ruvim.pinka@gmail.com> writes:
>> On 2021-10-30 20:28, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>>> What happens when READ-LINE is asked for u1 = 0 characters?
> ...
>> Aside from the the flag, it also specifies some meaning for u2. When the
>> flag is true, u2=u1 means that a partial line was read, u2<u1 — that a
>> complete line was read, and u2=0 means that an empty complete line was read.
>
> I don't see the u2=0 case in
> <https://forth-standard.org/standard/file/READ-LINE>.

This case is implied by "(0 <= u2 <= u1)"
It's obvious that if u2=0 then the line is empty (when flag is true).

> To me it is obvious that the line in RFI 1
> <http://www.complang.tuwien.ac.at/forth/dpans-html/a0001.htm> you
> refer to is a mistake (based on assuming that u1>0) on the side of the
> person who drafted the response (and was not caught be the committee).
> It also appears that this reponse was not intended to change the
> meaning of the standard.

I didn't refer to RFI 1, but it doesn't matter.
Obviously, it's correct when u1 > 0. And it's ambiguous when u1=0.

So u2=u1 (regardless of u1) should always mean that a line terminator is
not reached.

> And in any case, Forth-2012 kept the Forth-94 wording, so even if the
> intent of the response was to change the standard, nobody turned it
> into a successful (or unsuccessful for that matter) Forth200x
> proposal.
>
>> So, when flag is true, u2=u1=0 means two mutually exclusive things: that
>> an empty line was read and that only a partial line of length 0 was read
>> (i.e. nothing was read).
>
> Which shows the mistake in the response. But this mistake is not
> present in Forth-94 nor in Forth-2012.

Agreed. But some other mistakes are still present in this glossary entry
anyway (e.g. in "before u1").

--
Ruvim

Re: Handling unsupported line-endings

<2021Nov1.151607@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15122&group=comp.lang.forth#15122

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 01 Nov 2021 14:16:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2021Nov1.151607@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org> <2021Oct30.192843@mips.complang.tuwien.ac.at> <slkg0f$sph$1@dont-email.me> <2021Oct31.131437@mips.complang.tuwien.ac.at> <slogli$rmb$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="0d65796f621b5a49322afdb7c8e7fd32";
logging-data="17648"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19VCJPepcfueKVk5f3XydyN"
Cancel-Lock: sha1:uZJGHDkqZY/HbNSJ5x/165EhUrE=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 1 Nov 2021 14:16 UTC

Ruvim <ruvim.pinka@gmail.com> writes:
>It's obvious that if u2=0 then the line is empty (when flag is true).

What makes you think so?

The standard says:

|When u1 = u2 the line terminator has yet to be reached.

That also holds if u1=u2=0.

>So u2=u1 (regardless of u1) should always mean that a line terminator is
>not reached.

It does. So if the programmer calls READ-LINE with u1=0, READ-LINE
should not consume any characters from the file. Conversely, if the
program should make any progress in the file, it should call READ-LINE
with u1>0.

>Agreed. But some other mistakes are still present in this glossary entry
>anyway (e.g. in "before u1").

Apart from the misleading "0 <= u2 <= u1" (where "0 <= u2 <u1" would
be better), anything else?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<slp3u2$ebf$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15123&group=comp.lang.forth#15123

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ruvim.pi...@gmail.com (Ruvim)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 1 Nov 2021 19:17:04 +0300
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <slp3u2$ebf$1@dont-email.me>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
<2021Oct30.192843@mips.complang.tuwien.ac.at> <slkg0f$sph$1@dont-email.me>
<2021Oct31.131437@mips.complang.tuwien.ac.at> <slogli$rmb$1@dont-email.me>
<2021Nov1.151607@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 1 Nov 2021 16:17:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="3ddbd8272af5bc4cd6fa60e03af28fcb";
logging-data="14703"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/d/H3xeX147E8Iom3T5klj"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Cancel-Lock: sha1:MgDskTrbVk8xLwaJvlJYF7ST2oo=
In-Reply-To: <2021Nov1.151607@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Ruvim - Mon, 1 Nov 2021 16:17 UTC

On 2021-11-01 17:16, Anton Ertl wrote:
> Ruvim <ruvim.pinka@gmail.com> writes:
>> It's obvious that if u2=0 then the line is empty (when flag is true).
>
> What makes you think so?

I mean when u1 > 0.

A small problem for a user is that checking for u2=0 cannot be performed
regardless of the value of u1. At the first glans I considered this
check regardless of u1 and concluded an ambiguity. I was wrong.

>
> The standard says:
>
> |When u1 = u2 the line terminator has yet to be reached.
>
> That also holds if u1=u2=0.
>
>> So u2=u1 (regardless of u1) should always mean that a line terminator is
>> not reached.
>
> It does. So if the programmer calls READ-LINE with u1=0, READ-LINE
> should not consume any characters from the file.

Yes, I'm agree.

> Conversely, if the
> program should make any progress in the file, it should call READ-LINE
> with u1>0.
>
>> Agreed. But some other mistakes are still present in this glossary entry
>> anyway (e.g. in "before u1").
>
> Apart from the misleading "0 <= u2 <= u1" (where "0 <= u2 <u1" would
> be better), anything else?

The parts:

A) "At most u1 characters are read"

B) "If a line terminator was received before u1 characters were read"

should be correspondingly:

a) "At most u1 characters (except line terminators) are read"

b) "If the first character of a complete line terminator was received
before u1+1 characters were read"

(Maybe a better wording can be found)

--
Ruvim

Re: Handling unsupported line-endings

<2021Nov1.201920@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15124&group=comp.lang.forth#15124

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 01 Nov 2021 19:19:20 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2021Nov1.201920@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org> <2021Oct30.192843@mips.complang.tuwien.ac.at> <slkg0f$sph$1@dont-email.me> <2021Oct31.131437@mips.complang.tuwien.ac.at> <slogli$rmb$1@dont-email.me> <2021Nov1.151607@mips.complang.tuwien.ac.at> <slp3u2$ebf$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="0d65796f621b5a49322afdb7c8e7fd32";
logging-data="3686"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Dad1cFmlzuftXtweJ/4Du"
Cancel-Lock: sha1:Ml/0x2ECY94HXyba73ALn8Jxizw=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 1 Nov 2021 19:19 UTC

Ruvim <ruvim.pinka@gmail.com> writes:
>On 2021-11-01 17:16, Anton Ertl wrote:
>> Apart from the misleading "0 <= u2 <= u1" (where "0 <= u2 <u1" would
>> be better), anything else?
>
>
>The parts:
>
>A) "At most u1 characters are read"
>
>B) "If a line terminator was received before u1 characters were read"
>
>should be correspondingly:
>
>a) "At most u1 characters (except line terminators) are read"
>
>b) "If the first character of a complete line terminator was received
>before u1+1 characters were read"
>
>(Maybe a better wording can be found)

I wrote a comment
<https://forth-standard.org/standard/file/READ-LINE#contribution-216>
where I put down the results of our discussions.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<slq260$1l9v$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15125&group=comp.lang.forth#15125

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Tue, 2 Nov 2021 11:53:19 +1100
Organization: Aioe.org NNTP Server
Message-ID: <slq260$1l9v$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <sljml0$3qg$1@gioia.aioe.org>
<2021Oct30.192843@mips.complang.tuwien.ac.at> <slkg0f$sph$1@dont-email.me>
<2021Oct31.131437@mips.complang.tuwien.ac.at> <slogli$rmb$1@dont-email.me>
<2021Nov1.151607@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="54591"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.2.1
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Tue, 2 Nov 2021 00:53 UTC

On 2/11/2021 01:16, Anton Ertl wrote:
> Ruvim <ruvim.pinka@gmail.com> writes:
>>It's obvious that if u2=0 then the line is empty (when flag is true).
>
> What makes you think so?
>
> The standard says:
>
> |When u1 = u2 the line terminator has yet to be reached.
>
> That also holds if u1=u2=0.
>
>>So u2=u1 (regardless of u1) should always mean that a line terminator is
>>not reached.
>
> It does. So if the programmer calls READ-LINE with u1=0, READ-LINE
> should not consume any characters from the file. Conversely, if the
> program should make any progress in the file, it should call READ-LINE
> with u1>0.
>
>>Agreed. But some other mistakes are still present in this glossary entry
>>anyway (e.g. in "before u1").
>
> Apart from the misleading "0 <= u2 <= u1" (where "0 <= u2 <u1" would
> be better), anything else?

So the ANS spec isn't really dead - it has "misleading" statements (you);
or implementers should rewind the file in the case of u1=u2 and line
terminator received (Ruvim).

https://www.youtube.com/watch?v=vZw35VUBdzo

Pages:1234567
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor