Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Free markets select for winning solutions." -- Eric S. Raymond


devel / comp.lang.forth / Re: Handling unsupported line-endings

SubjectAuthor
* Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsHeinrich Hohl
|`* Re: Handling unsupported line-endingsdxforth
| `* Re: Handling unsupported line-endingsHeinrich Hohl
|  +* Re: Handling unsupported line-endingsdxforth
|  |`- Re: Handling unsupported line-endingsdxforth
|  +* Re: Handling unsupported line-endingsminf...@arcor.de
|  |`- Re: Handling unsupported line-endingsdxforth
|  `* Re: Handling unsupported line-endingsAnton Ertl
|   +* Re: Handling unsupported line-endingsHeinrich Hohl
|   |`- Re: Handling unsupported line-endingsAnton Ertl
|   `* Re: Handling unsupported line-endingsNickolay Kolchin
|    +* Re: Handling unsupported line-endingsdxforth
|    |+* Re: Handling unsupported line-endingsdxforth
|    ||`* Re: Handling unsupported line-endingsAnton Ertl
|    || `* Re: Handling unsupported line-endingsdxforth
|    ||  +* Re: Handling unsupported line-endingsdxforth
|    ||  |`* Re: Handling unsupported line-endingsAnton Ertl
|    ||  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |  +- Re: Handling unsupported line-endingsdxforth
|    ||  |  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |   `* Re: Handling unsupported line-endingsdxforth
|    ||  |    `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |     `* Re: Handling unsupported line-endingsdxforth
|    ||  |      `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |       `* Re: Handling unsupported line-endingsdxforth
|    ||  |        `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |         `* Re: Handling unsupported line-endingsdxforth
|    ||  |          `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |           `* Re: Handling unsupported line-endingsdxforth
|    ||  |            `* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |             `* Re: Handling unsupported line-endingsdxforth
|    ||  |              +* Re: Handling unsupported line-endingsAnton Ertl
|    ||  |              |`* Re: Handling unsupported line-endingsdxforth
|    ||  |              | `* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  +* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |`* Re: Handling unsupported line-endingsRuvim
|    ||  |              |  | `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |  |  `- Re: Handling unsupported line-endingsRuvim
|    ||  |              |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    ||  |              |   `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |    `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |     `* Re: Handling unsupported line-endingsRon AARON
|    ||  |              |      `* Re: Handling unsupported line-endingsdxforth
|    ||  |              |       `- Re: Handling unsupported line-endingsRon AARON
|    ||  |              `* Re: Handling unsupported line-endingsdxforth
|    ||  |               `- Re: Handling unsupported line-endingsdxforth
|    ||  `* Re: Handling unsupported line-endingsAnton Ertl
|    ||   `* Re: Handling unsupported line-endingsdxforth
|    ||    `- Re: Handling unsupported line-endingsAnton Ertl
|    |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | +* Re: Handling unsupported line-endingsdxforth
|    | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsdxforth
|    | | |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    | | +* Re: Handling unsupported line-endingsAnton Ertl
|    | | |`* Re: Handling unsupported line-endingsNickolay Kolchin
|    | | | `* Re: Handling unsupported line-endingsAnton Ertl
|    | | |  `- Re: Handling unsupported line-endingsdxforth
|    | | `* Re: Handling unsupported line-endingsHeinrich Hohl
|    | |  `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |   `* Re: Handling unsupported line-endingsRon AARON
|    | |    `* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     +* Re: Handling unsupported line-endingspahihu
|    | |     |+- Re: Handling unsupported line-endingsNickolay Kolchin
|    | |     |`- Re: Handling unsupported line-endingsRon AARON
|    | |     `* Re: Handling unsupported line-endingsRon AARON
|    | |      +* Re: Handling unsupported line-endingsNickolay Kolchin
|    | |      |`- Re: Handling unsupported line-endingsRon AARON
|    | |      `- Re: Handling unsupported line-endingsdxforth
|    | `- Re: Handling unsupported line-endingsAnton Ertl
|    +* Re: Handling unsupported line-endingsAnton Ertl
|    |`- Re: Handling unsupported line-endingsNickolay Kolchin
|    `* Re: Handling unsupported line-endingsMarcel Hendrix
|     +- Re: Handling unsupported line-endingsNickolay Kolchin
|     `* Re: Handling unsupported line-endingsAnton Ertl
|      `* Re: Handling unsupported line-endingsdxforth
|       `* Re: Handling unsupported line-endingsAnton Ertl
|        `* Re: Handling unsupported line-endingspahihu
|         +* Re: Handling unsupported line-endingsdxforth
|         |`* Re: Handling unsupported line-endingsAnton Ertl
|         | `- Re: Handling unsupported line-endingsdxforth
|         `- Re: Handling unsupported line-endingsAnton Ertl
+* Re: Handling unsupported line-endingsS Jack
|`- Re: Handling unsupported line-endingsdxforth
+* Re: Handling unsupported line-endingsBranimir Maksimovic
|`- Re: Handling unsupported line-endingsdxforth
`* Re: Handling unsupported line-endingsdxforth
 +- Re: Handling unsupported line-endingsRuvim
 `* Re: Handling unsupported line-endingsAnton Ertl
  +* Re: Handling unsupported line-endingsRuvim
  |`* Re: Handling unsupported line-endingsAnton Ertl
  | `* Re: Handling unsupported line-endingsRuvim
  |  `* Re: Handling unsupported line-endingsAnton Ertl
  |   +* Re: Handling unsupported line-endingsRuvim
  |   |`- Re: Handling unsupported line-endingsAnton Ertl
  |   `* Re: Handling unsupported line-endingsdxforth
  |    `* Re: Handling unsupported line-endingsRuvim
  |     `* Re: Handling unsupported line-endingsdxforth
  |      `* Re: Handling unsupported line-endingsRuvim
  |       `* Re: Handling unsupported line-endingsdxforth
  `* Re: Handling unsupported line-endingsdxforth

Pages:1234567
Re: Handling unsupported line-endings

<sku5be$1t2m$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15046&group=comp.lang.forth#15046

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Fri, 22 Oct 2021 21:55:41 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sku5be$1t2m$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="62550"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Fri, 22 Oct 2021 10:55 UTC

On 22/10/2021 19:53, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 22/10/2021 11:43, dxforth wrote:
>>> On 22/10/2021 03:09, Anton Ertl wrote:
>>>> It's an error-prone interface, though.
>>>
>>> Remembering to add 2 to the buffer size? Agree with that but
>>> what's the alternative if one insists on handling any line ending?
>>
>>I suppose the spec could have been written:
>>
>> "Read the next line from the file specified by fileid into memory given
>> by address /c-addr u1/. Up to two implementation-defined line-terminating
>> characters may be read into memory at the end of the line, but are not
>> included in the count u2. The line buffer provided by c-addr /shall/ be
>> at least 2 characters long."
>
> Yes, that's my option 2),

> ...
> Anyway, it's water down the river.

AFAICS it's minimal change and breakage to what exists now.

> but you also need to specify how to
> recognize whether the line end has been reached or not.

Such need is rare. Should an EOL flag be needed, it can generated e.g.

\ As above but includes 'eol' (true if eol found)
: GETLINE ( a u -- a u' eol -1 | 0 )
posinfile 2>r 2dup 1+ readdata rot min dup if ( a a u)
( a a u') /eol dup 0 2r> d+ seekinfile over <> -1
end rdrop rdrop nip nip ;

I used this in my CRLF app to process lines of truly arbitrary length.

Re: Handling unsupported line-endings

<skvm1i$qlj$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15047&group=comp.lang.forth#15047

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 11:46:42 +1100
Organization: Aioe.org NNTP Server
Message-ID: <skvm1i$qlj$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
<2021Oct22.122045@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="27315"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Sat, 23 Oct 2021 00:46 UTC

On 22/10/2021 21:20, Anton Ertl wrote:
> ...
> I ran iforth on the ten bibles from Ben Hoyt's count-unique task:
>
> LC_NUMERIC=en_US.utf8 perf stat -e cycles -e cycles:u -e cycles:k -e instructions:u iforth 's" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye'
> 998170
>
> Performance counter stats for 'iforth s" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye':

You seem to be falling into the same trap as novices - ALLOTing less buffer
space than ANS requires for READ-LINE and other implementations may need.
Given the propensity of users to misjudge buffer size, perhaps READ-LINE spec
needs adjusting after all!

Below is test which checks how many characters are asked of READ-LINE vs. how
many it actually reads into the buffer. I picked the 'corner case' of a CRLF
terminated line in which the user asks for the line length + 1 chars.

\ start

create buf 14 2 + allot

: makeln
buf 13 [char] a fill $0d buf 13 + c! $0a buf 14 + c!
s" foo" r/w create-file throw >r
buf 15 r@ write-file throw
buf 15 r@ write-file throw
r> close-file drop ;

: readln
buf 14 2 + [char] x fill
s" foo" r/w open-file throw >r
cr ." asking: "
buf 14 dup . r@ read-line throw
r> close-file drop
cr ." got: flag = " . ." u2 = " .
cr buf 14 2 + dump ;

: run makeln readln ;

run

\ end

Results from 4 popular systems below. We find SwiftForth and Win32Forth
would have overwritten the buffer had they only been allotted the requested
14 characters. Curiously Gforth did not - while I expected it to. Perhaps
it's doing more work 'under the hood' than SF and Win32F ? And it appears
VFX has a bug.

SwiftForth i386-Win32 3.11.2 22-Jun-2021
run
asking: 14
got: flag = -1 u2 = 13

487424 61 61 61 61 61 61 61 61 61 61 61 61 61 0D 0A 78 aaaaaaaaaaaaa..x ok

VFX Forth for Windows x86
© MicroProcessor Engineering Ltd, 1998-2021

Version: 5.20 [build 3797]
Build date: 27 May 2021
run
asking: 14
got: flag = -1 u2 = 14 <-------- That looks like a bug!

004F:08A0 61 61 61 61 61 61 61 61 61 61 61 61 61 0D 78 78 aaaaaaaaaaaaa.xx

Win32Forth: a 32 Bit Forth for Windows 95/98/ME/NT4/W2K/XP/VISTA/W7/W8/W10
Version: 6.15.05 Build: 2
Compiled: Friday, October 19 2018, 10:20PM
run
asking: 14
got: flag = -1 u2 = 13

449360 | 61 61 61 61 61 61 61 61 61 61 61 61 61 0D 0A 78 |aaaaaaaaaaaaa..x| ok

Gforth 0.7.9_20200709
run
asking: 14
got: flag = -1 u2 = 13

6FFFFF8771F0: 61 61 61 61 61 61 61 61 - 61 61 61 61 61 78 78 78 aaaaaaaaaaaaaxxx

Re: Handling unsupported line-endings

<sl0343$lu7$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15051&group=comp.lang.forth#15051

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 15:29:56 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl0343$lu7$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
<sku5be$1t2m$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="22471"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Sat, 23 Oct 2021 04:29 UTC

On 22/10/2021 21:55, dxforth wrote:
> On 22/10/2021 19:53, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>On 22/10/2021 11:43, dxforth wrote:
>>>> On 22/10/2021 03:09, Anton Ertl wrote:
>>>>> It's an error-prone interface, though.
>>>>
>>>> Remembering to add 2 to the buffer size? Agree with that but
>>>> what's the alternative if one insists on handling any line ending?
>>>
>>>I suppose the spec could have been written:
>>>
>>> "Read the next line from the file specified by fileid into memory given
>>> by address /c-addr u1/. Up to two implementation-defined line-terminating
>>> characters may be read into memory at the end of the line, but are not
>>> included in the count u2. The line buffer provided by c-addr /shall/ be
>>> at least 2 characters long."
>>
>> Yes, that's my option 2),
>
>> ...
>> Anyway, it's water down the river.
>
> AFAICS it's minimal change and breakage to what exists now.
>
>> but you also need to specify how to
>> recognize whether the line end has been reached or not.
>
> Such need is rare. Should an EOL flag be needed, it can generated e.g.
>
> \ As above but includes 'eol' (true if eol found)
> : GETLINE ( a u -- a u' eol -1 | 0 )
> posinfile 2>r 2dup 1+ readdata rot min dup if ( a a u)
> ( a a u') /eol dup 0 2r> d+ seekinfile over <> -1
> end rdrop rdrop nip nip ;
>
> I used this in my CRLF app to process lines of truly arbitrary length.

The latter needs to be similarly amended to bring it into line with
what fgets does. I've updated all my sources accordingly.

: GETLINE ( a u -- a u' eol -1 | 0 )
posinfile 2>r 2dup readdata rot min dup if ( a a u)
( a a u') 1- /eol dup 0 2r> d+ seekinfile over <> -1
end rdrop rdrop nip nip ;

Re: Handling unsupported line-endings

<2021Oct23.101502@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15053&group=comp.lang.forth#15053

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 08:15:02 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 27
Message-ID: <2021Oct23.101502@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org> <2021Oct22.103014@mips.complang.tuwien.ac.at> <sku3k2$10tf$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="646da25659bb81e3912ea511d4515175";
logging-data="23454"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19l+9dtKX2O4rugttvzoWab"
Cancel-Lock: sha1:FOp8pZtGeUF3qkKOteMcEB+KTPg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 23 Oct 2021 08:15 UTC

dxforth <dxforth@gmail.com> writes:
>Does fgets have this problem
>of overwrite - or does it conform to the buffer size it has been given?

fgets() only writes at most as many chars as the caller specified.
This includes the terminating zero, and possibly '\n' (the newline
character). If there is no '\n' in the buffer afterwards, you need at
least one additional fgets() for the rest of the line.

One thing to note is that C assumes only one newline character. When
C was adapted to OSs with CFLF newlines, it got text (default) and
binary modes of opening files, and text mode means that the C library
translates CFLF into LF, so the application only sees LF.

This is an interesting contrast to what happened in MacOS X: MacOS
uses CR newlines, and MacOS X uses LF newlines. It would have been
relatively simple to let '\n' mean CR (and the few uses of \r would be
out of luck), but they preferred the pain of transition. That's
probably because they preferred the MacOS -> MacOS X transition pain
to a hypothetical NeXtSTEP -> CR MacOS X transition.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct23.103251@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15054&group=comp.lang.forth#15054

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 08:32:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 32
Message-ID: <2021Oct23.103251@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org> <skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="646da25659bb81e3912ea511d4515175";
logging-data="23454"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19xVrrWBnojrLysrBG1V2q7"
Cancel-Lock: sha1:I9dk13Y5vv29K9DaYYGbY9WIP7U=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 23 Oct 2021 08:32 UTC

dxforth <dxforth@gmail.com> writes:
[change READ-LINE to include the two extra characters]
>AFAICS it's minimal change and breakage to what exists now.

It will not work as intended with any existing code that uses
READ-LINE and checks for incomplete lines.

>> but you also need to specify how to
>> recognize whether the line end has been reached or not.
>
>Such need is rare.

This need exists and the programmers have to deal with it.

> hould an EOL flag be needed, it can generated e.g.
>
> \ As above but includes 'eol' (true if eol found)
> : GETLINE ( a u -- a u' eol -1 | 0 )
> posinfile 2>r 2dup 1+ readdata rot min dup if ( a a u)
> ( a a u') /eol dup 0 2r> d+ seekinfile over <> -1
> end rdrop rdrop nip nip ;

I see lots of non-standard words here and no use of READ-LINE. This
does not demonstrate how to use your suggested changed READ-LINE to
deal with lines longer than the buffer.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct23.104040@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15056&group=comp.lang.forth#15056

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 08:40:40 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 29
Message-ID: <2021Oct23.104040@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com> <2021Oct22.122045@mips.complang.tuwien.ac.at> <skvm1i$qlj$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="646da25659bb81e3912ea511d4515175";
logging-data="23454"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ROyna76iy6M4/0QKBnP8X"
Cancel-Lock: sha1:o8VXg09aLIdpDWtPuBGN8H6xDUg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 23 Oct 2021 08:40 UTC

dxforth <dxforth@gmail.com> writes:
>On 22/10/2021 21:20, Anton Ertl wrote:
>> Performance counter stats for 'iforth s" /home/anton/forth/count-unique.in" r/o open-file throw constant f create buf 256 allot : foo 0 begin buf 256 f read-line throw nip while 1+ repeat ; foo . cr bye':
>
>You seem to be falling into the same trap as novices - ALLOTing less buffer
>space than ANS requires for READ-LINE and other implementations may need.

Yes. How did this come about? At first I designed this
microbenchmark for Gforth only, and I know that Gforth does not need
the extra characters; then people mentioned other Forth systems, and I
reused it for the other Forth systems, fixing what did not work (e.g.,
Gforth 0.7.3 does not understand the string recognizer I used at
first). For some reason, the insufficient buffer length did not cause
any obvious breakage.

[write to the extra two chars]
>Curiously Gforth did not - while I expected it to. Perhaps
>it's doing more work 'under the hood' than SF and Win32F ?

Gforth uses C buffered I/O under the hood, and uses getc() to inspect
every char individually. And when it sees CR or LF, it does not write
them into the buffer.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl0opd$4dh$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15057&group=comp.lang.forth#15057

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 21:39:40 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl0opd$4dh$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
<sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="4529"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Sat, 23 Oct 2021 10:39 UTC

On 23/10/2021 19:32, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
> [change READ-LINE to include the two extra characters]

Specifically change u1 to mean buffer size.

>>AFAICS it's minimal change and breakage to what exists now.
>
> It will not work as intended with any existing code that uses
> READ-LINE and checks for incomplete lines.

How does ANS READ-LINE currently check for incomplete lines?

Re: Handling unsupported line-endings

<2021Oct23.125900@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15058&group=comp.lang.forth#15058

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 10:59:00 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2021Oct23.125900@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org> <skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="646da25659bb81e3912ea511d4515175";
logging-data="2155"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19mZoN8ZX7rBYi/4kpyCB5n"
Cancel-Lock: sha1:17/o/VOHZH81RQKgmCCCJHVSJys=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 23 Oct 2021 10:59 UTC

dxforth <dxforth@gmail.com> writes:
>On 23/10/2021 19:32, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>> [change READ-LINE to include the two extra characters]
>
>Specifically change u1 to mean buffer size.
>
>>>AFAICS it's minimal change and breakage to what exists now.
>>
>> It will not work as intended with any existing code that uses
>> READ-LINE and checks for incomplete lines.
>
>How does ANS READ-LINE currently check for incomplete lines?

By checking whether u2=u1. But note that that also means that all u2
chars are valid.

Example usage (not designed as example):

: $slurp-line { fid addr -- flag } addr $free
BEGIN
addr $@len dup { sk } $100 umax dup >r addr $+!len
r@ fid read-line throw
swap dup r> = WHILE 2drop REPEAT sk + addr $!len ;

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl0v52$lvk$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15059&group=comp.lang.forth#15059

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 23:28:17 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl0v52$lvk$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
<sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at>
<sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="22516"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Sat, 23 Oct 2021 12:28 UTC

On 23/10/2021 21:59, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 23/10/2021 19:32, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>> [change READ-LINE to include the two extra characters]
>>
>>Specifically change u1 to mean buffer size.
>>
>>>>AFAICS it's minimal change and breakage to what exists now.
>>>
>>> It will not work as intended with any existing code that uses
>>> READ-LINE and checks for incomplete lines.
>>
>>How does ANS READ-LINE currently check for incomplete lines?
>
> By checking whether u2=u1.

You are saying a completed line can never be u1 characters long.
Under ANS the buffer is u1 + 2 characters. That's enough room
to hold a completed line should an implementor so choose.

I agree it would be useful were the u2=u1 test to work on every
READ-LINE implementation, however ANS doesn't stipulate it nor
guarantees it. With the change to the spec I suggested, a
completed line is always less than u1 characters and your test
should work every time. I'm puzzled what leads you to believe
it won't.

Re: Handling unsupported line-endings

<2021Oct23.144831@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15060&group=comp.lang.forth#15060

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 12:48:31 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 47
Message-ID: <2021Oct23.144831@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org> <skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="646da25659bb81e3912ea511d4515175";
logging-data="25449"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18zntZZy9EAcsiqYw27kwoX"
Cancel-Lock: sha1:ngubbAsvB+fvCmyf/cq7rnnvqOA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 23 Oct 2021 12:48 UTC

dxforth <dxforth@gmail.com> writes:
>On 23/10/2021 21:59, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>How does ANS READ-LINE currently check for incomplete lines?
>>
>> By checking whether u2=u1.
>
>You are saying a completed line can never be u1 characters long.
>Under ANS the buffer is u1 + 2 characters. That's enough room
>to hold a completed line should an implementor so choose.
>
>I agree it would be useful were the u2=u1 test to work on every
>READ-LINE implementation, however ANS doesn't stipulate it nor
>guarantees it.

It does:

|When u1 = u2 the line terminator has yet to be reached.

>With the change to the spec I suggested, a
>completed line is always less than u1 characters and your test
>should work every time. I'm puzzled what leads you to believe
>it won't.

Your spec does not specify what happens when the buffer is too short
for the line, or one of the in-between cases. Given that the
intention seems to be that the system reads the newline characters
into the buffer, I expect breakage in corner cases.

In particular, if you have u1=10, and a line consisting of 9 chars
followed by CR and LF, you only see the CR with your read.

A simple interface would return u2=9 and have a separate way for
indicating that the line continues, and would just reposition to
before the CR, and indicate that the line continues (so the next read
would see the full CRLF).

There are ways to deal with this case if only u2=u1 indicates that the
line is incomplete, but the system implementation becomes more complex
and the probability of bugs increases.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl19jo$1533$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15061&group=comp.lang.forth#15061

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 24 Oct 2021 02:26:47 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl19jo$1533$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<sknsb6$1jc7$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
<sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at>
<sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at>
<sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="37987"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Sat, 23 Oct 2021 15:26 UTC

On 23/10/2021 23:48, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 23/10/2021 21:59, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>>>How does ANS READ-LINE currently check for incomplete lines?
>>>
>>> By checking whether u2=u1.
>>
>>You are saying a completed line can never be u1 characters long.
>>Under ANS the buffer is u1 + 2 characters. That's enough room
>>to hold a completed line should an implementor so choose.
>>
>>I agree it would be useful were the u2=u1 test to work on every
>>READ-LINE implementation, however ANS doesn't stipulate it nor
>>guarantees it.
>
> It does:
>
> |When u1 = u2 the line terminator has yet to be reached.

I see no "shall" in there - just an observation about overly long
lines. The previous sentence indicates a completed line may range
"0 <= u2 <= u1".

>
>>With the change to the spec I suggested, a
>>completed line is always less than u1 characters and your test
>>should work every time. I'm puzzled what leads you to believe
>>it won't.
>
> Your spec does not specify what happens when the buffer is too short
> for the line, or one of the in-between cases. Given that the
> intention seems to be that the system reads the newline characters
> into the buffer, I expect breakage in corner cases.
>
> In particular, if you have u1=10, and a line consisting of 9 chars
> followed by CR and LF, you only see the CR with your read.
>
> A simple interface would return u2=9 and have a separate way for
> indicating that the line continues, and would just reposition to
> before the CR, and indicate that the line continues (so the next read
> would see the full CRLF).
>
> There are ways to deal with this case if only u2=u1 indicates that the
> line is incomplete, but the system implementation becomes more complex
> and the probability of bugs increases.

It's news to me that READ-LINE is particularly hard to implement or
the u2=u1 test doesn't just work.

https://pastebin.com/z5Zmaatb

Re: Handling unsupported line-endings

<2021Oct23.185441@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15062&group=comp.lang.forth#15062

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sat, 23 Oct 2021 16:54:41 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 86
Message-ID: <2021Oct23.185441@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org> <2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org> <skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="646da25659bb81e3912ea511d4515175";
logging-data="6416"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Q+ZwpKgj9eUK9Ayi596N7"
Cancel-Lock: sha1:cbjgdwaR7Fdd1+QL2j41nqA0pEo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 23 Oct 2021 16:54 UTC

dxforth <dxforth@gmail.com> writes:
>On 23/10/2021 23:48, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>I agree it would be useful were the u2=u1 test to work on every
>>>READ-LINE implementation, however ANS doesn't stipulate it nor
>>>guarantees it.
>>
>> It does:
>>
>> |When u1 = u2 the line terminator has yet to be reached.
>
>I see no "shall" in there - just an observation about overly long
>lines. The previous sentence indicates a completed line may range
>"0 <= u2 <= u1".

The exact sentence is:

|If a line terminator was received before u1 characters were read, then
|u2 is the number of characters, not including the line terminator,
|actually read (0 <= u2 <= u1).

A little thinking produces the result that, not only does 0 <= u2 <=
u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
terminator was received /before/ u1 characters were read, and the line
terminator is not included.

>> Your spec does not specify what happens when the buffer is too short
>> for the line, or one of the in-between cases. Given that the
>> intention seems to be that the system reads the newline characters
>> into the buffer, I expect breakage in corner cases.
>>
>> In particular, if you have u1=10, and a line consisting of 9 chars
>> followed by CR and LF, you only see the CR with your read.
>>
>> A simple interface would return u2=9 and have a separate way for
>> indicating that the line continues, and would just reposition to
>> before the CR, and indicate that the line continues (so the next read
>> would see the full CRLF).
>>
>> There are ways to deal with this case if only u2=u1 indicates that the
>> line is incomplete, but the system implementation becomes more complex
>> and the probability of bugs increases.
>
>It's news to me that READ-LINE is particularly hard to implement or
>the u2=u1 test doesn't just work.
>
>https://pastebin.com/z5Zmaatb

I have tried this out, and, as I expected, it breaks in a corner case:

: makeln
s" foo" r/w create-file throw >r
s" 123456789" r@ write-file throw (cr) r@ write-file throw
s" 123456789" r@ write-file throw (cr) 1- r@ write-file throw
s" 123456789" r@ write-file throw (cr) 1 /string r@ write-file throw
s" 123456789" r@ write-file throw
r> close-file drop ;

: readln
s" foo" r/w open-file throw >r
cr ." asking: " buf size dup . r@ read-line throw
cr ." got: flag = " . ." u2 = " dup . buf swap type
cr ." asking: " buf size dup . r@ read-line throw
cr ." got: flag = " . ." u2 = " dup . buf swap type
cr ." asking: " buf size dup . r@ read-line throw
cr ." got: flag = " . ." u2 = " dup . buf swap type
cr ." asking: " buf size dup . r@ read-line throw
cr ." got: flag = " . ." u2 = " dup . buf swap type
r> close-file drop ;

Your READ-FILE treats the CR without following LF as a newline, but
does not deliver the following character. Treating CR-only as newline
is ok, but then the next character must be delivered. Treating it as
non-newline is also ok; then the second line should be treated as a
19-character line.

Could a determined fuzzer find more corner cases?

Cool use of the scan index, though.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl2e46$16ea$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15063&group=comp.lang.forth#15063

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 24 Oct 2021 12:49:57 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl2e46$16ea$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <sko3h4$1p8s$1@gioia.aioe.org>
<2021Oct21.180930@mips.complang.tuwien.ac.at> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
<sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at>
<sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at>
<sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at>
<sl19jo$1533$1@gioia.aioe.org> <2021Oct23.185441@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="39370"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Content-Language: en-GB
X-Notice: Filtered by postfilter v. 0.9.2
 by: dxforth - Sun, 24 Oct 2021 01:49 UTC

On 24/10/2021 03:54, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 23/10/2021 23:48, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>>>I agree it would be useful were the u2=u1 test to work on every
>>>>READ-LINE implementation, however ANS doesn't stipulate it nor
>>>>guarantees it.
>>>
>>> It does:
>>>
>>> |When u1 = u2 the line terminator has yet to be reached.
>>
>>I see no "shall" in there - just an observation about overly long
>>lines. The previous sentence indicates a completed line may range
>>"0 <= u2 <= u1".
>
> The exact sentence is:
>
> |If a line terminator was received before u1 characters were read, then
> |u2 is the number of characters, not including the line terminator,
> |actually read (0 <= u2 <= u1).
>
> A little thinking produces the result that, not only does 0 <= u2 <=
> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
> terminator was received /before/ u1 characters were read, and the line
> terminator is not included.

A little thinking will inform the purpose of specifying a 'u1 + 2' sized
buffer was to allow a completed line of 'u1' characters. If you can
interpret the remainder of ANS' specification in a way that contradicts
it, then there is something wrong either with your interpretation or
the spec.

>
>>> Your spec does not specify what happens when the buffer is too short
>>> for the line, or one of the in-between cases. Given that the
>>> intention seems to be that the system reads the newline characters
>>> into the buffer, I expect breakage in corner cases.
>>>
>>> In particular, if you have u1=10, and a line consisting of 9 chars
>>> followed by CR and LF, you only see the CR with your read.
>>>
>>> A simple interface would return u2=9 and have a separate way for
>>> indicating that the line continues, and would just reposition to
>>> before the CR, and indicate that the line continues (so the next read
>>> would see the full CRLF).
>>>
>>> There are ways to deal with this case if only u2=u1 indicates that the
>>> line is incomplete, but the system implementation becomes more complex
>>> and the probability of bugs increases.
>>
>>It's news to me that READ-LINE is particularly hard to implement or
>>the u2=u1 test doesn't just work.
>>
>>https://pastebin.com/z5Zmaatb
>
> I have tried this out, and, as I expected, it breaks in a corner case:
>
> : makeln
> s" foo" r/w create-file throw >r
> s" 123456789" r@ write-file throw (cr) r@ write-file throw
> s" 123456789" r@ write-file throw (cr) 1- r@ write-file throw
> s" 123456789" r@ write-file throw (cr) 1 /string r@ write-file throw
> s" 123456789" r@ write-file throw
> r> close-file drop ;

Sorry but what you're testing here is my EOL scanning algorithm -
not whether READ-LINE works with smaller than line-length buffers.

>
> : readln
> s" foo" r/w open-file throw >r
> cr ." asking: " buf size dup . r@ read-line throw
> cr ." got: flag = " . ." u2 = " dup . buf swap type
> cr ." asking: " buf size dup . r@ read-line throw
> cr ." got: flag = " . ." u2 = " dup . buf swap type
> cr ." asking: " buf size dup . r@ read-line throw
> cr ." got: flag = " . ." u2 = " dup . buf swap type
> cr ." asking: " buf size dup . r@ read-line throw
> cr ." got: flag = " . ." u2 = " dup . buf swap type
> r> close-file drop ;
>
> Your READ-FILE treats the CR without following LF as a newline, but
> does not deliver the following character. Treating CR-only as newline
> is ok, but then the next character must be delivered. Treating it as
> non-newline is also ok; then the second line should be treated as a
> 19-character line.
>
> Could a determined fuzzer find more corner cases?
>
> Cool use of the scan index, though.
>
> - anton
>

Re: Handling unsupported line-endings

<2021Oct24.090321@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15064&group=comp.lang.forth#15064

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 24 Oct 2021 07:03:21 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 125
Message-ID: <2021Oct24.090321@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <skt1fp$1826$1@gioia.aioe.org> <skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org> <2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="b0fe0e932e5713bad37e2426e0c6c714";
logging-data="10882"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18a2jQRICnU0xM5sgOhdmmH"
Cancel-Lock: sha1:pGDzudoNggPIhWH9mSO2KddP9yM=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 24 Oct 2021 07:03 UTC

dxforth <dxforth@gmail.com> writes:
>On 24/10/2021 03:54, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>On 23/10/2021 23:48, Anton Ertl wrote:
>>>> dxforth <dxforth@gmail.com> writes:
>>>>>I agree it would be useful were the u2=u1 test to work on every
>>>>>READ-LINE implementation, however ANS doesn't stipulate it nor
>>>>>guarantees it.
>>>>
>>>> It does:
>>>>
>>>> |When u1 = u2 the line terminator has yet to be reached.
>>>
>>>I see no "shall" in there - just an observation about overly long
>>>lines. The previous sentence indicates a completed line may range
>>>"0 <= u2 <= u1".
>>
>> The exact sentence is:
>>
>> |If a line terminator was received before u1 characters were read, then
>> |u2 is the number of characters, not including the line terminator,
>> |actually read (0 <= u2 <= u1).
>>
>> A little thinking produces the result that, not only does 0 <= u2 <=
>> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
>> terminator was received /before/ u1 characters were read, and the line
>> terminator is not included.
>
>A little thinking will inform the purpose of specifying a 'u1 + 2' sized
>buffer was to allow a completed line of 'u1' characters. If you can
>interpret the remainder of ANS' specification in a way that contradicts
>it, then there is something wrong either with your interpretation or
>the spec.

What is wrong with the specification IYO? I see nothing that is
contradictory. Would specifying 1 rather than 2 extra char have been
sufficient for the intended purpose? Maybe. As Gforth shows, you can
also do it completely without extra chars.

Let's see how, e.g., SwiftForth behaves:

[~/gforth:126015] echo "bla" >xbla
[~/gforth:126017] unix2dos xbla
unix2dos: converting file xbla to DOS format...
[~/gforth:126018] od -t x1 xbla
0000000 62 6c 61 0d 0a
0000005
[~/gforth:126022] sf
pad 16 255 fill pad 3 f read-line cr . . . pad 16 dump
0 -1 3
8084E28 62 6C 61 0D FF FF FF FF FF FF FF FF FF FF FF FF bla............. ok
pad 16 255 fill pad 3 f read-line cr . . . pad 16 dump
0 -1 0
8084E28 0D 0A FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................ ok

So SwiftForth indeed uses only one of the extra chars. Note that
SwiftForth sticks to the specification and considers the line
terminator not yet reached when u2=u1; you need another READ-LINE to
get the (empty) rest of the line (u2=0, which indicates that the line
terminator has been reached).

>>>> Your spec does not specify what happens when the buffer is too short
>>>> for the line, or one of the in-between cases. Given that the
>>>> intention seems to be that the system reads the newline characters
>>>> into the buffer, I expect breakage in corner cases.
>>>>
>>>> In particular, if you have u1=10, and a line consisting of 9 chars
>>>> followed by CR and LF, you only see the CR with your read.
>>>>
>>>> A simple interface would return u2=9 and have a separate way for
>>>> indicating that the line continues, and would just reposition to
>>>> before the CR, and indicate that the line continues (so the next read
>>>> would see the full CRLF).
>>>>
>>>> There are ways to deal with this case if only u2=u1 indicates that the
>>>> line is incomplete, but the system implementation becomes more complex
>>>> and the probability of bugs increases.
>>>
>>>It's news to me that READ-LINE is particularly hard to implement or
>>>the u2=u1 test doesn't just work.
>>>
>>>https://pastebin.com/z5Zmaatb
>>
>> I have tried this out, and, as I expected, it breaks in a corner case:
>>
>> : makeln
>> s" foo" r/w create-file throw >r
>> s" 123456789" r@ write-file throw (cr) r@ write-file throw
>> s" 123456789" r@ write-file throw (cr) 1- r@ write-file throw
>> s" 123456789" r@ write-file throw (cr) 1 /string r@ write-file throw
>> s" 123456789" r@ write-file throw
>> r> close-file drop ;
[reinserted from below]
>> : readln
>> s" foo" r/w open-file throw >r
>> cr ." asking: " buf size dup . r@ read-line throw
>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>> cr ." asking: " buf size dup . r@ read-line throw
>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>> cr ." asking: " buf size dup . r@ read-line throw
>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>> cr ." asking: " buf size dup . r@ read-line throw
>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>> r> close-file drop ;
>
>Sorry but what you're testing here is my EOL scanning algorithm -
>not whether READ-LINE works with smaller than line-length buffers.

It is easy to see that the test (READLN) only calls READ-LINE, so what
it tests is (modified-spec) READ-LINE. What part of your READ-LINE
fails does not matter for the question at hand (complexity of a
correct (modified-spec) READ-LINE).

So, once you manage to have a (modified-spec) READ-LINE that is
correct, how complicated is it? Could you do a less complicated
READ-LINE if u2<u1 does not indicate that the line terminator has been
reached, but instead you return that in an extra flag (or as a third
value in the existing flag, which would no longer be a flag)?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl3e4n$1m83$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15065&group=comp.lang.forth#15065

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 24 Oct 2021 21:56:22 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl3e4n$1m83$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <skt1fp$1826$1@gioia.aioe.org>
<skt3bc$1pn4$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at>
<sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at>
<sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at>
<sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at>
<sl19jo$1533$1@gioia.aioe.org> <2021Oct23.185441@mips.complang.tuwien.ac.at>
<sl2e46$16ea$1@gioia.aioe.org> <2021Oct24.090321@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="55555"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Sun, 24 Oct 2021 10:56 UTC

On 24/10/2021 18:03, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 24/10/2021 03:54, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>>>On 23/10/2021 23:48, Anton Ertl wrote:
>>>>> dxforth <dxforth@gmail.com> writes:
>>>>>>I agree it would be useful were the u2=u1 test to work on every
>>>>>>READ-LINE implementation, however ANS doesn't stipulate it nor
>>>>>>guarantees it.
>>>>>
>>>>> It does:
>>>>>
>>>>> |When u1 = u2 the line terminator has yet to be reached.
>>>>
>>>>I see no "shall" in there - just an observation about overly long
>>>>lines. The previous sentence indicates a completed line may range
>>>>"0 <= u2 <= u1".
>>>
>>> The exact sentence is:
>>>
>>> |If a line terminator was received before u1 characters were read, then
>>> |u2 is the number of characters, not including the line terminator,
>>> |actually read (0 <= u2 <= u1).
>>>
>>> A little thinking produces the result that, not only does 0 <= u2 <=
>>> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
>>> terminator was received /before/ u1 characters were read, and the line
>>> terminator is not included.
>>
>>A little thinking will inform the purpose of specifying a 'u1 + 2' sized
>>buffer was to allow a completed line of 'u1' characters. If you can
>>interpret the remainder of ANS' specification in a way that contradicts
>>it, then there is something wrong either with your interpretation or
>>the spec.
>
> What is wrong with the specification IYO? I see nothing that is
> contradictory.

You claim an entitlement that would render the specification contradictory.

> Would specifying 1 rather than 2 extra char have been
> sufficient for the intended purpose? Maybe. As Gforth shows, you can
> also do it completely without extra chars.

No. It entrenches the notion that u1 characters need to be received when
there was never such a need (in any language).

> ...
>>>>It's news to me that READ-LINE is particularly hard to implement or
>>>>the u2=u1 test doesn't just work.
>>>>
>>>>https://pastebin.com/z5Zmaatb
>>>
>>> I have tried this out, and, as I expected, it breaks in a corner case:
>>>
>>> : makeln
>>> s" foo" r/w create-file throw >r
>>> s" 123456789" r@ write-file throw (cr) r@ write-file throw
>>> s" 123456789" r@ write-file throw (cr) 1- r@ write-file throw
>>> s" 123456789" r@ write-file throw (cr) 1 /string r@ write-file throw
>>> s" 123456789" r@ write-file throw
>>> r> close-file drop ;
> [reinserted from below]
>>> : readln
>>> s" foo" r/w open-file throw >r
>>> cr ." asking: " buf size dup . r@ read-line throw
>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>> cr ." asking: " buf size dup . r@ read-line throw
>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>> cr ." asking: " buf size dup . r@ read-line throw
>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>> cr ." asking: " buf size dup . r@ read-line throw
>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>> r> close-file drop ;
>>
>>Sorry but what you're testing here is my EOL scanning algorithm -
>>not whether READ-LINE works with smaller than line-length buffers.
>
> It is easy to see that the test (READLN) only calls READ-LINE, so what
> it tests is (modified-spec) READ-LINE. What part of your READ-LINE
> fails does not matter for the question at hand (complexity of a
> correct (modified-spec) READ-LINE).
>
> So, once you manage to have a (modified-spec) READ-LINE that is
> correct, how complicated is it? Could you do a less complicated
> READ-LINE if u2<u1 does not indicate that the line terminator has been
> reached, but instead you return that in an extra flag (or as a third
> value in the existing flag, which would no longer be a flag)?

You've lost me. Here's a run on a text file comprising lines of
mixed length. I assume this is the behaviour you were seeking.

Making text file: FOO

Showing: FOO
1 12
123
1234
12345
123456
1234567
12345678
123456789

Reading: FOO using READ-LINE u1 = 5

Writing: BAR u2 chars using WRITE-FILE

Writing EOL sequence when u2 <> u1

Showing: BAR
1 12
123
1234
12345
123456
1234567
12345678
123456789
ok

Re: Handling unsupported line-endings

<2021Oct24.185514@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15066&group=comp.lang.forth#15066

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Sun, 24 Oct 2021 16:55:14 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 103
Message-ID: <2021Oct24.185514@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org> <2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org> <2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="b0fe0e932e5713bad37e2426e0c6c714";
logging-data="13973"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19VKKnVOENf1f1H8eyIOvgJ"
Cancel-Lock: sha1:dn0MzjOe3YghlppjumkBKJxYXB8=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 24 Oct 2021 16:55 UTC

dxforth <dxforth@gmail.com> writes:
>On 24/10/2021 18:03, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>On 24/10/2021 03:54, Anton Ertl wrote:
>>>> dxforth <dxforth@gmail.com> writes:
>>>>>On 23/10/2021 23:48, Anton Ertl wrote:
>>>>>> |When u1 = u2 the line terminator has yet to be reached.
>>>>>
>>>>>I see no "shall" in there - just an observation about overly long
>>>>>lines. The previous sentence indicates a completed line may range
>>>>>"0 <= u2 <= u1".
>>>>
>>>> The exact sentence is:
>>>>
>>>> |If a line terminator was received before u1 characters were read, then
>>>> |u2 is the number of characters, not including the line terminator,
>>>> |actually read (0 <= u2 <= u1).
>>>>
>>>> A little thinking produces the result that, not only does 0 <= u2 <=
>>>> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
>>>> terminator was received /before/ u1 characters were read, and the line
>>>> terminator is not included.
>>>
>>>A little thinking will inform the purpose of specifying a 'u1 + 2' sized
>>>buffer was to allow a completed line of 'u1' characters. If you can
>>>interpret the remainder of ANS' specification in a way that contradicts
>>>it, then there is something wrong either with your interpretation or
>>>the spec.
>>
>> What is wrong with the specification IYO? I see nothing that is
>> contradictory.
>
>You claim an entitlement that would render the specification contradictory.

Which entitlement? What contradiction?

>> Would specifying 1 rather than 2 extra char have been
>> sufficient for the intended purpose? Maybe. As Gforth shows, you can
>> also do it completely without extra chars.
>
>No. It entrenches the notion that u1 characters need to be received when
>there was never such a need (in any language).

You lost me here. What is "it", what do you mean with "entrench", and
the "u1 characters need to be received" is also mysterious without
context.

>>>>>It's news to me that READ-LINE is particularly hard to implement or
>>>>>the u2=u1 test doesn't just work.
>>>>>
>>>>>https://pastebin.com/z5Zmaatb
>>>>
>>>> I have tried this out, and, as I expected, it breaks in a corner case:
>>>>
>>>> : makeln
>>>> s" foo" r/w create-file throw >r
>>>> s" 123456789" r@ write-file throw (cr) r@ write-file throw
>>>> s" 123456789" r@ write-file throw (cr) 1- r@ write-file throw
>>>> s" 123456789" r@ write-file throw (cr) 1 /string r@ write-file throw
>>>> s" 123456789" r@ write-file throw
>>>> r> close-file drop ;
>> [reinserted from below]
>>>> : readln
>>>> s" foo" r/w open-file throw >r
>>>> cr ." asking: " buf size dup . r@ read-line throw
>>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>>> cr ." asking: " buf size dup . r@ read-line throw
>>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>>> cr ." asking: " buf size dup . r@ read-line throw
>>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>>> cr ." asking: " buf size dup . r@ read-line throw
>>>> cr ." got: flag = " . ." u2 = " dup . buf swap type
>>>> r> close-file drop ;
>>>
>>>Sorry but what you're testing here is my EOL scanning algorithm -
>>>not whether READ-LINE works with smaller than line-length buffers.
>>
>> It is easy to see that the test (READLN) only calls READ-LINE, so what
>> it tests is (modified-spec) READ-LINE. What part of your READ-LINE
>> fails does not matter for the question at hand (complexity of a
>> correct (modified-spec) READ-LINE).
>>
>> So, once you manage to have a (modified-spec) READ-LINE that is
>> correct, how complicated is it? Could you do a less complicated
>> READ-LINE if u2<u1 does not indicate that the line terminator has been
>> reached, but instead you return that in an extra flag (or as a third
>> value in the existing flag, which would no longer be a flag)?
>
>You've lost me.

Your current READ-LINE does not handle CR without following LF
correctly. There are two correct ways to handle it: 1) Treat it as
newline. 2) Treat it as non-newline char. What your READLINE does is
to treat the CR as newline and then skip the next char.

If you fix that bug, what does your READ-LINE look like?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<f0a0af84-d8f8-4c88-8279-b8d76bd279b9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15067&group=comp.lang.forth#15067

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:6214:28f:: with SMTP id l15mr10225357qvv.16.1635098214274;
Sun, 24 Oct 2021 10:56:54 -0700 (PDT)
X-Received: by 2002:a37:65d0:: with SMTP id z199mr10054687qkb.484.1635098214058;
Sun, 24 Oct 2021 10:56:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sun, 24 Oct 2021 10:56:53 -0700 (PDT)
In-Reply-To: <2021Oct23.104040@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:1110:207:932d:7db1:9661:7e16:cabb;
posting-account=5cIhGQgAAAD51vWxObfbr2Fz1M5rcgWL
NNTP-Posting-Host: 2a00:1110:207:932d:7db1:9661:7e16:cabb
References: <skjhir$jd9$1@gioia.aioe.org> <8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com> <2021Oct22.122045@mips.complang.tuwien.ac.at>
<skvm1i$qlj$1@gioia.aioe.org> <2021Oct23.104040@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f0a0af84-d8f8-4c88-8279-b8d76bd279b9n@googlegroups.com>
Subject: Re: Handling unsupported line-endings
From: pah...@gmail.com (pahihu)
Injection-Date: Sun, 24 Oct 2021 17:56:54 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 23
 by: pahihu - Sun, 24 Oct 2021 17:56 UTC

Anton Ertl ezt írta (2021. október 23., szombat, 10:57:09 UTC+2):
> Gforth uses C buffered I/O under the hood, and uses getc() to inspect
> every char individually. And when it sees CR or LF, it does not write
> them into the buffer.

With a bla2.txt as
00000000: 626c 610a 0d

gforth 0.7.3 (so does SwiftForth, iForth) gives:

s" bla2.txt" r/o open-file throw value fid ok
pad 3 fid read-line .s drop 2drop <3> 3 -1 0 ok
pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
pad 3 fid read-line .s drop 2drop <3> 0 0 0 ok

gforth 0.7.9_20211021 gives:
s" bla2.txt" r/o open-file throw value fid ok
pad 3 fid read-line .s drop 2drop <3> 3 -1 0 ok
pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
pad 3 fid read-line .s drop 2drop <3> 0 0 0 ok

This was a planned change or a bug?
pahihu

Re: Handling unsupported line-endings

<sl59h0$2kr$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15068&group=comp.lang.forth#15068

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 25 Oct 2021 14:49:51 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl59h0$2kr$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="2715"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Mon, 25 Oct 2021 03:49 UTC

On 25/10/2021 03:55, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 24/10/2021 18:03, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>>>On 24/10/2021 03:54, Anton Ertl wrote:
>>>>> dxforth <dxforth@gmail.com> writes:
>>>>>>On 23/10/2021 23:48, Anton Ertl wrote:
>>>>>>> |When u1 = u2 the line terminator has yet to be reached.
>>>>>>
>>>>>>I see no "shall" in there - just an observation about overly long
>>>>>>lines. The previous sentence indicates a completed line may range
>>>>>>"0 <= u2 <= u1".
>>>>>
>>>>> The exact sentence is:
>>>>>
>>>>> |If a line terminator was received before u1 characters were read, then
>>>>> |u2 is the number of characters, not including the line terminator,
>>>>> |actually read (0 <= u2 <= u1).
>>>>>
>>>>> A little thinking produces the result that, not only does 0 <= u2 <=
>>>>> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
>>>>> terminator was received /before/ u1 characters were read, and the line
>>>>> terminator is not included.
>>>>
>>>>A little thinking will inform the purpose of specifying a 'u1 + 2' sized
>>>>buffer was to allow a completed line of 'u1' characters. If you can
>>>>interpret the remainder of ANS' specification in a way that contradicts
>>>>it, then there is something wrong either with your interpretation or
>>>>the spec.
>>>
>>> What is wrong with the specification IYO? I see nothing that is
>>> contradictory.
>>
>>You claim an entitlement that would render the specification contradictory.
>
> Which entitlement? What contradiction?

Amnesia?

ANS does not entitle a standard programs to test for a completed line using
'u2<>u1'. It would deny an implementer the right to have a completed line of
u2=u1 chars - an entitlement given in the first paragraph of the specification.

>
>>> Would specifying 1 rather than 2 extra char have been
>>> sufficient for the intended purpose? Maybe. As Gforth shows, you can
>>> also do it completely without extra chars.
>>
>>No. It entrenches the notion that u1 characters need to be received when
>>there was never such a need (in any language).
>
> You lost me here. What is "it", what do you mean with "entrench", and
> the "u1 characters need to be received" is also mysterious without
> context.

"As Gforth shows, you can also do it completely without extra chars."

is a complication nobody needs. You want Forth to be slower than C ?

>>>>
>>>>Sorry but what you're testing here is my EOL scanning algorithm -
>>>>not whether READ-LINE works with smaller than line-length buffers.
>>>
>>> It is easy to see that the test (READLN) only calls READ-LINE, so what
>>> it tests is (modified-spec) READ-LINE. What part of your READ-LINE
>>> fails does not matter for the question at hand (complexity of a
>>> correct (modified-spec) READ-LINE).
>>>
>>> So, once you manage to have a (modified-spec) READ-LINE that is
>>> correct, how complicated is it? Could you do a less complicated
>>> READ-LINE if u2<u1 does not indicate that the line terminator has been
>>> reached, but instead you return that in an extra flag (or as a third
>>> value in the existing flag, which would no longer be a flag)?
>>
>>You've lost me.
>
> Your current READ-LINE does not handle CR without following LF
> correctly.

Since when does ANS require it? ANS permits implementers to choose
what line terminators to support and CR was low priority for me.

> There are two correct ways to handle it: 1) Treat it as
> newline. 2) Treat it as non-newline char. What your READLINE does is
> to treat the CR as newline and then skip the next char.
>
> If you fix that bug, what does your READ-LINE look like?

You mean 'want'. You want a READ-LINE that handles all common line
terminators and works within the confines of a 'u1' sized buffer?
ISTM SwiftForth's READ-LINE will do that with the following source
tweaks. Remove the '1+' and insert '1-' before EOL-SCANNER.
I haven't tested it but you can.

Re: Handling unsupported line-endings

<sl5ea9$1gse$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15069&group=comp.lang.forth#15069

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 25 Oct 2021 16:11:37 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl5ea9$1gse$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<8bff3f97-d0c0-4477-8bb0-269f41c10b27n@googlegroups.com>
<skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
<2021Oct22.122045@mips.complang.tuwien.ac.at> <skvm1i$qlj$1@gioia.aioe.org>
<2021Oct23.104040@mips.complang.tuwien.ac.at>
<f0a0af84-d8f8-4c88-8279-b8d76bd279b9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="50062"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Mon, 25 Oct 2021 05:11 UTC

On 25/10/2021 04:56, pahihu wrote:
> Anton Ertl ezt írta (2021. október 23., szombat, 10:57:09 UTC+2):
>> Gforth uses C buffered I/O under the hood, and uses getc() to inspect
>> every char individually. And when it sees CR or LF, it does not write
>> them into the buffer.
>
> With a bla2.txt as
> 00000000: 626c 610a 0d
>
> gforth 0.7.3 (so does SwiftForth, iForth) gives:
>
> s" bla2.txt" r/o open-file throw value fid ok
> pad 3 fid read-line .s drop 2drop <3> 3 -1 0 ok
> pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
> pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
> pad 3 fid read-line .s drop 2drop <3> 0 0 0 ok
>
> gforth 0.7.9_20211021 gives:
> s" bla2.txt" r/o open-file throw value fid ok
> pad 3 fid read-line .s drop 2drop <3> 3 -1 0 ok
> pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
> pad 3 fid read-line .s drop 2drop <3> 0 0 0 ok
>
> This was a planned change or a bug?
> pahihu
>

SwiftForth i386-Win32 3.11.2 22-Jun-2021
s" bla2.txt" r/o open-file throw value fid ok
pad 3 fid read-line .s drop 2drop 3 -1 0 <-Top ok
pad 3 fid read-line .s drop 2drop 0 -1 0 <-Top ok
pad 3 fid read-line .s drop 2drop 0 0 0 <-Top ok

Re: Handling unsupported line-endings

<2021Oct25.085345@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15070&group=comp.lang.forth#15070

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 25 Oct 2021 06:53:45 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 28
Message-ID: <2021Oct25.085345@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com> <2021Oct22.122045@mips.complang.tuwien.ac.at> <skvm1i$qlj$1@gioia.aioe.org> <2021Oct23.104040@mips.complang.tuwien.ac.at> <f0a0af84-d8f8-4c88-8279-b8d76bd279b9n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="b42143063d551d6dc122d6d7cabeda20";
logging-data="14267"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Xw0PgDVcFXp+fetyszMvm"
Cancel-Lock: sha1:NdnJsNG3ai0QsWaMWjzFXMNYGZw=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 25 Oct 2021 06:53 UTC

pahihu <pahihu@gmail.com> writes:
>With a bla2.txt as
>00000000: 626c 610a 0d
>
>gforth 0.7.3 (so does SwiftForth, iForth) gives:
>
>s" bla2.txt" r/o open-file throw value fid ok
>pad 3 fid read-line .s drop 2drop <3> 3 -1 0 ok
>pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
>pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
>pad 3 fid read-line .s drop 2drop <3> 0 0 0 ok
>
>gforth 0.7.9_20211021 gives:
>s" bla2.txt" r/o open-file throw value fid ok
>pad 3 fid read-line .s drop 2drop <3> 3 -1 0 ok
>pad 3 fid read-line .s drop 2drop <3> 0 -1 0 ok
>pad 3 fid read-line .s drop 2drop <3> 0 0 0 ok
>
>This was a planned change or a bug?

A bug, now fixed. Thanks for reporting.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<2021Oct25.085451@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15071&group=comp.lang.forth#15071

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 25 Oct 2021 06:54:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 16
Message-ID: <2021Oct25.085451@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <skjlhf$k4f$1@gioia.aioe.org> <649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com> <2021Oct19.095538@mips.complang.tuwien.ac.at> <0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com> <33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com> <2021Oct22.122045@mips.complang.tuwien.ac.at> <skvm1i$qlj$1@gioia.aioe.org> <2021Oct23.104040@mips.complang.tuwien.ac.at> <f0a0af84-d8f8-4c88-8279-b8d76bd279b9n@googlegroups.com> <sl5ea9$1gse$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="b42143063d551d6dc122d6d7cabeda20";
logging-data="14267"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18KhGBK6pL+w35dJeibr7kT"
Cancel-Lock: sha1:c1fs5t43GzkbkMSsrq/BhwKPMac=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 25 Oct 2021 06:54 UTC

dxforth <dxforth@gmail.com> writes:
>SwiftForth i386-Win32 3.11.2 22-Jun-2021
>s" bla2.txt" r/o open-file throw value fid ok
>pad 3 fid read-line .s drop 2drop 3 -1 0 <-Top ok
>pad 3 fid read-line .s drop 2drop 0 -1 0 <-Top ok
>pad 3 fid read-line .s drop 2drop 0 0 0 <-Top ok

3.11.0 behaves as pahihu described. Are you sure that the file you
read contains S" abc\n\r" (LFCR, not CRLF)?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl5l52$1v1k$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15072&group=comp.lang.forth#15072

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 25 Oct 2021 18:08:17 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl5l52$1v1k$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org> <skjlhf$k4f$1@gioia.aioe.org>
<649de292-1f29-4522-b41b-7b9d1faf6210n@googlegroups.com>
<2021Oct19.095538@mips.complang.tuwien.ac.at>
<0c4e0a35-7d6f-45bb-8d18-5ac4a83e2b99n@googlegroups.com>
<33a47861-b846-4556-b93e-e556d3a4a27cn@googlegroups.com>
<2021Oct22.122045@mips.complang.tuwien.ac.at> <skvm1i$qlj$1@gioia.aioe.org>
<2021Oct23.104040@mips.complang.tuwien.ac.at>
<f0a0af84-d8f8-4c88-8279-b8d76bd279b9n@googlegroups.com>
<sl5ea9$1gse$1@gioia.aioe.org> <2021Oct25.085451@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="64564"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Mon, 25 Oct 2021 07:08 UTC

On 25/10/2021 17:54, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>SwiftForth i386-Win32 3.11.2 22-Jun-2021
>>s" bla2.txt" r/o open-file throw value fid ok
>>pad 3 fid read-line .s drop 2drop 3 -1 0 <-Top ok
>>pad 3 fid read-line .s drop 2drop 0 -1 0 <-Top ok
>>pad 3 fid read-line .s drop 2drop 0 0 0 <-Top ok
>
> 3.11.0 behaves as pahihu described. Are you sure that the file you
> read contains S" abc\n\r" (LFCR, not CRLF)?

No. Had I known LFCR was intended I would have let it pass.

Re: Handling unsupported line-endings

<2021Oct25.085641@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15073&group=comp.lang.forth#15073

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Mon, 25 Oct 2021 06:56:41 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 78
Message-ID: <2021Oct25.085641@mips.complang.tuwien.ac.at>
References: <skjhir$jd9$1@gioia.aioe.org> <2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org> <2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org> <2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org> <2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org> <2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org> <2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="b42143063d551d6dc122d6d7cabeda20";
logging-data="31588"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Jv1gUUj8V4SleBaeQeVs1"
Cancel-Lock: sha1:HEJZEUHuh9C6YJf4Nb7jfZMmafw=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 25 Oct 2021 06:56 UTC

dxforth <dxforth@gmail.com> writes:
>On 25/10/2021 03:55, Anton Ertl wrote:
>> dxforth <dxforth@gmail.com> writes:
>>>On 24/10/2021 18:03, Anton Ertl wrote:
>>>> dxforth <dxforth@gmail.com> writes:
>>>>>On 24/10/2021 03:54, Anton Ertl wrote:
>>>>>> dxforth <dxforth@gmail.com> writes:
>>>>>>>On 23/10/2021 23:48, Anton Ertl wrote:
>>>>>>>> |When u1 = u2 the line terminator has yet to be reached.
>>>>>>>
>>>>>>>I see no "shall" in there - just an observation about overly long
>>>>>>>lines. The previous sentence indicates a completed line may range
>>>>>>>"0 <= u2 <= u1".
>>>>>>
>>>>>> The exact sentence is:
>>>>>>
>>>>>> |If a line terminator was received before u1 characters were read, then
>>>>>> |u2 is the number of characters, not including the line terminator,
>>>>>> |actually read (0 <= u2 <= u1).
>>>>>>
>>>>>> A little thinking produces the result that, not only does 0 <= u2 <=
>>>>>> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
>>>>>> terminator was received /before/ u1 characters were read, and the line
>>>>>> terminator is not included.
>>>>>
>>>>>A little thinking will inform the purpose of specifying a 'u1 + 2' sized
>>>>>buffer was to allow a completed line of 'u1' characters. If you can
>>>>>interpret the remainder of ANS' specification in a way that contradicts
>>>>>it, then there is something wrong either with your interpretation or
>>>>>the spec.
>>>>
>>>> What is wrong with the specification IYO? I see nothing that is
>>>> contradictory.
>>>
>>>You claim an entitlement that would render the specification contradictory.
>>
>> Which entitlement? What contradiction?
>
>Amnesia?
>
>ANS does not entitle a standard programs to test for a completed line using
>'u2<>u1'.

It does, as discussed above.

>It would deny an implementer the right to have a completed line of
>u2=u1 chars - an entitlement given in the first paragraph of the specification.

I find no such entitlement there.

>"As Gforth shows, you can also do it completely without extra chars."
>
>is a complication nobody needs. You want Forth to be slower than C ?

Forth's READ-LINE can be as fast as C's fgets(), if Forth does its own
buffering. In that case you just don't copy the line terminator
characters.

While Gforth does not do this (it uses C's buffering through getc, and
performance suffers from that), it's READ-LINE is still the fastest
one among the Forth systems posted here.

>> Your current READ-LINE does not handle CR without following LF
>> correctly.
>
>Since when does ANS require it? ANS permits implementers to choose
>what line terminators to support and CR was low priority for me.

It's acceptable to treat CR as a non-line-terminator. It is not
acceptable to tread "CR 1" as a line terminator, and your READ-LINE
does that.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Re: Handling unsupported line-endings

<sl7p4a$1je$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15075&group=comp.lang.forth#15075

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Tue, 26 Oct 2021 13:28:27 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl7p4a$1je$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
<2021Oct25.085641@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="1646"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Tue, 26 Oct 2021 02:28 UTC

On 25/10/2021 17:56, Anton Ertl wrote:
> dxforth <dxforth@gmail.com> writes:
>>On 25/10/2021 03:55, Anton Ertl wrote:
>>> dxforth <dxforth@gmail.com> writes:
>>>>On 24/10/2021 18:03, Anton Ertl wrote:
>>>>> dxforth <dxforth@gmail.com> writes:
>>>>>>On 24/10/2021 03:54, Anton Ertl wrote:
>>>>>>> dxforth <dxforth@gmail.com> writes:
>>>>>>>>On 23/10/2021 23:48, Anton Ertl wrote:
>>>>>>>>> |When u1 = u2 the line terminator has yet to be reached.
>>>>>>>>
>>>>>>>>I see no "shall" in there - just an observation about overly long
>>>>>>>>lines. The previous sentence indicates a completed line may range
>>>>>>>>"0 <= u2 <= u1".
>>>>>>>
>>>>>>> The exact sentence is:
>>>>>>>
>>>>>>> |If a line terminator was received before u1 characters were read, then
>>>>>>> |u2 is the number of characters, not including the line terminator,
>>>>>>> |actually read (0 <= u2 <= u1).
>>>>>>>
>>>>>>> A little thinking produces the result that, not only does 0 <= u2 <=
>>>>>>> u1 hold in that case, but 0 <= u2 < u1 holds, too (because the
>>>>>>> terminator was received /before/ u1 characters were read, and the line
>>>>>>> terminator is not included.
>>>>>>
>>>>>>A little thinking will inform the purpose of specifying a 'u1 + 2' sized
>>>>>>buffer was to allow a completed line of 'u1' characters. If you can
>>>>>>interpret the remainder of ANS' specification in a way that contradicts
>>>>>>it, then there is something wrong either with your interpretation or
>>>>>>the spec.
>>>>>
>>>>> What is wrong with the specification IYO? I see nothing that is
>>>>> contradictory.
>>>>
>>>>You claim an entitlement that would render the specification contradictory.
>>>
>>> Which entitlement? What contradiction?
>>
>>Amnesia?
>>
>>ANS does not entitle a standard programs to test for a completed line using
>>'u2<>u1'.
>
> It does, as discussed above.
>
>>It would deny an implementer the right to have a completed line of
>>u2=u1 chars - an entitlement given in the first paragraph of the specification.
>
> I find no such entitlement there.

"The line buffer provided by c-addr should be at least u1+2 characters long."

Entitles it.

>
>>"As Gforth shows, you can also do it completely without extra chars."
>>
>>is a complication nobody needs. You want Forth to be slower than C ?
>
> Forth's READ-LINE can be as fast as C's fgets(), if Forth does its own
> buffering. In that case you just don't copy the line terminator
> characters.
>
> While Gforth does not do this (it uses C's buffering through getc, and
> performance suffers from that), it's READ-LINE is still the fastest
> one among the Forth systems posted here.

Good for you but let's not condemn everyone to using getc.

>
>>> Your current READ-LINE does not handle CR without following LF
>>> correctly.
>>
>>Since when does ANS require it? ANS permits implementers to choose
>>what line terminators to support and CR was low priority for me.
>
> It's acceptable to treat CR as a non-line-terminator. It is not
> acceptable to tread "CR 1" as a line terminator, and your READ-LINE
> does that.

A lone CR in a text file that uses CRLF line endings in not convention.
There is no 'correct response' to such a situation, nor AFAIK did ANS
suggest one.

Re: Handling unsupported line-endings

<sl8472$13k6$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=15076&group=comp.lang.forth#15076

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!rocksolid2!i2pn.org!aioe.org!7AktqsUqy5CCvnKa3S0Dkw.user.46.165.242.75.POSTED!not-for-mail
From: dxfo...@gmail.com (dxforth)
Newsgroups: comp.lang.forth
Subject: Re: Handling unsupported line-endings
Date: Tue, 26 Oct 2021 16:37:38 +1100
Organization: Aioe.org NNTP Server
Message-ID: <sl8472$13k6$1@gioia.aioe.org>
References: <skjhir$jd9$1@gioia.aioe.org>
<2021Oct22.105318@mips.complang.tuwien.ac.at> <sku5be$1t2m$1@gioia.aioe.org>
<2021Oct23.103251@mips.complang.tuwien.ac.at> <sl0opd$4dh$1@gioia.aioe.org>
<2021Oct23.125900@mips.complang.tuwien.ac.at> <sl0v52$lvk$1@gioia.aioe.org>
<2021Oct23.144831@mips.complang.tuwien.ac.at> <sl19jo$1533$1@gioia.aioe.org>
<2021Oct23.185441@mips.complang.tuwien.ac.at> <sl2e46$16ea$1@gioia.aioe.org>
<2021Oct24.090321@mips.complang.tuwien.ac.at> <sl3e4n$1m83$1@gioia.aioe.org>
<2021Oct24.185514@mips.complang.tuwien.ac.at> <sl59h0$2kr$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="36486"; posting-host="7AktqsUqy5CCvnKa3S0Dkw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-GB
 by: dxforth - Tue, 26 Oct 2021 05:37 UTC

On 25/10/2021 14:49, dxforth wrote:
> ...
> You want a READ-LINE that handles all common line
> terminators and works within the confines of a 'u1' sized buffer?
> ISTM SwiftForth's READ-LINE will do that with the following source
> tweaks. Remove the '1+' and insert '1-' before EOL-SCANNER.

Correction. Amend the above to read:

Remove the '1+' and insert 'SWAP 1-' before MIN.

Pages:1234567
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor