Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

24 Apr, 2024: Testing a new version of the Overboard here. If you have an issue post about it to rocksolid.nodes.help (I know. Everyone on Usenet has issues)


devel / comp.lang.forth / Re: Updated String Parsing Words in kForth

SubjectAuthor
* Updated String Parsing Words in kForthKrishna Myneni
+* Re: Updated String Parsing Words in kForthP Falth
|`- Re: Updated String Parsing Words in kForthKrishna Myneni
+* Re: Updated String Parsing Words in kForthMarcel Hendrix
|`- Re: Updated String Parsing Words in kForthDoug Hoffman
+* Re: Updated String Parsing Words in kForthAnton Ertl
|+* Re: Updated String Parsing Words in kForthMarcel Hendrix
||+* Re: Updated String Parsing Words in kForthS Jack
|||+- Re: Updated String Parsing Words in kForthminf...@arcor.de
|||`* Re: Updated String Parsing Words in kForthHans Bezemer
||| `* Re: Updated String Parsing Words in kForthS Jack
|||  `- Re: Updated String Parsing Words in kForthS Jack
||+* Re: Updated String Parsing Words in kForthKrishna Myneni
|||+- Re: Updated String Parsing Words in kForthKrishna Myneni
|||`* Re: Updated String Parsing Words in kForthAnton Ertl
||| `- Re: Updated String Parsing Words in kForthKrishna Myneni
||`- Re: Updated String Parsing Words in kForthAnton Ertl
|`* Re: Updated String Parsing Words in kForthKrishna Myneni
| `* Re: Updated String Parsing Words in kForthAnton Ertl
|  `* Re: Updated String Parsing Words in kForthKrishna Myneni
|   `* Re: Updated String Parsing Words in kForthKrishna Myneni
|    +* Re: Updated String Parsing Words in kForthKrishna Myneni
|    |`* Re: Updated String Parsing Words in kForthDoug Hoffman
|    | `- Re: Updated String Parsing Words in kForthKrishna Myneni
|    `* Re: Updated String Parsing Words in kForthKrishna Myneni
|     `* Re: Updated String Parsing Words in kForthNN
|      `* Re: Updated String Parsing Words in kForthKrishna Myneni
|       `* Re: Updated String Parsing Words in kForthDoug Hoffman
|        +* Re: Updated String Parsing Words in kForthHans Bezemer
|        |`- Re: Updated String Parsing Words in kForthDoug Hoffman
|        +- Re: Updated String Parsing Words in kForthDoug Hoffman
|        `* Re: Updated String Parsing Words in kForthKrishna Myneni
|         +- Re: Updated String Parsing Words in kForthDoug Hoffman
|         `* Re: Updated String Parsing Words in kForthDoug Hoffman
|          `* Re: Updated String Parsing Words in kForthHans Bezemer
|           `* Re: Updated String Parsing Words in kForthDoug Hoffman
|            `* Re: Updated String Parsing Words in kForthHans Bezemer
|             `- Re: Updated String Parsing Words in kForthHans Bezemer
+* Re: Updated String Parsing Words in kForthNN
|+- Re: Updated String Parsing Words in kForthdxforth
|`- Re: Updated String Parsing Words in kForthKrishna Myneni
`* Re: Updated String Parsing Words in kForthminf...@arcor.de
 `* Re: Updated String Parsing Words in kForthKrishna Myneni
  `- Re: Updated String Parsing Words in kForthMarcel Hendrix

Pages:12
Re: Updated String Parsing Words in kForth

<623705e5$0$695$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17372&group=comp.lang.forth#17372

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Sun, 20 Mar 2022 06:45:54 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Subject: Re: Updated String Parsing Words in kForth
Content-Language: en-US
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t155th$9eu$1@dont-email.me>
From: dhoffman...@gmail.com (Doug Hoffman)
In-Reply-To: <t155th$9eu$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 52
Message-ID: <623705e5$0$695$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: b18da22c.news.sunsite.dk
X-Trace: 1647773157 news.sunsite.dk 695 glidedog@gmail.com/68.55.82.126:50349
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Sun, 20 Mar 2022 10:45 UTC

Why not simply supply the file h2lines.dat so others can try different
parsing techniques without having to follow a non-trivial recipe to
first create it? Or perhaps I am missing something about the parsing
problem you are wanting to solve.

-Doug

On 3/19/22 2:03 PM, Krishna Myneni wrote:

>> The line list file, h2lines.dat, must be present in the directory. If
>> you wish to try the example program under your Forth (2012-compatible)
>> system, you will need a copy of the line list file. ...
>
> If you download the H2SPEC files, you may build h2spec with the
> following command, under Linux:
>
> $ gfortran -o h2spec h2spec.f
>
> Then, run h2spec to generate the line list, using input parameters shown
> below:
> ---
> $ ./h2spec
>
>  Compute the H2 spectrum with these conditions:
>
>  Rotational temperature (K) >> 300
>  Lower wavenumber (cm-1) >> 60000
>  Upper wavenumber (cm-1) >> 90000
>  FWHM (cm-1) >> 10
> $
> ---
> This will generate two files: the line list, h2lines.dat, and a spectrum
> file, h2vuv.dat. Although the parameters entered into the program are
> only relevant for the spectrum file, the design of the program does not
> allow to only generate the line list.
>
> ---
> $ ls -l h2lines.dat
> -rw-rw-r--. 1 krishna krishna 202182 Mar 19 12:52 h2lines.dat
> $ md5sum h2lines.dat
> dbb81b16961d0a71e28fece40129def2  h2lines.dat
> ---
>
> --
> Krishna
>
>
>
> --
> Krishna
>

Re: Updated String Parsing Words in kForth

<t17ktl$4kt$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17375&group=comp.lang.forth#17375

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: krishna....@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Updated String Parsing Words in kForth
Date: Sun, 20 Mar 2022 11:31:49 -0500
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <t17ktl$4kt$1@dont-email.me>
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t155th$9eu$1@dont-email.me>
<623705e5$0$695$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 20 Mar 2022 16:31:50 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="3f01fcf7a502ef362a8c5c3c7c632c22";
logging-data="4765"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+hfoIAsGNBUMgPOsgfZTvS"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:MHitBsmfozGXgFi+RJfeX9qtlk4=
In-Reply-To: <623705e5$0$695$14726298@news.sunsite.dk>
Content-Language: en-US
 by: Krishna Myneni - Sun, 20 Mar 2022 16:31 UTC

On 3/20/22 05:45, Doug Hoffman wrote:
> Why not simply supply the file h2lines.dat so others can try different
> parsing techniques without having to follow a non-trivial recipe to
> first create it? Or perhaps I am missing something about the parsing
> problem you are wanting to solve.
>

Just to avoid cluttering the kForth repo. However, since you are
interested in parsing the file, I added it here:

https://github.com/mynenik/kForth-64/blob/master/forth-src/h2lines.dat

Cheers,
Krishna

Re: Updated String Parsing Words in kForth

<t17p41$ntj$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17377&group=comp.lang.forth#17377

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: krishna....@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Updated String Parsing Words in kForth
Date: Sun, 20 Mar 2022 12:43:29 -0500
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <t17p41$ntj$1@dont-email.me>
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 20 Mar 2022 17:43:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="3f01fcf7a502ef362a8c5c3c7c632c22";
logging-data="24499"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+2EMVczRwJfQGLUTL/202O"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:+2iu6/TjbwIk6+ySan4QW5RY/bY=
In-Reply-To: <t14quq$2hu$1@dont-email.me>
Content-Language: en-US
 by: Krishna Myneni - Sun, 20 Mar 2022 17:43 UTC

On 3/19/22 09:56, Krishna Myneni wrote:
....
> https://github.com/mynenik/kForth-64/blob/master/forth-src/parse-h2lines.4th
>
....
Minor revs made to the above file. I don't know why I used an unaligned
field in the structure, SpectralLine%, for the floating point data. The
conditional defn. of +FFIELD may be removed.

For simplicity, it should be

BEGIN-STRUCTURE SpectralLine%
field: LineIndex
ffield: Wavelength \ Angstroms
ffield: Wavenumber \ frequency in cm^-1
ffield: Aul \ line transition rate in s^-1
EnergyLevel% +field LowerLevel
EnergyLevel% +field UpperLevel
END-STRUCTURE

KM

Re: Updated String Parsing Words in kForth

<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17406&group=comp.lang.forth#17406

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:620a:21d4:b0:67d:6a35:5dff with SMTP id h20-20020a05620a21d400b0067d6a355dffmr16759466qka.747.1648488249747;
Mon, 28 Mar 2022 10:24:09 -0700 (PDT)
X-Received: by 2002:ac8:598e:0:b0:2e2:3243:1e52 with SMTP id
e14-20020ac8598e000000b002e232431e52mr23180273qte.369.1648488249252; Mon, 28
Mar 2022 10:24:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 28 Mar 2022 10:24:09 -0700 (PDT)
In-Reply-To: <t17p41$ntj$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.200.220; posting-account=9A5f7goAAAD_QfJPZnlK3Xq_UhzYjdP-
NNTP-Posting-Host: 92.40.200.220
References: <t0jcsp$9g4$1@dont-email.me> <2022Mar13.103712@mips.complang.tuwien.ac.at>
<t0lumk$8jf$1@dont-email.me> <2022Mar18.095852@mips.complang.tuwien.ac.at>
<t125t2$agt$1@dont-email.me> <t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: november...@gmail.com (NN)
Injection-Date: Mon, 28 Mar 2022 17:24:09 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 214
 by: NN - Mon, 28 Mar 2022 17:24 UTC

Hi Krishna,

Since no one has posted anything, heres my take on the parsing.

The first half is setting up the list helper fns. These should be obvious to anyone familiar with lisp.
I also like to have extra stacks handy which are in the form of s1 & s2.

I read the file into memory.
Pick out the lines and store these as strings.
Since we now have lines the file isnt needed and the memory released.
For each line, pick out space separated tokens which are stored as symbols.
The symbols are collected into lists and lists stored on stack s2.

(NB: I have tagged the atoms into symbols and strings. )

I used 2variable rather than 2value because gforth didnt appear to have 2value.
I have found : src in> eol? rch nch pch bl? skipp scann and token to be sufficient for most purposes.

if someone does run the code below.
s1 should be empty
s2 will have the lines as lists.
you can inspect using s1. & s2.
s2> will pop a line at a time but it will be in a list format.

Although you have given the structure of the data in the previous post, I have left it as it is
because I just wanted it to be a general. Since you have written about recognisers they could
be the next step in identifying ints, reals and labels.

\\ -------------------------------------------------------------------

only forth
vocabulary testlex
also testlex definitions

: cn ( y x -- p ) 2 cells allocate throw dup >r 2! r> ;
: hd ( p -- x ) 2@ nip ;
: tl ( p -- y ) 2@ drop ;

: -cn ( p -- ) free throw ;

: !hd ( x p -- ) dup >r tl swap r> 2! ;
: !tl ( y p -- ) dup >r hd r> 2! ;

: swons ( x y -- p ) swap cn ;
: lst ( x --[x] ) 0 swons ;

: tag ( a -- n ) 3 and ;
: null ( a -- f ) [ -1 3 xor ] literal and 0= ;
: atom ( a -- f ) tag 0> ;
: eq ( y x -- f ) = ;

: ?sym ( a -- f ) tag 1 = ;
: ?str ( a -- f ) tag 2 = ;

: concat ( l1 l2 -- l3 )
over null if nip else
over begin dup tl null 0= while tl repeat !tl
then ;

: reverse ( l1 -- l2 )
dup null 0= if
0 swap begin dup null 0= while 2@ rot swons swap repeat drop
then ;

: len ( l -- n )
0 swap begin dup null 0= while swap 1+ swap tl repeat drop ;

: nth ( l i -- )
dup 0< abort" ERR: Index outside range"
2dup swap len > abort" ERR: Index outside range"
begin 1- dup while swap tl swap repeat drop hd ;

: cp$ ( a u -- b )
dup 1+ dup 255 > abort" ERR: String longer than expected"
allocate throw dup >r place r> ;

: esym ( a u -- b ) 2dup s" nil" compare 0= if 2drop 0 else cp$ then 1+ ;
: dsym ( b -- a u ) dup null if drop s" nil" else 1- count then ;

: estr ( a u -- b ) ?dup if cp$ else drop 0 then 2 + ;
: dstr ( b -- a u ) dup null if drop s" " else 2 - count then ;

defer pr

: prlst1 ( a -- ) begin 2@ space pr dup null until drop ;

: prlst ( a -- )
." ("
dup null if drop else
2@ pr dup null if drop else prlst1 then
then
." )" ;

: prstr ( a -- ) '"' emit dstr type '"' emit ;
: prsym ( a -- ) dsym type ;

create _prtbl
' prlst ,
' prsym ,
' prstr ,

:noname ( a -- ) dup tag cells _prtbl + perform ; is pr

: ?stk ( s -- f ) null ;
: push ( n s -- s ) swons ;
: pop ( s -- n s )
dup ?stk abort" ERR: Stack empty "
dup 2@ swap rot -cn ;
: peek ( s -- n )
dup ?stk abort" ERR: Stack empty " 2@ nip ;
: empty-stk ( s -- ) begin dup ?stk 0= while dup pop drop repeat ;

variable s1

: ?s1 ( -- f ) s1 @ ?stk ;
: >s1 ( n -- ) s1 @ push s1 ! ;
: s1> ( -- n ) s1 @ pop s1 ! ;
: s1@ ( -- n ) s1 @ peek ;
: s1. ( -- ) s1 @ pr ;

variable s2

: ?s2 ( -- f ) s2 @ ?stk ;
: >s2 ( n -- ) s2 @ push s2 ! ;
: s2> ( -- n ) s2 @ pop s2 ! ;
: s2@ ( -- n ) s2 @ peek ;
: s2. ( -- ) s2 @ pr ;

: s1->s2 ( -- ) begin ?s1 0= while s1> >s2 repeat ;
: s2->s1 ( -- ) begin ?s2 0= while s2> >s1 repeat ;

: empty-s1 ( -- ) s1 @ empty-stk s1 ! ;
: empty-s2 ( -- ) s2 @ empty-stk s2 ! ;

2variable src
variable in>

: eol? ( -- f ) src 2@ nip in> @ > 0= ;
: rch ( -- c ) src 2@ drop in> @ + c@ ;
: nch ( -- ) 1 in> +! ;
: pch ( -- ) -1 in> +! ;

( tab or space )
: bl? ( c -- f )
dup 9 = swap bl = or ;

\ ( whitespace )
\ : bl? ( c -- f )
\ dup 9 =
\ over 10 = or
\ over 13 = or
\ swap bl = or ;

\ ( all values 0..32 )
\ : bl? ( c -- f ) bl 1+ u< ;

: skipp ( c -- )
>r begin eol? 0= while rch r@ = while nch repeat then r> drop ;

: scann ( c -- )
>r begin eol? 0= while rch r@ <> while nch repeat then r> drop ;

: token ( c -- )
>r
src 2@ drop in> @ ( a1 u1 )
r> scann
in> @ ( a1 u1 u2 )
over - >r + r> nch ;

: (slurp)
r/o open-file throw >r
r@ file-size throw d>s
dup allocate throw dup rot
r@ read-file throw
r> close-file throw ;

: slurp ( name -- a n )
['] (slurp) catch if
-1 abort" ERR: error occured reading file" then ;

: rdlines ( a u -- )
empty-s1
begin eol? 0= while
10 token estr >s1
repeat ;
( read file into memory )
s" h2lines.dat" slurp src 2! 0 in> !

( break up into lines )
rdlines

( release mem )
scr 2@ drop -cn

: parse-line ( -- l )
0 begin eol? 0= while
bl token ?dup if esym cn else drop then
repeat reverse ;
: parse-lines
begin ?s1 0= while
s1> dstr src 2! 0 in> !
parse-line >s2
repeat ;

parse-lines

Re: Updated String Parsing Words in kForth

<t202km$bjl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17411&group=comp.lang.forth#17411

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: krishna....@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Updated String Parsing Words in kForth
Date: Tue, 29 Mar 2022 17:53:08 -0500
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <t202km$bjl$1@dont-email.me>
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Mar 2022 22:53:10 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ec2c86ec4e29ab1baaa4d2022dfc110c";
logging-data="11893"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX196AA+uANaByFIv+XDGpCtQ"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:jpMpyDj1Bw3Nc24S3rpmDUp5WTQ=
In-Reply-To: <1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
Content-Language: en-US
 by: Krishna Myneni - Tue, 29 Mar 2022 22:53 UTC

On 3/28/22 12:24, NN wrote:
> Hi Krishna,
>
> Since no one has posted anything, heres my take on the parsing.
>
> The first half is setting up the list helper fns. These should be obvious to anyone familiar with lisp.
> I also like to have extra stacks handy which are in the form of s1 & s2.
>
> I read the file into memory.

This usually will work on desktops, but not in memory-constrained
systems, so one should plan to be able to read the file, line by line.

> Pick out the lines and store these as strings.
> Since we now have lines the file isnt needed and the memory released.
> For each line, pick out space separated tokens which are stored as symbols.

Yes, that's easy to do with substrings/lexemes separated by space(s),
but we want a general framework for any arbitrary delimiters, e.g. comma
separated values, exported from a spreadsheet. The individual field
parsing and conversion words should not be so tightly coupled to the
line parser such that it is not generally useful.

For example, shown below is a single line (row) from an exported csv of
COVID-19 daily case data for a country over a two month period (this
file is publicly available). Each line has the following fields

1. Province/State
2. Country/Region
3. Latitude
4. Longitude
5. # of cases for Date1
6. # of cases for Date2
7. ...
....

The fields are separated by commas. Complications include:

-- The Province/State field may be missing, so the line starts with
",[country/region name],...", e.g.

,Argentina,-38.4161,-63.6167,...

-- The first two fields can be strings which include spaces and also
commas. When commas are present in the string, the entire string is
enclosed in quotes, e.g.

"Saint Helena, Ascension and Tristan da Cunha",United
Kingdom,-7.9467,-14.3559, ...

Note that the quotes in the above line are part of the data in the first
field. The field parser/converter for the first two fields must be
capable of handling such complications, so it will not be quite so simple.

> The symbols are collected into lists and lists stored on stack s2.
>

Pre-parsing the fields and keeping them as strings for later conversion
and storage is fine. But the conversion/storage step should be
accomplished before proceeding to the next line, to avoid overflowing
stacks. Also, keeping the substrings within a list is not going to scale
well for a large amount of data. It is handy to be able to examine the
substrings but not always practical.

> (NB: I have tagged the atoms into symbols and strings. )
>
> I used 2variable rather than 2value because gforth didnt appear to have 2value.
> I have found : src in> eol? rch nch pch bl? skipp scann and token to be sufficient for most purposes.
>
>
> if someone does run the code below.
> s1 should be empty
> s2 will have the lines as lists.
> you can inspect using s1. & s2.
> s2> will pop a line at a time but it will be in a list format.
>
> Although you have given the structure of the data in the previous post, I have left it as it is
> because I just wanted it to be a general. Since you have written about recognisers they could
> be the next step in identifying ints, reals and labels.
>

I haven't tested your code yet, but thanks for providing it. I will
check it out.

I'm working on the COVID-19 case data example within the PARSE-STRING
framework I proposed and used for a complete demonstration of reading
and storing the h2lines.dat data field. Also, I now think it may be good
to change the name of PARSE-STRING to STRING-PARSE. Words prefaced with
"PARSE" should probably be reserved for those words which parse from the
input stream, to avoid confusion.

The csv data file may be found here:

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

The first line contains headings for the fields.

--
Krishna

Re: Updated String Parsing Words in kForth

<b487fc38-26e8-474f-849c-afd59fc3e284n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17413&group=comp.lang.forth#17413

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:620a:d87:b0:67b:311c:ecbd with SMTP id q7-20020a05620a0d8700b0067b311cecbdmr22970141qkl.146.1648622700472;
Tue, 29 Mar 2022 23:45:00 -0700 (PDT)
X-Received: by 2002:a05:620a:3cc:b0:67b:e77:6f21 with SMTP id
r12-20020a05620a03cc00b0067b0e776f21mr22991053qkm.272.1648622700318; Tue, 29
Mar 2022 23:45:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 29 Mar 2022 23:45:00 -0700 (PDT)
In-Reply-To: <t0jcsp$9g4$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=79.224.111.239; posting-account=AqNUYgoAAADmkK2pN-RKms8sww57W0Iw
NNTP-Posting-Host: 79.224.111.239
References: <t0jcsp$9g4$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b487fc38-26e8-474f-849c-afd59fc3e284n@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: minfo...@arcor.de (minf...@arcor.de)
Injection-Date: Wed, 30 Mar 2022 06:45:00 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 79
 by: minf...@arcor.de - Wed, 30 Mar 2022 06:45 UTC

Krishna Myneni schrieb am Sonntag, 13. März 2022 um 01:12:12 UTC+1:
> I am overhauling the string parsing words in the kForth String Words
> library (strings.4th). The existing words are not named as clearly as
> they should be, and have inefficient implementations:
>
> \ PARSE_TOKEN ( a u -- arem urem atok utok )
> \ PARSE_LINE ( a u -- a1 u1 a2 u2 ... an un n )
> \ PARSE_ARGS ( a u -- n ) ( F: r1 ... rn )
> \ PARSE_CSV ( a u -- n ) ( F: r1 ... rn )
>
> PARSE_TOKEN skips leading spaces and parses the next blank(s)-delimited
> substring, atok utok, and also returns the remaining portion of the string.
>
> PARSE_LINE applies PARSE_TOKEN repeatedly to place the token strings and
> token count on the stack.
>
> PARSE_ARGS parses a string with blank delimited numbers, converting each
> substring to a floating point number, returning the n floating point
> numbers and the count.
>
> PARSE_CSV is a hack which replaces each comma in the string with a space
> and performs the same function as PARSE_ARGS.
>
> These words need clearer names and an upgrade (particularly PARSE_CSV).
> My proposed replacements are
>
>
> \ NEXT-BS-TOKEN ( a u -- atok utok arem urem )
> \ NEXT-CS-TOKEN ( a u -- atok utok arem urem )
> \ PARSED-BSV ( a u -- a1 u1 a2 u2 ... an un n )
> \ PARSED-CSV ( a u -- a1 u1 a2 u2 ... an un n )
> \ ITH-PARSED ( a1 u1 ... an un n i -- a1 u1 ... an un n ai ui )
> \ DROP-PARSED ( a1 u1 ... an un n -- )
> \ PARSED>FLOATS ( a1 u1 ... an un n -- n ) ( F: r1 ... rn )
>
> NEXT-BS-TOKEN is a slightly more efficient version of PARSE_TOKEN, but
> leaves the token and remaining strings in a swapped order. The name
> indicates that it is parsing the next "blank(s)-separated" token in the
> string.
>
> NEXT-CS-TOKEN parses the next "comma-separated" token in the string and
> returns the token and remaining substrings.
>
> PARSED-BSV starts with an input string and repeatedly applies
> NEXT-BS-TOKEN until the remaining string is null and returns all of the
> substrings and the token count on the stack.
>
> PARSED-CSV starts with an input string and repeatedly applies
> NEXT-CS-TOKEN until the remaining string is null. It returns all of the
> substrings and the token count on the stack. Unlike PARSED-BSV, each
> comma delimiter will mark a separte token string, e.g. if the input
> string is ",," there will be three token strings, each of length zero.
>
> ITH-PARSED provides a way to PICK the ith substring returned by
> PARSED-BSV and PARSED-CSV.
>
> DROP-PARSED may be used to discard the n token substrings returned by
> PARSED-BSV and PARSED-CSV.
>
> PARSED>FLOATS converts each of the token substrings returned by
> PARSED-BSV or PARSED-CSV to n floating point numbers. If the string to
> float conversion fails, the floating point value NAN will be returned.
> If a substring has zero length, the corresponding fp value will also be NAN.
>
> Comments or suggestions?

How much would it simplify the task to just use scanf ??
If C is not available it can be called from an OS library.
IIRC there are also similar Forth versions around.

Re: Updated String Parsing Words in kForth

<62442315$0$699$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17415&group=comp.lang.forth#17415

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Wed, 30 Mar 2022 05:29:53 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Subject: Re: Updated String Parsing Words in kForth
Content-Language: en-US
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me>
From: dhoffman...@gmail.com (Doug Hoffman)
In-Reply-To: <t202km$bjl$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 203
Message-ID: <62442315$0$699$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 3662b546.news.sunsite.dk
X-Trace: 1648632597 news.sunsite.dk 699 glidedog@gmail.com/68.55.82.126:49540
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Wed, 30 Mar 2022 09:29 UTC

include FMS2VT.f

0 value records

: .line ( n -- ) 1- records :at { a }
10 set-precision
a :size 0 ?do cr i dup . a :at dup .class :. loop ;

: process-line { a -- }
a :size 0 ?do i a :at
dup :@ >integer
if >int i a :to <free
else dup :@ >float
if >flt i a :to <free
else drop
then
then
loop a records :add ;

: process-file ( fileObj n c -- ) >r
over :!line-len
>array to records
begin
dup :each
while
r@ swap :split process-line
repeat
<free r> drop ; \ free the file

timer-reset
\ s" h2lines.dat" >file 500 bl process-file
\ s" time_series_covid19_confirmed_global.csv"
\ >file 10000 ',' process-file
..elapsed

\ h2lines.dat
\ => 40 ms

records :size . \ => 3261 ok

1 .line \ => output follows
0 int 1
1 flt 1108.127000
2 flt 90242.35200
3 flt 3060000.000
4 string X
5 int 0
6 int 0
7 string P
8 string B
9 int 0
10 int 1
11 string P ok

records <freeAll

\ time_series_covid19_confirmed_global.csv
\ => 190 ms

records :size . \ => 285 ok
1 .line
0 string Province/State
1 string Country/Region
2 string Lat
3 string Long
4 string 1/22/20
5 string 1/23/20
6 string 1/24/20
....
799 string 3/27/22
800 string 3/28/22
801 string 3/29/22 ok

2 .line
0 string Afghanistan
1 flt 33.93911000
2 flt 67.70995300
3 int 0
4 int 0
5 int 0
....
550 int 143183
551 int 143439
552 int 143666
....
799 int 177602
800 int 177658 ok

239 .line
0 string South Sudan
1 flt 6.877000000
2 flt 31.30700000
3 int 0
....

262 .line
0 string Anguilla
1 string United Kingdom
2 flt 18.22060000
3 flt -63.06860000
4 int 0
5 int 0
....
801 int 2700 ok

285 .line
0 string Zimbabwe
1 flt -19.01543800
2 flt 29.15485700
3 int 0
4 int 0
5 int 0
....
799 int 245927
800 int 246042 ok

On 3/29/22 6:53 PM, Krishna Myneni wrote:
> On 3/28/22 12:24, NN wrote:

>> I read the file into memory.
>
> This usually will work on desktops, but not in memory-constrained
> systems, so one should plan to be able to read the file, line by line.

My meager 7 year old notebook with 16 GB of ram was plenty to hold the
entire file data in a heap array. But the logic and flow of the program
could easily be changed to process just one line at a time. The flow and
logic is basically unchanged.

All fields were either strings, integers, or floats. I used the generic
:. message to print each. For custom output use :@ instead which gives
caddr len for strings, n for integers, and F: r for floats.

>> Pick out the lines and store these as strings.
>> Since we now have lines the file isnt needed and the memory released.
>> For each line, pick out space separated tokens which are stored as
>> symbols.
>
> Yes, that's easy to do with substrings/lexemes separated by space(s),
> but we want a general framework for any arbitrary delimiters, e.g. comma
> separated values, exported from a spreadsheet. The individual field
> parsing and conversion words should not be so tightly coupled to the
> line parser such that it is not generally useful.

Agreed. I use a :split function that takes a string and a char. That was
enough to handle your two files.

> For example, shown below is a single line (row) from an exported csv of
> COVID-19 daily case data for a country over a two month period (this
> file is publicly available). Each line has the following fields
>
> 1. Province/State
> 2. Country/Region
> 3. Latitude
> 4. Longitude
> 5. # of cases for Date1
> 6. # of cases for Date2
> 7. ...
> ...
>
> The fields are separated by commas. Complications include:
>
> -- The Province/State field may be missing, so the line starts with
> ",[country/region name],...", e.g.
>
> ,Argentina,-38.4161,-63.6167,...
>
> -- The first two fields can be strings which include spaces and also
> commas. When commas are present in the string, the entire string is
> enclosed in quotes, e.g.
>
> "Saint Helena, Ascension and Tristan da Cunha",United
> Kingdom,-7.9467,-14.3559, ...
>
> Note that the quotes in the above line are part of the data in the first
> field. The field parser/converter for the first two fields must be
> capable of handling such complications, so it will not be quite so simple.

Your csv file did not have any "XX YY", (quotes within fields) so no
special handling was needed. If there were, it would be simple to first
strip the '"' chars from the string using the string :replall function.
Or a different strategy could be used if needed.

> ... keeping the substrings within a list is not going to scale
> well for a large amount of data.

It did for me for your two examples. But as I said it would be fairly
simple to modify the above program to read in just one line at a time.

-Doug

> The csv data file may be found here:
>
> https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
>
>
> The first line contains headings for the fields.

Re: Updated String Parsing Words in kForth

<c15460a1-c651-4eef-9fbb-2ec02d366f3dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17423&group=comp.lang.forth#17423

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:190c:b0:2e1:ef94:63f6 with SMTP id w12-20020a05622a190c00b002e1ef9463f6mr3460697qtc.197.1648721535647;
Thu, 31 Mar 2022 03:12:15 -0700 (PDT)
X-Received: by 2002:ad4:5ca3:0:b0:440:f131:a7a4 with SMTP id
q3-20020ad45ca3000000b00440f131a7a4mr34939197qvh.16.1648721535491; Thu, 31
Mar 2022 03:12:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Thu, 31 Mar 2022 03:12:15 -0700 (PDT)
In-Reply-To: <62442315$0$699$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=82.95.228.79; posting-account=Ebqe4AoAAABfjCRL4ZqOHWv4jv5ZU4Cs
NNTP-Posting-Host: 82.95.228.79
References: <t0jcsp$9g4$1@dont-email.me> <2022Mar13.103712@mips.complang.tuwien.ac.at>
<t0lumk$8jf$1@dont-email.me> <2022Mar18.095852@mips.complang.tuwien.ac.at>
<t125t2$agt$1@dont-email.me> <t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com> <t202km$bjl$1@dont-email.me>
<62442315$0$699$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c15460a1-c651-4eef-9fbb-2ec02d366f3dn@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: the.beez...@gmail.com (Hans Bezemer)
Injection-Date: Thu, 31 Mar 2022 10:12:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Hans Bezemer - Thu, 31 Mar 2022 10:12 UTC

On Wednesday, March 30, 2022 at 11:30:00 AM UTC+2, Doug Hoffman wrote:
> > https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
> >
> >
> > The first line contains headings for the fields.

I had absolutely no trouble at all converting it to a 4tH table using 4tH's "csv2xls.4th" (https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/csv2xls.4th) - after I had adjusted it to the insane line length of 8192 (note: this is JUST a header and ONE SINGLE data line!):

---8<---
include lib/constant.4th

create covid
," Province/State" ," Country/Region" ," Lat" ," Long" ," 1/22/20" ," 1/23/20" ," 1/24/20" ," 1/25/20" ," 1/26/20" ," 1/27/20" ," 1/28/20" ," 1/29/20" ," 1/30/20" ," 1/31/20" ," 2/1/20" ," 2/2/20" ," 2/3/20" ," 2/4/20" ," 2/5/20" ," 2/6/20" ," 2/7/20" ," 2/8/20" ," 2/9/20" ," 2/10/20" ," 2/11/20" ," 2/12/20" ," 2/13/20" ," 2/14/20" ," 2/15/20" ," 2/16/20" ," 2/17/20" ," 2/18/20" ," 2/19/20" ," 2/20/20" ," 2/21/20" ," 2/22/20" ," 2/23/20" ," 2/24/20" ," 2/25/20" ," 2/26/20" ," 2/27/20" ," 2/28/20" ," 2/29/20" ," 3/1/20" ," 3/2/20" ," 3/3/20" ," 3/4/20" ," 3/5/20" ," 3/6/20" ," 3/7/20" ," 3/8/20" ," 3/9/20" ," 3/10/20" ," 3/11/20" ," 3/12/20" ," 3/13/20" ," 3/14/20" ," 3/15/20" ," 3/16/20" ," 3/17/20" ," 3/18/20" ," 3/19/20" ," 3/20/20" ," 3/21/20" ," 3/22/20" ," 3/23/20" ," 3/24/20" ," 3/25/20" ," 3/26/20" ," 3/27/20" ," 3/28/20" ," 3/29/20" ," 3/30/20" ," 3/31/20" ," 4/1/20" ," 4/2/20" ," 4/3/20" ," 4/4/20" ," 4/5/20" ," 4/6/20" ," 4/7/20" ," 4/8/20" ," 4/9/20" ," 4/10/20" ," 4/11/20" ," 4/12/20" ," 4/13/20" ," 4/14/20" ," 4/15/20" ," 4/16/20" ," 4/17/20" ," 4/18/20" ," 4/19/20" ," 4/20/20" ," 4/21/20" ," 4/22/20" ," 4/23/20" ," 4/24/20" ," 4/25/20" ," 4/26/20" ," 4/27/20" ," 4/28/20" ," 4/29/20" ," 4/30/20" ," 5/1/20" ," 5/2/20" ," 5/3/20" ," 5/4/20" ," 5/5/20" ," 5/6/20" ," 5/7/20" ," 5/8/20" ," 5/9/20" ," 5/10/20" ," 5/11/20" ," 5/12/20" ," 5/13/20" ," 5/14/20" ," 5/15/20" ," 5/16/20" ," 5/17/20" ," 5/18/20" ," 5/19/20" ," 5/20/20" ," 5/21/20" ," 5/22/20" ," 5/23/20" ," 5/24/20" ," 5/25/20" ," 5/26/20" ," 5/27/20" ," 5/28/20" ," 5/29/20" ," 5/30/20" ," 5/31/20" ," 6/1/20" ," 6/2/20" ," 6/3/20" ," 6/4/20" ," 6/5/20" ," 6/6/20" ," 6/7/20" ," 6/8/20" ," 6/9/20" ," 6/10/20" ," 6/11/20" ," 6/12/20" ," 6/13/20" ," 6/14/20" ," 6/15/20" ," 6/16/20" ," 6/17/20" ," 6/18/20" ," 6/19/20" ," 6/20/20" ," 6/21/20" ," 6/22/20" ," 6/23/20" ," 6/24/20" ," 6/25/20" ," 6/26/20" ," 6/27/20" ," 6/28/20" ," 6/29/20" ," 6/30/20" ," 7/1/20" ," 7/2/20" ," 7/3/20" ," 7/4/20" ," 7/5/20" ," 7/6/20" ," 7/7/20" ," 7/8/20" ," 7/9/20" ," 7/10/20" ," 7/11/20" ," 7/12/20" ," 7/13/20" ," 7/14/20" ," 7/15/20" ," 7/16/20" ," 7/17/20" ," 7/18/20" ," 7/19/20" ," 7/20/20" ," 7/21/20" ," 7/22/20" ," 7/23/20" ," 7/24/20" ," 7/25/20" ," 7/26/20" ," 7/27/20" ," 7/28/20" ," 7/29/20" ," 7/30/20" ," 7/31/20" ," 8/1/20" ," 8/2/20" ," 8/3/20" ," 8/4/20" ," 8/5/20" ," 8/6/20" ," 8/7/20" ," 8/8/20" ," 8/9/20" ," 8/10/20" ," 8/11/20" ," 8/12/20" ," 8/13/20" ," 8/14/20" ," 8/15/20" ," 8/16/20" ," 8/17/20" ," 8/18/20" ," 8/19/20" ," 8/20/20" ," 8/21/20" ," 8/22/20" ," 8/23/20" ," 8/24/20" ," 8/25/20" ," 8/26/20" ," 8/27/20" ," 8/28/20" ," 8/29/20" ," 8/30/20" ," 8/31/20" ," 9/1/20" ," 9/2/20" ," 9/3/20" ," 9/4/20" ," 9/5/20" ," 9/6/20" ," 9/7/20" ," 9/8/20" ," 9/9/20" ," 9/10/20" ," 9/11/20" ," 9/12/20" ," 9/13/20" ," 9/14/20" ," 9/15/20" ," 9/16/20" ," 9/17/20" ," 9/18/20" ," 9/19/20" ," 9/20/20" ," 9/21/20" ," 9/22/20" ," 9/23/20" ," 9/24/20" ," 9/25/20" ," 9/26/20" ," 9/27/20" ," 9/28/20" ," 9/29/20" ," 9/30/20" ," 10/1/20" ," 10/2/20" ," 10/3/20" ," 10/4/20" ," 10/5/20" ," 10/6/20" ," 10/7/20" ," 10/8/20" ," 10/9/20" ," 10/10/20" ," 10/11/20" ," 10/12/20" ," 10/13/20" ," 10/14/20" ," 10/15/20" ," 10/16/20" ," 10/17/20" ," 10/18/20" ," 10/19/20" ," 10/20/20" ," 10/21/20" ," 10/22/20" ," 10/23/20" ," 10/24/20" ," 10/25/20" ," 10/26/20" ," 10/27/20" ," 10/28/20" ," 10/29/20" ," 10/30/20" ," 10/31/20" ," 11/1/20" ," 11/2/20" ," 11/3/20" ," 11/4/20" ," 11/5/20" ," 11/6/20" ," 11/7/20" ," 11/8/20" ," 11/9/20" ," 11/10/20" ," 11/11/20" ," 11/12/20" ," 11/13/20" ," 11/14/20" ," 11/15/20" ," 11/16/20" ," 11/17/20" ," 11/18/20" ," 11/19/20" ," 11/20/20" ," 11/21/20" ," 11/22/20" ," 11/23/20" ," 11/24/20" ," 11/25/20" ," 11/26/20" ," 11/27/20" ," 11/28/20" ," 11/29/20" ," 11/30/20" ," 12/1/20" ," 12/2/20" ," 12/3/20" ," 12/4/20" ," 12/5/20" ," 12/6/20" ," 12/7/20" ," 12/8/20" ," 12/9/20" ," 12/10/20" ," 12/11/20" ," 12/12/20" ," 12/13/20" ," 12/14/20" ," 12/15/20" ," 12/16/20" ," 12/17/20" ," 12/18/20" ," 12/19/20" ," 12/20/20" ," 12/21/20" ," 12/22/20" ," 12/23/20" ," 12/24/20" ," 12/25/20" ," 12/26/20" ," 12/27/20" ," 12/28/20" ," 12/29/20" ," 12/30/20" ," 12/31/20" ," 1/1/21" ," 1/2/21" ," 1/3/21" ," 1/4/21" ," 1/5/21" ," 1/6/21" ," 1/7/21" ," 1/8/21" ," 1/9/21" ," 1/10/21" ," 1/11/21" ," 1/12/21" ," 1/13/21" ," 1/14/21" ," 1/15/21" ," 1/16/21" ," 1/17/21" ," 1/18/21" ," 1/19/21" ," 1/20/21" ," 1/21/21" ," 1/22/21" ," 1/23/21" ," 1/24/21" ," 1/25/21" ," 1/26/21" ," 1/27/21" ," 1/28/21" ," 1/29/21" ," 1/30/21" ," 1/31/21" ," 2/1/21" ," 2/2/21" ," 2/3/21" ," 2/4/21" ," 2/5/21" ," 2/6/21" ," 2/7/21" ," 2/8/21" ," 2/9/21" ," 2/10/21" ," 2/11/21" ," 2/12/21" ," 2/13/21" ," 2/14/21" ," 2/15/21" ," 2/16/21" ," 2/17/21" ," 2/18/21" ," 2/19/21" ," 2/20/21" ," 2/21/21" ," 2/22/21" ," 2/23/21" ," 2/24/21" ," 2/25/21" ," 2/26/21" ," 2/27/21" ," 2/28/21" ," 3/1/21" ," 3/2/21" ," 3/3/21" ," 3/4/21" ," 3/5/21" ," 3/6/21" ," 3/7/21" ," 3/8/21" ," 3/9/21" ," 3/10/21" ," 3/11/21" ," 3/12/21" ," 3/13/21" ," 3/14/21" ," 3/15/21" ," 3/16/21" ," 3/17/21" ," 3/18/21" ," 3/19/21" ," 3/20/21" ," 3/21/21" ," 3/22/21" ," 3/23/21" ," 3/24/21" ," 3/25/21" ," 3/26/21" ," 3/27/21" ," 3/28/21" ," 3/29/21" ," 3/30/21" ," 3/31/21" ," 4/1/21" ," 4/2/21" ," 4/3/21" ," 4/4/21" ," 4/5/21" ," 4/6/21" ," 4/7/21" ," 4/8/21" ," 4/9/21" ," 4/10/21" ," 4/11/21" ," 4/12/21" ," 4/13/21" ," 4/14/21" ," 4/15/21" ," 4/16/21" ," 4/17/21" ," 4/18/21" ," 4/19/21" ," 4/20/21" ," 4/21/21" ," 4/22/21" ," 4/23/21" ," 4/24/21" ," 4/25/21" ," 4/26/21" ," 4/27/21" ," 4/28/21" ," 4/29/21" ," 4/30/21" ," 5/1/21" ," 5/2/21" ," 5/3/21" ," 5/4/21" ," 5/5/21" ," 5/6/21" ," 5/7/21" ," 5/8/21" ," 5/9/21" ," 5/10/21" ," 5/11/21" ," 5/12/21" ," 5/13/21" ," 5/14/21" ," 5/15/21" ," 5/16/21" ," 5/17/21" ," 5/18/21" ," 5/19/21" ," 5/20/21" ," 5/21/21" ," 5/22/21" ," 5/23/21" ," 5/24/21" ," 5/25/21" ," 5/26/21" ," 5/27/21" ," 5/28/21" ," 5/29/21" ," 5/30/21" ," 5/31/21" ," 6/1/21" ," 6/2/21" ," 6/3/21" ," 6/4/21" ," 6/5/21" ," 6/6/21" ," 6/7/21" ," 6/8/21" ," 6/9/21" ," 6/10/21" ," 6/11/21" ," 6/12/21" ," 6/13/21" ," 6/14/21" ," 6/15/21" ," 6/16/21" ," 6/17/21" ," 6/18/21" ," 6/19/21" ," 6/20/21" ," 6/21/21" ," 6/22/21" ," 6/23/21" ," 6/24/21" ," 6/25/21" ," 6/26/21" ," 6/27/21" ," 6/28/21" ," 6/29/21" ," 6/30/21" ," 7/1/21" ," 7/2/21" ," 7/3/21" ," 7/4/21" ," 7/5/21" ," 7/6/21" ," 7/7/21" ," 7/8/21" ," 7/9/21" ," 7/10/21" ," 7/11/21" ," 7/12/21" ," 7/13/21" ," 7/14/21" ," 7/15/21" ," 7/16/21" ," 7/17/21" ," 7/18/21" ," 7/19/21" ," 7/20/21" ," 7/21/21" ," 7/22/21" ," 7/23/21" ," 7/24/21" ," 7/25/21" ," 7/26/21" ," 7/27/21" ," 7/28/21" ," 7/29/21" ," 7/30/21" ," 7/31/21" ," 8/1/21" ," 8/2/21" ," 8/3/21" ," 8/4/21" ," 8/5/21" ," 8/6/21" ," 8/7/21" ," 8/8/21" ," 8/9/21" ," 8/10/21" ," 8/11/21" ," 8/12/21" ," 8/13/21" ," 8/14/21" ," 8/15/21" ," 8/16/21" ," 8/17/21" ," 8/18/21" ," 8/19/21" ," 8/20/21" ," 8/21/21" ," 8/22/21" ," 8/23/21" ," 8/24/21" ," 8/25/21" ," 8/26/21" ," 8/27/21" ," 8/28/21" ," 8/29/21" ," 8/30/21" ," 8/31/21" ," 9/1/21" ," 9/2/21" ," 9/3/21" ," 9/4/21" ," 9/5/21" ," 9/6/21" ," 9/7/21" ," 9/8/21" ," 9/9/21" ," 9/10/21" ," 9/11/21" ," 9/12/21" ," 9/13/21" ," 9/14/21" ," 9/15/21" ," 9/16/21" ," 9/17/21" ," 9/18/21" ," 9/19/21" ," 9/20/21" ," 9/21/21" ," 9/22/21" ," 9/23/21" ," 9/24/21" ," 9/25/21" ," 9/26/21" ," 9/27/21" ," 9/28/21" ," 9/29/21" ," 9/30/21" ," 10/1/21" ," 10/2/21" ," 10/3/21" ," 10/4/21" ," 10/5/21" ," 10/6/21" ," 10/7/21" ," 10/8/21" ," 10/9/21" ," 10/10/21" ," 10/11/21" ," 10/12/21" ," 10/13/21" ," 10/14/21" ," 10/15/21" ," 10/16/21" ," 10/17/21" ," 10/18/21" ," 10/19/21" ," 10/20/21" ," 10/21/21" ," 10/22/21" ," 10/23/21" ," 10/24/21" ," 10/25/21" ," 10/26/21" ," 10/27/21" ," 10/28/21" ," 10/29/21" ," 10/30/21" ," 10/31/21" ," 11/1/21" ," 11/2/21" ," 11/3/21" ," 11/4/21" ," 11/5/21" ," 11/6/21" ," 11/7/21" ," 11/8/21" ," 11/9/21" ," 11/10/21" ," 11/11/21" ," 11/12/21" ," 11/13/21" ," 11/14/21" ," 11/15/21" ," 11/16/21" ," 11/17/21" ," 11/18/21" ," 11/19/21" ," 11/20/21" ," 11/21/21" ," 11/22/21" ," 11/23/21" ," 11/24/21" ," 11/25/21" ," 11/26/21" ," 11/27/21" ," 11/28/21" ," 11/29/21" ," 11/30/21" ," 12/1/21" ," 12/2/21" ," 12/3/21" ," 12/4/21" ," 12/5/21" ," 12/6/21" ," 12/7/21" ," 12/8/21" ," 12/9/21" ," 12/10/21" ," 12/11/21" ," 12/12/21" ," 12/13/21" ," 12/14/21" ," 12/15/21" ," 12/16/21" ," 12/17/21" ," 12/18/21" ," 12/19/21" ," 12/20/21" ," 12/21/21" ," 12/22/21" ," 12/23/21" ," 12/24/21" ," 12/25/21" ," 12/26/21" ," 12/27/21" ," 12/28/21" ," 12/29/21" ," 12/30/21" ," 12/31/21" ," 1/1/22" ," 1/2/22" ," 1/3/22" ," 1/4/22" ," 1/5/22" ," 1/6/22" ," 1/7/22" ," 1/8/22" ," 1/9/22" ," 1/10/22" ," 1/11/22" ," 1/12/22" ," 1/13/22" ," 1/14/22" ," 1/15/22" ," 1/16/22" ," 1/17/22" ," 1/18/22" ," 1/19/22" ," 1/20/22" ," 1/21/22" ," 1/22/22" ," 1/23/22" ," 1/24/22" ," 1/25/22" ," 1/26/22" ," 1/27/22" ," 1/28/22" ," 1/29/22" ," 1/30/22" ," 1/31/22" ," 2/1/22" ," 2/2/22" ," 2/3/22" ," 2/4/22" ," 2/5/22" ," 2/6/22" ," 2/7/22" ," 2/8/22" ," 2/9/22" ," 2/10/22" ," 2/11/22" ," 2/12/22" ," 2/13/22" ," 2/14/22" ," 2/15/22" ," 2/16/22" ," 2/17/22" ," 2/18/22" ," 2/19/22" ," 2/20/22" ," 2/21/22" ," 2/22/22" ," 2/23/22" ," 2/24/22" ," 2/25/22" ," 2/26/22" ," 2/27/22" ," 2/28/22" ," 3/1/22" ," 3/2/22" ," 3/3/22" ," 3/4/22" ," 3/5/22" ," 3/6/22" ," 3/7/22" ," 3/8/22" ," 3/9/22" ," 3/10/22" ," 3/11/22" ," 3/12/22" ," 3/13/22" ," 3/14/22" ," 3/15/22" ," 3/16/22" ," 3/17/22" ," 3/18/22" ," 3/19/22" ," 3/20/22" ," 3/21/22" ," 3/22/22" ," 3/23/22" ," 3/24/22" ," 3/25/22" ," 3/26/22" ," 3/27/22" ," 3/28/22" ," 3/29/22" ," 3/30/22"
,"" ," Afghanistan" ," 33.93911" ," 67.709953" 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 8 , 8 , 8 , 8 , 11 , 11 , 11 , 14 , 20 , 25 , 26 , 26 , 26 , 24 , 24 , 34 , 40 , 42 , 74 , 80 , 91 , 106 , 114 , 114 , 166 , 192 , 235 , 269 , 270 , 299 , 337 , 367 , 423 , 444 , 521 , 521 , 555 , 607 , 665 , 770 , 794 , 845 , 908 , 933 , 996 , 1026 , 1092 , 1176 , 1226 , 1330 , 1463 , 1531 , 1703 , 1827 , 1827 , 2171 , 2469 , 2469 , 2469 , 2469 , 3224 , 3392 , 3563 , 3563 , 4402 , 4664 , 4967 , 4967 , 5339 , 6053 , 6402 , 6635 , 7072 , 7655 , 8145 , 8676 , 9216 , 9952 , 10668 , 11180 , 11917 , 12465 , 13102 , 13745 , 14529 , 15180 , 15836 , 16578 , 17353 , 17977 , 19055 , 19637 , 20428 , 21003 , 21308 , 22228 , 22976 , 23632 , 24188 , 24852 , 25613 , 25719 , 26960 , 27423 , 27964 , 28383 , 28919 , 29229 , 29567 , 29726 , 30261 , 30346 , 30702 , 31053 , 31324 , 31445 , 31848 , 32108 , 32410 , 32758 , 33037 , 33150 , 33470 , 33680 , 33739 , 34280 , 34437 , 34537 , 34541 , 34826 , 35026 , 35156 , 35315 , 35375 , 35561 , 35595 , 35701 , 35813 , 36001 , 36067 , 36122 , 36243 , 36349 , 36454 , 36557 , 36628 , 36628 , 36796 , 36796 , 36796 , 36833 , 36915 , 36982 , 37023 , 37101 , 37140 , 37140 , 37355 , 37431 , 37510 , 37517 , 37637 , 37682 , 37682 , 37685 , 37685 , 37845 , 37942 , 37980 , 38039 , 38085 , 38156 , 38199 , 38215 , 38226 , 38229 , 38229 , 38248 , 38282 , 38329 , 38374 , 38374 , 38390 , 38484 , 38580 , 38606 , 38630 , 38658 , 38692 , 38727 , 38802 , 38858 , 38901 , 38941 , 38958 , 38969 , 39005 , 39130 , 39160 , 39182 , 39231 , 39256 , 39272 , 39278 , 39313 , 39325 , 39340 , 39354 , 39371 , 39376 , 39383 , 39427 , 39508 , 39572 , 39634 , 39702 , 39779 , 39789 , 39885 , 39956 , 40014 , 40080 , 40112 , 40159 , 40227 , 40286 , 40373 , 40461 , 40461 , 40510 , 40626 , 40687 , 40768 , 40833 , 40937 , 41032 , 41145 , 41268 , 41334 , 41425 , 41501 , 41633 , 41728 , 41814 , 41935 , 41975 , 42033 , 42159 , 42297 , 42463 , 42609 , 42795 , 42969 , 43035 , 43240 , 43403 , 43628 , 43851 , 44228 , 44443 , 44503 , 44706 , 44988 , 45278 , 45490 , 45716 , 45839 , 45966 , 46215 , 46498 , 46717 , 46980 , 47258 , 47388 , 47641 , 47901 , 48136 , 48366 , 48540 , 48753 , 48826 , 48952 , 49273 , 49484 , 49703 , 49927 , 50202 , 50456 , 50536 , 50678 , 50888 , 51070 , 51357 , 51595 , 51764 , 51848 , 52007 , 52147 , 52330 , 52330 , 52513 , 52586 , 52709 , 52909 , 53011 , 53105 , 53207 , 53332 , 53400 , 53489 , 53538 , 53584 , 53690 , 53775 , 53831 , 53938 , 53984 , 54062 , 54141 , 54278 , 54403 , 54483 , 54559 , 54595 , 54672 , 54750 , 54854 , 54891 , 54939 , 55008 , 55023 , 55059 , 55121 , 55174 , 55231 , 55265 , 55330 , 55335 , 55359 , 55384 , 55402 , 55420 , 55445 , 55473 , 55492 , 55514 , 55518 , 55540 , 55557 , 55575 , 55580 , 55604 , 55617 , 55646 , 55664 , 55680 , 55696 , 55707 , 55714 , 55733 , 55759 , 55770 , 55775 , 55827 , 55840 , 55847 , 55876 , 55876 , 55894 , 55917 , 55959 , 55959 , 55985 , 55985 , 55995 , 56016 , 56044 , 56069 , 56093 , 56103 , 56153 , 56177 , 56192 , 56226 , 56254 , 56290 , 56294 , 56322 , 56384 , 56454 , 56517 , 56572 , 56595 , 56676 , 56717 , 56779 , 56873 , 56943 , 57019 , 57144 , 57160 , 57242 , 57364 , 57492 , 57534 , 57612 , 57721 , 57793 , 57898 , 58037 , 58214 , 58312 , 58542 , 58730 , 58843 , 59015 , 59225 , 59370 , 59576 , 59745 , 59939 , 60122 , 60300 , 60563 , 60797 , 61162 , 61455 , 61755 , 61842 , 62063 , 62403 , 62718 , 63045 , 63355 , 63412 , 63484 , 63598 , 63819 , 64122 , 64575 , 65080 , 65486 , 65728 , 66275 , 66903 , 67743 , 68366 , 69130 , 70111 , 70761 , 71838 , 72977 , 74026 , 75119 , 76628 , 77963 , 79224 , 80841 , 82326 , 84050 , 85892 , 87716 , 88740 , 89861 , 91458 , 93272 , 93288 , 96531 , 98734 , 100521 , 101906 , 103902 , 105749 , 107957 , 109532 , 111592 , 113124 , 114220 , 115751 , 117158 , 118659 , 120216 , 122156 , 123485 , 124748 , 125937 , 127464 , 129021 , 130113 , 131586 , 132777 , 133578 , 134653 , 135889 , 136643 , 137853 , 139051 , 140224 , 140602 , 141499 , 142414 , 142762 , 143183 , 143439 , 143666 , 143871 , 144285 , 145008 , 145552 , 145996 , 146523 , 147154 , 147501 , 147985 , 148572 , 148933 , 149361 , 149810 , 150240 , 150458 , 150778 , 151013 , 151291 , 151563 , 151770 , 151941 , 152033 , 152142 , 152243 , 152363 , 152411 , 152448 , 152497 , 152511 , 152583 , 152660 , 152722 , 152822 , 152960 , 153007 , 153033 , 153148 , 153220 , 153260 , 153306 , 153375 , 153395 , 153423 , 153534 , 153626 , 153736 , 153840 , 153962 , 153982 , 153990 , 154094 , 154180 , 154283 , 154361 , 154487 , 154487 , 154487 , 154585 , 154712 , 154757 , 154800 , 154960 , 154960 , 154960 , 155072 , 155093 , 155128 , 155174 , 155191 , 155191 , 155191 , 155287 , 155309 , 155380 , 155429 , 155448 , 155466 , 155508 , 155540 , 155599 , 155627 , 155682 , 155688 , 155739 , 155764 , 155776 , 155801 , 155859 , 155891 , 155931 , 155940 , 155944 , 156040 , 156071 , 156124 , 156166 , 156196 , 156210 , 156250 , 156284 , 156307 , 156323 , 156363 , 156392 , 156397 , 156397 , 156397 , 156397 , 156414 , 156456 , 156487 , 156510 , 156552 , 156610 , 156649 , 156739 , 156739 , 156812 , 156864 , 156896 , 156911 , 157015 , 157032 , 157144 , 157171 , 157190 , 157218 , 157260 , 157289 , 157359 , 157387 , 157412 , 157431 , 157454 , 157499 , 157508 , 157542 , 157585 , 157603 , 157611 , 157633 , 157648 , 157660 , 157665 , 157725 , 157734 , 157745 , 157787 , 157797 , 157816 , 157841 , 157878 , 157887 , 157895 , 157951 , 157967 , 157998 , 158037 , 158056 , 158084 , 158107 , 158189 , 158183 , 158205 , 158245 , 158275 , 158300 , 158309 , 158381 , 158394 , 158471 , 158511 , 158602 , 158639 , 158678 , 158717 , 158826 , 158974 , 159070 , 159303 , 159516 , 159548 , 159649 , 159896 , 160252 , 160692 , 161004 , 161057 , 161290 , 162111 , 162926 , 163555 , 164190 , 164727 , 165358 , 165711 , 166191 , 166924 , 167739 , 168550 , 169448 , 169940 , 170152 , 170604 , 171246 , 171422 , 171519 , 171673 , 171857 , 171931 , 172205 , 172441 , 172716 , 172901 , 173047 , 173084 , 173146 , 173395 , 173659 , 173879 , 174073 , 174214 , 174214 , 174331 , 174582 , 175000 , 175353 , 175525 , 175893 , 175974 , 176039 , 176201 , 176409 , 176571 , 176743 , 176918 , 176983 , 177039 , 177093 , 177191 , 177255 , 177321 , 177321 , 177321 , 177321 , 177520 , 177602 , 177658 , 177716 ,
---8<---
And yes, it compiled as well (clipped) ;-)


Click here to read the complete article
Re: Updated String Parsing Words in kForth

<62458c46$0$704$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17424&group=comp.lang.forth#17424

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Thu, 31 Mar 2022 07:10:59 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Subject: Re: Updated String Parsing Words in kForth
Content-Language: en-US
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me> <62442315$0$699$14726298@news.sunsite.dk>
<c15460a1-c651-4eef-9fbb-2ec02d366f3dn@googlegroups.com>
From: dhoffman...@gmail.com (Doug Hoffman)
In-Reply-To: <c15460a1-c651-4eef-9fbb-2ec02d366f3dn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 18
Message-ID: <62458c46$0$704$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 81113c86.news.sunsite.dk
X-Trace: 1648725062 news.sunsite.dk 704 glidedog@gmail.com/68.55.82.126:51334
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Thu, 31 Mar 2022 11:10 UTC

On 3/31/22 6:12 AM, Hans Bezemer wrote:

> I had absolutely no trouble at all converting it to a 4tH table using
> 4tH's "csv2xls.4th"

Yeah. It isn't a difficult file to parse. JSON is a bit more challenging :-)

> Using "csvscan.4th"

> I use 4tH a LOT professionally, therefore it is possible I've developed
> a whole shebang of tools to tackle stuff like this.
>
> So - to me it's a walk in the park. Nothing special. All in a days work.

Interesting to see the different ways of doing it.

-Doug

Re: Updated String Parsing Words in kForth

<6245c669$0$692$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17429&group=comp.lang.forth#17429

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Thu, 31 Mar 2022 11:19:02 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Subject: Re: Updated String Parsing Words in kForth
Content-Language: en-US
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me> <62442315$0$699$14726298@news.sunsite.dk>
From: dhoffman...@gmail.com (Doug Hoffman)
In-Reply-To: <62442315$0$699$14726298@news.sunsite.dk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 4
Message-ID: <6245c669$0$692$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 329230e2.news.sunsite.dk
X-Trace: 1648739945 news.sunsite.dk 692 glidedog@gmail.com/68.55.82.126:51434
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Thu, 31 Mar 2022 15:19 UTC

I notice that the .csv file parsing I show needs a tweak. When a line
*begins* with a comma then the first field must be read as a blank.

-Doug

Re: Updated String Parsing Words in kForth

<t29luq$rv9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17442&group=comp.lang.forth#17442

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: krishna....@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Updated String Parsing Words in kForth
Date: Sat, 2 Apr 2022 09:18:00 -0500
Organization: A noiseless patient Spider
Lines: 120
Message-ID: <t29luq$rv9$1@dont-email.me>
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me> <62442315$0$699$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 2 Apr 2022 14:18:02 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0d337509a6ea557f2056dc9593ed1e89";
logging-data="28649"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19EiQsU7uXIUtZSXhu0BMOT"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:fNpLCadBFaSh6aCfrOquVvgAFP4=
In-Reply-To: <62442315$0$699$14726298@news.sunsite.dk>
Content-Language: en-US
 by: Krishna Myneni - Sat, 2 Apr 2022 14:18 UTC

On 3/30/22 04:29, Doug Hoffman wrote:
>
> include FMS2VT.f
>
> 0 value records
>
> : .line ( n -- ) 1- records :at { a }
>   10 set-precision
>   a :size 0 ?do cr i dup .  a :at dup .class :. loop ;
>
> : process-line { a -- }
>    a :size 0 ?do i a :at
>     dup :@ >integer
>      if >int i a :to <free
>      else dup :@ >float
>           if >flt i a :to <free
>           else drop
>           then
>      then
>    loop a records :add ;
>
> : process-file ( fileObj n c -- ) >r
>   over :!line-len
>   >array to records
>   begin
>     dup :each
>   while
>     r@ swap :split process-line
>   repeat
>     <free r> drop ; \ free the file
>
> timer-reset
> \ s" h2lines.dat" >file 500 bl process-file
> \ s" time_series_covid19_confirmed_global.csv"
> \    >file 10000 ',' process-file
> .elapsed
>
> \ h2lines.dat
> \ => 40 ms
>
> records :size . \ => 3261  ok
>
> 1 .line \ => output follows
> 0 int 1
> 1 flt 1108.127000
> 2 flt 90242.35200
> 3 flt 3060000.000
> 4 string X
> 5 int 0
> 6 int 0
> 7 string P
> 8 string B
> 9 int 0
> 10 int 1
> 11 string P ok
>
> records <freeAll
>

It appears that you have a convenient package for storing records with
fields consisting of objects. However, it's not clear if your records
have named references to the individual fields. I imagine you should be
able to add named references to individual fields.

> \ time_series_covid19_confirmed_global.csv
> \ => 190 ms
>
> records :size . \ => 285 ok
> 1 .line
> 0 string Province/State
> 1 string Country/Region
> 2 string Lat
> 3 string Long
> 4 string 1/22/20
> 5 string 1/23/20
> 6 string 1/24/20
> ...

The general case has commas inside of a single field, for a text field
enclosed in quotes. I don't know if it gets more complicated than that, e.g.

"text1,text2,\"text3\",text4", ...

>> ...
>> "Saint Helena, Ascension and Tristan da Cunha",United
>> Kingdom,-7.9467,-14.3559, ...
>>
>> Note that the quotes in the above line are part of the data in the
>> first field. The field parser/converter for the first two fields must
>> be capable of handling such complications, so it will not be quite so
>> simple.
>
> Your csv file did not have any "XX YY", (quotes within fields) so no
> special handling was needed. If there were, it would be simple to first
> strip the '"' chars from the string using the string :replall function.
> Or a different strategy could be used if needed.
>

Strange. The example I'm referring to is line 273. Perhaps the file has
changed. The copy I downloaded has a md5sum of

08a6a9d1c3a114d92500cdaad755ac29 time_series_covid19_confirmed_global.csv

Yes, one must detect the first character being quote and then parse to
the next quote. It is simple, but still an additional complication,
compared to just scanning for the next comma.

....
>> The csv data file may be found here:
>>
>> https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
>>
>>
>> The first line contains headings for the fields.
>
>

Krishna

Re: Updated String Parsing Words in kForth

<624885fd$0$705$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17445&group=comp.lang.forth#17445

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Sat, 2 Apr 2022 13:20:58 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
From: dhoffman...@gmail.com (Doug Hoffman)
Subject: Re: Updated String Parsing Words in kForth
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me> <62442315$0$699$14726298@news.sunsite.dk>
<t29luq$rv9$1@dont-email.me>
Content-Language: en-US
In-Reply-To: <t29luq$rv9$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 51
Message-ID: <624885fd$0$705$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 9b58d6b4.news.sunsite.dk
X-Trace: 1648920061 news.sunsite.dk 705 glidedog@gmail.com/68.55.82.126:51922
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Sat, 2 Apr 2022 17:20 UTC

On 4/2/22 10:18 AM, Krishna Myneni wrote:

> ... it's not clear if your records
> have named references to the individual fields.

They don't. The fields use zero-based indices to access because they
are arrays. It is straightforward to assign names to the numbers, as you
suggest.

> I imagine you should be able to add named references to individual
> fields.

Or perhaps you could use the Country/Region and Date in the first line
to have automatic conversion into column(field) and row(line) numbers,
at least in the case of your .csv file. A simple string search for each
would work.

Maybe you want the date strings converted into actual dates. This should
not be hard. A date class could be used.

> The general case has commas inside of a single field, for a text field
> enclosed in quotes. I don't know if it gets more complicated than that,
> e.g.
>
> "text1,text2,\"text3\",text4", ...

I think it can get fairly complicated.
Ref: https://www.csvreader.com/csv_format.php

>>> "Saint Helena, Ascension and Tristan da Cunha",United
>>> Kingdom,-7.9467,-14.3559, ...
>>>
>>> Note that the quotes in the above line are part of the data in the
>>> first field. The field parser/converter for the first two fields must
>>> be capable of handling such complications, so it will not be quite so
>>> simple.

Yes, you are right and I missed that. My personal inclination would be
to just modify :split to handle the case when the first field is
escaped with quotes. Not hard to do.

But to handle all possible cases as shown in the referenced URL might
not be something I would do unless I needed it. Maybe it wouldn't be too
complex. Not sure.

Line 197 is the other line that has an escaped comma(first field only).

Thanks for the feedback.

-Doug

Re: Updated String Parsing Words in kForth

<t2a1uj$cg9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17446&group=comp.lang.forth#17446

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: krishna....@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Updated String Parsing Words in kForth
Date: Sat, 2 Apr 2022 12:42:42 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <t2a1uj$cg9$1@dont-email.me>
References: <t0jcsp$9g4$1@dont-email.me>
<b487fc38-26e8-474f-849c-afd59fc3e284n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 2 Apr 2022 17:42:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0d337509a6ea557f2056dc9593ed1e89";
logging-data="12809"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18R8lzdcB3nVQZ+JVmVTOOR"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:qYDA9y/zaCSO/1VE0q9eqmpW27Y=
In-Reply-To: <b487fc38-26e8-474f-849c-afd59fc3e284n@googlegroups.com>
Content-Language: en-US
 by: Krishna Myneni - Sat, 2 Apr 2022 17:42 UTC

On 3/30/22 01:45, minf...@arcor.de wrote:
> Krishna Myneni schrieb am Sonntag, 13. März 2022 um 01:12:12 UTC+1:
>> I am overhauling the string parsing words in the kForth String Words
>> library (strings.4th).
....
>> These words need clearer names and an upgrade (particularly PARSE_CSV).
>> My proposed replacements are
>>
>>
>> \ NEXT-BS-TOKEN ( a u -- atok utok arem urem )
>> \ NEXT-CS-TOKEN ( a u -- atok utok arem urem )
>> \ PARSED-BSV ( a u -- a1 u1 a2 u2 ... an un n )
>> \ PARSED-CSV ( a u -- a1 u1 a2 u2 ... an un n )
>> \ ITH-PARSED ( a1 u1 ... an un n i -- a1 u1 ... an un n ai ui )
>> \ DROP-PARSED ( a1 u1 ... an un n -- )
>> \ PARSED>FLOATS ( a1 u1 ... an un n -- n ) ( F: r1 ... rn )
>>
>> NEXT-BS-TOKEN is a slightly more efficient version of PARSE_TOKEN, but
>> leaves the token and remaining strings in a swapped order. The name
>> indicates that it is parsing the next "blank(s)-separated" token in the
>> string.
>>
>> NEXT-CS-TOKEN parses the next "comma-separated" token in the string and
>> returns the token and remaining substrings.
>>
>> PARSED-BSV starts with an input string and repeatedly applies
>> NEXT-BS-TOKEN until the remaining string is null and returns all of the
>> substrings and the token count on the stack.
>>
>> PARSED-CSV starts with an input string and repeatedly applies
>> NEXT-CS-TOKEN until the remaining string is null. It returns all of the
>> substrings and the token count on the stack. Unlike PARSED-BSV, each
>> comma delimiter will mark a separte token string, e.g. if the input
>> string is ",," there will be three token strings, each of length zero.
>>
>> ITH-PARSED provides a way to PICK the ith substring returned by
>> PARSED-BSV and PARSED-CSV.
>>
>> DROP-PARSED may be used to discard the n token substrings returned by
>> PARSED-BSV and PARSED-CSV.
>>
>> PARSED>FLOATS converts each of the token substrings returned by
>> PARSED-BSV or PARSED-CSV to n floating point numbers. If the string to
>> float conversion fails, the floating point value NAN will be returned.
>> If a substring has zero length, the corresponding fp value will also be NAN.
>>
>> Comments or suggestions?
>
> How much would it simplify the task to just use scanf ??
> If C is not available it can be called from an OS library.
> IIRC there are also similar Forth versions around.

I believe there has been discussion in c.l.f. of a scanf-like word in
Forth in the past. That's one valid approach, but might be actually more
complex than the STRING-PARSE solution I'm looking at.

--
Krishna

Re: Updated String Parsing Words in kForth

<1658d1b7-0980-4fca-be85-32ff318ca55cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17448&group=comp.lang.forth#17448

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:620a:2293:b0:600:2b7b:2a19 with SMTP id o19-20020a05620a229300b006002b7b2a19mr9613815qkh.408.1648923831450;
Sat, 02 Apr 2022 11:23:51 -0700 (PDT)
X-Received: by 2002:a05:6214:1cc5:b0:443:6a15:5894 with SMTP id
g5-20020a0562141cc500b004436a155894mr11941863qvd.59.1648923831269; Sat, 02
Apr 2022 11:23:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sat, 2 Apr 2022 11:23:51 -0700 (PDT)
In-Reply-To: <t2a1uj$cg9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:1c05:2f14:600:35b9:ee0b:a628:8c7;
posting-account=-JQ2RQoAAAB6B5tcBTSdvOqrD1HpT_Rk
NNTP-Posting-Host: 2001:1c05:2f14:600:35b9:ee0b:a628:8c7
References: <t0jcsp$9g4$1@dont-email.me> <b487fc38-26e8-474f-849c-afd59fc3e284n@googlegroups.com>
<t2a1uj$cg9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1658d1b7-0980-4fca-be85-32ff318ca55cn@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: mhx...@iae.nl (Marcel Hendrix)
Injection-Date: Sat, 02 Apr 2022 18:23:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 213
 by: Marcel Hendrix - Sat, 2 Apr 2022 18:23 UTC

On Saturday, April 2, 2022 at 7:42:45 PM UTC+2, Krishna Myneni wrote:
> On 3/30/22 01:45, minf...@arcor.de wrote:
> > Krishna Myneni schrieb am Sonntag, 13. März 2022 um 01:12:12 UTC+1:
> >> I am overhauling the string parsing words in the kForth String Words
> >> library (strings.4th).
> ...
[..]
> > How much would it simplify the task to just use scanf ??
> > If C is not available it can be called from an OS library.
> > IIRC there are also similar Forth versions around.
> I believe there has been discussion in c.l.f. of a scanf-like word in
> Forth in the past. That's one valid approach, but might be actually more
> complex than the STRING-PARSE solution I'm looking at.
(*
* LANGUAGE : ANS Forth with extensions
* PROJECT : Forth Environments
* DESCRIPTION : scanf() functionality
* CATEGORY : Tools
* AUTHOR : Marcel Hendrix
* LAST CHANGE : Monday, March 28, 2011, 01:08, Marcel Hendrix, FGETF -- WS instead of BL
* LAST CHANGE : August 27, 2010, Marcel Hendrix
*)

NEEDS -miscutil

REVISION -scanf "--- Scanf reader Version 0.03 ---"

PRIVATES

DOC
(*
Limitation #1: Line length should be less than 4096 characters.

Limitation #2: Reads up to 256 items per line. Items are either strings or
floating-point values (integers and doubles are converted).. Strings
are auto allocated and never deallocated, deallocation is the job of
the program ( FREE the returned address.)

The format string states the sequence that is expected with %<char>[*] markers.

Recognized %<char> markers are:
%s -- a string, not containing white space
n %sx -- a string, delimited by ASCII n
%d -- a single-precision integer value
%d2 -- a double integer
%g -- a float value
n %c -- the character with ASCII value n (normally only used as "n %c*" )

With %<char>* the corresponding item is discarded and not stored (skip item).
The accepted items are stored sequentially in a read-only array[16] of pointers.
3 %g@ retrieves the fourth read (float) item.
1 %s@ retrieves the string read as the second item.
*)
ENDDOC

#4096 =: /line PRIVATE
#256 =: /items PRIVATE

0 VALUE eof? PRIVATE
0 VALUE line# PRIVATE
0 VALUE #size PRIVATE
0 VALUE #esize PRIVATE
0 VALUE handle PRIVATE
0 VALUE #items PRIVATE ( permanent )
0 VALUE rptr PRIVATE

CREATE NUMPAD PRIVATE /line 2+ CHARS ALLOT
CREATE rvals PRIVATE /items CELLS ALLOT
CREATE fvals PRIVATE /items FLOATS ALLOT

: %g@ fvals []FLOAT F@ ; ( ix -- ) ( F: -- r )
: %d@ %g@ F>S ; ( ix -- ) ( -- n )
: %d2@ %g@ F>D ; ( ix -- ) ( -- d )
: %c@ rvals []CELL @ ; ( ix -- ) ( -- c )
: %s@ %c@ @+ ; ( ix -- c-addr u )

: GET-INDEX ( -- n ) #items ;
: RESET-INDEX ( -- ) CLEAR #items ; -- does not deallocate strings; do that yourself when needed!

: CLOSE-DATAFILE ( -- )
handle CLOSE-FILE ?FILE
CLEAR handle CLEAR line# CLEAR eof? ;

: >>ITEMS ( c-addr u -- )
#items /items >= IF CLOSE-DATAFILE TRUE ABORT" >>ITEMS :: too many items, overflow in rvals[]" ENDIF
DUP CELL+ ALLOCATE ?ALLOCATE
( -- c-addr1 u1 addr2 ) CELLPACK
rvals #items CELL[] ! 1 +TO #items ; PRIVATE

: >>FITEMS ( F: r -- )
#items /items >= IF CLOSE-DATAFILE TRUE ABORT" >>FITEMS :: too many items, overflow in fvals[]" ENDIF
fvals #items FLOAT[] F! 1 +TO #items ; PRIVATE

: >>CITEMS ( c -- )
#items /items >= IF CLOSE-DATAFILE TRUE ABORT" >>CITEMS :: too many items, overflow in rvals[]" ENDIF
rvals #items CELL[] ! 1 +TO #items ; PRIVATE

: OPEN-DATAFILE ( c-addr u -- )
handle IF CR ." OPEN-DATAFILE :: " &" EMIT TYPE &" EMIT ." -- file already open."
FALSE TO eof? CLEAR line# CLEAR #items
0. handle REPOSITION-FILE ?FILE EXIT
ENDIF
R/O BIN OPEN-FILE ?FILE TO handle
FALSE TO eof? CLEAR line# CLEAR #items ;

: .EOF ( c-addr u -- )
CR TYPE ." :: <EOF> in "
HERE #256 handle FID>NAME DROP TYPE ." at line " line# DEC.
CR ." current input is " &" EMIT NUMPAD #esize TYPE &" EMIT ABORT ; PRIVATE

: READ-NEXTLINE ( -- )
1 +TO line#
NUMPAD /line 2+ handle READ-LINE ?FILE
0= IF TRUE TO eof? S" READ-NEXTLINE" .EOF ENDIF
DUP TO #esize TO #size NUMPAD TO rptr ;

-- Allows to quickly find the interesting part of a file (skip headers etc.)
: MARKED-WITH ( c-addr u -- )
DLOCAL str
BEGIN eof? 0 WHILE READ-NEXTLINE
NUMPAD #size str SEARCH NIP NIP
UNTIL THEN
eof? IF S" MARKED-WITH" .EOF ENDIF ;

: GET$ ( delimiter -- c-addr u )
LOCAL delimiter
rptr #size delimiter SKIP
delimiter Split-At-Char TO #size TO rptr ; PRIVATE

: FGET$ ( -- c-addr u ) BL GET$ ;

: FGETF ( F: -- r )
rptr #size -LEADING-WS Split-At-WS TO #size TO rptr
( c-addr u -- ) >FLOAT
0= IF S" FGETF" .EOF CLOSE-DATAFILE TRUE ABORT" FGETF :: invalid number" ENDIF ;

: FGETI ( -- n ) FGETF F>S ;

: FGETC ( -- char )
#size 0<= IF S" FGETC" .EOF CLOSE-DATAFILE TRUE ABORT" FGETC :: empty line" ENDIF
rptr C@ 1 +TO rptr 1 -TO #size ;

WARNING @ WARNING OFF
: %sx GET$ >>ITEMS ; ( delimiter -- )
: %sx* GET$ 2DROP ; ( delimiter -- )
: %d FGETF >>FITEMS ;
: %d* FGETF FDROP ;
: %c FGETC >>CITEMS ;
: %c* FGETC DROP ;
: %s BL %sx ;
: %s* BL %sx* ;
: %d2 %d ;
: %g %d ;
: %d2* %d* ;
: %g* %d* ;
WARNING !

:ABOUT CR ." OPEN-DATAFILE ( c-addr u -- ) -- open file"
CR ." CLOSE-DATAFILE ( -- ) -- close file"
CR ." READ-NEXTLINE ( -- ) -- read a full line but do not parse it"
CR ." MARKED-WITH ( c-addr u -- ) -- go to the first line with this string on it (no parsing)"
CR
CR ." FGET$ FGETF FGETI FGETC -- parse next typed item (don't store, return on stack)"
CR ." From the current line, parse and store internally:"
CR ." %s $s* -- read a blank delimited string, optionally discard it"
CR ." %sx $sx* ( n -- ) -- read n-delimited string, optionally discard it"
CR ." %d $d* -- read integer, optionally discard it"
CR ." %d2 $d2* -- read double integer, optionally discard it"
CR ." %g $g* -- read a float, optionally discard it"
CR ." RESET-INDEX -- free up space for next items (before CLOSE-DATAFILE)"
CR
CR ." Retrieve values at any point (#items available)"
CR ." %c@ ( ix -- c ) -- fetch a character"
CR ." %s@ ( ix -- c-addr u ) -- fetch a string value, deallocate yourself"
CR ." %g@ ( ix -- ) ( F: -- r ) -- fetch a float value"
CR ." %d@ ( ix -- n ) -- fetch an integer value"
CR ." %d2@ ( ix -- d ) -- fetch a double integer value" ;

NESTING @ 1 = [IF] .ABOUT -scanf CR [THEN]
DEPRIVE

(* End of Source *)

Re: Updated String Parsing Words in kForth

<6249acb9$0$692$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17454&group=comp.lang.forth#17454

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Sun, 3 Apr 2022 10:18:30 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Subject: Re: Updated String Parsing Words in kForth
Content-Language: en-US
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me> <62442315$0$699$14726298@news.sunsite.dk>
<t29luq$rv9$1@dont-email.me>
From: dhoffman...@gmail.com (Doug Hoffman)
In-Reply-To: <t29luq$rv9$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 44
Message-ID: <6249acb9$0$692$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: d5362fa6.news.sunsite.dk
X-Trace: 1648995513 news.sunsite.dk 692 glidedog@gmail.com/68.55.82.126:52114
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Sun, 3 Apr 2022 14:18 UTC

On 4/2/22 10:18 AM, Krishna Myneni wrote:

> I imagine you should be
> able to add named references to individual fields.

Yes. One way:

: myfield ( n -- ) create , does> @ swap :at ;
: enumk 0 do i myfield loop ;
8 enumk NAME1 NAME2 NAME3 NAME4 NAME5 NAME6 NAME7 NAME8

\ Example use for file h2lines.dat
1 records :at name3 :. \ => 1092.195000 ok

> The general case has commas inside of a ... text field
> enclosed in quotes.

It was not hard to modify :split to handle that:

: :split-csv ( in-str -- 1-arry-obj )
>array 0 0 >string 0 locals| doing"? ac-str arr in-str |
in-str :reset
in-str :@ ( adr len ) bounds ?do
i c@ dup '"' =
if drop doing"? invert to doing"?
else
dup ',' =
if doing"? if ac-str :ch+
else drop ac-str arr :add
0 0 >string to ac-str
then
else ac-str :ch+
then
then
loop ac-str arr :add
arr ;

: :split ( char str-obj -- 1-arry-obj )
over ',' = if nip :split-csv exit then :split ;

Both files now parse correctly. Naming fields is optional.

-Doug

Re: Updated String Parsing Words in kForth

<8cc3ed6c-9084-4ca2-b788-37e537e384dan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17463&group=comp.lang.forth#17463

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:6214:2529:b0:443:7b3d:2d5c with SMTP id gg9-20020a056214252900b004437b3d2d5cmr16100293qvb.50.1649063933903;
Mon, 04 Apr 2022 02:18:53 -0700 (PDT)
X-Received: by 2002:a05:6214:1cc3:b0:443:689a:9b72 with SMTP id
g3-20020a0562141cc300b00443689a9b72mr26029470qvd.125.1649063933719; Mon, 04
Apr 2022 02:18:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 4 Apr 2022 02:18:53 -0700 (PDT)
In-Reply-To: <6249acb9$0$692$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=82.95.228.79; posting-account=Ebqe4AoAAABfjCRL4ZqOHWv4jv5ZU4Cs
NNTP-Posting-Host: 82.95.228.79
References: <t0jcsp$9g4$1@dont-email.me> <2022Mar13.103712@mips.complang.tuwien.ac.at>
<t0lumk$8jf$1@dont-email.me> <2022Mar18.095852@mips.complang.tuwien.ac.at>
<t125t2$agt$1@dont-email.me> <t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com> <t202km$bjl$1@dont-email.me>
<62442315$0$699$14726298@news.sunsite.dk> <t29luq$rv9$1@dont-email.me> <6249acb9$0$692$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8cc3ed6c-9084-4ca2-b788-37e537e384dan@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: the.beez...@gmail.com (Hans Bezemer)
Injection-Date: Mon, 04 Apr 2022 09:18:53 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 88
 by: Hans Bezemer - Mon, 4 Apr 2022 09:18 UTC

On Sunday, April 3, 2022 at 4:18:35 PM UTC+2, Doug Hoffman wrote:
Ok, this is my contribution. Don't whine about "it's not ANS" - it wasn't intended to. So if you wanna use it, do the work. I'll give you a few hints:

: c/string over >r chop r> c@ ;
: (unumber) 0. 2swap >number ;
: >zero dup xor ;
<largest possible integer> constant max-n

And here she comes:

\ Like sscanf(), but with a few differences
\ - Whitespace in both buffer and format string is largely ignored;
\ - %c with a width REQUIRES a string, %c without REQUIRES a variable;
\ - %s will stop on ANY defined delimiter, not just whitespace;
\ - Returns the unscanned part of the format string;
\ - Variable #SCANF returns the number of assignments made;
\ - Stack diagram of variables is inverted: d c b a s" %a %b %c %d";
\ - On failure, unused variables WILL remain on the stack.

\ Typical use:
\ a b c$ d$ e f g
\ s" %c%c %d%%%s, %4c%c%c" s" ab -12345%This is the end, 543210" sscanf

variable (delim) \ delimiter of string
variable (width) \ width of string
variable #scanf \ number of assignments
\ a few execution tokens
: (dec) decimal ; : (hex) hex ; : (oct) octal ;
: (putback) 1+ swap char- swap ; ( a n -- a-1 n+1)
: (unumber) 0 -rot >snumber ; ( a1 n1 -- n3 a2 n2)
: (number) base @ >r execute (unumber) r> base ! ;
: (delimiter!) dup if c/string >r (putback) r> else dup then (delim) ! ;
: (delimiter?) over [char] ! - 0< over bl = and >r = r> or ;
: (getwidth) ['] (dec) (number) rot dup unless max-n + then ;
: (assigned) 1 #scanf +! false ; ( -- f)
: (assign) swap ! (assigned) ; ( x n -- f)
: (place) rot place (assigned) ; ( a1 a2 n2 -- f)
\ get sign flag
: (sign) ( a1 n1 -- f a2 n2)
c/string dup [char] - = dup >r \ is it a minus sign?
if drop else [char] + <> if (putback) then then r> -rot
; \ drop plus sign
\ skip white space
: (skipwhite) ( a1 n1 -- a2 n2)
begin dup while c/string bl > dup >r if (putback) then r> until
; \ parse buffer string
: (getstr) ( a1 n1 -- a2 n2 a3 n3)
over >r begin \ save starting address
dup \ any string left?
while \ if so, still within width?
over r@ - (width) @ < \ did we hit the delimiter?
while \ if so, put back the character
c/string (delim) @ (delimiter?) dup if >r (putback) r> then
until over r@ - r> swap 2swap \ calculate string dimensions
; \ handle type specifiers
: (gettype) ( a1 n1 a2 n2 -- f)
2>r (getwidth) (width) ! c/string >r (delimiter!) r> -rot
2r> 2swap 2>r (skipwhite) rot \ get properties, skip whitespace
case \ select type specifier
[char] d of \ 'd' requires a sign
(sign) ['] (dec) (number) 2>r swap if negate then (assign) endof
[char] u of ['] (dec) (number) 2>r (assign) endof
[char] x of ['] (hex) (number) 2>r (assign) endof
[char] o of ['] (oct) (number) 2>r (assign) endof
[char] % of c/string [char] % <> -rot 2>r endof
[char] s of (getstr) 2>r (place) endof
[char] c of \ if default width specified
(width) @ max-n = if \ just parse a single character
c/string -rot 2>r (assign) \ if a width has been specified
else \ take the entire length specified
2dup (width) @ /string 0 max 2>r (width) @ min (place)
then endof \ and put it in a string
endcase 2r> rot 2r> rot >r 2swap r>
; \ select character on format string
: (select) ( a1 n1 c a2 n2 -- a3 n3 a4 n4 f)
rot dup >r \ save current character
[char] % = if rdrop (gettype) ;then \ act on type specifier
r@ bl = if (skipwhite) 2swap (skipwhite) 2swap r> >zero ;then
c/string r> <> \ does this character match
;

: sscanf ( xn .. x0 a1 n1 a2 n2 -- a3 n3)
0 #scanf ! 2>r begin dup while c/string 2r> (select) -rot 2>r until 2rdrop
;

Hans Bezemer

Re: Updated String Parsing Words in kForth

<624afa89$0$706$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17464&group=comp.lang.forth#17464

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Mon, 4 Apr 2022 10:02:45 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Subject: Re: Updated String Parsing Words in kForth
Content-Language: en-US
Newsgroups: comp.lang.forth
References: <t0jcsp$9g4$1@dont-email.me>
<2022Mar13.103712@mips.complang.tuwien.ac.at> <t0lumk$8jf$1@dont-email.me>
<2022Mar18.095852@mips.complang.tuwien.ac.at> <t125t2$agt$1@dont-email.me>
<t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com>
<t202km$bjl$1@dont-email.me> <62442315$0$699$14726298@news.sunsite.dk>
<t29luq$rv9$1@dont-email.me> <6249acb9$0$692$14726298@news.sunsite.dk>
<8cc3ed6c-9084-4ca2-b788-37e537e384dan@googlegroups.com>
From: dhoffman...@gmail.com (Doug Hoffman)
In-Reply-To: <8cc3ed6c-9084-4ca2-b788-37e537e384dan@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 9
Message-ID: <624afa89$0$706$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 5c748849.news.sunsite.dk
X-Trace: 1649080969 news.sunsite.dk 706 glidedog@gmail.com/68.55.82.126:52663
X-Complaints-To: staff@sunsite.dk
 by: Doug Hoffman - Mon, 4 Apr 2022 14:02 UTC

On 4/4/22 5:18 AM, Hans Bezemer wrote:

> Ok, this is my contribution. Don't whine about "it's not ANS" - it
> wasn't intended to. So if you wanna use it, do the work.
Looks good to me, though I didn't test it. Also looks to be well
factored and documented. I'll not complain about non-Ans as I often use
oForth. ;-)

-Doug

Re: Updated String Parsing Words in kForth

<f0506731-e2c0-46ad-a160-b5bac9915f82n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17465&group=comp.lang.forth#17465

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:6214:21cf:b0:42d:cc:4121 with SMTP id d15-20020a05621421cf00b0042d00cc4121mr468754qvh.70.1649085290802;
Mon, 04 Apr 2022 08:14:50 -0700 (PDT)
X-Received: by 2002:ad4:5ca3:0:b0:440:f131:a7a4 with SMTP id
q3-20020ad45ca3000000b00440f131a7a4mr98152qvh.16.1649085290656; Mon, 04 Apr
2022 08:14:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 4 Apr 2022 08:14:50 -0700 (PDT)
In-Reply-To: <624afa89$0$706$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=82.95.228.79; posting-account=Ebqe4AoAAABfjCRL4ZqOHWv4jv5ZU4Cs
NNTP-Posting-Host: 82.95.228.79
References: <t0jcsp$9g4$1@dont-email.me> <2022Mar13.103712@mips.complang.tuwien.ac.at>
<t0lumk$8jf$1@dont-email.me> <2022Mar18.095852@mips.complang.tuwien.ac.at>
<t125t2$agt$1@dont-email.me> <t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com> <t202km$bjl$1@dont-email.me>
<62442315$0$699$14726298@news.sunsite.dk> <t29luq$rv9$1@dont-email.me>
<6249acb9$0$692$14726298@news.sunsite.dk> <8cc3ed6c-9084-4ca2-b788-37e537e384dan@googlegroups.com>
<624afa89$0$706$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f0506731-e2c0-46ad-a160-b5bac9915f82n@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: the.beez...@gmail.com (Hans Bezemer)
Injection-Date: Mon, 04 Apr 2022 15:14:50 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Hans Bezemer - Mon, 4 Apr 2022 15:14 UTC

On Monday, April 4, 2022 at 4:02:53 PM UTC+2, Doug Hoffman wrote:
> On 4/4/22 5:18 AM, Hans Bezemer wrote:
> Looks good to me, though I didn't test it. Also looks to be well
> factored and documented. I'll not complain about non-Ans as I often use
> oForth. ;-)
Thx! I hope it will be useful. Last time I posted a snippet it went into a LONG discussion
about Comus words. As if BOUNDS or PLACE came from a different planet..

I forgot a little word : CROP 1- SWAP CHAR+ SWAP ; or : CROP 1 /STRING ;

And thanks for the nice words ;-)

Hans Bezemer

Re: Updated String Parsing Words in kForth

<e297ce91-ce3e-4e07-9e75-3c4888c1a14cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17466&group=comp.lang.forth#17466

  copy link   Newsgroups: comp.lang.forth
X-Received: by 2002:a05:622a:5d1:b0:2e0:70c7:1678 with SMTP id d17-20020a05622a05d100b002e070c71678mr1026685qtb.43.1649093808744;
Mon, 04 Apr 2022 10:36:48 -0700 (PDT)
X-Received: by 2002:a05:620a:240c:b0:680:a0f6:af19 with SMTP id
d12-20020a05620a240c00b00680a0f6af19mr719793qkn.110.1649093808560; Mon, 04
Apr 2022 10:36:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 4 Apr 2022 10:36:48 -0700 (PDT)
In-Reply-To: <f0506731-e2c0-46ad-a160-b5bac9915f82n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=82.95.228.79; posting-account=Ebqe4AoAAABfjCRL4ZqOHWv4jv5ZU4Cs
NNTP-Posting-Host: 82.95.228.79
References: <t0jcsp$9g4$1@dont-email.me> <2022Mar13.103712@mips.complang.tuwien.ac.at>
<t0lumk$8jf$1@dont-email.me> <2022Mar18.095852@mips.complang.tuwien.ac.at>
<t125t2$agt$1@dont-email.me> <t14quq$2hu$1@dont-email.me> <t17p41$ntj$1@dont-email.me>
<1de42645-1f88-4cd6-8692-abcb0db5841bn@googlegroups.com> <t202km$bjl$1@dont-email.me>
<62442315$0$699$14726298@news.sunsite.dk> <t29luq$rv9$1@dont-email.me>
<6249acb9$0$692$14726298@news.sunsite.dk> <8cc3ed6c-9084-4ca2-b788-37e537e384dan@googlegroups.com>
<624afa89$0$706$14726298@news.sunsite.dk> <f0506731-e2c0-46ad-a160-b5bac9915f82n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e297ce91-ce3e-4e07-9e75-3c4888c1a14cn@googlegroups.com>
Subject: Re: Updated String Parsing Words in kForth
From: the.beez...@gmail.com (Hans Bezemer)
Injection-Date: Mon, 04 Apr 2022 17:36:48 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 101
 by: Hans Bezemer - Mon, 4 Apr 2022 17:36 UTC

On Monday, April 4, 2022 at 5:14:52 PM UTC+2, Hans Bezemer wrote:
I thought, "let's be nice". It took me a few minutes to port. Seems to run ok, but your mileage may vary:

---8<---
\ 4tH library - SSCANF - Copyright 2022 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ Like sscanf(), but with a few differences
\ - Whitespace in both buffer and format string is largely ignored;
\ - %c with a width REQUIRES a string, %c without REQUIRES a variable;
\ - %s will stop on ANY defined delimiter, not just whitespace;
\ - Returns the unscanned part of the format string;
\ - Variable #SCANF returns the number of assignments made;
\ - Stack diagram of variables is inverted: d c b a s" %a %b %c %d";
\ - On failure, unused variables WILL remain on the stack.

\ Typical use:
\ a b c$ d$ e f g
\ s" %c%c %d%%%s, %4c%c%c" s" ab -12345%This is the end, 543210" sscanf

s" MAX-N" environment? \ query environment
[IF] \ if successful
constant max-n \ create constant MAX-N
[ELSE]
..( Warning: MAX-N undefined) cr
[THEN]

variable (delim) \ delimiter of string
variable (width) \ width of string
variable #scanf \ number of assignments
\ ANS Forth interface
: char- 1 chars - ; ( a -- a-1)
: >zero dup xor ; ( n -- 0)
: ;then postpone exit postpone then ; immediate
: unless postpone 0= postpone if ; immediate
: c/string over >r 1- swap char+ swap r> c@ ; ( a n -- a n-1 c)
\ a few execution tokens
: (dec) decimal ; : (hex) hex ; : (oct) 8 base ! ;
: (putback) 1+ swap char- swap ; ( a n -- a-1 n+1)
: (unumber) 0. 2swap >number 2>r d>s 2r> ; ( a1 n1 -- n3 a2 n2)
: (number) base @ >r execute (unumber) r> base ! ;
: (width!) ['] (dec) (number) rot dup unless max-n + then (width) ! ;
: (delimiter!) dup if c/string >r (putback) r> else dup then (delim) ! ;
: (delimiter?) over [char] ! - 0< over bl = and >r = r> or ;
: (assigned) 1 #scanf +! false ; ( -- f)
: (assign) swap ! (assigned) ; ( x n -- f)
: (place) rot place (assigned) ; ( a1 a2 n2 -- f)
\ get sign flag
: (sign) ( a1 n1 -- f a2 n2)
c/string dup [char] - = dup >r \ is it a minus sign?
if drop else [char] + <> if (putback) then then r> -rot
; \ drop plus sign
\ skip white space
: (skipwhite) ( a1 n1 -- a2 n2)
begin dup while c/string bl > dup >r if (putback) then r> until then
; \ parse buffer string
: (getstr) ( a1 n1 -- a2 n2 a3 n3)
over >r begin \ save starting address
dup \ any string left?
while \ if so, still within width?
over r@ - (width) @ < \ did we hit the delimiter?
while \ if so, put back the character
c/string (delim) @ (delimiter?) dup if >r (putback) r> then
until then then
over r@ - r> swap 2swap \ calculate string dimensions
; \ handle type specifiers
: (gettype) ( a1 n1 a2 n2 -- f)
2>r (width!) c/string >r (delimiter!) r> -rot
2r> 2swap 2>r (skipwhite) rot \ get properties, skip whitespace
case \ select type specifier
[char] d of \ 'd' requires a sign
(sign) ['] (dec) (number) 2>r swap if negate then (assign) endof
[char] u of ['] (dec) (number) 2>r (assign) endof
[char] x of ['] (hex) (number) 2>r (assign) endof
[char] o of ['] (oct) (number) 2>r (assign) endof
[char] % of c/string [char] % <> -rot 2>r endof
[char] s of (getstr) 2>r (place) endof
[char] c of \ if default width specified
(width) @ max-n = if \ just parse a single character
c/string -rot 2>r (assign) \ if a width has been specified
else \ take the entire length specified
2dup (width) @ /string 0 max 2>r (width) @ min (place)
then endof \ and put it in a string
endcase 2r> rot 2r> rot >r 2swap r> \ restore stack and pull up flag
; \ select character on format string
: (select) ( a1 n1 c a2 n2 -- a3 n3 a4 n4 f)
rot dup >r \ save current character
[char] % = if rdrop (gettype) ;then \ act on type specifier
r@ bl = if (skipwhite) 2swap (skipwhite) 2swap r> >zero ;then
c/string r> <> \ does this character match
;

: sscanf ( xn .. x0 a1 n1 a2 n2 -- a3 n3)
0 #scanf ! 2>r begin dup while c/string 2r> (select) -rot 2>r until then 2rdrop
; ---8<---

Hansa Bezemer

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor