Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Heisenberg may have been here.


devel / comp.lang.c / C vs Haskell for XML parsing

SubjectAuthor
* C vs Haskell for XML parsingMalcolm McLean
+* Re: C vs Haskell for XML parsingBart
|+* Re: C vs Haskell for XML parsingBen Bacarisse
||`* Re: C vs Haskell for XML parsingMalcolm McLean
|| `* Re: C vs Haskell for XML parsingBen Bacarisse
||  `* Re: C vs Haskell for XML parsingMalcolm McLean
||   `* Re: C vs Haskell for XML parsingBen Bacarisse
||    +* Re: C vs Haskell for XML parsingMalcolm McLean
||    |`- Re: C vs Haskell for XML parsingBen Bacarisse
||    `* Re: C vs Haskell for XML parsingMalcolm McLean
||     +- Re: C vs Haskell for XML parsingLew Pitcher
||     +* Re: C vs Haskell for XML parsingScott Lurndal
||     |+* Re: C vs Haskell for XML parsingLew Pitcher
||     ||+- Re: C vs Haskell for XML parsingLew Pitcher
||     ||`- Re: C vs Haskell for XML parsingScott Lurndal
||     |`* Re: C vs Haskell for XML parsingBen Bacarisse
||     | `- Re: C vs Haskell for XML parsingScott Lurndal
||     `* Re: C vs Haskell for XML parsingBen Bacarisse
||      `* Re: C vs Haskell for XML parsingMalcolm McLean
||       +- Re: C vs Haskell for XML parsingRichard Damon
||       `* Re: C vs Haskell for XML parsingBen Bacarisse
||        `* Re: C vs Haskell for XML parsingMalcolm McLean
||         +- Re: C vs Haskell for XML parsingKeith Thompson
||         `* Re: C vs Haskell for XML parsingBen Bacarisse
||          +* Re: C vs Haskell for XML parsingMalcolm McLean
||          |`- Re: C vs Haskell for XML parsingBen Bacarisse
||          `* Re: C vs Haskell for XML parsingDavid Brown
||           +* Re: C vs Haskell for XML parsingMalcolm McLean
||           |`* Re: C vs Haskell for XML parsingDavid Brown
||           | `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |  `* Re: C vs Haskell for XML parsingDavid Brown
||           |   +* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   |+* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||`* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   || `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||  `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||   `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||    `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     +* Re: C vs Haskell for XML parsingBart
||           |   ||     |`* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     | `- Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     +* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |`* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     | +- Re: C vs Haskell for XML parsingScott Lurndal
||           |   ||     | `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |  `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |   `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |    `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |     +* Re: C vs Haskell for XML parsingBart
||           |   ||     |     |+* Re: C vs Haskell for XML parsingKaz Kylheku
||           |   ||     |     ||`- Re: C vs Haskell for XML parsingKaz Kylheku
||           |   ||     |     |`- Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |     `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      +* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      |+* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      ||+* Re: C vs Haskell for XML parsingScott Lurndal
||           |   ||     |      |||`* Re: C vs Haskell for XML parsingLew Pitcher
||           |   ||     |      ||| `* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      |||  `- Re: C vs Haskell for XML parsingLew Pitcher
||           |   ||     |      ||+* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      |||`* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      ||| `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      |||  `- Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      ||`* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || +* Re: C vs Haskell for XML parsingScott Lurndal
||           |   ||     |      || |`* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || | +- Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      || | `* Re: C vs Haskell for XML parsingJames Kuyper
||           |   ||     |      || |  +* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || |  |+* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      || |  ||`* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || |  || `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      || |  ||  `* Re: C vs Haskell for XML parsingKaz Kylheku
||           |   ||     |      || |  ||   +- Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || |  ||   `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      || |  ||    `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || |  ||     `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      || |  ||      `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || |  ||       `- Re: C vs Haskell for XML parsingTim Rentsch
||           |   ||     |      || |  |+* Re: C vs Haskell for XML parsingKaz Kylheku
||           |   ||     |      || |  ||`- Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      || |  |+- Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      || |  |`- Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      || |  `- Re: C vs Haskell for XML parsingScott Lurndal
||           |   ||     |      || `* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      ||  `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      ||   +- Re: C vs Haskell for XML parsingChris M. Thomasson
||           |   ||     |      ||   `- Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      |`* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      | +* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      | |+* Re: C vs Haskell for XML parsingRichard Damon
||           |   ||     |      | ||+* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      | |||+- Re: C vs Haskell for XML parsingBen Bacarisse
||           |   ||     |      | |||+* Re: C vs Haskell for XML parsingRichard Damon
||           |   ||     |      | ||||`* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      | |||| `* Re: C vs Haskell for XML parsingRichard Damon
||           |   ||     |      | ||||  `* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      | ||||   `* Re: C vs Haskell for XML parsingRichard Damon
||           |   ||     |      | ||||    `* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      | ||||     `* Re: C vs Haskell for XML parsingRichard Damon
||           |   ||     |      | ||||      `* Re: C vs Haskell for XML parsingKeith Thompson
||           |   ||     |      | |||`- Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      | ||`* Re: C vs Haskell for XML parsingBart
||           |   ||     |      | |`* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      | `* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      +- Re: C vs Haskell for XML parsingSpiros Bousbouras
||           |   ||     |      +* Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     |      +* Underscores in type names (was : C vs Haskell for XML parsing)Spiros Bousbouras
||           |   ||     |      +* Re: C vs Haskell for XML parsingDavid Brown
||           |   ||     |      `- Re: C vs Haskell for XML parsingMalcolm McLean
||           |   ||     `- Re: C vs Haskell for XML parsingKeith Thompson
||           |   |`- Re: C vs Haskell for XML parsingScott Lurndal
||           |   `- Re: C vs Haskell for XML parsingBart
||           `- Re: C vs Haskell for XML parsingBen Bacarisse
|+* Re: C vs Haskell for XML parsingfir
|`* Re: C vs Haskell for XML parsingKaz Kylheku
+- Re: C vs Haskell for XML parsingBen Bacarisse
+- Re: C vs Haskell for XML parsingfir
`* Re: C vs Haskell for XML parsingfir

Pages:123456789101112
C vs Haskell for XML parsing

<576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27748&group=comp.lang.c#27748

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:7d46:0:b0:403:acaa:abe0 with SMTP id h6-20020ac87d46000000b00403acaaabe0mr10770qtb.8.1692171070093;
Wed, 16 Aug 2023 00:31:10 -0700 (PDT)
X-Received: by 2002:a05:6a00:22cb:b0:675:b734:d2fe with SMTP id
f11-20020a056a0022cb00b00675b734d2femr601169pfj.3.1692171069464; Wed, 16 Aug
2023 00:31:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 16 Aug 2023 00:31:08 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:a6:dd70:dbcb:58c;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:a6:dd70:dbcb:58c
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
Subject: C vs Haskell for XML parsing
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Wed, 16 Aug 2023 07:31:10 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1360
 by: Malcolm McLean - Wed, 16 Aug 2023 07:31 UTC

Some people here are interested in Haskell.
They might be interested in this:

https://chrisdone.com/posts/fast-haskell-c-parsing-xml/

Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.

Re: C vs Haskell for XML parsing

<ubi7hd$38q7d$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27753&group=comp.lang.c#27753

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Wed, 16 Aug 2023 11:14:06 +0100
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <ubi7hd$38q7d$1@dont-email.me>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 16 Aug 2023 10:14:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="17f8aedb09110eed4ec1c7adfd187d70";
logging-data="3434733"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+SVMyV6RcaFX3dumzt5CNpHcSv76oLqcA="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:1Za8Kp0EKWps0EYI4VTUHmxGxvY=
In-Reply-To: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
 by: Bart - Wed, 16 Aug 2023 10:14 UTC

On 16/08/2023 08:31, Malcolm McLean wrote:
> Some people here are interested in Haskell.
> They might be interested in this:
>
> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>
> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
>

"Portability (i.e. Windows) is a pain in the arse with C."

I wonder what makes them say that?

Reading from a file must be the world's most portable kind of program.
While issues with filenames and paths will be the same whatever the
language.

So what is it?

Re: C vs Haskell for XML parsing

<87ttsyfuxa.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27815&group=comp.lang.c#27815

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Thu, 17 Aug 2023 00:07:29 +0100
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <87ttsyfuxa.fsf@bsb.me.uk>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="c666b555a840b444cd6c032219f8b5e7";
logging-data="3653731"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18dOgDC70AR567RDPooiXzTdC1Wbjo391U="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:ZEXPYiSDuaKiNGAehLVrFjn4rss=
sha1:TXxvr00XKExyS40p4Qxya+WJwIo=
X-BSB-Auth: 1.e7d324c3763454823f82.20230817000729BST.87ttsyfuxa.fsf@bsb.me.uk
 by: Ben Bacarisse - Wed, 16 Aug 2023 23:07 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> Some people here are interested in Haskell.
> They might be interested in this:
>
> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/

Interesting. Thanks.

> Of course it's written from a pro-Haskell point of view, and writing
> an improved version when you've got the C in front of you isn't really
> a fair test. But he does match C for speed.

A lot of the refinements are ones one might try anyway, so I don't think
it's as directed as you suggest. Sadly, a lot of the improvements come
from targeted hints to a clever implementation so it's not always
Haskell the language that is giving the speed, but the phenomenal piece
of work that is the Glasgow Haskell compiler.

--
Ben.

Re: C vs Haskell for XML parsing

<87o7j6fu74.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27817&group=comp.lang.c#27817

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Thu, 17 Aug 2023 00:23:11 +0100
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <87o7j6fu74.fsf@bsb.me.uk>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="c666b555a840b444cd6c032219f8b5e7";
logging-data="3657358"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/SWkpZNmoYYo3Sw7jv888/OTlt27hXohU="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:6MQXpymSd/Y81b3afJ857PSNhww=
sha1:Rt7FEsqRhDqIiLdeao13R916B+8=
X-BSB-Auth: 1.26de6dc6c6a614f16297.20230817002311BST.87o7j6fu74.fsf@bsb.me.uk
 by: Ben Bacarisse - Wed, 16 Aug 2023 23:23 UTC

Bart <bc@freeuk.com> writes:

> On 16/08/2023 08:31, Malcolm McLean wrote:
>> Some people here are interested in Haskell.
>> They might be interested in this:
>> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>> Of course it's written from a pro-Haskell point of view, and writing an
>> improved version when you've got the C in front of you isn't really a
>> fair test. But he does match C for speed.
>>
>
> "Portability (i.e. Windows) is a pain in the arse with C."
>
> I wonder what makes them say that?

Yes, I wondered that too, since the cut-down XML parsing they are doing
is one of the most potentially portable bits of C one could write (as
you say yourself):

> Reading from a file must be the world's most portable kind of
> program.

But reading more closely, the remark is a general one about dropping out
of a high-level language for some part of a program rather than being
specific to this task. None the less, I'd have liked a citation or
link.

> While issues with filenames and paths will be the same whatever
> the language.

Not always. Some languages have standard library functions to handle
such things (e.g. Python and Haskell). I imagine that's what the author
was thinking about.

--
Ben.

Re: C vs Haskell for XML parsing

<af9d6384-203a-48c0-afbf-3e50de15252fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27821&group=comp.lang.c#27821

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a5d:51c3:0:b0:318:7887:5610 with SMTP id n3-20020a5d51c3000000b0031878875610mr26335wrv.1.1692231904996;
Wed, 16 Aug 2023 17:25:04 -0700 (PDT)
X-Received: by 2002:a17:902:f950:b0:1bb:a78c:7a3e with SMTP id
kx16-20020a170902f95000b001bba78c7a3emr1139288plb.3.1692231904039; Wed, 16
Aug 2023 17:25:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 16 Aug 2023 17:25:03 -0700 (PDT)
In-Reply-To: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.214; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.214
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <af9d6384-203a-48c0-afbf-3e50de15252fn@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: profesor...@gmail.com (fir)
Injection-Date: Thu, 17 Aug 2023 00:25:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: fir - Thu, 17 Aug 2023 00:25 UTC

środa, 16 sierpnia 2023 o 09:31:17 UTC+2 Malcolm McLean napisał(a):
> Some people here are interested in Haskell.
> They might be interested in this:
>
> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>
> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.

many articles on programing are written by noobs/ponts and judging from some sentences
here its probably a noob/pony (i wouldnt say definitely as some things may dependant on soem contextes, but being 90% a chance of being pony is enough to keep reserve for such kind of articles)

for example its terribly noobish to judge languages based on ibraries in those languages, its preposterously stupid ..many pones write pony code even in c - you itself wrote thsi pice with fgetc in loop which showed to be so slow - if some wnt to judge and conclude better compare real good code in c with real good code in other language .. besides all that conclusions are finally nonintersting as i generally know what things look like (in an aspect of speed - learning language for constructions may be valuable for sme other reasons)

Re: C vs Haskell for XML parsing

<1d587f1a-6958-4dd2-bc02-45b47f535278n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27822&group=comp.lang.c#27822

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a5d:6a49:0:b0:317:4797:ff11 with SMTP id t9-20020a5d6a49000000b003174797ff11mr26948wrw.1.1692232322696;
Wed, 16 Aug 2023 17:32:02 -0700 (PDT)
X-Received: by 2002:a17:903:1109:b0:1b5:2b14:5f2c with SMTP id
n9-20020a170903110900b001b52b145f2cmr1432192plh.4.1692232321772; Wed, 16 Aug
2023 17:32:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 16 Aug 2023 17:32:01 -0700 (PDT)
In-Reply-To: <ubi7hd$38q7d$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.214; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.214
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> <ubi7hd$38q7d$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1d587f1a-6958-4dd2-bc02-45b47f535278n@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: profesor...@gmail.com (fir)
Injection-Date: Thu, 17 Aug 2023 00:32:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: fir - Thu, 17 Aug 2023 00:32 UTC

środa, 16 sierpnia 2023 o 12:14:20 UTC+2 Bart napisał(a):
> On 16/08/2023 08:31, Malcolm McLean wrote:
> > Some people here are interested in Haskell.
> > They might be interested in this:
> >
> > https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >
> > Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
> >
> "Portability (i.e. Windows) is a pain in the arse with C."
>
> I wonder what makes them say that?
>
> Reading from a file must be the world's most portable kind of program.
> While issues with filenames and paths will be the same whatever the
> language.
>
> So what is it?

ponys have special areas of ponys trash talking "portability" is one of that areas, other are for example "standard/undefined behaviour" or "dont diccover a wheel" or "oop is good" etc...its pony world full of belifs myths lies and general bulshit

Re: C vs Haskell for XML parsing

<20230816173214.630@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27823&group=comp.lang.c#27823

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Thu, 17 Aug 2023 00:37:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <20230816173214.630@kylheku.com>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me>
Injection-Date: Thu, 17 Aug 2023 00:37:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1aa1e972d16a2b9e389dd8d4860af990";
logging-data="3672940"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ivmUe/n8EOts44Qr2x81Lsdq/UjOX1Qg="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:pTcyexB+KJRNqYxCCDfhpZbuIsI=
 by: Kaz Kylheku - Thu, 17 Aug 2023 00:37 UTC

On 2023-08-16, Bart <bc@freeuk.com> wrote:
> On 16/08/2023 08:31, Malcolm McLean wrote:
>> Some people here are interested in Haskell.
>> They might be interested in this:
>>
>> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>>
>> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
>>
>
> "Portability (i.e. Windows) is a pain in the arse with C."
>
> I wonder what makes them say that?
>
> Reading from a file must be the world's most portable kind of program.
> While issues with filenames and paths will be the same whatever the
> language.
>
> So what is it?

Portability of advanced programs that go beyond the standard C library.

For instance, say, a program that opens a serial device and sets the
baud rate, framing/parity bits and hardware handshaking, and then
reads from it --- with timeouts.

POSIX is quite different from Win32 in every way. Everything is done
differently: ranging from a bit differently, to radically.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: C vs Haskell for XML parsing

<38ef19cc-ff88-4200-b7ba-cf4757c82bebn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27824&group=comp.lang.c#27824

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:adf:f052:0:b0:30f:c27c:a363 with SMTP id t18-20020adff052000000b0030fc27ca363mr25205wro.11.1692232836866;
Wed, 16 Aug 2023 17:40:36 -0700 (PDT)
X-Received: by 2002:a37:5ac4:0:b0:76d:3475:2e04 with SMTP id
o187-20020a375ac4000000b0076d34752e04mr37496qkb.3.1692232836045; Wed, 16 Aug
2023 17:40:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 16 Aug 2023 17:40:35 -0700 (PDT)
In-Reply-To: <20230816173214.630@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.44; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.44
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <20230816173214.630@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <38ef19cc-ff88-4200-b7ba-cf4757c82bebn@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: profesor...@gmail.com (fir)
Injection-Date: Thu, 17 Aug 2023 00:40:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2701
 by: fir - Thu, 17 Aug 2023 00:40 UTC

czwartek, 17 sierpnia 2023 o 02:37:57 UTC+2 Kaz Kylheku napisał(a):
> On 2023-08-16, Bart <b...@freeuk.com> wrote:
> > On 16/08/2023 08:31, Malcolm McLean wrote:
> >> Some people here are interested in Haskell.
> >> They might be interested in this:
> >>
> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >>
> >> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
> >>
> >
> > "Portability (i.e. Windows) is a pain in the arse with C."
> >
> > I wonder what makes them say that?
> >
> > Reading from a file must be the world's most portable kind of program.
> > While issues with filenames and paths will be the same whatever the
> > language.
> >
> > So what is it?
> Portability of advanced programs that go beyond the standard C library.
>
> For instance, say, a program that opens a serial device and sets the
> baud rate, framing/parity bits and hardware handshaking, and then
> reads from it --- with timeouts.
>
> POSIX is quite different from Win32 in every way. Everything is done
> differently: ranging from a bit differently, to radically.
>
and in haskell you get it more portable?

Re: C vs Haskell for XML parsing

<101175be-f5e7-46d5-8369-64278ec9a3b9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27825&group=comp.lang.c#27825

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a5d:61c2:0:b0:317:5f7b:8c1f with SMTP id q2-20020a5d61c2000000b003175f7b8c1fmr25543wrv.10.1692233270030;
Wed, 16 Aug 2023 17:47:50 -0700 (PDT)
X-Received: by 2002:a17:902:ea0d:b0:1bc:7001:6e5c with SMTP id
s13-20020a170902ea0d00b001bc70016e5cmr1458059plg.3.1692233269244; Wed, 16 Aug
2023 17:47:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 16 Aug 2023 17:47:48 -0700 (PDT)
In-Reply-To: <1d587f1a-6958-4dd2-bc02-45b47f535278n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.44; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.44
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <1d587f1a-6958-4dd2-bc02-45b47f535278n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <101175be-f5e7-46d5-8369-64278ec9a3b9n@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: profesor...@gmail.com (fir)
Injection-Date: Thu, 17 Aug 2023 00:47:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: fir - Thu, 17 Aug 2023 00:47 UTC

czwartek, 17 sierpnia 2023 o 02:32:15 UTC+2 fir napisał(a):
> środa, 16 sierpnia 2023 o 12:14:20 UTC+2 Bart napisał(a):
> > On 16/08/2023 08:31, Malcolm McLean wrote:
> > > Some people here are interested in Haskell.
> > > They might be interested in this:
> > >
> > > https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> > >
> > > Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
> > >
> > "Portability (i.e. Windows) is a pain in the arse with C."
> >
> > I wonder what makes them say that?
> >
> > Reading from a file must be the world's most portable kind of program.
> > While issues with filenames and paths will be the same whatever the
> > language.
> >
> > So what is it?
> ponys have special areas of ponys trash talking "portability" is one of that areas, other are for example "standard/undefined behaviour" or "dont diccover a wheel" or "oop is good" etc...its pony world full of belifs myths lies and general bulshit

this pony areas list is ofc longer and it grows depending of how many things soem inspects itself the more myths pony lives in shows..this way reading pony articles that propagate myths is somewhat rather a negative value

Re: C vs Haskell for XML parsing

<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27828&group=comp.lang.c#27828

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:8c02:b0:76d:7b5c:c87a with SMTP id qz2-20020a05620a8c0200b0076d7b5cc87amr7651qkn.4.1692247088195;
Wed, 16 Aug 2023 21:38:08 -0700 (PDT)
X-Received: by 2002:a05:6a00:1704:b0:687:a55f:a9ef with SMTP id
h4-20020a056a00170400b00687a55fa9efmr1887843pfc.2.1692247087644; Wed, 16 Aug
2023 21:38:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 16 Aug 2023 21:38:06 -0700 (PDT)
In-Reply-To: <87o7j6fu74.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:e1a0:67fe:7527:13f;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:e1a0:67fe:7527:13f
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Thu, 17 Aug 2023 04:38:08 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Malcolm McLean - Thu, 17 Aug 2023 04:38 UTC

On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
> > On 16/08/2023 08:31, Malcolm McLean wrote:
> >> Some people here are interested in Haskell.
> >> They might be interested in this:
> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >> Of course it's written from a pro-Haskell point of view, and writing an
> >> improved version when you've got the C in front of you isn't really a
> >> fair test. But he does match C for speed.
> >>
> >
> > "Portability (i.e. Windows) is a pain in the arse with C."
> >
> > I wonder what makes them say that?
> Yes, I wondered that too, since the cut-down XML parsing they are doing
> is one of the most potentially portable bits of C one could write (as
> you say yourself):
>
There are some gotchas with files, but not for the cut down parsing
they implement.
Windows used to accept "rt" for reading a text stream. And there's still
a mess with Unicode. And the XML people say that a parser must accept
UTF-16.
I implement this by having the lexer call a function pointer to read a
UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
on the fly. If it's UTF-8, it's just an alias for fgetc().
But how do you know the file format? I have code that does this, but if
I called it, the XML parser would no longer be a single file module. So
I read the first few character of the file, the seek back to the start
position.
Bu this only works on seekable streams. So the high-level parse function
which accepts a stream rather than a file name either has to insist on
a seekable stream, or it has to insist that the stream be in known format,
or the stream access function has to maintain a little buffer. The last
solution is the real one, but it's such a fiddly thing that instead I decided on the
known format (if you call with a FILE * rather than a filename, the data must
be in UTF-8).
But it is a complete pain which would have been avoided with a high-level
language which just loads a text file.

Re: C vs Haskell for XML parsing

<ccdad921-747e-42b8-bf3d-b9f56fb30e71n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27833&group=comp.lang.c#27833

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:1a0e:b0:40d:4c6:bce4 with SMTP id f14-20020a05622a1a0e00b0040d04c6bce4mr41479qtb.11.1692265026630;
Thu, 17 Aug 2023 02:37:06 -0700 (PDT)
X-Received: by 2002:a17:90a:cc03:b0:26c:fab1:9e23 with SMTP id
b3-20020a17090acc0300b0026cfab19e23mr823306pju.0.1692265026005; Thu, 17 Aug
2023 02:37:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Thu, 17 Aug 2023 02:37:05 -0700 (PDT)
In-Reply-To: <20230816173214.630@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <20230816173214.630@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ccdad921-747e-42b8-bf3d-b9f56fb30e71n@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 17 Aug 2023 09:37:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Michael S - Thu, 17 Aug 2023 09:37 UTC

On Thursday, August 17, 2023 at 3:37:57 AM UTC+3, Kaz Kylheku wrote:
> On 2023-08-16, Bart <b...@freeuk.com> wrote:
> > On 16/08/2023 08:31, Malcolm McLean wrote:
> >> Some people here are interested in Haskell.
> >> They might be interested in this:
> >>
> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >>
> >> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
> >>
> >
> > "Portability (i.e. Windows) is a pain in the arse with C."
> >
> > I wonder what makes them say that?
> >
> > Reading from a file must be the world's most portable kind of program.
> > While issues with filenames and paths will be the same whatever the
> > language.
> >
> > So what is it?
> Portability of advanced programs that go beyond the standard C library.
>
> For instance, say, a program that opens a serial device and sets the
> baud rate, framing/parity bits and hardware handshaking, and then
> reads from it --- with timeouts.
>
> POSIX is quite different from Win32 in every way. Everything is done
> differently: ranging from a bit differently, to radically.
>

Semantic difference between POSIX and Win32 APIs is quite small
relatively to expected difference between randomly chosen pair of
unrelated operation systems.
More so, at OS API/concepts level Win32 is more similar to
[unrelated] POSIX than to [related] VMS.

Kernel-user; processes-threads-preemptive multitasking;
memory not shared between processes by default, but can be shared
on request; hierarchical file systems; files=collections of bytes;
block devices, character devices; everything is a file descriptor (handle)
except few things that are not; etc...
We are used to take all this similarities for granted due to little exposure
to more diverse world.

> --
> TXR Programming Language: http://nongnu.org/txr
> Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
> Mastodon: @Kazi...@mstdn.ca

Re: C vs Haskell for XML parsing

<84381416-9dc3-47fd-abac-da497a5f8860n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27837&group=comp.lang.c#27837

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:4a4b:b0:63c:f7eb:470 with SMTP id ph11-20020a0562144a4b00b0063cf7eb0470mr42488qvb.11.1692268356657;
Thu, 17 Aug 2023 03:32:36 -0700 (PDT)
X-Received: by 2002:a17:903:2348:b0:1b8:5541:9d3e with SMTP id
c8-20020a170903234800b001b855419d3emr1833943plh.6.1692268356326; Thu, 17 Aug
2023 03:32:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Thu, 17 Aug 2023 03:32:35 -0700 (PDT)
In-Reply-To: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.102; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.102
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <84381416-9dc3-47fd-abac-da497a5f8860n@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: profesor...@gmail.com (fir)
Injection-Date: Thu, 17 Aug 2023 10:32:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: fir - Thu, 17 Aug 2023 10:32 UTC

środa, 16 sierpnia 2023 o 09:31:17 UTC+2 Malcolm McLean napisał(a):
> Some people here are interested in Haskell.
> They might be interested in this:
>
> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>
> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.

he does match the c library.. and this library is probably sloppy written, so its a test library vs library (saying haskel environment is a library) ...and if so so what?
you got hevy thousands of libraries and saying this one non interesting is faster then thsi one not interesting is of not much of big value if someone want learbn something more "groundfull" ... more worse af it not inspect low lewel things
this overal outcome is propagating myths and this is in fact bad harmful activity

i name personally some peopple in programming that live in myths and propagate myths as ponys (though thsi is not definition, ponys are ponys, but they propagate myths and live in myths, write articles on myths etc)

Re: C vs Haskell for XML parsing

<815a0ff4-dcd0-4c4c-ba94-8fd764f34b01n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27838&group=comp.lang.c#27838

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:3016:b0:635:e500:8dc7 with SMTP id ke22-20020a056214301600b00635e5008dc7mr35186qvb.4.1692268944216;
Thu, 17 Aug 2023 03:42:24 -0700 (PDT)
X-Received: by 2002:a17:902:d4d0:b0:1bf:cc5:7b57 with SMTP id
o16-20020a170902d4d000b001bf0cc57b57mr1100805plg.3.1692268943634; Thu, 17 Aug
2023 03:42:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Thu, 17 Aug 2023 03:42:22 -0700 (PDT)
In-Reply-To: <84381416-9dc3-47fd-abac-da497a5f8860n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.102; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.102
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> <84381416-9dc3-47fd-abac-da497a5f8860n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <815a0ff4-dcd0-4c4c-ba94-8fd764f34b01n@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: profesor...@gmail.com (fir)
Injection-Date: Thu, 17 Aug 2023 10:42:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2905
 by: fir - Thu, 17 Aug 2023 10:42 UTC

czwartek, 17 sierpnia 2023 o 12:32:45 UTC+2 fir napisał(a):
> środa, 16 sierpnia 2023 o 09:31:17 UTC+2 Malcolm McLean napisał(a):
> > Some people here are interested in Haskell.
> > They might be interested in this:
> >
> > https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >
> > Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
> he does match the c library.. and this library is probably sloppy written, so its a test library vs library (saying haskel environment is a library) ..and if so so what?
> you got hevy thousands of libraries and saying this one non interesting is faster then thsi one not interesting is of not much of big value if someone want learbn something more "groundfull" ... more worse af it not inspect low lewel things
> this overal outcome is propagating myths and this is in fact bad harmful activity
>
> i name personally some peopple in programming that live in myths and propagate myths as ponys (though thsi is not definition, ponys are ponys, but they propagate myths and live in myths, write articles on myths etc)

btw those numbers 0.2 ms for 0.2 MB are besides not bad /quite high so
i wouldnt maybe say its neccesarely badly written as speed is good,
but if some of haskel beats c lib then those part of c lib is sloppy written (or test
may also be uiunfair) but this is banal and obvious conclusion so this article
outcome is at most banal but it is worse in fact as it uses this myth-talking

Re: C vs Haskell for XML parsing

<877cptgbli.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27839&group=comp.lang.c#27839

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Thu, 17 Aug 2023 12:19:37 +0100
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <877cptgbli.fsf@bsb.me.uk>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="c666b555a840b444cd6c032219f8b5e7";
logging-data="3954483"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19YdFk6oyi/uFpiPYVgTuZ/Emyc6tyNjWg="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:7Q5XjDsp8FJEos4HDprZhNllc/8=
sha1:s0WILsIwa7ujiMDrLPAjK1RzRqY=
X-BSB-Auth: 1.962420fd80e9c4cea360.20230817121937BST.877cptgbli.fsf@bsb.me.uk
 by: Ben Bacarisse - Thu, 17 Aug 2023 11:19 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
>> Bart <b...@freeuk.com> writes:
>>
>> > On 16/08/2023 08:31, Malcolm McLean wrote:
>> >> Some people here are interested in Haskell.
>> >> They might be interested in this:
>> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>> >> Of course it's written from a pro-Haskell point of view, and writing an
>> >> improved version when you've got the C in front of you isn't really a
>> >> fair test. But he does match C for speed.
>> >>
>> >
>> > "Portability (i.e. Windows) is a pain in the arse with C."
>> >
>> > I wonder what makes them say that?
>> Yes, I wondered that too, since the cut-down XML parsing they are doing
>> is one of the most potentially portable bits of C one could write (as
>> you say yourself):
>>
> There are some gotchas with files, but not for the cut down parsing
> they implement.
> Windows used to accept "rt" for reading a text stream. And there's
> still a mess with Unicode.

None of that matters for the case in point. The C code treats the input
like a stream of 8-bit bytes. You can do that without regard to line
convention.

> And the XML people say that a parser must
> accept UTF-16.

Again, that's not relevant to the case in the article. But it's also a
completely different issue. An XML parser that must handle either UTF-8
or UTF-16 needs a layer below the parser (conceptually) to detect the
encoding and return "characters" (as I think you have done). There is
no reason to suppose that that can't be written in portable C.

> I implement this by having the lexer call a function pointer to read a
> UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
> on the fly.

Exactly -- though I think I would not have converted to UTF-8 in a plain
parser. Maybe your application make that a good choice.

> If it's UTF-8, it's just an alias for fgetc().
> But how do you know the file format?

The first character much be '<' (and, technically, it must be the '<' that
opens an XML declaration). The encoding should be clear from the first
two bytes.

> I have code that does this, but if
> I called it, the XML parser would no longer be a single file module. So
> I read the first few character of the file, the seek back to the start
> position.
> But this only works on seekable streams.

Actually, you don't need to read more that one character to determine if
the file is UTF-8 or UTF-16, all you need to do is an ungetc call and
that works on non-seekable streams.

> So the high-level parse function which accepts a stream rather than a
> file name either has to insist on a seekable stream, or it has to
> insist that the stream be in known format, or the stream access
> function has to maintain a little buffer. The last solution is the
> real one, but it's such a fiddly thing

Given that you convert UTF-16 to UTF-8, I'd have thought it was the
natural choice, even though you can get away with just an ungetc
call. But then I don't know how your code is organised. What's fiddly
about it?

> that instead I decided on the
> known format (if you call with a FILE * rather than a filename, the
> data must be in UTF-8). But it is a complete pain which would have
> been avoided with a high-level language which just loads a text file.

Agreed. Though I don't think the world is that good at agreeing things.
It's possible that this is not what the author had in mind with the
high/low-level portable code remark, but it's not a clear-cut case. When
the world has decided that a text file can just be opened and read, such
a facility could be provided by standard C, and even if not standard, it
could probably be written in portable C.

--
Ben.

Re: C vs Haskell for XML parsing

<20230817064532.687@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27846&group=comp.lang.c#27846

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Thu, 17 Aug 2023 13:50:05 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <20230817064532.687@kylheku.com>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <20230816173214.630@kylheku.com>
<ccdad921-747e-42b8-bf3d-b9f56fb30e71n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 17 Aug 2023 13:50:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1aa1e972d16a2b9e389dd8d4860af990";
logging-data="3992894"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+LKh8rq9obWjF0qgEHKkCvXoYUqJ6MZwM="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:0v7444L/rssbDTsK+buXm4Dr1ro=
 by: Kaz Kylheku - Thu, 17 Aug 2023 13:50 UTC

On 2023-08-17, Michael S <already5chosen@yahoo.com> wrote:
> On Thursday, August 17, 2023 at 3:37:57 AM UTC+3, Kaz Kylheku wrote:
>> On 2023-08-16, Bart <b...@freeuk.com> wrote:
>> > On 16/08/2023 08:31, Malcolm McLean wrote:
>> >> Some people here are interested in Haskell.
>> >> They might be interested in this:
>> >>
>> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>> >>
>> >> Of course it's written from a pro-Haskell point of view, and writing an improved version when you've got the C in front of you isn't really a fair test. But he does match C for speed.
>> >>
>> >
>> > "Portability (i.e. Windows) is a pain in the arse with C."
>> >
>> > I wonder what makes them say that?
>> >
>> > Reading from a file must be the world's most portable kind of program.
>> > While issues with filenames and paths will be the same whatever the
>> > language.
>> >
>> > So what is it?
>> Portability of advanced programs that go beyond the standard C library.
>>
>> For instance, say, a program that opens a serial device and sets the
>> baud rate, framing/parity bits and hardware handshaking, and then
>> reads from it --- with timeouts.
>>
>> POSIX is quite different from Win32 in every way. Everything is done
>> differently: ranging from a bit differently, to radically.
>>
>
> Semantic difference between POSIX and Win32 APIs is quite small
> relatively to expected difference between randomly chosen pair of
> unrelated operation systems.

Yes; if we take a broad, and especially historic perspective, sure.

TOPS-10 versus OS/360, or whatever random pair.

It's the syntactic differences that cause the difficulty. If
the concept behind a functional area is very similar, but the
actual functional area looks different at the API level, then
you're basically writing the code twice.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: C vs Haskell for XML parsing

<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27855&group=comp.lang.c#27855

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:b93:b0:635:f3c8:860d with SMTP id fe19-20020a0562140b9300b00635f3c8860dmr51074qvb.11.1692284011756;
Thu, 17 Aug 2023 07:53:31 -0700 (PDT)
X-Received: by 2002:a63:a312:0:b0:565:dc04:c915 with SMTP id
s18-20020a63a312000000b00565dc04c915mr1030204pge.9.1692284011201; Thu, 17 Aug
2023 07:53:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Thu, 17 Aug 2023 07:53:30 -0700 (PDT)
In-Reply-To: <877cptgbli.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:e1a0:67fe:7527:13f;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:e1a0:67fe:7527:13f
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk> <37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Thu, 17 Aug 2023 14:53:31 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4671
 by: Malcolm McLean - Thu, 17 Aug 2023 14:53 UTC

On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
> >> Bart <b...@freeuk.com> writes:
> >>
> >> > On 16/08/2023 08:31, Malcolm McLean wrote:
> >> >> Some people here are interested in Haskell.
> >> >> They might be interested in this:
> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >> >> Of course it's written from a pro-Haskell point of view, and writing an
> >> >> improved version when you've got the C in front of you isn't really a
> >> >> fair test. But he does match C for speed.
> >> >>
> >> >
> >> > "Portability (i.e. Windows) is a pain in the arse with C."
> >> >
> >> > I wonder what makes them say that?
> >> Yes, I wondered that too, since the cut-down XML parsing they are doing
> >> is one of the most potentially portable bits of C one could write (as
> >> you say yourself):
> >>
> > There are some gotchas with files, but not for the cut down parsing
> > they implement.
> > Windows used to accept "rt" for reading a text stream. And there's
> > still a mess with Unicode.
> None of that matters for the case in point. The C code treats the input
> like a stream of 8-bit bytes. You can do that without regard to line
> convention.
> > And the XML people say that a parser must
> > accept UTF-16.
> Again, that's not relevant to the case in the article. But it's also a
> completely different issue. An XML parser that must handle either UTF-8
> or UTF-16 needs a layer below the parser (conceptually) to detect the
> encoding and return "characters" (as I think you have done). There is
> no reason to suppose that that can't be written in portable C.
> > I implement this by having the lexer call a function pointer to read a
> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
> > on the fly.
> Exactly -- though I think I would not have converted to UTF-8 in a plain
> parser. Maybe your application make that a good choice.
> > If it's UTF-8, it's just an alias for fgetc().
> > But how do you know the file format?
> The first character much be '<' (and, technically, it must be the '<' that
> opens an XML declaration). The encoding should be clear from the first
> two bytes.
> > I have code that does this, but if
> > I called it, the XML parser would no longer be a single file module. So
> > I read the first few character of the file, the seek back to the start
> > position.
> > But this only works on seekable streams.
>
> Actually, you don't need to read more that one character to determine if
> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and
> that works on non-seekable streams.
>
You need two characters, because you might have a UTF-16 little-endian
stream without a BOM. So the first character in 8 bit bytes would be '<'.
But there's a simple hack, which is to read the first character from the
stream, then set up the lexer with a "<' sitting in its token. So of course
you also have to read the first character when passed a string, which
is a bit of a nuisance (and that's the sort of thing that gives programming
such a bad reputation). But it should work now when piped a non-seekable
UTF-16 stream.

Re: C vs Haskell for XML parsing

<87o7j4vt6r.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27905&group=comp.lang.c#27905

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Sat, 19 Aug 2023 00:15:08 +0100
Organization: A noiseless patient Spider
Lines: 100
Message-ID: <87o7j4vt6r.fsf@bsb.me.uk>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="3fc15511aac7d6a52e9762e8994bb691";
logging-data="470062"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191RXhW35VUC1xWagla23O4qBrtUz02OpA="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:uszjmccAS+devldJgsvG5L7z2Xs=
sha1:z+Gz8UOjeXOLD3ZLnkRYQ94AyDU=
X-BSB-Auth: 1.151ef3d87d92ba5df957.20230819001508BST.87o7j4vt6r.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 18 Aug 2023 23:15 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
>> >> Bart <b...@freeuk.com> writes:
>> >>
>> >> > On 16/08/2023 08:31, Malcolm McLean wrote:
>> >> >> Some people here are interested in Haskell.
>> >> >> They might be interested in this:
>> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>> >> >> Of course it's written from a pro-Haskell point of view, and writing an
>> >> >> improved version when you've got the C in front of you isn't really a
>> >> >> fair test. But he does match C for speed.
>> >> >>
>> >> >
>> >> > "Portability (i.e. Windows) is a pain in the arse with C."
>> >> >
>> >> > I wonder what makes them say that?
>> >> Yes, I wondered that too, since the cut-down XML parsing they are doing
>> >> is one of the most potentially portable bits of C one could write (as
>> >> you say yourself):
>> >>
>> > There are some gotchas with files, but not for the cut down parsing
>> > they implement.
>> > Windows used to accept "rt" for reading a text stream. And there's
>> > still a mess with Unicode.
>> None of that matters for the case in point. The C code treats the input
>> like a stream of 8-bit bytes. You can do that without regard to line
>> convention.
>> > And the XML people say that a parser must
>> > accept UTF-16.
>> Again, that's not relevant to the case in the article. But it's also a
>> completely different issue. An XML parser that must handle either UTF-8
>> or UTF-16 needs a layer below the parser (conceptually) to detect the
>> encoding and return "characters" (as I think you have done). There is
>> no reason to suppose that that can't be written in portable C.
>> > I implement this by having the lexer call a function pointer to read a
>> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
>> > on the fly.
>> Exactly -- though I think I would not have converted to UTF-8 in a plain
>> parser. Maybe your application make that a good choice.
>> > If it's UTF-8, it's just an alias for fgetc().
>> > But how do you know the file format?
>> The first character much be '<' (and, technically, it must be the '<' that
>> opens an XML declaration). The encoding should be clear from the first
>> two bytes.

I ended up looking at the spec (that's an hour I'll never get back!) and
it's more complicated...

>> > I have code that does this, but if
>> > I called it, the XML parser would no longer be a single file module. So
>> > I read the first few character of the file, the seek back to the start
>> > position.
>> > But this only works on seekable streams.
>>
>> Actually, you don't need to read more that one character to determine if
>> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and
>> that works on non-seekable streams.
>>
> You need two characters, because you might have a UTF-16 little-endian
> stream without a BOM. So the first character in 8 bit bytes would be
> '<'.

Yes, I wasn't thinking. Thanks. You can't always tell until the second
byte, but you don't have to "unget" anything in that case because you
now know the character.

But as it happens I spoke way too soon... The full picture is a mess.

> But there's a simple hack, which is to read the first character from the
> stream, then set up the lexer with a "<' sitting in its token. So of course
> you also have to read the first character when passed a string, which
> is a bit of a nuisance (and that's the sort of thing that gives programming
> such a bad reputation).

What do you mean "when passed a string"? Do you mean when the parser is
acting on in-memory data?

> But it should work now when piped a non-seekable
> UTF-16 stream.

It turns out that if you want to be 100% conforming you need to be able
to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to
set up just enough of the reading mechanism to be able to read the XML
declaration and then adjust the reading mechanism to handle the named
encoding. For your application, ISO-8859-1 might be effectively the
same as ISO-8859-15, but UCS-4 is a complication and you might want to
flag certain errors if the encoding is named as ISO-10646-UCS-2 rather
than UTF-16.

While this can obviously be done in C, I would much rather do it in
Haskell. Haskell's lazy evaluation gives you stream IO for free (so to
speak), and handling the tail of a lazy stream with functions computed
by looking at the start of it comes naturally in Haskell.

--
Ben.

Re: C vs Haskell for XML parsing

<323a8074-838d-4dfd-ad44-32eda639760en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27908&group=comp.lang.c#27908

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:a44:b0:649:7b86:8aaa with SMTP id ee4-20020a0562140a4400b006497b868aaamr7910qvb.0.1692401622419;
Fri, 18 Aug 2023 16:33:42 -0700 (PDT)
X-Received: by 2002:a17:90b:895:b0:26d:1201:a8cb with SMTP id
bj21-20020a17090b089500b0026d1201a8cbmr161452pjb.2.1692401618440; Fri, 18 Aug
2023 16:33:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 18 Aug 2023 16:33:37 -0700 (PDT)
In-Reply-To: <87o7j4vt6r.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:5d6c:de7b:7e4:995e;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:5d6c:de7b:7e4:995e
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk> <37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk> <250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <323a8074-838d-4dfd-ad44-32eda639760en@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 18 Aug 2023 23:33:42 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 7883
 by: Malcolm McLean - Fri, 18 Aug 2023 23:33 UTC

On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
> > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote:
> >> Malcolm McLean <malcolm.ar...@gmail.com> writes:
> >>
> >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
> >> >> Bart <b...@freeuk.com> writes:
> >> >>
> >> >> > On 16/08/2023 08:31, Malcolm McLean wrote:
> >> >> >> Some people here are interested in Haskell.
> >> >> >> They might be interested in this:
> >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
> >> >> >> Of course it's written from a pro-Haskell point of view, and writing an
> >> >> >> improved version when you've got the C in front of you isn't really a
> >> >> >> fair test. But he does match C for speed.
> >> >> >>
> >> >> >
> >> >> > "Portability (i.e. Windows) is a pain in the arse with C."
> >> >> >
> >> >> > I wonder what makes them say that?
> >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing
> >> >> is one of the most potentially portable bits of C one could write (as
> >> >> you say yourself):
> >> >>
> >> > There are some gotchas with files, but not for the cut down parsing
> >> > they implement.
> >> > Windows used to accept "rt" for reading a text stream. And there's
> >> > still a mess with Unicode.
> >> None of that matters for the case in point. The C code treats the input
> >> like a stream of 8-bit bytes. You can do that without regard to line
> >> convention.
> >> > And the XML people say that a parser must
> >> > accept UTF-16.
> >> Again, that's not relevant to the case in the article. But it's also a
> >> completely different issue. An XML parser that must handle either UTF-8
> >> or UTF-16 needs a layer below the parser (conceptually) to detect the
> >> encoding and return "characters" (as I think you have done). There is
> >> no reason to suppose that that can't be written in portable C.
> >> > I implement this by having the lexer call a function pointer to read a
> >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
> >> > on the fly.
> >> Exactly -- though I think I would not have converted to UTF-8 in a plain
> >> parser. Maybe your application make that a good choice.
> >> > If it's UTF-8, it's just an alias for fgetc().
> >> > But how do you know the file format?
> >> The first character much be '<' (and, technically, it must be the '<' that
> >> opens an XML declaration). The encoding should be clear from the first
> >> two bytes.
> I ended up looking at the spec (that's an hour I'll never get back!) and
> it's more complicated...
> >> > I have code that does this, but if
> >> > I called it, the XML parser would no longer be a single file module. So
> >> > I read the first few character of the file, the seek back to the start
> >> > position.
> >> > But this only works on seekable streams.
> >>
> >> Actually, you don't need to read more that one character to determine if
> >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and
> >> that works on non-seekable streams.
> >>
> > You need two characters, because you might have a UTF-16 little-endian
> > stream without a BOM. So the first character in 8 bit bytes would be
> > '<'.
> Yes, I wasn't thinking. Thanks. You can't always tell until the second
> byte, but you don't have to "unget" anything in that case because you
> now know the character.
>
> But as it happens I spoke way too soon... The full picture is a mess.
>
Yes, it's awful. You have an "encoding" field in XML 1.0. But you can't
depend on it because not all XML is version 1.0, some of it is bare. Now
I don't have much experience with text, but I reckon that it's entirely
possible that someone would run XML through a program like iconv,
and it won't be clever enough to change the "encoding" field:
>
> > But there's a simple hack, which is to read the first character from the
> > stream, then set up the lexer with a "<' sitting in its token. So of course
> > you also have to read the first character when passed a string, which
> > is a bit of a nuisance (and that's the sort of thing that gives programming
> > such a bad reputation).
> What do you mean "when passed a string"? Do you mean when the parser is
> acting on in-memory data?
>
Sorry, I was so close to the program that I forgot that everybody else knows
nothing of the code (It's on GitHub but not in the resource compiler, it's in a
separate project). You can pass it either a file name, an open stream, or a
string. The string has to be UTF-8 because it is a char *. Of course I have to
read the first character of the string to make the string work the same way
as the rest of the code, all to support UTF-16 without a BOM on non-seekable
streams.
> > But it should work now when piped a non-seekable
> > UTF-16 stream.
> It turns out that if you want to be 100% conforming you need to be able
> to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to
> set up just enough of the reading mechanism to be able to read the XML
> declaration and then adjust the reading mechanism to handle the named
> encoding. For your application, ISO-8859-1 might be effectively the
> same as ISO-8859-15, but UCS-4 is a complication and you might want to
> flag certain errors if the encoding is named as ISO-10646-UCS-2 rather
> than UTF-16.
>
The XML people say that a parser must accept UTF-8 and UTF-16. I have
heard of files which switch encodings, but I think they are largely mythical.
The basic idea of XML was very good, but I'm not impressed with the standard.
>
> While this can obviously be done in C, I would much rather do it in
> Haskell. Haskell's lazy evaluation gives you stream IO for free (so to
> speak), and handling the tail of a lazy stream with functions computed
> by looking at the start of it comes naturally in Haskell.
>
The structure of the C function is massively improved by going to a lexer and
having a proper hierarchical, recursive grammar rather than the old ad-hoc
system. (Which was tempting because basic XML is so simple).
However it might be possible to do a much better job in Haskell. Unfortunately
I can't do that better job.
I'm confident that it is shaping up as a very good single file C XML parser,
however.

Re: C vs Haskell for XML parsing

<cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27913&group=comp.lang.c#27913

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:3d11:b0:76d:7431:2d55 with SMTP id tq17-20020a05620a3d1100b0076d74312d55mr7312qkn.9.1692439468951;
Sat, 19 Aug 2023 03:04:28 -0700 (PDT)
X-Received: by 2002:a63:7789:0:b0:569:350a:a690 with SMTP id
s131-20020a637789000000b00569350aa690mr281887pgc.1.1692439468458; Sat, 19 Aug
2023 03:04:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Sat, 19 Aug 2023 03:04:27 -0700 (PDT)
In-Reply-To: <87o7j4vt6r.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:40e6:743f:6a80:14c2;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:40e6:743f:6a80:14c2
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk> <37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk> <250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
Subject: Re: C vs Haskell for XML parsing
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Sat, 19 Aug 2023 10:04:28 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2163
 by: Malcolm McLean - Sat, 19 Aug 2023 10:04 UTC

On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>
> It turns out that if you want to be 100% conforming you need to be able
> to detect both UCS-4 and (eye roll) EBCDIC.
>
I had a go at ECBDIC.

If anyone has an EBCDIC XML file they'd like to test, please post a link.

Of course the next challenge is to support ECBDIC as the execution character
set. This means all the if (ch == '<') statements have to come out and be replaced
by if (ch == ASCII_LESSTHEN). And the strings have to be replaced with hex codes.

Here's where the Baby X resource compiler shows its power. Simply set up the input
<BabyXRC>
<utf8 name="cdata"><CDATA</utf8>
</BabyXRC>

And so on, and you get all the strings in hex-encoded UTF-8, ready to cut and paste.

Re: C vs Haskell for XML parsing

<ubqfgf$r0tm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27925&group=comp.lang.c#27925

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Sat, 19 Aug 2023 13:19:11 -0000 (UTC)
Organization: The Pitcher Digital Freehold
Lines: 33
Message-ID: <ubqfgf$r0tm$1@dont-email.me>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
<cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 19 Aug 2023 13:19:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="79d7d6344ec940457ae1bda346c076e0";
logging-data="885686"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19OUYIDYJwdKSUj9PJaSMrDd8oJabhEaiY="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:TWSbjpeQ+EW+nQxn9fmrOVK9A2o=
 by: Lew Pitcher - Sat, 19 Aug 2023 13:19 UTC

On Sat, 19 Aug 2023 03:04:27 -0700, Malcolm McLean wrote:

> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>
>> It turns out that if you want to be 100% conforming you need to be able
>> to detect both UCS-4 and (eye roll) EBCDIC.
>>
> I had a go at ECBDIC.
>
> If anyone has an EBCDIC XML file they'd like to test, please post a link.

Be careful of what you ask for, Malcolm

You /do/ realize that "EBCDIC" refers to a whole family of charactersets,
(at least 46 individual charactersets, most with /some/ common elements)
and /not/ to a single characterset like Unicode or US-ASCII (although, you
could argue that ASCII embodied multiple charactersets, just with fewer
variants).

FWIW, there are a number EBCDIC charactersets that you could not reliably use
in XML, as they lack a few of the required characters. You might take a
look at the DKUUG's characterset standards website[1] - they contributed to
the ISO/IEC JTC 1/SC 2 [2] effort to catalogue and standardize charactersets.

[1] http://std.dkuug.dk/i18n/charmaps/
[2] https://en.wikipedia.org/wiki/ISO/IEC_JTC_1/SC_2

[snip]

--
Lew Pitcher
"In Skills We Trust"

Re: C vs Haskell for XML parsing

<IC4EM.686039$TPw2.185069@fx17.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27929&group=comp.lang.c#27929

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: C vs Haskell for XML parsing
Newsgroups: comp.lang.c
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> <ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk> <37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com> <877cptgbli.fsf@bsb.me.uk> <250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com> <87o7j4vt6r.fsf@bsb.me.uk> <cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
Lines: 14
Message-ID: <IC4EM.686039$TPw2.185069@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 19 Aug 2023 14:48:08 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 19 Aug 2023 14:48:08 GMT
X-Received-Bytes: 10335
 by: Scott Lurndal - Sat, 19 Aug 2023 14:48 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>
>> It turns out that if you want to be 100% conforming you need to be able
>> to detect both UCS-4 and (eye roll) EBCDIC.
>>
>I had a go at ECBDIC.
>
>If anyone has an EBCDIC XML file they'd like to test, please post a link.

Here's one:

Lo���@�������~�K�@��������~���`�on%Lo���`����������@����~���m��������K���@����~����a���@on%LO�������@���������@������@���������K���n%LO``@Ö�������@M�]@����`����@���@Ӊ�����K@���@������@��������K@``n%LO``@㈉�@��������@��@Ö����������K@㈉�@��������@���@����@��@����@���@�����������@��@����������@����@���@�����@��@���@���������@�������@����@��@���@���@���@�����@����@���@���������@����@��������@��K@``n%LO``@㈅@����@���������@��@����@��������@��@�����������@���@�������@��@������@��@����������@���������@�������@������K@``n%L��������m����n%@@L���������n%@@@@L��������@��m��������~㙤�@��m��������~Ɓ���@��m������~Ɓ���@��m��������~Ɓ���@��m����m�����~Ɓ���n%@@@@@@L���m�����m����n����m���La���m�����m����n%@@@@@@L���m����m����nԁ��@��@م������La���m����m����n%@@@@@@%@@@@@@%@@@@@@%@@@@@@L���m�������@��������m������~㙤�@���m���m������~㙤�@�����m������~ą���n%@@@@@@@@L���m���������ną���La���m���������n%@@@@@@@@%@@@@@@@@L���m������nL���������n����La���������nLa���m������n%@@@@@@La���m�������n%@@@@@@L���m�����m�����n���La���m�����m�����n%@@@@@@L���m������n%@@@@@@@@L���m������m�����n%@@@@@@@@@@L���m������m�����ną�����La���m������m�����n%@@@@
@@@@@@L���m������m����n��La���m������m����n%@@@@@@@@La���m������m�����n@%@@@@@@La���m������n%@@@@@@L���m��������n%@@@@@@@@L���m�������n%@@@@@@@@@@L������m����@��������~�������`����m���K���n����m���La������m����n%@@@@@@@@@@L������m����n�������������La������m����n%@@@@@@@@@@L������m���������m�����n�������La������m���������m�����n%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@La���m�������nL���m�������n%@@@@@@@@@@L������m����@��������~�������`����K���n����La������m����n%@@@@@@@@@@L������m����n�������������La������m����n%@@@@@@@@@@L������m���������m�����n�������La������m���������m�����n%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@@@%@@@@@@@@La���m�������n%@@@@@@La���m��������n%@@@@@@L���m�������n%@@@@@@@@L�������m����nL����nי������@��������������@�����������@���@���@��k@���������@��@�����������@����@���@���@������@���@�@������@��@������KLa����nLa�������m����n%@@@@@@@@%@@@@@@La���m�������n%@@@@@@L���m������n%@@@@@@@@L���m�����nɄ������������@���������La���m�����n%@@@@@@La���m������n%@@@@@@L���m�����m�����������n%@@@@@@@@%@@@@@@La���m�����m�����������n%
@@@@@@L���m�������������n%@@@@@@@@%@@@@@@La���m�������������n%@@@@@@%@@@@@@L���m����������n%@@@@@@@@L����������m����nL����n����m���@��@�@��`���@��������KLa����nLa����������m����n%@@@@@@@@%@@@@@@La���m����������n%@@@@@@L���m���������n%@@@@@@%@@@@@@L������@������~��n%@@@@@@%@@@@@@%@@@@@@@@%L�����@��m��������m������~Ɓ���@���m�������m��������~Ɓ���@��m������m��m�������m��������~Ɓ���n%@@L�����m����nɔ���������La�����m����n%@@%@@L�����m���n��La�����m���n%@@L�����m���n��La�����m���n%@@%@@L�����m�����������@�����~������nL����n㈅@ɔ���������@����K@㈉�@�����@����@����@��@�����������@����@����@���@����@��������@��@���K@��������@�����@�������@���@���������zLa����nLa�����m�����������n%@@L�����m�����������@�����~������nL�����nL������@����~�nL�����nL���nL�����nȅ�@��������������La�����nL�����n�����@��������������La�����nL�����nɔ���������La�����nLa���nLa�����nL�����nL���nL�����nL���������n���La���������nLa�����nL�����n�La�����nL�����n���@Ӊ�����La�����nLa���nL���nL�����nL���������n���La���������nLa�����nL�����n�La�����nL�����n™������@Ö���������La�����nLa���nL���nL�����nL���������n���La���������nLa�����nL�����n�La�����nL�����nÁ����@ɕ�KLa�����nLa���
nL���nL�����nL���������n���La���������nLa�����nL�����n�La�����nL�����nĉ�����@Ř�������@Ö���������La�����nLa���nL���nL�����nL���������n��La���������nLa�����nL�����n�La�����nL�����nɕ������@ㅃ���������@��La�����nLa���nL���nL�����nL���������n���La���������nLa�����nL�����n�La�����nL�����nԖ������@��@ƙ�������@⅔����������@ɕ�KLa�����nLa���nL���nL�����nL���������n���La���������nLa�����nL�����n�La�����nL�����n�����@Ö���������La�����nLa���nL���nL�����nL���������n��La���������nLa�����nL�����n�La�����nL�����n�������@ԉ���@É������@Ö���������La�����nLa���nL���nL�����nL���������n��La���������nLa�����nL�����n�La�����nL�����nؤ������@ɕ�KLa�����nLa���nL���nL�����nL���������n�La���������nLa�����nL�����n�La�����nL�����nԁ�����@ɕ�����������@ӣ�KLa�����nLa���nL���nL�����nL���������n�La���������nLa�����nL�����n�La�����nL�����nɕ���@Ö���������La�����nLa���nLa�����nLa������nLa�����nLa�����m�����������n%@@L�����m�����������@�����~������nL����n���@���@������@�����@����@���@���@���������@��@����@������K@���@������@���@��������@��@���@���@��������@���@����@���@��@����KLa����nLa�����m�����������n%@@%@@L�����m������n%@@@@%@@@@%@@@@%@@La�����m������n%@@%@@%@@L�����m������
n%@@@@%@@La�����m������n%@@%@@%@@%La�����n%%@@@@@@%@@@@@@@@%L�����@��m��������m������~Ɓ���@���m�������m��������~Ɓ���@��m������m��m�������m��������~Ɓ���n%@@L�����m����n偙����La�����m����n%@@%@@L�����m���n��La�����m���n%@@L�����m���n��La�����m���n%@@%@@L�����m�����������@�����~������nL����n��@L���`�������`����n�������������@�������La���`�������`����n@�������@������K@㨗������k@����@�����@��@����@��@�����������@�������@���������@�������@��������k@��@�����@���������@��@�@�������KLa����nLa�����m�����������n%@@%@@L�����m������n%@@@@%@@@@%@@@@%@@La�����m������n%@@%@@%@@L�����m������n%@@@@%@@La�����m������n%@@%@@%@@%La�����n%%@@@@@@%@@@@@@@@%L�����@��m��������m������~Ɓ���@���m�������m��������~Ɓ���@��m������m��m�������m��������~Ɓ���n%@@L�����m����n������������La�����m����n%@@%@@L�����m���n�La�����m���n%@@L�����m���n�La�����m���n%@@%@@L�����m�����������@�����~������nL����n㈅@���������@������@��@����@�����@���zLa����nLa�����m�����������n%@@%@@L�����m������n%@@@@%@@@@%@@@@L�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ�La����nLa�����m�����m�����������n
%@@@@@@%@@@@@@%@@@@La�����m�����m��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ��La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ�@M��������]La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ��La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ���La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ����La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����n��ԥ�La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m
��������nL�����m�����m��������n%@@@@@@L�����m�����n����La�����m�����n%@@@@@@%@@@@@@L�����m�����m�����������nL����ną�����@��@�����@������La����nLa�����m�����m�����������n%@@@@@@%@@@@@@%@@@@La�����m�����m��������n%@@La�����m������n%@@L�����m�����������@�����~�����nL����n���@�����@������@���@��������KLa����nLa�����m�����������n%@@%@@%@@L�����m������n%@@@@%@@La�����m������n%@@%@@%@@%La�����n%%@@@@@@%@@@@@@@@%L�����@��m��������m������~Ɓ���@���m�������m��������~Ɓ���@��m������m��m�������m��������~Ɓ���n%@@L�����m����nׁ��դ�La�����m����n%@@%@@L�����m���n�La�����m���n%@@L�����m���n�La�����m���n%@@%@@L�����m�����������@�����~������nL����n��@L���`�������`����n�������������@�������La���`�������`����n@�������@����@������@���@���@������KLa����nLa�����m�����������n%@@L�����m�����������@�����~������nL����n֕@����������@�����������@��@���k@��@���@���@����@����@��@���@�������@����@������@���@L���������n��La���������n@��@L���������n�La���������nk@���@�������@���@������������@���@�������@�����������KLa����nLa�����m�����������n%@@%@@L�����m������n%@@@@%@@@@%@@@@%@@La�����m������n%@@%@@%@@L�����m������n%@@@@%@@La�����m������n%@@%@@%@@%La
�����n%%@@@@@@%@@@@@@@@%L�����@��m��������m������~Ɓ���@���m�������m��������~Ɓ���@��m������m��m�������m��������~Ɓ���n%@@L�����m����nم������La�����m����n%@@%@@L�����m���n�La�����m���n%@@L�����m���n�La�����m���n%@@%@@L�����m�����������@�����~������nL����n��@L���`�������`����n�������������@�������La���`�������`����n@��������@������@���@���@������KLa����nLa�����m�����������n%@@%@@L�����m������n%@@@@%@@@@%@@@@%@@La�����m������n%@@%@@%@@L�����m������n%@@@@%@@La�����m������n%@@%@@%@@%La�����n%%@@@@@@%@@@@@@@@La������n%@@@@@@La���m���������n%@@@@@@%@@@@La��������n%@@La���������n%@@L���������n��a��a����@�z�La���������n%La��������m����n


Click here to read the complete article
Re: C vs Haskell for XML parsing

<ubqlvc$shgn$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27931&group=comp.lang.c#27931

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Sat, 19 Aug 2023 15:09:32 -0000 (UTC)
Organization: The Pitcher Digital Freehold
Lines: 41
Message-ID: <ubqlvc$shgn$1@dont-email.me>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
<cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
<IC4EM.686039$TPw2.185069@fx17.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 19 Aug 2023 15:09:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="79d7d6344ec940457ae1bda346c076e0";
logging-data="935447"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ISxOCkM+NLMe/XJFakW1Fm2rFxdlfIzw="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:EnF9ixZ3VZk+BsAi0jFz8k2B9E8=
 by: Lew Pitcher - Sat, 19 Aug 2023 15:09 UTC

On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote:

> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>
>>> It turns out that if you want to be 100% conforming you need to be able
>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>
>>I had a go at ECBDIC.
>>
>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>
> Here's one:
[snip]

And that's an excellent illustration of my point about some EBCDIC
charactersets lacking the necessary characters to properly express XML.

Here are the first four lines of the ASCII equivalent of that message,
as generated by
dd if=ebcdic.msg of=ascii.msg conv=ascii
where
conv=ascii
will convert "from EBCDIC to ASCII" (dd(1) manpage)

Note the (translated) format of the DOCTYPE entities
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="one_register.xsl" type="text/xsl" ?>
<|DOCTYPE registers SYSTEM "registers.dtd">
<|-- Copyright (c) 2010-2014 ARM Limited. All rights reserved. -->

Apparently, you used a variant of EBCDIC that includes an exclamation mark
at codepoint 0x4f; dd uses EBCDIC-US which, at codepoint 0x4f encodes
a "VERTICAL LINE"

--
Lew Pitcher
"In Skills We Trust"

Re: C vs Haskell for XML parsing

<ubqmdh$shgn$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27933&group=comp.lang.c#27933

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lew.pitc...@digitalfreehold.ca (Lew Pitcher)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Sat, 19 Aug 2023 15:17:05 -0000 (UTC)
Organization: The Pitcher Digital Freehold
Lines: 48
Message-ID: <ubqmdh$shgn$2@dont-email.me>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
<cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
<IC4EM.686039$TPw2.185069@fx17.iad> <ubqlvc$shgn$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 19 Aug 2023 15:17:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="79d7d6344ec940457ae1bda346c076e0";
logging-data="935447"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19pbnxJiizdxz7mXicz4Y2uYdqKLaWjUtM="
User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2)
Cancel-Lock: sha1:EO2+ATOyEHb6ukCO/l43ncK6Cek=
 by: Lew Pitcher - Sat, 19 Aug 2023 15:17 UTC

On Sat, 19 Aug 2023 15:09:32 +0000, Lew Pitcher wrote:

> On Sat, 19 Aug 2023 14:48:08 +0000, Scott Lurndal wrote:
>
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>>
>>>> It turns out that if you want to be 100% conforming you need to be able
>>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>>
>>>I had a go at ECBDIC.
>>>
>>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>>
>> Here's one:
> [snip]
>
> And that's an excellent illustration of my point about some EBCDIC
> charactersets lacking the necessary characters to properly express XML.
>
> Here are the first four lines of the ASCII equivalent of that message,
> as generated by
> dd if=ebcdic.msg of=ascii.msg conv=ascii
> where
> conv=ascii
> will convert "from EBCDIC to ASCII" (dd(1) manpage)
>
> Note the (translated) format of the DOCTYPE entities
> <?xml version="1.0" encoding="utf-8"?>

Oh, and bye the way, that "encoding" value is incorrect for
the XML document you posted. It should have named the
EBCDIC variant you used, not "utf-8".

I suspect that you just machine or hand encoded an existing
utf-8 XML document, rather than compose a completely new
document in EBCDIC

FWIW, I spent many years working in an EBCDIC environment,
manipulating XML documents (in EBCDIC) with a tool developed
in-house. I had to write a number of "white papers" on the
subjects of characterset translation (to/from EBCDIC, and
between EBCDIC variants), and on XML handling in an EBCDIC
environment. :-)

--
Lew Pitcher
"In Skills We Trust"

Re: C vs Haskell for XML parsing

<871qfyx0fe.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27944&group=comp.lang.c#27944

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Sat, 19 Aug 2023 21:05:41 +0100
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <871qfyx0fe.fsf@bsb.me.uk>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
<cb35076d-f8ec-441c-a963-7077bd5f884cn@googlegroups.com>
<IC4EM.686039$TPw2.185069@fx17.iad>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="3fc15511aac7d6a52e9762e8994bb691";
logging-data="1058273"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+mMsuROfVnNN5gkGqRSZg8tJm+8TKTmz8="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:53OGdb3en6gJ5vGYle/2tyoIC8g=
sha1:ks0sXSMy/c2xjzqWvwUltlZzGhc=
X-BSB-Auth: 1.a6b3ce20d31e39e75630.20230819210541BST.871qfyx0fe.fsf@bsb.me.uk
 by: Ben Bacarisse - Sat, 19 Aug 2023 20:05 UTC

scott@slp53.sl.home (Scott Lurndal) writes:

> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>>On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>>>
>>> It turns out that if you want to be 100% conforming you need to be able
>>> to detect both UCS-4 and (eye roll) EBCDIC.
>>>
>>I had a go at ECBDIC.
>>
>>If anyone has an EBCDIC XML file they'd like to test, please post a link.
>
> Here's one:
>
> Lo...

<EBCDIC-encoded XML deleted>

Is that legal? I thought an EBCDIC XML file must give the correct
encoding in the XML declaration. xmllint rejects it unless I edit the
declaration.

--
Ben.

Re: C vs Haskell for XML parsing

<87pm3ivjyd.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27947&group=comp.lang.c#27947

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Sat, 19 Aug 2023 21:46:50 +0100
Organization: A noiseless patient Spider
Lines: 122
Message-ID: <87pm3ivjyd.fsf@bsb.me.uk>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com>
<ubi7hd$38q7d$1@dont-email.me> <87o7j6fu74.fsf@bsb.me.uk>
<37f1a926-972c-42c8-a276-8d3f6457ccb8n@googlegroups.com>
<877cptgbli.fsf@bsb.me.uk>
<250cc72c-f682-4986-96bd-80011967c8dbn@googlegroups.com>
<87o7j4vt6r.fsf@bsb.me.uk>
<323a8074-838d-4dfd-ad44-32eda639760en@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="3fc15511aac7d6a52e9762e8994bb691";
logging-data="1069088"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18admh/24r2AQTuccJ6ndb8FEUy2zi9gp8="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:ST+YIoaC7E5rJzAVuJkUQytYzFk=
sha1:TK0XymIXY0lx+bqgagRzXAvFkUE=
X-BSB-Auth: 1.a54fab9fc28044276313.20230819214650BST.87pm3ivjyd.fsf@bsb.me.uk
 by: Ben Bacarisse - Sat, 19 Aug 2023 20:46 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Saturday, 19 August 2023 at 00:15:25 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>> > On Thursday, 17 August 2023 at 12:19:55 UTC+1, Ben Bacarisse wrote:
>> >> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>> >>
>> >> > On Thursday, 17 August 2023 at 00:23:26 UTC+1, Ben Bacarisse wrote:
>> >> >> Bart <b...@freeuk.com> writes:
>> >> >>
>> >> >> > On 16/08/2023 08:31, Malcolm McLean wrote:
>> >> >> >> Some people here are interested in Haskell.
>> >> >> >> They might be interested in this:
>> >> >> >> https://chrisdone.com/posts/fast-haskell-c-parsing-xml/
>> >> >> >> Of course it's written from a pro-Haskell point of view, and writing an
>> >> >> >> improved version when you've got the C in front of you isn't really a
>> >> >> >> fair test. But he does match C for speed.
>> >> >> >>
>> >> >> >
>> >> >> > "Portability (i.e. Windows) is a pain in the arse with C."
>> >> >> >
>> >> >> > I wonder what makes them say that?
>> >> >> Yes, I wondered that too, since the cut-down XML parsing they are doing
>> >> >> is one of the most potentially portable bits of C one could write (as
>> >> >> you say yourself):
>> >> >>
>> >> > There are some gotchas with files, but not for the cut down parsing
>> >> > they implement.
>> >> > Windows used to accept "rt" for reading a text stream. And there's
>> >> > still a mess with Unicode.
>> >> None of that matters for the case in point. The C code treats the input
>> >> like a stream of 8-bit bytes. You can do that without regard to line
>> >> convention.
>> >> > And the XML people say that a parser must
>> >> > accept UTF-16.
>> >> Again, that's not relevant to the case in the article. But it's also a
>> >> completely different issue. An XML parser that must handle either UTF-8
>> >> or UTF-16 needs a layer below the parser (conceptually) to detect the
>> >> encoding and return "characters" (as I think you have done). There is
>> >> no reason to suppose that that can't be written in portable C.
>> >> > I implement this by having the lexer call a function pointer to read a
>> >> > UTF-8 character from a stream. If the file is UTF-16, it converts to UTF-8
>> >> > on the fly.
>> >> Exactly -- though I think I would not have converted to UTF-8 in a plain
>> >> parser. Maybe your application make that a good choice.
>> >> > If it's UTF-8, it's just an alias for fgetc().
>> >> > But how do you know the file format?
>> >> The first character much be '<' (and, technically, it must be the '<' that
>> >> opens an XML declaration). The encoding should be clear from the first
>> >> two bytes.
>> I ended up looking at the spec (that's an hour I'll never get back!) and
>> it's more complicated...
>> >> > I have code that does this, but if
>> >> > I called it, the XML parser would no longer be a single file module. So
>> >> > I read the first few character of the file, the seek back to the start
>> >> > position.
>> >> > But this only works on seekable streams.
>> >>
>> >> Actually, you don't need to read more that one character to determine if
>> >> the file is UTF-8 or UTF-16, all you need to do is an ungetc call and
>> >> that works on non-seekable streams.
>> >>
>> > You need two characters, because you might have a UTF-16 little-endian
>> > stream without a BOM. So the first character in 8 bit bytes would be
>> > '<'.
>> Yes, I wasn't thinking. Thanks. You can't always tell until the second
>> byte, but you don't have to "unget" anything in that case because you
>> now know the character.
>>
>> But as it happens I spoke way too soon... The full picture is a mess.
>>
> Yes, it's awful. You have an "encoding" field in XML 1.0. But you can't
> depend on it because not all XML is version 1.0, some of it is bare. Now
> I don't have much experience with text, but I reckon that it's entirely
> possible that someone would run XML through a program like iconv,
> and it won't be clever enough to change the "encoding" field:

Maybe. But you are providing a tool and you don't have to accept
everything. A converted document with the wrong XML declaration is an
error and you could just reject it. You don't have to bend over
backwards for bad input.

Not being a Windows user, I've not seen a UTF-16 encoded file in the
wild, so I would not even accept that. In the Linux world, I'd probably
accept only UTF-8 and point my users to xmllint.

xmllint can read any valid XML file and re-write it using UTF-8 (or,
indeed, many other encodings), changing the XML declaration on the fly.
Hence

xmllint -encode UTF-8 myresources | babyxrc

would be close to a universal XML processor with little extra work on
your part. But maybe you probably can't assume your users will want to
do that.

>> It turns out that if you want to be 100% conforming you need to be able
>> to detect both UCS-4 and (eye roll) EBCDIC. What's more, you need to
>> set up just enough of the reading mechanism to be able to read the XML
>> declaration and then adjust the reading mechanism to handle the named
>> encoding. For your application, ISO-8859-1 might be effectively the
>> same as ISO-8859-15, but UCS-4 is a complication and you might want to
>> flag certain errors if the encoding is named as ISO-10646-UCS-2 rather
>> than UTF-16.
>>
> The XML people say that a parser must accept UTF-8 and UTF-16.

Don't they go further? I thought they did. Maybe the others are
optional and those two are the only must-haves.

> I have
> heard of files which switch encodings, but I think they are largely mythical.
> The basic idea of XML was very good, but I'm not impressed with the
> standard.

There are two kinds of standards: those that incorporate lots of options
because of all the interested parties, and those that make decisive
choices between competing candidates.

--
Ben.

Pages:123456789101112
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor