Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Emotions are alien to me. I'm a scientist. -- Spock, "This Side of Paradise", stardate 3417.3


devel / comp.compilers / Why is flex pattern-matching of NULs slow?

SubjectAuthor
* Why is flex pattern-matching of NULs slow?Roger L Costello
`- Why is flex pattern-matching of NULs slow?Christopher F Clark

1
Why is flex pattern-matching of NULs slow?

<22-04-001@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=354&group=comp.compilers#354

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: coste...@mitre.org (Roger L Costello)
Newsgroups: comp.compilers
Subject: Why is flex pattern-matching of NULs slow?
Date: Fri, 8 Apr 2022 11:06:00 +0000
Organization: Compilers Central
Lines: 14
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-04-001@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="77256"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, performance, question, comment
Posted-Date: 08 Apr 2022 11:48:17 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Accept-Language: en-US
Content-Language: en-US
 by: Roger L Costello - Fri, 8 Apr 2022 11:06 UTC

Hi Folks,

The Flex manual says this:

Pattern-matching of NULs is substantially slower
than matching other characters.

Why is that?

/Roger
[My recollection is that zero is used as a flag value in internal
tables and there is some slow kludge to say that this is a nul not the
flag, but perhaps someone who has looked at the code more recently
will remember the details. -John]

Why is flex pattern-matching of NULs slow?

<22-04-010@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=359&group=comp.compilers#359

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: Why is flex pattern-matching of NULs slow?
Date: Sat, 9 Apr 2022 21:40:45 +0300
Organization: Compilers Central
Lines: 31
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-04-010@comp.compilers>
References: <22-04-001@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="26217"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, i18n, comment
Posted-Date: 09 Apr 2022 16:19:41 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Christopher F Clark - Sat, 9 Apr 2022 18:40 UTC

I haven't looked at Flex in a while either, but what I remember is
that 0 is used as end of buffer and EOF indication and that you had to
validate against that. I don't recall whether that required an
attempt at reading or not. It wouldn't surprise me if it were used as
a flag also, and for a "null pointer". Depending upon how you look at
it, C either hates 0 or loves it, but it is very often "special".

But if you are parsing human readable ASCII text, having 0 (NUL) be an
EOF mark is actually not a bad solution. If I recall correctly, that
isn't even a bad choice for human readable UTF-8 (including
non-latin-1 texts, because 2 and 3 byte sequences don't have NULs in
them). It only becomes a pain if you want to parse binary data.

By the way, in our lexer, we used -1, i.e. what getc used to return
for EOF for the same condition and I don't recall how we put it in the
buffer (or whether we even did). Being ex-PL/I and Pascal
programmers, we used strings with lengths in many places instead of C
strings. I don't remember whether we used Paul Abrahams clever hack
to put the length at the end of the string which if done right also
serves as a null byte for use as C strings.

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[You're right about UTF-8, where NUL is also a reasonable string terminator.
UTF-8 is self-synchonizing -- the bytes of no UTF-8 code point are a prefix
or suffix of any other code point. -John]

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor