Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Turn on, tune up, rock out." -- Billy Gibbons


devel / comp.compilers / Re: State-of-the-art algorithms for lexical analysis?

SubjectAuthor
* Re: State-of-the-art algorithms for lexical analysis?Roger L Costello
+- Re: State-of-the-art algorithms for lexical analysis?gah4
`- Re: State-of-the-art algorithms for lexical analysis?gah4

1
Re: State-of-the-art algorithms for lexical analysis?

<22-06-009@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=430&group=comp.compilers#430

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: coste...@mitre.org (Roger L Costello)
Newsgroups: comp.compilers
Subject: Re: State-of-the-art algorithms for lexical analysis?
Date: Mon, 6 Jun 2022 10:48:24 +0000
Organization: Compilers Central
Lines: 17
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-06-009@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="36459"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 06 Jun 2022 11:06:25 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Thread-Topic: State-of-the-art algorithms for lexical analysis?
Thread-Index: Adh5kg76Z0xZslIuRRyzgUhteE2M6A==
Accept-Language: en-US
Content-Language: en-US
authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=mitre.org;
 by: Roger L Costello - Mon, 6 Jun 2022 10:48 UTC

gah4 wrote:

> Pattern Specification Language (PSL) is
> much more powerful than the usual
> regular expression.

Neat!

> I suspect that if regexes hadn't previously
> been defined, we might come up with
> something different today.

Wow! That is a remarkable statement.

I will look into PSL. There are algorithms for converting regexes to DFA and then using the DFA to tokenize the input. Are there algorithms for converting PSL to (what?) and then using the (what?) to tokenize the input?

/Roger

Re: State-of-the-art algorithms for lexical analysis?

<22-06-011@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=432&group=comp.compilers#432

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: State-of-the-art algorithms for lexical analysis?
Date: Mon, 6 Jun 2022 10:03:55 -0700 (PDT)
Organization: Compilers Central
Lines: 29
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-06-011@comp.compilers>
References: <Adh5kg76Z0xZslIuRRyzgUhteE2M6A==> <22-06-009@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="77191"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 06 Jun 2022 15:57:06 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-06-009@comp.compilers>
 by: gah4 - Mon, 6 Jun 2022 17:03 UTC

On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote:

(snip)

> I will look into PSL. There are algorithms for converting regexes to DFA
> and then using the DFA to tokenize the input. Are there algorithms for
> converting PSL to (what?) and then using the (what?) to tokenize the input?

The approximate searches are done using dynamic programming.
The penalty is 1 for insertion, deletion, or substitution and the score
is in 3 bits, so up to six spelling errors.

The whole query is then compiled into code for a systolic array,
which then runs as fast as the data comes off disk.

FDF2 is a 9U VME board that runs in a VME based Sun system.

FDF3 connects directly to a SCSI disk, and also to a Sun workstation.
In searching, it transfers directly from the disk. To load data into
the disk, the disk is accessed indirectly through the FDF3.
It is a desktop box, about the size of a large external SCSI disk.

Some of it is described here:

https://aclanthology.org/X93-1011.pdf

along with its use for searching Japanese text, and:

https://trec.nist.gov/pubs/trec3/papers/paper.ps.gz

Re: State-of-the-art algorithms for lexical analysis?

<22-06-014@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=435&group=comp.compilers#435

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: State-of-the-art algorithms for lexical analysis?
Date: Mon, 6 Jun 2022 12:25:56 -0700 (PDT)
Organization: Compilers Central
Lines: 21
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-06-014@comp.compilers>
References: <Adh5kg76Z0xZslIuRRyzgUhteE2M6A==> <22-06-009@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="79244"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, performance, comment
Posted-Date: 06 Jun 2022 16:05:09 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-06-009@comp.compilers>
 by: gah4 - Mon, 6 Jun 2022 19:25 UTC

On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote:

(snip, I wrote)
> > I suspect that if regexes hadn't previously
> > been defined, we might come up with
> > something different today.

> Wow! That is a remarkable statement.

Well, mostly, regex were defined based on what was reasonable to do on
computers at the time. It seems reasonable, then, with the more powerful
computers of today, to expect that more features would have been added.

Some of that was done in the later ERE, Extended Regular Expression.

But there is a strong tendency not to break backward compatibility,
and so not add new features later.
[See my note about DFAs a few messages back. EREs are just syntactic
sugar on regular REs so sure. PCREs are swell but they are a lot
slower since backreferences mean you need to be able to back up.
-John]

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor