Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Beware of Programmers who carry screwdrivers. -- Leonard Brandwein


devel / comp.compilers / Re: Some questions about recursive descent?

SubjectAuthor
* Some questions about recursive descent?Johann 'Myrkraverk' Oskarsson
+- Re: Some questions about recursive descent?gah4
+- Re: Some questions about recursive descent?Hans-Peter Diettrich
+* Re: Some questions about recursive descent?Anton Ertl
|+* Re: Some questions about recursive descent?Alain Ketterlin
||`* Re: Some questions about recursive descent?Johann 'Myrkraverk' Oskarsson
|| +- Re: Some questions about recursive descent?Anton Ertl
|| +- Re: Some questions about recursive descent?Alain Ketterlin
|| `- Re: Some questions about recursive descent?Hans-Peter Diettrich
|`- Re:Some questions about recursive descent?Paul Robinson
`- RE: Some questions about recursive descentChristopher F Clark

1
Some questions about recursive descent?

<22-02-021@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=282&group=comp.compilers#282

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: joh...@myrkraverk.com (Johann 'Myrkraverk' Oskarsson)
Newsgroups: comp.compilers
Subject: Some questions about recursive descent?
Date: Sun, 27 Feb 2022 19:02:59 +0000
Organization: Easynews - www.easynews.com
Lines: 73
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-021@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="1651"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, question
Posted-Date: 27 Feb 2022 16:37:05 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US
 by: Johann 'Myrkrav - Sun, 27 Feb 2022 19:02 UTC

Dear comp.compilers,

I am currently reading two compiler books, [Holub] and [Appel], and
then [Salomon] for a bit of context below,

[Holub] Compiler Design in C, 1990, by Allen Holub

[Appel] Modern Compiler Implementation in ML, 2004, by Andrew Appel

[Salomon] Assemblers and Loaders, 1993, David Salomon

and I have some questions about the construction of recursive descent
parsers. If some of my questions are adequately answered in some other
publication, reference, preferably with chapter and verse, is welcome.

The reason for my questions is that I know production compilers /use/
recursive descent. GCC is probably the most famous example, but Watcom
does it too. More below as well.

My first question is: how are production recursive descent parsers con-
structed? From these two books, and other things I have read in the
past, recursive descent is so simple it's mostly just brushed off with
an example or two, but there seems to be a whole lot more to it. For
instance, [Appel] mentions a parse table but mostly does not explain how
to construct one. The rest of the book seems to be focused on ML-lex
and ML-yacc, which I haven't read yet.
I have read more of [Holub] and he explains the theory much better,
in fact this seems to be the best theory introduction I remember coming
across ever. Yet, that book also does not spend too much effort on rec-
ursive descent, and goes on about implementing and using lex and yacc
like tools.
[Holub] explains the construction/algorithms for FIRST and FOLLOW
sets better than [Appel], in my opinion, but I do not recall any more
details about said parse table, which seems instrumental in the con-
struction of the final recursive descent. [Holub] does mention Q
grammars, and how they are very easy to turn into recursive descent.

Second question: why are recursive descent parsers I've come across
always using globals and construct code and/or the parse tree as side
effects, rather than, say, return the parse tree to the caller?
Is this something to do with /synthesized attributes/ that [Holub]
talks about? Where the recursive descent parser routines return values
to the caller, or is this maybe just a tradition that no one bothers to
change?

Now, to give these questions a bit of context, as a practice, I wanted
to create a recursive descent parser for the first programming exercise
in [Salomon] but found out the hard way that it's not so trivial to
figure out /how/.
My first methods were somewhat ad-hoc to create and modify the
grammar to be non-left-recursive, which I think was mostly easy due to
the assembler being very bare bones, and hence the grammar can be very
simple. Then I ran into trouble when I tried to make the parser return
the parse tree, rather than construct it with side effects.
Granted, some of my problems are from using a programming language,
ML, that I'm not familiar with, which I decided to use for another
wrinkle in the exercise. If the project is too easy, it just isn't
/fun/.
The first wrinkle I stumbled upon, when I started to read up on the
subject, is that constructing the parse table, as a preliminary step,
isn't really explained in [Appel] -- at least not to my satisfaction --
and I found not much better coverage of the subject in [Holub] though
certain topics are covered better.
In the past, when I've stumbled like this, reading up on the subject
has helped a lot, but so far, almost everything I've read seems inade-
quate in different ways.

Side thought: do any of the mentioned authors read this list?

Thanks,
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

Re: Some questions about recursive descent?

<22-02-022@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=283&group=comp.compilers#283

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Sun, 27 Feb 2022 17:13:34 -0800 (PST)
Organization: Compilers Central
Lines: 34
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-022@comp.compilers>
References: <22-02-021@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="55008"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 27 Feb 2022 21:52:24 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-02-021@comp.compilers>
 by: gah4 - Mon, 28 Feb 2022 01:13 UTC

On Sunday, February 27, 2022 at 1:37:09 PM UTC-8, Johann 'Myrkraverk' Oskarsson wrote:
> I am currently reading two compiler books, [Holub] and [Appel], and
> then [Salomon] for a bit of context below,
>
> [Holub] Compiler Design in C, 1990, by Allen Holub

> [Appel] Modern Compiler Implementation in ML, 2004, by Andrew Appel
>
> [Salomon] Assemblers and Loaders, 1993, David Salomon

> and I have some questions about the construction of recursive descent
> parsers. If some of my questions are adequately answered in some other
> publication, reference, preferably with chapter and verse, is welcome.

A few thoughts, which might not answer all your questions.

One is that compilers that use recursive descent often don't use it for
everything. Some might use it only for the statement level, and
something else, such as operator precedence, for expressions.

One of the complications with recursive descent is error handling.
Consider that a user might forget, or misspell, a closing statement
of some block structure. All the rest of the program is then parsed
as inside the block, or at least until something that isn't allowed inside.

And that might partly get to your question about returning the parse tree.
Without errors, that might be fine. But when errors occur, the compiler
has to undo much that was done, but should not have been done. That
is somewhat easier with a global tree, than one that is distributed
throughout the recursive contexts of the called routines.

Also, much of earlier compiler design was done when computer
memories were small. Designing compilers for small memory
usage is now a lost art.

Re: Some questions about recursive descent?

<22-02-023@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=284&group=comp.compilers#284

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: DrDiettr...@netscape.net (Hans-Peter Diettrich)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Mon, 28 Feb 2022 06:48:09 +0100
Organization: Compilers Central
Lines: 58
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-023@comp.compilers>
References: <22-02-021@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="210"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 28 Feb 2022 10:55:14 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-02-021@comp.compilers>
 by: Hans-Peter Diettrich - Mon, 28 Feb 2022 05:48 UTC

On 2/27/22 8:02 PM, Johann 'Myrkraverk' Oskarsson wrote:

> My first question is: how are production recursive descent parsers con-
> structed?

I think that most parsers are hand made, from a well suited (EBNF...)
grammar.

But also have a look at the many tools found by "LL(1) parser
generator". I don't think that all these tools are/have been used to
generate compilers for a couple of new programming languages.

I've explored and modified CoCo/R many years ago. After I understood the
principles I wrote my essential parsers for OPL and C manually, based on
the fine patterns of that Compiler Compiler.

> Second question: why are recursive descent parsers I've come across
> always using globals and construct code and/or the parse tree as side
> effects, rather than, say, return the parse tree to the caller?

Don't think only of programming language compilers if you ask for a
parser. There exist many use cases for parsers like pretty printer, CSV,
RNA or other data deciphering. Not all such "compilers" need a
(complete) parse tree but interpret the parser output differently,
depending on the compilation target.

>     Is this something to do with /synthesized attributes/ that [Holub]
> talks about?  Where the recursive descent parser routines return values
> to the caller, or is this maybe just a tradition that no one bothers to
> change?

One of the nasties disadvantages of commonly used formal grammars are
the specific attributes or similar decoration for a specific compiler.
This makes it hard to use a verified grammar for construction of a tool
other than the one the grammar writer had in mind, maybe a different
compiler implementation language or framework. So one use case of a
parser generator were a decoration remover or converter from some
grammar. This were my most important argument *against* embedding the
parse tree generation into the parser.

Another very interesting experiment was cooperation between Quinn Tyler
Jackson's MetaS parser program, written in C++, and my parse tree
generator, written in Delphi.

> Now, to give these questions a bit of context, as a practice, I wanted
> to create a recursive descent parser for the first programming exercise
> in [Salomon] but found out the hard way that it's not so trivial to
> figure out /how/.

Parser generators can be too picky WRT minor problems of a grammar, like
the famous dangling else. Most practical solution then is a reduced
grammar without the problematic parts and full or partial manual parser
implementation.

DoDi

Re: Some questions about recursive descent?

<22-02-024@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=285&group=comp.compilers#285

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Mon, 28 Feb 2022 07:43:39 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 79
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-024@comp.compilers>
References: <22-02-021@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="739"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 28 Feb 2022 10:56:12 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Mon, 28 Feb 2022 07:43 UTC

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>My first question is: how are production recursive descent parsers con-
>structed? From these two books, and other things I have read in the
>past, recursive descent is so simple it's mostly just brushed off with
>an example or two, but there seems to be a whole lot more to it. For
>instance, [Appel] mentions a parse table but mostly does not explain how
>to construct one.

I don't know what such a table would be good for. My parser generator
Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
generates recursive-descent parsers, certainly does not need one.

The way Gray works is: It computes first and follow sets for all nodes
in the grammar; follow sets are only used for computing first sets and
for reporting conflicts, first sets are used for deciding how to
parse. It generates code as follows:

grammar code
terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end
a b code(a) code(b)
a|b if sym in first(a) then code(a); else code(b); end
a? if sym in first(a) do code(a); end
a* while sym in first(a) do code(a); end
a+ do code(a) while sym in first(a)
nonterm call(function(nonterm))
action action

a?, a* and a+ are regular right part grammar (RRPG) syntax; they
correspond to [a], {a} and a{a} respectively in EBNF.

>Second question: why are recursive descent parsers I've come across
>always using globals and construct code and/or the parse tree as side
>effects, rather than, say, return the parse tree to the caller?

I have no explanation for the case of constructing the parse tree;
that's trivial to do by returning it.

Concerning code generation, constructing it as a string is inefficient
with the conventional string representation (array of chars), because
concatenation takes time proportional to the length of the string
(likewise, if you generate machine code or virtual machine code as an
array of bytes). You can now add an unconventional string
representation (e.g., a rope), but that still consumes memory; or you
just use side effects for outputting the code.

The disadvantage of the latter is that once you have output the code,
you cannot take it back (or at least not easily), while with strings
you can arrange the code for the nodes in a different than processing
order, have the code for a node several times etc. without introducing
additional complications in your compiler.

> Is this something to do with /synthesized attributes/ that [Holub]
>talks about? Where the recursive descent parser routines return values
>to the caller

Synthesized attributes are trivial to pass along in recursive-descent
parsers by returning them. You can also trivially do L-attributed
grammars by passing attributes inherited from left of a nonterminal as
parameters to a rule.

> Then I ran into trouble when I tried to make the parser return
>the parse tree, rather than construct it with side effects.
> Granted, some of my problems are from using a programming language,
>ML, that I'm not familiar with

I would expect that returning the parse tree is easier in ML than
using side effects (especially if you are unfamiliar with ML, because
literature about ML probably does not explain side effects early on).

If you found the literature you have read up to now to be not
satisfactory, maybe Wirth's compiler book is more to your taste. IIRC
he explains how to write a recursive-descent parser for PL/0 (a simple
Pascal-like language).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

RE: Some questions about recursive descent

<22-02-025@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=286&group=comp.compilers#286

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: RE: Some questions about recursive descent
Date: Mon, 28 Feb 2022 15:40:37 +0200
Organization: Compilers Central
Lines: 50
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-025@comp.compilers>
References: <22-02-021@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="1706"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 28 Feb 2022 10:57:34 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Christopher F Clark - Mon, 28 Feb 2022 13:40 UTC

I'll start with the most important response first.

It should definitely be possible to return the generated syntax tree
from a procedure implementing a parser rule (for a non-terminal) in a
recursive descent compiler. In fact, it surprises me that you say
that's not the normal case. It's *almost* the definition of recursive
descent. One calls the function implementing some non-terminal and
the non-terminal consumes its input, building internally a
representation of that input (as a single structure, i.e. an AST node)
and returns that structure to the caller, so that the caller can do
the same. Each call makes a new rooted tree (that identifies what
kind of tree it is, i.e. what non-terminal it represents) and it is
used by the caller as part of the larger tree containing it. Now, I
supposed it *could* make sense not to make that the return value for
the function implementing the non-terminal, but it certainly is a
natural value to return. More on that after I talk about tables.

It is possible to implement LL (as was as LR and precedence and other)
parsers as parser tables. Recursive descent doesn't generally do so.
It implements that parser as code, specifically nested function calls,
with some (generally, but not necessarily) small amount of logic to
determine which variant (alternative) of a non-terminal is being
processed. This logic is what uses the result of the first (and
follow) functions to determine based upon the initial tokens which
alternative matches the input. This is generally done without parsing
tables. Using parsing tables generally suggests you are implementing
a different parsing methodology than recursive descent, even if you
are doing so only as a subroutine within a recursive descent parser.
The parts not using the tables are the recursive descent part. The
part using the table is an embedded parsing technology within an
otherwise recursive descent framework.

Ok, back to the question of what a recursive descent parser's routine
implementing the parsing of a non-terminal should be. In an
Object-oriented world, it would be reasonable for it to return an
"object" representing the non-terminal, with methods/member functions
that get you the sub-parts, evaluate attributes, etc. Another
reasonable return value, comes from the Rust world, it could be a
"result type", where it is either the non-terminal as discussed
before, or an error. (Result<NonTerminal, Error>). However, in the
non-error case, you generally want each parsing rule to return the
root non-terminal of the AST is parsed.

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Re: Some questions about recursive descent?

<22-02-026@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=287&group=comp.compilers#287

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ala...@universite-de-strasbourg.fr.invalid (Alain Ketterlin)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Mon, 28 Feb 2022 21:52:54 +0100
Organization: Université de Strasbourg
Lines: 60
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-026@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="15798"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1), comment
Posted-Date: 28 Feb 2022 16:19:54 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Alain Ketterlin - Mon, 28 Feb 2022 20:52 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

> Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>>My first question is: how are production recursive descent parsers con-
>>structed? From these two books, and other things I have read in the
>>past, recursive descent is so simple it's mostly just brushed off with
>>an example or two, but there seems to be a whole lot more to it. For
>>instance, [Appel] mentions a parse table but mostly does not explain how
>>to construct one.
>
> I don't know what such a table would be good for. My parser generator
> Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
> generates recursive-descent parsers, certainly does not need one.

A parse table for recursive descent/LL(1) simply tells which rule to
expand given the non-terminal currently parsed and the incoming token
type. The building of this table is described in, e.g., the dragon book
(for sure in the 2nd edition).

> The way Gray works is: It computes first and follow sets for all nodes
> in the grammar; follow sets are only used for computing first

You probably mean the reverse (first is used to compute follow).

> sets and for reporting conflicts, first sets are used for deciding how
> to parse. It generates code as follows:
>
> grammar code
> terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end
> a b code(a) code(b)
> a|b if sym in first(a) then code(a); else code(b); end
> a? if sym in first(a) do code(a); end
> a* while sym in first(a) do code(a); end
> a+ do code(a) while sym in first(a)
> nonterm call(function(nonterm))
> action action

The parsing table embodies all that logic. Parsing works by maintaining
a stack of symbols, and the table indicates what actions need to be
taken (either replace one non-terminal with a rhs on the stack, or match
one input symbol). The algorithm works with any table, and the whole
thing (algorithm+table) is certainly much shorter than generating code
for each and every rule. EBNF constructions like ?/*/+/... have to be
translated first into "regular" rules, but that's trivial pre-processing
(along with left factoring/recursivity elimination).

Building a parse tree in this context is less obvious, because
non-terminal symbols seem to "disappear" during the parsing. The
solution (at least the one I came up with) consists in keeping special
entries on the stack signaling the end of a rhs. Then, special actions
(i.e., "reduce") can be taken when finding these special entries during
parsing, much in the same way as it is done in bottom-up (e.g., LR)
parsing: the "semantic values" of the various symbols can be pushed onto
a separate stack, and you end up with something really similar to
LR-parsing.

-- Alain.
[I think the point is that if you have a table, you have a table driven
LL(1) parser rather than recursive descent in which the grammar is embedded
in the code. -John]

Re: Some questions about recursive descent?

<22-02-027@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=288&group=comp.compilers#288

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: joh...@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Tue, 1 Mar 2022 01:40:33 +0000
Organization: Easynews - www.easynews.com
Lines: 103
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-027@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> <22-02-026@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="89437"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1), comment
Posted-Date: 28 Feb 2022 20:57:17 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US
In-Reply-To: <22-02-026@comp.compilers>
 by: Johann 'Myrkrav - Tue, 1 Mar 2022 01:40 UTC

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>>> My first question is: how are production recursive descent parsers con-
>>> structed? From these two books, and other things I have read in the
>>> past, recursive descent is so simple it's mostly just brushed off with
>>> an example or two, but there seems to be a whole lot more to it. For
>>> instance, [Appel] mentions a parse table but mostly does not explain how
>>> to construct one. ...

I think, at this point, I'd like to demonstrate what [Appel] was talking
about. I can't expect every reader of comp.compilers to have the book
at hand.

The demonstration starts with grammar 3.12, page 47.

Z -> d Y -> X -> Y
Z -> X Y Z Y -> c X -> a

where Y goes to epsilon in the centre of the first line.
Then [Appel] shows how FIRST and FOLLOW sets are computed
given this grammar, ending with

nullable FIRST FOLLOW
X yes a c a c d
Y yes c a c d
Z no a c d

where previous steps are also shown[1]. Then figure 3.14 shows a
predictive parsing table, presumably constructed using the FIRST
and FOLLOW sets.

a c d
+-----------------------------------------+
X | X -> a X -> Y X -> Y |
| Y -> Y |
| |
Y | Y -> Y -> Y -> |
| Y -> c |
| |
Z | Z -> X Y Z Z -> X Y Z Z -> d |
| Z -> X Y Z |
+-----------------------------------------+

Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
recursive descent parser; or LL(1) grammar per the book.

I am not clear on how to construct such a table from the description
in the book, and would not presume to be able to do so by hand, using
a more complex grammar.

I'll give the paragraph where [Appel] explains /recursive descent/.

> Consider a recursive-descent parser. The parsing function for some
> nonterminal X has a clause for each X-production; it must choose one
> of these clauses based on the next token T of the input. If we can
> choose the right production for each (X, T), then we can write the
> recursive-descent parser. All the information we need can be encoded
> as a two-dimensional table of productions, indexed by nonterminals X
> and terminals T. This is called a /predictive parsing/ table.

So I was wondering if people who create /recursive descent/ parsers for
production compilers use such a table as an intermediary step, or not.
Of course, as I mentioned before, [Holub] mentions Q grammars as an
alternative intermediary step, but only in an off hand manner.

This lack of clear directions led me to ask the first question: how are
production compilers using /recursive descent/ constructed? What are
the steps involved? What do people use, or not use?

I don't know Forth, so I haven't done much more than take a cursory look
at the Gray manual; but it certainly wouldn't surprise me if people used
a skeleton generator, from [E]BNF grammar, and then filled in the
details later.

I tried to make the question open-ended, because I am pretty sure people
use methods not described in either book. I have not looked this up in
the dragon book, because currently my copy is in a different country,
and the shipping cost might rival the price of a new copy. Given a
chapter and verse in that book, I can probably read it with a little bit
of help, and I am totally up to get myself further material if it's
useful.

[1] I think overall the steps are better explained in [Holub] but I did
not try to following along with this particular grammar, using [Holub]'s
method.

P.S. I'll try to reply to the other threads later on.

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

[Appel is showing how you turn a normal grammar into an LL(1) grammar. The
algorithm on page 49 tells you how to make the first and follow sets, then
the informal algorithm in the second para on page 51 turns it into the parse
table. Then he says gee, the table has more then one rule in some of the
cells so he spends a few more pages turning it into an LL(1) grammar on
pages 53-54. In my experience most people writing RD parsers do this
informally, e.g., to deal with left recursion in expressions you know
that you eat the first token, then peek at the next token and eat it
if you're in the correct rule, otherwise return and let the caller
deal with it. If this seems like more trouble than it's worth, now
you know why we use parser generators. -John]

Re: Some questions about recursive descent?

<22-03-001@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=289&group=comp.compilers#289

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Tue, 01 Mar 2022 08:01:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 50
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-001@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> <22-02-026@comp.compilers> <22-02-027@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="41624"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 01 Mar 2022 17:38:47 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Tue, 1 Mar 2022 08:01 UTC

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> writes:
>So I was wondering if people who create /recursive descent/ parsers for
>production compilers use such a table as an intermediary step, or not.

Gray doesn't. For finding out about conflicts, it just looks at the
nodes involved. E.g., for a|b, it produces a warning if both a and b
can produce epsilon, if the first sets of a and b are not disjoint,
and if a|b may produce epsilon and its first set and its follow set
are not disjoint.

>I don't know Forth, so I haven't done much more than take a cursory look
>at the Gray manual; but it certainly wouldn't surprise me if people used
>a skeleton generator, from [E]BNF grammar, and then filled in the
>details later.

Gray is not a skeleton generator. It takes an RRPG (regular right
part grammar, similar to EBNF) with actions, and produces an
executable parser (i.e., not as source code). You then run that
parser, you do not fill in the details later; you provide all the
details in the source code before you generate the parser.

BTW, an updated version of Wirth's book that I mentioned is available
online for free:
<https://people.inf.ethz.ch/wirth/CompilerConstruction/CompilerConstruction1.pdf>.
The sections of interest to you seem to be Section 4.1 and 7.2,
probably also 7.3.

>[ [...] In my experience most people writing RD parsers do this
>informally, e.g., to deal with left recursion in expressions you know
>that you eat the first token, then peek at the next token and eat it
>if you're in the correct rule, otherwise return and let the caller
>deal with it. If this seems like more trouble than it's worth, now
>you know why we use parser generators. -John]

Or you start with EBNF or RRPG where you can express repetition
without recursion, so left recursion is rare.

I happen to look at Exercise 7.2 at this moment, which suggests
another way to deal with difficulties:

|7.2. Where is the Oberon syntax not LL(1), that is, where is a
|lookahead of more than one symbol necessary? Change the syntax in such
|a way that it satisfies the LL(1) property.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Some questions about recursive descent?

<22-03-002@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=290&group=comp.compilers#290

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ala...@universite-de-strasbourg.fr (Alain Ketterlin)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Tue, 01 Mar 2022 23:13:25 +0100
Organization: Université de Strasbourg
Lines: 113
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-002@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> <22-02-026@comp.compilers> <22-02-027@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="42163"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 01 Mar 2022 17:40:05 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Alain Ketterlin - Tue, 1 Mar 2022 22:13 UTC

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> writes:

> Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>>>> My first question is: how are production recursive descent parsers con-
>>>> structed? From these two books, and other things I have read in the
>>>> past, recursive descent is so simple it's mostly just brushed off with
>>>> an example or two, but there seems to be a whole lot more to it. For
>>>> instance, [Appel] mentions a parse table but mostly does not explain how
>>>> to construct one. ...

[...]
> The demonstration starts with grammar 3.12, page 47.
>
> Z -> d Y -> X -> Y
> Z -> X Y Z Y -> c X -> a
>
> where Y goes to epsilon in the centre of the first line.
> Then [Appel] shows how FIRST and FOLLOW sets are computed
> given this grammar, ending with
>
> nullable FIRST FOLLOW
> X yes a c a c d
> Y yes c a c d
> Z no a c d
>
> where previous steps are also shown[1]. Then figure 3.14 shows a
> predictive parsing table, presumably constructed using the FIRST
> and FOLLOW sets.
>
> a c d
> +-----------------------------------------+
> X | X -> a X -> Y X -> Y |
> | Y -> Y |
> | |
> Y | Y -> Y -> Y -> |
> | Y -> c |
> | |
> Z | Z -> X Y Z Z -> X Y Z Z -> d |
> | Z -> X Y Z |
> +-----------------------------------------+
>
> Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
> recursive descent parser; or LL(1) grammar per the book.
>
> I am not clear on how to construct such a table from the description
> in the book, and would not presume to be able to do so by hand, using
> a more complex grammar.

That's a 2-dimensional table M[N,t] where N is a non-terminal, and t
is a terminal symbol. It is build by the following algorithm:

for each rule L -> R1...Rk
for each terminal f in FIRST (R1...Rk)
add the rule to M[L,f]
if R1...Rk is nullable
for each terminal f in FOLLOW (L)
add the rule to M[L,f]

In essence: FIRST lets you know when to expand a non-terminal, and
FOLLOW lets you know when to apply a rule that matches nothing in
the input.

The grammar is LL(1) is all entries of the table contain at most one
rule. That is not the case in your example.

Parsing goes as follows: maintain a stack on which you initially push
the start symbol. Then repeat:

if top(stack) is a terminal symbol
if it matches the next input
pop it from the stack and advance input
else
signal error
else
if M[top(stack),next_input] is empty
signal error
else // M[...] is a rule L -> R1...Rk
pop an element from the stack and push Rk, ..., R1

(note that you push the rhs in reverse order, such as the leftmost
symbol appears on the top of the stack).

The parsing succeeds if you end up with an empty stack and an empty
input.

I don't have my copy of the dragon book with me, but from pearson's web
site I find this is the object of chapter 4 section 4.

> So I was wondering if people who create /recursive descent/ parsers for
> production compilers use such a table as an intermediary step, or not.
[...]

As John says, if your language fits the requirements (that is, it has a
LL(1) grammar, or LR(1)), you are better off using a generator. It turns
out that "big" compiler parsers are often written by hand, because 1)
grammars are often ambiguous (sometimes at the lexical level), 2)
ambiguities are easier to resolve "by hand" because you can take more
context into account, and 3) hand-written parsers give you more control
on error recovery. (Also because memory is now big enough to hold a
parse tree.)

For C/C++ parsers, add the fact that there are several grammatical
layers (e.g., OpenMP pragmas), and/or you need details that automated
parsing usually omits (e.g., for diagnostics and tools). That's why both
gcc and clang parsers are hand-written.

The official Python implementation used an LL(1) (auto-generated) parser
until recently, where they switched to a PEG-parser (I have to admit I
don't understand their motivation for doing this -- as exposed in a
document called PEP 617). It is still auto-generated as far as I
understand.

-- Alain.

Re: Some questions about recursive descent?

<22-03-003@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=291&group=comp.compilers#291

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: DrDiettr...@netscape.net (Hans-Peter Diettrich)
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Wed, 2 Mar 2022 05:52:46 +0100
Organization: Compilers Central
Lines: 16
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-003@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> <22-02-026@comp.compilers> <22-02-027@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="21177"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 02 Mar 2022 09:50:40 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-02-027@comp.compilers>
 by: Hans-Peter Diettrich - Wed, 2 Mar 2022 04:52 UTC

On 3/1/22 2:40 AM, Johann 'Myrkraverk' Oskarsson wrote:

> where previous steps are also shown[1].  Then figure 3.14 shows a
> predictive parsing table, presumably constructed using the FIRST
> and FOLLOW sets.
>
>           a            c             d
>     +-----------------------------------------+
>   X |  X -> a        X -> Y        X -> Y     |
>     |  Y -> Y                                 |

Just a typo correction: In (X,a) the second alternative should read
X -> Y
not Y -> Y.

DoDi

Re:Some questions about recursive descent?

<22-03-005@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=293&group=comp.compilers#293

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: xdpas...@xdpascal.com (Paul Robinson)
Newsgroups: comp.compilers
Subject: Re:Some questions about recursive descent?
Date: 5 Mar 2022 14:46:08 -0600
Organization: Compilers Central
Lines: 108
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-005@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="17429"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 05 Mar 2022 16:53:47 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Paul Robinson - Sat, 5 Mar 2022 20:46 UTC

On 27 Feb 2022 16:37:05 EST, Johann 'Myrkraverk' Oskarsson
<johann@myrkraverk.com> asked in Newsgroup comp.compilers:

| I have some questions about the construction of
| recursive descent parsers. The reason for my questions
| is that I know production compilers /use/ recursive
| descent. My first question is: how are production
| recursive descent parsers constructed?

If you want to see another example of this, since I added features to
it, is the XDPascal compiler, available from
https://github.com/electric-socket/xdpw or https://XDpascal.com. Now,
what I'm going to do is explain why and how this is done.

Pascal syntax requires that, after a procedure or function signature,
and any constants, types or variables are declared, it must have a
block. This is a BEGIN, zero or more statements, each separated by a
semicolon, then an optional semicolon, then END and a semicolon if it's
a procedure or function, a period if it's the end of a program. A syntax
diagram may help here:

Block: ^______________>|
| V
--->BEGIN---+-->Statement---+-->V--------->^----> END --->
^_______ ; <____V |__> ;_>__|

In the code example below, any identifier ending
in TOK is the token for that symbol or keyword,
e.g. DOTOK for DO, DOTTOK for period, etc.

So at the end of the compile, the compiler would
have a piece of code similar to this:

block;
nexttoken;
if thistoken<>DOTTOK then error('Period expected');
finishcompiling;

Procedure "block" would have something similar to
the following:

repeat
statement;
until thistoken = ENDTOK or END_OF_FILE ;

Procedure "Statement" would look like this:

nextToken;
Case thisToken of
Identifier : Identstmt;
FORTOK: ForStmt
REPEATTOK: RepeatStmt;
BEGINTOK: Block; // [1]
...
WHILETOK: WhileStmt
ELSE
Error('Syntax error');

FOR statement:
----> FOR --> := --> ident ------> expression -->|
|<-----------------------------------------------V
V--V------- TO ----->----> expression --> DO --|
V____> DOWNTO -->| |
|
|<---------------------------------------------V
V--------- statement ----------------->

Now, procedure ForStmt might be coded like this:

nexttoken;
if thistok <> BECOMESTOK then
error(' := expected');
nexttoken;
if thistok <> identTok then
error(' Identifier expected ');
...

and so on until:

nexttoken;
if thistok <> DOTOK then
error(' DO expected');
nexttoken;
statement;

Now, if you look, at // [1], the case selector for
the token BEGIN calls procedure block again, while
it itself was called by block earlier. And, after the
DO token in FOR, statement calls itself.

If you notice, recursive descent works because each
statement in the program is processed until a "trigger"
token exits that state back to where it was called.
On return, syntax checking continues with the next
token after the one that caused a block or statement
procedure to be called.

I hope this example helps you to understand recursive
descent parsing.

XDPascal.com: The (new) home of the XDPascal Compiler.
----
"The lessons of history teach us - if they teach us anything - that no
one learns the lessons that history teaches us."
Paul Robinson <XDpascal@XDPascal.com> / <paul@paul-robinson.us>

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor