Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

With all the fancy scientists in the world, why can't they just once build a nuclear balm?


devel / comp.compilers / Re: How do you create a grammar for a multi-language language?

SubjectAuthor
* How do you create a grammar for a multi-language language?Roger L Costello
+* Re: How do you create a grammar for a multi-language language?Roger L Costello
|+- Re: How do you create a grammar for a multi-language language?Kartik Agaram
|`* RE: How do you create a grammar for a multi-language language?Christopher F Clark
| +* Re: How do you create a grammar for a multi-language language?gah4
| |`- Re: How do you create a grammar for a multi-language language?gah4
| `- RE: How do you create a grammar for a multi-language language?Roger L Costello
+* Re: How do you create a grammar for a multi-language language?gah4
|`- Re: How do you create a grammar for a multi-language language?Robin Vowels
`* Re: How do you create a grammar for a multi-language language?Hans-Peter Diettrich
 `* Re: How do you create a grammar for a multi-language language?Hans-Peter Diettrich
  `* Re: How do you create a grammar for a multi-language language?gah4
   `* Keywords and Reserved WordsChristopher F Clark
    +* Re: Keywords and Reserved Wordsgah4
    |`- Re: Keywords and Reserved WordsRobin Vowels
    +* Re: Keywords and Reserved WordsRobin Vowels
    |`- Re: Keywords and Reserved Words in FortranThomas Koenig
    `- Re: Keywords and Reserved WordsRobin Vowels

1
How do you create a grammar for a multi-language language?

<22-03-004@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=292&group=comp.compilers#292

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: coste...@mitre.org (Roger L Costello)
Newsgroups: comp.compilers
Subject: How do you create a grammar for a multi-language language?
Date: Thu, 3 Mar 2022 13:57:55 +0000
Organization: Compilers Central
Lines: 19
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-004@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="16318"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, question
Posted-Date: 05 Mar 2022 16:50:21 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Thread-Topic: How do you create a grammar for a multi-language language?
Thread-Index: AdgvBM3tabuoXFasQLiTikka5cGndQ==
Content-Language: en-US
 by: Roger L Costello - Thu, 3 Mar 2022 13:57 UTC

Hello Compiler Experts!

Suppose you are creating a grammar for a language that hosts other languages.
For example, the (parent) language hosts the regular expression language and
the XPath language. How do you create a grammar for a multi-language language?
I can imagine two approaches:

1. Create the grammar for the parent language and copy and paste into it the
grammars of the hosted languages. Copy-and-paste doesn't sound appealing.

2. Create a grammar just for the parent language. Then, create a parsing
pipeline: parse the input first with the grammar for the parent language, then
parse the input with the grammar for one hosted language, then parse the input
with the grammar for the second hosted language, and so forth. I have no idea
how this would work; e.g., how would an abstract syntax tree be constructed?

How do you create a grammar for a multi-language language?

/Roger

Re: How do you create a grammar for a multi-language language?

<22-03-006@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=294&group=comp.compilers#294

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: coste...@mitre.org (Roger L Costello)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sat, 5 Mar 2022 22:29:37 +0000
Organization: Compilers Central
Lines: 51
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-006@comp.compilers>
References: <22-03-004@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="40565"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, question, comment
Posted-Date: 05 Mar 2022 18:27:13 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US
 by: Roger L Costello - Sat, 5 Mar 2022 22:29 UTC

Kartik Agaram wrote:

* I'd re-ask the question in the context of the concrete languages you're
considering.

I am working with a team to develop a new language and our current thinking is
that the language will use the XPath language for navigating through documents
and the CSS language for expressing sets of match/action pairs.

This idea of multi-language languages is very common within the XML community.
For example, the XSLT language uses (hosts) the XPath language. Here is an
excerpt to illustrate:

select="/Bookstore/Book[1]/Title"

The expression /Bookstore/Book[1]/Title is an XPath expression, the other
parts are XSLT. So, the format of the select statement is:

select="XPath"

See how XSLT hosts XPath? That is, one language (XSLT) is using another
language (XPath).

Another XML language called XML Schema hosts two languages: the regex language
and the XPath language.

How do compiler experts create grammars for multi-language languages?

/Roger

From: Kartik Agaram <ak@akkartik.com>
Sent: Saturday, March 5, 2022 4:58 PM
Subject: [EXT] Re: How do you create a grammar for a multi-language language?

The first way is unlikely to work. Inlining one grammar into another is
overwhelmingly likely to result in garbage. If you're lucky it'll recognize
one of the languages and ignore the productions of the other. More likely
it'll recognize a single language that's an uncontrollable melange of both. In
the worst case it'll seem to work but fail in strange corner cases.

In general, I think you under-estimate the difficulty of this problem, and how
easy it is for two languages to be fundamentally incompatible. The first
question isn't "what's the cleanest way to do this for arbitrary languages?"
but "is it even possible for these two languages?" followed by "is it worth
the trouble?"

So I'd re-ask the question in the context of the concrete languages you're
considering.
[XSLT is written in XML. Conceptually at least, first you parse the XML using
an XML grammar, then you parse the XSLT using the XSLT grammar. -John]

Re: How do you create a grammar for a multi-language language?

<22-03-007@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=295&group=comp.compilers#295

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ak...@akkartik.com (Kartik Agaram)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sat, 05 Mar 2022 16:55:48 -0800
Organization: Compilers Central
Lines: 36
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-007@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-006@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="68525"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, design
Posted-Date: 05 Mar 2022 20:11:57 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-006@comp.compilers>
 by: Kartik Agaram - Sun, 6 Mar 2022 00:55 UTC

> I am working with a team to develop a new language and our current thinking is
> that the language will use the XPath language for navigating through documents
> and the CSS language for expressing sets of match/action pairs.

That feels like a much more tractable problem. Don't even think of it
as a multi-language language. It's a language you're designing (so you
have lots of room for maneuver) around XPath.

> This idea of multi-language languages is very common within the XML community.
> For example, the XSLT language uses (hosts) the XPath language. Here is an
> excerpt to illustrate:
>
> select="/Bookstore/Book[1]/Title"
>
> The expression /Bookstore/Book[1]/Title is an XPath expression, the other
> parts are XSLT. So, the format of the select statement is:
>
> select="XPath"
>
> See how XSLT hosts XPath? That is, one language (XSLT) is using another
> language (XPath).

Yeah, putting a second language inside the string literals of the
first is a very common approach. All the tooling for the outer
language can remain oblivious of the second. You don't even need to
design a new language, this approach works with any existing one, and
you can focus on implementing a library to process XPath.

If you do want to design another language, however, you can make things a lot nicer. For example:

- any URL is a valid literal in the Red programming language: https://github.com/red/docs/blob/master/en/datatypes.adoc
- the JSX extension of JavaScript used in the React framework permits component literals that look a lot like some sort of markup language: https://reactjs.org/docs/introducing-jsx.html

Kartik
http://akkartik.name/about

Re: How do you create a grammar for a multi-language language?

<22-03-008@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=296&group=comp.compilers#296

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sat, 5 Mar 2022 21:10:46 -0800 (PST)
Organization: Compilers Central
Lines: 24
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-008@comp.compilers>
References: <AdgvBM3tabuoXFasQLiTikka5cGndQ==> <22-03-004@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88045"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, design
Posted-Date: 06 Mar 2022 12:01:52 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-004@comp.compilers>
 by: gah4 - Sun, 6 Mar 2022 05:10 UTC

On Saturday, March 5, 2022 at 1:50:24 PM UTC-8, Roger L Costello wrote:
> Hello Compiler Experts!

> Suppose you are creating a grammar for a language that hosts other languages.
> For example, the (parent) language hosts the regular expression language and
> the XPath language. How do you create a grammar for a multi-language language?

A little it depends on how the two go together.

A not so unusual way to write compilers years ago, was recursive descent
for statements, and operator precedence for expressions. The recursive
descent parser calls the operator precedence parser when it needs an
expression to be parsed.

Some mixed languages can be parsed separately.

The C preprocessor, originally a separate pass, but now usually implemented
together with the rest of the C compiler, processes its statements, and passes
everything else through. It does that well enough, that it is commonly used
with Fortran. (The traditional version, not the newer one.)

PHP is designed to recognize its syntax, and ignore everything else, which
is normally html, but could be another language.

Re: How do you create a grammar for a multi-language language?

<22-03-009@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=297&group=comp.compilers#297

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: DrDiettr...@netscape.net (Hans-Peter Diettrich)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sun, 6 Mar 2022 12:23:23 +0100
Organization: Compilers Central
Lines: 10
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-009@comp.compilers>
References: <22-03-004@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88645"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, comment
Posted-Date: 06 Mar 2022 12:03:48 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Hans-Peter Diettrich - Sun, 6 Mar 2022 11:23 UTC

On 3/3/22 2:57 PM, Roger L Costello wrote:
> How do you create a grammar for a multi-language language?

A grammar is not required. In 1971 I saw a program source that could
tell the compiler it was compiled with - Fortran or Algol.

DoDi
[I've seen obfuscated programs that will work as shell scripts or
perl programs but I don't think that was what he was asking about.
-John]

RE: How do you create a grammar for a multi-language language?

<22-03-010@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=298&group=comp.compilers#298

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: RE: How do you create a grammar for a multi-language language?
Date: Sun, 6 Mar 2022 15:37:13 +0200
Organization: Compilers Central
Lines: 90
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-010@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-006@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="89298"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, syntax
Posted-Date: 06 Mar 2022 12:06:31 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Christopher F Clark - Sun, 6 Mar 2022 13:37 UTC

At first glance, I misread the question and thought about using
inheritance to do so. We used that in Yacc++ (the one from compiler
resources, not the one that comes with your C++ compiler) to solve
certain multi-language problems.

However, you are actually solving a "simpler" problem. And, the
standard approach to that is to embed the second language as a
"string" in the outer language. Many languages (and their
compilers/interpreters) do that. That's exactly what your XSLT case
does. The XPATH code is simply a string in the XSLT language, and the
XSLT language doesn't attempt to parse it. It simply hands the code
off to an XPATH parser when in knows the string is XPATH code.

Now, the only problem with that has to do with nested strings. If
your inner language has strings, that probably gives you the
quoting/backslash problem. Many shells have this issue where to nest
strings one needs to add backslashes (and double existing backslashes)
resulting in a terrible counting problem as to when one has enough
(and not too many) of them.

foo = "a \"nested\" string, worse one with an \\\"extra\\\" level of
nesting, imaging doing that \\\\\"more than once\\\\\" and getting
them to line up, recursive \\s are a pain"
letsPlay = "whose newline \\n is it \n" // a game where the rules are
made up and the formatting doesn't matter (or does it)

An alternative to that is often "raw strings" where the outer language
has a syntax where the outer delimiter says within it, ignore quotes
and backslashes etc, and accept anything up to the matching closing
outer delimiter. The best form of these allows one to "tag" the
delimiter with a key that the closing delimiter must include and
match. That way you can use the key to keep the closing delimiter
from being "reserved" in the inner text. Something like this:

foo = """xyzzy
flk" \n \" \\\ js // none of these are special and used verbatim
sdflkjs
ssfs ''''' xlkjxlvj // not a closing """ because it doesn't have
the key xyzzy
sflsfj
''''xyzzy; // now we closed the string

Note that matching the outer delimiter plus the tag might be beyond
the capacity of your lexer, especially if it only does regular
expressions.

A different alternative is to make "strings" that are special in your
language and switch grammars. For example, I like / as a string
delimiter for regular expressions.
Thus /regex/ is a "string" in the outer language, it doesn't parse its
contents, but it also knows that the string is a regular expression
and should be handled as one.
Here is an example of that.

foo = /ab.c*([abc]|xb+)*/;

We actually use a variant of that in Yacc++. We treat {code} as a
string in Yacc++. We don't parse the code. We send it to the C++ or
C# (et al) compiler. However, we have modified the lexer (using an LR
rule rather than just a regular expression) so that we do a little bit
of parsing to allow nested { } pairs in the code and comments. That
takes a little work to do. In Yacc++, we made that easy by merging
regular expressions and LR rules into a unified model, but most lexer
and parser generators don't do that. (A shame in my opinion, but
that's just *my* *opinion*.)

foo : bar { if (x == y) { a = b } /* a C style comment with a /*
inside, C comments don't nest per spec */
while (a < b) { a = a << 1; // C++ comment with a } inside
} // end while
} /* close code */ bletch;

The disadvantage of doing that in the lexer is that we cannot
conveniently use { } pairs in the "outer language" syntax. Not
without lexer states or the equivalent.

But, notice in all these methods we don't actually do (much) parsing
of the string. We don't try to make "one" grammar out of them,
because what's really hard to do (unless the languages are quite
similar, and especially have the same tokenizing rules) is to embed
the second language as something other than some kind of string. It
can be done, but it can be a lot of work and fragile.

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Re: How do you create a grammar for a multi-language language?

<22-03-011@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=299&group=comp.compilers#299

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sun, 6 Mar 2022 14:36:01 -0800 (PST)
Organization: Compilers Central
Lines: 31
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-011@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-006@comp.compilers> <22-03-010@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="72642"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, design
Posted-Date: 06 Mar 2022 18:22:54 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-010@comp.compilers>
 by: gah4 - Sun, 6 Mar 2022 22:36 UTC

On Sunday, March 6, 2022 at 9:06:34 AM UTC-8, Christopher F Clark wrote:

(snip)

> However, you are actually solving a "simpler" problem. And, the
> standard approach to that is to embed the second language as a
> "string" in the outer language. Many languages (and their
> compilers/interpreters) do that. That's exactly what your XSLT case
> does. The XPATH code is simply a string in the XSLT language, and the
> XSLT language doesn't attempt to parse it. It simply hands the code
> off to an XPATH parser when in knows the string is XPATH code.

That works when the outer language is parsed first.

For PHP, and I believe the C preprocessor, the inner language is parsed
first, so the parser has to ignore everything, including quoting, in
the outer language.

I don't believe I ever tried preprocessor statements inside C string
constants, but as far as I know, it works.

One has to be careful to avoid ambiguities between the two,
or are created with the combination of the two languages.
[PHP effectively treats material between ?> and <?php as an instruction
to print the material as if it were a quoted string.

C says that the input is tokenized before it does the preprocessor
phase, so it does not look inside quoted strings. The # and ##
preprocessor operators allow some preprocessor time creation of quoted
strings. -John]

RE: How do you create a grammar for a multi-language language?

<22-03-012@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=300&group=comp.compilers#300

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: coste...@mitre.org (Roger L Costello)
Newsgroups: comp.compilers
Subject: RE: How do you create a grammar for a multi-language language?
Date: Sun, 6 Mar 2022 23:32:13 +0000
Organization: Compilers Central
Lines: 23
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-012@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-006@comp.compilers> <22-03-010@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="81037"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, design
Posted-Date: 06 Mar 2022 18:49:53 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US
 by: Roger L Costello - Sun, 6 Mar 2022 23:32 UTC

Chris Clark wrote:

> the standard approach to that is to embed the
> second language as a "string" in the outer language.
> Many languages (and their compilers/interpreters)
> do that. That's exactly what your XSLT case
> does. The XPATH code is simply a string in the
> XSLT language, and the XSLT language doesn't
> attempt to parse it. It simply hands the code
> off to an XPATH parser when in knows the string
> is XPATH code.

Okay, so there would be one grammar for XSLT, a second (independent) grammar for XPath. The grammar for XSLT just treats the XPath portions as strings. The grammar for XPath ignores the XSLT portions. So the input is processed in a pipeline fashion.

Is that correct?

But, but, but, ... how would an Abstract Syntax Tree be constructed when the input is processed in a pipeline?

/Roger
[I believe the answer is "incrementally." You can run the parsers sequentially
or you can run them as coroutines. I don't think it makes much difference unless
the intermediate version is extremely huge, but that's rarely a problem with XML. -John]

Re: How do you create a grammar for a multi-language language?

<22-03-013@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=301&group=comp.compilers#301

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sun, 6 Mar 2022 16:50:08 -0800 (PST)
Organization: Compilers Central
Lines: 52
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-013@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-006@comp.compilers> <22-03-010@comp.compilers> <22-03-011@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="1066"; mail-complaints-to="abuse@iecc.com"
Keywords: C, history
Posted-Date: 06 Mar 2022 20:44:17 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-011@comp.compilers>
 by: gah4 - Mon, 7 Mar 2022 00:50 UTC

On Sunday, March 6, 2022 at 3:22:57 PM UTC-8, gah4 wrote:

(snip)

> I don't believe I ever tried preprocessor statements inside C string
> constants, but as far as I know, it works.
(snip)

> [PHP effectively treats material between ?> and <?php as an instruction
> to print the material as if it were a quoted string.

I suppose it works that way.

> C says that the input is tokenized before it does the preprocessor
> phase, so it does not look inside quoted strings. The # and ##
> preprocessor operators allow some preprocessor time creation of quoted
> strings. -John]

It seems that the traditional C compiler, sometimes used with other
languages, such as Fortran, has different parsing and tokenizing rules.

gcc -E --traditional quote.c

will process files with a preprocessor statement inside quotes.

On the other hand, the result is likely not what was wanted,
at least not for C. Among others, it puts out lines like:

# 6 "quote.c" 2

which then end up inside the string.

On the other hand, if you language uses ' for other than strings,
or for two uses, then it should work.

One of the stranger things that I have known for about 50 years,
is direct access I/O in IBM Fortran IV. There are statements like:

WRITE(1'N) X, Y, Z

where N is the record in the direct access file. The compiler also
accepts string (Hollerith) constants with apostrophes.
(But not in direct access I/O statements.)

In any case, the --traditional C preprocessor is commonly used with Fortran,
where I suspect C tokenizing could cause problems.

Some C preprocessors, but not the current ISO C version, and it
seems not gcc -E --traditional, will substitute preprocessor symbols
inside quoted strings.

As the OP notes, mixed parsing can have surprising effects!

Re: How do you create a grammar for a multi-language language?

<22-03-014@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=302&group=comp.compilers#302

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: robi...@dodo.com.au (Robin Vowels)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Mon, 7 Mar 2022 13:39:33 +1100
Organization: Compilers Central
Lines: 13
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-014@comp.compilers>
References: <AdgvBM3tabuoXFasQLiTikka5cGndQ==> <22-03-004@comp.compilers> <22-03-008@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="17392"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, PL/I
Posted-Date: 06 Mar 2022 22:08:18 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Robin Vowels - Mon, 7 Mar 2022 02:39 UTC

From: "gah4" <gah4@u.washington.edu>
Sent: Sunday, March 06, 2022 4:10 PM
>
> The C preprocessor, originally a separate pass, but now usually implemented
> together with the rest of the C compiler, processes its statements, and passes
> everything else through. It does that well enough, that it is commonly used
> with Fortran. (The traditional version, not the newer one.)

IBM's PL/I has a preprocessor that accepts a subset of PL/I.
The finished text is then passed to the compiler.

XPL processes such text as it goes, handling the text processing
before passing it to the compiler.

Re: How do you create a grammar for a multi-language language?

<22-03-015@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=303&group=comp.compilers#303

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: DrDiettr...@netscape.net (Hans-Peter Diettrich)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Mon, 7 Mar 2022 05:08:33 +0100
Organization: Compilers Central
Lines: 28
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-015@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="36239"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 06 Mar 2022 23:43:40 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-009@comp.compilers>
 by: Hans-Peter Diettrich - Mon, 7 Mar 2022 04:08 UTC

On 3/6/22 12:23 PM, Hans-Peter Diettrich wrote:
> I don't think that was what he was asking about.

The question is too unspecific for me. A (traditional) grammar covers
the parser part, while the lexer is specified differently. Languages
based on the same lexer can be merged into one grammar, no problem so
far. But if the second language shall apply to a single token
(string...) of the primary language then both languages are independent
and can not share a single grammar. Like a document can contain parts of
several natural languages, where each language applies only to specific
parts of the documents and is subject to special lexer rules (RTL/LTR
reading...).

In C/C++ and its preprocessor we have a special construct where the
preprocessor tokens are refined/extended by the C/C++ lexer. And there
were discussions whether the preprocessor should look into string
literals (Siemens?), or whether the preprocessor can generate C comments
(Microsoft). A separate preprocessor run at least allows for the latter,
as the preprocessed source code is lexed again and can build tokens
differently from the tokens in the original source code.

My conclusion:
A single (formal) grammar can not contain multiple languages. Unless you
specify that e.g. statements and expressions in a programming language
shall be considered subject to different languages. Such nitpicking is
not worth further thoughts :-(

DoDi

Re: How do you create a grammar for a multi-language language?

<22-03-016@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=304&group=comp.compilers#304

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Sun, 6 Mar 2022 21:22:17 -0800 (PST)
Organization: Compilers Central
Lines: 29
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-016@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="15313"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, design
Posted-Date: 07 Mar 2022 13:04:25 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-015@comp.compilers>
 by: gah4 - Mon, 7 Mar 2022 05:22 UTC

On Sunday, March 6, 2022 at 8:43:43 PM UTC-8, Hans-Peter Diettrich wrote:

(snip)
> My conclusion:
> A single (formal) grammar can not contain multiple languages. Unless you
> specify that e.g. statements and expressions in a programming language
> shall be considered subject to different languages. Such nitpicking is
> not worth further thoughts :-(

It would be complicated for compiled languages.

TeX allows one to change, character by character in the input, which characters
are letters, and so used in a control sequence name.

LaTeX macros, to allow for internal names that don't conflict with any
user defined names, puts an @ sign in them, after changing @ to a letter.
Then, just before going into user code, changes @ back to not a letter.
(Specifically, it is other.) The lexer can't read too far ahead, as the character
codes might change at any time.

It is also interesting to see how languages without reserved
words, keep track of which words have the keyword meaning,
and which are ordinary names.
[Back in the 1970s there were a bunch of extendible languages like EL/1 and
IMP72 where you could add and change syntax on the fly. They all died since
it meant that in practice no two programs were written in the same language
and they were unreadable. Now we have overloading so you understand the
syntax but you don't know what it means. -John]

Keywords and Reserved Words

<22-03-017@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=305&group=comp.compilers#305

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: Keywords and Reserved Words
Date: Tue, 8 Mar 2022 21:46:44 +0200
Organization: Compilers Central
Lines: 124
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-017@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers> <22-03-016@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="80364"; mail-complaints-to="abuse@iecc.com"
Keywords: syntax, parse, comment
Posted-Date: 08 Mar 2022 15:23:51 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Christopher F Clark - Tue, 8 Mar 2022 19:46 UTC

> It is also interesting to see how languages without reserved
> words, keep track of which words have the keyword meaning,
> and which are ordinary names.

This is a topic close to my heart. There are actually a variety of
methods that work relatively well in this regard.

-----------------------------------------------

First, recognizing keywords, if you are using a language (unlike the
original FORTRAN where spaces were not significant), it is useful to
follow a model proposed by Frank Deremer, with lexing divided into a
scanner and a screener. The scanner has only the responsibility of
dividing tokens into discrete entities and labelling those which have
fixed types, so keyword recognition is not part of the scanner to the
scanner they are just identifiers. The screener then checks the
identifier to see if it is a keyword.

In most languages, that lookup can easily be done by preloading the
symbol table with the identifiers that are keywords. This is not
inefficient, because even if the identifier is not a keyword, you need
a unique symbol table entry for each identifier in any case, so you
are doing that lookup anyway.

The only wrinkle here is the case (which we have in Yacc++) where you
want your keywords to be case insensitive and your identifier to be
case sensitive. You do get an extra lookup in that case, but only the
first time an identifier is seen. Once, it is seen, you have marked
the symbol table entry with whether it is a keyword or not.

-----------------------------------------------

Now, the inverse problem. For keywords that are reserved words.
There is nothing more to do. However, if you only want your keywords
to be recognized as keywords in specific contexts, then you need the
second part of the solution. This is a [set of] parser rule[s] that
turns keywords back into identifiers if they aren't being used in a
context where they have a special meaning.

This rule looks something like this:

identifier: IDENTIFIER | IF | THEN | ELSE | WHILE | DO | ... ;

That is, you define an identifier non-terminal that matches either an
IDENTIFIER token or any of your KEYWORD[ token]s. And, when you want
an identifier you use the non-terminal rather than token in your rule.

Now, this may cause "conflicts" or "ambiguities" and you resolve that
by figuring out which keywords are special in that context and making
an alternate idenitiier_in_this_context rule, which omits the
conflicting keywords from its list. (And, by the way, this is one
reason to use a parser generator, because it will report those
conflicts to you and tell you when you have some keyword that is still
an issue in some context. The checking that a parser generator does
helps you not make mistakes.) And, in most cases the parsers
lookahead can figure out whether you mean the keyword or the
identifier so that you rarely have to make those extra rules that
exclude specific keywords from the list.

-----------------------------------------------

The main place one gets in trouble is when you have a [set of]
keyword[s] that is/are optional before a list of identifiers. In that
case, your identifier list cannot start with one of those keywords.
However, a better solution (if you are designing your own language) is
to put the optional keywords before a mandatory one.

For example, in Yacc++ we have a KEYWORD definition, and a variation
on it for SUBSTRING keywords. But we don't use SUBSTRING as a
declaration by itself, so the rule for the declaration of a list of
keywords looks like:

keyword_declaration:
SUBSTRING? KEYWORD identifier ("=" number)? (","? identifier ("="
number)?)* ";" ;

If we did it in the other order, we couldn't allow substring as the
first identifier in the list.
That is:

keyword_declaration:
KEYWORD SUBSTRING? identifier_but_not_substring ("=" number)? (","?
identifier ("=" number)?)* ";" ;

And, note, if we had hand-written a recursive descent parser, we
wouldn't have been warned of that edge case. Either we would have
mis-parsed it or we would have that weird exception in our language.
Neither is ideal in my point of view.

And, if you go through it carefully, you can see that you can make a
much more complex set of rules that covers the second case, so that if
you saw the SUBSTRING keyword or your "substring" identifier was
followed by an equals or a comma or a semi-colon it was still legal.
Something to introduce subtle bugs in users' programs where a
seemingly innocuous change causes something to go from legal (and
meaning one thing) to illegal or meaning something else.

Or as our wise moderator might put it, you start creating a language
where the user is no longer certain what their code means.

-----------------------------------------------

By the way, we used both parts in Yacc++. We do have a small number
of reserved words, like yy_eof which we use to communicate between
lexer and the parser to indicate the end of the input. You cannot use
that identifier for any other purpose in our grammars. However, we
allow keywords like TOKEN to be used also as names of tokens or
non-terminals and we can tell when you are using it to declare a token
versus using it as the name of a token or non-terminal. So, most of
our keywords are just that context sensitive keywords that only have
reserved meanings in certain contexts. Otherwise, they are just
identifiers. I guess one can tell that before starting compiler
resources, my partner and I had spent a fair amount of time writing
and using compilers in PL/I dialects.

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[If anyone wants to know how to find the tokens in old space-insensitive Fortran
I can tell you, but it's as ugly as you might imagine. -John]

Re: Keywords and Reserved Words

<22-03-018@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=306&group=comp.compilers#306

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Keywords and Reserved Words
Date: Wed, 9 Mar 2022 00:31:25 -0800 (PST)
Organization: Compilers Central
Lines: 16
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-018@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers> <22-03-016@comp.compilers> <22-03-017@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88054"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, Fortran
Posted-Date: 09 Mar 2022 15:20:16 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-017@comp.compilers>
 by: gah4 - Wed, 9 Mar 2022 08:31 UTC

(snip, our moderator wrote)

> [If anyone wants to know how to find the tokens in old space-insensitive Fortran
> I can tell you, but it's as ugly as you might imagine. -John]

As well as I know, the original 704 Fortran compiler has been lost to history,
but the Fortran II compiler is still around.

Note, though, that fixed-form is still in the Fortran standard, and still has the
space-insensitivity. More popular for new programs, free-form is not space
insensitive.

Also, to make it easier for programmers, some space is still optional in
free-form Fortran, for old and new keywords. For example, both GO TO
and GOTO are allowed, I suspect with separate entries into the parsing
tables. PL/I also has both GO TO and GOTO statements.

Re: Keywords and Reserved Words

<22-03-021@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=308&group=comp.compilers#308

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: robi...@dodo.com.au (Robin Vowels)
Newsgroups: comp.compilers
Subject: Re: Keywords and Reserved Words
Date: Thu, 10 Mar 2022 09:55:46 +1100
Organization: Compilers Central
Lines: 10
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-021@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers> <22-03-016@comp.compilers> <22-03-017@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="64703"; mail-complaints-to="abuse@iecc.com"
Keywords: Fortran, parse, comment
Posted-Date: 09 Mar 2022 20:53:53 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Robin Vowels - Wed, 9 Mar 2022 22:55 UTC

> [If anyone wants to know how to find the tokens in old space-insensitive Fortran
> I can tell you, but it's as ugly as you might imagine. -John]

No it's not.
See A. H. J. Sale, "The Classification of FORTRAN Statements", Computer Journal,
Vol. 14 No. 1, 1971.
[Having written production Fortran compilers, I can assure you that to
do it right is quite ugly. I looked at the article and he missed some things,
and he made some unrealistic assumptions that a compiler would accept nothing
beyond what was in the Basic Fortran standard. -John]

Re: Keywords and Reserved Words

<22-03-022@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=309&group=comp.compilers#309

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: robi...@dodo.com.au (Robin Vowels)
Newsgroups: comp.compilers
Subject: Re: Keywords and Reserved Words
Date: Thu, 10 Mar 2022 10:00:22 +1100
Organization: Compilers Central
Lines: 10
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-022@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers> <22-03-016@comp.compilers> <22-03-017@comp.compilers> <22-03-018@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="65230"; mail-complaints-to="abuse@iecc.com"
Keywords: Fortran, PL/I
Posted-Date: 09 Mar 2022 20:55:01 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Robin Vowels - Wed, 9 Mar 2022 23:00 UTC

From: "gah4" <gah4@u.washington.edu>
Sent: Wednesday, March 09, 2022 7:31 PM

> Also, to make it easier for programmers, some space is still optional in
> free-form Fortran, for old and new keywords. For example, both GO TO
> and GOTO are allowed, I suspect with separate entries into the parsing
> tables. PL/I also has both GO TO and GOTO statements.

GO and GOTO are keywords in both FORTRAN and PL/I,
and they are not reserved.

Re: Keywords and Reserved Words

<22-03-023@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=310&group=comp.compilers#310

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: robi...@dodo.com.au (Robin Vowels)
Newsgroups: comp.compilers
Subject: Re: Keywords and Reserved Words
Date: Thu, 10 Mar 2022 11:59:42 +1100
Organization: Compilers Central
Lines: 41
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-023@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers> <22-03-016@comp.compilers> <22-03-017@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="65701"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 09 Mar 2022 20:56:09 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Robin Vowels - Thu, 10 Mar 2022 00:59 UTC

From: "Christopher F Clark" <christopher.f.clark@compiler-resources.com>
Sent: Wednesday, March 09, 2022 6:46 AM

> First, recognizing keywords, if you are using a language (unlike the
> original FORTRAN where spaces were not significant), it is useful to
> follow a model proposed by Frank Deremer, with lexing divided into a
> scanner and a screener. The scanner has only the responsibility of
> dividing tokens into discrete entities and labelling those which have
> fixed types, so keyword recognition is not part of the scanner to the
> scanner they are just identifiers. The screener then checks the
> identifier to see if it is a keyword.

A similar approach is used in DEUCE PL/I. (more in a moment).

In XPL, the scanner recognises keywords (which are reserved words)
and classifies them differently from identifiers.
The other objects (integers, strings, operators, parentheses, etc)
are classified appropriately.

Returning to DEUCE PL/I, the scanner classifies indentifiers
without making any distinction between identifiers and keywords.
Other objects are classified as above. The symbol table
already contains the keywords. All new identifiers are entered
into the symbol table.

In the semantics section, initial recognition is on groupings of
identifiers (again keywords are not recognised as such).
Once a phrase is recognised, the opening keywords in the
statement are compared with permissible keyword combinations,
without resorting to the symbol table. For example, PUT EDIT ( ...
PUT LIST ( ... and PUT DATA etc are identified after comparison with
the first word, and comparison with the second word is sufficient to
identify the kind of statement.
At this point, it can be said that the keywords are not reserved.
Avoiding looking in the symbol table for keywords saves time.

> In most languages, that lookup can easily be done by preloading the
> symbol table with the identifiers that are keywords. This is not
> inefficient, because even if the identifier is not a keyword, you need
> a unique symbol table entry for each identifier in any case, so you
> are doing that lookup anyway.

Re: Keywords and Reserved Words in Fortran

<22-03-024@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=311&group=comp.compilers#311

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.compilers
Subject: Re: Keywords and Reserved Words in Fortran
Date: Thu, 10 Mar 2022 07:07:25 -0000 (UTC)
Organization: news.netcologne.de
Lines: 38
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-024@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> <22-03-015@comp.compilers> <22-03-016@comp.compilers> <22-03-017@comp.compilers> <22-03-021@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="37264"; mail-complaints-to="abuse@iecc.com"
Keywords: Fortran, parse, comment
Posted-Date: 11 Mar 2022 14:47:29 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Thomas Koenig - Thu, 10 Mar 2022 07:07 UTC

Our moderator wrote

> [Having written production Fortran compilers, I can assure you that to
> do it right is quite ugly.

I can confirm that.

Looking at how a recent compiler does that, I can slightly rephrase
https://gcc.gnu.org/wiki/GFortranHacking :

The front end's action can be grouped into four phases:

Parsing: This converts source code into a stream of tokens which
describe the language. Because Fortran does not have reserved
keywords, the gfortran runs a series of matchers against code
trying to find one that matches a statement. On failing a match
an error message may be queued, and another matcher tried. If all
attempts at matching fail, the error queue is dumped to the user.

Resolution: This resolves things left over from the parsing phase,
such as types of expressions, and compile-time simplification of
constants. Many errors are issued in this phase. At the end of
this phase, the abstract syntax tree is finished.

[...]

This works (sort of) but has the drawback that cleaning up after an
error is error-prone (hah!) so there are a lot of ice-after-invalid
(internal compiler error after invalid code) issues that have emerged
over the years.
[As I recall, first I did a pass to fold continuation cards and mark
quoted and hollerith strings, then a pass to look for an equal sign
not inside parens and not followed by an comma, which said it was an
assignment statement, and then the parsing was straightforward. Eacn
non-assignment statement starts with a keyword and it is easy to tell
from the syntax when to look for another keyword. Repeat for the
statement that follows a logical IF with a special case for the
statement numbers in an arithmetic IF. -John]

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor