novaBBS - comp.compilers - Re: About finding the start symbol of a grammar

About finding the start symbol of a grammar

<21-05-015@comp.compilers>

https://www.novabbs.com/devel/article-flat.php?id=48&group=comp.compilers#48

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.imp.ch!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ecosta....@gmail.com (Eduardo Costa)
Newsgroups: comp.compilers
Subject: About finding the start symbol of a grammar
Date: Fri, 21 May 2021 03:49:43 -0700 (PDT)
Organization: Compilers Central
Lines: 28
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-05-015@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="65584"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, question, comment
Posted-Date: 21 May 2021 09:46:22 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Eduardo Costa - Fri, 21 May 2021 10:49 UTC

Hey guys,

I've been lately dealing with a parser generator for LL grammars, and since
it's inception I've always been blindy assuming the first element read from
within the input file is going to be the start symbol or starting rule.

So I've been wondering all this time, just out of curiosity, if there exists a
method or algorithm to find out the start symbol of a given grammar?

I guess the answer is no.

While there would exist grammars we could recursively check to find out which
it's start symbol is (i.e.: it's the only rule that used the rest of them,
where checking every other resulted in dangling rules that weren't even called
in), there might be other grammars for which more than one rule yields full
coverage (all of these obviously defining different languages) and so leading
to ambiguity.

I only contemplate a simple coverage test, even though other techniques could
exist, again, all of them leading to a point where we couldn't ascertain if
one or the other is what the user meant.

So I'm wondering if this is even an issue in production-grade
parser-generators out there?

Regards,
[yacc and its descendants have an explicit %start declaration, usually defaulting to
the first rule in the file. -John]

Re: About finding the start symbol of a grammar

<21-05-016@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=49&group=comp.compilers#49

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 563-365-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: About finding the start symbol of a grammar
Date: Fri, 21 May 2021 14:14:37 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 50
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-05-016@comp.compilers>
References: <21-05-015@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88660"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, comment
Posted-Date: 21 May 2021 11:17:15 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Kaz Kylheku - Fri, 21 May 2021 14:14 UTC

On 2021-05-21, Eduardo Costa <ecosta.tmp@gmail.com> wrote:
> Hey guys,
>
> I've been lately dealing with a parser generator for LL grammars, and since
> it's inception I've always been blindy assuming the first element read from
> within the input file is going to be the start symbol or starting rule.
>
> So I've been wondering all this time, just out of curiosity, if there exists a
> method or algorithm to find out the start symbol of a given grammar?
>
> I guess the answer is no.

Surely, the start symbol of a context-free grammar is one which appears
only in the left hand side of a rule. If there is such a unique symbol,
it must be /the/ start symbol.

> While there would exist grammars we could recursively check to find out which
> it's start symbol is (i.e.: it's the only rule that used the rest of them,
> where checking every other resulted in dangling rules that weren't even called
> in), there might be other grammars for which more than one rule yields full
> coverage (all of these obviously defining different languages) and so leading
> to ambiguity.

Ambiguity doesn't imply there is no algorithm to find a start symbol,
but that the algorithm must be able to report situations like the
presence of multiple start symbols, or no start symbols.

On the face of it, this problem does not smell of undecidability, or
even NP completeness. Where do you suspect is the difficulty?

It seems like this is a fairly trivial property of a graph, type of
thing.

Whether rules are dangling is also a graph property: is the graph
connected.

But tha seems like an identifiable point where the algorithm can stop
and announce a decision. Then diagnostics can be issued.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[I have seen useful grammars where the start symbol also appears in the RHS of a rule.
Think of the standard expression grammar.

You pick the start symbol that gives you the language you want to parse. -John]

Re: About finding the start symbol of a grammar

<21-05-017@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=50&group=comp.compilers#50

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: DrDiettr...@netscape.net (Hans-Peter Diettrich)
Newsgroups: comp.compilers
Subject: Re: About finding the start symbol of a grammar
Date: Fri, 21 May 2021 17:02:34 +0200
Organization: Compilers Central
Lines: 35
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-05-017@comp.compilers>
References: <21-05-015@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88903"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 21 May 2021 11:17:47 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <21-05-015@comp.compilers>
Content-Language: de-DE

by: Hans-Peter Diettrich - Fri, 21 May 2021 15:02 UTC

On 5/21/21 12:49 PM, Eduardo Costa wrote:
> I've been lately dealing with a parser generator for LL grammars, and since
> it's inception I've always been blindy assuming the first element read from
> within the input file is going to be the start symbol or starting rule.
>
> So I've been wondering all this time, just out of curiosity, if there exists a
> method or algorithm to find out the start symbol of a given grammar?

Graph analysis methods exist to find unreachable nodes which can become
start symbols. In short any node that is a predecessor of *all* nodes
can be a start symbol. If no such node exists then the grammar is faulty
(discontiguous).

IMO this problem can be solved by introduction of an artificial start
symbol that allows to reach all other symbols but can not be reached
itself. Please note that this solution solves a syntactic problem but
may not prevent or even cause semantic problems.

> I only contemplate a simple coverage test, even though other techniques could
> exist, again, all of them leading to a point where we couldn't ascertain if
> one or the other is what the user meant.
>
> So I'm wondering if this is even an issue in production-grade
> parser-generators out there?

A useful parser generator should include checks for grammar sanity.

DoDi

Re: About finding the start symbol of a grammar

<21-05-018@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=51&group=comp.compilers#51

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: About finding the start symbol of a grammar
Date: Fri, 21 May 2021 15:32:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 34
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-05-018@comp.compilers>
References: <21-05-015@comp.compilers> <21-05-016@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="33119"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 22 May 2021 13:23:38 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Anton Ertl - Fri, 21 May 2021 15:32 UTC

Kaz Kylheku <563-365-8930@kylheku.com> writes:
>Surely, the start symbol of a context-free grammar is one which appears
>only in the left hand side of a rule. If there is such a unique symbol,
>it must be /the/ start symbol.

It could be a now-unused nonterminal, while the start symbol is part
of a strongly connected component of the graph.

If you have one nonterminal that dominates all other nonterminals (the
other nonterminals can be derived from it, but not the other way
round), it probably is the start symbol. Why "probably"? There is
still the possibility that there is a wrapper rule around the real
start symbol that was used for debugging and is left in the grammar.

>On the face of it, this problem does not smell of undecidability, or
>even NP completeness. Where do you suspect is the difficulty?

It's easy to find nodes with particular properties in a graph. But
there is no guarantee that the result really is the start symbol.

There is a reason why you specify the start symbol in some way.

>Whether rules are dangling is also a graph property: is the graph
>connected.

"Connected" is an undirected-graph property. If a nonterminal is
unreachable from the start symbol, it can still be connected to the
reachable graph through a RHS-nonterminal.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: About finding the start symbol of a grammar

<21-05-019@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=52&group=comp.compilers#52

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: driko...@gmail.com (Ev. Drikos)
Newsgroups: comp.compilers
Subject: Re: About finding the start symbol of a grammar
Date: Sat, 22 May 2021 06:52:17 +0300
Organization: Aioe.org NNTP Server
Lines: 15
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-05-019@comp.compilers>
References: <21-05-015@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="33346"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 22 May 2021 13:23:54 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US

by: Ev. Drikos - Sat, 22 May 2021 03:52 UTC

On 21/05/2021 13:49, Eduardo Costa wrote:
> While there would exist grammars we could recursively check to find out which
> it's start symbol is (i.e.: it's the only rule that used the rest of them,
> where checking every other resulted in dangling rules that weren't even called
> in), there might be other grammars for which more than one rule yields full
> coverage (all of these obviously defining different languages) and so leading
> to ambiguity.

IMHO, it can be so simple as you describe here without important overhead.
Typically, a parser will reduce the start symbol and finish. All rules
that yield full coverage can be ie alternatives of a single root symbol:

RootSymbol -> R1 | R2 | ... | Rn

Ev. Drikos

Not only Guinness - Linux is good for you, too. -- Banzai on IRC

devel / comp.compilers / Re: About finding the start symbol of a grammar

Subject	Author
About finding the start symbol of a grammar	Eduardo Costa
Re: About finding the start symbol of a grammar	Kaz Kylheku
Re: About finding the start symbol of a grammar	Anton Ertl
Re: About finding the start symbol of a grammar	Hans-Peter Diettrich
Re: About finding the start symbol of a grammar	Ev. Drikos