Message-ID:

Reserve your abuse for your true friends. -- Larry Wall in <199712041852.KAA19364@wall.org>

On 3/24/2022 6:11 PM, Keith Thompson wrote:
> BGB <cr88192@gmail.com> writes:
>> On 3/24/2022 1:36 PM, Keith Thompson wrote:
>>> Guillaume <message@bottle.org> writes:
>>>> Le 24/03/2022 à 16:45, Scott Lurndal a écrit :
>>>>> Richard Harnden <richard.nospam@gmail.com> writes:
>>>>>> "C: Everyone's favourite programming language isn't a programming language"
>>>>>>
>>>>>> Erm ... <https://www.theregister.com/2022/03/23/c_not_a_language>
>>>>> The article doesn't make a whole lot of sense - it's basically just
>>>>> bitching because swift and rust can't do everything that C can.
>>>>
>>>> And you're putting it lightly.
>>>> Of course, first thing, without even reading it, is questioning the
>>>> relevance of such articles written by heavy proponents of
>>>> "alternative" languages. How much more biased can one be?
>>>>
>>>> Digging a bit deeper, there is nothing much to even take from this
>>>> short article. It's pretty shallow.
>>>>
>>>> - C is difficult to parse? C declarations are, for instance, not
>>>> exactly easy, but it's only very moderately difficult. You can have
>>>> fun with this to get an idea: https://cdecl.org/ . I has never made
>>>> it a problem to write a C compiler. There are more of them out there
>>>> than for any other language, I think. OTOH, there is, AFAIK, only one
>>>> implementation of Rust. Which to me is in itself a problem. I'm
>>>> curious to see if it ever even gets to the point of having other
>>>> competing implementations.
>>> It's probably referring to the typedef issue.
>>> In the absence of typedefs, type names consist of keywords and
>>> punctuation. A typedef (a feature that was added to the language
>>> after the syntax had been established) creates a type name that's an
>>> identifier. It's impossible to parse something that uses a typedef name
>>> without knowing that it's a typedef name. In effect a typedef adds a
>>> context-sensitive keyword to the grammar. The parser needs to query the
>>> symbol table.
>>> But it's a solved problem and not that big a deal.
>>
>> Or, at least, you can't do it "properly" without knowing the typedefs.
>
> I don't know what distinction you're making. I said that C cannot be
> parsed without recognizing typedefs.
>

Note I said "properly", one can do it, sorta, but may get incorrect
results. For the most part, these results *may* be sufficient for some
tasks which are a little looser in terms of their requirements.

In this case, one can ignore the existence of cases where the parser
will fall on its face (they will still exist, but we can pretend they
don't exist...).

>> But, heuristics exist, say we assume that:
>> identifier identifier;
>> identifier '*' identifier;
>> ...
>>
>> Are not valid in statement context, then one can assume by extension,
>> that encountering one of these patterns means to interpret the first
>> identifier as a type name.
>
> The latter assumption is invalid. For example:
>
> int main(void) {
> int x=0, y=0;
> x * y;
> {
> typedef int x;
> x * y;
> }
> }
>
> The two lines "x * y;" are textually identical. The first is an
> expression statement containing a multiplicative expression. The second
> defines an object y of type int*. A parser cannot distinguish between
> these two interpretations without knowing that x has been declared as a
> typedef in the inner scope. (This applies equally in C90, which did not
> allow mixing declarations and statements.)
>
> A C parser that makes "assumptions" without already knowing which
> identifiers are defined as typedefs in the current scope is not a valid
> C parser. It's annoying, but as I said, it's a solved problem -- and
> it's only a problem if you're writing a C parser.
>
> And I've never heard of a C compiler that doesn't get this right.
>

Granted.

But, the context in this case was for parsing for things like FFI tools,
rather than necessarily for a C compiler. Namely, if you want to parse
headers without needing to do so via an actual C compiler.

For a C compiler, trying to do it this way will result in incorrectly
parsing things on occasion. I was not trying to claim that it wouldn't;
rather you would have a variant that would not allow for things like
having a multiplication expression where a statement was expected.

And, OTOH, for something like "mine program for prototypes to put into a
header" or similar, one can get "even more tacky" (eg, using textual
pattern matching to grab anything that looks like a function
declaration, *).

*: And then using annotation comments to tell the tool when to ignore
any "false positives".

>> In some cases, one can make an assumption regarding any preceding
>> keywords, and whether these keywords already define a type valid (as
>> to whether to parse an identifier as a typename or part of the
>> declarator), in combination with assuming the original cases to still
>> be invalid (if we see this, the identifier is parsed as a typename).
>>
>> ...
>>
>> This will parse most code OK, but if one tries to use this as a
>> general strategy, it may trip up occasionally.
>>
>> One major problem case is casts:
>> x=(foo)(1, 2);
>> If this is meant to be a function call, it may be parsed incorrectly
>> if 'foo' is assumed to be a type. This situation does pop up
>> occasionally in the wild.
>
> Again, no assumption is necessary or sufficient. The parser has to know
> whether "foo" is a type name or not.
>

To be "correct".

Requirements are looser if:
It doesn't need to always parse correctly;
It doesn't need to deal with "general purpose" code;
...

So, it is not valid for a general purpose compiler, but may be
sufficient for FFI tools or similar.

It may still need to at least attempt to parse function bodies though,
because these are semi-common inside headers.

Granted, one can argue as well that maintaining a table of previously
seen typedefs isn't all that difficult, and is required for correct parsing.

One could potentially have a language which has the "look and feel" of
C, but uses a different parsing strategy, and leave it as the
programmers' issue if they write something and it doesn't work as expected.

>> ...
>>
>>
>>
>> But, yeah, if one has a preprocessor, handles typedefs during parsing,
>> ... then parsing C correctly isn't all that difficult.
>
> This has nothing to do with the C preprocessor.
>

One still needs a preprocessor for "#include" and "#ifdef" and similar,
to be able to make much useful sense of the contents of most C headers
for FFI tools.

>> There may be other edge cases, as I recently encountered code that did
>> something like:
>> enum vals1 {
>> vals1a, vals1b, vals1c, ...
>> };
>> enum vals2 {
>> vals2a, vals2b=vals1c, ...
>> };
>>
>> Which required adding a new table in this case to sort out enum parsing.
>
> The expression on the RHS of "vals2b=..." can be any integer constant
> expression. The fact that it happens to be an enumeration constant is
> irrelevant, and is not a problem for parsing.
>

It depends some on when and where the values are assigned in the "enum".
In this case, the enums were handled fairly early in the process, so
ended up needing another table mostly to deal with values for things
like "enum" and "const int x=whatever;" and similar, so that these
values could be propagated at a fairly early stage (namely during
parse-time constant-folding operations).

Things like dealing with global variables and functions would not happen
until a following stage.

>> If designing a new language, would prefer to avoid a dependence on
>> prior context during parsing, but this is its own thing.
>
> Agreed. If the idea that an identifier can be a type name had been
> introduced to the language earlier, the syntax could have been designed
> to avoid the typedef problem.
>

Yeah.

Languages like C# seem to be built around a strategy more like I had
mentioned earlier.

Subject	Replies	Author
C isn't a programming language By: Richard Harnden on Thu, 24 Mar 2022	71	Richard Harnden

Reserve your abuse for your true friends. -- Larry Wall in <199712041852.KAA19364@wall.org>

devel / comp.lang.c / Re: C isn't a programming language