Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The road to hell is paved with NAND gates. -- J. Gooding


devel / comp.lang.c / Re: Implicit String-Literal Concatenation

SubjectAuthor
* Implicit String-Literal ConcatenationLawrence D'Oliveiro
+* Re: Implicit String-Literal ConcatenationJanis Papanagnou
|`* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
| `- Re: Implicit String-Literal Concatenationbart
+- Re: Implicit String-Literal ConcatenationBlue-Maned_Hawk
+- Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
+* Re: Implicit String-Literal ConcatenationŁukasz 'Maly' Ostrowski
|`* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
| `* Re: Implicit String-Literal ConcatenationJanis Papanagnou
|  `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
|   `- Re: Implicit String-Literal ConcatenationJanis Papanagnou
+- Re: Implicit String-Literal ConcatenationKaz Kylheku
`* Re: Implicit String-Literal ConcatenationMike Sanders
 +* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
 |`- Re: Implicit String-Literal ConcatenationMike Sanders
 `* Re: Implicit String-Literal ConcatenationDavid Brown
  +- Re: Implicit String-Literal ConcatenationMike Sanders
  +* Re: Implicit String-Literal Concatenationbart
  |`* Re: Implicit String-Literal ConcatenationDavid Brown
  | `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
  |  `* Re: Implicit String-Literal ConcatenationKaz Kylheku
  |   `* Re: Implicit String-Literal ConcatenationDavid Brown
  |    `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
  |     +* Re: Implicit String-Literal Concatenationbart
  |     |`* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
  |     | `* Re: Implicit String-Literal Concatenationbart
  |     |  +* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
  |     |  |`- Re: Implicit String-Literal Concatenationbart
  |     |  `* Re: Implicit String-Literal ConcatenationScott Lurndal
  |     |   `* Re: Implicit String-Literal ConcatenationJanis Papanagnou
  |     |    `* Re: Implicit String-Literal ConcatenationScott Lurndal
  |     |     +* Re: Implicit String-Literal ConcatenationJanis Papanagnou
  |     |     |`* Re: Implicit String-Literal ConcatenationScott Lurndal
  |     |     | `* Re: Implicit String-Literal ConcatenationKeith Thompson
  |     |     |  `* Re: Implicit String-Literal Concatenationbart
  |     |     |   `* Re: Implicit String-Literal ConcatenationKeith Thompson
  |     |     |    `* Re: Implicit String-Literal ConcatenationJanis Papanagnou
  |     |     |     `* Re: Implicit String-Literal ConcatenationKeith Thompson
  |     |     |      `- Re: Implicit String-Literal ConcatenationJanis Papanagnou
  |     |     `- Re: Implicit String-Literal ConcatenationKeith Thompson
  |     `* Re: Implicit String-Literal ConcatenationDavid Brown
  |      `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
  |       `* Re: Implicit String-Literal ConcatenationDavid Brown
  |        `* Re: Implicit String-Literal ConcatenationKaz Kylheku
  |         `- Re: Implicit String-Literal ConcatenationDavid Brown
  `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
   +* Re: Implicit String-Literal ConcatenationKeith Thompson
   |`- Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
   `* Re: Implicit String-Literal Concatenationbart
    `* Re: Implicit String-Literal ConcatenationDavid Brown
     +* Re: Implicit String-Literal Concatenationbart
     |+- Re: Implicit String-Literal ConcatenationDavid Brown
     |+* Re: Implicit String-Literal ConcatenationKeith Thompson
     ||+* Re: Implicit String-Literal ConcatenationMalcolm McLean
     |||+* Re: Implicit String-Literal ConcatenationDavid Brown
     ||||`* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
     |||| +- Re: Implicit String-Literal ConcatenationKeith Thompson
     |||| `* Re: Implicit String-Literal ConcatenationRichard Harnden
     ||||  `- Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
     |||`- Re: Implicit String-Literal ConcatenationKeith Thompson
     ||`* Re: Implicit String-Literal Concatenationbart
     || +* Re: Implicit String-Literal ConcatenationRichard Harnden
     || |`* Re: Implicit String-Literal ConcatenationChris M. Thomasson
     || | `* Re: Implicit String-Literal ConcatenationKeith Thompson
     || |  `* Re: Implicit String-Literal ConcatenationChris M. Thomasson
     || |   `* Re: Implicit String-Literal ConcatenationKeith Thompson
     || |    +- Re: Implicit String-Literal ConcatenationChris M. Thomasson
     || |    `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
     || |     `* Re: Implicit String-Literal ConcatenationKeith Thompson
     || |      `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
     || |       +* Re: Implicit String-Literal ConcatenationKaz Kylheku
     || |       |`* Re: Implicit String-Literal ConcatenationKeith Thompson
     || |       | +* Re: Implicit String-Literal ConcatenationKaz Kylheku
     || |       | |`- Re: Implicit String-Literal ConcatenationKeith Thompson
     || |       | +- Re: Implicit String-Literal ConcatenationChris M. Thomasson
     || |       | `- Re: Implicit String-Literal ConcatenationRichard Harnden
     || |       `- Re: Implicit String-Literal ConcatenationKeith Thompson
     || +* Re: Implicit String-Literal ConcatenationDavid Brown
     || |`- Re: Implicit String-Literal ConcatenationKeith Thompson
     || `* Re: Implicit String-Literal ConcatenationKeith Thompson
     ||  `* Re: Implicit String-Literal ConcatenationJanis Papanagnou
     ||   `* Re: Implicit String-Literal ConcatenationKeith Thompson
     ||    `* Re: Implicit String-Literal ConcatenationKaz Kylheku
     ||     `- Re: Implicit String-Literal ConcatenationJames Kuyper
     |`- Re: Implicit String-Literal ConcatenationKeith Thompson
     `* Re: Implicit String-Literal ConcatenationKeith Thompson
      +* Re: Implicit String-Literal Concatenationbart
      |`* Re: Implicit String-Literal ConcatenationKeith Thompson
      | `* Re: Implicit String-Literal Concatenationbart
      |  +- Re: Implicit String-Literal ConcatenationKeith Thompson
      |  +* Re: Implicit String-Literal ConcatenationtTh
      |  |`- Re: Implicit String-Literal ConcatenationScott Lurndal
      |  `* Re: Implicit String-Literal ConcatenationScott Lurndal
      |   `* Re: Implicit String-Literal ConcatenationKeith Thompson
      |    `* Re: Implicit String-Literal ConcatenationScott Lurndal
      |     `* Re: Implicit String-Literal ConcatenationDavid Brown
      |      `* Re: Implicit String-Literal ConcatenationScott Lurndal
      |       +* Re: Implicit String-Literal ConcatenationScott Lurndal
      |       |`* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
      |       | `* Re: Implicit String-Literal Concatenationbart
      |       |  `* Re: Implicit String-Literal ConcatenationLawrence D'Oliveiro
      |       `- Re: Implicit String-Literal ConcatenationDavid Brown
      `* Re: Implicit String-Literal ConcatenationDavid Brown

Pages:12345
Re: Implicit String-Literal Concatenation

<urn6li$3s62i$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33892&group=comp.lang.c#33892

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.furie.org.uk!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 12:50:10 +0100
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <urn6li$3s62i$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 28 Feb 2024 11:50:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2b7eb0b264583c4bb7dc0db8d671358e";
logging-data="4069458"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Gw64yCSB+pCZQNRm1vfYXC4Ldc8pPF10="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:uK7+7oU9gtzAnuDbwUEcNYRdOuo=
Content-Language: en-GB
In-Reply-To: <20240227170925.837@kylheku.com>
 by: David Brown - Wed, 28 Feb 2024 11:50 UTC

On 28/02/2024 02:09, Kaz Kylheku wrote:
> On 2024-02-27, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
>> On Tue, 27 Feb 2024 23:21:28 +0100, David Brown wrote:
>>
>>> The #embed pre-processor directive turns the file into a list of integer
>>> constants, one per byte (unless an implementation offers other options).
>>
>> What a waste of time.
>
> Plus easily doable in 1970's Lisp.
>

That would be useful, if we were living in the 1970's or if anyone had
wanted to learn Lisp this side of the millennium bug.

As I mentioned before, it's not particularly difficult to do this kind
of manipulation, and people write utilities for them in a variety of
languages, or download a variety of free tools for the job.

But it will often be more convenient to have it built into the language
and compiler. And for those interested in speed, the test
implementations have handled this far more efficiently than other
techniques. Logically, #embed turns the file into a list of numbers. In
practice, if you use it for the common case of initialising a const
array of unsigned char, the compiler simply copies and pastes the file
into the output as a binary blob.

It would, IMHO, have been useful also to have had an "embed operator" in
the manner of the "pragma operator", so that it could be used in a macro
definition.

Re: Implicit String-Literal Concatenation

<urn6sv$3s62i$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33893&group=comp.lang.c#33893

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 12:54:06 +0100
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <urn6sv$3s62i$2@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 28 Feb 2024 11:54:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2b7eb0b264583c4bb7dc0db8d671358e";
logging-data="4069458"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WERjiz+3Mu5Q1n3SLnOw2jmLX0VU85ig="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:+CId1AgXcRh484GzbG29h7wsw8s=
Content-Language: en-GB
In-Reply-To: <urlmo7$3eg2j$1@dont-email.me>
 by: David Brown - Wed, 28 Feb 2024 11:54 UTC

On 27/02/2024 23:12, bart wrote:
> On 27/02/2024 20:25, Lawrence D'Oliveiro wrote:
>> On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
>>
>>> And with C23, we will get #embed, though it is not yet supported by
>>> major tools.
>>
>> More and more hacks on the preprocessor. Why not just get rid of it and
>> replace it with something like m4?
>>
>> Because then you will discover that string-based macros are inherently an
>> unmanageable problem.
>
> I hadn't notice that #embed was a preprocessor directive. But that is
> not the problem here, it is this:
>
> "The expansion of a #embed directive is a token sequence formed from the
> list of integer constant expressions described below."
>
> If a string like "ABC" really is converted to the five tokens 'A' comma
> 'B' comma 'C', then it's going to make long strings and binary files
> inefficient.
>
> Embedding a 100KB file will result in a 100KB bigger executable, but
> along the way it may have to generate 200,000 tokens within the
> compiler, half of them commas. Which in turn will need to be turned into
> 100,000 integer expressions.
>
> I would hope that implementations find some way of streamlining that
> process, perhaps by turning that 100KB of data directly into a 100KB
> string.

They won't use strings, they will use data blobs - binary data. Then
there is no issue with null bytes. And yes, implementations will skip
the token generation (unless you are doing something weird, such as
using #embed to read the parameters to a function call).

Tests with prototype implementations gave extremely fast results.

Re: Implicit String-Literal Concatenation

<urnbh6$3t14d$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33896&group=comp.lang.c#33896

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 13:13:13 +0000
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <urnbh6$3t14d$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Feb 2024 13:13:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4c905df20a464da86fec05f3fa822103";
logging-data="4097165"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19lLC1ZdHf3emhAJ1aoPdMX"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:E3tOjeBqFto6mOvxxWx163DkxCs=
Content-Language: en-GB
In-Reply-To: <urn6sv$3s62i$2@dont-email.me>
 by: bart - Wed, 28 Feb 2024 13:13 UTC

On 28/02/2024 11:54, David Brown wrote:
> On 27/02/2024 23:12, bart wrote:
>> On 27/02/2024 20:25, Lawrence D'Oliveiro wrote:
>>> On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
>>>
>>>> And with C23, we will get #embed, though it is not yet supported by
>>>> major tools.
>>>
>>> More and more hacks on the preprocessor. Why not just get rid of it and
>>> replace it with something like m4?
>>>
>>> Because then you will discover that string-based macros are
>>> inherently an
>>> unmanageable problem.
>>
>> I hadn't notice that #embed was a preprocessor directive. But that is
>> not the problem here, it is this:
>>
>> "The expansion of a #embed directive is a token sequence formed from
>> the list of integer constant expressions described below."
>>
>> If a string like "ABC" really is converted to the five tokens 'A'
>> comma 'B' comma 'C', then it's going to make long strings and binary
>> files inefficient.
>>
>> Embedding a 100KB file will result in a 100KB bigger executable, but
>> along the way it may have to generate 200,000 tokens within the
>> compiler, half of them commas. Which in turn will need to be turned
>> into 100,000 integer expressions.
>>
>> I would hope that implementations find some way of streamlining that
>> process, perhaps by turning that 100KB of data directly into a 100KB
>> string.
>
> They won't use strings, they will use data blobs - binary data.  Then
> there is no issue with null bytes.

AFAIK strings in C can have embedded zeros when not assumed to be
zero-terminated. So here:

char s[]={1,2,3,0,4,5,6};

s will have a length of 7.

>  And yes, implementations will skip
> the token generation (unless you are doing something weird, such as
> using #embed to read the parameters to a function call).

What happens if you do -E to preprocess only?

Re: Implicit String-Literal Concatenation

<urnepk$3tmoq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33897&group=comp.lang.c#33897

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 15:08:52 +0100
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <urnepk$3tmoq$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Feb 2024 14:08:53 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2b7eb0b264583c4bb7dc0db8d671358e";
logging-data="4119322"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19vnBSeayfGmoyK+kN89LM3TCwol4hWPzo="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:CFsZ3beEIuNsL/S51PWiGBMmtlQ=
In-Reply-To: <urnbh6$3t14d$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Wed, 28 Feb 2024 14:08 UTC

On 28/02/2024 14:13, bart wrote:
> On 28/02/2024 11:54, David Brown wrote:
>> On 27/02/2024 23:12, bart wrote:
>>> On 27/02/2024 20:25, Lawrence D'Oliveiro wrote:
>>>> On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
>>>>
>>>>> And with C23, we will get #embed, though it is not yet supported by
>>>>> major tools.
>>>>
>>>> More and more hacks on the preprocessor. Why not just get rid of it and
>>>> replace it with something like m4?
>>>>
>>>> Because then you will discover that string-based macros are
>>>> inherently an
>>>> unmanageable problem.
>>>
>>> I hadn't notice that #embed was a preprocessor directive. But that is
>>> not the problem here, it is this:
>>>
>>> "The expansion of a #embed directive is a token sequence formed from
>>> the list of integer constant expressions described below."
>>>
>>> If a string like "ABC" really is converted to the five tokens 'A'
>>> comma 'B' comma 'C', then it's going to make long strings and binary
>>> files inefficient.
>>>
>>> Embedding a 100KB file will result in a 100KB bigger executable, but
>>> along the way it may have to generate 200,000 tokens within the
>>> compiler, half of them commas. Which in turn will need to be turned
>>> into 100,000 integer expressions.
>>>
>>> I would hope that implementations find some way of streamlining that
>>> process, perhaps by turning that 100KB of data directly into a 100KB
>>> string.
>>
>> They won't use strings, they will use data blobs - binary data.  Then
>> there is no issue with null bytes.
>
> AFAIK strings in C can have embedded zeros when not assumed to be
> zero-terminated. So here:
>
>     char s[]={1,2,3,0,4,5,6};
>
> s will have a length of 7.

That's not a string, it's an array of char. A "string" in C is "a
contiguous sequence of characters terminated by and including the first
null character". The difference is crucial in respect to the handling
of null bytes. And it is the main reason for #embed generating a
comma-separated sequence of integer constants rather than a string. (It
also avoids messy hex character sequences if you show the output of
#embed somewhere.)

>
>>   And yes, implementations will skip the token generation (unless you
>> are doing something weird, such as using #embed to read the parameters
>> to a function call).
>
> What happens if you do -E to preprocess only?
>

That's something weird :-)

I guess you get the integer list.

Re: Implicit String-Literal Concatenation

<uro6ls$2mh1$5@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33900&group=comp.lang.c#33900

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo...@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 20:56:28 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <uro6ls$2mh1$5@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Feb 2024 20:56:28 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2b5f550de027bed53e7de3d0720ccb3e";
logging-data="88609"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ZXem1L6CzyRciAG7JnkF8"
User-Agent: Pan/0.155 (Kherson; fc5a80b8)
Cancel-Lock: sha1:/vO/68obnR2gxugJoQS42hbXKRM=
 by: Lawrence D'Oliv - Wed, 28 Feb 2024 20:56 UTC

On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:

> ... people write utilities for them in a variety of languages ...
>
> But it will often be more convenient to have it built into the language
> and compiler.

What can be built into the language can only ever be a small subset of
the many and varied ways that people have incorporated data blobs into
their programs. Often these will need to have custom structures with
computed header fields, that kind of thing. So you will need custom
build tools to construct these structures, and then you might as well
include those blobs directly into the final build, rather than go
through some extra step of pretending to turn them back into some
source form.

For example, here’s an old Android project of mine (OK, so the app is
Java code, but the same principle applies)
<https://bitbucket.org/ldo17/unicode_browser_android/src/master/>
where I wrote a custom Python script to read a Nameslist.txt file
downloaded from unicode.org to generate a table which could be loaded
into memory quickly for easy searching.

Re: Implicit String-Literal Concatenation

<uro8sl$3d71$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33902&group=comp.lang.c#33902

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 21:34:14 +0000
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <uro8sl$3d71$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
<uro6ls$2mh1$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Feb 2024 21:34:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4c905df20a464da86fec05f3fa822103";
logging-data="111841"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+BiltanoUiFwGiBFzgIEAV"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:h1Zkr+n3C8FNKwHJ3C1Zb1HR0LE=
In-Reply-To: <uro6ls$2mh1$5@dont-email.me>
Content-Language: en-GB
 by: bart - Wed, 28 Feb 2024 21:34 UTC

On 28/02/2024 20:56, Lawrence D'Oliveiro wrote:
> On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:
>
>> ... people write utilities for them in a variety of languages ...
>>
>> But it will often be more convenient to have it built into the language
>> and compiler.
>
> What can be built into the language can only ever be a small subset of
> the many and varied ways that people have incorporated data blobs into
> their programs. Often these will need to have custom structures with
> computed header fields, that kind of thing. So you will need custom
> build tools to construct these structures, and then you might as well
> include those blobs directly into the final build, rather than go
> through some extra step of pretending to turn them back into some
> source form.
>
> For example, here’s an old Android project of mine (OK, so the app is
> Java code, but the same principle applies)
> <https://bitbucket.org/ldo17/unicode_browser_android/src/master/>
> where I wrote a custom Python script to read a Nameslist.txt file
> downloaded from unicode.org to generate a table which could be loaded
> into memory quickly for easy searching.

I can see now where you get your coding style from. You seem to like
stretching things out vertically as much as possible:

public void Add
(
int CategoryCode,
ItemType Item
)
/* Use this instead of add to populate CodeToIndex table. */
{
CodeToIndex.put(CategoryCode, getCount());
add(Item);
} /*Add*/

In C:

void Add(int CategoryCode, ItemType Item) {
CodeToIndex_put(CategoryCode, getCount());
add(Item);
}

4 non-comment lines versus 9. I know Java needs tons of boilerplate, but
but it is not all the language's fault.

Re: Implicit String-Literal Concatenation

<87frxcuv87.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33903&group=comp.lang.c#33903

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 13:36:40 -0800
Organization: None to speak of
Lines: 24
Message-ID: <87frxcuv87.fsf@nosuchdomain.example.com>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="e720677c6ddefaeb35e4f23cb0e9f8ab";
logging-data="115558"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18DvcZAnittI335G+j04Yku"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:iEVSO8naiLHOpLNxR7sYcHTXPGI=
sha1:e3JaSp5+wxECMpxReSUYzwO3Knk=
 by: Keith Thompson - Wed, 28 Feb 2024 21:36 UTC

bart <bc@freeuk.com> writes:
[...]
> AFAIK strings in C can have embedded zeros when not assumed to be
> zero-terminated. So here:
>
> char s[]={1,2,3,0,4,5,6};
>
> s will have a length of 7.

Strings *by definition* cannot have embedded zeros. A null character
terminates a string.

A string literal can have embedded \0 characters, but if you're
suggesting that #embed should expand to a string literal, I can see
several disadvantages and no significant advantages. For one thing, the
data may or may not end with a null character; string literals always
do.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Implicit String-Literal Concatenation

<87bk80uuzr.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33904&group=comp.lang.c#33904

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 13:41:44 -0800
Organization: None to speak of
Lines: 17
Message-ID: <87bk80uuzr.fsf@nosuchdomain.example.com>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="e720677c6ddefaeb35e4f23cb0e9f8ab";
logging-data="115558"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JAev965kEPtqLZghKNWH9"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:nDI7s7lx8kuxI48uAEHgP3DJO9Q=
sha1:AdGuLBrhD9l/t6LUSh8D3dkwpfQ=
 by: Keith Thompson - Wed, 28 Feb 2024 21:41 UTC

bart <bc@freeuk.com> writes:
[...]
> AFAIK strings in C can have embedded zeros when not assumed to be
> zero-terminated. So here:
>
> char s[]={1,2,3,0,4,5,6};
>
> s will have a length of 7.

s will have a *size* of 7. Its length, as a string, is 3. The
distinction between "length" and "size" is particularly important in
this case.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Implicit String-Literal Concatenation

<877ciouua2.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33905&group=comp.lang.c#33905

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 13:57:09 -0800
Organization: None to speak of
Lines: 46
Message-ID: <877ciouua2.fsf@nosuchdomain.example.com>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="e720677c6ddefaeb35e4f23cb0e9f8ab";
logging-data="124128"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8vj6lGs2Iy4Fnu2xMH5je"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:tBf6TKeO0m+RID0K7lJPR/91o2s=
sha1:hk+LxggzR2tLPeX+q85DVywo0U8=
 by: Keith Thompson - Wed, 28 Feb 2024 21:57 UTC

David Brown <david.brown@hesbynett.no> writes:
[...]
> They won't use strings, they will use data blobs - binary data. Then
> there is no issue with null bytes. And yes, implementations will skip
> the token generation (unless you are doing something weird, such as
> using #embed to read the parameters to a function call).
>
> Tests with prototype implementations gave extremely fast results.

I'm not sure how that would work. #embed is a preprocessor directive,
and at least in the abstract model it has to expand to valid C code.

I would have expected that it would simply generate the list of
comma-separated integer constants described in the standard; later
phases would simply parse that list and generate code as if that
sequence had been written in the original source file. Do you know of
an implementation that does something else?

For example, say you have a file "foo.dat" containing 4 bytes with
values 0, 1, 2, and 3. This would be perfectly valid:

struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};

struct foo obj = {
#embed "foo.dat"
};

#embed isn't defined to translate an input file to a sequence of bytes.
It's defined to translate an input file to a sequence of integer
constant expressions.

*Maybe* a compiler could optimize for the case where it knows that it's
being used to initialize an array of unsigned char, but (a) that would
require the preprocessor to have information that normally doesn't exist
until later phases, and (b) I'm not convinced it would be worth the
effort.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Implicit String-Literal Concatenation

<uroe02$4eoh$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33908&group=comp.lang.c#33908

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 23:01:22 +0000
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <uroe02$4eoh$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 28 Feb 2024 23:01:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c3e2a728c2cf8236510c608f905802b0";
logging-data="146193"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19D16OHBt8ePicltguwoyeFWA5L9tHozqY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:TmdpNrAeBbTUy47tRvjTEWapiU8=
Content-Language: en-GB
In-Reply-To: <877ciouua2.fsf@nosuchdomain.example.com>
 by: bart - Wed, 28 Feb 2024 23:01 UTC

On 28/02/2024 21:57, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
> [...]
>> They won't use strings, they will use data blobs - binary data. Then
>> there is no issue with null bytes. And yes, implementations will skip
>> the token generation (unless you are doing something weird, such as
>> using #embed to read the parameters to a function call).
>>
>> Tests with prototype implementations gave extremely fast results.
>
> I'm not sure how that would work. #embed is a preprocessor directive,
> and at least in the abstract model it has to expand to valid C code.
>
> I would have expected that it would simply generate the list of
> comma-separated integer constants described in the standard; later
> phases would simply parse that list and generate code as if that
> sequence had been written in the original source file. Do you know of
> an implementation that does something else?
>
> For example, say you have a file "foo.dat" containing 4 bytes with
> values 0, 1, 2, and 3. This would be perfectly valid:
>
> struct foo {
> unsigned char a;
> unsigned short b;
> unsigned int c;
> double d;
> };
>
> struct foo obj = {
> #embed "foo.dat"
> };

It would be unfortunate if your example was allowed. Clearly a binary
representation of an instance of your struct would probably require 16
bytes rather than 4, of which one may be padding.

Certainly if you were to write it out to disk as binary, it would need
more than 4.

> #embed isn't defined to translate an input file to a sequence of bytes.
> It's defined to translate an input file to a sequence of integer
> constant expressions.

Maybe it should be defined exactly like that, because that is what
people might expect. You example is better off using a normal text file
which contains an actual comma-delimited list (and which can mix ints
and floats), and a regular #include.

Re: Implicit String-Literal Concatenation

<87y1b4tbcq.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33909&group=comp.lang.c#33909

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 15:31:17 -0800
Organization: None to speak of
Lines: 109
Message-ID: <87y1b4tbcq.fsf@nosuchdomain.example.com>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com>
<uroe02$4eoh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="2beb5206cb1be4986a7eae51046cf684";
logging-data="160528"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fhfNhdwB0hnyXJyKEeP3b"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:S8sQ8r1znloi40ggYS52p4ekOo8=
sha1:B3FdYpdpy8z0vUETrRg/gJ7Z3VI=
 by: Keith Thompson - Wed, 28 Feb 2024 23:31 UTC

bart <bc@freeuk.com> writes:
> On 28/02/2024 21:57, Keith Thompson wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>> [...]
>>> They won't use strings, they will use data blobs - binary data. Then
>>> there is no issue with null bytes. And yes, implementations will skip
>>> the token generation (unless you are doing something weird, such as
>>> using #embed to read the parameters to a function call).
>>>
>>> Tests with prototype implementations gave extremely fast results.
>> I'm not sure how that would work. #embed is a preprocessor
>> directive,
>> and at least in the abstract model it has to expand to valid C code.
>> I would have expected that it would simply generate the list of
>> comma-separated integer constants described in the standard; later
>> phases would simply parse that list and generate code as if that
>> sequence had been written in the original source file. Do you know of
>> an implementation that does something else?
>> For example, say you have a file "foo.dat" containing 4 bytes with
>> values 0, 1, 2, and 3. This would be perfectly valid:
>> struct foo {
>> unsigned char a;
>> unsigned short b;
>> unsigned int c;
>> double d;
>> };
>> struct foo obj = {
>> #embed "foo.dat"
>> };
>
> It would be unfortunate if your example was allowed. Clearly a binary
> representation of an instance of your struct would probably require 16
> bytes rather than 4, of which one may be padding.

Depending on the sizes and alignments of the various types, sure.
So what?

N3096 is the latest public C23 draft.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf

#embed is defined in section 6.10.3.

The expansion of a #embed directive is a token sequence
formed from the list of integer constant expressions described
below. The group of tokens for each integer constant expression
in the list is separated in the token sequence from the group
of tokens for the previous integer constant expression in the
list by a comma. The sequence neither begins nor ends in a
comma. If the list of integer constant expressions is empty,
the token sequence is empty. The directive is replaced by its
expansion and, with the presence of certain embed parameters,
additional or replacement token sequences.

It's a preprocessor directive. The preprocessor operates on text and
proprocessing tokens, not on raw data. There is no way to directly
represent raw data in C source code. (I suppose string literals do so
to some extent, but they can't represent generalized raw data.)

The usage I described above is allowed. I see nothing unfortunate about
it. If you only want to use #embed with arrays of unsigned char, then
do that.

Its primary intended use is to read binary file contents at compile time
and allow a program to treat those contents as a raw representation,
particularly as the initialization for an array of unsigned char. There
was no reason to impose arbitrary restrictions to make it impossible to
use for any other purposes.

I suppose it would have been possible for #embed to expand to the raw
data itself, a binary copy of the input file. That would require C
source code, which currently is plain text, to be able to support
delimited chunks of binary data. It would require changes to portions
of the compiler after the preprocessor. Presumably you'd be able to
write the same representation directly in a C source file, which means
that C source files would no longer necessarily be representable as
text. I can see that causing all kinds of problems.

Fortunately, none of that was necessary, since the authors came up with
a way to define #embed in the preprocessor without making any other
changes to how C source code is represented. The fact that it can be
used in other odd ways doesn't bother me. The code I wrote above is
valid; I never said it was acceptable style.

> Certainly if you were to write it out to disk as binary, it would need
> more than 4.

Yes. So what?

>> #embed isn't defined to translate an input file to a sequence of bytes.
>> It's defined to translate an input file to a sequence of integer
>> constant expressions.
>
> Maybe it should be defined exactly like that, because that is what
> people might expect. You example is better off using a normal text
> file which contains an actual comma-delimited list (and which can mix
> ints and floats), and a regular #include.

I certainly wouldn't advocate writing code like the above. My point is
that, given the definition of #embed in the C23 standard, it's valid and
has well defined semantics.

If you have suggestions for alternate ways to define #embed, they might
be interesting, but it's too late to change the existing specification.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Implicit String-Literal Concatenation

<uroh0n$51an$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33911&group=comp.lang.c#33911

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo...@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 23:52:55 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <uroh0n$51an$2@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
<uro6ls$2mh1$5@dont-email.me> <uro8sl$3d71$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Feb 2024 23:52:55 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="daeb6559b2144face1d3464670b018f1";
logging-data="165207"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ucCJ6E34s/B+lEUrw7LYk"
User-Agent: Pan/0.155 (Kherson; fc5a80b8)
Cancel-Lock: sha1:LBp3ZlwSp68D8F6EqzNMLWl+8xg=
 by: Lawrence D'Oliv - Wed, 28 Feb 2024 23:52 UTC

On Wed, 28 Feb 2024 21:34:14 +0000, bart wrote:

> In C:
>
> void Add(int CategoryCode, ItemType Item) {
> CodeToIndex_put(CategoryCode, getCount());
> add(Item);
> }
>
> 4 non-comment lines versus 9. I know Java needs tons of boilerplate, but
> but it is not all the language's fault.

Or how about

void Add(int CategoryCode, ItemType Item) {CodeToIndex_put(CategoryCode, getCount());add(Item);}

Wow! I never realized you could do that in C!! I thought it was an
error to put stuff after column 72 or something. Thanks for the tip!!!

Re: Implicit String-Literal Concatenation

<uroial$58n9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33912&group=comp.lang.c#33912

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 00:15:17 +0000
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <uroial$58n9$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
<uro6ls$2mh1$5@dont-email.me> <uro8sl$3d71$1@dont-email.me>
<uroh0n$51an$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 00:15:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c3e2a728c2cf8236510c608f905802b0";
logging-data="172777"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX182HPBqb3KC5s4gt/Z45P6h"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:VhchBe7pyjjtlXTg2FfJb0cC/C8=
In-Reply-To: <uroh0n$51an$2@dont-email.me>
Content-Language: en-GB
 by: bart - Thu, 29 Feb 2024 00:15 UTC

On 28/02/2024 23:52, Lawrence D'Oliveiro wrote:
> On Wed, 28 Feb 2024 21:34:14 +0000, bart wrote:
>
>> In C:
>>
>> void Add(int CategoryCode, ItemType Item) {
>> CodeToIndex_put(CategoryCode, getCount());
>> add(Item);
>> }
>>
>> 4 non-comment lines versus 9. I know Java needs tons of boilerplate, but
>> but it is not all the language's fault.
>
> Or how about
>
> void Add(int CategoryCode, ItemType Item) {CodeToIndex_put(CategoryCode, getCount());add(Item);}
>
> Wow! I never realized you could do that in C!! I thought it was an
> error to put stuff after column 72 or something. Thanks for the tip!!!

Well, you could write an entire program on one line.

Or you can write an entire program in one thin column:

v\
o\
i\
d\
....

Or you can use common sense and avoiding writing code which is either
too compact or so spread out vertically that you have to hunt for the
actual code. Like trying to find the bits of meat in a thin soup.

That's what I took away from your Java code, which looks remarkably like
the spaced-out examples you posted recently.

Re: Implicit String-Literal Concatenation

<urok6t$5lv4$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33913&group=comp.lang.c#33913

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 00:47:25 +0000
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <urok6t$5lv4$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com> <uroe02$4eoh$1@dont-email.me>
<87y1b4tbcq.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 00:47:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c3e2a728c2cf8236510c608f905802b0";
logging-data="186340"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18I0C1H6Cwl8Y/ot0LE96HN"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FGk75z52XxrqUDPhTdGPW06J6bw=
In-Reply-To: <87y1b4tbcq.fsf@nosuchdomain.example.com>
Content-Language: en-GB
 by: bart - Thu, 29 Feb 2024 00:47 UTC

On 28/02/2024 23:31, Keith Thompson wrote:
> bart <bc@freeuk.com> writes:

>> It would be unfortunate if your example was allowed. Clearly a binary
>> representation of an instance of your struct would probably require 16
>> bytes rather than 4, of which one may be padding.
>
> Depending on the sizes and alignments of the various types, sure.
> So what?
>

> If you have suggestions for alternate ways to define #embed, they might
> be interesting, but it's too late to change the existing specification.
>

My early comments on this were about compiler performance. I suggested
there might be a way to turn 100,000 byte values in a file, directly
into a 100KB string or data block, without needing to first convert
100,000 values into 100,000 integer expressions representated as tokens,
and to then parse those 100,000 expressions into AST nodes etc.

DB suggested something like that was actually done. But you can't do
that if those 100,000 numbers represent from 100KB to 800KB of memory
depending on the data type of the strucure they're initialising.

They might even be mixed type. Or it might be an example like this:

A binary file contains 8 bytes representing one IEEE754 float value. It
is desired to use that to initialise a double array of one element.

However #embed will that into 8 integer values of 0 to 255 each (I assume).

It's not clear either what happens when one of the integers has the
value 150, say, but it is used to initialise an element of type (signed)
char. It sounds like it would make it hard to inialise a char[] array,
when char is signed, from a file of UTF8 text.

Basically, #embed is dumb.

For flexibility, I wouldn't use #embed at all. Just have an actual
comma-separated set of values in a text file, and use #include instead.

Re: Implicit String-Literal Concatenation

<87ttlst6os.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33914&group=comp.lang.c#33914

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Wed, 28 Feb 2024 17:12:03 -0800
Organization: None to speak of
Lines: 93
Message-ID: <87ttlst6os.fsf@nosuchdomain.example.com>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com>
<uroe02$4eoh$1@dont-email.me>
<87y1b4tbcq.fsf@nosuchdomain.example.com>
<urok6t$5lv4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="2beb5206cb1be4986a7eae51046cf684";
logging-data="195026"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+6VGtV7sVb03TMgWErqZvn"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:Qip2/JuNiZ2olAer1FxsWpD1hVw=
sha1:Y7FHTPHv1QnZXSZ7b+s+rqEdxUo=
 by: Keith Thompson - Thu, 29 Feb 2024 01:12 UTC

bart <bc@freeuk.com> writes:
> On 28/02/2024 23:31, Keith Thompson wrote:
>> bart <bc@freeuk.com> writes:
>>> It would be unfortunate if your example was allowed. Clearly a binary
>>> representation of an instance of your struct would probably require 16
>>> bytes rather than 4, of which one may be padding.
>> Depending on the sizes and alignments of the various types, sure.
>> So what?
>
>> If you have suggestions for alternate ways to define #embed, they might
>> be interesting, but it's too late to change the existing specification.
>
> My early comments on this were about compiler performance. I suggested
> there might be a way to turn 100,000 byte values in a file, directly
> into a 100KB string or data block, without needing to first convert
> 100,000 values into 100,000 integer expressions representated as
> tokens, and to then parse those 100,000 expressions into AST nodes
> etc.

I suggest that (a) parsing thoser 100,000 byte values isn't likely to be
a huge deal (if you have actual performance figures that contradict
that, feel free to present them), and (b) any solution that doesn't
involve expanding to C source code would require a lot more work to
implement for very little benefit.

> DB suggested something like that was actually done. But you can't do
> that if those 100,000 numbers represent from 100KB to 800KB of memory
> depending on the data type of the strucure they're initialising.

Neither gcc nor clang implements #embed yet. DB mentioned prototype
implementations. I've asked him for more information.

> They might even be mixed type. Or it might be an example like this:
>
> A binary file contains 8 byes representing one IEEE754 float
> value. It is desired to use that to initialise a double array of one
> element.
>
> However #embed will that into 8 integer values of 0 to 255 each (I assume).

Assuming CHAR_BIT==8, yes. You can use it to initialize a union, or use
memcpy() to copy from an array of unsigned char into a double object.
(Storing double values in binary files is uncommon, but it's certainly
possible.)

> It's not clear either what happens when one of the integers has the
> value 150, say, but it is used to initialise an element of type
> (signed) char. It sounds like it would make it hard to inialise a
> char[] array, when char is signed, from a file of UTF8 text.

Say you have a binary file containing a single byte with the value 150
(when interpreted as an 8-bit unsigned char). Then
#embed "file.dat"
will expand to something like
150
or
0x96

So if you write:

char array[] = {
#embed file.dat
};

then it's treated exactly the same as
char array[] = { 150 };

If plain char is signed, then the result of the conversion is
implementation-defined, but is very very likely to result in a value of
-106.

I expect that 99% of the uses of #embed will be to initialize arrays of
unsigned char (or uint8_t). For that purpose, it should work just fine.

> Basically, #embed is dumb.

Do you object to the fact that the authors didn't add additional
arbitrary restrictions to forbid uses that you don't like?

> For flexibility, I wouldn't use #embed at all. Just have an actual
> comma-separated set of values in a text file, and use #include
> instead.

And you can still do that.

If you have a png image file and you want to include its contents in
your C program, you can use a separate program to translate the file to
C source and #include the result, or you can use `#embed "foo.png"`.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

Re: Implicit String-Literal Concatenation

<urorjd$ahe2$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33915&group=comp.lang.c#33915

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ldo...@nz.invalid (Lawrence D'Oliveiro)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 02:53:33 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <urorjd$ahe2$2@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
<uro6ls$2mh1$5@dont-email.me> <uro8sl$3d71$1@dont-email.me>
<uroh0n$51an$2@dont-email.me> <uroial$58n9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Feb 2024 02:53:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="daeb6559b2144face1d3464670b018f1";
logging-data="345538"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Gbp0z/5pMCMSqEoGnodvb"
User-Agent: Pan/0.155 (Kherson; fc5a80b8)
Cancel-Lock: sha1:EIXwYnzuSXdcdgO3MUr0hJZuGNY=
 by: Lawrence D'Oliv - Thu, 29 Feb 2024 02:53 UTC

On Thu, 29 Feb 2024 00:15:17 +0000, bart wrote:

> Or you can use common sense and avoiding writing code which is either
> too compact or so spread out vertically that you have to hunt for the
> actual code. Like trying to find the bits of meat in a thin soup.

Terribly sorry about that. I wonder if you could look at this part of the
same code file:

final android.util.SparseArray<Integer> CodeToIndex =
new android.util.SparseArray<Integer>();

and show me how to thicken that part of my humble, tasteless gruel? Maybe
using that same “_” trick you used to do OO in C in your previous example?

Re: Implicit String-Literal Concatenation

<urpdfg$dl6q$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33916&group=comp.lang.c#33916

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 08:58:40 +0100
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <urpdfg$dl6q$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
<uro6ls$2mh1$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 07:58:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5f99e7338f93fdbe8ddcb741f55a4912";
logging-data="447706"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18jmdyegiMM1sWN1SDvyvXd0QzvR7y5mS0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:lUA0WTM7ns3zQUSCjfcrRS6F66c=
In-Reply-To: <uro6ls$2mh1$5@dont-email.me>
Content-Language: en-GB
 by: David Brown - Thu, 29 Feb 2024 07:58 UTC

On 28/02/2024 21:56, Lawrence D'Oliveiro wrote:
> On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:
>
>> ... people write utilities for them in a variety of languages ...
>>
>> But it will often be more convenient to have it built into the language
>> and compiler.
>
> What can be built into the language can only ever be a small subset of
> the many and varied ways that people have incorporated data blobs into
> their programs.

Of course. But that doesn't mean that a language should not include a
feature that makes it easy for a lot of people to get some data blobs
into their code. Maybe /you/ won't find it useful, but other people will.

Re: Implicit String-Literal Concatenation

<urphlj$ejuu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33917&group=comp.lang.c#33917

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.chmurka.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 10:10:10 +0100
Organization: A noiseless patient Spider
Lines: 100
Message-ID: <urphlj$ejuu$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 09:10:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5f99e7338f93fdbe8ddcb741f55a4912";
logging-data="479198"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zDIDxD2KjAGC1YJd8Dik3NhjhcLL9R7s="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:GFoByLXauU8nOZ6lkx5ZbAKDTks=
Content-Language: en-GB
In-Reply-To: <877ciouua2.fsf@nosuchdomain.example.com>
 by: David Brown - Thu, 29 Feb 2024 09:10 UTC

On 28/02/2024 22:57, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
> [...]
>> They won't use strings, they will use data blobs - binary data. Then
>> there is no issue with null bytes. And yes, implementations will skip
>> the token generation (unless you are doing something weird, such as
>> using #embed to read the parameters to a function call).
>>
>> Tests with prototype implementations gave extremely fast results.
>
> I'm not sure how that would work. #embed is a preprocessor directive,
> and at least in the abstract model it has to expand to valid C code.
>
> I would have expected that it would simply generate the list of
> comma-separated integer constants described in the standard; later
> phases would simply parse that list and generate code as if that
> sequence had been written in the original source file. Do you know of
> an implementation that does something else?
>

The key thing, as I understand it, is that the compiler gets to know
that the integers in the list are all "nice". And since the
preprocessor and the compiler are part of the same implementation (even
if they are separate programs communicating with pipes or temporary
files), the preprocessor could pass on the binary blob in a pre-parsed form.

Think about what a preprocessor and compiler does with the initialisers
in an array, written in normal text (such as by using "xxd -i" or
another external script). For each integer, it has to divide up the
tokens, identify the comma, parse the integer, check that it is a valid
integer, figure out its type based on the size (and suffix, if any). It
needs to record the line number and column number for possible later
reference in error or warning messages. It has to check the value of
the integer against the type for the array elements, and possibly change
the value to suit, or issue warnings for out-of-range values. It has to
allocate all the space to store this information as it goes along,
without knowing the size of the array - so it will be lots of small
mallocs and/or wasted space. It's a /lot/. (Simpler compilers can get
away with a bit less effort, especially if they have more limited warnings.)

With #embed, the preprocessor can generate a compiler-specific "start of
embed" informational directive (much like "#line" directives and such
things generated by preprocessors today), then the data in a very
specific format, then an "end of embed" directive. It could, for
example, generate all the integers in the format "0xAB, " with 16
elements per line. The compiler wouldn't need to parse the data
normally - it knows exactly how many elements there are (from the "start
of embed" directive), it knows exactly where to find each entry (as each
is 6 characters long), it only needs to look at two of these characters,
there's never any errors, the source line number is fixed (at the #embed
line), and so on.

A more tightly coupled preprocessor and compiler can do even better -
for array initialisation, the binary blob could be used directly without
ever generating integer constants or parsing them.

The results of testing are that #embed is /massively/ faster and lower
memory compared to external generators, especially for larger files.
And it gives you the data on-hand for optimisation purposes, unlike
external direct linking of binary blobs. (So you can get the size of
the array, or use values from it as compile-time known values.)

> For example, say you have a file "foo.dat" containing 4 bytes with
> values 0, 1, 2, and 3. This would be perfectly valid:
>
> struct foo {
> unsigned char a;
> unsigned short b;
> unsigned int c;
> double d;
> };
>
> struct foo obj = {
> #embed "foo.dat"
> };
>
> #embed isn't defined to translate an input file to a sequence of bytes.
> It's defined to translate an input file to a sequence of integer
> constant expressions.
>

Yes. But the prime speed (and memory usage) gains come in, are for
large files, and that means array initialisers. That does not conflict
with using it for cases like yours.

> *Maybe* a compiler could optimize for the case where it knows that it's
> being used to initialize an array of unsigned char, but (a) that would
> require the preprocessor to have information that normally doesn't exist
> until later phases, and (b) I'm not convinced it would be worth the
> effort.
>

Look at
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.

In those tests, for a 40 MB file gcc #embed is 200 times faster than
"xxd -i" generated files, and takes about 2.5% of the memory. It scales
to 1 GB files. And that's just a proof-of-concept implementation.

Re: Implicit String-Literal Concatenation

<urpi8u$eo9h$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33918&group=comp.lang.c#33918

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 09:20:30 +0000
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <urpi8u$eo9h$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlb8p$3bvbc$1@dont-email.me>
<urln99$3ejjt$1@dont-email.me> <urlp3h$3ep9p$5@dont-email.me>
<20240227170925.837@kylheku.com> <urn6li$3s62i$1@dont-email.me>
<uro6ls$2mh1$5@dont-email.me> <uro8sl$3d71$1@dont-email.me>
<uroh0n$51an$2@dont-email.me> <uroial$58n9$1@dont-email.me>
<urorjd$ahe2$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Feb 2024 09:20:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c3e2a728c2cf8236510c608f905802b0";
logging-data="483633"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192Jw+1UxVNosm0kKXkcVbR"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ouBjEj4TP5/tpb692c7FWQ7rMXE=
In-Reply-To: <urorjd$ahe2$2@dont-email.me>
Content-Language: en-GB
 by: bart - Thu, 29 Feb 2024 09:20 UTC

On 29/02/2024 02:53, Lawrence D'Oliveiro wrote:
> On Thu, 29 Feb 2024 00:15:17 +0000, bart wrote:
>
>> Or you can use common sense and avoiding writing code which is either
>> too compact or so spread out vertically that you have to hunt for the
>> actual code. Like trying to find the bits of meat in a thin soup.
>
> Terribly sorry about that. I wonder if you could look at this part of the
> same code file:
>
> final android.util.SparseArray<Integer> CodeToIndex =
> new android.util.SparseArray<Integer>();
>
> and show me how to thicken that part of my humble, tasteless gruel? Maybe
> using that same “_” trick you used to do OO in C in your previous example?

You've shown an example of a piece of meat. In main.java, 70% of the
lines are either blanks or contain only an opening or closing bracket.

Re: Implicit String-Literal Concatenation

<urpllc$ff02$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33919&group=comp.lang.c#33919

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 10:18:21 +0000
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <urpllc$ff02$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com> <urphlj$ejuu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Feb 2024 10:18:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c3e2a728c2cf8236510c608f905802b0";
logging-data="506882"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Zm76thF9DiigXXAbCh1db"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:peTtRd51eyc8UtKyzAyzeWedBbM=
In-Reply-To: <urphlj$ejuu$1@dont-email.me>
Content-Language: en-GB
 by: bart - Thu, 29 Feb 2024 10:18 UTC

On 29/02/2024 09:10, David Brown wrote:
> On 28/02/2024 22:57, Keith Thompson wrote:

>> *Maybe* a compiler could optimize for the case where it knows that it's
>> being used to initialize an array of unsigned char, but (a) that would
>> require the preprocessor to have information that normally doesn't exist
>> until later phases, and (b) I'm not convinced it would be worth the
>> effort.
>>
>
> Look at
> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.
>
> In those tests, for a 40 MB file gcc #embed is 200 times faster than
> "xxd -i" generated files, and takes about 2.5% of the memory.  It scales
> to 1 GB files.  And that's just a proof-of-concept implementation.

I've just down my own tests, with a 40MB data file containing random
A..Z letters (so can be processed as a text file).

This was converted also to a 120MB text file contain a list of numbers
("65,66,73,...", 3 characters for each data byte).

Using 'strinclude' in my old C compiler, it took about 1 second to build
this program:

#include <stdio.h>
#include <string.h>

char* s=strinclude("data");

int main(void) {
printf("%zu\n", strlen(s));
}

(Running it shows '40000000'.) The same test in my language (which has
no intermediate ASM stage) took 0.3 seconds.

Next I tried instead that 120MB text file containing the same data but
as text, initialising a char[] array using #include.

Tcc took 12 seconds. Bcc took 56 seconds (via ASM etc).

gcc got up to about 3GB memory usage then 'cc1' failed trying to
allocate 0.5GB, after about a minute.

Processing long list of numbers DOES use considerable resources. Bear in
mind that #embed also needs to take binary data and generate tokens,
possibly converting each binary number to text.

Re: Implicit String-Literal Concatenation

<urprdv$gfvq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33922&group=comp.lang.c#33922

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: malcolm....@gmail.com (Malcolm McLean)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 11:56:47 +0000
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <urprdv$gfvq$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me> <87frxcuv87.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 11:56:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="67cef76909880883a80a7ee97b1c9595";
logging-data="540666"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oVg9k/aXJqpz7uBd7XMpFeDw1KRiZDS4="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:oUWnJ0Z86Y3Dl6bw2+VVN4zx2BU=
In-Reply-To: <87frxcuv87.fsf@nosuchdomain.example.com>
Content-Language: en-GB
 by: Malcolm McLean - Thu, 29 Feb 2024 11:56 UTC

On 28/02/2024 21:36, Keith Thompson wrote:
> bart <bc@freeuk.com> writes:
> [...]
>> AFAIK strings in C can have embedded zeros when not assumed to be
>> zero-terminated. So here:
>>
>> char s[]={1,2,3,0,4,5,6};
>>
>> s will have a length of 7.
>
> Strings *by definition* cannot have embedded zeros. A null character
> terminates a string.
>
C strings. Not strings in other programming languages. And only if you
define "C strings" in a rather restrictive but, to be fair, totally
legitimate way. So I wouldn't have put in the asterisks.

--
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm

Re: Implicit String-Literal Concatenation

<urq4fe$lapm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33923&group=comp.lang.c#33923

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 14:31:11 +0000
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <urq4fe$lapm$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me> <87frxcuv87.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 14:31:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c3e2a728c2cf8236510c608f905802b0";
logging-data="699190"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+iO97oD5QkRTU79qgfKW06"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:usToOz1GUfisoPr9UEXTKQ299lQ=
In-Reply-To: <87frxcuv87.fsf@nosuchdomain.example.com>
Content-Language: en-GB
 by: bart - Thu, 29 Feb 2024 14:31 UTC

On 28/02/2024 21:36, Keith Thompson wrote:
> bart <bc@freeuk.com> writes:
> [...]
>> AFAIK strings in C can have embedded zeros when not assumed to be
>> zero-terminated. So here:
>>
>> char s[]={1,2,3,0,4,5,6};
>>
>> s will have a length of 7.
>
> Strings *by definition* cannot have embedded zeros. A null character
> terminates a string.
>
> A string literal can have embedded \0 characters, but if you're
> suggesting that #embed should expand to a string literal, I can see
> several disadvantages and no significant advantages. For one thing, the
> data may or may not end with a null character; string literals always
> do.

Not here:

char s[] = "ABC";
char t[3] = "DEF";

The "DEF" string doesn't end with a zero.

Is 'string' given a special meaning in the standard?

/That/ would seem to me to be too restrictive. Does this:

char *s;

define a pointer to a such string, or can it be any kind of data? For
example, `char*` is used by the GetOpenFileName WinAPI function for a
/series/ of zero-terminated strings which itself is terminated with two
zero bytes.

So it is some property that is attributed to the data that will be stored.

I normally use `cstring` or `stringz` outside the language when refering
to a zero-terminated sequences of characters, which implies that
embedded zeros aren't allowed.

Re: Implicit String-Literal Concatenation

<urq7ah$lrbn$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33925&group=comp.lang.c#33925

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 16:19:45 +0100
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <urq7ah$lrbn$2@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me> <87frxcuv87.fsf@nosuchdomain.example.com>
<urprdv$gfvq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Feb 2024 15:19:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5f99e7338f93fdbe8ddcb741f55a4912";
logging-data="716151"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EdOFbx+/Ah+J0Etg0KHBXFv9uQsvhOHs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:oYtgB+3ujO3OJOykAJ+BXznkYCM=
Content-Language: en-GB
In-Reply-To: <urprdv$gfvq$1@dont-email.me>
 by: David Brown - Thu, 29 Feb 2024 15:19 UTC

On 29/02/2024 12:56, Malcolm McLean wrote:
> On 28/02/2024 21:36, Keith Thompson wrote:
>> bart <bc@freeuk.com> writes:
>> [...]
>>> AFAIK strings in C can have embedded zeros when not assumed to be
>>> zero-terminated. So here:
>>>
>>>      char s[]={1,2,3,0,4,5,6};
>>>
>>> s will have a length of 7.
>>
>> Strings *by definition* cannot have embedded zeros.  A null character
>> terminates a string.
>>
> C strings. Not strings in other programming languages.

Let me point you to the name of this Usenet group.

And strings in any programming language have either :

1. A string of characters and a terminating null, thus no embedded null
characters.
2. A starting length (such as Pascal strings).
3. A fixed size.
4. A more advanced structure.

An array of bytes is not a "string".

> And only if you
> define "C strings" in a rather restrictive but, to be fair, totally
> legitimate way. So I wouldn't have put in the asterisks.
>

The definition of "C string" is given in section 7.1.1p1 of the C
standards. There is only one definition of "C string".

Re: Implicit String-Literal Concatenation

<urq7fd$lupv$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33926&group=comp.lang.c#33926

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: richard....@gmail.invalid (Richard Harnden)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 15:22:18 +0000
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <urq7fd$lupv$1@dont-email.me>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<urnbh6$3t14d$1@dont-email.me> <87frxcuv87.fsf@nosuchdomain.example.com>
<urq4fe$lapm$1@dont-email.me>
Reply-To: richard.harnden@invalid.com
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Feb 2024 15:22:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="60b1b7b9bcee9ba5318e909c10756aab";
logging-data="719679"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+wPZjK99hqlO+8EjuZm/LuEE7RfI1jj6M="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:awPNa2yhiSvW/pTjr4yxAIRuD74=
In-Reply-To: <urq4fe$lapm$1@dont-email.me>
Content-Language: en-GB
 by: Richard Harnden - Thu, 29 Feb 2024 15:22 UTC

On 29/02/2024 14:31, bart wrote:
> On 28/02/2024 21:36, Keith Thompson wrote:
>> bart <bc@freeuk.com> writes:
>> [...]
>>> AFAIK strings in C can have embedded zeros when not assumed to be
>>> zero-terminated. So here:
>>>
>>>      char s[]={1,2,3,0,4,5,6};
>>>
>>> s will have a length of 7.
>>
>> Strings *by definition* cannot have embedded zeros.  A null character
>> terminates a string.
>>
>> A string literal can have embedded \0 characters, but if you're
>> suggesting that #embed should expand to a string literal, I can see
>> several disadvantages and no significant advantages.  For one thing, the
>> data may or may not end with a null character; string literals always
>> do.
>
> Not here:
>
>     char s[]  = "ABC";
>     char t[3] = "DEF";
>
> The "DEF" string doesn't end with a zero.

And is, therefore, not a string.

>
> Is 'string' given a special meaning in the standard?

Yes. Things that work with the strX functions. Which means they are
'\0' terminated.

>
> /That/ would seem to me to be too restrictive. Does this:
>
>    char *s;
>
> define a pointer to a such string, or can it be any kind of data? For

That points to a char. That could be followed by more chars and it one
of those is a '\0', then it's a string. You know this.

> example, `char*` is used by the GetOpenFileName WinAPI function for a
> /series/ of zero-terminated strings which itself is terminated with two
> zero bytes.

That is a windowsism, then.

Why didn't they use the NULL terminated char **argv kind of thing?

>
> So it is some property that is attributed to the data that will be stored.
>
> I normally use `cstring` or `stringz` outside the language when refering
> to a zero-terminated sequences of characters, which implies that
> embedded zeros aren't allowed.
>
>
>
>

Re: Implicit String-Literal Concatenation

<urq7sk$2ntv$1@news.gegeweb.eu>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=33927&group=comp.lang.c#33927

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.nntp4.net!news.gegeweb.eu!gegeweb.org!.POSTED.2a01:cb19:8674:1100:216d:274a:f506:6cd1!not-for-mail
From: tth...@none.invalid (tTh)
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 29 Feb 2024 16:29:24 +0100
Organization: none
Message-ID: <urq7sk$2ntv$1@news.gegeweb.eu>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
<877ciouua2.fsf@nosuchdomain.example.com> <uroe02$4eoh$1@dont-email.me>
<87y1b4tbcq.fsf@nosuchdomain.example.com> <urok6t$5lv4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Feb 2024 15:29:24 -0000 (UTC)
Injection-Info: news.gegeweb.eu; posting-account="tontonth@usenet.local"; posting-host="2a01:cb19:8674:1100:216d:274a:f506:6cd1";
logging-data="90047"; mail-complaints-to="abuse@gegeweb.eu"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha256:FqrzG/Q4A3Sn8COdO1PFBXokEWQU7U3IsMDroO+5S6Q=
Content-Language: en-US
In-Reply-To: <urok6t$5lv4$1@dont-email.me>
 by: tTh - Thu, 29 Feb 2024 15:29 UTC

On 2/29/24 01:47, bart wrote:

> My early comments on this were about compiler performance. I suggested
> there might be a way to turn 100,000 byte values in a file, directly
> into a 100KB string or data block, without needing to first convert
> 100,000 values into 100,000 integer expressions representated as tokens,
> and to then parse those 100,000 expressions into AST nodes etc.

But you HAVE to do that il #embed is in the preprocessor,
because his job is to give compilable text to the real
compiler. No other way is possible.

> Basically, #embed is dumb.

No.

--
+---------------------------------------------------------------------+
| https://tube.interhacker.space/a/tth/video-channels |
+---------------------------------------------------------------------+


devel / comp.lang.c / Re: Implicit String-Literal Concatenation

Pages:12345
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor