Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Chemist who falls in acid is absorbed in work.


devel / comp.lang.c++ / Re: Strange optimization

SubjectAuthor
* Strange optimizationBonita Montero
`* Re: Strange optimizationBo Persson
 +* Re: Strange optimizationAlf P. Steinbach
 |`* Re: Strange optimizationBonita Montero
 | `* Re: Strange optimizationAlf P. Steinbach
 |  +- Re: Strange optimizationAlf P. Steinbach
 |  `* Re: Strange optimizationBonita Montero
 |   +- Re: Strange optimizationDavid Brown
 |   `* Re: Strange optimizationAlf P. Steinbach
 |    +- Re: Strange optimizationBonita Montero
 |    +* Re: Strange optimizationJames Kuyper
 |    |+* Re: Strange optimizationChris M. Thomasson
 |    ||`* Re: Strange optimizationChris M. Thomasson
 |    || `- Re: Strange optimizationChris M. Thomasson
 |    |+* Re: Strange optimizationDavid Brown
 |    ||`- Re: Strange optimizationV
 |    |`* Re: Strange optimizationAlf P. Steinbach
 |    | `* Re: Strange optimizationjames...@alumni.caltech.edu
 |    |  +* Re: Strange optimizationAlf P. Steinbach
 |    |  |+* Re: Strange optimizationDavid Brown
 |    |  ||`* Re: Strange optimizationAlf P. Steinbach
 |    |  || `* Re: Strange optimizationDavid Brown
 |    |  ||  `* Re: Strange optimizationAlf P. Steinbach
 |    |  ||   `- Re: Strange optimizationDavid Brown
 |    |  |`* Re: Strange optimizationjames...@alumni.caltech.edu
 |    |  | `* Re: Strange optimizationAlf P. Steinbach
 |    |  |  `* Re: Strange optimizationJames Kuyper
 |    |  |   +- Re: Strange optimizationBonita Montero
 |    |  |   `* Re: Strange optimizationAlf P. Steinbach
 |    |  |    +* Re: Strange optimizationKeith Thompson
 |    |  |    |`* Re: Strange optimizationAlf P. Steinbach
 |    |  |    | `- Re: Strange optimizationDavid Brown
 |    |  |    `- Re: Strange optimizationJames Kuyper
 |    |  `- Re: Strange optimizationBonita Montero
 |    `* Re: Strange optimizationDavid Brown
 |     `* Re: Strange optimizationAlf P. Steinbach
 |      +* Re: Strange optimizationDavid Brown
 |      |`* Re: Strange optimizationAlf P. Steinbach
 |      | +* Re: Strange optimizationDavid Brown
 |      | |+* Re: Strange optimizationBen Bacarisse
 |      | ||`- Re: Strange optimizationDavid Brown
 |      | |`* Re: Strange optimizationAlf P. Steinbach
 |      | | `* Re: Strange optimizationDavid Brown
 |      | |  `* Re: Strange optimizationAlf P. Steinbach
 |      | |   `- Re: Strange optimizationDavid Brown
 |      | `* Re: Strange optimizationKeith Thompson
 |      |  `- Re: Strange optimizationDavid Brown
 |      `- Re: Strange optimizationBonita Montero
 +* Re: Strange optimizationBonita Montero
 |`* Re: Strange optimizationDavid Brown
 | `* Re: Strange optimizationBonita Montero
 |  `* Re: Strange optimizationDavid Brown
 |   +* Re: Strange optimizationBonita Montero
 |   |`* Re: Strange optimizationDavid Brown
 |   | `* Re: Strange optimizationBonita Montero
 |   |  `* Re: Strange optimizationDavid Brown
 |   |   `* Re: Strange optimizationBonita Montero
 |   |    `* Re: Strange optimizationDavid Brown
 |   |     `- Re: Strange optimizationBonita Montero
 |   `* Re: Strange optimizationScott Lurndal
 |    +* Re: Strange optimizationBonita Montero
 |    |+- Re: Strange optimizationScott Lurndal
 |    |`* Re: Strange optimizationjames...@alumni.caltech.edu
 |    | `* Re: Strange optimizationBonita Montero
 |    |  `- Re: Strange optimizationjames...@alumni.caltech.edu
 |    `* Re: Strange optimizationDavid Brown
 |     `- Re: Strange optimizationScott Lurndal
 `* Re: Strange optimizationVir Campestris
  `- Re: Strange optimizationKeith Thompson

Pages:123
Re: Strange optimization

<u6ejcj$bum6$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=384&group=comp.lang.c%2B%2B#384

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 01:50:26 -0700
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <u6ejcj$bum6$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6desf$4oa4$1@dont-email.me>
<u6dgsr$4ums$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 08:50:28 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bc222fdc488b0299ae962f359e97db3d";
logging-data="391878"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fl4+OyvxqNxKIfsxDvI7fk1YClA5wnPs="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:S+wmcZznuKsUORn1SzPkbuDs6JU=
Content-Language: en-US
In-Reply-To: <u6dgsr$4ums$2@dont-email.me>
 by: Chris M. Thomasson - Thu, 15 Jun 2023 08:50 UTC

On 6/14/2023 4:01 PM, Chris M. Thomasson wrote:
> On 6/14/2023 3:27 PM, James Kuyper wrote:
>> On 6/14/23 15:46, Alf P. Steinbach wrote:
>>> On 2023-06-14 7:56 AM, Bonita Montero wrote:
>>>> Am 14.06.2023 um 07:52 schrieb Alf P. Steinbach:
>>>>
>>>>> It's copying an `uint64_t` that is known to be correctly aliased, to
>>>>> an `uint64_t`; that's nonsense.
>>>>
>>>> The reference intitially supplied by the caller is casted from a char
>>>> -array. memcpy() is the only legal way in C++ to alias that content.
>> ...
>>> So for the separately compiled function it does not matter technically,
>>> except possibly for performance, whether it uses clear, concise, safe
>>> and guaranteed max efficient `=`, or verbose and unsafe `memcpy`.
>>>
>>> That means that regarding this matter the common interpretation of the
>>> standard is not technical but instead specifies a formal UB that can't
>>> happen unless one informs a really perverse compiler that it's there.
>>
>> All you need is a platform where misaligned pointers do not merely cause
>> the code to be inefficient, but to actually malfunction. On such a
>> platform, if p_bytes is not correctly aligned to store a uint64_t, then
>> the code will malfunction in the reinterpret_cast<>.
> [...]
>
> What about some code that crosses a L2 cache line boundary and causes
> the damn processor to assert a bus lock... Argh!
>

CMPXCHG on an address that points to data that straddles a l2 cache line
should do it... I cannot remember for sure if the LOCK prefix _has_ to
be present here... Cannot remember right that detail right now, damn!
Fwiw, XCHG should assert bus lock as well wrt the "bad" location, and
LOCK is automatically implied in XCHG to begin with.

Re: Strange optimization

<u6ejlk$c16k$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=385&group=comp.lang.c%2B%2B#385

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 01:55:15 -0700
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <u6ejlk$c16k$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6desf$4oa4$1@dont-email.me>
<u6dgsr$4ums$2@dont-email.me> <u6ejcj$bum6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 08:55:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bc222fdc488b0299ae962f359e97db3d";
logging-data="394452"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Ex5S2WjOwlv6yg/NsG1U/K6oCToioZks="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:s30bU3cbaa0ZGtsSW4rQf+2Dq1o=
Content-Language: en-US
In-Reply-To: <u6ejcj$bum6$1@dont-email.me>
 by: Chris M. Thomasson - Thu, 15 Jun 2023 08:55 UTC

On 6/15/2023 1:50 AM, Chris M. Thomasson wrote:
> On 6/14/2023 4:01 PM, Chris M. Thomasson wrote:
>> On 6/14/2023 3:27 PM, James Kuyper wrote:
>>> On 6/14/23 15:46, Alf P. Steinbach wrote:
>>>> On 2023-06-14 7:56 AM, Bonita Montero wrote:
>>>>> Am 14.06.2023 um 07:52 schrieb Alf P. Steinbach:
>>>>>
>>>>>> It's copying an `uint64_t` that is known to be correctly aliased, to
>>>>>> an `uint64_t`; that's nonsense.
>>>>>
>>>>> The reference intitially supplied by the caller is casted from a char
>>>>> -array. memcpy() is the only legal way in C++ to alias that content.
>>> ...
>>>> So for the separately compiled function it does not matter technically,
>>>> except possibly for performance, whether it uses clear, concise, safe
>>>> and guaranteed max efficient `=`, or verbose and unsafe `memcpy`.
>>>>
>>>> That means that regarding this matter the common interpretation of the
>>>> standard is not technical but instead specifies a formal UB that can't
>>>> happen unless one informs a really perverse compiler that it's there.
>>>
>>> All you need is a platform where misaligned pointers do not merely cause
>>> the code to be inefficient, but to actually malfunction. On such a
>>> platform, if p_bytes is not correctly aligned to store a uint64_t, then
>>> the code will malfunction in the reinterpret_cast<>.
>> [...]
>>
>> What about some code that crosses a L2 cache line boundary and causes
>> the damn processor to assert a bus lock... Argh!
>>
>
> CMPXCHG on an address that points to data that straddles a l2 cache line
> should do it... I cannot remember for sure if the LOCK prefix _has_ to
> be present here... Cannot remember right that detail right now, damn!
> Fwiw, XCHG should assert bus lock as well wrt the "bad" location, and
> LOCK is automatically implied in XCHG to begin with.

I also cannot remember if it only asserts the bus lock when there is
"contention" on a LOCK'ed atomic RMW using an address that goes to data
that straddles a l2 cache line on Intel. Its been a while since I have
worked with raw x86 asm. I am sure some of my work is up on the way back
machine. Let me check...

I found some of my old asm work!

This is MASM:

http://web.archive.org/web/20060214112539/http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_masm_asm.html

This should be GAS:

http://web.archive.org/web/20060214112345/http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html

;^)

Re: Strange optimization

<u6enq9$cirl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=386&group=comp.lang.c%2B%2B#386

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 12:06:03 +0200
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <u6enq9$cirl$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<u6cjrm$1ts6$1@dont-email.me> <u6dbee$4cof$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 10:06:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="959c7d2278d53cc6c55fcb40db86c57a";
logging-data="412533"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+aDifL9ZraJfzTgPrhgH3k01RufVzHaq8="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:U5iRgl3TkP6HxaFd98N7qHqg9tI=
Content-Language: de-DE
In-Reply-To: <u6dbee$4cof$1@dont-email.me>
 by: Bonita Montero - Thu, 15 Jun 2023 10:06 UTC

Am 14.06.2023 um 23:28 schrieb David Brown:

>> In C you can alias anything as a char-array and vise versa,
>> but not in C++.

> <https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing>
> <https://en.cppreference.com/w/cpp/types/byte>

There never will be an upcoming CPU where CHAR_BIT is not eight.
Even Posix requires that.

> Finally you have managed to check it, ...

I've checked it long before the posting of you before.

> No, it is still the wrong type regardless of the memory flatness.

You're paranoid.

Re: Strange optimization

<u6f2b1$dp2e$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=387&group=comp.lang.c%2B%2B#387

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 15:05:37 +0200
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <u6f2b1$dp2e$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<u6cjrm$1ts6$1@dont-email.me> <u6dbee$4cof$1@dont-email.me>
<u6enq9$cirl$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 15 Jun 2023 13:05:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fff2e5cb247c5ed47788381bd530146c";
logging-data="451662"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XNK8zZoCfmBktm6dEju/aPdkItwTWSZM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Cancel-Lock: sha1:Iis5A9otmmPdxwPImtQHam4sEGo=
Content-Language: en-GB
In-Reply-To: <u6enq9$cirl$1@dont-email.me>
 by: David Brown - Thu, 15 Jun 2023 13:05 UTC

On 15/06/2023 12:06, Bonita Montero wrote:
> Am 14.06.2023 um 23:28 schrieb David Brown:
>
>>> In C you can alias anything as a char-array and vise versa,
>>> but not in C++.
>
>> <https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing>
>> <https://en.cppreference.com/w/cpp/types/byte>
>
> There never will be an upcoming CPU where CHAR_BIT is not eight.
> Even Posix requires that.

To the nearest percent, 0% of all cpus shipped are used in POSIX systems.

CPUs are made all the time that don't have 8-bit char. Just because you
have a limited view, does not mean C, C++ or all other programmers do so.

Of course, none of that matters in the slightest here - nothing about
std::byte, type aliasing, or accessing via char types relies on char
being 8-bit.

>
>
> > Finally you have managed to check it, ...
>
> I've checked it long before the posting of you before.
>

Either you are lying in an attempt to look less incompetent, or you are
incompetent, or you wrote a poorly considered "optimisation" that
doesn't work at all in a major use-case and were so proud of your
half-arsed solution that you hoped no one would notice.

>> No, it is still the wrong type regardless of the memory flatness.
>
> You're paranoid.

No, understanding the point of basic language types is not paranoia.

Re: Strange optimization

<6c624506-c15a-4e2d-b533-c2ce6ffcf27fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=388&group=comp.lang.c%2B%2B#388

  copy link   Newsgroups: comp.lang.c++
X-Received: by 2002:ac8:7f94:0:b0:3f9:aa64:7dbf with SMTP id z20-20020ac87f94000000b003f9aa647dbfmr1675139qtj.4.1686835711906;
Thu, 15 Jun 2023 06:28:31 -0700 (PDT)
X-Received: by 2002:a37:4541:0:b0:75e:c6ad:c98 with SMTP id
s62-20020a374541000000b0075ec6ad0c98mr1417447qka.13.1686835711697; Thu, 15
Jun 2023 06:28:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c++
Date: Thu, 15 Jun 2023 06:28:31 -0700 (PDT)
In-Reply-To: <u6cm5p$26u1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=166.252.74.253; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 166.252.74.253
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<3PkiM.61009$hl93.4930@fx18.iad> <u6cm5p$26u1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6c624506-c15a-4e2d-b533-c2ce6ffcf27fn@googlegroups.com>
Subject: Re: Strange optimization
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Thu, 15 Jun 2023 13:28:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2473
 by: james...@alumni.calt - Thu, 15 Jun 2023 13:28 UTC

On Wednesday, June 14, 2023 at 11:26:01 AM UTC-4, Bonita Montero wrote:
....
> In C you can alias anything as a char array and a char array as
> anything so it would be safe to alias anything as anything with
> double-casting (I guess that's correctly supported by the compilers).

C's anti-alasing rules are asymmetric. It distinguishes between the effective type of a object (which is the same as it'declared type, if it has one) and the typeof the league used to access it. Aliasing an object with an effect8ve type of uint64_t using an lvalue of character type is allowed by the following clause:

"An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
....
- a character type" (6.5p7)

Accessing an object with an effective type that is a character type (or an array thereof) using an lvalue with a type of uint64_t is not allowed by any of the cases listed in that paragraph unless they are members of the same union, (or if uint64_t is a character type, which is pretty unlikely, but permitted).

Re: Strange optimization

<u6f43i$e06o$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=389&group=comp.lang.c%2B%2B#389

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 15:35:48 +0200
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <u6f43i$e06o$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<u6cjrm$1ts6$1@dont-email.me> <u6dbee$4cof$1@dont-email.me>
<u6enq9$cirl$1@dont-email.me> <u6f2b1$dp2e$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 15 Jun 2023 13:35:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="959c7d2278d53cc6c55fcb40db86c57a";
logging-data="458968"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bMj3zE8v1hqFBBedaTl6zdMD9iWCutyQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:oUZ5RLFQFHDpR3xXYolbkLF5V1s=
Content-Language: de-DE
In-Reply-To: <u6f2b1$dp2e$1@dont-email.me>
 by: Bonita Montero - Thu, 15 Jun 2023 13:35 UTC

Am 15.06.2023 um 15:05 schrieb David Brown:
> On 15/06/2023 12:06, Bonita Montero wrote:
>> Am 14.06.2023 um 23:28 schrieb David Brown:
>>
>>>> In C you can alias anything as a char-array and vise versa,
>>>> but not in C++.
>>
>>> <https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing>
>>> <https://en.cppreference.com/w/cpp/types/byte>
>>
>> There never will be an upcoming CPU where CHAR_BIT is not eight.
>> Even Posix requires that.
>
> To the nearest percent, 0% of all cpus shipped are used in POSIX systems.
>
> CPUs are made all the time that don't have 8-bit char.  Just because you
> have a limited view, does not mean C, C++ or all other programmers do so.

CPUs with CHAR_BIT != 8 are rare and there won't be any further
in the future.

> Of course, none of that matters in the slightest here - nothing about
> std::byte, type aliasing, or accessing via char types relies on char
> being 8-bit.
>
>>
>>
>>  > Finally you have managed to check it, ...
>>
>> I've checked it long before the posting of you before.
>>
>
> Either you are lying in an attempt to look less incompetent, ...

I checked it, you didn't.

> incompetent, or you wrote a poorly considered "optimisation" that
> doesn't work at all in a major use-case and were so proud of your
> half-arsed solution that you hoped no one would notice.
>
>>> No, it is still the wrong type regardless of the memory flatness.
>>
>> You're paranoid.
>
> No, understanding the point of basic language types is not paranoia.

I would never run into problems with that.
Your opinion is compulsive.

Re: Strange optimization

<u6f4go$e14c$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=390&group=comp.lang.c%2B%2B#390

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 15:42:49 +0200
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <u6f4go$e14c$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<3PkiM.61009$hl93.4930@fx18.iad> <u6cm5p$26u1$1@dont-email.me>
<6c624506-c15a-4e2d-b533-c2ce6ffcf27fn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 15 Jun 2023 13:42:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="959c7d2278d53cc6c55fcb40db86c57a";
logging-data="459916"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18VKI+A1+sHOsX3uNHY3L4EGzQ+IdZO8L0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:+3FyAi1UNCx89pw+t6FxMxWwxrc=
In-Reply-To: <6c624506-c15a-4e2d-b533-c2ce6ffcf27fn@googlegroups.com>
Content-Language: de-DE
 by: Bonita Montero - Thu, 15 Jun 2023 13:42 UTC

Am 15.06.2023 um 15:28 schrieb james...@alumni.caltech.edu:
> On Wednesday, June 14, 2023 at 11:26:01 AM UTC-4, Bonita Montero wrote:
> ...
>> In C you can alias anything as a char array and a char array as
>> anything so it would be safe to alias anything as anything with
>> double-casting (I guess that's correctly supported by the compilers).
>
> C's anti-alasing rules are asymmetric. It distinguishes between the effective type of a object (which is the same as it'declared type, if it has one) and the typeof the league used to access it. Aliasing an object with an effect8ve type of uint64_t using an lvalue of character type is allowed by the following clause:
>
> "An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
> ...
> - a character type" (6.5p7)
>
> Accessing an object with an effective type that is a character type (or an array thereof) using an lvalue with a type of uint64_t is not allowed by any of the cases listed in that paragraph unless they are members of the same union, (or if uint64_t is a character type, which is pretty unlikely, but permitted).

In C you can alias anything as a char-array and a a char-array as
anything. And you can alias signed, defaulted (char) or unsigned
entities as their counterparts. That's all.
Since aliasing with a union is very common all compiler support
that, although there's no guarantee from the standard for that.

Re: Strange optimization

<5cb3bf03-8afb-4bda-9722-ba584007b1e7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=391&group=comp.lang.c%2B%2B#391

  copy link   Newsgroups: comp.lang.c++
X-Received: by 2002:a37:9a13:0:b0:759:184a:de49 with SMTP id c19-20020a379a13000000b00759184ade49mr1534965qke.11.1686839921296;
Thu, 15 Jun 2023 07:38:41 -0700 (PDT)
X-Received: by 2002:a05:622a:413:b0:3f8:6bf6:73fb with SMTP id
n19-20020a05622a041300b003f86bf673fbmr1637811qtx.8.1686839920962; Thu, 15 Jun
2023 07:38:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c++
Date: Thu, 15 Jun 2023 07:38:40 -0700 (PDT)
In-Reply-To: <u6f4go$e14c$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=174.192.200.114; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 174.192.200.114
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<3PkiM.61009$hl93.4930@fx18.iad> <u6cm5p$26u1$1@dont-email.me>
<6c624506-c15a-4e2d-b533-c2ce6ffcf27fn@googlegroups.com> <u6f4go$e14c$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5cb3bf03-8afb-4bda-9722-ba584007b1e7n@googlegroups.com>
Subject: Re: Strange optimization
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Thu, 15 Jun 2023 14:38:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4876
 by: james...@alumni.calt - Thu, 15 Jun 2023 14:38 UTC

On Thursday, June 15, 2023 at 9:43:04 AM UTC-4, Bonita Montero wrote:
> Am 15.06.2023 um 15:28 schrieb james...@alumni.caltech.edu:
> > On Wednesday, June 14, 2023 at 11:26:01 AM UTC-4, Bonita Montero wrote:
> > ...
> >> In C you can alias anything as a char array and a char array as
> >> anything so it would be safe to alias anything as anything with
> >> double-casting (I guess that's correctly supported by the compilers).
> >
> > C's anti-alasing rules are asymmetric. It distinguishes between the effective type of a object (which is the same as it'declared type, if it has one) and the typeof the league used to access it. Aliasing an object with an effect8ve type of uint64_t using an lvalue of character type is allowed by the following clause:
> >
> > "An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
> > ...
> > - a character type" (6.5p7)
> >
> > Accessing an object with an effective type that is a character type (or an array thereof) using an lvalue with a type of uint64_t is not allowed by any of the cases listed in that paragraph unless they are members of the same union, (or if uint64_t is a character type, which is pretty unlikely, but permitted).
> In C you can alias anything as a char-array and a a char-array as
> anything. And you can alias signed, defaulted (char) or unsigned
> entities as their counterparts. That's all.

Citation, please?
6,5p7 is a complete and exhaustive list of the situations where an object can be accessed with defined behaviorusing an lvaue with a type that is different from the effective type of that object. It starts with a "shall", so violations have undefined behavior. Please identify which item on that list covers the case where the lvalue is uint64_t and the effective type is an array of char.

> Since aliasing with a union is very common all compiler support
> that, although there's no guarantee from the standard for that.

There's a footnote in the C standard which says "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning)."

Footnotes are non-normative. They are not supposed to contain the sole specification of some aspect if the language. They're only supposed to explain something that could be derived from the normative text of standard. I don'tbelieve that is the case for this footnote. I've discussed this issue with a couple of people, one of them a member of the committee, who disagreed. Neither of them was able to present an argument laying out that derivation, so I believe that you are technically correct.

However, what that footnote describes is the intent of the committee, and the expectation that almost all users of C have had since union's were first introduced, and the way essentially all real world implementors have implemented them. Therefore, I would recommend treating that footnote as if it were normative, until such time as the standard is corrected to say the same thing in normative text.

Re: Strange optimization

<u6fllf$g7jm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=394&group=comp.lang.c%2B%2B#394

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 20:35:27 +0200
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <u6fllf$g7jm$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<u6cjrm$1ts6$1@dont-email.me> <u6dbee$4cof$1@dont-email.me>
<u6enq9$cirl$1@dont-email.me> <u6f2b1$dp2e$1@dont-email.me>
<u6f43i$e06o$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 15 Jun 2023 18:35:28 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6076497e32e11112fd548c4f9a39d5a8";
logging-data="532086"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RimCmRmCk8U3Zx5DMuKRoqf++2w31v0U="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.7.1
Cancel-Lock: sha1:VTXkjXcKWeXPNlER5k/Jke6l/GI=
In-Reply-To: <u6f43i$e06o$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Thu, 15 Jun 2023 18:35 UTC

On 15/06/2023 15:35, Bonita Montero wrote:
> Am 15.06.2023 um 15:05 schrieb David Brown:
>> On 15/06/2023 12:06, Bonita Montero wrote:
>>> Am 14.06.2023 um 23:28 schrieb David Brown:
>>>
>>>>> In C you can alias anything as a char-array and vise versa,
>>>>> but not in C++.
>>>
>>>> <https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing>
>>>> <https://en.cppreference.com/w/cpp/types/byte>
>>>
>>> There never will be an upcoming CPU where CHAR_BIT is not eight.
>>> Even Posix requires that.
>>
>> To the nearest percent, 0% of all cpus shipped are used in POSIX systems.
>>
>> CPUs are made all the time that don't have 8-bit char.  Just because
>> you have a limited view, does not mean C, C++ or all other programmers
>> do so.
>
> CPUs with CHAR_BIT != 8 are rare and there won't be any further
> in the future.

You do understand that simply repeating something does not make it true?
Processors with char greater than 8 bits are niche, but certainly not
rare in numbers of devices delivered. I suppose it's fair to assume
that /you/ will never be programming any.

>
>> Of course, none of that matters in the slightest here - nothing about
>> std::byte, type aliasing, or accessing via char types relies on char
>> being 8-bit.
>>
>>>
>>>
>>>  > Finally you have managed to check it, ...
>>>
>>> I've checked it long before the posting of you before.
>>>
>>
>> Either you are lying in an attempt to look less incompetent, ...
>
> I checked it, you didn't.
>
>> incompetent, or you wrote a poorly considered "optimisation" that
>> doesn't work at all in a major use-case and were so proud of your
>> half-arsed solution that you hoped no one would notice.
>>
>>>> No, it is still the wrong type regardless of the memory flatness.
>>>
>>> You're paranoid.
>>
>> No, understanding the point of basic language types is not paranoia.
>
> I would never run into problems with that.
> Your opinion is compulsive.
>

"Compulsive" is not nearly as inaccurate as "paranoid" - I do prefer to
try to be accurate in my coding. If there is a type that fits a
particular usage, I'll use that rather than one that just happens to work.

Re: Strange optimization

<u6fni9$gfbu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=396&group=comp.lang.c%2B%2B#396

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Thu, 15 Jun 2023 21:07:55 +0200
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <u6fni9$gfbu$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bpr3$3v0ra$1@dont-email.me> <u6c6vf$hh4$1@dont-email.me>
<u6cffm$1efd$1@dont-email.me> <u6cj8u$1sek$1@dont-email.me>
<u6cjrm$1ts6$1@dont-email.me> <u6dbee$4cof$1@dont-email.me>
<u6enq9$cirl$1@dont-email.me> <u6f2b1$dp2e$1@dont-email.me>
<u6f43i$e06o$1@dont-email.me> <u6fllf$g7jm$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 15 Jun 2023 19:07:53 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="959c7d2278d53cc6c55fcb40db86c57a";
logging-data="540030"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/97WkRMszEbCBXHSVRUeozA4yOY1JhyKw="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:dx11pmzERD2JsFPEMrActd914mI=
In-Reply-To: <u6fllf$g7jm$1@dont-email.me>
Content-Language: de-DE
 by: Bonita Montero - Thu, 15 Jun 2023 19:07 UTC

Am 15.06.2023 um 20:35 schrieb David Brown:

> Processors with char greater than 8 bits are niche, but certainly not
> rare in numbers of devices delivered.  I suppose it's fair to assume
> that /you/ will never be programming any.

Almost any programmer is not programming for such CPUs. And you think
everything must be portable to such CPUs if you complain about that
for sources for which you don't understand its purpose. There's for
sure no C++-compiler which supports C++20 for systems that have CHAR_BIT
different than eight.

> "Compulsive" is not nearly as inaccurate as "paranoid" - I do prefer
> to try to be accurate in my coding. ...

.... if necessary. If it is not necessary it's just compulsiveness.

Re: Strange optimization

<u6h0j5$olvu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=402&group=comp.lang.c%2B%2B#402

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alf.p.st...@gmail.com (Alf P. Steinbach)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 08:48:04 +0200
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <u6h0j5$olvu$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6desf$4oa4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jun 2023 06:48:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="41fde8fe75c13341b4da7e47a75482a0";
logging-data="808958"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19RoH7jZhCjae3dP7WmT4br"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:/Leaq7XxzGxMlfUqYq1mKWYRL9U=
In-Reply-To: <u6desf$4oa4$1@dont-email.me>
Content-Language: en-US
 by: Alf P. Steinbach - Fri, 16 Jun 2023 06:48 UTC

On 2023-06-15 12:27 AM, James Kuyper wrote:
> On 6/14/23 15:46, Alf P. Steinbach wrote:
>> On 2023-06-14 7:56 AM, Bonita Montero wrote:
>>> Am 14.06.2023 um 07:52 schrieb Alf P. Steinbach:
>>>
>>>> It's copying an `uint64_t` that is known to be correctly aliased, to
>>>> an `uint64_t`; that's nonsense.
>>>
>>> The reference intitially supplied by the caller is casted from a char
>>> -array. memcpy() is the only legal way in C++ to alias that content.
> ...
>> So for the separately compiled function it does not matter technically,
>> except possibly for performance, whether it uses clear, concise, safe
>> and guaranteed max efficient `=`, or verbose and unsafe `memcpy`.
>>
>> That means that regarding this matter the common interpretation of the
>> standard is not technical but instead specifies a formal UB that can't
>> happen unless one informs a really perverse compiler that it's there.
>
> All you need is a platform where misaligned pointers do not merely cause
> the code to be inefficient, but to actually malfunction. On such a
> platform, if p_bytes is not correctly aligned to store a uint64_t, then
> the code will malfunction in the reinterpret_cast<>.

Yes, but irrelevant for the case discussed, because the values are
guaranteed correctly aligned.

[snip]
> If p_bytes is correctly aligned, simple assignment will work just as
> well as memcpy().

Yes.

>> Even g++'s documentation of `-fstrict-aliasing` says "A character type
>> may alias any other type.". ...
>
> True, but that's not what this reinterpret_cast does;

It is what this `reinterpret_cast` does.

> it aliases a
> character type with uint64_t, and that a problem if pbytes is not
> correctly aligned to hold a uint64_t.

It is correctly aligned.

> The anti-aliasing rules are not symmetric.

That's a tangential issue, and best discussed in a new thread.

- Alf

Re: Strange optimization

<u6h18t$op4b$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=403&group=comp.lang.c%2B%2B#403

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alf.p.st...@gmail.com (Alf P. Steinbach)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 08:59:39 +0200
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <u6h18t$op4b$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 06:59:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="41fde8fe75c13341b4da7e47a75482a0";
logging-data="812171"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18XavIELRODGUlLADvaMWpt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:wpm2t1v5DwFhgc91wbr6ffvTrYU=
In-Reply-To: <u6egpk$bn1q$2@dont-email.me>
Content-Language: en-US
 by: Alf P. Steinbach - Fri, 16 Jun 2023 06:59 UTC

On 2023-06-15 10:06 AM, David Brown wrote:
> On 14/06/2023 21:46, Alf P. Steinbach wrote:
>> On 2023-06-14 7:56 AM, Bonita Montero wrote:
>>> Am 14.06.2023 um 07:52 schrieb Alf P. Steinbach:
>>>
>>>> It's copying an `uint64_t` that is known to be correctly aliased, to
>>>> an `uint64_t`; that's nonsense.
>>>
>>> The reference intitially supplied by the caller is casted from a char
>>> -array. memcpy() is the only legal way in C++ to alias that content.
>>
>> When that function is separately compiled in a different translation
>> unit, how is the compiler to know when compiling calling code that the
>> function uses `memcpy` internally,
>>
>> and when compiling the function, how is the compiler to know that it's
>> generally called with a `*reinterpret_cast<uint64_t*>( p_bytes )` as
>> argument?
>>
>> Answer: it doesn't know, in either case.
>>
>> So generally (let's disregard global optimization with link time
>> compilation) `memcpy` versus `=` can't affect the outcome.
>>
>> So for the separately compiled function it does not matter
>> technically, except possibly for performance, whether it uses clear,
>> concise, safe and guaranteed max efficient `=`, or verbose and unsafe
>> `memcpy`.
>>
>
> Code that relies on limited optimisation or separate compilation for
> correct behaviour, is an extremely bad idea - it is fragile and a hidden
> bug waiting to explode in the future.  Shortcuts now will cost dearly
> later on.  Take pride in your work, and code responsibly - do what you
> can to make your code /correct/, rather than relying on weak tools!

Repeatedly copying lots of data instead of reinterpreting,

is inefficient and awkward and a source of bugs in itself.

`memcpy` is not the safest tool around. It's rather the opposite,
something to avoid /if possible/. `memcpy` is the "weak tool" here.

The problem is not the standard, which simply lacks wording for what was
obviously intended, like the wording in the g++ documentation, and which
at least in C++03 was a bit inconsistent re this issue (e.g. the
separate point about address of first item in a POD struct was not part
of the allegedly exhaustive strict aliasing list), but the problem is,
clearly, the C++ standardization committee and its interpretation.

Once you realize that it should not be hard to code responsibly.

- Alf

Re: Strange optimization

<u6h82j$phr2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=404&group=comp.lang.c%2B%2B#404

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 10:55:47 +0200
Organization: A noiseless patient Spider
Lines: 110
Message-ID: <u6h82j$phr2$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 08:55:47 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fa4eed4594fafe51d746a65a9db235c9";
logging-data="837474"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18K6ZVPwoOXrmGzc+CtiFHyE6QfDmWMQOc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Cancel-Lock: sha1:Hs4ICb0GoP9b55KCqq1pq5fQyCw=
Content-Language: en-GB
In-Reply-To: <u6h18t$op4b$1@dont-email.me>
 by: David Brown - Fri, 16 Jun 2023 08:55 UTC

On 16/06/2023 08:59, Alf P. Steinbach wrote:
> On 2023-06-15 10:06 AM, David Brown wrote:
>> On 14/06/2023 21:46, Alf P. Steinbach wrote:
>>> On 2023-06-14 7:56 AM, Bonita Montero wrote:
>>>> Am 14.06.2023 um 07:52 schrieb Alf P. Steinbach:
>>>>
>>>>> It's copying an `uint64_t` that is known to be correctly aliased,
>>>>> to an `uint64_t`; that's nonsense.
>>>>
>>>> The reference intitially supplied by the caller is casted from a char
>>>> -array. memcpy() is the only legal way in C++ to alias that content.
>>>
>>> When that function is separately compiled in a different translation
>>> unit, how is the compiler to know when compiling calling code that
>>> the function uses `memcpy` internally,
>>>
>>> and when compiling the function, how is the compiler to know that
>>> it's generally called with a `*reinterpret_cast<uint64_t*>( p_bytes
>>> )` as argument?
>>>
>>> Answer: it doesn't know, in either case.
>>>
>>> So generally (let's disregard global optimization with link time
>>> compilation) `memcpy` versus `=` can't affect the outcome.
>>>
>>> So for the separately compiled function it does not matter
>>> technically, except possibly for performance, whether it uses clear,
>>> concise, safe and guaranteed max efficient `=`, or verbose and unsafe
>>> `memcpy`.
>>>
>>
>> Code that relies on limited optimisation or separate compilation for
>> correct behaviour, is an extremely bad idea - it is fragile and a
>> hidden bug waiting to explode in the future.  Shortcuts now will cost
>> dearly later on.  Take pride in your work, and code responsibly - do
>> what you can to make your code /correct/, rather than relying on weak
>> tools!
>
> Repeatedly copying lots of data instead of reinterpreting,
>
> is inefficient and awkward and a source of bugs in itself.

With a half-decent compiler, "copying" like this disappears in the
optimisation. But there's no disagreement that it is awkward and could
be a source of bugs (Bonita's half-way attempt at optimisation shows that).

Reinterpreting data as a different type from its real type is, in
general, undefined behaviour. Reinterpreting, such as by pointer casts
or reinterpret_cast<>, does not allow you to break the language's type
aliasing rules.

In C++20, there is an alternative to some related uses of memcpy() -
std::bit_cast<>. This is safer, because it hides messy details like the
size of the copy and fails to compile for inappropriate object types,
but in practice it is just a nice wrapper for a memcpy() with a little
"magic" to make it work as constexpr.

(If Bonita had been targeting C++20, presumably the code would have used
std::asumme_aligned instead of a gcc/clang extension.)

>
> `memcpy` is not the safest tool around. It's rather the opposite,
> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>

I don't know how the function here is supposed to be used. We know that
the pointer (it is syntactically a reference, but effectively a pointer)
is properly aligned for a uint64_t, but we don't know if it actually
points to a uint64_t. Perhaps it points to a double, or some other
64-bit type.

You are absolutely right that memcpy() is not a "safe" tool - used like
this, it is a way to bypass the normal safe typing of the language. But
it is a way to get around the rules, rather than break the rules. So if
you need to access the representation of one type as though it were a
different type, then memcpy (or equivalent use of std::byte or char
pointers) is the correct way to do it. So memcpy is not a /weak/ tool
here - it is the /best/ tool here. But you only use it if you need it.

The alternative you are suggesting - separately compiled functions - is
far less safe. It is a hidden bomb, waiting to cause trouble when later
compilers or flags combine code differently, or when someone moves the
function to a different part of the code. memcpy() is part of the
language, and is fully defined and documented behaviour, while separate
compilation is going outside the language and defined behaviour. It is
the most fragile solution, and /never/ one to be recommended.

> The problem is not the standard, which simply lacks wording for what was
> obviously intended, like the wording in the g++ documentation, and which
> at least in C++03 was a bit inconsistent re this issue (e.g. the
> separate point about address of first item in a POD struct was not part
> of the allegedly exhaustive strict aliasing list), but the problem is,
> clearly, the C++ standardization committee and its interpretation.
>

I don't think anyone would accuse the C++ standards of being too clear,
but the behaviour of memcpy is well documented and well defined. Bonita
is thoroughly confused about how and when access by character pointers
is defined behaviour in C and C++, but knows that memcpy works correctly
here. It is usually a better choice to use memcpy() than roll-your-own
character pointer access code anyway.

> Once you realize that it should not be hard to code responsibly.
>

Yes - you do so by writing code that has defined behaviour that does
what you want. You don't do it by relying on luck of compilation details.

Re: Strange optimization

<u6hdrh$q807$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=405&group=comp.lang.c%2B%2B#405

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alf.p.st...@gmail.com (Alf P. Steinbach)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 12:34:23 +0200
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <u6hdrh$q807$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 10:34:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="41fde8fe75c13341b4da7e47a75482a0";
logging-data="860167"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WTnlzE1HgpBQeL4hALpAC"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:CZH3HthKYwA8yti9eH1cXCZwuxw=
In-Reply-To: <u6h82j$phr2$1@dont-email.me>
Content-Language: en-US
 by: Alf P. Steinbach - Fri, 16 Jun 2023 10:34 UTC

On 2023-06-16 10:55 AM, David Brown wrote:
> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>
>> `memcpy` is not the safest tool around. It's rather the opposite,
>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>
>
> I don't know how the function here is supposed to be used.  We know that
> the pointer (it is syntactically a reference, but effectively a pointer)
> is properly aligned for a uint64_t, but we don't know if it actually
> points to a uint64_t.  Perhaps it points to a double, or some other
> 64-bit type.

In the case you sketch where the bits do not represent a valid
`uint64_t`, the `memcpy` does not make the behavior well-defined: that's
a (dangerous) misconception.

I'm not sure if you're addressing only the formal here.

However, if you intended this to also apply to the in-practice, then the
burden of proof is on you that some computer exists where `uint64_t` has
bits that do not participate in the value representation, so that they
/can/ be invalid: as far as I know there is no such computer.

- Alf

Re: Strange optimization

<u6hgt2$qicu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=406&group=comp.lang.c%2B%2B#406

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@gmail.com (Bonita Montero)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 13:26:29 +0200
Organization: A noiseless patient Spider
Lines: 6
Message-ID: <u6hgt2$qicu$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jun 2023 11:26:26 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e5c9bd31f4740e8398adb25355c1ecbb";
logging-data="870814"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KY5lknCc4JAB1ARl6uwD3yZNujAlQ7mE="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:Pql6lrLVfHemK3vERhTpMrF4SPE=
In-Reply-To: <u6h18t$op4b$1@dont-email.me>
Content-Language: de-DE
 by: Bonita Montero - Fri, 16 Jun 2023 11:26 UTC

Am 16.06.2023 um 08:59 schrieb Alf P. Steinbach:

> `memcpy` is not the safest tool around. It's rather the opposite,
> something to avoid /if possible/. `memcpy` is the "weak tool" here.

I'm aliasing, so memcpy() is the only tool here.

Re: Strange optimization

<u6hm4v$r4fl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=407&group=comp.lang.c%2B%2B#407

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 14:55:59 +0200
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <u6hm4v$r4fl$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 12:55:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fa4eed4594fafe51d746a65a9db235c9";
logging-data="889333"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kvKjMRDq/8xALNDdv32i1gcdBhoR48y0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Cancel-Lock: sha1:gZcnVywsZXeo1D7khUkXxNU9Zyk=
In-Reply-To: <u6hdrh$q807$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Fri, 16 Jun 2023 12:55 UTC

On 16/06/2023 12:34, Alf P. Steinbach wrote:
> On 2023-06-16 10:55 AM, David Brown wrote:
>> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>>
>>> `memcpy` is not the safest tool around. It's rather the opposite,
>>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>>
>>
>> I don't know how the function here is supposed to be used.  We know
>> that the pointer (it is syntactically a reference, but effectively a
>> pointer) is properly aligned for a uint64_t, but we don't know if it
>> actually points to a uint64_t.  Perhaps it points to a double, or some
>> other 64-bit type.
>
> In the case you sketch where the bits do not represent a valid
> `uint64_t`, the `memcpy` does not make the behavior well-defined: that's
> a (dangerous) misconception.

A "uint64_t" has a guaranteed fully-defined format and no padding bits.
All possible bit patterns for the type have well-defined behaviour.
(Hypothetically, that would not be true for "unsigned long long", which
could contain padding bits and have could have trap representations.)

In case I am missing something, please tell me where you see any
possible dangerous or not fully defined behaviour even for the more
general case :

#include <string.h>
#include <stdint.h>

uint64_t read64bits(const void * p) {
uint64_t x;
memcpy((void*) &x, p, sizeof(uint64_t));
return x;
}

We can assume that "p" points to data of some type of at least 64 bits
in size. I would like to hear of any potential issues in C or C++
(hence the cross-language code).

>
> I'm not sure if you're addressing only the formal here.

No. It is a general principle. Some people /do/ believe that "separate
compilation" creates magical barriers that limit a compiler's ability to
see the relationships between code sections, and therefore its ability
to "optimise using assumptions about defined and undefined behaviours",
and that this means some kinds of undefined behaviours become defined by
moving code to a different file or disabling optimisation. They are
wrong to believe this. They may manage to write code that works when
they test it, but it will be fragile - the code still has exactly the
same undefined behaviours, and these may manifest as bugs in the future.

>
> However, if you intended this to also apply to the in-practice, then the
> burden of proof is on you that some computer exists where `uint64_t` has
> bits that do not participate in the value representation, so that they
> /can/ be invalid: as far as I know there is no such computer.
>

No, not at all. It is only /you/ that has suggested, by claiming the
use of memcpy is not fully defined, that uint64_t may hypothetically
contain padding bits. (See earlier in my reply.) I know it can't, so
that is not the issue.

My argument is that the following code is /wrong/, even if the functions
are compiled in separate sources :

uint64_t read64(const uint64_t * p) {
return *p;
}

uint64_t reinterpret(double x) {
return read64((const uint64_t *) &x);
}

If these are placed in separate files and compiled separately, with
today's compilers, with no link-time or whole-program optimisation, then
the code will work as the programmer expected and get a bit
representation of the double (which we assume is 64-bit). But working
in a test does not make it /correct/ code, and certainly not /good/ code.

Re: Strange optimization

<87ilbno5r4.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=408&group=comp.lang.c%2B%2B#408

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 15:12:47 +0100
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <87ilbno5r4.fsf@bsb.me.uk>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me> <u6hm4v$r4fl$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="1bcde1cc5cd745a23baabd07c29b6d77";
logging-data="902037"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DNrPQ9F292wcwAuEZMK0RDMlkqkmqsOI="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:Un1owXNDpdNjZ0S8agTf3tTNAh4=
sha1:5fwtABp+mF9fJDqvpnJSTaKkeLY=
X-BSB-Auth: 1.9c490cb3d89dc836edf2.20230616151247BST.87ilbno5r4.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 16 Jun 2023 14:12 UTC

David Brown <david.brown@hesbynett.no> writes:

> On 16/06/2023 12:34, Alf P. Steinbach wrote:
>> On 2023-06-16 10:55 AM, David Brown wrote:
>>> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>>>
>>>> `memcpy` is not the safest tool around. It's rather the opposite,
>>>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>>>
>>>
>>> I don't know how the function here is supposed to be used.  We know that
>>> the pointer (it is syntactically a reference, but effectively a pointer)
>>> is properly aligned for a uint64_t, but we don't know if it actually
>>> points to a uint64_t.  Perhaps it points to a double, or some other
>>> 64-bit type.
>> In the case you sketch where the bits do not represent a valid
>> `uint64_t`, the `memcpy` does not make the behavior well-defined: that's
>> a (dangerous) misconception.
>
> A "uint64_t" has a guaranteed fully-defined format and no padding bits. All
> possible bit patterns for the type have well-defined
> behaviour.

"Fully-defined format" says, to my mind, more that you wanted to say.
In particular the significance of the bits is not defined.

> (Hypothetically, that would not be true for "unsigned long
> long", which could contain padding bits and have could have trap
> representations.)
>
> In case I am missing something, please tell me where you see any possible
> dangerous or not fully defined behaviour even for the more general case :
>
> #include <string.h>
> #include <stdint.h>
>
> uint64_t read64bits(const void * p) {
> uint64_t x;
> memcpy((void*) &x, p, sizeof(uint64_t));
> return x;
> }

What's not "fully defined behaviour" is the return value's relationship
to the bytes pointed to by p. I think you are using "fully defined
behaviour" to mean "not undefined behaviour", but since the former is
not a technical term in the language standards, a reader might take more
from it than you intended.

Yes, this is something of a nit-pick, I know.

--
Ben.

Re: Strange optimization

<u6hutm$rvk8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=409&group=comp.lang.c%2B%2B#409

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alf.p.st...@gmail.com (Alf P. Steinbach)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 17:25:40 +0200
Organization: A noiseless patient Spider
Lines: 123
Message-ID: <u6hutm$rvk8$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me> <u6hm4v$r4fl$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 15:25:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="41fde8fe75c13341b4da7e47a75482a0";
logging-data="917128"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pfprbHXHeN4C/kgFTespl"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:mawh+4YOt6/3k6od3Bi+K1oF61U=
Content-Language: en-US
In-Reply-To: <u6hm4v$r4fl$1@dont-email.me>
 by: Alf P. Steinbach - Fri, 16 Jun 2023 15:25 UTC

On 2023-06-16 2:55 PM, David Brown wrote:
> On 16/06/2023 12:34, Alf P. Steinbach wrote:
>> On 2023-06-16 10:55 AM, David Brown wrote:
>>> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>>>
>>>> `memcpy` is not the safest tool around. It's rather the opposite,
>>>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>>>
>>> I don't know how the function here is supposed to be used.  We know
>>> that the pointer (it is syntactically a reference, but effectively a
>>> pointer) is properly aligned for a uint64_t, but we don't know if it
>>> actually points to a uint64_t.  Perhaps it points to a double, or
>>> some other 64-bit type.
>>
>> In the case you sketch where the bits do not represent a valid
>> `uint64_t`, the `memcpy` does not make the behavior well-defined:
>> that's a (dangerous) misconception.
>
> A "uint64_t" has a guaranteed fully-defined format and no padding bits.

Since you believe that, your comment about "Perhaps points to a double"
is meaningless nonsense.

You're arguing against yourself, with only three lines between your two
comments that are shooting dum-dum bullets at each other.

Make up your mind, please.

> All possible bit patterns for the type have well-defined behaviour.
> (Hypothetically, that would not be true for "unsigned long long", which
> could contain padding bits and have could have trap representations.)

We could discuss this assertion, e.g. I could helpfully mention that in
C++ "these requirements do not hold for other types [than character
types]", but better that you waste time attempting to PROVE IT.

Chapter and verse, please.

Not that it matters for what I've written, but it matters for the silly
argument that you offered, quoted above, and that you now argue against,
plus, there is the thing about being Wrong on the internet, not to
mention Doubly Wrong: just on principle one should not let that pass.

> In case I am missing something, please tell me where you see any
> possible dangerous or not fully defined behaviour even for the more
> general case :
>
> #include <string.h>
> #include <stdint.h>
>
> uint64_t read64bits(const void * p) {
>     uint64_t x;
>     memcpy((void*) &x, p, sizeof(uint64_t));
>     return x;
> }

Now you're arguing against yourself again.

That means that whatever I respond, I can expect a random direction answer.

Anyway:

* The C style cast there is both unnecessary and dangerous, because it
can cast away const, and is difficult to grep, so it's ungood code.
* You're wrong about formally no padding bits for C++, so in principle
that function can produce an invalid `uint64` with trap representation;
that's UB -- except that that's in principle, not in practice.
* If the pointer `p` is invalid, or is a nullpointer, or doesn't point
to at least sizeof(uint64_t) contiguous bytes of readable memory, then
that's UB, and it's UB both in principle and in practice.

If you had been a certain other old timer in this group, then it would
also be relevant that there's possible UB due to stack overflow.

Which by his (lack of) logic leads to the conclusion that all C++
programs have UB.

> We can assume that "p" points to data of some type of at least 64 bits
> in size.  I would like to hear of any potential issues in C or C++
> (hence the cross-language code).

Oh, cross language, sorry.

For the case of C /I believe/ that there's a formal guarantee of no
padding bits, so then only the last point above matters wrt. UB.

>> I'm not sure if you're addressing only the formal here.
>
> No.  It is a general principle.  Some people /do/ believe that "separate
> compilation" creates magical barriers that limit a compiler's ability to
> see the relationships between code sections, and therefore its ability
> to "optimise using assumptions about defined and undefined behaviours",
> and that this means some kinds of undefined behaviours become defined by
> moving code to a different file or disabling optimisation.  They are
> wrong to believe this.  They may manage to write code that works when
> they test it, but it will be fragile - the code still has exactly the
> same undefined behaviours, and these may manifest as bugs in the future.
>
>>
>> However, if you intended this to also apply to the in-practice, then
>> the burden of proof is on you that some computer exists where
>> `uint64_t` has bits that do not participate in the value
>> representation, so that they /can/ be invalid: as far as I know there
>> is no such computer.
>>
>
> No, not at all.  It is only /you/ that has suggested, by claiming the
> use of memcpy is not fully defined, that uint64_t may hypothetically
> contain padding bits.  (See earlier in my reply.)  I know it can't, so
> that is not the issue.

Now you're claiming that your argument was my argument. Jeez.

[snip]

- Alf

Re: Strange optimization

<u6i14i$s9ke$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=410&group=comp.lang.c%2B%2B#410

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 18:03:29 +0200
Organization: A noiseless patient Spider
Lines: 75
Message-ID: <u6i14i$s9ke$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me> <u6hm4v$r4fl$1@dont-email.me>
<87ilbno5r4.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 16:03:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fa4eed4594fafe51d746a65a9db235c9";
logging-data="927374"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187ILX+2hboXV184BCw0xHe9PwYJoS+DXg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Cancel-Lock: sha1:YqMbsbQYd/AZWIj5V2Naavf/gmg=
Content-Language: en-GB
In-Reply-To: <87ilbno5r4.fsf@bsb.me.uk>
 by: David Brown - Fri, 16 Jun 2023 16:03 UTC

On 16/06/2023 16:12, Ben Bacarisse wrote:
> David Brown <david.brown@hesbynett.no> writes:
>
>> On 16/06/2023 12:34, Alf P. Steinbach wrote:
>>> On 2023-06-16 10:55 AM, David Brown wrote:
>>>> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>>>>
>>>>> `memcpy` is not the safest tool around. It's rather the opposite,
>>>>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>>>>
>>>>
>>>> I don't know how the function here is supposed to be used.  We know that
>>>> the pointer (it is syntactically a reference, but effectively a pointer)
>>>> is properly aligned for a uint64_t, but we don't know if it actually
>>>> points to a uint64_t.  Perhaps it points to a double, or some other
>>>> 64-bit type.
>>> In the case you sketch where the bits do not represent a valid
>>> `uint64_t`, the `memcpy` does not make the behavior well-defined: that's
>>> a (dangerous) misconception.
>>
>> A "uint64_t" has a guaranteed fully-defined format and no padding bits. All
>> possible bit patterns for the type have well-defined
>> behaviour.
>
> "Fully-defined format" says, to my mind, more that you wanted to say.
> In particular the significance of the bits is not defined.

The order of the bits is implementation-defined, and the bits are
required to represent different powers of two from 0 to 63. There can
be no padding bits.

I was using "fully defined" to mean "defined by the standard or
implementation defined" - i.e., something documented that you could rely
upon. (I should have made that clear.)

It is certainly allowed for an implementation to have different
endianness for different types - so even if "double" uses the IEEE
formats, storing a value as a double and reading it as a uint64_t may
have different results on different platforms. I suspect such
inconsistently ordered implementations would be quite rare, however -
usually the bit ordering is either clearly little-endian or clearly
big-endian.

>
>> (Hypothetically, that would not be true for "unsigned long
>> long", which could contain padding bits and have could have trap
>> representations.)
>>
>> In case I am missing something, please tell me where you see any possible
>> dangerous or not fully defined behaviour even for the more general case :
>>
>> #include <string.h>
>> #include <stdint.h>
>>
>> uint64_t read64bits(const void * p) {
>> uint64_t x;
>> memcpy((void*) &x, p, sizeof(uint64_t));
>> return x;
>> }
>
> What's not "fully defined behaviour" is the return value's relationship
> to the bytes pointed to by p.
> I think you are using "fully defined
> behaviour" to mean "not undefined behaviour", but since the former is
> not a technical term in the language standards, a reader might take more
> from it than you intended.
>
> Yes, this is something of a nit-pick, I know.
>

I am interested in nit-picks, and glad to hear of any you find.

If I had written "standards-defined and/or implementation defined
behaviour" instead of "fully defined behaviour", would that be sufficient?

Re: Strange optimization

<u6i2rp$sgu3$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=411&group=comp.lang.c%2B%2B#411

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 18:32:56 +0200
Organization: A noiseless patient Spider
Lines: 196
Message-ID: <u6i2rp$sgu3$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me> <u6hm4v$r4fl$1@dont-email.me>
<u6hutm$rvk8$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 16:32:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fa4eed4594fafe51d746a65a9db235c9";
logging-data="934851"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LWznhBu9lI6AQOqjjhP7/GkpNWOM0r6E="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Cancel-Lock: sha1:lIfsNmRigPMBMTmaDHlzr3JJVN0=
Content-Language: en-GB
In-Reply-To: <u6hutm$rvk8$1@dont-email.me>
 by: David Brown - Fri, 16 Jun 2023 16:32 UTC

On 16/06/2023 17:25, Alf P. Steinbach wrote:
> On 2023-06-16 2:55 PM, David Brown wrote:
>> On 16/06/2023 12:34, Alf P. Steinbach wrote:
>>> On 2023-06-16 10:55 AM, David Brown wrote:
>>>> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>>>>
>>>>> `memcpy` is not the safest tool around. It's rather the opposite,
>>>>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>>>>
>>>> I don't know how the function here is supposed to be used.  We know
>>>> that the pointer (it is syntactically a reference, but effectively a
>>>> pointer) is properly aligned for a uint64_t, but we don't know if it
>>>> actually points to a uint64_t.  Perhaps it points to a double, or
>>>> some other 64-bit type.
>>>
>>> In the case you sketch where the bits do not represent a valid
>>> `uint64_t`, the `memcpy` does not make the behavior well-defined:
>>> that's a (dangerous) misconception.
>>
>> A "uint64_t" has a guaranteed fully-defined format and no padding bits.
>
> Since you believe that,

Ref. C standards 7.20.1.1p2, 6.2.6.1p2, 6.2.6.2p1.

I believe the C++ standards inherit this from C, and the C standards are
easier to reference (IMHO) because the numbering stays consistent
between versions.

> your comment about "Perhaps points to a double"
> is meaningless nonsense.

I'm sorry, I have no idea what you mean by that.

>
> You're arguing against yourself, with only three lines between your two
> comments that are shooting dum-dum bullets at each other.
>
> Make up your mind, please.

Again, I have no idea what you mean.

Perhaps you don't remember what you wrote yourself?

>
>
>> All possible bit patterns for the type have well-defined behaviour.
>> (Hypothetically, that would not be true for "unsigned long long",
>> which could contain padding bits and have could have trap
>> representations.)
>
> We could discuss this assertion, e.g. I could helpfully mention that in
> C++ "these requirements do not hold for other types [than character
> types]", but better that you waste time attempting to PROVE IT.
>
> Chapter and verse, please.
>

The C++ standards change their numbering regularly. These numbers are
from the C++20 draft N4860 - I don't believe the contents change much
between versions.

6.8.1p3 .. p5 describes the representation of unsigned types as
collections of bits representing powers of 2. For unsigned integer
types in general, there may be padding bits, but there are no padding
bits in char, signed char, unsigned char and char8_t. (C defines the
bit order for unsigned char, but I don't think C++ does as far as I can
see.)

17.4.1 specifies <cstdint>, and in 17.4.1p2 this refers back to the C
standards section 7.20 where the size-specific integer types are defined
to have no padding bits.

So again - please tell me why you think memcpy'ing data into a uint64_t
may have dangerous or poorly defined behaviour.

The ball is in your court.

> Not that it matters for what I've written, but it matters for the silly
> argument that you offered, quoted above, and that you now argue against,
> plus, there is the thing about being Wrong on the internet, not to
> mention Doubly Wrong: just on principle one should not let that pass.
>
>
>> In case I am missing something, please tell me where you see any
>> possible dangerous or not fully defined behaviour even for the more
>> general case :
>>
>> #include <string.h>
>> #include <stdint.h>
>>
>> uint64_t read64bits(const void * p) {
>>      uint64_t x;
>>      memcpy((void*) &x, p, sizeof(uint64_t));
>>      return x;
>> }
>
> Now you're arguing against yourself again.
>
> That means that whatever I respond, I can expect a random direction answer.
>
> Anyway:
>
> * The C style cast there is both unnecessary and dangerous, because it
> can cast away const, and is difficult to grep, so it's ungood code.

I am asking what you think is /dangerous/ or not fully defined here -
not whether there might be an unnecessary cast or whether the style is
"perfect" according to your questionable judgement.

> * You're wrong about formally no padding bits for C++, so in principle
> that function can produce an invalid `uint64` with trap representation;
> that's UB -- except that that's in principle, not in practice.

See above for the chapter and verse you requested.

> * If the pointer `p` is invalid, or is a nullpointer, or doesn't point
> to at least sizeof(uint64_t) contiguous bytes of readable memory, then
> that's UB, and it's UB both in principle and in practice.
>

See below for the explicit assumption that p points to sufficient
readable data.

> If you had been a certain other old timer in this group, then it would
> also be relevant that there's possible UB due to stack overflow.
>
> Which by his (lack of) logic leads to the conclusion that all C++
> programs have UB.
>

I hope we can agree to assume that kind of thing is outside the scope of
the language!

>
>> We can assume that "p" points to data of some type of at least 64 bits
>> in size.  I would like to hear of any potential issues in C or C++
>> (hence the cross-language code).
>
> Oh, cross language, sorry.

No problem. As long as we avoid crass language :-)

>
> For the case of C /I believe/ that there's a formal guarantee of no
> padding bits, so then only the last point above matters wrt. UB.
>

Ah ha - now we are getting somewhere!

I suspect the key point is that the C++ standard does not explicitly
define uint64_t to have no padding bits, but it refers to the C standard
which /does/ have such guarantees. C++ inherits them from C.

>
>>> I'm not sure if you're addressing only the formal here.
>>
>> No.  It is a general principle.  Some people /do/ believe that
>> "separate compilation" creates magical barriers that limit a
>> compiler's ability to see the relationships between code sections, and
>> therefore its ability to "optimise using assumptions about defined and
>> undefined behaviours", and that this means some kinds of undefined
>> behaviours become defined by moving code to a different file or
>> disabling optimisation.  They are wrong to believe this.  They may
>> manage to write code that works when they test it, but it will be
>> fragile - the code still has exactly the same undefined behaviours,
>> and these may manifest as bugs in the future.
>>
>>>
>>> However, if you intended this to also apply to the in-practice, then
>>> the burden of proof is on you that some computer exists where
>>> `uint64_t` has bits that do not participate in the value
>>> representation, so that they /can/ be invalid: as far as I know there
>>> is no such computer.
>>>
>>
>> No, not at all.  It is only /you/ that has suggested, by claiming the
>> use of memcpy is not fully defined, that uint64_t may hypothetically
>> contain padding bits.  (See earlier in my reply.)  I know it can't, so
>> that is not the issue.
>
> Now you're claiming that your argument was my argument. Jeez.
>

Please re-read your reply to Bonita. You were advocating using
separately compiled code so that you can use assignment instead of
memcpy - claiming, bizarrely, that assignment was "safe" and memcpy
"unsafe" despite insisting on separately compiled functions for your
"solution" to work "in practice".

And then you claimed that the memcpy version was not well defined.

Re: Strange optimization

<u6i61g$srpv$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=412&group=comp.lang.c%2B%2B#412

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alf.p.st...@gmail.com (Alf P. Steinbach)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 19:27:11 +0200
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <u6i61g$srpv$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me> <u6hm4v$r4fl$1@dont-email.me>
<u6hutm$rvk8$1@dont-email.me> <u6i2rp$sgu3$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 17:27:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="41fde8fe75c13341b4da7e47a75482a0";
logging-data="945983"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19864hGMCMiepkomvLnF4En"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:USdcq+CsBXWGhHrAks0yngMy66Q=
In-Reply-To: <u6i2rp$sgu3$1@dont-email.me>
Content-Language: en-US
 by: Alf P. Steinbach - Fri, 16 Jun 2023 17:27 UTC

On 2023-06-16 6:32 PM, David Brown wrote:
[snip]
>>>
>>> A "uint64_t" has a guaranteed fully-defined format and no padding bits.
[snip]
> The C++ standards change their numbering regularly.  These numbers are
> from the C++20 draft N4860 - I don't believe the contents change much
> between versions.
>
> 6.8.1p3 .. p5 describes the representation of unsigned types as
> collections of bits representing powers of 2.  For unsigned integer
> types in general, there may be padding bits, but there are no padding
> bits in char, signed char, unsigned char and char8_t.

[snip]

> The ball is in your court.

In the above you contradict yourself:

* First you claim that there are no padding bits for `uint64_t`.
* Then you paraphrase the standard that "there may be padding bits".

---

I elected now to not quote or respond to the rest which likewise
involved some self-contradictions.

- Alf

Re: Strange optimization

<87o7lfgu2y.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=414&group=comp.lang.c%2B%2B#414

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Fri, 16 Jun 2023 11:06:45 -0700
Organization: None to speak of
Lines: 29
Message-ID: <87o7lfgu2y.fsf@nosuchdomain.example.com>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6egpk$bn1q$2@dont-email.me>
<u6h18t$op4b$1@dont-email.me> <u6h82j$phr2$1@dont-email.me>
<u6hdrh$q807$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="2cdc7a1159dbf1e85495be5cdd32a693";
logging-data="951802"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/tuBNVxocRRvafpId2eeHZ"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:K9QAU13hzDFyt1YXP7AgmINuESE=
sha1:IQRm1AEaNY6fmbBw2mX6TvnDLpA=
 by: Keith Thompson - Fri, 16 Jun 2023 18:06 UTC

"Alf P. Steinbach" <alf.p.steinbach@gmail.com> writes:
> On 2023-06-16 10:55 AM, David Brown wrote:
>> On 16/06/2023 08:59, Alf P. Steinbach wrote:
>>>
>>> `memcpy` is not the safest tool around. It's rather the opposite,
>>> something to avoid /if possible/. `memcpy` is the "weak tool" here.
>>>
>> I don't know how the function here is supposed to be used.  We know
>> that the pointer (it is syntactically a reference, but effectively a
>> pointer) is properly aligned for a uint64_t, but we don't know if it
>> actually points to a uint64_t.  Perhaps it points to a double, or
>> some other 64-bit type.
>
> In the case you sketch where the bits do not represent a valid
> `uint64_t`, the `memcpy` does not make the behavior well-defined:
> that's a (dangerous) misconception.

I think David was suggesting, not the bits are not a valid
representation for an object of type uint64_t, but that for example the
pointer point to an object whose type is, say, double. (I haven't
followed the discussion closely enough to have an opinion on whether
that matters.)

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: Strange optimization

<a435a6d6-d86e-4a35-ac22-08f72c0fe369n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=415&group=comp.lang.c%2B%2B#415

  copy link   Newsgroups: comp.lang.c++
X-Received: by 2002:a37:4642:0:b0:75c:9b66:d021 with SMTP id t63-20020a374642000000b0075c9b66d021mr532455qka.15.1686950201460;
Fri, 16 Jun 2023 14:16:41 -0700 (PDT)
X-Received: by 2002:ad4:58a7:0:b0:62d:f2f4:af8 with SMTP id
ea7-20020ad458a7000000b0062df2f40af8mr609943qvb.1.1686950201285; Fri, 16 Jun
2023 14:16:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c++
Date: Fri, 16 Jun 2023 14:16:41 -0700 (PDT)
In-Reply-To: <u6h0j5$olvu$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=108.45.179.220; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 108.45.179.220
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6desf$4oa4$1@dont-email.me> <u6h0j5$olvu$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a435a6d6-d86e-4a35-ac22-08f72c0fe369n@googlegroups.com>
Subject: Re: Strange optimization
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Fri, 16 Jun 2023 21:16:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3145
 by: james...@alumni.calt - Fri, 16 Jun 2023 21:16 UTC

On Friday, June 16, 2023 at 2:48:24 AM UTC-4, Alf P. Steinbach wrote:
> On 2023-06-15 12:27 AM, James Kuyper wrote:
> > On 6/14/23 15:46, Alf P. Steinbach wrote:
....
> > All you need is a platform where misaligned pointers do not merely cause
> > the code to be inefficient, but to actually malfunction. On such a
> > platform, if p_bytes is not correctly aligned to store a uint64_t, then
> > the code will malfunction in the reinterpret_cast<>.
> Yes, but irrelevant for the case discussed, because the values are
> guaranteed correctly aligned.

You said nothing about what p_bytes was when you asked

"how is the compiler to know that it's generally called with a
`*reinterpret_cast<uint64_t*>( p_bytes )` as argument?"

In particular, you said nothing about p_bytes that would guarantee that it was
correctly aligned. The memcpy() would only be reasonable if it were possible
for "data' to be a misaligned pointer; otherwise simple assignment would be
simpler and equally safe.

> [snip]
> > If p_bytes is correctly aligned, simple assignment will work just as
> > well as memcpy().
> Yes.
> >> Even g++'s documentation of `-fstrict-aliasing` says "A character type
> >> may alias any other type.". ...
> >
> > True, but that's not what this reinterpret_cast does;
> It is what this `reinterpret_cast` does.

This reinterpret_cast converts p_bytes, which presumably points to the
first element of an array of character type, into a pointer to uint64_t*. In
other words, an array of char is being aliased as a uint64_t. Are you
claiming that uint64_t is a character type?

Re: Strange optimization

<u6in1p$uvth$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=417&group=comp.lang.c%2B%2B#417

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: alf.p.st...@gmail.com (Alf P. Steinbach)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Sat, 17 Jun 2023 00:17:26 +0200
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <u6in1p$uvth$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6desf$4oa4$1@dont-email.me>
<u6h0j5$olvu$1@dont-email.me>
<a435a6d6-d86e-4a35-ac22-08f72c0fe369n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 16 Jun 2023 22:17:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e7bb0977682eeae68857305e0469c284";
logging-data="1015729"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19JKJKHiz46qla4HL04O6m/"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:i6DRxmE2kE7uLraFggzJZot4e3w=
Content-Language: en-US
In-Reply-To: <a435a6d6-d86e-4a35-ac22-08f72c0fe369n@googlegroups.com>
 by: Alf P. Steinbach - Fri, 16 Jun 2023 22:17 UTC

On 2023-06-16 11:16 PM, james...@alumni.caltech.edu wrote:
> On Friday, June 16, 2023 at 2:48:24 AM UTC-4, Alf P. Steinbach wrote:
>> On 2023-06-15 12:27 AM, James Kuyper wrote:
>>> On 6/14/23 15:46, Alf P. Steinbach wrote:
> ...
>>> All you need is a platform where misaligned pointers do not merely cause
>>> the code to be inefficient, but to actually malfunction. On such a
>>> platform, if p_bytes is not correctly aligned to store a uint64_t, then
>>> the code will malfunction in the reinterpret_cast<>.
>> Yes, but irrelevant for the case discussed, because the values are
>> guaranteed correctly aligned.
>
> You said nothing about what p_bytes was when you asked
>
> "how is the compiler to know that it's generally called with a
> `*reinterpret_cast<uint64_t*>( p_bytes )` as argument?"
>
> In particular, you said nothing about p_bytes that would guarantee that it was
> correctly aligned.

That was known from the optimization attempt in the original posting's
code (I'm not the OP):

#if defined(__GNUC__) || defined(__llvm__)
if( (size_t)&data % 8 )
__builtin_unreachable();
#endif

> The memcpy() would only be reasonable if it were possible
> for "data' to be a misaligned pointer; otherwise simple assignment would be
> simpler and equally safe.

On that we agree. :)

>> [snip]
>>> If p_bytes is correctly aligned, simple assignment will work just as
>>> well as memcpy().
>> Yes.
>>>> Even g++'s documentation of `-fstrict-aliasing` says "A character type
>>>> may alias any other type.". ...
>>>
>>> True, but that's not what this reinterpret_cast does;
>> It is what this `reinterpret_cast` does.
>
> This reinterpret_cast converts p_bytes, which presumably points to the
> first element of an array of character type, into a pointer to uint64_t*. In
> other words, an array of char is being aliased as a uint64_t. Are you
> claiming that uint64_t is a character type?

It may be that my English isn't good enough to understand a one-way
nature of the GCC docs' wording.

Perhaps.

The way I think, aliasing is of interest to the compiler because where
it can assume that a T value is never changed via a U* pointer, and that
the value that the U* points to is never changed by a change to the T
value, it can optimize, e.g. not reload that value from memory when it's
already in a register. If T = float and U* = int*, then it can make this
assumption. Similarly if T = int and U* = float*, it can assume this.

But if T = char-type, such as the first char in an array, and U* is a
double*, then it can not reasonably make this assumption, and as I read
the docs quote g++ doesn't make this assumption for T = char-type.

And similarly, if T = double and U* is a char-type*, then it can not
reasonably make this assumption, and as I read the docs quote g++
doesn't make this assumption for U* = char-type*.

That is, as I read it, based more on reasoning about the purpose than
expertise in English (I'm Norwegian), the wording works both ways.
char-type on either side is OK. But invoking that freedom twice with
different types, to end up with e.g. int* and double* pointers that are
the same address, so that an int assignment can change a double or vice
versa, is ungood.

- Alf

Re: Strange optimization

<u6ks5p$1ajn0$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=424&group=comp.lang.c%2B%2B#424

  copy link   Newsgroups: comp.lang.c++
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c++
Subject: Re: Strange optimization
Date: Sat, 17 Jun 2023 19:57:11 +0200
Organization: A noiseless patient Spider
Lines: 143
Message-ID: <u6ks5p$1ajn0$1@dont-email.me>
References: <u6acl4$3m5sb$1@dont-email.me> <kerr9cF31i7U1@mid.individual.net>
<u6bb5e$3tf93$1@dont-email.me> <u6be9i$3tpfh$1@dont-email.me>
<u6bkir$3ueat$1@dont-email.me> <u6bkqb$3ufs0$1@dont-email.me>
<u6d5dr$3rml$1@dont-email.me> <u6desf$4oa4$1@dont-email.me>
<u6h0j5$olvu$1@dont-email.me>
<a435a6d6-d86e-4a35-ac22-08f72c0fe369n@googlegroups.com>
<u6in1p$uvth$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 17 Jun 2023 17:57:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f5dfb6c9df3d7eafcdacd0c306c08410";
logging-data="1396448"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ZE6a7+tk5PFkXe5dAqLAMVt37xDxoEDg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.7.1
Cancel-Lock: sha1:qfAt0KQWBkzfL+R1i+3FYQljkqk=
Content-Language: en-GB
In-Reply-To: <u6in1p$uvth$1@dont-email.me>
 by: David Brown - Sat, 17 Jun 2023 17:57 UTC

On 17/06/2023 00:17, Alf P. Steinbach wrote:
> On 2023-06-16 11:16 PM, james...@alumni.caltech.edu wrote:
>> On Friday, June 16, 2023 at 2:48:24 AM UTC-4, Alf P. Steinbach wrote:
>>> On 2023-06-15 12:27 AM, James Kuyper wrote:
>>>> On 6/14/23 15:46, Alf P. Steinbach wrote:
>> ...
>>>> All you need is a platform where misaligned pointers do not merely
>>>> cause
>>>> the code to be inefficient, but to actually malfunction. On such a
>>>> platform, if p_bytes is not correctly aligned to store a uint64_t, then
>>>> the code will malfunction in the reinterpret_cast<>.
>>> Yes, but irrelevant for the case discussed, because the values are
>>> guaranteed correctly aligned.
>>
>> You said nothing about what p_bytes was when you asked
>>
>> "how is the compiler to know that it's generally called with a
>> `*reinterpret_cast<uint64_t*>( p_bytes )` as argument?"
>>
>> In particular, you said nothing about p_bytes that would guarantee
>> that it was
>> correctly aligned.
>
> That was known from the optimization attempt in the original posting's
> code (I'm not the OP):
>
>     #if defined(__GNUC__) || defined(__llvm__)
>         if( (size_t)&data % 8 )
>             __builtin_unreachable();
>     #endif
>
>
>> The memcpy() would only be reasonable if it were possible
>> for "data' to be a misaligned pointer; otherwise simple assignment
>> would be
>> simpler and equally safe.
>
> On that we agree. :)

If we knew that the original pointer pointed to a uint64_t, then we
could /all/ agree that a plain assignment would be simpler, clearer, and
as efficient as possible.

But as far as I know, no such guarantee exists. My guess, from how
functions like this are sometimes used, is that the original data is in
an array of unsigned char - perhaps a buffer for a received network
packet or a file that has been read.

And if the data does not start as a uint64_t (or compatible type), then
reading it through a uint64_t glvalue is /not/ safe, even if alignment
is guaranteed.

>
>
>>> [snip]
>>>> If p_bytes is correctly aligned, simple assignment will work just as
>>>> well as memcpy().
>>> Yes.
>>>>> Even g++'s documentation of `-fstrict-aliasing` says "A character type
>>>>> may alias any other type.". ...
>>>>
>>>> True, but that's not what this reinterpret_cast does;
>>> It is what this `reinterpret_cast` does.
>>
>> This reinterpret_cast converts p_bytes, which presumably points to the
>> first element of an array of character type, into a pointer to
>> uint64_t*. In
>> other words, an array of char is being aliased as a uint64_t. Are you
>> claiming that uint64_t is a character type?
>
> It may be that my English isn't good enough to understand a one-way
> nature of the GCC docs' wording.
>

The "cppreference" site is often clearer than the standards language:

<https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing>

The key point is that a reinterpret_cast (or equivalent via a C-style
cast) does not let you access incompatible types. It does not, in gcc
parlance, side-step the strict aliasing rules.

Like a C cast, reinterpret_cast is a way to tell the compiler that you
think it is safe to change the type of an object (usually a pointer).
But it does not /make/ it safe. And if the compiler can see that the
actual object type is not compatible with the way you are accessing it,
then you have a conflict - you are lying to the compiler, and no good
will come of it. A reinterpret_cast will not change that situation.
(And separate compilation will only hide the problem.)

> Perhaps.
>
> The way I think, aliasing is of interest to the compiler because where
> it can assume that a T value is never changed via a U* pointer, and that
> the value that the U* points to is never changed by a change to the T
> value, it can optimize, e.g. not reload that value from memory when it's
> already in a register. If T = float and U* = int*, then it can make this
> assumption. Similarly if T = int and U* = float*, it can assume this.
>

Yes.

> But if T = char-type, such as the first char in an array, and U* is a
> double*, then it can not reasonably make this assumption, and as I read
> the docs quote g++ doesn't make this assumption for T = char-type.
>

No. You can use a char-type pointer to access a non-char object, but
you cannot use a non-char pointer to access a char (array) object.

I think gcc's documentation could certainly be clearer here, and I also
think that the documentation for "-fstrict-aliasing" flag could be moved
from an optimisation flag to the "code generation options" page. IMHO,
using "-fno-strict-aliasing" is a significant change to the semantics of
the language, making previously undefined behaviour into defined
behaviour (much like "-fwrapv" does for signed integer overflow).

> And similarly, if T = double and U* is a char-type*, then it can not
> reasonably make this assumption, and as I read the docs quote g++
> doesn't make this assumption for U* = char-type*.
>

Correct.

However, gcc/g++ is not saying anything more or less than the standards
say here. The behaviour - both the defined behaviour, and the undefined
behaviour - comes straight from the standard. (The only exception is
for type-punning unions. C90 did not explicitly say they were allowed,
but the gcc documentation says they are allowed even in C90 mode. C99
onwards allows them, while C++ never has.)

> That is, as I read it, based more on reasoning about the purpose than
> expertise in English (I'm Norwegian), the wording works both ways.
> char-type on either side is OK. But invoking that freedom twice with
> different types, to end up with e.g. int* and double* pointers that are
> the same address, so that an int assignment can change a double or vice
> versa, is ungood.
>
>
> - Alf
>


devel / comp.lang.c++ / Re: Strange optimization

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor