Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Except for 75% of the women, everyone in the whole world wants to have sex. -- Ellyn Mustard


devel / comp.compilers / Re: Undefined Behavior Optimizations in C

SubjectAuthor
* Undefined Behavior Optimizations in CLucian Popescu
+- RE: Undefined Behavior Optimizations in CNuno Lopes
+* Re: Undefined Behavior Optimizations in CSpiros Bousbouras
|`* Re: Undefined Behavior Optimizations in Cgah4
| +- Re: Undefined Behavior Optimizations in CAnton Ertl
| `* Re: Undefined Behavior Optimizations in CDavid Brown
|  +* Re: Undefined Behavior Optimizations in Cgah4
|  |+- Re: Undefined Behavior Optimizations in Cgah4
|  |`* Re: Undefined Behavior Optimizations in CSpiros Bousbouras
|  | `- Re: Undefined Behavior Optimizations in Cantispam
|  `* Re: Undefined Behavior Optimizations in CKaz Kylheku
|   +* Re: Re: Undefined Behavior Optimizations in CJon Chesterfield
|   |+* Re: Undefined Behavior Optimizations in CThomas Koenig
|   ||`* Re: Undefined Behavior Optimizations in CKaz Kylheku
|   || +- Re: Undefined Behavior Optimizations in CKeith Thompson
|   || +- Re: Undefined Behavior Optimizations in CThomas Koenig
|   || `- Re: Undefined Behavior Optimizations in CKaz Kylheku
|   |`* Re: Undefined Behavior Optimizations in CDavid Brown
|   | `* Re: Undefined Behavior Optimizations in CSpiros Bousbouras
|   |  +* Re: Undefined Behavior Optimizations in CDavid Brown
|   |  |+- Re: Undefined Behavior Optimizations in Cgah4
|   |  |+* Re: Undefined Behavior Optimizations in Cgah4
|   |  ||`- Re: Undefined Behavior Optimizations in CKeith Thompson
|   |  |`* Re: Undefined Behavior Optimizations in Cgah4
|   |  | +- Re: Undefined Behavior Optimizations in FortranSteven G. Kargl
|   |  | `- Re: Undefined Behavior Optimizations in Fortrangah4
|   |  +* Re: Undefined Behavior Optimizations in CAlexei A. Frounze
|   |  |`* Re: Undefined Behavior Optimizations in CThomas Koenig
|   |  | +* Re: Undefined Behavior Optimizations in CAnton Ertl
|   |  | |`- Re: Undefined Behavior Optimizations in CAnton Ertl
|   |  | `- Re: Undefined Behavior Optimizations in CKaz Kylheku
|   |  `- Re: Undefined Behavior Optimizations in CMartin Ward
|   `* Re: Undefined Behavior Optimizations in CDavid Brown
|    +* Re: Undefined Behavior Optimizations in Cgah4
|    |`* Re: Undefined Behavior Optimizations in CDavid Brown
|    | `- Re: Undefined Behavior Optimizations in Cgah4
|    `- Re: Undefined Behavior Optimizations in Cdave thompson 2
+- Re: Undefined Behavior Optimizations in CAnton Ertl
`- Re: Undefined Behavior Optimizations in CKaz Kylheku

Pages:12
Re: Undefined Behavior Optimizations in C

<23-01-063@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=690&group=comp.compilers#690

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 18 Jan 2023 21:14:44 +0100
Organization: A noiseless patient Spider
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-063@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="83512"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 18 Jan 2023 18:55:44 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-062@comp.compilers>
Content-Language: en-GB
 by: David Brown - Wed, 18 Jan 2023 20:14 UTC

On 18/01/2023 14:14, Spiros Bousbouras wrote:
> On Wed, 11 Jan 2023 14:20:49 +0100
> David Brown <david.brown@hesbynett.no> wrote:
>> C was designed from day one to be a high-level language, not an
>> assembler of any sort. Limitations of weaker earlier compilers does
>> not mean the language was supposed to work that way.
>
> For those who want an abstract or portable assembler , there exists
> c9x.me/compile/ .I've never used it but at least it aims to be that ,
> unlike C. I would be curious to know of other analogous projects. I
> guess the "register transfer language" of GCC is somewhat analogous.

I haven't looked at that projects - but as a general point, I am
sceptical to any claims about "portable assembler". If there is
translation and it is not one-to-one (or very close to that), then you
don't really have "assembler" even if you have a rather low-level
language. (And gcc's RTL is an internal format - usually there are
several optimisation passes done at the RTL level.)

>
>> I first used a C compiler that optimised on the assumption that UB
>> didn't happen some 25 years ago. (In particular, it assumed signed
>> integer arithmetic never overflowed.)
>
> I have encountered several times the claim that compilers assume that UB does
> not happen and I don't understand it. Lets consider 2 examples :
>
> x + 1 > x
>
> in C where x is a signed integer. Compilers will often treat this as
> always true with the following reasoning :
>
> - if x does not have the maximum value which fits in its type then the
> meaning of the C expressions is the same as their mathematical meaning
> so the expression evaluates to true.
>
> - if x has the maximum value which fits in its type then x + 1 is not
> defined so any translation (including treating the whole expression as
> true) is valid.
>
> There's no assumption that UB (undefined behaviour) will not happen, both
> possibilities are accounted for.
>

I think I see what you are saying, but I don't make a big distinction
between "assumes UB does not happen", "assumes you don't care about
results if UB /does/ happen" and "can make any transformations if UB
happens".

One thing that you might view as a distinction is that compilers can
use their knowledge of UB to affect surrounding code.

So if you have :

int x, y;

if (x + 1 > x) y++; // (a)
if (x == INT_MAX) y = 10; // (b)

From your example above, we can see that the compiler can transform (a)
into "y++;" - there is no need for the conditional. But the compiler
can /also/ transform (b) into ";" - it is allowed to reason that if x
/were/ equal to INT_MAX, statement (a) would be undefined behaviour
(even though it was transformed away) and there is no value for x which
would result in "y = 10" being executed without also executing UB.

(A quick check on <https://godbolt.org> shows that gcc does the first
transformation, but not the second one.)

> Another example is
>
> ... *some_pointer_object ...
> [ some_pointer_object does not get modified in this part of the code and
> has not been declared as volatile ]
> if (some_pointer_object == NULL) ...
>
> If some_pointer_object is not NULL then the test can be omitted ; if it is
> NULL then the earlier dereference is UB so any translation is valid including
> omitting the test.
>
> Again, there's no assumpion that UB will not happen.

I think that is one way to look at it, but really it comes down to the
same thing.

One thing that is worth noting in this context is that compilers like
gcc and clang translate known undefined behaviour into a special marker.
You can imagine it as translating "*p = ..." into :

if (!p) undefined_behaviour();
*p = ...

And the builtin function __builtin_unreachable() is translated into
exactly the same internal marker or tree node type. These compilers do
not distinguish between "undefined behaviour" and "code flow cannot
get here".

>
> So the request that C compilers should stop assuming that UB will not
> happen seems to me completely misguided. I think what is really meant
> is that, in reasoning what a valid translation is, C compilers (or
> the authors of the compilers) should not employ the notion of UB. But
> then how should UB be translated ? Again there exists the assumption
> or claim that there is some intuitively obvious translation and
> compilers should go for that. First, I'm not sure that there exists
> such a common intuition even among humans and second, even if it does
> , how does one go from an intuition to an algorithm C compilers can
> use to do translation ? Lots of things are intuitively obvious but
> creating an algorithm to duplicate the human intuition is a hard
> problem, one which has not been solved in many cases and perhaps even
> one which is unsolvable in some cases.
>

I agree entirely with your assessment, with the exception that
"compilers can and do assume UB doesn't happen" is a valid way to view
things.

(I'm snipping the rest, because I fully agree - and it is so well
written that I've nothing to add!)

Re: Undefined Behavior Optimizations in C

<23-01-064@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=691&group=comp.compilers#691

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 18 Jan 2023 21:10:55 -0800 (PST)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-064@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-063@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="12660"; mail-complaints-to="abuse@iecc.com"
Keywords: analysis
Posted-Date: 19 Jan 2023 12:25:23 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-063@comp.compilers>
 by: gah4 - Thu, 19 Jan 2023 05:10 UTC

On Wednesday, January 18, 2023 at 3:55:49 PM UTC-8, David Brown wrote:

(snip)

> From your example above, we can see that the compiler can transform (a)
> into "y++;" - there is no need for the conditional. But the compiler
> can /also/ transform (b) into ";" - it is allowed to reason that if x
> /were/ equal to INT_MAX, statement (a) would be undefined behaviour
> (even though it was transformed away) and there is no value for x which
> would result in "y = 10" being executed without also executing UB.

This is reminding me of some quantum mechanics rules described here:

https://www.sciencenews.org/wp-content/uploads/2010/11/baseball.pdf

It is interesting reading for those interested in the physics, but also
for those who aren't. It doesn't take much physics thought.

It has to do with what quantum mechanics allows when you don't
actually measure something.

And, similarly, compilers shouldn't try to make too many assumptions
about things not measured.

Re: Undefined Behavior Optimizations in C

<23-01-065@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=692&group=comp.compilers#692

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: alexfrun...@gmail.com (Alexei A. Frounze)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Thu, 19 Jan 2023 21:18:52 -0800 (PST)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-065@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="1115"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 20 Jan 2023 11:16:59 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-062@comp.compilers>
 by: Alexei A. Frounze - Fri, 20 Jan 2023 05:18 UTC

On Wednesday, January 18, 2023 at 8:35:40 AM UTC-8, Spiros Bousbouras wrote:
....
> I have encountered several times the claim that compilers assume that UB does
> not happen and I don't understand it. Lets consider 2 examples :
>
> x + 1 > x
>
> in C where x is a signed integer. Compilers will often treat this as
> always true with the following reasoning :
>
> - if x does not have the maximum value which fits in its type then the
> meaning of the C expressions is the same as their mathematical meaning
> so the expression evaluates to true.
>
> - if x has the maximum value which fits in its type then x + 1 is not
> defined so any translation (including treating the whole expression as
> true) is valid.
>
> There's no assumption that UB (undefined behaviour) will not happen, both
> possibilities are accounted for.

I believe in a case like this a modern C/C++ compiler reasons that x must be
less than the maximum representable value and it generates code according
to this, possibly removing dead code that depends on x being the maximum
representable value. If the compiler's assumption that x is less than the
maximum is wrong, it's perfectly fine, it's UB, any "broken" code generated
is allowed.

Alex

Re: Undefined Behavior Optimizations in C

<23-01-066@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=693&group=comp.compilers#693

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 20 Jan 2023 10:45:11 -0800 (PST)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-066@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-063@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="53085"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 20 Jan 2023 16:13:46 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-063@comp.compilers>
 by: gah4 - Fri, 20 Jan 2023 18:45 UTC

On Wednesday, January 18, 2023 at 3:55:49 PM UTC-8, David Brown wrote:

(snip)

> So if you have :
>
> int x, y;
>
> if (x + 1 > x) y++; // (a)
> if (x == INT_MAX) y = 10; // (b)

> From your example above, we can see that the compiler can transform (a)
> into "y++;" - there is no need for the conditional. But the compiler
> can /also/ transform (b) into ";" - it is allowed to reason that if x
> /were/ equal to INT_MAX, statement (a) would be undefined behaviour
> (even though it was transformed away) and there is no value for x which
> would result in "y = 10" being executed without also executing UB.

I am now wondering both how well compilers do this, and how well
people do this.

Note that the only case where, on all machines I use, the y++ is
not executed, is when x==INT_MAX. Now, it would be completely
different for:

if(x + 2 > x) y++; (a)

or

if(x == INT_MAX) y--;

For many years, C allowed for sign-magnitude and ones' complement
representation. Fixed point overflow was, then, at least machine
dependent as they overflow differently. Some machines have the
ability to enable an interrupt on fixed point overflow, but at least
for Fortran and C, it is normally disabled.

Now, C has unsigned int which you can use when you need specific
overflow behavior (except on some Unisys machines). Fortran does not,
and so people expect, and depend, on two's complement overflow. (The
Fortran standard allows for any integer radix greater than one, and
also for different sign representations. But often enough, people know
it is two's complement binary.)

As C allows for both UB and system-dependent behavior, and it is
hard for people to remember every case of each one, it is unreasonable
to me, to assume fixed point overflow is UB.

Dereference of pointers that might point to the wrong place, maybe.

The reason for the Fortran example above, mentioning short circuit
IF statements, was that Fortran programs, at least with IBM compilers,
could reliably fetch from element 0 of an array. With static allocation,
and data stored after code, there was no chance of element 0 being
outside the address space. Yes people shouldn't rely on it,
but sometimes they did.

Storing outside an array often enough leads to problems, but fetch
much less often.

Re: Undefined Behavior Optimizations in C

<23-01-067@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=694&group=comp.compilers#694

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 20 Jan 2023 20:42:25 -0000 (UTC)
Organization: news.netcologne.de
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-067@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-065@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="53399"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 20 Jan 2023 16:14:34 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Thomas Koenig - Fri, 20 Jan 2023 20:42 UTC

Alexei A. Frounze <alexfrunews@gmail.com> schrieb:

> I believe in a case like this a modern C/C++ compiler reasons that x must be
> less than the maximum representable value and it generates code according
> to this, possibly removing dead code that depends on x being the maximum
> representable value. If the compiler's assumption that x is less than the
> maximum is wrong, it's perfectly fine, it's UB, any "broken" code generated
> is allowed.

There are cases when compilers don't even use this knowledge.

Take the function

int add (int a, int b)
{ return a+b;
}

on an instruction set architecture which has only 64-bit
arithmetic, such as POWER. This is translated by gcc,
with optimization, to

add 3,3,4
extsw 3,3
blr

(which is an addition followed by a sign extension). The POWER ABi
specifies that all values passed in registers are sign-extended,
so the content of a register has the same value independent of
the width of the signed integer it is being considered as.

So, the compiler would be within its right to _not_ extend the sign
of the result, because it could assume that no overflow occurs.
This, however, would result in a violation of the ABI, so the
compiler puts in the extra instruction just in case. If you
replace int by long in the example above, the sign extension
instruction is not generated.

By comparision, MIPS gcc translates this to

jr $31
addu $2,$4,$5

(note use of the delay slot), so no explicit sign extension is done,
and the value returned in the register might have a different value
if interpreted as a 64-bit value.

Re: Undefined Behavior Optimizations in C

<23-01-068@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=695&group=comp.compilers#695

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 20 Jan 2023 13:54:36 -0800
Organization: None to speak of
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-068@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-063@comp.compilers> <23-01-066@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="25720"; mail-complaints-to="abuse@iecc.com"
Keywords: C, arithmetic
Posted-Date: 20 Jan 2023 22:09:05 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Keith Thompson - Fri, 20 Jan 2023 21:54 UTC

gah4 <gah4@u.washington.edu> writes:
[snip]
> For many years, C allowed for sign-magnitude and ones' complement
> representation.

It still does. The upcoming 2023 ISO C standard mandates 2's-complement
for signed integer types, but it hasn't been published yet.

[...]

> Now, C has unsigned int which you can use when you need specific
> overflow behavior (except on some Unisys machines).

If a Unisys C implementation doesn't behave as the standard requires
with respect to unsigned overflow, then it's a non-conforming
implementation. (Non-conforming implementations can still be useful.)

[...]

> As C allows for both UB and system-dependent behavior, and it is
> hard for people to remember every case of each one, it is unreasonable
> to me, to assume fixed point overflow is UB.

C has:
- Undefined behavior (the standard imposes no requirements);
- Unspecified behavior (the standard provides 2 or more possibilities
and implementations can choose arbitrarily; an example is the order of
evaluation of function arguments); and
- Implementation-defined behavior (unspecified behavior where the
implementation must document its choice).

Signed integer overflow has undefined behavior -- not because it is or
isn't reasonable, but because the standard says so.

Even in C23, the mandate for 2's-complement representation doesn't
imply a requirement for 2's-complement behavior on overflow.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Re: Undefined Behavior Optimizations in C

<23-01-069@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=696&group=comp.compilers#696

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Sat, 21 Jan 2023 11:54:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-069@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-065@comp.compilers> <23-01-067@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="61901"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 21 Jan 2023 22:40:50 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Sat, 21 Jan 2023 11:54 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Take the function
>
>int add (int a, int b)
>{
> return a+b;
>}
>
>on an instruction set architecture which has only 64-bit
>arithmetic, such as POWER. This is translated by gcc,
>with optimization, to
>
> add 3,3,4
> extsw 3,3
> blr
>
>(which is an addition followed by a sign extension). The POWER ABi
>specifies that all values passed in registers are sign-extended,
>so the content of a register has the same value independent of
>the width of the signed integer it is being considered as.
>
>So, the compiler would be within its right to _not_ extend the sign
>of the result, because it could assume that no overflow occurs.
>This, however, would result in a violation of the ABI, so the
>compiler puts in the extra instruction just in case. If you
>replace int by long in the example above, the sign extension
>instruction is not generated.
>
>By comparision, MIPS gcc translates this to
>
> jr $31
> addu $2,$4,$5
>
>(note use of the delay slot), so no explicit sign extension is done,

What makes you think so? The definition of ADDU in MIPS IV Rev. 3.2
is pretty perverse, specifying an undefined result if one of the
operands is not a sign-extended 32-bit value; but if both operands are
to the instruction's liking, it produces a sign-extended 32-bit
result.

A programming note says:

|[ADDU] is appropriate for arithmetic which is not signed, such as
|address arithmetic, or integer arithmetic environments that ignore
|overflow, such as ā€œCā€ language arithmetic.

One interesting aspect is that the Power ABI specifies sign-extension
rather than garbage-extension for passing around ints. Many other
ABIs are similar (e.g., the RISC-V ABI specifies sign extension, even
for unsigned ints), and AMD64 specifies zero-extension for both signed
and unsigned ints (and has instructions that generate zero-extended
results).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Undefined Behavior Optimizations in C

<23-01-070@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=697&group=comp.compilers#697

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Sun, 22 Jan 2023 07:04:26 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-070@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-065@comp.compilers> <23-01-067@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="70091"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 22 Jan 2023 12:20:49 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Sun, 22 Jan 2023 07:04 UTC

On 2023-01-20, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Alexei A. Frounze <alexfrunews@gmail.com> schrieb:
>
>> I believe in a case like this a modern C/C++ compiler reasons that x must be
>> less than the maximum representable value and it generates code according
>> to this, possibly removing dead code that depends on x being the maximum
>> representable value. If the compiler's assumption that x is less than the
>> maximum is wrong, it's perfectly fine, it's UB, any "broken" code generated
>> is allowed.
>
> There are cases when compilers don't even use this knowledge.
>
> Take the function
>
> int add (int a, int b)
> {
> return a+b;
> }
>
> on an instruction set architecture which has only 64-bit
> arithmetic, such as POWER. This is translated by gcc,
> with optimization, to
>
> add 3,3,4
> extsw 3,3
> blr
>
> (which is an addition followed by a sign extension). The POWER ABi
> specifies that all values passed in registers are sign-extended,
> so the content of a register has the same value independent of
> the width of the signed integer it is being considered as.
>
> So, the compiler would be within its right to _not_ extend the sign
> of the result, because it could assume that no overflow occurs.

Indeed. If overflow occurs, then there could be an unexpected results if
the, say, result of the addition is converted to a 64 bit type.

Suppose that a 32 to 64 conversion is actually a no-op, because the
register values are assumed to be sign-extended, like the ABI says.

Then, say a and b are positives values that fit into the 32 bit
range, such that the a + b sum does not; it wraps negative
in 32 bits.

The programmer might thus expect expect (long) (a + b) to be
that wrapped negative value.

But since the addition is done in 64 bits, it doesn't overflow;
there is a positive 64 bit result. If conversion to long is a noop,
a positive value of long will result, as if the expression were
(long) a + (long) b.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Undefined Behavior Optimizations in C

<23-01-071@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=698&group=comp.compilers#698

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Sun, 22 Jan 2023 09:56:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-071@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-065@comp.compilers> <23-01-067@comp.compilers> <23-01-069@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="77450"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 22 Jan 2023 12:42:35 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Sun, 22 Jan 2023 09:56 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>AMD64 specifies zero-extension for both signed
>and unsigned ints (and has instructions that generate zero-extended
>results).

Looking at <https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf>, I
find no such specification. However, compilers certainly behave in
that way. E.g., for

int add (int a, int b)
{ return a+b;
}

gcc generates:

0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 retq

which zero-extends the result. This certainly rules out an ABI that
requires sign-extension for signed integers.

One interesting case is:

long add (unsigned a, long b)
{ return a+b;
}

which gcc compiles into

0: 89 ff mov %edi,%edi
2: 48 8d 04 37 lea (%rdi,%rsi,1),%rax
6: c3 retq

What's the point of the MOV instruction here? It performs a
32->64-bit zero extension of %rdi. So gcc apparently assumes that
passed operands are garbage-extended on AMD64. Or maybe gcc is just
cautious here. Another test:

unsigned bar(int x);

unsigned long foo(long x)
{ return bar(x);
}

gcc -O compiles this to:

0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <foo+0x9>
9: 89 c0 mov %eax,%eax
b: 48 83 c4 08 add $0x8,%rsp
f: c3 retq

There is no zero or sign-extension on passing x to bar(), so the value
is passed garbage-extended. There is a zero extension for converting
the return value unsigned long, so gcc assumes that the return value
of bar is not necessarily zero-extended.

Conclusion: In the System V ABI for AMD64, values are passed around
garbage-extended (in the general case).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Undefined Behavior Optimizations in C

<23-01-072@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=699&group=comp.compilers#699

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: mar...@gkc.org.uk (Martin Ward)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Mon, 23 Jan 2023 17:12:14 +0000
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-072@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="74330"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 23 Jan 2023 13:27:46 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Martin Ward - Mon, 23 Jan 2023 17:12 UTC

On 18/01/2023 13:14, Spiros Bousbouras wrote:

There's no assumption that UB (undefined behaviour) will not happen, both
possibilities are accounted for.

The "assumption that UB will not happen" is shorthand for the idea
that any optimisation is valid if the optimised code is a refinement
of the unoptimised code for all initial states such that UB does not
occur. Equivalently, a proposed optimiation is valid if we represent
UB as "abort" (a statement which can be refined to anything) and the
optimised code is a refinement of the unoptimised code for all initial
states.

--
Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

Re: Undefined Behavior Optimizations in C

<23-01-073@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=700&group=comp.compilers#700

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Mon, 23 Jan 2023 18:50:31 -0800 (PST)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-073@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-063@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="16579"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, comment
Posted-Date: 26 Jan 2023 13:59:58 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-063@comp.compilers>
 by: gah4 - Tue, 24 Jan 2023 02:50 UTC

On Wednesday, January 18, 2023 at 3:55:49 PM UTC-8, David Brown wrote:

(snip)

> int x, y;
>
> if (x + 1 > x) y++;

OK, for now just considering this one.

And the only Fortran program I still remember from almost 50 years ago.
(It was summer 1972 when I first started learning Fortran, so not long after.)

| SUBROUTINE RANDU (IX, IY, YFL)
| IY = IX * 65539
| IF (IY) 5,6,6
|5 IY = IY + 2147483647 + 1
|6 YFL = IY
| YFL = YFL * .4656613E-9
| RETURN
| END

To avoid loss of indenting, I put in the |.

Since Fortran doesn't have unsigned integers, programs use signed integers.

And a lot of Fortran programs get translated to C.

But first, I suspect that many C programmers don't know, or don't remember
if they did, that two's complement overflow is UB.

And of those that know it is UB, I suspect many don't expect compilers to remove
statements that depend on it. Most C programmers don't carry around the
standard, and don't look up every operation.

Now, C programmers should know about unsigned int, and its properties.

But those translating Fortran programs to C won't always know when they
depend on it. RANDU is probably the most infamous random number generator,
that was popular for many years. It comes from the IBM Scientific Subroutine
Package that traces back to the early 1960's.

It conveniently depends on the fixed point overflow properties of the multiply,
and then the IF to correct for negative results.

As far as I can tell, Fortran makes no claims regarding fixed point overflow,
being undefined or system dependent.

Fortran does allow for fixed point values in any radix greater than one.
There are requirements for the number of decimal digits that types can
represent.

There are also, as part of the C interoperability feature, fixed point types
(KINDs in Fortran terms) with specific bit widths.

Now, the original subject of this thread, is the cost vs. benefit of such
optimizations. Not so obvious the benefit, but there is a cost when people
try to debug programs where things are optimized away.
[Gee, it's been a while since I thought about SSP. I believe that IBM wrote
it largely to give people code that would get reasonable numeric answers
with the 360's funky floating point. Then there were a few odds and ends
like RANDU. They never promised the code would work on anything other
than IBM 360 Fortran. -John]

Re: Undefined Behavior Optimizations in Fortran

<23-01-074@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=701&group=comp.compilers#701

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: sgk...@REMOVEtroutmask.apl.washington.edu (Steven G. Kargl)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in Fortran
Date: Thu, 26 Jan 2023 21:12:12 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-074@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-063@comp.compilers> <23-01-073@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="90413"; mail-complaints-to="abuse@iecc.com"
Keywords: Fortran, comment
Posted-Date: 26 Jan 2023 21:05:52 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Steven G. Kargl - Thu, 26 Jan 2023 21:12 UTC

On Mon, 23 Jan 2023 18:50:31 -0800, gah4 wrote:
>
> As far as I can tell, Fortran makes no claims regarding
> fixed point overflow, being undefined or system dependent.
>

It makes claims. F2018 (18-007r1.pdf), p. 148.

The execution of any numeric operation whose result is
not defined by the arithmetic used by the processor is
prohibited.

If you want to go back to F66, one finds in Sec. 6.4,
"Evaluation of Experssions."

No element may be evaluated whose values is not
mathematically defined.

--
steve
[Like I said, IBM didn't promise that SSP would work anywhere else. -John]

Re: Undefined Behavior Optimizations in Fortran

<23-01-075@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=702&group=comp.compilers#702

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in Fortran
Date: Thu, 26 Jan 2023 17:50:32 -0800 (PST)
Organization: Compilers Central
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-075@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers> <23-01-062@comp.compilers> <23-01-063@comp.compilers> <23-01-073@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="94396"; mail-complaints-to="abuse@iecc.com"
Keywords: Fortran, standards, comment
Posted-Date: 26 Jan 2023 21:24:07 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-073@comp.compilers>
 by: gah4 - Fri, 27 Jan 2023 01:50 UTC

On Thursday, January 26, 2023 at 11:00:02 AM UTC-8, gah4 wrote:

(snip)

> Now, the original subject of this thread, is the cost vs. benefit of such
> optimizations. Not so obvious the benefit, but there is a cost when people
> try to debug programs where things are optimized away.

> [Gee, it's been a while since I thought about SSP. I believe that IBM wrote
> it largely to give people code that would get reasonable numeric answers
> with the 360's funky floating point. Then there were a few odds and ends
> like RANDU. They never promised the code would work on anything other
> than IBM 360 Fortran. -John]

It seems that there is also SSP for the IBM 1130, which is 16 bit binary,
so probably also a 32 bit two's complement integer.

There is a PL/I SSP, but seems not to have RANDU.

When I was in high school, we had CALL/OS, with PL/I, and
I used the RANDU algorithm, as I didn't have any other one.

As you say, it wasn't promised to work with any other Fortran,
but others did try to stay compatible with IBM.
(But often with non-IBM extensions.)

Fortran systems that I know, are good at ignoring fixed point
overflow, though often trap or count floating point overflow.
[I took a look, RANDU on the 1130 repeated after 2^13 items.
Yow. -John]

Re: Undefined Behavior Optimizations in C

<23-01-076@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=703&group=comp.compilers#703

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: dave_tho...@comcast.net
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Sat, 28 Jan 2023 10:35:18 -0500
Organization: A noiseless patient Spider
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-076@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-027@comp.compilers> <23-01-032@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="12030"; mail-complaints-to="abuse@iecc.com"
Keywords: C, Fortran, history
Posted-Date: 29 Jan 2023 11:56:51 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: dave_tho...@comcast.net - Sat, 28 Jan 2023 15:35 UTC

On Tue, 10 Jan 2023 17:32:28 +0100, David Brown
<david.brown@hesbynett.no> wrote:

[ UB example: generated code assumes bool parameter is only 0 or 1 and
mishandles anything else, but misdeclared call passes 2 ]

> This is clearly wrong - clearly undefined behaviour. And the result
> would be a formatted disk. But there is nothing wrong with the
> compiler's generated code. I have seen other occasions when compiler's
> have made code with booleans that appear to be both true and false, or
> neither true nor false, as a result of undefined behaviour setting the
> underlying memory to something other than 0 or 1, simply because that
> was the result of the most efficient code.
>
FWIW -- back in the 80s the VAX FORTRAN compiler checked only the _low
bit_ of a LOGICAL variable (stored as a byte or a 4-byte word, I
forget which) because this was faster. People often reused (scarce and
expensive) memory then and such a variable might accidentally get set
to a value other than 0 or 1, producing surprising/confusing results.

In those days DEC was vigorously opposing C (they had BLISS instead)
and Unix, so there was no DEC C compiler, though I'm pretty sure there
was a DECUS (user group) one, and of course there was not yet a C
standard at all much less one including bool.

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor