Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Digital circuits are made from analog parts. -- Don Vonada


devel / comp.compilers / Re: Undefined Behavior Optimizations in C

SubjectAuthor
* Undefined Behavior Optimizations in CLucian Popescu
+- RE: Undefined Behavior Optimizations in CNuno Lopes
+* Re: Undefined Behavior Optimizations in CSpiros Bousbouras
|`* Re: Undefined Behavior Optimizations in Cgah4
| +- Re: Undefined Behavior Optimizations in CAnton Ertl
| `* Re: Undefined Behavior Optimizations in CDavid Brown
|  +* Re: Undefined Behavior Optimizations in Cgah4
|  |+- Re: Undefined Behavior Optimizations in Cgah4
|  |`* Re: Undefined Behavior Optimizations in CSpiros Bousbouras
|  | `- Re: Undefined Behavior Optimizations in Cantispam
|  `* Re: Undefined Behavior Optimizations in CKaz Kylheku
|   +* Re: Re: Undefined Behavior Optimizations in CJon Chesterfield
|   |+* Re: Undefined Behavior Optimizations in CThomas Koenig
|   ||`* Re: Undefined Behavior Optimizations in CKaz Kylheku
|   || +- Re: Undefined Behavior Optimizations in CKeith Thompson
|   || +- Re: Undefined Behavior Optimizations in CThomas Koenig
|   || `- Re: Undefined Behavior Optimizations in CKaz Kylheku
|   |`* Re: Undefined Behavior Optimizations in CDavid Brown
|   | `* Re: Undefined Behavior Optimizations in CSpiros Bousbouras
|   |  +* Re: Undefined Behavior Optimizations in CDavid Brown
|   |  |+- Re: Undefined Behavior Optimizations in Cgah4
|   |  |+* Re: Undefined Behavior Optimizations in Cgah4
|   |  ||`- Re: Undefined Behavior Optimizations in CKeith Thompson
|   |  |`* Re: Undefined Behavior Optimizations in Cgah4
|   |  | +- Re: Undefined Behavior Optimizations in FortranSteven G. Kargl
|   |  | `- Re: Undefined Behavior Optimizations in Fortrangah4
|   |  +* Re: Undefined Behavior Optimizations in CAlexei A. Frounze
|   |  |`* Re: Undefined Behavior Optimizations in CThomas Koenig
|   |  | +* Re: Undefined Behavior Optimizations in CAnton Ertl
|   |  | |`- Re: Undefined Behavior Optimizations in CAnton Ertl
|   |  | `- Re: Undefined Behavior Optimizations in CKaz Kylheku
|   |  `- Re: Undefined Behavior Optimizations in CMartin Ward
|   `* Re: Undefined Behavior Optimizations in CDavid Brown
|    +* Re: Undefined Behavior Optimizations in Cgah4
|    |`* Re: Undefined Behavior Optimizations in CDavid Brown
|    | `- Re: Undefined Behavior Optimizations in Cgah4
|    `- Re: Undefined Behavior Optimizations in Cdave thompson 2
+- Re: Undefined Behavior Optimizations in CAnton Ertl
`- Re: Undefined Behavior Optimizations in CKaz Kylheku

Pages:12
Undefined Behavior Optimizations in C

<23-01-009@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=637&group=comp.compilers#637

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: luci...@ctrl-c.club (Lucian Popescu)
Newsgroups: comp.compilers
Subject: Undefined Behavior Optimizations in C
Date: 5 Jan 2023 10:05:49 +0000
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-009@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="50451"; mail-complaints-to="abuse@iecc.com"
Keywords: semantics, optimize
Posted-Date: 05 Jan 2023 11:51:36 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Lucian Popescu - Thu, 5 Jan 2023 10:05 UTC

Hi,

I'm currently working on an academic project that analyzes the speedup gain of
Undefined Behavior Optimizations in C. I argue that UB Optimizations introduce
more risks than actual speedup gains. This happens because most of the time
the compiler does not have the context to understand the intention of the
programmer in code that contains UB. For example this fast inverse square root
function [1] triggers UB at this line: i = * ( long * ) &y;. A "smart"
compiler could delete this line as an optimization because it contains UB.

Don't get me wrong, I'm not saying that the compiler should magically
understand situations where the programmer makes UB mistakes and then
repair them. What I'm saying is that the compiler should compile the code
"as is" without making smart optimization transformations that break the
intention of the programmer.

To test the theory that the UB Optimizations introduce more risks than speedup
gains, I take OpenBSD (for its focus on security and robustness) and compile
it
on one hand with UB Optimizations turned on and with UB Optimizations turned
off. If I will find out that the speedup gain is not so big (I don't know what big
means at the moment) then the UB Optimizations don't make sense in the OS.
Otherwise, it means that I was wrong, but I will still want to see what
security
risks they introduce on the respective OS setup.

My current progress is here [2]. I did not start the technical work, ATM I only
have the research proposal. I reached you to see if you have any feedback
on this proposal. Is it a manageable goal? Do you see ways in which it can
be improved? Does it suck? etc, etc.

Lucian Popescu

[1] https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
[2] https://tildegit.org/lucic71/dissertation/src/branch/master/TSW/tsw.pdf

RE: Undefined Behavior Optimizations in C

<23-01-010@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=638&group=comp.compilers#638

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: nuno.lo...@tecnico.ulisboa.pt (Nuno Lopes)
Newsgroups: comp.compilers
Subject: RE: Undefined Behavior Optimizations in C
Date: Thu, 5 Jan 2023 10:24:13 -0000
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-010@comp.compilers>
References: <23-01-009@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="50911"; mail-complaints-to="abuse@iecc.com"
Keywords: semantics, optimize
Posted-Date: 05 Jan 2023 11:52:36 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-gb
 by: Nuno Lopes - Thu, 5 Jan 2023 10:24 UTC

Hi Lucian,

I've a fair bit of experience with UB-related optimizations in LLVM. I don't
think the speedup and code size reduction (also important) from certain UB
exploitation is negligible. I think a better project would be to quantify
the impact of each of the UB features.
As a counter argument to your thesis, see the slides of my presentation
"Semantics for Compiler IRs: Undefined Behavior is not Evil!":
https://web.ist.utl.pt/nuno.lopes/pres/ub-vmcai19.pptx

See an example here that gives a 39% speedup in a loop:
https://web.ist.utl.pt/nuno.lopes/pres/poison-llvm16.pptx
More data on how alias analysis loves UB:
https://web.ist.utl.pt/nuno.lopes/pres/ub-vmcai19.pptx

There's a key implementation difficulty in doing your study. While it's
feasible to disable most UB exploitation at the clang side (not with
compiler switches, you need to change the code). At the LLVM side is
trickier to do a fair comparison. Because optimizations themselves produce
UB code, but that's fine; it's just their way of expressing invariants. But
then you cannot distinguish between what's an invariant and what's an
exploitation of UB in the program.

Don't get me wrong, I would be very keen to read such a study. But it's a
lot more work than you think to do it right.

Do ping me privately if you want more information/pointers.

Nuno

-----Original Message-----
From: Lucian Popescu
Sent: Thursday, January 5, 2023 10:06 AM
Subject: Undefined Behavior Optimizations in C

Hi,

I'm currently working on an academic project that analyzes the speedup gain
of Undefined Behavior Optimizations in C. I argue that UB Optimizations
introduce more risks than actual speedup gains. This happens because most of
the time the compiler does not have the context to understand the intention
of the programmer in code that contains UB. ...

Re: Undefined Behavior Optimizations in C

<23-01-011@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=639&group=comp.compilers#639

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Thu, 5 Jan 2023 18:06:53 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-011@comp.compilers>
References: <23-01-009@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="65170"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, semantics
Posted-Date: 05 Jan 2023 13:13:05 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-009@comp.compilers>
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Thu, 5 Jan 2023 18:06 UTC

On 5 Jan 2023 10:05:49 +0000
"Lucian Popescu" <lucic71@ctrl-c.club> wrote:
> Hi,
>
> I'm currently working on an academic project that analyzes the speedup gain of
> Undefined Behavior Optimizations in C. I argue that UB Optimizations introduce
> more risks than actual speedup gains. This happens because most of the time
> the compiler does not have the context to understand the intention of the
> programmer in code that contains UB. For example this fast inverse square root
> function [1] triggers UB at this line: i = * ( long * ) &y;. A "smart"
> compiler could delete this line as an optimization because it contains UB.
>
> Don't get me wrong, I'm not saying that the compiler should magically
> understand situations where the programmer makes UB mistakes and then
> repair them. What I'm saying is that the compiler should compile the code
> "as is" without making smart optimization transformations that break the
> intention of the programmer.

Do you have a rigorous definition of what it means for a compiler to

compile the code "as is" without making smart optimization
transformations that break the intention of the programmer.

? Without such a definition , I don't think you are making a meaningful
distinction and I think it would be hard to impossible to come up with
such a definition.

> To test the theory that the UB Optimizations introduce more risks than
> speedup gains,

Isn't this comparing apples and oranges ?

> I take OpenBSD (for its focus on security and robustness) and compile it on
> one hand with UB Optimizations turned on and with UB Optimizations turned
> off. If I will find out that the speedup gain is not so big (I don't know
> what big means at the moment) then the UB Optimizations don't make sense in
> the OS. Otherwise, it means that I was wrong, but I will still want to see
> what security risks they introduce on the respective OS setup.

Given the large variety of tasks that an operating system can be used for ,
I'm curious to see what you are going to test on OpenBSD to determine speedup
gain. I think it would make more sense to test with chess engines : prepare
a collection of chess positions and see how many nodes per second an engine
can calculate for a given position depending on how the engine was compiled.

Doing analogous experiments with graphics code or numerical analysis code
would also be interesting. But an operating system , I don't know , it seems
too general.

> My current progress is here [2]. I did not start the technical work, ATM I only
> have the research proposal. I reached you to see if you have any feedback
> on this proposal. Is it a manageable goal? Do you see ways in which it can
> be improved? Does it suck? etc, etc.
>
> Lucian Popescu
>
> [1] https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
> [2] https://tildegit.org/lucic71/dissertation/src/branch/master/TSW/tsw.pdf

Re: Undefined Behavior Optimizations in C

<23-01-012@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=640&group=comp.compilers#640

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Thu, 5 Jan 2023 16:22:07 -0800 (PST)
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-012@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="36560"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize
Posted-Date: 05 Jan 2023 19:29:10 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-011@comp.compilers>
 by: gah4 - Fri, 6 Jan 2023 00:22 UTC

On Thursday, January 5, 2023 at 10:13:08 AM UTC-8, Spiros Bousbouras wrote:
> On 5 Jan 2023 10:05:49 +0000
> "Lucian Popescu" <luc...@ctrl-c.club> wrote:

> > I'm currently working on an academic project that analyzes the speedup gain of
> > Undefined Behavior Optimizations in C.
(snip)

> > To test the theory that the UB Optimizations introduce more risks than
> > speedup gains,

> Isn't this comparing apples and oranges ?

Probably.

You can quantify speed-up, but it is harder to quantify risk.

You might be able to quantify debug time, and how much longer
it takes to debug a program with such behavior.

Most important when debugging, is that you can trust the compiler to
do what you said. That they don't, has always been part of
optimization, but these UB make it worse.

> > I take OpenBSD (for its focus on security and robustness) and compile it on
> > one hand with UB Optimizations turned on and with UB Optimizations turned
> > off.
(snip)

> I think it would make more sense to test with chess engines : prepare
> a collection of chess positions and see how many nodes per second an engine
> can calculate for a given position depending on how the engine was compiled.

Sounds good to me. Big enough, but not too big. And something that someone
always wants to be faster.

> Doing analogous experiments with graphics code or numerical analysis code
> would also be interesting. But an operating system , I don't know , it seems
> too general.

My thought would be all of SPEC, if you can avoid the funny SPEC rules on
using the numbers.

Numerical analysis code sounds interesting, as I don't know how much
the problem affects floating point code.

I haven't thought about this so recently. Do compilers give warnings
when they remove such UB code?

Re: Undefined Behavior Optimizations in C

<23-01-014@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=642&group=comp.compilers#642

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 06 Jan 2023 07:47:55 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-014@comp.compilers>
References: <23-01-009@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="29210"; mail-complaints-to="abuse@iecc.com"
Keywords: semantics, optimize
Posted-Date: 06 Jan 2023 11:57:45 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Fri, 6 Jan 2023 07:47 UTC

"Lucian Popescu" <lucic71@ctrl-c.club> writes:
>I'm currently working on an academic project that analyzes the speedup gain of
>Undefined Behavior Optimizations in C.

Some earlier work:

@InProceedings{wang+12,
author = {Xi Wang and Haogang Chen and Alvin Cheung and Zhihao Jia and Nickolai Zeldovich and M. Frans Kaashoek},
title = {Undefined Behavior: What Happened to My Code?},
booktitle = {Asia-Pacific Workshop on Systems (APSYS'12)},
OPTpages = {},
year = {2012},
url1 = {http://homes.cs.washington.edu/~akcheung/getFile.php?file=apsys12.pdf},
url2 = {http://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf},
OPTannote = {}
}

The most interesting part wrt. topic is section 3.3: They compiled
SPECint 2006 with gcc and clang, with either the default options, or
with defining the behaviour with the options -fno-strict-overflow (or
-fwrapv) -fno-delete-null-pointer-checks -fno-strict-aliasing, which
disables most of what you call "UB Optimizations" in the compiler
versions they used (and probably still disables most "UB
Optimizations"; there have been new -fno-* flags added, corresponding
to some of the new "UB Optimizations", but I expect that the early "UB
Optimizations" have the most impact (you would not start with
low-impact "optimizations").

The results were that in 10 of 12 programs, there was no significant
performance difference. In 2 of the 12 programs, there were speed
differences by factors 1.072, 1.09, 1.063 and 1.118 (depending on
program and compiler used).

However, they also found that these speed differences are due to a
single place in the source code of each of the two programs, and that
a small change to the source code of each program could eliminate the
speed difference.

@InProceedings{ertl15kps,
author = {M. Anton Ertl},
title = {What every compiler writer should know about programmers},
crossref = {kps15},
pages = {112--133},
url = {http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf},
abstract = {In recent years C compiler writers have taken the
attitude that tested and working production (i.e.,
\emph{conforming} according to the C standard) C
programs are buggy if they contain undefined
behavior, and they feel free to compile these
programs (except benchmarks) in a way that they no
longer work. The justification for this attitude is
that it allows C compilers to optimize better. But
while these ``optimizations'' provide a factor 1.017
speedup with Clang-3.1 on SPECint 2006, for
non-benchmarks it also has a cost: if we go by the
wishes of the compiler maintainers, we have to
``fix'' our working, conforming C programs; this
requires a substantial effort, and can result in
bigger and slower code. The effort could otherwise
be invested in source-level optimizations by
programmers, which can have a much bigger effect
(e.g., a factor $>2.7$ for Jon Bentley's traveling
salesman program). Therefore, optimizations that
assume that undefined behavior does not exist are a
bad idea not just for security, but also for the
performance of non-benchmark programs.}
}

@Proceedings{kps15,
title = {18. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'15)},
booktitle = {18. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'15)},
year = {2015},
key = {KPS '15},
editor = {Jens Knoop and M. Anton Ertl},
url = {http://www.complang.tuwien.ac.at/kps2015/proceedings/},
url-pdf = {http://www.complang.tuwien.ac.at/kps2015/proceedings/kps15.pdf}
}

In section 4.2 I use the same way to determine the effectiveness of
"UB Optimizations" (called "optimizations" (with quotes) in the
paper), but with a longer list of flags for gcc-5.2.0:

-fno-aggressive-loop-optimizations -fno-unsafe-loop-optimizations
-fno-delete-null-pointer-checks -fno-strict-aliasing
-fno-strict-overflow -fno-isolate-erroneous-paths-dereference
-fwrapv

I used 8 different versions of Jon Bentley's Traveling Salesman
program translated to C (from Pascal). I found that for clang-3.5 the
flags made no difference (same binary) for any of the programs, while
for gcc-5.2.0 they made no difference in the binary for three of the
programs and no significant speed difference for two more programs.
For the remaining programs, the "speedups" from "UB Optimizations"
were by a factor of 1.04, 0.95, and 1.02.

It contrasts this with the speedup seen from source-level (manual)
optimizations: The 8 programs were successive steps in manual
optimization. The speedup seen here was a factor 2.7 when using
gcc-5.2.0 -O3 and more for other compilers.

>I argue that UB Optimizations introduce
>more risks than actual speedup gains.

It's hard to quantify the risks.

My argument has been that it is cheaper and much more effective to
perform manual optimization than to ensure that a C program contains
no undefined behaviour.

>Don't get me wrong, I'm not saying that the compiler should magically
>understand situations where the programmer makes UB mistakes and then
>repair them. What I'm saying is that the compiler should compile the code
>"as is" without making smart optimization transformations that break the
>intention of the programmer.

To which the advocates of "UB Optimizations" react by claiming that
they don't know the intention. So I have written a followup paper
that discusses this aspect in more depth:

@InProceedings{ertl17kps,
author = {M. Anton Ertl},
title = {The Intended Meaning of \emph{Undefined Behaviour}
in {C} Programs},
crossref = {kps17},
pages = {20--28},
url = {http://www.complang.tuwien.ac.at/papers/ertl17kps.pdf},
url2 = {http://www.kps2017.uni-jena.de/proceedings/kps2017_submission_5.pdf},
abstract = {All significant C programs contain undefined
behaviour. There are conflicting positions about how
to deal with that: One position is that all these
programs are broken and may be compiled to arbitrary
code. Another position is that tested and working
programs should continue to work as intended by the
programmer with future versions of the same C
compiler. In that context the advocates of the first
position often claim that they do not know the
intended meaning of a program with undefined
behaviour. This paper explores this topic in greater
depth. The goal is to preserve the behaviour of
existing, tested programs. It is achieved by letting
the compiler define a consistent mapping of C
operations to machine code; and the compiler then
has to stick to this behaviour during optimizations
and in future releases.}
}

@Proceedings{kps17,
title = {19. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'17)},
booktitle = {19. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'17)},
year = {2017},
key = {KPS '17},
editor = {Wolfram Amme and Thomas Heinze},
url = {http://www.kps2017.uni-jena.de/kps2017_proceedings.html},
url-pdf = {http://www.kps2017.uni-jena.de/proceedings/kps2017.pdf}
}

>My current progress is here [2]. I did not start the technical work, ATM I only
>have the research proposal. I reached you to see if you have any feedback
>on this proposal.
[[2] https://tildegit.org/lucic71/dissertation/src/branch/master/TSW/tsw.pdf]

I have read Section III and IV of this paper.

Section III: You cite my 2015 paper above. About SPECInt 2006, I
cited only Wang et al.'s paper (and compute the overall SPECInt
difference based on their numbers). For the traveling salesman
program, the speedup factor of 2.7 (or more, depending on compiler and
optimization level) is for source-level (i.e., manual) optimization,
not "UB Optimization". The numbers for "UB Optimization" are, as
mentioned above, usually a factor of 1, and in the three cases where
there is a difference, it's a factor 1.04, 0.95, and 1.02, as
mentioned above.

Section IV: You have the beginnings of a plan to evaluate the
performance; you leave the question of benchmarks open until you know
what changes in the resulting code are due to "UB Optimization". It's
not clear what the point of such a benchmark selection would be. It
certainly would not make the selection more representative, and would
have the smell of cherry-picking. Also, "UB Optimization" may cause
changes in thousands of places in the code of OpenBSD, but only few of
those places will be in hot code and be relevant for performance.


Click here to read the complete article
Re: Undefined Behavior Optimizations in C

<23-01-015@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=643&group=comp.compilers#643

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 06 Jan 2023 08:41:49 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-015@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="29736"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, semantics
Posted-Date: 06 Jan 2023 12:00:03 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Fri, 6 Jan 2023 08:41 UTC

gah4 <gah4@u.washington.edu> writes:
>On Thursday, January 5, 2023 at 10:13:08 AM UTC-8, Spiros Bousbouras wrote:
>> On 5 Jan 2023 10:05:49 +0000
>> "Lucian Popescu" <luc...@ctrl-c.club> wrote:
>
>> > I'm currently working on an academic project that analyzes the speedup gain of
>> > Undefined Behavior Optimizations in C.
>(snip)
>
>> > To test the theory that the UB Optimizations introduce more risks than
>> > speedup gains,
>
>> Isn't this comparing apples and oranges ?
>
>Probably.

You can compare apples and oranges: How many can you buy for, say, EUR
5? What's their nutritional value? etc.

In a similar way, we can compare the advantages and costs of "UB
Optimization".

The advantages claimed by the faithful are speedups (usually
unquantified, to let the the audience imagine high speedups). Wang et
al. has done some quantification work, as have I; the results were
that the speedups were small, or sometimes even slowdowns.

Among the costs are that programmers have to change your program to
eliminate standard-undefined behaviour.

First you have to find that behaviour. The "UB Optimization"
compilers have added "sanitizers" to let you find it; however,
sanitizers work at run-time, so they won't find cases where, e.g., the
"UB Optimization" "optimized" away a bounds-check preventing a buffer
overflow unless there is a test case that tries to perform this buffer
overflow.

And once you have found the behaviour, you have to change it. This
not only costs developers time, it can also cost performance. E.g.,
in Section 4.3 of
<https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf>,
I estimate that converting Gforth to C without undefined behaviour
would cause a slowdown by a factor >3; Gforth is currently compiled
with as few "UB Optimizations" as possible (by using flags like
-fwrapv and -fno-... flags). I expect that removing these flags will
give a speedup by approximately a factor of 1. For those who believe
differently, I have repeatedly posted a challenge
<2017Aug18.152854@mips.complang.tuwien.ac.at>

|Write a proof-of-concept Forth interpreter in the language you
|advocate that runs at least one of bubble-sort, matrix-mult or sieve
|from bench/forth in
|<http://www.complang.tuwien.ac.at/forth/bench.zip>

We can then check whether the language you advocate can compete with a
low-level language by benchmarking your interpreter against Gforth.

We can compare the advantages to the costs of "UB Optimization" in the
currency of development time:

How much development time does it cost to achieve as much speedup as
the UB optimizations give?

How much development time does it cost to add test cases for
sanitizing, to run all the different sanitizers, to remove the
undefined behaviour that you find, and to add manual optimizations
that compensate any performance regressions that may turn up in that
process?

My expectation is that the latter development cost would often be far
greater (and, in the case of the Forth interpreter, I expect that the
performance regression compensation is not possible), but it would be
nice if we had some empirical results for that.

>Most important when debugging, is that you can trust the compiler to
>do what you said. That they don't, has always been part of
>optimization

If a compiler produces something that behaves differently, it's not an
optimization. One usually considers the I/O (but not timing)
behaviour of the program when evaluating behaviour. When using a
single-step debugger, you see execution at a more fine-grained level
than the optimization was designed for. One approach is to debug the
unoptimized program, and enable optimization afterwards. If it's a
proper optimization, the I/O behaviour will be unchanged.

>but these UB make it worse.

"UB optimizations" assume that programs don't perform undefined
behaviour, and they are transformations that are proper optimizations
for such programs. The problem is that this assumption is usually
wrong.

>My thought would be all of SPEC

SPEC CPU programs are not representative in many respects:

First, they are chosen to be standard-compliant; it's not clear how
SPEC tries to ensure this, but as the 464.h264ref function SATD shows,
the process is not perfect. Still, a program that uses one of the
flags for disabling "UB optimization" is going to be rejected by SPEC,
so SPEC programs cannot be used for evaluating all the costs of "UB
optimizations".

Moreover, the "UB optimization" compilers are tuned for SPEC CPU: They
compile the undefined behaviour in SPEC CPU programs (e.g., SATD) as
intended, while they miscompile a non-SPEC CPU program with the same
undefined behaviour (see gcc bug 66875).

And given that the compiler maintainers (and others) evaluate the
performance of the compilers using SPEC CPU, the compiler maintainers
are motivated to add optimizations that speed up a SPEC CPU program;
e.g., if an "UB optimization idea" breaks a test case one one of the
tested programs (for gcc those reportedly nowadays include all
automatically testable Debian programs, which is good), but benefits a
SPEC CPU program, the compiler maintainer will be motivated to refine
the criteria for applying the optimizations until the breakage
disappears while still applying for the SPEC CPU program; if the "UB
optimization" benefits only programs outside the relevant benchmarks,
the compiler maintainer will be less likely to spend as much time on
it. As a result, I expect to see more benefits from "UB
Optimizations" for SPEC CPU than for, say, code that was written after
the compiler was released (and was not tuned for the compiler).

>I haven't thought about this so recently. Do compilers give warnings
>when they remove such UB code?

No. The faithful claim that it is impossible, but I don't think so:

The compiler could internally compile the program twice: Once with the
flags as given, and once with all the flags that the compiler has for
defining undefined behaviour (-fwrapv and the various
-fno-... options). If there is a difference in the resulting code,
produce a warning. That warning could be for something that is a
proper optimization for this particular program (i.e., a false
positive), or it could be for miscompilation (a true positive); that's
up to the developer to sort out.

Once we have that framework in place, we can look at the false
positives and refine it. E.g., there is a claim that many false
negatives come from macro evaluations that are possible at compile
time. In that case, one could add something to the source code that
suppresses the warning (just like there is "__attribute__((unused))"
for suppressing warnings about unused variables.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Undefined Behavior Optimizations in C

<23-01-017@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=645&group=comp.compilers#645

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!aioe.org!adore2!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 6 Jan 2023 16:12:25 +0100
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-017@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="30902"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, semantics
Posted-Date: 06 Jan 2023 12:03:27 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-012@comp.compilers>
Content-Language: en-GB
 by: David Brown - Fri, 6 Jan 2023 15:12 UTC

On 06/01/2023 01:22, gah4 wrote:
> On Thursday, January 5, 2023 at 10:13:08 AM UTC-8, Spiros Bousbouras wrote:
>> On 5 Jan 2023 10:05:49 +0000
>> "Lucian Popescu" <luc...@ctrl-c.club> wrote:
>
>>> I'm currently working on an academic project that analyzes the speedup gain of
>>> Undefined Behavior Optimizations in C.
> (snip)
>
>>> To test the theory that the UB Optimizations introduce more risks than
>>> speedup gains,
>
>> Isn't this comparing apples and oranges ?
>
> Probably.
>
> You can quantify speed-up, but it is harder to quantify risk.
>
> You might be able to quantify debug time, and how much longer
> it takes to debug a program with such behavior.
>
> Most important when debugging, is that you can trust the compiler to
> do what you said. That they don't, has always been part of
> optimization, but these UB make it worse.

The trouble with undefined behaviour is that, in general, you cannot
trust the compiler to "do what you say" because it cannot know what you
have said.

A computer language like C is defined by its standard. This says what
particular combinations of characters in the source code actually mean.
If what you write does not fit the specified and documented patterns
(or the pattern is explicitly labelled "undefined behaviour"), then it
does not mean anything at all.

So if you write "give me some prime numbers" as your C code, it means
nothing and the compiler can't help you. If you write "int * p = 0; int
x = *p;", it means nothing and the compiler can't help you.

(Well, the compiler might be able to give helpful error messages!)

When you write "x = *p;", you are saying to the compiler "It is a fact
that p is valid pointer to data of a type compatible to *p, all the
constraints required for the assignment operation are met, there is no
partial overlap between x and *p, there are no data races, and no range
errors converting any floating point values. Given that, act as though
the value of x is now equal to *p after any required conversions."

You might /think/ you are saying "read the value at address p, and store
it in the memory reserved for x".

When you write "x = (y * 30) / 15;" (for "int" x and y), you might
/think/ you are asking the compiler to multiply by 30 and then divide by
15. But you are actually telling it that there would be no overflow if
it were to multiply y by 30, and thus it can use simple mathematical
equalities to reduce the expression to "x = y * 2;".

You can /always/ trust the compiler to do what you said, barring
occasional bugs in the compiler. What you cannot always do is trust the
programmer to know what he or she /actually/ said, or to write what he
or she meant.

And outside of flags that change the language semantics, such as gcc's
-fwrapv or -fno-strict-aliasing, enabling or disabling optimisations
does not change the meaning of the code. It might affect how code is
generated (and therefore its speed and size), but if the behaviour is
different then it is because your code did not say what you thought it said.

And yes, I know debugging optimised code can be difficult. Sometimes
that means adding extra "volatile" qualifiers, or "asm volatile("" :::
"memory");" fences, in order to have good breakpoint spots or debugging
information.

Re: Undefined Behavior Optimizations in C

<23-01-018@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=646&group=comp.compilers#646

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!aioe.org!adore2!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 6 Jan 2023 10:33:29 -0800 (PST)
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-018@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="52443"; mail-complaints-to="abuse@iecc.com"
Keywords: code, history, comment
Posted-Date: 06 Jan 2023 13:49:53 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-017@comp.compilers>
 by: gah4 - Fri, 6 Jan 2023 18:33 UTC

On Friday, January 6, 2023 at 9:03:30 AM UTC-8, David Brown wrote:

(snip, I wrote)

> > Most important when debugging, is that you can trust the compiler to
> > do what you said. That they don't, has always been part of
> > optimization, but these UB make it worse.

> The trouble with undefined behaviour is that, in general, you cannot
> trust the compiler to "do what you say" because it cannot know what you
> have said.

Well, yes. But sometimes ...

This is reminding me, last year I was working on a program that I worked
on (but didn't write) in 1977. Writting in, as close as it could be, to
standard Fortran 66. (Except that it makes some assumptions about
the use of values read in, and written out, with A format. Like that you
can compare such values.)

In any case, I wanted to compile with bounds check on because it wasn't
working right. (The version I had was not the exact one from 1977.)

It turns out, though, to have a fair number of statements like:

IF(I.GT.0.AND.X(I).EQ.3) ...

That is, test the subscript and use it in the same IF statement.

Now, this would not be a problem at all in C, with short-circuit IF
evaluation required, but even now Fortran doesn't do short
circuit IF evaluation. The way Fortran did static allocation in 1977,
there was pretty much no chance that you would access outside
your address space evaluating X(0). There is a chance, though,
that the value 3 is stored there.

Pretty much what I say in this case, is that I rely on the system
memory protection, and don't need the compiler to ignore the
test on I, which I suspect is one of the UB that C compilers do.

Anyway, in this case I know exactly what I said, and the compiler
does, too. And as long as subscript checks are off, it most
likely works.
[Both Fortran 66 and 77 could do short-circuit evaluation, but it's
not required. I think most compilers did, since it was easy, but
of course you couldn't count on it. -John]

Re: Undefined Behavior Optimizations in C

<23-01-019@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=647&group=comp.compilers#647

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!aioe.org!adore2!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 6 Jan 2023 11:39:42 -0800 (PST)
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-019@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-018@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="65439"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, history, comment
Posted-Date: 06 Jan 2023 15:04:26 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-018@comp.compilers>
 by: gah4 - Fri, 6 Jan 2023 19:39 UTC

On Friday, January 6, 2023 at 10:49:59 AM UTC-8, gah4 wrote:

(snip)

> Now, this would not be a problem at all in C, with short-circuit IF
> evaluation required, but even now Fortran doesn't do short
> circuit IF evaluation. The way Fortran did static allocation in 1977,
> there was pretty much no chance that you would access outside
> your address space evaluating X(0). There is a chance, though,
> that the value 3 is stored there.

(snip)

> [Both Fortran 66 and 77 could do short-circuit evaluation, but it's
> not required. I think most compilers did, since it was easy, but
> of course you couldn't count on it. -John]

Sorry, yes, you can't count on it.

But if you can't count on it, then you can't use it in standard programs.

But now that you mention it, I was compiling this last year with
gfortran, and ran into those, so it wasn't doing it. At least it wasn't
doing all of them. I would run the program with bounds check on,
and see where it failed.

Reminds me, though, of another one from years ago.

DO 11 I=l,10
DO 12 J=l,10
IF (B(I}.LT.0.)GO TO 11
12 C(J)=SQRT(B(I)}
11 CONTINUE

comes from an IBM manual explaining the Fortran H optimizations.
It seems that the optimizer can move the SQRT out of the inner loop,
but not the test on B(I). This is actually documented, so isn't UB
anymore. (Well, not IBM UB.)

That is in an IBM manual from 1973.
[It's on page 65 of this 1967 edition. I guess they did data flow but not control flow.
http://bitsavers.org/pdf/ibm/360/fortran/C28-6602-3_OS_FORTRAN_IV_H_Programmers_Guide_1967.pdf
-John]

Re: Undefined Behavior Optimizations in C

<23-01-021@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=649&group=comp.compilers#649

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Sat, 7 Jan 2023 12:10:21 -0000 (UTC)
Organization: Aioe.org NNTP Server
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-021@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-018@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="74721"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, comment
Posted-Date: 08 Jan 2023 13:58:37 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Sat, 7 Jan 2023 12:10 UTC

On Fri, 6 Jan 2023 10:33:29 -0800 (PST)
gah4 <gah4@u.washington.edu> wrote:
> It turns out, though, to have a fair number of statements like:
>
> IF(I.GT.0.AND.X(I).EQ.3) ...
>
> That is, test the subscript and use it in the same IF statement.
>
> Now, this would not be a problem at all in C, with short-circuit IF
> evaluation required, but even now Fortran doesn't do short
> circuit IF evaluation. ...

The analogous C code would be
if (I > 0 && X[I] == 3) ...

..A C compiler would be "entitled" to ignore the I > 0 test if I has not
been initialised. If the array X has not been initialised or if it only has 1
element (so only X[0] is meaningful) then a C compiler could treat the
whole (I > 0 && X[I] == 3) expression as always false and not emit any code
for it. I don't know if these are the scenarios you have in mind when you say
"is one of the UB that C compilers do". In the first scenario the code would
be meaningless even from a human point of view. In scenarios 2 and 3 , the
code is meaningful if I is 0 when the expression is reached but then there
would be no need for the tests so the code would almost certainly have a bug.
So I find the behaviour of the C compilers perfectly intuitive regardless of
whether it produces a speed-up. If I is unitialised then likely you would
get a warning. For the other 2 scenarios , I'm not sure you would even if the C
compiler does something special like treat the expression as always false. A
warning would always be nice but I've heard it is hard to achieve , I think
because by the time in the compilation process certain optimisations are made
, the original code has been transformed too much for the compiler to make
the connection with some specific point in the source code.

[I'm pretty sure I don't want to use a C compiler that says aha, I know that
this code is buggy, so I will silently throw it away. -John]

Re: Undefined Behavior Optimizations in C

<23-01-026@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=653&group=comp.compilers#653

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Mon, 9 Jan 2023 09:10:24 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-026@comp.compilers>
References: <23-01-009@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="23342"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, semantics
Posted-Date: 09 Jan 2023 11:41:15 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Mon, 9 Jan 2023 09:10 UTC

On 2023-01-05, Lucian Popescu <lucic71@ctrl-c.club> wrote:
> Hi,
>
> I'm currently working on an academic project that analyzes the speedup gain of
> Undefined Behavior Optimizations in C. I argue that UB Optimizations introduce
> more risks than actual speedup gains. This happens because most of the time
> the compiler does not have the context to understand the intention of the
> programmer in code that contains UB. For example this fast inverse square root
> function [1] triggers UB at this line: i = * ( long * ) &y;. A "smart"
> compiler could delete this line as an optimization because it contains UB.

"(Lack of) UB optimizations" are based on these ideas:

1. The programmer is infallible, and therefore
2. Any interpretation of the code's meaning which entails undefined
behavior must be incorrect and unintended; and also
3. If some construct has no interpretation that is not UB, the
programmer must be saying that that code is unreachable.

They are pretty terrible engineering.

Assuming a lack of undefined behavior in C is simply foolish; it's
extremely unlikely to be correct for any nontrivial program. A
compiler processing just tens of thousands of lines of C is likely to
be incorrect a few times in making a no-UB assumption.

In regard to 3, there are actually some extensions being proposed to C
whereby the programer can insert an expression which means "this code
is not reached", and the only semantics of that expression is that it
has undefined behavior. Since the programmer would never intend
undefined behavior, by (3) above it implies that the construct is not
reached. The compiler can replace it by an infinite loop, abortive
exit, reformatting of the disk, or whatever. Or just generate no
instructions at all.

I simply cannot get behind the idea of a deliberately introduced
language construct that has nothing but undefined behavior. I
understand that it's efficient: the compiler can reason about that not
being reachable, and not expend any instructions for it that would add
to the code size. I would rather have spend the code size to have it
stop. Just stick a breakpoint trap instruction in there or whatever.

> Don't get me wrong, I'm not saying that the compiler should magically
> understand situations where the programmer makes UB mistakes and then
> repair them. What I'm saying is that the compiler should compile the code
> "as is" without making smart optimization transformations that break the
> intention of the programmer.

I echo that serntiment. I don't require optimizations of the form
"assuming expression X is always well-defined, thus can translate some
other expression Y as follows".

In C, the programmer is supposed to do a lot of the optimizing.

In spite of these UB-related shennanigans, new developments in compilers
like GCC are not improving this basic fact.

You still have to know the tight coding techniques from 1990 in order
to get good code out of the latest GCC.

In the 1990's, I compared code for deleting from a linked list:

node->next->prev = node->prev;
node->prev->next = node->next;

versus:

node_t *next = node->next;
node_t *prev = node->prev;

next->prev = prev;
prev->next = next;

I get the same results today as I did in 1990-something: a shorter
instruction sequence for the code in which I cached the pointers in
local variables! This is because the compiler suspects that
the first assignment "node->next->prev = node->prev"
might or might not have affected the value of "node->prev";
so that has to be loaded again; it cannot be subject to CSE. Strict
aliasing reasoning doesn't help because everything has the same node_t *
type.

(But, note! If the assignment node->next->prev = node->prev
affects node->prev, that could only be becauase node->next == node.
And in that case, the assignment is a no-op: the cached node->prev
value could be used to avoid accesing that location again! GCC is not
yet smart enough to figure this out: and nobody needs it to be.)

A C compiler just needs to obey the memory accesses the programmer
expressed, figure out which local variables to keep in registers,
and do some basic things with loops, inlining and whatnot; and
have good basic optimizations like control flow stuff (jump thraeding
and whatnot), peephole optimizations, CSE, ...

Approaches like "oh this loop must be infinite because it nothing but
increments the dummy variable and tests for it going negative" just have
no place in engineering; that's just someone dicking around to get their
name in the git history of the compiler, and stuff their resume.

If you want an optimal instruction sequence for something, there is
always assembly language.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Undefined Behavior Optimizations in C

<23-01-027@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=654&group=comp.compilers#654

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Mon, 9 Jan 2023 10:14:02 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-027@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="23675"; mail-complaints-to="abuse@iecc.com"
Keywords: C, semantics, optimize
Posted-Date: 09 Jan 2023 11:41:49 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Mon, 9 Jan 2023 10:14 UTC

On 2023-01-06, David Brown <david.brown@hesbynett.no> wrote:
> On 06/01/2023 01:22, gah4 wrote:
>> On Thursday, January 5, 2023 at 10:13:08 AM UTC-8, Spiros Bousbouras wrote:
>>> On 5 Jan 2023 10:05:49 +0000
>>> "Lucian Popescu" <luc...@ctrl-c.club> wrote:
>>
>>>> I'm currently working on an academic project that analyzes the speedup gain of
>>>> Undefined Behavior Optimizations in C.
>> (snip)
>>
>>>> To test the theory that the UB Optimizations introduce more risks than
>>>> speedup gains,
>>
>>> Isn't this comparing apples and oranges ?
>>
>> Probably.
>>
>> You can quantify speed-up, but it is harder to quantify risk.
>>
>> You might be able to quantify debug time, and how much longer
>> it takes to debug a program with such behavior.
>>
>> Most important when debugging, is that you can trust the compiler to
>> do what you said. That they don't, has always been part of
>> optimization, but these UB make it worse.
>
> The trouble with undefined behaviour is that, in general, you cannot
> trust the compiler to "do what you say" because it cannot know what you
> have said.

That's correcst; however, what we should be able to trust is for the
compiler not to make it worse.

The compiler makes it worse when it assumes that the programmer is
infallible, and thus makes logical inferences predicated on some
construct never having undefined behavior, using those to guide the
translation of other constructs.

This is particularly harmful when the undefined behavior is *de facto*
defined: like that if the undefinedness aspect is ignored, and the
obvious code is emitted, that code will do something characteristic
of the machine.

In C99, the original definition of undefined behavior is this:

behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements.

NOTE: Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).

There is an obvious intepretation of this text which rules out
the too-clever optimizations.

If the compiler optimizes construct Y based on some other construct
X being free of undefined behavior, and that assumption turns
out to be false, that compiler has not conformed to the "NOTE"
part of the treatment of undefined behavior.

Firstly, it has not "ignor[ed] the situation completely". You cannot
assume that X is well-defined, in order to treat Y in some way, and yet
say that you're completely ignoring the situaton of X. If the UB
problem in X causes a secondary problem with the way Y was translated,
that is a problem with the assumption that was made about X. That
assumption was made because it's possible for X to be undefined, and
that possiblity was expliclty disregarded, which is not the same as
completely ignored.

The situation also isn't a documented manner characteristic of
the environment.

It also isn't a termination of translation or execution with or without
a diagnostic message.

The situation doesn't conform to the NOTE.

C compiler writers should take the NOTE more seriously.

Ignoring the situation completely must mean something like:

"Do not engage in any reasoning about whether the inputs or other
conditions are right for the correct execution of the construct. Do not
handle the bad situations at translation time, and don't emit any code
to handle them. Just translate the specified semantics, and let the
translated language and its run-time deal with those potential issues."

As soon as we assume that, oh, if X is well-defined, we can make a
certain shortcut in translating this other construct Y, we are no longer
"completely ignoring"; we are engaging in reasoning about whether the
inputs or other conditions are right for the correct execution of X, and
insisting that they must be, as a justification for treating Y in a
certain way.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Re: Undefined Behavior Optimizations in C

<23-01-031@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=658&group=comp.compilers#658

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: jonathan...@gmail.com (Jon Chesterfield)
Newsgroups: comp.compilers
Subject: Re: Re: Undefined Behavior Optimizations in C
Date: Tue, 10 Jan 2023 10:46:32 +0000
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-031@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="78830"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 10 Jan 2023 16:59:04 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <sympa.1673343321.1624.383@lists.iecc.com>
 by: Jon Chesterfield - Tue, 10 Jan 2023 10:46 UTC

> So before we decide if UB optimizations are actually allowed by the
standard we need to decide what "ignoring the situation completely
with unpredictable results" actually means.

[1] https://port70.net/~nsz/c/c89/rationale/

Lucian

WG14 are aware of UB optimising compilers and could have steered away from
this path, but haven't. It's been decades now. The pointer provenance work
seeks to apply aliasing rules even more aggressively. GCC and clang are
both pursuing faster codegen via exploiting undefined behaviour.

C, the WG14 ISO defined language, as implemented by the primary open source
toolchains, is thus unfit for my purposes. I'm not clear what use that
language has.

C, the typed assembler of ye olde times, is a profoundly useful language.
One just can't use GCC or clang to build it reliably.

It annoys me intensely that the type aliasing rules capture something a
whole program optimising compiler can usually work out for itself anyway,
while preventing me from reading 128bit integers from the same memory I
fetch_add 32bit integers into.

Jon

Re: Undefined Behavior Optimizations in C

<23-01-032@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=659&group=comp.compilers#659

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Tue, 10 Jan 2023 17:32:28 +0100
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-032@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-027@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="79420"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 10 Jan 2023 17:00:54 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: David Brown - Tue, 10 Jan 2023 16:32 UTC

On 09/01/2023 11:14, Kaz Kylheku wrote:
> On 2023-01-06, David Brown <david.brown@hesbynett.no> wrote:
>> On 06/01/2023 01:22, gah4 wrote:

>>> Most important when debugging, is that you can trust the compiler to
>>> do what you said. That they don't, has always been part of
>>> optimization, but these UB make it worse.
>>
>> The trouble with undefined behaviour is that, in general, you cannot
>> trust the compiler to "do what you say" because it cannot know what you
>> have said.
>
> That's correcst; however, what we should be able to trust is for the
> compiler not to make it worse.
>

Let's be clear here. At the theoretical level, when your code executes
undefined behaviour at run time, you have said "I don't care what
happens now". The compiler is allowed to do /anything/. There is no
such thing as "making it worse" - you have said you will accept
absolutely /anything/ from the compiler. No restrictions, no
limitations - /anything/.

That is what "undefined behaviour" means. There are /no/ definitions of
the behaviour, and no limits, expectations, "worse" choices, "better"
choices.

At a /practical/ level, compilers can try to be more helpful. They can
define behaviour for things that are not defined in the language
specifications. They can give you warnings or error messages (at
compile time or run time, such as when using sanitizers). They can
offer optional extra semantics (like gcc's "-fwrapv" flag). They can
certainly avoid going out of their way to avoid being specially awkward
- normally they are simply trying to make the code more efficient when
it is running correctly.

But you cannot be sure things won't go unexpectedly bad. You can never
have guarantees about that. Guarantees require definitions, and you
don't have that with undefined behaviour.

Consider this example source code:

void print_message(bool x) {
if (x) {
printf("Great!\n");
} else {
printf("Boo!\n");
}
}

void format_harddisk(int disk_number) {
...
}

This is fine code, with no UB in sight.

The compiler could this to the pseudo-code :

print_message:
if ($GPR1 == 0) :
$GPR1 = "Boo!"
jump puts

if ($GPR1 == 1) :
$GPR1 = "Great!"
jump puts

format_harddisk:
...

It knows that on entry, the parameter in GPR1 is either 0 or 1, because
it is type "bool".

What happens if you call the function from a different translation unit
like this :

extern void print_message(int x);
print_message(2);

?

This is clearly wrong - clearly undefined behaviour. And the result
would be a formatted disk. But there is nothing wrong with the
compiler's generated code. I have seen other occasions when compiler's
have made code with booleans that appear to be both true and false, or
neither true nor false, as a result of undefined behaviour setting the
underlying memory to something other than 0 or 1, simply because that
was the result of the most efficient code.

There are all sorts of other ways a compiler can take advantage of
knowing a boolean is either 0 or 1, and all sorts of things that can go
wrong if you've messed up your code and the boolean is /not/ 0 or 1. It
can use it as an index into an array, or to multiply a value
(translating "x = f ? a : b;" into "x = f * (a - b) + b;" ), or using it
as a bit mask, or subtracting 1 and using /that/ as a mask. And once
you have started getting the wrong values, there is no theoretical limit
to how bad things can get - it is not the compiler "making things
worse", it is bugs in the code being outside the specifications for what
is intended for the program.

Of course the compiler should not be messing around checking that the
value you pass is actually 0 or 1. That would be inefficient. And what
would it do if it was a bad value? Treat it as "false" ? What about
calling "missile_control(bool cancel_launch)" with a bad value? Should
it treat it as "true" ? What about calling "missile_control(bool
confirm_launch)" with a bad value? Jump to an error handler and stop
the program with a message? How does that work for your flight controller?

If you want a hand-holding check everything language, there are plenty
of them to go around. They're great, and far more appropriate than C
for many purposes. C, on the other hand, trusts the programmer and
gives you the most efficient object code it can from the source you give
it, based on the rules of the language (plus any extra features defined
by the compiler).

> The compiler makes it worse when it assumes that the programmer is
> infallible, and thus makes logical inferences predicated on some
> construct never having undefined behavior, using those to guide the
> translation of other constructs.

It is the programmers' /job/ to write correct code. It is up to them to
use any and all tools available to do that - static error checking,
run-time sanitizers, code reviews, unit tests, system tests, etc. That
also includes using the write language for the task in hand. If the
code is critical, but too complex to be written bug-free in C, then
maybe a different language would be better - or maybe a different
programmer.

I am not suggesting that I always write bug-free code. But the bugs I
put in my code are /my/ bugs - /my/ responsibility. And I don't blame
the compiler for the effects of these bugs.

>
> This is particularly harmful when the undefined behavior is *de facto*
> defined: like that if the undefinedness aspect is ignored, and the
> obvious code is emitted, that code will do something characteristic
> of the machine.

It is particularly harmful when programmers think there is such a thing
as "de facto defined". That's an oxymoron. If the behaviour is
defined, it is defined. If it is not defined, it is not defined. If it
is not defined and a programmer makes unwarranted and incorrect
assumptions about what they think it means, then the programmer needs to
update his or her understanding of the language. They don't get to
blame the compiler or the compiler writer for not making the same
unfounded assumptions that they did.

>
> In C99, the original definition of undefined behavior is this:
>
> behavior, upon use of a nonportable or erroneous program construct or
> of erroneous data, for which this International Standard imposes no
> requirements.
>
> NOTE: Possible undefined behavior ranges from ignoring the situation
> completely with unpredictable results, to behaving during translation
> or program execution in a documented manner characteristic of the
> environment (with or without the issuance of a diagnostic message), to
> terminating a translation or execution (with the issuance of a
> diagnostic message).
>
> There is an obvious intepretation of this text which rules out
> the too-clever optimizations.
>

I presume you mean "a documented manner characteristic of the
environment". That would be something like gcc's "-fwrapv" flag. It
takes something that is undefined in the C standards - signed integer
overflow - and gives it a /documented/ behaviour. The key here is
"documented". Now the behaviour is /defined/ - it is UB as far as the C
standards are concerned, but /defined/ behaviour as far as the
implementation is concerned. And you can rely on it, as long as you use
a compiler with such a flag and documented behaviour.

The more general and obvious interpretation of the definition is
"imposes no requirements" - you have no right to have any expectations
about the results of the undefined behaviour. In particular, while you
might have some guesses as to what will happen just at the time, you
have no idea about knock-on effects.

> If the compiler optimizes construct Y based on some other construct
> X being free of undefined behavior, and that assumption turns
> out to be false, that compiler has not conformed to the "NOTE"
> part of the treatment of undefined behavior.
>
> Firstly, it has not "ignor[ed] the situation completely". You cannot
> assume that X is well-defined, in order to treat Y in some way, and yet
> say that you're completely ignoring the situaton of X. If the UB
> problem in X causes a secondary problem with the way Y was translated,
> that is a problem with the assumption that was made about X. That
> assumption was made because it's possible for X to be undefined, and
> that possiblity was expliclty disregarded, which is not the same as
> completely ignored.
>
> The situation also isn't a documented manner characteristic of
> the environment.
>
> It also isn't a termination of translation or execution with or without
> a diagnostic message.
>
> The situation doesn't conform to the NOTE.
>
> C compiler writers should take the NOTE more seriously.


Click here to read the complete article
Re: Undefined Behavior Optimizations in C

<23-01-035@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=662&group=comp.compilers#662

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Tue, 10 Jan 2023 15:57:07 -0800 (PST)
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-035@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-027@comp.compilers> <23-01-032@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="8236"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, C
Posted-Date: 10 Jan 2023 19:57:07 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-032@comp.compilers>
 by: gah4 - Tue, 10 Jan 2023 23:57 UTC

On Tuesday, January 10, 2023 at 2:00:57 PM UTC-8, David Brown wrote:

(big snip)

> It is particularly harmful when programmers think there is such a thing
> as "de facto defined". That's an oxymoron. If the behaviour is
> defined, it is defined. If it is not defined, it is not defined. If it
> is not defined and a programmer makes unwarranted and incorrect
> assumptions about what they think it means, then the programmer needs to
> update his or her understanding of the language. They don't get to
> blame the compiler or the compiler writer for not making the same
> unfounded assumptions that they did.

A lot of C code assumes two's complement. I believe Unisys still sells
ones' complement hardware, and so that code might have UB, but often
people know that it will only run on two's complement machines.

Much also assumes ASCII code. Don't tell IBM about that.

I remember comments in some routines in the IBM PL/I (F)
library, such as:

THE OPERATION OF THIS MODULE DEPENDS UPON AN INTERNAL
REPRESENTATION OF THE EXTERNAL CHARACTER SET WHICH IS
EQUIVALENT TO THE ONE USED AT ASSEMBLY TIME. THE CODING
HAS BEEN ARRANGED SO THAT REDEFINITION OF ''CHARACTER''
CONSTANTS, BY REASSEMBLY, WILL RESULT IN A CORRECT
MODULE FOR THE NEW DEFINITIONS.

I have never seen that in C programs.

As I noted in another thread, Java has a rule that the compiler be
able to determine that a variable is given a value before it is used.
It is a fatal compilation error if it can't.

If C compilers had a fatal compilation error when they found
suspicious UB, I could live with that. Maybe it means doing
something to convince the compiler, even if it really is still UB.

Remember, much of the data breaches we hear about
(and also the ones we don't) are due to buffer overflow
in C programs. Make it easier, instead of harder, for programmers
to avoid buffer problems.

(snip)

Re: Undefined Behavior Optimizations in C

<23-01-036@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=663&group=comp.compilers#663

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 11 Jan 2023 09:34:28 -0000 (UTC)
Organization: news.netcologne.de
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-036@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="93651"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 11 Jan 2023 18:07:01 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Thomas Koenig - Wed, 11 Jan 2023 09:34 UTC

Jon Chesterfield <jonathanchesterfield@gmail.com> schrieb:
>> So before we decide if UB optimizations are actually allowed by the
> standard we need to decide what "ignoring the situation completely
> with unpredictable results" actually means.
>
> [1] https://port70.net/~nsz/c/c89/rationale/
>
> Lucian
>
> WG14 are aware of UB optimising compilers and could have steered away from
> this path, but haven't. It's been decades now. The pointer provenance work
> seeks to apply aliasing rules even more aggressively. GCC and clang are
> both pursuing faster codegen via exploiting undefined behaviour.
>
> C, the WG14 ISO defined language, as implemented by the primary open source
> toolchains, is thus unfit for my purposes.

This whole discussion is a bit weird.

The C standard, like any programming language standard, is a
specification. Compiler writers promise to keep their end of the
bargain by translating code to machien language according to that
specification (modulo bugs). A programmer who uses the language
according to that specification can be sure that this is translated
correctly (again, modulo bugs).

Compilers are required to document implementation-defined behavior,
and it is also possible to document undefined behavior; a
documented instance of undefined behavior is an extension.

For example, a program which uses #include <unistd.h> (where in
fact another standard, POSIX, is referenced) or using __builtin_clz
(a gcc extension taken up by other compilers) invokes undefined
behavior according to the C standard. There is nothing wrong with
that, provided the programmers sticks to compilers where it is
documented to be supported.

There is another class of undefined behavoor, that which is the result
of an erroneous construct. It appears that some people like to violate
the specifications (both the C standard and the compiler's
documentation, or other standards) and expect consistent results
according to their wishes. Or, to put it slightly differently, they
assume some sort of unwritten super specification that should apply in
addition to the other specifications.

There is in fact no such specification, and it is unclear if peple
who think so could agree on what exactly it should and should
not contain.

And people who have such an unspecified specification in mind then are
then hurt, indignant and sometimes downright offensive to people who
point out that compiler writers follow the written and
well-established specifications of the standards, and that something
that happened by accident in earlier revisions of a compiler has
changed.

> I'm not clear what use that language has.

Either you know very little knowledge of the software currently used
(did you ever hear of a piece of software called the "Linux kernel"?)
or you are engaging in hyperbole. I strongly suspect the latter.

If C is indeed the right sort of language to write large programs in
is quite another question, in which a strong argument against C can
indeed be made.

> C, the typed assembler of ye olde times, is a profoundly useful language.
> One just can't use GCC or clang to build it reliably.

You mean without a lot of the optimizations of the last (almost)
fifty years?

> It annoys me intensely that the type aliasing rules capture something a
> whole program optimising compiler can usually work out for itself anyway,

Link-time optimization, while highly useful, currently has severe
scaling issues. This is an area where advances would be really
useful (if implemented in production compilers).

> while preventing me from reading 128bit integers from the same memory I
> fetch_add 32bit integers into.

Not clear what exactly you want to do this.

Re: Undefined Behavior Optimizations in C

<23-01-041@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=668&group=comp.compilers#668

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 11 Jan 2023 14:20:49 +0100
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-041@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="95876"; mail-complaints-to="abuse@iecc.com"
Keywords: C, standards
Posted-Date: 11 Jan 2023 18:11:50 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-GB
In-Reply-To: <23-01-031@comp.compilers>
 by: David Brown - Wed, 11 Jan 2023 13:20 UTC

On 10/01/2023 11:46, Jon Chesterfield wrote:
>> So before we decide if UB optimizations are actually allowed by the
> standard we need to decide what "ignoring the situation completely
> with unpredictable results" actually means.
>
> [1] https://port70.net/~nsz/c/c89/rationale/
>
> Lucian
>
> WG14 are aware of UB optimising compilers and could have steered away from
> this path, but haven't. It's been decades now. The pointer provenance work
> seeks to apply aliasing rules even more aggressively. GCC and clang are
> both pursuing faster codegen via exploiting undefined behaviour.
>
> C, the WG14 ISO defined language, as implemented by the primary open source
> toolchains, is thus unfit for my purposes. I'm not clear what use that
> language has.

It seems to be very popular, so many people find it fit for their
purposes. (I certainly find it, along with C++, a good fit for my
low-level small-systems embedded programming, and I am quite happy with
"UB optimisations" as you call them.) But some people don't like it,
which is fair enough. And certainly no one thinks either the language
or the tools are perfect.

Some people want a language that is mostly like C, except for certain
features - and accessing objects in memory using different pointer types
is a common request. This is why both gcc and clang (and a few other
compilers) have a flag that gives you this behaviour
"-fno-strict-aliasing". I always find it ironic that the compilers that
some people complain "doesn't do what I want" or "doesn't do what old
compilers did" are precisely the compilers that give you these options.

>
> C, the typed assembler of ye olde times, is a profoundly useful language.

It's a myth. It never existed. There has simply been a steady
improvement in the optimisation of correct code as compilers have got
more sophisticated. There are compilers that document and define
behaviour for certain things that are undefined behaviour in the C
standards, but I have never heard of a compiler that claims to
understand the programmers' intentions even when they write incorrect
code.

C was designed from day one to be a high-level language, not an
assembler of any sort. Limitations of weaker earlier compilers does
not mean the language was supposed to work that way.

I first used a C compiler that optimised on the assumption that UB
didn't happen some 25 years ago. (In particular, it assumed signed
integer arithmetic never overflowed.)

> One just can't use GCC or clang to build it reliably.

You mean newer tools treat your code bugs in different ways from older
tools? There's a solution for that.

> It annoys me intensely that the type aliasing rules capture something a
> whole program optimising compiler can usually work out for itself anyway,
> while preventing me from reading 128bit integers from the same memory I
> fetch_add 32bit integers into.
>

It annoys /me/ intensely that people complain about this sort of thing,
and yet apparently haven't bothered to read the compiler manuals to see
how to get the effects they want. Compile with "-fno-strict-aliasing",
or (better, IMHO) add this to your code:

#pragma GCC optimize ("-fno-strict-aliasing")

Now, if you want to complain that the gcc documentation is not great, or
that flags like this should be documented along with the standards flags
rather than optimisation flags, I'll happily agree. (I don't know if
clang does better here.) But don't complain that the compiler is a problem.

And there are other ways to handle this in gcc. Use "may_alias" types.
Or use volatile accesses. Or use memcpy(). Or use unions.

There are two /real/ problems here. One is that C is not, and never has
been, the language that some people think it is - and thus they get
frustrated when they find out there code is not as correct as they
thought. A second is that there are weak compilers out there that on
the one hand lull developers into a false understanding of the language
due to their limited code optimisations, and on the other hand make safe
alternatives such as "memcpy" highly inefficient on their tools.

What this means is that different compilers, including gcc and clang,
are perfectly capable of generating code that efficiently mixes accesses
of different kinds to the same object. But the details of the code you
write to get the effects are different - C is not as portable here as it
should be. For code that needs to work well on multiple toolchains, you
quickly end up with a header that has conditional compilation and macros
that vary depending on the compiler in use. That is ugly and awkward,
but I know of no better way.

Re: Undefined Behavior Optimizations in C

<23-01-042@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=669&group=comp.compilers#669

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 11 Jan 2023 14:40:30 +0100
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-042@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-027@comp.compilers> <23-01-032@comp.compilers> <23-01-035@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="96511"; mail-complaints-to="abuse@iecc.com"
Keywords: C, standards
Posted-Date: 11 Jan 2023 18:13:05 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-035@comp.compilers>
Content-Language: en-GB
 by: David Brown - Wed, 11 Jan 2023 13:40 UTC

On 11/01/2023 00:57, gah4 wrote:
> On Tuesday, January 10, 2023 at 2:00:57 PM UTC-8, David Brown wrote:
>
> (big snip)
>
>> It is particularly harmful when programmers think there is such a thing
>> as "de facto defined". That's an oxymoron. If the behaviour is
>> defined, it is defined. If it is not defined, it is not defined. If it
>> is not defined and a programmer makes unwarranted and incorrect
>> assumptions about what they think it means, then the programmer needs to
>> update his or her understanding of the language. They don't get to
>> blame the compiler or the compiler writer for not making the same
>> unfounded assumptions that they did.
>
> A lot of C code assumes two's complement. I believe Unisys still sells
> ones' complement hardware, and so that code might have UB, but often
> people know that it will only run on two's complement machines.
>
> Much also assumes ASCII code. Don't tell IBM about that.

There is usually no requirement for a given piece of C code to be fully
portable to all conforming compilers. Most C code is quite limited in
the scope of its use, and it is quite reasonable to assume two's
complement representation (though /not/ two's complement wrapping on
overflow), 8-bit char, ASCII characters, etc. It may be fine to assume
32-bit int, and perhaps little-endian ordering. These are /warranted/
assumptions, not unwarranted ones - they are typically reasonable to
make, and if you want to you can sometimes put compile-time checks so
that if someone does try to use it out of context, they get an error
message.

And these are all points that the C standards call
implementation-dependent behaviour. These may vary between compilers or
targets, but for a given toolchain and target the behaviour is clear and
defined. That is quite different from undefined behaviour.

>
> If C compilers had a fatal compilation error when they found
> suspicious UB, I could live with that. Maybe it means doing
> something to convince the compiler, even if it really is still UB.
>

C compilers often do their best here, as long as you enable warnings and
optimisation (needed for the code analysis). I'd prefer it if tools
like gcc had a lot more warnings by default. However, it is rarely
possible to find run-time UB at compile time. If a function takes a
pointer parameter and dereferences it, it is only UB if the function is
called with an invalid pointer. The compiler will assume that UB does
not occur, and compile the function optimised with that assumption. If
the compiler were to warn you about the potential UB in the function,
your builds would be swamped by false positives.

You can come a long way be using sanitizers when testing code - then
your code is augmented by extra checks, and the run-time will stop with
a fatal error when there are problems. I think such tools are often
under-used by developers (as are static warnings). Of course no testing
can prove code is correct, but it is certainly a help.

> Remember, much of the data breaches we hear about
> (and also the ones we don't) are due to buffer overflow
> in C programs. Make it easier, instead of harder, for programmers
> to avoid buffer problems.
>

Sure. The prime solution here is to stop using C - use a language that
gives you more help in such cases. That could be a high level language
such as Python, a managed language like C# or Java, or a smarter
language like Rust or C++.

If you have to use C, then write /better/ C and do better testing. Stop
passing raw pointers around independently of the size of the data. Stop
using fixed size "magic numbers" for the size of buffers when the size
needed might change later. Refactor your data accesses into small
inline functions, where you can easily add tests and limits during
debugging.

Yes, C makes it easy to write incorrect code - but it is quite possible
to use it for writing correct code. There are other languages that make
it easier to write correct code and harder to write incorrect code -
it's too late for C to change.

Re: Undefined Behavior Optimizations in C

<23-01-043@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=670&group=comp.compilers#670

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 11 Jan 2023 16:09:32 -0800 (PST)
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-043@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-027@comp.compilers> <23-01-032@comp.compilers> <23-01-035@comp.compilers> <23-01-042@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="7463"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, history
Posted-Date: 11 Jan 2023 19:21:25 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <23-01-042@comp.compilers>
 by: gah4 - Thu, 12 Jan 2023 00:09 UTC

On Wednesday, January 11, 2023 at 3:13:08 PM UTC-8, David Brown wrote:

(snip)

> > Much also assumes ASCII code. Don't tell IBM about that.

> There is usually no requirement for a given piece of C code to be fully
> portable to all conforming compilers. Most C code is quite limited in
> the scope of its use, and it is quite reasonable to assume two's
> complement representation (though /not/ two's complement wrapping on
> overflow), 8-bit char, ASCII characters, etc. It may be fine to assume
> 32-bit int, and perhaps little-endian ordering. These are /warranted/
> assumptions, not unwarranted ones - they are typically reasonable to
> make, and if you want to you can sometimes put compile-time checks so
> that if someone does try to use it out of context, they get an error
> message.

As I noted with the Fortran optimization example from 50 years ago,
this is not a new problem. But IBM documented it (so, as you note, it
is an extension and not UB), and programmers learn to expect it.

And yes I ignored the distinction between implementation defined
and undefined. But that is exactly what happens when actual
people (as opposed to other programs) write programs.

People learn what works and what doesn't. When they need to worry
about endianness or ASCII or two's complement.

Much of the implementation, or undefined, behavior of early Fortran
compilers, later got implemented into the standard. And much didn't,
but enough people believe it did.

If compiler writers remember that they are writing for use by actual
people, who are sometimes imperfect, then it should be fine. Try to
make the changes not too surprising to users.

I don't remember the exact example of surprising UB, but they are out
there.

In any case, I am all for the original project, of studying compilers,
UB, and optimization. If it doesn't really help speed up problems, but
does slow down debugging, maybe it isn't worth doing.

Re: Undefined Behavior Optimizations in C

<23-01-045@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=672&group=comp.compilers#672

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Thu, 12 Jan 2023 05:21:16 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-045@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-036@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="11680"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize, comment
Posted-Date: 12 Jan 2023 13:37:40 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Thu, 12 Jan 2023 05:21 UTC

On 2023-01-11, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Jon Chesterfield <jonathanchesterfield@gmail.com> schrieb:
>> It annoys me intensely that the type aliasing rules capture something a
>> whole program optimising compiler can usually work out for itself anyway,
>
> Link-time optimization, while highly useful, currently has severe
> scaling issues. This is an area where advances would be really
> useful (if implemented in production compilers).

Link-time optimization violates ISO C, unfortunately.

ISO C describes C translation in a number of phases. I'm going to use
C99 numbering; it might have changed in C11.

In translation phase 7, syntax and semantic analysis is performed on the
tokens coming from the previous phases (preprocessing) and the
translation unit is translated.

Then in translation phase 8, multiple translation units are combined,
and references are resolved.

Link time optimization breaks the layering; it reintroduces semantic
analysis into translation phase 8, where only linking is supposed to be
taking place.

When the compiler/linker peeks into translation unit a.o in order to
alter something in b.o, you no longer have translation units.

You have something similar to the effect of doing

cat a.c b.c > combined.c
cc combined.c

except that such a crude approach combines all internal linkage
identifiers ("static") into one.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[I don't see the problem, since there's invariably an "as if" rule
that lets you do whatever you want so long as the results are the same
as if you'd done everything in sequence. If a linktime optimizer looks
at the code, promotes a couple of very busy variables that never have
their addresses taken into global registers, so what? Or if it can
globally tell that two variables are never aliased so it can reuse a
register copy of one after modifying the other, again so what? -John]

Re: Undefined Behavior Optimizations in C

<23-01-048@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=675&group=comp.compilers#675

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Thu, 12 Jan 2023 12:21:40 -0800
Organization: None to speak of
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-048@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-036@comp.compilers> <23-01-045@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="82638"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 12 Jan 2023 20:53:07 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Keith Thompson - Thu, 12 Jan 2023 20:21 UTC

Kaz Kylheku <864-117-4973@kylheku.com> writes:
[...]
> Link-time optimization violates ISO C, unfortunately.
>
> ISO C describes C translation in a number of phases. I'm going to use
> C99 numbering; it might have changed in C11.

(It didn't.)

> In translation phase 7, syntax and semantic analysis is performed on the
> tokens coming from the previous phases (preprocessing) and the
> translation unit is translated.
>
> Then in translation phase 8, multiple translation units are combined,
> and references are resolved.
>
> Link time optimization breaks the layering; it reintroduces semantic
> analysis into translation phase 8, where only linking is supposed to be
> taking place.
>
> When the compiler/linker peeks into translation unit a.o in order to
> alter something in b.o, you no longer have translation units.
[...]
> [I don't see the problem, since there's invariably an "as if" rule
> that lets you do whatever you want so long as the results are the same
> as if you'd done everything in sequence. If a linktime optimizer looks
> at the code, promotes a couple of very busy variables that never have
> their addresses taken into global registers, so what? Or if it can
> globally tell that two variables are never aliased so it can reuse a
> register copy of one after modifying the other, again so what? -John]

And in fact the C standard makes this explicit in this case, in a
footnote in the section defining the translation phases:

Physical source file multibyte characters are mapped, in an
implementation- defined manner, to the source character set
(introducing new-line characters for end-of-line indicators) if
necessary. Trigraph sequences are replaced by corresponding
single-character internal representations. Implementations shall
behave as if these separate phases occur, even though many are
typically folded together in practice. Source files, translation
units, and translated translation units need not necessarily be
stored as files, nor need there be any one-to-one correspondence
between these entities and any external representation. The
description is conceptual only, and does not specify any particular
implementation.

Of course any link-time optimization that breaks the standard-defined
semantics makes the implementation non-conforming (the same as for any
optimization).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Re: Undefined Behavior Optimizations in C

<23-01-049@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=676&group=comp.compilers#676

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Thu, 12 Jan 2023 21:50:14 -0000 (UTC)
Organization: news.netcologne.de
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-049@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-036@comp.compilers> <23-01-045@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="83074"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 12 Jan 2023 20:53:51 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Thomas Koenig - Thu, 12 Jan 2023 21:50 UTC

Kaz Kylheku <864-117-4973@kylheku.com> schrieb:
> On 2023-01-11, Thomas Koenig <tkoenig@netcologne.de> wrote:
>> Jon Chesterfield <jonathanchesterfield@gmail.com> schrieb:
>>> It annoys me intensely that the type aliasing rules capture something a
>>> whole program optimising compiler can usually work out for itself anyway,
>>
>> Link-time optimization, while highly useful, currently has severe
>> scaling issues. This is an area where advances would be really
>> useful (if implemented in production compilers).
>
> Link-time optimization violates ISO C, unfortunately.
>
> ISO C describes C translation in a number of phases. I'm going to use
> C99 numbering; it might have changed in C11.
>
> In translation phase 7, syntax and semantic analysis is performed on the
> tokens coming from the previous phases (preprocessing) and the
> translation unit is translated.

[...]

The C standard (like other programming language standards) describes
behavior, not implementation. A compiler which translates "as if" the
phases were followed works as well. The same applies to program
execution, where the concept of the abstract machine is used.

Regarding compilation phases, this is made clear in N2596,

5.1.1.2 Translation phases
The precedence among the syntax rules of translation is specified
by the following phases.6)

plus the footnote which explains a bit what is meant by

6) This requires implementations to behave as if these separate
phases occur, [...]

> Then in translation phase 8, multiple translation units are combined,
> and references are resolved.
>
> Link time optimization breaks the layering; it reintroduces semantic
> analysis into translation phase 8, where only linking is supposed to be
> taking place.

.... which does not have to be the case.

> When the compiler/linker peeks into translation unit a.o in order to
> alter something in b.o, you no longer have translation units.

.... which you don't need; 5.1.1.2 shows the "as if" rule quite clearly.

If LTO changes the semantics of a standard-conforming program, that
is another matter; that is then simply a bug.

Re: Undefined Behavior Optimizations in C

<23-01-055@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=682&group=comp.compilers#682

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: antis...@math.uni.wroc.pl
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Fri, 13 Jan 2023 20:46:28 -0000 (UTC)
Organization: Aioe.org NNTP Server
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-055@comp.compilers>
References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> <23-01-017@comp.compilers> <23-01-018@comp.compilers> <23-01-021@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="18899"; mail-complaints-to="abuse@iecc.com"
Keywords: C, Fortran, comment
Posted-Date: 13 Jan 2023 16:21:12 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
X-Notice: Filtered by postfilter v. 0.9.2
 by: antis...@math.uni.wroc.pl - Fri, 13 Jan 2023 20:46 UTC

Spiros Bousbouras <spibou@gmail.com> wrote:
> On Fri, 6 Jan 2023 10:33:29 -0800 (PST)
> gah4 <gah4@u.washington.edu> wrote:
> > It turns out, though, to have a fair number of statements like:
> >
> > IF(I.GT.0.AND.X(I).EQ.3) ...
> >
> > That is, test the subscript and use it in the same IF statement.
> >
> > Now, this would not be a problem at all in C, with short-circuit IF
> > evaluation required, but even now Fortran doesn't do short
> > circuit IF evaluation. ...
>
> The analogous C code would be
> if (I > 0 && X[I] == 3) ...

No, Fortran array start from 1, C arrays start from 0. And Fortran
is not obliged do do short circuit evaluation. So analogous
C code would be

char bv1 = (I>=0);
char bv2 = (X[I] == 3);

if (bv1 && bv2) ...

I this code compiler may throw out 'bv1' (assume that it is true).
Of course, normal C programmer would write

if ((I>=0)&&(X[I] == 3))

which is fine due to short circuit evaluation. Reversing order:

if ((X[I] == 3)&&(I>=0))

have the same problem as first version. Both first version and
third version may appear as output of generators or due to macro
expansion, so this is not entirely theoretical problem. OTOH,
in Fortran and Pascal classic teaching was "do not do this".
This was particularly important in Pascal, as standard required
full evaluation.

--
Waldek Hebisch
[You can start Fortran arrays at zero if you want
DIMENSION FOO(0:99)
But I agree mostly people use the 1 default -John]

Re: Undefined Behavior Optimizations in C

<23-01-058@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=685&group=comp.compilers#685

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Sun, 15 Jan 2023 04:17:31 -0000 (UTC)
Organization: A noiseless patient Spider
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-058@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-036@comp.compilers> <23-01-045@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="38641"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize, comment
Posted-Date: 15 Jan 2023 13:09:50 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Sun, 15 Jan 2023 04:17 UTC

On 2023-01-12, the moderator wrote:
> [I don't see the problem, since there's invariably an "as if" rule
> that lets you do whatever you want so long as the results are the same
> as if you'd done everything in sequence. If a linktime optimizer looks
> at the code, promotes a couple of very busy variables that never have
> their addresses taken into global registers, so what? Or if it can
> globally tell that two variables are never aliased so it can reuse a
> register copy of one after modifying the other, again so what? -John]

We can look at this as if the translation units have been rolled back in
time to translation phase 7, where semantic analysis is now taking place
again in an aternative future. This time, there is a magic oracle
present that provides information about all the translation units
which will be linked to produce the program.

I'm skeptical whether the GCC and Clang LTO systems always behave like a
magic oracle that supplies iron-clad truths to translation phase 7.

And even if so, it seems wrong for the translation phase semantic
analysis to be looking at information from any other part of a program
than the translation unit at hand. The standard clearly says what is
subject to semantic analysis, and it's only the material from the
translation unit:

7. White-space characters separating tokens are no longer significant.
Each preprocessing token is converted into a token. The resulting
tokens are syntactically and semantically analyzed and translated
as a translation unit.

"The resulting tokens" are those from this translation unit, and nowhere
else. Nothing other than those tokens is to be semantically analyzed.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[Hey, wait, it's one thing to say LTO might be buggy, another to say
it's fundamentally imposible. Any optimizer might be buggy. -John]

Re: Undefined Behavior Optimizations in C

<23-01-062@comp.compilers>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=689&group=comp.compilers#689

 copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.compilers
Subject: Re: Undefined Behavior Optimizations in C
Date: Wed, 18 Jan 2023 13:14:35 -0000 (UTC)
Organization: Aioe.org NNTP Server
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-01-062@comp.compilers>
References: <23-01-027@comp.compilers> <sympa.1673343321.1624.383@lists.iecc.com> <23-01-031@comp.compilers> <23-01-041@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="85677"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 18 Jan 2023 11:35:37 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Spiros Bousbouras - Wed, 18 Jan 2023 13:14 UTC

On Wed, 11 Jan 2023 14:20:49 +0100
David Brown <david.brown@hesbynett.no> wrote:
> C was designed from day one to be a high-level language, not an
> assembler of any sort. Limitations of weaker earlier compilers does
> not mean the language was supposed to work that way.

For those who want an abstract or portable assembler , there exists
c9x.me/compile/ .I've never used it but at least it aims to be that ,
unlike C. I would be curious to know of other analogous projects. I
guess the "register transfer language" of GCC is somewhat analogous.

> I first used a C compiler that optimised on the assumption that UB
> didn't happen some 25 years ago. (In particular, it assumed signed
> integer arithmetic never overflowed.)

I have encountered several times the claim that compilers assume that UB does
not happen and I don't understand it. Lets consider 2 examples :

x + 1 > x

in C where x is a signed integer. Compilers will often treat this as
always true with the following reasoning :

- if x does not have the maximum value which fits in its type then the
meaning of the C expressions is the same as their mathematical meaning
so the expression evaluates to true.

- if x has the maximum value which fits in its type then x + 1 is not
defined so any translation (including treating the whole expression as
true) is valid.

There's no assumption that UB (undefined behaviour) will not happen, both
possibilities are accounted for.

Another example is

... *some_pointer_object ...
[ some_pointer_object does not get modified in this part of the code and
has not been declared as volatile ]
if (some_pointer_object == NULL) ...

If some_pointer_object is not NULL then the test can be omitted ; if it is
NULL then the earlier dereference is UB so any translation is valid including
omitting the test.

Again, there's no assumpion that UB will not happen.

So the request that C compilers should stop assuming that UB will not
happen seems to me completely misguided. I think what is really meant
is that, in reasoning what a valid translation is, C compilers (or
the authors of the compilers) should not employ the notion of UB. But
then how should UB be translated ? Again there exists the assumption
or claim that there is some intuitively obvious translation and
compilers should go for that. First, I'm not sure that there exists
such a common intuition even among humans and second, even if it does
, how does one go from an intuition to an algorithm C compilers can
use to do translation ? Lots of things are intuitively obvious but
creating an algorithm to duplicate the human intuition is a hard
problem, one which has not been solved in many cases and perhaps even
one which is unsolvable in some cases.

I've seen the suggestion that compilers should describe their behaviour in
terms of assembly generated (possibly some kind of abstract assembly) as
opposed to higher terms. I'm not sure if this is possible and, even if it is,
I would not find it useful. I tend to think of what I want my code to do in
higher terms and then bring it down to the level of the language with
successive refinements. If parts of C were described in assembly terms then
it would potentially force me to do at least 1 more refinement step with no
benefit.

A more productive avenue is for people to give definitions, as precise as
possible, to the kinds of UB which has caused them problems and then try to
convince compiler writers to implement such extensions if they don't do so
already. In this area even compiler documentation should perhaps improve. For
example, from the GCC man page

-fdelete-null-pointer-checks
Use global dataflow analysis to identify and eliminate useless
checks for null pointers. The compiler assumes that
dereferencing a null pointer would have halted the program. If
a pointer is checked after it has already been dereferenced, it
cannot be null.

In some environments, this assumption is not true, and programs
can safely dereference null pointers. Use
-fno-delete-null-pointer-checks to disable this optimization for
programs which depend on that behavior.

..The above still doesn't tell me what is supposed to happen when a NULL pointer
is dereferenced even with the -fno-delete-null-pointer-checks flag. I'm
guessing it's impossible to give a general definition. One can in specific
systems but in general no so perhaps the above description does the best
possible.

Another example

-fstrict-overflow
Allow the compiler to assume strict signed overflow rules,
depending on the language being compiled. For C (and C++) this
means that overflow when doing arithmetic with signed numbers is
undefined, which means that the compiler may assume that it will
not happen.

This is poor phrasing, in particular the part "which means that the
compiler may assume that it will not happen" is redundant. There is no
reason for the compiler to assume anything about which execution paths will
happen during runtime to conclude for example that x + 1 > x can be
translated as true. The above quote gives an unnecessarily circuitous
reasoning as to why the expression can be translated as true. I give a more
direct reasoning above.

> It annoys /me/ intensely that people complain about this sort of thing,
> and yet apparently haven't bothered to read the compiler manuals to see
> how to get the effects they want. Compile with "-fno-strict-aliasing",
> or (better, IMHO) add this to your code:
>
> #pragma GCC optimize ("-fno-strict-aliasing")
>
> Now, if you want to complain that the gcc documentation is not great,

Yeah, it would be good if there was a more precise specification as to what
additional guarantees beyond the C standard this gives. For translating other
languages into C, this seems to be important for achieving object allocation
and garbage collection since relying on the native malloc() and related is
generally not adequate, at least not if your garbage collector is allowed to
move objects.

Pages:12
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor