novaBBS - comp.arch - Re: The value of floating-point exceptions?

Re: The value of floating-point exceptions?

<sdhqj9$1l9s$1@gioia.aioe.org>

https://www.novabbs.com/devel/article-flat.php?id=19138&group=comp.arch#19138

Path: i2pn2.org!i2pn.org!aioe.org!ux6ld97kLXxG8kVFFLnoWg.user.46.165.242.75.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sat, 24 Jul 2021 12:44:40 -0700
Organization: Aioe.org NNTP Server
Message-ID: <sdhqj9$1l9s$1@gioia.aioe.org>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<905454f4-6aa8-4a4e-b0ef-6b55340c61d7n@googlegroups.com>
<sdfcrn$2s3$1@dont-email.me>
<2ec34aff-417e-4e2c-ad88-c500e17b1e78n@googlegroups.com>
<d3b65e22-c7de-4bf3-a650-63ca49ca623cn@googlegroups.com>
<b195fbdb-0cab-44a9-bb9d-5fd0c469fc10n@googlegroups.com>
<sdgek0$70h$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="54588"; posting-host="ux6ld97kLXxG8kVFFLnoWg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2

by: Chris M. Thomasson - Sat, 24 Jul 2021 19:44 UTC

On 7/24/2021 12:14 AM, BGB wrote:
> On 7/24/2021 1:05 AM, Quadibloc wrote:
>> On Saturday, July 24, 2021 at 12:01:35 AM UTC-6, Quadibloc wrote:
>>
>>> I remember specifically that one particular failed Ariane launch was
>>> held up as what bad floating-point can cause...
>>
>> Ah. A Google search led me to what really happened.
>>
>> The maiden flight of Ariane 5 failed because conversion from
>> floating-point to 16-bit integer caused an exception, as the
>> floating-point valule was not within the range of such integers.
>>
>> No doubt bad software design, but not really necessarily
>> about the kind of numerical analysis concern that IEEE 754
>> addresses.
>>
>
> IME:
> Inexact rounding is hardly ever the source of problems (excluding
> integer-exact math in codecs, but this sort of stuff is typically
> fixed-point).
>
>
> However, more serious bugs, like unexpected overflow, saturation, or
> traps, these sorts of things can ruin ones' day.
>
> Likewise for algorithms whose domain is "all real numbers except zero",
> which just so happens to fail catastrophically and nuke the program if
> the input just so happens to be zero.
>
>
> Insufficient precision can also be a source of problems.
>
> Issues with the use of single precision coordinates in games trying to
> deal with large worlds is a fairly common example (typically a lot of
> workarounds are needed to make all the math work for worlds much larger
> than a few km or so).

Fwiw, I did a fractal encoding thing for pure fun where I map symbols to
complex roots. The code only encodes/decodes data in a Julia set with a
power of 16 to conveniently map into hexbytes. This is a pure
experiment, but it works. Rounding errors and other floating point
issues can make it crap out by decrypting part of the plaintext with
some "junk" on it. So, I found an arbitrary precision lib in JavaScript,
Decimal.js, and put it up online. Here it is:

http://fractallife247.com/test/rifc_cipher

A ciphertext is a single complex number. For instance here is one that
contains my name:

real:
-0.70928383564905214400492596591643890200098992164665980782966227733203960188288097070737389345985516069300117982413622497654113697

imag:
0.75006448767684071252250616852543657203512420946592887596427538863664520584158627985390890157772176873489867565028334553930789721

It is hardcoded to use 128 points of precision.

To decrypt it, you copy and paste the real and imaginary parts into
their respective textboxes and click decrypt. Can you get it to work?

This is sensitive to floating point issues. Also, it can take a long
time to compute for larger plaintexts. I really need to put the
processing into a WebWorker, or even in an animation loop where each
frame decodes a bit of the work. Right now, it "freezes" the ui during
long computations.

Oh, btw, here is an implementation of it using C++ and doubles:

https://github.com/ChrisMThomasson/fractal_cipher/blob/master/RIFC/cpp/ct_rifc_sample.cpp

Iirc the C++ code can handle different powers, even negative ones. Its
not hardcoded to 16 symbols.

Re: The value of floating-point exceptions?

<Ws_KI.75293$VU3.57635@fx46.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19139&group=comp.arch#19139

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx46.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 57
Message-ID: <Ws_KI.75293$VU3.57635@fx46.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sat, 24 Jul 2021 20:10:30 UTC
Organization: usenet-news.net
Date: Sat, 24 Jul 2021 20:10:30 GMT
X-Received-Bytes: 2387

by: Branimir Maksimovic - Sat, 24 Jul 2021 20:10 UTC

On 2021-07-24, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>>> Both loops execute at 2 cycles/iteration (IPC=4) on a Skylake. I
> ...
>>Could you provide test program?
>
> http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip
>
> Today I measure 1.8 cycles per iteration (IPC=4.44) from this program
> (both compiled for 387 and SSE2) on a Skylake. Strange.
>
> - anton
Heh, I just need timing code now:
in gas, ARMv8 equivalent of rdtsc:
init_time:
mrs x0,CNTPCT_EL0 ; counter
adrp x8,elapsed@PAGE
str x0, [x8,elapsed@PAGEOFF]
ret
time_me:
mrs x8,cntfrq_el0 ; clock
ucvtf d1,x8
mrs x8,CNTPCT_EL0 ; counter
adrp x9,elapsed@PAGE
ldr x9,[x9,elapsed@PAGEOFF]
sub x8,x8,x9
ucvtf d0,x8
fdiv d0,d0,d1
str d0,[sp]
b _printf
Just I dunno how to measure ticks in C :P
modified also flags for gcc:
bmaxa@Branimirs-Air axpy % cat Makefile
all: axpy-sse axpy-387

axpy-sse: axpy-sse.o axpy-main-sse.o
gcc axpy-sse.o axpy-main-sse.o -o $@

axpy-387: axpy-387.o axpy-main-387.o
gcc axpy-387.o axpy-main-387.o -o $@

axpy-sse.o: axpy.c
gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o $@

axpy-387.o: axpy.c
gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o $@

axpy-main-sse.o: axpy-main.c
gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o $@

axpy-main-387.o: axpy-main.c
gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o $@

--
bmaxa now listens Sex & Violence by Exploited from Totally Exploited

Re: The value of floating-point exceptions?

<548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19141&group=comp.arch#19141

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:e904:: with SMTP id a4mr2084476qvo.56.1627161860525;
Sat, 24 Jul 2021 14:24:20 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr7138228oti.178.1627161860299;
Sat, 24 Jul 2021 14:24:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 24 Jul 2021 14:24:20 -0700 (PDT)
In-Reply-To: <c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:4568:db05:6e53:ccff;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:4568:db05:6e53:ccff
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sat, 24 Jul 2021 21:24:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Quadibloc - Sat, 24 Jul 2021 21:24 UTC

On Friday, July 23, 2021 at 2:13:02 PM UTC-6, MitchAlsup wrote:

> IBM's was worse the simply radix 16, it was also a truncation system
> with a guard digit (½ a byte) in calculations. To make maters worse, its
> competitors were 36-bit, 48-bit, and 60-bit (single precision).

And it originally went out the door without even the guard digit: the
results of arithmetic done that way were so bad, they had to fix that
in every computer they sold as well as in all future ones.

I hadn't looked at things that way, though, as far as precision went.

Since, unlike the STRETCH, the IBM 360 never went in for *bit*
addressing, they could have designed it with a 36-bit word.

Packed decimal would be less efficient, with one wasted bit in
each byte; but at least in one of those bytes, that bit could be
used for the sign, instead of using a whole digit for that.

If one wanted lower-case, putting characters in 9-bit bytes would
be just fine. The punched card code, though, would now be more'
complicated, since with 12, 11, 0, 8, and 9 used independently,
there were seven bits left to encode the remaining three bits with
one punch or none.

Using another punch for an additional binary bit means that only six
are left, so at least one combination with two of them punched would
also have to be allowed.

One could use 4 as the additional punch, and 2 and 6 as the additional
combination of the rest, this would help keep the card strong.

But with a 36-bit word, obviously an additional complication to the
instruction set would have been needed: now the computer would
have to be able to handle not just character instructions for 9-bit
upper-case only characters, but also for 6-bit characters, three to
a halfword, since upper-case only characters were what was usually
used, and if they could be handled efficiently, it would be insisted
upon.

As individual packed decimal digits couldn't be addressed, the fact that
one could only address these characters in multiples of three shouldn't
be too much of an issue; there would be a way to unpack them into
nine-bit characters when required.

John Savard

Re: The value of floating-point exceptions?

<Ej1LI.12196$gE.4282@fx21.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19143&group=comp.arch#19143

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx21.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 126
Message-ID: <Ej1LI.12196$gE.4282@fx21.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sat, 24 Jul 2021 23:25:24 UTC
Organization: usenet-news.net
Date: Sat, 24 Jul 2021 23:25:24 GMT
X-Received-Bytes: 4371

by: Branimir Maksimovic - Sat, 24 Jul 2021 23:25 UTC

On 2021-07-24, Branimir Maksimovic <branimir.maksimovic@gmail.com> wrote:
> On 2021-07-24, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>>>> Both loops execute at 2 cycles/iteration (IPC=4) on a Skylake. I
>> ...
>>>Could you provide test program?
>>
>> http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip
>>
>> Today I measure 1.8 cycles per iteration (IPC=4.44) from this program
>> (both compiled for 387 and SSE2) on a Skylake. Strange.
>>
>> - anton
> Heh, I just need timing code now:
> in gas, ARMv8 equivalent of rdtsc:
> init_time:
> mrs x0,CNTPCT_EL0 ; counter
> adrp x8,elapsed@PAGE
> str x0, [x8,elapsed@PAGEOFF]
> ret
> time_me:
> mrs x8,cntfrq_el0 ; clock
> ucvtf d1,x8
> mrs x8,CNTPCT_EL0 ; counter
> adrp x9,elapsed@PAGE
> ldr x9,[x9,elapsed@PAGEOFF]
> sub x8,x8,x9
> ucvtf d0,x8
> fdiv d0,d0,d1
> str d0,[sp]
> b _printf
> Just I dunno how to measure ticks in C :P
> modified also flags for gcc:
> bmaxa@Branimirs-Air axpy % cat Makefile
> all: axpy-sse axpy-387
>
> axpy-sse: axpy-sse.o axpy-main-sse.o
> gcc axpy-sse.o axpy-main-sse.o -o $@
>
> axpy-387: axpy-387.o axpy-main-387.o
> gcc axpy-387.o axpy-main-387.o -o $@
>
>
> axpy-sse.o: axpy.c
> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o $@
>
> axpy-387.o: axpy.c
> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o $@
>
> axpy-main-sse.o: axpy-main.c
> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o $@
>
> axpy-main-387.o: axpy-main.c
> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o $@
>
>
Here it is on M1:
bmaxa@Branimirs-Air axpy % make
gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o axpy-sse.o
gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o axpy-main-sse.o
as timing.gas -o timing.o
gcc axpy-sse.o axpy-main-sse.o timing.o -o axpy-sse
gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o axpy-387.o
gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o axpy-main-387.o
gcc axpy-387.o axpy-main-387.o timing.o -o axpy-387
bmaxa@Branimirs-Air axpy % ./axpy-387
stride: 0.000004 secs
axpy: 0.356832 secs
bmaxa@Branimirs-Air axpy % ./axpy-sse
stride: 0.000004 secs
axpy: 0.357641 secs
bmaxa@Branimirs-Air axpy % cat axpy-main.c
#include <stdlib.h>
void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount);
extern void init_time(void);
extern void time_me(const char* format);
int main()
{ long stride=10;
long i;
char *x=malloc(16000);
char *y=malloc(16000);
char *px=x, *py=y;
init_time();
for (i=0; i<1000; i++) {
*(Float *)px=1.0;
*(Float *)py=0.0;
px+=stride;
py+=stride;
}
time_me("stride: %f secs\n");
init_time();
for (i=0; i<1000000; i++)
axpy(1.000001, (Float *)x, (Float *)y, stride, 1000);
time_me("axpy: %f secs\n");
}

bmaxa@Branimirs-Air axpy % cat timing.gas
.text
.globl _init_time
.globl _time_me
.align 4
_init_time:
mrs x0,CNTPCT_EL0 ; counter
adrp x8,elapsed@PAGE
str x0, [x8,elapsed@PAGEOFF]
ret
_time_me:
mrs x8,cntfrq_el0 ; clock
ucvtf d1,x8
mrs x8,CNTPCT_EL0 ; counter
adrp x9,elapsed@PAGE
ldr x9,[x9,elapsed@PAGEOFF]
sub x8,x8,x9
ucvtf d0,x8
fdiv d0,d0,d1
str d0,[sp]
b _printf
..data
..bss
..align 8
elapsed: .space 8

--
bmaxa now listens 03. Yorgos Kazantzis - Sorocos

Re: The value of floating-point exceptions?

<98ee6cd3-1a1e-44a6-848b-20a8d81a6adfn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19146&group=comp.arch#19146

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a154:: with SMTP id k81mr12197566qke.202.1627184413053;
Sat, 24 Jul 2021 20:40:13 -0700 (PDT)
X-Received: by 2002:a05:6830:1c1:: with SMTP id r1mr6164874ota.22.1627184412816;
Sat, 24 Jul 2021 20:40:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 24 Jul 2021 20:40:12 -0700 (PDT)
In-Reply-To: <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:1da:f261:aacf:575d;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:1da:f261:aacf:575d
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com> <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <98ee6cd3-1a1e-44a6-848b-20a8d81a6adfn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 25 Jul 2021 03:40:13 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 37

by: Quadibloc - Sun, 25 Jul 2021 03:40 UTC

On Saturday, July 24, 2021 at 3:24:21 PM UTC-6, Quadibloc wrote:

> Since, unlike the STRETCH, the IBM 360 never went in for *bit*
> addressing, they could have designed it with a 36-bit word.

I've tried to imagine what a 360 with a nine-bit byte might be like.

I felt that it might be possible to design the RX format so that
displacements could grow from 12 bits to 15 bits:

opcode: 9 bits
destination register: 4 bits
index register: 4 bits
base register: 4 bits
displacement: 15 bits

But that would mean the RR format would have to look
like this:

opcode: 9 bits
destination register: 4 bits
source register: 4 bits
opcode: 1 bit

And the SS format instructions would be a mess:

opcode: 8 bits
source base register: 1 bit
length: 8 bits
destination base register: 4 bits
destination address: 15 bits
source base register (continued): 3 bits
source address: 15 bits

so I might just have to resign myself to 14-bit
displacements instead.

John Savard

Re: The value of floating-point exceptions?

<1aac679c-f89b-4a20-afcc-32b581cb3e7cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19149&group=comp.arch#19149

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:6708:: with SMTP id e8mr10191758qtp.166.1627196203336;
Sat, 24 Jul 2021 23:56:43 -0700 (PDT)
X-Received: by 2002:a05:6808:158a:: with SMTP id t10mr13411276oiw.175.1627196203083;
Sat, 24 Jul 2021 23:56:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 24 Jul 2021 23:56:42 -0700 (PDT)
In-Reply-To: <98ee6cd3-1a1e-44a6-848b-20a8d81a6adfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:f009:2b4b:53ac:3cb4;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:f009:2b4b:53ac:3cb4
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com> <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
<98ee6cd3-1a1e-44a6-848b-20a8d81a6adfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1aac679c-f89b-4a20-afcc-32b581cb3e7cn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 25 Jul 2021 06:56:43 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Sun, 25 Jul 2021 06:56 UTC

On Saturday, July 24, 2021 at 9:40:14 PM UTC-6, Quadibloc wrote:

> so I might just have to resign myself to 14-bit
> displacements instead.

But, on the other hand, if I do _that_, I end up with
opcode fields that are far larger than necessary.

John Savard

Re: The value of floating-point exceptions?

<sdj6qs$1fg0$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19150&group=comp.arch#19150

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!gQesnmzb0qB9vB6KzIYu2A.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 10:19:39 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sdj6qs$1fg0$1@gioia.aioe.org>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<sdeill$7ct$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="48640"; posting-host="gQesnmzb0qB9vB6KzIYu2A.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Sun, 25 Jul 2021 08:19 UTC

Ivan Godard wrote:
> On 7/23/2021 5:19 AM, Quadibloc wrote:
>> On Friday, July 23, 2021 at 4:54:17 AM UTC-6, Ivan Godard wrote:
>>> Some people care if the moon lander
>>> lands on the moon surface, or ten meters above or below it.
>>
>> That's definitely a good thing to care about.
>>
>> Does the fancy stuff in IEEE 754 really help with that?
>>
>> Or would doing everything in double precision, even if one
>> were using something like the old System/360 floating
>> format, do a better job?
>>
>> Accurate and reliable calculations by computers are a very
>> important thing. How much the approach taken by IEEE 754
>> contributes to that goal, let alone the more elaborate notions
>> presented by people like John Gustavson, is, however, an open
>> question, I would think.
>>
>> John Savard
>>
>
>
> Back when I was active on the IEEE committee, I once asked Kahan
> whether, if quad (128-bit) were as fast as double, would he still have
> denorms. He answered an unequivocal "No!".

The funny part is of course that since then, due to the universal
inclusion of FMAC, denorms no longer have any speed penalty, just a
single-digit percentage gate increase.

I.e. no reason to skip it even on quad where I'm guessing the huge FMAC
post-normalization network would be one of the largest single features.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The value of floating-point exceptions?

<0p9LI.21872$tL2.6289@fx43.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19151&group=comp.arch#19151

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx43.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 130
Message-ID: <0p9LI.21872$tL2.6289@fx43.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sun, 25 Jul 2021 08:37:16 UTC
Organization: usenet-news.net
Date: Sun, 25 Jul 2021 08:37:16 GMT
X-Received-Bytes: 4883

by: Branimir Maksimovic - Sun, 25 Jul 2021 08:37 UTC

On 2021-07-24, Branimir Maksimovic <branimir.maksimovic@gmail.com> wrote:
> On 2021-07-24, Branimir Maksimovic <branimir.maksimovic@gmail.com> wrote:
>> On 2021-07-24, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>> Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>>>>> Both loops execute at 2 cycles/iteration (IPC=4) on a Skylake. I
>>> ...
>>>>Could you provide test program?
>>>
>>> http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip
>>>
>>> Today I measure 1.8 cycles per iteration (IPC=4.44) from this program
>>> (both compiled for 387 and SSE2) on a Skylake. Strange.
>>>
>>> - anton
>> Heh, I just need timing code now:
>> in gas, ARMv8 equivalent of rdtsc:
>> init_time:
>> mrs x0,CNTPCT_EL0 ; counter
>> adrp x8,elapsed@PAGE
>> str x0, [x8,elapsed@PAGEOFF]
>> ret
>> time_me:
>> mrs x8,cntfrq_el0 ; clock
>> ucvtf d1,x8
>> mrs x8,CNTPCT_EL0 ; counter
>> adrp x9,elapsed@PAGE
>> ldr x9,[x9,elapsed@PAGEOFF]
>> sub x8,x8,x9
>> ucvtf d0,x8
>> fdiv d0,d0,d1
>> str d0,[sp]
>> b _printf
>> Just I dunno how to measure ticks in C :P
>> modified also flags for gcc:
>> bmaxa@Branimirs-Air axpy % cat Makefile
>> all: axpy-sse axpy-387
>>
>> axpy-sse: axpy-sse.o axpy-main-sse.o
>> gcc axpy-sse.o axpy-main-sse.o -o $@
>>
>> axpy-387: axpy-387.o axpy-main-387.o
>> gcc axpy-387.o axpy-main-387.o -o $@
>>
>>
>> axpy-sse.o: axpy.c
>> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o $@
>>
>> axpy-387.o: axpy.c
>> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o $@
>>
>> axpy-main-sse.o: axpy-main.c
>> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o $@
>>
>> axpy-main-387.o: axpy-main.c
>> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o $@
>>
>>
> Here it is on M1:
> bmaxa@Branimirs-Air axpy % make
> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o axpy-sse.o
> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o axpy-main-sse.o
> as timing.gas -o timing.o
> gcc axpy-sse.o axpy-main-sse.o timing.o -o axpy-sse
> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o axpy-387.o
> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o axpy-main-387.o
> gcc axpy-387.o axpy-main-387.o timing.o -o axpy-387
> bmaxa@Branimirs-Air axpy % ./axpy-387
> stride: 0.000004 secs
> axpy: 0.356832 secs
> bmaxa@Branimirs-Air axpy % ./axpy-sse
> stride: 0.000004 secs
> axpy: 0.357641 secs
> bmaxa@Branimirs-Air axpy % cat axpy-main.c
> #include <stdlib.h>
> void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount);
> extern void init_time(void);
> extern void time_me(const char* format);
> int main()
> {
> long stride=10;
> long i;
> char *x=malloc(16000);
> char *y=malloc(16000);
> char *px=x, *py=y;
> init_time();
> for (i=0; i<1000; i++) {
> *(Float *)px=1.0;
> *(Float *)py=0.0;
> px+=stride;
> py+=stride;
> }
> time_me("stride: %f secs\n");
> init_time();
> for (i=0; i<1000000; i++)
> axpy(1.000001, (Float *)x, (Float *)y, stride, 1000);
> time_me("axpy: %f secs\n");
> }
>
> bmaxa@Branimirs-Air axpy % cat timing.gas
> .text
> .globl _init_time
> .globl _time_me
> .align 4
> _init_time:
> mrs x0,CNTPCT_EL0 ; counter
> adrp x8,elapsed@PAGE
> str x0, [x8,elapsed@PAGEOFF]
> ret
> _time_me:
> mrs x8,cntfrq_el0 ; clock
> ucvtf d1,x8
> mrs x8,CNTPCT_EL0 ; counter
> adrp x9,elapsed@PAGE
> ldr x9,[x9,elapsed@PAGEOFF]
> sub x8,x8,x9
> ucvtf d0,x8
> fdiv d0,d0,d1
> str d0,[sp]
> b _printf
> .data
> .bss
> .align 8
> elapsed: .space 8
>
>
Hm, seems that on ARMv8 simd can't be switched off, compiler produces indentical code
for both cases.. -mfpu option is also not present on aarch64...

--
bmaxa now listens Volim Te by Lollobrigida from Lollobrigida Inc.

Re: The value of floating-point exceptions?

<sdj9rq$ue1$2@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19153&group=comp.arch#19153

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-a40-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 09:11:23 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdj9rq$ue1$2@newsreader4.netcologne.de>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com>
<548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 25 Jul 2021 09:11:23 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-a40-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:a40:0:7285:c2ff:fe6c:992d";
logging-data="31169"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 25 Jul 2021 09:11 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:
> On Friday, July 23, 2021 at 2:13:02 PM UTC-6, MitchAlsup wrote:
>
>> IBM's was worse the simply radix 16, it was also a truncation system
>> with a guard digit (½ a byte) in calculations. To make maters worse, its
>> competitors were 36-bit, 48-bit, and 60-bit (single precision).
>
> And it originally went out the door without even the guard digit: the
> results of arithmetic done that way were so bad, they had to fix that
> in every computer they sold as well as in all future ones.
>
> I hadn't looked at things that way, though, as far as precision went.
>
> Since, unlike the STRETCH, the IBM 360 never went in for *bit*
> addressing, they could have designed it with a 36-bit word.

That was not in the cards.

Gene Amdahl wanted a 24 - bit machine, but got overruled by
management because they insisted on a power of two for the number
of bits, and because the 7-bit ASCII standard was already on
the horizon.

If he had had his way, the /360 would have gone the way of the
CDC 3000 series and other 24-bit systems a long time ago.

There are ample examples how to deal with a 36-bit words:
The IBM 701 and following and the PDP-10, for example.

Re: The value of floating-point exceptions?

<sdjghd$1i4a$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19154&group=comp.arch#19154

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!gQesnmzb0qB9vB6KzIYu2A.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 13:05:17 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sdjghd$1i4a$1@gioia.aioe.org>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="51338"; posting-host="gQesnmzb0qB9vB6KzIYu2A.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Sun, 25 Jul 2021 11:05 UTC

MitchAlsup wrote:
> Having watched this from inside:
> a) HW designers know a lot more about this today than in 1980
> b) even systems that started out as IEEE-format gradually went
> closer and closer to full IEEE-compliant (GPUs) until there is no
> useful difference in the quality of the arithmetic.
> c) once 754-2009 came out the overhead to do denorms went to
> zero, and there is no reason to avoid full speed denorms in practice.
> (BGB's small FPGA prototyping environment aside.)

I agree.

> d) HW designers have learned how to perform all of the rounding
> modes at no overhead compared to RNE.

This is actually dead easy since all the other modes are easier than
RNE: As soon as you have all four bits required for RNE (i.e.
sign/ulp/guard/sticky) then the remaining rounding modes only need
various subsets of these, so you use the rounding mode to route one of 5
or 6 possible 16-entry one-bit lookup tables into the rounding circuit
where it becomes the input to be added into the ulp position of the
final packed (sign/exp/mantissa) fp result.

Since the hidden bit is already hidden at this point, andy rounding
overflow of the mantissa from 0xfff.. to 0x000.. will cause the exponent
term to be incremented, possibly all the way to Inf. In all cases, this
is the exactly correct behaviour.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The value of floating-point exceptions?

<jwvmtqalefg.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19158&group=comp.arch#19158

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 09:19:47 -0400
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <jwvmtqalefg.fsf-monnier+comp.arch@gnu.org>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com>
<548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
<sdj9rq$ue1$2@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="a34bdcb9792c85d65dd509414c161a94";
logging-data="17230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18p1rbJkwLzBXgYxg+drVvg"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:618p3idXggRgHitTXRiCOvnTgrU=
sha1:n2r/fX3t8Dowbcd8Qd1wEdMi9A4=

by: Stefan Monnier - Sun, 25 Jul 2021 13:19 UTC

> Gene Amdahl wanted a 24 - bit machine, but got overruled by
> management because they insisted on a power of two for the number
> of bits, and because the 7-bit ASCII standard was already on
> the horizon.

A good reminder that management's decisions can be quite sane, even when
it displeases the engineers,

Stefan

Re: The value of floating-point exceptions?

<bdd7738f-1b15-4970-a3d5-dcbf62496ffen@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19161&group=comp.arch#19161

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:518f:: with SMTP id b15mr13560871qvp.52.1627222888479;
Sun, 25 Jul 2021 07:21:28 -0700 (PDT)
X-Received: by 2002:a05:6820:61d:: with SMTP id e29mr7885333oow.69.1627222888208;
Sun, 25 Jul 2021 07:21:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 07:21:28 -0700 (PDT)
In-Reply-To: <jwvmtqalefg.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:a4fa:c486:95ac:988e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:a4fa:c486:95ac:988e
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com> <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
<sdj9rq$ue1$2@newsreader4.netcologne.de> <jwvmtqalefg.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bdd7738f-1b15-4970-a3d5-dcbf62496ffen@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 25 Jul 2021 14:21:28 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Sun, 25 Jul 2021 14:21 UTC

On Sunday, July 25, 2021 at 7:19:51 AM UTC-6, Stefan Monnier wrote:

> > Gene Amdahl wanted a 24 - bit machine, but got overruled by
> > management because they insisted on a power of two for the number
> > of bits, and because the 7-bit ASCII standard was already on
> > the horizon.

> A good reminder that management's decisions can be quite sane, even when
> it displeases the engineers,

That all depends. If Gene Amdahl wanted to build something like,
say, an SDS 9300, yes, management was right, I will agree.

However, IBM also built the AN/FSQ-31 and AN/FSQ-32. These were
48-bit machines, and if the IBM 360 had looked something like them,
it could well have been just as successful.

John Savard

Re: The value of floating-point exceptions?

<2021Jul25.172251@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19163&group=comp.arch#19163

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 15:22:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 21
Message-ID: <2021Jul25.172251@mips.complang.tuwien.ac.at>
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com> <memo.20210723223630.14132H@jgd.cix.co.uk> <2021Jul24.112832@mips.complang.tuwien.ac.at> <TZRKI.24285$Nq7.6581@fx33.iad> <2021Jul24.184642@mips.complang.tuwien.ac.at> <Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad> <0p9LI.21872$tL2.6289@fx43.iad>
Injection-Info: reader02.eternal-september.org; posting-host="3a9da7f4055550120b631e882934953b";
logging-data="31107"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xTOTuEnCmzOC4MLXk4Mr3"
Cancel-Lock: sha1:oTvdvJmCb6kVczE9s38pwUU99KI=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 25 Jul 2021 15:22 UTC

Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>Hm, seems that on ARMv8 simd can't be switched off, compiler produces indentical code
>for both cases.. -mfpu option is also not present on aarch64...

It will be once you can choose between Neon and SVE (and maybe
Helium).

Even if the code for axpy contains a vectorized variant for stride=8,
this variant will not run, because stride=10 (because this test has
originally been written to test the speed difference between 80-bit
387 FP and 64-bit 387 FP). I very much doubt that they perform
autovectorization for stride=10.

Testing this with gcc-10.2 and clang-11.0 on AMD64 with -O3 -mavx2,
gcc produces a simple scalar loop, while clang produces an unrolled
scalar loop (no vectorized variants to be seen).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The value of floating-point exceptions?

<174a169c-1005-4fc2-8a20-0b61c1e8dd8en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19165&group=comp.arch#19165

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5645:: with SMTP id 5mr11940866qtt.200.1627228077513; Sun, 25 Jul 2021 08:47:57 -0700 (PDT)
X-Received: by 2002:aca:4946:: with SMTP id w67mr14407136oia.155.1627228077291; Sun, 25 Jul 2021 08:47:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 08:47:57 -0700 (PDT)
In-Reply-To: <0p9LI.21872$tL2.6289@fx43.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com> <memo.20210723223630.14132H@jgd.cix.co.uk> <2021Jul24.112832@mips.complang.tuwien.ac.at> <TZRKI.24285$Nq7.6581@fx33.iad> <2021Jul24.184642@mips.complang.tuwien.ac.at> <Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad> <0p9LI.21872$tL2.6289@fx43.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <174a169c-1005-4fc2-8a20-0b61c1e8dd8en@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 25 Jul 2021 15:47:57 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 133

by: Michael S - Sun, 25 Jul 2021 15:47 UTC

On Sunday, July 25, 2021 at 11:37:21 AM UTC+3, Branimir Maksimovic wrote:
> On 2021-07-24, Branimir Maksimovic <branimir....@gmail.com> wrote:
> > On 2021-07-24, Branimir Maksimovic <branimir....@gmail.com> wrote:
> >> On 2021-07-24, Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> >>> Branimir Maksimovic <branimir....@gmail.com> writes:
> >>>>> Both loops execute at 2 cycles/iteration (IPC=4) on a Skylake. I
> >>> ...
> >>>>Could you provide test program?
> >>>
> >>> http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip
> >>>
> >>> Today I measure 1.8 cycles per iteration (IPC=4.44) from this program
> >>> (both compiled for 387 and SSE2) on a Skylake. Strange.
> >>>
> >>> - anton
> >> Heh, I just need timing code now:
> >> in gas, ARMv8 equivalent of rdtsc:
> >> init_time:
> >> mrs x0,CNTPCT_EL0 ; counter
> >> adrp x8,elapsed@PAGE
> >> str x0, [x8,elapsed@PAGEOFF]
> >> ret
> >> time_me:
> >> mrs x8,cntfrq_el0 ; clock
> >> ucvtf d1,x8
> >> mrs x8,CNTPCT_EL0 ; counter
> >> adrp x9,elapsed@PAGE
> >> ldr x9,[x9,elapsed@PAGEOFF]
> >> sub x8,x8,x9
> >> ucvtf d0,x8
> >> fdiv d0,d0,d1
> >> str d0,[sp]
> >> b _printf
> >> Just I dunno how to measure ticks in C :P
> >> modified also flags for gcc:
> >> bmaxa@Branimirs-Air axpy % cat Makefile
> >> all: axpy-sse axpy-387
> >>
> >> axpy-sse: axpy-sse.o axpy-main-sse.o
> >> gcc axpy-sse.o axpy-main-sse.o -o $@
> >>
> >> axpy-387: axpy-387.o axpy-main-387.o
> >> gcc axpy-387.o axpy-main-387.o -o $@
> >>
> >>
> >> axpy-sse.o: axpy.c
> >> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o $@
> >>
> >> axpy-387.o: axpy.c
> >> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o $@
> >>
> >> axpy-main-sse.o: axpy-main.c
> >> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o $@
> >>
> >> axpy-main-387.o: axpy-main.c
> >> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o $@
> >>
> >>
> > Here it is on M1:
> > bmaxa@Branimirs-Air axpy % make
> > gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o axpy-sse.o
> > gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o axpy-main-sse.o
> > as timing.gas -o timing.o
> > gcc axpy-sse.o axpy-main-sse.o timing.o -o axpy-sse
> > gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o axpy-387.o
> > gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o axpy-main-387.o
> > gcc axpy-387.o axpy-main-387.o timing.o -o axpy-387
> > bmaxa@Branimirs-Air axpy % ./axpy-387
> > stride: 0.000004 secs
> > axpy: 0.356832 secs
> > bmaxa@Branimirs-Air axpy % ./axpy-sse
> > stride: 0.000004 secs
> > axpy: 0.357641 secs
> > bmaxa@Branimirs-Air axpy % cat axpy-main.c
> > #include <stdlib.h>
> > void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount);
> > extern void init_time(void);
> > extern void time_me(const char* format);
> > int main()
> > {
> > long stride=10;
> > long i;
> > char *x=malloc(16000);
> > char *y=malloc(16000);
> > char *px=x, *py=y;
> > init_time();
> > for (i=0; i<1000; i++) {
> > *(Float *)px=1.0;
> > *(Float *)py=0.0;
> > px+=stride;
> > py+=stride;
> > }
> > time_me("stride: %f secs\n");
> > init_time();
> > for (i=0; i<1000000; i++)
> > axpy(1.000001, (Float *)x, (Float *)y, stride, 1000);
> > time_me("axpy: %f secs\n");
> > }
> >
> > bmaxa@Branimirs-Air axpy % cat timing.gas
> > .text
> > .globl _init_time
> > .globl _time_me
> > .align 4
> > _init_time:
> > mrs x0,CNTPCT_EL0 ; counter
> > adrp x8,elapsed@PAGE
> > str x0, [x8,elapsed@PAGEOFF]
> > ret
> > _time_me:
> > mrs x8,cntfrq_el0 ; clock
> > ucvtf d1,x8
> > mrs x8,CNTPCT_EL0 ; counter
> > adrp x9,elapsed@PAGE
> > ldr x9,[x9,elapsed@PAGEOFF]
> > sub x8,x8,x9
> > ucvtf d0,x8
> > fdiv d0,d0,d1
> > str d0,[sp]
> > b _printf
> > .data
> > .bss
> > .align 8
> > elapsed: .space 8
> >
> >
> Hm, seems that on ARMv8 simd can't be switched off, compiler produces indentical code
> for both cases.. -mfpu option is also not present on aarch64...
>

IIRC, clang supports -fno-vectorize

> --
> bmaxa now listens Volim Te by Lollobrigida from Lollobrigida Inc.

Re: The value of floating-point exceptions?

<sdk2kd$lu1$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19167&group=comp.arch#19167

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 11:14:03 -0500
Organization: A noiseless patient Spider
Lines: 90
Message-ID: <sdk2kd$lu1$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 25 Jul 2021 16:14:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ecf570b92954f16bd440b3dfeb2dc12f";
logging-data="22465"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18QWgFxV1oC8LEVJXosXrq6"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:ZKcxQo7pwAqKjLseQGW2BeiD0FA=
In-Reply-To: <sdjghd$1i4a$1@gioia.aioe.org>
Content-Language: en-US

by: BGB - Sun, 25 Jul 2021 16:14 UTC

On 7/25/2021 6:05 AM, Terje Mathisen wrote:
> MitchAlsup wrote:
>> Having watched this from inside:
>> a) HW designers know a lot more about this today than in 1980
>> b) even systems that started out as IEEE-format gradually went
>> closer and closer to full IEEE-compliant (GPUs) until there is no
>> useful difference in the quality of the arithmetic.
>> c) once 754-2009 came out the overhead to do denorms went to
>> zero, and there is no reason to avoid full speed denorms in practice.
>> (BGB's small FPGA prototyping environment aside.)
>
> I agree.
>
>> d) HW designers have learned how to perform all of the rounding
>> modes at no overhead compared to RNE.
>
> This is actually dead easy since all the other modes are easier than
> RNE: As soon as you have all four bits required for RNE (i.e.
> sign/ulp/guard/sticky) then the remaining rounding modes only need
> various subsets of these, so you use the rounding mode to route one of 5
> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
> where it becomes the input to be added into the ulp position of the
> final packed (sign/exp/mantissa) fp result.
>

Oddly enough, the extra cost to rounding itself is not the main issue
with multiple rounding modes, but more the question of how the bits get
there (if one doesn't already have an FPU status register or similar).

Granted, could in theory put these bits in SR or similar, but, yeah...

It would be better IMO if it were part of the instruction, but there
isn't really any good / non-annoying way to encode this. Probably the
"least awful" would probably be to use an Op64 encoding, which then uses
some of the Immed extension bits to encode a rounding mode.

* FFw0_00ii_F0nm_5eo8 FADD Rm, Ro, Rn, Imm8
* FFw0_00ii_F0nm_5eo9 FSUB Rm, Ro, Rn, Imm8
* FFw0_00ii_F0nm_5eoA FMUL Rm, Ro, Rn, Imm8

Where the Imm8 field encodes the rounding mode, say:
00 = Round to Nearest.
01 = Truncate.

Or could go the SR route, but I don't want FPU behavior to depend on SR.

> Since the hidden bit is already hidden at this point, andy rounding
> overflow of the mantissa from 0xfff.. to 0x000.. will cause the exponent
> term to be incremented, possibly all the way to Inf. In all cases, this
> is the exactly correct behaviour.
>

Yep.

Main limiting factor though is that for bigger formats (Double or FP96),
propagating the carry that far can be an issue.

In the vast majority of cases, the carry gets absorbed within the low 8
or 16 bits or so (or if it doesn't, leave these bits as-is).

For narrowing conversions to Binary16 or Binary32, full width rounding
is both easier and more useful.

For FADD/FSUB, the vast majority of cases where a very long stream of
1's would have occured can be avoided by doing the math internally in
twos complement form.

Though, in this case, one can save a little cost by implementing the
"twos complement" as essentially ones' complement with a carry bit input
to the adder (one can't arrive at a case where both inputs are negative
with FADD).

Cases can occur though where the result mantissa comes up negative
though, which can itself require a sign inversion. The only alternative
is to compare mantissa input values by value if the exponents are equal,
which is also fairly expensive.

Though, potentially one could use the rounding step to "absorb" part of
the cost of the second sign inversion.

Another possibility here could be to have an adder which produces two
outputs, namely both ((A+B)+Cin) and (~(A+B)+(!Cin)), and then using the
second output if the first came up negative.

....

Re: The value of floating-point exceptions?

<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19170&group=comp.arch#19170

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:4102:: with SMTP id kc2mr14152645qvb.44.1627233779822; Sun, 25 Jul 2021 10:22:59 -0700 (PDT)
X-Received: by 2002:aca:3094:: with SMTP id w142mr7477583oiw.37.1627233779622; Sun, 25 Jul 2021 10:22:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 10:22:59 -0700 (PDT)
In-Reply-To: <sdk2kd$lu1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c819:dc8e:782d:1663; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c819:dc8e:782d:1663
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk> <e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com> <sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com> <e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com> <713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 25 Jul 2021 17:22:59 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 111

by: MitchAlsup - Sun, 25 Jul 2021 17:22 UTC

On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
> > MitchAlsup wrote:
> >> Having watched this from inside:
> >> a) HW designers know a lot more about this today than in 1980
> >> b) even systems that started out as IEEE-format gradually went
> >> closer and closer to full IEEE-compliant (GPUs) until there is no
> >> useful difference in the quality of the arithmetic.
> >> c) once 754-2009 came out the overhead to do denorms went to
> >> zero, and there is no reason to avoid full speed denorms in practice.
> >> (BGB's small FPGA prototyping environment aside.)
> >
> > I agree.
> >
> >> d) HW designers have learned how to perform all of the rounding
> >> modes at no overhead compared to RNE.
> >
> > This is actually dead easy since all the other modes are easier than
> > RNE: As soon as you have all four bits required for RNE (i.e.
> > sign/ulp/guard/sticky) then the remaining rounding modes only need
> > various subsets of these, so you use the rounding mode to route one of 5
> > or 6 possible 16-entry one-bit lookup tables into the rounding circuit
> > where it becomes the input to be added into the ulp position of the
> > final packed (sign/exp/mantissa) fp result.
> >
> Oddly enough, the extra cost to rounding itself is not the main issue
> with multiple rounding modes, but more the question of how the bits get
> there (if one doesn't already have an FPU status register or similar).
>
> Granted, could in theory put these bits in SR or similar, but, yeah...
>
> It would be better IMO if it were part of the instruction, but there
> isn't really any good / non-annoying way to encode this.
<
And this is why they are put in control/status registers.
<
< Probably the
> "least awful" would probably be to use an Op64 encoding, which then uses
> some of the Immed extension bits to encode a rounding mode.
<
The argument against having them in instructions is that this prevents
someone from running the code several times with different rounding
modes set to detect any sensitivity to the actually chosen rounding mode.
Kahan said he uses this a lot.
>
>
> * FFw0_00ii_F0nm_5eo8 FADD Rm, Ro, Rn, Imm8
> * FFw0_00ii_F0nm_5eo9 FSUB Rm, Ro, Rn, Imm8
> * FFw0_00ii_F0nm_5eoA FMUL Rm, Ro, Rn, Imm8
>
> Where the Imm8 field encodes the rounding mode, say:
> 00 = Round to Nearest.
> 01 = Truncate.
>
> Or could go the SR route, but I don't want FPU behavior to depend on SR.
<
When one has multi-threading and control/status register, one simply
reads the RM field and delivers it to the FU as an operand. A couple
of interlock checks means you don't really have to stall the pipeline
because these modes don't change all that often.
<
> > Since the hidden bit is already hidden at this point, andy rounding
> > overflow of the mantissa from 0xfff.. to 0x000.. will cause the exponent
> > term to be incremented, possibly all the way to Inf. In all cases, this
> > is the exactly correct behaviour.
> >
> Yep.
>
> Main limiting factor though is that for bigger formats (Double or FP96),
> propagating the carry that far can be an issue.
<
Koogie-Stone adders !
>
> In the vast majority of cases, the carry gets absorbed within the low 8
> or 16 bits or so (or if it doesn't, leave these bits as-is).
>
> For narrowing conversions to Binary16 or Binary32, full width rounding
> is both easier and more useful.
>
>
>
> For FADD/FSUB, the vast majority of cases where a very long stream of
> 1's would have occured can be avoided by doing the math internally in
> twos complement form.
>
> Though, in this case, one can save a little cost by implementing the
> "twos complement" as essentially ones' complement with a carry bit input
> to the adder (one can't arrive at a case where both inputs are negative
> with FADD).
<
This is a standard trick that everyone should know--I first saw it in the
PDP-8 in the Complement and increment instruction--but it has come in
handy several times and is the way operands are negated and complemented
in My 66000. The operand is conditionally complemented with a carry in
conditionally asserted. IF the operand is being processed is integer there
is an adder that deals with the carry in. If the operand is logical, there is
no adder and the carry in is ignored.
>
>
> Cases can occur though where the result mantissa comes up negative
> though, which can itself require a sign inversion. The only alternative
> is to compare mantissa input values by value if the exponents are equal,
> which is also fairly expensive.
>
> Though, potentially one could use the rounding step to "absorb" part of
> the cost of the second sign inversion.
>
> Another possibility here could be to have an adder which produces two
> outputs, namely both ((A+B)+Cin) and (~(A+B)+(!Cin)), and then using the
> second output if the first came up negative.
>
> ...

Re: The value of floating-point exceptions?

<sdk7e0$mk2$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19171&group=comp.arch#19171

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 10:36:00 -0700
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <sdk7e0$mk2$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 25 Jul 2021 17:36:01 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4c76a7b01451991c24954e3271229dfa";
logging-data="23170"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+a2YHpSwWpp1ei08X0pVzL"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:4z8fdlw7wmB0BsBYNoRjy/Zpnx0=
In-Reply-To: <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Sun, 25 Jul 2021 17:36 UTC

On 7/25/2021 10:22 AM, MitchAlsup wrote:
> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>> MitchAlsup wrote:
>>>> Having watched this from inside:
>>>> a) HW designers know a lot more about this today than in 1980
>>>> b) even systems that started out as IEEE-format gradually went
>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>> useful difference in the quality of the arithmetic.
>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>> (BGB's small FPGA prototyping environment aside.)
>>>
>>> I agree.
>>>
>>>> d) HW designers have learned how to perform all of the rounding
>>>> modes at no overhead compared to RNE.
>>>
>>> This is actually dead easy since all the other modes are easier than
>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>> various subsets of these, so you use the rounding mode to route one of 5
>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>> where it becomes the input to be added into the ulp position of the
>>> final packed (sign/exp/mantissa) fp result.
>>>
>> Oddly enough, the extra cost to rounding itself is not the main issue
>> with multiple rounding modes, but more the question of how the bits get
>> there (if one doesn't already have an FPU status register or similar).
>>
>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>
>> It would be better IMO if it were part of the instruction, but there
>> isn't really any good / non-annoying way to encode this.
> <
> And this is why they are put in control/status registers.

However pitting them in status regs mucks up any code that actually does
care about mode; interval arithmetic for example. Especially because
changing the mode commonly costs a pipe flush (yes, you can put the
status in the decoder and decorate the op in the pipe with it, but that
adds five bits to the op state). And then there's save/restore of the
mode across calls.

Status reg and ignoring the software is a good hardware solution. :-(

Re: The value of floating-point exceptions?

<NvhLI.28465$0N5.11765@fx06.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19172&group=comp.arch#19172

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx06.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<2021Jul25.172251@mips.complang.tuwien.ac.at>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 45
Message-ID: <NvhLI.28465$0N5.11765@fx06.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sun, 25 Jul 2021 17:50:37 UTC
Organization: usenet-news.net
Date: Sun, 25 Jul 2021 17:50:37 GMT
X-Received-Bytes: 2236

by: Branimir Maksimovic - Sun, 25 Jul 2021 17:50 UTC

On 2021-07-25, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>>Hm, seems that on ARMv8 simd can't be switched off, compiler produces indentical code
>>for both cases.. -mfpu option is also not present on aarch64...
>
> It will be once you can choose between Neon and SVE (and maybe
> Helium).
>
> Even if the code for axpy contains a vectorized variant for stride=8,
> this variant will not run, because stride=10 (because this test has
> originally been written to test the speed difference between 80-bit
> 387 FP and 64-bit 387 FP). I very much doubt that they perform
> autovectorization for stride=10.
>
> Testing this with gcc-10.2 and clang-11.0 on AMD64 with -O3 -mavx2,
> gcc produces a simple scalar loop, while clang produces an unrolled
> scalar loop (no vectorized variants to be seen).
>
> - anton
You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s
.arch armv8.4-a+crc
.text
.align 2
.globl _axpy
_axpy:
LFB0:
cbz x3, L1
mov x4, 0
L3:
ldr d2, [x0, x4]
fmul d2, d0, d2
ldr d1, [x1, x4]
fadd d1, d1, d2
str d1, [x1, x4]
add x4, x4, x2
subs x3, x3, #1
bne L3
L1:
ret
pure scalar code...

--
bmaxa now listens Sick Muse by Metric from Fantasies

Re: The value of floating-point exceptions?

<UBhLI.23229$uj5.19735@fx03.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19173&group=comp.arch#19173

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx03.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<174a169c-1005-4fc2-8a20-0b61c1e8dd8en@googlegroups.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 141
Message-ID: <UBhLI.23229$uj5.19735@fx03.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sun, 25 Jul 2021 17:57:08 UTC
Organization: usenet-news.net
Date: Sun, 25 Jul 2021 17:57:08 GMT
X-Received-Bytes: 5516

by: Branimir Maksimovic - Sun, 25 Jul 2021 17:57 UTC

On 2021-07-25, Michael S <already5chosen@yahoo.com> wrote:
> On Sunday, July 25, 2021 at 11:37:21 AM UTC+3, Branimir Maksimovic wrote:
>> On 2021-07-24, Branimir Maksimovic <branimir....@gmail.com> wrote:
>> > On 2021-07-24, Branimir Maksimovic <branimir....@gmail.com> wrote:
>> >> On 2021-07-24, Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> >>> Branimir Maksimovic <branimir....@gmail.com> writes:
>> >>>>> Both loops execute at 2 cycles/iteration (IPC=4) on a Skylake. I
>> >>> ...
>> >>>>Could you provide test program?
>> >>>
>> >>> http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip
>> >>>
>> >>> Today I measure 1.8 cycles per iteration (IPC=4.44) from this program
>> >>> (both compiled for 387 and SSE2) on a Skylake. Strange.
>> >>>
>> >>> - anton
>> >> Heh, I just need timing code now:
>> >> in gas, ARMv8 equivalent of rdtsc:
>> >> init_time:
>> >> mrs x0,CNTPCT_EL0 ; counter
>> >> adrp x8,elapsed@PAGE
>> >> str x0, [x8,elapsed@PAGEOFF]
>> >> ret
>> >> time_me:
>> >> mrs x8,cntfrq_el0 ; clock
>> >> ucvtf d1,x8
>> >> mrs x8,CNTPCT_EL0 ; counter
>> >> adrp x9,elapsed@PAGE
>> >> ldr x9,[x9,elapsed@PAGEOFF]
>> >> sub x8,x8,x9
>> >> ucvtf d0,x8
>> >> fdiv d0,d0,d1
>> >> str d0,[sp]
>> >> b _printf
>> >> Just I dunno how to measure ticks in C :P
>> >> modified also flags for gcc:
>> >> bmaxa@Branimirs-Air axpy % cat Makefile
>> >> all: axpy-sse axpy-387
>> >>
>> >> axpy-sse: axpy-sse.o axpy-main-sse.o
>> >> gcc axpy-sse.o axpy-main-sse.o -o $@
>> >>
>> >> axpy-387: axpy-387.o axpy-main-387.o
>> >> gcc axpy-387.o axpy-main-387.o -o $@
>> >>
>> >>
>> >> axpy-sse.o: axpy.c
>> >> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o $@
>> >>
>> >> axpy-387.o: axpy.c
>> >> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o $@
>> >>
>> >> axpy-main-sse.o: axpy-main.c
>> >> gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o $@
>> >>
>> >> axpy-main-387.o: axpy-main.c
>> >> gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o $@
>> >>
>> >>
>> > Here it is on M1:
>> > bmaxa@Branimirs-Air axpy % make
>> > gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy.c -o axpy-sse.o
>> > gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" axpy-main.c -o axpy-main-sse.o
>> > as timing.gas -o timing.o
>> > gcc axpy-sse.o axpy-main-sse.o timing.o -o axpy-sse
>> > gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy.c -o axpy-387.o
>> > gcc-11 -c -O -march=armv8.4-a -D"Float=double" axpy-main.c -o axpy-main-387.o
>> > gcc axpy-387.o axpy-main-387.o timing.o -o axpy-387
>> > bmaxa@Branimirs-Air axpy % ./axpy-387
>> > stride: 0.000004 secs
>> > axpy: 0.356832 secs
>> > bmaxa@Branimirs-Air axpy % ./axpy-sse
>> > stride: 0.000004 secs
>> > axpy: 0.357641 secs
>> > bmaxa@Branimirs-Air axpy % cat axpy-main.c
>> > #include <stdlib.h>
>> > void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount);
>> > extern void init_time(void);
>> > extern void time_me(const char* format);
>> > int main()
>> > {
>> > long stride=10;
>> > long i;
>> > char *x=malloc(16000);
>> > char *y=malloc(16000);
>> > char *px=x, *py=y;
>> > init_time();
>> > for (i=0; i<1000; i++) {
>> > *(Float *)px=1.0;
>> > *(Float *)py=0.0;
>> > px+=stride;
>> > py+=stride;
>> > }
>> > time_me("stride: %f secs\n");
>> > init_time();
>> > for (i=0; i<1000000; i++)
>> > axpy(1.000001, (Float *)x, (Float *)y, stride, 1000);
>> > time_me("axpy: %f secs\n");
>> > }
>> >
>> > bmaxa@Branimirs-Air axpy % cat timing.gas
>> > .text
>> > .globl _init_time
>> > .globl _time_me
>> > .align 4
>> > _init_time:
>> > mrs x0,CNTPCT_EL0 ; counter
>> > adrp x8,elapsed@PAGE
>> > str x0, [x8,elapsed@PAGEOFF]
>> > ret
>> > _time_me:
>> > mrs x8,cntfrq_el0 ; clock
>> > ucvtf d1,x8
>> > mrs x8,CNTPCT_EL0 ; counter
>> > adrp x9,elapsed@PAGE
>> > ldr x9,[x9,elapsed@PAGEOFF]
>> > sub x8,x8,x9
>> > ucvtf d0,x8
>> > fdiv d0,d0,d1
>> > str d0,[sp]
>> > b _printf
>> > .data
>> > .bss
>> > .align 8
>> > elapsed: .space 8
>> >
>> >
>> Hm, seems that on ARMv8 simd can't be switched off, compiler produces indentical code
>> for both cases.. -mfpu option is also not present on aarch64...
>>
>
> IIRC, clang supports -fno-vectorize
>

thanks!
>> --
>> bmaxa now listens Volim Te by Lollobrigida from Lollobrigida Inc.

--
bmaxa now listens Sick Muse by Metric from Fantasies

Re: The value of floating-point exceptions?

<sdk95b$mjj$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19174&group=comp.arch#19174

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-a40-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 18:05:31 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdk95b$mjj$1@newsreader4.netcologne.de>
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<2021Jul25.172251@mips.complang.tuwien.ac.at>
<NvhLI.28465$0N5.11765@fx06.iad>
Injection-Date: Sun, 25 Jul 2021 18:05:31 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-a40-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:a40:0:7285:c2ff:fe6c:992d";
logging-data="23155"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 25 Jul 2021 18:05 UTC

Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:

> You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s

I don't find the source code, so...

> pure scalar code...

Did you use restrict on the pointers? If not, the compiler
has to assume all sorts of aliasing issues, which usually
preculde vectorization.
>
>
>

Re: The value of floating-point exceptions?

<p4iLI.42308$r21.7912@fx38.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19176&group=comp.arch#19176

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx38.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<2021Jul25.172251@mips.complang.tuwien.ac.at>
<NvhLI.28465$0N5.11765@fx06.iad> <sdk95b$mjj$1@newsreader4.netcologne.de>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 31
Message-ID: <p4iLI.42308$r21.7912@fx38.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sun, 25 Jul 2021 18:29:41 UTC
Organization: usenet-news.net
Date: Sun, 25 Jul 2021 18:29:41 GMT
X-Received-Bytes: 1937

by: Branimir Maksimovic - Sun, 25 Jul 2021 18:29 UTC

On 2021-07-25, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
>
>> You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s
>
> I don't find the source code, so...
>
>> pure scalar code...
>
> Did you use restrict on the pointers? If not, the compiler
> has to assume all sorts of aliasing issues, which usually
> preculde vectorization.
bmaxa@Branimirs-Air axpy % cat axpy.c
void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount)
{ for (; ucount>0; ucount--) {
*f_y += ra * *f_x;
f_x = (Float *)(((char *)f_x)+stride);
f_y = (Float *)(((char *)f_y)+stride);
}
} bmaxa@Branimirs-Air axpy % gcc-11 -c -O -march=armv8.4-a+simd -D"Float=double" -S axpy.c -o axpysimd.s
bmaxa@Branimirs-Air axpy %

>>
>>
>>

--
bmaxa now listens Knights of Cydonia by Muse from Black Holes and Revelations

Re: The value of floating-point exceptions?

<sdkeqm$qnu$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19178&group=comp.arch#19178

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 19:42:14 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdkeqm$qnu$1@newsreader4.netcologne.de>
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<2021Jul25.172251@mips.complang.tuwien.ac.at>
<NvhLI.28465$0N5.11765@fx06.iad> <sdk95b$mjj$1@newsreader4.netcologne.de>
<p4iLI.42308$r21.7912@fx38.iad>
Injection-Date: Sun, 25 Jul 2021 19:42:14 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:aea:0:7285:c2ff:fe6c:992d";
logging-data="27390"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 25 Jul 2021 19:42 UTC

Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
> On 2021-07-25, Thomas Koenig <tkoenig@netcologne.de> wrote:
>> Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
>>
>>> You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s
>>
>> I don't find the source code, so...
>>
>>> pure scalar code...
>>
>> Did you use restrict on the pointers? If not, the compiler
>> has to assume all sorts of aliasing issues, which usually
>> preculde vectorization.
> bmaxa@Branimirs-Air axpy % cat axpy.c
> void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount)
> {
> for (; ucount>0; ucount--) {
> *f_y += ra * *f_x;
> f_x = (Float *)(((char *)f_x)+stride);
> f_y = (Float *)(((char *)f_y)+stride);
> }
> }

Try

$ cat a.c
void axpy(Float ra, Float const * restrict f_x,
Float * restrict f_y,
long stride, unsigned long ucount)
{ for (; ucount>0; ucount--) {
*f_y += ra * *f_x;
f_x = (Float *)(((char *)f_x)+stride);
f_y = (Float *)(((char *)f_y)+stride);
}
} $ gcc -DFloat=double -march=native -O3 -S a.c

and on my home system you get something like

..L3:
vmovsd (%rdi), %xmm1
addq %rdx, %rdi
vfmadd213sd (%rsi), %xmm0, %xmm1
vmovsd %xmm1, (%rsi)
addq %rdx, %rsi
decq %rcx
jne .L3

so it's at least a bit better.

Vectorization with strides which are unknown at compile time is
difficult, which is why you need either LTO or VVM :-)

Re: The value of floating-point exceptions?

<HqjLI.1611$rl3.1306@fx26.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19180&group=comp.arch#19180

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx26.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: The value of floating-point exceptions?
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<2021Jul25.172251@mips.complang.tuwien.ac.at>
<NvhLI.28465$0N5.11765@fx06.iad> <sdk95b$mjj$1@newsreader4.netcologne.de>
<p4iLI.42308$r21.7912@fx38.iad> <sdkeqm$qnu$1@newsreader4.netcologne.de>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 74
Message-ID: <HqjLI.1611$rl3.1306@fx26.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Sun, 25 Jul 2021 20:01:43 UTC
Organization: usenet-news.net
Date: Sun, 25 Jul 2021 20:01:43 GMT
X-Received-Bytes: 3080

by: Branimir Maksimovic - Sun, 25 Jul 2021 20:01 UTC

On 2021-07-25, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
>> On 2021-07-25, Thomas Koenig <tkoenig@netcologne.de> wrote:
>>> Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
>>>
>>>> You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s
>>>
>>> I don't find the source code, so...
>>>
>>>> pure scalar code...
>>>
>>> Did you use restrict on the pointers? If not, the compiler
>>> has to assume all sorts of aliasing issues, which usually
>>> preculde vectorization.
>> bmaxa@Branimirs-Air axpy % cat axpy.c
>> void axpy(Float ra, Float *f_x, Float *f_y, long stride, unsigned long ucount)
>> {
>> for (; ucount>0; ucount--) {
>> *f_y += ra * *f_x;
>> f_x = (Float *)(((char *)f_x)+stride);
>> f_y = (Float *)(((char *)f_y)+stride);
>> }
>> }
>
> Try
>
> $ cat a.c
> void axpy(Float ra, Float const * restrict f_x,
> Float * restrict f_y,
> long stride, unsigned long ucount)
> {
> for (; ucount>0; ucount--) {
> *f_y += ra * *f_x;
> f_x = (Float *)(((char *)f_x)+stride);
> f_y = (Float *)(((char *)f_y)+stride);
> }
> }
> $ gcc -DFloat=double -march=native -O3 -S a.c
>
> and on my home system you get something like
>
> .L3:
> vmovsd (%rdi), %xmm1
> addq %rdx, %rdi
> vfmadd213sd (%rsi), %xmm0, %xmm1
> vmovsd %xmm1, (%rsi)
> addq %rdx, %rsi
> decq %rcx
> jne .L3
>
> so it's at least a bit better.
>
same on arm:
_axpy:
LFB0:
mov x4, 0
cbz x3, L1
.p2align 3,,7
L3:
ldr d2, [x0, x4]
subs x3, x3, #1
ldr d1, [x1, x4]
fmadd d1, d2, d0, d1
str d1, [x1, x4]
add x4, x4, x2
bne L3
> Vectorization with strides which are unknown at compile time is
> difficult, which is why you need either LTO or VVM :-)

oh lto :P

--
bmaxa now listens The Skank Heads by Skunk Anansie from Post Orgasmic Chill

Re: The value of floating-point exceptions?

<9a426609-3695-46b7-bc19-5130b2f068a8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19185&group=comp.arch#19185

copy link Newsgroups: comp.arch

X-Received: by 2002:aed:2149:: with SMTP id 67mr12525605qtc.60.1627249156417;
Sun, 25 Jul 2021 14:39:16 -0700 (PDT)
X-Received: by 2002:a4a:2242:: with SMTP id z2mr8597051ooe.90.1627249156208;
Sun, 25 Jul 2021 14:39:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 14:39:16 -0700 (PDT)
In-Reply-To: <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:a4fa:c486:95ac:988e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:a4fa:c486:95ac:988e
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org>
<sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9a426609-3695-46b7-bc19-5130b2f068a8n@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 25 Jul 2021 21:39:16 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Sun, 25 Jul 2021 21:39 UTC

On Sunday, July 25, 2021 at 11:23:00 AM UTC-6, MitchAlsup wrote:

> Koogie-Stone adders !

Kogge-Stone adders, please!

John Savard

Re: The value of floating-point exceptions?

<sdkp9i$cvq$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19188&group=comp.arch#19188

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Sun, 25 Jul 2021 17:40:47 -0500
Organization: A noiseless patient Spider
Lines: 130
Message-ID: <sdkp9i$cvq$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 25 Jul 2021 22:40:50 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="21a115adb001a4814e15cab9b01f96eb";
logging-data="13306"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18bbrucUb6MdlqDShE4vgHA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:3OdbCuczjSpdArCNofMs+TgwX9Q=
In-Reply-To: <sdk7e0$mk2$1@dont-email.me>
Content-Language: en-US

by: BGB - Sun, 25 Jul 2021 22:40 UTC

On 7/25/2021 12:36 PM, Ivan Godard wrote:
> On 7/25/2021 10:22 AM, MitchAlsup wrote:
>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>> MitchAlsup wrote:
>>>>> Having watched this from inside:
>>>>> a) HW designers know a lot more about this today than in 1980
>>>>> b) even systems that started out as IEEE-format gradually went
>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>> useful difference in the quality of the arithmetic.
>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>
>>>> I agree.
>>>>
>>>>> d) HW designers have learned how to perform all of the rounding
>>>>> modes at no overhead compared to RNE.
>>>>
>>>> This is actually dead easy since all the other modes are easier than
>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>> various subsets of these, so you use the rounding mode to route one
>>>> of 5
>>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>>> where it becomes the input to be added into the ulp position of the
>>>> final packed (sign/exp/mantissa) fp result.
>>>>
>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>> with multiple rounding modes, but more the question of how the bits get
>>> there (if one doesn't already have an FPU status register or similar).
>>>
>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>
>>> It would be better IMO if it were part of the instruction, but there
>>> isn't really any good / non-annoying way to encode this.
>> <
>> And this is why they are put in control/status registers.
>
> However pitting them in status regs mucks up any code that actually does
> care about mode; interval arithmetic for example. Especially because
> changing the mode commonly costs a pipe flush (yes, you can put the
> status in the decoder and decorate the op in the pipe with it, but that
> adds five bits to the op state). And then there's save/restore of the
> mode across calls.
>

Yeah, to be useful, it kinda needs to be per-instruction.
This means putting it in the encoding, as a register-based mode is a bit
too coarse grain to be particularly useful.

> Status reg and ignoring the software is a good hardware solution. :-(

Unless one adds logic to save/restore the FPU control state as part of
the ABI, then it effectively becomes global state.

Unless the register needs to be regularly reloaded for some other
reason, then one has the issue that, if some random piece of code in
some library somewhere decides to change the FPU rounding mode, then
everything else in the program may quietly start producing subtly
different results, which I personally feel is a *worse* scenario than
not having an option to change the rounding mode in the first place.

Having it either fixed in the implementation, or encoded as part of the
instruction, avoids this scenario in that the same instruction sequence
with the same inputs will always produce the same results.

Also don't want to add another register just to mandate that it be
saved/restored via the C ABI, as this would add a lot of extra cost for
a fairly obscure use-case.

In this case, it would almost make more sense to add the bits into
GBR(63:48), along with an intrinsic to modify them, and the compiler
would force the function to do a GBR save/restore if this intrinsic is used:
GBR(47: 0): Global Base Register (used to access .data/.bss);
GBR(63:48): Repurposed as FPSCR State.

Would behave as dynamic state within a given program or DLL, but revert
to defaults across DLL boundaries. Likewise, returning from the function
which updated the rounding mode would automatically revert it to
whatever value it held previously.

Say, FPSCR:
(3:0): Rounding Mode
0=Nearest
1=Truncate
2=+Inf
3=-Inf
4=(?) Nearest via Frac(2), Frac(1:0)=Status
...
(7:4): Sticky Bits (Inexact, Underflow, Overflow, Inv-Op)

This would be kind of an ugly hack though...

Mode 4 would allow using Binary64 to hold a 50-bit integer exactly; The
low 2 bits could serve as a result status (00=Exact,
01=Inexact/Underflow, 10/11=Reserved).

It also effectively moves the ULP over by 2 bits, rounding the number in
a way which is more appropriate for flonum operations. The inexact
status would be sticky, such that inexact inputs may not yield an exact
output. This distinction would be N/A for flonums.

Note that the high 16 bits of LR are already used for saving restoring
some SR state bits (WEX mode and predicate flags and similar).

Though, could be better to just add rounding modes via an Op64 encoding,
and skip the use of any register bits.

Personally I feel any sticky bits are also borderline useless unless one
can tell which value they apply to.

OTOH:
Overflow -> Inf
Invalid Operation -> NaN
Underflow -> Inexact + Zero (Mode 4)
Inexact -> Inexact (Mode 4)

Modes 0..3 would lose both Inexact and Underflow status, but these are
likely to be niche cases anyways.

....

But Captain -- the engines can't take this much longer!

devel / comp.arch / Re: The value of floating-point exceptions?

Subject	Author
The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Stephen Fuld
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	luke.l...@gmail.com
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	Anton Ertl
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	EricP
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point	Terje Mathisen
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Stephen Fuld
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: The value of floating-point exceptions?	EricP
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Thomas Koenig
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Thomas Koenig
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Anton Ertl
Re: The value of floating-point exceptions?	Michael S
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Ivan Godard
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point	Ivan Godard
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Ivan Godard
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: The value of floating-point exceptions?	antispam
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	antispam
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	antispam
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Thomas Koenig
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	EricP
Re: The value of floating-point exceptions?	Anton Ertl
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	antispam