Message-ID:

Killing is stupid; useless! -- McCoy, "A Private Little War", stardate 4211.8

devel / comp.arch / Re: The value of floating-point exceptions?

Re: The value of floating-point exceptions?

<0b1c1274-8e64-4971-b7f0-3d3d89e2ac8dn@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=19190&group=comp.arch#19190

X-Received: by 2002:a37:a6d2:: with SMTP id p201mr15401414qke.98.1627254371147;
Sun, 25 Jul 2021 16:06:11 -0700 (PDT)
X-Received: by 2002:a9d:6d83:: with SMTP id x3mr9482022otp.110.1627254370949;
Sun, 25 Jul 2021 16:06:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 16:06:10 -0700 (PDT)
In-Reply-To: <sdk7e0$mk2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd7e:9026:5385:145d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd7e:9026:5385:145d
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org>
<sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0b1c1274-8e64-4971-b7f0-3d3d89e2ac8dn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 25 Jul 2021 23:06:11 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Sun, 25 Jul 2021 23:06 UTC

On Sunday, July 25, 2021 at 12:36:03 PM UTC-5, Ivan Godard wrote:
> On 7/25/2021 10:22 AM, MitchAlsup wrote:
> > On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
> >> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
> >>> MitchAlsup wrote:
> >>>> Having watched this from inside:
> >>>> a) HW designers know a lot more about this today than in 1980
> >>>> b) even systems that started out as IEEE-format gradually went
> >>>> closer and closer to full IEEE-compliant (GPUs) until there is no
> >>>> useful difference in the quality of the arithmetic.
> >>>> c) once 754-2009 came out the overhead to do denorms went to
> >>>> zero, and there is no reason to avoid full speed denorms in practice.
> >>>> (BGB's small FPGA prototyping environment aside.)
> >>>
> >>> I agree.
> >>>
> >>>> d) HW designers have learned how to perform all of the rounding
> >>>> modes at no overhead compared to RNE.
> >>>
> >>> This is actually dead easy since all the other modes are easier than
> >>> RNE: As soon as you have all four bits required for RNE (i.e.
> >>> sign/ulp/guard/sticky) then the remaining rounding modes only need
> >>> various subsets of these, so you use the rounding mode to route one of 5
> >>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
> >>> where it becomes the input to be added into the ulp position of the
> >>> final packed (sign/exp/mantissa) fp result.
> >>>
> >> Oddly enough, the extra cost to rounding itself is not the main issue
> >> with multiple rounding modes, but more the question of how the bits get
> >> there (if one doesn't already have an FPU status register or similar).
> >>
> >> Granted, could in theory put these bits in SR or similar, but, yeah...
> >>
> >> It would be better IMO if it were part of the instruction, but there
> >> isn't really any good / non-annoying way to encode this.
> > <
> > And this is why they are put in control/status registers.
<
> However pitting them in status regs mucks up any code that actually does
> care about mode; interval arithmetic for example. Especially because
> changing the mode commonly costs a pipe flush (yes, you can put the
> status in the decoder and decorate the op in the pipe with it, but that
> adds five bits to the op state). And then there's save/restore of the
> mode across calls.
<
Yes, some old machines did require pipeline flushes to read status or write control.
<
However, once SMT came along they (T H E Y) should have quit building
data paths that require pipeline flushes. Whether they (T H E Y) did or not
I really don't know, I just know it is not necessary.
<
Besides, nobody actually uses interval arithmetic .........
<
What software environments require preserving modes across subroutine calls?
That is, so few FP subroutines modify rounding modes, that saving and restoring
these seem pointless en-the-large, but maybe a subroutine here or there needs to
do this.
>
> Status reg and ignoring the software is a good hardware solution. :-(

Re: The value of floating-point exceptions?

<2e8fba39-7ded-4876-8409-46461415ac3bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19192&group=comp.arch#19192

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7608:: with SMTP id t8mr13366914qtq.246.1627254470906;
Sun, 25 Jul 2021 16:07:50 -0700 (PDT)
X-Received: by 2002:aca:c7cb:: with SMTP id x194mr9436246oif.119.1627254470670;
Sun, 25 Jul 2021 16:07:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 16:07:50 -0700 (PDT)
In-Reply-To: <9a426609-3695-46b7-bc19-5130b2f068a8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd7e:9026:5385:145d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd7e:9026:5385:145d
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org>
<sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<9a426609-3695-46b7-bc19-5130b2f068a8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2e8fba39-7ded-4876-8409-46461415ac3bn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 25 Jul 2021 23:07:50 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 9

by: MitchAlsup - Sun, 25 Jul 2021 23:07 UTC

On Sunday, July 25, 2021 at 4:39:17 PM UTC-5, Quadibloc wrote:
> On Sunday, July 25, 2021 at 11:23:00 AM UTC-6, MitchAlsup wrote:
>
> > Koogie-Stone adders !
>
> Kogge-Stone adders, please!
>
Getting picky in your old age ??
<
> John Savard

Marcus <m.delete@this.bitsnbites.eu> wrote:
> Hi!
>
> I would like to ask a couple of simple questions in this group, related
> to the IEEE 754 standard and floating-point exceptions.
>
> (Some of you may know that I've been pondering this subject before in
> other forums).
>
> As we all know, support for floating-point exceptions comes with a cost
> (sometimes significant, since it dictates many aspects of a HW
> implementation for instance). If we imagine an alternative universe in
> which the dominant floating-point standard did not require exceptions,
> I suspect that most hardware (and language) implementations would most
> likely be simpler, more energy efficient and more suitable for parallel
> and pipelined floating-point operations.
>
> My questions are:
>
> 1) What (in your opinion) are the benefits of floating-point exceptions?

No need for explicit error checks => less instructions, so simpler
and faster code
> 2) In what situations have you had use for floating-point exceptions?

My code contains no explicit error checks, so I depend on exception
to detect errors. Note that while I want to abort FP computation,
in many cases I do not want to abort whole program. In particular,
outer routine may restart computation using more expensive method
which hopefully will work without error.

Extra explanation: IEEE says that instructions should generate
NaN-s in certion situations, however NaN-s have problems:
- overflow generates Infinity, so only exceptions will reliably
catch it
- sometimes there is need to catch underflow (loss of precision),
flush to 0 or denormals try to mask such problem (denormals
make detection harder), exception on denormals catches it
- NaN-s produce strange results during comparisons, so wild
NaN may easily divert program flow to wrong place, leading
to completely bogus result.

I would prefer to have only single 0, and that any invalid
operation should produce NaN or exception. I find current
rules for comparison of NaN-s to be insane: it breaks
basic math idetities which decent optimizer should use.
For high-level languages in practice it means that
high-level laguage must make comparisons with NaN-s
undefined (since optimizer will give different results
than hardware).

For higher-level language that want to offer "safe" semantic
there is need to detect errors. Exception do this. Alternative
is automatice testing for NaN-s, Infinities, etc. AFAICS
with current IEEE with Infinites, automatic checks would
be quite expensive (I do not see any way to avoid checking
results of almost all operations), probably slowing down
programs several times. With "NaN on any invalid op" one
could delay checks a bit, but this would require testing
for NaN-s before any comparisions, so still significant
slowdown for programs performing a lot of FP comparisions.

I could probably live with rules saying that one gets
NaN-s on any invalid op and that use of NaN-s in comparisons
causes exception.

Still, exceptions are much better for debugging than alternatives.

--
Waldek Hebisch

Re: The value of floating-point exceptions?

<10a5151f-d8b9-4ff3-a22c-85b0b92fb5f9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19195&group=comp.arch#19195

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7f01:: with SMTP id f1mr14135078qtk.362.1627277148425;
Sun, 25 Jul 2021 22:25:48 -0700 (PDT)
X-Received: by 2002:aca:3094:: with SMTP id w142mr8556034oiw.37.1627277148196;
Sun, 25 Jul 2021 22:25:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 22:25:47 -0700 (PDT)
In-Reply-To: <98ee6cd3-1a1e-44a6-848b-20a8d81a6adfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:654a:752a:63ac:cddc;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:654a:752a:63ac:cddc
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de>
<c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com> <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com>
<98ee6cd3-1a1e-44a6-848b-20a8d81a6adfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <10a5151f-d8b9-4ff3-a22c-85b0b92fb5f9n@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 26 Jul 2021 05:25:48 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Mon, 26 Jul 2021 05:25 UTC

On Saturday, July 24, 2021 at 9:40:14 PM UTC-6, Quadibloc wrote:
> On Saturday, July 24, 2021 at 3:24:21 PM UTC-6, Quadibloc wrote:

> And the SS format instructions would be a mess:
>
> opcode: 8 bits
> source base register: 1 bit
> length: 8 bits
> destination base register: 4 bits
> destination address: 15 bits
> source base register (continued): 3 bits
> source address: 15 bits

Silly me. Just move the destination address constant to an *odd*
byte position, and everything is fine.

opcode: 8 bits
destination base register: 4 bits
destination address: 15 bits
length: 8 bits
source base register: 4 bits
source address: 15 bits

So all I have to do is accept that the order of fields will be
different from that in the System/360, and I get a reasonable
36-bit version of the System/360.

John Savard

Re: The value of floating-point exceptions?

<b9e4d9ae-2fca-49f9-a60d-3631a77cd6d7n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19196&group=comp.arch#19196

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:31a1:: with SMTP id bi33mr15839007qkb.146.1627277215824; Sun, 25 Jul 2021 22:26:55 -0700 (PDT)
X-Received: by 2002:a9d:6d83:: with SMTP id x3mr10180623otp.110.1627277215596; Sun, 25 Jul 2021 22:26:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 25 Jul 2021 22:26:55 -0700 (PDT)
In-Reply-To: <2e8fba39-7ded-4876-8409-46461415ac3bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:654a:752a:63ac:cddc; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:654a:752a:63ac:cddc
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk> <e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com> <sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com> <e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com> <713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com> <9a426609-3695-46b7-bc19-5130b2f068a8n@googlegroups.com> <2e8fba39-7ded-4876-8409-46461415ac3bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b9e4d9ae-2fca-49f9-a60d-3631a77cd6d7n@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 26 Jul 2021 05:26:55 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 12

by: Quadibloc - Mon, 26 Jul 2021 05:26 UTC

On Sunday, July 25, 2021 at 5:07:51 PM UTC-6, MitchAlsup wrote:
> On Sunday, July 25, 2021 at 4:39:17 PM UTC-5, Quadibloc wrote:
> > On Sunday, July 25, 2021 at 11:23:00 AM UTC-6, MitchAlsup wrote:

> > > Koogie-Stone adders !

> > Kogge-Stone adders, please!

> Getting picky in your old age ??

Well, what if somebody wanted to Google them?

John Savard

Re: The value of floating-point exceptions?

<sdlo5n$da3$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19197&group=comp.arch#19197

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 09:27:51 +0200
Organization: A noiseless patient Spider
Lines: 149
Message-ID: <sdlo5n$da3$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 26 Jul 2021 07:27:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9f3629cba39f73252889084e82ae9ee4";
logging-data="13635"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/19U+XfUT9dVVSz163HwXSei8pNZGtMaA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:BCnK75ABwxWWtnN81acjA2GFruY=
In-Reply-To: <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
Content-Language: en-US
X-Mozilla-News-Host: news://news.eternal-september.org

by: Marcus - Mon, 26 Jul 2021 07:27 UTC

On 2021-07-25 19:22, MitchAlsup wrote:
> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>> MitchAlsup wrote:
>>>> Having watched this from inside:
>>>> a) HW designers know a lot more about this today than in 1980
>>>> b) even systems that started out as IEEE-format gradually went
>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>> useful difference in the quality of the arithmetic.
>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>> (BGB's small FPGA prototyping environment aside.)
>>>
>>> I agree.
>>>
>>>> d) HW designers have learned how to perform all of the rounding
>>>> modes at no overhead compared to RNE.
>>>
>>> This is actually dead easy since all the other modes are easier than
>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>> various subsets of these, so you use the rounding mode to route one of 5
>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>> where it becomes the input to be added into the ulp position of the
>>> final packed (sign/exp/mantissa) fp result.
>>>
>> Oddly enough, the extra cost to rounding itself is not the main issue
>> with multiple rounding modes, but more the question of how the bits get
>> there (if one doesn't already have an FPU status register or similar).
>>
>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>
>> It would be better IMO if it were part of the instruction, but there
>> isn't really any good / non-annoying way to encode this.
> <
> And this is why they are put in control/status registers.
> <

There are several problems with this, but the *main* problem is that the
rounding mode setting becomes a super-global variable. If one subroutine
touches the register, it affects all code in the same thread. And it's
"super-global" because it crosses source code and language barriers
(e.g. consider a program written in Go that has a Python scripting back
end that calls out to a DLL that is written in C++ that changes the
floating-point rounding mode...).

As I have accounted for elsewhere, this is a real problem. As a SW
developer I therefore prefer to work with architectures that do not have
an FPU control register - /even/ if that means that I can only use RNE
(because let's face it - that's the only rounding mode I'm ever going
to use anyway).

Having the rounding modes as part of the instruction is *much* better
from a SW developer's perspective - but that blows up opcode space for
a feature that is never used.

Perhaps your prefix instruction paradigm could be used for this (e.g.
like "CARRY")?

> < Probably the
>> "least awful" would probably be to use an Op64 encoding, which then uses
>> some of the Immed extension bits to encode a rounding mode.
> <
> The argument against having them in instructions is that this prevents
> someone from running the code several times with different rounding
> modes set to detect any sensitivity to the actually chosen rounding mode.
> Kahan said he uses this a lot.

.......let me question the usefulness of that. If I were to do such
experiments I would just compile the algorithm with different rounding
modes set explicitly (e.g. as a C++ template argument or something).

>>
>>
>> * FFw0_00ii_F0nm_5eo8 FADD Rm, Ro, Rn, Imm8
>> * FFw0_00ii_F0nm_5eo9 FSUB Rm, Ro, Rn, Imm8
>> * FFw0_00ii_F0nm_5eoA FMUL Rm, Ro, Rn, Imm8
>>
>> Where the Imm8 field encodes the rounding mode, say:
>> 00 = Round to Nearest.
>> 01 = Truncate.
>>
>> Or could go the SR route, but I don't want FPU behavior to depend on SR.
> <
> When one has multi-threading and control/status register, one simply
> reads the RM field and delivers it to the FU as an operand. A couple
> of interlock checks means you don't really have to stall the pipeline
> because these modes don't change all that often.
> <

Again, that's the HW perspective. But from a SW perspective you *don't*
want RM to be part of a configuration register.

Global variables are bad. This was well established some 50 years ago
(e.g. Wulf, W.A., Shaw, M., "Global Variables Considered Harmful" from
1973). Global configuration registers even more so.

>>> Since the hidden bit is already hidden at this point, andy rounding
>>> overflow of the mantissa from 0xfff.. to 0x000.. will cause the exponent
>>> term to be incremented, possibly all the way to Inf. In all cases, this
>>> is the exactly correct behaviour.
>>>
>> Yep.
>>
>> Main limiting factor though is that for bigger formats (Double or FP96),
>> propagating the carry that far can be an issue.
> <
> Koogie-Stone adders !
>>
>> In the vast majority of cases, the carry gets absorbed within the low 8
>> or 16 bits or so (or if it doesn't, leave these bits as-is).
>>
>> For narrowing conversions to Binary16 or Binary32, full width rounding
>> is both easier and more useful.
>>
>>
>>
>> For FADD/FSUB, the vast majority of cases where a very long stream of
>> 1's would have occured can be avoided by doing the math internally in
>> twos complement form.
>>
>> Though, in this case, one can save a little cost by implementing the
>> "twos complement" as essentially ones' complement with a carry bit input
>> to the adder (one can't arrive at a case where both inputs are negative
>> with FADD).
> <
> This is a standard trick that everyone should know--I first saw it in the
> PDP-8 in the Complement and increment instruction--but it has come in
> handy several times and is the way operands are negated and complemented
> in My 66000. The operand is conditionally complemented with a carry in
> conditionally asserted. IF the operand is being processed is integer there
> is an adder that deals with the carry in. If the operand is logical, there is
> no adder and the carry in is ignored.
>>
>>
>> Cases can occur though where the result mantissa comes up negative
>> though, which can itself require a sign inversion. The only alternative
>> is to compare mantissa input values by value if the exponents are equal,
>> which is also fairly expensive.
>>
>> Though, potentially one could use the rounding step to "absorb" part of
>> the cost of the second sign inversion.
>>
>> Another possibility here could be to have an adder which produces two
>> outputs, namely both ((A+B)+Cin) and (~(A+B)+(!Cin)), and then using the
>> second output if the first came up negative.
>>
>> ...

Re: The value of floating-point exceptions?

<sdloj0$fvd$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19198&group=comp.arch#19198

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 09:34:55 +0200
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <sdloj0$fvd$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me>
<0b1c1274-8e64-4971-b7f0-3d3d89e2ac8dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 26 Jul 2021 07:34:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9f3629cba39f73252889084e82ae9ee4";
logging-data="16365"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Owh4MDlEkvf7+pwRmYC7LmBUaOveJthE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:FJb+bWAeKjzNulkMFOBHG2nA/NM=
In-Reply-To: <0b1c1274-8e64-4971-b7f0-3d3d89e2ac8dn@googlegroups.com>
Content-Language: en-US

by: Marcus - Mon, 26 Jul 2021 07:34 UTC

On 2021-07-26 01:06, MitchAlsup wrote:
> On Sunday, July 25, 2021 at 12:36:03 PM UTC-5, Ivan Godard wrote:
>> On 7/25/2021 10:22 AM, MitchAlsup wrote:
>>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>>> MitchAlsup wrote:
>>>>>> Having watched this from inside:
>>>>>> a) HW designers know a lot more about this today than in 1980
>>>>>> b) even systems that started out as IEEE-format gradually went
>>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>>> useful difference in the quality of the arithmetic.
>>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>>
>>>>> I agree.
>>>>>
>>>>>> d) HW designers have learned how to perform all of the rounding
>>>>>> modes at no overhead compared to RNE.
>>>>>
>>>>> This is actually dead easy since all the other modes are easier than
>>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>>> various subsets of these, so you use the rounding mode to route one of 5
>>>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>>>> where it becomes the input to be added into the ulp position of the
>>>>> final packed (sign/exp/mantissa) fp result.
>>>>>
>>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>>> with multiple rounding modes, but more the question of how the bits get
>>>> there (if one doesn't already have an FPU status register or similar).
>>>>
>>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>>
>>>> It would be better IMO if it were part of the instruction, but there
>>>> isn't really any good / non-annoying way to encode this.
>>> <
>>> And this is why they are put in control/status registers.
> <
>> However pitting them in status regs mucks up any code that actually does
>> care about mode; interval arithmetic for example. Especially because
>> changing the mode commonly costs a pipe flush (yes, you can put the
>> status in the decoder and decorate the op in the pipe with it, but that
>> adds five bits to the op state). And then there's save/restore of the
>> mode across calls.
> <
> Yes, some old machines did require pipeline flushes to read status or write control.
> <
> However, once SMT came along they (T H E Y) should have quit building
> data paths that require pipeline flushes. Whether they (T H E Y) did or not
> I really don't know, I just know it is not necessary.
> <
> Besides, nobody actually uses interval arithmetic .........
> <
> What software environments require preserving modes across subroutine calls?
> That is, so few FP subroutines modify rounding modes, that saving and restoring
> these seem pointless en-the-large, but maybe a subroutine here or there needs to
> do this.

A much more common scenario is that you change the floating-point
configuration (including rounding-mode) in your DllMain() function (i.e.
the function that is called when a DLL is loaded into your process), and
then you rely on that configuration to be in effect when DLL routines
are later called.

(!!!)

>>
>> Status reg and ignoring the software is a good hardware solution. :-(

Re: The value of floating-point exceptions?

<2021Jul26.093451@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19199&group=comp.arch#19199

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 07:34:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 21
Distribution: world
Message-ID: <2021Jul26.093451@mips.complang.tuwien.ac.at>
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com> <memo.20210723223630.14132H@jgd.cix.co.uk> <2021Jul24.112832@mips.complang.tuwien.ac.at> <TZRKI.24285$Nq7.6581@fx33.iad> <2021Jul24.184642@mips.complang.tuwien.ac.at> <Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad> <0p9LI.21872$tL2.6289@fx43.iad> <2021Jul25.172251@mips.complang.tuwien.ac.at> <NvhLI.28465$0N5.11765@fx06.iad> <sdk95b$mjj$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="612bbe92c16c6943ad1f18b8022540fe";
logging-data="5282"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19m82ZofJxwJqkx13xtkBD8"
Cancel-Lock: sha1:0y6O7pku6v+U8y+VOZ94A/rvteM=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Mon, 26 Jul 2021 07:34 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
>
>> You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s
>
>I don't find the source code, so...
>
>> pure scalar code...
>
>Did you use restrict on the pointers? If not, the compiler
>has to assume all sorts of aliasing issues, which usually
>preculde vectorization.

Source code is

http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The value of floating-point exceptions?

<sdlpam$kbf$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19200&group=comp.arch#19200

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 09:47:34 +0200
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <sdlpam$kbf$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me>
<0ab0a7a5-95e0-4ee4-8129-3ec7f1e876a6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 26 Jul 2021 07:47:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9f3629cba39f73252889084e82ae9ee4";
logging-data="20847"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+U9rTG+0idfnEDP5kZnioFTL6IRPIrpUE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:T4OSqa7jfXzUNV9AuKRXAFbaoEY=
In-Reply-To: <0ab0a7a5-95e0-4ee4-8129-3ec7f1e876a6n@googlegroups.com>
Content-Language: en-US

by: Marcus - Mon, 26 Jul 2021 07:47 UTC

On 2021-07-23 14:15, Quadibloc wrote:
> On Friday, July 23, 2021 at 4:46:33 AM UTC-6, Marcus wrote:
>
>> I agree. I tried approaching the IEEE 754 working group [1] with a
>> suggestion to standardize a leaner subset of the current standard that
>> would better acknowledge the current reality (i.e. that many floating-
>> point implementations lack some of the mandatory IEEE 754 features),
>> and to analyze the effects of such implementations w.r.t. the numerical
>> and operational guarantees that the standard aims to provide. At least
>> to preempt an explosion of fragmented, non-conforming implementations.
>
>> To no avail...
>
> I can see why there would be issues.
>
> The characteristics of the "leaner subset" would depend *strongly* on
> the level of the implementation.
>

It is an issue, but I don't think that was the issue this time.

In contrast, I think that if there *was* a leaner subset, an
implementation would be far more likely to go with that rather than
rolling their own non-standard FP solution, as is the case now, when
the only option for being fully compliant is simply impractical.

> Thus, for the kind of floating-point arithmetic unit I might prefer, the
> leaner subset would have these characteristics:
>
> - Denormals would work. What would not guarantee is that they
> would *fail* when they were supposed to.
>
> - Accurate rounding would be guaranteed for addition, subtraction,
> and multiplication, but *not* division, which would instead be accurate
> to something like 0.6 or 0.51 units in the last place.
>
> That's because I would want to implement division using one of the
> fast algorithms which require significant additional overhead to get
> accurate rounding...

Now I'm mostly guessing, but wouldn't it be better to have a fully
compliant divider - even if it's rather slow - and provide a fast
approximate reciprocal instead?

After all, divisions are rare enough that you can usually track them
down and change them to A * (1 / B) in places where you need the
performance and can live with the approximation.

>
> and I intend to support denormals with no loss in speed by converting
> floats to a format with no hidden bit inside registers, so it would be
> extra work to reproduce the exact numerical range and precision of
> the IEEE 754 floating-point format. Instead, the denormals - and some
> numeric range beyond them - would still have full precision, and only
> get rounded down when it came time to store values in memory.
>
> People implementing floating-point in a *different* manner would have
> entirely different choices for what parts of the IEEE 754 standard they
> would want omitted from a 'leaner subset'. An implementation with
> significantly less hardware would omit the entire denormal range, but
> support exact rounding for division, for example.
>
> So there's a 'lightweight' leaner subset, and there's a 'high-performance'
> leaner subset, at the very least. Which are disjoint rather than one being
> a subset of the other.
>
> John Savard
>

Re: The value of floating-point exceptions?

<2021Jul26.093622@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19201&group=comp.arch#19201

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 07:36:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 24
Message-ID: <2021Jul26.093622@mips.complang.tuwien.ac.at>
References: <sd9a9h$ro6$1@dont-email.me> <sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com> <e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de> <c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com> <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com> <sdj9rq$ue1$2@newsreader4.netcologne.de> <jwvmtqalefg.fsf-monnier+comp.arch@gnu.org> <bdd7738f-1b15-4970-a3d5-dcbf62496ffen@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="612bbe92c16c6943ad1f18b8022540fe";
logging-data="27446"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+W6A/lHcIpjdeH34Z0xDBk"
Cancel-Lock: sha1:uGuMj9h/XNCme+MTDz04lI4gh/I=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Mon, 26 Jul 2021 07:36 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>However, IBM also built the AN/FSQ-31 and AN/FSQ-32. These were
>48-bit machines, and if the IBM 360 had looked something like them,
>it could well have been just as successful.

What makes you think so? I never heard of these systems before.

Reading up on the FSQ-32, it's a word-addressed machine with 18 bits
for the memory address. It would not have achieved IBM's goal for the
S/360 of unifying the scientific and commercial lines, which the S/360
achieved with byte addressing. Also, the 18-bit address limit would
have meant that it would have run out of addresses soon. Admittedly,
the PDP-6/DEC-10 line managed to live until the VAX, and the Burroghs
and Univac architectures survive, in a way, until today, but in any
case it would have hampered the architecture.

I am very happy that IBM did not give in to Amdahl. And when Amdahl
was the boss, he saw the wisdom of this decision and developed and
sold S/370-compatible machines.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The value of floating-point exceptions?

<9f14668a-1878-4763-8c1f-fdc9ad2a9c9fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19203&group=comp.arch#19203

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a004:: with SMTP id j4mr16683205qke.499.1627298502024; Mon, 26 Jul 2021 04:21:42 -0700 (PDT)
X-Received: by 2002:a54:4194:: with SMTP id 20mr3608414oiy.78.1627298501693; Mon, 26 Jul 2021 04:21:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 26 Jul 2021 04:21:41 -0700 (PDT)
In-Reply-To: <2021Jul26.093622@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39a:d100:6d5f:8085:2d60:3740; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39a:d100:6d5f:8085:2d60:3740
References: <sd9a9h$ro6$1@dont-email.me> <sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com> <e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <sdf4t3$6cl$2@newsreader4.netcologne.de> <c0917435-65c8-45f1-b745-5fd7ff4f58c0n@googlegroups.com> <548e3590-e939-40b7-8f78-3040d59358f7n@googlegroups.com> <sdj9rq$ue1$2@newsreader4.netcologne.de> <jwvmtqalefg.fsf-monnier+comp.arch@gnu.org> <bdd7738f-1b15-4970-a3d5-dcbf62496ffen@googlegroups.com> <2021Jul26.093622@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9f14668a-1878-4763-8c1f-fdc9ad2a9c9fn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 26 Jul 2021 11:21:41 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 15

by: Quadibloc - Mon, 26 Jul 2021 11:21 UTC

On Monday, July 26, 2021 at 2:04:26 AM UTC-6, Anton Ertl wrote:
> And when Amdahl
> was the boss, he saw the wisdom of this decision and developed and
> sold S/370-compatible machines.

I'm not sure the data justifies the conclusion.

That is, he might have still believed a 24-bit computer would have been
better, but by _that_ time, the IBM System/360 was well established as
a _de facto_ industry standard.

For all I know, Lisa Su might think that the 680x0 or the Power PC
architectures are superior to the Intel 80386 architecture... or ARM
or RISC-V even... but what she can sell is what can run Windows.

John Savard

Re: The value of floating-point exceptions?

<sdmc4e$2l9$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19204&group=comp.arch#19204

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 13:08:30 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdmc4e$2l9$1@newsreader4.netcologne.de>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdlo5n$da3$1@dont-email.me>
Injection-Date: Mon, 26 Jul 2021 13:08:30 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:aea:0:7285:c2ff:fe6c:992d";
logging-data="2729"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 26 Jul 2021 13:08 UTC

Marcus <m.delete@this.bitsnbites.eu> schrieb:

> There are several problems with this, but the *main* problem is that the
> rounding mode setting becomes a super-global variable. If one subroutine
> touches the register, it affects all code in the same thread.

And obviously, that should not happen, so the rounding mode should
be reset on leaving the subroutine (or on leaving your library,
at the latest).

Fortran chose to reset the rounding mode on leaving the subroutine.

Re: The value of floating-point exceptions?

<sdml6u$s04$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19206&group=comp.arch#19206

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 10:43:25 -0500
Organization: A noiseless patient Spider
Lines: 91
Message-ID: <sdml6u$s04$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me>
<0b1c1274-8e64-4971-b7f0-3d3d89e2ac8dn@googlegroups.com>
<sdloj0$fvd$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 26 Jul 2021 15:43:27 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="02f2099a34b7570ec045b5def5025dcc";
logging-data="28676"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+w2eUbpwCZGK2J7D/yywHP"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:E5pS59rhml/J4Pu7NJjJx/eKmZ8=
In-Reply-To: <sdloj0$fvd$1@dont-email.me>
Content-Language: en-US

by: BGB - Mon, 26 Jul 2021 15:43 UTC

On 7/26/2021 2:34 AM, Marcus wrote:
> On 2021-07-26 01:06, MitchAlsup wrote:
>> On Sunday, July 25, 2021 at 12:36:03 PM UTC-5, Ivan Godard wrote:
>>> On 7/25/2021 10:22 AM, MitchAlsup wrote:
>>>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>>>> MitchAlsup wrote:
>>>>>>> Having watched this from inside:
>>>>>>> a) HW designers know a lot more about this today than in 1980
>>>>>>> b) even systems that started out as IEEE-format gradually went
>>>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>>>> useful difference in the quality of the arithmetic.
>>>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>>>> zero, and there is no reason to avoid full speed denorms in
>>>>>>> practice.
>>>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>>>
>>>>>> I agree.
>>>>>>
>>>>>>> d) HW designers have learned how to perform all of the rounding
>>>>>>> modes at no overhead compared to RNE.
>>>>>>
>>>>>> This is actually dead easy since all the other modes are easier than
>>>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>>>> various subsets of these, so you use the rounding mode to route
>>>>>> one of 5
>>>>>> or 6 possible 16-entry one-bit lookup tables into the rounding
>>>>>> circuit
>>>>>> where it becomes the input to be added into the ulp position of the
>>>>>> final packed (sign/exp/mantissa) fp result.
>>>>>>
>>>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>>>> with multiple rounding modes, but more the question of how the bits
>>>>> get
>>>>> there (if one doesn't already have an FPU status register or similar).
>>>>>
>>>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>>>
>>>>> It would be better IMO if it were part of the instruction, but there
>>>>> isn't really any good / non-annoying way to encode this.
>>>> <
>>>> And this is why they are put in control/status registers.
>> <
>>> However pitting them in status regs mucks up any code that actually does
>>> care about mode; interval arithmetic for example. Especially because
>>> changing the mode commonly costs a pipe flush (yes, you can put the
>>> status in the decoder and decorate the op in the pipe with it, but that
>>> adds five bits to the op state). And then there's save/restore of the
>>> mode across calls.
>> <
>> Yes, some old machines did require pipeline flushes to read status or
>> write control.
>> <
>> However, once SMT came along they (T H E Y) should have quit building
>> data paths that require pipeline flushes. Whether they (T H E Y) did
>> or not
>> I really don't know, I just know it is not necessary.
>> <
>> Besides, nobody actually uses interval arithmetic .........
>> <
>> What software environments require preserving modes across subroutine
>> calls?
>> That is, so few FP subroutines modify rounding modes, that saving and
>> restoring
>> these seem pointless en-the-large, but maybe a subroutine here or
>> there needs to
>> do this.
>
> A much more common scenario is that you change the floating-point
> configuration (including rounding-mode) in your DllMain() function (i.e.
> the function that is called when a DLL is loaded into your process), and
> then you rely on that configuration to be in effect when DLL routines
> are later called.
>
> (!!!)
>

This sort of thing is kinda thing is why I figure putting it a global
register is not an ideal solution.

Though, at least, if the register were preserved by the ABI:
Far reaching "damage" to this state is avoided;
Any functions which need a particular mode would be under more pressure
to set it explicitly;
....

>>>
>>> Status reg and ignoring the software is a good hardware solution. :-(
>

Re: The value of floating-point exceptions?

<1f8827fe-a288-4f2e-8e4e-40cc343febbdn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19208&group=comp.arch#19208

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:f6c6:: with SMTP id d6mr18486668qvo.30.1627316371104;
Mon, 26 Jul 2021 09:19:31 -0700 (PDT)
X-Received: by 2002:aca:c78d:: with SMTP id x135mr16864413oif.30.1627316370811;
Mon, 26 Jul 2021 09:19:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 26 Jul 2021 09:19:30 -0700 (PDT)
In-Reply-To: <sdlo5n$da3$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4cd:6eb0:1a8:164c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4cd:6eb0:1a8:164c
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org>
<sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdlo5n$da3$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1f8827fe-a288-4f2e-8e4e-40cc343febbdn@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 26 Jul 2021 16:19:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 215

by: MitchAlsup - Mon, 26 Jul 2021 16:19 UTC

On Monday, July 26, 2021 at 2:27:54 AM UTC-5, Marcus wrote:
> On 2021-07-25 19:22, MitchAlsup wrote:
> > On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
> >> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
> >>> MitchAlsup wrote:
> >>>> Having watched this from inside:
> >>>> a) HW designers know a lot more about this today than in 1980
> >>>> b) even systems that started out as IEEE-format gradually went
> >>>> closer and closer to full IEEE-compliant (GPUs) until there is no
> >>>> useful difference in the quality of the arithmetic.
> >>>> c) once 754-2009 came out the overhead to do denorms went to
> >>>> zero, and there is no reason to avoid full speed denorms in practice..
> >>>> (BGB's small FPGA prototyping environment aside.)
> >>>
> >>> I agree.
> >>>
> >>>> d) HW designers have learned how to perform all of the rounding
> >>>> modes at no overhead compared to RNE.
> >>>
> >>> This is actually dead easy since all the other modes are easier than
> >>> RNE: As soon as you have all four bits required for RNE (i.e.
> >>> sign/ulp/guard/sticky) then the remaining rounding modes only need
> >>> various subsets of these, so you use the rounding mode to route one of 5
> >>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
> >>> where it becomes the input to be added into the ulp position of the
> >>> final packed (sign/exp/mantissa) fp result.
> >>>
> >> Oddly enough, the extra cost to rounding itself is not the main issue
> >> with multiple rounding modes, but more the question of how the bits get
> >> there (if one doesn't already have an FPU status register or similar).
> >>
> >> Granted, could in theory put these bits in SR or similar, but, yeah...
> >>
> >> It would be better IMO if it were part of the instruction, but there
> >> isn't really any good / non-annoying way to encode this.
> > <
> > And this is why they are put in control/status registers.
> > <
> There are several problems with this, but the *main* problem is that the
> rounding mode setting becomes a super-global variable.
<
Given that they cannot be in a GPR or an FPR, Oh wise one,
Where would you put them ?
<
> If one subroutine
> touches the register, it affects all code in the same thread. And it's
> "super-global" because it crosses source code and language barriers
<
But does not cross thread or task boundaries. So its not "like memory" either.
<
> (e.g. consider a program written in Go that has a Python scripting back
> end that calls out to a DLL that is written in C++ that changes the
> floating-point rounding mode...).
>
> As I have accounted for elsewhere, this is a real problem. As a SW
> developer I therefore prefer to work with architectures that do not have
> an FPU control register - /even/ if that means that I can only use RNE
> (because let's face it - that's the only rounding mode I'm ever going
> to use anyway).
<
So you discount libraries that may contain interval arithmetic or properly
rounded transcendentals ?
>
> Having the rounding modes as part of the instruction is *much* better
> from a SW developer's perspective - but that blows up opcode space for
> a feature that is never used.
<
And there are cases where this philosophy would blow up code space
when they were used. Does the compiler have to generate code for
foo() in RNE, and RTPI and RTNI, and RTZ, and RTMAG modes?
>
> Perhaps your prefix instruction paradigm could be used for this (e.g.
> like "CARRY")?
<
If I wanted 'per instruction RMs' this would be how I did it. A CARRY-like
RM-instruction-modifier could cast RM over 5-ish subsequent instructions.
>
> > < Probably the
> >> "least awful" would probably be to use an Op64 encoding, which then uses
> >> some of the Immed extension bits to encode a rounding mode.
> > <
> > The argument against having them in instructions is that this prevents
> > someone from running the code several times with different rounding
> > modes set to detect any sensitivity to the actually chosen rounding mode.
> > Kahan said he uses this a lot.
>
> ......let me question the usefulness of that. If I were to do such
> experiments I would just compile the algorithm with different rounding
> modes set explicitly (e.g. as a C++ template argument or something).
<
So you would end up with 5 copies of FPPPP, one for each rounding mode
(at the cost of 8K×sizeof(inst) per rounding mode = 32KB/mode = 165KB)
??!!?
>
> >>
> >>
> >> * FFw0_00ii_F0nm_5eo8 FADD Rm, Ro, Rn, Imm8
> >> * FFw0_00ii_F0nm_5eo9 FSUB Rm, Ro, Rn, Imm8
> >> * FFw0_00ii_F0nm_5eoA FMUL Rm, Ro, Rn, Imm8
> >>
> >> Where the Imm8 field encodes the rounding mode, say:
> >> 00 = Round to Nearest.
> >> 01 = Truncate.
> >>
> >> Or could go the SR route, but I don't want FPU behavior to depend on SR.
> > <
> > When one has multi-threading and control/status register, one simply
> > reads the RM field and delivers it to the FU as an operand. A couple
> > of interlock checks means you don't really have to stall the pipeline
> > because these modes don't change all that often.
> > <
>
> Again, that's the HW perspective. But from a SW perspective you *don't*
> want RM to be part of a configuration register.
>
> Global variables are bad. This was well established some 50 years ago
> (e.g. Wulf, W.A., Shaw, M., "Global Variables Considered Harmful" from
> 1973). Global configuration registers even more so.
<
So Root pointers are now considered Harmful ?!?
>
> >>> Since the hidden bit is already hidden at this point, andy rounding
> >>> overflow of the mantissa from 0xfff.. to 0x000.. will cause the exponent
> >>> term to be incremented, possibly all the way to Inf. In all cases, this
> >>> is the exactly correct behaviour.
> >>>
> >> Yep.
> >>
> >> Main limiting factor though is that for bigger formats (Double or FP96),
> >> propagating the carry that far can be an issue.
> > <
> > Koogie-Stone adders !
> >>
> >> In the vast majority of cases, the carry gets absorbed within the low 8
> >> or 16 bits or so (or if it doesn't, leave these bits as-is).
> >>
> >> For narrowing conversions to Binary16 or Binary32, full width rounding
> >> is both easier and more useful.
> >>
> >>
> >>
> >> For FADD/FSUB, the vast majority of cases where a very long stream of
> >> 1's would have occured can be avoided by doing the math internally in
> >> twos complement form.
> >>
> >> Though, in this case, one can save a little cost by implementing the
> >> "twos complement" as essentially ones' complement with a carry bit input
> >> to the adder (one can't arrive at a case where both inputs are negative
> >> with FADD).
> > <
> > This is a standard trick that everyone should know--I first saw it in the
> > PDP-8 in the Complement and increment instruction--but it has come in
> > handy several times and is the way operands are negated and complemented
> > in My 66000. The operand is conditionally complemented with a carry in
> > conditionally asserted. IF the operand is being processed is integer there
> > is an adder that deals with the carry in. If the operand is logical, there is
> > no adder and the carry in is ignored.
> >>
> >>
> >> Cases can occur though where the result mantissa comes up negative
> >> though, which can itself require a sign inversion. The only alternative
> >> is to compare mantissa input values by value if the exponents are equal,
> >> which is also fairly expensive.
> >>
> >> Though, potentially one could use the rounding step to "absorb" part of
> >> the cost of the second sign inversion.
> >>
> >> Another possibility here could be to have an adder which produces two
> >> outputs, namely both ((A+B)+Cin) and (~(A+B)+(!Cin)), and then using the
> >> second output if the first came up negative.
> >>
> >> ...

Click here to read the complete article

Re: The value of floating-point exceptions?

<d972a7e6-17e1-4a83-885f-68e747beb4f6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19209&group=comp.arch#19209

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1087:: with SMTP id g7mr5566080qkk.436.1627316477233;
Mon, 26 Jul 2021 09:21:17 -0700 (PDT)
X-Received: by 2002:a05:6808:10d5:: with SMTP id s21mr11757951ois.7.1627316477050;
Mon, 26 Jul 2021 09:21:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 26 Jul 2021 09:21:16 -0700 (PDT)
In-Reply-To: <sdmc4e$2l9$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4cd:6eb0:1a8:164c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4cd:6eb0:1a8:164c
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org>
<sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdlo5n$da3$1@dont-email.me> <sdmc4e$2l9$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d972a7e6-17e1-4a83-885f-68e747beb4f6n@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 26 Jul 2021 16:21:17 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 26 Jul 2021 16:21 UTC

On Monday, July 26, 2021 at 8:08:32 AM UTC-5, Thomas Koenig wrote:
> Marcus <m.de...@this.bitsnbites.eu> schrieb:
> > There are several problems with this, but the *main* problem is that the
> > rounding mode setting becomes a super-global variable. If one subroutine
> > touches the register, it affects all code in the same thread.
> And obviously, that should not happen, so the rounding mode should
> be reset on leaving the subroutine (or on leaving your library,
> at the latest).
>
> Fortran chose to reset the rounding mode on leaving the subroutine.
<
So how does on set a rounding mode by calling the ChangeRoundingMode(rm)
subroutine ?

Re: The value of floating-point exceptions?

<sdmoe3$aro$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19210&group=comp.arch#19210

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 16:38:27 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdmoe3$aro$1@newsreader4.netcologne.de>
References: <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<memo.20210723223630.14132H@jgd.cix.co.uk>
<2021Jul24.112832@mips.complang.tuwien.ac.at>
<TZRKI.24285$Nq7.6581@fx33.iad>
<2021Jul24.184642@mips.complang.tuwien.ac.at>
<Ws_KI.75293$VU3.57635@fx46.iad> <Ej1LI.12196$gE.4282@fx21.iad>
<0p9LI.21872$tL2.6289@fx43.iad>
<2021Jul25.172251@mips.complang.tuwien.ac.at>
<NvhLI.28465$0N5.11765@fx06.iad> <sdk95b$mjj$1@newsreader4.netcologne.de>
<2021Jul26.093451@mips.complang.tuwien.ac.at>
Injection-Date: Mon, 26 Jul 2021 16:38:27 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:aea:0:7285:c2ff:fe6c:992d";
logging-data="11128"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 26 Jul 2021 16:38 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>>Branimir Maksimovic <branimir.maksimovic@gmail.com> schrieb:
>>
>>> You are right this is what gcc-11 produces @Branimirs-Air axpy % cat axpysimd.s
>>
>>I don't find the source code, so...
>>
>>> pure scalar code...
>>
>>Did you use restrict on the pointers? If not, the compiler
>>has to assume all sorts of aliasing issues, which usually
>>preculde vectorization.
>
> Source code is
>
> http://www.complang.tuwien.ac.at/anton/tmp/axpy.zip

(The BLAS *AXPY has two different strides, by the way).

Let's see... unmodified source, inner loop only, -O3 -march=native
on a Zen 1:

..L3:
vmovsd (%rdi), %xmm1
addq %rdx, %rdi
vfmadd213sd (%rsi), %xmm0, %xmm1
vmovsd %xmm1, (%rsi)
addq %rdx, %rsi
decq %rcx
jne .L3

The inner loop is unchanged when putting restrict on f_x and f_y.
However, no vectorization, and it is not helped by restrict either.

Hm, I just saw that equivalent Fortran code is rather poorly
translated. Time for a PR...

Re: The value of floating-point exceptions?

<sdmofl$iv3$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19211&group=comp.arch#19211

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 09:39:17 -0700
Organization: A noiseless patient Spider
Lines: 111
Message-ID: <sdmofl$iv3$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdlo5n$da3$1@dont-email.me>
<1f8827fe-a288-4f2e-8e4e-40cc343febbdn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 26 Jul 2021 16:39:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bd05d2ba9b54b1196d811f6c4148a764";
logging-data="19427"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fYSE9gleyAsg1+yoe/yt6"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:QRwumMXFjA5rQhoXcw2MBwOM9JY=
In-Reply-To: <1f8827fe-a288-4f2e-8e4e-40cc343febbdn@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Mon, 26 Jul 2021 16:39 UTC

On 7/26/2021 9:19 AM, MitchAlsup wrote:
> On Monday, July 26, 2021 at 2:27:54 AM UTC-5, Marcus wrote:
>> On 2021-07-25 19:22, MitchAlsup wrote:
>>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>>> MitchAlsup wrote:
>>>>>> Having watched this from inside:
>>>>>> a) HW designers know a lot more about this today than in 1980
>>>>>> b) even systems that started out as IEEE-format gradually went
>>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>>> useful difference in the quality of the arithmetic.
>>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>>
>>>>> I agree.
>>>>>
>>>>>> d) HW designers have learned how to perform all of the rounding
>>>>>> modes at no overhead compared to RNE.
>>>>>
>>>>> This is actually dead easy since all the other modes are easier than
>>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>>> various subsets of these, so you use the rounding mode to route one of 5
>>>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>>>> where it becomes the input to be added into the ulp position of the
>>>>> final packed (sign/exp/mantissa) fp result.
>>>>>
>>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>>> with multiple rounding modes, but more the question of how the bits get
>>>> there (if one doesn't already have an FPU status register or similar).
>>>>
>>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>>
>>>> It would be better IMO if it were part of the instruction, but there
>>>> isn't really any good / non-annoying way to encode this.
>>> <
>>> And this is why they are put in control/status registers.
>>> <
>> There are several problems with this, but the *main* problem is that the
>> rounding mode setting becomes a super-global variable.
> <
> Given that they cannot be in a GPR or an FPR, Oh wise one,
> Where would you put them ?
> <
>> If one subroutine
>> touches the register, it affects all code in the same thread. And it's
>> "super-global" because it crosses source code and language barriers
> <
> But does not cross thread or task boundaries. So its not "like memory" either.
> <
>> (e.g. consider a program written in Go that has a Python scripting back
>> end that calls out to a DLL that is written in C++ that changes the
>> floating-point rounding mode...).
>>
>> As I have accounted for elsewhere, this is a real problem. As a SW
>> developer I therefore prefer to work with architectures that do not have
>> an FPU control register - /even/ if that means that I can only use RNE
>> (because let's face it - that's the only rounding mode I'm ever going
>> to use anyway).
> <
> So you discount libraries that may contain interval arithmetic or properly
> rounded transcendentals ?
>>
>> Having the rounding modes as part of the instruction is *much* better
>> from a SW developer's perspective - but that blows up opcode space for
>> a feature that is never used.
> <
> And there are cases where this philosophy would blow up code space
> when they were used. Does the compiler have to generate code for
> foo() in RNE, and RTPI and RTNI, and RTZ, and RTMAG modes?

Other than the shoehorn problem for instruction encoding, why not?

So many of the problems discussed here go away if you don't start off by
saying, "well, I'm going to have an <N> bit instruction".

>>
>> Perhaps your prefix instruction paradigm could be used for this (e.g.
>> like "CARRY")?
> <
> If I wanted 'per instruction RMs' this would be how I did it. A CARRY-like
> RM-instruction-modifier could cast RM over 5-ish subsequent instructions.

As you ladle more and more onto the CARRY mechanism, they start to look
more and more like bundles. Why not take it all the way?

>>
>>> < Probably the
>>>> "least awful" would probably be to use an Op64 encoding, which then uses
>>>> some of the Immed extension bits to encode a rounding mode.
>>> <
>>> The argument against having them in instructions is that this prevents
>>> someone from running the code several times with different rounding
>>> modes set to detect any sensitivity to the actually chosen rounding mode.
>>> Kahan said he uses this a lot.

This might have been relevant in days when compiles were slow and
monolithic. Today if you want to experiment you change a line in or a
compile flag on a DLL and recompile.

>> ......let me question the usefulness of that. If I were to do such
>> experiments I would just compile the algorithm with different rounding
>> modes set explicitly (e.g. as a C++ template argument or something).
> <
> So you would end up with 5 copies of FPPPP, one for each rounding mode
> (at the cost of 8K×sizeof(inst) per rounding mode = 32KB/mode = 165KB)
> ??!!?

No, just five copies of the program

Re: The value of floating-point exceptions?

<sdmpea$c57$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19212&group=comp.arch#19212

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 16:55:39 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdmpea$c57$1@newsreader4.netcologne.de>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdlo5n$da3$1@dont-email.me> <sdmc4e$2l9$1@newsreader4.netcologne.de>
<d972a7e6-17e1-4a83-885f-68e747beb4f6n@googlegroups.com>
Injection-Date: Mon, 26 Jul 2021 16:55:39 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-aea-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:aea:0:7285:c2ff:fe6c:992d";
logging-data="12455"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 26 Jul 2021 16:55 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Monday, July 26, 2021 at 8:08:32 AM UTC-5, Thomas Koenig wrote:
>> Marcus <m.de...@this.bitsnbites.eu> schrieb:
>> > There are several problems with this, but the *main* problem is that the
>> > rounding mode setting becomes a super-global variable. If one subroutine
>> > touches the register, it affects all code in the same thread.
>> And obviously, that should not happen, so the rounding mode should
>> be reset on leaving the subroutine (or on leaving your library,
>> at the latest).
>>
>> Fortran chose to reset the rounding mode on leaving the subroutine.
><
> So how does on set a rounding mode by calling the ChangeRoundingMode(rm)
> subroutine ?

You call the subroutine from the intrinisc module in your main
program, or rather in the procedure from which you call all other
procedures you want to call with that rounding mode set.

You do _NOT_ call a routine to set up the rounding mode for you,
that would be a no-op. (Come to think of it, it might be
a nice warning if somebody sets something IEEE-like and then
doesn't do any floating point operations until the end of
the procedure :-(

I guess that's a trap for the unwary, but people who set rounding
modes are apparently assumed to know what they are doing (which
is true of a good many thing in programming languages, and not
always the case).

IEEE support is optional in Fortran, by the way, you can ask
the compiler if an operation is supported. That you can do in
a subroutine.

Re: The value of floating-point exceptions?

<FyCLI.4763$xn6.2271@fx23.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19213&group=comp.arch#19213

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx23.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk> <e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com> <sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com> <e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com> <713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com> <sdk7e0$mk2$1@dont-email.me>
In-Reply-To: <sdk7e0$mk2$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 99
Message-ID: <FyCLI.4763$xn6.2271@fx23.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 26 Jul 2021 17:47:17 UTC
Date: Mon, 26 Jul 2021 13:46:57 -0400
X-Received-Bytes: 6053

by: EricP - Mon, 26 Jul 2021 17:46 UTC

Ivan Godard wrote:
> On 7/25/2021 10:22 AM, MitchAlsup wrote:
>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>> MitchAlsup wrote:
>>>>> Having watched this from inside:
>>>>> a) HW designers know a lot more about this today than in 1980
>>>>> b) even systems that started out as IEEE-format gradually went
>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>> useful difference in the quality of the arithmetic.
>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>
>>>> I agree.
>>>>
>>>>> d) HW designers have learned how to perform all of the rounding
>>>>> modes at no overhead compared to RNE.
>>>>
>>>> This is actually dead easy since all the other modes are easier than
>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>> various subsets of these, so you use the rounding mode to route one
>>>> of 5
>>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>>> where it becomes the input to be added into the ulp position of the
>>>> final packed (sign/exp/mantissa) fp result.
>>>>
>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>> with multiple rounding modes, but more the question of how the bits get
>>> there (if one doesn't already have an FPU status register or similar).
>>>
>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>
>>> It would be better IMO if it were part of the instruction, but there
>>> isn't really any good / non-annoying way to encode this.
>> <
>> And this is why they are put in control/status registers.
>
> However pitting them in status regs mucks up any code that actually does
> care about mode; interval arithmetic for example. Especially because
> changing the mode commonly costs a pipe flush (yes, you can put the
> status in the decoder and decorate the op in the pipe with it, but that
> adds five bits to the op state).

The data type should be on the instruction: fp16, fp32, fp64, fp128.
The round mode requires 2 bits, 3 static and 1 dynamic mode.
Thats 4 bits for most FP operate instructions so they need to be
a 32-bit format, just like other 3 register instructions.

It is probably best to not do that dynamic round mode merge in the decoder.
Merging the dynamic control bits into a uOp _when needed_ is best done
at the front of the FPU pipeline.

Setting the FpControl flags requires moving from an integer register
and you don't want to make your decoder serially dependent on that
as it would stall at decode until the new value shows up.
Doing the merge *for dynamic round mode instructions only* at the front
of the FPU pipeline confines any RAW dependencies to just that unit
and just those dynamic round instructions.

A Move To FPCTRL MVTFPC instruction makes subsequent FP operate instructions
RAW dependent on the new control value, which causes an FP stall until
the integer value is ready. However by locating the a future version of
FpControl at the front of the FP pipeline, in addition to a committed
version located at write back, an FP pipeline drain can be eliminated.
The future FpControl would also supply the bits for a Move From FPCTRL.

If there is a branch mispredict and if there was a MVTFPC in flight
at the time then you would have to purge any subsequent FP uOps which
had used those future control bits. However they would be purged anyway
for the branch. The future FpControl is then reset to the committed
FpControl value.

One can have multiple versions of the FpControl register but
by having the precision and round control on the instructions
it pretty much eliminates almost all writes to FpControl.

> And then there's save/restore of the
> mode across calls.
>
> Status reg and ignoring the software is a good hardware solution. :-(

Most of this problem is caused because x86 _only_ has dynamic precision
and rounding modes, forcing almost every FP instruction to be
dependent on FpControl's current setting.

Everyone expects FpControl to be set their way but doesn't
want to pay the price of saving, setting and restoring it
on every FP using routine. So the setting of FpControl
becomes an invisible ABI register that no one talks about.

Having the precision and round mode bits on individual instructions
should eliminate almost all dynamic changes to FpControl.
The only users of dynamic round will be those who really
do want dynamic changes.

Re: The value of floating-point exceptions?

<sdmu62$rlr$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19214&group=comp.arch#19214

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 11:16:34 -0700
Organization: A noiseless patient Spider
Lines: 114
Message-ID: <sdmu62$rlr$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me> <FyCLI.4763$xn6.2271@fx23.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 26 Jul 2021 18:16:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bd05d2ba9b54b1196d811f6c4148a764";
logging-data="28347"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+FO5vQEbQiVtku0kCSDZR4"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:2nDI5Ku1F15x1v0/+dx+ZyjsikU=
In-Reply-To: <FyCLI.4763$xn6.2271@fx23.iad>
Content-Language: en-US

by: Ivan Godard - Mon, 26 Jul 2021 18:16 UTC

On 7/26/2021 10:46 AM, EricP wrote:
> Ivan Godard wrote:
>> On 7/25/2021 10:22 AM, MitchAlsup wrote:
>>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>>> MitchAlsup wrote:
>>>>>> Having watched this from inside:
>>>>>> a) HW designers know a lot more about this today than in 1980
>>>>>> b) even systems that started out as IEEE-format gradually went
>>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>>> useful difference in the quality of the arithmetic.
>>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>>> zero, and there is no reason to avoid full speed denorms in practice.
>>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>>
>>>>> I agree.
>>>>>
>>>>>> d) HW designers have learned how to perform all of the rounding
>>>>>> modes at no overhead compared to RNE.
>>>>>
>>>>> This is actually dead easy since all the other modes are easier than
>>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>>> various subsets of these, so you use the rounding mode to route one
>>>>> of 5
>>>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
>>>>> where it becomes the input to be added into the ulp position of the
>>>>> final packed (sign/exp/mantissa) fp result.
>>>>>
>>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>>> with multiple rounding modes, but more the question of how the bits get
>>>> there (if one doesn't already have an FPU status register or similar).
>>>>
>>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>>
>>>> It would be better IMO if it were part of the instruction, but there
>>>> isn't really any good / non-annoying way to encode this.
>>> <
>>> And this is why they are put in control/status registers.
>>
>> However pitting them in status regs mucks up any code that actually
>> does care about mode; interval arithmetic for example. Especially
>> because changing the mode commonly costs a pipe flush (yes, you can
>> put the status in the decoder and decorate the op in the pipe with it,
>> but that adds five bits to the op state).
>
> The data type should be on the instruction: fp16, fp32, fp64, fp128.
> The round mode requires 2 bits, 3 static and 1 dynamic mode.
> Thats 4 bits for most FP operate instructions so they need to be
> a 32-bit format, just like other 3 register instructions.
>
> It is probably best to not do that dynamic round mode merge in the decoder.
> Merging the dynamic control bits into a uOp _when needed_ is best done
> at the front of the FPU pipeline.
>
> Setting the FpControl flags requires moving from an integer register
> and you don't want to make your decoder serially dependent on that
> as it would stall at decode until the new value shows up.
> Doing the merge *for dynamic round mode instructions only* at the front
> of the FPU pipeline confines any RAW dependencies to just that unit
> and just those dynamic round instructions.
>
> A Move To FPCTRL MVTFPC instruction makes subsequent FP operate
> instructions
> RAW dependent on the new control value, which causes an FP stall until
> the integer value is ready. However by locating the a future version of
> FpControl at the front of the FP pipeline, in addition to a committed
> version located at write back, an FP pipeline drain can be eliminated.
> The future FpControl would also supply the bits for a Move From FPCTRL.
>
> If there is a branch mispredict and if there was a MVTFPC in flight
> at the time then you would have to purge any subsequent FP uOps which
> had used those future control bits. However they would be purged anyway
> for the branch. The future FpControl is then reset to the committed
> FpControl value.
>
> One can have multiple versions of the FpControl register but
> by having the precision and round control on the instructions
> it pretty much eliminates almost all writes to FpControl.

Almost exactly the way we do it.

>> And then there's save/restore of the mode across calls.
>>
>> Status reg and ignoring the software is a good hardware solution. :-(
>
> Most of this problem is caused because x86 _only_ has dynamic precision
> and rounding modes, forcing almost every FP instruction to be
> dependent on FpControl's current setting.
>
> Everyone expects FpControl to be set their way but doesn't
> want to pay the price of saving, setting and restoring it
> on every FP using routine. So the setting of FpControl
> becomes an invisible ABI register that no one talks about.
>
> Having the precision and round mode bits on individual instructions
> should eliminate almost all dynamic changes to FpControl.
> The only users of dynamic round will be those who really
> do want dynamic changes.

That's over-optimistic IMO. There will be legacy code that does dynamic
forever, and that code will will expect legacy scope rules for what
dynamic actions it takes. Unfortunately the best-behaved legacy code
will pay the biggest cost: a function that carefully saves the mode on
entry, sets it to a desired mode, and then restores the saved mode on
all exits including exceptions has to pay for all those actions, but
changing the source to use a static in-opcode mode costs software dev
too - and won't be portable.

Yes, mode-in-opcode is the way to go, but uptake in real software will
be slow. Language standards can help, but they are hamstrung by the
vendors or legacy systems. Politics and money trumps engineering
excellence all to often.

Re: The value of floating-point exceptions?

<sdn1s3$m4l$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19217&group=comp.arch#19217

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 14:19:29 -0500
Organization: A noiseless patient Spider
Lines: 166
Message-ID: <sdn1s3$m4l$1@dont-email.me>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdlo5n$da3$1@dont-email.me>
<1f8827fe-a288-4f2e-8e4e-40cc343febbdn@googlegroups.com>
<sdmofl$iv3$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 26 Jul 2021 19:19:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="02f2099a34b7570ec045b5def5025dcc";
logging-data="22677"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/zvUrSdxp2RfSxPeU4+u32"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:nbTvuOYz4lb+AqXL2E3CDDrk1LE=
In-Reply-To: <sdmofl$iv3$1@dont-email.me>
Content-Language: en-US

by: BGB - Mon, 26 Jul 2021 19:19 UTC

On 7/26/2021 11:39 AM, Ivan Godard wrote:
> On 7/26/2021 9:19 AM, MitchAlsup wrote:
>> On Monday, July 26, 2021 at 2:27:54 AM UTC-5, Marcus wrote:
>>> On 2021-07-25 19:22, MitchAlsup wrote:
>>>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
>>>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
>>>>>> MitchAlsup wrote:
>>>>>>> Having watched this from inside:
>>>>>>> a) HW designers know a lot more about this today than in 1980
>>>>>>> b) even systems that started out as IEEE-format gradually went
>>>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
>>>>>>> useful difference in the quality of the arithmetic.
>>>>>>> c) once 754-2009 came out the overhead to do denorms went to
>>>>>>> zero, and there is no reason to avoid full speed denorms in
>>>>>>> practice.
>>>>>>> (BGB's small FPGA prototyping environment aside.)
>>>>>>
>>>>>> I agree.
>>>>>>
>>>>>>> d) HW designers have learned how to perform all of the rounding
>>>>>>> modes at no overhead compared to RNE.
>>>>>>
>>>>>> This is actually dead easy since all the other modes are easier than
>>>>>> RNE: As soon as you have all four bits required for RNE (i.e.
>>>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
>>>>>> various subsets of these, so you use the rounding mode to route
>>>>>> one of 5
>>>>>> or 6 possible 16-entry one-bit lookup tables into the rounding
>>>>>> circuit
>>>>>> where it becomes the input to be added into the ulp position of the
>>>>>> final packed (sign/exp/mantissa) fp result.
>>>>>>
>>>>> Oddly enough, the extra cost to rounding itself is not the main issue
>>>>> with multiple rounding modes, but more the question of how the bits
>>>>> get
>>>>> there (if one doesn't already have an FPU status register or similar).
>>>>>
>>>>> Granted, could in theory put these bits in SR or similar, but, yeah...
>>>>>
>>>>> It would be better IMO if it were part of the instruction, but there
>>>>> isn't really any good / non-annoying way to encode this.
>>>> <
>>>> And this is why they are put in control/status registers.
>>>> <
>>> There are several problems with this, but the *main* problem is that the
>>> rounding mode setting becomes a super-global variable.
>> <
>> Given that they cannot be in a GPR or an FPR, Oh wise one,
>> Where would you put them ?
>> <
>>>
>>> If one subroutine
>>> touches the register, it affects all code in the same thread. And it's
>>> "super-global" because it crosses source code and language barriers
>> <
>> But does not cross thread or task boundaries. So its not "like memory"
>> either.
>> <
>>> (e.g. consider a program written in Go that has a Python scripting back
>>> end that calls out to a DLL that is written in C++ that changes the
>>> floating-point rounding mode...).
>>>
>>> As I have accounted for elsewhere, this is a real problem. As a SW
>>> developer I therefore prefer to work with architectures that do not have
>>> an FPU control register - /even/ if that means that I can only use RNE
>>> (because let's face it - that's the only rounding mode I'm ever going
>>> to use anyway).
>> <
>> So you discount libraries that may contain interval arithmetic or
>> properly
>> rounded transcendentals ?
>>>
>>> Having the rounding modes as part of the instruction is *much* better
>>> from a SW developer's perspective - but that blows up opcode space for
>>> a feature that is never used.
>> <
>> And there are cases where this philosophy would blow up code space
>> when they were used. Does the compiler have to generate code for
>> foo() in RNE, and RTPI and RTNI, and RTZ, and RTMAG modes?
>
> Other than the shoehorn problem for instruction encoding, why not?
>
> So many of the problems discussed here go away if you don't start off by
> saying, "well, I'm going to have an <N> bit instruction".
>

In my ISA spec, I ended up going this way:
FADD Rm, Ro, Rn
32-bit encoding.
Rounding mode still hard-wired as Round-Nearest.
FADD Rm, Ro, Rn, Imm8
64-bit encoding.
User can select the rounding mode explicitly.

Also applies to several other FPU operations.

Imm8 is a bit overkill for a rounding mode (3 would have been
sufficient), but this is the main option I had on hand. Could maybe be
useful for something else later.

These operations are also likely to be rare enough that the slightly
bloated encoding shouldn't matter too much.

Unclear if or how this would be exposed at the C level:
* Not at all (easiest);
* Intrinsics (explicit):
** h=__fpu_fmul_tr(f, g);
* #pragma or something...
** #pragma rounding_mode(truncate)
** Would apply on the level of a given translation unit.
* Attributes:
** [[rounding_mode(truncate)]] double foo(double x);

Not aware of any way to specify this in terms of standard C though.

Decided mostly to avoid going into the specifics of BGBCC recently
adding support for _BitInt and lambda functions (more or less compatible
with the C23 proposal, and a subset of C++ lambda semantics, though not
exactly 1:1 on the semantics front).

>>>
>>> Perhaps your prefix instruction paradigm could be used for this (e.g.
>>> like "CARRY")?
>> <
>> If I wanted 'per instruction RMs' this would be how I did it. A
>> CARRY-like
>> RM-instruction-modifier could cast RM over 5-ish subsequent instructions.
>
> As you ladle more and more onto the CARRY mechanism, they start to look
> more and more like bundles. Why not take it all the way?
>
>>>
>>>> < Probably the
>>>>> "least awful" would probably be to use an Op64 encoding, which then
>>>>> uses
>>>>> some of the Immed extension bits to encode a rounding mode.
>>>> <
>>>> The argument against having them in instructions is that this prevents
>>>> someone from running the code several times with different rounding
>>>> modes set to detect any sensitivity to the actually chosen rounding
>>>> mode.
>>>> Kahan said he uses this a lot.
>
> This might have been relevant in days when compiles were slow and
> monolithic. Today if you want to experiment you change a line in or a
> compile flag on a DLL and recompile.
>

Recompilation is fairly fast, at least typically for C.

>>> ......let me question the usefulness of that. If I were to do such
>>> experiments I would just compile the algorithm with different rounding
>>> modes set explicitly (e.g. as a C++ template argument or something).
>> <
>> So you would end up with 5 copies of FPPPP, one for each rounding mode
>> (at the cost of 8K×sizeof(inst) per rounding mode = 32KB/mode = 165KB)
>> ??!!?
>
> No, just five copies of the program
>

....

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>Quadibloc <jsavard@ecn.ab.ca> writes:
>>However, IBM also built the AN/FSQ-31 and AN/FSQ-32. These were
>>48-bit machines, and if the IBM 360 had looked something like them,
>>it could well have been just as successful.
>
>What makes you think so? I never heard of these systems before.
>
>Reading up on the FSQ-32, it's a word-addressed machine with 18 bits
>for the memory address. It would not have achieved IBM's goal for the
>S/360 of unifying the scientific and commercial lines, which the S/360
>achieved with byte addressing.

Quite right. IBM's commercial customers were happy with their
character addressed 7080 and 1401 machines and had rejected the word
addressed 7070 so there was no way they were going to move to a word
addressed S/360. The power of 2 byte addressing was a brilliant hack
to allow both character and word addressing with decent performance.

There was some grousing from people who were happy with 36 and 48 bit
floating point, but the botched hex float made everyone move to 64
bit double precision anyway.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: The value of floating-point exceptions?

<sdn7g4$d66$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19219&group=comp.arch#19219

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!aoTW/Fm1++YcllHDt9jnUQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The value of floating-point exceptions?
Date: Mon, 26 Jul 2021 22:55:32 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sdn7g4$d66$1@gioia.aioe.org>
References: <sd9a9h$ro6$1@dont-email.me>
<memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com>
<a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me>
<7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com>
<fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com>
<sdjghd$1i4a$1@gioia.aioe.org> <sdk2kd$lu1$1@dont-email.me>
<476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me> <sdkp9i$cvq$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="13510"; posting-host="aoTW/Fm1++YcllHDt9jnUQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Mon, 26 Jul 2021 20:55 UTC

BGB wrote:
> On 7/25/2021 12:36 PM, Ivan Godard wrote:
>> However pitting them in status regs mucks up any code that actually
>> does care about mode; interval arithmetic for example. Especially
>> because changing the mode commonly costs a pipe flush (yes, you can
>> put the status in the decoder and decorate the op in the pipe with it,
>> but that adds five bits to the op state). And then there's
>> save/restore of the mode across calls.
>>
>
> Yeah, to be useful, it kinda needs to be per-instruction.
> This means putting it in the encoding, as a register-based mode is a bit
> too coarse grain to be particularly useful.

I would like to have my cake and eat it too:

a) We need a global/process/thread default Rounding Mode register which
is saved & restored on any context switch, i.e. like pretty much all
systems today.

b) Any (short) group of FP operations can have a CARRY-like prefix op
that specifies the exact rounding mode to be used for the corresponding
operations. Using 3 bits per instruction makes it possible for a 32-bit
instruction with 15 available immediate bits to encode 5 FP instructions.

I.e. ROUND (TRUNC, RNE, RNA, CEIL, FLOOR)

Of the 8 possible encodings, we're currently using 5 for rounding modes,
so we can dedicate one of them (000/111 probably) to indicate "ignore
this, use the thread control word setting here".

One additional requirement is that the use of the ROUND prefix
guarantees that there will be no change of the control word rounding
setting until the shadow ends.

Using Mitch's CARRY shadow trick means that the decoder can then figure
out very early where the rounding mode bits will come from and dispatch
them alongside each FP operation.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: byte me, was The value of floating-point exceptions?

<13a04382-b3c7-4cc3-9dec-72b01f88233bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19220&group=comp.arch#19220

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a154:: with SMTP id k81mr19843256qke.202.1627336950774;
Mon, 26 Jul 2021 15:02:30 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr13122855oti.178.1627336950653;
Mon, 26 Jul 2021 15:02:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 26 Jul 2021 15:02:30 -0700 (PDT)
In-Reply-To: <sdn5jc$30oa$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a168:359b:a6c6:288b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a168:359b:a6c6:288b
References: <sd9a9h$ro6$1@dont-email.me> <jwvmtqalefg.fsf-monnier+comp.arch@gnu.org>
<bdd7738f-1b15-4970-a3d5-dcbf62496ffen@googlegroups.com> <2021Jul26.093622@mips.complang.tuwien.ac.at>
<sdn5jc$30oa$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <13a04382-b3c7-4cc3-9dec-72b01f88233bn@googlegroups.com>
Subject: Re: byte me, was The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 26 Jul 2021 22:02:30 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 26 Jul 2021 22:02 UTC

On Monday, July 26, 2021 at 3:23:10 PM UTC-5, John Levine wrote:
> According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
> >Quadibloc <jsa...@ecn.ab.ca> writes:
> >>However, IBM also built the AN/FSQ-31 and AN/FSQ-32. These were
> >>48-bit machines, and if the IBM 360 had looked something like them,
> >>it could well have been just as successful.
> >
> >What makes you think so? I never heard of these systems before.
> >
> >Reading up on the FSQ-32, it's a word-addressed machine with 18 bits
> >for the memory address. It would not have achieved IBM's goal for the
> >S/360 of unifying the scientific and commercial lines, which the S/360
> >achieved with byte addressing.
> Quite right. IBM's commercial customers were happy with their
> character addressed 7080 and 1401 machines and had rejected the word
> addressed 7070 so there was no way they were going to move to a word
> addressed S/360. The power of 2 byte addressing was a brilliant hack
> to allow both character and word addressing with decent performance.
>
> There was some grousing from people who were happy with 36 and 48 bit
> floating point, but the botched hex float made everyone move to 64
> bit double precision anyway.
<
The deliveries of the first S/360 corresponded with the FP community jumping
ship to the CDC 6600, so the botched FP only lost them a few % of the base
customers, anyway.
>
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

Re: The value of floating-point exceptions?

<ad8db496-015a-4c9f-bda9-2057ad61a373n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19221&group=comp.arch#19221

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:f6c6:: with SMTP id d6mr20038477qvo.30.1627337567682;
Mon, 26 Jul 2021 15:12:47 -0700 (PDT)
X-Received: by 2002:a9d:6d83:: with SMTP id x3mr12698417otp.110.1627337567477;
Mon, 26 Jul 2021 15:12:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 26 Jul 2021 15:12:47 -0700 (PDT)
In-Reply-To: <sdmu62$rlr$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a168:359b:a6c6:288b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a168:359b:a6c6:288b
References: <sd9a9h$ro6$1@dont-email.me> <memo.20210721153537.10680P@jgd.cix.co.uk>
<e9c738bd-7c0e-4b3b-9385-3a0d0658b059n@googlegroups.com> <a74c6bf2-9ad1-4969-b3cb-b650ae8ebdadn@googlegroups.com>
<sde6m7$kr1$1@dont-email.me> <sde74m$nio$1@dont-email.me> <7cf5713e-f138-488b-9ccf-d85df84c50can@googlegroups.com>
<e7e0b9a2-7990-4ec8-9c40-a6e9a07bd306n@googlegroups.com> <fc5a33d0-7c17-4855-8ab3-162884bd6b7bn@googlegroups.com>
<713a35af-9cce-4954-b968-1b4b754e7b1en@googlegroups.com> <sdjghd$1i4a$1@gioia.aioe.org>
<sdk2kd$lu1$1@dont-email.me> <476a9f6b-5fa0-4606-aed6-cf31089b8c5bn@googlegroups.com>
<sdk7e0$mk2$1@dont-email.me> <FyCLI.4763$xn6.2271@fx23.iad> <sdmu62$rlr$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ad8db496-015a-4c9f-bda9-2057ad61a373n@googlegroups.com>
Subject: Re: The value of floating-point exceptions?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 26 Jul 2021 22:12:47 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 26 Jul 2021 22:12 UTC

On Monday, July 26, 2021 at 1:16:36 PM UTC-5, Ivan Godard wrote:
> On 7/26/2021 10:46 AM, EricP wrote:
> > Ivan Godard wrote:
> >> On 7/25/2021 10:22 AM, MitchAlsup wrote:
> >>> On Sunday, July 25, 2021 at 11:14:08 AM UTC-5, BGB wrote:
> >>>> On 7/25/2021 6:05 AM, Terje Mathisen wrote:
> >>>>> MitchAlsup wrote:
> >>>>>> Having watched this from inside:
> >>>>>> a) HW designers know a lot more about this today than in 1980
> >>>>>> b) even systems that started out as IEEE-format gradually went
> >>>>>> closer and closer to full IEEE-compliant (GPUs) until there is no
> >>>>>> useful difference in the quality of the arithmetic.
> >>>>>> c) once 754-2009 came out the overhead to do denorms went to
> >>>>>> zero, and there is no reason to avoid full speed denorms in practice.
> >>>>>> (BGB's small FPGA prototyping environment aside.)
> >>>>>
> >>>>> I agree.
> >>>>>
> >>>>>> d) HW designers have learned how to perform all of the rounding
> >>>>>> modes at no overhead compared to RNE.
> >>>>>
> >>>>> This is actually dead easy since all the other modes are easier than
> >>>>> RNE: As soon as you have all four bits required for RNE (i.e.
> >>>>> sign/ulp/guard/sticky) then the remaining rounding modes only need
> >>>>> various subsets of these, so you use the rounding mode to route one
> >>>>> of 5
> >>>>> or 6 possible 16-entry one-bit lookup tables into the rounding circuit
> >>>>> where it becomes the input to be added into the ulp position of the
> >>>>> final packed (sign/exp/mantissa) fp result.
> >>>>>
> >>>> Oddly enough, the extra cost to rounding itself is not the main issue
> >>>> with multiple rounding modes, but more the question of how the bits get
> >>>> there (if one doesn't already have an FPU status register or similar).
> >>>>
> >>>> Granted, could in theory put these bits in SR or similar, but, yeah...
> >>>>
> >>>> It would be better IMO if it were part of the instruction, but there
> >>>> isn't really any good / non-annoying way to encode this.
> >>> <
> >>> And this is why they are put in control/status registers.
> >>
> >> However pitting them in status regs mucks up any code that actually
> >> does care about mode; interval arithmetic for example. Especially
> >> because changing the mode commonly costs a pipe flush (yes, you can
> >> put the status in the decoder and decorate the op in the pipe with it,
> >> but that adds five bits to the op state).
> >
> > The data type should be on the instruction: fp16, fp32, fp64, fp128.
> > The round mode requires 2 bits, 3 static and 1 dynamic mode.
> > Thats 4 bits for most FP operate instructions so they need to be
> > a 32-bit format, just like other 3 register instructions.
> >
> > It is probably best to not do that dynamic round mode merge in the decoder.
> > Merging the dynamic control bits into a uOp _when needed_ is best done
> > at the front of the FPU pipeline.
> >
> > Setting the FpControl flags requires moving from an integer register
> > and you don't want to make your decoder serially dependent on that
> > as it would stall at decode until the new value shows up.
> > Doing the merge *for dynamic round mode instructions only* at the front
> > of the FPU pipeline confines any RAW dependencies to just that unit
> > and just those dynamic round instructions.
> >
> > A Move To FPCTRL MVTFPC instruction makes subsequent FP operate
> > instructions
> > RAW dependent on the new control value, which causes an FP stall until
> > the integer value is ready. However by locating the a future version of
> > FpControl at the front of the FP pipeline, in addition to a committed
> > version located at write back, an FP pipeline drain can be eliminated.
> > The future FpControl would also supply the bits for a Move From FPCTRL.
> >
> > If there is a branch mispredict and if there was a MVTFPC in flight
> > at the time then you would have to purge any subsequent FP uOps which
> > had used those future control bits. However they would be purged anyway
> > for the branch. The future FpControl is then reset to the committed
> > FpControl value.
> >
> > One can have multiple versions of the FpControl register but
> > by having the precision and round control on the instructions
> > it pretty much eliminates almost all writes to FpControl.
<
> Almost exactly the way we do it.
<
One of my FP observations concerns Newton-Raphson iterations::
One gets a more predictable (i.e., accurate) final result when the
k-2 iterations before final don't even care about rounding, the K-1
iteration REALLY wants truncate, and iteration K wants FP.control.RM.
<
Where being "more accurate" means over hundreds of thousands of
N-R calculations the total error is reduced. This is generally down
2 decimal digits below 1 ULP.
<
> >> And then there's save/restore of the mode across calls.
> >>
> >> Status reg and ignoring the software is a good hardware solution. :-(
> >
> > Most of this problem is caused because x86 _only_ has dynamic precision
> > and rounding modes, forcing almost every FP instruction to be
> > dependent on FpControl's current setting.
> >
> > Everyone expects FpControl to be set their way but doesn't
> > want to pay the price of saving, setting and restoring it
> > on every FP using routine. So the setting of FpControl
> > becomes an invisible ABI register that no one talks about.
<
Why is the cost of saving and restoring something other than on the
order of 4 instructions:: ReadCR-STstack.......LDstack-WriteCR.
That is:: without requiring pipeline flushes and crap like that.
> >
> > Having the precision and round mode bits on individual instructions
> > should eliminate almost all dynamic changes to FpControl.
> > The only users of dynamic round will be those who really
> > do want dynamic changes.
> That's over-optimistic IMO. There will be legacy code that does dynamic
> forever, and that code will will expect legacy scope rules for what
> dynamic actions it takes. Unfortunately the best-behaved legacy code
> will pay the biggest cost: a function that carefully saves the mode on
> entry, sets it to a desired mode, and then restores the saved mode on
> all exits including exceptions has to pay for all those actions, but
> changing the source to use a static in-opcode mode costs software dev
> too - and won't be portable.
>
> Yes, mode-in-opcode is the way to go, but uptake in real software will
> be slow. Language standards can help, but they are hamstrung by the
> vendors or legacy systems. Politics and money trumps engineering
> excellence all to often.

Subject	Author
The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Stephen Fuld
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	luke.l...@gmail.com
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	Anton Ertl
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	EricP
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point	Terje Mathisen
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Stephen Fuld
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: The value of floating-point exceptions?	EricP
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Thomas Koenig
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Thomas Koenig
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Anton Ertl
Re: The value of floating-point exceptions?	Michael S
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Ivan Godard
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: Configurable rounding modes (was The value of floating-point	Marcus
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point	Ivan Godard
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	Ivan Godard
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point exceptions?)	Quadibloc
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: Configurable rounding modes (was The value of floating-point exceptions?)	MitchAlsup
Re: Configurable rounding modes (was The value of floating-point	BGB
Re: The value of floating-point exceptions?	antispam
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	antispam
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	antispam
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	John Dallman
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Thomas Koenig
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Quadibloc
Re: The value of floating-point exceptions?	Marcus
Re: The value of floating-point exceptions?	Terje Mathisen
Re: The value of floating-point exceptions?	Ivan Godard
Re: The value of floating-point exceptions?	BGB
Re: The value of floating-point exceptions?	EricP
Re: The value of floating-point exceptions?	Anton Ertl
Re: The value of floating-point exceptions?	MitchAlsup
Re: The value of floating-point exceptions?	antispam