Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

This login session: $13.76, but for you $11.88.


devel / comp.arch / Re: Why separate 32-bit arithmetic on a 64-bit architecture?

SubjectAuthor
* Why separate 32-bit arithmetic on a 64-bit architecture?Thomas Koenig
+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?BGB
|`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?David Brown
| +- Re: Why separate 32-bit arithmetic on a 64-bit architecture?BGB
| `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
|  `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
|`- Re: Why separate 32-bit arithmetic on a 64-bit architecture?BGB
+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
|`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Marcus
| +- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Marcus
| `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?EricP
|+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Thomas Koenig
||`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Thomas Koenig
|| `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?EricP
||  `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Thomas Koenig
||   `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||    +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Terje Mathisen
||    |`* The cost of gradual underflow (was: Why separate 32-bit arithmetic on a 64-bit aStefan Monnier
||    | `- Re: The cost of gradual underflowTerje Mathisen
||    +- Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||    `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?antispam
||     +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Terje Mathisen
||     |`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Terje Mathisen
||     | |`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | | +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     | | |`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | | | `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     | | |  `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | | |   +- Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     | | |   `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
||     | | `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Terje Mathisen
||     | |  `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | |   `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     | |    `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | |     `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     | |      `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | |       `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     | |        `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     | |         +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Thomas Koenig
||     | |         |+- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     | |         |`- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     | |         `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Terje Mathisen
||     | `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     |  +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  |+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||`- Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     |  |+* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     |  ||`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  || `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     |  ||  `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     |  ||   |`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Thomas Koenig
||     |  ||   | `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     |  ||   |  +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
||     |  ||   |  |+- Re: Why separate 32-bit arithmetic on a 64-bit architecture?EricP
||     |  ||   |  |`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
||     |  ||   |  | `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     |  ||   |  `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   |   +- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     |  ||   |   +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   |   |`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?George Neuner
||     |  ||   |   | `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   |   `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   |    +- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   |    +- Spectre ane EPIC (was: Why separate 32-bit arithmetic...)Anton Ertl
||     |  ||   |    +* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     |  ||   |    |`* Spectre (was: Why separate 32-bit arithmetic ...)Anton Ertl
||     |  ||   |    | +* Re: Spectre (was: Why separate 32-bit arithmetic ...)Michael S
||     |  ||   |    | |+* Re: SpectreEricP
||     |  ||   |    | ||+* Re: SpectreMitchAlsup
||     |  ||   |    | |||`* Re: SpectreEricP
||     |  ||   |    | ||| `- Re: SpectreMitchAlsup
||     |  ||   |    | ||`- Re: SpectreAnton Ertl
||     |  ||   |    | |`* Re: Spectre (was: Why separate 32-bit arithmetic ...)Anton Ertl
||     |  ||   |    | | +* Re: Spectre (was: Why separate 32-bit arithmetic ...)MitchAlsup
||     |  ||   |    | | |`- Re: Spectre (was: Why separate 32-bit arithmetic ...)Thomas Koenig
||     |  ||   |    | | `- Re: Spectre (was: Why separate 32-bit arithmetic ...)Anton Ertl
||     |  ||   |    | +* Re: SpectreEricP
||     |  ||   |    | |`* Re: SpectreAnton Ertl
||     |  ||   |    | | +* Memory encryption (was: Spectre)Thomas Koenig
||     |  ||   |    | | |`* Re: Memory encryption (was: Spectre)Anton Ertl
||     |  ||   |    | | | `* Re: Memory encryption (was: Spectre)Elijah Stone
||     |  ||   |    | | |  +- Re: Memory encryption (was: Spectre)Michael S
||     |  ||   |    | | |  `* Re: Memory encryption (was: Spectre)Anton Ertl
||     |  ||   |    | | |   +- Re: Memory encryption (was: Spectre)MitchAlsup
||     |  ||   |    | | |   `* Re: Memory encryption (was: Spectre)Thomas Koenig
||     |  ||   |    | | |    `- Re: Memory encryption (was: Spectre)Anton Ertl
||     |  ||   |    | | `* Re: SpectreTerje Mathisen
||     |  ||   |    | |  `* Re: SpectreThomas Koenig
||     |  ||   |    | |   +* Re: SpectreAnton Ertl
||     |  ||   |    | |   |`* Re: SpectreThomas Koenig
||     |  ||   |    | |   | +- Re: SpectreAnton Ertl
||     |  ||   |    | |   | `- Re: SpectreMichael S
||     |  ||   |    | |   `- Re: SpectreMitchAlsup
||     |  ||   |    | `* Re: Spectre (was: Why separate 32-bit arithmetic ...)MitchAlsup
||     |  ||   |    |  `- Re: Spectre (was: Why separate 32-bit arithmetic ...)Anton Ertl
||     |  ||   |    `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   |     `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc
||     |  ||   `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
||     |  |+- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Bill Findlay
||     |  |`* Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?John Levine
||     |  `- Re: Why separate 32-bit arithmetic on a 64-bit architecture?Michael S
||     `* Re: Why separate 32-bit arithmetic on a 64-bit architecture?MitchAlsup
|`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Anton Ertl
`* Re: Why separate 32-bit arithmetic on a 64-bit architecture?Quadibloc

Pages:1234567
Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<b096b5cc-df83-4197-9de3-3212f0666ae2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24310&group=comp.arch#24310

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:21cf:b0:42d:cc:4121 with SMTP id d15-20020a05621421cf00b0042d00cc4121mr8526913qvh.70.1647635612361;
Fri, 18 Mar 2022 13:33:32 -0700 (PDT)
X-Received: by 2002:a05:6870:1692:b0:dd:9dc0:1747 with SMTP id
j18-20020a056870169200b000dd9dc01747mr4462117oae.205.1647635612157; Fri, 18
Mar 2022 13:33:32 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Mar 2022 13:33:31 -0700 (PDT)
In-Reply-To: <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:54ed:4d39:c468:35ac;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:54ed:4d39:c468:35ac
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b096b5cc-df83-4197-9de3-3212f0666ae2n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 18 Mar 2022 20:33:32 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18
 by: Quadibloc - Fri, 18 Mar 2022 20:33 UTC

On Friday, March 18, 2022 at 2:16:36 PM UTC-6, Quadibloc wrote:

> There is cost, and then there is value.

And on reading up a bit more about this, I can see now what
the big problem with imprecise interrupts is.

Once your computer has a swap file - virtual memory - you
have to be able to recover from a page fault. So that interrupt
has to be precise.

Maybe there is an alternative. Instead of retrying the failed
instruction after the interrupt is serviced, the interrupt service
routine could use a special instruction to feed the memory access
into the saved complicated pipeline state. That would avoid the
2% - 3% performance cost of having precise interrupts given by
the paper I saw this in.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24311&group=comp.arch#24311

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:444a:b0:67d:2087:c1cd with SMTP id w10-20020a05620a444a00b0067d2087c1cdmr7095363qkp.90.1647637771298;
Fri, 18 Mar 2022 14:09:31 -0700 (PDT)
X-Received: by 2002:a05:6870:d249:b0:dd:ada6:736b with SMTP id
h9-20020a056870d24900b000ddada6736bmr4485650oac.27.1647637771085; Fri, 18 Mar
2022 14:09:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Mar 2022 14:09:30 -0700 (PDT)
In-Reply-To: <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:dd3d:e99a:37f7:e0f7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:dd3d:e99a:37f7:e0f7
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 18 Mar 2022 21:09:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 108
 by: MitchAlsup - Fri, 18 Mar 2022 21:09 UTC

On Friday, March 18, 2022 at 3:16:36 PM UTC-5, Quadibloc wrote:
> On Friday, March 18, 2022 at 12:43:53 PM UTC-6, MitchAlsup wrote:
>
> > It costs far less to satisfy the denorm problem in the FPU than it cost to solve
> > the precise exception problem when taking exceptions. Far far less. Yet you
> > would never accept the later.
> There is cost, and then there is value.
>
> People mistakenly believe that solving the denomals problem in
> IEEE 754 floating-point has a significant cost, and so they're
> against bothering because computers got along fine for years
> without denormals.
>
> But exceptions were always precise before IBM came up with
> the Tomasulo algorithm and the Model 91. (Actually, not quite:
> the 6600 had imprecise interrupts too.) So this was a _change_
> that deprived people of something they were used to, and didn't
> have any idea of how to do without.
<
CDC 6600 had NO exceptions other than touching memory outside
of base and bounds. Calculation exceptions created results much
like IEEE overflowing to infinities, underflowing to zeros, integer
wrapping (1-s complement),...
<
Also note the CDC 6600 did not respond to interrupts. The peripheral
processors responded to interrupts, performed I/O data movement,
did CPU scheduling, and when they new what was next to run, the PPs
kicked the CPU in the shin and made it execute an exchange jump
instruction. 10-cycles later you were running a new thread in a new
context.
<
Moral: the CPU does not have to be the one that responds to
interrupts.
>
> Initially, the Model 91 didn't have precise exceptions; the manual
> explained that there was this limitation of the machine compared
> to others.
>
> Incidentally, on a page about the history of this computer, I saw
> this quote from a memo at SLAC: "The Model 91 designers have
> provided a switch that puts the machine into non-overlapped
> mode, in which it runs at about Model 75 speed but with precise
> interrupts. This switch will be important for program debugging,
> and it can be set and reset by programming."
<
I thought that was the /195
>
> This was cheaper than providing precise interrupts and out-of-order
> operation at the same time; a similar feature on modern machines
> might be useful as a Spectre mitigation - let only the untrusted
> code run slower, trusted code doesn't need it.
<
Trusted code does not need it only if you can prevent RoP attacks.
>
> But the Model 91 was a usable computer as it was. So it is
> possible to have a computer capable of doing useful work, with
> I/O done by interrupts, even if the interrupts are imprecise.
>
> Actually, _that_ makes sense. Interrupts to service I/O should be
> totally invisible to the programs that are running concurrently.
<
Even if you interrupt it early, or let the whole pipeline drain, interrupts
are still precise.
<
It always amused me that CPU designers would go to great lengths
to transfer control to SW on an interrupt in as few cycles as possible
(as low as 5-cycles on some RISC machines) and then spend 50-100
cycles saving state before SW could get to the code to fix why control
arrived here.
<
> So they shouldn't mess with them in any way. So it doesn't matter
> if the interrupt happens after an instruction, or in the middle of
> a complicated pipeline state, since as far as the running user mode
> programs are concerned, the interrupt could have been serviced by
> another core.
>
> It's when _traps_ are imprecise, like a divide by zero error, that it's
> a pain; now, the program has no choice but to fail, while with
> precise traps, the interrupt handler could include a customized
> fixup, and set the program to running again.
<
OK
<
You have used the words "exception", "interrupt" and "trap" almost interchangeably.
They are not (and they are) depending on where inside the machine you look.
<
Exception is caused by an instruction which cannot be completed.
Trap is caused by an instruction doing exactly what it is supposed to do.
Interrupt is a call for help asynchronous to the program at hand.
<
Mc 88100 had imprecise precise exceptions. When an exception occurred
enough state was latched to unload the machine, figure out what was going
on, fix the issues, and continue. But it did not have the 1 PC pointing at the
1 instruction which caused the 1 exception, it had the PC pointing at the next
instruction to execute after you fixed everything.
<
When control arrived at exception handler, it could unload up to 5 problems.
It would them go about fixing all pending problems, and then it could get on
with running code.
<
EVEN without the 1 PC pointing at the 1 instruction which ultimately caused
1 exception control transfer, or backing up the machine to become precise.
It ran DG UNIX "just fine".
<
The only applications which had issues were those that assumed that there
was 1 PC, 1 instruction, 1 exception, and long jumped out of signal handler
instead of returning to OS to see if there was another exception pending.
>
> John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<a97a6b26-276e-44ab-b0db-a7ffe8dc314an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24312&group=comp.arch#24312

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5f49:0:b0:2e1:b989:7aab with SMTP id y9-20020ac85f49000000b002e1b9897aabmr8907344qta.465.1647638124099;
Fri, 18 Mar 2022 14:15:24 -0700 (PDT)
X-Received: by 2002:a9d:6393:0:b0:5b2:3ff9:d1c1 with SMTP id
w19-20020a9d6393000000b005b23ff9d1c1mr3886651otk.243.1647638123877; Fri, 18
Mar 2022 14:15:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Mar 2022 14:15:23 -0700 (PDT)
In-Reply-To: <b096b5cc-df83-4197-9de3-3212f0666ae2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:dd3d:e99a:37f7:e0f7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:dd3d:e99a:37f7:e0f7
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<b096b5cc-df83-4197-9de3-3212f0666ae2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a97a6b26-276e-44ab-b0db-a7ffe8dc314an@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 18 Mar 2022 21:15:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 36
 by: MitchAlsup - Fri, 18 Mar 2022 21:15 UTC

On Friday, March 18, 2022 at 3:33:33 PM UTC-5, Quadibloc wrote:
> On Friday, March 18, 2022 at 2:16:36 PM UTC-6, Quadibloc wrote:
>
> > There is cost, and then there is value.
> And on reading up a bit more about this, I can see now what
> the big problem with imprecise interrupts is.
>
> Once your computer has a swap file - virtual memory - you
> have to be able to recover from a page fault. So that interrupt
> has to be precise.
<
Not the kind of precision you think it needs; much less.
>
> Maybe there is an alternative. Instead of retrying the failed
> instruction after the interrupt is serviced, the interrupt service
> routine could use a special instruction to feed the memory access
> into the saved complicated pipeline state. That would avoid the
> 2% - 3% performance cost of having precise interrupts given by
> the paper I saw this in.
<
You can use the "completion" model or the "retry" model.
<
In the completion model, the exception handler performs the instruction
updates state of thread (registers and memory) and returns after the
excepting instruction.
<
In the retry model, you fix one thing, try again, take another exception, fix
that , try again, until you have fixed everything about the instruction
and make forward progress again.
<
Both models work, the completion model does not need the precision
that people think they need, and can deal with k exceptions, interrupts,
traps rather than just 1. This is a lot like a file system that sends out
disk requests and gets sectors returned in a different order than their
requests went out.
>
> John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<0001HW.27E55CB7006C64B270000599B38F@news.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24316&group=comp.arch#24316

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: findlayb...@blueyonder.co.uk (Bill Findlay)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sat, 19 Mar 2022 00:35:03 +0000
Organization: none
Lines: 12
Message-ID: <0001HW.27E55CB7006C64B270000599B38F@news.individual.net>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de> <0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org> <t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com> <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
Reply-To: findlaybill@blueyonder.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: individual.net o2BhwSxlj0841q0xKYhblAJMgGOjAYg1pWxTBynt37JD9d4v4a
X-Orig-Path: not-for-mail
Cancel-Lock: sha1:Bcp8l8tdZ2g9pvXfS234oM1MSmQ=
User-Agent: Hogwasher/5.24
 by: Bill Findlay - Sat, 19 Mar 2022 00:35 UTC

On 18 Mar 2022, Quadibloc wrote
(in article<3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>):

> But exceptions were always precise before IBM came up with
> the Tomasulo algorithm and the Model 91. (Actually, not quite:
> the 6600 had imprecise interrupts too.)
The KDF9 beat the 6600 by about 6 months.

--
Bill Findlay

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<t13bi2$nbc$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24317&group=comp.arch#24317

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sat, 19 Mar 2022 01:27:30 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t13bi2$nbc$1@gal.iecc.com>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com> <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
Injection-Date: Sat, 19 Mar 2022 01:27:30 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="23916"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <sso6aq$37b$1@newsreader4.netcologne.de> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com> <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 19 Mar 2022 01:27 UTC

According to Quadibloc <jsavard@ecn.ab.ca>:
>Initially, the Model 91 didn't have precise exceptions; the manual
>explained that there was this limitation of the machine compared
>to others.
>
>Incidentally, on a page about the history of this computer, I saw
>this quote from a memo at SLAC: "The Model 91 designers have
>provided a switch that puts the machine into non-overlapped
>mode, in which it runs at about Model 75 speed but with precise
>interrupts. This switch will be important for program debugging,
>and it can be set and reset by programming."

There was no such switch on the /91 although the manual says there
was one on the /195. You had to restart the machine to turn it on
or off so I doubt it was used much.

On both machines there was a variety of no-op that waited for all
previous instructions to finish that you could use to get the effect
of a precise interrupt. I believe there was an option in the PL/I
compiler to generate them.

Having spent a summer working on Fortran programs on a 360/91 I
can say that the imprecise interrupts were annoying but not all
that hard to deal with. You'd look at the dump, figure out
what failed, and adjust the program either to check the data
or rearrange it so the fault wouldn't happen.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24319&group=comp.arch#24319

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:65cb:0:b0:67d:374a:4ecc with SMTP id z194-20020a3765cb000000b0067d374a4eccmr7351588qkb.689.1647662842841;
Fri, 18 Mar 2022 21:07:22 -0700 (PDT)
X-Received: by 2002:a05:6870:d249:b0:dd:ada6:736b with SMTP id
h9-20020a056870d24900b000ddada6736bmr4922854oac.27.1647662842625; Fri, 18 Mar
2022 21:07:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Mar 2022 21:07:22 -0700 (PDT)
In-Reply-To: <t13bi2$nbc$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:70f7:8eab:ef34:8078;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:70f7:8eab:ef34:8078
References: <sso6aq$37b$1@newsreader4.netcologne.de> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<t13bi2$nbc$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sat, 19 Mar 2022 04:07:22 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 26
 by: Quadibloc - Sat, 19 Mar 2022 04:07 UTC

On Friday, March 18, 2022 at 7:27:33 PM UTC-6, John Levine wrote:

> On both machines there was a variety of no-op that waited for all
> previous instructions to finish

Yes, there was, I saw that in my search to confirm the 91's interrupts
were imprecise.

> that you could use to get the effect
> of a precise interrupt.

No. Because while you could indeed use that instruction to get
an interrupt after an instruction, instead of an interrupt in the middle
of a complicated pipeline state with many instructions in flight, the
instruction you interrupted at wouldn't be the _right_ instruction.

If there was an interrupt because of a memory access instruction
having a page fault, you need *that instruction* to be pointed to by
the return address. The no-op would flush all instructions in the
pipeline when the interrupt happened, so your return address would
be several instructions later.

The same problem arises if you want to fix an arithmetic underflow,
or simulate an additional instruction when an unimplemented opcode
is hit.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24320&group=comp.arch#24320

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5bef:0:b0:441:3b3:31ce with SMTP id k15-20020ad45bef000000b0044103b331cemr3394889qvc.50.1647663434584;
Fri, 18 Mar 2022 21:17:14 -0700 (PDT)
X-Received: by 2002:a05:6808:8e4:b0:2ec:aea1:353a with SMTP id
d4-20020a05680808e400b002ecaea1353amr5741442oic.27.1647663434385; Fri, 18 Mar
2022 21:17:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Mar 2022 21:17:14 -0700 (PDT)
In-Reply-To: <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:70f7:8eab:ef34:8078;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:70f7:8eab:ef34:8078
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sat, 19 Mar 2022 04:17:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 35
 by: Quadibloc - Sat, 19 Mar 2022 04:17 UTC

On Friday, March 18, 2022 at 3:09:33 PM UTC-6, MitchAlsup wrote:
> On Friday, March 18, 2022 at 3:16:36 PM UTC-5, Quadibloc wrote:

> > This was cheaper than providing precise interrupts and out-of-order
> > operation at the same time; a similar feature on modern machines
> > might be useful as a Spectre mitigation - let only the untrusted
> > code run slower, trusted code doesn't need it.

> Trusted code does not need it only if you can prevent RoP attacks.

It is true that if one wishes to be serious about security, on a system
where all kinds of attacks are possible against the operating system,
it's better to mitigate Spectre everywhere you can.

Basically, though, if your operating system _were_ so good that
Spectre was the *only thing* that let anyone attack it, then if you
prevented untrusted code from making use of Spectre, then there
would be no attack vectors with which untrusted code could
masquerade as trusted code.

But in the real world, surely that's never true?

That may be. But if the required mitigations come with a _heavy
performance cost_ and the system is performance-critical and
cost-critical, this lower level of security, combined with a better-quality
operating system and alert and trained users, may be appropriate.

Also, another possibility is: use the best mitigations possible that
come with a "reasonable" performance cost all the time - but use
a 'perfect' mitigation - running the system in 486 mode - that has
a heavy performance cost for untrusted code only. If Spectre _is_
the major security hole in the system, and attackers _are_ exploiting
forms of the attack that can't be mitigated without excessive performance
cost, what other choice is there?

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24323&group=comp.arch#24323

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:5d1:b0:2e0:70c7:1678 with SMTP id d17-20020a05622a05d100b002e070c71678mr11183143qtb.43.1647706568226;
Sat, 19 Mar 2022 09:16:08 -0700 (PDT)
X-Received: by 2002:a05:6808:13c2:b0:2da:6007:8317 with SMTP id
d2-20020a05680813c200b002da60078317mr10497711oiw.7.1647706567995; Sat, 19 Mar
2022 09:16:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 09:16:07 -0700 (PDT)
In-Reply-To: <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4073:7501:59a7:2fa1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4073:7501:59a7:2fa1
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Mar 2022 16:16:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 46
 by: MitchAlsup - Sat, 19 Mar 2022 16:16 UTC

On Friday, March 18, 2022 at 11:17:16 PM UTC-5, Quadibloc wrote:
> On Friday, March 18, 2022 at 3:09:33 PM UTC-6, MitchAlsup wrote:
> > On Friday, March 18, 2022 at 3:16:36 PM UTC-5, Quadibloc wrote:
>
> > > This was cheaper than providing precise interrupts and out-of-order
> > > operation at the same time; a similar feature on modern machines
> > > might be useful as a Spectre mitigation - let only the untrusted
> > > code run slower, trusted code doesn't need it.
>
> > Trusted code does not need it only if you can prevent RoP attacks.
> It is true that if one wishes to be serious about security, on a system
> where all kinds of attacks are possible against the operating system,
> it's better to mitigate Spectre everywhere you can.
<
Yes,
>
> Basically, though, if your operating system _were_ so good that
> Spectre was the *only thing* that let anyone attack it, then if you
> prevented untrusted code from making use of Spectre, then there
> would be no attack vectors with which untrusted code could
> masquerade as trusted code.
>
> But in the real world, surely that's never true?
>
> That may be. But if the required mitigations come with a _heavy
> performance cost_ and the system is performance-critical and
> cost-critical, this lower level of security, combined with a better-quality
> operating system and alert and trained users, may be appropriate.
<
What if the mitigations do not bear such a heavy cost; say on the
order of 1%.......
>
> Also, another possibility is: use the best mitigations possible that
> come with a "reasonable" performance cost all the time - but use
> a 'perfect' mitigation - running the system in 486 mode - that has
> a heavy performance cost for untrusted code only. If Spectre _is_
> the major security hole in the system, and attackers _are_ exploiting
> forms of the attack that can't be mitigated without excessive performance
> cost, what other choice is there?
<
Build machines that are immune from Spectré.
>
> John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t154mh$ck1$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24324&group=comp.arch#24324

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!NZ87pNe1TKxNDknVl4tZhw.user.46.165.242.91.POSTED!not-for-mail
From: antis...@math.uni.wroc.pl
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sat, 19 Mar 2022 17:42:41 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <t154mh$ck1$1@gioia.aioe.org>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de> <0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org> <2e234d6c-be42-4df7-a8e7-3345e22e6acdn@googlegroups.com>
Injection-Info: gioia.aioe.org; logging-data="12929"; posting-host="NZ87pNe1TKxNDknVl4tZhw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: tin/2.4.5-20201224 ("Glen Albyn") (Linux/5.10.0-9-amd64 (x86_64))
X-Notice: Filtered by postfilter v. 0.9.2
Cancel-Lock: sha1:xAgp92Jz2DD3TLZtpgeXEHtV4+E=
 by: antis...@math.uni.wroc.pl - Sat, 19 Mar 2022 17:42 UTC

MitchAlsup <MitchAlsup@aol.com> wrote:
> On Thursday, March 17, 2022 at 8:24:14 PM UTC-5, anti...@math.uni.wroc.pl wrote:
> > Quadibloc <jsa...@ecn.ab.ca> wrote:
> > > On Wednesday, January 26, 2022 at 12:05:13 AM UTC-7, Thomas Koenig wrote:
> > >
> > > > I especially like the sentence
> > > >
> > > > DEC?s main advocate on the IEEE p754 Committee was a Mathematician
> > > > and Numerical Analyst Dr. Mary H. Payne. She was experienced,
> > > > fully competent,
> > > >
> > > > and misled by DEC?s hardware engineers
> > > >
> > > > when they assured her that Gradual Underflow was unimplementable
> > > > as the default (no trap) of any computer arithmetic aspiring to
> > > > high performance.
> > >
> > > That is true, but, although I may be mistaken, it appeared to me that
> > > while one certainly _could_ have a high-performance floating-point
> > > implementation which fully supported IEEE 754 gradual underflow,
> > >
> > > one would have to do it by representing floating-point numbers in an
> > > "internal form" when they're in registers, like the way the 8087 did it,
> > > where the internal form was like an old-style plain floating-point format,
> > > but with a range larger than the architectural floating-point format, so
> > > that it covered all the gradual underflow territory.
> > >
> > > And, of course, if you do it that way, you can't properly and accurately
> > > trap when a register-to-register calculation underflows, because that isn't
> > > easily visible any longer. Instead, you find out when you store the result
> > > in memory.
> >
> > Well, "internal format" really means one extra exponent bit.
> > Detecting underflow with such format does not present extra
> > difficulty. Real problem is that one has to properly round
> > result so that it does not have more accuracy than allowed
> > by denormal representation. AFAICS rounding after addition
> > does not present problem. But rounding after multiplication
> > of division may be problematic, basically instead of rounding
> > in fixed binary position one need rounding at variable position.
> >
> > > So if you _exclude_ the idea of doing calculations on an internal form
> > > of numbers, because it's problematic, then indeed gradual underflow will
> > > obstruct high performance.
> >
> > Well, there is need for extra hardware in data path. Depending
> > on organization of FPU almost all hardware needed for denormals
> > may be already there. Probably worst case is FPU with separate
> > load/store unit, adder and multiplier. When operating internally
> > on normalized numbers such FPU needs extra exponenent bit,
> > shifter in load/store unit (not needed otherwise), and extra
> > rounding circuitry after multiplication.
> <
> Once your FPU design is centered around FMAC, the added circuitry
> is about 1.2-gates per bit of accumulator width and 1 gate delay of
> added latency.
> <
> The FMUL unit in Opteron was ~60-gates of delay
> The FADD unit of Opteron was ~50-gates of delay
> With a cycle time of 16-gates of logic per cycle.

Ok, using your numbers cost is about 1.6% of latency. Now look
at gain (for simplicity only for single precision, for double
precision relative gain is smaller): extra representable numbers
are less than 0.13 extra exponent bits, that is 0.4% improvement.
And _everyone_ must pay latency cost, while denormals gain you
this 0.4% only if you use them.

> <
> <
> < This may increase
> > load/store latency and multiplication latency. AFAICS allowing
> > denormals internaly means slightly modified arithemtic units
> > to avoid inserting hideden bit for denormals. For multiplication
> > and division again there is problem of rounding plus need to
> > shift (to get denormal representation).
> >
> > Either case does not look like a very bing problem. OTOH
> > extra gates may lower clock frequentcy by few percent or
> > add extra clock of latency.
> >
> > I find denormals of almost no use, so even low extra cost
> > seem too high. But other folks may prefer different
> > tradeof.
> <
> That was my position n 1985, I have been edumacated to the
> facts that denorms are not "expensive" and it is easier to obey
> the std than to hand wave around not needing to fully comply.

You write "obey the std". IMO standard is bad here (well,
this looks like typical mistake made when decision are made
by committee). I understand that some programs may depend
on denormals. Using traps and software emulation is probably
the best way for handling such programs. If authors care
about speed they will fix programs to avoid denormals. If
authors do not care, then presumably emulation is fast
enough.

--
Waldek Hebisch

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t156a5$cq9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24325&group=comp.arch#24325

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sat, 19 Mar 2022 11:10:08 -0700
Organization: A noiseless patient Spider
Lines: 111
Message-ID: <t156a5$cq9$1@dont-email.me>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de>
<sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad>
<ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com>
<t10mvq$4oe$1@gioia.aioe.org>
<2e234d6c-be42-4df7-a8e7-3345e22e6acdn@googlegroups.com>
<t154mh$ck1$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 19 Mar 2022 18:10:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d6b68c0ad0549b324853aaeeb5e2344e";
logging-data="13129"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AsY62PG60exuUKq1DedyN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.2
Cancel-Lock: sha1:7pyH+UZyroD7cjt0J5BPx22Es7g=
In-Reply-To: <t154mh$ck1$1@gioia.aioe.org>
Content-Language: en-US
 by: Ivan Godard - Sat, 19 Mar 2022 18:10 UTC

On 3/19/2022 10:42 AM, antispam@math.uni.wroc.pl wrote:
> MitchAlsup <MitchAlsup@aol.com> wrote:
>> On Thursday, March 17, 2022 at 8:24:14 PM UTC-5, anti...@math.uni.wroc.pl wrote:
>>> Quadibloc <jsa...@ecn.ab.ca> wrote:
>>>> On Wednesday, January 26, 2022 at 12:05:13 AM UTC-7, Thomas Koenig wrote:
>>>>
>>>>> I especially like the sentence
>>>>>
>>>>> DEC?s main advocate on the IEEE p754 Committee was a Mathematician
>>>>> and Numerical Analyst Dr. Mary H. Payne. She was experienced,
>>>>> fully competent,
>>>>>
>>>>> and misled by DEC?s hardware engineers
>>>>>
>>>>> when they assured her that Gradual Underflow was unimplementable
>>>>> as the default (no trap) of any computer arithmetic aspiring to
>>>>> high performance.
>>>>
>>>> That is true, but, although I may be mistaken, it appeared to me that
>>>> while one certainly _could_ have a high-performance floating-point
>>>> implementation which fully supported IEEE 754 gradual underflow,
>>>>
>>>> one would have to do it by representing floating-point numbers in an
>>>> "internal form" when they're in registers, like the way the 8087 did it,
>>>> where the internal form was like an old-style plain floating-point format,
>>>> but with a range larger than the architectural floating-point format, so
>>>> that it covered all the gradual underflow territory.
>>>>
>>>> And, of course, if you do it that way, you can't properly and accurately
>>>> trap when a register-to-register calculation underflows, because that isn't
>>>> easily visible any longer. Instead, you find out when you store the result
>>>> in memory.
>>>
>>> Well, "internal format" really means one extra exponent bit.
>>> Detecting underflow with such format does not present extra
>>> difficulty. Real problem is that one has to properly round
>>> result so that it does not have more accuracy than allowed
>>> by denormal representation. AFAICS rounding after addition
>>> does not present problem. But rounding after multiplication
>>> of division may be problematic, basically instead of rounding
>>> in fixed binary position one need rounding at variable position.
>>>
>>>> So if you _exclude_ the idea of doing calculations on an internal form
>>>> of numbers, because it's problematic, then indeed gradual underflow will
>>>> obstruct high performance.
>>>
>>> Well, there is need for extra hardware in data path. Depending
>>> on organization of FPU almost all hardware needed for denormals
>>> may be already there. Probably worst case is FPU with separate
>>> load/store unit, adder and multiplier. When operating internally
>>> on normalized numbers such FPU needs extra exponenent bit,
>>> shifter in load/store unit (not needed otherwise), and extra
>>> rounding circuitry after multiplication.
>> <
>> Once your FPU design is centered around FMAC, the added circuitry
>> is about 1.2-gates per bit of accumulator width and 1 gate delay of
>> added latency.
>> <
>> The FMUL unit in Opteron was ~60-gates of delay
>> The FADD unit of Opteron was ~50-gates of delay
>> With a cycle time of 16-gates of logic per cycle.
>
> Ok, using your numbers cost is about 1.6% of latency. Now look
> at gain (for simplicity only for single precision, for double
> precision relative gain is smaller): extra representable numbers
> are less than 0.13 extra exponent bits, that is 0.4% improvement.
> And _everyone_ must pay latency cost, while denormals gain you
> this 0.4% only if you use them.
>
>> <
>> <
>> < This may increase
>>> load/store latency and multiplication latency. AFAICS allowing
>>> denormals internaly means slightly modified arithemtic units
>>> to avoid inserting hideden bit for denormals. For multiplication
>>> and division again there is problem of rounding plus need to
>>> shift (to get denormal representation).
>>>
>>> Either case does not look like a very bing problem. OTOH
>>> extra gates may lower clock frequentcy by few percent or
>>> add extra clock of latency.
>>>
>>> I find denormals of almost no use, so even low extra cost
>>> seem too high. But other folks may prefer different
>>> tradeof.
>> <
>> That was my position n 1985, I have been edumacated to the
>> facts that denorms are not "expensive" and it is easier to obey
>> the std than to hand wave around not needing to fully comply.
>
> You write "obey the std". IMO standard is bad here (well,
> this looks like typical mistake made when decision are made
> by committee). I understand that some programs may depend
> on denormals. Using traps and software emulation is probably
> the best way for handling such programs. If authors care
> about speed they will fix programs to avoid denormals. If
> authors do not care, then presumably emulation is fast
> enough.

Not all increases in gate count lead to increases in cycle count.
Digital circuits have a gate-count granularity; if the increase in gates
just utilizes gates that would be otherwise wasted in the final cycle
then the actual cost of denormal support is zero.

And the designer has some control over granularity. If it happens that
denormals push a performance-critical instruction over a cycle boundary,
then the size of a cycle can be adjusted with, if necessary, a balance
in the clock rate. However, it is extremely unlikely that any FP
instruction is so critical in the market - far more likely is that the
time to the D$1 is what chooses the cycle gate-width.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<6d86a7b6-0e7b-40b0-95ae-d31e7ec16244n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24326&group=comp.arch#24326

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:181:b0:2e1:e70a:ec2a with SMTP id s1-20020a05622a018100b002e1e70aec2amr11567125qtw.42.1647713435999;
Sat, 19 Mar 2022 11:10:35 -0700 (PDT)
X-Received: by 2002:a05:6870:45a4:b0:dd:b08e:fa49 with SMTP id
y36-20020a05687045a400b000ddb08efa49mr6193898oao.270.1647713435752; Sat, 19
Mar 2022 11:10:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 11:10:35 -0700 (PDT)
In-Reply-To: <t154mh$ck1$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4073:7501:59a7:2fa1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4073:7501:59a7:2fa1
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<2e234d6c-be42-4df7-a8e7-3345e22e6acdn@googlegroups.com> <t154mh$ck1$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6d86a7b6-0e7b-40b0-95ae-d31e7ec16244n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Mar 2022 18:10:35 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 122
 by: MitchAlsup - Sat, 19 Mar 2022 18:10 UTC

On Saturday, March 19, 2022 at 12:42:47 PM UTC-5, anti...@math.uni.wroc.pl wrote:
> MitchAlsup <Mitch...@aol.com> wrote:
> > On Thursday, March 17, 2022 at 8:24:14 PM UTC-5, anti...@math.uni.wroc.pl wrote:
> > > Quadibloc <jsa...@ecn.ab.ca> wrote:
> > > > On Wednesday, January 26, 2022 at 12:05:13 AM UTC-7, Thomas Koenig wrote:
> > > >
> > > > > I especially like the sentence
> > > > >
> > > > > DEC?s main advocate on the IEEE p754 Committee was a Mathematician
> > > > > and Numerical Analyst Dr. Mary H. Payne. She was experienced,
> > > > > fully competent,
> > > > >
> > > > > and misled by DEC?s hardware engineers
> > > > >
> > > > > when they assured her that Gradual Underflow was unimplementable
> > > > > as the default (no trap) of any computer arithmetic aspiring to
> > > > > high performance.
> > > >
> > > > That is true, but, although I may be mistaken, it appeared to me that
> > > > while one certainly _could_ have a high-performance floating-point
> > > > implementation which fully supported IEEE 754 gradual underflow,
> > > >
> > > > one would have to do it by representing floating-point numbers in an
> > > > "internal form" when they're in registers, like the way the 8087 did it,
> > > > where the internal form was like an old-style plain floating-point format,
> > > > but with a range larger than the architectural floating-point format, so
> > > > that it covered all the gradual underflow territory.
> > > >
> > > > And, of course, if you do it that way, you can't properly and accurately
> > > > trap when a register-to-register calculation underflows, because that isn't
> > > > easily visible any longer. Instead, you find out when you store the result
> > > > in memory.
> > >
> > > Well, "internal format" really means one extra exponent bit.
> > > Detecting underflow with such format does not present extra
> > > difficulty. Real problem is that one has to properly round
> > > result so that it does not have more accuracy than allowed
> > > by denormal representation. AFAICS rounding after addition
> > > does not present problem. But rounding after multiplication
> > > of division may be problematic, basically instead of rounding
> > > in fixed binary position one need rounding at variable position.
> > >
> > > > So if you _exclude_ the idea of doing calculations on an internal form
> > > > of numbers, because it's problematic, then indeed gradual underflow will
> > > > obstruct high performance.
> > >
> > > Well, there is need for extra hardware in data path. Depending
> > > on organization of FPU almost all hardware needed for denormals
> > > may be already there. Probably worst case is FPU with separate
> > > load/store unit, adder and multiplier. When operating internally
> > > on normalized numbers such FPU needs extra exponenent bit,
> > > shifter in load/store unit (not needed otherwise), and extra
> > > rounding circuitry after multiplication.
> > <
> > Once your FPU design is centered around FMAC, the added circuitry
> > is about 1.2-gates per bit of accumulator width and 1 gate delay of
> > added latency.
> > <
> > The FMUL unit in Opteron was ~60-gates of delay
> > The FADD unit of Opteron was ~50-gates of delay
> > With a cycle time of 16-gates of logic per cycle.
> Ok, using your numbers cost is about 1.6% of latency. Now look
<
That is not how it works.
<
With a 16-gate machine, you have 16-gates, 32-gates, 48-gates,
64-gates as design targets. It would take a lot of pounding to get
FADD down to 48-gates and thereby 3-cycles, it is impossible to
get FMUL down to 3-cycles, so you have 4-more gates of logic
where you can do stuff. Thus, 4-easy cycles for FADD, 4-not-so
hard cycles for FMUL, and the FP pipeline is uniformly 4-cycles
long making the overall FPU easier to design.
<
You can do the same analysis with any gates of delay you like.
<
I should also add that an FMAC is 62 gates of delay. This limits
some of the gates-per-cycle decisions that are viable.
<
> at gain (for simplicity only for single precision, for double
> precision relative gain is smaller): extra representable numbers
> are less than 0.13 extra exponent bits, that is 0.4% improvement.
> And _everyone_ must pay latency cost, while denormals gain you
> this 0.4% only if you use them.
> > <
> > <
> > < This may increase
> > > load/store latency and multiplication latency. AFAICS allowing
> > > denormals internaly means slightly modified arithemtic units
> > > to avoid inserting hideden bit for denormals. For multiplication
> > > and division again there is problem of rounding plus need to
> > > shift (to get denormal representation).
> > >
> > > Either case does not look like a very bing problem. OTOH
> > > extra gates may lower clock frequentcy by few percent or
> > > add extra clock of latency.
> > >
> > > I find denormals of almost no use, so even low extra cost
> > > seem too high. But other folks may prefer different
> > > tradeof.
> > <
> > That was my position n 1985, I have been edumacated to the
> > facts that denorms are not "expensive" and it is easier to obey
> > the std than to hand wave around not needing to fully comply.
<
> You write "obey the std". IMO standard is bad here (well,
> this looks like typical mistake made when decision are made
> by committee). I understand that some programs may depend
> on denormals. Using traps and software emulation is probably
> the best way for handling such programs. If authors care
> about speed they will fix programs to avoid denormals. If
> authors do not care, then presumably emulation is fast
> enough.
<
a) I can agree that the std is baroque and has stuff in it that
should not be there. But it is there, and some people (i.e., fools)
are dependent upon those poor ideas being in the std.
b) therefore change the std, introduce a new std and have
people/applications migrate to the new std.
c) but do not argue that the current std is too hard to
implement. That part is not true.
>
> --
> Waldek Hebisch

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<t156o6$30a$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24327&group=comp.arch#24327

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sat, 19 Mar 2022 18:17:42 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t156o6$30a$1@gal.iecc.com>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <t13bi2$nbc$1@gal.iecc.com> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
Injection-Date: Sat, 19 Mar 2022 18:17:42 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="3082"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <t13bi2$nbc$1@gal.iecc.com> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 19 Mar 2022 18:17 UTC

According to Quadibloc <jsavard@ecn.ab.ca>:
>On Friday, March 18, 2022 at 7:27:33 PM UTC-6, John Levine wrote:
>
>> On both machines there was a variety of no-op that waited for all
>> previous instructions to finish
>
>Yes, there was, I saw that in my search to confirm the 91's interrupts
>were imprecise.
>
>> that you could use to get the effect of a precise interrupt.
>
>No. Because while you could indeed use that instruction to get
>an interrupt after an instruction, instead of an interrupt in the middle
>of a complicated pipeline state with many instructions in flight, the
>instruction you interrupted at wouldn't be the _right_ instruction.

I said the effect of a precise interrupt, not an actual precise interrupt.
If you put the flushing no-op after each instruction that might fault, you
just subtract 2 to get the address of the faulting instruction. Nobody said
this would be fast.

>If there was an interrupt because of a memory access instruction
>having a page fault, ...

A what? What is a "page fault"? If you're looking for the 360/67, it's down the
hall on your right.

On the /91 and /195, the imprecise interrupts you might want to
recover from were floating point overflow and underflow. Anything else
was either an addressing error which would mean a progrm bug, or a
zero divide or fixed point overflow which was easy to test for and
prevent the fault.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<0042f1d2-b53c-47e3-86df-13f684f517a0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24330&group=comp.arch#24330

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:fe47:0:b0:42d:f798:3da5 with SMTP id u7-20020a0cfe47000000b0042df7983da5mr11353407qvs.77.1647720873873;
Sat, 19 Mar 2022 13:14:33 -0700 (PDT)
X-Received: by 2002:a9d:853:0:b0:5b2:617e:e982 with SMTP id
77-20020a9d0853000000b005b2617ee982mr5469192oty.333.1647720873631; Sat, 19
Mar 2022 13:14:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 13:14:33 -0700 (PDT)
In-Reply-To: <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0042f1d2-b53c-47e3-86df-13f684f517a0n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 19 Mar 2022 20:14:33 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Sat, 19 Mar 2022 20:14 UTC

On Friday, March 18, 2022 at 8:43:53 PM UTC+2, MitchAlsup wrote:
> On Friday, March 18, 2022 at 10:27:35 AM UTC-5, Michael S wrote:
> > On Friday, March 18, 2022 at 10:53:03 AM UTC+2, Terje Mathisen wrote:
> > > anti...@math.uni.wroc.pl wrote:
> > > > Quadibloc <jsa...@ecn.ab.ca> wrote:
> > > >> On Wednesday, January 26, 2022 at 12:05:13 AM UTC-7, Thomas Koenig wrote:
> > > >>
> > > >>> I especially like the sentence
> > > >>>
> > > >>> DEC?s main advocate on the IEEE p754 Committee was a Mathematician
> > > >>> and Numerical Analyst Dr. Mary H. Payne. She was experienced,
> > > >>> fully competent,
> > > >>>
> > > >>> and misled by DEC?s hardware engineers
> > > >>>
> > > >>> when they assured her that Gradual Underflow was unimplementable
> > > >>> as the default (no trap) of any computer arithmetic aspiring to
> > > >>> high performance.
> > > >>
> > > >> That is true, but, although I may be mistaken, it appeared to me that
> > > >> while one certainly _could_ have a high-performance floating-point
> > > >> implementation which fully supported IEEE 754 gradual underflow,
> > > >>
> > > >> one would have to do it by representing floating-point numbers in an
> > > >> "internal form" when they're in registers, like the way the 8087 did it,
> > > >> where the internal form was like an old-style plain floating-point format,
> > > >> but with a range larger than the architectural floating-point format, so
> > > >> that it covered all the gradual underflow territory.
> > > >>
> > > >> And, of course, if you do it that way, you can't properly and accurately
> > > >> trap when a register-to-register calculation underflows, because that isn't
> > > >> easily visible any longer. Instead, you find out when you store the result
> > > >> in memory.
> > > >
> > > > Well, "internal format" really means one extra exponent bit.
> > > > Detecting underflow with such format does not present extra
> > > > difficulty. Real problem is that one has to properly round
> > > > result so that it does not have more accuracy than allowed
> > > > by denormal representation. AFAICS rounding after addition
> > > > does not present problem. But rounding after multiplication
> > > > of division may be problematic, basically instead of rounding
> > > > in fixed binary position one need rounding at variable position.
> > > >
> > > >> So if you _exclude_ the idea of doing calculations on an internal form
> > > >> of numbers, because it's problematic, then indeed gradual underflow will
> > > >> obstruct high performance.
> > > >
> > > > Well, there is need for extra hardware in data path. Depending
> > > > on organization of FPU almost all hardware needed for denormals
> > > > may be already there. Probably worst case is FPU with separate
> > > > load/store unit, adder and multiplier. When operating internally
> > > > on normalized numbers such FPU needs extra exponenent bit,
> > > > shifter in load/store unit (not needed otherwise), and extra
> > > > rounding circuitry after multiplication. This may increase
> > > > load/store latency and multiplication latency. AFAICS allowing
> > > > denormals internaly means slightly modified arithemtic units
> > > > to avoid inserting hideden bit for denormals. For multiplication
> > > > and division again there is problem of rounding plus need to
> > > > shift (to get denormal representation).
> > > >
> > > > Either case does not look like a very bing problem. OTOH
> > > > extra gates may lower clock frequentcy by few percent or
> > > > add extra clock of latency.
> > > >
> > > > I find denormals of almost no use, so even low extra cost
> > > > seem too high. But other folks may prefer different
> > > > tradeof.
> > > I seldom _need_ subnormal myself, but I have been told that there exists
> > > solution solvers/zero finders that fail to stabilize (you end up
> > > flipping back & forth between two values) unless you have subnormals.
> > >
> > That's what I expect for "chord&tangent" solver, the one I was taught back in high school
> > even before we "officially" were taught derivatives in the math class.
> > Probably, the same applies to pure tangent (i.e. Newton) method, but here I'm less sure.
> > > That, plus the fact (which you mention above) that any FMAC-supporting
> > > fp unit already have all the required circuitry to allow you to skip
> > > input normalization, means that I like Mitch Alsup strongly believe all
> > > fp units should support subnormal values at zero cycle cost.
> <
> > There are different scenarios for one or more denormal input or output.
> > In some of them microtrap could be the best practical solution.
> > In particular, I am thinking about designs where latency of regular ("normal")
> > FADD is lower than latency of FMUL and of FMADD.
> <
> For both FADD and FMUL and FMAC and for the numerator of FDIV, a denorm
> as input simply does not have to generate the hidden bit::
> <
> if( operand.exponent = 0 )
> then 0.fraction;
> else 1.fraction;
> <
> For the output of FADD and FMUL and FMAC and FDIV, you inject a synthetic
> 1-bit at the position the hidden bit would have been in the largest possible denorm.
> This prevents the normalizer from shifting too many positions to the left and gives
> you the proper denormalized result.
> {Note: you insert the synthetic 1-bit only with the value going into the find-first
> circuit, not the value going into the shifter the find-first controls.}
> <
> > Mitch does not see a point in such designs, but I do see it.
> <
> Correction, Mitch sees no value in not fully obeying IEEE 754-2019 because
> the cost to do so is so small.

Implementation of corner cases via microtrap (as opposed to sw-visible trap) fully obey IEEE 754-2019.

> <
> 25 years ago I was in the same camp you seem to be in today. I was shown
> various circuitry tricks that made the denorm problem vanish into a cost so
> small I would be foolish not to do the complete package.
> <
> It costs far less to satisfy the denorm problem in the FPU than it cost to solve
> the precise exception problem when taking exceptions. Far far less. Yet you
> would never accept the later.
> <
> > > Terje
> > >
> > > --
> > > - <Terje.Mathisen at tmsw.no>
> > > "almost all programming can be viewed as an exercise in caching"

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24332&group=comp.arch#24332

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:fcc9:0:b0:441:1aef:988d with SMTP id i9-20020a0cfcc9000000b004411aef988dmr419539qvq.71.1647721579634;
Sat, 19 Mar 2022 13:26:19 -0700 (PDT)
X-Received: by 2002:a05:6808:1152:b0:2da:c7f:66c2 with SMTP id
u18-20020a056808115200b002da0c7f66c2mr10794337oiu.253.1647721579422; Sat, 19
Mar 2022 13:26:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 13:26:19 -0700 (PDT)
In-Reply-To: <t12aif$182v$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 19 Mar 2022 20:26:19 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Sat, 19 Mar 2022 20:26 UTC

On Friday, March 18, 2022 at 6:04:37 PM UTC+2, Terje Mathisen wrote:
> Michael S wrote:
> > On Friday, March 18, 2022 at 10:53:03 AM UTC+2, Terje Mathisen wrote:
> >> I seldom _need_ subnormal myself, but I have been told that there exists
> >> solution solvers/zero finders that fail to stabilize (you end up
> >> flipping back & forth between two values) unless you have subnormals.
> >>
> >
> > That's what I expect for "chord&tangent" solver, the one I was taught back in high school
> > even before we "officially" were taught derivatives in the math class.
> > Probably, the same applies to pure tangent (i.e. Newton) method, but here I'm less sure.
> >
> >> That, plus the fact (which you mention above) that any FMAC-supporting
> >> fp unit already have all the required circuitry to allow you to skip
> >> input normalization, means that I like Mitch Alsup strongly believe all
> >> fp units should support subnormal values at zero cycle cost.
> >
> > There are different scenarios for one or more denormal input or output.
> > In some of them microtrap could be the best practical solution.
> > In particular, I am thinking about designs where latency of regular ("normal") FADD is lower than latency of FMUL and of FMADD.
> > Mitch does not see a point in such designs, but I do see it.
> After spending 2016-2019 on the ieee754 update working group I strongly
> support Mitch's viewpoint here: Just optimize the heck out of FMAC and
> make that the basic building block for all your operations, including
> FDIV/FSQRT and all the trancendentals.
>
> The possibility of getting a pure FADD one cycle faster, but only when
> none of potentially many parallel SIMD operations hit a subnormal input
> or output, fails to excite me.
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
each A64FX.

Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
But for GPUs I agree with you and Mitch about uniform latency as a way to go.

BTW, in a post above I didn't mean *one* cycle faster.
What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24333&group=comp.arch#24333

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:89:b0:2e1:b8c7:9975 with SMTP id o9-20020a05622a008900b002e1b8c79975mr11884598qtw.342.1647728003767;
Sat, 19 Mar 2022 15:13:23 -0700 (PDT)
X-Received: by 2002:a05:6870:1607:b0:de:984:496d with SMTP id
b7-20020a056870160700b000de0984496dmr579254oae.253.1647728003543; Sat, 19 Mar
2022 15:13:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 15:13:23 -0700 (PDT)
In-Reply-To: <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4073:7501:59a7:2fa1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4073:7501:59a7:2fa1
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Mar 2022 22:13:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 81
 by: MitchAlsup - Sat, 19 Mar 2022 22:13 UTC

On Saturday, March 19, 2022 at 3:26:21 PM UTC-5, Michael S wrote:
> On Friday, March 18, 2022 at 6:04:37 PM UTC+2, Terje Mathisen wrote:
> > Michael S wrote:
> > > On Friday, March 18, 2022 at 10:53:03 AM UTC+2, Terje Mathisen wrote:
> > >> I seldom _need_ subnormal myself, but I have been told that there exists
> > >> solution solvers/zero finders that fail to stabilize (you end up
> > >> flipping back & forth between two values) unless you have subnormals..
> > >>
> > >
> > > That's what I expect for "chord&tangent" solver, the one I was taught back in high school
> > > even before we "officially" were taught derivatives in the math class..
> > > Probably, the same applies to pure tangent (i.e. Newton) method, but here I'm less sure.
> > >
> > >> That, plus the fact (which you mention above) that any FMAC-supporting
> > >> fp unit already have all the required circuitry to allow you to skip
> > >> input normalization, means that I like Mitch Alsup strongly believe all
> > >> fp units should support subnormal values at zero cycle cost.
> > >
> > > There are different scenarios for one or more denormal input or output.
> > > In some of them microtrap could be the best practical solution.
> > > In particular, I am thinking about designs where latency of regular ("normal") FADD is lower than latency of FMUL and of FMADD.
> > > Mitch does not see a point in such designs, but I do see it.
> > After spending 2016-2019 on the ieee754 update working group I strongly
> > support Mitch's viewpoint here: Just optimize the heck out of FMAC and
> > make that the basic building block for all your operations, including
> > FDIV/FSQRT and all the trancendentals.
> >
> > The possibility of getting a pure FADD one cycle faster, but only when
> > none of potentially many parallel SIMD operations hit a subnormal input
> > or output, fails to excite me.
> > Terje
> >
> >
> > --
> > - <Terje.Mathisen at tmsw.no>
> > "almost all programming can be viewed as an exercise in caching"
> Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> each A64FX.
>
> Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> But for GPUs I agree with you and Mitch about uniform latency as a way to go.
<
GPUs are not SIMD because each calculation belongs to a different thread, one which
can take different control flow,...
>
> BTW, in a post above I didn't mean *one* cycle faster.
> What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
<
FMAC is 62 gates of delay
FMUL is 60 gates of delay
FADD is 50 gates of delay
<
To get FADD into 3 cycles, you need 17 gates per cycle (2×17=51)
Thereby FMAC is CEIL(62/17) = CEIL(3.647) = 4 cycles with comfortable margin.
<
Thus, there is not a realistic scenario where FADD is 3 and FMAC is 5.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24334&group=comp.arch#24334

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2943:b0:67d:5ce7:4207 with SMTP id n3-20020a05620a294300b0067d5ce74207mr9243579qkp.706.1647729645227;
Sat, 19 Mar 2022 15:40:45 -0700 (PDT)
X-Received: by 2002:a05:6808:152b:b0:2ec:f48f:8120 with SMTP id
u43-20020a056808152b00b002ecf48f8120mr7513989oiw.58.1647729645037; Sat, 19
Mar 2022 15:40:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 15:40:44 -0700 (PDT)
In-Reply-To: <bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 19 Mar 2022 22:40:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 87
 by: Michael S - Sat, 19 Mar 2022 22:40 UTC

On Sunday, March 20, 2022 at 12:13:25 AM UTC+2, MitchAlsup wrote:
> On Saturday, March 19, 2022 at 3:26:21 PM UTC-5, Michael S wrote:
> > On Friday, March 18, 2022 at 6:04:37 PM UTC+2, Terje Mathisen wrote:
> > > Michael S wrote:
> > > > On Friday, March 18, 2022 at 10:53:03 AM UTC+2, Terje Mathisen wrote:
> > > >> I seldom _need_ subnormal myself, but I have been told that there exists
> > > >> solution solvers/zero finders that fail to stabilize (you end up
> > > >> flipping back & forth between two values) unless you have subnormals.
> > > >>
> > > >
> > > > That's what I expect for "chord&tangent" solver, the one I was taught back in high school
> > > > even before we "officially" were taught derivatives in the math class.
> > > > Probably, the same applies to pure tangent (i.e. Newton) method, but here I'm less sure.
> > > >
> > > >> That, plus the fact (which you mention above) that any FMAC-supporting
> > > >> fp unit already have all the required circuitry to allow you to skip
> > > >> input normalization, means that I like Mitch Alsup strongly believe all
> > > >> fp units should support subnormal values at zero cycle cost.
> > > >
> > > > There are different scenarios for one or more denormal input or output.
> > > > In some of them microtrap could be the best practical solution.
> > > > In particular, I am thinking about designs where latency of regular ("normal") FADD is lower than latency of FMUL and of FMADD.
> > > > Mitch does not see a point in such designs, but I do see it.
> > > After spending 2016-2019 on the ieee754 update working group I strongly
> > > support Mitch's viewpoint here: Just optimize the heck out of FMAC and
> > > make that the basic building block for all your operations, including
> > > FDIV/FSQRT and all the trancendentals.
> > >
> > > The possibility of getting a pure FADD one cycle faster, but only when
> > > none of potentially many parallel SIMD operations hit a subnormal input
> > > or output, fails to excite me.
> > > Terje
> > >
> > >
> > > --
> > > - <Terje.Mathisen at tmsw.no>
> > > "almost all programming can be viewed as an exercise in caching"
> > Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> > Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> > performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> > each A64FX.
> >
> > Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> > But for GPUs I agree with you and Mitch about uniform latency as a way to go.
> <
> GPUs are not SIMD because each calculation belongs to a different thread, one which
> can take different control flow,...
> >
> > BTW, in a post above I didn't mean *one* cycle faster.
> > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> <
> FMAC is 62 gates of delay
> FMUL is 60 gates of delay
> FADD is 50 gates of delay
> <
> To get FADD into 3 cycles, you need 17 gates per cycle (2×17=51)

Is it still a case when rare corner cases rise uTraps?

> Thereby FMAC is CEIL(62/17) = CEIL(3.647) = 4 cycles with comfortable margin.
> <
> Thus, there is not a realistic scenario where FADD is 3 and FMAC is 5.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<4661fd13-5375-4629-b559-18a31126a182n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24335&group=comp.arch#24335

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:508a:b0:440:f824:3d55 with SMTP id kk10-20020a056214508a00b00440f8243d55mr8555408qvb.26.1647736480992;
Sat, 19 Mar 2022 17:34:40 -0700 (PDT)
X-Received: by 2002:a05:6808:2018:b0:2ec:c22b:15b8 with SMTP id
q24-20020a056808201800b002ecc22b15b8mr10750477oiw.136.1647736480616; Sat, 19
Mar 2022 17:34:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 17:34:40 -0700 (PDT)
In-Reply-To: <b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4073:7501:59a7:2fa1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4073:7501:59a7:2fa1
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com> <b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4661fd13-5375-4629-b559-18a31126a182n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Mar 2022 00:34:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 23
 by: MitchAlsup - Sun, 20 Mar 2022 00:34 UTC

On Saturday, March 19, 2022 at 5:40:46 PM UTC-5, Michael S wrote:
> On Sunday, March 20, 2022 at 12:13:25 AM UTC+2, MitchAlsup wrote:

> > > BTW, in a post above I didn't mean *one* cycle faster.
> > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> > <
> > FMAC is 62 gates of delay
> > FMUL is 60 gates of delay
> > FADD is 50 gates of delay
> > <
> > To get FADD into 3 cycles, you need 17 gates per cycle (2×17=51)
<
> Is it still a case when rare corner cases rise uTraps?
<
Not the way most people build them these days, about the only corner case is when
denominator is denormalized and one is using Newton-Raphson or Goldschmidt
division. It does not mater if division is SRT.
<
> > Thereby FMAC is CEIL(62/17) = CEIL(3.647) = 4 cycles with comfortable margin.
> > <
> > Thus, there is not a realistic scenario where FADD is 3 and FMAC is 5.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<4605d6df-d71a-4711-9271-fe2fc01fc4f6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24336&group=comp.arch#24336

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:fe47:0:b0:42d:f798:3da5 with SMTP id u7-20020a0cfe47000000b0042df7983da5mr11870885qvs.77.1647738366579;
Sat, 19 Mar 2022 18:06:06 -0700 (PDT)
X-Received: by 2002:a9d:2f61:0:b0:5af:5d99:29ed with SMTP id
h88-20020a9d2f61000000b005af5d9929edmr5776382otb.142.1647738366365; Sat, 19
Mar 2022 18:06:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 18:06:06 -0700 (PDT)
In-Reply-To: <4661fd13-5375-4629-b559-18a31126a182n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com> <b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com>
<4661fd13-5375-4629-b559-18a31126a182n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4605d6df-d71a-4711-9271-fe2fc01fc4f6n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 20 Mar 2022 01:06:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 38
 by: Michael S - Sun, 20 Mar 2022 01:06 UTC

On Sunday, March 20, 2022 at 2:34:42 AM UTC+2, MitchAlsup wrote:
> On Saturday, March 19, 2022 at 5:40:46 PM UTC-5, Michael S wrote:
> > On Sunday, March 20, 2022 at 12:13:25 AM UTC+2, MitchAlsup wrote:
>
> > > > BTW, in a post above I didn't mean *one* cycle faster.
> > > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> > > <
> > > FMAC is 62 gates of delay
> > > FMUL is 60 gates of delay
> > > FADD is 50 gates of delay
> > > <
> > > To get FADD into 3 cycles, you need 17 gates per cycle (2×17=51)
> <
> > Is it still a case when rare corner cases rise uTraps?
> <
> Not the way most people build them these days, about the only corner case is when
> denominator is denormalized and one is using Newton-Raphson or Goldschmidt
> division. It does not mater if division is SRT.

I was talking about class of designs where FADD unit is specialized for FADD and nothing else.
FDIV is done either by completely separate HW or by the same unit that does FMADD.

That's appears to be a design that Intel used before SKL and that AMD used in Zen1 and Zen2 for at least one of their pair FADD pipes.
May be, Zen3 too.

> <
> > > Thereby FMAC is CEIL(62/17) = CEIL(3.647) = 4 cycles with comfortable margin.
> > > <
> > > Thus, there is not a realistic scenario where FADD is 3 and FMAC is 5..

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<6121dcd1-6eca-4aef-9a10-2c7f1503f2c8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24337&group=comp.arch#24337

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:89:b0:2e1:b8c7:9975 with SMTP id o9-20020a05622a008900b002e1b8c79975mr12205691qtw.342.1647740519017;
Sat, 19 Mar 2022 18:41:59 -0700 (PDT)
X-Received: by 2002:a05:6808:1152:b0:2da:c7f:66c2 with SMTP id
u18-20020a056808115200b002da0c7f66c2mr11159620oiu.253.1647740518828; Sat, 19
Mar 2022 18:41:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 18:41:58 -0700 (PDT)
In-Reply-To: <4605d6df-d71a-4711-9271-fe2fc01fc4f6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4073:7501:59a7:2fa1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4073:7501:59a7:2fa1
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com> <b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com>
<4661fd13-5375-4629-b559-18a31126a182n@googlegroups.com> <4605d6df-d71a-4711-9271-fe2fc01fc4f6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6121dcd1-6eca-4aef-9a10-2c7f1503f2c8n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Mar 2022 01:41:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 44
 by: MitchAlsup - Sun, 20 Mar 2022 01:41 UTC

On Saturday, March 19, 2022 at 8:06:07 PM UTC-5, Michael S wrote:
> On Sunday, March 20, 2022 at 2:34:42 AM UTC+2, MitchAlsup wrote:
> > On Saturday, March 19, 2022 at 5:40:46 PM UTC-5, Michael S wrote:
> > > On Sunday, March 20, 2022 at 12:13:25 AM UTC+2, MitchAlsup wrote:
> >
> > > > > BTW, in a post above I didn't mean *one* cycle faster.
> > > > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> > > > <
> > > > FMAC is 62 gates of delay
> > > > FMUL is 60 gates of delay
> > > > FADD is 50 gates of delay
> > > > <
> > > > To get FADD into 3 cycles, you need 17 gates per cycle (2×17=51)
> > <
> > > Is it still a case when rare corner cases rise uTraps?
> > <
> > Not the way most people build them these days, about the only corner case is when
> > denominator is denormalized and one is using Newton-Raphson or Goldschmidt
> > division. It does not mater if division is SRT.
<
> I was talking about class of designs where FADD unit is specialized for FADD and nothing else.
<
I was talking about the Opteron FADD unit. Which does FADD, FSUB, FCMP, but not
FMUL, FDIV, SQRT, or conversions.
<
> FDIV is done either by completely separate HW or by the same unit that does FMADD.
<
This does not change the delay of the FADD part.
>
> That's appears to be a design that Intel used before SKL and that AMD used in Zen1 and Zen2
> for at least one of their pair FADD pipes. May be, Zen3 too.
> > <
> > > > Thereby FMAC is CEIL(62/17) = CEIL(3.647) = 4 cycles with comfortable margin.
> > > > <
> > > > Thus, there is not a realistic scenario where FADD is 3 and FMAC is 5.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24338&group=comp.arch#24338

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5d86:0:b0:2e1:b9fd:ec24 with SMTP id d6-20020ac85d86000000b002e1b9fdec24mr12293184qtx.290.1647743622947;
Sat, 19 Mar 2022 19:33:42 -0700 (PDT)
X-Received: by 2002:a9d:6393:0:b0:5b2:3ff9:d1c1 with SMTP id
w19-20020a9d6393000000b005b23ff9d1c1mr5631825otk.243.1647743622519; Sat, 19
Mar 2022 19:33:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 19:33:42 -0700 (PDT)
In-Reply-To: <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 20 Mar 2022 02:33:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 15
 by: Quadibloc - Sun, 20 Mar 2022 02:33 UTC

On Saturday, March 19, 2022 at 10:16:10 AM UTC-6, MitchAlsup wrote:

> What if the mitigations do not bear such a heavy cost; say on the
> order of 1%.......

Then, as you say, the right choice _is_ to

> Build machines that are immune from Spectré.

What I've been reading, though, is that while costs of 2% to 3%
are associated with _some_ sets of mitigations for Spectre,
later versions of the attack have proven harder to deal with, and
so the figure for protecting against _all_ the variants of Spectre
and Meltdown is now at 25% and climbing.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24339&group=comp.arch#24339

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2aad:b0:441:50e:ce56 with SMTP id js13-20020a0562142aad00b00441050ece56mr477876qvb.128.1647744156770;
Sat, 19 Mar 2022 19:42:36 -0700 (PDT)
X-Received: by 2002:a9d:72c6:0:b0:5af:42ef:bb7c with SMTP id
d6-20020a9d72c6000000b005af42efbb7cmr5542769otk.96.1647744156548; Sat, 19 Mar
2022 19:42:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Mar 2022 19:42:36 -0700 (PDT)
In-Reply-To: <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4073:7501:59a7:2fa1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4073:7501:59a7:2fa1
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Mar 2022 02:42:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 20 Mar 2022 02:42 UTC

On Saturday, March 19, 2022 at 9:33:44 PM UTC-5, Quadibloc wrote:
> On Saturday, March 19, 2022 at 10:16:10 AM UTC-6, MitchAlsup wrote:
>
> > What if the mitigations do not bear such a heavy cost; say on the
> > order of 1%.......
> Then, as you say, the right choice _is_ to
> > Build machines that are immune from Spectré.
<
> What I've been reading, though, is that while costs of 2% to 3%
> are associated with _some_ sets of mitigations for Spectre,
> later versions of the attack have proven harder to deal with, and
> so the figure for protecting against _all_ the variants of Spectre
> and Meltdown is now at 25% and climbing.
<
Just because x86 and ARM are having problems does not mean everyone is.
>
> John Savard

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<t16ft4$q9o$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24340&group=comp.arch#24340

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit
architecture?
Date: Sat, 19 Mar 2022 23:00:03 -0700
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <t16ft4$q9o$1@dont-email.me>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<t13bi2$nbc$1@gal.iecc.com>
<dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
<t156o6$30a$1@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 20 Mar 2022 06:00:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fdf6e8caf7bcfe21273589171002b318";
logging-data="26936"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1W4+h6EdMyh4hpFpfU0YBOfVNZFQcFrk="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:MF6/fnB/nUtu102brahrfmWBs8Y=
In-Reply-To: <t156o6$30a$1@gal.iecc.com>
Content-Language: en-US
 by: Stephen Fuld - Sun, 20 Mar 2022 06:00 UTC

On 3/19/2022 11:17 AM, John Levine wrote:
> According to Quadibloc <jsavard@ecn.ab.ca>:

snip

>> If there was an interrupt because of a memory access instruction
>> having a page fault, ...
>
> A what? What is a "page fault"? If you're looking for the 360/67, it's down the
> hall on your right.

:-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<2022Mar20.092534@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24341&group=comp.arch#24341

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sun, 20 Mar 2022 08:25:34 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 28
Message-ID: <2022Mar20.092534@mips.complang.tuwien.ac.at>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <t10mvq$4oe$1@gioia.aioe.org> <t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com> <t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com> <bb6c8d76-8c88-43c8-919d-d0448bf33900n@googlegroups.com> <b3e09325-b4be-4a1b-a28f-bf1b43f18247n@googlegroups.com> <4661fd13-5375-4629-b559-18a31126a182n@googlegroups.com> <4605d6df-d71a-4711-9271-fe2fc01fc4f6n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="53f8d1dc4f582a946a31715650919540";
logging-data="15160"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+6fN6myVtaRuFFIO6o5Gc1"
Cancel-Lock: sha1:DEoaeGinvbaO5L3jFJaAQzwKlhw=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 20 Mar 2022 08:25 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Sunday, March 20, 2022 at 2:34:42 AM UTC+2, MitchAlsup wrote:
>> > > FMAC is 62 gates of delay=20
>> > > FMUL is 60 gates of delay=20
>> > > FADD is 50 gates of delay=20
....
>I was talking about class of designs where FADD unit is specialized for FAD=
>D and nothing else.
....
>That's appears to be a design that Intel used before SKL and that AMD used =
>in Zen1 and Zen2 for at least one of their pair FADD pipes.=20
>May be, Zen3 too.

Looking at <https://www.agner.org/optimize/instruction_tables.pdf>,
the latencies are:

ADDPD MULPD VFMADD132PD
3 4 5 Zen
3 3 5 Zen2
3 3 4 Zen3

The diagrams I remember show FADD and FMA units for all of these
cores.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<2022Mar20.094045@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24342&group=comp.arch#24342

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sun, 20 Mar 2022 08:40:45 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 23
Message-ID: <2022Mar20.094045@mips.complang.tuwien.ac.at>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com> <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com> <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="53f8d1dc4f582a946a31715650919540";
logging-data="15160"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18mklobouR6mvSIx4dEwQpP"
Cancel-Lock: sha1:J78whpNwlOPQ98eUih0SUhfANqc=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 20 Mar 2022 08:40 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>On Saturday, March 19, 2022 at 10:16:10 AM UTC-6, MitchAlsup wrote:
>> Build machines that are immune from Spectr=C3=A9.=20
>
>What I've been reading, though, is that while costs of 2% to 3%
>are associated with _some_ sets of mitigations for Spectre,
>later versions of the attack have proven harder to deal with, and
>so the figure for protecting against _all_ the variants of Spectre
>and Meltdown is now at 25% and climbing.

We have not seen any CPU designed to be immune to Spectre. What we
have seen are CPUs that have been designed before Spectre was known to
the CPU designers, or, in the case of Alder Lake, which may have been
early enough in the design process to make it immune without
significant schedule delays, where the CPU designers were told (or
told each other) not to worry about it and throw the problem over to
software. Because these CPUs are not immune to Spectre, they need
mitigations, and these mitigations cost.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24343&group=comp.arch#24343

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2708:b0:67e:9b07:69bf with SMTP id b8-20020a05620a270800b0067e9b0769bfmr159669qkp.274.1647767754849;
Sun, 20 Mar 2022 02:15:54 -0700 (PDT)
X-Received: by 2002:a9d:5e15:0:b0:5b2:5125:fd09 with SMTP id
d21-20020a9d5e15000000b005b25125fd09mr6036473oti.129.1647767754609; Sun, 20
Mar 2022 02:15:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 02:15:54 -0700 (PDT)
In-Reply-To: <t156o6$30a$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<t13bi2$nbc$1@gal.iecc.com> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
<t156o6$30a$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 20 Mar 2022 09:15:54 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18
 by: Quadibloc - Sun, 20 Mar 2022 09:15 UTC

On Saturday, March 19, 2022 at 12:17:45 PM UTC-6, John Levine wrote:
> According to Quadibloc <jsa...@ecn.ab.ca>:

> >If there was an interrupt because of a memory access instruction
> >having a page fault, ...

> A what? What is a "page fault"? If you're looking for the 360/67, it's down the
> hall on your right.

Oh, you mean that when a program executes a memory access instruction
to a location which... is in virtual memory, but not physical memory, as that
location's data (in the case of a read) is only sitting on the swap file... some
other name than "page fault" is used for the condition of that happening on
modern computer architectures?

I guess that's reasonable, given that modern architectures probably don't
use fixed-size pages for this, the way the 360 and 370 did.

John Savard

Pages:1234567
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor