Message-ID:

A LISP programmer knows the value of everything, but the cost of nothing. -- Alan Perlis

devel / comp.arch / Re: Why separate 32-bit arithmetic on a 64-bit architecture?

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<54515e0e-04c0-4af4-b543-f1264c69456an@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=24344&group=comp.arch#24344

X-Received: by 2002:a05:620a:1713:b0:67b:3b91:e91b with SMTP id az19-20020a05620a171300b0067b3b91e91bmr10206585qkb.534.1647768014305;
Sun, 20 Mar 2022 02:20:14 -0700 (PDT)
X-Received: by 2002:aca:a8c5:0:b0:2ec:b137:c0a2 with SMTP id
r188-20020acaa8c5000000b002ecb137c0a2mr217434oie.1.1647768014046; Sun, 20 Mar
2022 02:20:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 02:20:13 -0700 (PDT)
In-Reply-To: <2022Mar20.094045@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e
References: <sso6aq$37b$1@newsreader4.netcologne.de> <t11h9a$g5v$1@gioia.aioe.org>
<1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com> <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com>
<3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com>
<bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com> <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>
<fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <2022Mar20.094045@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <54515e0e-04c0-4af4-b543-f1264c69456an@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 20 Mar 2022 09:20:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 19

by: Quadibloc - Sun, 20 Mar 2022 09:20 UTC

On Sunday, March 20, 2022 at 2:47:29 AM UTC-6, Anton Ertl wrote:

> We have not seen any CPU designed to be immune to Spectre. What we
> have seen are CPUs that have been designed before Spectre was known to
> the CPU designers,

Well, though, some of the CPUs in that latter category _are_ immune to
Spectre. The 80486, the IBM System/360 Model 75, to name a few.

According to Mitch Alsup, though, there _are_ CPU designs out there
that can be made immune to Spectre by means of mitigations at a
significantly smaller performance cost than either x86 or ARM CPUs
must bear. I have no idea what they are, but I _hope_ he doesn't
mean CPUs based on the CDC 6600 scoreboard.

Because _that_ costs performance compared to real OoO, so it's not
really a way to avoid mitigation cost. But surely he would have enough
sense to realize that.

John Savard

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<078ddb0e-c1c2-4716-bd1c-6b1f99d21087n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24345&group=comp.arch#24345

copy link Newsgroups: comp.arch

X-Received: by 2002:ae9:e411:0:b0:67e:616f:400a with SMTP id q17-20020ae9e411000000b0067e616f400amr5120788qkc.645.1647768222866;
Sun, 20 Mar 2022 02:23:42 -0700 (PDT)
X-Received: by 2002:a05:6870:d58b:b0:d2:8d1d:c12 with SMTP id
u11-20020a056870d58b00b000d28d1d0c12mr10773519oao.108.1647768222653; Sun, 20
Mar 2022 02:23:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 02:23:42 -0700 (PDT)
In-Reply-To: <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:bde6:e5e0:5d72:7e4e
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<t13bi2$nbc$1@gal.iecc.com> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
<t156o6$30a$1@gal.iecc.com> <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <078ddb0e-c1c2-4716-bd1c-6b1f99d21087n@googlegroups.com>
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 20 Mar 2022 09:23:42 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 20

by: Quadibloc - Sun, 20 Mar 2022 09:23 UTC

On Sunday, March 20, 2022 at 3:15:56 AM UTC-6, Quadibloc wrote:
> On Saturday, March 19, 2022 at 12:17:45 PM UTC-6, John Levine wrote:
> > According to Quadibloc <jsa...@ecn.ab.ca>:
> > >If there was an interrupt because of a memory access instruction
> > >having a page fault, ...
>
> > A what? What is a "page fault"? If you're looking for the 360/67, it's down the
> > hall on your right.
> Oh, you mean that when a program executes a memory access instruction
> to a location which... is in virtual memory, but not physical memory, as that
> location's data (in the case of a read) is only sitting on the swap file... some
> other name than "page fault" is used for the condition of that happening on
> modern computer architectures?
>
> I guess that's reasonable, given that modern architectures probably don't
> use fixed-size pages for this, the way the 360 and 370 did.

....ah, if only I could walk down a hall, and turn right, and be eighteen years
old once again...

John Savard

Fixing Spectre (was: Why separate 32-bit arithmetic ...)

<2022Mar20.105613@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24346&group=comp.arch#24346

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Fixing Spectre (was: Why separate 32-bit arithmetic ...)
Date: Sun, 20 Mar 2022 09:56:13 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 42
Message-ID: <2022Mar20.105613@mips.complang.tuwien.ac.at>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com> <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <2022Mar20.094045@mips.complang.tuwien.ac.at> <54515e0e-04c0-4af4-b543-f1264c69456an@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="53f8d1dc4f582a946a31715650919540";
logging-data="31210"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+TXek7ShrTdbciXGzbVDke"
Cancel-Lock: sha1:ufNfqgQXAejow020jCB5+m+YzkE=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 20 Mar 2022 09:56 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>On Sunday, March 20, 2022 at 2:47:29 AM UTC-6, Anton Ertl wrote:
>
>> We have not seen any CPU designed to be immune to Spectre. What we
>> have seen are CPUs that have been designed before Spectre was known to
>> the CPU designers,
>
>Well, though, some of the CPUs in that latter category _are_ immune to
>Spectre. The 80486, the IBM System/360 Model 75, to name a few.

Sure, CPUs that do not perform speculative execution are immune to
Spectre, because Spectre is based on speculative execution.

>According to Mitch Alsup, though, there _are_ CPU designs out there
>that can be made immune to Spectre by means of mitigations at a
>significantly smaller performance cost than either x86 or ARM CPUs
>must bear. I have no idea what they are

That's disappointing, given that we have discussed this repeatedly
since Spectre was revealed in January 2018, e.g., the discussion
including <2018Jan10.133244@mips.complang.tuwien.ac.at>
(<http://al.howardknight.net/?ID=151559737200>). A recent discussion
has included e.g., <2022Jan23.002646@mips.complang.tuwien.ac.at>
<2022Jan24.144612@mips.complang.tuwien.ac.at>
(<https://news.novabbs.com/computers/article-flat.php?id=23055&group=comp.arch#23055>;
note that the tree overview does not show the postings on the subjects
"Avoiding Spectre" and "Fixing Spectre"; you could try Google Groups
for a better tree view). You even participated in the recent
discussion.

Given that such CPUs are designed to be immune, I would not call it a
mitigation.

As for the architecture (you seem to assume that the supposed
significant performance cost is inevitable for "x86" and ARM):
architecture does not matter, because the bug is a microarchitectural
bug. You fix it in the microarchitecture.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t1744k$o4s$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24347&group=comp.arch#24347

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-d5e7-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sun, 20 Mar 2022 11:45:24 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t1744k$o4s$1@newsreader4.netcologne.de>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de>
<sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad>
<ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com>
<t10mvq$4oe$1@gioia.aioe.org> <t11h9a$g5v$1@gioia.aioe.org>
<1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com>
<3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com>
<bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>
<fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 20 Mar 2022 11:45:24 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-d5e7-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:d5e7:0:7285:c2ff:fe6c:992d";
logging-data="24732"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 20 Mar 2022 11:45 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Saturday, March 19, 2022 at 9:33:44 PM UTC-5, Quadibloc wrote:
>> On Saturday, March 19, 2022 at 10:16:10 AM UTC-6, MitchAlsup wrote:
>>
>> > What if the mitigations do not bear such a heavy cost; say on the
>> > order of 1%.......
>> Then, as you say, the right choice _is_ to
>> > Build machines that are immune from Spectré.
><
>> What I've been reading, though, is that while costs of 2% to 3%
>> are associated with _some_ sets of mitigations for Spectre,
>> later versions of the attack have proven harder to deal with, and
>> so the figure for protecting against _all_ the variants of Spectre
>> and Meltdown is now at 25% and climbing.
><
> Just because x86 and ARM are having problems does not mean everyone is.

As does POWER, and zSystem...

Is there an OOO implementation of a CPU (existing, in large-scale
use) that does not suffer from Spectre, or experience slowdowns
from the fixes?

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24348&group=comp.arch#24348

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:4691:b0:67d:9bab:33d7 with SMTP id bq17-20020a05620a469100b0067d9bab33d7mr10561093qkb.500.1647785111931;
Sun, 20 Mar 2022 07:05:11 -0700 (PDT)
X-Received: by 2002:a05:6870:a2d0:b0:d9:ae66:b8e2 with SMTP id
w16-20020a056870a2d000b000d9ae66b8e2mr10235702oak.7.1647785111736; Sun, 20
Mar 2022 07:05:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 07:05:11 -0700 (PDT)
In-Reply-To: <t1744k$o4s$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 20 Mar 2022 14:05:11 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 32

by: Michael S - Sun, 20 Mar 2022 14:05 UTC

On Sunday, March 20, 2022 at 1:45:27 PM UTC+2, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > On Saturday, March 19, 2022 at 9:33:44 PM UTC-5, Quadibloc wrote:
> >> On Saturday, March 19, 2022 at 10:16:10 AM UTC-6, MitchAlsup wrote:
> >>
> >> > What if the mitigations do not bear such a heavy cost; say on the
> >> > order of 1%.......
> >> Then, as you say, the right choice _is_ to
> >> > Build machines that are immune from Spectré.
> ><
> >> What I've been reading, though, is that while costs of 2% to 3%
> >> are associated with _some_ sets of mitigations for Spectre,
> >> later versions of the attack have proven harder to deal with, and
> >> so the figure for protecting against _all_ the variants of Spectre
> >> and Meltdown is now at 25% and climbing.
> ><
> > Just because x86 and ARM are having problems does not mean everyone is.
>
> As does POWER, and zSystem...
>
> Is there an OOO implementation of a CPU (existing, in large-scale
> use) that does not suffer from Spectre, or experience slowdowns
> from the fixes?

Is there any high-performance CPU, either OoO or In-order, that does not suffer from SpectreV1 (bound check bypass) ?
I heard claims that both Itanium uArchs are not vulnerable, but don't remember if it came from authoritative source.
POWER6 is certainly vulnerable.
Even some low-performance in-order cores, in particular Arm Cortex-A8, are vulnerable.

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<t17hpf$l3e$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24349&group=comp.arch#24349

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sun, 20 Mar 2022 15:38:23 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t17hpf$l3e$1@gal.iecc.com>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com> <t156o6$30a$1@gal.iecc.com> <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
Injection-Date: Sun, 20 Mar 2022 15:38:23 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="21614"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <sso6aq$37b$1@newsreader4.netcologne.de> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com> <t156o6$30a$1@gal.iecc.com> <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Sun, 20 Mar 2022 15:38 UTC

According to Quadibloc <jsavard@ecn.ab.ca>:
>On Saturday, March 19, 2022 at 12:17:45 PM UTC-6, John Levine wrote:
>> According to Quadibloc <jsa...@ecn.ab.ca>:
>
>> >If there was an interrupt because of a memory access instruction
>> >having a page fault, ...
>
>> A what? What is a "page fault"? If you're looking for the 360/67, it's down the
>> hall on your right.
>
>Oh, you mean that when a program executes a memory access instruction
>to a location which... is in virtual memory, ...

You keep using words that we 360/91 programmers don't recognize. A program
is either in core (and I mean core) or it isn't.

I think that everyone who designs computers with virtual memory understands
that it has to be possible to restart after a page fault. Usually the
state is synchronized in hardware but I've seen a few where the fault dumps
internal registers and the fault handler has to fix up the state. Intel 860
I think did that.

There have also been implementation bugs. e.g., on the VAX 11/750 you
couldn't recover from faults in read-only stack pages. The brute force
response to that was vfork(), although the sensible approach would have
been to do copy-on-touch rather than copy-on-write in the stack.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<d05ed798-4a72-45d7-984d-993867f124d9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24350&group=comp.arch#24350

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:2aad:b0:441:50e:ce56 with SMTP id js13-20020a0562142aad00b00441050ece56mr2066602qvb.128.1647794162098;
Sun, 20 Mar 2022 09:36:02 -0700 (PDT)
X-Received: by 2002:a05:6808:2018:b0:2ec:c22b:15b8 with SMTP id
q24-20020a056808201800b002ecc22b15b8mr11883776oiw.136.1647794161882; Sun, 20
Mar 2022 09:36:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 09:36:01 -0700 (PDT)
In-Reply-To: <t17hpf$l3e$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
<t156o6$30a$1@gal.iecc.com> <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
<t17hpf$l3e$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d05ed798-4a72-45d7-984d-993867f124d9n@googlegroups.com>
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 20 Mar 2022 16:36:02 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 33

by: Michael S - Sun, 20 Mar 2022 16:36 UTC

On Sunday, March 20, 2022 at 5:38:26 PM UTC+2, John Levine wrote:
> According to Quadibloc <jsa...@ecn.ab.ca>:
> >On Saturday, March 19, 2022 at 12:17:45 PM UTC-6, John Levine wrote:
> >> According to Quadibloc <jsa...@ecn.ab.ca>:
> >
> >> >If there was an interrupt because of a memory access instruction
> >> >having a page fault, ...
> >
> >> A what? What is a "page fault"? If you're looking for the 360/67, it's down the
> >> hall on your right.
> >
> >Oh, you mean that when a program executes a memory access instruction
> >to a location which... is in virtual memory, ...
>
> You keep using words that we 360/91 programmers don't recognize. A program
> is either in core (and I mean core) or it isn't.
>
> I think that everyone who designs computers with virtual memory understands
> that it has to be possible to restart after a page fault. Usually the
> state is synchronized in hardware but I've seen a few where the fault dumps
> internal registers and the fault handler has to fix up the state. Intel 860
> I think did that.
>
> There have also been implementation bugs. e.g., on the VAX 11/750 you
> couldn't recover from faults in read-only stack pages. The brute force
> response to that was vfork(), although the sensible approach would have
> been to do copy-on-touch rather than copy-on-write in the stack.
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

I think, you're trying to remind to John Savard that S/360, with possible exception of Model 67, had no thing that we now call MMU.
I also think that you can do it in more direct way.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<2022Mar20.174140@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24351&group=comp.arch#24351

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Sun, 20 Mar 2022 16:41:40 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 32
Message-ID: <2022Mar20.174140@mips.complang.tuwien.ac.at>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com> <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de> <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="53f8d1dc4f582a946a31715650919540";
logging-data="3645"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX193qoOHCIvCWROulb77u4ZD"
Cancel-Lock: sha1:rpc/we4USPWhG3+P65TSsIYsYYw=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 20 Mar 2022 16:41 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Sunday, March 20, 2022 at 1:45:27 PM UTC+2, Thomas Koenig wrote:
>> Is there an OOO implementation of a CPU (existing, in large-scale=20
>> use) that does not suffer from Spectre, or experience slowdowns=20
>> from the fixes?

Not yet AFAIK.

>Is there any high-performance CPU, either OoO or In-order, that does not su=
>ffer from SpectreV1 (bound check bypass) ?
>I heard claims that both Itanium uArchs are not vulnerable, but don't remem=
>ber if it came from authoritative source.
>POWER6 is certainly vulnerable.=20
>Even some low-performance in-order cores, in particular Arm Cortex-A8, are =
>vulnerable.

References?

Given that Spectre V1 performs two speculatively executed loads, with
the second depending on the result of the first, I would expect that
your typical in-order core is immune to that, because while the
typical in-order core can predict the branch and do instruction
fetching, decoding, and maybe even the execute stage, it stops at the
commit stage because it cannot back out of that. Ok, the second load
could get the result through the forwarding path and continue
execution with that, but do actual in-order cores do that? When does
it stop?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<qVJZJ.155290$r6p7.55679@fx41.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24353&group=comp.arch#24353

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx41.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com> <bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com> <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de> <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <2022Mar20.174140@mips.complang.tuwien.ac.at>
In-Reply-To: <2022Mar20.174140@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 48
Message-ID: <qVJZJ.155290$r6p7.55679@fx41.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 20 Mar 2022 17:56:38 UTC
Date: Sun, 20 Mar 2022 13:56:24 -0400
X-Received-Bytes: 3451

by: EricP - Sun, 20 Mar 2022 17:56 UTC

Anton Ertl wrote:
> Michael S <already5chosen@yahoo.com> writes:
>> On Sunday, March 20, 2022 at 1:45:27 PM UTC+2, Thomas Koenig wrote:
>>> Is there an OOO implementation of a CPU (existing, in large-scale=20
>>> use) that does not suffer from Spectre, or experience slowdowns=20
>>> from the fixes?
>
> Not yet AFAIK.
>
>> Is there any high-performance CPU, either OoO or In-order, that does not su=
>> ffer from SpectreV1 (bound check bypass) ?
>> I heard claims that both Itanium uArchs are not vulnerable, but don't remem=
>> ber if it came from authoritative source.
>> POWER6 is certainly vulnerable.=20
>> Even some low-performance in-order cores, in particular Arm Cortex-A8, are =
>> vulnerable.
>
> References?
>
> Given that Spectre V1 performs two speculatively executed loads, with
> the second depending on the result of the first, I would expect that
> your typical in-order core is immune to that, because while the
> typical in-order core can predict the branch and do instruction
> fetching, decoding, and maybe even the execute stage, it stops at the
> commit stage because it cannot back out of that. Ok, the second load
> could get the result through the forwarding path and continue
> execution with that, but do actual in-order cores do that? When does
> it stop?
>
> - anton

The following page contains a table of which Spectre's affect which Arm's.
As you can see the in-order Cortex-A8 is vulnerable to V1 and V2.

https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability

Arm are slightly coy about disclosing exactly how this occurs on in-order
cores but I suspect stores values can be speculatively saved in the store
buffer (the secret byte), and used via store-load forwarding as an index
for an adjacent load, to touch a cache line and display the secret value.

This might also occur if the load-store pipeline is processed separately
from integer, and stores are performed at WB stage but loads are
performed at earlier EXE, and if store-load forwarding occurs.
The older store would not block the load from proceeding
allowing load to touch a cache line and display a secret value.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<4fa644c9-0faa-4d53-896b-501b97884b4fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24354&group=comp.arch#24354

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:3004:b0:434:ec44:a4aa with SMTP id ke4-20020a056214300400b00434ec44a4aamr13646495qvb.82.1647799113675;
Sun, 20 Mar 2022 10:58:33 -0700 (PDT)
X-Received: by 2002:a05:6870:5829:b0:c8:9f42:f919 with SMTP id
r41-20020a056870582900b000c89f42f919mr6829033oap.54.1647799113478; Sun, 20
Mar 2022 10:58:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 10:58:33 -0700 (PDT)
In-Reply-To: <2022Mar20.174140@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f1de:c51c:bcfe:6234;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f1de:c51c:bcfe:6234
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <2022Mar20.174140@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4fa644c9-0faa-4d53-896b-501b97884b4fn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Mar 2022 17:58:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 41

by: MitchAlsup - Sun, 20 Mar 2022 17:58 UTC

On Sunday, March 20, 2022 at 11:54:37 AM UTC-5, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Sunday, March 20, 2022 at 1:45:27 PM UTC+2, Thomas Koenig wrote:
> >> Is there an OOO implementation of a CPU (existing, in large-scale=20
> >> use) that does not suffer from Spectre, or experience slowdowns=20
> >> from the fixes?
>
> Not yet AFAIK.
>
> >Is there any high-performance CPU, either OoO or In-order, that does not su> >ffer from SpectreV1 (bound check bypass) ?
> >I heard claims that both Itanium uArchs are not vulnerable, but don't remem> >ber if it came from authoritative source.
> >POWER6 is certainly vulnerable.=20
> >Even some low-performance in-order cores, in particular Arm Cortex-A8, are =
> >vulnerable.
>
> References?
>
> Given that Spectre V1 performs two speculatively executed loads, with
> the second depending on the result of the first, I would expect that
> your typical in-order core is immune to that, because while the
> typical in-order core can predict the branch and do instruction
> fetching, decoding, and maybe even the execute stage, it stops at the
> commit stage because it cannot back out of that. Ok, the second load
> could get the result through the forwarding path and continue
> execution with that, but do actual in-order cores do that? When does
> it stop?
<
AMD machines were not sensitive to the double load version of Spectré
These all fit the usual definition of GBOoO machines.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24355&group=comp.arch#24355

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:6407:0:b0:67e:4423:7127 with SMTP id y7-20020a376407000000b0067e44237127mr9770265qkb.526.1647831838323;
Sun, 20 Mar 2022 20:03:58 -0700 (PDT)
X-Received: by 2002:a05:6870:9590:b0:de:27ca:c60c with SMTP id
k16-20020a056870959000b000de27cac60cmr512076oao.108.1647831838039; Sun, 20
Mar 2022 20:03:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 20:03:57 -0700 (PDT)
In-Reply-To: <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:20d5:40c0:3e5e:1bfe;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:20d5:40c0:3e5e:1bfe
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 21 Mar 2022 03:03:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 7

by: Quadibloc - Mon, 21 Mar 2022 03:03 UTC

On Sunday, March 20, 2022 at 8:05:13 AM UTC-6, Michael S wrote:

> Is there any high-performance CPU, either OoO or In-order,

Well, an in-order CPU would definitely not suffer from Spectre.
But is there such a thing as a high-performance in-order CPU?

John Savard

Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?

<07499662-a7ce-484e-9377-4a3f4b8cea81n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24356&group=comp.arch#24356

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:fe47:0:b0:42d:f798:3da5 with SMTP id u7-20020a0cfe47000000b0042df7983da5mr14870006qvs.77.1647832005094;
Sun, 20 Mar 2022 20:06:45 -0700 (PDT)
X-Received: by 2002:a05:6870:1607:b0:de:984:496d with SMTP id
b7-20020a056870160700b000de0984496dmr2018524oae.253.1647832004891; Sun, 20
Mar 2022 20:06:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Mar 2022 20:06:44 -0700 (PDT)
In-Reply-To: <d05ed798-4a72-45d7-984d-993867f124d9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:20d5:40c0:3e5e:1bfe;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:20d5:40c0:3e5e:1bfe
References: <sso6aq$37b$1@newsreader4.netcologne.de> <dbbf2425-d7f7-4bf0-8a09-d360eb619421n@googlegroups.com>
<t156o6$30a$1@gal.iecc.com> <31efa824-7995-42e1-9d57-26a7a67f9b8dn@googlegroups.com>
<t17hpf$l3e$1@gal.iecc.com> <d05ed798-4a72-45d7-984d-993867f124d9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <07499662-a7ce-484e-9377-4a3f4b8cea81n@googlegroups.com>
Subject: Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 21 Mar 2022 03:06:45 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16

by: Quadibloc - Mon, 21 Mar 2022 03:06 UTC

On Sunday, March 20, 2022 at 10:36:03 AM UTC-6, Michael S wrote:

> I think, you're trying to remind to John Savard that S/360, with possible exception of Model 67, had no thing that we now call MMU.
> I also think that you can do it in more direct way.

And I was perfectly well aware that the 360/91 didn't have virtual memory.

Even the 360/195 which had cache didn't have virtual memory.

But since I was discussing out-of-order execution in general, and
not the 360/91 in particular, why on Earth would someone get
the impression that I had forgotten that by my mention of page
faults?

This is now getting more confusing by the minute.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<7b566a91-2eda-4289-aeab-c2d879931d11n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24358&group=comp.arch#24358

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:31a0:b0:67d:7500:1752 with SMTP id bi32-20020a05620a31a000b0067d75001752mr12200718qkb.485.1647861404782;
Mon, 21 Mar 2022 04:16:44 -0700 (PDT)
X-Received: by 2002:a05:6870:45a4:b0:dd:b08e:fa49 with SMTP id
y36-20020a05687045a400b000ddb08efa49mr8401095oao.270.1647861404292; Mon, 21
Mar 2022 04:16:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 04:16:44 -0700 (PDT)
In-Reply-To: <4fa644c9-0faa-4d53-896b-501b97884b4fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sso6aq$37b$1@newsreader4.netcologne.de> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <2022Mar20.174140@mips.complang.tuwien.ac.at>
<4fa644c9-0faa-4d53-896b-501b97884b4fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7b566a91-2eda-4289-aeab-c2d879931d11n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 21 Mar 2022 11:16:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Michael S - Mon, 21 Mar 2022 11:16 UTC

On Sunday, March 20, 2022 at 7:58:34 PM UTC+2, MitchAlsup wrote:
> On Sunday, March 20, 2022 at 11:54:37 AM UTC-5, Anton Ertl wrote:
> > Michael S <already...@yahoo.com> writes:
> > >On Sunday, March 20, 2022 at 1:45:27 PM UTC+2, Thomas Koenig wrote:
> > >> Is there an OOO implementation of a CPU (existing, in large-scale=20
> > >> use) that does not suffer from Spectre, or experience slowdowns=20
> > >> from the fixes?
> >
> > Not yet AFAIK.
> >
> > >Is there any high-performance CPU, either OoO or In-order, that does not su=
> > >ffer from SpectreV1 (bound check bypass) ?
> > >I heard claims that both Itanium uArchs are not vulnerable, but don't remem=
> > >ber if it came from authoritative source.
> > >POWER6 is certainly vulnerable.=20
> > >Even some low-performance in-order cores, in particular Arm Cortex-A8, are =
> > >vulnerable.
> >
> > References?
> >
> > Given that Spectre V1 performs two speculatively executed loads, with
> > the second depending on the result of the first, I would expect that
> > your typical in-order core is immune to that, because while the
> > typical in-order core can predict the branch and do instruction
> > fetching, decoding, and maybe even the execute stage, it stops at the
> > commit stage because it cannot back out of that. Ok, the second load
> > could get the result through the forwarding path and continue
> > execution with that, but do actual in-order cores do that? When does
> > it stop?
> <
> AMD machines were not sensitive to the double load version of Spectré
> These all fit the usual definition of GBOoO machines.

AMD machines (Bulldozer derivatives and Zen) were not vulnerable to Spectre-V3, a.k.a. Meltdown.
They were, nevertheless, vulnerable to Spectre-V1, a.k.a. Bound Check Bypass.
Both V1 and V3 are double load exploits.

That's citation from AMD whitepapwer:
For all flavors of variant 1, the AMD mitigation recommendation is software only
solutions which need to be evaluated in a wide range of software including kernel
software, JITs, browsers, and other user applications

> <
> > - anton
> > --
> > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24359&group=comp.arch#24359

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57d6:0:b0:2e0:68af:7c52 with SMTP id w22-20020ac857d6000000b002e068af7c52mr15802868qta.380.1647861936910;
Mon, 21 Mar 2022 04:25:36 -0700 (PDT)
X-Received: by 2002:a05:6830:40c8:b0:5cb:557a:bba6 with SMTP id
h8-20020a05683040c800b005cb557abba6mr4634926otu.350.1647861936703; Mon, 21
Mar 2022 04:25:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 04:25:36 -0700 (PDT)
In-Reply-To: <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 21 Mar 2022 11:25:36 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Mon, 21 Mar 2022 11:25 UTC

On Monday, March 21, 2022 at 5:03:59 AM UTC+2, Quadibloc wrote:
> On Sunday, March 20, 2022 at 8:05:13 AM UTC-6, Michael S wrote:
>
> > Is there any high-performance CPU, either OoO or In-order,
> Well, an in-order CPU would definitely not suffer from Spectre.

No, they are not. Read the post, you are answering to.

> But is there such a thing as a high-performance in-order CPU?

Right now the closest thing to high-performance in-order CPU is Intel Paulson which is
quite significantly slower than the best OoO CPUs, but I wouldn't call it slow in absolute sense.

Back at time of introduction of POWER6 (2007) it was most definitely considered high-performance CPU.
It was approximately on par with the fastest competitor's offerings in single-threaded benchmarks
and was easily beating them in throughput per core metrics.
Yes, it was 15 years ago. Whether it's a lot of time or a little, depends on everyone's perspective.

>
> John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t19ohr$r21$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24360&group=comp.arch#24360

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!T3F9KNSTSM9ffyC31YXeHw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Mon, 21 Mar 2022 12:46:01 +0100
Organization: Aioe.org NNTP Server
Message-ID: <t19ohr$r21$1@gioia.aioe.org>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de>
<sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad>
<ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com>
<t10mvq$4oe$1@gioia.aioe.org> <t11h9a$g5v$1@gioia.aioe.org>
<1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org>
<ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="27713"; posting-host="T3F9KNSTSM9ffyC31YXeHw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Mon, 21 Mar 2022 11:46 UTC

Michael S wrote:
> Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> each A64FX.
>
> Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> But for GPUs I agree with you and Mitch about uniform latency as a way to go.
>
> BTW, in a post above I didn't mean *one* cycle faster.
> What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.

That's interesting: From Mitch's numbers FADD is ~= 48 gates, FMAC = 62,
leaving no possible gates/cycle value which makes the first 3 and the
second 5 cycles?

I.e. FMAC==5 means gates/clock must be between 13 (65 gates) and 15
(75), leaving 39 to 45 gates for a 3-cycles FADD, so I'll take 45 as the
limit.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24364&group=comp.arch#24364

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7fc6:0:b0:2e1:ce3e:b491 with SMTP id b6-20020ac87fc6000000b002e1ce3eb491mr17138703qtk.287.1647885854083;
Mon, 21 Mar 2022 11:04:14 -0700 (PDT)
X-Received: by 2002:a05:6870:a2d0:b0:d9:ae66:b8e2 with SMTP id
w16-20020a056870a2d000b000d9ae66b8e2mr137391oak.7.1647885853841; Mon, 21 Mar
2022 11:04:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 11:04:13 -0700 (PDT)
In-Reply-To: <t19ohr$r21$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 21 Mar 2022 18:04:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 42

by: Michael S - Mon, 21 Mar 2022 18:04 UTC

On Monday, March 21, 2022 at 1:46:07 PM UTC+2, Terje Mathisen wrote:
> Michael S wrote:
> > Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> > Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> > performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> > each A64FX.
> >
> > Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> > But for GPUs I agree with you and Mitch about uniform latency as a way to go.
> >
> > BTW, in a post above I didn't mean *one* cycle faster.
> > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
>
> That's interesting: From Mitch's numbers FADD is ~= 48 gates, FMAC = 62,
> leaving no possible gates/cycle value which makes the first 3 and the
> second 5 cycles?
>
> I.e. FMAC==5 means gates/clock must be between 13 (65 gates) and 15
> (75), leaving 39 to 45 gates for a 3-cycles FADD, so I'll take 45 as the
> limit.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

I don't know which components of latency are included in figures of Mitch and which are omitted.
In particular, do they include muxes for selection of inputs between RF read ports and forwarding sources?
Such muxes can be more complicated for FMADD (3 inputs) vs FADD/FMUL (2 inputs).
Do figures include delays of pipeline latches?
Do they include wire delays to forwarding network and into input muxes?

What I do know is that, as mentioned in post of Anton Ertl, in recent years we had seen several
high-volume CPU cores that had sw-visible FADD latency = 3 and sw-visible FMADD=5.
FMUL latency of these designs varied (5 on HSWL, 4 on Zen1, 3 on BDWL and Zen2).
One of these designs (Zen2) is still shipping in significant volumes.

I don't know for sure what was FO4 per stage design target for each and every of these cores,
but if I had to guess I'd say that none of them was 16 or 17, more like 20 or 21.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24367&group=comp.arch#24367

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1051:b0:2e1:eb06:ecc2 with SMTP id f17-20020a05622a105100b002e1eb06ecc2mr17192658qte.171.1647886370711;
Mon, 21 Mar 2022 11:12:50 -0700 (PDT)
X-Received: by 2002:a05:6870:1692:b0:dd:9dc0:1747 with SMTP id
j18-20020a056870169200b000dd9dc01747mr149142oae.205.1647886370470; Mon, 21
Mar 2022 11:12:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 11:12:50 -0700 (PDT)
In-Reply-To: <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:30:df54:dce3:b07d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:30:df54:dce3:b07d
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Mar 2022 18:12:50 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 57

by: MitchAlsup - Mon, 21 Mar 2022 18:12 UTC

On Monday, March 21, 2022 at 1:04:15 PM UTC-5, Michael S wrote:
> On Monday, March 21, 2022 at 1:46:07 PM UTC+2, Terje Mathisen wrote:
> > Michael S wrote:
> > > Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> > > Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> > > performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> > > each A64FX.
> > >
> > > Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> > > But for GPUs I agree with you and Mitch about uniform latency as a way to go.
> > >
> > > BTW, in a post above I didn't mean *one* cycle faster.
> > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> >
> > That's interesting: From Mitch's numbers FADD is ~= 48 gates, FMAC = 62,
> > leaving no possible gates/cycle value which makes the first 3 and the
> > second 5 cycles?
> >
> > I.e. FMAC==5 means gates/clock must be between 13 (65 gates) and 15
> > (75), leaving 39 to 45 gates for a 3-cycles FADD, so I'll take 45 as the
> > limit.
> >
> > Terje
> >
> > --
> > - <Terje.Mathisen at tmsw.no>
> > "almost all programming can be viewed as an exercise in caching"
<
> I don't know which components of latency are included in figures of Mitch and which are omitted.
> In particular, do they include muxes for selection of inputs between RF read ports and forwarding sources?
<
These are pipeline delays not FPU delays. FPU delay starts when operands arrive and
finish when FPU result is forwarded to become an operand of a subsequent instruction.
<
> Such muxes can be more complicated for FMADD (3 inputs) vs FADD/FMUL (2 inputs).
<
If you can route the wires to where they need to be, a 1:2 mux versus a 1:4 mux is
not that much different in the gates of delay proplem (both are 1 gate of delay).
<
> Do figures include delays of pipeline latches?
>
No flip-flops are part of the clocking of the pipeline, not part of the logic of calculation.
>
> Do they include wire delays to forwarding network and into input muxes?
<
Yes,
>
> What I do know is that, as mentioned in post of Anton Ertl, in recent years we had seen several
> high-volume CPU cores that had sw-visible FADD latency = 3 and sw-visible FMADD=5.
> FMUL latency of these designs varied (5 on HSWL, 4 on Zen1, 3 on BDWL and Zen2).
> One of these designs (Zen2) is still shipping in significant volumes.
>
> I don't know for sure what was FO4 per stage design target for each and every of these cores,
> but if I had to guess I'd say that none of them was 16 or 17, more like 20 or 21.
<
Athlon was 16 gates of logic per cycle
Opteron was 17 gates of logic per cycle
Bulldozer tried to be 12.....

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24370&group=comp.arch#24370

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:56b:b0:62c:eff4:fe8d with SMTP id p11-20020a05620a056b00b0062ceff4fe8dmr13207736qkp.459.1647888508022;
Mon, 21 Mar 2022 11:48:28 -0700 (PDT)
X-Received: by 2002:a05:6870:d249:b0:dd:ada6:736b with SMTP id
h9-20020a056870d24900b000ddada6736bmr215931oac.27.1647888507723; Mon, 21 Mar
2022 11:48:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 11:48:27 -0700 (PDT)
In-Reply-To: <134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 21 Mar 2022 18:48:28 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 69

by: Michael S - Mon, 21 Mar 2022 18:48 UTC

On Monday, March 21, 2022 at 8:12:52 PM UTC+2, MitchAlsup wrote:
> On Monday, March 21, 2022 at 1:04:15 PM UTC-5, Michael S wrote:
> > On Monday, March 21, 2022 at 1:46:07 PM UTC+2, Terje Mathisen wrote:
> > > Michael S wrote:
> > > > Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> > > > Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> > > > performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> > > > each A64FX.
> > > >
> > > > Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> > > > But for GPUs I agree with you and Mitch about uniform latency as a way to go.
> > > >
> > > > BTW, in a post above I didn't mean *one* cycle faster.
> > > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> > >
> > > That's interesting: From Mitch's numbers FADD is ~= 48 gates, FMAC = 62,
> > > leaving no possible gates/cycle value which makes the first 3 and the
> > > second 5 cycles?
> > >
> > > I.e. FMAC==5 means gates/clock must be between 13 (65 gates) and 15
> > > (75), leaving 39 to 45 gates for a 3-cycles FADD, so I'll take 45 as the
> > > limit.
> > >
> > > Terje
> > >
> > > --
> > > - <Terje.Mathisen at tmsw.no>
> > > "almost all programming can be viewed as an exercise in caching"
> <
> > I don't know which components of latency are included in figures of Mitch and which are omitted.
> > In particular, do they include muxes for selection of inputs between RF read ports and forwarding sources?
> <
> These are pipeline delays not FPU delays. FPU delay starts when operands arrive and
> finish when FPU result is forwarded to become an operand of a subsequent instruction.
> <
> > Such muxes can be more complicated for FMADD (3 inputs) vs FADD/FMUL (2 inputs).
> <
> If you can route the wires to where they need to be, a 1:2 mux versus a 1:4 mux is
> not that much different in the gates of delay proplem (both are 1 gate of delay).
> <
> > Do figures include delays of pipeline latches?
> >
> No flip-flops are part of the clocking of the pipeline, not part of the logic of calculation.
> >
> > Do they include wire delays to forwarding network and into input muxes?
> <
> Yes,

Do you assume 2 FPUs that freely forward to each other and one or more non-FP ports that also can be sources to FPU ?
For example, in HSWL/BDWL each of two FPUs can be feed by 3 execution ports. I also think that during FP loads
results can be forwarded into FPUs bypassing RF, but I am not sure about it.
And that just HSWL/BDWL, Zen2 is significantly wider than that.

> >
> > What I do know is that, as mentioned in post of Anton Ertl, in recent years we had seen several
> > high-volume CPU cores that had sw-visible FADD latency = 3 and sw-visible FMADD=5.
> > FMUL latency of these designs varied (5 on HSWL, 4 on Zen1, 3 on BDWL and Zen2).
> > One of these designs (Zen2) is still shipping in significant volumes.
> >
> > I don't know for sure what was FO4 per stage design target for each and every of these cores,
> > but if I had to guess I'd say that none of them was 16 or 17, more like 20 or 21.
> <
> Athlon was 16 gates of logic per cycle
> Opteron was 17 gates of logic per cycle
> Bulldozer tried to be 12.....

Those CPU mentioned above are certainly much more conservative w.r.t. clocking than BD and probably more conservative than K8 too. Esp. Broadwell that has max. clock frequency = 3.8GHz which is quite low relatively to other Intel 14nm CPUs.
(Broadwell-E turbos to 4GHz, but it's a different die and likely does it only at quite high voltage).

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t1ahum$4rd$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24371&group=comp.arch#24371

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Mon, 21 Mar 2022 19:59:33 +0100
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <t1ahum$4rd$1@dont-email.me>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<06d93522-9312-4c73-8c4f-8fc29e305b81n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 21 Mar 2022 18:59:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0fcea8588bfc284dfde8940db91e62f0";
logging-data="4973"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NXBpk2W0G4Bmi2WsWzOTNn05M16J7uBg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:cTE8X5POUzuYZsr9ukQDN7XJ6LY=
In-Reply-To: <06d93522-9312-4c73-8c4f-8fc29e305b81n@googlegroups.com>
Content-Language: en-US

by: Marcus - Mon, 21 Mar 2022 18:59 UTC

On 2022-01-25, MitchAlsup wrote:
> On Tuesday, January 25, 2022 at 12:45:50 AM UTC-6, Thomas Koenig wrote:
>> Looking at Alpha, and also now at LoonArch, I find separate
>> instructions for 32-bit and 64-bit addition for a 64-bit
>> architecture. For example, Alpha has both addl (add longword)
>> and addq (add quadword).
>>
>> What is not quite clear to me is the rationale behind this - a
>> 64-bit addition is (blindlingly obviously) just a 32-bit addition
>> when ignoring the high bits.
> <
> For My 66000 architecture, arithmetic is only register size (except
> an accommodation for single precision floats).
> <
> On the integer side: if one wants a particular number of bits after
> arithmetic has been performed, I have both signed and unsigned
> versions of extract (any number of bits from 1..64) not just std
> sizes.

Have you got the statistics for how common this is?

It seems to me that "main stream" 64-bit ISA:s (x86-64 & ARMv8) have
fairly extensive support for 32-bit arithmetic on 64-bit registers.
RV64 also has reasonable 32-bit arithmetic support.

My assumption is that the decision was data driven: that too much
software is written with C type int rather than long long, and the
compiler ends up emitting lots of sign/zero-extension or truncation
operations.

I'm still not sure if this could be sorted out more efficiently at the
compiler level, to minimize the number of size conversions, or if that
is already a solved problem and the decision to support 32-bit
arithmetic is actually warranted.

> <
> On the floating pint side: There is a full complement (49) of insts
> that convert one {S, U, F, D} to the other with any of the rounding
> modes.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24372&group=comp.arch#24372

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5d86:0:b0:2e1:b9fd:ec24 with SMTP id d6-20020ac85d86000000b002e1b9fdec24mr17615827qtx.290.1647889495691;
Mon, 21 Mar 2022 12:04:55 -0700 (PDT)
X-Received: by 2002:a05:6808:211f:b0:2da:84f6:9eed with SMTP id
r31-20020a056808211f00b002da84f69eedmr285306oiw.239.1647889495459; Mon, 21
Mar 2022 12:04:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 12:04:55 -0700 (PDT)
In-Reply-To: <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:30:df54:dce3:b07d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:30:df54:dce3:b07d
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com> <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Mar 2022 19:04:55 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 78

by: MitchAlsup - Mon, 21 Mar 2022 19:04 UTC

On Monday, March 21, 2022 at 1:48:29 PM UTC-5, Michael S wrote:
> On Monday, March 21, 2022 at 8:12:52 PM UTC+2, MitchAlsup wrote:
> > On Monday, March 21, 2022 at 1:04:15 PM UTC-5, Michael S wrote:
> > > On Monday, March 21, 2022 at 1:46:07 PM UTC+2, Terje Mathisen wrote:
> > > > Michael S wrote:
> > > > > Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> > > > > Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> > > > > performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> > > > > each A64FX.
> > > > >
> > > > > Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> > > > > But for GPUs I agree with you and Mitch about uniform latency as a way to go.
> > > > >
> > > > > BTW, in a post above I didn't mean *one* cycle faster.
> > > > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> > > >
> > > > That's interesting: From Mitch's numbers FADD is ~= 48 gates, FMAC = 62,
> > > > leaving no possible gates/cycle value which makes the first 3 and the
> > > > second 5 cycles?
> > > >
> > > > I.e. FMAC==5 means gates/clock must be between 13 (65 gates) and 15
> > > > (75), leaving 39 to 45 gates for a 3-cycles FADD, so I'll take 45 as the
> > > > limit.
> > > >
> > > > Terje
> > > >
> > > > --
> > > > - <Terje.Mathisen at tmsw.no>
> > > > "almost all programming can be viewed as an exercise in caching"
> > <
> > > I don't know which components of latency are included in figures of Mitch and which are omitted.
> > > In particular, do they include muxes for selection of inputs between RF read ports and forwarding sources?
> > <
> > These are pipeline delays not FPU delays. FPU delay starts when operands arrive and
> > finish when FPU result is forwarded to become an operand of a subsequent instruction.
> > <
> > > Such muxes can be more complicated for FMADD (3 inputs) vs FADD/FMUL (2 inputs).
> > <
> > If you can route the wires to where they need to be, a 1:2 mux versus a 1:4 mux is
> > not that much different in the gates of delay proplem (both are 1 gate of delay).
> > <
> > > Do figures include delays of pipeline latches?
> > >
> > No flip-flops are part of the clocking of the pipeline, not part of the logic of calculation.
> > >
> > > Do they include wire delays to forwarding network and into input muxes?
> > <
> > Yes,
<
> Do you assume 2 FPUs that freely forward to each other and one or more non-FP ports that also can be sources to FPU ?
<
You have to assume that the Reg File, any FPU, and any LD port can be feeding operands to any FPU.
So, if you have 2 FPUs, 3 memory units, each operand comes from 1 of 6 places. (1 RF, 2 FPU, 3 MEM).
<
> For example, in HSWL/BDWL each of two FPUs can be feed by 3 execution ports.
<
Can you say this again and use different words. Or at least define the words you are using.
<
> I also think that during FP loads
> results can be forwarded into FPUs bypassing RF, but I am not sure about it.
> And that just HSWL/BDWL, Zen2 is significantly wider than that.
<
Hard Starboard Won't List ?
Bowel Deflection Wasn't Liked ?
<
> > >
> > > What I do know is that, as mentioned in post of Anton Ertl, in recent years we had seen several
> > > high-volume CPU cores that had sw-visible FADD latency = 3 and sw-visible FMADD=5.
> > > FMUL latency of these designs varied (5 on HSWL, 4 on Zen1, 3 on BDWL and Zen2).
> > > One of these designs (Zen2) is still shipping in significant volumes.
> > >
> > > I don't know for sure what was FO4 per stage design target for each and every of these cores,
> > > but if I had to guess I'd say that none of them was 16 or 17, more like 20 or 21.
> > <
> > Athlon was 16 gates of logic per cycle
> > Opteron was 17 gates of logic per cycle
> > Bulldozer tried to be 12.....
> Those CPU mentioned above are certainly much more conservative w.r.t. clocking than BD and probably more conservative than K8 too. Esp. Broadwell that has max. clock frequency = 3.8GHz which is quite low relatively to other Intel 14nm CPUs.
> (Broadwell-E turbos to 4GHz, but it's a different die and likely does it only at quite high voltage).

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t1aku3$3oc$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24376&group=comp.arch#24376

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Mon, 21 Mar 2022 20:50:27 +0100
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <t1aku3$3oc$1@dont-email.me>
References: <t1ahum$4rd$1@dont-email.me>
<memo.20220321191516.1928b@jgd.cix.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 21 Mar 2022 19:50:27 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0fcea8588bfc284dfde8940db91e62f0";
logging-data="3852"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Uu3i5V6i+QnA5f2kBl05qUFeGKW87EVM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:GDx6nMYzD0zhfPK8XhgFncK6/34=
In-Reply-To: <memo.20220321191516.1928b@jgd.cix.co.uk>
Content-Language: en-US

by: Marcus - Mon, 21 Mar 2022 19:50 UTC

On 2022-03-21, John Dallman wrote:
> In article <t1ahum$4rd$1@dont-email.me>, m.delete@this.bitsnbites.eu
> (Marcus) wrote:
>
>> My assumption is that the decision was data driven: that too much
>> software is written with C type int rather than long long, and the
>> compiler ends up emitting lots of sign/zero-extension or truncation
>> operations.
>
> Software that counts macroscopic physical objects, or their equivalents
> in modelling space, is usually happy with 32-bit ints. Switching to
> 64-bit ints for that won't be attractive until support for ILP32
> programming environments is /completely/ dead for any given type of
> applications.
>

There's also the issue of programming language bias. The utterly
dominant integer type in C/C++ is still "int", which maps to a 32-bit
integer for most architectures.

for (int i = 0; i < N; i++) { do_something(); }

....is still very common. And types are viral (you usually avoid mixing
different integer types in a segment of code).

And finally, there's the data size issue. For large data sets it's
perfectly valid to use 32-bit (or even 16-bit) integers to reduce memory
and D$ load. It's a common trick in 3D graphics, for instance (which is
what you were hinting at, I guess).

> The stuff I work on is all 64-bit, except that some customers still
> insist on 32-bit Windows support. This is usually because they use one or
> another commercial software component whose supplier is scared of 64-bit,
> or wants a great deal more money for it.

In my case it's 32-bit architectures for embedded platforms that set the
limit. E.g. many DSP:s are 32-bit, and ARMv7 is still common. But ARMv8
and x86-64 are also in the mix.

>
> John

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<a24ef448-6f5a-4bfd-82c4-54988de5d58bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24377&group=comp.arch#24377

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5acd:0:b0:2e1:e117:b303 with SMTP id d13-20020ac85acd000000b002e1e117b303mr18370432qtd.216.1647892289512;
Mon, 21 Mar 2022 12:51:29 -0700 (PDT)
X-Received: by 2002:a05:6808:2185:b0:2d9:ebf0:fb66 with SMTP id
be5-20020a056808218500b002d9ebf0fb66mr378660oib.69.1647892289164; Mon, 21 Mar
2022 12:51:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 12:51:28 -0700 (PDT)
In-Reply-To: <t1ahum$4rd$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:30:df54:dce3:b07d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:30:df54:dce3:b07d
References: <sso6aq$37b$1@newsreader4.netcologne.de> <06d93522-9312-4c73-8c4f-8fc29e305b81n@googlegroups.com>
<t1ahum$4rd$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a24ef448-6f5a-4bfd-82c4-54988de5d58bn@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Mar 2022 19:51:29 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 82

by: MitchAlsup - Mon, 21 Mar 2022 19:51 UTC

On Monday, March 21, 2022 at 1:59:37 PM UTC-5, Marcus wrote:
> On 2022-01-25, MitchAlsup wrote:
> > On Tuesday, January 25, 2022 at 12:45:50 AM UTC-6, Thomas Koenig wrote:
> >> Looking at Alpha, and also now at LoonArch, I find separate
> >> instructions for 32-bit and 64-bit addition for a 64-bit
> >> architecture. For example, Alpha has both addl (add longword)
> >> and addq (add quadword).
> >>
> >> What is not quite clear to me is the rationale behind this - a
> >> 64-bit addition is (blindlingly obviously) just a 32-bit addition
> >> when ignoring the high bits.
> > <
> > For My 66000 architecture, arithmetic is only register size (except
> > an accommodation for single precision floats).
> > <
> > On the integer side: if one wants a particular number of bits after
> > arithmetic has been performed, I have both signed and unsigned
> > versions of extract (any number of bits from 1..64) not just std
> > sizes.
>
> Have you got the statistics for how common this is?
<
Page Fault handlers need 10-bit, 9-bit field extracts from VA.
<
Floating point needs 11-bits (8-bits) 1-bit down (33-down) from the top.
<
And once you start supporting random offsets and random lengths,
you don't need explicit support for 32-bits.
<
It occur often enough and costs so little beyond the shifter itself,
that it falls into the "why not" category. Note:, 64-bit Shift is 3 layers
of logic (4:1 Mux) with lots of wire delay. You have time to decode the
mask and amplify the sign bit and still fit in 16-gtes with ease.
<
Thus, adding masking capability to the shifter adds area but no real
delay.
>
> It seems to me that "main stream" 64-bit ISA:s (x86-64 & ARMv8) have
> fairly extensive support for 32-bit arithmetic on 64-bit registers.
> RV64 also has reasonable 32-bit arithmetic support.
<
Mostly because they started out as 32-bit machines (or smaller).
None of them has support for 24-bit arithmetic, or 37, or 49.....
<
And part of this is "if you don't provide X" programmers will not use
X and thus you were justified in leaving X out. A self fulfilling prophecy.
>
> My assumption is that the decision was data driven: that too much
> software is written with C type int rather than long long, and the
> compiler ends up emitting lots of sign/zero-extension or truncation
> operations.
<
Any compiler (hint LLVM) which compiles code such that the size of
the container never contains a value outside of what it is allowed
to contain will have plenty of these "smashes". This has low surprise
coefficient.
>
> I'm still not sure if this could be sorted out more efficiently at the
> compiler level, to minimize the number of size conversions, or if that
> is already a solved problem and the decision to support 32-bit
> arithmetic is actually warranted.
<
Where does it stop?
{8, 16, 32, 64} ?
{Right, middle, left}
{1,2,4,128,256,512}?
{2,3,5,7,11,13,17,19,23}?
In other words, they are your encoding, choose what delivers more to
your bottom line.
<
And if you want to know how it hurts your bottom line: jump in and
draw up the circuits to see if it fits !
>
> > <
> > On the floating pint side: There is a full complement (49) of insts
> > that convert one {S, U, F, D} to the other with any of the rounding
> > modes.
<
In the 1-operand subGroup, there are encodings for 2048 instructions
burning up 49 of these, which use almost identical circuitry, and with
only 8 other encodings, there is not an argument where you run out
of encodings, the argument switches around to what are useful to the
compiled code.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24381&group=comp.arch#24381

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:2487:b0:67b:3113:f83f with SMTP id i7-20020a05620a248700b0067b3113f83fmr13974383qkn.604.1647896211979;
Mon, 21 Mar 2022 13:56:51 -0700 (PDT)
X-Received: by 2002:a05:6808:211f:b0:2da:84f6:9eed with SMTP id
r31-20020a056808211f00b002da84f69eedmr484613oiw.239.1647896211776; Mon, 21
Mar 2022 13:56:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 13:56:51 -0700 (PDT)
In-Reply-To: <98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:9c02:dcf9:eab4:49a
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com> <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 21 Mar 2022 20:56:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 130

by: Michael S - Mon, 21 Mar 2022 20:56 UTC

On Monday, March 21, 2022 at 9:04:57 PM UTC+2, MitchAlsup wrote:
> On Monday, March 21, 2022 at 1:48:29 PM UTC-5, Michael S wrote:
> > On Monday, March 21, 2022 at 8:12:52 PM UTC+2, MitchAlsup wrote:
> > > On Monday, March 21, 2022 at 1:04:15 PM UTC-5, Michael S wrote:
> > > > On Monday, March 21, 2022 at 1:46:07 PM UTC+2, Terje Mathisen wrote:
> > > > > Michael S wrote:
> > > > > > Overwhelming majority of FP instructions, as measure by appearance in world's program code, is scalar.
> > > > > > Quite possibly, it's not true for majority of *executed* FP instructions and esp. for majority of
> > > > > > performed FLOPs, but I wouldn't bet even on that. There are quite a few Cortex-M4s out here per
> > > > > > each A64FX.
> > > > > >
> > > > > > Well, considering that all GPU code is SIMD (even if they call it SIMT), may be not.
> > > > > > But for GPUs I agree with you and Mitch about uniform latency as a way to go.
> > > > > >
> > > > > > BTW, in a post above I didn't mean *one* cycle faster.
> > > > > > What I had in mind was heavily pipelined design where FMADD is 5 clocks and FADD is 3.
> > > > >
> > > > > That's interesting: From Mitch's numbers FADD is ~= 48 gates, FMAC = 62,
> > > > > leaving no possible gates/cycle value which makes the first 3 and the
> > > > > second 5 cycles?
> > > > >
> > > > > I.e. FMAC==5 means gates/clock must be between 13 (65 gates) and 15
> > > > > (75), leaving 39 to 45 gates for a 3-cycles FADD, so I'll take 45 as the
> > > > > limit.
> > > > >
> > > > > Terje
> > > > >
> > > > > --
> > > > > - <Terje.Mathisen at tmsw.no>
> > > > > "almost all programming can be viewed as an exercise in caching"
> > > <
> > > > I don't know which components of latency are included in figures of Mitch and which are omitted.
> > > > In particular, do they include muxes for selection of inputs between RF read ports and forwarding sources?
> > > <
> > > These are pipeline delays not FPU delays. FPU delay starts when operands arrive and
> > > finish when FPU result is forwarded to become an operand of a subsequent instruction.
> > > <
> > > > Such muxes can be more complicated for FMADD (3 inputs) vs FADD/FMUL (2 inputs).
> > > <
> > > If you can route the wires to where they need to be, a 1:2 mux versus a 1:4 mux is
> > > not that much different in the gates of delay proplem (both are 1 gate of delay).
> > > <
> > > > Do figures include delays of pipeline latches?
> > > >
> > > No flip-flops are part of the clocking of the pipeline, not part of the logic of calculation.
> > > >
> > > > Do they include wire delays to forwarding network and into input muxes?
> > > <
> > > Yes,
> <
> > Do you assume 2 FPUs that freely forward to each other and one or more non-FP ports that also can be sources to FPU ?
> <
> You have to assume that the Reg File, any FPU, and any LD port can be feeding operands to any FPU.
> So, if you have 2 FPUs, 3 memory units, each operand comes from 1 of 6 places. (1 RF, 2 FPU, 3 MEM).

Almost. In particular case of HSWL/BDWL 2 FPUs, 1 non-FP SIMD EU (int ops, logical ops, shifts, shuffles) and 2 MEM loads.

> <
> > For example, in HSWL/BDWL each of two FPUs can be feed by 3 execution ports.
> <
> Can you say this again and use different words. Or at least define the words you are using.

Execution ports are defined both in Intel's "Intel® 64 and IA-32 Architectures Optimization Reference Manual", e.g. 248966-042b
and in Agner Fog's microarchitecture tables.

> <
> > I also think that during FP loads
> > results can be forwarded into FPUs bypassing RF, but I am not sure about it.
> > And that just HSWL/BDWL, Zen2 is significantly wider than that.
> <
> Hard Starboard Won't List ?
> Bowel Deflection Wasn't Liked ?

Intel codename. Haswell and Broadwell. A pair of rather similar microarchitectures, but FP parts are not identical.

> <
> > > >
> > > > What I do know is that, as mentioned in post of Anton Ertl, in recent years we had seen several
> > > > high-volume CPU cores that had sw-visible FADD latency = 3 and sw-visible FMADD=5.
> > > > FMUL latency of these designs varied (5 on HSWL, 4 on Zen1, 3 on BDWL and Zen2).
> > > > One of these designs (Zen2) is still shipping in significant volumes.
> > > >
> > > > I don't know for sure what was FO4 per stage design target for each and every of these cores,
> > > > but if I had to guess I'd say that none of them was 16 or 17, more like 20 or 21.
> > > <
> > > Athlon was 16 gates of logic per cycle
> > > Opteron was 17 gates of logic per cycle
> > > Bulldozer tried to be 12.....
> > Those CPU mentioned above are certainly much more conservative w.r.t. clocking than BD and probably more conservative than K8 too. Esp. Broadwell that has max. clock frequency = 3.8GHz which is quite low relatively to other Intel 14nm CPUs.
> > (Broadwell-E turbos to 4GHz, but it's a different die and likely does it only at quite high voltage).

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<9a6a22cd-48fb-4880-8468-b4f8561bfa20n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24383&group=comp.arch#24383

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1386:b0:67e:46c1:1b33 with SMTP id k6-20020a05620a138600b0067e46c11b33mr11925580qki.53.1647897049445;
Mon, 21 Mar 2022 14:10:49 -0700 (PDT)
X-Received: by 2002:a05:6808:113:b0:2ec:b7db:df66 with SMTP id
b19-20020a056808011300b002ecb7dbdf66mr588643oie.108.1647897049210; Mon, 21
Mar 2022 14:10:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 14:10:49 -0700 (PDT)
In-Reply-To: <34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:5f3:4cc4:d5eb:d696;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:5f3:4cc4:d5eb:d696
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com>
<34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9a6a22cd-48fb-4880-8468-b4f8561bfa20n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 21 Mar 2022 21:10:49 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 23

by: Quadibloc - Mon, 21 Mar 2022 21:10 UTC

On Monday, March 21, 2022 at 5:25:38 AM UTC-6, Michael S wrote:

> Right now the closest thing to high-performance in-order CPU is Intel Paulson which is
> quite significantly slower than the best OoO CPUs, but I wouldn't call it slow in absolute sense.

If it isn't possible to make an in-order CPU that is as fast
as *the best there is*, then in-order CPUs are basically
eliminated from consideration - for doing the heavy lifting.

But that doesn't mean that they're useless. Even a 386 SX
could run Windows 3.1 with good responsiveness, therefore
a 486 CPU would be plenty fast enough for a core intended
to run untrusted code.

But just running untrusted code on a Spectre-proof CPU isn't
good enough unless all the rest of your security is perfect, and
that doesn't seem to be attainable. (But if this _were_ the only
practical alternative, then perhaps more effort might be made
to make operating systems genuinely secure. Yes, they're not
secure now, but this is largely for lack of trying, in my opinion.
_That's_ why I make this suggestion, though it may seem daft
to Mitch.)

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24384&group=comp.arch#24384

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:208:b0:2e1:b3ec:b7ce with SMTP id b8-20020a05622a020800b002e1b3ecb7cemr18176234qtx.345.1647897739440;
Mon, 21 Mar 2022 14:22:19 -0700 (PDT)
X-Received: by 2002:a05:6808:1152:b0:2da:c7f:66c2 with SMTP id
u18-20020a056808115200b002da0c7f66c2mr570553oiu.253.1647897739215; Mon, 21
Mar 2022 14:22:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 14:22:19 -0700 (PDT)
In-Reply-To: <34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:5f3:4cc4:d5eb:d696;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:5f3:4cc4:d5eb:d696
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<ea3fec28-7635-450a-afa9-ad3d93baef97n@googlegroups.com> <3d95c40a-41c8-48db-a983-98a2fc066023n@googlegroups.com>
<bdef5bdc-0d13-4e62-8763-289d955094ebn@googlegroups.com> <bbf922c2-66d1-432c-aee7-d66a4097623dn@googlegroups.com>
<619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com>
<b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de>
<0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com>
<34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 21 Mar 2022 21:22:19 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18

by: Quadibloc - Mon, 21 Mar 2022 21:22 UTC

On Monday, March 21, 2022 at 5:25:38 AM UTC-6, Michael S wrote:
> On Monday, March 21, 2022 at 5:03:59 AM UTC+2, Quadibloc wrote:

> > But is there such a thing as a high-performance in-order CPU?

> Right now the closest thing to high-performance in-order CPU is Intel Paulson which is
> quite significantly slower than the best OoO CPUs, but I wouldn't call it slow in absolute sense.

I had to look it up to refresh my memory. Intel Poulson is an *Itanium*,
and this microarchitecture was even made a bit faster in Kittson by
using a newer process.

This certainly would offer a jump in performance over a 486 core.

But it doesn't address the issue of *running existing Windows software*
which is basically what a computer is *for* as far as the market is
concerned.

John Savard

Subject	Author
Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	BGB
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	David Brown
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	BGB
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	BGB
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Marcus
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Marcus
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	EricP
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	EricP
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
The cost of gradual underflow (was: Why separate 32-bit arithmetic on a 64-bit a	Stefan Monnier
Re: The cost of gradual underflow	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	antispam
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	EricP
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	George Neuner
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Spectre ane EPIC (was: Why separate 32-bit arithmetic...)	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Michael S
Re: Spectre	EricP
Re: Spectre	MitchAlsup
Re: Spectre	EricP
Re: Spectre	MitchAlsup
Re: Spectre	Anton Ertl
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Spectre (was: Why separate 32-bit arithmetic ...)	MitchAlsup
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Thomas Koenig
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Spectre	EricP
Re: Spectre	Anton Ertl
Memory encryption (was: Spectre)	Thomas Koenig
Re: Memory encryption (was: Spectre)	Anton Ertl
Re: Memory encryption (was: Spectre)	Elijah Stone
Re: Memory encryption (was: Spectre)	Michael S
Re: Memory encryption (was: Spectre)	Anton Ertl
Re: Memory encryption (was: Spectre)	MitchAlsup
Re: Memory encryption (was: Spectre)	Thomas Koenig
Re: Memory encryption (was: Spectre)	Anton Ertl
Re: Spectre	Terje Mathisen
Re: Spectre	Thomas Koenig
Re: Spectre	Anton Ertl
Re: Spectre	Thomas Koenig
Re: Spectre	Anton Ertl
Re: Spectre	Michael S
Re: Spectre	MitchAlsup
Re: Spectre (was: Why separate 32-bit arithmetic ...)	MitchAlsup
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Bill Findlay
Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?	John Levine
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc