Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

Last yeer I kudn't spel Engineer. Now I are won.

Re: Spectre

Subject	Author
Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	BGB
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	David Brown
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	BGB
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	BGB
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Marcus
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Marcus
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	EricP
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	EricP
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
The cost of gradual underflow (was: Why separate 32-bit arithmetic on a 64-bit a	Stefan Monnier
Re: The cost of gradual underflow	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	antispam
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Terje Mathisen
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Thomas Koenig
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	EricP
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	George Neuner
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Spectre ane EPIC (was: Why separate 32-bit arithmetic...)	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Michael S
Re: Spectre	EricP
Re: Spectre	MitchAlsup
Re: Spectre	EricP
Re: Spectre	MitchAlsup
Re: Spectre	Anton Ertl
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Spectre (was: Why separate 32-bit arithmetic ...)	MitchAlsup
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Thomas Koenig
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Spectre	EricP
Re: Spectre	Anton Ertl
Memory encryption (was: Spectre)	Thomas Koenig
Re: Memory encryption (was: Spectre)	Anton Ertl
Re: Memory encryption (was: Spectre)	Elijah Stone
Re: Memory encryption (was: Spectre)	Michael S
Re: Memory encryption (was: Spectre)	Anton Ertl
Re: Memory encryption (was: Spectre)	MitchAlsup
Re: Memory encryption (was: Spectre)	Thomas Koenig
Re: Memory encryption (was: Spectre)	Anton Ertl
Re: Spectre	Terje Mathisen
Re: Spectre	Thomas Koenig
Re: Spectre	Anton Ertl
Re: Spectre	Thomas Koenig
Re: Spectre	Anton Ertl
Re: Spectre	Michael S
Re: Spectre	MitchAlsup
Re: Spectre (was: Why separate 32-bit arithmetic ...)	MitchAlsup
Re: Spectre (was: Why separate 32-bit arithmetic ...)	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Bill Findlay
Re: Imprecision, was Why separate 32-bit arithmetic on a 64-bit architecture?	John Levine
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Michael S
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	MitchAlsup
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Anton Ertl
Re: Why separate 32-bit arithmetic on a 64-bit architecture?	Quadibloc

Pages:1 2 3 4 567

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24387&group=comp.arch#24387

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7fc6:0:b0:2e1:ce3e:b491 with SMTP id b6-20020ac87fc6000000b002e1ce3eb491mr17861705qtk.287.1647900510537;
Mon, 21 Mar 2022 15:08:30 -0700 (PDT)
X-Received: by 2002:a05:6808:1451:b0:2ec:cfe4:21e with SMTP id
x17-20020a056808145100b002eccfe4021emr677724oiv.147.1647900510287; Mon, 21
Mar 2022 15:08:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 15:08:30 -0700 (PDT)
In-Reply-To: <935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:30:df54:dce3:b07d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:30:df54:dce3:b07d
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com> <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com> <935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Mar 2022 22:08:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 25

by: MitchAlsup - Mon, 21 Mar 2022 22:08 UTC

On Monday, March 21, 2022 at 3:56:53 PM UTC-5, Michael S wrote:
> On Monday, March 21, 2022 at 9:04:57 PM UTC+2, MitchAlsup wrote:
> > On Monday, March 21, 2022 at 1:48:29 PM UTC-5, Michael S wrote:
> > <
> > > For example, in HSWL/BDWL each of two FPUs can be feed by 3 execution ports.
> > <
> > Can you say this again and use different words. Or at least define the words you are using.
> Execution ports are defined both in Intel's "Intel® 64 and IA-32 Architectures Optimization Reference Manual", e.g. 248966-042b
> and in Agner Fog's microarchitecture tables.
<
Converting out of Intel-esse: 3 instruction queues can feed either FPU.
> > <
> > > I also think that during FP loads
> > > results can be forwarded into FPUs bypassing RF, but I am not sure about it.
> > > And that just HSWL/BDWL, Zen2 is significantly wider than that.
> > <
> > Hard Starboard Won't List ?
> > Bowel Deflection Wasn't Liked ?
> Intel codename. Haswell and Broadwell. A pair of rather similar microarchitectures, but FP parts are not identical.
<
Now, would it have not been easier to spell them out the first time?

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<5uuh3hp56ttjf8bek6j8vdhf077d17mo22@4ax.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24390&group=comp.arch#24390

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: gneun...@comcast.net (George Neuner)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Mon, 21 Mar 2022 18:41:55 -0400
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <5uuh3hp56ttjf8bek6j8vdhf077d17mo22@4ax.com>
References: <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de> <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com> <34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com> <9a6a22cd-48fb-4880-8468-b4f8561bfa20n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="c7bab279bcfd86cec6522a5493239951";
logging-data="22128"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kOCLLYfX3c/wklSnyC/vymDm9JvL0kv0="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:yhlGI84DGQxxJgjdQpPf0RyJcTY=

by: George Neuner - Mon, 21 Mar 2022 22:41 UTC

On Mon, 21 Mar 2022 14:10:49 -0700 (PDT), Quadibloc
<jsavard@ecn.ab.ca> wrote:

>On Monday, March 21, 2022 at 5:25:38 AM UTC-6, Michael S wrote:
>
>> Right now the closest thing to high-performance in-order CPU is Intel
>> Paulson which is quite significantly slower than the best OoO CPUs,
>> but I wouldn't call it slow in absolute sense.
>
>If it isn't possible to make an in-order CPU that is as fast
>as *the best there is*, then in-order CPUs are basically
>eliminated from consideration - for doing the heavy lifting.
>
>But that doesn't mean that they're useless. Even a 386 SX
>could run Windows 3.1 with good responsiveness, therefore
>a 486 CPU would be plenty fast enough for a core intended
>to run untrusted code.

i386sx worked for Windows 3.xx because all the GUI code - applications
and system libraries - was 16-bit and few people used any Win32s
applications. i386sx certainly did /not/ provide acceptable
performance for the 32-bit GUIs in Windows 9x or NT.

Similarly, i486 would not work well for 64-bit code. You probably are
correct that sandboxed code could make due with similar performance as
an i486 running Windows 9.x ... but the actual chip would need to be
64-bit.

>But just running untrusted code on a Spectre-proof CPU isn't
>good enough unless all the rest of your security is perfect, and
>that doesn't seem to be attainable. (But if this _were_ the only
>practical alternative, then perhaps more effort might be made
>to make operating systems genuinely secure. Yes, they're not
>secure now, but this is largely for lack of trying, in my opinion.
>_That's_ why I make this suggestion, though it may seem daft
>to Mitch.)
>
>John Savard

YMMV,
George

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<b25cb519-1f63-4f7f-a137-ba347676f654n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24392&group=comp.arch#24392

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:d87:b0:67b:311c:ecbd with SMTP id q7-20020a05620a0d8700b0067b311cecbdmr14258725qkl.146.1647906941511;
Mon, 21 Mar 2022 16:55:41 -0700 (PDT)
X-Received: by 2002:a9d:12c:0:b0:5cb:4c2c:8162 with SMTP id
41-20020a9d012c000000b005cb4c2c8162mr6182948otu.112.1647906941193; Mon, 21
Mar 2022 16:55:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 16:55:40 -0700 (PDT)
In-Reply-To: <eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:65dd:bb7:48b5:b678;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:65dd:bb7:48b5:b678
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com> <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com> <935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
<eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b25cb519-1f63-4f7f-a137-ba347676f654n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 21 Mar 2022 23:55:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 44

by: Michael S - Mon, 21 Mar 2022 23:55 UTC

On Tuesday, March 22, 2022 at 12:08:32 AM UTC+2, MitchAlsup wrote:
> On Monday, March 21, 2022 at 3:56:53 PM UTC-5, Michael S wrote:
> > On Monday, March 21, 2022 at 9:04:57 PM UTC+2, MitchAlsup wrote:
> > > On Monday, March 21, 2022 at 1:48:29 PM UTC-5, Michael S wrote:
> > > <
> > > > For example, in HSWL/BDWL each of two FPUs can be feed by 3 execution ports.
> > > <
> > > Can you say this again and use different words. Or at least define the words you are using.
> > Execution ports are defined both in Intel's "Intel® 64 and IA-32 Architectures Optimization Reference Manual", e.g. 248966-042b
> > and in Agner Fog's microarchitecture tables.
> <
> Converting out of Intel-esse: 3 instruction queues can feed either FPU.

According to my understanding, Intel's execution ports are not quite identical to instruction queues of K7/K8.
While it's true that, like on K7/K8, uOps are bond to execution ports rather early (last decode stage or rename, I'm not
sure which of the two) the storage for instructions is not separate by ports, but shared between ports.
Also, there is no provision for interport stealing of input or output paths/buses similar to how K8 handled, for example, integer multiplication.
Instead, all instruction or macroOps that produce more than one register result are splat into uOps that produce no more
than one register.
Probably, there are other differences too. That is, apart from the BIG difference of using PRF and dataless RSs.

> > > <
> > > > I also think that during FP loads
> > > > results can be forwarded into FPUs bypassing RF, but I am not sure about it.
> > > > And that just HSWL/BDWL, Zen2 is significantly wider than that.
> > > <
> > > Hard Starboard Won't List ?
> > > Bowel Deflection Wasn't Liked ?
> > Intel codename. Haswell and Broadwell. A pair of rather similar microarchitectures, but FP parts are not identical.
> <
> Now, would it have not been easier to spell them out the first time?

Not for me.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t1bqp8$vsg$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24395&group=comp.arch#24395

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Tue, 22 Mar 2022 06:36:24 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t1bqp8$vsg$1@newsreader4.netcologne.de>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de>
<sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad>
<ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com>
<t10mvq$4oe$1@gioia.aioe.org> <t11h9a$g5v$1@gioia.aioe.org>
<1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org>
<ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org>
<a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com>
<bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com>
<935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
<eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com>
<b25cb519-1f63-4f7f-a137-ba347676f654n@googlegroups.com>
Injection-Date: Tue, 22 Mar 2022 06:36:24 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:30bd:0:7285:c2ff:fe6c:992d";
logging-data="32656"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Tue, 22 Mar 2022 06:36 UTC

Michael S <already5chosen@yahoo.com> schrieb:

[Intel]

> Probably, there are other differences too. That is, apart from
> the BIG difference of using PRF and dataless RSs.

The Best In Group difference of using Public Relations Firms and
dataless Research Scientists.

Sounds a lot like what Intel does.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<dadf79d5-f4d9-4cf5-81e4-6c9c19495329n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24396&group=comp.arch#24396

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:450f:b0:67d:b1ee:bd3 with SMTP id t15-20020a05620a450f00b0067db1ee0bd3mr14596057qkp.766.1647931656193;
Mon, 21 Mar 2022 23:47:36 -0700 (PDT)
X-Received: by 2002:a05:6830:25d6:b0:5c9:49ef:3c5b with SMTP id
d22-20020a05683025d600b005c949ef3c5bmr9422790otu.331.1647931655913; Mon, 21
Mar 2022 23:47:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 23:47:35 -0700 (PDT)
In-Reply-To: <5uuh3hp56ttjf8bek6j8vdhf077d17mo22@4ax.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:5f3:4cc4:d5eb:d696;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:5f3:4cc4:d5eb:d696
References: <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com>
<fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com>
<t1744k$o4s$1@newsreader4.netcologne.de> <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com>
<c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com> <34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com>
<9a6a22cd-48fb-4880-8468-b4f8561bfa20n@googlegroups.com> <5uuh3hp56ttjf8bek6j8vdhf077d17mo22@4ax.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dadf79d5-f4d9-4cf5-81e4-6c9c19495329n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 22 Mar 2022 06:47:36 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 17

by: Quadibloc - Tue, 22 Mar 2022 06:47 UTC

On Monday, March 21, 2022 at 4:42:01 PM UTC-6, George Neuner wrote:

> Similarly, i486 would not work well for 64-bit code. You probably are
> correct that sandboxed code could make due with similar performance as
> an i486 running Windows 9.x ... but the actual chip would need to be
> 64-bit.

Of course this is true, but I presume the 486 design could be extended
to include EM64T, as well as all the other x86 architectural extensions
since then, up to and including AVX-512.

After all, even MMX only arrived on the Pentium.

But an architecturally-extended 64-bit version of the 486 would still not
be much faster than the original 486 - some operations would be speeded
up by having new instructions, and the process shrink would help as well.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<48817699-887e-43ef-a4c8-762d4323edf4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24397&group=comp.arch#24397

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:244f:b0:67d:ccec:3eaa with SMTP id h15-20020a05620a244f00b0067dccec3eaamr14627069qkn.744.1647931843666;
Mon, 21 Mar 2022 23:50:43 -0700 (PDT)
X-Received: by 2002:a05:6870:5829:b0:c8:9f42:f919 with SMTP id
r41-20020a056870582900b000c89f42f919mr1023291oap.54.1647931843445; Mon, 21
Mar 2022 23:50:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Mar 2022 23:50:43 -0700 (PDT)
In-Reply-To: <memo.20220321213421.1928d@jgd.cix.co.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:5f3:4cc4:d5eb:d696;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:5f3:4cc4:d5eb:d696
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220321213421.1928d@jgd.cix.co.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <48817699-887e-43ef-a4c8-762d4323edf4n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 22 Mar 2022 06:50:43 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 12

by: Quadibloc - Tue, 22 Mar 2022 06:50 UTC

On Monday, March 21, 2022 at 3:34:25 PM UTC-6, John Dallman wrote:

> Poulson and Kittson are discontinued, along with all other Itaniums.
> Nobody seems to have seriously considered continuing Itanium production
> as a way of avoiding Spectre vulnerabilities.

And I wouldn't either, given that Itanium isn't x86. If there was some way to
make an x86 chip that was as fast as Kittson while still being in-order because
of additional circuitry to efficiently translate x86 instructions on the fly to
Itanium instructions... but for many reasons, I suspect that doing so is utterly
impractical.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<ff80fb51-ded0-4b6b-9a63-a905d3ea1ce9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24398&group=comp.arch#24398

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:410b:b0:67d:d59c:13b8 with SMTP id j11-20020a05620a410b00b0067dd59c13b8mr15167657qko.449.1647932850947;
Tue, 22 Mar 2022 00:07:30 -0700 (PDT)
X-Received: by 2002:a05:6870:c595:b0:da:4ea1:991f with SMTP id
ba21-20020a056870c59500b000da4ea1991fmr1120180oab.147.1647932850652; Tue, 22
Mar 2022 00:07:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 22 Mar 2022 00:07:30 -0700 (PDT)
In-Reply-To: <t1bqp8$vsg$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:5f3:4cc4:d5eb:d696;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:5f3:4cc4:d5eb:d696
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com> <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com> <935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
<eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com> <b25cb519-1f63-4f7f-a137-ba347676f654n@googlegroups.com>
<t1bqp8$vsg$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff80fb51-ded0-4b6b-9a63-a905d3ea1ce9n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 22 Mar 2022 07:07:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 24

by: Quadibloc - Tue, 22 Mar 2022 07:07 UTC

On Tuesday, March 22, 2022 at 12:36:27 AM UTC-6, Thomas Koenig wrote:
> Michael S <already...@yahoo.com> schrieb:
>
> [Intel]
> > Probably, there are other differences too. That is, apart from
> > the BIG difference of using PRF and dataless RSs.

> The Best In Group difference of using Public Relations Firms and
> dataless Research Scientists.
>
> Sounds a lot like what Intel does.

Actually, it's

the *big* difference of using [a] Physical Register File [-based renaming
scheme] and dataless Reservation Stations

although finding out what RS meant required me to look at a PDF paper
instead of just a web page. Once I found, it though, it makes sense -
if you're giving up using an RRF (Rename Register File) and you're still
out-of-order, you need to do something else, so why not go partly back
to Tomasulo, but not all the way, keeping the data somewhere else instead
of in the reservation statiions.

John Savard

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<3cb7f110-6c5d-4a5c-a117-6b5ec8112af0n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24399&group=comp.arch#24399

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1051:b0:2e1:eb06:ecc2 with SMTP id f17-20020a05622a105100b002e1eb06ecc2mr18895236qte.171.1647933034599;
Tue, 22 Mar 2022 00:10:34 -0700 (PDT)
X-Received: by 2002:a05:6830:40c8:b0:5cb:557a:bba6 with SMTP id
h8-20020a05683040c800b005cb557abba6mr6169830otu.350.1647933034383; Tue, 22
Mar 2022 00:10:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 22 Mar 2022 00:10:34 -0700 (PDT)
In-Reply-To: <ff80fb51-ded0-4b6b-9a63-a905d3ea1ce9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:5f3:4cc4:d5eb:d696;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:5f3:4cc4:d5eb:d696
References: <sso6aq$37b$1@newsreader4.netcologne.de> <UPXHJ.6202$9O.4300@fx12.iad>
<ssplm7$1sv$1@newsreader4.netcologne.de> <sspnd8$3d6$1@newsreader4.netcologne.de>
<tJZHJ.10626$8Q.353@fx19.iad> <ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com> <t10mvq$4oe$1@gioia.aioe.org>
<t11h9a$g5v$1@gioia.aioe.org> <1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org> <ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org> <a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com> <bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com> <935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
<eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com> <b25cb519-1f63-4f7f-a137-ba347676f654n@googlegroups.com>
<t1bqp8$vsg$1@newsreader4.netcologne.de> <ff80fb51-ded0-4b6b-9a63-a905d3ea1ce9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3cb7f110-6c5d-4a5c-a117-6b5ec8112af0n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 22 Mar 2022 07:10:34 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 27

by: Quadibloc - Tue, 22 Mar 2022 07:10 UTC

On Tuesday, March 22, 2022 at 1:07:32 AM UTC-6, Quadibloc wrote:
> On Tuesday, March 22, 2022 at 12:36:27 AM UTC-6, Thomas Koenig wrote:
> > Michael S <already...@yahoo.com> schrieb:
> >
> > [Intel]
> > > Probably, there are other differences too. That is, apart from
> > > the BIG difference of using PRF and dataless RSs.
>
> > The Best In Group difference of using Public Relations Firms and
> > dataless Research Scientists.
> >
> > Sounds a lot like what Intel does.
> Actually, it's
>
> the *big* difference of using [a] Physical Register File [-based renaming
> scheme] and dataless Reservation Stations
>
> although finding out what RS meant required me to look at a PDF paper
> instead of just a web page. Once I found, it though, it makes sense -
> if you're giving up using an RRF (Rename Register File) and you're still
> out-of-order, you need to do something else, so why not go partly back
> to Tomasulo, but not all the way, keeping the data somewhere else instead
> of in the reservation statiions.

It was clear from context that he didn't mean Platelet-Rich Fibrin, which was
the result that Google favored on my first attempt at a search...

John Savard

Spectre ane EPIC (was: Why separate 32-bit arithmetic...)

<2022Mar22.132957@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24401&group=comp.arch#24401

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Spectre ane EPIC (was: Why separate 32-bit arithmetic...)
Date: Tue, 22 Mar 2022 12:29:57 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 64
Message-ID: <2022Mar22.132957@mips.complang.tuwien.ac.at>
References: <sso6aq$37b$1@newsreader4.netcologne.de> <619a3fdf-ae89-4feb-b36a-e96cc3f78e46n@googlegroups.com> <fe009d94-7665-4b2f-9e33-1bcd3967a48fn@googlegroups.com> <b19ea6d6-9d87-4ac4-b571-797a4bc1db2cn@googlegroups.com> <t1744k$o4s$1@newsreader4.netcologne.de> <0de33b45-2e18-4f69-a1f2-c633fa9dacaan@googlegroups.com> <c5fa7171-22da-4878-87a1-e7181edba629n@googlegroups.com> <34c92e84-445d-431b-a821-4f4e49da50ffn@googlegroups.com> <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="361c92486fcb72a1ebc3baaeb28ec05b";
logging-data="25444"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NCV1cgumHyKjxEcGUPcvW"
Cancel-Lock: sha1:EA2nb80k7+Ucb5Nrmckp4oC/Q6Y=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Tue, 22 Mar 2022 12:29 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>On Monday, March 21, 2022 at 5:25:38 AM UTC-6, Michael S wrote:
>> On Monday, March 21, 2022 at 5:03:59 AM UTC+2, Quadibloc wrote:
>
>> > But is there such a thing as a high-performance in-order CPU?
>
>> Right now the closest thing to high-performance in-order CPU is Intel Paulson which is
>> quite significantly slower than the best OoO CPUs,

And of course the point of IA-64 (or, more generally, EPIC) is to let
the compiler do the things that hardware does in an OoO core,
including speculative execution.

So if the compiler performs two dependent speculative loads, the
resulting code may exhibit a Spectre v1 vulnerability. If the
compiler performs one speculative load, and the hardware then performs
another speculative load far enough to update a cache (but without
writing the result back), the result is a Spectre vunlerability (and
of course other microarchitectural effects can be used as side
channels).

You can then decide to forego compiler speculation, but then the
resulting performance is even lower. You can try to analyse the cases
where speculation can be used by an attacker and where it cannot, and
only use speculation in the latter cases, but that is hard, and if we
could do it reliably, software mitigation of Spectre would be a viable
approach.

Note that Intel announced Itanium's end of life in 2019, after Spectre
was discovered. So Intel (and HP) did not see this discovery as an
opportunity to revitalise the IA-64 business.

>I had to look it up to refresh my memory. Intel Poulson is an *Itanium*,
>and this microarchitecture was even made a bit faster in Kittson by
>using a newer process.

Kittson was originally planned to be a 22nm shrink of Poulson, but
then decided to use the same 32nm process dimension as Poulson, and
the same microarchitecture. My impression is that it is a relabeled
Poulson, with process tweaks (or just binning) allowing a slightly
higher clock.

There is also Bonnell (in-order Atom):

Latex Benchmark, numbers are seconds run-time:

- HP workstation 900MHz Itanium II, Debian Linux 3.528
- Intel Atom 330, 1.6GHz, 512K L2 Zotac ION A, Debian 9 64bit 2.368
- Xeon W-1370P (=Core i7-11700K), 5200MHz, Debian 11 (64-bit) 0.175

Kittson could be had at a 3 times higher clock than the Itanium II,
and I wonder how much microarchitectural and compiler advances would
have bought compared to McKinley (Itanium II), but I expect that the
microarchitectural advances of Poulson/Kittson would not improve the
IPC by much for this benchmark (I think it is dependency-limited);
compiler advances when using gcc are probably not big since the time
when we installed Debian on it (we only got the Itanium II box when it
was retired by the original owner); more specialized compilers may
fare better, but it's unclear by how much.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<t1cib0$c3i$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24402&group=comp.arch#24402

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
Date: Tue, 22 Mar 2022 14:18:23 +0100
Organization: Aioe.org NNTP Server
Message-ID: <t1cib0$c3i$1@gioia.aioe.org>
References: <sso6aq$37b$1@newsreader4.netcologne.de>
<UPXHJ.6202$9O.4300@fx12.iad> <ssplm7$1sv$1@newsreader4.netcologne.de>
<sspnd8$3d6$1@newsreader4.netcologne.de> <tJZHJ.10626$8Q.353@fx19.iad>
<ssqrr7$ptr$1@newsreader4.netcologne.de>
<0cf5023d-3458-46d2-ad3d-fa0e6ecb18dfn@googlegroups.com>
<t10mvq$4oe$1@gioia.aioe.org> <t11h9a$g5v$1@gioia.aioe.org>
<1cb8bb2d-4e59-43f3-9992-ef658ec5ecden@googlegroups.com>
<t12aif$182v$1@gioia.aioe.org>
<ef80eb7c-8b61-4464-bd1c-e6551f826582n@googlegroups.com>
<t19ohr$r21$1@gioia.aioe.org>
<a41172c7-1903-400c-bfae-c54978d46605n@googlegroups.com>
<134185cd-d54d-4c8d-9fac-3dadae9f6624n@googlegroups.com>
<bd0c9217-7b0b-48b5-9532-acf2c3825806n@googlegroups.com>
<98c688f6-e886-4378-ab1a-85cf879832ben@googlegroups.com>
<935a2d8f-e3a8-46aa-b3c2-d2f4e25460b8n@googlegroups.com>
<eacd1a59-4b00-42e3-b37a-188064dfe3ean@googlegroups.com>
<b25cb519-1f63-4f7f-a137-ba347676f654n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="12402"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Tue, 22 Mar 2022 13:18 UTC

Michael S wrote:
> On Tuesday, March 22, 2022 at 12:08:32 AM UTC+2, MitchAlsup wrote:
>> On Monday, March 21, 2022 at 3:56:53 PM UTC-5, Michael S wrote:
>>> On Monday, March 21, 2022 at 9:04:57 PM UTC+2, MitchAlsup wrote:
>>>> On Monday, March 21, 2022 at 1:48:29 PM UTC-5, Michael S wrote:
>>>> <
>>>>> For example, in HSWL/BDWL each of two FPUs can be feed by 3 execution ports.
>>>> <
>>>> Can you say this again and use different words. Or at least define the words you are using.
>>> Execution ports are defined both in Intel's "IntelÂ® 64 and IA-32 Architectures Optimization Reference Manual", e.g. 248966-042b
>>> and in Agner Fog's microarchitecture tables.
>> <
>> Converting out of Intel-esse: 3 instruction queues can feed either FPU.
>
> According to my understanding, Intel's execution ports are not quite identical to instruction queues of K7/K8.
> While it's true that, like on K7/K8, uOps are bond to execution ports rather early (last decode stage or rename, I'm not
> sure which of the two) the storage for instructions is not separate by ports, but shared between ports.
> Also, there is no provision for interport stealing of input or output paths/buses similar to how K8 handled, for example, integer multiplication.
> Instead, all instruction or macroOps that produce more than one register result are splat into uOps that produce no more
> than one register.
> Probably, there are other differences too. That is, apart from the BIG difference of using PRF and dataless RSs.
>
>
>>>> <
>>>>> I also think that during FP loads
>>>>> results can be forwarded into FPUs bypassing RF, but I am not sure about it.
>>>>> And that just HSWL/BDWL, Zen2 is significantly wider than that.
>>>> <
>>>> Hard Starboard Won't List ?
>>>> Bowel Deflection Wasn't Liked ?
>>> Intel codename. Haswell and Broadwell. A pair of rather similar microarchitectures, but FP parts are not identical.
>> <
>> Now, would it have not been easier to spell them out the first time?
>
> Not for me.
>
I have probably spent more time on low-level Intel x86 than anyone else
here not working for Intel, and I would have been helped by having those
abbreviations spelled out.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24409&group=comp.arch#24409

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5bcd:0:b0:2e1:c6c4:ca00 with SMTP id b13-20020ac85bcd000000b002e1c6c4ca00mr20848253qtb.528.1647963203879;
Tue, 22 Mar 2022 08:33:23 -0700 (PDT)
X-Received: by 2002:a05:6830:40c8:b0:5cb:557a:bba6 with SMTP id
h8-20020a05683040c800b005cb557abba6mr6925618otu.350.1647963203655; Tue, 22
Mar 2022 08:33:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 22 Mar 2022 08:33:23 -0700 (PDT)
In-Reply-To: <memo.20220322141111.1928h@jgd.cix.co.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 22 Mar 2022 15:33:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 27

by: Michael S - Tue, 22 Mar 2022 15:33 UTC

On Tuesday, March 22, 2022 at 4:11:14 PM UTC+2, John Dallman wrote:
> In article <82d668ec-a140-454c...@googlegroups.com>,
> jsa...@ecn.ab.ca (Quadibloc) wrote:
> > But it doesn't address the issue of *running existing Windows
> > software* which is basically what a computer is *for* as far
> > as the market is concerned.
> You're neglecting "Running Android apps". which sells a great many more
> ARM-based devices, although they're lower-cost.
>
> However, neither Intel/AMD nor any of the ARM manufacturers seem to have
> found a practical way to achieve high performance without speculation and
> the consequential Spectre-like vulnerabilities. The world as a whole's
> response to this problem seems to be to try to ignore it, aided by the
> lack of publicised Spectre attacks.
>
> John

I think, it is a very sane response.
Spectre-V1 as well any other existing or future Spectre variants that do not penetrate official trust boundaries
should be dealt in software i.e. in JavaScript JIT engines of Web browsers. The only thing that is required by HW is providing tools for such handling.
In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
for conditional moves, data dependencies will remain non-speculative in all future implementations.
I didn't check, may be, ARMv9-A already provides such guarantee,

As to RISC-V, IMHO, they have to add MIPSr6-style conditional zeroize instruction into mandatory part of
the ISA at least for cores, that feature virtual memory support.

Spectre (was: Why separate 32-bit arithmetic ...)

<2022Mar23.093749@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24422&group=comp.arch#24422

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Spectre (was: Why separate 32-bit arithmetic ...)
Date: Wed, 23 Mar 2022 08:37:49 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 133
Message-ID: <2022Mar23.093749@mips.complang.tuwien.ac.at>
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="59d4db3a8cbbe98be236cd8a7a3c085e";
logging-data="12301"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18OEePJOqxNyizFIq8d6APy"
Cancel-Lock: sha1:DhoM03QqaDiUKkbXa5E2Q5Mz1uY=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Wed, 23 Mar 2022 08:37 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Tuesday, March 22, 2022 at 4:11:14 PM UTC+2, John Dallman wrote:
>> However, neither Intel/AMD nor any of the ARM manufacturers seem to have
>> found a practical way to achieve high performance without speculation and
>> the consequential Spectre-like vulnerabilities.

Spectre vulnerabilities are in no way a consequence of speculation,
just like corrupted architectural state is not a consequence of
speculation.

>> The world as a whole's
>> response to this problem seems to be to try to ignore it, aided by the
>> lack of publicised Spectre attacks.

Spectre attacks have been well-publicized. I guess you mean that
Spectre attacks by real-world attackers (rather than security
researchers) have not been publicized. The question is if that's
because they don't occur, or because such attacks are so subtle that
you don't notice when they occur, and their noticable effects are far
enough from the attack that even if you notice it you don't know that
Spectre was used to get it (and given that the common narrative is
that Spectre is not used for real-world attacks, you are likely to
dismiss that possibility, and even if you don't, others will dismiss
your suspicion).

Of course there are powerful interests that are well-known to have
worked on subverting security (especially in subtle ways, e.g., by
weakening random-number generators). They are likely interested in
CPUs continuing to be vulnerable to Spectre.

>I think, it is a very sane response.
>Spectre-V1 as well any other existing or future Spectre variants that do not penetrate official trust boundaries

Is security based on not performing architectural accesses to the
secret any less official than security based on page table entries
(another architectural feature)? Not in my book, and certainly not in
the architecture manuals of the CPU manufacturers.

>should be dealt in software i.e. in JavaScript JIT engines of Web browsers.

But if the only security you consider official (and therefore the only
one the hardware is required to guarantee IYO) is that provided by
page table entries, that's incredibly expensive: The JavaScript JIT
engine would have to change the page table entries to allow access
before the access, and change the page table entries again after the
access to disallow the access. What's the slowdown of that? A factor
100?

>The only thing that is required by HW is providing tools for such handling.
>In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
>implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
>for conditional moves, data dependencies will remain non-speculative in all future implementations.

I don't think that any existing OoO implementation (certainly not all)
waits with a conditional move instruction (or an instruction dependent
on a conditional move) until it is no longer speculative. You may be
meaning that they treat the condition as a data dependency, rather
than something that itself is subject to speculation.

Assuming you mean the latter, how would such a guarantee give any
"official" security guarantee? The next time it's inconvenient for
the CPU manufacturers or the NSA to fix a bug, they will just take
your stand and say that the only official security is that implemented
through page-table entries.

Can someone define a security model somewhere between page-table-based
security and architectural security that involves conditional move
instructions? It has been almost 5 years since Intel and AMD have
been informed of Spectre, and they have not defined such a security
model in their architecture manual despite your claim that all
existing hardware satisfies it. Maybe it's not as easy as you think
it is.

But let's assume that they actually defined such a security model in
their hardware manuals, what would be the consequence.
Implementations of programming languages with bounds-checked arrays
could introduce a conditional move on every index computation; note
that this is quite a bit more often than the original bounds checking;
you can eliminate a lot of bounds-check branches by knowing that the
branch only checks a bound that already has been checked. You cannot
do that in all those cases for index operations: E.g., if you have (in
Java):

for (i=0; i<n; i++) {
... a[i] ...
}

a single check of n against the a.length is sufficient to eliminate
all architectural out-of-bounds accesses. But every a[i] access can
be speculated independently, so you would have to add a conditional
move into every execution of a[i]. This will cost quite a bit of
performance, probably significantly more than a proper hardware fix
for Spectre.

And then you have those languages that are not bounds-checked, in
particular C, where inserting all these additional conditional moves
cannot be done automatically, because the compiler in many cases does
not know the array bounds. People have given up on IMO promising
approaches for architectural security for C like Deputy
<http://ivy.cs.berkeley.edu/ivywiki/uploads/deputy-manual.html>, with
new security vulnerabilities for C programs being reported frequently
that are exploited in non-subtle ways. So I very much doubt people
will take a similar effort (and incur the run-time penalty!) to add
conditional-move-based security to C programs to mitigate Spectre, for
which the narrative is that nobody uses it for real-world attacks.

And if the C libraries called by your JavaScript or Java programs do
not have the software mitigation you suggest, adding that software
mitigation to the JavaScript or Java compiler gives little benefit.

In conclusion, dealing with Spectre "in software" the way you suggest

* complicates the architecture descriptions.

* will not be accepted for the C parts of our software stacks (and in
the few cases where it is accepted, it will cause a slowdown).

* will slow down the Java/JavaScript parts of our software stacks, but
the software stack will still be vulnerable thanks to the C parts.

By contrast, fixing Spectre in hardware the way I suggest

* costs a little area.

* costs hardly any performance.

* works for the whole software stack without needing any changes to
the architecture definition or the software.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Spectre (was: Why separate 32-bit arithmetic ...)

<02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24424&group=comp.arch#24424

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:3184:b0:67d:cce9:bab4 with SMTP id bi4-20020a05620a318400b0067dcce9bab4mr18506167qkb.685.1648043122960;
Wed, 23 Mar 2022 06:45:22 -0700 (PDT)
X-Received: by 2002:a4a:dd15:0:b0:320:da3c:c342 with SMTP id
m21-20020a4add15000000b00320da3cc342mr32799oou.7.1648043122631; Wed, 23 Mar
2022 06:45:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Mar 2022 06:45:22 -0700 (PDT)
In-Reply-To: <2022Mar23.093749@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
<memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
<2022Mar23.093749@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
Subject: Re: Spectre (was: Why separate 32-bit arithmetic ...)
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 23 Mar 2022 13:45:22 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 189

by: Michael S - Wed, 23 Mar 2022 13:45 UTC

On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Tuesday, March 22, 2022 at 4:11:14 PM UTC+2, John Dallman wrote:
> >> However, neither Intel/AMD nor any of the ARM manufacturers seem to have
> >> found a practical way to achieve high performance without speculation and
> >> the consequential Spectre-like vulnerabilities.
> Spectre vulnerabilities are in no way a consequence of speculation,
> just like corrupted architectural state is not a consequence of
> speculation.
> >> The world as a whole's
> >> response to this problem seems to be to try to ignore it, aided by the
> >> lack of publicised Spectre attacks.
> Spectre attacks have been well-publicized. I guess you mean that
> Spectre attacks by real-world attackers (rather than security
> researchers) have not been publicized. The question is if that's
> because they don't occur, or because such attacks are so subtle that
> you don't notice when they occur, and their noticable effects are far
> enough from the attack that even if you notice it you don't know that
> Spectre was used to get it (and given that the common narrative is
> that Spectre is not used for real-world attacks, you are likely to
> dismiss that possibility, and even if you don't, others will dismiss
> your suspicion).
>
> Of course there are powerful interests that are well-known to have
> worked on subverting security (especially in subtle ways, e.g., by
> weakening random-number generators). They are likely interested in
> CPUs continuing to be vulnerable to Spectre.
> >I think, it is a very sane response.
> >Spectre-V1 as well any other existing or future Spectre variants that do not penetrate official trust boundaries
> Is security based on not performing architectural accesses to the
> secret any less official than security based on page table entries
> (another architectural feature)? Not in my book, and certainly not in
> the architecture manuals of the CPU manufacturers.
> >should be dealt in software i.e. in JavaScript JIT engines of Web browsers.
> But if the only security you consider official (and therefore the only
> one the hardware is required to guarantee IYO) is that provided by
> page table entries, that's incredibly expensive: The JavaScript JIT
> engine would have to change the page table entries to allow access
> before the access, and change the page table entries again after the
> access to disallow the access. What's the slowdown of that? A factor
> 100?

That's not my suggestion.
But it seem that below you figured out what I mean.

I don't follow.
If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
then SW can rely on it.

> Can someone define a security model somewhere between page-table-based
> security and architectural security that involves conditional move
> instructions? It has been almost 5 years since Intel and AMD have
> been informed of Spectre, and they have not defined such a security
> model in their architecture manual despite your claim that all
> existing hardware satisfies it. Maybe it's not as easy as you think
> it is.
>

Right now not only cmove but all Intel and AMD instructions have all data dependencies non-speculative.
I'd guess, the fact wouldn't by codified in manuals until the have plans to make some of dependencies speculative.
That's simple how it worked until now in x86 world - hw practice leads and manuals follow.
Back in mid-00s when they finally documented memory ordering model, it was the same story.

So, I expect that guarantees will first appear in ARM manuals and then Intel and AMD will say that they are the same.

> But let's assume that they actually defined such a security model in
> their hardware manuals, what would be the consequence.
> Implementations of programming languages with bounds-checked arrays

It only matters if the language guarantees sandboxing properties.
Most bound-checked languages do not.
JS does and everybody care.
Java sort-of does, but 99.9% of users do not care
..Net used to, but I think their ambitions in that area (Silverlight) died full decade ago.
Which language else care?

> could introduce a conditional move on every index computation; note
> that this is quite a bit more often than the original bounds checking;
> you can eliminate a lot of bounds-check branches by knowing that the
> branch only checks a bound that already has been checked. You cannot
> do that in all those cases for index operations: E.g., if you have (in
> Java):
>
> for (i=0; i<n; i++) {
> ... a[i] ...
> }

That's easy.
p = a.ptr;
if (n >= a.length) {
throw()
} p = n >= a.length ? 0 : p; // implemented as cmove/select
for (i=0; i<n; i++) {
... p[i] ...
}

>
> a single check of n against the a.length is sufficient to eliminate
> all architectural out-of-bounds accesses. But every a[i] access can
> be speculated independently, so you would have to add a conditional
> move into every execution of a[i]. This will cost quite a bit of
> performance, probably significantly more than a proper hardware fix
> for Spectre.
>

"Proper hardware fix for Spectre" is a pipe dream, IMHO.
At least I, personally, would not pay for it even 3% either in performance or in money.

> And then you have those languages that are not bounds-checked, in
> particular C, where inserting all these additional conditional moves
> cannot be done automatically, because the compiler in many cases does
> not know the array bounds. People have given up on IMO promising
> approaches for architectural security for C like Deputy
> <http://ivy.cs.berkeley.edu/ivywiki/uploads/deputy-manual.html>, with
> new security vulnerabilities for C programs being reported frequently
> that are exploited in non-subtle ways. So I very much doubt people
> will take a similar effort (and incur the run-time penalty!) to add
> conditional-move-based security to C programs to mitigate Spectre, for
> which the narrative is that nobody uses it for real-world attacks.
>

I don't understand how 'C' came into discussion.
C does not promise anything w.r.t. read protection parts of program
state from any parts of the same programm so no promise can be violated.

> And if the C libraries called by your JavaScript or Java programs do
> not have the software mitigation you suggest, adding that software
> mitigation to the JavaScript or Java compiler gives little benefit.

C library called by JavaScript is a fixed code. It can't be modified by attacker to suit his needs.
There is absolutely no needs for Spectre mitigations in such library.

>
> In conclusion, dealing with Spectre "in software" the way you suggest
>
> * complicates the architecture descriptions.

Yes. I prefer to call it "tighten", but yes it's not very simple.
Still, much simpler than, for example, memory ordering or than many other things related to cache coherence.

>
> * will not be accepted for the C parts of our software stacks (and in
> the few cases where it is accepted, it will cause a slowdown).
>
> * will slow down the Java/JavaScript parts of our software stacks, but
> the software stack will still be vulnerable thanks to the C parts.

Slowdown will be (hopefully, we can use present tense - is) very subtle.
I don't see how C part is relevant.

>
> By contrast, fixing Spectre in hardware the way I suggest
>
> * costs a little area.
>
> * costs hardly any performance.
>

Handwaving.

> * works for the whole software stack without needing any changes to
> the architecture definition or the software.

Click here to read the complete article

Re: Spectre

<%OF_J.126355$ZmJ7.99768@fx06.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24425&group=comp.arch#24425

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news2.arglkargh.de!news.karotte.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx06.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Spectre
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com> <2022Mar23.093749@mips.complang.tuwien.ac.at>
In-Reply-To: <2022Mar23.093749@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 129
Message-ID: <%OF_J.126355$ZmJ7.99768@fx06.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 23 Mar 2022 14:05:47 UTC
Date: Wed, 23 Mar 2022 10:05:38 -0400
X-Received-Bytes: 7781

by: EricP - Wed, 23 Mar 2022 14:05 UTC

Anton Ertl wrote:
> Michael S <already5chosen@yahoo.com> writes:
>> On Tuesday, March 22, 2022 at 4:11:14 PM UTC+2, John Dallman wrote:
>>> However, neither Intel/AMD nor any of the ARM manufacturers seem to have
>>> found a practical way to achieve high performance without speculation and
>>> the consequential Spectre-like vulnerabilities.
>
> Spectre vulnerabilities are in no way a consequence of speculation,
> just like corrupted architectural state is not a consequence of
> speculation.

I assume you mean Spectre is not an *inevitable* consequence of speculation,
just a consequence of its current implementation.

>>> The world as a whole's
>>> response to this problem seems to be to try to ignore it, aided by the
>>> lack of publicised Spectre attacks.
>
> Spectre attacks have been well-publicized. I guess you mean that
> Spectre attacks by real-world attackers (rather than security
> researchers) have not been publicized. The question is if that's
> because they don't occur, or because such attacks are so subtle that
> you don't notice when they occur, and their noticable effects are far
> enough from the attack that even if you notice it you don't know that
> Spectre was used to get it (and given that the common narrative is
> that Spectre is not used for real-world attacks, you are likely to
> dismiss that possibility, and even if you don't, others will dismiss
> your suspicion).
>
> Of course there are powerful interests that are well-known to have
> worked on subverting security (especially in subtle ways, e.g., by
> weakening random-number generators). They are likely interested in
> CPUs continuing to be vulnerable to Spectre.
>
>> I think, it is a very sane response.
>> Spectre-V1 as well any other existing or future Spectre variants that do not penetrate official trust boundaries
>
> Is security based on not performing architectural accesses to the
> secret any less official than security based on page table entries
> (another architectural feature)? Not in my book, and certainly not in
> the architecture manuals of the CPU manufacturers.

That is not official security, as defined by the designers of that system.
JavaScript can stamp its feet all it likes, but just because someone
wrote a program that time-shares multiple insecure, unverified programs
inside a security domain doesn't modify and expand the guarantees defined
by the HW designers that is address space based security domains.

Which is why JavaScript now runs is separate process address spaces.

>> should be dealt in software i.e. in JavaScript JIT engines of Web browsers.
>
> But if the only security you consider official (and therefore the only
> one the hardware is required to guarantee IYO) is that provided by
> page table entries, that's incredibly expensive: The JavaScript JIT
> engine would have to change the page table entries to allow access
> before the access, and change the page table entries again after the
> access to disallow the access. What's the slowdown of that? A factor
> 100?

Which is probably why, as far as I know, no one does it that way.
Switching address spaces to run programs and passing messages between
cooperating processes is a common technique before multi-threading.
It can be cheaper than pipes between unix processes because you can
create large shared memory sections for message passing.

>> The only thing that is required by HW is providing tools for such handling.
>> In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
>> implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
>> for conditional moves, data dependencies will remain non-speculative in all future implementations.
>
> I don't think that any existing OoO implementation (certainly not all)
> waits with a conditional move instruction (or an instruction dependent
> on a conditional move) until it is no longer speculative. You may be
> meaning that they treat the condition as a data dependency, rather
> than something that itself is subject to speculation.
>
> Assuming you mean the latter, how would such a guarantee give any
> "official" security guarantee? The next time it's inconvenient for
> the CPU manufacturers or the NSA to fix a bug, they will just take
> your stand and say that the only official security is that implemented
> through page-table entries.
>
> Can someone define a security model somewhere between page-table-based
> security and architectural security that involves conditional move
> instructions? It has been almost 5 years since Intel and AMD have
> been informed of Spectre, and they have not defined such a security
> model in their architecture manual despite your claim that all
> existing hardware satisfies it. Maybe it's not as easy as you think
> it is.
>
> But let's assume that they actually defined such a security model in
> their hardware manuals, what would be the consequence.
> Implementations of programming languages with bounds-checked arrays
> could introduce a conditional move on every index computation; note
> that this is quite a bit more often than the original bounds checking;
> you can eliminate a lot of bounds-check branches by knowing that the
> branch only checks a bound that already has been checked. You cannot
> do that in all those cases for index operations: E.g., if you have (in
> Java):
>
> for (i=0; i<n; i++) {
> ... a[i] ...
> }
>
> a single check of n against the a.length is sufficient to eliminate
> all architectural out-of-bounds accesses. But every a[i] access can
> be speculated independently, so you would have to add a conditional
> move into every execution of a[i]. This will cost quite a bit of
> performance, probably significantly more than a proper hardware fix
> for Spectre.

I want to have various Compare-And-Trap CATcc instructions for exactly
these kinds of bounds checks. That makes bounds checks cheap,
however those too could be vulnerable to Spectre. Normally in OoO
subsequent instructions would not be data flow dependent a CAT
instruction and so would not be blocked from early execution.

A solution I was thinking of is a new variant of a Compare-And-Trap that
copies the checked value to a dest register and acts as a data flow gate.
Move-Trap-Conditional MTCcc compares to a limit value and traps or moves.
That makes the checked value data flow dependent on the guard condition
and at run time costs only forwarding a value if the guard condition is
already resolved, or delays until it is resolved.

Of course this doesn't help software in languages that don't have array
bounds to automatically check, but a compiler could have intrinsics that
make such checks as run time cheap as possible.

Re: Spectre

<2cG_J.96741$dln7.46979@fx03.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24426&group=comp.arch#24426

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx03.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Spectre
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com> <2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
In-Reply-To: <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 35
Message-ID: <2cG_J.96741$dln7.46979@fx03.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 23 Mar 2022 14:32:30 UTC
Date: Wed, 23 Mar 2022 10:32:31 -0400
X-Received-Bytes: 2750

by: EricP - Wed, 23 Mar 2022 14:32 UTC

Michael S wrote:
> On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>>> The only thing that is required by HW is providing tools for such handling.
>>> In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
>>> implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
>>> for conditional moves, data dependencies will remain non-speculative in all future implementations.

There has been a lot of research on *value* speculation though
I don't think any has made it into the commercial market.
For example, value speculation allows speculative forwarding
of a store value to a subsequent load and then at load retire
checks that the value was correct.

>> I don't think that any existing OoO implementation (certainly not all)
>> waits with a conditional move instruction (or an instruction dependent
>> on a conditional move) until it is no longer speculative. You may be
>> meaning that they treat the condition as a data dependency, rather
>> than something that itself is subject to speculation.
>>
>> Assuming you mean the latter, how would such a guarantee give any
>> "official" security guarantee? The next time it's inconvenient for
>> the CPU manufacturers or the NSA to fix a bug, they will just take
>> your stand and say that the only official security is that implemented
>> through page-table entries.
>>
>
>
> I don't follow.
> If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
> then SW can rely on it.

Value speculation would effectively break this CMOV guarantee
because the CMOV wouldn't know that its values were speculative.

Re: Spectre (was: Why separate 32-bit arithmetic ...)

<a24a8df0-2adc-4728-9c83-193d9e41f487n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24427&group=comp.arch#24427

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5961:0:b0:435:a1d7:c243 with SMTP id eq1-20020ad45961000000b00435a1d7c243mr671185qvb.46.1648056471005;
Wed, 23 Mar 2022 10:27:51 -0700 (PDT)
X-Received: by 2002:aca:ead4:0:b0:2ec:ba66:12df with SMTP id
i203-20020acaead4000000b002ecba6612dfmr5603131oih.194.1648056470403; Wed, 23
Mar 2022 10:27:50 -0700 (PDT)
Path: i2pn2.org!rocksolid2!news.neodome.net!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Mar 2022 10:27:50 -0700 (PDT)
In-Reply-To: <2022Mar23.093749@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd7f:a8f3:4756:422e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd7f:a8f3:4756:422e
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
<memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
<2022Mar23.093749@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a24a8df0-2adc-4728-9c83-193d9e41f487n@googlegroups.com>
Subject: Re: Spectre (was: Why separate 32-bit arithmetic ...)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 23 Mar 2022 17:27:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 172

by: MitchAlsup - Wed, 23 Mar 2022 17:27 UTC

On Wednesday, March 23, 2022 at 4:58:49 AM UTC-5, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Tuesday, March 22, 2022 at 4:11:14 PM UTC+2, John Dallman wrote:
> >> However, neither Intel/AMD nor any of the ARM manufacturers seem to have
> >> found a practical way to achieve high performance without speculation and
> >> the consequential Spectre-like vulnerabilities.
> Spectre vulnerabilities are in no way a consequence of speculation,
<
I could argue it both ways, but in a fundamental sense, you are correct:
Spectré is a consequence of allowing microarchitectural state to become
visible even if an instruction got squashed later on, and thus, is independent
of speculation per sé.
<
> just like corrupted architectural state is not a consequence of
> speculation.
> >> The world as a whole's
> >> response to this problem seems to be to try to ignore it, aided by the
> >> lack of publicised Spectre attacks.
> Spectre attacks have been well-publicized. I guess you mean that
> Spectre attacks by real-world attackers (rather than security
> researchers) have not been publicized. The question is if that's
> because they don't occur, or because such attacks are so subtle that
> you don't notice when they occur, and their noticable effects are far
> enough from the attack that even if you notice it you don't know that
> Spectre was used to get it (and given that the common narrative is
> that Spectre is not used for real-world attacks, you are likely to
> dismiss that possibility, and even if you don't, others will dismiss
> your suspicion).
<
None of the above: Neither of the available choices is immune from
Spectré--and no one has stepped up the game and presented an option
that is both fast and immune. Thus, the market has NO choice but to
just buy computers and hope for the best.
<
>
> Of course there are powerful interests that are well-known to have
> worked on subverting security (especially in subtle ways, e.g., by
> weakening random-number generators). They are likely interested in
> CPUs continuing to be vulnerable to Spectre.
> >I think, it is a very sane response.
> >Spectre-V1 as well any other existing or future Spectre variants that do not penetrate official trust boundaries
> Is security based on not performing architectural accesses to the
> secret any less official than security based on page table entries
> (another architectural feature)? Not in my book, and certainly not in
> the architecture manuals of the CPU manufacturers.
> >should be dealt in software i.e. in JavaScript JIT engines of Web browsers.
> But if the only security you consider official (and therefore the only
> one the hardware is required to guarantee IYO) is that provided by
> page table entries, that's incredibly expensive: The JavaScript JIT
> engine would have to change the page table entries to allow access
> before the access, and change the page table entries again after the
> access to disallow the access. What's the slowdown of that? A factor
> 100?
<
The JITer should not be modifying JIT-code which is executable by a JIT
consumer. Thus, write and execute access are not permitted at the same
time. To get rid of execute in order to write involves a page-shootdown
which is exceedingly expensive. So, you get into this "I'm smart enough
to hold a gun to my foot with my finger on the trigger" mantra which allows
JITer to modify lines on the page which can be known not to be in use
by JIT-consumer. I, personally, don't think JITers are that smart.
<
> >The only thing that is required by HW is providing tools for such handling.
> >In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
> >implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
> >for conditional moves, data dependencies will remain non-speculative in all future implementations.
<
> I don't think that any existing OoO implementation (certainly not all)
> waits with a conditional move instruction (or an instruction dependent
> on a conditional move) until it is no longer speculative. You may be
> meaning that they treat the condition as a data dependency, rather
> than something that itself is subject to speculation.
<
Waiting until non-speculative would be the kiss of death to performance.
>
> Assuming you mean the latter, how would such a guarantee give any
> "official" security guarantee? The next time it's inconvenient for
> the CPU manufacturers or the NSA to fix a bug, they will just take
> your stand and say that the only official security is that implemented
> through page-table entries.
>
> Can someone define a security model somewhere between page-table-based
> security and architectural security that involves conditional move
> instructions? It has been almost 5 years since Intel and AMD have
> been informed of Spectre, and they have not defined such a security
> model in their architecture manual despite your claim that all
> existing hardware satisfies it. Maybe it's not as easy as you think
> it is.
>
> But let's assume that they actually defined such a security model in
> their hardware manuals, what would be the consequence.
> Implementations of programming languages with bounds-checked arrays
> could introduce a conditional move on every index computation; note
> that this is quite a bit more often than the original bounds checking;
> you can eliminate a lot of bounds-check branches by knowing that the
> branch only checks a bound that already has been checked. You cannot
> do that in all those cases for index operations: E.g., if you have (in
> Java):
>
> for (i=0; i<n; i++) {
> ... a[i] ...
> }
>
> a single check of n against the a.length is sufficient to eliminate
> all architectural out-of-bounds accesses. But every a[i] access can
> be speculated independently, so you would have to add a conditional
> move into every execution of a[i]. This will cost quite a bit of
> performance, probably significantly more than a proper hardware fix
> for Spectre.
>
> And then you have those languages that are not bounds-checked, in
> particular C, where inserting all these additional conditional moves
> cannot be done automatically, because the compiler in many cases does
> not know the array bounds. People have given up on IMO promising
> approaches for architectural security for C like Deputy
> <http://ivy.cs.berkeley.edu/ivywiki/uploads/deputy-manual.html>, with
> new security vulnerabilities for C programs being reported frequently
> that are exploited in non-subtle ways. So I very much doubt people
> will take a similar effort (and incur the run-time penalty!) to add
> conditional-move-based security to C programs to mitigate Spectre, for
> which the narrative is that nobody uses it for real-world attacks.
>
> And if the C libraries called by your JavaScript or Java programs do
> not have the software mitigation you suggest, adding that software
> mitigation to the JavaScript or Java compiler gives little benefit.
>
> In conclusion, dealing with Spectre "in software" the way you suggest
<
I don't see how SW can be both high performing and Spectré immune if
the underlying HW is not Spectré immune.
<
>
> * complicates the architecture descriptions.
>
> * will not be accepted for the C parts of our software stacks (and in
> the few cases where it is accepted, it will cause a slowdown).
>
> * will slow down the Java/JavaScript parts of our software stacks, but
> the software stack will still be vulnerable thanks to the C parts.
>
> By contrast, fixing Spectre in hardware the way I suggest
>
> * costs a little area.
<
I have suggested the area penalty is small maybe even close to invisible.
>
> * costs hardly any performance.
<
I have suggested the performance penalty is essentially invisible.
>
> * works for the whole software stack without needing any changes to
> the architecture definition or the software.
<
What is the world going to do is x86, ARM, and RISC-V all fail to have
immunity from Spectré ?
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Spectre

<552789f8-7daa-4198-b76d-b2a3aa4f27ccn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24428&group=comp.arch#24428

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:d87:b0:67b:311c:ecbd with SMTP id q7-20020a05620a0d8700b0067b311cecbdmr704518qkl.146.1648056639095;
Wed, 23 Mar 2022 10:30:39 -0700 (PDT)
X-Received: by 2002:a05:6808:1451:b0:2ec:cfe4:21e with SMTP id
x17-20020a056808145100b002eccfe4021emr5506704oiv.147.1648056638840; Wed, 23
Mar 2022 10:30:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Mar 2022 10:30:38 -0700 (PDT)
In-Reply-To: <2cG_J.96741$dln7.46979@fx03.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd7f:a8f3:4756:422e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd7f:a8f3:4756:422e
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
<memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
<2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
<2cG_J.96741$dln7.46979@fx03.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <552789f8-7daa-4198-b76d-b2a3aa4f27ccn@googlegroups.com>
Subject: Re: Spectre
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 23 Mar 2022 17:30:39 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 39

by: MitchAlsup - Wed, 23 Mar 2022 17:30 UTC

On Wednesday, March 23, 2022 at 9:32:33 AM UTC-5, EricP wrote:
> Michael S wrote:
> > On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
> >> Michael S <already...@yahoo.com> writes:
> >>> The only thing that is required by HW is providing tools for such handling.
> >>> In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
> >>> implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
> >>> for conditional moves, data dependencies will remain non-speculative in all future implementations.
> There has been a lot of research on *value* speculation though
> I don't think any has made it into the commercial market.
> For example, value speculation allows speculative forwarding
> of a store value to a subsequent load and then at load retire
> checks that the value was correct.
<
This ruins the concept of memory consistency when you get good
enough at speculating data that the memory reference is never
actually performed.
<
> >> I don't think that any existing OoO implementation (certainly not all)
> >> waits with a conditional move instruction (or an instruction dependent
> >> on a conditional move) until it is no longer speculative. You may be
> >> meaning that they treat the condition as a data dependency, rather
> >> than something that itself is subject to speculation.
> >>
> >> Assuming you mean the latter, how would such a guarantee give any
> >> "official" security guarantee? The next time it's inconvenient for
> >> the CPU manufacturers or the NSA to fix a bug, they will just take
> >> your stand and say that the only official security is that implemented
> >> through page-table entries.
> >>
> >
> >
> > I don't follow.
> > If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
> > then SW can rely on it.
> Value speculation would effectively break this CMOV guarantee
> because the CMOV wouldn't know that its values were speculative.
<
No-one is going to produce a machine where CMOV cannot be performed
under speculation based entirely on data (and control) flow.

Re: Spectre (was: Why separate 32-bit arithmetic ...)

<2022Mar23.165254@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24430&group=comp.arch#24430

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Spectre (was: Why separate 32-bit arithmetic ...)
Date: Wed, 23 Mar 2022 15:52:54 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 267
Message-ID: <2022Mar23.165254@mips.complang.tuwien.ac.at>
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com> <2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="59d4db3a8cbbe98be236cd8a7a3c085e";
logging-data="16382"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qUMbBEO/vtaV1v448xuED"
Cancel-Lock: sha1:VUU0qz68NUHe17HEMoDzxfD+zIs=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Wed, 23 Mar 2022 15:52 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>> The next time it's inconvenient for
>> the CPU manufacturers or the NSA to fix a bug, they will just take
>> your stand and say that the only official security is that implemented
>> through page-table entries.
>>
>
>
>I don't follow.
>If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
>then SW can rely on it.

Sorry, "speculative" is not an architectural term. What is "SW can
rely on it" supposed to mean?

Consider the following machine code:

0: b8 00 00 00 00 mov eax,0x0
5: 48 39 d6 cmp rsi,rdx
8: 72 01 jb b <foo+0xb>
a: c3 ret
b: 48 8b 04 f7 mov rax,QWORD PTR [rdi+rsi*8]
f: eb f9 jmp a <foo+0xa>

[ This was generated by gcc-10 -O from:

long foo(long *a, unsigned long i, unsigned long n)
{
if (i<n)
return a[i];
return 0;
}

Can anyone explain the jump back and the placement of the instruction
at 0 to me? ]

If I look in the architecture manual what this program does, it tells
me that if rsi>=rdx, the jb falls through and the code never reaches
the mov at offset b, and therefore does not perform it and does not
access the data at rdi+rsi*8. Now you claim that this official
architecture manual (which also describes the page tables) does not
describe the official trust boundaries. Why should a statement about
"speculative dependencies" be any different? Assumung your statement
is added to the manuals, and people put cmovs in all kinds of places,
and then a vulnerability is discovered, someone could claim that
that's not "official trust boundaries", so it's not a bug, and
therefore software should work around it. And this could even happen
if the conditional move performed speculative execution based on the
condition, because someone (probably not you in this case, but someone
like you, maybe EricP) would claim that the only official trust
boundary is that implemented in page tables.

But note that Meltdown has taught us that page-table entries cannot be
relied on, either; fortunately that was easier to fix in hardware and
easier to mitigate in software, but does that make page tables more
official than the architectural execution sequence?

>Right now not only cmove but all Intel and AMD instructions have all data dependencies non-speculative.

In your dreams. Alias prediction has been used for quite some time.
One can probably work out a Spectre vulnerability based on it.

>I'd guess, the fact wouldn't by codified in manuals until the have plans to make some of dependencies speculative.

They also have not codified branch prediction in the architecture
manuals (only in the optimization manuals). That's because branch
prediction is microarchitecture, not architecture.

>That's simple how it worked until now in x86 world - hw practice leads and manuals follow.
>Back in mid-00s when they finally documented memory ordering model, it was the same story.

We have had branch prediction since the Pentium in 1993.

>So, I expect that guarantees will first appear in ARM manuals and then Intel and AMD will say that they are the same.

I doubt that ARM manual describe branch prediction or alias prediction
in the architecture part. That's because that's microarchitecture.

>> Implementations of programming languages with bounds-checked arrays
>
>It only matters if the language guarantees sandboxing properties.
>Most bound-checked languages do not.
>JS does and everybody care.
>Java sort-of does, but 99.9% of users do not care

I am pretty sure that most Java programmers and most users of Java
programs care, as well as the users of all other bounds-checked
languages. Security and safety are big selling points of these
languages. If you told the programmers that programs their languages
are supposed to leak information like badly written C programs, why
should they use these languages instead of C?

>> could introduce a conditional move on every index computation; note
>> that this is quite a bit more often than the original bounds checking;
>> you can eliminate a lot of bounds-check branches by knowing that the
>> branch only checks a bound that already has been checked. You cannot
>> do that in all those cases for index operations: E.g., if you have (in
>> Java):
>>
>> for (i=0; i<n; i++) {
>> ... a[i] ...
>> }
>
>That's easy.
>p = a.ptr;
>if (n >= a.length) {
> throw()
>}
>p = n >= a.length ? 0 : p; // implemented as cmove/select
>for (i=0; i<n; i++) {
> ... p[i] ...
>}

Sorry, if n>100 (and also for most values of n<100), your average OoO
CPU will access p[n], p[n+1], p[n+2], ... until the branch
misprediction is corrected. The throw at the start is also premature,
as you must only throw when i==a.length. So you really need:

for (i=0; i<n; i++) {
i1 = i<a.length ? i : 0; // implemented as conditional move
... a[i1] ...
}

I leave inserting the throw check as an exercise to the reader.

>"Proper hardware fix for Spectre" is a pipe dream, IMHO.
>At least I, personally, would not pay for it even 3% either in performance or in money.

Somehow these statements are at odds with each other. "Pipe dream"
suggests it is not possible; the second statement says that it is
worthless to you.

>> And then you have those languages that are not bounds-checked, in
>> particular C, where inserting all these additional conditional moves
>> cannot be done automatically, because the compiler in many cases does
>> not know the array bounds. People have given up on IMO promising
>> approaches for architectural security for C like Deputy
>> <http://ivy.cs.berkeley.edu/ivywiki/uploads/deputy-manual.html>, with
>> new security vulnerabilities for C programs being reported frequently
>> that are exploited in non-subtle ways. So I very much doubt people
>> will take a similar effort (and incur the run-time penalty!) to add
>> conditional-move-based security to C programs to mitigate Spectre, for
>> which the narrative is that nobody uses it for real-world attacks.
>>
>
>I don't understand how 'C' came into discussion.
>C does not promise anything w.r.t. read protection parts of program
>state from any parts of the same programm so no promise can be violated.

If you read what the C standard says about the function foo() above,
the program does not perform the access to a[i] if i>=n.

>C library called by JavaScript is a fixed code. It can't be modified by attacker to suit his needs.

The attacker does not need to modify the code to suit his needs. The
attacker just needs to find code in the library that suits his needs
(that's called a "gadget" by security researchers). E.g., if the
attacker can control i in the call to foo() above, and there is a side
channel that reveals the result of foo(), that would be a gadget.

In your "should be dealt in SW" scenario, all the C code has to be
inspected for gadgets, and that may have to be repeated every time a
new Spectre variant is discovered.

>There is absolutely no needs for Spectre mitigations in such library.

Then you leave the library, and thus the whole process open to Spectre
attacks.

>> In conclusion, dealing with Spectre "in software" the way you suggest
>>
>> * complicates the architecture descriptions.
>
>Yes. I prefer to call it "tighten", but yes it's not very simple.
>Still, much simpler than, for example, memory ordering or than many other things related to cache coherence.

Specifying sequential consistency would be simple. It's just that the
hardware people want to implement something weaker that makes
specifying it complex.

>> By contrast, fixing Spectre in hardware the way I suggest
>>
>> * costs a little area.
>>
>> * costs hardly any performance.
>>
>
>Handwaving.
>
>> * works for the whole software stack without needing any changes to
>> the architecture definition or the software.
>
>Does not work at all before demonstrated.

The Pentium Pro and every OoO processor since then has demonstrated
that hardware people know how to speculatively change architectural
state, but do not commit these changes to permanent state. That the
hardware people could do the same with microarchitectural state; until
Spectre they made it permanent because it's microarchitectural state,
which supposedly is not relevant; Spectre has demonstrated that it is
relevant.

Click here to read the complete article

Re: Spectre

<2kK_J.37396$zT.15971@fx27.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24431&group=comp.arch#24431

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx27.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Spectre
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com> <2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com> <2cG_J.96741$dln7.46979@fx03.iad> <552789f8-7daa-4198-b76d-b2a3aa4f27ccn@googlegroups.com>
In-Reply-To: <552789f8-7daa-4198-b76d-b2a3aa4f27ccn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 74
Message-ID: <2kK_J.37396$zT.15971@fx27.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 23 Mar 2022 19:14:06 UTC
Date: Wed, 23 Mar 2022 15:14:06 -0400
X-Received-Bytes: 4809

by: EricP - Wed, 23 Mar 2022 19:14 UTC

MitchAlsup wrote:
> On Wednesday, March 23, 2022 at 9:32:33 AM UTC-5, EricP wrote:
>> Michael S wrote:
>>> On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
>>>> Michael S <already...@yahoo.com> writes:
>>>>> The only thing that is required by HW is providing tools for such handling.
>>>>> In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
>>>>> implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
>>>>> for conditional moves, data dependencies will remain non-speculative in all future implementations.
>> There has been a lot of research on *value* speculation though
>> I don't think any has made it into the commercial market.
>> For example, value speculation allows speculative forwarding
>> of a store value to a subsequent load and then at load retire
>> checks that the value was correct.
> <
> This ruins the concept of memory consistency when you get good
> enough at speculating data that the memory reference is never
> actually performed.

As I understand it, the consistency model is still obeyed but the
rules to achieve it are rearranged. In x86-TSO

ST [r0], r1
LD r2, [r3] // LD#1
LD r4, [r5] // LD#2

If address [r3] is unresolved and [r5] is resolved, LD#2 cannot bypass
LD#1 because [r3] might be the same address and an Invalidate message
for that address might arrive between LD#1 and LD#2.
If that did happen then the older LD#1 gets the younger value and
younger LD#2 gets the older value, violating x86-TSO consistency.
So LD#2 must wait for [r3] to resolve even though this scenario is
extremely unlikely.

A value speculating LSQ could allow LD#2 to bypass older unresolved
addresses and execute. It gets its value from cache or the older ST and
forwards it to dependents and dest register, and marks the uOp as
"Done-Speculate". However the uOp remains in the LSQ with the value.
Later when all older addresses have resolved it repeats its load
but this time in proper x86-TSO order, compares the values,
and if the same marks the uOp as "Done-Final" or if different
triggers a replay trap.

This sounds to me like the logic to implement this LSQ would be a lot
simpler than trying to track which uOps made potentially hazardous
bypasses and detect when actual consistency violations occur.

> <
>>>> I don't think that any existing OoO implementation (certainly not all)
>>>> waits with a conditional move instruction (or an instruction dependent
>>>> on a conditional move) until it is no longer speculative. You may be
>>>> meaning that they treat the condition as a data dependency, rather
>>>> than something that itself is subject to speculation.
>>>>
>>>> Assuming you mean the latter, how would such a guarantee give any
>>>> "official" security guarantee? The next time it's inconvenient for
>>>> the CPU manufacturers or the NSA to fix a bug, they will just take
>>>> your stand and say that the only official security is that implemented
>>>> through page-table entries.
>>>>
>>>
>>> I don't follow.
>>> If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
>>> then SW can rely on it.
>> Value speculation would effectively break this CMOV guarantee
>> because the CMOV wouldn't know that its values were speculative.
> <
> No-one is going to produce a machine where CMOV cannot be performed
> under speculation based entirely on data (and control) flow.

Yes. Tracking when values are speculative or not gets complicated
real quick when one considers resolved and unresolved nested branches
interacting with resolved and unresolved LSQ ordered addresses.

Re: Spectre (was: Why separate 32-bit arithmetic ...)

<4b119551-6012-4982-8ca6-92a85f9708ccn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24432&group=comp.arch#24432

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:e87:b0:441:a5d:681c with SMTP id hf7-20020a0562140e8700b004410a5d681cmr1417542qvb.38.1648063933897;
Wed, 23 Mar 2022 12:32:13 -0700 (PDT)
X-Received: by 2002:a9d:d81:0:b0:5cd:9d25:b872 with SMTP id
1-20020a9d0d81000000b005cd9d25b872mr686614ots.227.1648063933626; Wed, 23 Mar
2022 12:32:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Mar 2022 12:32:13 -0700 (PDT)
In-Reply-To: <2022Mar23.165254@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd7f:a8f3:4756:422e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd7f:a8f3:4756:422e
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
<memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
<2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
<2022Mar23.165254@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4b119551-6012-4982-8ca6-92a85f9708ccn@googlegroups.com>
Subject: Re: Spectre (was: Why separate 32-bit arithmetic ...)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 23 Mar 2022 19:32:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 323

by: MitchAlsup - Wed, 23 Mar 2022 19:32 UTC

On Wednesday, March 23, 2022 at 1:51:32 PM UTC-5, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
> >> Michael S <already...@yahoo.com> writes:
> >> The next time it's inconvenient for
> >> the CPU manufacturers or the NSA to fix a bug, they will just take
> >> your stand and say that the only official security is that implemented
> >> through page-table entries.
> >>
> >
> >
> >I don't follow.
> >If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
> >then SW can rely on it.
> Sorry, "speculative" is not an architectural term. What is "SW can
> rely on it" supposed to mean?
>
> Consider the following machine code:
>
> 0: b8 00 00 00 00 mov eax,0x0
> 5: 48 39 d6 cmp rsi,rdx
> 8: 72 01 jb b <foo+0xb>
> a: c3 ret
> b: 48 8b 04 f7 mov rax,QWORD PTR [rdi+rsi*8]
> f: eb f9 jmp a <foo+0xa>
>
> [ This was generated by gcc-10 -O from:
>
> long foo(long *a, unsigned long i, unsigned long n)
> {
> if (i<n)
> return a[i];
> return 0;
> }
>
> Can anyone explain the jump back and the placement of the instruction
> at 0 to me? ]
<
It is likely at some point the compiler thought their might be an epilogue
to this subroutine and the jump is to make all returns common code.
<
On the other hand, the unconditional-jump-to-ret peephole optimization
was not performed, either.
>
> If I look in the architecture manual what this program does, it tells
> me that if rsi>=rdx, the jb falls through and the code never reaches
> the mov at offset b, and therefore does not perform it and does not
> access the data at rdi+rsi*8. Now you claim that this official
> architecture manual (which also describes the page tables) does not
> describe the official trust boundaries. Why should a statement about
> "speculative dependencies" be any different?
<
It is different because in the code above, the LD was not performed if
i>n. In the CMOV case, the LD would always be performed.
<
> Assumung your statement
> is added to the manuals, and people put cmovs in all kinds of places,
> and then a vulnerability is discovered, someone could claim that
> that's not "official trust boundaries", so it's not a bug, and
> therefore software should work around it. And this could even happen
> if the conditional move performed speculative execution based on the
> condition, because someone (probably not you in this case, but someone
> like you, maybe EricP) would claim that the only official trust
> boundary is that implemented in page tables.
>
> But note that Meltdown has taught us that page-table entries cannot be
> relied on, either; fortunately that was easier to fix in hardware and
> easier to mitigate in software, but does that make page tables more
> official than the architectural execution sequence?
> >Right now not only cmove but all Intel and AMD instructions have all data dependencies non-speculative.
> In your dreams. Alias prediction has been used for quite some time.
> One can probably work out a Spectre vulnerability based on it.
> >I'd guess, the fact wouldn't by codified in manuals until the have plans to make some of dependencies speculative.
> They also have not codified branch prediction in the architecture
> manuals (only in the optimization manuals). That's because branch
> prediction is microarchitecture, not architecture.
<
Is not all prediction microarchitecture ?
<
> >That's simple how it worked until now in x86 world - hw practice leads and manuals follow.
> >Back in mid-00s when they finally documented memory ordering model, it was the same story.
> We have had branch prediction since the Pentium in 1993.
> >So, I expect that guarantees will first appear in ARM manuals and then Intel and AMD will say that they are the same.
> I doubt that ARM manual describe branch prediction or alias prediction
> in the architecture part. That's because that's microarchitecture.
> >> Implementations of programming languages with bounds-checked arrays
> >
> >It only matters if the language guarantees sandboxing properties.
> >Most bound-checked languages do not.
> >JS does and everybody care.
> >Java sort-of does, but 99.9% of users do not care
> I am pretty sure that most Java programmers and most users of Java
> programs care, as well as the users of all other bounds-checked
> languages. Security and safety are big selling points of these
> languages. If you told the programmers that programs their languages
> are supposed to leak information like badly written C programs, why
> should they use these languages instead of C?
<
Because of JIT. If C had been setup to download compile and go JAVA
and JA would not have gotten going.
<
> >> could introduce a conditional move on every index computation; note
> >> that this is quite a bit more often than the original bounds checking;
> >> you can eliminate a lot of bounds-check branches by knowing that the
> >> branch only checks a bound that already has been checked. You cannot
> >> do that in all those cases for index operations: E.g., if you have (in
> >> Java):
> >>
> >> for (i=0; i<n; i++) {
> >> ... a[i] ...
> >> }
> >
> >That's easy.
> >p = a.ptr;
> >if (n >= a.length) {
> > throw()
> >}
> >p = n >= a.length ? 0 : p; // implemented as cmove/select
> >for (i=0; i<n; i++) {
> > ... p[i] ...
> >}
> Sorry, if n>100 (and also for most values of n<100), your average OoO
> CPU will access p[n], p[n+1], p[n+2], ... until the branch
> misprediction is corrected. The throw at the start is also premature,
> as you must only throw when i==a.length. So you really need:
> for (i=0; i<n; i++) {
> i1 = i<a.length ? i : 0; // implemented as conditional move
> ... a[i1] ...
> }
>
> I leave inserting the throw check as an exercise to the reader.
> >"Proper hardware fix for Spectre" is a pipe dream, IMHO.
> >At least I, personally, would not pay for it even 3% either in performance or in money.
> Somehow these statements are at odds with each other. "Pipe dream"
> suggests it is not possible; the second statement says that it is
> worthless to you.
> >> And then you have those languages that are not bounds-checked, in
> >> particular C, where inserting all these additional conditional moves
> >> cannot be done automatically, because the compiler in many cases does
> >> not know the array bounds. People have given up on IMO promising
> >> approaches for architectural security for C like Deputy
> >> <http://ivy.cs.berkeley.edu/ivywiki/uploads/deputy-manual.html>, with
> >> new security vulnerabilities for C programs being reported frequently
> >> that are exploited in non-subtle ways. So I very much doubt people
> >> will take a similar effort (and incur the run-time penalty!) to add
> >> conditional-move-based security to C programs to mitigate Spectre, for
> >> which the narrative is that nobody uses it for real-world attacks.
> >>
> >
> >I don't understand how 'C' came into discussion.
> >C does not promise anything w.r.t. read protection parts of program
> >state from any parts of the same programm so no promise can be violated.
> If you read what the C standard says about the function foo() above,
> the program does not perform the access to a[i] if i>=n.
<
But overly aggressive optimization may ignore this fact.
<
> >C library called by JavaScript is a fixed code. It can't be modified by attacker to suit his needs.
> The attacker does not need to modify the code to suit his needs. The
> attacker just needs to find code in the library that suits his needs
> (that's called a "gadget" by security researchers). E.g., if the
> attacker can control i in the call to foo() above, and there is a side
> channel that reveals the result of foo(), that would be a gadget.
>
> In your "should be dealt in SW" scenario, all the C code has to be
> inspected for gadgets, and that may have to be repeated every time a
> new Spectre variant is discovered.
> >There is absolutely no needs for Spectre mitigations in such library.
> Then you leave the library, and thus the whole process open to Spectre
> attacks.
> >> In conclusion, dealing with Spectre "in software" the way you suggest
> >>
> >> * complicates the architecture descriptions.
> >
> >Yes. I prefer to call it "tighten", but yes it's not very simple.
> >Still, much simpler than, for example, memory ordering or than many other things related to cache coherence.
> Specifying sequential consistency would be simple. It's just that the
> hardware people want to implement something weaker that makes
> specifying it complex.
> >> By contrast, fixing Spectre in hardware the way I suggest
> >>
> >> * costs a little area.
> >>
> >> * costs hardly any performance.
> >>
> >
> >Handwaving.
> >
> >> * works for the whole software stack without needing any changes to
> >> the architecture definition or the software.
> >
> >Does not work at all before demonstrated.
> The Pentium Pro and every OoO processor since then has demonstrated
> that hardware people know how to speculatively change architectural
> state, but do not commit these changes to permanent state. That the
> hardware people could do the same with microarchitectural state; until
> Spectre they made it permanent because it's microarchitectural state,
> which supposedly is not relevant; Spectre has demonstrated that it is
> relevant.
>
> As for the costs:
>
> Area: We need:
>
> * Buffers for cache lines for in-flight loads (that missed the L1
> cache). You could provide buffers for all the possible loads (size
> similar to the physical ZMM register file if you provided physical
> registers for all possible in-flight instructions), or you can
> provide fewer buffers (just like there are not physical registers
> for everything). I think that if you size these buffers for, say,
> losing 1% performance on average due to occasional loads having to
> wait for a buffer to become free would be relatively small: a
> smaller proportion of instructions are loads than produce register
> results; in many of the cases these loads are in the L1 cache, and
> we may be able to avoid the buffer for that; and often the same
> cache lines are loaded several times (e.g., accessing consecutive
> elements of an array), so the buffer is needed only once. I have no
> measurement results for that, so you can still dismiss it as
> handwaving.
<
All of the BGOoO machine I have worked on already had at least 8
outstanding miss buffers. I suspect that if you waited to update
cache state until after retirement you would loose less than that 1%.
{But note all caches are microarchitectrure--except where one gets
prefetch and post push instructions, along with INVAL,...but I digress}
>
> * For branch predictions: the actual direction/target needs to be
> stored with the instruction anyway until it is retired, so no
> additional buffers are needed.
<
Only until it is checked (after consistent and before retire it can be
disposed of.) But again, very minor area penalty, if any.
>
> * There are also other microarchitectural states, e.g., the power
> state of AVX256 units, but they don't need big buffers, so doing
> that right costs little area.
>
> Performance: As long as we don't run out of buffers, there is no
> slowdown. There might even be a speedup because the load buffers
> conceptually enlarge the L1 cache, branch mispredictions don't evict
> useful lines from the L1 cache, and don't update the branch predictor
> with invalid data.
<
Create a temporal cache between pipelines and L1. A 48 entry fully
associative memory reorder buffer supplied at lest 50% of all the
LDs and STs in Mc 88120. STs can deliver data, LDs can read the
stored data, and it can all be throw away if the ST fails to retire.
>
> In addition to side channels by permanent effects on
> microarchitectural state of the core, there are also effects on the
> microarchitectural state of, e.g., the DRAM controllers (in
> particular, open banks); these can be handled by waiting until the
> instruction is non-speculative, or by not keeping the banks open; the
> latter is probably more expensive.
<
Servers prefer closed DRAM while HP FP prefer open page mode.
While the fall-out traffic from Lmax to DRAM is partially correlated
from miss to miss, the traffic from 2 different CPUs is rarely even
correlated enough to prefer open-page-mode. That is open-page-mode
is for single CPU system models--which is becoming rare, except
for benchmarks.
<
< L3 cache misses are rare, and they
> cost a lot of latency anyway. Adding some dozens of cycles to wait
> for the instruction to be non-speculative does not make it that much
> more expensive. E.g., if you have an L3 miss every 10000
> instructions, and that then costs 50 cycles more, CPI increases by
> 0.005.
<
If the L3 contains data which appears to have already been written to DRAM,
then Meltdown is eliminated. But I digress, again.....
>
> There are also side channels from resource contention to units shared
> with other cores (e.g., cache snooping). One solution for that would
> be to have a round-robin scheme for speculative accesses by cores:
> Non-speculative accesses always get served. For speculative accesses
> the other cycles are managed in a round-robin scheme, where each core
> gets a cycle whether it uses it or not. So other cores cannot observe
> whether you perform a speculative load.
<
As Spectré has demonstrated, given a high precision timer, it is not whether
the LD was speculated or not that maters, it is whether you did anything that
changes the time seen by the HPT.
<
> This may cost area (for
> increasing the number of ports) and/or performance (because you cannot
> perform as many speculative L3 accesses as you used to), but the
> performance effect very much depends on the application. One solution
> may be to have larger per-core L2 caches, with a good part of the
> speculative bandwidth reserved for the owning core.
<
Add NAK to the coherent portion of the transport command set.
NAK mean, yes I have it, no you can't have it quite yet.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Click here to read the complete article

Re: Spectre

<b97b1064-a69f-460a-b1db-a3a0140d0261n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24433&group=comp.arch#24433

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:89:b0:2e1:b8c7:9975 with SMTP id o9-20020a05622a008900b002e1b8c79975mr1330852qtw.342.1648064180913;
Wed, 23 Mar 2022 12:36:20 -0700 (PDT)
X-Received: by 2002:a05:6870:f2a5:b0:da:b3f:2b50 with SMTP id
u37-20020a056870f2a500b000da0b3f2b50mr4797835oap.239.1648064180715; Wed, 23
Mar 2022 12:36:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Mar 2022 12:36:20 -0700 (PDT)
In-Reply-To: <2kK_J.37396$zT.15971@fx27.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd7f:a8f3:4756:422e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd7f:a8f3:4756:422e
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com>
<memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com>
<2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com>
<2cG_J.96741$dln7.46979@fx03.iad> <552789f8-7daa-4198-b76d-b2a3aa4f27ccn@googlegroups.com>
<2kK_J.37396$zT.15971@fx27.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b97b1064-a69f-460a-b1db-a3a0140d0261n@googlegroups.com>
Subject: Re: Spectre
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 23 Mar 2022 19:36:20 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 77

by: MitchAlsup - Wed, 23 Mar 2022 19:36 UTC

On Wednesday, March 23, 2022 at 2:14:11 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Wednesday, March 23, 2022 at 9:32:33 AM UTC-5, EricP wrote:
> >> Michael S wrote:
> >>> On Wednesday, March 23, 2022 at 11:58:49 AM UTC+2, Anton Ertl wrote:
> >>>> Michael S <already...@yahoo.com> writes:
> >>>>> The only thing that is required by HW is providing tools for such handling.
> >>>>> In case of x86, ARM and POWER the tools, in form of non-speculative data dependencies, are already here in all
> >>>>> implementations. The only missing step is an ISA-level guarantee that for some instructions, preferably
> >>>>> for conditional moves, data dependencies will remain non-speculative in all future implementations.
> >> There has been a lot of research on *value* speculation though
> >> I don't think any has made it into the commercial market.
> >> For example, value speculation allows speculative forwarding
> >> of a store value to a subsequent load and then at load retire
> >> checks that the value was correct.
> > <
> > This ruins the concept of memory consistency when you get good
> > enough at speculating data that the memory reference is never
> > actually performed.
> As I understand it, the consistency model is still obeyed but the
> rules to achieve it are rearranged. In x86-TSO
>
> ST [r0], r1
> LD r2, [r3] // LD#1
> LD r4, [r5] // LD#2
>
> If address [r3] is unresolved and [r5] is resolved, LD#2 cannot bypass
> LD#1 because [r3] might be the same address and an Invalidate message
> for that address might arrive between LD#1 and LD#2.
> If that did happen then the older LD#1 gets the younger value and
> younger LD#2 gets the older value, violating x86-TSO consistency.
> So LD#2 must wait for [r3] to resolve even though this scenario is
> extremely unlikely.
>
> A value speculating LSQ could allow LD#2 to bypass older unresolved
> addresses and execute. It gets its value from cache or the older ST and
> forwards it to dependents and dest register, and marks the uOp as
> "Done-Speculate". However the uOp remains in the LSQ with the value.
> Later when all older addresses have resolved it repeats its load
> but this time in proper x86-TSO order, compares the values,
> and if the same marks the uOp as "Done-Final" or if different
> triggers a replay trap.
<
When LD#1 gets its address generated, you forward check all younger
LDs to see if they may have read that address, too, and replay from the
oldest LD first, while NAKing other accesses to those addresses.
>
> This sounds to me like the logic to implement this LSQ would be a lot
> simpler than trying to track which uOps made potentially hazardous
> bypasses and detect when actual consistency violations occur.
<
It is, and in fact other parts of the OoO machine are already depending
on this to happen already. So, the cost is essentially zero.
> > <
> >>>> I don't think that any existing OoO implementation (certainly not all)
> >>>> waits with a conditional move instruction (or an instruction dependent
> >>>> on a conditional move) until it is no longer speculative. You may be
> >>>> meaning that they treat the condition as a data dependency, rather
> >>>> than something that itself is subject to speculation.
> >>>>
> >>>> Assuming you mean the latter, how would such a guarantee give any
> >>>> "official" security guarantee? The next time it's inconvenient for
> >>>> the CPU manufacturers or the NSA to fix a bug, they will just take
> >>>> your stand and say that the only official security is that implemented
> >>>> through page-table entries.
> >>>>
> >>>
> >>> I don't follow.
> >>> If the book (architecture manual) says that all 3 data dependencies of cmove/select are non speculative
> >>> then SW can rely on it.
> >> Value speculation would effectively break this CMOV guarantee
> >> because the CMOV wouldn't know that its values were speculative.
> > <
> > No-one is going to produce a machine where CMOV cannot be performed
> > under speculation based entirely on data (and control) flow.
> Yes. Tracking when values are speculative or not gets complicated
> real quick when one considers resolved and unresolved nested branches
> interacting with resolved and unresolved LSQ ordered addresses.

Re: Spectre (was: Why separate 32-bit arithmetic ...)

<t1ft2b$nhi$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24434&group=comp.arch#24434

copy link Newsgroups: comp.arch

by: Thomas Koenig - Wed, 23 Mar 2022 19:39 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Wednesday, March 23, 2022 at 1:51:32 PM UTC-5, Anton Ertl wrote:

>> Consider the following machine code:
>>
>> 0: b8 00 00 00 00 mov eax,0x0
>> 5: 48 39 d6 cmp rsi,rdx
>> 8: 72 01 jb b <foo+0xb>
>> a: c3 ret
>> b: 48 8b 04 f7 mov rax,QWORD PTR [rdi+rsi*8]
>> f: eb f9 jmp a <foo+0xa>
>>
>> [ This was generated by gcc-10 -O from:
>>
>> long foo(long *a, unsigned long i, unsigned long n)
>> {
>> if (i<n)
>> return a[i];
>> return 0;
>> }
>>
>> Can anyone explain the jump back and the placement of the instruction
>> at 0 to me? ]
><
> It is likely at some point the compiler thought their might be an epilogue
> to this subroutine and the jump is to make all returns common code.

At higher optimization level (using gcc12), this is translated to

xorl %eax, %eax
cmpq %rdx, %rsi
jb .L5
ret
.p2align 4,,10
.p2align 3
..L5:
movq (%rdi,%rsi,8), %rax
ret

which looks reasonable.

Re: Why separate 32-bit arithmetic on a 64-bit architecture?

<20d5ecab-f2ff-4641-8914-acc6db457d80n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24435&group=comp.arch#24435

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:89:b0:2e1:b8c7:9975 with SMTP id o9-20020a05622a008900b002e1b8c79975mr1594898qtw.342.1648069018473;
Wed, 23 Mar 2022 13:56:58 -0700 (PDT)
X-Received: by 2002:aca:ead4:0:b0:2ec:ba66:12df with SMTP id
i203-20020acaead4000000b002ecba6612dfmr6074731oih.194.1648069014904; Wed, 23
Mar 2022 13:56:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Mar 2022 13:56:54 -0700 (PDT)
In-Reply-To: <memo.20220322141111.1928h@jgd.cix.co.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:a16a:817b:b8ee:b080;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:a16a:817b:b8ee:b080
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <20d5ecab-f2ff-4641-8914-acc6db457d80n@googlegroups.com>
Subject: Re: Why separate 32-bit arithmetic on a 64-bit architecture?
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 23 Mar 2022 20:56:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 13

by: Quadibloc - Wed, 23 Mar 2022 20:56 UTC

On Tuesday, March 22, 2022 at 8:11:14 AM UTC-6, John Dallman wrote:

> You're neglecting "Running Android apps". which sells a great many more
> ARM-based devices, although they're lower-cost.

In that case, the problem is almost solved. While _some_ Android apps include
C code compiled to ARM machine code for ultimate performance, most are
purely writtren in Java. This enabled smartphones to be built that ran Android,
but had x86 processors instead of ARM processors.

Given that, the way to abolish Spectre for the Android world is clear. Just make
a low-power Itanium chip suitable for smartphones.

John Savard

Re: Spectre (was: Why separate 32-bit arithmetic ...)

<2022Mar24.084643@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24438&group=comp.arch#24438

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Spectre (was: Why separate 32-bit arithmetic ...)
Date: Thu, 24 Mar 2022 07:46:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 44
Message-ID: <2022Mar24.084643@mips.complang.tuwien.ac.at>
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com> <2022Mar23.093749@mips.complang.tuwien.ac.at> <02d8578d-eb8e-463d-87de-e68a23952aa4n@googlegroups.com> <2022Mar23.165254@mips.complang.tuwien.ac.at>
Injection-Info: reader02.eternal-september.org; posting-host="d5f67e5c06b5b0e3beeb5b798e22ec52";
logging-data="9081"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RQL2BTdLv+exqww+/hwcz"
Cancel-Lock: sha1:mDOz9CZaGtGEDaAX52Rg1UjWCik=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Thu, 24 Mar 2022 07:46 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>There are also side channels from resource contention to units shared
>with other cores (e.g., cache snooping). One solution for that would
>be to have a round-robin scheme for speculative accesses by cores:
>Non-speculative accesses always get served. For speculative accesses
>the other cycles are managed in a round-robin scheme, where each core
>gets a cycle whether it uses it or not. So other cores cannot observe
>whether you perform a speculative load. This may cost area (for
>increasing the number of ports) and/or performance (because you cannot
>perform as many speculative L3 accesses as you used to), but the
>performance effect very much depends on the application. One solution
>may be to have larger per-core L2 caches, with a good part of the
>speculative bandwidth reserved for the owning core.

Another (complementary) approach is to allocate speculative bandwidth
based on observed architectural bandwidth. I.e., if a core has
recently performed a significant number of architectural accesses to
this cache level, the arbiter increases its share of allowed
speculative accesses (possibly up to 100%; the other cores then all
have to wait for the accesses to become non-speculative, but they
won't starve). The idea behind this is that processes have phases (or
probably phase transitions) with lots of cache misses, and phases with
hardly any cache misses. By allocating more bandwidth to a core with
lots of cache misses based on architectural accesses, no
microarchitectural state is revealed, and yet more bandwidth can be
utilized.

Whether this really pays off is the question, however: If a core
performs so many cache accesses that bandwidth matters, after a short
while the accesses have to wait so long that they become
non-speculative before the resource becomes available, and then there
is no speculative bandwidth left, and redistributing that makes no
difference.

So the question is how many workloads have cache accesses that are
rare enough that significant bandwidth is left for speculative
accesses, but still frequent enough that they see a significant
benefit from an increased bandwidth allocation. I guess it's not that
many, but that's a topic for empirical research.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Spectre

<2022Mar24.090722@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24439&group=comp.arch#24439

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Spectre
Date: Thu, 24 Mar 2022 08:07:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 103
Message-ID: <2022Mar24.090722@mips.complang.tuwien.ac.at>
References: <82d668ec-a140-454c-8606-9e665c750997n@googlegroups.com> <memo.20220322141111.1928h@jgd.cix.co.uk> <2c5bd9b0-1439-49c1-a4c8-707f27a9eb11n@googlegroups.com> <2022Mar23.093749@mips.complang.tuwien.ac.at> <%OF_J.126355$ZmJ7.99768@fx06.iad>
Injection-Info: reader02.eternal-september.org; posting-host="d5f67e5c06b5b0e3beeb5b798e22ec52";
logging-data="7611"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/MgBXHmgvtkcxDu2LPGGI9"
Cancel-Lock: sha1:4PvfrW0MpOFdNynQnEIQNuw+/L0=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Thu, 24 Mar 2022 08:07 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Anton Ertl wrote:
>> Michael S <already5chosen@yahoo.com> writes:
>>> On Tuesday, March 22, 2022 at 4:11:14 PM UTC+2, John Dallman wrote:
>>>> However, neither Intel/AMD nor any of the ARM manufacturers seem to have
>>>> found a practical way to achieve high performance without speculation and
>>>> the consequential Spectre-like vulnerabilities.
>>
>> Spectre vulnerabilities are in no way a consequence of speculation,
>> just like corrupted architectural state is not a consequence of
>> speculation.
>
>I assume you mean Spectre is not an *inevitable* consequence of speculation,
>just a consequence of its current implementation.

Yes.

>> Is security based on not performing architectural accesses to the
>> secret any less official than security based on page table entries
>> (another architectural feature)? Not in my book, and certainly not in
>> the architecture manuals of the CPU manufacturers.
>
>That is not official security, as defined by the designers of that system.

Where do I find this definition? I don't remember ever reading about
"official security" in the architecture manuals.

The designers of the Borroughs B5500 and its successors relied on
security by controlling architectural accesses, without segments-based
or page-table-based security. Their successors run on Intel hardware
AFAIK, and I expect that the whole OS with all processes on it shares
the same page table (or at least shared it until Spectre was
revealed).

There are also other systems based on this idea, e.g., Singularity.

The reason why OSs won that rely on page-table security is not because
B5500-style security is any less official, but because page-table
security puts less constraints on the software: You can write software
in C and assembly language without subverting that security, not just
managed languages (which were not dominant when page-table-based OSs
won). And even for managed languages it provides an additional line
of defense against compiler or (more frequently) library bugs, or
(since Spectre) hardware bugs.

>> But if the only security you consider official (and therefore the only
>> one the hardware is required to guarantee IYO) is that provided by
>> page table entries, that's incredibly expensive: The JavaScript JIT
>> engine would have to change the page table entries to allow access
>> before the access, and change the page table entries again after the
>> access to disallow the access. What's the slowdown of that? A factor
>> 100?
>
>Which is probably why, as far as I know, no one does it that way.
>Switching address spaces to run programs and passing messages between
>cooperating processes is a common technique before multi-threading.

That is a better (not just more efficient, but also more secure
against attacks from other threads) workaround for protecting
super-secret stuff like secret keys against many current hardware and
software vulnerabilities, but that does not mean that these
vulnerabilities are not bugs.

As for more general security, I don't think that creating a process
for every array, and then using IPC to communicate the index and
values to be loaded or stored is efficient enough to be usable. But
that would be the consequence if hardware designers really disavowed
bounds-checking-based security, and software designers acted on such a
disavowal.

>A solution I was thinking of is a new variant of a Compare-And-Trap that
>copies the checked value to a dest register and acts as a data flow gate.
>Move-Trap-Conditional MTCcc compares to a limit value and traps or moves.
>That makes the checked value data flow dependent on the guard condition
>and at run time costs only forwarding a value if the guard condition is
>already resolved, or delays until it is resolved.

In the usual OoO engines control flow is mostly separated from data
flow. I don't think that an instruction that traps when the condition
is this and has a result when the condition is something else fits in
well. The likely implementation is an instruction that moves in any
case (possibly waiting for the condition to arrive), and traps on a
specific condition. Such an implementation would be vulnerable to
Spectre V1 (but you would consider that ok given your position about
"official security", no?). Another possible implementation is to
synchronize the core, but then using this instruction would be
abysmally slow.

The existing workaround (that does not need additional instructions)
is to squash an out-of-bounds index to an in-bounds value with an
instruction like AND or CMOV that we expect to use just data flow, not
any prediction; this works around Spectre V1. The bounds check for
the throw is done mostly separately.

Adding workaround instructions would require designing new hardware.
While they are at it, they could and should fix Spectre. So when the
new hardware comes to the market, why would one use these workaround
instructions?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Pages:1 2 3 4 567

server_pubkey.txt

rocksolid light 0.9.8
clearnet tor