novaBBS - comp.arch - Re: Another Spectre variant: Retbleed

Re: Another Spectre variant: Retbleed

<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=26675&group=comp.arch#26675

X-Received: by 2002:ad4:5761:0:b0:473:7861:69d1 with SMTP id r1-20020ad45761000000b00473786169d1mr16848796qvx.73.1658018072897;
Sat, 16 Jul 2022 17:34:32 -0700 (PDT)
X-Received: by 2002:a05:622a:1749:b0:31e:c44a:9656 with SMTP id
l9-20020a05622a174900b0031ec44a9656mr16910232qtk.579.1658018072752; Sat, 16
Jul 2022 17:34:32 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 16 Jul 2022 17:34:32 -0700 (PDT)
In-Reply-To: <tauunv$vce$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 17 Jul 2022 00:34:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4090

by: MitchAlsup - Sun, 17 Jul 2022 00:34 UTC

On Saturday, July 16, 2022 at 1:06:58 PM UTC-5, Thomas Koenig wrote:
> https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
> wrote the exploit code for this). It seems there is no end to
> this. Intel and AMD are affected, and mitigation is expensive,
> 14% and 39% overhead measured. Bah.
>
> @Mitch: How expensive would not allowing speculative execution to
> update microarchitectural state be?
<
I have read the associated paper and spent a couple hours thinking about
the problem space::
<
a) My 66130 is not sensitive to this kind of attack.
.....My 666x0 might be sensitive to this kind of attack.
.....I suspect Mill is not sensitive, either.
b) the attack strategy would vanish if GuestOS used different ASID
.....than application, and the branch prediction recorded ASID or just
.....used the ASID in its HASH. My 66000 has this abstraction built in.
c) using Safe Stack (and the Mill equivalent) eliminates the problem.
.....but also makes retpolines require excess privilege, but these are
.....unnecessary; so let us assume they were not "put in".
d) in any event, My 66000 has switches and method calls that are not
.....based on indirect control transfer (LD Rk,[address], JMP Rk) and use
.....of these has range verification applied in HW; so the main avenue
.....of attack is missing.
e) since indirect control transfer (LD Rk,[address], JMP Rk) will be so
.....rare, it requires no prediction, eliminating this attack strategy.
>
> I am thinking about stores; I assume that most stores on OoO
> architectures are speculative, so a sizable buffer (a second L1
> cache, if you will) would have to be set aside. Do you have
> an estimate how big and expensive that would have to be?
<
In Mc88120 we allowed the conditional cache (i.e., memory reservation
station) to be ½ of the size of dynamic execution window (96 instructions).
We chose ½ instead of ⅓ because administration was simpler, and because
memory references tend to come in clumps, and breaking a clump harmed
the packet cache density.
<
In general, STs represent 10% of dynamic instructions, LDs 20%. So, you need
a ST buffer big enough not to stall issue when you have a clump or run through
a section of code with little arithmetic. It has to be long enough to absorb the
delay until you get to the point STs can migrate data to the cache hierarchy. In
heavy FP codes this will be at least (12-cycles + completion-latency) × issue-width.
12-cycles embraces the LD latency feeding the FP calculations and the FP calcu-
lations themselves. Completion latency is the number of cycles after ST becomes
consistent (and is allowed to modify visible architectural state) until all such state
has been so modified and the store is then ready to have its microarchitectural
state recycled to a new memory reference (or ST).

Re: Another Spectre variant: Retbleed

<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26676&group=comp.arch#26676

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sat, 16 Jul 2022 22:20:52 -0400
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="617f92a15445cd5aa63900641011f127";
logging-data="3787726"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xoO28OrtgTTjtzJf89PYC"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:KJ0pt8CXnH1qnJ5gY90WuX620O4=
sha1:Oj6FcFVvuJ3f45Mzg+TIg8ei+mk=

by: Stefan Monnier - Sun, 17 Jul 2022 02:20 UTC

> e) since indirect control transfer (LD Rk,[address], JMP Rk) will be so
> ....rare, it requires no prediction, eliminating this attack strategy.

While some languages and/or compilation strategies may be able to
replace most indirect calls other forms of control transfers, it doesn't
sound reasonable to assume that they will necessarily be rare.

Method calls and calls to closures by default rely crucially on
indirect function calls. Various optimization techniques have been
developed to replace those with direct calls or switch tables, but they
only cover *some* cases.

Stefan

Never mind the cost of losses to Yet Another Exploit.

What if you grid, say, RK3399's onto a board, with some flavor
of switched ethernet serving all the nodes? Storage is off-board
SAN. You share a CPU IFF the apps are all in the same security
domain.

I would cheerfully switch to this, even at a 25% bump in hosting
cost.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Re: Another Spectre variant: Retbleed

<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26687&group=comp.arch#26687

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:fe69:0:b0:473:9d82:b160 with SMTP id b9-20020a0cfe69000000b004739d82b160mr15905287qvv.111.1658070150172;
Sun, 17 Jul 2022 08:02:30 -0700 (PDT)
X-Received: by 2002:a05:622a:1345:b0:31e:b991:ac1e with SMTP id
w5-20020a05622a134500b0031eb991ac1emr19173999qtk.279.1658070150032; Sun, 17
Jul 2022 08:02:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!45.76.7.193.MISMATCH!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 17 Jul 2022 08:02:29 -0700 (PDT)
In-Reply-To: <jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 17 Jul 2022 15:02:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16

by: MitchAlsup - Sun, 17 Jul 2022 15:02 UTC

On Saturday, July 16, 2022 at 9:20:56 PM UTC-5, Stefan Monnier wrote:
> > e) since indirect control transfer (LD Rk,[address], JMP Rk) will be so
> > ....rare, it requires no prediction, eliminating this attack strategy.
> While some languages and/or compilation strategies may be able to
> replace most indirect calls other forms of control transfers, it doesn't
> sound reasonable to assume that they will necessarily be rare.
>
> Method calls and calls to closures by default rely crucially on
> indirect function calls. Various optimization techniques have been
> developed to replace those with direct calls or switch tables, but they
> only cover *some* cases.
<
Tabularized calls are supported and do not need indirect calling/jumping.
Tabularized calls support Method Calling.
>
>
> Stefan

Re: Another Spectre variant: Retbleed

<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26688&group=comp.arch#26688

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sun, 17 Jul 2022 11:12:32 -0400
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="617f92a15445cd5aa63900641011f127";
logging-data="3964745"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+i23cTahhZ7mC0p9reiqcM"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:kW+PuViy+fzLa0yTRV7HqmHnAQg=
sha1:Rxdjnmkh8ArqWkUeo5RarDB8yiY=

by: Stefan Monnier - Sun, 17 Jul 2022 15:12 UTC

> Tabularized calls are supported and do not need indirect calling/jumping.
> Tabularized calls support Method Calling.

Many method calls look like:

(x->vtable[CST]) (x, ..args...)

while calls to closures will look like

(x->code) (x, ...args...)

both of those seem to fall squarely in the category of indirect calls.

IIUC you're saying that you think you'll be able to avoid using
prediction for those? That seems difficult without paying a fairly
heavy cost (`x` is not necessarily known yet at the time you fetch those
instructions).

Stefan

Andy Valencia wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>> https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
>> wrote the exploit code for this). It seems there is no end to
>> this. Intel and AMD are affected, and mitigation is expensive,
>> 14% and 39% overhead measured. Bah.
>
> Never mind the cost of losses to Yet Another Exploit.
>
> What if you grid, say, RK3399's onto a board, with some flavor
> of switched ethernet serving all the nodes? Storage is off-board
> SAN. You share a CPU IFF the apps are all in the same security
> domain.
>
> I would cheerfully switch to this, even at a 25% bump in hosting
> cost.
>
> Andy Valencia
> Home page: https://www.vsta.org/andy/
> To contact me: https://www.vsta.org/contact/andy.html

Beaten with a blunt object is also a viable engineering solution.

Re: Another Spectre variant: Retbleed

<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26690&group=comp.arch#26690

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:23cb:b0:472:f1a5:5cea with SMTP id hr11-20020a05621423cb00b00472f1a55ceamr18197982qvb.13.1658080556349;
Sun, 17 Jul 2022 10:55:56 -0700 (PDT)
X-Received: by 2002:a05:620a:40c6:b0:6b1:48e4:8784 with SMTP id
g6-20020a05620a40c600b006b148e48784mr16100052qko.331.1658080556177; Sun, 17
Jul 2022 10:55:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 17 Jul 2022 10:55:56 -0700 (PDT)
In-Reply-To: <jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org> <ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 17 Jul 2022 17:55:56 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 93

by: MitchAlsup - Sun, 17 Jul 2022 17:55 UTC

On Sunday, July 17, 2022 at 10:12:36 AM UTC-5, Stefan Monnier wrote:
> > Tabularized calls are supported and do not need indirect calling/jumping.
> > Tabularized calls support Method Calling.
> Many method calls look like:
>
> (x->vtable[CST]) (x, ..args...)
<
Why do they not look like::
<
globaltable[x->method](x, ..args... )
<
After all, the compiler+linker can account for all methods prior to the start
of execution:: can they not ??
>
> while calls to closures will look like
>
> (x->code) (x, ...args...)
<
Which languages expose closures to applications ??
>
> both of those seem to fall squarely in the category of indirect calls.
>
> IIUC you're saying that you think you'll be able to avoid using
> prediction for those? That seems difficult without paying a fairly
> heavy cost (`x` is not necessarily known yet at the time you fetch those
> instructions).
<
Prediction is based on the performance goals. A 1-wide in-order machine
is unlikely to use prediction--indeed, the ISA of My 66000 was designed
so that prediction was unnecessary on such lowly implementations. VVM,
then, enabled these lowly implementations to achieve performance results
within spitting distance of current GBOoO machines (that is 2.0 IPC long
term average).
<
However, GBOoO My 66000's will indeed have (and need) prediction.
I envision 3 kinds of predictors:: conditional branch predictor, call-return
stack predictor, and an indirect predictor. Exactly how these would be
configured is dependent on how the FETCH-DECODE part of the execution
pipeline is designed.
<
Indirect prediction in the past has not have very good accuracy. In fact,
my My 66000 simulator uses tabularized subroutine calls:: PARSE uses
instruction<31:26> to index a format table. The format subroutine, then,
extracts more bits from the instruction (dependent on which format is
active) and then calls an indirect function to deal with operands (in
DECODE) and then call the calculation indirect function (In EXECUTE).
<
static
REGISTER op2table( REGISTER S1, REGISTER S2, SIGN s )[] = {
illegalOpCode, // 0
SR, // 1
SL, // 2
BMM, // 3
ADD, // 4
MUL, // 5
DIV, // 6
CMP, // 7
MAX, // 8
MIN, // 9
OR, // 10
XOR, // 11
AND, // 12
illegalOpcode, // 13
illegalOpcode, // 14
illegalOpcode, // 15
EADD, // 16
FADD, // 17
FMUL, // 18
FDIV, // 19
FCMP, // 20
FMAX, // 21
FMIN, // 22
POW, // 23
ATAN2, // 24
illegalOpCode, // 25
illegalOpCode, // 26
illegalOpCode, // 27
illegalOpCode, // 28
illegalOpCode, // 29
illegalOpCode, // 30
illegalOpCode, // 31
};
<
Since each instruction is independent of the predecessor and successor
prediction is not going to be good with any kind of predictor !!
<
In Mc88120 we had an indirect predictor and we gave it 1 SRAM of storage;
1024 entries; replace on mispredict. We were getting 50% indirect prediction
accuracy in SPEC89. On the other hand, the design of the front end allotted
us only 4 gates of delay from arrival of a packet to the choice of next_fetch_
_address. So we had no room for any exotic prediction algorithms.
>
>
> Stefan

Re: Another Spectre variant: Retbleed

<jwv8roroc31.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26691&group=comp.arch#26691

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sun, 17 Jul 2022 14:15:07 -0400
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <jwv8roroc31.fsf-monnier+comp.arch@gnu.org>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="617f92a15445cd5aa63900641011f127";
logging-data="3991603"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX195kD2ePaqPRAhOwwuRKrvZ"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:K7Vc5+0ou1YwntoH3Ix2KgWQADQ=
sha1:p9PwyKe9GQqNcoutx7uuUUgA/3E=

by: Stefan Monnier - Sun, 17 Jul 2022 18:15 UTC

MitchAlsup [2022-07-17 10:55:56] wrote:
> On Sunday, July 17, 2022 at 10:12:36 AM UTC-5, Stefan Monnier wrote:
>> > Tabularized calls are supported and do not need indirect calling/jumping.
>> > Tabularized calls support Method Calling.
>> Many method calls look like:
>> (x->vtable[CST]) (x, ..args...)
> Why do they not look like::
> <
> globaltable[x->method](x, ..args... )
> <
> After all, the compiler+linker can account for all methods prior to the start
> of execution:: can they not ??

For most applications nowadays you never get to see "the whole world",
e.g. because of plugins, or dynamically linked libraries, or jit, or ...

>> while calls to closures will look like
>> (x->code) (x, ...args...)
> Which languages expose closures to applications ??

AFAIK nowadays basically all languages expose some kind of closure
construct (except for C and maybe a handful of other holdovers).

> Prediction is based on the performance goals. A 1-wide in-order machine
> is unlikely to use prediction--indeed, the ISA of My 66000 was designed
> so that prediction was unnecessary on such lowly implementations.

For the typical `(x->vtable[CST]) (x, ..args...)` method calls I suspect
that the performance will suck if you don't have a BTB-style (or
better) predictor [ Tho admittedly it depends on how much your machine
is strictly "in-order", but there's about 10 cycles of latency with
a naive execution and the BTB can remove that dependency completely,
especially since practice most of those method calls always jump to the
same destination so even a simple BTB tends to work very well. ]

> Since each instruction is independent of the predecessor and successor
> prediction is not going to be good with any kind of predictor !!

Actually the kinds of predictors used on modern amd4 processors can make
use of the branch history to provide surprisingly good predictions in
these kinds of situations (basically running an interpreter). They end
up being able to take advantage of patterns in the interpreted code
(where you do get usual control flow behaviors like loops etc...).

Stefan

Re: Another Spectre variant: Retbleed

<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26692&group=comp.arch#26692

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sun, 17 Jul 2022 14:15:53 -0400
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="617f92a15445cd5aa63900641011f127";
logging-data="4001203"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18nw3TBCWpGHUsmo1B1+XHS"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:mbIiJ8Z2EI+xnFv+k34evzdjxmA=
sha1:4G6jtnTuG19TOUHzpum5G8TWZ+4=

by: Stefan Monnier - Sun, 17 Jul 2022 18:15 UTC

For most applications nowadays you never get to see "the whole world",
e.g. because of plugins, or dynamically linked libraries, or jit, or ...

>> while calls to closures will look like
>> (x->code) (x, ...args...)
> Which languages expose closures to applications ??

AFAIK nowadays basically all languages expose some kind of closure
construct (except for C and maybe a handful of other holdovers).

> Since each instruction is independent of the predecessor and successor
> prediction is not going to be good with any kind of predictor !!

Stefan

Re: Another Spectre variant: Retbleed

<b92a934a-80e1-45fb-9e7b-bba11db29474n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26699&group=comp.arch#26699

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:4542:b0:6b3:7c51:6177 with SMTP id u2-20020a05620a454200b006b37c516177mr15987084qkp.306.1658092319776;
Sun, 17 Jul 2022 14:11:59 -0700 (PDT)
X-Received: by 2002:a05:622a:198b:b0:31e:ec25:8ead with SMTP id
u11-20020a05622a198b00b0031eec258eadmr2625318qtc.423.1658092319622; Sun, 17
Jul 2022 14:11:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 17 Jul 2022 14:11:59 -0700 (PDT)
In-Reply-To: <jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org> <ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org> <ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b92a934a-80e1-45fb-9e7b-bba11db29474n@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 17 Jul 2022 21:11:59 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3774

by: MitchAlsup - Sun, 17 Jul 2022 21:11 UTC

On Sunday, July 17, 2022 at 1:15:56 PM UTC-5, Stefan Monnier wrote:
> MitchAlsup [2022-07-17 10:55:56] wrote:
> > On Sunday, July 17, 2022 at 10:12:36 AM UTC-5, Stefan Monnier wrote:
> >> > Tabularized calls are supported and do not need indirect calling/jumping.
> >> > Tabularized calls support Method Calling.
> >> Many method calls look like:
> >> (x->vtable[CST]) (x, ..args...)
> > Why do they not look like::
> > <
> > globaltable[x->method](x, ..args... )
> > <
> > After all, the compiler+linker can account for all methods prior to the start
> > of execution:: can they not ??
> For most applications nowadays you never get to see "the whole world",
> e.g. because of plugins, or dynamically linked libraries, or jit, or ...
> >> while calls to closures will look like
> >> (x->code) (x, ...args...)
> > Which languages expose closures to applications ??
> AFAIK nowadays basically all languages expose some kind of closure
> construct (except for C and maybe a handful of other holdovers).
> > Prediction is based on the performance goals. A 1-wide in-order machine
> > is unlikely to use prediction--indeed, the ISA of My 66000 was designed
> > so that prediction was unnecessary on such lowly implementations.
> For the typical `(x->vtable[CST]) (x, ..args...)` method calls I suspect
> that the performance will suck if you don't have a BTB-style (or
> better) predictor [ Tho admittedly it depends on how much your machine
> is strictly "in-order", but there's about 10 cycles of latency with
> a naive execution and the BTB can remove that dependency completely,
> especially since practice most of those method calls always jump to the
> same destination so even a simple BTB tends to work very well. ]
<
I agree that GBOoO machine will use prediction on these.
<
> > Since each instruction is independent of the predecessor and successor
> > prediction is not going to be good with any kind of predictor !!
> Actually the kinds of predictors used on modern amd4 processors can make
> use of the branch history to provide surprisingly good predictions in
> these kinds of situations (basically running an interpreter). They end
> up being able to take advantage of patterns in the interpreted code
> (where you do get usual control flow behaviors like loops etc...).
>
>
> Stefan

Re: Another Spectre variant: Retbleed

<4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26700&group=comp.arch#26700

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:50:b0:31d:29af:8224 with SMTP id y16-20020a05622a005000b0031d29af8224mr19967736qtw.350.1658092420363;
Sun, 17 Jul 2022 14:13:40 -0700 (PDT)
X-Received: by 2002:ac8:7c4d:0:b0:31a:6e91:5f7a with SMTP id
o13-20020ac87c4d000000b0031a6e915f7amr19049239qtv.441.1658092420225; Sun, 17
Jul 2022 14:13:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 17 Jul 2022 14:13:40 -0700 (PDT)
In-Reply-To: <jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org> <ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org> <ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 17 Jul 2022 21:13:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 51

by: MitchAlsup - Sun, 17 Jul 2022 21:13 UTC

On Sunday, July 17, 2022 at 1:15:56 PM UTC-5, Stefan Monnier wrote:
> MitchAlsup [2022-07-17 10:55:56] wrote:
> > On Sunday, July 17, 2022 at 10:12:36 AM UTC-5, Stefan Monnier wrote:
> >> > Tabularized calls are supported and do not need indirect calling/jumping.
> >> > Tabularized calls support Method Calling.
> >> Many method calls look like:
> >> (x->vtable[CST]) (x, ..args...)
> > Why do they not look like::
> > <
> > globaltable[x->method](x, ..args... )
> > <
> > After all, the compiler+linker can account for all methods prior to the start
> > of execution:: can they not ??
> For most applications nowadays you never get to see "the whole world",
> e.g. because of plugins, or dynamically linked libraries, or jit, or ...
> >> while calls to closures will look like
> >> (x->code) (x, ...args...)
> > Which languages expose closures to applications ??
> AFAIK nowadays basically all languages expose some kind of closure
> construct (except for C and maybe a handful of other holdovers).
> > Prediction is based on the performance goals. A 1-wide in-order machine
> > is unlikely to use prediction--indeed, the ISA of My 66000 was designed
> > so that prediction was unnecessary on such lowly implementations.
> For the typical `(x->vtable[CST]) (x, ..args...)` method calls I suspect
> that the performance will suck if you don't have a BTB-style (or
> better) predictor [ Tho admittedly it depends on how much your machine
> is strictly "in-order", but there's about 10 cycles of latency with
> a naive execution and the BTB can remove that dependency completely,
> especially since practice most of those method calls always jump to the
> same destination so even a simple BTB tends to work very well. ]
> > Since each instruction is independent of the predecessor and successor
> > prediction is not going to be good with any kind of predictor !!
> Actually the kinds of predictors used on modern amd4 processors can make
> use of the branch history to provide surprisingly good predictions in
> these kinds of situations (basically running an interpreter). They end
> up being able to take advantage of patterns in the interpreted code
> (where you do get usual control flow behaviors like loops etc...).
<
Does anyone have a pointer to a paper where they look at <typical>
branch prediction accuracies when ½ of the branches have been
converted into predication ?
>
>
> Stefan

Re: Another Spectre variant: Retbleed

<tb22d2$3tb8g$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26703&group=comp.arch#26703

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sun, 17 Jul 2022 15:27:48 -0700
Organization: A noiseless patient Spider
Lines: 118
Message-ID: <tb22d2$3tb8g$1@dont-email.me>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 17 Jul 2022 22:27:47 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="043cacd9cbc1d6af1bc71f982f6cca35";
logging-data="4107536"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QQI7npKe2TX/jbxzACAYw"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:OsazXOW5pFFA0HF3G25xIJrcUjw=
Content-Language: en-US
In-Reply-To: <ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>

by: Ivan Godard - Sun, 17 Jul 2022 22:27 UTC

On 7/17/2022 10:55 AM, MitchAlsup wrote:
> On Sunday, July 17, 2022 at 10:12:36 AM UTC-5, Stefan Monnier wrote:
>>> Tabularized calls are supported and do not need indirect calling/jumping.
>>> Tabularized calls support Method Calling.
>> Many method calls look like:
>>
>> (x->vtable[CST]) (x, ..args...)
> <
> Why do they not look like::
> <
> globaltable[x->method](x, ..args... )
> <
> After all, the compiler+linker can account for all methods prior to the start
> of execution:: can they not ??

No: ld.so at runtime.

>>
>> while calls to closures will look like
>>
>> (x->code) (x, ...args...)
> <
> Which languages expose closures to applications ??

Many, although they might not call them closures. Start with C++.

>>
>> both of those seem to fall squarely in the category of indirect calls.
>>
>> IIUC you're saying that you think you'll be able to avoid using
>> prediction for those? That seems difficult without paying a fairly
>> heavy cost (`x` is not necessarily known yet at the time you fetch those
>> instructions).
> <
> Prediction is based on the performance goals. A 1-wide in-order machine
> is unlikely to use prediction--indeed, the ISA of My 66000 was designed
> so that prediction was unnecessary on such lowly implementations. VVM,
> then, enabled these lowly implementations to achieve performance results
> within spitting distance of current GBOoO machines (that is 2.0 IPC long
> term average).
> <
> However, GBOoO My 66000's will indeed have (and need) prediction.
> I envision 3 kinds of predictors:: conditional branch predictor, call-return
> stack predictor, and an indirect predictor. Exactly how these would be
> configured is dependent on how the FETCH-DECODE part of the execution
> pipeline is designed.
> <
> Indirect prediction in the past has not have very good accuracy. In fact,
> my My 66000 simulator uses tabularized subroutine calls:: PARSE uses
> instruction<31:26> to index a format table. The format subroutine, then,
> extracts more bits from the instruction (dependent on which format is
> active) and then calls an indirect function to deal with operands (in
> DECODE) and then call the calculation indirect function (In EXECUTE).
> <
> static
> REGISTER op2table( REGISTER S1, REGISTER S2, SIGN s )[] = {
> illegalOpCode, // 0
> SR, // 1
> SL, // 2
> BMM, // 3
> ADD, // 4
> MUL, // 5
> DIV, // 6
> CMP, // 7
> MAX, // 8
> MIN, // 9
> OR, // 10
> XOR, // 11
> AND, // 12
> illegalOpcode, // 13
> illegalOpcode, // 14
> illegalOpcode, // 15
> EADD, // 16
> FADD, // 17
> FMUL, // 18
> FDIV, // 19
> FCMP, // 20
> FMAX, // 21
> FMIN, // 22
> POW, // 23
> ATAN2, // 24
> illegalOpCode, // 25
> illegalOpCode, // 26
> illegalOpCode, // 27
> illegalOpCode, // 28
> illegalOpCode, // 29
> illegalOpCode, // 30
> illegalOpCode, // 31
> };
> <
> Since each instruction is independent of the predecessor and successor
> prediction is not going to be good with any kind of predictor !!
> <
> In Mc88120 we had an indirect predictor and we gave it 1 SRAM of storage;
> 1024 entries; replace on mispredict. We were getting 50% indirect prediction
> accuracy in SPEC89. On the other hand, the design of the front end allotted
> us only 4 gates of delay from arrival of a packet to the choice of next_fetch_
> _address. So we had no room for any exotic prediction algorithms.

That's why Mill uses run-ahead prediction, that and because it permits
arbitrary fetch-ahead without needing the branch in memory.

My66 is tuned for code with small working sets, so most code will be hat
in the I$1 so a limited lookahead is enough to keep the predictor up
with the fetcher, so you only need branch prediction; indirect
prediction is a matter of tidying up.

Mill is tuned for code with working sets somewhat or much larger than
I$1 and frequently than I$2; that's why we use exit prediction. By
chaining through exits we can prefetch all the way to RAM. The drawback
is that the added state limits the number of entries for a give budget.
That prevents us from doing much history matching; there isn't much room
for multiple histories for each ebb. There's more, but it's NYF.

>>
>>
>> Stefan

Re: Another Spectre variant: Retbleed

<tb22tu$3tfr0$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26704&group=comp.arch#26704

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sun, 17 Jul 2022 15:36:47 -0700
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <tb22tu$3tfr0$1@dont-email.me>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
<4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 17 Jul 2022 22:36:46 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="043cacd9cbc1d6af1bc71f982f6cca35";
logging-data="4112224"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+pKGaFHCEfwM2scSKneOdB"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:NHLfv2hD9wYiCVS45CIgvpAyPUE=
In-Reply-To: <4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Sun, 17 Jul 2022 22:36 UTC

On 7/17/2022 2:13 PM, MitchAlsup wrote:
> On Sunday, July 17, 2022 at 1:15:56 PM UTC-5, Stefan Monnier wrote:
>> MitchAlsup [2022-07-17 10:55:56] wrote:
>>> On Sunday, July 17, 2022 at 10:12:36 AM UTC-5, Stefan Monnier wrote:
>>>>> Tabularized calls are supported and do not need indirect calling/jumping.
>>>>> Tabularized calls support Method Calling.
>>>> Many method calls look like:
>>>> (x->vtable[CST]) (x, ..args...)
>>> Why do they not look like::
>>> <
>>> globaltable[x->method](x, ..args... )
>>> <
>>> After all, the compiler+linker can account for all methods prior to the start
>>> of execution:: can they not ??
>> For most applications nowadays you never get to see "the whole world",
>> e.g. because of plugins, or dynamically linked libraries, or jit, or ...
>>>> while calls to closures will look like
>>>> (x->code) (x, ...args...)
>>> Which languages expose closures to applications ??
>> AFAIK nowadays basically all languages expose some kind of closure
>> construct (except for C and maybe a handful of other holdovers).
>>> Prediction is based on the performance goals. A 1-wide in-order machine
>>> is unlikely to use prediction--indeed, the ISA of My 66000 was designed
>>> so that prediction was unnecessary on such lowly implementations.
>> For the typical `(x->vtable[CST]) (x, ..args...)` method calls I suspect
>> that the performance will suck if you don't have a BTB-style (or
>> better) predictor [ Tho admittedly it depends on how much your machine
>> is strictly "in-order", but there's about 10 cycles of latency with
>> a naive execution and the BTB can remove that dependency completely,
>> especially since practice most of those method calls always jump to the
>> same destination so even a simple BTB tends to work very well. ]
>>> Since each instruction is independent of the predecessor and successor
>>> prediction is not going to be good with any kind of predictor !!
>> Actually the kinds of predictors used on modern amd4 processors can make
>> use of the branch history to provide surprisingly good predictions in
>> these kinds of situations (basically running an interpreter). They end
>> up being able to take advantage of patterns in the interpreted code
>> (where you do get usual control flow behaviors like loops etc...).
> <
> Does anyone have a pointer to a paper where they look at <typical>
> branch prediction accuracies when ½ of the branches have been
> converted into predication ?

No paper, but some experience: predication doesn't seem to impact
accuracy much if the predictor is bigger than the code's working set.
That is, far branches (too far to use predication) seem to be about as
predictable as close ones. However, predication reduces the number of
branch instructions to be executed, i.e. it reduces the working set of
the code. That can drop the working set into the size of the predictor,
and the predictor performance dramatically improves as the code stops
thrashing in the predictor.

That's only for the middle range of working sets of course; small sets
fit in the predictor with or without predication, and large sets don't
fit either way, and thrash.

Re: Another Spectre variant: Retbleed

<cE3BK.562174$X_i.274477@fx18.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26709&group=comp.arch#26709

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
References: <tauunv$vce$1@newsreader4.netcologne.de> <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com> <jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org> <ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com> <jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org> <ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com> <jwv1qujobas.fsf-monnier+comp.arch@gnu.org> <4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
In-Reply-To: <4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 42
Message-ID: <cE3BK.562174$X_i.274477@fx18.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 18 Jul 2022 02:33:44 UTC
Date: Sun, 17 Jul 2022 22:33:20 -0400
X-Received-Bytes: 2662

by: EricP - Mon, 18 Jul 2022 02:33 UTC

MitchAlsup wrote:
> <
> Does anyone have a pointer to a paper where they look at <typical>
> branch prediction accuracies when ½ of the branches have been
> converted into predication ?

I'm not sure about 1/2 but there are a bunch of papers from 1994 to
2007 on interactions between branch prediction and predication.
I found these by taking the title I have for the first one on
Guarded Execution, plopping it into Google Scholar,
and following the citations.
They have titles and/or abstracts that look relevant but
I haven't read them and have no idea if they actually are relevant.

Guarded execution and branch prediction in dynamic ILP processors, 1994
https://research.cs.wisc.edu/techreports/1993/TR1193.pdf

Characterizing the impact of predicated execution on
branch prediction, 1994
https://web.eecs.umich.edu/~mahlke/papers/1994/mahlke_micro94.pdf

The Effects of Predicated Execution on Branch Prediction, 1994
http://www.princeton.edu/~rblee/ELE572Papers/the-effects-of-predicated.pdf

Evaluating the Effects of Predicated Execution on Branch Prediction, 2006
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6629&rep=rep1&type=pdf

The impact of if-conversion and branch prediction on program execution
on the intel itanium processor, 2001
http://courses.cs.washington.edu/courses/cse590g/02wi/choi_y.pdf

Predicate Prediction for Efficient Out-of-order Execution, 2003
https://cseweb.ucsd.edu/~calder/papers/ICS-03-PP.pdf

Improving Branch Prediction and Predicated Execution
in Out-of-Order Processors, 2007
http://people.site.ac.upc.edu/~equinone/docs/2006-08/hpca_2007.pdf

Re: Another Spectre variant: Retbleed

<tb2o2v$50h5$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26710&group=comp.arch#26710

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Sun, 17 Jul 2022 21:37:51 -0700
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <tb2o2v$50h5$1@dont-email.me>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
<4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
<cE3BK.562174$X_i.274477@fx18.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 18 Jul 2022 04:37:51 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="043cacd9cbc1d6af1bc71f982f6cca35";
logging-data="164389"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19XoO76x6aWifjbuuOi6sSY"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:K7w3KSa3p761Z1yUFkQ9uf95mNk=
In-Reply-To: <cE3BK.562174$X_i.274477@fx18.iad>
Content-Language: en-US

by: Ivan Godard - Mon, 18 Jul 2022 04:37 UTC

On 7/17/2022 7:33 PM, EricP wrote:
> MitchAlsup wrote:
>> <
>> Does anyone have a pointer to a paper where they look at <typical>
>> branch prediction accuracies when ½ of the branches have been
>> converted into predication ?
>
> I'm not sure about 1/2 but there are a bunch of papers from 1994 to
> 2007 on interactions between branch prediction and predication.
> I found these by taking the title I have for the first one on
> Guarded Execution, plopping it into Google Scholar,
> and following the citations.
> They have titles and/or abstracts that look relevant but
> I haven't read them and have no idea if they actually are relevant.
>
> Guarded execution and branch prediction in dynamic ILP processors, 1994
> https://research.cs.wisc.edu/techreports/1993/TR1193.pdf

This presents the same guarding instruction approach that Mitch uses. So
much for that patent :-(

> Characterizing the impact of predicated execution on
> branch prediction, 1994
> https://web.eecs.umich.edu/~mahlke/papers/1994/mahlke_micro94.pdf
>
> The Effects of Predicated Execution on Branch Prediction, 1994
> http://www.princeton.edu/~rblee/ELE572Papers/the-effects-of-predicated.pdf
>
> Evaluating the Effects of Predicated Execution on Branch Prediction, 2006
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6629&rep=rep1&type=pdf
>
>
> The impact of if-conversion and branch prediction on program execution
> on the intel itanium processor, 2001
> http://courses.cs.washington.edu/courses/cse590g/02wi/choi_y.pdf
>
> Predicate Prediction for Efficient Out-of-order Execution, 2003
> https://cseweb.ucsd.edu/~calder/papers/ICS-03-PP.pdf
>
> Improving Branch Prediction and Predicated Execution
> in Out-of-Order Processors, 2007
> http://people.site.ac.upc.edu/~equinone/docs/2006-08/hpca_2007.pdf
>
>
>
>
>

Thomas Koenig <tkoenig@netcologne.de> writes:
>https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
>wrote the exploit code for this). It seems there is no end to
>this. Intel and AMD are affected, and mitigation is expensive,
>14% and 39% overhead measured. Bah.
>
>@Mitch: How expensive would not allowing speculative execution to
>update microarchitectural state be?
>
>I am thinking about stores; I assume that most stores on OoO
>architectures are speculative, so a sizable buffer (a second L1
>cache, if you will) would have to be set aside.

Store buffers are set aside already, because a store must only become
permanent on commit.

Spectre v1 and v2 work use loads for the side channel because CPUs
with speculative execution and without Spectre fix make permanent
changes to the caches when a speculative load happens. A fix would
keep the loads in load buffers und only put them in the caches on
commit. According to Mitch Alsup, CPUs already have such load
buffers. My guess is that you may want to make them somewhat bigger;
if you want to support, say, speculative loads from 32 different cache
lines that do not reside in the D-cache already, you need 32 cache
lines (2KB with 64-byte cache lines).

There are also other microarchitectural states that have been used as
a side channel (e.g., the power state of SIMD units), but all others
need much less speculative state (or, if you delay the change until it
is no longer speculative, have a much smaller performance impact) then
loads.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Anton Ertl wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>> https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
>> wrote the exploit code for this). It seems there is no end to
>> this. Intel and AMD are affected, and mitigation is expensive,
>> 14% and 39% overhead measured. Bah.
>>
>> @Mitch: How expensive would not allowing speculative execution to
>> update microarchitectural state be?
>>
>> I am thinking about stores; I assume that most stores on OoO
>> architectures are speculative, so a sizable buffer (a second L1
>> cache, if you will) would have to be set aside.
>
> Store buffers are set aside already, because a store must only become
> permanent on commit.
>
> Spectre v1 and v2 work use loads for the side channel because CPUs
> with speculative execution and without Spectre fix make permanent
> changes to the caches when a speculative load happens. A fix would
> keep the loads in load buffers und only put them in the caches on
> commit. According to Mitch Alsup, CPUs already have such load
> buffers. My guess is that you may want to make them somewhat bigger;
> if you want to support, say, speculative loads from 32 different cache
> lines that do not reside in the D-cache already, you need 32 cache
> lines (2KB with 64-byte cache lines).

As we have discussed here previously, that might be vulnerable too.
Reading a line shared for a speculated LD cache miss could cause
a remote node to downgrade it from an exclusive or modified state,
and that line state change would be detectable at the remote node.
Holding the line in a local miss buffer wouldn't help because the
remote node's cache has already changed state.

Changing this coherence behaviour would require adding a NAK to the
protocol to read the line shared if no node has it exclusive or modified,
which no one would want to do as it adds states to the protocol.

It can do local store-load forwarding because that all gets tossed
on mispredict.

One might be able to build a dependency matrix that tracked
LD and ST vs prior branch resolutions. When all older branches
have resolved then release the loads to cause real cache misses,
and allow the stores to prefetch. But this too might be vulnerable
to using exceptions like bounds check instead of branches,
so it might have to watch pending potential exceptions too.

> There are also other microarchitectural states that have been used as
> a side channel (e.g., the power state of SIMD units), but all others
> need much less speculative state (or, if you delay the change until it
> is no longer speculative, have a much smaller performance impact) then
> loads.
>
> - anton

Re: Another Spectre variant: Retbleed

<0f2b59d6-a5eb-47cd-b631-c703f731e69en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26721&group=comp.arch#26721

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:aa9c:0:b0:473:4e4c:728 with SMTP id f28-20020a0caa9c000000b004734e4c0728mr20843250qvb.114.1658159985108;
Mon, 18 Jul 2022 08:59:45 -0700 (PDT)
X-Received: by 2002:a05:620a:1724:b0:6b5:8679:a5b3 with SMTP id
az36-20020a05620a172400b006b58679a5b3mr18586602qkb.656.1658159984876; Mon, 18
Jul 2022 08:59:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 18 Jul 2022 08:59:44 -0700 (PDT)
In-Reply-To: <zhdBK.582112$JVi.365498@fx17.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <2022Jul18.114941@mips.complang.tuwien.ac.at>
<zhdBK.582112$JVi.365498@fx17.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0f2b59d6-a5eb-47cd-b631-c703f731e69en@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 18 Jul 2022 15:59:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 78

by: MitchAlsup - Mon, 18 Jul 2022 15:59 UTC

On Monday, July 18, 2022 at 8:32:18 AM UTC-5, EricP wrote:
> Anton Ertl wrote:
> > Thomas Koenig <tko...@netcologne.de> writes:
> >> https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
> >> wrote the exploit code for this). It seems there is no end to
> >> this. Intel and AMD are affected, and mitigation is expensive,
> >> 14% and 39% overhead measured. Bah.
> >>
> >> @Mitch: How expensive would not allowing speculative execution to
> >> update microarchitectural state be?
> >>
> >> I am thinking about stores; I assume that most stores on OoO
> >> architectures are speculative, so a sizable buffer (a second L1
> >> cache, if you will) would have to be set aside.
> >
> > Store buffers are set aside already, because a store must only become
> > permanent on commit.
> >
> > Spectre v1 and v2 work use loads for the side channel because CPUs
> > with speculative execution and without Spectre fix make permanent
> > changes to the caches when a speculative load happens. A fix would
> > keep the loads in load buffers und only put them in the caches on
> > commit. According to Mitch Alsup, CPUs already have such load
> > buffers. My guess is that you may want to make them somewhat bigger;
> > if you want to support, say, speculative loads from 32 different cache
> > lines that do not reside in the D-cache already, you need 32 cache
> > lines (2KB with 64-byte cache lines).
<
> As we have discussed here previously, that might be vulnerable too.
> Reading a line shared for a speculated LD cache miss could cause
> a remote node to downgrade it from an exclusive or modified state,
> and that line state change would be detectable at the remote node.
> Holding the line in a local miss buffer wouldn't help because the
> remote node's cache has already changed state.
<
yes, setting 3rd party Spectré attack strategies.
>
> Changing this coherence behaviour would require adding a NAK to the
> protocol to read the line shared if no node has it exclusive or modified,
> which no one would want to do as it adds states to the protocol.
<
My 66000 already has NaK in the protocol; currently only being used
for ATOMIC stuff.
<
The alternative is to have the ability to send the line back to its previous
owner if the line cannot be installed in Cache due to non-comit of
memory reference.
>
> It can do local store-load forwarding because that all gets tossed
> on mispredict.
>
> One might be able to build a dependency matrix that tracked
> LD and ST vs prior branch resolutions. When all older branches
> have resolved then release the loads to cause real cache misses,
<
Tidies up branches, does nothing for exceptions.
<
> and allow the stores to prefetch. But this too might be vulnerable
> to using exceptions like bounds check instead of branches,
> so it might have to watch pending potential exceptions too.
<
I don't see that you have gained anything (due to exceptions remaining
present.)
<
> > There are also other microarchitectural states that have been used as
> > a side channel (e.g., the power state of SIMD units), but all others
> > need much less speculative state (or, if you delay the change until it
> > is no longer speculative, have a much smaller performance impact) then
> > loads.
> >
> > - anton

Re: Another Spectre variant: Retbleed

<tb4770$epd$2@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26726&group=comp.arch#26726

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-2699-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Mon, 18 Jul 2022 18:02:08 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <tb4770$epd$2@newsreader4.netcologne.de>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
<4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
<cE3BK.562174$X_i.274477@fx18.iad> <tb2o2v$50h5$1@dont-email.me>
Injection-Date: Mon, 18 Jul 2022 18:02:08 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-2699-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:2699:0:7285:c2ff:fe6c:992d";
logging-data="15149"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 18 Jul 2022 18:02 UTC

Ivan Godard <ivan@millcomputing.com> schrieb:
> On 7/17/2022 7:33 PM, EricP wrote:

>> Guarded execution and branch prediction in dynamic ILP processors, 1994
>> https://research.cs.wisc.edu/techreports/1993/TR1193.pdf
>
> This presents the same guarding instruction approach that Mitch uses. So
> much for that patent :-(

Is that indeed a patent?

Mitch's name is rare enough that it is rather easy to search for
in the patent databases (unlike mine, which is relatively common),
and the only one I found post-Samsung was US20180357043, the
Transcendental calculation unit apparatus and method.

Re: Another Spectre variant: Retbleed

<49ca2db6-41a1-4b42-b51a-cb539b25058en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26727&group=comp.arch#26727

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:269a:b0:6b5:b769:2591 with SMTP id c26-20020a05620a269a00b006b5b7692591mr17692407qkp.293.1658167881581;
Mon, 18 Jul 2022 11:11:21 -0700 (PDT)
X-Received: by 2002:a05:620a:1724:b0:6b5:8679:a5b3 with SMTP id
az36-20020a05620a172400b006b58679a5b3mr19003133qkb.656.1658167881411; Mon, 18
Jul 2022 11:11:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 18 Jul 2022 11:11:21 -0700 (PDT)
In-Reply-To: <tb4770$epd$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org> <ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org> <ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org> <4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
<cE3BK.562174$X_i.274477@fx18.iad> <tb2o2v$50h5$1@dont-email.me> <tb4770$epd$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <49ca2db6-41a1-4b42-b51a-cb539b25058en@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 18 Jul 2022 18:11:21 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16

by: MitchAlsup - Mon, 18 Jul 2022 18:11 UTC

On Monday, July 18, 2022 at 1:02:11 PM UTC-5, Thomas Koenig wrote:
> Ivan Godard <iv...@millcomputing.com> schrieb:
> > On 7/17/2022 7:33 PM, EricP wrote:
>
> >> Guarded execution and branch prediction in dynamic ILP processors, 1994
> >> https://research.cs.wisc.edu/techreports/1993/TR1193.pdf
> >
> > This presents the same guarding instruction approach that Mitch uses. So
> > much for that patent :-(
> Is that indeed a patent?
>
> Mitch's name is rare enough that it is rather easy to search for
> in the patent databases (unlike mine, which is relatively common),
> and the only one I found post-Samsung was US20180357043, the
> Transcendental calculation unit apparatus and method.
<
You will find my name on 52 patents at USPTO.

Re: Another Spectre variant: Retbleed

<tb490n$gf9$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26729&group=comp.arch#26729

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-2699-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
Date: Mon, 18 Jul 2022 18:32:55 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <tb490n$gf9$1@newsreader4.netcologne.de>
References: <tauunv$vce$1@newsreader4.netcologne.de>
<baec3aea-8428-4b0f-8c3a-305a9d8fdb8bn@googlegroups.com>
<jwvilnwpjoz.fsf-monnier+comp.arch@gnu.org>
<ddd8ed6f-4d04-42ff-b91e-6fa2bd1ce32bn@googlegroups.com>
<jwvpmi3ojy7.fsf-monnier+comp.arch@gnu.org>
<ed14f829-3f46-4a90-9cda-304f3746a9afn@googlegroups.com>
<jwv1qujobas.fsf-monnier+comp.arch@gnu.org>
<4eb12252-fbaf-44ca-a70a-870a77d1d88bn@googlegroups.com>
<cE3BK.562174$X_i.274477@fx18.iad> <tb2o2v$50h5$1@dont-email.me>
<tb4770$epd$2@newsreader4.netcologne.de>
<49ca2db6-41a1-4b42-b51a-cb539b25058en@googlegroups.com>
Injection-Date: Mon, 18 Jul 2022 18:32:55 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-2699-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:2699:0:7285:c2ff:fe6c:992d";
logging-data="16873"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 18 Jul 2022 18:32 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Monday, July 18, 2022 at 1:02:11 PM UTC-5, Thomas Koenig wrote:
>> Ivan Godard <iv...@millcomputing.com> schrieb:
>> > On 7/17/2022 7:33 PM, EricP wrote:
>>
>> >> Guarded execution and branch prediction in dynamic ILP processors, 1994
>> >> https://research.cs.wisc.edu/techreports/1993/TR1193.pdf
>> >
>> > This presents the same guarding instruction approach that Mitch uses. So
>> > much for that patent :-(
>> Is that indeed a patent?
>>
>> Mitch's name is rare enough that it is rather easy to search for
>> in the patent databases (unlike mine, which is relatively common),
>> and the only one I found post-Samsung was US20180357043, the
>> Transcendental calculation unit apparatus and method.
><
> You will find my name on 52 patents at USPTO.

That's quite a number (but apparently I didn't find them all,
I only used Google patents, which is somewhat iffy). I didn't
use the patent database I use at work, for obvious reasons.

Re: Another Spectre variant: Retbleed

<nkkBK.488767$zgr9.426762@fx13.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26737&group=comp.arch#26737

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Another Spectre variant: Retbleed
References: <tauunv$vce$1@newsreader4.netcologne.de> <2022Jul18.114941@mips.complang.tuwien.ac.at> <zhdBK.582112$JVi.365498@fx17.iad> <0f2b59d6-a5eb-47cd-b631-c703f731e69en@googlegroups.com>
In-Reply-To: <0f2b59d6-a5eb-47cd-b631-c703f731e69en@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 117
Message-ID: <nkkBK.488767$zgr9.426762@fx13.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 18 Jul 2022 21:33:07 UTC
Date: Mon, 18 Jul 2022 17:32:16 -0400
X-Received-Bytes: 6400

by: EricP - Mon, 18 Jul 2022 21:32 UTC

MitchAlsup wrote:
> On Monday, July 18, 2022 at 8:32:18 AM UTC-5, EricP wrote:
>> Anton Ertl wrote:
>>> Thomas Koenig <tko...@netcologne.de> writes:
>>>> https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
>>>> wrote the exploit code for this). It seems there is no end to
>>>> this. Intel and AMD are affected, and mitigation is expensive,
>>>> 14% and 39% overhead measured. Bah.
>>>>
>>>> @Mitch: How expensive would not allowing speculative execution to
>>>> update microarchitectural state be?
>>>>
>>>> I am thinking about stores; I assume that most stores on OoO
>>>> architectures are speculative, so a sizable buffer (a second L1
>>>> cache, if you will) would have to be set aside.
>>> Store buffers are set aside already, because a store must only become
>>> permanent on commit.
>>>
>>> Spectre v1 and v2 work use loads for the side channel because CPUs
>>> with speculative execution and without Spectre fix make permanent
>>> changes to the caches when a speculative load happens. A fix would
>>> keep the loads in load buffers und only put them in the caches on
>>> commit. According to Mitch Alsup, CPUs already have such load
>>> buffers. My guess is that you may want to make them somewhat bigger;
>>> if you want to support, say, speculative loads from 32 different cache
>>> lines that do not reside in the D-cache already, you need 32 cache
>>> lines (2KB with 64-byte cache lines).
> <
>> As we have discussed here previously, that might be vulnerable too.
>> Reading a line shared for a speculated LD cache miss could cause
>> a remote node to downgrade it from an exclusive or modified state,
>> and that line state change would be detectable at the remote node.
>> Holding the line in a local miss buffer wouldn't help because the
>> remote node's cache has already changed state.
> <
> yes, setting 3rd party Spectré attack strategies.
>> Changing this coherence behaviour would require adding a NAK to the
>> protocol to read the line shared if no node has it exclusive or modified,
>> which no one would want to do as it adds states to the protocol.
> <
> My 66000 already has NaK in the protocol; currently only being used
> for ATOMIC stuff.
> <
> The alternative is to have the ability to send the line back to its previous
> owner if the line cannot be installed in Cache due to non-comit of
> memory reference.

Hmmm... a cache line 'Rebound'?
My guess is that would probably go over even less popular than NAK.

AIUI the problem with NAK is not just the extra protocol states
which causes possible interactions to grow exponentially.
It was also that resources would be allocated for a miss,
cache victims chosen and evicted, possibly move store data from
LSQ into the pending cache miss buffer to free the LSQ entry.
Then the cache gets back a NAK and... does what?

So it can't move the store data to the miss buffer early,
and it either lives with the empty slot or puts back the victim,
then wakes up the waiting LSQ entries which immediately
trigger the miss sequence again.

A rebound would be like a NAK but with longer latency,
and with the possibility of other interviening cache changes,
that all have to be backed out of. Yeech.

>> It can do local store-load forwarding because that all gets tossed
>> on mispredict.
>>
>> One might be able to build a dependency matrix that tracked
>> LD and ST vs prior branch resolutions. When all older branches
>> have resolved then release the loads to cause real cache misses,
> <
> Tidies up branches, does nothing for exceptions.
> <
>> and allow the stores to prefetch. But this too might be vulnerable
>> to using exceptions like bounds check instead of branches,
>> so it might have to watch pending potential exceptions too.
> <
> I don't see that you have gained anything (due to exceptions remaining
> present.)

Just spitballing...
I'll call the state of a uOp after execution Finished.
When branch instructions are finished they have resolved and,
if a mispredict, already purged mispredicted younger uOps.

All instructions except LD and ST are unaffected by the finished
status of older uOps so they do not need to take them into account
for their scheduling.

LD and ST can translate addresses in the TLB in any order as soon
as the address is ready. However because a table walk would cause
cache state and TLB changes we must synchronize how these proceed.
LD and ST table walks must wait until all older uOps are finished
and exception free before proceeding one at a time.

Note however this serializes all TLB miss table walks as we don't
know if the current walk will ultimately page fault so we can't
begin the next as it might change cache too early. So there is only
one table walker (but that was probably going to happen anyway).

After translate LD can proceed if it can receive a store-load forward,
or if it cache hits. Otherwise LD must wait until older uOps are finished
and exception free (if LD got a TLB hit it would have skipped this check
earlier so must do it now).
Note that memory disambiguation will group LD's to the same cache line
together and process the groups concurrently, So LD's to different lines
can hit the cache while others stall.

ST must wait to reach the Commit/Retire stage but can issue cache
prefetches as soon as older uOps are finished and exception free.

So it looks like there is still some concurrency and prefetching
opportunities to be had even while respecting Spectre cache rules.

Re: Another Spectre variant: Retbleed

<2b25190b-a2a2-4396-a484-c1fdd78bdce2n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=26742&group=comp.arch#26742

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5c81:0:b0:31e:a97f:33c4 with SMTP id r1-20020ac85c81000000b0031ea97f33c4mr22893760qta.674.1658181921118;
Mon, 18 Jul 2022 15:05:21 -0700 (PDT)
X-Received: by 2002:a05:622a:591:b0:31d:4044:c457 with SMTP id
c17-20020a05622a059100b0031d4044c457mr23504493qtb.331.1658181920824; Mon, 18
Jul 2022 15:05:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 18 Jul 2022 15:05:20 -0700 (PDT)
In-Reply-To: <nkkBK.488767$zgr9.426762@fx13.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tauunv$vce$1@newsreader4.netcologne.de> <2022Jul18.114941@mips.complang.tuwien.ac.at>
<zhdBK.582112$JVi.365498@fx17.iad> <0f2b59d6-a5eb-47cd-b631-c703f731e69en@googlegroups.com>
<nkkBK.488767$zgr9.426762@fx13.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2b25190b-a2a2-4396-a484-c1fdd78bdce2n@googlegroups.com>
Subject: Re: Another Spectre variant: Retbleed
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 18 Jul 2022 22:05:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 143

by: MitchAlsup - Mon, 18 Jul 2022 22:05 UTC

On Monday, July 18, 2022 at 4:33:10 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
<snip>
> > yes, setting 3rd party Spectré attack strategies.
> >> Changing this coherence behaviour would require adding a NAK to the
> >> protocol to read the line shared if no node has it exclusive or modified,
> >> which no one would want to do as it adds states to the protocol.
> > <
> > My 66000 already has NaK in the protocol; currently only being used
> > for ATOMIC stuff.
> > <
> > The alternative is to have the ability to send the line back to its previous
> > owner if the line cannot be installed in Cache due to non-comit of
> > memory reference.
<
> Hmmm... a cache line 'Rebound'?
> My guess is that would probably go over even less popular than NAK.
>
> AIUI the problem with NAK is not just the extra protocol states
> which causes possible interactions to grow exponentially.
> It was also that resources would be allocated for a miss,
> cache victims chosen and evicted, possibly move store data from
> LSQ into the pending cache miss buffer to free the LSQ entry.
> Then the cache gets back a NAK and... does what?
<
NAK in the protocol pushes the architect in the direction of exclusive
caches. Thus, you do not select and evict on miss, you select and
evict on data arrival. Remember, Spectrè is going to prevent you
from modifying your cache until the instruction performing the
modification can retire. So, you already LOST the ability to select
and evict up front, you just have not seen a Spectrè attack using
that. It is gone, none-the-less.
<
When the protocol is well sorted, a NAK arriving back at original
requestor cause the request to be restarted {unless the request
was participating in an ATOMIC event in which case, the ATOMIC
event fails instead}.
<
At the point of replay, that cache is in the same sate as it was when
the request went out the first time.
>
> So it can't move the store data to the miss buffer early,
> and it either lives with the empty slot or puts back the victim,
> then wakes up the waiting LSQ entries which immediately
> trigger the miss sequence again.
>
> A rebound would be like a NAK but with longer latency,
> and with the possibility of other interviening cache changes,
> that all have to be backed out of. Yeech.
<
Rebound is not data-present is is a command message only. The
downgraded line remains
<
> >> It can do local store-load forwarding because that all gets tossed
> >> on mispredict.
> >>
> >> One might be able to build a dependency matrix that tracked
> >> LD and ST vs prior branch resolutions. When all older branches
> >> have resolved then release the loads to cause real cache misses,
> > <
> > Tidies up branches, does nothing for exceptions.
> > <
> >> and allow the stores to prefetch. But this too might be vulnerable
> >> to using exceptions like bounds check instead of branches,
> >> so it might have to watch pending potential exceptions too.
> > <
> > I don't see that you have gained anything (due to exceptions remaining
> > present.)
> Just spitballing...
> I'll call the state of a uOp after execution Finished.
We used Complete.
> When branch instructions are finished they have resolved and,
> if a mispredict, already purged mispredicted younger uOps.
You also have to do similarly when an exception is recognized.
>
> All instructions except LD and ST are unaffected by the finished
> status of older uOps so they do not need to take them into account
> for their scheduling.
<
Memory references are dependent on their operand ordering and on
their memory ordering. {Not sure if there is any disagreement here,
just clarifying. We are also assuming result ordering was renamed
away.}
>
> LD and ST can translate addresses in the TLB in any order as soon
> as the address is ready. However because a table walk would cause
> cache state and TLB changes we must synchronize how these proceed.
> LD and ST table walks must wait until all older uOps are finished
> and exception free before proceeding one at a time.
<
We never found a need to do this. Mc 88120 started table walks as
TLB misses occurred. What Spectrè brings in is that you cannot
update TLB until the TLB-missing instruction becomes consistent.
{Consistent is when all older instructions are complete and exception
free.} An expendable buffer associated with TLB allows for these
walked translations to service other memory references without
being "in" the TLB. They migrate to TLB after the cause becomes
consistent.
>
> Note however this serializes all TLB miss table walks as we don't
> know if the current walk will ultimately page fault so we can't
> begin the next as it might change cache too early. So there is only
> one table walker (but that was probably going to happen anyway).
<
Mc 88120 measured this and did not find it to be a troublesome
problem in practice. We built and measured a table walker that
could service several TLB misses simultaneously (about 4) and
found no particular advantage compared to 1. But machines are
extracting more ILP now, so it would be best to rerun this experiment.
>
> After translate LD can proceed if it can receive a store-load forward,
> or if it cache hits. Otherwise LD must wait until older uOps are finished
> and exception free (if LD got a TLB hit it would have skipped this check
> earlier so must do it now).
<
You have this all lined up as if there was no buffering that relaxes the
problem enough to warrant their inclusion. I am not applying these
restrictions ...
<
> Note that memory disambiguation will group LD's to the same cache line
> together and process the groups concurrently, So LD's to different lines
> can hit the cache while others stall.
<
As I taught Luke a couple of years ago, disambiguation is more about
filtering out the "can't be the same as" lines from the "absolutely is
the same as". The former can be done with lower order untranslated
address bits, while the later requires physical address bits. In practice
the former is a lot easier to implement and looses virtually nothing.
>
> ST must wait to reach the Commit/Retire stage but can issue cache
> prefetches as soon as older uOps are finished and exception free.
<
Earlier with buffering.
>
> So it looks like there is still some concurrency and prefetching
> opportunities to be had even while respecting Spectre cache rules.
<
I agree with that sentiment.

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Anton Ertl wrote:
>> Thomas Koenig <tkoenig@netcologne.de> writes:
>>> https://comsec.ethz.ch/research/microarch/retbleed/ (A foaf actually
>>> wrote the exploit code for this). It seems there is no end to
>>> this. Intel and AMD are affected, and mitigation is expensive,
>>> 14% and 39% overhead measured. Bah.
>>>
>>> @Mitch: How expensive would not allowing speculative execution to
>>> update microarchitectural state be?
>>>
>>> I am thinking about stores; I assume that most stores on OoO
>>> architectures are speculative, so a sizable buffer (a second L1
>>> cache, if you will) would have to be set aside.
>>
>> Store buffers are set aside already, because a store must only become
>> permanent on commit.
>>
>> Spectre v1 and v2 work use loads for the side channel because CPUs
>> with speculative execution and without Spectre fix make permanent
>> changes to the caches when a speculative load happens. A fix would
>> keep the loads in load buffers und only put them in the caches on
>> commit. According to Mitch Alsup, CPUs already have such load
>> buffers. My guess is that you may want to make them somewhat bigger;
>> if you want to support, say, speculative loads from 32 different cache
>> lines that do not reside in the D-cache already, you need 32 cache
>> lines (2KB with 64-byte cache lines).
>
>As we have discussed here previously, that might be vulnerable too.
>Reading a line shared for a speculated LD cache miss could cause
>a remote node to downgrade it from an exclusive or modified state,
>and that line state change would be detectable at the remote node.

You cannot do that, as you discussed with Mitch Alsup. There are also
other issues, like the bandwidth side channel. But the question was
about the needed buffer size, and these other issues have little to
do with that.

>Holding the line in a local miss buffer wouldn't help because the
>remote node's cache has already changed state.

You don't do that with speculative accesses. If the speculative
access can be handled without change to permanent microarchitectural
state, it can go ahead, otherwise it has to wait until it is no longer
speculative. Given that most accesses don't require such a change,
and that accesses to remote caches usually take more cycles than
waiting for the commit, I expect that the slowdown from this waiting
will be small.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Polymer physicists are into chains.

devel / comp.arch / Re: Another Spectre variant: Retbleed

devel / comp.arch / Re: Another Spectre variant: Retbleed

Subject	Author
Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stefan Monnier
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stefan Monnier
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stefan Monnier
Re: Another Spectre variant: Retbleed	Stefan Monnier
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	EricP
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Tim Rentsch
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Tim Rentsch
Re: Another Spectre variant: Retbleed	Tim Rentsch
Another Spectre variant: Retbleed	Andy Valencia
Re: Another Spectre variant: Retbleed	EricP
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	EricP
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	EricP
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Quadibloc
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Quadibloc
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Stephen Fuld
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Brett
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Brett
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Paul A. Clayton
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Scott Lurndal
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	Paul A. Clayton
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	Stefan Monnier
Value prediction (was: Another Spectre variant: Retbleed)	Anton Ertl
Re: Value prediction	Stefan Monnier
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Brett
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Michael S
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stefan Monnier
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stephen Fuld
Re: Another Spectre variant: Retbleed	Brett
Re: Another Spectre variant: Retbleed	Quadibloc
Instruction-level parallelism and static scheduling (was: Another ...)	Anton Ertl
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Ivan Godard
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stephen Fuld
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stephen Fuld
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Stephen Fuld
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Terje Mathisen
Re: Another Spectre variant: Retbleed	Thomas Koenig
Re: Another Spectre variant: Retbleed	MitchAlsup
Re: Another Spectre variant: Retbleed	Anton Ertl
Re: Another Spectre variant: Retbleed	Terje Mathisen
Re: Another Spectre variant: Retbleed	Terje Mathisen