Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

A mathematician is a device for turning coffee into theorems. -- P. Erdos


devel / comp.arch / Re: branchless binary search

SubjectAuthor
* Ill-advised use of CMOVEStefan Monnier
+* Re: Ill-advised use of CMOVEThomas Koenig
|`* Re: Ill-advised use of CMOVEStephen Fuld
| `* Re: Ill-advised use of CMOVEThomas Koenig
|  +* Re: Ill-advised use of CMOVEStephen Fuld
|  |`- Re: Ill-advised use of CMOVEaph
|  `* Re: Ill-advised use of CMOVEAnton Ertl
|   +* Re: Ill-advised use of CMOVEMichael S
|   |`* Re: Ill-advised use of CMOVEAnton Ertl
|   | `* Re: Ill-advised use of CMOVEMichael S
|   |  +* Re: Ill-advised use of CMOVEIvan Godard
|   |  |`* Re: Ill-advised use of CMOVEMichael S
|   |  | `* Re: Ill-advised use of CMOVEIvan Godard
|   |  |  `- Re: Ill-advised use of CMOVEMichael S
|   |  `* Re: Ill-advised use of CMOVEAnton Ertl
|   |   +- Re: Ill-advised use of CMOVEMichael S
|   |   `* branchless binary search (was: Ill-advised use of CMOVE)Anton Ertl
|   |    +* Re: branchless binary searchStefan Monnier
|   |    |`- Re: branchless binary searchAnton Ertl
|   |    +- Re: branchless binary searchTerje Mathisen
|   |    `* Re: branchless binary searchEricP
|   |     +* Re: branchless binary searchMichael S
|   |     |+* Re: branchless binary searchStephen Fuld
|   |     ||`* Re: branchless binary searchMichael S
|   |     || +- Re: branchless binary searchThomas Koenig
|   |     || `* Spectre fix (was: branchless binary search)Anton Ertl
|   |     ||  `- Re: Spectre fix (was: branchless binary search)Michael S
|   |     |+* Re: branchless binary searchStefan Monnier
|   |     ||`- Re: branchless binary searchMitchAlsup
|   |     |`* Re: branchless binary searchAndy Valencia
|   |     | `- Re: branchless binary searchTerje Mathisen
|   |     `* Re: branchless binary searchAnton Ertl
|   |      `* Re: branchless binary searchMitchAlsup
|   |       `* Spectre and resource comtention (was: branchless binary search)Anton Ertl
|   |        +* Re: Spectre and resource comtentionStefan Monnier
|   |        |+- Re: Spectre and resource comtentionMitchAlsup
|   |        |`- Re: Spectre and resource comtentionAnton Ertl
|   |        `* Re: Spectre and resource comtention (was: branchless binary search)MitchAlsup
|   |         `* Re: Spectre and resource comtention (was: branchless binary search)Anton Ertl
|   |          `- Re: Spectre and resource comtention (was: branchless binary search)MitchAlsup
|   +* Re: Ill-advised use of CMOVEStephen Fuld
|   |`* binary search vs. hash tables (was: Ill-advised use of CMOVE)Anton Ertl
|   | +* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Stephen Fuld
|   | |`* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)John Levine
|   | | `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |  `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |   `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |    `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |     `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |      `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |       `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |        `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Brett
|   | |         `- Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | `- Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)John Levine
|   +* Re: Ill-advised use of CMOVEEricP
|   |`* Re: Ill-advised use of CMOVEBGB
|   | +* Re: Ill-advised use of CMOVEMitchAlsup
|   | |+* Re: Ill-advised use of CMOVEBGB
|   | ||`* Re: Ill-advised use of CMOVEMitchAlsup
|   | || +* Re: Ill-advised use of CMOVEBGB
|   | || |`- Re: Ill-advised use of CMOVEMitchAlsup
|   | || `- Re: Ill-advised use of CMOVEIvan Godard
|   | |`* Re: Ill-advised use of CMOVEIvan Godard
|   | | `- Re: Ill-advised use of CMOVEMitchAlsup
|   | `- Re: Ill-advised use of CMOVEIvan Godard
|   `- Re: Ill-advised use of CMOVEThomas Koenig
+* Re: Ill-advised use of CMOVETerje Mathisen
|+* Re: Ill-advised use of CMOVEMitchAlsup
||`* Re: Ill-advised use of CMOVETerje Mathisen
|| `* Re: Ill-advised use of CMOVEMitchAlsup
||  `* Re: Ill-advised use of CMOVEMarcus
||   `* Re: Ill-advised use of CMOVEMitchAlsup
||    `* Re: Ill-advised use of CMOVEBGB
||     `* Re: Ill-advised use of CMOVEMitchAlsup
||      `- Re: Ill-advised use of CMOVEBGB
|+- Re: Ill-advised use of CMOVEIvan Godard
|`* Re: Ill-advised use of CMOVEStephen Fuld
| +* Re: Ill-advised use of CMOVEMichael S
| |`* Re: Ill-advised use of CMOVEStephen Fuld
| | `- Re: Ill-advised use of CMOVEAnton Ertl
| +* Re: Ill-advised use of CMOVEStefan Monnier
| |`* Re: Ill-advised use of CMOVEStephen Fuld
| | `* Re: Ill-advised use of CMOVEAnton Ertl
| |  `* Re: Ill-advised use of CMOVEBGB
| |   `* Re: Ill-advised use of CMOVEMitchAlsup
| |    +- Re: Ill-advised use of CMOVEBGB
| |    `* Re: Ill-advised use of CMOVEThomas Koenig
| |     +- Re: Ill-advised use of CMOVEAnton Ertl
| |     `- Re: Ill-advised use of CMOVEMitchAlsup
| +* Re: Ill-advised use of CMOVEAnton Ertl
| |`* Re: Ill-advised use of CMOVEMitchAlsup
| | `* Re: Ill-advised use of CMOVEEricP
| |  +* Re: Ill-advised use of CMOVEMichael S
| |  |`* Re: Ill-advised use of CMOVEEricP
| |  | +- Re: Ill-advised use of CMOVEMitchAlsup
| |  | +* Re: Ill-advised use of CMOVETerje Mathisen
| |  | |`* Re: Ill-advised use of CMOVEEricP
| |  | | `* Re: Ill-advised use of CMOVEMitchAlsup
| |  | |  `- Re: Ill-advised use of CMOVEBGB
| |  | `- Re: Ill-advised use of CMOVEAnton Ertl
| |  `* Re: Ill-advised use of CMOVEAnton Ertl
| `* Re: Ill-advised use of CMOVETerje Mathisen
+- Re: Ill-advised use of CMOVEAnton Ertl
`* Re: Ill-advised use of CMOVEScott Michel

Pages:123456
Re: Ill-advised use of CMOVE

<t61b96$fhn$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25414&group=comp.arch#25414

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Tue, 17 May 2022 16:31:50 -0700
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <t61b96$fhn$1@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <OQrgK.3334$6y5f.1197@fx40.iad>
<t60p72$el5$1@dont-email.me>
<a2fe9cd2-cacf-4c4e-ac89-d68dcbb4bb4dn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 17 May 2022 23:31:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="368e5f2b90e3d951fb5f2b7df8e7b0e2";
logging-data="15927"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+j+QULThbZors2jvOxlTGb"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:P3k/v0r/khwgloXOoYMnA3NAn7A=
In-Reply-To: <a2fe9cd2-cacf-4c4e-ac89-d68dcbb4bb4dn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Tue, 17 May 2022 23:31 UTC

On 5/17/2022 12:46 PM, MitchAlsup wrote:
> On Tuesday, May 17, 2022 at 1:23:33 PM UTC-5, BGB wrote:
>> On 5/16/2022 7:54 AM, EricP wrote:
>>> Anton Ertl wrote:
>>>> Thomas Koenig <tko...@netcologne.de> writes:
>>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>>>>>
>>>>>> So, I think a good solution is for the compiler to essentially never
>>>>>> decide to issue CMOV as a replacement for a conditional branch for
>>>>>> architectures with good branch predictors (not saying anything here
>>>>>> about predication.).
>>>>> Binary search was also recently quoted here as an example where
>>>>> CMOV makes a lot of sense
>>>>
>>>> As is often the case, this depends very much on the inputs.
>>>>
>>>> And then there's the question of why one uses binary search in the
>>>> first place. The only time I remember using it is when searching for
>>>> the try-catch block of an exception in a JavaVM implementation. In
>>>> that case it is likely that the path through the binary search can be
>>>> predicted from the branch history before the throw.
>>>
>>> Binary search is one of the decompositions for SWITCH statements
>>> with discontiguous blocks.
>>>
>>
>> Agreed.
>>
>> Before I added jump-tables to BGBCC, binary search was the primary form
>> of "switch()".
>>
>>
>> Say:
>> 1..5 case labels:
>> Use if-Else
>> 6+: Divide space in half, recurse on left and right halves.
>> Compare-and-branch to right half.
>>
>> But, with jump tables:
>> 1..5 case labels:
>> Use if-Else
>> 6..511 case labels, density>=50%:
>> Use jump table.
>> Else: Divide space in half, recurse on left and right halves.
>> Compare-and-branch to right half.
> <
> My 66000 has tabularized jumps directly (and PIC, too)
> For a short switch clause (less than 256 instructions in the switch clause)
> The tabularized jump is ALWAYS smaller than a series of if-elses.

Well, sort of, for the way you handle branches and prediction anyway.
But not so much when you can have a four-way branch in a single bundle,
with less that one prediction for that.

The reason search can be smaller than table is that many searches can be
done with no branch instructions at all. A lot of switches have trivial
case bodies, and these can be folded along with the comparisons using
predication or selection, becoming sequential code. And with no branches
to miss when the switch is unpredictable.

Gotta be wide for that though :-)

Re: Ill-advised use of CMOVE

<t61bgd$fhn$2@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25415&group=comp.arch#25415

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Tue, 17 May 2022 16:35:42 -0700
Organization: A noiseless patient Spider
Lines: 169
Message-ID: <t61bgd$fhn$2@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <OQrgK.3334$6y5f.1197@fx40.iad>
<t60p72$el5$1@dont-email.me>
<a2fe9cd2-cacf-4c4e-ac89-d68dcbb4bb4dn@googlegroups.com>
<t6116l$cgc$1@dont-email.me>
<6f006c4d-143b-424d-8aa5-d15e86dc0496n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 17 May 2022 23:35:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="368e5f2b90e3d951fb5f2b7df8e7b0e2";
logging-data="15927"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX183aDZI32yrOysGG2/urMHe"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:38HMQ0YuKHdEdMzLBoBB0JbfKx0=
In-Reply-To: <6f006c4d-143b-424d-8aa5-d15e86dc0496n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Tue, 17 May 2022 23:35 UTC

On 5/17/2022 2:17 PM, MitchAlsup wrote:
> On Tuesday, May 17, 2022 at 3:39:52 PM UTC-5, BGB wrote:
>> On 5/17/2022 2:46 PM, MitchAlsup wrote:
>>> On Tuesday, May 17, 2022 at 1:23:33 PM UTC-5, BGB wrote:
>>>> On 5/16/2022 7:54 AM, EricP wrote:
>>>>> Anton Ertl wrote:
>>>>>> Thomas Koenig <tko...@netcologne.de> writes:
>>>>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>>>>>>>
>>>>>>>> So, I think a good solution is for the compiler to essentially never
>>>>>>>> decide to issue CMOV as a replacement for a conditional branch for
>>>>>>>> architectures with good branch predictors (not saying anything here
>>>>>>>> about predication.).
>>>>>>> Binary search was also recently quoted here as an example where
>>>>>>> CMOV makes a lot of sense
>>>>>>
>>>>>> As is often the case, this depends very much on the inputs.
>>>>>>
>>>>>> And then there's the question of why one uses binary search in the
>>>>>> first place. The only time I remember using it is when searching for
>>>>>> the try-catch block of an exception in a JavaVM implementation. In
>>>>>> that case it is likely that the path through the binary search can be
>>>>>> predicted from the branch history before the throw.
>>>>>
>>>>> Binary search is one of the decompositions for SWITCH statements
>>>>> with discontiguous blocks.
>>>>>
>>>>
>>>> Agreed.
>>>>
>>>> Before I added jump-tables to BGBCC, binary search was the primary form
>>>> of "switch()".
>>>>
>>>>
>>>> Say:
>>>> 1..5 case labels:
>>>> Use if-Else
>>>> 6+: Divide space in half, recurse on left and right halves.
>>>> Compare-and-branch to right half.
>>>>
>>>> But, with jump tables:
>>>> 1..5 case labels:
>>>> Use if-Else
>>>> 6..511 case labels, density>=50%:
>>>> Use jump table.
>>>> Else: Divide space in half, recurse on left and right halves.
>>>> Compare-and-branch to right half.
>>> <
>>> My 66000 has tabularized jumps directly (and PIC, too)
>>> For a short switch clause (less than 256 instructions in the switch clause)
>>> The tabularized jump is ALWAYS smaller than a series of if-elses.
>>> <
>>> So, the instruction looks like::
>>> <
>>> JTT<size> Rcase, #Bounds
>>> <
>>> if( Rcase < 0 or Rcase > Brounds ) IP = IP + 4 + (Bounds<<2)
>>> else IP = IP + memory.unsigned.size[ IP + 4 + Rcase << size ] << 2;
>>> <
>>> some of the time these offsets are bytes so each entry in the table is ¼
>>> the size of a branch or â…›th the label of an address. But almost all switch
>>> statements are encoded as 16-bit displacements (unless they fit a byte).
>>> <
>>> The table in in code space, is fetched through the instruction cache,
>>> needs no intermediate register, and is position independent.
>>> <
>>> Typical code sequences look like:
>>> <
>>> JTT Rt,#bounds
>>> .byte label0-.,label1-.,label2-.,default-.
>>> ........ // until all the labels have been enumerated
>>> label0: // case[0]
>>> BR end_switch
>>> label1: // case[1]
>>> BR end_switch
>>> ...
>>> default: // default clause
>>> ...
>>> end_switch:
>>> // normal code from here on out.
>> OK.
>>
>>
>> In my case, it would be something like:
>> SUB Ri, Base, Rt
>> CMPHI Limit, Rt //unsigned Rt>Limit
>> BT .default
>> ADD Rt, Rt //(assuming a table of 32-bit branch ops)
>> BRA (PC, Rt) //AKA: "BRAF Rt"
>> ... table of branch ops ...
>>
>> Would take around 16 clock cycles.
> <
> Which is 12 too many........
>>
>>
>>
>> But, for 5 or less, eg:
>> CMPEQ Val1, Ri
>> BT .lbl1
>> CMPEQ Val2, Ri
>> BT .lbl2
>> CMPEQ Val3, Ri
>> BT .lbl3
>> CMPEQ Val4, Ri
>> BT .lbl4
>> CMPEQ Val5, Ri
>> BT .lbl5
>> BRA .default
>>
>> Will (on average) get to the destination in less than 16 clock-cycles
>> (or less for shorter sequences).
> <
> Mine takes 4 cycles uniformly.

Two on Gold, but shared with the case bodies so likely zero.

>>
>>
>> For the binary-tree case, the binary branches will tend to have a higher
>> mispredict rate than the linear-search branches (so a cut-off of 5 makes
>> more sense than 3 or similar).
> <
> Note: I was only talking about switches in general and not participating in the
> search portions of this thread.
>>
>>
>> Say:
>> CMPGE Val6, Ri
>> BT .sub1
>>
>> CMPEQ Val1, Ri
>> BT .lbl1
>> CMPEQ Val2, Ri
>> BT .lbl2
>> CMPEQ Val3, Ri
>> BT .lbl3
>> CMPEQ Val4, Ri
>> BT .lbl4
>> CMPEQ Val5, Ri
>> BT .lbl5
>> BRA .default
>>
>> .sub1:
>> CMPEQ Val6, Ri
>> BT .lbl6
>> CMPEQ Val7, Ri
>> BT .lbl7
>> CMPEQ Val8, Ri
>> BT .lbl8
>> CMPEQ Val9, Ri
>> BT .lbl9
>> BRA .default
>>
>>
>> Makes sense if the value space is sparse.
> <
> By encoding labels as word displacements off the IP, one can allow
> much greater sparseness without impacting code size or cycles.
>>>>
>>>>
>>>> This method deals acceptably with things like switch'ing based on a
>>>> bunch of FOURCC values or similar.
>>>>
>>>>
>>>> Generally, switch needs to gather up a list of all the case labels, sort
>>>> them, and then build the dispatch structure, followed by compiling the
>>>> switch body.

Re: Ill-advised use of CMOVE

<9922efb7-a7fa-4ab2-b5d5-c4c656bac9b6n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25419&group=comp.arch#25419

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7fd0:0:b0:2f3:fda4:6ddf with SMTP id b16-20020ac87fd0000000b002f3fda46ddfmr21898514qtk.323.1652838932076;
Tue, 17 May 2022 18:55:32 -0700 (PDT)
X-Received: by 2002:a05:6830:1645:b0:606:fe3:fa21 with SMTP id
h5-20020a056830164500b006060fe3fa21mr9363385otr.268.1652838931802; Tue, 17
May 2022 18:55:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 17 May 2022 18:55:31 -0700 (PDT)
In-Reply-To: <t61b96$fhn$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b906:a655:bfcf:bf2f;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b906:a655:bfcf:bf2f
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de>
<t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <OQrgK.3334$6y5f.1197@fx40.iad>
<t60p72$el5$1@dont-email.me> <a2fe9cd2-cacf-4c4e-ac89-d68dcbb4bb4dn@googlegroups.com>
<t61b96$fhn$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9922efb7-a7fa-4ab2-b5d5-c4c656bac9b6n@googlegroups.com>
Subject: Re: Ill-advised use of CMOVE
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 18 May 2022 01:55:32 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 18 May 2022 01:55 UTC

On Tuesday, May 17, 2022 at 6:31:54 PM UTC-5, Ivan Godard wrote:
> On 5/17/2022 12:46 PM, MitchAlsup wrote:
> > On Tuesday, May 17, 2022 at 1:23:33 PM UTC-5, BGB wrote:
> >> On 5/16/2022 7:54 AM, EricP wrote:
> >>> Anton Ertl wrote:
> >>>> Thomas Koenig <tko...@netcologne.de> writes:
> >>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> >>>>>
> >>>>>> So, I think a good solution is for the compiler to essentially never
> >>>>>> decide to issue CMOV as a replacement for a conditional branch for
> >>>>>> architectures with good branch predictors (not saying anything here
> >>>>>> about predication.).
> >>>>> Binary search was also recently quoted here as an example where
> >>>>> CMOV makes a lot of sense
> >>>>
> >>>> As is often the case, this depends very much on the inputs.
> >>>>
> >>>> And then there's the question of why one uses binary search in the
> >>>> first place. The only time I remember using it is when searching for
> >>>> the try-catch block of an exception in a JavaVM implementation. In
> >>>> that case it is likely that the path through the binary search can be
> >>>> predicted from the branch history before the throw.
> >>>
> >>> Binary search is one of the decompositions for SWITCH statements
> >>> with discontiguous blocks.
> >>>
> >>
> >> Agreed.
> >>
> >> Before I added jump-tables to BGBCC, binary search was the primary form
> >> of "switch()".
> >>
> >>
> >> Say:
> >> 1..5 case labels:
> >> Use if-Else
> >> 6+: Divide space in half, recurse on left and right halves.
> >> Compare-and-branch to right half.
> >>
> >> But, with jump tables:
> >> 1..5 case labels:
> >> Use if-Else
> >> 6..511 case labels, density>=50%:
> >> Use jump table.
> >> Else: Divide space in half, recurse on left and right halves.
> >> Compare-and-branch to right half.
> > <
> > My 66000 has tabularized jumps directly (and PIC, too)
> > For a short switch clause (less than 256 instructions in the switch clause)
> > The tabularized jump is ALWAYS smaller than a series of if-elses.
>
> Well, sort of, for the way you handle branches and prediction anyway.
<
I have tried to steer clear how My 66000 implementations predict branches
and when and if. Low end machines may have no prediction at all (just
search ahead in the instruction buffer and prefetch.) Higher end machines
will be taking more than one branch pre cycle, and there are a myriad of
choices. Not all branches need to be predicted, and I have solutions for
indirect jumps (those which remain after JYY), returns, and unconditionals
that are not predictors in the typical branch sense of prediction.
<
And of course there is plenty of predication capability to further reduce
the footprint of branches in the predictor table(s).
<
Higher end machines may be issuing instructions from more than one basic
block in the same cycle (ala Mc 88120 and K9)
,...
<
> But not so much when you can have a four-way branch in a single bundle,
> with less that one prediction for that.
<
than ? <that>
<
In a sense, a JTT is an any-way branch (more than 256 ways). I guess you
could attempt to predict it, but getting the switch index to the JTT faster
is worth more.
>
> The reason search can be smaller than table is that many searches can be
> done with no branch instructions at all. A lot of switches have trivial
> case bodies, and these can be folded along with the comparisons using
> predication or selection, becoming sequential code. And with no branches
> to miss when the switch is unpredictable.
<
a search compared to 4 total cycles ?!?
>
> Gotta be wide for that though :-)
<
in order to NEED it ? or to utilize it ?

Re: branchless binary search

<2022May18.083523@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25421&group=comp.arch#25421

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: branchless binary search
Date: Wed, 18 May 2022 06:35:23 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 16
Message-ID: <2022May18.083523@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at> <jwvpmkb7q1h.fsf-monnier+comp.arch@gnu.org>
Injection-Info: reader02.eternal-september.org; posting-host="ceeab64d13947d9b16eab9899a387060";
logging-data="31981"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199TEpy/UtqV2QSm39pr7FQ"
Cancel-Lock: sha1:Qkz+loSL9PMKtWxNokII0G+AQXI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 18 May 2022 06:35 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> * in the mispredict case you have speculatively loaded a cache line
>> that you don't need, and evicted a probably more useful cache line.
>
>If your CPU is careful to avoid Spectre attacks, this should not be the
>case, right?

Yes, a CPU with the Spectre fix that Mitch Alsup and I have advocated
would load the cache line speculatively, but would not put the result
in the (lower-level) cache unless the load was committed. So no
evictions for mispredicted loads.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<ddbda80c-74ba-4cbe-9676-03151747287bn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25422&group=comp.arch#25422

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:18f:b0:2f3:f42f:67f0 with SMTP id s15-20020a05622a018f00b002f3f42f67f0mr205071qtw.42.1652886614261;
Wed, 18 May 2022 08:10:14 -0700 (PDT)
X-Received: by 2002:a05:6808:c2:b0:325:eb87:c26f with SMTP id
t2-20020a05680800c200b00325eb87c26fmr35836oic.117.1652886613644; Wed, 18 May
2022 08:10:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 18 May 2022 08:10:13 -0700 (PDT)
In-Reply-To: <t614s3$6ch$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t5t8r4$qdi$1@dont-email.me>
<2022May16.180830@mips.complang.tuwien.ac.at> <t5u35g$fo4$1@dont-email.me>
<t5u5c4$2n6r$1@gal.iecc.com> <677f55fc-8833-4830-bacc-9c3925c7b1adn@googlegroups.com>
<6985cff9-13dd-4b59-8a22-0d418f156c12n@googlegroups.com> <69eee9e4-1f34-4994-88c7-65d756baf30an@googlegroups.com>
<30054530-21ac-4bd3-88b4-a0e74ea5af87n@googlegroups.com> <62e6eb85-26ef-45fb-9b22-fbe787447f1en@googlegroups.com>
<736a42b0-8608-49be-aa94-e3800e1a0836n@googlegroups.com> <0f19cd66-c04b-406c-8e79-4565b1013e45n@googlegroups.com>
<t614s3$6ch$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ddbda80c-74ba-4cbe-9676-03151747287bn@googlegroups.com>
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 18 May 2022 15:10:14 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2985
 by: Michael S - Wed, 18 May 2022 15:10 UTC

On Wednesday, May 18, 2022 at 12:42:30 AM UTC+3, gg...@yahoo.com wrote:
> Michael S <already...@yahoo.com> wrote:
> > On Tuesday, May 17, 2022 at 11:38:22 AM UTC+3, Michael S wrote:
> >> Sorry for all 'test' posts.
> >> I am trying to figure out if google, in their eternal wisdom, had broken
> >> a posting on Google Groups completely,
> >> or if it did it selectively, just on Firefox.
> >> So far it looks like the later.
> >
> > More the strangerer
> There was a corrupt spam post since deleted that broke the NewTap reader on
> iOS, had to delete comp.arch, re-add and restart the app.

I am pretty sure that it's unrelated.
It seems, google found to a way to make newer versions of Firefox unusable by exploiting
a way they handle mistakes in Cross-Origin Requests.
Under certain conditions Firefox just stops to react, not only in tab that caused the trouble,
but in other tabs as well. Not all of them, but many.
Now it happens to me not just in Google Groups. Google Search is also affected.
I was forced to switch to Bing as default Firefox search engine.

Of course, that's not what Google wanted. Their ultimate goal, I suppose, is to force me
off Firefox to one of the Chromium-based browsers. And I suspect that they will succeed
at the end. But not yet.

Re: branchless binary search

<t67upc$ruj$1@gioia.aioe.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25429&group=comp.arch#25429

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: branchless binary search
Date: Fri, 20 May 2022 13:41:38 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t67upc$ruj$1@gioia.aioe.org>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="28627"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.12
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Fri, 20 May 2022 11:41 UTC

Anton Ertl wrote:
> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> Michael S <already5chosen@yahoo.com> writes:
>>> On Monday, May 16, 2022 at 7:07:29 PM UTC+3, Anton Ertl wrote:
>>>> and a "branchless" variant is much faster.
>>>
>>> The CPU in question has no CMOVE/Select.
>>
>> For binary search it's relatively cheap to do without (at least
>> someone showed code here many years ago that looked convincing).
>
> I searched around a bit, and found the following code in
> <https://stackoverflow.com/questions/11360831/about-the-branchless-binary-search>
> by BeeOnRope (Travis Downs):
>
> int needle; // value we are searching for
> int *base = ...; // base pointer
> int n; // number of elements in the current region
> while (n > 1) {
> int middle = n/2;
> base += (needle < *base[middle]) ? 0 : middle;
> n -= middle;
> }
>
> (The context indicates that you should verify this code before use.)
>
> And of course (x<y)?0:middle is equivalent to ((x<y)-1)&middle, but
> the compiler probably knows this.
>
> The more interesting part is that the posting then goes on to point
> out that this branchless code is slower than branching code in the
> cases where the loads miss the first few cache levels, even if the
> branches are unpredictable. That's because the branchless code makes
> every load depend on the previous load, while in the branching code
> there is no such dependency if the prediction is correct (50% of the
> time in the unpredictable case, more for better predictability).
>
> Downsides of the branching code in the unpredictable case:
>
> * lots of fetching down the predicted path, most of which is probably
> canceled.
>
> * in the mispredict case an iteration takes the load latency (in order
> to be able to resolce the prediction) plus the mispredict penalty
> (plus some minor change).
>
> * in the mispredict case you have speculatively loaded a cache line
> that you don't need, and evicted a probably more useful cache line.
>
> You can add prefetches or actual loads to the branchless variant:
> E.g., always prefetch the whole left spine down to the leaf (if you
> view it like a search tree). In the unpredictable case this is as
> accurate (and as wasteful) as the branching code, and the performance
> may be a little better (no mispredict penalty). Alternatively,
> prefetch one level of both sides. This has the same effect of halving
> the recurrence latency, but the load unit and cache eviction cost is
> probably lower. You could prefetch further, but the costs rise
> exponentially, while the benefits shrink with each additional level.
>
> Of course, if the searches are even moderately predictable, the
> branching code wins hands-down.

Yeah, that has been my real-life experience. I.e. I have written
branchless binary search code at least a couple of times without finding
it to be a win. OTOH, if you have a lot of these searches, with variable
history leading up to them, then they could put enough pressure on the
BP tables as to force some other more needed resource out.

If you use CMOV to implement the branchless variant, then this becomes
yet another example of where CMOV typically loses compared to the
naive/branchy version.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: branchless binary search

<nkOhK.5260$wIO9.4690@fx12.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25430&group=comp.arch#25430

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: branchless binary search
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at>
In-Reply-To: <2022May17.205327@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 77
Message-ID: <nkOhK.5260$wIO9.4690@fx12.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 20 May 2022 15:19:47 UTC
Date: Fri, 20 May 2022 11:19:38 -0400
X-Received-Bytes: 4457
 by: EricP - Fri, 20 May 2022 15:19 UTC

Anton Ertl wrote:
> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> Michael S <already5chosen@yahoo.com> writes:
>>> On Monday, May 16, 2022 at 7:07:29 PM UTC+3, Anton Ertl wrote:
>>>> and a "branchless" variant is much faster.
>>> The CPU in question has no CMOVE/Select.
>> For binary search it's relatively cheap to do without (at least
>> someone showed code here many years ago that looked convincing).
>
> I searched around a bit, and found the following code in
> <https://stackoverflow.com/questions/11360831/about-the-branchless-binary-search>
> by BeeOnRope (Travis Downs):
>
> int needle; // value we are searching for
> int *base = ...; // base pointer
> int n; // number of elements in the current region
> while (n > 1) {
> int middle = n/2;
> base += (needle < *base[middle]) ? 0 : middle;
> n -= middle;
> }
>
> (The context indicates that you should verify this code before use.)
>
> And of course (x<y)?0:middle is equivalent to ((x<y)-1)&middle, but
> the compiler probably knows this.
>
> The more interesting part is that the posting then goes on to point
> out that this branchless code is slower than branching code in the
> cases where the loads miss the first few cache levels, even if the
> branches are unpredictable. That's because the branchless code makes
> every load depend on the previous load, while in the branching code
> there is no such dependency if the prediction is correct (50% of the
> time in the unpredictable case, more for better predictability).

The branching version depends on flooding the memory hierarchy with
speculative requests to get as much overlapped cache line fetches
going at once, and then on average uses maybe 50% of those fetches,
but that is still a win.

Now if we de-Spectre-ize the memory subsystem so that it cannot
fetch across memory hierarchy levels under speculation,
all that "win" disappears and the two should be the same.

That could also indicate that all loop unrolling might take a hit.

> Downsides of the branching code in the unpredictable case:
>
> * lots of fetching down the predicted path, most of which is probably
> canceled.
>
> * in the mispredict case an iteration takes the load latency (in order
> to be able to resolce the prediction) plus the mispredict penalty
> (plus some minor change).
>
> * in the mispredict case you have speculatively loaded a cache line
> that you don't need, and evicted a probably more useful cache line.
>
> You can add prefetches or actual loads to the branchless variant:
> E.g., always prefetch the whole left spine down to the leaf (if you
> view it like a search tree). In the unpredictable case this is as
> accurate (and as wasteful) as the branching code, and the performance
> may be a little better (no mispredict penalty). Alternatively,
> prefetch one level of both sides. This has the same effect of halving
> the recurrence latency, but the load unit and cache eviction cost is
> probably lower. You could prefetch further, but the costs rise
> exponentially, while the benefits shrink with each additional level.

We would need to allow explicit prefetches done under speculation
to not be blocked by de-Spectre-ization of memory hierarchy.

> Of course, if the searches are even moderately predictable, the
> branching code wins hands-down.
>
> - anton

Re: branchless binary search

<89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25431&group=comp.arch#25431

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:19c2:b0:462:230:dbd8 with SMTP id j2-20020a05621419c200b004620230dbd8mr6140149qvc.114.1653063906295;
Fri, 20 May 2022 09:25:06 -0700 (PDT)
X-Received: by 2002:a05:6870:7084:b0:ed:d709:34be with SMTP id
v4-20020a056870708400b000edd70934bemr6116720oae.4.1653063906118; Fri, 20 May
2022 09:25:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 May 2022 09:25:05 -0700 (PDT)
In-Reply-To: <nkOhK.5260$wIO9.4690@fx12.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de>
<t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
Subject: Re: branchless binary search
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 20 May 2022 16:25:06 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Fri, 20 May 2022 16:25 UTC

On Friday, May 20, 2022 at 6:19:51 PM UTC+3, EricP wrote:
> Anton Ertl wrote:
> > an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >> Michael S <already...@yahoo.com> writes:
> >>> On Monday, May 16, 2022 at 7:07:29 PM UTC+3, Anton Ertl wrote:
> >>>> and a "branchless" variant is much faster.
> >>> The CPU in question has no CMOVE/Select.
> >> For binary search it's relatively cheap to do without (at least
> >> someone showed code here many years ago that looked convincing).
> >
> > I searched around a bit, and found the following code in
> > <https://stackoverflow.com/questions/11360831/about-the-branchless-binary-search>
> > by BeeOnRope (Travis Downs):
> >
> > int needle; // value we are searching for
> > int *base = ...; // base pointer
> > int n; // number of elements in the current region
> > while (n > 1) {
> > int middle = n/2;
> > base += (needle < *base[middle]) ? 0 : middle;
> > n -= middle;
> > }
> >
> > (The context indicates that you should verify this code before use.)
> >
> > And of course (x<y)?0:middle is equivalent to ((x<y)-1)&middle, but
> > the compiler probably knows this.
> >
> > The more interesting part is that the posting then goes on to point
> > out that this branchless code is slower than branching code in the
> > cases where the loads miss the first few cache levels, even if the
> > branches are unpredictable. That's because the branchless code makes
> > every load depend on the previous load, while in the branching code
> > there is no such dependency if the prediction is correct (50% of the
> > time in the unpredictable case, more for better predictability).
> The branching version depends on flooding the memory hierarchy with
> speculative requests to get as much overlapped cache line fetches
> going at once, and then on average uses maybe 50% of those fetches,
> but that is still a win.
>
> Now if we de-Spectre-ize the memory subsystem so that it cannot
> fetch across memory hierarchy levels under speculation,
> all that "win" disappears and the two should be the same.
>
> That could also indicate that all loop unrolling might take a hit.
> > Downsides of the branching code in the unpredictable case:
> >
> > * lots of fetching down the predicted path, most of which is probably
> > canceled.
> >
> > * in the mispredict case an iteration takes the load latency (in order
> > to be able to resolce the prediction) plus the mispredict penalty
> > (plus some minor change).
> >
> > * in the mispredict case you have speculatively loaded a cache line
> > that you don't need, and evicted a probably more useful cache line.
> >
> > You can add prefetches or actual loads to the branchless variant:
> > E.g., always prefetch the whole left spine down to the leaf (if you
> > view it like a search tree). In the unpredictable case this is as
> > accurate (and as wasteful) as the branching code, and the performance
> > may be a little better (no mispredict penalty). Alternatively,
> > prefetch one level of both sides. This has the same effect of halving
> > the recurrence latency, but the load unit and cache eviction cost is
> > probably lower. You could prefetch further, but the costs rise
> > exponentially, while the benefits shrink with each additional level.
> We would need to allow explicit prefetches done under speculation
> to not be blocked by de-Spectre-ization of memory hierarchy.

de-Spectre-ization is a pipe dream. Not going to happen.

> > Of course, if the searches are even moderately predictable, the
> > branching code wins hands-down.
> >
> > - anton

Re: branchless binary search

<t68h2a$ruq$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25432&group=comp.arch#25432

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: branchless binary search
Date: Fri, 20 May 2022 09:53:29 -0700
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <t68h2a$ruq$1@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad>
<89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 20 May 2022 16:53:30 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7626444eaec0c5d4e8d825e77af0b968";
logging-data="28634"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/valESLYTNCOkj1cVkveFwhsY9WrWnNGs="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:Jzn10TV/CbXdhpSZC8f/ioe2P6w=
In-Reply-To: <89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Fri, 20 May 2022 16:53 UTC

On 5/20/2022 9:25 AM, Michael S wrote:
> On Friday, May 20, 2022 at 6:19:51 PM UTC+3, EricP wrote:
>> Anton Ertl wrote:
>>> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>>> Michael S <already...@yahoo.com> writes:
>>>>> On Monday, May 16, 2022 at 7:07:29 PM UTC+3, Anton Ertl wrote:
>>>>>> and a "branchless" variant is much faster.
>>>>> The CPU in question has no CMOVE/Select.
>>>> For binary search it's relatively cheap to do without (at least
>>>> someone showed code here many years ago that looked convincing).
>>>
>>> I searched around a bit, and found the following code in
>>> <https://stackoverflow.com/questions/11360831/about-the-branchless-binary-search>
>>> by BeeOnRope (Travis Downs):
>>>
>>> int needle; // value we are searching for
>>> int *base = ...; // base pointer
>>> int n; // number of elements in the current region
>>> while (n > 1) {
>>> int middle = n/2;
>>> base += (needle < *base[middle]) ? 0 : middle;
>>> n -= middle;
>>> }
>>>
>>> (The context indicates that you should verify this code before use.)
>>>
>>> And of course (x<y)?0:middle is equivalent to ((x<y)-1)&middle, but
>>> the compiler probably knows this.
>>>
>>> The more interesting part is that the posting then goes on to point
>>> out that this branchless code is slower than branching code in the
>>> cases where the loads miss the first few cache levels, even if the
>>> branches are unpredictable. That's because the branchless code makes
>>> every load depend on the previous load, while in the branching code
>>> there is no such dependency if the prediction is correct (50% of the
>>> time in the unpredictable case, more for better predictability).
>> The branching version depends on flooding the memory hierarchy with
>> speculative requests to get as much overlapped cache line fetches
>> going at once, and then on average uses maybe 50% of those fetches,
>> but that is still a win.
>>
>> Now if we de-Spectre-ize the memory subsystem so that it cannot
>> fetch across memory hierarchy levels under speculation,
>> all that "win" disappears and the two should be the same.
>>
>> That could also indicate that all loop unrolling might take a hit.
>>> Downsides of the branching code in the unpredictable case:
>>>
>>> * lots of fetching down the predicted path, most of which is probably
>>> canceled.
>>>
>>> * in the mispredict case an iteration takes the load latency (in order
>>> to be able to resolce the prediction) plus the mispredict penalty
>>> (plus some minor change).
>>>
>>> * in the mispredict case you have speculatively loaded a cache line
>>> that you don't need, and evicted a probably more useful cache line.
>>>
>>> You can add prefetches or actual loads to the branchless variant:
>>> E.g., always prefetch the whole left spine down to the leaf (if you
>>> view it like a search tree). In the unpredictable case this is as
>>> accurate (and as wasteful) as the branching code, and the performance
>>> may be a little better (no mispredict penalty). Alternatively,
>>> prefetch one level of both sides. This has the same effect of halving
>>> the recurrence latency, but the load unit and cache eviction cost is
>>> probably lower. You could prefetch further, but the costs rise
>>> exponentially, while the benefits shrink with each additional level.
>> We would need to allow explicit prefetches done under speculation
>> to not be blocked by de-Spectre-ization of memory hierarchy.
>
> de-Spectre-ization is a pipe dream. Not going to happen.

I am not sure what you are saying here. ISTM that statement could mean
one of three things.

1. It is impossible to design any OoO CPU, even from scratch that is not
susceptible to Spectre. I think Mitch and others would disagree,

2. It is not possible to design an OoO X86 compatible CPU that is not
susceptible.

3. It is possible but economic concerns dictate that it will not happen.
These concerns might include reduced performance or increased power as
well as increased product cost, or even just not worth the design
resources to implement.

Which one (or a different one) are you saying?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: branchless binary search

<2022May20.185040@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25433&group=comp.arch#25433

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: branchless binary search
Date: Fri, 20 May 2022 16:50:40 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 67
Message-ID: <2022May20.185040@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad>
Injection-Info: reader02.eternal-september.org; posting-host="2c951e407f74007c0342cdc4431af617";
logging-data="12761"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Tv6c5recWHZCjDy1O1rdN"
Cancel-Lock: sha1:caaeT25uF6L0maiVqQYAgcYxM/E=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 20 May 2022 16:50 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Anton Ertl wrote:
[...]
>The branching version depends on flooding the memory hierarchy with
>speculative requests to get as much overlapped cache line fetches
>going at once, and then on average uses maybe 50% of those fetches,
>but that is still a win.

If it speculates 20 iterations of the loop, it fetches 20 cache lines
speculatively, and in the unpredictable case on average only one
speculative fetch is actually used on average, i.e. 5%.

>Now if we de-Spectre-ize the memory subsystem so that it cannot
>fetch across memory hierarchy levels under speculation,
>all that "win" disappears and the two should be the same.

And if we de-Spectre-ize it such that it still can fetch across memory
hierarchy levels under speculation, it does not disappear.

However, we may significantly reduce the bandwidth of speculative
accesses in order to avoid speculative side channels through resource
contention. In that case, if there is only one cache line fetched
speculatively, that will be useful 50% of the time.

>> You can add prefetches or actual loads to the branchless variant:
>> E.g., always prefetch the whole left spine down to the leaf (if you
>> view it like a search tree). In the unpredictable case this is as
>> accurate (and as wasteful) as the branching code, and the performance
>> may be a little better (no mispredict penalty). Alternatively,
>> prefetch one level of both sides. This has the same effect of halving
>> the recurrence latency, but the load unit and cache eviction cost is
>> probably lower. You could prefetch further, but the costs rise
>> exponentially, while the benefits shrink with each additional level.
>
>We would need to allow explicit prefetches done under speculation
>to not be blocked by de-Spectre-ization of memory hierarchy.

You can do the fetches/prefetches without any speculation. What is
happening in code like:

while (n > 1) {
int middle = n/2;
base += (needle < *base[middle]) ? 0 : middle;
n -= middle;
}

is that the following slice

while (n > 1) {
int middle = n/2;
n -= middle;
}

runs ahead at about one iteration every 2-3 cycles, resolving the
branch prediction quickly, while the ld n computations of

base += (needle < *base[middle]) ? 0 : middle;

(and additional prefetches that use base) lag further and further
behind, and are therefore non-speculative as soon as they are ready.
Such fetches and prefetches are therefore also not affected by
bandwidth limitations for speculative code.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: branchless binary search

<ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25434&group=comp.arch#25434

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5aa8:0:b0:45a:f1e4:7b24 with SMTP id u8-20020ad45aa8000000b0045af1e47b24mr8819895qvg.127.1653068493574;
Fri, 20 May 2022 10:41:33 -0700 (PDT)
X-Received: by 2002:aca:e155:0:b0:325:6d76:da4b with SMTP id
y82-20020acae155000000b003256d76da4bmr6176661oig.125.1653068493321; Fri, 20
May 2022 10:41:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 May 2022 10:41:33 -0700 (PDT)
In-Reply-To: <2022May20.185040@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2185:754d:d068:fbc8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2185:754d:d068:fbc8
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de>
<t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad> <2022May20.185040@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>
Subject: Re: branchless binary search
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 20 May 2022 17:41:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5844
 by: MitchAlsup - Fri, 20 May 2022 17:41 UTC

On Friday, May 20, 2022 at 12:24:54 PM UTC-5, Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
> >Anton Ertl wrote:
> [...]
> >The branching version depends on flooding the memory hierarchy with
> >speculative requests to get as much overlapped cache line fetches
> >going at once, and then on average uses maybe 50% of those fetches,
> >but that is still a win.
<
> If it speculates 20 iterations of the loop, it fetches 20 cache lines
> speculatively, and in the unpredictable case on average only one
> speculative fetch is actually used on average, i.e. 5%.
<
> >Now if we de-Spectre-ize the memory subsystem so that it cannot
> >fetch across memory hierarchy levels under speculation,
> >all that "win" disappears and the two should be the same.
<
> And if we de-Spectre-ize it such that it still can fetch across memory
> hierarchy levels under speculation, it does not disappear.
<
One can Fetch down a non-architectural path, feed the data to the LD
instruction, and allow the LD instruction (transitively) to start new
Fetches. What one cannot do is put the fetched non-architectural
data in the data cache. You CAN put it in a temporal cache (data buffer)
That gets cleaned upon mispredict, you just can't put it in the conventional
data cache.
<
It is the installation of the non-architectural data the cache that enables
the window to Spectré.
<
<
Same for TLBs and branch predictors.
>
> However, we may significantly reduce the bandwidth of speculative
> accesses in order to avoid speculative side channels through resource
> contention. In that case, if there is only one cache line fetched
> speculatively, that will be useful 50% of the time.
<
No need to reduce BW, you just need somewhere other than the cache to
hold the data (and forward other units of data) until the LD passes the
completion point. Most processors have something like "Miss Buffers"
where all this can be done without adding any data-path HW (there is a bit
of added sequencing logic).
<
So, for the designs I have done in the past, no real HW additions would
be necessary to de-Spectorize the processor.
<
> >> You can add prefetches or actual loads to the branchless variant:
> >> E.g., always prefetch the whole left spine down to the leaf (if you
> >> view it like a search tree). In the unpredictable case this is as
> >> accurate (and as wasteful) as the branching code, and the performance
> >> may be a little better (no mispredict penalty). Alternatively,
> >> prefetch one level of both sides. This has the same effect of halving
> >> the recurrence latency, but the load unit and cache eviction cost is
> >> probably lower. You could prefetch further, but the costs rise
> >> exponentially, while the benefits shrink with each additional level.
> >
> >We would need to allow explicit prefetches done under speculation
> >to not be blocked by de-Spectre-ization of memory hierarchy.
<
> You can do the fetches/prefetches without any speculation. What is
> happening in code like:
> while (n > 1) {
> int middle = n/2;
> base += (needle < *base[middle]) ? 0 : middle;
> n -= middle;
> }
> is that the following slice
> while (n > 1) {
> int middle = n/2;
> n -= middle;
> }
>
> runs ahead at about one iteration every 2-3 cycles, resolving the
> branch prediction quickly, while the ld n computations of
> base += (needle < *base[middle]) ? 0 : middle;
> (and additional prefetches that use base) lag further and further
> behind, and are therefore non-speculative as soon as they are ready.
> Such fetches and prefetches are therefore also not affected by
> bandwidth limitations for speculative code.
<
The loop control code easily runs forward.
The loop comparison code has longer latency
Just don't put data in the cache non-speculatively.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Spectre and resource comtention (was: branchless binary search)

<2022May20.205631@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25435&group=comp.arch#25435

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Spectre and resource comtention (was: branchless binary search)
Date: Fri, 20 May 2022 18:56:31 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 75
Message-ID: <2022May20.205631@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad> <2022May20.185040@mips.complang.tuwien.ac.at> <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="2c951e407f74007c0342cdc4431af617";
logging-data="12696"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18+QNSL0+zXfOn8mlWZPoex"
Cancel-Lock: sha1:EqnSTOFKMgMvIgy8piKmSnSW8aE=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 20 May 2022 18:56 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Friday, May 20, 2022 at 12:24:54 PM UTC-5, Anton Ertl wrote:
>> However, we may significantly reduce the bandwidth of speculative=20
>> accesses in order to avoid speculative side channels through resource=20
>> contention. In that case, if there is only one cache line fetched=20
>> speculatively, that will be useful 50% of the time.
><
>No need to reduce BW, you just need somewhere other than the cache to
>hold the data (and forward other units of data) until the LD passes the
>completion point. Most processors have something like "Miss Buffers"
>where all this can be done without adding any data-path HW (there is a bit
>of added sequencing logic).

That helps against attacks that then test for the presence of some
data in a cache. But I also want to be immune to an attack like the
following:

One core contains your secret bit, and somehow the secret bit is
accessed speculatively (there is some gadget in your code that the
attacker can use). A load depends on the secret bit: if it is clear,
the load accesses data in the L1 cache, if it is set, it accesses data
in the shared L3 cache.

Another core is under control of the attacker, and the attacker
constantly accesses the L3 cache speculatively, and measures the
bandwidth he sees. If he does not get full bandwidth, he knows
someone else has accessed the L3 cache. Under certain circumstances,
this allows him to know that your secret bit is set. And if he gets
the full bandwidth in the time window when he knows you should be
doing the access, he knows that your secret bit is clear.

What can we do against this attack?

Divide the bandwidth such that every one of N cores gets only 1/Nth of
the bandwidth.

This seems overly restrictive for non-speculative accesses (we
typically use software measures to avoid such problems for important
secrets there), so my next idea is that non-speculative accesses can
make use of the full bandwidth (crowding out speculative accesses),
but speculative accesses can use only every 1/Nth time slot (and only
of that time slot is not taken by a non-speculative access), with each
time slot belonging exclusively to one core as far as speculative
accesses are concerned. So a core can never measure if there was a
speculative access by another core.

For per-core caches something like that has to be done for snooping
accesses, but with many more time slots for the core the cache belongs
to.

The question is how much performance that costs. It seems to me that
the bandwidth-heavy code (HPC code) performs mostly non-speculative
accesses, and most other code tends to work well with caches, so a CPU
with large per-core caches should not suffer much from this
countermeasure.

One thing that this idea does not help against is an attack from the
same core (say, some sandboxed code under the attacker's control
speculatively accesses data outside the sandbox). Javascript works
against that be making measurement imprecise, but I would prefer if
such an attack would be impossible.

However, AFAIK security researchers have not demonstrated Spectre
attacks that use resource contention as a side channel yet, only
persistent microarchitectural state, which is much less ephemeral. So
if we fix the latter kinds of attacks in the way outlined by Mitch
Alsup and me, we would gain a lot. Then security researchers could
put their sights on the (harder) attacks involving resource
contention, and we will see if they succeed, and if so, can enable the
chicken bit for the fix I outlined above:-).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Spectre and resource comtention

<jwvbkvr29vb.fsf-monnier+comp.arch@gnu.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25436&group=comp.arch#25436

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Spectre and resource comtention
Date: Fri, 20 May 2022 17:07:21 -0400
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <jwvbkvr29vb.fsf-monnier+comp.arch@gnu.org>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad>
<2022May20.185040@mips.complang.tuwien.ac.at>
<ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>
<2022May20.205631@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="850bbd683bf1e53e42ee44f7313a6063";
logging-data="7814"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0DVwdMGG1IZ3LovxzCg83"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:M0PIBqe2yV73TsHfLNAt1ig5/WE=
sha1:lW6yQ9pMqsM6FhW0DHBR5TStBIo=
 by: Stefan Monnier - Fri, 20 May 2022 21:07 UTC

> Another core is under control of the attacker, and the attacker
> constantly accesses the L3 cache speculatively, and measures the
> bandwidth he sees. If he does not get full bandwidth, he knows
> someone else has accessed the L3 cache. Under certain circumstances,
> this allows him to know that your secret bit is set. And if he gets
> the full bandwidth in the time window when he knows you should be
> doing the access, he knows that your secret bit is clear.

Why does the attacker need to (constantly) access L3 speculatively?
Couldn't they just as well access L3 (constantly) non-speculatively?

[ If so, your counter measure wouldn't work, right? ]

Stefan

Re: Spectre and resource comtention (was: branchless binary search)

<18bd9df5-c3d1-460b-8327-99f85b7a88d4n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25437&group=comp.arch#25437

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:16a2:b0:6a0:68e2:b7a4 with SMTP id s2-20020a05620a16a200b006a068e2b7a4mr7592462qkj.374.1653086385808;
Fri, 20 May 2022 15:39:45 -0700 (PDT)
X-Received: by 2002:a05:6870:170b:b0:f1:8a5c:80a0 with SMTP id
h11-20020a056870170b00b000f18a5c80a0mr7224969oae.16.1653086385530; Fri, 20
May 2022 15:39:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 May 2022 15:39:45 -0700 (PDT)
In-Reply-To: <2022May20.205631@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2185:754d:d068:fbc8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2185:754d:d068:fbc8
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad>
<2022May20.185040@mips.complang.tuwien.ac.at> <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>
<2022May20.205631@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <18bd9df5-c3d1-460b-8327-99f85b7a88d4n@googlegroups.com>
Subject: Re: Spectre and resource comtention (was: branchless binary search)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 20 May 2022 22:39:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 12555
 by: MitchAlsup - Fri, 20 May 2022 22:39 UTC

On Friday, May 20, 2022 at 2:40:12 PM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Friday, May 20, 2022 at 12:24:54 PM UTC-5, Anton Ertl wrote:
> >> However, we may significantly reduce the bandwidth of speculative=20
> >> accesses in order to avoid speculative side channels through resource=20
> >> contention. In that case, if there is only one cache line fetched=20
> >> speculatively, that will be useful 50% of the time.
> ><
> >No need to reduce BW, you just need somewhere other than the cache to
> >hold the data (and forward other units of data) until the LD passes the
> >completion point. Most processors have something like "Miss Buffers"
> >where all this can be done without adding any data-path HW (there is a bit
> >of added sequencing logic).
> That helps against attacks that then test for the presence of some
> data in a cache. But I also want to be immune to an attack like the
> following:
>
> One core contains your secret bit, and somehow the secret bit is
> accessed speculatively (there is some gadget in your code that the
> attacker can use). A load depends on the secret bit: if it is clear,
> the load accesses data in the L1 cache, if it is set, it accesses data
> in the shared L3 cache.
<
Phraseology:: a LD either hits in L1 or it does not (and so on)
I fail to see how secret bit is present in the core being attacked
unless the core performs a Flush to L3 on a line recently in L1.
{So, you prevent this attack by not flushing.}
<
I can see how attacker might conspire to move L1 cache to L3
cache if the implementation has a "generous" snoop policy.
But snoops, being as expensive as they are, almost invariably
use "precise" snooping. But under precise snooping, attacker
and attacked would have to be sharing address space for the
attacker to migrate a cache line from L1 to L3.
<
{Also note: in my current designs, L3 is a write back and prefetch
buffer for DRAM that is snoopable by cores which miss in L1 and
L2....}
<
Is secret bit visible to the instruction stream prior to accessing L1 ?
Why is secret bit present in core ?
<
I also fail to see how background I/O would not ALSO be a source
of noise to the attacker. Coherence traffic is another source of
timing irregularities.
<
But let us assume that there are explanations for the above......
>
> Another core is under control of the attacker, and the attacker
> constantly accesses the L3 cache speculatively, and measures the
> bandwidth he sees. If he does not get full bandwidth, he knows
> someone else has accessed the L3 cache. Under certain circumstances,
> this allows him to know that your secret bit is set. And if he gets
> the full bandwidth in the time window when he knows you should be
> doing the access, he knows that your secret bit is clear.
>
> What can we do against this attack?
<
Build the L3 using mainframe memory banking. As long as the multiple
cores are accessing independent banks of L3, everybody sees full bandwidth.
Given 4MB of L3, one could have between 16 and 128 banks for a chip
containing 8-32 cores.
<
Generally, a given core has only a handful of miss buffers (say 8).
......so we have a fundamental limit of 8 outstanding references per core
L1 cache is 1 cycle away with 1 cycle throughput (cache access width)
L2 cache is 10+ cycles away with 1 access per 8 cycles (cache line)
......L2 generally has multiple banks but this us used to save power.
L3 cache is 40+ cycles away with 1 access per 8 cycles (cache line)
......but L3 has multiple banks so multiple accesses are concurrent.
......total available L3 BW is typically bound by the generation of
......snoop traffic not by cache deliverance.
DRAM is 150+ cycles away when none of the caches hit (30ns gives
20ns for DRAM access and 10ns to route access to DRAM controller
and to route data back to requestor.) Coherence traffic is going to
be closer to 250 cycles (core access to knowing which data is
useable.)
<
Here the attacking core only gets 8/40 (1/5) of the available bandwidth
even if the attacker is the only core enabled to run code, once you factor
in coherence traffic, a given core accessing cacheable data will see less
BW.
<
So, we have a fundamental limit of 8 outstanding references per core
and a coherence limitation of ~250 cycles to send out all the accesses
and receive back all of the resulting coherence traffic. So, any given core
can only "do" and entire 1 access every 32 cycles.
<
And basically, this leaves plenty of available BW in multi-banked L3
so that temporal measurements are going to be very difficult.
<
The above was written as if cache lines are transmitted in 4-8 beat bursts.
Transmitting an entire cache line in a single beat does not change the
above numbers much, maybe taking 6 cycles out of each layer in the
caches/DRAM except L1 and L2.
>
> Divide the bandwidth such that every one of N cores gets only 1/Nth of
> the bandwidth.
<
Nada gonna fly.........but it happens naturally........so, don't worry about this.
>
> This seems overly restrictive for non-speculative accesses (we
> typically use software measures to avoid such problems for important
> secrets there), so my next idea is that non-speculative accesses can
> make use of the full bandwidth (crowding out speculative accesses),
> but speculative accesses can use only every 1/Nth time slot (and only
> of that time slot is not taken by a non-speculative access), with each
> time slot belonging exclusively to one core as far as speculative
> accesses are concerned. So a core can never measure if there was a
> speculative access by another core.
<
Given a multi-banked L3, one faces the exact same problem as faced by
CRAY 1/XMP 8 processor system. The first 6 processors could saturate
the banking of memory (3 accesses per cycle × 6 processors > banks).
The Cray 1/YMP solved this problem by having a priority mechanism at
the bank selection, so it would take a pending request over a newly
arriving request any time the bank was available to start a new access.
<
Presto: fairly banked large L3 where everyone gets essentially equal access
with small time variances. Fairly, here, means everybody gets rathe equal
BW per bank under bank collisions.
>
> For per-core caches something like that has to be done for snooping
> accesses, but with many more time slots for the core the cache belongs
> to.
<
One can deal with the snoops by replicating cache tags.
What one cannot do is to shorten the latency to collect snoops information
from all layers of the cache hierarchy--or makes these have guaranteed
latency (or even a short span of latencies.) This is a routing problem
not a latency problem.
>
> The question is how much performance that costs. It seems to me that
> the bandwidth-heavy code (HPC code) performs mostly non-speculative
> accesses, and most other code tends to work well with caches, so a CPU
> with large per-core caches should not suffer much from this
> countermeasure.
<
A) its been a long time since someone made an interconnect for HPC
access patterns that was not a GPU. Back in the CRAY days, there was
no cache, and no lack of cash to build suitable memory systems.
<
B) Your typical CPU caches see 95% code hit rates, 90% data hit rates
in L1. L2 cleans up another 4× {making this 8× causes too much added
L2 latency to "pay off" in the long run for "typical codes"}.
<
C) on the other hand, DRAM is still "lots" of latency cycles away, and is
generally accessed by EVERYONE, sharing L3 storage,....
>
D) The GPU I worked on used 2×1024-bit wide busses per "shader" core
(1 inbound, 1 outbound) whereas current cache lines are only 512-bits !!
<
So, this is what current implementations are faced with: Developing a
cache and routing hierarchy that provides a good illusion that every core
is essentially independent of every other core and treated fairly.
<
So, at this point, I know I have not given a solution to the issue raised,
merely outlines the current state of affairs. BUT:
<
1) it seems to me that when DRAM is doing any automomous (or
programatic) prefetching, this adds noise to L3 measurements
2) multi-banking makes the noise harder to measure
3) routing adds more noise
4) DRAM adds still more noise
5) L3 can be used to mitigate Meltdown (multiple writes get absorbed
by L3 and DRAM sees but 1 single write--which prevents writing to
the same 'word' of DRAM more than "every once in a while". and DRAM
manufactures don't have to do anything "new" about Meltdown)
<
But this still leaves me concerned that attack is possible.
>
> One thing that this idea does not help against is an attack from the
> same core (say, some sandboxed code under the attacker's control
> speculatively accesses data outside the sandbox). Javascript works
> against that be making measurement imprecise, but I would prefer if
> such an attack would be impossible.
<
Consider a JavaScript application which creates a native code trojan
and transfers control to trojan in order to read the high precision clock !?!
<
It is strategies like this that I dislike a page of memory being writable and
executable at the same time.
>
> However, AFAIK security researchers have not demonstrated Spectre
> attacks that use resource contention as a side channel yet, only
> persistent microarchitectural state, which is much less ephemeral. So
<
Likely because these resources have inherent noise in the measurements
AND because there are lots of other system noise being inserted randomly.
<
> if we fix the latter kinds of attacks in the way outlined by Mitch
> Alsup and me, we would gain a lot. Then security researchers could
> put their sights on the (harder) attacks involving resource
> contention, and we will see if they succeed, and if so, can enable the
> chicken bit for the fix I outlined above:-).
<
Agreed.
<
I wonder what would happen if the high precision timer was incremented
per Clock and per route. That is::
next HPT = current HPT + 1 + routes;
where routes is the number of unique routing decision made per clock in
the interconnect fabric (maybe just to the L3 or DAM but aggregating
noise to the HPT sill enabled it to be used for lots of stuff it is being used
for that we find acceptable).
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>


Click here to read the complete article
Re: Spectre and resource comtention

<32ba58a2-e39b-4035-bc06-0e664500fba9n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25438&group=comp.arch#25438

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:188a:b0:2f3:f4a8:ac9b with SMTP id v10-20020a05622a188a00b002f3f4a8ac9bmr9422911qtc.396.1653086847910;
Fri, 20 May 2022 15:47:27 -0700 (PDT)
X-Received: by 2002:a05:6870:8a26:b0:f1:8c07:299e with SMTP id
p38-20020a0568708a2600b000f18c07299emr7421413oaq.214.1653086847729; Fri, 20
May 2022 15:47:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 May 2022 15:47:27 -0700 (PDT)
In-Reply-To: <jwvbkvr29vb.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2185:754d:d068:fbc8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2185:754d:d068:fbc8
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad>
<2022May20.185040@mips.complang.tuwien.ac.at> <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>
<2022May20.205631@mips.complang.tuwien.ac.at> <jwvbkvr29vb.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <32ba58a2-e39b-4035-bc06-0e664500fba9n@googlegroups.com>
Subject: Re: Spectre and resource comtention
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 20 May 2022 22:47:27 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 20 May 2022 22:47 UTC

On Friday, May 20, 2022 at 4:07:25 PM UTC-5, Stefan Monnier wrote:
> > Another core is under control of the attacker, and the attacker
> > constantly accesses the L3 cache speculatively, and measures the
> > bandwidth he sees. If he does not get full bandwidth, he knows
> > someone else has accessed the L3 cache. Under certain circumstances,
> > this allows him to know that your secret bit is set. And if he gets
> > the full bandwidth in the time window when he knows you should be
> > doing the access, he knows that your secret bit is clear.
>
> Why does the attacker need to (constantly) access L3 speculatively?
<
Basically, everything (other than the first 5 instructions executed) are
speculative. One STILL has speculative instructions in the pipeline after
MOST branch mispredictions. So, there is no getting away from this.
Just assume everything is always speculative, even though a small
percentage are not.
<
> Couldn't they just as well access L3 (constantly) non-speculatively?
<
Given 40 cycles to L3, and knowing you are missing in L1 and L2
pretty much makes you non-speculative. (Branch predictions have
resolved by the time L3 returns data.)
>
> [ If so, your counter measure wouldn't work, right? ]
<
Proposed counter measure is a red-herring (as far as I understand
the proposed)
>
>
> Stefan

Re: Spectre and resource comtention

<2022May21.103752@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25439&group=comp.arch#25439

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Spectre and resource comtention
Date: Sat, 21 May 2022 08:37:52 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 28
Message-ID: <2022May21.103752@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad> <2022May20.185040@mips.complang.tuwien.ac.at> <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com> <2022May20.205631@mips.complang.tuwien.ac.at> <jwvbkvr29vb.fsf-monnier+comp.arch@gnu.org>
Injection-Info: reader02.eternal-september.org; posting-host="635e1e13de3db738801291169031cff5";
logging-data="15309"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sMrp7hTNICLMCqholubrA"
Cancel-Lock: sha1:MiVRMgVukfY79pD70goqQrmzeIE=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 21 May 2022 08:37 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> Another core is under control of the attacker, and the attacker
>> constantly accesses the L3 cache speculatively, and measures the
>> bandwidth he sees. If he does not get full bandwidth, he knows
>> someone else has accessed the L3 cache. Under certain circumstances,
>> this allows him to know that your secret bit is set. And if he gets
>> the full bandwidth in the time window when he knows you should be
>> doing the access, he knows that your secret bit is clear.
>
>Why does the attacker need to (constantly) access L3 speculatively?
>Couldn't they just as well access L3 (constantly) non-speculatively?

My countermeasure gives priority to non-speculative accesses, so the
speculative access then just would not happen, and the attacker coulkd
not measure it.

>[ If so, your counter measure wouldn't work, right? ]

The attack would not work in the presence of my countermeasure. For
non-speculative measurement accesses it would not work because the
non-speculative accesses get priority; for speculative measurement
accesses, it would not work, because they use different access slots
than the attacked process.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Spectre and resource comtention (was: branchless binary search)

<2022May21.104922@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25440&group=comp.arch#25440

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Spectre and resource comtention (was: branchless binary search)
Date: Sat, 21 May 2022 08:49:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 113
Message-ID: <2022May21.104922@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad> <2022May20.185040@mips.complang.tuwien.ac.at> <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com> <2022May20.205631@mips.complang.tuwien.ac.at> <18bd9df5-c3d1-460b-8327-99f85b7a88d4n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="635e1e13de3db738801291169031cff5";
logging-data="14970"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18UzGGHtG4a36pxNT9BxguX"
Cancel-Lock: sha1:GeIVoc9ofDigFOjaad5suoFKsRM=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 21 May 2022 08:49 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Friday, May 20, 2022 at 2:40:12 PM UTC-5, Anton Ertl wrote:
>> One core contains your secret bit, and somehow the secret bit is=20
>> accessed speculatively (there is some gadget in your code that the=20
>> attacker can use). A load depends on the secret bit: if it is clear,=20
>> the load accesses data in the L1 cache, if it is set, it accesses data=20
>> in the shared L3 cache.=20
><
>Phraseology:: a LD either hits in L1 or it does not (and so on)
>I fail to see how secret bit is present in the core being attacked
>unless the core performs a Flush to L3 on a line recently in L1.
>{So, you prevent this attack by not flushing.}

More precisely, the core can access the bit speculatively through some
gadget; this ios just the same as the first speculative load in
Spectre V1 or Spectre V2, only the side channel afterwards is
different.

The side channel is not about any speculative changes to the cache
hierarchy (these are prevented by our fix for that side channel), but
about the fact that (without countermeasures) you can measure whether
a different core accesses a shared cache or main memory by observing
the bandwidth you get when you access them yourself.

>I can see how attacker might conspire to move L1 cache to L3
>cache if the implementation has a "generous" snoop policy.
>But snoops, being as expensive as they are, almost invariably
>use "precise" snooping. But under precise snooping, attacker
>and attacked would have to be sharing address space for the
>attacker to migrate a cache line from L1 to L3.

That's a different potential side channel one also would have to take
care of (but does not look that hard to close). Many processes share
physical memory in the form of shared libraries (and that would be
enough for that effect, no?).

>I also fail to see how background I/O would not ALSO be a source=20
>of noise to the attacker. Coherence traffic is another source of
>timing irregularities.

There are certainly lots of things going on in the CPU that make such
attacks hard, but would you bet that they make them impossible? The
typical answer by security researchers is to repeat the whole thing n
times, and the signal increases by a factor of n, while the noise
increases only by IIRC sqrt(n), and at some n the signal is clearly
distinguishable from the noise.

>Build the L3 using mainframe memory banking. As long as the multiple=20
>cores are accessing independent banks of L3, everybody sees full bandwidth.
>Given 4MB of L3, one could have between 16 and 128 banks for a chip
>containing 8-32 cores.

Of course the attacker will try to access the same bank.

But sure, increasing the resources (in this case composite bandwidth)
is a way to mostly avoid the problem that my countermeasure hurts
performance.

>And basically, this leaves plenty of available BW in multi-banked L3
>so that temporal measurements are going to be very difficult.

If there were so many resources that it would be impossible, that
would be a way to avoid resource contention side channels. My
countermeasures idea is based on trying to get that effect without
needing to provide so many resources.

>Given a multi-banked L3, one faces the exact same problem as faced by
>CRAY 1/XMP 8 processor system. The first 6 processors could saturate=20
>the banking of memory (3 accesses per cycle =C3=97 6 processors > banks).
>The Cray 1/YMP solved this problem by having a priority mechanism at
>the bank selection, so it would take a pending request over a newly=20
>arriving request any time the bank was available to start a new access.
><
>Presto: fairly banked large L3 where everyone gets essentially equal access
>with small time variances. Fairly, here, means everybody gets rathe equal
>BW per bank under bank collisions.

The problem at hand is not fairness, but side channels. Of course you
still want fairness (and I think that's necessary to avoid side
channels). But I am willing to sacrifice performance to avoid
speculative side channels, and once I have achieved that, I think
about how the memory system needs to look to minimize the performance
impact.

>So, this is what current implementations are faced with: Developing a
>cache and routing hierarchy that provides a good illusion that every core
>is essentially independent of every other core and treated fairly.

s/essentially//

>So, at this point, I know I have not given a solution to the issue raised,
>merely outlines the current state of affairs. BUT:
><
>1) it seems to me that when DRAM is doing any automomous (or
>programatic) prefetching, this adds noise to L3 measurements
>2) multi-banking makes the noise harder to measure
>3) routing adds more noise
>4) DRAM adds still more noise

Adding noise makes the attack harder, but not impossible.

>5) L3 can be used to mitigate Meltdown (multiple writes get absorbed
>by L3 and DRAM sees but 1 single write--which prevents writing to
>the same 'word' of DRAM more than "every once in a while". and DRAM
>manufactures don't have to do anything "new" about Meltdown)

Are you thinking about Rowhammer? I think they use instructions like
CLFLUSH or non-temporal accesses to get around that.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Spectre and resource comtention (was: branchless binary search)

<8f25086f-ab43-403c-85d6-ccf0cc21a183n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25441&group=comp.arch#25441

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a385:0:b0:6a3:2baf:7e98 with SMTP id m127-20020a37a385000000b006a32baf7e98mr9913626qke.109.1653156395709;
Sat, 21 May 2022 11:06:35 -0700 (PDT)
X-Received: by 2002:a54:4f83:0:b0:324:f58f:4b95 with SMTP id
g3-20020a544f83000000b00324f58f4b95mr8093812oiy.4.1653156395486; Sat, 21 May
2022 11:06:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 21 May 2022 11:06:35 -0700 (PDT)
In-Reply-To: <2022May21.104922@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ddf8:a315:a9be:b790;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ddf8:a315:a9be:b790
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad>
<2022May20.185040@mips.complang.tuwien.ac.at> <ba27adf3-9b85-4392-b5b9-6afdfc8db1cbn@googlegroups.com>
<2022May20.205631@mips.complang.tuwien.ac.at> <18bd9df5-c3d1-460b-8327-99f85b7a88d4n@googlegroups.com>
<2022May21.104922@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8f25086f-ab43-403c-85d6-ccf0cc21a183n@googlegroups.com>
Subject: Re: Spectre and resource comtention (was: branchless binary search)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 21 May 2022 18:06:35 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2778
 by: MitchAlsup - Sat, 21 May 2022 18:06 UTC

On Saturday, May 21, 2022 at 4:36:49 AM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:

> >5) L3 can be used to mitigate Meltdown (multiple writes get absorbed
> >by L3 and DRAM sees but 1 single write--which prevents writing to
> >the same 'word' of DRAM more than "every once in a while". and DRAM
> >manufactures don't have to do anything "new" about Meltdown)
<
> Are you thinking about Rowhammer? I think they use instructions like
> CLFLUSH or non-temporal accesses to get around that.
<
Yes, s/meltdown/rowhammer/
<
No, you missed my point. When the L3 is the read and write buffers
to DRAM, then one can CFLUSH as many times as one likes, but
DRAM only gets written once. The L3 gets written k times, Dram 1
time. Nobody gets to DRAM without going through the L3, and L3
absorbs all the CFLUSH writes, and retains a single pending write
to DRAM (which it gets around to sooner or later;; mostly later.)
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: branchless binary search

<5ed3c521-f7de-4774-9029-28a53b8f0a9en@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25442&group=comp.arch#25442

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2a87:b0:461:e7cf:6ec6 with SMTP id jr7-20020a0562142a8700b00461e7cf6ec6mr12220113qvb.82.1653157464590;
Sat, 21 May 2022 11:24:24 -0700 (PDT)
X-Received: by 2002:a05:6870:e409:b0:f2:4cf4:f187 with SMTP id
n9-20020a056870e40900b000f24cf4f187mr548983oag.154.1653157464342; Sat, 21 May
2022 11:24:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 21 May 2022 11:24:24 -0700 (PDT)
In-Reply-To: <t68h2a$ruq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de>
<t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad> <89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
<t68h2a$ruq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5ed3c521-f7de-4774-9029-28a53b8f0a9en@googlegroups.com>
Subject: Re: branchless binary search
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 21 May 2022 18:24:24 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Sat, 21 May 2022 18:24 UTC

On Friday, May 20, 2022 at 7:53:33 PM UTC+3, Stephen Fuld wrote:
> On 5/20/2022 9:25 AM, Michael S wrote:
> > On Friday, May 20, 2022 at 6:19:51 PM UTC+3, EricP wrote:
> >> Anton Ertl wrote:
> >>> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >>>> Michael S <already...@yahoo.com> writes:
> >>>>> On Monday, May 16, 2022 at 7:07:29 PM UTC+3, Anton Ertl wrote:
> >>>>>> and a "branchless" variant is much faster.
> >>>>> The CPU in question has no CMOVE/Select.
> >>>> For binary search it's relatively cheap to do without (at least
> >>>> someone showed code here many years ago that looked convincing).
> >>>
> >>> I searched around a bit, and found the following code in
> >>> <https://stackoverflow.com/questions/11360831/about-the-branchless-binary-search>
> >>> by BeeOnRope (Travis Downs):
> >>>
> >>> int needle; // value we are searching for
> >>> int *base = ...; // base pointer
> >>> int n; // number of elements in the current region
> >>> while (n > 1) {
> >>> int middle = n/2;
> >>> base += (needle < *base[middle]) ? 0 : middle;
> >>> n -= middle;
> >>> }
> >>>
> >>> (The context indicates that you should verify this code before use.)
> >>>
> >>> And of course (x<y)?0:middle is equivalent to ((x<y)-1)&middle, but
> >>> the compiler probably knows this.
> >>>
> >>> The more interesting part is that the posting then goes on to point
> >>> out that this branchless code is slower than branching code in the
> >>> cases where the loads miss the first few cache levels, even if the
> >>> branches are unpredictable. That's because the branchless code makes
> >>> every load depend on the previous load, while in the branching code
> >>> there is no such dependency if the prediction is correct (50% of the
> >>> time in the unpredictable case, more for better predictability).
> >> The branching version depends on flooding the memory hierarchy with
> >> speculative requests to get as much overlapped cache line fetches
> >> going at once, and then on average uses maybe 50% of those fetches,
> >> but that is still a win.
> >>
> >> Now if we de-Spectre-ize the memory subsystem so that it cannot
> >> fetch across memory hierarchy levels under speculation,
> >> all that "win" disappears and the two should be the same.
> >>
> >> That could also indicate that all loop unrolling might take a hit.
> >>> Downsides of the branching code in the unpredictable case:
> >>>
> >>> * lots of fetching down the predicted path, most of which is probably
> >>> canceled.
> >>>
> >>> * in the mispredict case an iteration takes the load latency (in order
> >>> to be able to resolce the prediction) plus the mispredict penalty
> >>> (plus some minor change).
> >>>
> >>> * in the mispredict case you have speculatively loaded a cache line
> >>> that you don't need, and evicted a probably more useful cache line.
> >>>
> >>> You can add prefetches or actual loads to the branchless variant:
> >>> E.g., always prefetch the whole left spine down to the leaf (if you
> >>> view it like a search tree). In the unpredictable case this is as
> >>> accurate (and as wasteful) as the branching code, and the performance
> >>> may be a little better (no mispredict penalty). Alternatively,
> >>> prefetch one level of both sides. This has the same effect of halving
> >>> the recurrence latency, but the load unit and cache eviction cost is
> >>> probably lower. You could prefetch further, but the costs rise
> >>> exponentially, while the benefits shrink with each additional level.
> >> We would need to allow explicit prefetches done under speculation
> >> to not be blocked by de-Spectre-ization of memory hierarchy.
> >
> > de-Spectre-ization is a pipe dream. Not going to happen.
> I am not sure what you are saying here. ISTM that statement could mean
> one of three things.
>
> 1. It is impossible to design any OoO CPU, even from scratch that is not
> susceptible to Spectre. I think Mitch and others would disagree,
>
> 2. It is not possible to design an OoO X86 compatible CPU that is not
> susceptible.
>
> 3. It is possible but economic concerns dictate that it will not happen.
> These concerns might include reduced performance or increased power as
> well as increased product cost, or even just not worth the design
> resources to implement.
>
> Which one (or a different one) are you saying?
>

The 3rd. Except that I don't see increased power as major concern.
Reduced performance - yes.
Increased area - yes.
Not worth design resources - yes, yes, yes
Also, futility of the effort.
A new sub-channel with the same effect except, may be,
slightly lower bandwidth, will be found at 6 months after you release your
spectre-proof CPU. That is, if it becomes popular, otherwise nobody would care.

IMHO, all people willing to understood the obvious already know that if they want
their secrets secure then they shouldn't run binary attacker's code on the same
machine with sensitive data. And that they should run JITTed attacker's code as little
as possible and preferably not just not in the same process with sensitive data, but
even not in the same trust domain (i.e., in Windows/**IX trust model, not the same account)
and if data is seriously sensitive then they shouldn't run JITTed attacker's code at all.
It is that simple. Anybody who claims otherwise is lying.
Don't believe him even if his name is Jeff Bezos. Or especially if his name is Jeff Bezos.

But if one's data are of lesser importance, he don't have to care about spectre-type attacks,
because launching such attack is not easy for attacker. If he wiil be attacked at all, it wouldn't
be through spectre or through other sexy side-channel method, but by something much more
primitive. And there is big chance that his defense wouldn't be good enough even against
those primitive old-fashioned attacks.

>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: branchless binary search

<t6bek1$tm6$1@newsreader4.netcologne.de>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25443&group=comp.arch#25443

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-c475-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: branchless binary search
Date: Sat, 21 May 2022 19:30:09 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t6bek1$tm6$1@newsreader4.netcologne.de>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad>
<89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
<t68h2a$ruq$1@dont-email.me>
<5ed3c521-f7de-4774-9029-28a53b8f0a9en@googlegroups.com>
Injection-Date: Sat, 21 May 2022 19:30:09 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-c475-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:c475:0:7285:c2ff:fe6c:992d";
logging-data="30406"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 21 May 2022 19:30 UTC

Michael S <already5chosen@yahoo.com> schrieb:

> IMHO, all people willing to understood the obvious already know that if they want
> their secrets secure then they shouldn't run binary attacker's code on the same
> machine with sensitive data. And that they should run JITTed attacker's code as little
> as possible and preferably not just not in the same process with sensitive data, but
> even not in the same trust domain (i.e., in Windows/**IX trust model, not the same account)
> and if data is seriously sensitive then they shouldn't run JITTed attacker's code at all.
> It is that simple. Anybody who claims otherwise is lying.
> Don't believe him even if his name is Jeff Bezos. Or especially if his name is Jeff Bezos.

Amen.

>
> But if one's data are of lesser importance, he don't have to care about spectre-type attacks,
> because launching such attack is not easy for attacker. If he wiil be attacked at all, it wouldn't
> be through spectre or through other sexy side-channel method, but by something much more
> primitive. And there is big chance that his defense wouldn't be good enough even against
> those primitive old-fashioned attacks.

https://xkcd.com/2166/ comes to mind.

Re: Ill-advised use of CMOVE

<m-6dnanyXqYjmxf_nZ2dnUU7-dPNnZ2d@supernews.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25444&group=comp.arch#25444

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!buffer2.nntp.dca1.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sun, 22 May 2022 04:47:42 -0500
Sender: Andrew Haley <aph@zarquon.pink>
From: aph...@littlepinkcloud.invalid
Subject: Re: Ill-advised use of CMOVE
Newsgroups: comp.arch
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad> <2022May15.194522@mips.complang.tuwien.ac.at> <0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com> <OQrgK.3333$6y5f.202@fx40.iad> <ZEwgK.60130$qMI1.19308@fx96.iad>
User-Agent: tin/1.9.2-20070201 ("Dalaruan") (UNIX) (Linux/4.18.0-348.12.2.el8_5.x86_64 (x86_64))
Message-ID: <m-6dnanyXqYjmxf_nZ2dnUU7-dPNnZ2d@supernews.com>
Date: Sun, 22 May 2022 04:47:42 -0500
Lines: 12
X-Trace: sv3-uCGEk6/ZcVTooqJjifJ7Whruh05aw6mnKrEkpSjcI6nBrsmo3kj0zQo6odMrLYETZAFP89BNIHBX1CI!LTmxboQGK33qe06CxSIHQ+opvsyHryH7AMdhMGcMAY4iS28vKyl3/gP4OsUAMH/cyM2TASGTYYVs!Cf6WSnmM
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 2005
 by: aph...@littlepinkcloud.invalid - Sun, 22 May 2022 09:47 UTC

EricP <ThatWouldBeTelling@thevillage.com> wrote:
>
> I can't find any details on those ARM uArch's at all. None.
> Almost all predication studies are proposals by Intel for doing OoO Itanium.

Looking at the neoverse N2 Software Optimization Guide, I see that
basic ALU ops have 1 clock latency, 4 ops/clock throughput. Basic ALU
ops with condition run at only 1 op/clock. These predicated
instructions run on the Integer Multicycle Unit, also used for things
like multiplication, rather then the standard Integer Unit.

Andrew.

Spectre fix (was: branchless binary search)

<2022May22.110702@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25445&group=comp.arch#25445

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Spectre fix (was: branchless binary search)
Date: Sun, 22 May 2022 09:07:02 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 71
Message-ID: <2022May22.110702@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com> <2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com> <2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at> <nkOhK.5260$wIO9.4690@fx12.iad> <89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com> <t68h2a$ruq$1@dont-email.me> <5ed3c521-f7de-4774-9029-28a53b8f0a9en@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="8241a6e5816418ac47af082a8fb0d782";
logging-data="27127"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX183s39j/Taylkpch3BnofED"
Cancel-Lock: sha1:/fDT5LLfM/603nGiqyYPdq+5SDk=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 22 May 2022 09:07 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Friday, May 20, 2022 at 7:53:33 PM UTC+3, Stephen Fuld wrote:
>> On 5/20/2022 9:25 AM, Michael S wrote:
>> > de-Spectre-ization is a pipe dream. Not going to happen.
>> 3. It is possible but economic concerns dictate that it will not happen.
>> These concerns might include reduced performance or increased power as
>> well as increased product cost, or even just not worth the design
>> resources to implement.
>>
>> Which one (or a different one) are you saying?
>>
>
[reformatted to satisfy Usenet line length conventions]
>The 3rd. Except that I don't see increased power as major concern.
>Reduced performance - yes.
>Increased area - yes.
>Not worth design resources - yes, yes, yes
>Also, futility of the effort.
>A new sub-channel with the same effect except, may be, slightly lower
>bandwidth, will be found at 6 months after you release your
>spectre-proof CPU.

Possibly. The same could be said about any other vulnerability.
Should we stop fixing them? In contrast to software vulnerabilities,
which seem to have an endless supply given the huge amount of software
written every day, I have the hope that microarchitectural and
hardware vulnerabilities will be much rarer and (where they appear)
have less impact, for the same reasons that other hardware bugs are
rarer and have less impact: because fixing hardware bugs is more
expensive, hardware manufacturers are more careful in avoiding them.

But that hope will only continue to be true if the customers actually
care about these vulnerabilities, otherwise the hardware manufacturers
have no incentive to be careful about them.

So your advocacy of not fixing Spectre is working towards a world
where hardware vulnerabilities are as common as software
vulnerabilities.

>IMHO, all people willing to understood the obvious already know that
>if they want their secrets secure then they shouldn't run binary
>attacker's code on the same machine with sensitive data.

You apparently are under the illusion (or you want us to fall prey to
that illusion) that your CPU needs to execute any attacker-controlled
code to be vulnerable to Spectre. NetSpectre
<https://martinschwarzl.at/media/files/netspectre.pdf> has
demonstrated already in 2018 that this is not the case:

|In this paper, we present NetSpectre, a new attack based on Spectre,
|requiring no attacker-controlled code on the target device

>But if one's data are of lesser importance, he don't have to care
>about spectre-type attacks

This reminds me of Obi-Wan Kenobi's arrival in Mos Eisley:

|Stormtrooper : Let me see your identification.
| |Ben Obi-Wan Kenobi : [with a small wave of his hand] You don't need to
| see his identification.
| |Stormtrooper : We don't need to see his identification.

I am sure the NSA and their counterparts from other countries want us to
believe that we don't have to care about Spectre-type attacks.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Spectre fix (was: branchless binary search)

<14ae34ee-74c0-4211-a5b2-02c8fb06b4c2n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25446&group=comp.arch#25446

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:590c:0:b0:2f3:e1b7:5d1d with SMTP id 12-20020ac8590c000000b002f3e1b75d1dmr13181535qty.191.1653219677669;
Sun, 22 May 2022 04:41:17 -0700 (PDT)
X-Received: by 2002:a9d:124:0:b0:605:eac8:65a with SMTP id 33-20020a9d0124000000b00605eac8065amr6801198otu.381.1653219677219;
Sun, 22 May 2022 04:41:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 22 May 2022 04:41:17 -0700 (PDT)
In-Reply-To: <2022May22.110702@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad> <89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
<t68h2a$ruq$1@dont-email.me> <5ed3c521-f7de-4774-9029-28a53b8f0a9en@googlegroups.com>
<2022May22.110702@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <14ae34ee-74c0-4211-a5b2-02c8fb06b4c2n@googlegroups.com>
Subject: Re: Spectre fix (was: branchless binary search)
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 22 May 2022 11:41:17 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 5700
 by: Michael S - Sun, 22 May 2022 11:41 UTC

On Sunday, May 22, 2022 at 12:53:26 PM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Friday, May 20, 2022 at 7:53:33 PM UTC+3, Stephen Fuld wrote:
> >> On 5/20/2022 9:25 AM, Michael S wrote:
> >> > de-Spectre-ization is a pipe dream. Not going to happen.
> >> 3. It is possible but economic concerns dictate that it will not happen.
> >> These concerns might include reduced performance or increased power as
> >> well as increased product cost, or even just not worth the design
> >> resources to implement.
> >>
> >> Which one (or a different one) are you saying?
> >>
> >
> [reformatted to satisfy Usenet line length conventions]
> >The 3rd. Except that I don't see increased power as major concern.
> >Reduced performance - yes.
> >Increased area - yes.
> >Not worth design resources - yes, yes, yes
> >Also, futility of the effort.
> >A new sub-channel with the same effect except, may be, slightly lower
> >bandwidth, will be found at 6 months after you release your
> >spectre-proof CPU.
> Possibly. The same could be said about any other vulnerability.
> Should we stop fixing them?

As far as HW is concerned, spectre is not a vulnerability, it's a feature.
It should be documented rather than fixed.

> In contrast to software vulnerabilities,
> which seem to have an endless supply given the huge amount of software
> written every day, I have the hope that microarchitectural and
> hardware vulnerabilities will be much rarer and (where they appear)
> have less impact, for the same reasons that other hardware bugs are
> rarer and have less impact: because fixing hardware bugs is more
> expensive, hardware manufacturers are more careful in avoiding them.
>
> But that hope will only continue to be true if the customers actually
> care about these vulnerabilities, otherwise the hardware manufacturers
> have no incentive to be careful about them.
>

As a customer, I am not willing to pay for HW fix for spectre, even if payment is small.

> So your advocacy of not fixing Spectre is working towards a world
> where hardware vulnerabilities are as common as software
> vulnerabilities.

Only if we consider spectre a hw vulnerability. I don't.

> >IMHO, all people willing to understood the obvious already know that
> >if they want their secrets secure then they shouldn't run binary
> >attacker's code on the same machine with sensitive data.
> You apparently are under the illusion (or you want us to fall prey to
> that illusion) that your CPU needs to execute any attacker-controlled
> code to be vulnerable to Spectre. NetSpectre
> <https://martinschwarzl.at/media/files/netspectre.pdf> has
> demonstrated already in 2018 that this is not the case:

I think, I read it back then and was not impressed. Don't want to read again.

>
> |In this paper, we present NetSpectre, a new attack based on Spectre,
> |requiring no attacker-controlled code on the target device
> >But if one's data are of lesser importance, he don't have to care
> >about spectre-type attacks
> This reminds me of Obi-Wan Kenobi's arrival in Mos Eisley:
>
> |Stormtrooper : Let me see your identification.
> |
> |Ben Obi-Wan Kenobi : [with a small wave of his hand] You don't need to
> | see his identification.
> |
> |Stormtrooper : We don't need to see his identification.
>
> I am sure the NSA and their counterparts from other countries want us to
> believe that we don't have to care about Spectre-type attacks.

I don't know what they want me to believe.
Personally, I believe that it's not just not the weakest link in hw/sw security of
what I use daily, but that it does not make top 1000 in the list of weakest links.
So, may be, an opposite of what you say is true - bad guys prefer me to believe
that spectre is serious in order for taken my attention away from more obvious
holes.

> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: branchless binary search

<jwv1qwly95l.fsf-monnier+comp.arch@gnu.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25447&group=comp.arch#25447

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: branchless binary search
Date: Sun, 22 May 2022 09:50:56 -0400
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <jwv1qwly95l.fsf-monnier+comp.arch@gnu.org>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at>
<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at>
<c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at>
<2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad>
<89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="244c4b3a7756d9d4175c2787a4586186";
logging-data="19808"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+nJ2OQyAq+VWsEHHBzzEqE"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:xMis1t4JKXfxGvWjrX5oF8z4Gvo=
sha1:JsC8BuDcaWjpzt3VPHxv75Tpbgo=
 by: Stefan Monnier - Sun, 22 May 2022 13:50 UTC

> de-Spectre-ization is a pipe dream. Not going to happen.

I'm also wondering if it'll happen.

So far hardware manufacturers have been rather keen on improving
"security" when it comes to protecting the interest of manufacturers
against "attacks" by the device owners (e.g. HDCP, Secure Boot, the
interest around encrypting the DRAM interface, ...), so maybe
Spectre-proof hardware will only be seriously considered when it's used
for things like leaking the private keys of BluRay, or breaking out of
an iOS jail?

Stefan

Re: branchless binary search

<e08bfb5f-d19d-45fb-905a-082f544970cfn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25448&group=comp.arch#25448

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:208:b0:461:d511:c170 with SMTP id i8-20020a056214020800b00461d511c170mr14604475qvt.44.1653242824568;
Sun, 22 May 2022 11:07:04 -0700 (PDT)
X-Received: by 2002:a05:6870:b383:b0:e9:2fea:2148 with SMTP id
w3-20020a056870b38300b000e92fea2148mr10161147oap.103.1653242824372; Sun, 22
May 2022 11:07:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 22 May 2022 11:07:04 -0700 (PDT)
In-Reply-To: <jwv1qwly95l.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7466:170f:a3cb:db21;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7466:170f:a3cb:db21
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de>
<t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
<2022May16.175019@mips.complang.tuwien.ac.at> <c17493ff-4a3d-4e90-bbbd-00f1ec658d23n@googlegroups.com>
<2022May17.110354@mips.complang.tuwien.ac.at> <2022May17.205327@mips.complang.tuwien.ac.at>
<nkOhK.5260$wIO9.4690@fx12.iad> <89da1370-6a58-47d6-bfd0-7a32b1c2934dn@googlegroups.com>
<jwv1qwly95l.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e08bfb5f-d19d-45fb-905a-082f544970cfn@googlegroups.com>
Subject: Re: branchless binary search
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 22 May 2022 18:07:04 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2406
 by: MitchAlsup - Sun, 22 May 2022 18:07 UTC

On Sunday, May 22, 2022 at 8:50:59 AM UTC-5, Stefan Monnier wrote:
> > de-Spectre-ization is a pipe dream. Not going to happen.
> I'm also wondering if it'll happen.
>
> So far hardware manufacturers have been rather keen on improving
> "security" when it comes to protecting the interest of manufacturers
> against "attacks" by the device owners (e.g. HDCP, Secure Boot, the
> interest around encrypting the DRAM interface, ...), so maybe
<
And also application vendors against the attacks of its customers.
<
> Spectre-proof hardware will only be seriously considered when it's used
> for things like leaking the private keys of BluRay, or breaking out of
> an iOS jail?
>
>
> Stefan

Pages:123456
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor