Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

< jaybonci> actually d-i stands for "divine intervention" ;) -- in #debian-devel


devel / comp.arch / Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

SubjectAuthor
* Ill-advised use of CMOVEStefan Monnier
+* Re: Ill-advised use of CMOVEThomas Koenig
|`* Re: Ill-advised use of CMOVEStephen Fuld
| `* Re: Ill-advised use of CMOVEThomas Koenig
|  +* Re: Ill-advised use of CMOVEStephen Fuld
|  |`- Re: Ill-advised use of CMOVEaph
|  `* Re: Ill-advised use of CMOVEAnton Ertl
|   +* Re: Ill-advised use of CMOVEMichael S
|   |`* Re: Ill-advised use of CMOVEAnton Ertl
|   | `* Re: Ill-advised use of CMOVEMichael S
|   |  +* Re: Ill-advised use of CMOVEIvan Godard
|   |  |`* Re: Ill-advised use of CMOVEMichael S
|   |  | `* Re: Ill-advised use of CMOVEIvan Godard
|   |  |  `- Re: Ill-advised use of CMOVEMichael S
|   |  `* Re: Ill-advised use of CMOVEAnton Ertl
|   |   +- Re: Ill-advised use of CMOVEMichael S
|   |   `* branchless binary search (was: Ill-advised use of CMOVE)Anton Ertl
|   |    +* Re: branchless binary searchStefan Monnier
|   |    |`- Re: branchless binary searchAnton Ertl
|   |    +- Re: branchless binary searchTerje Mathisen
|   |    `* Re: branchless binary searchEricP
|   |     +* Re: branchless binary searchMichael S
|   |     |+* Re: branchless binary searchStephen Fuld
|   |     ||`* Re: branchless binary searchMichael S
|   |     || +- Re: branchless binary searchThomas Koenig
|   |     || `* Spectre fix (was: branchless binary search)Anton Ertl
|   |     ||  `- Re: Spectre fix (was: branchless binary search)Michael S
|   |     |+* Re: branchless binary searchStefan Monnier
|   |     ||`- Re: branchless binary searchMitchAlsup
|   |     |`* Re: branchless binary searchAndy Valencia
|   |     | `- Re: branchless binary searchTerje Mathisen
|   |     `* Re: branchless binary searchAnton Ertl
|   |      `* Re: branchless binary searchMitchAlsup
|   |       `* Spectre and resource comtention (was: branchless binary search)Anton Ertl
|   |        +* Re: Spectre and resource comtentionStefan Monnier
|   |        |+- Re: Spectre and resource comtentionMitchAlsup
|   |        |`- Re: Spectre and resource comtentionAnton Ertl
|   |        `* Re: Spectre and resource comtention (was: branchless binary search)MitchAlsup
|   |         `* Re: Spectre and resource comtention (was: branchless binary search)Anton Ertl
|   |          `- Re: Spectre and resource comtention (was: branchless binary search)MitchAlsup
|   +* Re: Ill-advised use of CMOVEStephen Fuld
|   |`* binary search vs. hash tables (was: Ill-advised use of CMOVE)Anton Ertl
|   | +* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Stephen Fuld
|   | |`* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)John Levine
|   | | `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |  `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |   `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |    `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |     `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |      `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |       `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | |        `* Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Brett
|   | |         `- Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)Michael S
|   | `- Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)John Levine
|   +* Re: Ill-advised use of CMOVEEricP
|   |`* Re: Ill-advised use of CMOVEBGB
|   | +* Re: Ill-advised use of CMOVEMitchAlsup
|   | |+* Re: Ill-advised use of CMOVEBGB
|   | ||`* Re: Ill-advised use of CMOVEMitchAlsup
|   | || +* Re: Ill-advised use of CMOVEBGB
|   | || |`- Re: Ill-advised use of CMOVEMitchAlsup
|   | || `- Re: Ill-advised use of CMOVEIvan Godard
|   | |`* Re: Ill-advised use of CMOVEIvan Godard
|   | | `- Re: Ill-advised use of CMOVEMitchAlsup
|   | `- Re: Ill-advised use of CMOVEIvan Godard
|   `- Re: Ill-advised use of CMOVEThomas Koenig
+* Re: Ill-advised use of CMOVETerje Mathisen
|+* Re: Ill-advised use of CMOVEMitchAlsup
||`* Re: Ill-advised use of CMOVETerje Mathisen
|| `* Re: Ill-advised use of CMOVEMitchAlsup
||  `* Re: Ill-advised use of CMOVEMarcus
||   `* Re: Ill-advised use of CMOVEMitchAlsup
||    `* Re: Ill-advised use of CMOVEBGB
||     `* Re: Ill-advised use of CMOVEMitchAlsup
||      `- Re: Ill-advised use of CMOVEBGB
|+- Re: Ill-advised use of CMOVEIvan Godard
|`* Re: Ill-advised use of CMOVEStephen Fuld
| +* Re: Ill-advised use of CMOVEMichael S
| |`* Re: Ill-advised use of CMOVEStephen Fuld
| | `- Re: Ill-advised use of CMOVEAnton Ertl
| +* Re: Ill-advised use of CMOVEStefan Monnier
| |`* Re: Ill-advised use of CMOVEStephen Fuld
| | `* Re: Ill-advised use of CMOVEAnton Ertl
| |  `* Re: Ill-advised use of CMOVEBGB
| |   `* Re: Ill-advised use of CMOVEMitchAlsup
| |    +- Re: Ill-advised use of CMOVEBGB
| |    `* Re: Ill-advised use of CMOVEThomas Koenig
| |     +- Re: Ill-advised use of CMOVEAnton Ertl
| |     `- Re: Ill-advised use of CMOVEMitchAlsup
| +* Re: Ill-advised use of CMOVEAnton Ertl
| |`* Re: Ill-advised use of CMOVEMitchAlsup
| | `* Re: Ill-advised use of CMOVEEricP
| |  +* Re: Ill-advised use of CMOVEMichael S
| |  |`* Re: Ill-advised use of CMOVEEricP
| |  | +- Re: Ill-advised use of CMOVEMitchAlsup
| |  | +* Re: Ill-advised use of CMOVETerje Mathisen
| |  | |`* Re: Ill-advised use of CMOVEEricP
| |  | | `* Re: Ill-advised use of CMOVEMitchAlsup
| |  | |  `- Re: Ill-advised use of CMOVEBGB
| |  | `- Re: Ill-advised use of CMOVEAnton Ertl
| |  `* Re: Ill-advised use of CMOVEAnton Ertl
| `* Re: Ill-advised use of CMOVETerje Mathisen
+- Re: Ill-advised use of CMOVEAnton Ertl
`* Re: Ill-advised use of CMOVEScott Michel

Pages:123456
Re: Ill-advised use of CMOVE

<t5rbct$rbe$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25300&group=comp.arch#25300

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Sun, 15 May 2022 09:57:00 -0700
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <t5rbct$rbe$1@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 15 May 2022 16:57:01 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a509753fa6896ee6e789545f2bc1ed99";
logging-data="28014"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18PEWhnQiKE/Zj52HqrWzHpMnaaLaoMyTw="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:FUqO1Q7bVzDC+Z1wDYCp+oR2QuM=
In-Reply-To: <t58mug$ref$2@newsreader4.netcologne.de>
Content-Language: en-US
 by: Stephen Fuld - Sun, 15 May 2022 16:57 UTC

On 5/8/2022 8:17 AM, Thomas Koenig wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> schrieb:
>> We recently bumped into a funny performance behavior in Emacs.
>> Some code computing the length of a (possibly circular) list ended up
>> macroexpanded to something like:
>>
>> for (struct for_each_tail_internal li = { list, 2, 0, 2 };
>> CONSP (list);
>> (list = XCDR (list),
>> ((--li.q != 0
>> || 0 < --li.n
>> || (li.q = li.n = li.max <<= 1, li.n >>= USHRT_WIDTH,
>> li.tortoise = (list), false))
>> && EQ (list, li.tortoise))
>> ? (void) (list = Qnil) : (void) 0))
>> len++;
>>
>> `EQ` is currently a slightly costly operation but in this specific case
>> it can be replaced with a plain `==`. When we tried that, the resulting
>> loop ended up running almost twice *slower*.
>>
>> The problem turned out that with the simpler comparison, both GCC and
>> LLVM decided it would be a good idea to use CMOVE for the
>> `?:` operation, which just ends up making the data flow's critical path
>> longer whereas a branch works much better here since it's trivial
>> to predict.
>
> It would probably good to submit a PR for this. I'm not overly
> optimisic about the chances of finding a good heuristic for when
> code can be well predicted by a branch predictor, but it's worth
> a shot.

After going through all of the responses in this thread, I am led to the
following:

CMOV was probably a good idea when it was first developed, but the
development of highly accurate hardware branch predictors has lessened
its utility greatly to where it can frequently be the wrong choice.

As Anton has said, compilers, and humans are not very good at making
branch predictions in most cases, and the cases where they are good
(i.e. infrequent exceptional cases), the hardware does essentially as good.

But as Terje has said there are cases, such as decompressing data, where
CMOV is probably a win. Fortunately, the programmer probably know he is
doing that sort of thing, and thus knows that a branch predictor will
not be effective.

So, I think a good solution is for the compiler to essentially never
decide to issue CMOV as a replacement for a conditional branch for
architectures with good branch predictors (not saying anything here
about predication.). It should also probably ignore the existing hint
pragmas. But there probably should be a pragma to tell the compiler to
use CMOV in the rare case of things like decompression, where the
programmer can realistically know that the hardware branch predictor
will be a bad choice. The documentation should explicitly state when it
should be a good to use, i.e. rarely.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Ill-advised use of CMOVE

<cGagK.5409$x1Wf.800@fx10.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25307&group=comp.arch#25307

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <49f15662-d5c6-432b-bca3-ec0b1cd820ddn@googlegroups.com> <JZ8gK.6877$j0D5.5934@fx09.iad> <749dacbf-67e9-4121-82be-cb5fe54105cen@googlegroups.com>
In-Reply-To: <749dacbf-67e9-4121-82be-cb5fe54105cen@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 79
Message-ID: <cGagK.5409$x1Wf.800@fx10.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 15 May 2022 17:23:20 UTC
Date: Sun, 15 May 2022 13:22:20 -0400
X-Received-Bytes: 4467
 by: EricP - Sun, 15 May 2022 17:22 UTC

MitchAlsup wrote:
> On Sunday, May 15, 2022 at 10:27:42 AM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Saturday, May 14, 2022 at 3:39:37 AM UTC-5, Anton Ertl wrote:
>>>> EricP <ThatWould...@thevillage.com> writes:
>>>>> MitchAlsup wrote:
>>>>>> On Tuesday, May 10, 2022 at 1:09:01 PM UTC-5, Anton Ertl wrote:
>>>>>>> In an OoO machine, this can mean several different things.
>>>>>>>
>>>>>>> * high resource usage. On the 21264, CMOV takes two
>>>>>>> microinstructions.
>>>>> For CMOV 1 or 2 extra uOps in a 100+ instruction queue is not a
>>>>> problem but for full predication this approach would not do
>>>>> and it requires smarter uOps.
>>> <
>>>> Yes, I doubt that the extra uOps cause a slowdown in most cases and a
>>>> significant slowdown in the rest. And I think that the independence
>>>> of the other source is so important for performance that this is a
>>>> good approach for predication, too.
>>> <
>>>>> In the Alpha 21264 case with its 2-uOp CMOV approach, each uOp can
>>>>> proceed independently when the condition and its data source are ready.
>>>>> Slightly better.
>>> <
>>>> I think this can be a lot better, depending on the data dependencies.
>>>> In particular, if you have a loop recurrence involving a CMOV and one
>>>> input to the CMOV is loop-invariant (e.g., a constant), every time
>>>> this input is selected breaks the recurrence latency chain. And even
>>>> if both inputs are loop variants, if one has a long latency (e.g., it
>>>> involves a load) and the other a short latency (e.g., it's just the
>>>> result from the last iteration), this is a win.
>>>>
>>>> For predication, if you have an if-then-else instruction, and a
>>>> then-instruction with the same destination register as an
>>>> else-instruction, you can pair these up as the two complementary
>>>> microinstructions producing the same target, without needing extra
>>>> microinstructions.
>>> <
>>> I give the same physical register name to instructions in the then-clause
>>> and in the else clause. One will get delivered, the other one will get suppressed.
>> So this handles things like:
>>
>> x = cond? y + z : a * b;
>>
>> This merges a PHI and two alternate uOps to eliminate the
>> temp register allocations and copies. But it requires all the
>> source operands y,z,a,b to be loaded so the savings are minimal.
>>
>> I would want to skip loading either y,z or a,b as that is
>> where all the big saving are to be had.
>> That would be much harder for a decoder to detect and optimize
>> as the sequence is too long.
>>
>> PRED { 111000 }
>> LD r1,[y]
>> LD r2,[z]
>> ADD r3 = r1 + r2
>> LD r1,[a]
>> LD r2,[b]
>> MUL r3 = r1 * r2
>> ST [x],r3
> <
> Yes, and especially when written this way::
> <
> PREDcnd {TE}
> ADD R3,Ry,Rz
> MUL R3,Ra,Rb

Yes, that is the idiom that forces it to always load Ry,Rz,Ra,Rb
and then skips calculation. Like CMOV it requires preparing data
for both paths, then skips using it. It's slightly better than CMOV
because it can potentially prune the MUL if disabled, but not that much.

This goes back to the original question of why CMOV doesn't really
benefit performance that much. I believe, though I have no proof,
that to benefit it needs to be able to skip loading values that it
never uses, or storing values that it didn't change.

Re: Ill-advised use of CMOVE

<325c06f2-0bbe-4192-8095-a36b6a067e42n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25308&group=comp.arch#25308

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:b312:0:b0:45a:a8d7:ecd6 with SMTP id s18-20020a0cb312000000b0045aa8d7ecd6mr12277131qve.100.1652635619311;
Sun, 15 May 2022 10:26:59 -0700 (PDT)
X-Received: by 2002:a05:6870:b383:b0:e9:2fea:2148 with SMTP id
w3-20020a056870b38300b000e92fea2148mr12584774oap.103.1652635619088; Sun, 15
May 2022 10:26:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 15 May 2022 10:26:58 -0700 (PDT)
In-Reply-To: <cGagK.5409$x1Wf.800@fx10.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4811:b402:98e:d212;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4811:b402:98e:d212
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org>
<t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at>
<0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad>
<2022May14.100411@mips.complang.tuwien.ac.at> <49f15662-d5c6-432b-bca3-ec0b1cd820ddn@googlegroups.com>
<JZ8gK.6877$j0D5.5934@fx09.iad> <749dacbf-67e9-4121-82be-cb5fe54105cen@googlegroups.com>
<cGagK.5409$x1Wf.800@fx10.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <325c06f2-0bbe-4192-8095-a36b6a067e42n@googlegroups.com>
Subject: Re: Ill-advised use of CMOVE
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 15 May 2022 17:26:59 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 5417
 by: MitchAlsup - Sun, 15 May 2022 17:26 UTC

On Sunday, May 15, 2022 at 12:23:23 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Sunday, May 15, 2022 at 10:27:42 AM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Saturday, May 14, 2022 at 3:39:37 AM UTC-5, Anton Ertl wrote:
> >>>> EricP <ThatWould...@thevillage.com> writes:
> >>>>> MitchAlsup wrote:
> >>>>>> On Tuesday, May 10, 2022 at 1:09:01 PM UTC-5, Anton Ertl wrote:
> >>>>>>> In an OoO machine, this can mean several different things.
> >>>>>>>
> >>>>>>> * high resource usage. On the 21264, CMOV takes two
> >>>>>>> microinstructions.
> >>>>> For CMOV 1 or 2 extra uOps in a 100+ instruction queue is not a
> >>>>> problem but for full predication this approach would not do
> >>>>> and it requires smarter uOps.
> >>> <
> >>>> Yes, I doubt that the extra uOps cause a slowdown in most cases and a
> >>>> significant slowdown in the rest. And I think that the independence
> >>>> of the other source is so important for performance that this is a
> >>>> good approach for predication, too.
> >>> <
> >>>>> In the Alpha 21264 case with its 2-uOp CMOV approach, each uOp can
> >>>>> proceed independently when the condition and its data source are ready.
> >>>>> Slightly better.
> >>> <
> >>>> I think this can be a lot better, depending on the data dependencies.
> >>>> In particular, if you have a loop recurrence involving a CMOV and one
> >>>> input to the CMOV is loop-invariant (e.g., a constant), every time
> >>>> this input is selected breaks the recurrence latency chain. And even
> >>>> if both inputs are loop variants, if one has a long latency (e.g., it
> >>>> involves a load) and the other a short latency (e.g., it's just the
> >>>> result from the last iteration), this is a win.
> >>>>
> >>>> For predication, if you have an if-then-else instruction, and a
> >>>> then-instruction with the same destination register as an
> >>>> else-instruction, you can pair these up as the two complementary
> >>>> microinstructions producing the same target, without needing extra
> >>>> microinstructions.
> >>> <
> >>> I give the same physical register name to instructions in the then-clause
> >>> and in the else clause. One will get delivered, the other one will get suppressed.
> >> So this handles things like:
> >>
> >> x = cond? y + z : a * b;
> >>
> >> This merges a PHI and two alternate uOps to eliminate the
> >> temp register allocations and copies. But it requires all the
> >> source operands y,z,a,b to be loaded so the savings are minimal.
> >>
> >> I would want to skip loading either y,z or a,b as that is
> >> where all the big saving are to be had.
> >> That would be much harder for a decoder to detect and optimize
> >> as the sequence is too long.
> >>
> >> PRED { 111000 }
> >> LD r1,[y]
> >> LD r2,[z]
> >> ADD r3 = r1 + r2
> >> LD r1,[a]
> >> LD r2,[b]
> >> MUL r3 = r1 * r2
> >> ST [x],r3
> > <
> > Yes, and especially when written this way::
> > <
> > PREDcnd {TE}
> > ADD R3,Ry,Rz
> > MUL R3,Ra,Rb
> Yes, that is the idiom that forces it to always load Ry,Rz,Ra,Rb
<
Or allocate those values into registers.
<
> and then skips calculation. Like CMOV it requires preparing data
> for both paths, then skips using it. It's slightly better than CMOV
> because it can potentially prune the MUL if disabled, but not that much.
<
Just because one has good predication does not remove the utility
of CMOV.
>
> This goes back to the original question of why CMOV doesn't really
> benefit performance that much. I believe, though I have no proof,
> that to benefit it needs to be able to skip loading values that it
> never uses, or storing values that it didn't change.
<
CMOV is for those hard to predict cases. Do not "bet against" your
branch predictor.

Re: Ill-advised use of CMOVE

<2022May15.190911@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25309&group=comp.arch#25309

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Sun, 15 May 2022 17:09:11 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 56
Message-ID: <2022May15.190911@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <49f15662-d5c6-432b-bca3-ec0b1cd820ddn@googlegroups.com> <JZ8gK.6877$j0D5.5934@fx09.iad>
Injection-Info: reader02.eternal-september.org; posting-host="a804ed51de3bbe163e8bef27aeacdbb8";
logging-data="15055"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Y+Vuny9mtCsDfR9wjpDMO"
Cancel-Lock: sha1:Af3CP+KwyvblAtYu/nms3Guf/3w=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 15 May 2022 17:09 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>So this handles things like:
>
> x = cond? y + z : a * b;
>
>This merges a PHI and two alternate uOps to eliminate the
>temp register allocations and copies. But it requires all the
>source operands y,z,a,b to be loaded so the savings are minimal.

This predication implementation is about reducing latency, not about
reducing resource consumption. I think that in most code latency
rather than resources limits the performance.

>I would want to skip loading either y,z or a,b as that is
>where all the big saving are to be had.

Register allocation?

But sure, if you have resource-limited code with unpredictable
branches, a different approach (and probably a different if-then-else
instruction would be more appropriate.

Maybe something like: Wait for the condition (as you suggested). If
an instruction need not be executed retire it without consuming
further execution resources. Depending on how resources are tracked,
it may be necessary to wait for the other inputs first.

>That would be much harder for a decoder to detect and optimize
>as the sequence is too long.
>
> PRED { 111000 }
> LD r1,[y]
> LD r2,[z]
> ADD r3 = r1 + r2
> LD r1,[a]
> LD r2,[b]
> MUL r3 = r1 * r2
> ST [x],r3

It could be easier if you arrange it like this:

PRED { 101010 }
LD r1,[y]
LD r1,[a]
LD r2,[z]
LD r2,[b]
ADD r3 = r1 + r2
MUL r3 = r1 * r2
ST [x],r3

Probably would also perform better on in-order CPUs.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Ill-advised use of CMOVE

<2022May15.194522@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25312&group=comp.arch#25312

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Sun, 15 May 2022 17:45:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 57
Message-ID: <2022May15.194522@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad>
Injection-Info: reader02.eternal-september.org; posting-host="a804ed51de3bbe163e8bef27aeacdbb8";
logging-data="15055"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197PmYLx+6JD7d2UGhU7XVy"
Cancel-Lock: sha1:+9O/ORKHa9gaECXwNClbjBcrFGg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 15 May 2022 17:45 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Anton Ertl wrote:
>> For predication, if you have an if-then-else instruction, and a
>> then-instruction with the same destination register as an
>> else-instruction, you can pair these up as the two complementary
>> microinstructions producing the same target, without needing extra
>> microinstructions. You can have a smart decoder that finds this out
>> dynamically, or encode this pairing in the if-then-else instruction
>> (with an exception if the target registers then don't match).
>
>It looks like a uOp fusion.

It certainly needs to have several instructions under consideration
when decoding such an if-then-else and the instructions it covers.

>>> This led them to propose what they called "predicate slip" whereby
>>> the data side can proceed as soon as its operands are ready,
>>> and the predicate state is checked before retire.
....
>> For OoO you don't want to wait for retirement, because other
>> instructions in the OoO engine depend on the results.
....
>This was proposed for OoO predication.
>It frees up reservation stations ASAP by executing uOps when their
>data operands are ready and not wait for the guarding predicate.
>The results are held in the ROB and cannot be forwarded until
>the predicate resolves.

Ok, but as soon as the predicate is then available, the result is made
available. So this is only about where the uOp waits in the meantime.
That should be no problem.

>For longer latency ops like DIV it makes sense.
>Also for ops like MUL which are fast but the FU
>is expensive so there is only one and they bottleneck.
>And FP which take multiple cycles and bottlenecks.
>
>But for the average 1-cycle integer ALU op where FU's are cheap
>and you have multiple units, this is unnecessary.

It's not clear to me what you mean with "it" or "this" here. I can
imagine situations where you want to perform any of these uOps
eagerly, and situations where you want to wait for the predicate.

I think that in this case the compiler can better (than for
predication-vs-branch) decide which one to use, but the decision is
microarchitecture-dependent, so it may still be better to leave it to
hardware.

>Just making OoO predication work at all looks pretty challenging to me.

OoO ARM CPUs exist.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Ill-advised use of CMOVE

<t5rg76$hp1$1@newsreader4.netcologne.de>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25313&group=comp.arch#25313

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-5c6-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Sun, 15 May 2022 18:19:18 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t5rg76$hp1$1@newsreader4.netcologne.de>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
Injection-Date: Sun, 15 May 2022 18:19:18 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-5c6-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:5c6:0:7285:c2ff:fe6c:992d";
logging-data="18209"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 15 May 2022 18:19 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:

> So, I think a good solution is for the compiler to essentially never
> decide to issue CMOV as a replacement for a conditional branch for
> architectures with good branch predictors (not saying anything here
> about predication.).

Binary search was also recently quoted here as an example where
CMOV makes a lot of sense - running a fixed number of iterations
for each size (to keep the branch predictors happy).

> It should also probably ignore the existing hint
> pragmas. But there probably should be a pragma to tell the compiler to
> use CMOV in the rare case of things like decompression, where the
> programmer can realistically know that the hardware branch predictor
> will be a bad choice. The documentation should explicitly state when it
> should be a good to use, i.e. rarely.

Ideally, a compiler would contain heuristics when a branch, or a
CMOVE, is better. How could one predict predictability of a branch?

Re: Ill-advised use of CMOVE

<0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25317&group=comp.arch#25317

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2886:b0:699:bab7:ae78 with SMTP id j6-20020a05620a288600b00699bab7ae78mr10105054qkp.618.1652644433382;
Sun, 15 May 2022 12:53:53 -0700 (PDT)
X-Received: by 2002:a9d:5888:0:b0:606:10d2:2fc1 with SMTP id
x8-20020a9d5888000000b0060610d22fc1mr5099096otg.29.1652644433147; Sun, 15 May
2022 12:53:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 15 May 2022 12:53:52 -0700 (PDT)
In-Reply-To: <2022May15.194522@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org>
<t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at>
<0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad>
<2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad> <2022May15.194522@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com>
Subject: Re: Ill-advised use of CMOVE
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 15 May 2022 19:53:53 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4499
 by: Michael S - Sun, 15 May 2022 19:53 UTC

On Sunday, May 15, 2022 at 8:59:35 PM UTC+3, Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
> >Anton Ertl wrote:
> >> For predication, if you have an if-then-else instruction, and a
> >> then-instruction with the same destination register as an
> >> else-instruction, you can pair these up as the two complementary
> >> microinstructions producing the same target, without needing extra
> >> microinstructions. You can have a smart decoder that finds this out
> >> dynamically, or encode this pairing in the if-then-else instruction
> >> (with an exception if the target registers then don't match).
> >
> >It looks like a uOp fusion.
> It certainly needs to have several instructions under consideration
> when decoding such an if-then-else and the instructions it covers.
> >>> This led them to propose what they called "predicate slip" whereby
> >>> the data side can proceed as soon as its operands are ready,
> >>> and the predicate state is checked before retire.
> ...
> >> For OoO you don't want to wait for retirement, because other
> >> instructions in the OoO engine depend on the results.
> ...
> >This was proposed for OoO predication.
> >It frees up reservation stations ASAP by executing uOps when their
> >data operands are ready and not wait for the guarding predicate.
> >The results are held in the ROB and cannot be forwarded until
> >the predicate resolves.
> Ok, but as soon as the predicate is then available, the result is made
> available. So this is only about where the uOp waits in the meantime.
> That should be no problem.
> >For longer latency ops like DIV it makes sense.
> >Also for ops like MUL which are fast but the FU
> >is expensive so there is only one and they bottleneck.
> >And FP which take multiple cycles and bottlenecks.
> >
> >But for the average 1-cycle integer ALU op where FU's are cheap
> >and you have multiple units, this is unnecessary.
> It's not clear to me what you mean with "it" or "this" here. I can
> imagine situations where you want to perform any of these uOps
> eagerly, and situations where you want to wait for the predicate.
>
> I think that in this case the compiler can better (than for
> predication-vs-branch) decide which one to use, but the decision is
> microarchitecture-dependent, so it may still be better to leave it to
> hardware.
> >Just making OoO predication work at all looks pretty challenging to me.
> OoO ARM CPUs exist.

But most of them (too my best knowledge, all of them) don't have "classic" A32
ISA as their primary instruction set. The older one are meant to run T32 code
fast, the newer optimized for aarch64.
So, I wouldn't be surprised if all existing Arm OoO cores simply crack predicated
A32 instruction and if it causes significant performance impact then so it goes.

> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Ill-advised use of CMOVE

<t5sbrt$jut$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25331&group=comp.arch#25331

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Sun, 15 May 2022 19:11:10 -0700
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <t5sbrt$jut$1@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 16 May 2022 02:11:10 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0595d2a0721e6e1fe7f0190327a23932";
logging-data="20445"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kSgd3sJ0eZmhXxaCPz6FvvXxh4fobotQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:2BMsSMp1CT0LL6QMrRWATywukYw=
In-Reply-To: <t5rg76$hp1$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Stephen Fuld - Mon, 16 May 2022 02:11 UTC

On 5/15/2022 11:19 AM, Thomas Koenig wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>
>> So, I think a good solution is for the compiler to essentially never
>> decide to issue CMOV as a replacement for a conditional branch for
>> architectures with good branch predictors (not saying anything here
>> about predication.).
>
> Binary search was also recently quoted here as an example where
> CMOV makes a lot of sense - running a fixed number of iterations
> for each size (to keep the branch predictors happy).
>
>> It should also probably ignore the existing hint
>> pragmas. But there probably should be a pragma to tell the compiler to
>> use CMOV in the rare case of things like decompression, where the
>> programmer can realistically know that the hardware branch predictor
>> will be a bad choice. The documentation should explicitly state when it
>> should be a good to use, i.e. rarely.
>
> Ideally, a compiler would contain heuristics when a branch, or a
> CMOVE, is better. How could one predict predictability of a branch?

That was Anton's and my point. Compilers aren't very good at doing
this, so my suggestion is don't ask them to.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Ill-advised use of CMOVE

<2022May16.080143@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25337&group=comp.arch#25337

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Mon, 16 May 2022 06:01:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 34
Distribution: world
Message-ID: <2022May16.080143@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="128a42e0247b02702bd84c94989e66d3";
logging-data="2065"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Q8pZh8GP6E/V2CDuStqVd"
Cancel-Lock: sha1:lLsGibBwWYHJwfVmrGAwNwWrqbI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 16 May 2022 06:01 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>
>> So, I think a good solution is for the compiler to essentially never
>> decide to issue CMOV as a replacement for a conditional branch for
>> architectures with good branch predictors (not saying anything here
>> about predication.).
>
>Binary search was also recently quoted here as an example where
>CMOV makes a lot of sense

As is often the case, this depends very much on the inputs.

And then there's the question of why one uses binary search in the
first place. The only time I remember using it is when searching for
the try-catch block of an exception in a JavaVM implementation. In
that case it is likely that the path through the binary search can be
predicted from the branch history before the throw.

>Ideally, a compiler would contain heuristics when a branch, or a
>CMOVE, is better. How could one predict predictability of a branch?

I doubt that there are good heuristics, especially for the compiler.
As a programmer, you are a little better off: If the branch depends on
input data, and that input data does not contain patterns, there's a
good chance that the branch is unpredictable.

But essentially, you have to measure it, or you leave the whole
problem to hardware.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Ill-advised use of CMOVE

<2022May16.091727@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25339&group=comp.arch#25339

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Mon, 16 May 2022 07:17:27 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 35
Message-ID: <2022May16.091727@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad> <2022May15.194522@mips.complang.tuwien.ac.at> <0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="128a42e0247b02702bd84c94989e66d3";
logging-data="6881"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1912Cz2+dgcwvKpKR9ZKTfi"
Cancel-Lock: sha1:0eRny09HsW43a6CaiHpIuVn9cTE=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 16 May 2022 07:17 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Sunday, May 15, 2022 at 8:59:35 PM UTC+3, Anton Ertl wrote:
>> EricP <ThatWould...@thevillage.com> writes:
>> >Just making OoO predication work at all looks pretty challenging to me.
>> OoO ARM CPUs exist.
>
>But most of them (too my best knowledge, all of them) don't have "classic" A32
>ISA as their primary instruction set. The older one are meant to run T32 code
>fast,

T32 includes if-then-else instructions, so it has to deal with
predicated instructions, too.

As for "primary instruction set", my understanding is that T32 was
designed such that it could be easily decoded into the same uops as
A32, and behind the decoder the ISA makes little difference (although
I guess that there are some interesting interactions between
if-then-else and exceptions). In particular, both T32 if-then-else
and A32 pedicated execution will result in predicated uops (with
various implementation options, as we have discussed).

>So, I wouldn't be surprised if all existing Arm OoO cores simply crack predicated
>A32 instruction and if it causes significant performance impact then so it goes.

Crack into what? I expect that in OoO implementations the T32
if-then-else is distributed across the uOps of the controlled
instructions in a predicate-like way, allowing the uOps to be
processed OoO. I very much doubt that they process, e.g., itete and
the four instructions it controls, as one block. But who knows, Apple
processes the instructions in groups of up to seven instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Ill-advised use of CMOVE

<a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25340&group=comp.arch#25340

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:16c2:b0:69f:ca37:f6b5 with SMTP id a2-20020a05620a16c200b0069fca37f6b5mr12052318qkn.48.1652689319050;
Mon, 16 May 2022 01:21:59 -0700 (PDT)
X-Received: by 2002:a05:6870:d1cd:b0:e1:e7ee:faa0 with SMTP id
b13-20020a056870d1cd00b000e1e7eefaa0mr14378107oac.5.1652689318766; Mon, 16
May 2022 01:21:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 16 May 2022 01:21:58 -0700 (PDT)
In-Reply-To: <2022May16.080143@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de>
<t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
Subject: Re: Ill-advised use of CMOVE
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 16 May 2022 08:21:59 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Mon, 16 May 2022 08:21 UTC

On Monday, May 16, 2022 at 9:22:21 AM UTC+3, Anton Ertl wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> >Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> >
> >> So, I think a good solution is for the compiler to essentially never
> >> decide to issue CMOV as a replacement for a conditional branch for
> >> architectures with good branch predictors (not saying anything here
> >> about predication.).
> >
> >Binary search was also recently quoted here as an example where
> >CMOV makes a lot of sense
> As is often the case, this depends very much on the inputs.
>
> And then there's the question of why one uses binary search in the
> first place.

Sorted array + binary search is a reasonable option for data structure+algorithm
when [past initialization phase] the dictionary is either totally or almost totally static.
I used it, for example, for generation of Dynamic HTML on resource-limited embedded
system.
Of course, in this case the keys were pointers to strings rather than simple integers,
and comparison was some variant of strcmp() rather than simple CMP.
My intuition tells me nothing about advantage/disadvantage of cmove vs branch for
such variation of binary search. It just mumbles something about "below noise floor".
Not that I would trust an intuition even if it was more decisive.

As to optimality of bsearch vs. other options, I'd guess, with decent certainty, that in my
case hash-based lookup would be faster, but also more complicated to code and
a little less robust in the unlikely worst case.
Preprocession of HTML sources with replacement of string keys by indices would be
the fastest option, but also the most labour-intensive.

> The only time I remember using it is when searching for
> the try-catch block of an exception in a JavaVM implementation. In
> that case it is likely that the path through the binary search can be
> predicted from the branch history before the throw.
> >Ideally, a compiler would contain heuristics when a branch, or a
> >CMOVE, is better. How could one predict predictability of a branch?
> I doubt that there are good heuristics, especially for the compiler.
> As a programmer, you are a little better off: If the branch depends on
> input data, and that input data does not contain patterns, there's a
> good chance that the branch is unpredictable.
>
> But essentially, you have to measure it, or you leave the whole
> problem to hardware.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Ill-advised use of CMOVE

<t5t8r4$qdi$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25344&group=comp.arch#25344

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Mon, 16 May 2022 03:25:41 -0700
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <t5t8r4$qdi$1@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 16 May 2022 10:25:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="567defd78a75f8fd093bba596b09f3d8";
logging-data="27058"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+eZtsQWVMR11rV0iDcgSXtz+Ns7aVb1Xs="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:aTz+vffXc4LOhAcEBk8SEsRtDAQ=
In-Reply-To: <2022May16.080143@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Stephen Fuld - Mon, 16 May 2022 10:25 UTC

On 5/15/2022 11:01 PM, Anton Ertl wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>>
>>> So, I think a good solution is for the compiler to essentially never
>>> decide to issue CMOV as a replacement for a conditional branch for
>>> architectures with good branch predictors (not saying anything here
>>> about predication.).
>>
>> Binary search was also recently quoted here as an example where
>> CMOV makes a lot of sense
>
> As is often the case, this depends very much on the inputs.
>
> And then there's the question of why one uses binary search in the
> first place.

In business applications, it at least used to be common enough that
COBOL actually has source code statements to tell the compiler that an
array (though they don't call it that) is sorted and that when searching
it, generate binary search code instead of sequential. Back in the dark
ages when I was programming some of this stuff, I used it quite frequently.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Ill-advised use of CMOVE

<OQrgK.3333$6y5f.202@fx40.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25345&group=comp.arch#25345

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx40.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad> <2022May15.194522@mips.complang.tuwien.ac.at> <0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com>
In-Reply-To: <0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 23
Message-ID: <OQrgK.3333$6y5f.202@fx40.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 16 May 2022 12:55:10 UTC
Date: Mon, 16 May 2022 08:50:09 -0400
X-Received-Bytes: 1935
 by: EricP - Mon, 16 May 2022 12:50 UTC

Michael S wrote:
> On Sunday, May 15, 2022 at 8:59:35 PM UTC+3, Anton Ertl wrote:
>> EricP <ThatWould...@thevillage.com> writes:
>>> Just making OoO predication work at all looks pretty challenging to me.
>> OoO ARM CPUs exist.
>
> But most of them (too my best knowledge, all of them) don't have "classic" A32
> ISA as their primary instruction set. The older one are meant to run T32 code
> fast, the newer optimized for aarch64.
> So, I wouldn't be surprised if all existing Arm OoO cores simply crack predicated
> A32 instruction and if it causes significant performance impact then so it goes.

According to page, the column for which ARMv7 are OoO lists:
Cortex-A9, Cortex-A12, Cortex-A15, Cortex-A17,
Qualcomm Scorpion (partial), Qualcomm Krait, Swift

https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores

I'll see if I can find some info on how
the uArch implements predication.

Re: Ill-advised use of CMOVE

<OQrgK.3334$6y5f.1197@fx40.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25346&group=comp.arch#25346

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!news1.tnib.de!feed.news.tnib.de!news.tnib.de!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx40.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at>
In-Reply-To: <2022May16.080143@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 25
Message-ID: <OQrgK.3334$6y5f.1197@fx40.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 16 May 2022 12:55:10 UTC
Date: Mon, 16 May 2022 08:54:53 -0400
X-Received-Bytes: 1783
 by: EricP - Mon, 16 May 2022 12:54 UTC

Anton Ertl wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>>
>>> So, I think a good solution is for the compiler to essentially never
>>> decide to issue CMOV as a replacement for a conditional branch for
>>> architectures with good branch predictors (not saying anything here
>>> about predication.).
>> Binary search was also recently quoted here as an example where
>> CMOV makes a lot of sense
>
> As is often the case, this depends very much on the inputs.
>
> And then there's the question of why one uses binary search in the
> first place. The only time I remember using it is when searching for
> the try-catch block of an exception in a JavaVM implementation. In
> that case it is likely that the path through the binary search can be
> predicted from the branch history before the throw.

Binary search is one of the decompositions for SWITCH statements
with discontiguous blocks.

Re: Ill-advised use of CMOVE

<2022May16.175019@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25348&group=comp.arch#25348

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
Date: Mon, 16 May 2022 15:50:19 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 32
Message-ID: <2022May16.175019@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at> <a52349f8-ace0-4f2c-af38-540713b6574en@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="128a42e0247b02702bd84c94989e66d3";
logging-data="15703"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+wmB7p4ntE0h5RA+PGM/1t"
Cancel-Lock: sha1:ccMBQG1mnCDaNNzQQYhg9UL6fNE=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 16 May 2022 15:50 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Monday, May 16, 2022 at 9:22:21 AM UTC+3, Anton Ertl wrote:
>> And then there's the question of why one uses binary search in the
>> first place.
>
>Sorted array + binary search is a reasonable option for data structure+algorithm
>when [past initialization phase] the dictionary is either totally or almost totally static.

Slow and complex.

>My intuition tells me nothing about advantage/disadvantage of cmove vs branch for
>such variation of binary search.

The question is how predictable your searches are. If they are
unpredictable, branches cost ~10 cycles (~20 per misprediction) on
average on a typical desktop CPU, and a "branchless" variant is much
faster.

>As to optimality of bsearch vs. other options, I'd guess, with decent certainty, that in my
>case hash-based lookup would be faster, but also more complicated to code and
>a little less robust in the unlikely worst case.

A simple hash table implementation is much less complicated than
keeping an array sorted and binary search (which is easy to get wrong,
and not so easy to notice that you got it wrong). The worst case is
worse, but it's much more likely that the simple hash table beats the
binary search by a lot.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

binary search vs. hash tables (was: Ill-advised use of CMOVE)

<2022May16.180830@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25351&group=comp.arch#25351

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: binary search vs. hash tables (was: Ill-advised use of CMOVE)
Date: Mon, 16 May 2022 16:08:30 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 47
Message-ID: <2022May16.180830@mips.complang.tuwien.ac.at>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me> <t5rg76$hp1$1@newsreader4.netcologne.de> <2022May16.080143@mips.complang.tuwien.ac.at> <t5t8r4$qdi$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="128a42e0247b02702bd84c94989e66d3";
logging-data="5690"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EmGGu5xXsE+La+z6qCQ+6"
Cancel-Lock: sha1:XiqhH7HDUE/w7LWbToCl1CPq/OQ=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 16 May 2022 16:08 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 5/15/2022 11:01 PM, Anton Ertl wrote:
>> And then there's the question of why one uses binary search in the
>> first place.
>
>In business applications, it at least used to be common enough that
>COBOL actually has source code statements to tell the compiler that an
>array (though they don't call it that) is sorted and that when searching
>it, generate binary search code instead of sequential. Back in the dark
>ages when I was programming some of this stuff, I used it quite frequently.

Sounds to me like it is common because it is in Cobol, not the other
way round. Certainly associative arrays in more recent languages are
typically implemented as hash tables, so much that they have sometimes
(Perl, Ruby and Seed7) been called hashes or (Common Lisp and Windows
PowerShell) hash tables rather than associative arrays.

So did they go for binary search in Cobol, because they did not know
better, or has something changed in the meantime? I think it's both.

At the time when Cobol was designed, it was not yet known how to
implement good hash functions efficiently. Instead, people advocated
division by a prime number (which indeed helps mixing the bits, but is
expensive).

On the other hand, memory lookups were cheap, and there were no caches
to be missed. Also, memory was small, so associative arrays were
small, i.e., binary search did not need many iterations (e.g., 10 in
case of 1000 entries).

These days, there may be many more entries, requiring more iterations
(e.g., 20 for 1,000,000 entries), accessing a different cache line in
every iteration, and after the first few iterations the accesses will
miss caches further and further out. By contrast, a well-implemented
hash table just incurs ~2 cache misses; and a good non-cryptographic
hash function is faster than a L3 cache hit (e.g. CityHash64 takes 9ns
(24 cycles) for a 64-byte string on a 2009-vintage 2.67GHz Xeon
X5550).

So, these days hash tables beat binary search in the typical case (and
with appropriate implementations, also in the worst case for
sufficiently large tables), but in 1960, things looked differently.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<t5u35g$fo4$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25352&group=comp.arch#25352

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!news.freedyn.de!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
Date: Mon, 16 May 2022 10:54:54 -0700
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <t5u35g$fo4$1@dont-email.me>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org>
<t58mug$ref$2@newsreader4.netcologne.de> <t5rbct$rbe$1@dont-email.me>
<t5rg76$hp1$1@newsreader4.netcologne.de>
<2022May16.080143@mips.complang.tuwien.ac.at> <t5t8r4$qdi$1@dont-email.me>
<2022May16.180830@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 16 May 2022 17:54:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4a27287573e3b71fdabadfe8eb764a67";
logging-data="16132"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/j+ridJ1vLxB9D/6KpvLYRxqYmo99REbc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.9.0
Cancel-Lock: sha1:0oon4HEmtG0kc5UAlwh+cu+OdkI=
In-Reply-To: <2022May16.180830@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Stephen Fuld - Mon, 16 May 2022 17:54 UTC

On 5/16/2022 9:08 AM, Anton Ertl wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> On 5/15/2022 11:01 PM, Anton Ertl wrote:
>>> And then there's the question of why one uses binary search in the
>>> first place.
>>
>> In business applications, it at least used to be common enough that
>> COBOL actually has source code statements to tell the compiler that an
>> array (though they don't call it that) is sorted and that when searching
>> it, generate binary search code instead of sequential. Back in the dark
>> ages when I was programming some of this stuff, I used it quite frequently.
>
> Sounds to me like it is common because it is in Cobol, not the other
> way round. Certainly associative arrays in more recent languages are
> typically implemented as hash tables, so much that they have sometimes
> (Perl, Ruby and Seed7) been called hashes or (Common Lisp and Windows
> PowerShell) hash tables rather than associative arrays.
>
> So did they go for binary search in Cobol, because they did not know
> better, or has something changed in the meantime? I think it's both.
>
> At the time when Cobol was designed, it was not yet known how to
> implement good hash functions efficiently. Instead, people advocated
> division by a prime number (which indeed helps mixing the bits, but is
> expensive).
>
> On the other hand, memory lookups were cheap, and there were no caches
> to be missed. Also, memory was small, so associative arrays were
> small, i.e., binary search did not need many iterations (e.g., 10 in
> case of 1000 entries).
>
> These days, there may be many more entries, requiring more iterations
> (e.g., 20 for 1,000,000 entries), accessing a different cache line in
> every iteration, and after the first few iterations the accesses will
> miss caches further and further out. By contrast, a well-implemented
> hash table just incurs ~2 cache misses; and a good non-cryptographic
> hash function is faster than a L3 cache hit (e.g. CityHash64 takes 9ns
> (24 cycles) for a 64-byte string on a 2009-vintage 2.67GHz Xeon
> X5550).
>
> So, these days hash tables beat binary search in the typical case (and
> with appropriate implementations, also in the worst case for
> sufficiently large tables), but in 1960, things looked differently.

You make good points. I don't know if current versions of COBOL offer a
hash table as an alternative for the search verb.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<t5u3fo$2cvk$1@gal.iecc.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25353&group=comp.arch#25353

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
Date: Mon, 16 May 2022 18:00:24 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t5u3fo$2cvk$1@gal.iecc.com>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.080143@mips.complang.tuwien.ac.at> <t5t8r4$qdi$1@dont-email.me> <2022May16.180830@mips.complang.tuwien.ac.at>
Injection-Date: Mon, 16 May 2022 18:00:24 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="78836"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <2022May16.080143@mips.complang.tuwien.ac.at> <t5t8r4$qdi$1@dont-email.me> <2022May16.180830@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Mon, 16 May 2022 18:00 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>So did they go for binary search in Cobol, because they did not know
>better, or has something changed in the meantime? I think it's both.
>
>At the time when Cobol was designed, it was not yet known how to
>implement good hash functions efficiently. Instead, people advocated
>division by a prime number (which indeed helps mixing the bits, but is
>expensive).

Maybe, but remember that COBOL was designed for memories that seem
impossibly small now. Binary search needs no extra storage, and the
code is very simple, needing no fancy arithmetic for the hashing or
dealing with collisions. I can think of ways to hash packed decimal
numbers, but ugh.

I agree that these days binary search only makes sense if you are
taking advantage of the order, e.g., if no match find the next larger
entry.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Ill-advised use of CMOVE

<ZEwgK.60130$qMI1.19308@fx96.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25355&group=comp.arch#25355

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx96.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad> <2022May15.194522@mips.complang.tuwien.ac.at> <0d201c5c-10a9-4954-994f-559e2adf9fc4n@googlegroups.com> <OQrgK.3333$6y5f.202@fx40.iad>
In-Reply-To: <OQrgK.3333$6y5f.202@fx40.iad>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 36
Message-ID: <ZEwgK.60130$qMI1.19308@fx96.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 16 May 2022 18:23:53 UTC
Date: Mon, 16 May 2022 14:23:25 -0400
X-Received-Bytes: 2383
 by: EricP - Mon, 16 May 2022 18:23 UTC

EricP wrote:
> Michael S wrote:
>> On Sunday, May 15, 2022 at 8:59:35 PM UTC+3, Anton Ertl wrote:
>>> EricP <ThatWould...@thevillage.com> writes:
>>>> Just making OoO predication work at all looks pretty challenging to me.
>>> OoO ARM CPUs exist.
>>
>> But most of them (too my best knowledge, all of them) don't have
>> "classic" A32
>> ISA as their primary instruction set. The older one are meant to run
>> T32 code
>> fast, the newer optimized for aarch64.
>> So, I wouldn't be surprised if all existing Arm OoO cores simply crack
>> predicated
>> A32 instruction and if it causes significant performance impact then
>> so it goes.
>
> According to page, the column for which ARMv7 are OoO lists:
> Cortex-A9, Cortex-A12, Cortex-A15, Cortex-A17,
> Qualcomm Scorpion (partial), Qualcomm Krait, Swift
>
> https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores
>
> I'll see if I can find some info on how
> the uArch implements predication.

I can't find any details on those ARM uArch's at all. None.
Almost all predication studies are proposals by Intel for doing OoO Itanium.

The only paper I found is not by ARM but looks at OoO predication
implementation, aka Guarded Execution, for an ARM ISA.

Efficient Out-of-order Execution of Guarded ISAs, 2013
https://hal.inria.fr/docs/00/91/12/30/PDF/RR-8406.pdf

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<t5u5c4$2n6r$1@gal.iecc.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25357&group=comp.arch#25357

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
Date: Mon, 16 May 2022 18:32:36 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t5u5c4$2n6r$1@gal.iecc.com>
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t5t8r4$qdi$1@dont-email.me> <2022May16.180830@mips.complang.tuwien.ac.at> <t5u35g$fo4$1@dont-email.me>
Injection-Date: Mon, 16 May 2022 18:32:36 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="89307"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t5t8r4$qdi$1@dont-email.me> <2022May16.180830@mips.complang.tuwien.ac.at> <t5u35g$fo4$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Mon, 16 May 2022 18:32 UTC

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
>You make good points. I don't know if current versions of COBOL offer a
>hash table as an alternative for the search verb.

Nope. There's SEARCH (serial), SEARCH ALL (binary), and SORT (sort it so you can do binary search.)

My impression is that for more complicated stuff, you put it in a database and use SQL.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<677f55fc-8833-4830-bacc-9c3925c7b1adn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25359&group=comp.arch#25359

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:518d:b0:45a:933f:965d with SMTP id kl13-20020a056214518d00b0045a933f965dmr17091162qvb.94.1652727526756;
Mon, 16 May 2022 11:58:46 -0700 (PDT)
X-Received: by 2002:a05:6830:91:b0:606:d01b:f7af with SMTP id
a17-20020a056830009100b00606d01bf7afmr6631961oto.60.1652727526409; Mon, 16
May 2022 11:58:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 16 May 2022 11:58:46 -0700 (PDT)
In-Reply-To: <t5u5c4$2n6r$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:783b:61d5:fd85:3249
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t5t8r4$qdi$1@dont-email.me>
<2022May16.180830@mips.complang.tuwien.ac.at> <t5u35g$fo4$1@dont-email.me> <t5u5c4$2n6r$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <677f55fc-8833-4830-bacc-9c3925c7b1adn@googlegroups.com>
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 16 May 2022 18:58:46 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1309
 by: Michael S - Mon, 16 May 2022 18:58 UTC

test

Re: Ill-advised use of CMOVE

<atygK.48016$JSxf.35968@fx11.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25362&group=comp.arch#25362

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Ill-advised use of CMOVE
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org> <t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at> <0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad> <2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad> <2022May15.194522@mips.complang.tuwien.ac.at>
In-Reply-To: <2022May15.194522@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 68
Message-ID: <atygK.48016$JSxf.35968@fx11.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 16 May 2022 20:27:50 UTC
Date: Mon, 16 May 2022 16:23:00 -0400
X-Received-Bytes: 4010
 by: EricP - Mon, 16 May 2022 20:23 UTC

Anton Ertl wrote:
> EricP <ThatWouldBeTelling@thevillage.com> writes:
>> Anton Ertl wrote:
>>> For predication, if you have an if-then-else instruction, and a
>>> then-instruction with the same destination register as an
>>> else-instruction, you can pair these up as the two complementary
>>> microinstructions producing the same target, without needing extra
>>> microinstructions. You can have a smart decoder that finds this out
>>> dynamically, or encode this pairing in the if-then-else instruction
>>> (with an exception if the target registers then don't match).
>> It looks like a uOp fusion.
>
> It certainly needs to have several instructions under consideration
> when decoding such an if-then-else and the instructions it covers.
>
>>>> This led them to propose what they called "predicate slip" whereby
>>>> the data side can proceed as soon as its operands are ready,
>>>> and the predicate state is checked before retire.
> ....
>>> For OoO you don't want to wait for retirement, because other
>>> instructions in the OoO engine depend on the results.
> ....
>> This was proposed for OoO predication.
>> It frees up reservation stations ASAP by executing uOps when their
>> data operands are ready and not wait for the guarding predicate.
>> The results are held in the ROB and cannot be forwarded until
>> the predicate resolves.
>
> Ok, but as soon as the predicate is then available, the result is made
> available. So this is only about where the uOp waits in the meantime.
> That should be no problem.

This was based on their observation that operand data values
arrive before predicate values. They believed that freeing up
reservation stations ASAP was worth the extra logic in their design.

>> For longer latency ops like DIV it makes sense.
>> Also for ops like MUL which are fast but the FU
>> is expensive so there is only one and they bottleneck.
>> And FP which take multiple cycles and bottlenecks.
>>
>> But for the average 1-cycle integer ALU op where FU's are cheap
>> and you have multiple units, this is unnecessary.
>
> It's not clear to me what you mean with "it" or "this" here. I can
> imagine situations where you want to perform any of these uOps
> eagerly, and situations where you want to wait for the predicate.

"It" and "this" means their proposed design.

And yes, that was my point. The cheap ALU uOps take 1 cycle,
and if a design can do back-to-back forwarding and execution
then there is no advantage gained for their eager execution.

> I think that in this case the compiler can better (than for
> predication-vs-branch) decide which one to use, but the decision is
> microarchitecture-dependent, so it may still be better to leave it to
> hardware.
>
>> Just making OoO predication work at all looks pretty challenging to me.
>
> OoO ARM CPUs exist.

Most papers I have seen who write about ARMv7 OoO predication
uArch says something to the effect of how difficult it is.
I haven't been able to find any actual details.

Re: Ill-advised use of CMOVE

<6c37df11-28cc-42a6-a509-2d807ce4dcf8n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25365&group=comp.arch#25365

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:843:0:b0:6a0:47d2:cdc5 with SMTP id 64-20020a370843000000b006a047d2cdc5mr13567155qki.689.1652736054109;
Mon, 16 May 2022 14:20:54 -0700 (PDT)
X-Received: by 2002:a05:6808:11ca:b0:2d9:a01a:488b with SMTP id
p10-20020a05680811ca00b002d9a01a488bmr14685066oiv.214.1652736053913; Mon, 16
May 2022 14:20:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 16 May 2022 14:20:53 -0700 (PDT)
In-Reply-To: <atygK.48016$JSxf.35968@fx11.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ed92:50fe:820d:b048;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ed92:50fe:820d:b048
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t58n6r$1riu$1@gioia.aioe.org>
<t5e1ek$s7c$1@dont-email.me> <2022May10.194427@mips.complang.tuwien.ac.at>
<0555194d-edc8-48fb-b304-7f78d62255d3n@googlegroups.com> <hlPeK.4263$pqKf.2583@fx12.iad>
<2022May14.100411@mips.complang.tuwien.ac.at> <IZ8gK.6876$j0D5.5549@fx09.iad>
<2022May15.194522@mips.complang.tuwien.ac.at> <atygK.48016$JSxf.35968@fx11.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6c37df11-28cc-42a6-a509-2d807ce4dcf8n@googlegroups.com>
Subject: Re: Ill-advised use of CMOVE
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 16 May 2022 21:20:54 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1974
 by: MitchAlsup - Mon, 16 May 2022 21:20 UTC

On Monday, May 16, 2022 at 3:27:53 PM UTC-5, EricP wrote:
> Anton Ertl wrote:

> > OoO ARM CPUs exist.
> Most papers I have seen who write about ARMv7 OoO predication
> uArch says something to the effect of how difficult it is.
> I haven't been able to find any actual details.
<
Perhaps because their predication model was not well though out !!
<
This does not mean that all forms of predication are difficult in all
kinds of instruction queue devices and mechanisms.

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<6985cff9-13dd-4b59-8a22-0d418f156c12n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25378&group=comp.arch#25378

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5b96:0:b0:2f8:af64:a0bd with SMTP id a22-20020ac85b96000000b002f8af64a0bdmr7650325qta.463.1652773600329;
Tue, 17 May 2022 00:46:40 -0700 (PDT)
X-Received: by 2002:a4a:b912:0:b0:33a:4543:8608 with SMTP id
x18-20020a4ab912000000b0033a45438608mr7543164ooo.94.1652773600032; Tue, 17
May 2022 00:46:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 17 May 2022 00:46:39 -0700 (PDT)
In-Reply-To: <677f55fc-8833-4830-bacc-9c3925c7b1adn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t5t8r4$qdi$1@dont-email.me>
<2022May16.180830@mips.complang.tuwien.ac.at> <t5u35g$fo4$1@dont-email.me>
<t5u5c4$2n6r$1@gal.iecc.com> <677f55fc-8833-4830-bacc-9c3925c7b1adn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6985cff9-13dd-4b59-8a22-0d418f156c12n@googlegroups.com>
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 17 May 2022 07:46:40 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1341
 by: Michael S - Tue, 17 May 2022 07:46 UTC

test

Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)

<69eee9e4-1f34-4994-88c7-65d756baf30an@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=25380&group=comp.arch#25380

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7e94:0:b0:2f3:ce2b:c320 with SMTP id w20-20020ac87e94000000b002f3ce2bc320mr18480124qtj.670.1652774831490;
Tue, 17 May 2022 01:07:11 -0700 (PDT)
X-Received: by 2002:aca:e155:0:b0:325:6d76:da4b with SMTP id
y82-20020acae155000000b003256d76da4bmr10333132oig.125.1652774831274; Tue, 17
May 2022 01:07:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 17 May 2022 01:07:11 -0700 (PDT)
In-Reply-To: <6985cff9-13dd-4b59-8a22-0d418f156c12n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <jwv1qx4xe9z.fsf-monnier+comp.arch@gnu.org> <t5t8r4$qdi$1@dont-email.me>
<2022May16.180830@mips.complang.tuwien.ac.at> <t5u35g$fo4$1@dont-email.me>
<t5u5c4$2n6r$1@gal.iecc.com> <677f55fc-8833-4830-bacc-9c3925c7b1adn@googlegroups.com>
<6985cff9-13dd-4b59-8a22-0d418f156c12n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <69eee9e4-1f34-4994-88c7-65d756baf30an@googlegroups.com>
Subject: Re: binary search vs. hash tables (was: Ill-advised use of CMOVE)
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 17 May 2022 08:07:11 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1403
 by: Michael S - Tue, 17 May 2022 08:07 UTC

test

Pages:123456
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor