Message-ID:

We are MicroSoft. You will be assimilated. Resistance is futile. (Attributed to B.G., Gill Bates)

devel / comp.arch / Re: Branch prediction hints

To quote the POWER9 User Manual:

# The POWER9 core normally ignores any software that attempts to
# override the dynamic branch prediction by setting the “a” bit
# in the BO field. This is done because historically programmers
# and compilers have made poor choices for setting the “a” bit,
# which limited the performance of codes where the hardware can
# do a superior job of predicting the branches.

Having read this: Are branching hints actually useful today?

I could see some use in a "almost never used" hint for branches
for fatal error messages, maybe.

> Having read this: Are branching hints actually useful today?

I had the impression that since the CPU needs to predict the address of
the next fetch during the current fetch, there just isn't much room for
using these branching hints: they'd presumably be most useful when the
branch is not yet in the branch prediction table, but in that case we
also don't yet have the branch instruction itself to look at its bits.

Stefan

Re: Branch prediction hints

<d3a951c5-866c-427c-b991-7395b2aaf07bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17046&group=comp.arch#17046

copy link Newsgroups: comp.arch

X-Received: by 2002:ae9:eb93:: with SMTP id b141mr21436196qkg.151.1621724304639;
Sat, 22 May 2021 15:58:24 -0700 (PDT)
X-Received: by 2002:a9d:19ed:: with SMTP id k100mr10873748otk.329.1621724304427;
Sat, 22 May 2021 15:58:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 22 May 2021 15:58:24 -0700 (PDT)
In-Reply-To: <s8c0j2$q5d$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ad18:1b36:cc04:fcd6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ad18:1b36:cc04:fcd6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d3a951c5-866c-427c-b991-7395b2aaf07bn@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 22 May 2021 22:58:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sat, 22 May 2021 22:58 UTC

On Saturday, May 22, 2021 at 5:28:51 PM UTC-5, Thomas Koenig wrote:
> To quote the POWER9 User Manual:
>
> # The POWER9 core normally ignores any software that attempts to
> # override the dynamic branch prediction by setting the “a” bit
> # in the BO field. This is done because historically programmers
> # and compilers have made poor choices for setting the “a” bit,
> # which limited the performance of codes where the hardware can
> # do a superior job of predicting the branches.
>
> Having read this: Are branching hints actually useful today?
<
Branch hints might (MIGHT) have been useful when predictors were
in the 90% accuracy range. Hints are a regrettable waste of entropy
with predictors in the 98% range.
>
> I could see some use in a "almost never used" hint for branches
> for fatal error messages, maybe.
<
Over in the unpredictable branch category, predication and conditional
move have made hints even less valuable than they originally were.
<
Predictors are very accurate on the use-once branches one finds around
initialization codes.

Re: Branch prediction hints

<s8c7gl$peh$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17049&group=comp.arch#17049

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sat, 22 May 2021 17:27:02 -0700
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <s8c7gl$peh$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 00:27:01 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="26065"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Avfna6MZ+y3Hyhpf4Uq58"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:gbTyh2BEM8JZ4Mtromtirg7LSAw=
In-Reply-To: <s8c0j2$q5d$1@newsreader4.netcologne.de>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 00:27 UTC

On 5/22/2021 3:28 PM, Thomas Koenig wrote:
> To quote the POWER9 User Manual:
>
> # The POWER9 core normally ignores any software that attempts to
> # override the dynamic branch prediction by setting the “a” bit
> # in the BO field. This is done because historically programmers
> # and compilers have made poor choices for setting the “a” bit,
> # which limited the performance of codes where the hardware can
> # do a superior job of predicting the branches.
>
> Having read this: Are branching hints actually useful today?
>
> I could see some use in a "almost never used" hint for branches
> for fatal error messages, maybe.
>

Hints are not useful, but the same logic that generates hints can be
used to swap the sense of conditional branches so that the expected case
is the not-taken one, i.e. fall through. This doesn't buy you anything
in latency over a good predictor, but does buy you denser code working
sets and hence reduced I$ contention and bandwidth demand.

Re: Branch prediction hints

<b56eb04b-6cd8-40f7-ac25-7bc89950f4a8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17052&group=comp.arch#17052

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5613:: with SMTP id ca19mr21902261qvb.3.1621737017244;
Sat, 22 May 2021 19:30:17 -0700 (PDT)
X-Received: by 2002:a4a:b389:: with SMTP id p9mr13494793ooo.71.1621737017050;
Sat, 22 May 2021 19:30:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 22 May 2021 19:30:16 -0700 (PDT)
In-Reply-To: <s8c7gl$peh$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ad18:1b36:cc04:fcd6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ad18:1b36:cc04:fcd6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8c7gl$peh$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b56eb04b-6cd8-40f7-ac25-7bc89950f4a8n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 May 2021 02:30:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sun, 23 May 2021 02:30 UTC

On Saturday, May 22, 2021 at 7:27:03 PM UTC-5, Ivan Godard wrote:
> On 5/22/2021 3:28 PM, Thomas Koenig wrote:
> > To quote the POWER9 User Manual:
> >
> > # The POWER9 core normally ignores any software that attempts to
> > # override the dynamic branch prediction by setting the “a” bit
> > # in the BO field. This is done because historically programmers
> > # and compilers have made poor choices for setting the “a” bit,
> > # which limited the performance of codes where the hardware can
> > # do a superior job of predicting the branches.
> >
> > Having read this: Are branching hints actually useful today?
> >
> > I could see some use in a "almost never used" hint for branches
> > for fatal error messages, maybe.
> >
> Hints are not useful, but the same logic that generates hints can be
> used to swap the sense of conditional branches so that the expected case
> is the not-taken one, i.e. fall through. This doesn't buy you anything
> in latency over a good predictor, but does buy you denser code working
> sets and hence reduced I$ contention and bandwidth demand.
<
The packetizer does this automagically. Packets are built in the "observed" direction.

Re: Branch prediction hints

<735b3c39-48cd-48ee-a12e-4898a268989en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17053&group=comp.arch#17053

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:570a:: with SMTP id 10mr19005188qtw.360.1621737071355;
Sat, 22 May 2021 19:31:11 -0700 (PDT)
X-Received: by 2002:aca:4ed4:: with SMTP id c203mr6930074oib.51.1621737071173;
Sat, 22 May 2021 19:31:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 22 May 2021 19:31:10 -0700 (PDT)
In-Reply-To: <b56eb04b-6cd8-40f7-ac25-7bc89950f4a8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ad18:1b36:cc04:fcd6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ad18:1b36:cc04:fcd6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8c7gl$peh$1@dont-email.me>
<b56eb04b-6cd8-40f7-ac25-7bc89950f4a8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <735b3c39-48cd-48ee-a12e-4898a268989en@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 May 2021 02:31:11 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sun, 23 May 2021 02:31 UTC

On Saturday, May 22, 2021 at 9:30:18 PM UTC-5, MitchAlsup wrote:
> On Saturday, May 22, 2021 at 7:27:03 PM UTC-5, Ivan Godard wrote:
> > On 5/22/2021 3:28 PM, Thomas Koenig wrote:
> > > To quote the POWER9 User Manual:
> > >
> > > # The POWER9 core normally ignores any software that attempts to
> > > # override the dynamic branch prediction by setting the “a” bit
> > > # in the BO field. This is done because historically programmers
> > > # and compilers have made poor choices for setting the “a” bit,
> > > # which limited the performance of codes where the hardware can
> > > # do a superior job of predicting the branches.
> > >
> > > Having read this: Are branching hints actually useful today?
> > >
> > > I could see some use in a "almost never used" hint for branches
> > > for fatal error messages, maybe.
> > >
> > Hints are not useful, but the same logic that generates hints can be
> > used to swap the sense of conditional branches so that the expected case
> > is the not-taken one, i.e. fall through. This doesn't buy you anything
> > in latency over a good predictor, but does buy you denser code working
> > sets and hence reduced I$ contention and bandwidth demand.
> <
> The packetizer does this automagically. Packets are built in the "observed" direction.
<
Not only in the as observed direction, but the branch to get there has zero
cost !

Re: Branch prediction hints

<s8cmv1$1e7$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17055&group=comp.arch#17055

copy link Newsgroups: comp.arch

Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sat, 22 May 2021 23:50:37 -0500
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <s8cmv1$1e7$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 04:50:41 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6c9cc4202b21f9c8a5dacf516cd4c7cd";
logging-data="1479"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19hH/VNXJaeYai+GHFhGf7c"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:sqKribLefzljTnuCk6LX9C6E75M=
In-Reply-To: <s8c0j2$q5d$1@newsreader4.netcologne.de>
Content-Language: en-US

by: BGB - Sun, 23 May 2021 04:50 UTC

On 5/22/2021 5:28 PM, Thomas Koenig wrote:
> To quote the POWER9 User Manual:
>
> # The POWER9 core normally ignores any software that attempts to
> # override the dynamic branch prediction by setting the “a” bit
> # in the BO field. This is done because historically programmers
> # and compilers have made poor choices for setting the “a” bit,
> # which limited the performance of codes where the hardware can
> # do a superior job of predicting the branches.
>
> Having read this: Are branching hints actually useful today?
>
> I could see some use in a "almost never used" hint for branches
> for fatal error messages, maybe.
>

Scenario 1:
Core is too cheap to do branch prediction:
Branch hints are useless.
Core only does a fixed prediction with no context:
Maybe relevant.
Core does branch prediction, has context:
This is useless.

Predictable branches:
A hardware branch predictor can predict them fairly easily, this is useless.

Unpredictable branches:
Can't be predicted either way, this is useless.

So, general leaning:
Branch direction hints are "kinda useless"...

Nevermind if my ISA has a few encodings which could potentially be
interpreted this way, in my defense, these encodings arrived as a
historical accident (predicated ops were added on after branches already
existed, so some redundant encodings appeared, ...).

It is likely I might reclaim some of this space eventually and use it
for something else (maybe more space for PrWEX ops?...).

As noted, some encodings technically exist (for Disp20 branches), but I
don't really consider them "valid":
BSR?T / BSR?F (1);
BT?T / BT?F / BF?T / BF?F;
WEX encoded branch ops (2).

*1: These operations "actually work", but predicated subroutine calls
aren't really an operation which "makes sense". So, fall into a sort of
"invalid de-facto because the operation itself is kinda absurd" category.

*2: Previously, these had been reclaimed as the original form of the
"jumbo" encoding, but then I reorganized some stuff and came up with a
"slightly less awful" encoding for Jumbo ops (at the expense of the
original Op48 space), but this leaves this as ambiguous.

Op48: Could re-add with new encodings, but isn't likely to be useful
enough to offset its cost if re-added.

More PrWEX? Is possible, but these isn't currently anything meaningful
to put there. The parts of the ISA which can't be encoded via PrWEX
either are not valid within the current ISA semantics (eg: Load/Store
with Displacement), or would not be able to fit into the encoding space
(Imm16 ops).

Some of these cases will either trigger an invalid opcode exception, or
trigger behaviors which are effectively undefined.

Then there are also encodings which are semantically redundant:
BT / BRA?T, BF / BRA?F
Where I could change this, but there is not currently any way to
eliminate this redundancy without breaking binary compatibility with
existing code, so "I may be stuck with them".

Well, and then there are the Jumbo-Branch encodings, which are "also
invalid", but would at this point most likely be used as part of an Op64
space (while there is now a "BRA/BSR Abs48" encoding, it doesn't count
here, because it is elsewhere in the encoding space; similar also goes
for "BRA Disp33s").

And, ironically enough, neither "BT Abs48" or "BRA?T Abs48" can be
encoded, but can't currently think up a whole lot of use cases where one
"really needs" a conditional branch to an absolute address.

....

Granted, it seems like it is almost inevitable that an ISA will be
"ugly" or "inefficient" in one area or another...
It is seemingly more just sorta shuffling stuff around to try to get
something "tolerable".
.... And eventually, the ISA either rams into some impassable wall, or it
turns into something resembling x86 ...

Re: Branch prediction hints

<s8csfm$172$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17056&group=comp.arch#17056

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sat, 22 May 2021 23:24:53 -0700
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <s8csfm$172$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 06:24:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="1250"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187XKZ7rh/aGohA/IOGaIXA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:7+sJXOLjtaTEqIpxQw9X6adADQQ=
In-Reply-To: <s8cmv1$1e7$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 06:24 UTC

On 5/22/2021 9:50 PM, BGB wrote:
> On 5/22/2021 5:28 PM, Thomas Koenig wrote:
>> To quote the POWER9 User Manual:
>>
>> # The POWER9 core normally ignores any software that attempts to
>> # override the dynamic branch prediction by setting the “a” bit
>> # in the BO field. This is done because historically programmers
>> # and compilers have made poor choices for setting the “a” bit,
>> # which limited the performance of codes where the hardware can
>> # do a superior job of predicting the branches.
>>
>> Having read this: Are branching hints actually useful today?
>>
>> I could see some use in a "almost never used" hint for branches
>> for fatal error messages, maybe.
>>
>
> Scenario 1:
> Core is too cheap to do branch prediction:
>     Branch hints are useless.
> Core only does a fixed prediction with no context:
>     Maybe relevant.
> Core does branch prediction, has context:
>     This is useless.
>
> Predictable branches:
> A hardware branch predictor can predict them fairly easily, this is
> useless.
>
> Unpredictable branches:
> Can't be predicted either way, this is useless.
>
> So, general leaning:
> Branch direction hints are "kinda useless"...
>
>
>
>
> Nevermind if my ISA has a few encodings which could potentially be
> interpreted this way, in my defense, these encodings arrived as a
> historical accident (predicated ops were added on after branches already
> existed, so some redundant encodings appeared, ...).
>
> It is likely I might reclaim some of this space eventually and use it
> for something else (maybe more space for PrWEX ops?...).
>
> As noted, some encodings technically exist (for Disp20 branches), but I
> don't really consider them "valid":
> BSR?T / BSR?F (1);
> BT?T / BT?F / BF?T / BF?F;
> WEX encoded branch ops (2).
>
> *1: These operations "actually work", but predicated subroutine calls
> aren't really an operation which "makes sense". So, fall into a sort of
> "invalid de-facto because the operation itself is kinda absurd" category.

Predicated calls are common in if-converted code. Of course, if you are
doing hardware bundle creation as in Mitch's then you don't need static
predication of any form.

Re: Branch prediction hints

<s8d40h$73f$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17058&group=comp.arch#17058

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 03:33:17 -0500
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <s8d40h$73f$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 08:33:21 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6c9cc4202b21f9c8a5dacf516cd4c7cd";
logging-data="7279"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Ih5d2HTgu4BbIsT68J2MF"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:dvoKlL2xM9eLA8B3lWTHzyt8Yu0=
In-Reply-To: <s8csfm$172$1@dont-email.me>
Content-Language: en-US

by: BGB - Sun, 23 May 2021 08:33 UTC

On 5/23/2021 1:24 AM, Ivan Godard wrote:
> On 5/22/2021 9:50 PM, BGB wrote:
>> On 5/22/2021 5:28 PM, Thomas Koenig wrote:
>>> To quote the POWER9 User Manual:
>>>
>>> # The POWER9 core normally ignores any software that attempts to
>>> # override the dynamic branch prediction by setting the “a” bit
>>> # in the BO field. This is done because historically programmers
>>> # and compilers have made poor choices for setting the “a” bit,
>>> # which limited the performance of codes where the hardware can
>>> # do a superior job of predicting the branches.
>>>
>>> Having read this: Are branching hints actually useful today?
>>>
>>> I could see some use in a "almost never used" hint for branches
>>> for fatal error messages, maybe.
>>>
>>
>> Scenario 1:
>>    Core is too cheap to do branch prediction:
>>      Branch hints are useless.
>>    Core only does a fixed prediction with no context:
>>      Maybe relevant.
>>    Core does branch prediction, has context:
>>      This is useless.
>>
>> Predictable branches:
>> A hardware branch predictor can predict them fairly easily, this is
>> useless.
>>
>> Unpredictable branches:
>> Can't be predicted either way, this is useless.
>>
>> So, general leaning:
>> Branch direction hints are "kinda useless"...
>>
>>
>>
>>
>> Nevermind if my ISA has a few encodings which could potentially be
>> interpreted this way, in my defense, these encodings arrived as a
>> historical accident (predicated ops were added on after branches
>> already existed, so some redundant encodings appeared, ...).
>>
>> It is likely I might reclaim some of this space eventually and use it
>> for something else (maybe more space for PrWEX ops?...).
>>
>> As noted, some encodings technically exist (for Disp20 branches), but
>> I don't really consider them "valid":
>>    BSR?T / BSR?F (1);
>>    BT?T / BT?F / BF?T / BF?F;
>>    WEX encoded branch ops (2).
>>
>> *1: These operations "actually work", but predicated subroutine calls
>> aren't really an operation which "makes sense". So, fall into a sort
>> of "invalid de-facto because the operation itself is kinda absurd"
>> category.
>
> Predicated calls are common in if-converted code. Of course, if you are
> doing hardware bundle creation as in Mitch's then you don't need static
> predication of any form.
>

Bundle creation in my case is explicit and handled by the compiler (or
ASM programmer).

But, I am only dealing with predication for simple branches, eg:
if(x>0)
x--;
Or:
if(x>0)
z=x-5;
else
z=x+13;
....

But, not for anything much more than a few instructions, or anything
involving a function call, ..., since presumably in this case a branch
is cheaper (and the core isn't really wide nor has enough registers to
really justify any IA-64 style modular scheduling trickery, ...).

Also, the state of SR.T would not be preserved across a function call,
so any logic following the function's return could not be predicated.

Re: Branch prediction hints

<s8dakc$27r$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17060&group=comp.arch#17060

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 03:26:19 -0700
Organization: A noiseless patient Spider
Lines: 104
Message-ID: <s8dakc$27r$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 10:26:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="2299"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19NEwyyKq/29QvrnBQyJOja"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:edTI3Ebt1kPXMmCP9UZNFkPfQEM=
In-Reply-To: <s8d40h$73f$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 10:26 UTC

On 5/23/2021 1:33 AM, BGB wrote:
> On 5/23/2021 1:24 AM, Ivan Godard wrote:
>> On 5/22/2021 9:50 PM, BGB wrote:
>>> On 5/22/2021 5:28 PM, Thomas Koenig wrote:
>>>> To quote the POWER9 User Manual:
>>>>
>>>> # The POWER9 core normally ignores any software that attempts to
>>>> # override the dynamic branch prediction by setting the “a” bit
>>>> # in the BO field. This is done because historically programmers
>>>> # and compilers have made poor choices for setting the “a” bit,
>>>> # which limited the performance of codes where the hardware can
>>>> # do a superior job of predicting the branches.
>>>>
>>>> Having read this: Are branching hints actually useful today?
>>>>
>>>> I could see some use in a "almost never used" hint for branches
>>>> for fatal error messages, maybe.
>>>>
>>>
>>> Scenario 1:
>>>    Core is too cheap to do branch prediction:
>>>      Branch hints are useless.
>>>    Core only does a fixed prediction with no context:
>>>      Maybe relevant.
>>>    Core does branch prediction, has context:
>>>      This is useless.
>>>
>>> Predictable branches:
>>> A hardware branch predictor can predict them fairly easily, this is
>>> useless.
>>>
>>> Unpredictable branches:
>>> Can't be predicted either way, this is useless.
>>>
>>> So, general leaning:
>>> Branch direction hints are "kinda useless"...
>>>
>>>
>>>
>>>
>>> Nevermind if my ISA has a few encodings which could potentially be
>>> interpreted this way, in my defense, these encodings arrived as a
>>> historical accident (predicated ops were added on after branches
>>> already existed, so some redundant encodings appeared, ...).
>>>
>>> It is likely I might reclaim some of this space eventually and use it
>>> for something else (maybe more space for PrWEX ops?...).
>>>
>>> As noted, some encodings technically exist (for Disp20 branches), but
>>> I don't really consider them "valid":
>>>    BSR?T / BSR?F (1);
>>>    BT?T / BT?F / BF?T / BF?F;
>>>    WEX encoded branch ops (2).
>>>
>>> *1: These operations "actually work", but predicated subroutine calls
>>> aren't really an operation which "makes sense". So, fall into a sort
>>> of "invalid de-facto because the operation itself is kinda absurd"
>>> category.
>>
>> Predicated calls are common in if-converted code. Of course, if you
>> are doing hardware bundle creation as in Mitch's then you don't need
>> static predication of any form.
>>
>
> Bundle creation in my case is explicit and handled by the compiler (or
> ASM programmer).
>
>
> But, I am only dealing with predication for simple branches, eg:
> if(x>0)
>     x--;
> Or:
> if(x>0)
>     z=x-5;
> else
>     z=x+13;
> ...
>

consider:
if(x>0)
z=x-5;
else
z=foo(x);
if you have exceptions under control, this can become:
t1=x-5;t2=(x>0)?nil:foo(x);
z=x>0?t1:t2;
the architectural challenge is how to implement "t2=(x>0)?nil:foo(x);",
i.e. predicated calls.

> But, not for anything much more than a few instructions, or anything
> involving a function call, ..., since presumably in this case a branch
> is cheaper (and the core isn't really wide nor has enough registers to
> really justify any IA-64 style modular scheduling trickery, ...).

That's a problem with any if-conversions or other speculative
scheduling: you have to have enough FUs to get useful parallelism.
There's a sweet spot in speculative width: too narrow and you lose the
benefit; too wide and the power and area cost more than the overhead of OOO

> Also, the state of SR.T would not be preserved across a function call,
> so any logic following the function's return could not be predicated.

This can be architected around; ours is not the only possible way to do it.

Thomas Koenig wrote:
> To quote the POWER9 User Manual:
>
> # The POWER9 core normally ignores any software that attempts to
> # override the dynamic branch prediction by setting the âaâ bit
> # in the BO field. This is done because historically programmers
> # and compilers have made poor choices for setting the âaâ bit,
> # which limited the performance of codes where the hardware can
> # do a superior job of predicting the branches.
>
> Having read this: Are branching hints actually useful today?
>
> I could see some use in a "almost never used" hint for branches
> for fatal error messages, maybe.

I think gcc already have an option to annotate branches that are very
unlikely to be taken, if that could be encoded as a hint to the cpu it
could totally bypass the BTB and thereby saving branch buffer space.

In order to retrofit it to existing architectures it would however
probably need to be as a separate hint opcode, previously defined as a
NOP, so making the code slower (at least by the time needed to decode
that NOP) on all previous cpu versions.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Thomas Koenig wrote:
> To quote the POWER9 User Manual:
>
> # The POWER9 core normally ignores any software that attempts to
> # override the dynamic branch prediction by setting the “a” bit
> # in the BO field. This is done because historically programmers
> # and compilers have made poor choices for setting the “a” bit,
> # which limited the performance of codes where the hardware can
> # do a superior job of predicting the branches.
>
> Having read this: Are branching hints actually useful today?
>
> I could see some use in a "almost never used" hint for branches
> for fatal error messages, maybe.

There is a case for a "predict never" hint where it doesn't matter
what this conditional branch did the last million times it executed,
always predict not-taken.

In a spinlock which normally test the lock condition with
a load before attempting the atomic sequence,
you never want to speculatively execute into the atomic sequence.
At a minimum it could cause ping-pong'ing the cache lines.

With Hardware Transactional Memory HTM reading a memory location
even speculatively might abort another's processors active transaction,
you don't want to even touch data memory without explicit permission,
not even prefetching any load or store addresses.

If you don't have a hint to explicitly block speculation at the
branch then the design would have to use more complicated and
probably error prone dynamic logic to "deduce" what to do.

Without an explicit "predict never" hint, in the case of HTM
this looked to me like that speculation might have to shut off
while a transaction was in progress because there is no way
to deduce which loads are guarded by a particular condition.
At a minimum, in an HTM it looked like no loads performed while
any prior branch was unresolved, not even prefetched into cache
(or maybe that's a good thing, I don't know).

Predict-never can also be used for rarely executed error handling code.

Predict-always is the complementary case for branching around
rarely executed error handling code that one wants inline,
and it doesn't matter what it did the last million times it executed.

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

[stuff on locks snipped]
> Thomas Koenig wrote:

>> I could see some use in a "almost never used" hint for branches
>> for fatal error messages, maybe.
>
> There is a case for a "predict never" hint where it doesn't matter
> what this conditional branch did the last million times it executed,
> always predict not-taken.
>
> In a spinlock which normally test the lock condition with
> a load before attempting the atomic sequence,
> you never want to speculatively execute into the atomic sequence.
> At a minimum it could cause ping-pong'ing the cache lines.

[HTM snipped]

> Predict-never can also be used for rarely executed error handling code.

> Predict-always is the complementary case for branching around
> rarely executed error handling code that one wants inline,
> and it doesn't matter what it did the last million times it executed.

Compilers already split functions into "hot" and "cold" parts
for better cache locality. Code that the programmer assures the
compiler will very rarely be taken is already put into the cold
section.

So, I take it from your article that a "Never predict this branch
to be taken" bit could be a reasonable thing to include in an ISA.

Re: Branch prediction hints

<03uqI.447319$2A5.348060@fx45.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17070&group=comp.arch#17070

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx45.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad> <s8dnjp$1d1$1@newsreader4.netcologne.de>
In-Reply-To: <s8dnjp$1d1$1@newsreader4.netcologne.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 56
Message-ID: <03uqI.447319$2A5.348060@fx45.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 23 May 2021 14:57:00 UTC
Date: Sun, 23 May 2021 10:56:29 -0400
X-Received-Bytes: 3154

by: EricP - Sun, 23 May 2021 14:56 UTC

Thomas Koenig wrote:
> EricP <ThatWouldBeTelling@thevillage.com> schrieb:
>
> [stuff on locks snipped]
>> Thomas Koenig wrote:
>
>>> I could see some use in a "almost never used" hint for branches
>>> for fatal error messages, maybe.
>> There is a case for a "predict never" hint where it doesn't matter
>> what this conditional branch did the last million times it executed,
>> always predict not-taken.
>>
>> In a spinlock which normally test the lock condition with
>> a load before attempting the atomic sequence,
>> you never want to speculatively execute into the atomic sequence.
>> At a minimum it could cause ping-pong'ing the cache lines.
>
> [HTM snipped]
>
>> Predict-never can also be used for rarely executed error handling code.
>
>> Predict-always is the complementary case for branching around
>> rarely executed error handling code that one wants inline,
>> and it doesn't matter what it did the last million times it executed.
>
> Compilers already split functions into "hot" and "cold" parts
> for better cache locality. Code that the programmer assures the
> compiler will very rarely be taken is already put into the cold
> section.

Yes, the case for a "predict-always" is weaker but there are times
when I want the error handler to sit right beside the error detector
so that the program counter indicates where the error occurred.

Also, symmetry is its' own reward (entropy considerations be dammed!).

> So, I take it from your article that a "Never predict this branch
> to be taken" bit could be a reasonable thing to include in an ISA.

I think so, or at least it makes a case for discussion.

This also crosses into Spectre territory. Spectre-safe code generation
could put the cold error handling code inline, and use a predict-never
hint to branch around error handler to the hot code.
It doesn't use the branch predictor so it can't be mis-trained.

But note that the use cases I gave for "predict-never" were both
using the hints to achieve a kind of branch-fence effect to block
speculation while a branch condition was unresolved.

Alternatively (left field thought here) maybe a branch-fence instruction
specifically designed to do that job would be more generally useful.
(Not sure what that means... just thinking out loud.)

Re: Branch prediction hints

<f7ed098c-cf36-404b-b61b-f14732b978c9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17073&group=comp.arch#17073

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:11ba:: with SMTP id c26mr23090539qkk.497.1621785390535;
Sun, 23 May 2021 08:56:30 -0700 (PDT)
X-Received: by 2002:a05:6808:f94:: with SMTP id o20mr8015534oiw.30.1621785390312;
Sun, 23 May 2021 08:56:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 May 2021 08:56:30 -0700 (PDT)
In-Reply-To: <s8dakc$27r$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:bc69:35bc:a8f4:11b6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:bc69:35bc:a8f4:11b6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f7ed098c-cf36-404b-b61b-f14732b978c9n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 May 2021 15:56:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sun, 23 May 2021 15:56 UTC

On Sunday, May 23, 2021 at 5:26:22 AM UTC-5, Ivan Godard wrote:
> On 5/23/2021 1:33 AM, BGB wrote:
> > On 5/23/2021 1:24 AM, Ivan Godard wrote:

> > Bundle creation in my case is explicit and handled by the compiler (or
> > ASM programmer).
> >
> >
> > But, I am only dealing with predication for simple branches, eg:
> > if(x>0)
> > x--;
> > Or:
> > if(x>0)
> > z=x-5;
> > else
> > z=x+13;
> > ...
PLT0 Rx,{1,1}
ADD Rz,Rx,#-1
....
PLT0 Rx,{2,10}
ADD Rz,Rx,#-5
ADD Rz,Rx,#13
> >
> consider:
> if(x>0)
> z=x-5;
> else
> z=foo(x);
<
PLT0 Rx,{4,1000}
ADD Rz,Rx,#-1
MOV R1,Rx
CALL foo
MOV Rz,R1
<
> if you have exceptions under control, this can become:
> t1=x-5;t2=(x>0)?nil:foo(x);
> z=x>0?t1:t2;
> the architectural challenge is how to implement "t2=(x>0)?nil:foo(x);",
> i.e. predicated calls.
<
You also have to predicate the argument setup and the result delivery.
<
> > But, not for anything much more than a few instructions, or anything
> > involving a function call, ..., since presumably in this case a branch
> > is cheaper (and the core isn't really wide nor has enough registers to
> > really justify any IA-64 style modular scheduling trickery, ...).
> That's a problem with any if-conversions or other speculative
> scheduling: you have to have enough FUs to get useful parallelism.
> There's a sweet spot in speculative width: too narrow and you lose the
> benefit; too wide and the power and area cost more than the overhead of OOO
<
The majority of the benefit has already accrued when you can predicate
as far as you FETCH width. SO if you FETCH 4-wide, you get the majority
of the benefit by predication at least 4 instructions. More than 2× this
distance and you should be using branches to avoid tracking instructions
that don't execute.
<
> > Also, the state of SR.T would not be preserved across a function call,
> > so any logic following the function's return could not be predicated.
> This can be architected around; ours is not the only possible way to do it.
<
If SR.T is in a preserved register, you just use PRED again after return.

Re: Branch prediction hints

<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17074&group=comp.arch#17074

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:b643:: with SMTP id g64mr25339418qkf.6.1621785495179;
Sun, 23 May 2021 08:58:15 -0700 (PDT)
X-Received: by 2002:a05:6830:4d0:: with SMTP id s16mr16067889otd.5.1621785494980;
Sun, 23 May 2021 08:58:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 May 2021 08:58:14 -0700 (PDT)
In-Reply-To: <s8dcbt$7f4$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:bc69:35bc:a8f4:11b6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:bc69:35bc:a8f4:11b6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 May 2021 15:58:15 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Sun, 23 May 2021 15:58 UTC

On Sunday, May 23, 2021 at 5:56:00 AM UTC-5, Terje Mathisen wrote:
> Thomas Koenig wrote:

> I think gcc already have an option to annotate branches that are very
> unlikely to be taken, if that could be encoded as a hint to the cpu it
> could totally bypass the BTB and thereby saving branch buffer space.
<
The fact that GCC can so annotate, does not increase the viability of
such annotations !
<
This thread is about whether the annotations are useful, and at what
scale they might be useful.
>
> In order to retrofit it to existing architectures it would however
> probably need to be as a separate hint opcode, previously defined as a
> NOP, so making the code slower (at least by the time needed to decode
> that NOP) on all previous cpu versions.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Branch prediction hints

<c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17075&group=comp.arch#17075

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:574:: with SMTP id p20mr25740962qkp.70.1621786062562;
Sun, 23 May 2021 09:07:42 -0700 (PDT)
X-Received: by 2002:a05:6808:117:: with SMTP id b23mr8309548oie.7.1621786062344;
Sun, 23 May 2021 09:07:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 May 2021 09:07:42 -0700 (PDT)
In-Reply-To: <Z7tqI.61928$N%1.35599@fx28.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:bc69:35bc:a8f4:11b6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:bc69:35bc:a8f4:11b6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 May 2021 16:07:42 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Sun, 23 May 2021 16:07 UTC

On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:

> There is a case for a "predict never" hint where it doesn't matter
> what this conditional branch did the last million times it executed,
> always predict not-taken.
>
> In a spinlock which normally test the lock condition with
> a load before attempting the atomic sequence,
> you never want to speculatively execute into the atomic sequence.
> At a minimum it could cause ping-pong'ing the cache lines.
<
Yes, the classical test-and-test-and-set. But this only decreases
bus traffic from BigO(n^3) to BigO(N^2). There are ways to
decrease bus traffic to BigO( N+3 )
>
> With Hardware Transactional Memory HTM reading a memory location
> even speculatively might abort another's processors active transaction,
> you don't want to even touch data memory without explicit permission,
> not even prefetching any load or store addresses.
<
This is one of the things WRONG about HTM.
<
in My 66000 there is a Exotic Synchronization Method (ESM) which is not
an HTM but can be used to create HTMs. In ESM, if the ATOMIC event has
reached a critical juncture (i.e., can complete) the CPUs reaching those
points gain the ability to NAK interference, allowing these CPUs to complete
the ATOMIC event, and making the interferers run slower!
>
> If you don't have a hint to explicitly block speculation at the
> branch then the design would have to use more complicated and
> probably error prone dynamic logic to "deduce" what to do.
<
You do not want Naked memory refs to be used to setup or complete
ATOMIC events. You need to "mark" their participation in the event
so the machine knows that such an event is going on from the out-
set.
>
> Without an explicit "predict never" hint, in the case of HTM
> this looked to me like that speculation might have to shut off
> while a transaction was in progress because there is no way
> to deduce which loads are guarded by a particular condition.
> At a minimum, in an HTM it looked like no loads performed while
> any prior branch was unresolved, not even prefetched into cache
> (or maybe that's a good thing, I don't know).
<
Should an ATOMIC event fail, the compiler needs to know that all
of the participating memory references are not viable containers
of data! And not ever use those units of stale data. The only use
that should be allowed is to print the values that failed.
>
> Predict-never can also be used for rarely executed error handling code.
>
> Predict-always is the complementary case for branching around
> rarely executed error handling code that one wants inline,
> and it doesn't matter what it did the last million times it executed.

Re: Branch prediction hints

<jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17077&group=comp.arch#17077

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.mixmin.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 12:52:37 -0400
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="c3c0b204b4c261b8b861b47a967870b7";
logging-data="12821"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/uTQINJv0h6Q+yTqhm9dgM"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:uI7avG2lNgdhb2ABcNrZis6H3QQ=
sha1:mow2WlYwWYJjDehPlk6aEWxx5us=

by: Stefan Monnier - Sun, 23 May 2021 16:52 UTC

>> I think gcc already have an option to annotate branches that are very
>> unlikely to be taken, if that could be encoded as a hint to the cpu it
>> could totally bypass the BTB and thereby saving branch buffer space.
> The fact that GCC can so annotate, does not increase the viability of
> such annotations !

Such annotations in the source code are most useful in order to improve
the generated code, not to add hints to branch instructions (I don't
know if GCC ever uses them to add hints to branch instructions, but
I know it uses them to guess which code is expected to be hot/cold).

Stefan

MitchAlsup wrote:
> On Sunday, May 23, 2021 at 5:56:00 AM UTC-5, Terje Mathisen wrote:
>> Thomas Koenig wrote:
>
>> I think gcc already have an option to annotate branches that are very
>> unlikely to be taken, if that could be encoded as a hint to the cpu it
>> could totally bypass the BTB and thereby saving branch buffer space.
> <
> The fact that GCC can so annotate, does not increase the viability of
> such annotations !
> <
> This thread is about whether the annotations are useful, and at what
> scale they might be useful.

We seem to once again be in violent agreement:

I generally don't like hint instructions, simply because I have seen far
too many instances where they hindered and _very_ few where they
actually helped.

See alos what I wrote below:
>>
>> In order to retrofit it to existing architectures it would however
>> probably need to be as a separate hint opcode, previously defined as a
>> NOP, so making the code slower (at least by the time needed to decode
>> that NOP) on all previous cpu versions.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Branch prediction hints

<s8e60c$ca6$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17083&group=comp.arch#17083

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 13:13:27 -0500
Organization: A noiseless patient Spider
Lines: 163
Message-ID: <s8e60c$ca6$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 18:13:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6c9cc4202b21f9c8a5dacf516cd4c7cd";
logging-data="12614"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+GnRSsp1hrD2tL469KmrMx"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:9jfotj7lJzWUXi6HzhCcfI8pQvk=
In-Reply-To: <s8dakc$27r$1@dont-email.me>
Content-Language: en-US

by: BGB - Sun, 23 May 2021 18:13 UTC

On 5/23/2021 5:26 AM, Ivan Godard wrote:
> On 5/23/2021 1:33 AM, BGB wrote:
>> On 5/23/2021 1:24 AM, Ivan Godard wrote:
>>> On 5/22/2021 9:50 PM, BGB wrote:
>>>> On 5/22/2021 5:28 PM, Thomas Koenig wrote:
>>>>> To quote the POWER9 User Manual:
>>>>>
>>>>> # The POWER9 core normally ignores any software that attempts to
>>>>> # override the dynamic branch prediction by setting the “a” bit
>>>>> # in the BO field. This is done because historically programmers
>>>>> # and compilers have made poor choices for setting the “a” bit,
>>>>> # which limited the performance of codes where the hardware can
>>>>> # do a superior job of predicting the branches.
>>>>>
>>>>> Having read this: Are branching hints actually useful today?
>>>>>
>>>>> I could see some use in a "almost never used" hint for branches
>>>>> for fatal error messages, maybe.
>>>>>
>>>>
>>>> Scenario 1:
>>>>    Core is too cheap to do branch prediction:
>>>>      Branch hints are useless.
>>>>    Core only does a fixed prediction with no context:
>>>>      Maybe relevant.
>>>>    Core does branch prediction, has context:
>>>>      This is useless.
>>>>
>>>> Predictable branches:
>>>> A hardware branch predictor can predict them fairly easily, this is
>>>> useless.
>>>>
>>>> Unpredictable branches:
>>>> Can't be predicted either way, this is useless.
>>>>
>>>> So, general leaning:
>>>> Branch direction hints are "kinda useless"...
>>>>
>>>>
>>>>
>>>>
>>>> Nevermind if my ISA has a few encodings which could potentially be
>>>> interpreted this way, in my defense, these encodings arrived as a
>>>> historical accident (predicated ops were added on after branches
>>>> already existed, so some redundant encodings appeared, ...).
>>>>
>>>> It is likely I might reclaim some of this space eventually and use
>>>> it for something else (maybe more space for PrWEX ops?...).
>>>>
>>>> As noted, some encodings technically exist (for Disp20 branches),
>>>> but I don't really consider them "valid":
>>>>    BSR?T / BSR?F (1);
>>>>    BT?T / BT?F / BF?T / BF?F;
>>>>    WEX encoded branch ops (2).
>>>>
>>>> *1: These operations "actually work", but predicated subroutine
>>>> calls aren't really an operation which "makes sense". So, fall into
>>>> a sort of "invalid de-facto because the operation itself is kinda
>>>> absurd" category.
>>>
>>> Predicated calls are common in if-converted code. Of course, if you
>>> are doing hardware bundle creation as in Mitch's then you don't need
>>> static predication of any form.
>>>
>>
>> Bundle creation in my case is explicit and handled by the compiler (or
>> ASM programmer).
>>
>>
>> But, I am only dealing with predication for simple branches, eg:
>>    if(x>0)
>>      x--;
>> Or:
>>    if(x>0)
>>      z=x-5;
>>    else
>>      z=x+13;
>> ...
>>
>
> consider:
>     if(x>0)
>       z=x-5;
>     else
>       z=foo(x);
> if you have exceptions under control, this can become:
>     t1=x-5;t2=(x>0)?nil:foo(x);
>     z=x>0?t1:t2;
> the architectural challenge is how to implement "t2=(x>0)?nil:foo(x);",
> i.e. predicated calls.
>

This case could be done as-is with some register-use trickery, though
unclear how useful it would be in general.

>> But, not for anything much more than a few instructions, or anything
>> involving a function call, ..., since presumably in this case a branch
>> is cheaper (and the core isn't really wide nor has enough registers to
>> really justify any IA-64 style modular scheduling trickery, ...).
>
> That's a problem with any if-conversions or other speculative
> scheduling: you have to have enough FUs to get useful parallelism.
> There's a sweet spot in speculative width: too narrow and you lose the
> benefit; too wide and the power and area cost more than the overhead of OOO
>

Yeah. In my own uses, what I was able to leverage in hand-written ASM
seems to imply an optimal width of ~ 2 or 3. Any wider, and I run out of
stuff that could be run in parallel, or run out of registers to put
stuff in. Some modular-loop scheduling was done manually in a few cases,
but is a rarity, and only sometimes pays off.

My C compiler still falls well short of this though...

It looks to me like making use of a 4 or 5 wide core would effectively
require a rather different approach:
multiple predication registers
with ops being able to select a src/dst predicate
bigger register file
...

At this point, it would start to look more like an Itanium.

Though, something kinda like Itanium, but with say 64 GPRs and 4 or 8
predicate registers, and variable-length bundles, could make some sense
(goal being to still use 32-bit instruction words and still have a
"plausible" code density).

One possibility for predication is that ops are predicated by default,
just one of the predicate flags is hard-wired (to allow for "always
execute" ops), then ops fall into a mode:
00 Scalar/End-Of-Bundle, Execute True
01 Scalar/End-Of-Bundle, Execute False
10 Wide, Execute True
11 Wide, Execute False
With a predicate register (Source):
00: Hard wired as True
01: Predicate 1
10: Predicate 2
11: Predicate 3

Other ops would have 3 registers (18 bits), or 2 registers (12 bits).
Compare ops could have a 2-bit predicate-destination field.

It is possible that 01:00 (Never Execute) could be used to encode a
Jumbo Prefix or similar (or, maybe a few unconditional large-immed
instructions or similar).

>> Also, the state of SR.T would not be preserved across a function call,
>> so any logic following the function's return could not be predicated.
>
> This can be architected around; ours is not the only possible way to do it.

Yeah. Most likely option would be a callee-save register containing
predicates or similar. As opposed to a single predicate flag which is
treated as a scratch value (and only ISRs need to bother with preserving
it).

Re: Branch prediction hints

<12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17087&group=comp.arch#17087

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:7306:: with SMTP id o6mr27147569qkc.38.1621797087388;
Sun, 23 May 2021 12:11:27 -0700 (PDT)
X-Received: by 2002:a05:6808:f94:: with SMTP id o20mr8377525oiw.30.1621797087211;
Sun, 23 May 2021 12:11:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 May 2021 12:11:27 -0700 (PDT)
In-Reply-To: <ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:bc69:35bc:a8f4:11b6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:bc69:35bc:a8f4:11b6
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com> <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
<ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 May 2021 19:11:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sun, 23 May 2021 19:11 UTC

On Sunday, May 23, 2021 at 12:15:04 PM UTC-5, robf...@gmail.com wrote:
> Speaking of the usefulness of branch hints for prediction I have to agree
> that they are not that useful. As a gag though I added the ability to supply
> branch predictor hints in ‘if’ statements that also allowed the branch
> predictor to be selected. How useful is it to be able to select the branch
> predictor to use (assuming multiple predictors are present)?
> The only case I can think of is maybe power savings.
<
I might note that Virtual Vector Method loops do not use the branch predictor
but are executed in advance of the loop iteration to effectively perform as if
the branch took zero cycles when the loop terminates (and zero cycles when
the loop continues.)
<
This improves the prediction accuracy of the "rest of the branches".
<
PREDication also does not use the branch predictor getting the HW setup
to execute either then-clause or else-clause. This also improves the prediction
accuracy of the "rest of the branches".

Re: Branch prediction hints

<s8e9pg$lnc$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17088&group=comp.arch#17088

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 12:18:07 -0700
Organization: A noiseless patient Spider
Lines: 85
Message-ID: <s8e9pg$lnc$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<f7ed098c-cf36-404b-b61b-f14732b978c9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 19:18:08 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="22252"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+KOL3kVR4XENWrbS+dRuKx"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:hVASPPU0vo0lE1NpF1gK8LqY6GU=
In-Reply-To: <f7ed098c-cf36-404b-b61b-f14732b978c9n@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 19:18 UTC

On 5/23/2021 8:56 AM, MitchAlsup wrote:
> On Sunday, May 23, 2021 at 5:26:22 AM UTC-5, Ivan Godard wrote:
>> On 5/23/2021 1:33 AM, BGB wrote:
>>> On 5/23/2021 1:24 AM, Ivan Godard wrote:
>
>>> Bundle creation in my case is explicit and handled by the compiler (or
>>> ASM programmer).
>>>
>>>
>>> But, I am only dealing with predication for simple branches, eg:
>>> if(x>0)
>>> x--;
>>> Or:
>>> if(x>0)
>>> z=x-5;
>>> else
>>> z=x+13;
>>> ...
> PLT0 Rx,{1,1}
> ADD Rz,Rx,#-1
> ...
> PLT0 Rx,{2,10}
> ADD Rz,Rx,#-5
> ADD Rz,Rx,#13
>>>
>> consider:
>> if(x>0)
>> z=x-5;
>> else
>> z=foo(x);
> <
> PLT0 Rx,{4,1000}
> ADD Rz,Rx,#-1
> MOV R1,Rx
> CALL foo
> MOV Rz,R1
> <
>> if you have exceptions under control, this can become:
>> t1=x-5;t2=(x>0)?nil:foo(x);
>> z=x>0?t1:t2;
>> the architectural challenge is how to implement "t2=(x>0)?nil:foo(x);",
>> i.e. predicated calls.
> <
> You also have to predicate the argument setup and the result delivery.

No, unless the argument setup is potentially excepting; another example
that predication must keep exceptions under control, or you get a
predication explosion and wind up with an ARM. Once you have Mill's NaR
bits or equivalent the only ops that get predicated are control flow
(including call and return) and store.

As for call result: whether a not-taken predicate clears or leaves alone
the result reg is a matter for architectural design; either can work.

>>> But, not for anything much more than a few instructions, or anything
>>> involving a function call, ..., since presumably in this case a branch
>>> is cheaper (and the core isn't really wide nor has enough registers to
>>> really justify any IA-64 style modular scheduling trickery, ...).
>> That's a problem with any if-conversions or other speculative
>> scheduling: you have to have enough FUs to get useful parallelism.
>> There's a sweet spot in speculative width: too narrow and you lose the
>> benefit; too wide and the power and area cost more than the overhead of OOO
> <
> The majority of the benefit has already accrued when you can predicate
> as far as you FETCH width. SO if you FETCH 4-wide, you get the majority
> of the benefit by predication at least 4 instructions. More than 2× this
> distance and you should be using branches to avoid tracking instructions
> that don't execute.

That assumes that consecutive instructuins all predicate the same way,
which works when you have hardware dynamic OOO, and doesn't when you
have static scheduling with interleaving from multiple paths.

There really doesn't seem to be any middle ground: either you work out
all the implications of static scheduling and wind up with a Mill, or
you work out all those of OOO and wind up with a MY66.

> <
>>> Also, the state of SR.T would not be preserved across a function call,
>>> so any logic following the function's return could not be predicated.
>> This can be architected around; ours is not the only possible way to do it.
> <
> If SR.T is in a preserved register, you just use PRED again after return.
>

Re: Branch prediction hints

<s8eabb$ig$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17089&group=comp.arch#17089

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 12:27:38 -0700
Organization: A noiseless patient Spider
Lines: 174
Message-ID: <s8eabb$ig$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 19:27:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="592"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/k3oixYDuYbIC/In6U0wKk"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:shr2OMTBK5b+mov+3hriRTzDZtE=
In-Reply-To: <s8e60c$ca6$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 19:27 UTC

On 5/23/2021 11:13 AM, BGB wrote:
> On 5/23/2021 5:26 AM, Ivan Godard wrote:
>> On 5/23/2021 1:33 AM, BGB wrote:
>>> On 5/23/2021 1:24 AM, Ivan Godard wrote:
>>>> On 5/22/2021 9:50 PM, BGB wrote:
>>>>> On 5/22/2021 5:28 PM, Thomas Koenig wrote:
>>>>>> To quote the POWER9 User Manual:
>>>>>>
>>>>>> # The POWER9 core normally ignores any software that attempts to
>>>>>> # override the dynamic branch prediction by setting the “a” bit
>>>>>> # in the BO field. This is done because historically programmers
>>>>>> # and compilers have made poor choices for setting the “a” bit,
>>>>>> # which limited the performance of codes where the hardware can
>>>>>> # do a superior job of predicting the branches.
>>>>>>
>>>>>> Having read this: Are branching hints actually useful today?
>>>>>>
>>>>>> I could see some use in a "almost never used" hint for branches
>>>>>> for fatal error messages, maybe.
>>>>>>
>>>>>
>>>>> Scenario 1:
>>>>>    Core is too cheap to do branch prediction:
>>>>>      Branch hints are useless.
>>>>>    Core only does a fixed prediction with no context:
>>>>>      Maybe relevant.
>>>>>    Core does branch prediction, has context:
>>>>>      This is useless.
>>>>>
>>>>> Predictable branches:
>>>>> A hardware branch predictor can predict them fairly easily, this is
>>>>> useless.
>>>>>
>>>>> Unpredictable branches:
>>>>> Can't be predicted either way, this is useless.
>>>>>
>>>>> So, general leaning:
>>>>> Branch direction hints are "kinda useless"...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Nevermind if my ISA has a few encodings which could potentially be
>>>>> interpreted this way, in my defense, these encodings arrived as a
>>>>> historical accident (predicated ops were added on after branches
>>>>> already existed, so some redundant encodings appeared, ...).
>>>>>
>>>>> It is likely I might reclaim some of this space eventually and use
>>>>> it for something else (maybe more space for PrWEX ops?...).
>>>>>
>>>>> As noted, some encodings technically exist (for Disp20 branches),
>>>>> but I don't really consider them "valid":
>>>>>    BSR?T / BSR?F (1);
>>>>>    BT?T / BT?F / BF?T / BF?F;
>>>>>    WEX encoded branch ops (2).
>>>>>
>>>>> *1: These operations "actually work", but predicated subroutine
>>>>> calls aren't really an operation which "makes sense". So, fall into
>>>>> a sort of "invalid de-facto because the operation itself is kinda
>>>>> absurd" category.
>>>>
>>>> Predicated calls are common in if-converted code. Of course, if you
>>>> are doing hardware bundle creation as in Mitch's then you don't need
>>>> static predication of any form.
>>>>
>>>
>>> Bundle creation in my case is explicit and handled by the compiler
>>> (or ASM programmer).
>>>
>>>
>>> But, I am only dealing with predication for simple branches, eg:
>>>    if(x>0)
>>>      x--;
>>> Or:
>>>    if(x>0)
>>>      z=x-5;
>>>    else
>>>      z=x+13;
>>> ...
>>>
>>
>> consider:
>>      if(x>0)
>>        z=x-5;
>>      else
>>        z=foo(x);
>> if you have exceptions under control, this can become:
>>      t1=x-5;t2=(x>0)?nil:foo(x);
>>      z=x>0?t1:t2;
>> the architectural challenge is how to implement
>> "t2=(x>0)?nil:foo(x);", i.e. predicated calls.
>>
>
> This case could be done as-is with some register-use trickery, though
> unclear how useful it would be in general.
>
>
>>> But, not for anything much more than a few instructions, or anything
>>> involving a function call, ..., since presumably in this case a
>>> branch is cheaper (and the core isn't really wide nor has enough
>>> registers to really justify any IA-64 style modular scheduling
>>> trickery, ...).
>>
>> That's a problem with any if-conversions or other speculative
>> scheduling: you have to have enough FUs to get useful parallelism.
>> There's a sweet spot in speculative width: too narrow and you lose the
>> benefit; too wide and the power and area cost more than the overhead
>> of OOO
>>
>
> Yeah. In my own uses, what I was able to leverage in hand-written ASM
> seems to imply an optimal width of ~ 2 or 3. Any wider, and I run out of
> stuff that could be run in parallel, or run out of registers to put
> stuff in. Some modular-loop scheduling was done manually in a few cases,
> but is a rarity, and only sometimes pays off.
>
>
> My C compiler still falls well short of this though...
>
>
> It looks to me like making use of a 4 or 5 wide core would effectively
> require a rather different approach:
> multiple predication registers
>    with ops being able to select a src/dst predicate
> bigger register file
> ...
>
> At this point, it would start to look more like an Itanium.
>
> Though, something kinda like Itanium, but with say 64 GPRs and 4 or 8
> predicate registers, and variable-length bundles, could make some sense
> (goal being to still use 32-bit instruction words and still have a
> "plausible" code density).
>
> One possibility for predication is that ops are predicated by default,
> just one of the predicate flags is hard-wired (to allow for "always
> execute" ops), then ops fall into a mode:
> 00 Scalar/End-Of-Bundle, Execute True
> 01 Scalar/End-Of-Bundle, Execute False
> 10 Wide, Execute True
> 11 Wide, Execute False
> With a predicate register (Source):
> 00: Hard wired as True
> 01: Predicate 1
> 10: Predicate 2
> 11: Predicate 3
>
> Other ops would have 3 registers (18 bits), or 2 registers (12 bits).
> Compare ops could have a 2-bit predicate-destination field.
>
> It is possible that 01:00 (Never Execute) could be used to encode a
> Jumbo Prefix or similar (or, maybe a few unconditional large-immed
> instructions or similar).

Having predication for most ops is just entropy clutter and a waste of
power: it costs more to *not* do an ADD than to do it, so always do it,
and junk the predicates. You need predicates for ops that might do a
hard throw, or that change persistent state like store and control flow;
nowhere else.

>>> Also, the state of SR.T would not be preserved across a function
>>> call, so any logic following the function's return could not be
>>> predicated.
>>
>> This can be architected around; ours is not the only possible way to
>> do it.
>
>
> Yeah. Most likely option would be a callee-save register containing
> predicates or similar. As opposed to a single predicate flag which is
> treated as a scratch value (and only ISRs need to bother with preserving
> it).
>

Re: Branch prediction hints

<s8eajo$ig$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17090&group=comp.arch#17090

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 12:32:08 -0700
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <s8eajo$ig$2@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<Z7tqI.61928$N%1.35599@fx28.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 23 May 2021 19:32:08 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="592"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18JqESXyyNR3Y8O4MzqC3mD"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:2/K5pvbGvRi0azh0MoDzcQzjCDM=
In-Reply-To: <Z7tqI.61928$N%1.35599@fx28.iad>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 19:32 UTC

On 5/23/2021 6:53 AM, EricP wrote:
> Thomas Koenig wrote:
>> To quote the POWER9 User Manual:
>>
>> # The POWER9 core normally ignores any software that attempts to
>> # override the dynamic branch prediction by setting the “a” bit
>> # in the BO field. This is done because historically programmers
>> # and compilers have made poor choices for setting the “a” bit,
>> # which limited the performance of codes where the hardware can
>> # do a superior job of predicting the branches.
>>
>> Having read this: Are branching hints actually useful today?
>>
>> I could see some use in a "almost never used" hint for branches
>> for fatal error messages, maybe.
>
> There is a case for a "predict never" hint where it doesn't matter
> what this conditional branch did the last million times it executed,
> always predict not-taken.
>
> In a spinlock which normally test the lock condition with
> a load before attempting the atomic sequence,
> you never want to speculatively execute into the atomic sequence.
> At a minimum it could cause ping-pong'ing the cache lines.
>
> With Hardware Transactional Memory HTM reading a memory location
> even speculatively might abort another's processors active transaction,
> you don't want to even touch data memory without explicit permission,
> not even prefetching any load or store addresses.
>
> If you don't have a hint to explicitly block speculation at the
> branch then the design would have to use more complicated and
> probably error prone dynamic logic to "deduce" what to do.
>
> Without an explicit "predict never" hint, in the case of HTM
> this looked to me like that speculation might have to shut off
> while a transaction was in progress because there is no way
> to deduce which loads are guarded by a particular condition.
> At a minimum, in an HTM it looked like no loads performed while
> any prior branch was unresolved, not even prefetched into cache
> (or maybe that's a good thing, I don't know).
>
> Predict-never can also be used for rarely executed error handling code.
>
> Predict-always is the complementary case for branching around
> rarely executed error handling code that one wants inline,
> and it doesn't matter what it did the last million times it executed.
>
>

Your HTM breaks on a load? Why? We break only on a colliding store.

Perhaps your HTM's intra-transaction state is visible from outside?

Re: Branch prediction hints

<s8earo$a9k$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=17091&group=comp.arch#17091

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Sun, 23 May 2021 12:36:24 -0700
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <s8earo$a9k$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<Z7tqI.61928$N%1.35599@fx28.iad>
<c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 23 May 2021 19:36:24 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22a41a4766778f1e6e183f2f1c53bf8f";
logging-data="10548"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eO4TCxxf+H57rHcEKE+eQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:0aOmPEZnUWyxJoT/PFMEGMtK4I0=
In-Reply-To: <c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Sun, 23 May 2021 19:36 UTC

On 5/23/2021 9:07 AM, MitchAlsup wrote:
> On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:
>
>> There is a case for a "predict never" hint where it doesn't matter
>> what this conditional branch did the last million times it executed,
>> always predict not-taken.
>>
>> In a spinlock which normally test the lock condition with
>> a load before attempting the atomic sequence,
>> you never want to speculatively execute into the atomic sequence.
>> At a minimum it could cause ping-pong'ing the cache lines.
> <
> Yes, the classical test-and-test-and-set. But this only decreases
> bus traffic from BigO(n^3) to BigO(N^2). There are ways to
> decrease bus traffic to BigO( N+3 )
>>
>> With Hardware Transactional Memory HTM reading a memory location
>> even speculatively might abort another's processors active transaction,
>> you don't want to even touch data memory without explicit permission,
>> not even prefetching any load or store addresses.
> <
> This is one of the things WRONG about HTM.
> <
> in My 66000 there is a Exotic Synchronization Method (ESM) which is not
> an HTM but can be used to create HTMs. In ESM, if the ATOMIC event has
> reached a critical juncture (i.e., can complete) the CPUs reaching those
> points gain the ability to NAK interference, allowing these CPUs to complete
> the ATOMIC event, and making the interferers run slower!
>>
>> If you don't have a hint to explicitly block speculation at the
>> branch then the design would have to use more complicated and
>> probably error prone dynamic logic to "deduce" what to do.
> <
> You do not want Naked memory refs to be used to setup or complete
> ATOMIC events. You need to "mark" their participation in the event
> so the machine knows that such an event is going on from the out-
> set.

This also permits you to have intra-transaction stores that are not part
of the transaction, say for logging and debugging, where you don't lock
the log memory.

>> Without an explicit "predict never" hint, in the case of HTM
>> this looked to me like that speculation might have to shut off
>> while a transaction was in progress because there is no way
>> to deduce which loads are guarded by a particular condition.
>> At a minimum, in an HTM it looked like no loads performed while
>> any prior branch was unresolved, not even prefetched into cache
>> (or maybe that's a good thing, I don't know).
> <
> Should an ATOMIC event fail, the compiler needs to know that all
> of the participating memory references are not viable containers
> of data! And not ever use those units of stale data. The only use
> that should be allowed is to print the values that failed.
>>
>> Predict-never can also be used for rarely executed error handling code.
>>
>> Predict-always is the complementary case for branching around
>> rarely executed error handling code that one wants inline,
>> and it doesn't matter what it did the last million times it executed.

Subject	Author
Branch prediction hints	Thomas Koenig
Re: Branch prediction hints	Stefan Monnier
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	BGB
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Stefan Monnier
Re: Branch prediction hints	BGB
Re: Branch prediction hints	robf...@gmail.com
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	BGB
Re: Branch prediction hints	BGB
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Terje Mathisen
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Stefan Monnier
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Marcus
Re: Branch prediction hints	Thomas Koenig
Re: Branch prediction hints	Marcus
Re: Branch prediction hints	Anton Ertl
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Stephen Fuld
Re: Branch prediction hints	Tim Rentsch
Re: Branch prediction hints	Stephen Fuld
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Quadibloc
Re: Branch prediction hints	Terje Mathisen
Re: Branch prediction hints	EricP
Re: Branch prediction hints	Thomas Koenig
Re: Branch prediction hints	EricP
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	EricP
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	EricP
Re: Branch prediction hints	MitchAlsup
Re: HW Transactions [was Branch prediction hints]	EricP
Re: HW Transactions [was Branch prediction hints]	MitchAlsup
Re: HW Transactions [was Branch prediction hints]	EricP
Re: HW Transactions [was Branch prediction hints]	MitchAlsup
Re: Branch prediction hints	EricP
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	Ivan Godard
Re: Branch prediction hints	MitchAlsup
Re: Branch prediction hints	EricP
Re: Branch prediction hints	Thomas Koenig