Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Time is an illusion perpetrated by the manufacturers of space.


devel / comp.arch / Minor idea for indirect target predictor

SubjectAuthor
* Minor idea for indirect target predictorPaul A. Clayton
`* Re: Minor idea for indirect target predictorMitchAlsup
 +* Re: Minor idea for indirect target predictorMitchAlsup
 |+- Re: Minor idea for indirect target predictorTerje Mathisen
 |`* Re: Minor idea for indirect target predictorEricP
 | `* Re: Minor idea for indirect target predictorMitchAlsup
 |  `* Re: Minor idea for indirect target predictorEricP
 |   `* Re: Minor idea for indirect target predictorThomas Koenig
 |    `- Re: Minor idea for indirect target predictorMitchAlsup
 +* Re: Minor idea for indirect target predictorPaul A. Clayton
 |`* Re: Minor idea for indirect target predictorMitchAlsup
 | `* Re: Minor idea for indirect target predictorPaul A. Clayton
 |  `- Re: Minor idea for indirect target predictorMitchAlsup
 `* Re: Minor idea for indirect target predictorAndy
  +* Re: Minor idea for indirect target predictorBGB
  |+- Re: Minor idea for indirect target predictorMitchAlsup
  |+* Re: Minor idea for indirect target predictorAnton Ertl
  ||+- Re: Minor idea for indirect target predictorMitchAlsup
  ||`- Re: Minor idea for indirect target predictorBGB
  |`* Re: Minor idea for indirect target predictorIvan Godard
  | +* Re: Minor idea for indirect target predictorMitchAlsup
  | |+* Re: Minor idea for indirect target predictorIvan Godard
  | ||+* Re: Minor idea for indirect target predictorMitchAlsup
  | |||`* Re: sparse switch (was Minor idea for indirect target predictor)Brian G. Lucas
  | ||| `* Re: sparse switch (was Minor idea for indirect target predictor)EricP
  | |||  +- Re: sparse switch (was Minor idea for indirect target predictor)Thomas Koenig
  | |||  +* Re: sparse switch (was Minor idea for indirect target predictor)Brian G. Lucas
  | |||  |`- Re: sparse switch (was Minor idea for indirect target predictor)Thomas Koenig
  | |||  `* Re: sparse switch (was Minor idea for indirect target predictor)MitchAlsup
  | |||   +- Re: sparse switch (was Minor idea for indirect target predictor)Thomas Koenig
  | |||   +- Re: sparse switch (was Minor idea for indirect target predictor)Terje Mathisen
  | |||   `* Re: sparse switch (was Minor idea for indirect target predictor)John Levine
  | |||    `* Re: sparse switch (was Minor idea for indirect target predictor)MitchAlsup
  | |||     `- Re: sparse switch (was Minor idea for indirect target predictor)Thomas Koenig
  | ||`- Re: Minor idea for indirect target predictorBGB
  | |+* Re: Minor idea for indirect target predictorStefan Monnier
  | ||`* Re: Minor idea for indirect target predictorMitchAlsup
  | || `* Re: Minor idea for indirect target predictorStefan Monnier
  | ||  `* Re: Minor idea for indirect target predictorMitchAlsup
  | ||   `* Re: Minor idea for indirect target predictorStefan Monnier
  | ||    `- Re: Minor idea for indirect target predictorMitchAlsup
  | |`- Re: Minor idea for indirect target predictorStefan Monnier
  | +* Re: Minor idea for indirect target predictorStephen Fuld
  | |+- Re: Minor idea for indirect target predictorIvan Godard
  | |+- Re: Minor idea for indirect target predictorMitchAlsup
  | |`* Re: Minor idea for indirect target predictorTerje Mathisen
  | | +* Re: Minor idea for indirect target predictorThomas Koenig
  | | |`* Re: Minor idea for indirect target predictorDavid Brown
  | | | +* Re: Minor idea for indirect target predictorBGB
  | | | |+* Re: Minor idea for indirect target predictorDavid Brown
  | | | ||`- Re: Minor idea for indirect target predictorBGB
  | | | |`* Re: Minor idea for indirect target predictorMarcus
  | | | | `* Re: Minor idea for indirect target predictorBGB
  | | | |  `* Re: Minor idea for indirect target predictorMarcus
  | | | |   `* Re: Minor idea for indirect target predictorBGB
  | | | |    `* Re: Minor idea for indirect target predictorTerje Mathisen
  | | | |     `* Re: Minor idea for indirect target predictorBGB
  | | | |      `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |       `* Re: Minor idea for indirect target predictorBGB
  | | | |        `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         +* Re: Minor idea for indirect target predictorMitchAlsup
  | | | |         |+- Re: Minor idea for indirect target predictorBGB
  | | | |         |`* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         | `* Re: Minor idea for indirect target predictorIvan Godard
  | | | |         |  +- Re: Minor idea for indirect target predictorJohn Dallman
  | | | |         |  `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   `* Re: Minor idea for indirect target predictorGeorge Neuner
  | | | |         |    `- Re: Minor idea for indirect target predictorEricP
  | | | |         +* Re: Minor idea for indirect target predictorAndy Valencia
  | | | |         |`* Re: Minor idea for indirect target predictorIvan Godard
  | | | |         | `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |  `* Re: Minor idea for indirect target predictorBGB
  | | | |         |   +* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   |+* Re: Minor idea for indirect target predictorIvan Godard
  | | | |         |   ||+- Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   ||`- Re: Minor idea for indirect target predictorGeorge Neuner
  | | | |         |   |+* Re: Minor idea for indirect target predictorBGB
  | | | |         |   ||`* Re: Minor idea for indirect target predictorStefan Monnier
  | | | |         |   || `* Re: Minor idea for indirect target predictorGeorge Neuner
  | | | |         |   ||  `* Re: Minor idea for indirect target predictorBGB
  | | | |         |   ||   `* Re: Minor idea for indirect target predictorMitchAlsup
  | | | |         |   ||    `- Re: Minor idea for indirect target predictorBGB
  | | | |         |   |`* Re: Minor idea for indirect target predictorDavid Brown
  | | | |         |   | `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   |  +- Re: Minor idea for indirect target predictorJohn Dallman
  | | | |         |   |  `* Re: Minor idea for indirect target predictorDavid Brown
  | | | |         |   |   `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   |    `* Re: Minor idea for indirect target predictorDavid Brown
  | | | |         |   |     `* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   |      +* Re: Minor idea for indirect target predictorStephen Fuld
  | | | |         |   |      |+* Re: Minor idea for indirect target predictorGeorge Neuner
  | | | |         |   |      ||`* Re: Minor idea for indirect target predictorStephen Fuld
  | | | |         |   |      || +* Re: Minor idea for indirect target predictorMitchAlsup
  | | | |         |   |      || |+- Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   |      || |`* Re: Minor idea for indirect target predictorGeorge Neuner
  | | | |         |   |      || | `- Re: Minor idea for indirect target predictorMitchAlsup
  | | | |         |   |      || `* Re: Minor idea for indirect target predictorGeorge Neuner
  | | | |         |   |      ||  +* Re: Minor idea for indirect target predictorStephen Fuld
  | | | |         |   |      ||  |`- Re: Minor idea for indirect target predictorDavid Brown
  | | | |         |   |      ||  `* Re: Minor idea for indirect target predictorBGB
  | | | |         |   |      ||   +* Re: Minor idea for indirect target predictorStefan Monnier
  | | | |         |   |      ||   +* Re: Minor idea for indirect target predictorIvan Godard
  | | | |         |   |      ||   `* Re: Python performance (was: Minor idea for indirect targetMarcus
  | | | |         |   |      |`* Re: Minor idea for indirect target predictorThomas Koenig
  | | | |         |   |      `* Re: Minor idea for indirect target predictorDavid Brown
  | | | |         |   `* Re: Minor idea for indirect target predictorDavid Brown
  | | | |         `- Re: Minor idea for indirect target predictorBGB
  | | | `* Re: Minor idea for indirect target predictorThomas Koenig
  | | `- Re: Minor idea for indirect target predictorBGB
  | `* Re: Minor idea for indirect target predictorBGB
  `- Re: Minor idea for indirect target predictorMitchAlsup

Pages:12345678
Minor idea for indirect target predictor

<bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18110&group=comp.arch#18110

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:d55:: with SMTP id 82mr4500712qkn.330.1624731990551;
Sat, 26 Jun 2021 11:26:30 -0700 (PDT)
X-Received: by 2002:a9d:784e:: with SMTP id c14mr2866866otm.276.1624731990329;
Sat, 26 Jun 2021 11:26:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Jun 2021 11:26:30 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
Subject: Minor idea for indirect target predictor
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Sat, 26 Jun 2021 18:26:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3103
 by: Paul A. Clayton - Sat, 26 Jun 2021 18:26 UTC

Indirect branches are typically divided into single-target branches (at least within a local execution context sufficient to provide accurate prediction and multiple-target branches. The predictor for the former is best indexed by a hash of the address of branch instruction, while for the multiple-target branches including some history information for the predictor index is desirable.

This is similar to the problem of indexing a TLB for multiple page sizes and so might be addressed by André Seznec's overlaid skewed associativity (my term — Seznec did not provide a name distinguishing from separate bank skewed associativity) technique ("Concurrent Support of Multiple Page Sizes On a Skewed Associative TLB", 2003). This would have the same advantage of sharing capacity among branch types. (In principle, direct branches might also be included as a special case of single-target branches.)

By preferring a more specific prediction, a common case target with one or more specific exceptions might be accurately predicted.

This is not fundamentally different from using separate target predictors except that the capacity is shared (at the cost of conflicts) — very analogous to the difference between André Seznec's TLB and using per-page-size TLBs. Using hash-rehash/multiple probes is another alternative for capacity sharing; multiple probes could also be applied with overlaid skewed associativity. A later probe could use later available information; while one bit of later arriving information might be used by doubling the read width and selecting, the cost of that technique increases exponentially.

If the storage has non-uniform access latency, there may be optimization opportunities in the indexing functions and their mappings to different-latency locations, perhaps exploiting different timings of information availability or accuracy/criticality of prediction. (Non-uniform latency encourages more complex placement, migration, and replacement choices.)

Re: Minor idea for indirect target predictor

<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18114&group=comp.arch#18114

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6911:: with SMTP id e17mr15098377qtr.256.1624735895124;
Sat, 26 Jun 2021 12:31:35 -0700 (PDT)
X-Received: by 2002:a9d:ecf:: with SMTP id 73mr10636478otj.5.1624735894837;
Sat, 26 Jun 2021 12:31:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Jun 2021 12:31:34 -0700 (PDT)
In-Reply-To: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d167:c4a4:9932:46f3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d167:c4a4:9932:46f3
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 26 Jun 2021 19:31:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sat, 26 Jun 2021 19:31 UTC

On Saturday, June 26, 2021 at 1:26:31 PM UTC-5, Paul A. Clayton wrote:
> Indirect branches are typically divided into single-target branches (at least within a local execution context sufficient to provide accurate prediction and multiple-target branches. The predictor for the former is best indexed by a hash of the address of branch instruction, while for the multiple-target branches including some history information for the predictor index is desirable.
<
There are five cases: Switches, Returns, Method calls, Tabulated Calls, and Longjump.
<
Switches tend to be unpredictable much of the time.
Returns are easily highly predictable
Method calls tend to use few methods more than others
Tabulated calls tend to be indexed tables of entry points
And longjump is a category all by itself.
<
If your ISA is supporting indirect branches within a subroutine, you are likely to remain
"on your own" as far as branch prediction is concerned.
<
Dense Switches, Method Calls, and Tabulated Calls are directly supported in My 66000 ISA
and do not use Indirect Jump or Indirect Call.
<
Returns are best supported by a call-return stack predictor that can self repair when
a mispredicted branch leads to a return which must then be undone when the branch
is recovered (so the return when it does happen returns to the proper place..)
>
Switches are decomposed into dense areas and loose areas. Loose areas are
performed by a series of compares and branches, while dense areas are performed
by indexing through a table of branch target addresses. The Loose areas are adequately
supported by normal branch prediction, while the dense areas have in the past been
supported by the generalized indirect predictors.
<
A Tabulated Call is where a field of (say) an instruction is used to index a table of
worker functions that perform the required semantic of the (say) instruction. The
index into the table is the bit pattern of the field. Should there not be a defined
semantic for a particular bit pattern, the call is performed to the undefined handler.
<
A method call is similar to a tabulated call except the compiler is in charge of the
indexing of the table and of its contents.
>
> This is similar to the problem of indexing a TLB for multiple page sizes and so might be addressed by André Seznec's overlaid skewed associativity (my term — Seznec did not provide a name distinguishing from separate bank skewed associativity) technique ("Concurrent Support of Multiple Page Sizes On a Skewed Associative TLB", 2003). This would have the same advantage of sharing capacity among branch types. (In principle, direct branches might also be included as a special case of single-target branches.)
>
> By preferring a more specific prediction, a common case target with one or more specific exceptions might be accurately predicted.
<
I would say that by filtering out Method Calls, from Tabularized calls, from "indirect"
calls and jumps one improves the predictability of the well organized means from the
less well organized means.
>
> This is not fundamentally different from using separate target predictors except that the capacity is shared (at the cost of conflicts) — very analogous to the difference between André Seznec's TLB and using per-page-size TLBs. Using hash-rehash/multiple probes is another alternative for capacity sharing; multiple probes could also be applied with overlaid skewed associativity. A later probe could use later available information; while one bit of later arriving information might be used by doubling the read width and selecting, the cost of that technique increases exponentially.
>
> If the storage has non-uniform access latency, there may be optimization opportunities in the indexing functions and their mappings to different-latency locations, perhaps exploiting different timings of information availability or accuracy/criticality of prediction. (Non-uniform latency encourages more complex placement, migration, and replacement choices.)

Re: Minor idea for indirect target predictor

<e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18122&group=comp.arch#18122

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:4b:: with SMTP id c11mr18503897qvr.18.1624757938894;
Sat, 26 Jun 2021 18:38:58 -0700 (PDT)
X-Received: by 2002:a4a:956f:: with SMTP id n44mr14524140ooi.54.1624757938657;
Sat, 26 Jun 2021 18:38:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Jun 2021 18:38:58 -0700 (PDT)
In-Reply-To: <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d167:c4a4:9932:46f3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d167:c4a4:9932:46f3
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 27 Jun 2021 01:38:58 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 27 Jun 2021 01:38 UTC

Adding::

On Saturday, June 26, 2021 at 2:31:36 PM UTC-5, MitchAlsup wrote:
> On Saturday, June 26, 2021 at 1:26:31 PM UTC-5, Paul A. Clayton wrote:
> > Indirect branches are typically divided into single-target branches (at least within a local execution context sufficient to provide accurate prediction and multiple-target branches. The predictor for the former is best indexed by a hash of the address of branch instruction, while for the multiple-target branches including some history information for the predictor index is desirable.
> <
> There are five cases: Switches, Returns, Method calls, Tabulated Calls, and Longjump.
> <
> Switches tend to be unpredictable much of the time.
> Returns are easily highly predictable
> Method calls tend to use few methods more than others
> Tabulated calls tend to be indexed tables of entry points
> And longjump is a category all by itself.
> <
> If your ISA is supporting indirect branches within a subroutine, you are likely to remain
> "on your own" as far as branch prediction is concerned.
> <
> Dense Switches, Method Calls, and Tabulated Calls are directly supported in My 66000 ISA
> and do not use Indirect Jump or Indirect Call.
> <
> Returns are best supported by a call-return stack predictor that can self repair when
> a mispredicted branch leads to a return which must then be undone when the branch
> is recovered (so the return when it does happen returns to the proper place.)
> >
> Switches are decomposed into dense areas and loose areas. Loose areas are
> performed by a series of compares and branches, while dense areas are performed
> by indexing through a table of branch target addresses. The Loose areas are adequately
> supported by normal branch prediction, while the dense areas have in the past been
> supported by the generalized indirect predictors.
> <
> A Tabulated Call is where a field of (say) an instruction is used to index a table of
> worker functions that perform the required semantic of the (say) instruction. The
> index into the table is the bit pattern of the field. Should there not be a defined
> semantic for a particular bit pattern, the call is performed to the undefined handler.
> <
> A method call is similar to a tabulated call except the compiler is in charge of the
> indexing of the table and of its contents.
> >
> > This is similar to the problem of indexing a TLB for multiple page sizes and so might be addressed by André Seznec's overlaid skewed associativity (my term — Seznec did not provide a name distinguishing from separate bank skewed associativity) technique ("Concurrent Support of Multiple Page Sizes On a Skewed Associative TLB", 2003). This would have the same advantage of sharing capacity among branch types. (In principle, direct branches might also be included as a special case of single-target branches.)
<
I think the problem is more fundamental, let us postulate that you have a simulator for a processor.
This processor has an instruction set, and the instruction set has a few formats. One way to decode
this ISA is to take the major OpCode and index a table of format subroutines. These format subroutines
look at another (format specific) field and index another table of subroutines. There will be heavy use
of the first table, with the Major OpCode touching most of the (valid) decode indexes ever hundred
simulated instructions. The other format subroutines are used less often (on average) but with even
more sporadic use.
<
This puts the predictor in limbo--unless it can somehow guess the actual instruction stream being
processed. Thus, low prediction accuracy is expected. Lower than typical Switch prediction accuracy
while processing a parser done in YACC.
<
> >
> > By preferring a more specific prediction, a common case target with one or more specific exceptions might be accurately predicted.
> <
> I would say that by filtering out Method Calls, from Tabularized calls, from "indirect"
> calls and jumps one improves the predictability of the well organized means from the
> less well organized means.
> >
> > This is not fundamentally different from using separate target predictors except that the capacity is shared (at the cost of conflicts) — very analogous to the difference between André Seznec's TLB and using per-page-size TLBs. Using hash-rehash/multiple probes is another alternative for capacity sharing; multiple probes could also be applied with overlaid skewed associativity. A later probe could use later available information; while one bit of later arriving information might be used by doubling the read width and selecting, the cost of that technique increases exponentially.
> >
> > If the storage has non-uniform access latency, there may be optimization opportunities in the indexing functions and their mappings to different-latency locations, perhaps exploiting different timings of information availability or accuracy/criticality of prediction. (Non-uniform latency encourages more complex placement, migration, and replacement choices.)

Re: Minor idea for indirect target predictor

<sba2nd$1jsh$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18131&group=comp.arch#18131

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 27 Jun 2021 16:41:49 +0200
Organization: Aioe.org NNTP Server
Lines: 29
Message-ID: <sba2nd$1jsh$1@gioia.aioe.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
NNTP-Posting-Host: XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Sun, 27 Jun 2021 14:41 UTC

MitchAlsup wrote:
> I think the problem is more fundamental, let us postulate that you have a simulator for a processor.
> This processor has an instruction set, and the instruction set has a few formats. One way to decode
> this ISA is to take the major OpCode and index a table of format subroutines. These format subroutines
> look at another (format specific) field and index another table of subroutines. There will be heavy use
> of the first table, with the Major OpCode touching most of the (valid) decode indexes ever hundred
> simulated instructions. The other format subroutines are used less often (on average) but with even
> more sporadic use.
> <
> This puts the predictor in limbo--unless it can somehow guess the actual instruction stream being
> processed. Thus, low prediction accuracy is expected. Lower than typical Switch prediction accuracy
> while processing a parser done in YACC.

This is why I always try to see if it makes sense to convert those code
switches to data switches, i.e. the first opcode table is used to lookup
the fields needed for the second level, which can then proceed at once.

This sometimes means that the second-level entry have to say "Whoa, that
was too far, backtrack now", but if these fixups are rare enough, it
might still be faster.

In the extreme case there are no exceptions, so everything just becomes
a long data-dependent graph with no (code) branching.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Minor idea for indirect target predictor

<Pj1CI.118763$431.56899@fx39.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18132&group=comp.arch#18132

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
In-Reply-To: <e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 38
Message-ID: <Pj1CI.118763$431.56899@fx39.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 27 Jun 2021 16:03:59 UTC
Date: Sun, 27 Jun 2021 12:03:43 -0400
X-Received-Bytes: 3013
 by: EricP - Sun, 27 Jun 2021 16:03 UTC

MitchAlsup wrote:
> On Saturday, June 26, 2021 at 2:31:36 PM UTC-5, MitchAlsup wrote:
>> <
>> A Tabulated Call is where a field of (say) an instruction is used to index a table of
>> worker functions that perform the required semantic of the (say) instruction. The
>> index into the table is the bit pattern of the field. Should there not be a defined
>> semantic for a particular bit pattern, the call is performed to the undefined handler.
>> <
> I think the problem is more fundamental, let us postulate that you have a simulator for a processor.
> This processor has an instruction set, and the instruction set has a few formats. One way to decode
> this ISA is to take the major OpCode and index a table of format subroutines. These format subroutines
> look at another (format specific) field and index another table of subroutines. There will be heavy use
> of the first table, with the Major OpCode touching most of the (valid) decode indexes ever hundred
> simulated instructions. The other format subroutines are used less often (on average) but with even
> more sporadic use.
> <

By the way, this is how I believe the VAX 780 decoder worked.
Microcode can perform N-way branches and calls by jamming a value
into the low order bits of the uPC (micro-op program counter).

The root instruction sequence reads the first instruction byte.
The opcode byte is looked up in a table (a PLA) to get a uPC destination
address, which is then called (an N-way call).
That routine has a sequence of uOp calls to fetch operands.
The operand fetch routine reads the address mode byte, looks it up in
a table, and does an N-way call, to a routine to fetch the operand data.

The instruction sequence itself may not need to be different for
different operations as the opcode lookup can provide bits that
are routed directly to the ALU to control its operation.
All the instruction sequence has to do it tell the ALU to
"do the thing that the opcode bits say".

So a few microcode sequences can handle most of the simple arithmetic
and logic instructions, using N-way calls to handle the operands.

Re: Minor idea for indirect target predictor

<07e961f9-8036-4f44-85b9-e1dba0ba443bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18133&group=comp.arch#18133

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:be85:: with SMTP id n5mr6699442qvi.59.1624812345869;
Sun, 27 Jun 2021 09:45:45 -0700 (PDT)
X-Received: by 2002:a05:6830:33ea:: with SMTP id i10mr17379249otu.342.1624812345644;
Sun, 27 Jun 2021 09:45:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 27 Jun 2021 09:45:45 -0700 (PDT)
In-Reply-To: <Pj1CI.118763$431.56899@fx39.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8ceb:68e4:4832:bb07;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8ceb:68e4:4832:bb07
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
<Pj1CI.118763$431.56899@fx39.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <07e961f9-8036-4f44-85b9-e1dba0ba443bn@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 27 Jun 2021 16:45:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 27 Jun 2021 16:45 UTC

On Sunday, June 27, 2021 at 11:04:04 AM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Saturday, June 26, 2021 at 2:31:36 PM UTC-5, MitchAlsup wrote:
> >> <
> >> A Tabulated Call is where a field of (say) an instruction is used to index a table of
> >> worker functions that perform the required semantic of the (say) instruction. The
> >> index into the table is the bit pattern of the field. Should there not be a defined
> >> semantic for a particular bit pattern, the call is performed to the undefined handler.
> >> <
> > I think the problem is more fundamental, let us postulate that you have a simulator for a processor.
> > This processor has an instruction set, and the instruction set has a few formats. One way to decode
> > this ISA is to take the major OpCode and index a table of format subroutines. These format subroutines
> > look at another (format specific) field and index another table of subroutines. There will be heavy use
> > of the first table, with the Major OpCode touching most of the (valid) decode indexes ever hundred
> > simulated instructions. The other format subroutines are used less often (on average) but with even
> > more sporadic use.
> > <
> By the way, this is how I believe the VAX 780 decoder worked.
<
So did PDP-11..........
<
> Microcode can perform N-way branches and calls by jamming a value
> into the low order bits of the uPC (micro-op program counter).
<
And this works because each microcode "word" contains its successor
address in µCode. So, you branch to an indexed entry point, do what was
specified there, and then end up in separate places as if the call was to
separate entry points
>
> The root instruction sequence reads the first instruction byte.
> The opcode byte is looked up in a table (a PLA) to get a uPC destination
> address, which is then called (an N-way call).
<
For the PDP-11, they got the instruction word and did a 8-way call to
address mode[0], then an 8-way call to address mode[1],...
<
For VAX, they did a 256-way call for the OpCode decode, the a series
of 16-way calls for each address mode.
<
> That routine has a sequence of uOp calls to fetch operands.
> The operand fetch routine reads the address mode byte, looks it up in
> a table, and does an N-way call, to a routine to fetch the operand data.
<
Where N=8 for PDP-11 and N=16 for VAX.
>
> The instruction sequence itself may not need to be different for
> different operations as the opcode lookup can provide bits that
> are routed directly to the ALU to control its operation.
> All the instruction sequence has to do it tell the ALU to
> "do the thing that the opcode bits say".
>
> So a few microcode sequences can handle most of the simple arithmetic
> and logic instructions, using N-way calls to handle the operands.
<
Which is why Maurice Wilkes was so keen of µCode.

Re: Minor idea for indirect target predictor

<k24CI.121713$Sx7.69733@fx18.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18137&group=comp.arch#18137

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com> <Pj1CI.118763$431.56899@fx39.iad> <07e961f9-8036-4f44-85b9-e1dba0ba443bn@googlegroups.com>
In-Reply-To: <07e961f9-8036-4f44-85b9-e1dba0ba443bn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 14
Message-ID: <k24CI.121713$Sx7.69733@fx18.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 27 Jun 2021 19:10:08 UTC
Date: Sun, 27 Jun 2021 15:09:46 -0400
X-Received-Bytes: 1458
 by: EricP - Sun, 27 Jun 2021 19:09 UTC

MitchAlsup wrote:
> On Sunday, June 27, 2021 at 11:04:04 AM UTC-5, EricP wrote:
>>
>> So a few microcode sequences can handle most of the simple arithmetic
>> and logic instructions, using N-way calls to handle the operands.
> <
> Which is why Maurice Wilkes was so keen of µCode.

Ah... Sir Maurice Wilkes invented micro-programming in 1951.

The best way to design an automatic calculating machine, 1951
http://www.cs.princeton.edu/courses/archive/fall09/cos375/BestWay.pdf

Re: Minor idea for indirect target predictor

<sbaimd$umh$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18138&group=comp.arch#18138

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 27 Jun 2021 19:14:21 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbaimd$umh$2@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
<Pj1CI.118763$431.56899@fx39.iad>
<07e961f9-8036-4f44-85b9-e1dba0ba443bn@googlegroups.com>
<k24CI.121713$Sx7.69733@fx18.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 27 Jun 2021 19:14:21 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="31441"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 27 Jun 2021 19:14 UTC

EricP <ThatWouldBeTelling@thevillage.com> schrieb:
> MitchAlsup wrote:
>> On Sunday, June 27, 2021 at 11:04:04 AM UTC-5, EricP wrote:
>>>
>>> So a few microcode sequences can handle most of the simple arithmetic
>>> and logic instructions, using N-way calls to handle the operands.
>> <
>> Which is why Maurice Wilkes was so keen of µCode.
>
> Ah... Sir Maurice Wilkes invented micro-programming in 1951.

So he's to blame for shenanigans like Intel deciding that your
machine is too fast, and doing something about it?

https://travisdowns.github.io/blog/2021/06/17/rip-zero-opt.html

I love that headline... "Your CPU May Have Slowed Down on Wednesday".

Re: Minor idea for indirect target predictor

<2c117d74-cea4-4846-9546-9ed80b8a0ed6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18140&group=comp.arch#18140

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1281:: with SMTP id w1mr22251125qki.261.1624824258421;
Sun, 27 Jun 2021 13:04:18 -0700 (PDT)
X-Received: by 2002:a05:6808:a19:: with SMTP id n25mr15458035oij.0.1624824258153;
Sun, 27 Jun 2021 13:04:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 27 Jun 2021 13:04:17 -0700 (PDT)
In-Reply-To: <sbaimd$umh$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8ceb:68e4:4832:bb07;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8ceb:68e4:4832:bb07
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <e7216926-1067-4bfe-b3fc-838c9c1cc541n@googlegroups.com>
<Pj1CI.118763$431.56899@fx39.iad> <07e961f9-8036-4f44-85b9-e1dba0ba443bn@googlegroups.com>
<k24CI.121713$Sx7.69733@fx18.iad> <sbaimd$umh$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2c117d74-cea4-4846-9546-9ed80b8a0ed6n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 27 Jun 2021 20:04:18 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 27 Jun 2021 20:04 UTC

On Sunday, June 27, 2021 at 2:14:23 PM UTC-5, Thomas Koenig wrote:
> EricP <ThatWould...@thevillage.com> schrieb:
> > MitchAlsup wrote:
> >> On Sunday, June 27, 2021 at 11:04:04 AM UTC-5, EricP wrote:
> >>>
> >>> So a few microcode sequences can handle most of the simple arithmetic
> >>> and logic instructions, using N-way calls to handle the operands.
> >> <
> >> Which is why Maurice Wilkes was so keen of µCode.
> >
> > Ah... Sir Maurice Wilkes invented micro-programming in 1951.
> So he's to blame for shenanigans like Intel deciding that your
> machine is too fast, and doing something about it?
>
> https://travisdowns.github.io/blog/2021/06/17/rip-zero-opt.html
>
> I love that headline... "Your CPU May Have Slowed Down on Wednesday".
<
One of the things that makes µCode so (ahem) useful is that one can add
a few "words" to the side of the µCode store and be able to alter the sequencer
and sequences after power on.
<
One can use this to patch µCode and avoid catastrophes or to create
catastrophes. Apparently Intel tries to do both.................

Re: Minor idea for indirect target predictor

<93c3554c-dbea-48d6-987a-28d95476f962n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18156&group=comp.arch#18156

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:23b:: with SMTP id u27mr21007988qkm.98.1624893535022;
Mon, 28 Jun 2021 08:18:55 -0700 (PDT)
X-Received: by 2002:a9d:d09:: with SMTP id 9mr113777oti.16.1624893534759; Mon,
28 Jun 2021 08:18:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!ecngs!feeder2.ecngs.de!178.20.174.218.MISMATCH!feeder5.feed.usenet.farm!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 08:18:54 -0700 (PDT)
In-Reply-To: <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <93c3554c-dbea-48d6-987a-28d95476f962n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Mon, 28 Jun 2021 15:18:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3189
 by: Paul A. Clayton - Mon, 28 Jun 2021 15:18 UTC

On Saturday, June 26, 2021 at 3:31:36 PM UTC-4, MitchAlsup wrote:
> On Saturday, June 26, 2021 at 1:26:31 PM UTC-5, Paul A. Clayton wrote:
>> Indirect branches are typically divided into single-target branches (at least
>> within a local execution context sufficient to provide accurate prediction
>> and multiple-target branches. The predictor for the former is best indexed
>> by a hash of the address of branch instruction, while for the multiple-target
>> branches including some history information for the predictor index is
>> desirable.
[snip interesting analysis]

> I would say that by filtering out Method Calls, from Tabularized calls, from "indirect"
> calls and jumps one improves the predictability of the well organized means from the
> less well organized means.

I think there may have been a miscommunication. The idea is not to
force all predictions to use the same indexing information (i.e., different
types of jumps would use different information), but to merge the
*storage*. As far as I know, the current best _practice_ is to use an
ordinary BTB for single-target instructions and a separate table indexed
with history information for the multiple-target instructions. My (minor)
idea is to merge these using Andé Seznec's overlaid skewed associativity
mechanism. (As I understand current practice, the indirect predictors do
not even use variable length (and variable selection) history despite TAGE
direction predictors doing so. I suspect this is, in part, because having
dedicated storage for different indexings can result in low utilization of
some storage. With overlaying, capacity is shared at the cost of sharing
some indexing information to guarantee lack of bank conflicts — infrequent
bank conflicts might be managed by reprobing, presumably with biasing
of probing priority.)

Re: Minor idea for indirect target predictor

<792c7667-8692-4c18-8c02-8727852595d8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18158&group=comp.arch#18158

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:65da:: with SMTP id t26mr1739723qto.308.1624896793316;
Mon, 28 Jun 2021 09:13:13 -0700 (PDT)
X-Received: by 2002:a9d:7cd1:: with SMTP id r17mr303272otn.110.1624896793064;
Mon, 28 Jun 2021 09:13:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 09:13:12 -0700 (PDT)
In-Reply-To: <93c3554c-dbea-48d6-987a-28d95476f962n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4de5:91b7:1e4c:8ba8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4de5:91b7:1e4c:8ba8
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <93c3554c-dbea-48d6-987a-28d95476f962n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <792c7667-8692-4c18-8c02-8727852595d8n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Jun 2021 16:13:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Mon, 28 Jun 2021 16:13 UTC

On Monday, June 28, 2021 at 10:19:06 AM UTC-5, Paul A. Clayton wrote:
> On Saturday, June 26, 2021 at 3:31:36 PM UTC-4, MitchAlsup wrote:
> > On Saturday, June 26, 2021 at 1:26:31 PM UTC-5, Paul A. Clayton wrote:
> >> Indirect branches are typically divided into single-target branches (at least
> >> within a local execution context sufficient to provide accurate prediction
> >> and multiple-target branches. The predictor for the former is best indexed
> >> by a hash of the address of branch instruction, while for the multiple-target
> >> branches including some history information for the predictor index is
> >> desirable.
> [snip interesting analysis]
> > I would say that by filtering out Method Calls, from Tabularized calls, from "indirect"
> > calls and jumps one improves the predictability of the well organized means from the
> > less well organized means.
<
> I think there may have been a miscommunication. The idea is not to
> force all predictions to use the same indexing information (i.e., different
> types of jumps would use different information), but to merge the
> *storage*.
<
Perhaps I did not get to my point:: if you can identify more different kinds
of "branching" it is more likely that each one has an optimal predictor than
all of then having a common optimal predictor.
<
> As far as I know, the current best _practice_ is to use an
> ordinary BTB for single-target instructions and a separate table indexed
> with history information for the multiple-target instructions. My (minor)
> idea is to merge these using Andé Seznec's overlaid skewed associativity
> mechanism. (As I understand current practice, the indirect predictors do
> not even use variable length (and variable selection) history despite TAGE
> direction predictors doing so. I suspect this is, in part, because having
> dedicated storage for different indexings can result in low utilization of
> some storage. With overlaying, capacity is shared at the cost of sharing
> some indexing information to guarantee lack of bank conflicts — infrequent
> bank conflicts might be managed by reprobing, presumably with biasing
> of probing priority.)

Re: Minor idea for indirect target predictor

<sbe511$src$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18175&group=comp.arch#18175

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!h6LGChWsO9V9RgWepYFR1Q.user.gioia.aioe.org.POSTED!not-for-mail
From: andymc...@gmail.com (Andy)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 15:45:36 +1200
Organization: Aioe.org NNTP Server
Lines: 15
Message-ID: <sbe511$src$1@gioia.aioe.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
NNTP-Posting-Host: h6LGChWsO9V9RgWepYFR1Q.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Andy - Tue, 29 Jun 2021 03:45 UTC

On 27/06/21 7:31 am, MitchAlsup wrote:

> There are five cases: Switches, Returns, Method calls, Tabulated Calls, and Longjump.
> <
> Switches tend to be unpredictable much of the time.
> Returns are easily highly predictable
> Method calls tend to use few methods more than others
> Tabulated calls tend to be indexed tables of entry points
> And longjump is a category all by itself.

Ummm, so no love for Call Return Tables?, wouldn't they go some way to
reducing the number of conditional instructions spent checking for error
return codes and having otherwise unneeded instructions situated right
after the original call.

Re: Minor idea for indirect target predictor

<sbfg9g$371$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18193&group=comp.arch#18193

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 11:01:33 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <sbfg9g$371$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 16:04:00 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="482124cd17f6036927ec04040c37f73d";
logging-data="3297"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/OTf5HmBRKes0P5y3aQkHi"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:Kszy6Ws5ZSw9rYCOti02Pwji48o=
In-Reply-To: <sbe511$src$1@gioia.aioe.org>
Content-Language: en-US
 by: BGB - Tue, 29 Jun 2021 16:01 UTC

On 6/28/2021 10:45 PM, Andy wrote:
> On 27/06/21 7:31 am, MitchAlsup wrote:
>
>> There are five cases: Switches, Returns, Method calls, Tabulated
>> Calls, and Longjump.
>> <
>> Switches tend to be unpredictable much of the time.
>> Returns are easily highly predictable
>> Method calls tend to use few methods more than others
>> Tabulated calls tend to be indexed tables of entry points
>> And longjump is a category all by itself.
>
> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
> reducing the number of conditional instructions spent checking for error
> return codes and having otherwise unneeded instructions situated right
> after the original call.
>

A dedicated call-return table probably only really makes sense for
something like x86, which loads the return address from memory at the
same time as branching to it.

If you use a link register, or return via some other special register
(such as treating R1 as a secondary link register), then all one needs
to do is check is that this register hasn't been modified within the
pipeline, and if so use the value held in this register as the predicted
branch target.

In this case, typically one can reload the LR or "Secondary LR" several
cycles before its associated RTS or JMP. If one reloads the link
register before the others in a function epilog, then (typically) the
reloaded register will have left the pipeline before the CPU reaches the
return instruction.

Meanwhile, switch faces a problem that typically one calculates the
switch target immediately before branching to it. If one could somehow
do something like:
Calculate where the "switch()" will go;
Put some prior unrelated logic here;
Do the branch to the switch target.

Then "switch()" could potentially also be made predictable.

For small switch one might instead use linear probing or binary
subdivision or similar, in which case it is predictable if the
individual branches are predictable.

Say (number of case labels):
1-3: Linear Probe
4-11: Binary Subdivide
12-511 (and sufficiently dense): Branch Table
Else: Binary Subdivide (1)

*1: Binary subdivide happens until either it reaches the conditions for
a branch-table, or the number of targets is small enough that it falls
back to a linear probe.

....

Re: Minor idea for indirect target predictor

<fd177abd-f48e-4a29-88c0-84d4d964f01fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18194&group=comp.arch#18194

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:1921:: with SMTP id es1mr7350253qvb.61.1624983338878;
Tue, 29 Jun 2021 09:15:38 -0700 (PDT)
X-Received: by 2002:a9d:66c3:: with SMTP id t3mr5125815otm.276.1624983338641;
Tue, 29 Jun 2021 09:15:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 09:15:38 -0700 (PDT)
In-Reply-To: <792c7667-8692-4c18-8c02-8727852595d8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <93c3554c-dbea-48d6-987a-28d95476f962n@googlegroups.com>
<792c7667-8692-4c18-8c02-8727852595d8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fd177abd-f48e-4a29-88c0-84d4d964f01fn@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Tue, 29 Jun 2021 16:15:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Paul A. Clayton - Tue, 29 Jun 2021 16:15 UTC

On Monday, June 28, 2021 at 12:13:14 PM UTC-4, MitchAlsup wrote:
> On Monday, June 28, 2021 at 10:19:06 AM UTC-5, Paul A. Clayton wrote:
[snip]
>> I think there may have been a miscommunication. The idea
>> is not to force all predictions to use the same indexing
>> information (i.e., different types of jumps would use
>> different information), but to merge the *storage*.
>
> Perhaps I did not get to my point:: if you can identify
> more different kinds of "branching" it is more likely that
> each one has an optimal predictor than all of then having
> a common optimal predictor.

I tend to agree, though I feel the targets of direct,
single-target jumps and returns should be cached/memoized
rather than predicted. For returns, this introduces extra
tagging overhead and forces mispeculation fix-up to apply to
the cache. For direct jumps, this adds some tagging overhead
to the degree that the target cache has broader coverage
than the instruction cache (and any outer shared cache which
can compress instruction data by eliding targets cached in
the specialized target caches — just accepting redudant
storage may be easier, avoiding coherent retention of data
[dropping a target address would only be a performance
problem (and potential side channel) since the absence would
be recognized and the data would be in main memory).

(Such also implies predecoding to extract/calculate targets.
The variable storage requirements further adds complexity.)

Storing targets on the instruction side for jump-based
switch statements would allow the predictor to only store
an index into the instruction-side jump table rather than
an offset. It might also be possible to microarchitecturally
bypass proximate storage for rarely taken targets/offsets.
(Architecturally, if a collection of targets is known at
compile-time to be rarely taken, the standard cold code
optimization could be used.)

For C++-like virtual calls, if one does not have code
generation (or fix-up) across compilation units, one could
not use an inline table. Such would also presumably increase
the size of an executable text because each method will have
multiple call sites. (Direct calls have the same issue, of
course, and few suggest requiring tables so that an offset
is replaced with a smaller index.)

(I like the idea of link-time code generation, but such is
more complex and slower. Caching of compilation information
and invalidation at finer granularity than object file would
presumably reduce compile times, but such does not seem to
be commonly implemented and would add further complexity.)

If your statement, Mitch, was that one should not focus on
incremental improvements in this area but look closer at the
root of the problem, I can understand the connection with
my post, which merely incrementally (possibly) improved
common practice. Since it addressed common practice and
seemed kind of neat (and added a use for overlaid skewed
associativity: https://cs.stackexchange.com/q/108720/4577 )
I wanted to share it.

Control flow target encoding and placement does not seem to
be a solved problem. One would prefer to extract targets
that are rarely taken from the instruction fetch stream, but
retreiving even cool targets all the way from main memory
might cost more than is saved by extraction from the
instruction stream. (One intermediate between inline and
completely independent [especially data-side] target
retreival would be an offset into a local aligned storage
area. E.g., every 1 MiB code region might be enabled to
index storage directly below that region.)

With respect to setjump/longjump, such seems to be similar
to setting up an exception target and then taking an
exception. Caching such a target would not seem especially
difficult (though being per-thread — I think — when other
exception targets are typically(?) per 'process' would
increase tagging overhead). As with exceptions, I suspect
there is some way that threading support would integrate
with this, but as a non-programmer I am fairly clueless
about how such features are used.

Re: Minor idea for indirect target predictor

<45323a97-6be5-4f68-ae79-baefa3d65d6bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18200&group=comp.arch#18200

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:364:: with SMTP id t4mr8590761qvu.54.1624988358874; Tue, 29 Jun 2021 10:39:18 -0700 (PDT)
X-Received: by 2002:aca:3d56:: with SMTP id k83mr13394oia.110.1624988358662; Tue, 29 Jun 2021 10:39:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 10:39:18 -0700 (PDT)
In-Reply-To: <sbe511$src$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <45323a97-6be5-4f68-ae79-baefa3d65d6bn@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 17:39:18 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 26
 by: MitchAlsup - Tue, 29 Jun 2021 17:39 UTC

On Monday, June 28, 2021 at 10:45:41 PM UTC-5, Bitter blue pill wrote:
> On 27/06/21 7:31 am, MitchAlsup wrote:
>
> > There are five cases: Switches, Returns, Method calls, Tabulated Calls, and Longjump.
> > <
> > Switches tend to be unpredictable much of the time.
> > Returns are easily highly predictable
> > Method calls tend to use few methods more than others
> > Tabulated calls tend to be indexed tables of entry points
> > And longjump is a category all by itself.
<
> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
> reducing the number of conditional instructions spent checking for error
> return codes and having otherwise unneeded instructions situated right
> after the original call.
<
I am not sure I understand your 3 word phrase--can you give an example ?
<
I have seen a lot of code that does stuff like::
<
if( function( arg1, arg1, arg3 ) == SUCCESS )
if( next_function( arg4, arg5 ) == SUCCESS )
return SUCCESS;
return FAILURE;
<
And I would expect the code at the return point have a CMP-BC or BC0
depending if SUCCESS was == 0 or not.

Re: Minor idea for indirect target predictor

<e21452cb-5aef-4906-a1e2-3c347f2c2307n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18201&group=comp.arch#18201

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4cd0:: with SMTP id l16mr23923559qtv.54.1624988596437;
Tue, 29 Jun 2021 10:43:16 -0700 (PDT)
X-Received: by 2002:aca:4f16:: with SMTP id d22mr35408oib.44.1624988596170;
Tue, 29 Jun 2021 10:43:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 10:43:15 -0700 (PDT)
In-Reply-To: <sbfg9g$371$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e21452cb-5aef-4906-a1e2-3c347f2c2307n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 17:43:16 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 29 Jun 2021 17:43 UTC

On Tuesday, June 29, 2021 at 11:04:03 AM UTC-5, BGB wrote:
> On 6/28/2021 10:45 PM, Andy wrote:
> > On 27/06/21 7:31 am, MitchAlsup wrote:
> >
> >> There are five cases: Switches, Returns, Method calls, Tabulated
> >> Calls, and Longjump.
> >> <
> >> Switches tend to be unpredictable much of the time.
> >> Returns are easily highly predictable
> >> Method calls tend to use few methods more than others
> >> Tabulated calls tend to be indexed tables of entry points
> >> And longjump is a category all by itself.
> >
> > Ummm, so no love for Call Return Tables?, wouldn't they go some way to
> > reducing the number of conditional instructions spent checking for error
> > return codes and having otherwise unneeded instructions situated right
> > after the original call.
> >
> A dedicated call-return table probably only really makes sense for
> something like x86, which loads the return address from memory at the
> same time as branching to it.
>
> If you use a link register, or return via some other special register
> (such as treating R1 as a secondary link register), then all one needs
> to do is check is that this register hasn't been modified within the
> pipeline, and if so use the value held in this register as the predicted
> branch target.
>
> In this case, typically one can reload the LR or "Secondary LR" several
> cycles before its associated RTS or JMP. If one reloads the link
> register before the others in a function epilog, then (typically) the
> reloaded register will have left the pipeline before the CPU reaches the
> return instruction.
>
>
> Meanwhile, switch faces a problem that typically one calculates the
> switch target immediately before branching to it. If one could somehow
> do something like:
> Calculate where the "switch()" will go;
> Put some prior unrelated logic here;
> Do the branch to the switch target.
>
> Then "switch()" could potentially also be made predictable.
>
> For small switch one might instead use linear probing or binary
> subdivision or similar, in which case it is predictable if the
> individual branches are predictable.
>
> Say (number of case labels):
> 1-3: Linear Probe
> 4-11: Binary Subdivide
> 12-511 (and sufficiently dense): Branch Table
<
My 66000 ISA directly supports tabularized dense switch regions. The
instruction checks the (now zero relative) index, and fetches an offset
to the switch case label, checks the index for being in bounds, and
transfers control to the case label or the default label (depending).
This is efficient enough to cover the dense 4-11 case, too.
<
> Else: Binary Subdivide (1)
>
> *1: Binary subdivide happens until either it reaches the conditions for
> a branch-table, or the number of targets is small enough that it falls
> back to a linear probe.
>
> ...

Re: Minor idea for indirect target predictor

<4be23524-eea4-4cb0-baad-3268d5bf4c18n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18202&group=comp.arch#18202

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:149:: with SMTP id v9mr28255727qtw.144.1624989005358;
Tue, 29 Jun 2021 10:50:05 -0700 (PDT)
X-Received: by 2002:a4a:e1c5:: with SMTP id n5mr5000309oot.5.1624989005098;
Tue, 29 Jun 2021 10:50:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 10:50:04 -0700 (PDT)
In-Reply-To: <fd177abd-f48e-4a29-88c0-84d4d964f01fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <93c3554c-dbea-48d6-987a-28d95476f962n@googlegroups.com>
<792c7667-8692-4c18-8c02-8727852595d8n@googlegroups.com> <fd177abd-f48e-4a29-88c0-84d4d964f01fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4be23524-eea4-4cb0-baad-3268d5bf4c18n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 17:50:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 29 Jun 2021 17:50 UTC

On Tuesday, June 29, 2021 at 11:15:39 AM UTC-5, Paul A. Clayton wrote:
> On Monday, June 28, 2021 at 12:13:14 PM UTC-4, MitchAlsup wrote:
> > On Monday, June 28, 2021 at 10:19:06 AM UTC-5, Paul A. Clayton wrote:
> [snip]
> >> I think there may have been a miscommunication. The idea
> >> is not to force all predictions to use the same indexing
> >> information (i.e., different types of jumps would use
> >> different information), but to merge the *storage*.
> >
> > Perhaps I did not get to my point:: if you can identify
> > more different kinds of "branching" it is more likely that
> > each one has an optimal predictor than all of then having
> > a common optimal predictor.
> I tend to agree, though I feel the targets of direct,
> single-target jumps and returns should be cached/memoized
> rather than predicted. For returns, this introduces extra
> tagging overhead and forces mispeculation fix-up to apply to
> the cache.
<
Which is why call/ret is generally done with a predictor operating
much like a cache--so it can be un-done and un-un-done when
necessary.
<
But I don't think one should mingle the call return stack with the
conditional branch predictor. Nor should the indirect predictor
be integrated. The way switches and method calls are supported
in My 66000 ISA means these need no predictors, just fast or wide
ICache access.
<
> For direct jumps, this adds some tagging overhead
> to the degree that the target cache has broader coverage
> than the instruction cache (and any outer shared cache which
> can compress instruction data by eliding targets cached in
> the specialized target caches — just accepting redudant
> storage may be easier, avoiding coherent retention of data
> [dropping a target address would only be a performance
> problem (and potential side channel) since the absence would
> be recognized and the data would be in main memory).
>
> (Such also implies predecoding to extract/calculate targets.
> The variable storage requirements further adds complexity.)
>
> Storing targets on the instruction side for jump-based
> switch statements would allow the predictor to only store
> an index into the instruction-side jump table rather than
> an offset. It might also be possible to microarchitecturally
> bypass proximate storage for rarely taken targets/offsets.
> (Architecturally, if a collection of targets is known at
> compile-time to be rarely taken, the standard cold code
> optimization could be used.)
>
> For C++-like virtual calls, if one does not have code
> generation (or fix-up) across compilation units, one could
> not use an inline table. Such would also presumably increase
> the size of an executable text because each method will have
> multiple call sites. (Direct calls have the same issue, of
> course, and few suggest requiring tables so that an offset
> is replaced with a smaller index.)
>
> (I like the idea of link-time code generation, but such is
> more complex and slower. Caching of compilation information
> and invalidation at finer granularity than object file would
> presumably reduce compile times, but such does not seem to
> be commonly implemented and would add further complexity.)
>
> If your statement, Mitch, was that one should not focus on
> incremental improvements in this area but look closer at the
> root of the problem, I can understand the connection with
> my post, which merely incrementally (possibly) improved
> common practice. Since it addressed common practice and
> seemed kind of neat (and added a use for overlaid skewed
> associativity: https://cs.stackexchange.com/q/108720/4577 )
> I wanted to share it.
>
> Control flow target encoding and placement does not seem to
> be a solved problem. One would prefer to extract targets
> that are rarely taken from the instruction fetch stream, but
> retreiving even cool targets all the way from main memory
> might cost more than is saved by extraction from the
> instruction stream. (One intermediate between inline and
> completely independent [especially data-side] target
> retreival would be an offset into a local aligned storage
> area. E.g., every 1 MiB code region might be enabled to
> index storage directly below that region.)
>
> With respect to setjump/longjump, such seems to be similar
> to setting up an exception target and then taking an
> exception. Caching such a target would not seem especially
> difficult (though being per-thread — I think — when other
> exception targets are typically(?) per 'process' would
> increase tagging overhead).
<
The issue with longjump() is that you need to call destructors as
the stack is walked backwards in C++.
<
>As with exceptions, I suspect
> there is some way that threading support would integrate
> with this, but as a non-programmer I am fairly clueless
> about how such features are used.

Re: Minor idea for indirect target predictor

<2021Jun29.193553@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18203&group=comp.arch#18203

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 17:35:53 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 44
Message-ID: <2021Jun29.193553@mips.complang.tuwien.ac.at>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="c65cab040ca5bcb1b4afd0898903036f";
logging-data="25206"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sF/xES1X/O2GgkPkgZR6t"
Cancel-Lock: sha1:bwUg0zDZuNcKJhn4tsXuk5Up7DU=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Tue, 29 Jun 2021 17:35 UTC

BGB <cr88192@gmail.com> writes:
>If you use a link register, or return via some other special register
>(such as treating R1 as a secondary link register), then all one needs
>to do is check is that this register hasn't been modified within the
>pipeline, and if so use the value held in this register as the predicted
>branch target.

McKinley and the PowerPC 970 predicted indirect jumps that way. If
the target arrives a short time before the call, this causes a
misprediction; IIRC it cost 6 cycles on McKinley (which did not have a
particularly deep pipeline).

On a modern high-performance CPU, you want to be able to call a short
routine, fetch the instructions from there, and fetch the instructions
from after the call a cycle later. Modern AMD64 implementations
achieve that (thanks to a very streamlined predictor including a
return stack), but with the approach you are suggesting it would not
be possible; keeping the return address in a register helps only so
far.

>In this case, typically one can reload the LR or "Secondary LR" several
>cycles before its associated RTS or JMP.

On a wide processor, you often cannot, see below.

>If one reloads the link
>register before the others in a function epilog, then (typically) the
>reloaded register will have left the pipeline before the CPU reaches the
>return instruction.

You certainly cannot if you do the load in a function epilog (rather
than immediately after the last call). The load has 3-5 cycles of
latency for use by some execution unit, and even in short pipelines
several additional cycles for use in instruction fetching; let's say
you have a short pipeline and only 6 cycles latency from executing a
load to instruction fetch that uses its result; and you have a
moderately wide machine at 4 instructions fetched and decoded per
cycle; then you need to put the load 24 cycles before the return
instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Minor idea for indirect target predictor

<sbfs7g$o15$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18207&group=comp.arch#18207

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 12:27:43 -0700
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <sbfs7g$o15$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 29 Jun 2021 19:27:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6dab9203f08465642dc0131cfddaffcf";
logging-data="24613"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/302ZzmWZKSa3tFm5V2Q+f"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:OYq8ynPsv0tBOyOsPHCfh1duTn4=
In-Reply-To: <sbfg9g$371$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Tue, 29 Jun 2021 19:27 UTC

On 6/29/2021 9:01 AM, BGB wrote:
> On 6/28/2021 10:45 PM, Andy wrote:
>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>
>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>> Calls, and Longjump.
>>> <
>>> Switches tend to be unpredictable much of the time.
>>> Returns are easily highly predictable
>>> Method calls tend to use few methods more than others
>>> Tabulated calls tend to be indexed tables of entry points
>>> And longjump is a category all by itself.
>>
>> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
>> reducing the number of conditional instructions spent checking for
>> error return codes and having otherwise unneeded instructions situated
>> right after the original call.
>>
>
> A dedicated call-return table probably only really makes sense for
> something like x86, which loads the return address from memory at the
> same time as branching to it.
>
> If you use a link register, or return via some other special register
> (such as treating R1 as a secondary link register), then all one needs
> to do is check is that this register hasn't been modified within the
> pipeline, and if so use the value held in this register as the predicted
> branch target.
>
> In this case, typically one can reload the LR or "Secondary LR" several
> cycles before its associated RTS or JMP. If one reloads the link
> register before the others in a function epilog, then (typically) the
> reloaded register will have left the pipeline before the CPU reaches the
> return instruction.
>
>
> Meanwhile, switch faces a problem that typically one calculates the
> switch target immediately before branching to it. If one could somehow
> do something like:
>   Calculate where the "switch()" will go;
>   Put some prior unrelated logic here;
>   Do the branch to the switch target.
>
> Then "switch()" could potentially also be made predictable.

You can do exactly that: figure index; index-load from a table in
memory; do other stuff; indirect branch. Of course, that puts the table
in dmem, which Mitch doesn't like, or requires a load instruction from imem.

> For small switch one might instead use linear probing or binary
> subdivision or similar, in which case it is predictable if the
> individual branches are predictable.
>
> Say (number of case labels):
>   1-3: Linear Probe
>   4-11: Binary Subdivide
>   12-511 (and sufficiently dense): Branch Table
>   Else: Binary Subdivide (1)
>
> *1: Binary subdivide happens until either it reaches the conditions for
> a branch-table, or the number of targets is small enough that it falls
> back to a linear probe.
>
> ...

What matters is the distribution of indices, not the number of targets.
Think a typical lexer: you want to isolate all alphas with one compare,
not a four way binary to a particular alpha, because all the alphas go
to a single label. Once you do that, normal ifconversion will get rid of
half the internal branches of the decision tree..

Re: Minor idea for indirect target predictor

<8fda3263-0c66-4255-a625-8743a033ade7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18210&group=comp.arch#18210

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:65c9:: with SMTP id t9mr13073581qto.102.1624998527407; Tue, 29 Jun 2021 13:28:47 -0700 (PDT)
X-Received: by 2002:a9d:c23:: with SMTP id 32mr5924086otr.182.1624998527154; Tue, 29 Jun 2021 13:28:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 13:28:46 -0700 (PDT)
In-Reply-To: <2021Jun29.193553@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me> <2021Jun29.193553@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8fda3263-0c66-4255-a625-8743a033ade7n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 20:28:47 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 44
 by: MitchAlsup - Tue, 29 Jun 2021 20:28 UTC

On Tuesday, June 29, 2021 at 12:53:14 PM UTC-5, Anton Ertl wrote:
> BGB <cr8...@gmail.com> writes:
> >If you use a link register, or return via some other special register
> >(such as treating R1 as a secondary link register), then all one needs
> >to do is check is that this register hasn't been modified within the
> >pipeline, and if so use the value held in this register as the predicted
> >branch target.
> McKinley and the PowerPC 970 predicted indirect jumps that way. If
> the target arrives a short time before the call, this causes a
> misprediction; IIRC it cost 6 cycles on McKinley (which did not have a
> particularly deep pipeline).
>
> On a modern high-performance CPU, you want to be able to call a short
> routine, fetch the instructions from there, and fetch the instructions
> from after the call a cycle later. Modern AMD64 implementations
> achieve that (thanks to a very streamlined predictor including a
> return stack), but with the approach you are suggesting it would not
> be possible; keeping the return address in a register helps only so
> far.
> >In this case, typically one can reload the LR or "Secondary LR" several
> >cycles before its associated RTS or JMP.
> On a wide processor, you often cannot, see below.
> >If one reloads the link
> >register before the others in a function epilog, then (typically) the
> >reloaded register will have left the pipeline before the CPU reaches the
> >return instruction.
<
> You certainly cannot if you do the load in a function epilog (rather
> than immediately after the last call). The load has 3-5 cycles of
> latency for use by some execution unit, and even in short pipelines
> several additional cycles for use in instruction fetching; let's say
> you have a short pipeline and only 6 cycles latency from executing a
> load to instruction fetch that uses its result; and you have a
> moderately wide machine at 4 instructions fetched and decoded per
> cycle; then you need to put the load 24 cycles before the return
> instructions.
<
In My 66000, EXIT fetches the return address first then fetches the instructions
at the return point, before reloading the saved registers. So the instructions
are entering the pipeline when the preserved registers have their proper values.
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Minor idea for indirect target predictor

<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18211&group=comp.arch#18211

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1281:: with SMTP id w1mr33211757qki.261.1624998798520;
Tue, 29 Jun 2021 13:33:18 -0700 (PDT)
X-Received: by 2002:a9d:7cd1:: with SMTP id r17mr5689554otn.110.1624998798363;
Tue, 29 Jun 2021 13:33:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 13:33:18 -0700 (PDT)
In-Reply-To: <sbfs7g$o15$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 20:33:18 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 29 Jun 2021 20:33 UTC

On Tuesday, June 29, 2021 at 2:27:46 PM UTC-5, Ivan Godard wrote:
> On 6/29/2021 9:01 AM, BGB wrote:
> > On 6/28/2021 10:45 PM, Andy wrote:
> >> On 27/06/21 7:31 am, MitchAlsup wrote:
> >>
> >>> There are five cases: Switches, Returns, Method calls, Tabulated
> >>> Calls, and Longjump.
> >>> <
> >>> Switches tend to be unpredictable much of the time.
> >>> Returns are easily highly predictable
> >>> Method calls tend to use few methods more than others
> >>> Tabulated calls tend to be indexed tables of entry points
> >>> And longjump is a category all by itself.
> >>
> >> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
> >> reducing the number of conditional instructions spent checking for
> >> error return codes and having otherwise unneeded instructions situated
> >> right after the original call.
> >>
> >
> > A dedicated call-return table probably only really makes sense for
> > something like x86, which loads the return address from memory at the
> > same time as branching to it.
> >
> > If you use a link register, or return via some other special register
> > (such as treating R1 as a secondary link register), then all one needs
> > to do is check is that this register hasn't been modified within the
> > pipeline, and if so use the value held in this register as the predicted
> > branch target.
> >
> > In this case, typically one can reload the LR or "Secondary LR" several
> > cycles before its associated RTS or JMP. If one reloads the link
> > register before the others in a function epilog, then (typically) the
> > reloaded register will have left the pipeline before the CPU reaches the
> > return instruction.
> >
> >
> > Meanwhile, switch faces a problem that typically one calculates the
> > switch target immediately before branching to it. If one could somehow
> > do something like:
> > Calculate where the "switch()" will go;
> > Put some prior unrelated logic here;
> > Do the branch to the switch target.
> >
> > Then "switch()" could potentially also be made predictable.
> You can do exactly that: figure index; index-load from a table in
> memory; do other stuff; indirect branch. Of course, that puts the table
> in dmem, which Mitch doesn't like, or requires a load instruction from imem.
<
Switch is common enough that it should be performed in an instruction
(the multiway branch) I added limit checking to this due to how it
"worked out" in my ISA.
<
Also note: the tables in my switches remain program relative, so you fetch
from ICache and then add this to the IP for the target address. This saves
table size as the majority of all tables fit in 16-bit offsets, retains program
relativity.
<
> > For small switch one might instead use linear probing or binary
> > subdivision or similar, in which case it is predictable if the
> > individual branches are predictable.
> >
> > Say (number of case labels):
> > 1-3: Linear Probe
> > 4-11: Binary Subdivide
> > 12-511 (and sufficiently dense): Branch Table
> > Else: Binary Subdivide (1)
> >
> > *1: Binary subdivide happens until either it reaches the conditions for
> > a branch-table, or the number of targets is small enough that it falls
> > back to a linear probe.
> >
> > ...
> What matters is the distribution of indices, not the number of targets.
> Think a typical lexer: you want to isolate all alphas with one compare,
> not a four way binary to a particular alpha, because all the alphas go
> to a single label. Once you do that, normal ifconversion will get rid of
> half the internal branches of the decision tree..
<
In any event, any HW is going to be able to sort through the situation
faster than any string of SW instructions.

Re: Minor idea for indirect target predictor

<sbg1u7$v63$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18212&group=comp.arch#18212

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 14:05:11 -0700
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <sbg1u7$v63$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 21:05:11 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6dab9203f08465642dc0131cfddaffcf";
logging-data="31939"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DxfGPW+iA4NBjikQJ8P8E"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:Jde5Ez2n6bYTCZjk/KPnjCpWIhs=
In-Reply-To: <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Tue, 29 Jun 2021 21:05 UTC

On 6/29/2021 1:33 PM, MitchAlsup wrote:
> On Tuesday, June 29, 2021 at 2:27:46 PM UTC-5, Ivan Godard wrote:
>> On 6/29/2021 9:01 AM, BGB wrote:
>>> On 6/28/2021 10:45 PM, Andy wrote:
>>>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>>>
>>>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>>>> Calls, and Longjump.
>>>>> <
>>>>> Switches tend to be unpredictable much of the time.
>>>>> Returns are easily highly predictable
>>>>> Method calls tend to use few methods more than others
>>>>> Tabulated calls tend to be indexed tables of entry points
>>>>> And longjump is a category all by itself.
>>>>
>>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
>>>> reducing the number of conditional instructions spent checking for
>>>> error return codes and having otherwise unneeded instructions situated
>>>> right after the original call.
>>>>
>>>
>>> A dedicated call-return table probably only really makes sense for
>>> something like x86, which loads the return address from memory at the
>>> same time as branching to it.
>>>
>>> If you use a link register, or return via some other special register
>>> (such as treating R1 as a secondary link register), then all one needs
>>> to do is check is that this register hasn't been modified within the
>>> pipeline, and if so use the value held in this register as the predicted
>>> branch target.
>>>
>>> In this case, typically one can reload the LR or "Secondary LR" several
>>> cycles before its associated RTS or JMP. If one reloads the link
>>> register before the others in a function epilog, then (typically) the
>>> reloaded register will have left the pipeline before the CPU reaches the
>>> return instruction.
>>>
>>>
>>> Meanwhile, switch faces a problem that typically one calculates the
>>> switch target immediately before branching to it. If one could somehow
>>> do something like:
>>> Calculate where the "switch()" will go;
>>> Put some prior unrelated logic here;
>>> Do the branch to the switch target.
>>>
>>> Then "switch()" could potentially also be made predictable.
>> You can do exactly that: figure index; index-load from a table in
>> memory; do other stuff; indirect branch. Of course, that puts the table
>> in dmem, which Mitch doesn't like, or requires a load instruction from imem.
> <
> Switch is common enough that it should be performed in an instruction
> (the multiway branch) I added limit checking to this due to how it
> "worked out" in my ISA.
> <
> Also note: the tables in my switches remain program relative, so you fetch
> from ICache and then add this to the IP for the target address. This saves
> table size as the majority of all tables fit in 16-bit offsets, retains program
> relativity.
> <
>>> For small switch one might instead use linear probing or binary
>>> subdivision or similar, in which case it is predictable if the
>>> individual branches are predictable.
>>>
>>> Say (number of case labels):
>>> 1-3: Linear Probe
>>> 4-11: Binary Subdivide
>>> 12-511 (and sufficiently dense): Branch Table
>>> Else: Binary Subdivide (1)
>>>
>>> *1: Binary subdivide happens until either it reaches the conditions for
>>> a branch-table, or the number of targets is small enough that it falls
>>> back to a linear probe.
>>>
>>> ...
>> What matters is the distribution of indices, not the number of targets.
>> Think a typical lexer: you want to isolate all alphas with one compare,
>> not a four way binary to a particular alpha, because all the alphas go
>> to a single label. Once you do that, normal ifconversion will get rid of
>> half the internal branches of the decision tree..
> <
> In any event, any HW is going to be able to sort through the situation
> faster than any string of SW instructions.
>

Arguable.

Label density matters for the tables too:
switch (<32 bit>) {
<cases: 11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999>

Do you really want 100k labels in your table, or a 3-deep compare tree?

Out of curiosity, what happens when you throw that at Brian's compiler?

Re: Minor idea for indirect target predictor

<jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18216&group=comp.arch#18216

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 18:02:32 -0400
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="48dc9934e6c6b6e6adbb2f03a0f875ee";
logging-data="19142"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+/7VDjdq709FhT2AMlaee3"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:7ruffpWOAGweJBdCBDv+t9+1+zw=
sha1:MLJ5z3GbAfU5MFT4KbqtnEOmLSU=
 by: Stefan Monnier - Tue, 29 Jun 2021 22:02 UTC

> Switch is common enough that it should be performed in an instruction
> (the multiway branch) I added limit checking to this due to how it
> "worked out" in my ISA.

BTW, what kind of prediction structure do you foresee for your "multiway
branch" instructions? Do you foresee predicting the index (and then
looking up the corresponding target address in the table), so similar to
a taken/nottaken predictor but just returning more than one bit, or do
you foresee something more like a BTB (i.e. predict the actual target
address of the branch)?

Stefan

Re: Minor idea for indirect target predictor

<jwvv95we39j.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18217&group=comp.arch#18217

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 18:04:45 -0400
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <jwvv95we39j.fsf-monnier+comp.arch@gnu.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="48dc9934e6c6b6e6adbb2f03a0f875ee";
logging-data="23384"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18tX+ATqvGISzhb+0mcYY4P"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:8F0Eyf/di4yLx0YvsVAbySmgU58=
sha1:OsdVeFpsY2DbX/OUfJeOnORgFbE=
 by: Stefan Monnier - Tue, 29 Jun 2021 22:04 UTC

> Switch is common enough that it should be performed in an instruction
> (the multiway branch) I added limit checking to this due to how it
> "worked out" in my ISA.

BTW, what kind of prediction structure do you foresee for your "multiway
branch" instructions? Do you foresee predicting the index (and then
looking up the corresponding target address in the table), so similar to
a taken/nottaken predictor but just returning more than one bit, or do
you foresee something more like a BTB (i.e. predict the actual target
address of the branch)?

Stefan

Re: Minor idea for indirect target predictor

<sbg6bf$rrm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18218&group=comp.arch#18218

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 15:20:31 -0700
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <sbg6bf$rrm$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 29 Jun 2021 22:20:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="75366452937bde3736f0394a49cb7218";
logging-data="28534"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dVrmNIjSjMbotL0wmLeJ/aVL+F20rUuQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:cKewl1zAUkeHP4P4/w1MRMcEdAk=
In-Reply-To: <sbfs7g$o15$1@dont-email.me>
Content-Language: en-US
 by: Stephen Fuld - Tue, 29 Jun 2021 22:20 UTC

On 6/29/2021 12:27 PM, Ivan Godard wrote:
> On 6/29/2021 9:01 AM, BGB wrote:
>> On 6/28/2021 10:45 PM, Andy wrote:
>>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>>
>>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>>> Calls, and Longjump.
>>>> <
>>>> Switches tend to be unpredictable much of the time.
>>>> Returns are easily highly predictable
>>>> Method calls tend to use few methods more than others
>>>> Tabulated calls tend to be indexed tables of entry points
>>>> And longjump is a category all by itself.
>>>
>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way
>>> to reducing the number of conditional instructions spent checking for
>>> error return codes and having otherwise unneeded instructions
>>> situated right after the original call.
>>>
>>
>> A dedicated call-return table probably only really makes sense for
>> something like x86, which loads the return address from memory at the
>> same time as branching to it.
>>
>> If you use a link register, or return via some other special register
>> (such as treating R1 as a secondary link register), then all one needs
>> to do is check is that this register hasn't been modified within the
>> pipeline, and if so use the value held in this register as the
>> predicted branch target.
>>
>> In this case, typically one can reload the LR or "Secondary LR"
>> several cycles before its associated RTS or JMP. If one reloads the
>> link register before the others in a function epilog, then (typically)
>> the reloaded register will have left the pipeline before the CPU
>> reaches the return instruction.
>>
>>
>> Meanwhile, switch faces a problem that typically one calculates the
>> switch target immediately before branching to it. If one could somehow
>> do something like:
>>    Calculate where the "switch()" will go;
>>    Put some prior unrelated logic here;
>>    Do the branch to the switch target.
>>
>> Then "switch()" could potentially also be made predictable.
>
> You can do exactly that: figure index; index-load from a table in
> memory; do other stuff; indirect branch. Of course, that puts the table
> in dmem, which Mitch doesn't like, or requires a load instruction from
> imem.
>
>> For small switch one might instead use linear probing or binary
>> subdivision or similar, in which case it is predictable if the
>> individual branches are predictable.
>>
>> Say (number of case labels):
>>    1-3: Linear Probe
>>    4-11: Binary Subdivide
>>    12-511 (and sufficiently dense): Branch Table
>>    Else: Binary Subdivide (1)
>>
>> *1: Binary subdivide happens until either it reaches the conditions
>> for a branch-table, or the number of targets is small enough that it
>> falls back to a linear probe.
>>
>> ...
>
> What matters is the distribution of indices, not the number of targets.
> Think a typical lexer: you want to isolate all alphas with one compare,
> not a four way binary to a particular alpha, because all the alphas go
> to a single label. Once you do that, normal ifconversion will get rid of
> half the internal branches of the decision tree..

I am not a compiler guy, so forgive my ignorance. What would a compiler
do in such a situation? Would it first test for something like if char
>= "a" and <= "z", etc.?

Just thinking about how I might do it, I think I would have a 256 byte
array, indexed by the character value, with each entry having an
indicator value, say 1 for lower case letters, 2 for upper case letters,
3 for numbers, 4 for white space characters, etc. Looking up the value
shouldn't be costly as most of the table with shortly be in D-cache.
Then use the indicator value to index a jump or call table (which I
guess is fairly predictable as most characters are lower case letters.

But what do I know? :-(

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Pages:12345678
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor