Message-ID:

When speculation has done its worst, two plus two still equals four. -- S. Johnson

devel / comp.arch / Re: Minor idea for indirect target predictor

Re: Minor idea for indirect target predictor

<sbg8e7$86c$1@dont-email.me>

https://www.novabbs.com/devel/article-flat.php?id=18219&group=comp.arch#18219

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 17:53:39 -0500
Organization: A noiseless patient Spider
Lines: 121
Message-ID: <sbg8e7$86c$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<2021Jun29.193553@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 22:56:07 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="8396"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vnQb9fT5Uecrp28VGkGJG"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:RNet5rFzXGS9XFoD8iWTG3R0TFs=
In-Reply-To: <2021Jun29.193553@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: BGB - Tue, 29 Jun 2021 22:53 UTC

On 6/29/2021 12:35 PM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> If you use a link register, or return via some other special register
>> (such as treating R1 as a secondary link register), then all one needs
>> to do is check is that this register hasn't been modified within the
>> pipeline, and if so use the value held in this register as the predicted
>> branch target.
>
> McKinley and the PowerPC 970 predicted indirect jumps that way. If
> the target arrives a short time before the call, this causes a
> misprediction; IIRC it cost 6 cycles on McKinley (which did not have a
> particularly deep pipeline).
>

I am doing it this way with my BJX2 core, which has an ~ 8-stage
pipeline, eg:
PF IF ID1 ID2 EX1 EX2 EX3 WB

PF: Pass next PC to I$
IF: I$ does its thing, gives result
ID1: Decode 1, Branch Predictor does its thing.
ID2: Decode 2, Fetch register values, ...
EX1: Execute 1 (AGU)
EX2: Execute 2 (Mem 1)
EX3: Execute 3 (Mem 2)
WB: Write-Back (Store results to GPR File)

Because the branch-predictor runs in ID2 and feeds the result into PF,
correctly-predicted taken branches still have a 2 cycle latency (best
case), whereas correctly-predicted non-taken branches have a 1 cycle
latency.

> On a modern high-performance CPU, you want to be able to call a short
> routine, fetch the instructions from there, and fetch the instructions
> from after the call a cycle later. Modern AMD64 implementations
> achieve that (thanks to a very streamlined predictor including a
> return stack), but with the approach you are suggesting it would not
> be possible; keeping the return address in a register helps only so
> far.
>

Potentially, but as noted, I am not talking about a modern x86 chip (or
even necessarily OoO).

>> In this case, typically one can reload the LR or "Secondary LR" several
>> cycles before its associated RTS or JMP.
>
> On a wide processor, you often cannot, see below.
>
>> If one reloads the link
>> register before the others in a function epilog, then (typically) the
>> reloaded register will have left the pipeline before the CPU reaches the
>> return instruction.
>
> You certainly cannot if you do the load in a function epilog (rather
> than immediately after the last call). The load has 3-5 cycles of
> latency for use by some execution unit, and even in short pipelines
> several additional cycles for use in instruction fetching; let's say
> you have a short pipeline and only 6 cycles latency from executing a
> load to instruction fetch that uses its result; and you have a
> moderately wide machine at 4 instructions fetched and decoded per
> cycle; then you need to put the load 24 cycles before the return
> instructions.
>

I have a core that does a max of 3 instructions per cycle, but only 1
memory access per cycle (For the most part, lanes 2 and 3 can only do
ALU ops and similar).

There is an optional feature which does allow issuing FPU ops from Lane
2, which can sometimes help some for FPU code by allowing memory access
and the FPU to operate in parallel. This isn't really an official
feature yet though (however, this does not allow running FPU ops in
parallel, as there is still only a single FPU).

If one does something like, say:
MOV.Q (SP, 64), R1
MOV.X (SP, 0), R24
MOV.X (SP, 16), R26
MOV.X (SP, 32), R28
MOV.X (SP, 48), R30
ADD 80, SP
JMP R1

Then R1 passes through WB just before the JMP hits ID1.

Longer is safer, and shorter than this will result in a non-predicted
branch (~ 9 cycles of penalty).

Not really a good workaround (in theory, an interlock could be done in
small cases, or possibly the compiler could detect these cases and
insert a few NOPs).

Granted, a fair number of functions save more registers than this, so, eg:
MOV.Q (SP, 112), R1
MOV.X (SP, 0), R8
MOV.X (SP, 16), R10
MOV.X (SP, 32), R12
MOV.X (SP, 48), R24
MOV.X (SP, 64), R26
MOV.X (SP, 80), R28
MOV.X (SP, 96), R30
ADD 128, SP
JMP R1

In which case R1 can be predicted.

If it is a leaf function which does not save LR and uses a plain RTS,
then the body of the leaf function typically provides enough of a buffer
zone (there may still be a penalty if the function is short enough that
the original BSR does not pass through WB before the RTS hits ID1).

Re: Minor idea for indirect target predictor

<sbg9cr$ddo$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18220&group=comp.arch#18220

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 16:12:27 -0700
Organization: A noiseless patient Spider
Lines: 115
Message-ID: <sbg9cr$ddo$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 29 Jun 2021 23:12:28 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f6e548dafa96a501ea22ce4810482039";
logging-data="13752"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ov3Xr94mR846kAzhzIDDQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:cY8rBuTNP0rg4F0gl0Zcchen0Fk=
In-Reply-To: <sbg6bf$rrm$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Tue, 29 Jun 2021 23:12 UTC

On 6/29/2021 3:20 PM, Stephen Fuld wrote:
> On 6/29/2021 12:27 PM, Ivan Godard wrote:
...
>>
>> What matters is the distribution of indices, not the number of
>> targets. Think a typical lexer: you want to isolate all alphas with
>> one compare, not a four way binary to a particular alpha, because all
>> the alphas go to a single label. Once you do that, normal ifconversion
>> will get rid of half the internal branches of the decision tree..
>
> I am not a compiler guy, so forgive my ignorance. What would a compiler
> do in such a situation? Would it first test for something like if char
> >= "a" and <= "z", etc.?
>
> Just thinking about how I might do it, I think I would have a 256 byte
> array, indexed by the character value, with each entry having an
> indicator value, say 1 for lower case letters, 2 for upper case letters,
> 3 for numbers, 4 for white space characters, etc. Looking up the value
> shouldn't be costly as most of the table with shortly be in D-cache.
> Then use the indicator value to index a jump or call table (which I
> guess is fairly predictable as most characters are lower case letters.
>
> But what do I know? :-(

Compilers vary a lot in their handling of switches, so I can only
comment about ours. Here's the internal comment from the internalize.cc
module, which among other things converts switches from genAsm
(approximately LLVM IR) to internal representation.

********************************************************************

/* Each of the switchInfos has a home ebb (where the switch statement
appeared) and a drop with the value that governs the switch. There
are also a dense set of cases, each with an integer range of values
and a target ebb that should get control if the governing drop has
one of these values. This function builds a CFG that routes control to
the right target.

Very large and dense case spaces are best handled by a bounding
range compare followed by indirection through a jump table. Very sparse
case spaces are best handled by a cascade of explicit equality tests
controlling branches to the targets, with the default forming the
fall-through. Dense case spaces with few distinct targets are best
handled by cascaded less-than compares with their branches. A single
switchInfo may show each of these types in different portions of the
case space.

Because the Mill can have multiple conditional branches per
instruction and needs only one prediction per ebb regardless of the
number of branches contained, it is better to put a large part of a
branch cascade in a single ebb than to use a tree of ebbs following a
binary search. However, on average half of those branches will be
executed before a winner is found, and for large cascades on small Mills
the time to run down a one-ebb cascade exceeds the time to break the
cascade in half by branching at a pivot value in the middle.

Of course, pivot branches can themselves be cascaded, thus
partitioning the case space into sub-spaces which again may be
partitioned recursively using sub-cascades. The member-dependent tuning
variable switchCascadeSize governs how big a (sub-) cascade must be
before it is broken by pivoting.

Each of the ranges can be isolated by a "less than" comparison, so
the maximal number of branches is the size of the ranges vector. A
less-than compare requires one branch per range, while an equality
compare requires one branch per non-default value. Consequently equality
takes one branch for two ranges when a non-default range of extent one
is adjacent to a default range whereas less-than would take two
branches; a non-default range of extent greater than one takes one
branch using less-than and a number of branches equal to the extent
using equality; and a non-default branch of extent one adjacent to a
non-default of any size takes one branch in both less-than and equality.

The worst case is a run of extent-one ranges. If there are N
ranges, then sequential equality needs N branches and will execute N/2
of them. With less-than you need one branch to split N to 2*(N/2), 2 to
split 2*(N/2) into 4*(N/4) and so on. That is 2N-1 total to the bottom
level, and log2(N) levels. So less-than will always need more branches
than equality, because the bottom level of a less-than is the same as an
equality and there are all the upper levels on top of that.

As for latency, the dividing point is when log2(N) is smaller than
N/2; the break-even is 8. So we less-than down until the number of cases
is eight or less, and do sequential equality after that.
*/
********************************************************************

Once this analysis is done, other code recognizes where a jump table or
comb would be better in part or all of the decision tree. The compiler
does not generate a compressing lookup table as you suggest though it
could and perhaps should. It does generate comb bitmasks for switches
where a single target is very frequent but not dense. The comb removes
all the frequent case and leaves their values as don't-care, which
simplifies the decision tree of the remaining cases.

While this function does the case partitioning, it ignores the code
located at each label, which will reside in its own ebb targeted by one
or more branches in the partitioning. Post partitioning, the control
flow is a normal CFG and has the same representation as it would have
had if the source had been written as the equivalent set of if-then.

As in any Mill CFG, subsequent processing does aggressive if-conversion
and block folding, so that nearly always the case target code gets
folded into the decision tree with appropriate predication. One of the
Mill talks (https://millcomputing.com/docs/switches/) shows how this
works in practice on a user-supplied example switch.

"The talk works through the compiled code for an example with eight
cases spanning 100 values. Spoiler: a mid-range Mill does the whole
switch, including the action bodies for each case, in three instruction
cycles and no table. Which isn’t too bad, considering that hand-crafted
Mill code can do it in two."

It would be interesting to see what code My66 has for this example.

Re: Minor idea for indirect target predictor

<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18223&group=comp.arch#18223

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:149:: with SMTP id v9mr29611232qtw.144.1625010573633; Tue, 29 Jun 2021 16:49:33 -0700 (PDT)
X-Received: by 2002:a05:6830:447:: with SMTP id d7mr6483748otc.329.1625010573481; Tue, 29 Jun 2021 16:49:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 16:49:33 -0700 (PDT)
In-Reply-To: <sbg1u7$v63$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com> <sbg1u7$v63$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 23:49:33 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 102

by: MitchAlsup - Tue, 29 Jun 2021 23:49 UTC

On Tuesday, June 29, 2021 at 4:05:13 PM UTC-5, Ivan Godard wrote:
> On 6/29/2021 1:33 PM, MitchAlsup wrote:
> > On Tuesday, June 29, 2021 at 2:27:46 PM UTC-5, Ivan Godard wrote:
> >> On 6/29/2021 9:01 AM, BGB wrote:
> >>> On 6/28/2021 10:45 PM, Andy wrote:
> >>>> On 27/06/21 7:31 am, MitchAlsup wrote:
> >>>>
> >>>>> There are five cases: Switches, Returns, Method calls, Tabulated
> >>>>> Calls, and Longjump.
> >>>>> <
> >>>>> Switches tend to be unpredictable much of the time.
> >>>>> Returns are easily highly predictable
> >>>>> Method calls tend to use few methods more than others
> >>>>> Tabulated calls tend to be indexed tables of entry points
> >>>>> And longjump is a category all by itself.
> >>>>
> >>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
> >>>> reducing the number of conditional instructions spent checking for
> >>>> error return codes and having otherwise unneeded instructions situated
> >>>> right after the original call.
> >>>>
> >>>
> >>> A dedicated call-return table probably only really makes sense for
> >>> something like x86, which loads the return address from memory at the
> >>> same time as branching to it.
> >>>
> >>> If you use a link register, or return via some other special register
> >>> (such as treating R1 as a secondary link register), then all one needs
> >>> to do is check is that this register hasn't been modified within the
> >>> pipeline, and if so use the value held in this register as the predicted
> >>> branch target.
> >>>
> >>> In this case, typically one can reload the LR or "Secondary LR" several
> >>> cycles before its associated RTS or JMP. If one reloads the link
> >>> register before the others in a function epilog, then (typically) the
> >>> reloaded register will have left the pipeline before the CPU reaches the
> >>> return instruction.
> >>>
> >>>
> >>> Meanwhile, switch faces a problem that typically one calculates the
> >>> switch target immediately before branching to it. If one could somehow
> >>> do something like:
> >>> Calculate where the "switch()" will go;
> >>> Put some prior unrelated logic here;
> >>> Do the branch to the switch target.
> >>>
> >>> Then "switch()" could potentially also be made predictable.
> >> You can do exactly that: figure index; index-load from a table in
> >> memory; do other stuff; indirect branch. Of course, that puts the table
> >> in dmem, which Mitch doesn't like, or requires a load instruction from imem.
> > <
> > Switch is common enough that it should be performed in an instruction
> > (the multiway branch) I added limit checking to this due to how it
> > "worked out" in my ISA.
> > <
> > Also note: the tables in my switches remain program relative, so you fetch
> > from ICache and then add this to the IP for the target address. This saves
> > table size as the majority of all tables fit in 16-bit offsets, retains program
> > relativity.
> > <
> >>> For small switch one might instead use linear probing or binary
> >>> subdivision or similar, in which case it is predictable if the
> >>> individual branches are predictable.
> >>>
> >>> Say (number of case labels):
> >>> 1-3: Linear Probe
> >>> 4-11: Binary Subdivide
> >>> 12-511 (and sufficiently dense): Branch Table
> >>> Else: Binary Subdivide (1)
> >>>
> >>> *1: Binary subdivide happens until either it reaches the conditions for
> >>> a branch-table, or the number of targets is small enough that it falls
> >>> back to a linear probe.
> >>>
> >>> ...
> >> What matters is the distribution of indices, not the number of targets.
> >> Think a typical lexer: you want to isolate all alphas with one compare,
> >> not a four way binary to a particular alpha, because all the alphas go
> >> to a single label. Once you do that, normal ifconversion will get rid of
> >> half the internal branches of the decision tree..
> > <
> > In any event, any HW is going to be able to sort through the situation
> > faster than any string of SW instructions.
> >
> Arguable.
>
> Label density matters for the tables too:
> switch (<32 bit>) {
> <cases: 11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999>
>
> Do you really want 100k labels in your table, or a 3-deep compare tree?
<
That is considered "not dense", and get composed from ::
<
if( value==11111)
{}
else if( value ==22222 )
{}
......
>
> Out of curiosity, what happens when you throw that at Brian's compiler?
<
Don't know:: Brian ??

Re: Minor idea for indirect target predictor

<7d7e6bd5-f82a-45c6-a79c-9551abbd04d9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18225&group=comp.arch#18225

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1116:: with SMTP id o22mr18700037qkk.299.1625010732932;
Tue, 29 Jun 2021 16:52:12 -0700 (PDT)
X-Received: by 2002:a4a:98ce:: with SMTP id b14mr4719996ooj.69.1625010732739;
Tue, 29 Jun 2021 16:52:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 16:52:12 -0700 (PDT)
In-Reply-To: <jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7d7e6bd5-f82a-45c6-a79c-9551abbd04d9n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 23:52:12 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Tue, 29 Jun 2021 23:52 UTC

On Tuesday, June 29, 2021 at 5:10:08 PM UTC-5, Stefan Monnier wrote:
> > Switch is common enough that it should be performed in an instruction
> > (the multiway branch) I added limit checking to this due to how it
> > "worked out" in my ISA.
<
> BTW, what kind of prediction structure do you foresee for your "multiway
> branch" instructions? Do you foresee predicting the index (and then
> looking up the corresponding target address in the table), so similar to
> a taken/nottaken predictor but just returning more than one bit, or do
> you foresee something more like a BTB (i.e. predict the actual target
> address of the branch)?
<
For narrow machines:: do absolutely nothing
For GBOoO machines:: try to take a good guess
For things in the middle:: do a lot of simulation
<
>
>
> Stefan

Re: Minor idea for indirect target predictor

<44917793-4650-4f7f-8908-edfbabc1fd1fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18227&group=comp.arch#18227

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5c8c:: with SMTP id r12mr29179075qta.265.1625011046203;
Tue, 29 Jun 2021 16:57:26 -0700 (PDT)
X-Received: by 2002:a9d:7cd1:: with SMTP id r17mr6260563otn.110.1625011046001;
Tue, 29 Jun 2021 16:57:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 16:57:25 -0700 (PDT)
In-Reply-To: <sbg6bf$rrm$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <44917793-4650-4f7f-8908-edfbabc1fd1fn@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 23:57:26 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Tue, 29 Jun 2021 23:57 UTC

On Tuesday, June 29, 2021 at 5:20:33 PM UTC-5, Stephen Fuld wrote:
> On 6/29/2021 12:27 PM, Ivan Godard wrote:
> > On 6/29/2021 9:01 AM, BGB wrote:
> >> On 6/28/2021 10:45 PM, Andy wrote:
> >>> On 27/06/21 7:31 am, MitchAlsup wrote:
> >>>
> >>>> There are five cases: Switches, Returns, Method calls, Tabulated
> >>>> Calls, and Longjump.
> >>>> <
> >>>> Switches tend to be unpredictable much of the time.
> >>>> Returns are easily highly predictable
> >>>> Method calls tend to use few methods more than others
> >>>> Tabulated calls tend to be indexed tables of entry points
> >>>> And longjump is a category all by itself.
> >>>
> >>> Ummm, so no love for Call Return Tables?, wouldn't they go some way
> >>> to reducing the number of conditional instructions spent checking for
> >>> error return codes and having otherwise unneeded instructions
> >>> situated right after the original call.
> >>>
> >>
> >> A dedicated call-return table probably only really makes sense for
> >> something like x86, which loads the return address from memory at the
> >> same time as branching to it.
> >>
> >> If you use a link register, or return via some other special register
> >> (such as treating R1 as a secondary link register), then all one needs
> >> to do is check is that this register hasn't been modified within the
> >> pipeline, and if so use the value held in this register as the
> >> predicted branch target.
> >>
> >> In this case, typically one can reload the LR or "Secondary LR"
> >> several cycles before its associated RTS or JMP. If one reloads the
> >> link register before the others in a function epilog, then (typically)
> >> the reloaded register will have left the pipeline before the CPU
> >> reaches the return instruction.
> >>
> >>
> >> Meanwhile, switch faces a problem that typically one calculates the
> >> switch target immediately before branching to it. If one could somehow
> >> do something like:
> >> Calculate where the "switch()" will go;
> >> Put some prior unrelated logic here;
> >> Do the branch to the switch target.
> >>
> >> Then "switch()" could potentially also be made predictable.
> >
> > You can do exactly that: figure index; index-load from a table in
> > memory; do other stuff; indirect branch. Of course, that puts the table
> > in dmem, which Mitch doesn't like, or requires a load instruction from
> > imem.
> >
> >> For small switch one might instead use linear probing or binary
> >> subdivision or similar, in which case it is predictable if the
> >> individual branches are predictable.
> >>
> >> Say (number of case labels):
> >> 1-3: Linear Probe
> >> 4-11: Binary Subdivide
> >> 12-511 (and sufficiently dense): Branch Table
> >> Else: Binary Subdivide (1)
> >>
> >> *1: Binary subdivide happens until either it reaches the conditions
> >> for a branch-table, or the number of targets is small enough that it
> >> falls back to a linear probe.
> >>
> >> ...
> >
> > What matters is the distribution of indices, not the number of targets.
> > Think a typical lexer: you want to isolate all alphas with one compare,
> > not a four way binary to a particular alpha, because all the alphas go
> > to a single label. Once you do that, normal ifconversion will get rid of
> > half the internal branches of the decision tree..
<
> I am not a compiler guy, so forgive my ignorance. What would a compiler
> do in such a situation? Would it first test for something like if char
> >= "a" and <= "z", etc.?
<
table[ c ] & ALPHABETIC
or
table[ c ] & (ALPHABETIC | NUMERIC)
>
> Just thinking about how I might do it, I think I would have a 256 byte
> array, indexed by the character value, with each entry having an
> indicator value, say 1 for lower case letters, 2 for upper case letters,
<
When you get more than 8 flags per character, the table grows into 16-bit
entries......
<
# define ALPHABETIC (1<<0)
# define NUMERIC (1<<1)
# define OPERATOR (1<<2)
# define SEPARATOR (1<<3)
# define WHITE (1<<4)
......
<
> 3 for numbers, 4 for white space characters, etc. Looking up the value
> shouldn't be costly as most of the table with shortly be in D-cache.
> Then use the indicator value to index a jump or call table (which I
> guess is fairly predictable as most characters are lower case letters.
>
> But what do I know? :-(
>
>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Minor idea for indirect target predictor

<jwv1r8kdrbu.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18230&group=comp.arch#18230

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 22:23:26 -0400
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <jwv1r8kdrbu.fsf-monnier+comp.arch@gnu.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org>
<7d7e6bd5-f82a-45c6-a79c-9551abbd04d9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="48dc9934e6c6b6e6adbb2f03a0f875ee";
logging-data="5099"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JgfPhWtZQhhxFPk4tyoLG"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:KBtO6v9U/MAqZCzVRw9UTPFxMXY=
sha1:hLczNocFuQ2DHfeEpJXJQ26WW0w=

by: Stefan Monnier - Wed, 30 Jun 2021 02:23 UTC

MitchAlsup [2021-06-29 16:52:12] wrote:
> On Tuesday, June 29, 2021 at 5:10:08 PM UTC-5, Stefan Monnier wrote:
>> > Switch is common enough that it should be performed in an instruction
>> > (the multiway branch) I added limit checking to this due to how it
>> > "worked out" in my ISA.
>> BTW, what kind of prediction structure do you foresee for your "multiway
>> branch" instructions? Do you foresee predicting the index (and then
>> looking up the corresponding target address in the table), so similar to
>> a taken/nottaken predictor but just returning more than one bit, or do
>> you foresee something more like a BTB (i.e. predict the actual target
>> address of the branch)?
> For narrow machines:: do absolutely nothing

Meaning... stall the pipeline?

Stefan

Re: Minor idea for indirect target predictor

<f8154a82-4a16-472f-a604-5dc0f34b1faan@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18231&group=comp.arch#18231

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a649:: with SMTP id p70mr12243794qke.225.1625020380155;
Tue, 29 Jun 2021 19:33:00 -0700 (PDT)
X-Received: by 2002:a05:6830:447:: with SMTP id d7mr6932358otc.329.1625020379921;
Tue, 29 Jun 2021 19:32:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 19:32:59 -0700 (PDT)
In-Reply-To: <jwv1r8kdrbu.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org> <7d7e6bd5-f82a-45c6-a79c-9551abbd04d9n@googlegroups.com>
<jwv1r8kdrbu.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f8154a82-4a16-472f-a604-5dc0f34b1faan@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 30 Jun 2021 02:33:00 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Wed, 30 Jun 2021 02:32 UTC

On Tuesday, June 29, 2021 at 9:23:29 PM UTC-5, Stefan Monnier wrote:
> MitchAlsup [2021-06-29 16:52:12] wrote:
> > On Tuesday, June 29, 2021 at 5:10:08 PM UTC-5, Stefan Monnier wrote:
> >> > Switch is common enough that it should be performed in an instruction
> >> > (the multiway branch) I added limit checking to this due to how it
> >> > "worked out" in my ISA.
> >> BTW, what kind of prediction structure do you foresee for your "multiway
> >> branch" instructions? Do you foresee predicting the index (and then
> >> looking up the corresponding target address in the table), so similar to
> >> a taken/nottaken predictor but just returning more than one bit, or do
> >> you foresee something more like a BTB (i.e. predict the actual target
> >> address of the branch)?
> > For narrow machines:: do absolutely nothing
> Meaning... stall the pipeline?
<
You need to remember My 66000 ISA has a Jump-through-table instruction
that performs dense switch ranges. So, it takes a few cycles (on the order of
a LD) and then control arrives at case label.
<
You can call this stalling of the pipeline, but SW did not have to perform the
range compare, and LD through the index, then JMP to the case label. All in
all it saves 3-4 instructions and takes (on average) a bit less time than the
SW equivalent, so at worst it is break even when compared to std RISC
architectures.
<
>
>
> Stefan

Re: Minor idea for indirect target predictor

<jwvpmw4cbrw.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18233&group=comp.arch#18233

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 23:00:02 -0400
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <jwvpmw4cbrw.fsf-monnier+comp.arch@gnu.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org>
<7d7e6bd5-f82a-45c6-a79c-9551abbd04d9n@googlegroups.com>
<jwv1r8kdrbu.fsf-monnier+comp.arch@gnu.org>
<f8154a82-4a16-472f-a604-5dc0f34b1faan@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="48dc9934e6c6b6e6adbb2f03a0f875ee";
logging-data="12558"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18cK6AUqrvbxIE4B4jj9NIb"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:ZohfLVF4L4VuI+agsMxtlJhuq4k=
sha1:taMV38IImNpg7DX6yQ++Sw4TGkk=

by: Stefan Monnier - Wed, 30 Jun 2021 03:00 UTC

>> >> BTW, what kind of prediction structure do you foresee for your "multiway
>> >> branch" instructions? Do you foresee predicting the index (and then
>> >> looking up the corresponding target address in the table), so similar to
>> >> a taken/nottaken predictor but just returning more than one bit, or do
>> >> you foresee something more like a BTB (i.e. predict the actual target
>> >> address of the branch)?
>> > For narrow machines:: do absolutely nothing
>> Meaning... stall the pipeline?
> You need to remember My 66000 ISA has a Jump-through-table instruction
> that performs dense switch ranges.

Yes my question is specifically about those instructions.

> So, it takes a few cycles (on the order of a LD) and then control
> arrives at case label.
[...]
> You can call this stalling of the pipeline, but SW did not have to perform the
> range compare, and LD through the index, then JMP to the case label. All in
> all it saves 3-4 instructions and takes (on average) a bit less time than the
> SW equivalent, so at worst it is break even when compared to std RISC
> architectures.

I guess the SW equivalent has typically 2 branches (a conditional one for
out-of-bounds handling and an indirect one for the actual jump), so it
likely won't take less than 2 cycles on most systems?

You're probably right that on a narrow enough machine, stalling might be
about as good. Still, after you fetch&decode the switch instruction,
you have to:
- wait for all the data-dependencies to clear in order to get
your index. For a 10-deep pipeline, that could be what, 2 cycles?
- fetch the data from the L1$. Latency would be what? 3 cycles?
do you think you can shorten that by prefetching the cache line before
we know the index and then using the index into the cache line to
reduce this part of the latency to a single cycle?
- then finally we have the address of the next instruction and can fetch
it from L1$.
If the target is hard to predict, stalling (and working to minimize the
latency) is the better option, indeed.

Stefan

Re: Minor idea for indirect target predictor

<sbgqtl$29s$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18235&group=comp.arch#18235

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 23:09:05 -0500
Organization: A noiseless patient Spider
Lines: 119
Message-ID: <sbgqtl$29s$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 04:11:33 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="2364"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/p97PelTf2Ru6RVxXfqmt5"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:95MF9DPBSpiVsLEA8OIS/onyRXg=
In-Reply-To: <sbfs7g$o15$1@dont-email.me>
Content-Language: en-US
X-Mozilla-News-Host: news://news.albasani.net

by: BGB - Wed, 30 Jun 2021 04:09 UTC

On 6/29/2021 2:27 PM, Ivan Godard wrote:
> On 6/29/2021 9:01 AM, BGB wrote:
>> On 6/28/2021 10:45 PM, Andy wrote:
>>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>>
>>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>>> Calls, and Longjump.
>>>> <
>>>> Switches tend to be unpredictable much of the time.
>>>> Returns are easily highly predictable
>>>> Method calls tend to use few methods more than others
>>>> Tabulated calls tend to be indexed tables of entry points
>>>> And longjump is a category all by itself.
>>>
>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way
>>> to reducing the number of conditional instructions spent checking for
>>> error return codes and having otherwise unneeded instructions
>>> situated right after the original call.
>>>
>>
>> A dedicated call-return table probably only really makes sense for
>> something like x86, which loads the return address from memory at the
>> same time as branching to it.
>>
>> If you use a link register, or return via some other special register
>> (such as treating R1 as a secondary link register), then all one needs
>> to do is check is that this register hasn't been modified within the
>> pipeline, and if so use the value held in this register as the
>> predicted branch target.
>>
>> In this case, typically one can reload the LR or "Secondary LR"
>> several cycles before its associated RTS or JMP. If one reloads the
>> link register before the others in a function epilog, then (typically)
>> the reloaded register will have left the pipeline before the CPU
>> reaches the return instruction.
>>
>>
>> Meanwhile, switch faces a problem that typically one calculates the
>> switch target immediately before branching to it. If one could somehow
>> do something like:
>>    Calculate where the "switch()" will go;
>>    Put some prior unrelated logic here;
>>    Do the branch to the switch target.
>>
>> Then "switch()" could potentially also be made predictable.
>
> You can do exactly that: figure index; index-load from a table in
> memory; do other stuff; indirect branch. Of course, that puts the table
> in dmem, which Mitch doesn't like, or requires a load instruction from
> imem.
>

It is a tradeoff...

My current implementation of switch tables is mostly based on
double-jumping, but this doesn't effect things too much. One calculates
where to jump into the jump table, rather than loading an address from a
table or similar (and adding it against another label used as a
reference point).

>> For small switch one might instead use linear probing or binary
>> subdivision or similar, in which case it is predictable if the
>> individual branches are predictable.
>>
>> Say (number of case labels):
>>    1-3: Linear Probe
>>    4-11: Binary Subdivide
>>    12-511 (and sufficiently dense): Branch Table
>>    Else: Binary Subdivide (1)
>>
>> *1: Binary subdivide happens until either it reaches the conditions
>> for a branch-table, or the number of targets is small enough that it
>> falls back to a linear probe.
>>
>> ...
>
> What matters is the distribution of indices, not the number of targets.
> Think a typical lexer: you want to isolate all alphas with one compare,
> not a four way binary to a particular alpha, because all the alphas go
> to a single label. Once you do that, normal ifconversion will get rid of
> half the internal branches of the decision tree..

It depends a fair bit on the pattern...

If we have a fairly dense table, such as ASCII characters, ideally one
wants a single jump through a table (it is compiled as a single jump
through a single table, if it fits the pattern that allows it to do so).

If it is a sparse distribution, such as TWOCC or FOURCC values, one
doesn't want a big jump table (trying to map a sparse TWOCC space to a
jump-table would waste absurd amounts of space).

Limiting the maximum size of the table also helps limit cases where a
large switch has dense regions and sparse regions, but the average
density falls just over the minimum limit. Splitting may result in the
large sparse regions effectively getting chopped off (and saving a lot
of space overall for large switch blocks).

How to partition the table optimally is non-trivial, and the current
implementation recursively divides the table roughly along the median
value (more likely to chop off sparse regions than splitting the table
at the center).

As noted, the typical worst-case number of branches is ~ log2(n_keys)
for the binary table. For direct jumps it is less, but then the table
has an ~ (key_max-key_min)*4 space overhead, which for sparse tables may
be significant.

So, I mostly fiddled with it to come up with something which seemed
acceptable.

....

Re: Minor idea for indirect target predictor

<sbgsgv$i79$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18236&group=comp.arch#18236

copy link Newsgroups: comp.arch

Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Tue, 29 Jun 2021 23:36:27 -0500
Organization: A noiseless patient Spider
Lines: 146
Message-ID: <sbgsgv$i79$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 30 Jun 2021 04:38:55 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="18665"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18qiwP1E4dv5udksgiHLyNZ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:p+utCi/lb+av8v0F/qHpfPL8hL0=
In-Reply-To: <sbg1u7$v63$1@dont-email.me>
Content-Language: en-US

by: BGB - Wed, 30 Jun 2021 04:36 UTC

On 6/29/2021 4:05 PM, Ivan Godard wrote:
> On 6/29/2021 1:33 PM, MitchAlsup wrote:
>> On Tuesday, June 29, 2021 at 2:27:46 PM UTC-5, Ivan Godard wrote:
>>> On 6/29/2021 9:01 AM, BGB wrote:
>>>> On 6/28/2021 10:45 PM, Andy wrote:
>>>>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>>>>
>>>>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>>>>> Calls, and Longjump.
>>>>>> <
>>>>>> Switches tend to be unpredictable much of the time.
>>>>>> Returns are easily highly predictable
>>>>>> Method calls tend to use few methods more than others
>>>>>> Tabulated calls tend to be indexed tables of entry points
>>>>>> And longjump is a category all by itself.
>>>>>
>>>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
>>>>> reducing the number of conditional instructions spent checking for
>>>>> error return codes and having otherwise unneeded instructions situated
>>>>> right after the original call.
>>>>>
>>>>
>>>> A dedicated call-return table probably only really makes sense for
>>>> something like x86, which loads the return address from memory at the
>>>> same time as branching to it.
>>>>
>>>> If you use a link register, or return via some other special register
>>>> (such as treating R1 as a secondary link register), then all one needs
>>>> to do is check is that this register hasn't been modified within the
>>>> pipeline, and if so use the value held in this register as the
>>>> predicted
>>>> branch target.
>>>>
>>>> In this case, typically one can reload the LR or "Secondary LR" several
>>>> cycles before its associated RTS or JMP. If one reloads the link
>>>> register before the others in a function epilog, then (typically) the
>>>> reloaded register will have left the pipeline before the CPU reaches
>>>> the
>>>> return instruction.
>>>>
>>>>
>>>> Meanwhile, switch faces a problem that typically one calculates the
>>>> switch target immediately before branching to it. If one could somehow
>>>> do something like:
>>>> Calculate where the "switch()" will go;
>>>> Put some prior unrelated logic here;
>>>> Do the branch to the switch target.
>>>>
>>>> Then "switch()" could potentially also be made predictable.
>>> You can do exactly that: figure index; index-load from a table in
>>> memory; do other stuff; indirect branch. Of course, that puts the table
>>> in dmem, which Mitch doesn't like, or requires a load instruction
>>> from imem.
>> <
>> Switch is common enough that it should be performed in an instruction
>> (the multiway branch) I added limit checking to this due to how it
>> "worked out" in my ISA.
>> <
>> Also note: the tables in my switches remain program relative, so you
>> fetch
>> from ICache and then add this to the IP for the target address. This
>> saves
>> table size as the majority of all tables fit in 16-bit offsets,
>> retains program
>> relativity.
>> <
>>>> For small switch one might instead use linear probing or binary
>>>> subdivision or similar, in which case it is predictable if the
>>>> individual branches are predictable.
>>>>
>>>> Say (number of case labels):
>>>> 1-3: Linear Probe
>>>> 4-11: Binary Subdivide
>>>> 12-511 (and sufficiently dense): Branch Table
>>>> Else: Binary Subdivide (1)
>>>>
>>>> *1: Binary subdivide happens until either it reaches the conditions for
>>>> a branch-table, or the number of targets is small enough that it falls
>>>> back to a linear probe.
>>>>
>>>> ...
>>> What matters is the distribution of indices, not the number of targets.
>>> Think a typical lexer: you want to isolate all alphas with one compare,
>>> not a four way binary to a particular alpha, because all the alphas go
>>> to a single label. Once you do that, normal ifconversion will get rid of
>>> half the internal branches of the decision tree..
>> <
>> In any event, any HW is going to be able to sort through the situation
>> faster than any string of SW instructions.
>>
>
> Arguable.
>
> Label density matters for the tables too:
> switch (<32 bit>) {
> <cases: 11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999>
>
> Do you really want 100k labels in your table, or a 3-deep compare tree?
>
> Out of curiosity, what happens when you throw that at Brian's compiler?

Hmm, hand-simulating my algo:
Range is 88888, so split along median 55555;
{11111, 22222, 33333, 44444}
N=4, Range=33333, Median=33333 (avg=27778), So Split:
{11111, 22222}, {33333, 44444}

{55555, 66666, 77777, 88888, 99999}
N=5, Range=44444, Median=77777, So Split:
{55555, 66666}, {77777, 88888, 99999}

No jump tables are used given table density is ~ 0.001%, which is well
below the minimum...

So, in BGBCC, this would compile as if it were, roughly:
if(val>=55555)
{
if(val>=77777)
{
if(val==77777) goto case_77777;
if(val==88888) goto case_88888;
if(val==99999) goto case_99999;
goto case_default;
}else
{
if(val==55555) goto case_55555;
if(val==66666) goto case_66666;
goto case_default;
}
}else
{
if(val>=33333)
{
if(val==33333) goto case_33333;
if(val==44444) goto case_44444;
goto case_default;
}else
{
if(val==11111) goto case_11111;
if(val==22222) goto case_22222;
goto case_default;
}
}

Re: Minor idea for indirect target predictor

<sbgupg$8mr$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18238&group=comp.arch#18238

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 00:15:09 -0500
Organization: A noiseless patient Spider
Lines: 135
Message-ID: <sbgupg$8mr$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbgqtl$29s$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 05:17:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="8923"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Ujs9SMc29eNwE6wJAlgaG"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:PrC2szWhd9I8QbT0nw5+NKWO4nU=
In-Reply-To: <sbgqtl$29s$1@dont-email.me>
Content-Language: en-US

by: BGB - Wed, 30 Jun 2021 05:15 UTC

On 6/29/2021 11:09 PM, BGB wrote:
> On 6/29/2021 2:27 PM, Ivan Godard wrote:
>> On 6/29/2021 9:01 AM, BGB wrote:
>>> On 6/28/2021 10:45 PM, Andy wrote:
>>>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>>>
>>>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>>>> Calls, and Longjump.
>>>>> <
>>>>> Switches tend to be unpredictable much of the time.
>>>>> Returns are easily highly predictable
>>>>> Method calls tend to use few methods more than others
>>>>> Tabulated calls tend to be indexed tables of entry points
>>>>> And longjump is a category all by itself.
>>>>
>>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way
>>>> to reducing the number of conditional instructions spent checking
>>>> for error return codes and having otherwise unneeded instructions
>>>> situated right after the original call.
>>>>
>>>
>>> A dedicated call-return table probably only really makes sense for
>>> something like x86, which loads the return address from memory at the
>>> same time as branching to it.
>>>
>>> If you use a link register, or return via some other special register
>>> (such as treating R1 as a secondary link register), then all one
>>> needs to do is check is that this register hasn't been modified
>>> within the pipeline, and if so use the value held in this register as
>>> the predicted branch target.
>>>
>>> In this case, typically one can reload the LR or "Secondary LR"
>>> several cycles before its associated RTS or JMP. If one reloads the
>>> link register before the others in a function epilog, then
>>> (typically) the reloaded register will have left the pipeline before
>>> the CPU reaches the return instruction.
>>>
>>>
>>> Meanwhile, switch faces a problem that typically one calculates the
>>> switch target immediately before branching to it. If one could
>>> somehow do something like:
>>>    Calculate where the "switch()" will go;
>>>    Put some prior unrelated logic here;
>>>    Do the branch to the switch target.
>>>
>>> Then "switch()" could potentially also be made predictable.
>>
>> You can do exactly that: figure index; index-load from a table in
>> memory; do other stuff; indirect branch. Of course, that puts the
>> table in dmem, which Mitch doesn't like, or requires a load
>> instruction from imem.
>>
>
> It is a tradeoff...
>
> My current implementation of switch tables is mostly based on
> double-jumping, but this doesn't effect things too much. One calculates
> where to jump into the jump table, rather than loading an address from a
> table or similar (and adding it against another label used as a
> reference point).
>
>
>>> For small switch one might instead use linear probing or binary
>>> subdivision or similar, in which case it is predictable if the
>>> individual branches are predictable.
>>>
>>> Say (number of case labels):
>>>    1-3: Linear Probe
>>>    4-11: Binary Subdivide
>>>    12-511 (and sufficiently dense): Branch Table
>>>    Else: Binary Subdivide (1)
>>>
>>> *1: Binary subdivide happens until either it reaches the conditions
>>> for a branch-table, or the number of targets is small enough that it
>>> falls back to a linear probe.
>>>
>>> ...
>>
>> What matters is the distribution of indices, not the number of
>> targets. Think a typical lexer: you want to isolate all alphas with
>> one compare, not a four way binary to a particular alpha, because all
>> the alphas go to a single label. Once you do that, normal ifconversion
>> will get rid of half the internal branches of the decision tree..
>
> It depends a fair bit on the pattern...
>
> If we have a fairly dense table, such as ASCII characters, ideally one
> wants a single jump through a table (it is compiled as a single jump
> through a single table, if it fits the pattern that allows it to do so).
>
>
> If it is a sparse distribution, such as TWOCC or FOURCC values, one
> doesn't want a big jump table (trying to map a sparse TWOCC space to a
> jump-table would waste absurd amounts of space).
>
> Limiting the maximum size of the table also helps limit cases where a
> large switch has dense regions and sparse regions, but the average
> density falls just over the minimum limit. Splitting may result in the
> large sparse regions effectively getting chopped off (and saving a lot
> of space overall for large switch blocks).
>
>
> How to partition the table optimally is non-trivial, and the current
> implementation recursively divides the table roughly along the median
> value (more likely to chop off sparse regions than splitting the table
> at the center).
>

Self-correction:
A value chosen from the middle of the range which approximates the
average value.

I realized after posting that the "median" would mean picking the value
from the middle of the table (and dividing it uniformly in half), which
isn't quite right.

Though, looking at it again, did go and tweak the algo used for pivot
selection here.

>
> As noted, the typical worst-case number of branches is ~ log2(n_keys)
> for the binary table. For direct jumps it is less, but then the table
> has an ~ (key_max-key_min)*4 space overhead, which for sparse tables may
> be significant.
>
>
> So, I mostly fiddled with it to come up with something which seemed
> acceptable.
>
>
> ...
>

Stephen Fuld wrote:
> On 6/29/2021 12:27 PM, Ivan Godard wrote:
> I am not a compiler guy, so forgive my ignorance. What would a compiler
> do in such a situation? Would it first test for something like if char
> >= "a" and <= "z", etc.?
>
> Just thinking about how I might do it, I think I would have a 256 byte
> array, indexed by the character value, with each entry having an
> indicator value, say 1 for lower case letters, 2 for upper case letters,
> 3 for numbers, 4 for white space characters, etc. Looking up the value
> shouldn't be costly as most of the table with shortly be in D-cache.
> Then use the indicator value to index a jump or call table (which I
> guess is fairly predictable as most characters are lower case letters.
>
> But what do I know? :-(

About the same as the standard C lib?

I.e. the classic character classification macros use just such a table.

As I noted previously, it tends to break down a bit with utf8, but you
can at least handle all 7-bit ascii very quickly.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Minor idea for indirect target predictor

<sbhofu$o28$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18249&group=comp.arch#18249

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 12:36:14 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbhofu$o28$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 12:36:14 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="24648"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Wed, 30 Jun 2021 12:36 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Stephen Fuld wrote:
>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>> I am not a compiler guy, so forgive my ignorance. What would a compiler
>> do in such a situation? Would it first test for something like if char
>> >= "a" and <= "z", etc.?
>>
>> Just thinking about how I might do it, I think I would have a 256 byte
>> array, indexed by the character value, with each entry having an
>> indicator value, say 1 for lower case letters, 2 for upper case letters,
>> 3 for numbers, 4 for white space characters, etc. Looking up the value
>> shouldn't be costly as most of the table with shortly be in D-cache.
>> Then use the indicator value to index a jump or call table (which I
>> guess is fairly predictable as most characters are lower case letters.
>>
>> But what do I know? :-(
>
> About the same as the standard C lib?
>
> I.e. the classic character classification macros use just such a table.
>
> As I noted previously, it tends to break down a bit with utf8, but you
> can at least handle all 7-bit ascii very quickly.

And if you don't sanitize your input, you can get "interesting"
results, such as when the value is negative for a signed char
or when testing if 99999 is printable.

There may be a few CVEs along those lines.

Re: Minor idea for indirect target predictor

<sbhspq$6so$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18250&group=comp.arch#18250

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 08:47:18 -0500
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <sbhspq$6so$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 13:49:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="7064"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/hKGc7di+cxdN0mYrDcP2S"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:uCN1LSZLCU8w3a9XrAorrVy3ye4=
In-Reply-To: <sbhng6$17b$1@gioia.aioe.org>
Content-Language: en-US

by: BGB - Wed, 30 Jun 2021 13:47 UTC

On 6/30/2021 7:19 AM, Terje Mathisen wrote:
> Stephen Fuld wrote:
>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>> I am not a compiler guy, so forgive my ignorance. What would a
>> compiler do in such a situation? Would it first test for something
>> like if char >= "a" and <= "z", etc.?
>>
>> Just thinking about how I might do it, I think I would have a 256 byte
>> array, indexed by the character value, with each entry having an
>> indicator value, say 1 for lower case letters, 2 for upper case
>> letters, 3 for numbers, 4 for white space characters, etc. Looking up
>> the value shouldn't be costly as most of the table with shortly be in
>> D-cache. Then use the indicator value to index a jump or call table
>> (which I guess is fairly predictable as most characters are lower case
>> letters.
>>
>> But what do I know? :-(
>
> About the same as the standard C lib?
>
> I.e. the classic character classification macros use just such a table.
>
> As I noted previously, it tends to break down a bit with utf8, but you
> can at least handle all 7-bit ascii very quickly.
>

Yeah, the "ctype.h" functions in the C library I am using worked this way.

A few functions, like toupper and tolower also used lookup tables.

In a few cases, when I rewrote some of them to have ASM special cases, I
eliminated the lookup tables in favor of direct range checks as direct
range-checks tended to be faster than the use of lookup tables.

Though, this sort of thing is not always the case, for example:
I recently tried experimentally modifying the Doom renderer to use
compressed textures and color modulation in place of 8-bpp textures and
lookups via a colormap table (with things like sprite graphics
effectively using alpha-testing in place of drawing a column in multiple
pieces).

The color-modulated logic tended to be slightly slower than the use of
colormaps (there was an ~ 20% increase in frame-times), I suspect due to
the slightly higher latency of the drawing funcs (and the loops I wrote
only doing one pixel per loop iteration rather than multiple pixels).

Note that flats were in Morton order, whereas wall patches and sprites
were in column order (with effectively 4 pixel columns being merged into
a single compressed-texture column).

Granted, there was also an certain amount of time being spent converting
from Doom's native formats into compressed-texture form, and there were
also some visible artifacts from the conversion process, ...

Re: Minor idea for indirect target predictor

<sbhukl$4a3$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18252&group=comp.arch#18252

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 16:21:08 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <sbhukl$4a3$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 14:21:09 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f6f66713ad55e4c61f2cd67d13e28223";
logging-data="4419"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3GCE8euphD2rBT2+zLfC+Gf+0ldEkG2M="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:qCzyDGL4vrw66onb7a3NSWNL6fY=
In-Reply-To: <sbhofu$o28$1@newsreader4.netcologne.de>
Content-Language: en-GB

by: David Brown - Wed, 30 Jun 2021 14:21 UTC

On 30/06/2021 14:36, Thomas Koenig wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>> Stephen Fuld wrote:
>>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>>> I am not a compiler guy, so forgive my ignorance. What would a compiler
>>> do in such a situation? Would it first test for something like if char
>>> >= "a" and <= "z", etc.?
>>>
>>> Just thinking about how I might do it, I think I would have a 256 byte
>>> array, indexed by the character value, with each entry having an
>>> indicator value, say 1 for lower case letters, 2 for upper case letters,
>>> 3 for numbers, 4 for white space characters, etc. Looking up the value
>>> shouldn't be costly as most of the table with shortly be in D-cache.
>>> Then use the indicator value to index a jump or call table (which I
>>> guess is fairly predictable as most characters are lower case letters.
>>>
>>> But what do I know? :-(
>>
>> About the same as the standard C lib?
>>
>> I.e. the classic character classification macros use just such a table.
>>
>> As I noted previously, it tends to break down a bit with utf8, but you
>> can at least handle all 7-bit ascii very quickly.

It breaks down a /lot/ with UTF-8 - the traditional <ctype.h>
classifications such as "isdigit" or "tolower" collapse pretty quickly
in the face of non-ASCII character sets. Locales can help a bit, but
you don't have to get very exotic before the answer is "it's complicated".

>
> And if you don't sanitize your input, you can get "interesting"
> results, such as when the value is negative for a signed char
> or when testing if 99999 is printable.
>
> There may be a few CVEs along those lines.
>

The C standard library functions here all take "int" parameters, with
the results defined for EOF or a value in the range of unsigned char.
Giving them anything else is undefined behaviour - it is the
programmer's fault, and the programmer's problem.

However, range testing the value is so simple and obvious that I would
not expect any C library implementation to have problems here.

Re: Minor idea for indirect target predictor

<cbb55ec4-63d5-44d0-8c29-576edeb07e3cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18255&group=comp.arch#18255

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:76c:: with SMTP id f12mr37097299qvz.3.1625064920558;
Wed, 30 Jun 2021 07:55:20 -0700 (PDT)
X-Received: by 2002:a4a:ba01:: with SMTP id b1mr8760187oop.0.1625064920133;
Wed, 30 Jun 2021 07:55:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 30 Jun 2021 07:55:19 -0700 (PDT)
In-Reply-To: <jwvpmw4cbrw.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f5f4:9dac:532a:1a43;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f5f4:9dac:532a:1a43
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<jwv5yxwfi37.fsf-monnier+comp.arch@gnu.org> <7d7e6bd5-f82a-45c6-a79c-9551abbd04d9n@googlegroups.com>
<jwv1r8kdrbu.fsf-monnier+comp.arch@gnu.org> <f8154a82-4a16-472f-a604-5dc0f34b1faan@googlegroups.com>
<jwvpmw4cbrw.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cbb55ec4-63d5-44d0-8c29-576edeb07e3cn@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 30 Jun 2021 14:55:20 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Wed, 30 Jun 2021 14:55 UTC

On Tuesday, June 29, 2021 at 10:00:04 PM UTC-5, Stefan Monnier wrote:
> >> >> BTW, what kind of prediction structure do you foresee for your "multiway
> >> >> branch" instructions? Do you foresee predicting the index (and then
> >> >> looking up the corresponding target address in the table), so similar to
> >> >> a taken/nottaken predictor but just returning more than one bit, or do
> >> >> you foresee something more like a BTB (i.e. predict the actual target
> >> >> address of the branch)?
> >> > For narrow machines:: do absolutely nothing
> >> Meaning... stall the pipeline?
> > You need to remember My 66000 ISA has a Jump-through-table instruction
> > that performs dense switch ranges.
> Yes my question is specifically about those instructions.
> > So, it takes a few cycles (on the order of a LD) and then control
> > arrives at case label.
> [...]
> > You can call this stalling of the pipeline, but SW did not have to perform the
> > range compare, and LD through the index, then JMP to the case label. All in
> > all it saves 3-4 instructions and takes (on average) a bit less time than the
> > SW equivalent, so at worst it is break even when compared to std RISC
> > architectures.
> I guess the SW equivalent has typically 2 branches (a conditional one for
> out-of-bounds handling and an indirect one for the actual jump), so it
> likely won't take less than 2 cycles on most systems?
>
> You're probably right that on a narrow enough machine, stalling might be
> about as good. Still, after you fetch&decode the switch instruction,
> you have to:
> - wait for all the data-dependencies to clear in order to get
> your index. For a 10-deep pipeline, that could be what, 2 cycles?
<
The narrow machine is 8 cycles deep::
Fetch:Parse:Decode:Execute:Cache:Align:Wait:Write
<
> - fetch the data from the L1$. Latency would be what? 3 cycles?
<
There is a good chance the offset may be in the instruction buffer
when the index is small, otherwise the ICache access is 1 cycle shorter
than the DCache access. And with 4-word-wide fetch, you are getting 8
offsets per access So, if the index is less than 16, you already have the
offset.
<
> do you think you can shorten that by prefetching the cache line before
> we know the index and then using the index into the cache line to
> reduce this part of the latency to a single cycle?
<
My general guess is that the process is index-bound:: you are more likely
to be waiting for the index before you can get started than the other way
around.
<
> - then finally we have the address of the next instruction and can fetch
> it from L1$.
> If the target is hard to predict, stalling (and working to minimize the
> latency) is the better option, indeed.
>
>
> Stefan

Re: Minor idea for indirect target predictor

<sbi11k$m95$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18257&group=comp.arch#18257

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 09:59:45 -0500
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <sbi11k$m95$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 15:02:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="22821"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/bmcudgGoGkuCFldsEQFLV"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:swxQiSAYs4//Ii0Bp6jzc+tr4AI=
In-Reply-To: <sbhukl$4a3$1@dont-email.me>
Content-Language: en-US

by: BGB - Wed, 30 Jun 2021 14:59 UTC

On 6/30/2021 9:21 AM, David Brown wrote:
> On 30/06/2021 14:36, Thomas Koenig wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>>> Stephen Fuld wrote:
>>>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>>>> I am not a compiler guy, so forgive my ignorance. What would a compiler
>>>> do in such a situation? Would it first test for something like if char
>>>> >= "a" and <= "z", etc.?
>>>>
>>>> Just thinking about how I might do it, I think I would have a 256 byte
>>>> array, indexed by the character value, with each entry having an
>>>> indicator value, say 1 for lower case letters, 2 for upper case letters,
>>>> 3 for numbers, 4 for white space characters, etc. Looking up the value
>>>> shouldn't be costly as most of the table with shortly be in D-cache.
>>>> Then use the indicator value to index a jump or call table (which I
>>>> guess is fairly predictable as most characters are lower case letters.
>>>>
>>>> But what do I know? :-(
>>>
>>> About the same as the standard C lib?
>>>
>>> I.e. the classic character classification macros use just such a table.
>>>
>>> As I noted previously, it tends to break down a bit with utf8, but you
>>> can at least handle all 7-bit ascii very quickly.
>
> It breaks down a /lot/ with UTF-8 - the traditional <ctype.h>
> classifications such as "isdigit" or "tolower" collapse pretty quickly
> in the face of non-ASCII character sets. Locales can help a bit, but
> you don't have to get very exotic before the answer is "it's complicated".
>

If one is feeding raw byte or char values to these functions, and using
UTF-8 or similar, then as soon as one gets outside ASCII range the
answer is basically useless.

Simple case is to ignore everything outside the ASCII range.

Secondary option is to treat everything outside the ASCII range as if it
were Codepage-1252 or Latin-1.

For wider characters, one would likely need some other functions, like
those from 'wctype.h', but then it depends on locale rather than "just
assume it is UTF-16 or something".

Or, at least assuming the locale actually does something (simpler answer
is to hard wire it).

In BGBCC, the patterns I followed are:
Narrow strings (in C) are typically Codepage-1252 by default;
Except when they are UTF-8.
Wide strings are typically UTF-16.

I went with the Windows pattern of using UTF-16 for wide-strings and
wchar_t, rather than the Linux pattern of using UTF-32, partly because
UTF-32 is impractically wasteful, and rarely useful. Similarly, one
still doesn't get "one character, one codepoint" even with UTF-32 when
it comes to things like emojis or similar, so alas...

>>
>> And if you don't sanitize your input, you can get "interesting"
>> results, such as when the value is negative for a signed char
>> or when testing if 99999 is printable.
>>
>> There may be a few CVEs along those lines.
>>
>
> The C standard library functions here all take "int" parameters, with
> the results defined for EOF or a value in the range of unsigned char.
> Giving them anything else is undefined behaviour - it is the
> programmer's fault, and the programmer's problem.
>
> However, range testing the value is so simple and obvious that I would
> not expect any C library implementation to have problems here.
>

Yep.

Re: Minor idea for indirect target predictor

<sbi3df$gg$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18260&group=comp.arch#18260

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 15:42:39 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbi3df$gg$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me>
Injection-Date: Wed, 30 Jun 2021 15:42:39 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="528"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Wed, 30 Jun 2021 15:42 UTC

David Brown <david.brown@hesbynett.no> schrieb:

(ctype.h)

> However, range testing the value is so simple and obvious that I would
> not expect any C library implementation to have problems here.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78155 has the non-nice
example

$ cat foo.c
int main (void)
{ __builtin_printf ("%i\n", __builtin_isalpha (999999));
} $ gcc foo.c
$ ./a.out
Segmentation fault (core dumped)

Re: Minor idea for indirect target predictor

<sbi3hh$8q0$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18261&group=comp.arch#18261

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 17:44:48 +0200
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <sbi3hh$8q0$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 15:44:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f6f66713ad55e4c61f2cd67d13e28223";
logging-data="9024"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zl6rl5vYOQWPZSHubkm3o4yfT5cEOBGY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:Sfcf5KJhnugm1HX3yv+sDT49HtE=
In-Reply-To: <sbi11k$m95$1@dont-email.me>
Content-Language: en-GB

by: David Brown - Wed, 30 Jun 2021 15:44 UTC

On 30/06/2021 16:59, BGB wrote:
> On 6/30/2021 9:21 AM, David Brown wrote:
>> On 30/06/2021 14:36, Thomas Koenig wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>>>> Stephen Fuld wrote:
>>>>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>>>>> I am not a compiler guy, so forgive my ignorance. What would a
>>>>> compiler
>>>>> do in such a situation? Would it first test for something like if
>>>>> char
>>>>> >= "a" and <= "z", etc.?
>>>>>
>>>>> Just thinking about how I might do it, I think I would have a 256 byte
>>>>> array, indexed by the character value, with each entry having an
>>>>> indicator value, say 1 for lower case letters, 2 for upper case
>>>>> letters,
>>>>> 3 for numbers, 4 for white space characters, etc. Looking up the
>>>>> value
>>>>> shouldn't be costly as most of the table with shortly be in D-cache.
>>>>> Then use the indicator value to index a jump or call table (which I
>>>>> guess is fairly predictable as most characters are lower case letters.
>>>>>
>>>>> But what do I know? :-(
>>>>
>>>> About the same as the standard C lib?
>>>>
>>>> I.e. the classic character classification macros use just such a table.
>>>>
>>>> As I noted previously, it tends to break down a bit with utf8, but you
>>>> can at least handle all 7-bit ascii very quickly.
>>
>> It breaks down a /lot/ with UTF-8 - the traditional <ctype.h>
>> classifications such as "isdigit" or "tolower" collapse pretty quickly
>> in the face of non-ASCII character sets. Locales can help a bit, but
>> you don't have to get very exotic before the answer is "it's
>> complicated".
>>
>
> If one is feeding raw byte or char values to these functions, and using
> UTF-8 or similar, then as soon as one gets outside ASCII range the
> answer is basically useless.
>
> Simple case is to ignore everything outside the ASCII range.
>
> Secondary option is to treat everything outside the ASCII range as if it
> were Codepage-1252 or Latin-1.
>
>
>
> For wider characters, one would likely need some other functions, like
> those from 'wctype.h', but then it depends on locale rather than "just
> assume it is UTF-16 or something".
>
> Or, at least assuming the locale actually does something (simpler answer
> is to hard wire it).
>
> In BGBCC, the patterns I followed are:
> Narrow strings (in C) are typically Codepage-1252 by default;
> Except when they are UTF-8.
> Wide strings are typically UTF-16.
>
> I went with the Windows pattern of using UTF-16 for wide-strings and
> wchar_t, rather than the Linux pattern of using UTF-32, partly because
> UTF-32 is impractically wasteful, and rarely useful. Similarly, one
> still doesn't get "one character, one codepoint" even with UTF-32 when
> it comes to things like emojis or similar, so alas...
>

I dislike the idea of making things OS specific. Latin-1 is really the
only good choice for an extended 8-bit character set, with wide support.
Anything beyond that (Latin-15, CP-1252, etc.) is going to come out
wrong with some programs.

Using UTF-16 in this day and age is just /wrong/. It is the worst
choice you can have. It doesn't give you a code point per code unit, so
has no advantage over UTF-8. It doesn't have ASCII compatibility and
suffers from endianness choices, giving the disadvantages of UTF-32.
The only possible reason to use it is for interfacing to other code or
API's that are currently UTF-16 - and even Windows is moving to UTF-8.

Re: Minor idea for indirect target predictor

<sbi7n8$7g9$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18263&group=comp.arch#18263

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 18:56:07 +0200
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sbi7n8$7g9$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 30 Jun 2021 16:56:08 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b0e3b71792d96e0a09c27d7a4fcf832f";
logging-data="7689"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18hl2wgZIV48Z4a1f9fbPcbul4H3H3KvAM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:QA2eSWmRw/cxzZP+RDYhleCAOEw=
In-Reply-To: <sbi3df$gg$1@newsreader4.netcologne.de>
Content-Language: en-GB

by: David Brown - Wed, 30 Jun 2021 16:56 UTC

On 30/06/2021 17:42, Thomas Koenig wrote:
> David Brown <david.brown@hesbynett.no> schrieb:
>
> (ctype.h)
>
>> However, range testing the value is so simple and obvious that I would
>> not expect any C library implementation to have problems here.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78155 has the non-nice
> example
>
> $ cat foo.c
> int main (void)
> {
> __builtin_printf ("%i\n", __builtin_isalpha (999999));
> }
> $ gcc foo.c
> $ ./a.out
> Segmentation fault (core dumped)
>

These builtins are a different matter. These are undocumented
functions, used by libraries to implement the documented ones. (There
are plenty of other builtins that /are/ documented.) That is a bug in
the user program, not the compiler, just as derefrencing an invalid
pointer or dividing by zero is a program bug.

Actually, calling standard library "isalpha" with 999999 (assuming you
don't have 32-bit char) and getting a segmentation fault is a programmer
bug too. But it makes sense to have some extra sanity checks on library
functions to help programmers catch their bugs.

Re: Minor idea for indirect target predictor

<sbi8d7$3sa$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18264&group=comp.arch#18264

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 17:07:51 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbi8d7$3sa$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me>
Injection-Date: Wed, 30 Jun 2021 17:07:51 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="3978"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Wed, 30 Jun 2021 17:07 UTC

David Brown <david.brown@hesbynett.no> schrieb:
> On 30/06/2021 17:42, Thomas Koenig wrote:
>> David Brown <david.brown@hesbynett.no> schrieb:
>>
>> (ctype.h)
>>
>>> However, range testing the value is so simple and obvious that I would
>>> not expect any C library implementation to have problems here.
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78155 has the non-nice
>> example
>>
>> $ cat foo.c
>> int main (void)
>> {
>> __builtin_printf ("%i\n", __builtin_isalpha (999999));
>> }
>> $ gcc foo.c
>> $ ./a.out
>> Segmentation fault (core dumped)
>>
>
> These builtins are a different matter. These are undocumented
> functions, used by libraries to implement the documented ones.

This isn't any better:

$ cat foo.c
#include <ctype.h>
#include <stdio.h>

int main (void)
{ printf ("%i\n", isalpha (999999));
} $ gcc foo.c
$ ./a.out
Segmentation fault (core dumped)

This just serves to illustrate that the arguments to isalpha() and
friends can lead to undefined behavior, and these functions should
not be called on unsanitized data.

Re: Minor idea for indirect target predictor

<sbic0k$580$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18267&group=comp.arch#18267

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Wed, 30 Jun 2021 13:06:56 -0500
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <sbic0k$580$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbi3hh$8q0$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 18:09:24 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="45b8fecc7a159f0e815ab6ce141c45b4";
logging-data="5376"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CYhrFmp3sFjbhPsLVkjcB"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:dFNlgiBP7k5dT16i0AyXmJlSKSo=
In-Reply-To: <sbi3hh$8q0$1@dont-email.me>
Content-Language: en-US

by: BGB - Wed, 30 Jun 2021 18:06 UTC

On 6/30/2021 10:44 AM, David Brown wrote:
> On 30/06/2021 16:59, BGB wrote:
>> On 6/30/2021 9:21 AM, David Brown wrote:
>>> On 30/06/2021 14:36, Thomas Koenig wrote:
>>>> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>>>>> Stephen Fuld wrote:
>>>>>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>>>>>> I am not a compiler guy, so forgive my ignorance. What would a
>>>>>> compiler
>>>>>> do in such a situation? Would it first test for something like if
>>>>>> char
>>>>>> >= "a" and <= "z", etc.?
>>>>>>
>>>>>> Just thinking about how I might do it, I think I would have a 256 byte
>>>>>> array, indexed by the character value, with each entry having an
>>>>>> indicator value, say 1 for lower case letters, 2 for upper case
>>>>>> letters,
>>>>>> 3 for numbers, 4 for white space characters, etc. Looking up the
>>>>>> value
>>>>>> shouldn't be costly as most of the table with shortly be in D-cache.
>>>>>> Then use the indicator value to index a jump or call table (which I
>>>>>> guess is fairly predictable as most characters are lower case letters.
>>>>>>
>>>>>> But what do I know? :-(
>>>>>
>>>>> About the same as the standard C lib?
>>>>>
>>>>> I.e. the classic character classification macros use just such a table.
>>>>>
>>>>> As I noted previously, it tends to break down a bit with utf8, but you
>>>>> can at least handle all 7-bit ascii very quickly.
>>>
>>> It breaks down a /lot/ with UTF-8 - the traditional <ctype.h>
>>> classifications such as "isdigit" or "tolower" collapse pretty quickly
>>> in the face of non-ASCII character sets. Locales can help a bit, but
>>> you don't have to get very exotic before the answer is "it's
>>> complicated".
>>>
>>
>> If one is feeding raw byte or char values to these functions, and using
>> UTF-8 or similar, then as soon as one gets outside ASCII range the
>> answer is basically useless.
>>
>> Simple case is to ignore everything outside the ASCII range.
>>
>> Secondary option is to treat everything outside the ASCII range as if it
>> were Codepage-1252 or Latin-1.
>>
>>
>>
>> For wider characters, one would likely need some other functions, like
>> those from 'wctype.h', but then it depends on locale rather than "just
>> assume it is UTF-16 or something".
>>
>> Or, at least assuming the locale actually does something (simpler answer
>> is to hard wire it).
>>
>> In BGBCC, the patterns I followed are:
>> Narrow strings (in C) are typically Codepage-1252 by default;
>> Except when they are UTF-8.
>> Wide strings are typically UTF-16.
>>
>> I went with the Windows pattern of using UTF-16 for wide-strings and
>> wchar_t, rather than the Linux pattern of using UTF-32, partly because
>> UTF-32 is impractically wasteful, and rarely useful. Similarly, one
>> still doesn't get "one character, one codepoint" even with UTF-32 when
>> it comes to things like emojis or similar, so alas...
>>
>
> I dislike the idea of making things OS specific. Latin-1 is really the
> only good choice for an extended 8-bit character set, with wide support.
> Anything beyond that (Latin-15, CP-1252, etc.) is going to come out
> wrong with some programs.
>

Pretty much, though Latin-1 and CP-1252 are close enough that the
differences can be mostly fudged over (a few more printable characters
vs a control character space "pretty much no one uses anyways").

> Using UTF-16 in this day and age is just /wrong/. It is the worst
> choice you can have. It doesn't give you a code point per code unit, so
> has no advantage over UTF-8. It doesn't have ASCII compatibility and
> suffers from endianness choices, giving the disadvantages of UTF-32.
> The only possible reason to use it is for interfacing to other code or
> API's that are currently UTF-16 - and even Windows is moving to UTF-8.
>

UTF-8 is generally a good option.

Though in this case it is more a question of what happens when someone
writes something like:
wchar_t *ws0;
...
ws0=L"Something String";

Windows went with UTF-16 here, and Linux went with UTF-32.

I went with UTF-16 as well, because UTF-32 is horribly wasteful as any
sort of default, and because in most regards BGBCC follows similar
patterns as MSVC, with the main exception of 'sizeof(long)' being 8 on
64-bit targets as keeping sizeof(long)==sizeof(void *) seemed like a
more useful property than keeping long being 32 bits.

As can be noted, not even UTF-32 gives one character per code-point,
since many emojis and CJK characters are encoded using combining
characters, so this battle is basically already lost.

And, likewise:
char *s0;
s0="Some String"; //assumes CP-1252 by default

Because, while one could use UTF-8 here, this actually goes horribly
wrong if compiling programs which assume the ability to shove raw binary
data into string literals.

The actual conversion is a little wonky:
ASCII range maps as-is;
0x80 .. 0xFF map through directly;
Any other characters in CP-1252 map to their associated spots;
Any codepoints from CP-437 also map to their associated spots;
Also some amount of PETSCII codepoints and similar (*2);
Pretty much everything else gets dumped off at 0x8F.

So, narrow string literals in C are a little bit lossy in this sense.

Internally, the compiler stores pretty much everything as UTF-8 though,
and emits the strings in their final encoding when the final binary is
produced.

*2: The built in text-mode hardware font in my BJX2 core is capable of
displaying CP-1252, CP-437, and PETSCII glyphs at the same time; though
granted 8-bit strings can't encode these at the same time (using glyphs
outside the ASCII/1252 set effectively requires the use of ANSI escape
codes or similar).

Technically, one can also do this:
char32_t *ls0;
ls0=U"Some UTF-32 String"; //explicit UTF-32
Or:
char16_t *us0;
us0=u"Some UTF-16 String"; //explicit UTF-16

Or:
char *s0;
s0=u8"Some UTF-8 String"; //explicit UTF-8

As for my "TestKern OS", generally most of the other API functions
assume UTF-8 as the default (pretty much anything involving the
filesystem or similar assumes UTF-8 strings).

....

Re: Minor idea for indirect target predictor

<578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18271&group=comp.arch#18271

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:6ec4:: with SMTP id f4mr33737973qtv.133.1625093087700;
Wed, 30 Jun 2021 15:44:47 -0700 (PDT)
X-Received: by 2002:aca:dac5:: with SMTP id r188mr19824233oig.78.1625093087456;
Wed, 30 Jun 2021 15:44:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.swapon.de!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 30 Jun 2021 15:44:47 -0700 (PDT)
In-Reply-To: <sbi8d7$3sa$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e858:57c1:8d44:a54;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e858:57c1:8d44:a54
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org>
<sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 30 Jun 2021 22:44:47 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Wed, 30 Jun 2021 22:44 UTC

On Wednesday, June 30, 2021 at 12:07:53 PM UTC-5, Thomas Koenig wrote:
> David Brown <david...@hesbynett.no> schrieb:
> > On 30/06/2021 17:42, Thomas Koenig wrote:
> >> David Brown <david...@hesbynett.no> schrieb:
> >>
> >> (ctype.h)
> >>
> >>> However, range testing the value is so simple and obvious that I would
> >>> not expect any C library implementation to have problems here.
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78155 has the non-nice
> >> example
> >>
> >> $ cat foo.c
> >> int main (void)
> >> {
> >> __builtin_printf ("%i\n", __builtin_isalpha (999999));
> >> }
> >> $ gcc foo.c
> >> $ ./a.out
> >> Segmentation fault (core dumped)
> >>
> >
> > These builtins are a different matter. These are undocumented
> > functions, used by libraries to implement the documented ones.
> This isn't any better:
>
> $ cat foo.c
> #include <ctype.h>
> #include <stdio.h>
>
> int main (void)
> {
> printf ("%i\n", isalpha (999999));
> }
> $ gcc foo.c
> $ ./a.out
> Segmentation fault (core dumped)
> This just serves to illustrate that the arguments to isalpha() and
> friends can lead to undefined behavior, and these functions should
> not be called on unsanitized data.
<
Why does isalpha NOT check its input argument range ?????

Re: Minor idea for indirect target predictor

<sbjm60$32l$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18277&group=comp.arch#18277

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 06:09:04 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbjm60$32l$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
<578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>
Injection-Date: Thu, 1 Jul 2021 06:09:04 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="3157"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 1 Jul 2021 06:09 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

> Why does isalpha NOT check its input argument range ?????

Another of those features of C that did not age well. It made
sense in the 1970s, less so in the 2020s.

A better alternative is to use

#include <safe-ctype.h>

and then ISALPHA etc. This is a gcc extension, not sure
if LLVM has it or not.

Re: Minor idea for indirect target predictor

<sbjush$2gp$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18281&group=comp.arch#18281

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 10:37:36 +0200
Organization: A noiseless patient Spider
Lines: 115
Message-ID: <sbjush$2gp$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 1 Jul 2021 08:37:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2ddc155e91461d5454be722e81a0b8a7";
logging-data="2585"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/nQ51GdZ57/TxgF1dSbNP0ef3viHXOlZQ="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:fzfCj533jyztFDuJqib6deB2rWA=
In-Reply-To: <sbi11k$m95$1@dont-email.me>
Content-Language: en-US

by: Marcus - Thu, 1 Jul 2021 08:37 UTC

On 2021-06-30, BGB wrote:
> On 6/30/2021 9:21 AM, David Brown wrote:
>> On 30/06/2021 14:36, Thomas Koenig wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>>>> Stephen Fuld wrote:
>>>>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>>>>> I am not a compiler guy, so forgive my ignorance. What would a
>>>>> compiler
>>>>> do in such a situation? Would it first test for something like if
>>>>> char
>>>>> >= "a" and <= "z", etc.?
>>>>>
>>>>> Just thinking about how I might do it, I think I would have a 256 byte
>>>>> array, indexed by the character value, with each entry having an
>>>>> indicator value, say 1 for lower case letters, 2 for upper case
>>>>> letters,
>>>>> 3 for numbers, 4 for white space characters, etc. Looking up the
>>>>> value
>>>>> shouldn't be costly as most of the table with shortly be in D-cache.
>>>>> Then use the indicator value to index a jump or call table (which I
>>>>> guess is fairly predictable as most characters are lower case letters.
>>>>>
>>>>> But what do I know? :-(
>>>>
>>>> About the same as the standard C lib?
>>>>
>>>> I.e. the classic character classification macros use just such a table.
>>>>
>>>> As I noted previously, it tends to break down a bit with utf8, but you
>>>> can at least handle all 7-bit ascii very quickly.
>>
>> It breaks down a /lot/ with UTF-8 - the traditional <ctype.h>
>> classifications such as "isdigit" or "tolower" collapse pretty quickly
>> in the face of non-ASCII character sets. Locales can help a bit, but
>> you don't have to get very exotic before the answer is "it's
>> complicated".
>>
>
> If one is feeding raw byte or char values to these functions, and using
> UTF-8 or similar, then as soon as one gets outside ASCII range the
> answer is basically useless.
>
> Simple case is to ignore everything outside the ASCII range.
>
> Secondary option is to treat everything outside the ASCII range as if it
> were Codepage-1252 or Latin-1.
>
>
>
> For wider characters, one would likely need some other functions, like
> those from 'wctype.h', but then it depends on locale rather than "just
> assume it is UTF-16 or something".
>

There's a clear difference between string encoding (ASCII, Latin-1,
CP-1252, UTF-8, UTF-16, ...) and character encoding. The latter can
usually be safely represented as a 32-bit integer representing a
single Unicode code point.

In my opinion, any function dealing with a single character at a time
should use Unicode code points as input/output (rather than locale
dependent encodings that is just a big source of bugs).

For instance in the GLFW text input API, all input events carry an
unsigned int representing the Unicode code point for a single character:

https://www.glfw.org/docs/latest/input_guide.html#input_char

....whereas functions that deal with text strings use UTF-8 encoding,
like glfwSetWindowTitle():

https://www.glfw.org/docs/latest/window_guide.html#window_title

My personal preference for C/C++ applications is to use UTF-8 strings
everywhere internally, and whenever an external API (e.g. the Win32 API)
is used - do the necessary string translations in conjunction with the
API calls. On POSIX systems you usally do not have to do any
translations since most API:s accept UTF-8 strings.

I successfully used this design philosophy for BuildCache, for instance.

> Or, at least assuming the locale actually does something (simpler answer
> is to hard wire it).
>
> In BGBCC, the patterns I followed are:
> Narrow strings (in C) are typically Codepage-1252 by default;
> Except when they are UTF-8.
> Wide strings are typically UTF-16.
>
> I went with the Windows pattern of using UTF-16 for wide-strings and
> wchar_t, rather than the Linux pattern of using UTF-32, partly because
> UTF-32 is impractically wasteful, and rarely useful. Similarly, one
> still doesn't get "one character, one codepoint" even with UTF-32 when
> it comes to things like emojis or similar, so alas...
>
>
>>>
>>> And if you don't sanitize your input, you can get "interesting"
>>> results, such as when the value is negative for a signed char
>>> or when testing if 99999 is printable.
>>>
>>> There may be a few CVEs along those lines.
>>>
>>
>> The C standard library functions here all take "int" parameters, with
>> the results defined for EOF or a value in the range of unsigned char.
>> Giving them anything else is undefined behaviour - it is the
>> programmer's fault, and the programmer's problem.
>>
>> However, range testing the value is so simple and obvious that I would
>> not expect any C library implementation to have problems here.
>>
>
> Yep.

Subject	Author
Minor idea for indirect target predictor	Paul A. Clayton
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Terje Mathisen
Re: Minor idea for indirect target predictor	EricP
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	EricP
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Paul A. Clayton
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Paul A. Clayton
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Andy
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Anton Ertl
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	MitchAlsup
Re: sparse switch (was Minor idea for indirect target predictor)	Brian G. Lucas
Re: sparse switch (was Minor idea for indirect target predictor)	EricP
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: sparse switch (was Minor idea for indirect target predictor)	Brian G. Lucas
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: sparse switch (was Minor idea for indirect target predictor)	MitchAlsup
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: sparse switch (was Minor idea for indirect target predictor)	Terje Mathisen
Re: sparse switch (was Minor idea for indirect target predictor)	John Levine
Re: sparse switch (was Minor idea for indirect target predictor)	MitchAlsup
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Terje Mathisen
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Marcus
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Marcus
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Terje Mathisen
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	John Dallman
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	EricP
Re: Minor idea for indirect target predictor	Andy Valencia
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	John Dallman
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Python performance (was: Minor idea for indirect target	Marcus
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	MitchAlsup