Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The Universe is populated by stable things. -- Richard Dawkins


computers / comp.arch / Re: Branch prediction hints

Re: Branch prediction hints

<66dfbd30-34c8-48a2-9174-1eeab337e696n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=17139&group=comp.arch#17139

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:7c02:: with SMTP id x2mr5631883qkc.483.1621905464049;
Mon, 24 May 2021 18:17:44 -0700 (PDT)
X-Received: by 2002:a4a:b389:: with SMTP id p9mr20131222ooo.71.1621905463771;
Mon, 24 May 2021 18:17:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 May 2021 18:17:43 -0700 (PDT)
In-Reply-To: <s8hf4q$mrr$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ed6f:f412:a8c7:989c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ed6f:f412:a8c7:989c
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me> <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <66dfbd30-34c8-48a2-9174-1eeab337e696n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 01:17:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 25 May 2021 01:17 UTC

On Monday, May 24, 2021 at 7:07:57 PM UTC-5, BGB wrote:
> On 5/24/2021 4:36 PM, MitchAlsup wrote:
> > On Monday, May 24, 2021 at 3:41:28 PM UTC-5, BGB wrote:
> >> On 5/24/2021 12:52 PM, MitchAlsup wrote:
> >>> On Sunday, May 23, 2021 at 11:16:58 PM UTC-5, BGB wrote:
> >>>> On 5/23/2021 8:32 PM, MitchAlsup wrote:
> >
> >>>>> Would I be out of line to state that this sounds like a poor starting point?
> >>>> Probably.
> >>>>
> >>>> It is more akin to designing for a 16-bit ISA, where it doesn't take
> >>>> much to eat through pretty much all of it.
> >> Clarification:
> >> I had meant, "it probably was a poor starting point" rather than "it was
> >> probably out of line"...
> > <
> > I knew that.......
> >>>>> <
> >>>>> My 66000 has 1/3rd of its Major OpCode space unallocated,
> >>>>> a bit less than 1/2 of its memory reference OpCode Space allocated,
> >>>>> a bit less than 1/2 of its 2-operand OpCode Space allocated,
> >>>>> a bit less than 1/128 of its 1-operand Op[Code Apace allocated,
> >>>>> and 1/4 of its 3-operand OpCode Space unallocated.
> >>>> Starts looking at it a little more, and realizing encoding space may be
> >>>> a more serious problem than I realized initially...
> >>>>
> >>>>
> >>>> I can't really map BJX2 to this new space, it just doesn't fit...
> >>>>
> >>>>
> >>>> Then again, maybe it might win more points with the "RISC means small
> >>>> ISA listing" crowd... Because one runs out of encoding bits before they
> >>>> can fit all that much into it...
> >>>>
> >>>>
> >>>> "Well, Imma define some Disp9 Load/Store Ops...",
> >>>> "Oh-Noes, that was 1/4 of the encoding space!",
> >>>> "How about some 3R Load/Store ops and 3R ALU ops and 2R space",
> >>>> "Now it at 1/2 of the opcode space!"
> >>> <
> >>> To be fair, I made a loot of these mistakes in Mc 88K, and corrected the
> >>> vast majority of them in My 66000.
> >>>>
> >>>> Then one has to struggle to fit some useful 3RI ALU ops, 2RI ops, and
> >>>> Branch ops, before realizing they are already basically out of encoding
> >>>> space...
> >>> <
> >>> The important thing to remember is that the most precious resource is
> >>> the Major OpCode space--and the reason is that this gives you access
> >>> to the other spaces.
> >>> <
> >>> In My 66000, the Major OpCode space consists of all 16-bit immediates
> >>> The branches with IP relative offsets, and the extension OpCodes, of
> >>> which there are 6 {Predication, Shifts, 2R+Disp memory refs, 2-Operand,
> >>> 3-Operand, and 1-Operand.}
> >>> <
> >>> For all of the extended instructions, My 66000 has 3-bits to control the
> >>> signs of the operands and access to long immediates, and access to
> >>> 5-bit immediates in Src1. This supports things like 1<<k in a single instruction.
> >>> <
> >>> The second most important resource is the 3-operand space because
> >>> there are only 8 available entries and we need FMAC (single and double),
> >>> CMOV, and INSert.
> >>> <
> >>> The other spaces are so partially populated that one has a pretty free
> >>> reign.
> >> OK.
> >>
> >> In my initial layout, was starting from a 6-bit major space, with a
> >> 4-bit minor space for 3R ops, and an additional 6 bits for 2R ops.
> > <
> > 6-bit major:: check
> > I only got 3-bit 3-operand OpCode because I use 3 other bits for sign
> > control and access to immediates::
> > <
> > FMAC Rd, R1,±R2,±R3
> > So you can change the sign associated with multiplication or with adding
> > and get 4 flavors of MACing. This seems to work usefully well with the
> > bit field INSert instruction as bit-inversion rather than negation.
> >>
> >> Doing a Disp9 or Imm9 op would only have the 6-bit major opcode, which
> >> doesn't really go all that far.
> > <
> > I guess I am missing something, here, as I get Imm16 and DISP16<<2 for both
> > of these, and for unconditional branches (or CALL) I get DISP26<<2. Must have
> > something to do with packing or unpacking of WEX.....
> Making another layout attempt...
> This one keeps Imm9 where appropriate.
>
>
> Where, ppqq!=0100:
>
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
<
I know it looks like you grew up inside x86-land, but using
MOV r,(r,d) and MOV (r,d),r as LD and ST is simply confusing.
Load and Store are much more explanatory.
<
Also note, I only have 1 LEA although the index register can still be
shifted 0,1,2,3 places.
>
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0000 ADD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0001 SUB Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0010 MULS Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0011 MULU Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0100 -
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0101 AND Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0110 OR Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0111 XOR Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1000 SHAD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1001 SHLD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1010 SHADQ Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1011 SHLDQ Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1100 ADC Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1101 SBB Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1110 DMULS Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1111 DMULU Rs, Rt, Rn
<
4 integer multiplies ?!? I only needed 2 {signed and unsigned}
I access carry through a different mechanism (saving those
instruction encodings.)
>
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0000 SHAD Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0001 SHAD Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0010 SHADQ Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0011 SHADQ Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0100 SHLD Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0101 SHLD Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0110 SHLDQ Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0111 SHLDQ Rs, Imm6n, Rn
<
I only have 4 shift instructions {SL and SR} in {signed and unsigned} flavors
and {constant or register} shift amounts. You have 3× as many
>
> Then fill in FPU, ALUX, SIMD, ... operations.
>
> ...
>
> * ppqq-0011 00nn-nnnn ssss-ssoo oooo-oooo 1R and 2R spaces.
>
> ...
>
>
> This '00' block is basically where all the 3R and 2R ops go.
>
>
> * ppqq-0000 01nn-nnnn ssss-ss0i iiii-iiii MOV.B Rn, (Rs, Disp9)
> * ppqq-0000 01nn-nnnn ssss-ss1i iiii-iiii LEA.B Rn, (Rs, Disp9)
> * ppqq-0001 01nn-nnnn ssss-ss0i iiii-iiii MOV.W Rn, (Rs, Disp9)
> * ppqq-0001 01nn-nnnn ssss-ss1i iiii-iiii LEA.W Rn, (Rs, Disp9)
> * ppqq-0010 01nn-nnnn ssss-ss0i iiii-iiii MOV.L Rn, (Rs, Disp9)
> * ppqq-0010 01nn-nnnn ssss-ss1i iiii-iiii LEA.L Rn, (Rs, Disp9)
> * ppqq-0011 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q Rn, (Rs, Disp9)
> * ppqq-0011 01nn-nnnn ssss-ss1i iiii-iiii LEA.Q Rn, (Rs, Disp9)
> * ppqq-0100 01nn-nnnn ssss-ss0i iiii-iiii MOV.B (Rs, Disp9), Rn
> * ppqq-0100 01nn-nnnn ssss-ss1i iiii-iiii MOVU.B (Rs, Disp9), Rn
> * ppqq-0101 01nn-nnnn ssss-ss0i iiii-iiii MOV.W (Rs, Disp9), Rn
> * ppqq-0101 01nn-nnnn ssss-ss1i iiii-iiii MOVU.W (Rs, Disp9), Rn
> * ppqq-0110 01nn-nnnn ssss-ss0i iiii-iiii MOV.L (Rs, Disp9), Rn
> * ppqq-0110 01nn-nnnn ssss-ss1i iiii-iiii MOVU.L (Rs, Disp9), Rn
> * ppqq-0111 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q (Rs, Disp9), Rn
> * ppqq-0111 01nn-nnnn ssss-ss1i iiii-iiii -
> ...
> * ppqq-1111 0100-iiii iiii-iiii iiii-iiii BRA Disp20
> * ppqq-1111 0101-iiii iiii-iiii iiii-iiii BSR Disp20
>
>
> ...
>
> * ppqq-0000 10nn-nnnn ssss-ss0i iiii-iiii ADD Rs, Imm9u, Rn
> * ppqq-0000 10nn-nnnn ssss-ss1i iiii-iiii ADD Rs, Imm9n, Rn
> * ppqq-0001 10nn-nnnn ssss-ss0i iiii-iiii MULS Rs, Imm9u, Rn
> * ppqq-0001 10nn-nnnn ssss-ss1i iiii-iiii MULU Rs, Imm9n, Rn
> * ppqq-0010 10nn-nnnn ssss-ss0i iiii-iiii ADDSL Rs, Imm9u, Rn
> * ppqq-0010 10nn-nnnn ssss-ss1i iiii-iiii ADDSL Rs, Imm9n, Rn
> * ppqq-0011 10nn-nnnn ssss-ss0i iiii-iiii ADDUL Rs, Imm9u, Rn
> * ppqq-0011 10nn-nnnn ssss-ss1i iiii-iiii ADDUL Rs, Imm9n, Rn
> * ppqq-0100 -
> * ppqq-0101 10nn-nnnn ssss-ss0i iiii-iiii AND Rs, Imm9u, Rn
> * ppqq-0110 10nn-nnnn ssss-ss0i iiii-iiii OR Rs, Imm9u, Rn
> * ppqq-0111 10nn-nnnn ssss-ss0i iiii-iiii XOR Rs, Imm9u, Rn
>
> * ppqq-1000 10nn-nnnn 0000-rrii iiii-iiii CMPEQ Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0001-rrii iiii-iiii CMPEQ Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0010-rrii iiii-iiii CMPQEQ Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0011-rrii iiii-iiii CMPQEQ Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0100-rrii iiii-iiii CMPGT Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0101-rrii iiii-iiii CMPGT Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0110-rrii iiii-iiii CMPQGT Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0111-rrii iiii-iiii CMPQGT Imm10n, Rn
>
> * ppqq-1000 10nn-nnnn 1000-rrii iiii-iiii CMPHI Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1001-rrii iiii-iiii CMPHI Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1010-rrii iiii-iiii CMPQHI Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1011-rrii iiii-iiii CMPQHI Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1100-rrii iiii-iiii CMPGE Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1101-rrii iiii-iiii CMPGE Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1110-rrii iiii-iiii CMPQGE Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1111-rrii iiii-iiii CMPQGE Imm10n, Rn
<
I only have 2 (maybe 3) CMP instructions {int, FP<single,double>}
I can convert the result of a CMP instruction into TRUE and FALSE
with a bit field extract when needed (very seldom) and branch on
then by using branch on bit when needed (almost always). This
supports both languages that define True as 1 and languages that
define True as -1. Even my smallest machines CoIssue CMP BB
as a single unit of work.
>
> ...
>
> * ppqq-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16u, Rn
> * ppqq-0001 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16n, Rn
<
I don't use any instructions to place a constant in a register, but
I do have a MOV instruction that has access to constant operand.
<
> * ppqq-0010 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16u, Rn
> * ppqq-0011 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16n, Rn
> * ppqq-0100 11nn-nnnn iiii-iiii iiii-iiii LDISH Imm16u, Rn
> * ppqq-0101 11nn-nnnn iiii-iiii iiii-iiii FLDCH Imm16u, Rn
<
I don't have any of the 9-bit or 10-bit forms, only 16-bit ones.
<
So It looks like you are wasting ~40%-ish of the available entropy
in your ISA.
>
> ...
>
>
> Where, ppqq==0100:
>
> * 0100-00ii iiii-iiii iiii-iiii iiii-iiii BRA Disp26s
> * 0100-01ii iiii-iiii iiii-iiii iiii-iiii BSR Disp26s
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Immed
> * 0100-1001 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Opcode
> * 0100-1010 iiii-iiii iiii-iiii iiii-iiii LDIZ Imm24, R0
> * 0100-1011 iiii-iiii iiii-iiii iiii-iiii LDIN Imm24, R0
> ...
>
>
> Likewise:
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1010 iiii-iiii iiii-iiii iiii-iiii BRA Abs48
>
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1011 iiii-iiii iiii-iiii iiii-iiii BSR Abs48
>
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0000-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm64, Rn
>
>
> Or, basically, I guess it works, but this encoding space hasn't exactly
> gone very far...

SubjectRepliesAuthor
o Branch prediction hints

By: Thomas Koenig on Sat, 22 May 2021

72Thomas Koenig
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor