Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

If the facts don't fit the theory, change the facts. -- Albert Einstein


devel / comp.arch / Re: Branch prediction hints

SubjectAuthor
* Branch prediction hintsThomas Koenig
+- Re: Branch prediction hintsStefan Monnier
+- Re: Branch prediction hintsMitchAlsup
+* Re: Branch prediction hintsIvan Godard
|`* Re: Branch prediction hintsMitchAlsup
| `- Re: Branch prediction hintsMitchAlsup
+* Re: Branch prediction hintsBGB
|`* Re: Branch prediction hintsIvan Godard
| `* Re: Branch prediction hintsBGB
|  `* Re: Branch prediction hintsIvan Godard
|   +* Re: Branch prediction hintsMitchAlsup
|   |`* Re: Branch prediction hintsIvan Godard
|   | +* Re: Branch prediction hintsBGB
|   | |`- Re: Branch prediction hintsMitchAlsup
|   | `- Re: Branch prediction hintsMitchAlsup
|   `* Re: Branch prediction hintsBGB
|    `* Re: Branch prediction hintsIvan Godard
|     +* Re: Branch prediction hintsMitchAlsup
|     |`* Re: Branch prediction hintsBGB
|     | `* Re: Branch prediction hintsMitchAlsup
|     |  `* Re: Branch prediction hintsBGB
|     |   `* Re: Branch prediction hintsMitchAlsup
|     |    `* Re: Branch prediction hintsBGB
|     |     `* Re: Branch prediction hintsMitchAlsup
|     |      `* Re: Branch prediction hintsBGB
|     |       +* Re: Branch prediction hintsMitchAlsup
|     |       |`* Re: Branch prediction hintsBGB
|     |       | `* Re: Branch prediction hintsMitchAlsup
|     |       |  `* Re: Branch prediction hintsBGB
|     |       |   +- Re: Branch prediction hintsMitchAlsup
|     |       |   `* Re: Branch prediction hintsStefan Monnier
|     |       |    `* Re: Branch prediction hintsBGB
|     |       |     `- Re: Branch prediction hintsrobf...@gmail.com
|     |       `* Re: Branch prediction hintsMitchAlsup
|     |        `- Re: Branch prediction hintsBGB
|     `* Re: Branch prediction hintsBGB
|      `- Re: Branch prediction hintsMitchAlsup
+* Re: Branch prediction hintsTerje Mathisen
|`* Re: Branch prediction hintsMitchAlsup
| +* Re: Branch prediction hintsStefan Monnier
| |`* Re: Branch prediction hintsMitchAlsup
| | `* Re: Branch prediction hintsMarcus
| |  +* Re: Branch prediction hintsThomas Koenig
| |  |+- Re: Branch prediction hintsMarcus
| |  |`* Re: Branch prediction hintsAnton Ertl
| |  | `- Re: Branch prediction hintsMitchAlsup
| |  `* Re: Branch prediction hintsStephen Fuld
| |   `* Re: Branch prediction hintsTim Rentsch
| |    +- Re: Branch prediction hintsStephen Fuld
| |    `* Re: Branch prediction hintsMitchAlsup
| |     `- Re: Branch prediction hintsQuadibloc
| `- Re: Branch prediction hintsTerje Mathisen
`* Re: Branch prediction hintsEricP
 +* Re: Branch prediction hintsThomas Koenig
 |`- Re: Branch prediction hintsEricP
 +* Re: Branch prediction hintsMitchAlsup
 |+* Re: Branch prediction hintsIvan Godard
 ||`* Re: Branch prediction hintsMitchAlsup
 || `* Re: Branch prediction hintsEricP
 ||  `* Re: Branch prediction hintsMitchAlsup
 ||   `* Re: Branch prediction hintsEricP
 ||    `* Re: Branch prediction hintsMitchAlsup
 ||     `* Re: HW Transactions [was Branch prediction hints]EricP
 ||      `* Re: HW Transactions [was Branch prediction hints]MitchAlsup
 ||       `* Re: HW Transactions [was Branch prediction hints]EricP
 ||        `- Re: HW Transactions [was Branch prediction hints]MitchAlsup
 |`- Re: Branch prediction hintsEricP
 `* Re: Branch prediction hintsIvan Godard
  +* Re: Branch prediction hintsMitchAlsup
  |`* Re: Branch prediction hintsIvan Godard
  | `- Re: Branch prediction hintsMitchAlsup
  `* Re: Branch prediction hintsEricP
   `- Re: Branch prediction hintsThomas Koenig

Pages:123
Re: Branch prediction hints

<4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17138&group=comp.arch#17138

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:127b:: with SMTP id b27mr31315256qkl.104.1621902379537;
Mon, 24 May 2021 17:26:19 -0700 (PDT)
X-Received: by 2002:a9d:2ee:: with SMTP id 101mr20160534otl.76.1621902379280;
Mon, 24 May 2021 17:26:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 May 2021 17:26:19 -0700 (PDT)
In-Reply-To: <s8hf4q$mrr$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ed6f:f412:a8c7:989c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ed6f:f412:a8c7:989c
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me> <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 00:26:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 25 May 2021 00:26 UTC

On Monday, May 24, 2021 at 7:07:57 PM UTC-5, BGB wrote:
> On 5/24/2021 4:36 PM, MitchAlsup wrote:
> > On Monday, May 24, 2021 at 3:41:28 PM UTC-5, BGB wrote:
> >> On 5/24/2021 12:52 PM, MitchAlsup wrote:
> >>> On Sunday, May 23, 2021 at 11:16:58 PM UTC-5, BGB wrote:
> >>>> On 5/23/2021 8:32 PM, MitchAlsup wrote:
> >
> >>>>> Would I be out of line to state that this sounds like a poor starting point?
> >>>> Probably.
> >>>>
> >>>> It is more akin to designing for a 16-bit ISA, where it doesn't take
> >>>> much to eat through pretty much all of it.
> >> Clarification:
> >> I had meant, "it probably was a poor starting point" rather than "it was
> >> probably out of line"...
> > <
> > I knew that.......
> >>>>> <
> >>>>> My 66000 has 1/3rd of its Major OpCode space unallocated,
> >>>>> a bit less than 1/2 of its memory reference OpCode Space allocated,
> >>>>> a bit less than 1/2 of its 2-operand OpCode Space allocated,
> >>>>> a bit less than 1/128 of its 1-operand Op[Code Apace allocated,
> >>>>> and 1/4 of its 3-operand OpCode Space unallocated.
> >>>> Starts looking at it a little more, and realizing encoding space may be
> >>>> a more serious problem than I realized initially...
> >>>>
> >>>>
> >>>> I can't really map BJX2 to this new space, it just doesn't fit...
> >>>>
> >>>>
> >>>> Then again, maybe it might win more points with the "RISC means small
> >>>> ISA listing" crowd... Because one runs out of encoding bits before they
> >>>> can fit all that much into it...
> >>>>
> >>>>
> >>>> "Well, Imma define some Disp9 Load/Store Ops...",
> >>>> "Oh-Noes, that was 1/4 of the encoding space!",
> >>>> "How about some 3R Load/Store ops and 3R ALU ops and 2R space",
> >>>> "Now it at 1/2 of the opcode space!"
> >>> <
> >>> To be fair, I made a loot of these mistakes in Mc 88K, and corrected the
> >>> vast majority of them in My 66000.
> >>>>
> >>>> Then one has to struggle to fit some useful 3RI ALU ops, 2RI ops, and
> >>>> Branch ops, before realizing they are already basically out of encoding
> >>>> space...
> >>> <
> >>> The important thing to remember is that the most precious resource is
> >>> the Major OpCode space--and the reason is that this gives you access
> >>> to the other spaces.
> >>> <
> >>> In My 66000, the Major OpCode space consists of all 16-bit immediates
> >>> The branches with IP relative offsets, and the extension OpCodes, of
> >>> which there are 6 {Predication, Shifts, 2R+Disp memory refs, 2-Operand,
> >>> 3-Operand, and 1-Operand.}
> >>> <
> >>> For all of the extended instructions, My 66000 has 3-bits to control the
> >>> signs of the operands and access to long immediates, and access to
> >>> 5-bit immediates in Src1. This supports things like 1<<k in a single instruction.
> >>> <
> >>> The second most important resource is the 3-operand space because
> >>> there are only 8 available entries and we need FMAC (single and double),
> >>> CMOV, and INSert.
> >>> <
> >>> The other spaces are so partially populated that one has a pretty free
> >>> reign.
> >> OK.
> >>
> >> In my initial layout, was starting from a 6-bit major space, with a
> >> 4-bit minor space for 3R ops, and an additional 6 bits for 2R ops.
> > <
> > 6-bit major:: check
> > I only got 3-bit 3-operand OpCode because I use 3 other bits for sign
> > control and access to immediates::
> > <
> > FMAC Rd, R1,±R2,±R3
> > So you can change the sign associated with multiplication or with adding
> > and get 4 flavors of MACing. This seems to work usefully well with the
> > bit field INSert instruction as bit-inversion rather than negation.
> >>
> >> Doing a Disp9 or Imm9 op would only have the 6-bit major opcode, which
> >> doesn't really go all that far.
> > <
> > I guess I am missing something, here, as I get Imm16 and DISP16<<2 for both
> > of these, and for unconditional branches (or CALL) I get DISP26<<2. Must have
> > something to do with packing or unpacking of WEX.....
> Making another layout attempt...
> This one keeps Imm9 where appropriate.
>
>
> Where, ppqq!=0100:
>
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
<
6-bit register specifiers are eating you alive.
It also appears you are using W for 16-bit containers.
>
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0000 ADD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0001 SUB Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0010 MULS Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0011 MULU Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0100 -
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0101 AND Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0110 OR Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0111 XOR Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1000 SHAD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1001 SHLD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1010 SHADQ Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1011 SHLDQ Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1100 ADC Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1101 SBB Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1110 DMULS Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1111 DMULU Rs, Rt, Rn
>
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0000 SHAD Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0001 SHAD Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0010 SHADQ Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0011 SHADQ Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0100 SHLD Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0101 SHLD Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0110 SHLDQ Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0111 SHLDQ Rs, Imm6n, Rn
>
> Then fill in FPU, ALUX, SIMD, ... operations.
>
> ...
>
> * ppqq-0011 00nn-nnnn ssss-ssoo oooo-oooo 1R and 2R spaces.
>
> ...
>
>
> This '00' block is basically where all the 3R and 2R ops go.
>
>
> * ppqq-0000 01nn-nnnn ssss-ss0i iiii-iiii MOV.B Rn, (Rs, Disp9)
> * ppqq-0000 01nn-nnnn ssss-ss1i iiii-iiii LEA.B Rn, (Rs, Disp9)
> * ppqq-0001 01nn-nnnn ssss-ss0i iiii-iiii MOV.W Rn, (Rs, Disp9)
> * ppqq-0001 01nn-nnnn ssss-ss1i iiii-iiii LEA.W Rn, (Rs, Disp9)
> * ppqq-0010 01nn-nnnn ssss-ss0i iiii-iiii MOV.L Rn, (Rs, Disp9)
> * ppqq-0010 01nn-nnnn ssss-ss1i iiii-iiii LEA.L Rn, (Rs, Disp9)
> * ppqq-0011 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q Rn, (Rs, Disp9)
> * ppqq-0011 01nn-nnnn ssss-ss1i iiii-iiii LEA.Q Rn, (Rs, Disp9)
> * ppqq-0100 01nn-nnnn ssss-ss0i iiii-iiii MOV.B (Rs, Disp9), Rn
> * ppqq-0100 01nn-nnnn ssss-ss1i iiii-iiii MOVU.B (Rs, Disp9), Rn
> * ppqq-0101 01nn-nnnn ssss-ss0i iiii-iiii MOV.W (Rs, Disp9), Rn
> * ppqq-0101 01nn-nnnn ssss-ss1i iiii-iiii MOVU.W (Rs, Disp9), Rn
> * ppqq-0110 01nn-nnnn ssss-ss0i iiii-iiii MOV.L (Rs, Disp9), Rn
> * ppqq-0110 01nn-nnnn ssss-ss1i iiii-iiii MOVU.L (Rs, Disp9), Rn
> * ppqq-0111 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q (Rs, Disp9), Rn
> * ppqq-0111 01nn-nnnn ssss-ss1i iiii-iiii -
> ...
> * ppqq-1111 0100-iiii iiii-iiii iiii-iiii BRA Disp20
> * ppqq-1111 0101-iiii iiii-iiii iiii-iiii BSR Disp20
>
>
> ...
>
> * ppqq-0000 10nn-nnnn ssss-ss0i iiii-iiii ADD Rs, Imm9u, Rn
> * ppqq-0000 10nn-nnnn ssss-ss1i iiii-iiii ADD Rs, Imm9n, Rn
> * ppqq-0001 10nn-nnnn ssss-ss0i iiii-iiii MULS Rs, Imm9u, Rn
> * ppqq-0001 10nn-nnnn ssss-ss1i iiii-iiii MULU Rs, Imm9n, Rn
> * ppqq-0010 10nn-nnnn ssss-ss0i iiii-iiii ADDSL Rs, Imm9u, Rn
> * ppqq-0010 10nn-nnnn ssss-ss1i iiii-iiii ADDSL Rs, Imm9n, Rn
> * ppqq-0011 10nn-nnnn ssss-ss0i iiii-iiii ADDUL Rs, Imm9u, Rn
> * ppqq-0011 10nn-nnnn ssss-ss1i iiii-iiii ADDUL Rs, Imm9n, Rn
> * ppqq-0100 -
> * ppqq-0101 10nn-nnnn ssss-ss0i iiii-iiii AND Rs, Imm9u, Rn
> * ppqq-0110 10nn-nnnn ssss-ss0i iiii-iiii OR Rs, Imm9u, Rn
> * ppqq-0111 10nn-nnnn ssss-ss0i iiii-iiii XOR Rs, Imm9u, Rn
>
> * ppqq-1000 10nn-nnnn 0000-rrii iiii-iiii CMPEQ Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0001-rrii iiii-iiii CMPEQ Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0010-rrii iiii-iiii CMPQEQ Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0011-rrii iiii-iiii CMPQEQ Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0100-rrii iiii-iiii CMPGT Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0101-rrii iiii-iiii CMPGT Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0110-rrii iiii-iiii CMPQGT Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0111-rrii iiii-iiii CMPQGT Imm10n, Rn
>
> * ppqq-1000 10nn-nnnn 1000-rrii iiii-iiii CMPHI Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1001-rrii iiii-iiii CMPHI Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1010-rrii iiii-iiii CMPQHI Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1011-rrii iiii-iiii CMPQHI Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1100-rrii iiii-iiii CMPGE Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1101-rrii iiii-iiii CMPGE Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1110-rrii iiii-iiii CMPQGE Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1111-rrii iiii-iiii CMPQGE Imm10n, Rn
>
> ...
>
> * ppqq-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16u, Rn
> * ppqq-0001 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16n, Rn
> * ppqq-0010 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16u, Rn
> * ppqq-0011 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16n, Rn
> * ppqq-0100 11nn-nnnn iiii-iiii iiii-iiii LDISH Imm16u, Rn
> * ppqq-0101 11nn-nnnn iiii-iiii iiii-iiii FLDCH Imm16u, Rn
>
> ...
>
>
> Where, ppqq==0100:
>
> * 0100-00ii iiii-iiii iiii-iiii iiii-iiii BRA Disp26s
> * 0100-01ii iiii-iiii iiii-iiii iiii-iiii BSR Disp26s
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Immed
> * 0100-1001 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Opcode
> * 0100-1010 iiii-iiii iiii-iiii iiii-iiii LDIZ Imm24, R0
> * 0100-1011 iiii-iiii iiii-iiii iiii-iiii LDIN Imm24, R0
> ...
>
>
> Likewise:
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1010 iiii-iiii iiii-iiii iiii-iiii BRA Abs48
>
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1011 iiii-iiii iiii-iiii iiii-iiii BSR Abs48
>
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0000-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm64, Rn
<
It is my opinion, that you are holding onto the notion of using instructions
to paste bits together, rather than using the momentum of the instruction
stream to provide constants to operand fields avoiding having to execute
these instructions.
>
>
> Or, basically, I guess it works, but this encoding space hasn't exactly
> gone very far...
<
Interesting tidying up, but no real movement in entropy.
>
>
> Though, I guess, unlike BJX2, it does have space to fit in a few large
> 26-bit branch ops.
> >>
> >>
> >> Meanwhile, in this top-level space, in BJX2 it was 3+4+1 (8) bits, with
> >> 3R ops adding 4-bits, and 2R ops adding 8 bits.
> >>
> >>
> >> The new design was seriously choked in the top-level space, but could
> >> have more space for 2R ops.
> > <
> > All of my 2-operand stuff went under 1 Major OpCode ( 001010 )
> > All of the 3-agen stuff went under 1 Major OpCode ( 001001 )
> > So I have 6 (of 64) Major OpCodes burned for everything not in the Major
> > OpCode group. One can tell if it has a chance of being an extension OpCode
> > (XOP) by looking at the top 2 bits (00), then the second top bit (xx0) is
> > for operand+immediate and (001) is for operand+operand.
> I had assumed a shared space for 2R and 3R ops, with 2R carved off of
> the 3R space.
>
> There isn't a huge shortage of 2R space at least...
>
> The 2R and 3R space is OK, assuming it is given 1/4 of the top-level
> space...
<
I only allotted the 2-operand space 1 major OpCode.
I only allotted the 3-operand space 1 major OpCode.
I still have 1/3rd of the Major OpCode space unallocated.


Click here to read the complete article
Re: Branch prediction hints

<66dfbd30-34c8-48a2-9174-1eeab337e696n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17139&group=comp.arch#17139

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:7c02:: with SMTP id x2mr5631883qkc.483.1621905464049;
Mon, 24 May 2021 18:17:44 -0700 (PDT)
X-Received: by 2002:a4a:b389:: with SMTP id p9mr20131222ooo.71.1621905463771;
Mon, 24 May 2021 18:17:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 May 2021 18:17:43 -0700 (PDT)
In-Reply-To: <s8hf4q$mrr$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ed6f:f412:a8c7:989c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ed6f:f412:a8c7:989c
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me> <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <66dfbd30-34c8-48a2-9174-1eeab337e696n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 01:17:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 25 May 2021 01:17 UTC

On Monday, May 24, 2021 at 7:07:57 PM UTC-5, BGB wrote:
> On 5/24/2021 4:36 PM, MitchAlsup wrote:
> > On Monday, May 24, 2021 at 3:41:28 PM UTC-5, BGB wrote:
> >> On 5/24/2021 12:52 PM, MitchAlsup wrote:
> >>> On Sunday, May 23, 2021 at 11:16:58 PM UTC-5, BGB wrote:
> >>>> On 5/23/2021 8:32 PM, MitchAlsup wrote:
> >
> >>>>> Would I be out of line to state that this sounds like a poor starting point?
> >>>> Probably.
> >>>>
> >>>> It is more akin to designing for a 16-bit ISA, where it doesn't take
> >>>> much to eat through pretty much all of it.
> >> Clarification:
> >> I had meant, "it probably was a poor starting point" rather than "it was
> >> probably out of line"...
> > <
> > I knew that.......
> >>>>> <
> >>>>> My 66000 has 1/3rd of its Major OpCode space unallocated,
> >>>>> a bit less than 1/2 of its memory reference OpCode Space allocated,
> >>>>> a bit less than 1/2 of its 2-operand OpCode Space allocated,
> >>>>> a bit less than 1/128 of its 1-operand Op[Code Apace allocated,
> >>>>> and 1/4 of its 3-operand OpCode Space unallocated.
> >>>> Starts looking at it a little more, and realizing encoding space may be
> >>>> a more serious problem than I realized initially...
> >>>>
> >>>>
> >>>> I can't really map BJX2 to this new space, it just doesn't fit...
> >>>>
> >>>>
> >>>> Then again, maybe it might win more points with the "RISC means small
> >>>> ISA listing" crowd... Because one runs out of encoding bits before they
> >>>> can fit all that much into it...
> >>>>
> >>>>
> >>>> "Well, Imma define some Disp9 Load/Store Ops...",
> >>>> "Oh-Noes, that was 1/4 of the encoding space!",
> >>>> "How about some 3R Load/Store ops and 3R ALU ops and 2R space",
> >>>> "Now it at 1/2 of the opcode space!"
> >>> <
> >>> To be fair, I made a loot of these mistakes in Mc 88K, and corrected the
> >>> vast majority of them in My 66000.
> >>>>
> >>>> Then one has to struggle to fit some useful 3RI ALU ops, 2RI ops, and
> >>>> Branch ops, before realizing they are already basically out of encoding
> >>>> space...
> >>> <
> >>> The important thing to remember is that the most precious resource is
> >>> the Major OpCode space--and the reason is that this gives you access
> >>> to the other spaces.
> >>> <
> >>> In My 66000, the Major OpCode space consists of all 16-bit immediates
> >>> The branches with IP relative offsets, and the extension OpCodes, of
> >>> which there are 6 {Predication, Shifts, 2R+Disp memory refs, 2-Operand,
> >>> 3-Operand, and 1-Operand.}
> >>> <
> >>> For all of the extended instructions, My 66000 has 3-bits to control the
> >>> signs of the operands and access to long immediates, and access to
> >>> 5-bit immediates in Src1. This supports things like 1<<k in a single instruction.
> >>> <
> >>> The second most important resource is the 3-operand space because
> >>> there are only 8 available entries and we need FMAC (single and double),
> >>> CMOV, and INSert.
> >>> <
> >>> The other spaces are so partially populated that one has a pretty free
> >>> reign.
> >> OK.
> >>
> >> In my initial layout, was starting from a 6-bit major space, with a
> >> 4-bit minor space for 3R ops, and an additional 6 bits for 2R ops.
> > <
> > 6-bit major:: check
> > I only got 3-bit 3-operand OpCode because I use 3 other bits for sign
> > control and access to immediates::
> > <
> > FMAC Rd, R1,±R2,±R3
> > So you can change the sign associated with multiplication or with adding
> > and get 4 flavors of MACing. This seems to work usefully well with the
> > bit field INSert instruction as bit-inversion rather than negation.
> >>
> >> Doing a Disp9 or Imm9 op would only have the 6-bit major opcode, which
> >> doesn't really go all that far.
> > <
> > I guess I am missing something, here, as I get Imm16 and DISP16<<2 for both
> > of these, and for unconditional branches (or CALL) I get DISP26<<2. Must have
> > something to do with packing or unpacking of WEX.....
> Making another layout attempt...
> This one keeps Imm9 where appropriate.
>
>
> Where, ppqq!=0100:
>
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
<
I know it looks like you grew up inside x86-land, but using
MOV r,(r,d) and MOV (r,d),r as LD and ST is simply confusing.
Load and Store are much more explanatory.
<
Also note, I only have 1 LEA although the index register can still be
shifted 0,1,2,3 places.
>
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0000 ADD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0001 SUB Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0010 MULS Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0011 MULU Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0100 -
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0101 AND Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0110 OR Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0111 XOR Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1000 SHAD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1001 SHLD Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1010 SHADQ Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1011 SHLDQ Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1100 ADC Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1101 SBB Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1110 DMULS Rs, Rt, Rn
> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1111 DMULU Rs, Rt, Rn
<
4 integer multiplies ?!? I only needed 2 {signed and unsigned}
I access carry through a different mechanism (saving those
instruction encodings.)
>
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0000 SHAD Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0001 SHAD Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0010 SHADQ Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0011 SHADQ Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0100 SHLD Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0101 SHLD Rs, Imm6n, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0110 SHLDQ Rs, Imm6u, Rn
> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0111 SHLDQ Rs, Imm6n, Rn
<
I only have 4 shift instructions {SL and SR} in {signed and unsigned} flavors
and {constant or register} shift amounts. You have 3× as many
>
> Then fill in FPU, ALUX, SIMD, ... operations.
>
> ...
>
> * ppqq-0011 00nn-nnnn ssss-ssoo oooo-oooo 1R and 2R spaces.
>
> ...
>
>
> This '00' block is basically where all the 3R and 2R ops go.
>
>
> * ppqq-0000 01nn-nnnn ssss-ss0i iiii-iiii MOV.B Rn, (Rs, Disp9)
> * ppqq-0000 01nn-nnnn ssss-ss1i iiii-iiii LEA.B Rn, (Rs, Disp9)
> * ppqq-0001 01nn-nnnn ssss-ss0i iiii-iiii MOV.W Rn, (Rs, Disp9)
> * ppqq-0001 01nn-nnnn ssss-ss1i iiii-iiii LEA.W Rn, (Rs, Disp9)
> * ppqq-0010 01nn-nnnn ssss-ss0i iiii-iiii MOV.L Rn, (Rs, Disp9)
> * ppqq-0010 01nn-nnnn ssss-ss1i iiii-iiii LEA.L Rn, (Rs, Disp9)
> * ppqq-0011 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q Rn, (Rs, Disp9)
> * ppqq-0011 01nn-nnnn ssss-ss1i iiii-iiii LEA.Q Rn, (Rs, Disp9)
> * ppqq-0100 01nn-nnnn ssss-ss0i iiii-iiii MOV.B (Rs, Disp9), Rn
> * ppqq-0100 01nn-nnnn ssss-ss1i iiii-iiii MOVU.B (Rs, Disp9), Rn
> * ppqq-0101 01nn-nnnn ssss-ss0i iiii-iiii MOV.W (Rs, Disp9), Rn
> * ppqq-0101 01nn-nnnn ssss-ss1i iiii-iiii MOVU.W (Rs, Disp9), Rn
> * ppqq-0110 01nn-nnnn ssss-ss0i iiii-iiii MOV.L (Rs, Disp9), Rn
> * ppqq-0110 01nn-nnnn ssss-ss1i iiii-iiii MOVU.L (Rs, Disp9), Rn
> * ppqq-0111 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q (Rs, Disp9), Rn
> * ppqq-0111 01nn-nnnn ssss-ss1i iiii-iiii -
> ...
> * ppqq-1111 0100-iiii iiii-iiii iiii-iiii BRA Disp20
> * ppqq-1111 0101-iiii iiii-iiii iiii-iiii BSR Disp20
>
>
> ...
>
> * ppqq-0000 10nn-nnnn ssss-ss0i iiii-iiii ADD Rs, Imm9u, Rn
> * ppqq-0000 10nn-nnnn ssss-ss1i iiii-iiii ADD Rs, Imm9n, Rn
> * ppqq-0001 10nn-nnnn ssss-ss0i iiii-iiii MULS Rs, Imm9u, Rn
> * ppqq-0001 10nn-nnnn ssss-ss1i iiii-iiii MULU Rs, Imm9n, Rn
> * ppqq-0010 10nn-nnnn ssss-ss0i iiii-iiii ADDSL Rs, Imm9u, Rn
> * ppqq-0010 10nn-nnnn ssss-ss1i iiii-iiii ADDSL Rs, Imm9n, Rn
> * ppqq-0011 10nn-nnnn ssss-ss0i iiii-iiii ADDUL Rs, Imm9u, Rn
> * ppqq-0011 10nn-nnnn ssss-ss1i iiii-iiii ADDUL Rs, Imm9n, Rn
> * ppqq-0100 -
> * ppqq-0101 10nn-nnnn ssss-ss0i iiii-iiii AND Rs, Imm9u, Rn
> * ppqq-0110 10nn-nnnn ssss-ss0i iiii-iiii OR Rs, Imm9u, Rn
> * ppqq-0111 10nn-nnnn ssss-ss0i iiii-iiii XOR Rs, Imm9u, Rn
>
> * ppqq-1000 10nn-nnnn 0000-rrii iiii-iiii CMPEQ Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0001-rrii iiii-iiii CMPEQ Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0010-rrii iiii-iiii CMPQEQ Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0011-rrii iiii-iiii CMPQEQ Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0100-rrii iiii-iiii CMPGT Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0101-rrii iiii-iiii CMPGT Imm10n, Rn
> * ppqq-1000 10nn-nnnn 0110-rrii iiii-iiii CMPQGT Imm10u, Rn
> * ppqq-1000 10nn-nnnn 0111-rrii iiii-iiii CMPQGT Imm10n, Rn
>
> * ppqq-1000 10nn-nnnn 1000-rrii iiii-iiii CMPHI Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1001-rrii iiii-iiii CMPHI Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1010-rrii iiii-iiii CMPQHI Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1011-rrii iiii-iiii CMPQHI Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1100-rrii iiii-iiii CMPGE Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1101-rrii iiii-iiii CMPGE Imm10n, Rn
> * ppqq-1000 10nn-nnnn 1110-rrii iiii-iiii CMPQGE Imm10u, Rn
> * ppqq-1000 10nn-nnnn 1111-rrii iiii-iiii CMPQGE Imm10n, Rn
<
I only have 2 (maybe 3) CMP instructions {int, FP<single,double>}
I can convert the result of a CMP instruction into TRUE and FALSE
with a bit field extract when needed (very seldom) and branch on
then by using branch on bit when needed (almost always). This
supports both languages that define True as 1 and languages that
define True as -1. Even my smallest machines CoIssue CMP BB
as a single unit of work.
>
> ...
>
> * ppqq-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16u, Rn
> * ppqq-0001 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16n, Rn
<
I don't use any instructions to place a constant in a register, but
I do have a MOV instruction that has access to constant operand.
<
> * ppqq-0010 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16u, Rn
> * ppqq-0011 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16n, Rn
> * ppqq-0100 11nn-nnnn iiii-iiii iiii-iiii LDISH Imm16u, Rn
> * ppqq-0101 11nn-nnnn iiii-iiii iiii-iiii FLDCH Imm16u, Rn
<
I don't have any of the 9-bit or 10-bit forms, only 16-bit ones.
<
So It looks like you are wasting ~40%-ish of the available entropy
in your ISA.
>
> ...
>
>
> Where, ppqq==0100:
>
> * 0100-00ii iiii-iiii iiii-iiii iiii-iiii BRA Disp26s
> * 0100-01ii iiii-iiii iiii-iiii iiii-iiii BSR Disp26s
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Immed
> * 0100-1001 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Opcode
> * 0100-1010 iiii-iiii iiii-iiii iiii-iiii LDIZ Imm24, R0
> * 0100-1011 iiii-iiii iiii-iiii iiii-iiii LDIN Imm24, R0
> ...
>
>
> Likewise:
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1010 iiii-iiii iiii-iiii iiii-iiii BRA Abs48
>
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1011 iiii-iiii iiii-iiii iiii-iiii BSR Abs48
>
> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> 0000-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm64, Rn
>
>
> Or, basically, I guess it works, but this encoding space hasn't exactly
> gone very far...


Click here to read the complete article
Re: Branch prediction hints

<s8hl1s$mao$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17140&group=comp.arch#17140

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Mon, 24 May 2021 20:48:39 -0500
Organization: A noiseless patient Spider
Lines: 350
Message-ID: <s8hl1s$mao$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me>
<13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me>
<23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me>
<1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me>
<4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
<4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 25 May 2021 01:48:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2660c744ddf30ee30e8ef7bd54ded436";
logging-data="22872"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19dcwgkeOyL+0keZlsJHwz4"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.2
Cancel-Lock: sha1:5M3Rrfrb/hsIIZTWZq2fsJ/1QPE=
In-Reply-To: <4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
Content-Language: en-US
 by: BGB - Tue, 25 May 2021 01:48 UTC

On 5/24/2021 7:26 PM, MitchAlsup wrote:
> On Monday, May 24, 2021 at 7:07:57 PM UTC-5, BGB wrote:
>> On 5/24/2021 4:36 PM, MitchAlsup wrote:
>>> On Monday, May 24, 2021 at 3:41:28 PM UTC-5, BGB wrote:
>>>> On 5/24/2021 12:52 PM, MitchAlsup wrote:
>>>>> On Sunday, May 23, 2021 at 11:16:58 PM UTC-5, BGB wrote:
>>>>>> On 5/23/2021 8:32 PM, MitchAlsup wrote:
>>>
>>>>>>> Would I be out of line to state that this sounds like a poor starting point?
>>>>>> Probably.
>>>>>>
>>>>>> It is more akin to designing for a 16-bit ISA, where it doesn't take
>>>>>> much to eat through pretty much all of it.
>>>> Clarification:
>>>> I had meant, "it probably was a poor starting point" rather than "it was
>>>> probably out of line"...
>>> <
>>> I knew that.......
>>>>>>> <
>>>>>>> My 66000 has 1/3rd of its Major OpCode space unallocated,
>>>>>>> a bit less than 1/2 of its memory reference OpCode Space allocated,
>>>>>>> a bit less than 1/2 of its 2-operand OpCode Space allocated,
>>>>>>> a bit less than 1/128 of its 1-operand Op[Code Apace allocated,
>>>>>>> and 1/4 of its 3-operand OpCode Space unallocated.
>>>>>> Starts looking at it a little more, and realizing encoding space may be
>>>>>> a more serious problem than I realized initially...
>>>>>>
>>>>>>
>>>>>> I can't really map BJX2 to this new space, it just doesn't fit...
>>>>>>
>>>>>>
>>>>>> Then again, maybe it might win more points with the "RISC means small
>>>>>> ISA listing" crowd... Because one runs out of encoding bits before they
>>>>>> can fit all that much into it...
>>>>>>
>>>>>>
>>>>>> "Well, Imma define some Disp9 Load/Store Ops...",
>>>>>> "Oh-Noes, that was 1/4 of the encoding space!",
>>>>>> "How about some 3R Load/Store ops and 3R ALU ops and 2R space",
>>>>>> "Now it at 1/2 of the opcode space!"
>>>>> <
>>>>> To be fair, I made a loot of these mistakes in Mc 88K, and corrected the
>>>>> vast majority of them in My 66000.
>>>>>>
>>>>>> Then one has to struggle to fit some useful 3RI ALU ops, 2RI ops, and
>>>>>> Branch ops, before realizing they are already basically out of encoding
>>>>>> space...
>>>>> <
>>>>> The important thing to remember is that the most precious resource is
>>>>> the Major OpCode space--and the reason is that this gives you access
>>>>> to the other spaces.
>>>>> <
>>>>> In My 66000, the Major OpCode space consists of all 16-bit immediates
>>>>> The branches with IP relative offsets, and the extension OpCodes, of
>>>>> which there are 6 {Predication, Shifts, 2R+Disp memory refs, 2-Operand,
>>>>> 3-Operand, and 1-Operand.}
>>>>> <
>>>>> For all of the extended instructions, My 66000 has 3-bits to control the
>>>>> signs of the operands and access to long immediates, and access to
>>>>> 5-bit immediates in Src1. This supports things like 1<<k in a single instruction.
>>>>> <
>>>>> The second most important resource is the 3-operand space because
>>>>> there are only 8 available entries and we need FMAC (single and double),
>>>>> CMOV, and INSert.
>>>>> <
>>>>> The other spaces are so partially populated that one has a pretty free
>>>>> reign.
>>>> OK.
>>>>
>>>> In my initial layout, was starting from a 6-bit major space, with a
>>>> 4-bit minor space for 3R ops, and an additional 6 bits for 2R ops.
>>> <
>>> 6-bit major:: check
>>> I only got 3-bit 3-operand OpCode because I use 3 other bits for sign
>>> control and access to immediates::
>>> <
>>> FMAC Rd, R1,±R2,±R3
>>> So you can change the sign associated with multiplication or with adding
>>> and get 4 flavors of MACing. This seems to work usefully well with the
>>> bit field INSert instruction as bit-inversion rather than negation.
>>>>
>>>> Doing a Disp9 or Imm9 op would only have the 6-bit major opcode, which
>>>> doesn't really go all that far.
>>> <
>>> I guess I am missing something, here, as I get Imm16 and DISP16<<2 for both
>>> of these, and for unconditional branches (or CALL) I get DISP26<<2. Must have
>>> something to do with packing or unpacking of WEX.....
>> Making another layout attempt...
>> This one keeps Imm9 where appropriate.
>>
>>
>> Where, ppqq!=0100:
>>
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
> <
> 6-bit register specifiers are eating you alive.
> It also appears you are using W for 16-bit containers.

Yep.

With the combination of the 6-bit registers and ppqq field, I end up
with several bits less than I had with the 16/32 length encoding and
5-bit registers.

B: Byte
W: Word (16-bit)
L: DWord (32-bit)
Q: QWord (64-bit)
X: XWord (128-bit)

At least, assuming I didn't change the naming scheme.
I missed the XWord ops, in BJX2 these are shoved off in various off corners.

If I were do do anything like this, would make sense to keep 128-bit
XWord operations and similar.

As noted, the idea for this ISA was to have 64 GPRs, such that in-theory
modulo loop scheduling could be used without blowing out the register
budget quite as quickly. But, not quite as absurd as the 128 registers
in IA-64 (and, with smaller instructions, so probably better code density).

It is possible I could map PC, LR, and GBR into the GPR space.
Eg:
R60=SP (Stack Pointer)
R61=LR (Link Register)
R62=GBR (Global Base Register)
R63=PC (Program Counter)

It is possible that some of the high-order bits of PC could be aliased
to several of the SR bits, and LR would save these bits. Part of this is
because these are either frequently accessed in prolog/epilog sequences
or needed as base registers.

>>
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0000 ADD Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0001 SUB Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0010 MULS Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0011 MULU Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0100 -
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0101 AND Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0110 OR Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0111 XOR Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1000 SHAD Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1001 SHLD Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1010 SHADQ Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1011 SHLDQ Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1100 ADC Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1101 SBB Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1110 DMULS Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1111 DMULU Rs, Rt, Rn
>>
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0000 SHAD Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0001 SHAD Rs, Imm6n, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0010 SHADQ Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0011 SHADQ Rs, Imm6n, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0100 SHLD Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0101 SHLD Rs, Imm6n, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0110 SHLDQ Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0111 SHLDQ Rs, Imm6n, Rn
>>
>> Then fill in FPU, ALUX, SIMD, ... operations.
>>
>> ...
>>
>> * ppqq-0011 00nn-nnnn ssss-ssoo oooo-oooo 1R and 2R spaces.
>>
>> ...
>>
>>
>> This '00' block is basically where all the 3R and 2R ops go.
>>
>>
>> * ppqq-0000 01nn-nnnn ssss-ss0i iiii-iiii MOV.B Rn, (Rs, Disp9)
>> * ppqq-0000 01nn-nnnn ssss-ss1i iiii-iiii LEA.B Rn, (Rs, Disp9)
>> * ppqq-0001 01nn-nnnn ssss-ss0i iiii-iiii MOV.W Rn, (Rs, Disp9)
>> * ppqq-0001 01nn-nnnn ssss-ss1i iiii-iiii LEA.W Rn, (Rs, Disp9)
>> * ppqq-0010 01nn-nnnn ssss-ss0i iiii-iiii MOV.L Rn, (Rs, Disp9)
>> * ppqq-0010 01nn-nnnn ssss-ss1i iiii-iiii LEA.L Rn, (Rs, Disp9)
>> * ppqq-0011 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q Rn, (Rs, Disp9)
>> * ppqq-0011 01nn-nnnn ssss-ss1i iiii-iiii LEA.Q Rn, (Rs, Disp9)
>> * ppqq-0100 01nn-nnnn ssss-ss0i iiii-iiii MOV.B (Rs, Disp9), Rn
>> * ppqq-0100 01nn-nnnn ssss-ss1i iiii-iiii MOVU.B (Rs, Disp9), Rn
>> * ppqq-0101 01nn-nnnn ssss-ss0i iiii-iiii MOV.W (Rs, Disp9), Rn
>> * ppqq-0101 01nn-nnnn ssss-ss1i iiii-iiii MOVU.W (Rs, Disp9), Rn
>> * ppqq-0110 01nn-nnnn ssss-ss0i iiii-iiii MOV.L (Rs, Disp9), Rn
>> * ppqq-0110 01nn-nnnn ssss-ss1i iiii-iiii MOVU.L (Rs, Disp9), Rn
>> * ppqq-0111 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q (Rs, Disp9), Rn
>> * ppqq-0111 01nn-nnnn ssss-ss1i iiii-iiii -
>> ...
>> * ppqq-1111 0100-iiii iiii-iiii iiii-iiii BRA Disp20
>> * ppqq-1111 0101-iiii iiii-iiii iiii-iiii BSR Disp20
>>
>>
>> ...
>>
>> * ppqq-0000 10nn-nnnn ssss-ss0i iiii-iiii ADD Rs, Imm9u, Rn
>> * ppqq-0000 10nn-nnnn ssss-ss1i iiii-iiii ADD Rs, Imm9n, Rn
>> * ppqq-0001 10nn-nnnn ssss-ss0i iiii-iiii MULS Rs, Imm9u, Rn
>> * ppqq-0001 10nn-nnnn ssss-ss1i iiii-iiii MULU Rs, Imm9n, Rn
>> * ppqq-0010 10nn-nnnn ssss-ss0i iiii-iiii ADDSL Rs, Imm9u, Rn
>> * ppqq-0010 10nn-nnnn ssss-ss1i iiii-iiii ADDSL Rs, Imm9n, Rn
>> * ppqq-0011 10nn-nnnn ssss-ss0i iiii-iiii ADDUL Rs, Imm9u, Rn
>> * ppqq-0011 10nn-nnnn ssss-ss1i iiii-iiii ADDUL Rs, Imm9n, Rn
>> * ppqq-0100 -
>> * ppqq-0101 10nn-nnnn ssss-ss0i iiii-iiii AND Rs, Imm9u, Rn
>> * ppqq-0110 10nn-nnnn ssss-ss0i iiii-iiii OR Rs, Imm9u, Rn
>> * ppqq-0111 10nn-nnnn ssss-ss0i iiii-iiii XOR Rs, Imm9u, Rn
>>
>> * ppqq-1000 10nn-nnnn 0000-rrii iiii-iiii CMPEQ Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 0001-rrii iiii-iiii CMPEQ Imm10n, Rn
>> * ppqq-1000 10nn-nnnn 0010-rrii iiii-iiii CMPQEQ Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 0011-rrii iiii-iiii CMPQEQ Imm10n, Rn
>> * ppqq-1000 10nn-nnnn 0100-rrii iiii-iiii CMPGT Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 0101-rrii iiii-iiii CMPGT Imm10n, Rn
>> * ppqq-1000 10nn-nnnn 0110-rrii iiii-iiii CMPQGT Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 0111-rrii iiii-iiii CMPQGT Imm10n, Rn
>>
>> * ppqq-1000 10nn-nnnn 1000-rrii iiii-iiii CMPHI Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 1001-rrii iiii-iiii CMPHI Imm10n, Rn
>> * ppqq-1000 10nn-nnnn 1010-rrii iiii-iiii CMPQHI Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 1011-rrii iiii-iiii CMPQHI Imm10n, Rn
>> * ppqq-1000 10nn-nnnn 1100-rrii iiii-iiii CMPGE Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 1101-rrii iiii-iiii CMPGE Imm10n, Rn
>> * ppqq-1000 10nn-nnnn 1110-rrii iiii-iiii CMPQGE Imm10u, Rn
>> * ppqq-1000 10nn-nnnn 1111-rrii iiii-iiii CMPQGE Imm10n, Rn
>>
>> ...
>>
>> * ppqq-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16u, Rn
>> * ppqq-0001 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16n, Rn
>> * ppqq-0010 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16u, Rn
>> * ppqq-0011 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16n, Rn
>> * ppqq-0100 11nn-nnnn iiii-iiii iiii-iiii LDISH Imm16u, Rn
>> * ppqq-0101 11nn-nnnn iiii-iiii iiii-iiii FLDCH Imm16u, Rn
>>
>> ...
>>
>>
>> Where, ppqq==0100:
>>
>> * 0100-00ii iiii-iiii iiii-iiii iiii-iiii BRA Disp26s
>> * 0100-01ii iiii-iiii iiii-iiii iiii-iiii BSR Disp26s
>> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Immed
>> * 0100-1001 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Opcode
>> * 0100-1010 iiii-iiii iiii-iiii iiii-iiii LDIZ Imm24, R0
>> * 0100-1011 iiii-iiii iiii-iiii iiii-iiii LDIN Imm24, R0
>> ...
>>
>>
>> Likewise:
>> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
>> 0100-1010 iiii-iiii iiii-iiii iiii-iiii BRA Abs48
>>
>> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
>> 0100-1011 iiii-iiii iiii-iiii iiii-iiii BSR Abs48
>>
>> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
>> 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
>> 0000-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm64, Rn
> <
> It is my opinion, that you are holding onto the notion of using instructions
> to paste bits together, rather than using the momentum of the instruction
> stream to provide constants to operand fields avoiding having to execute
> these instructions.


Click here to read the complete article
Re: Branch prediction hints

<s8hq5a$m75$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17142&group=comp.arch#17142

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Mon, 24 May 2021 22:15:49 -0500
Organization: A noiseless patient Spider
Lines: 332
Message-ID: <s8hq5a$m75$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me>
<13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me>
<23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me>
<1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me>
<4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
<66dfbd30-34c8-48a2-9174-1eeab337e696n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 25 May 2021 03:15:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2660c744ddf30ee30e8ef7bd54ded436";
logging-data="22757"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9JYDKQjzOTQWvVPzJEJaL"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.2
Cancel-Lock: sha1:9fMqiVm54vfDaDJIjViODtYzLEk=
In-Reply-To: <66dfbd30-34c8-48a2-9174-1eeab337e696n@googlegroups.com>
Content-Language: en-US
 by: BGB - Tue, 25 May 2021 03:15 UTC

On 5/24/2021 8:17 PM, MitchAlsup wrote:
> On Monday, May 24, 2021 at 7:07:57 PM UTC-5, BGB wrote:
>> On 5/24/2021 4:36 PM, MitchAlsup wrote:
>>> On Monday, May 24, 2021 at 3:41:28 PM UTC-5, BGB wrote:
>>>> On 5/24/2021 12:52 PM, MitchAlsup wrote:
>>>>> On Sunday, May 23, 2021 at 11:16:58 PM UTC-5, BGB wrote:
>>>>>> On 5/23/2021 8:32 PM, MitchAlsup wrote:
>>>
>>>>>>> Would I be out of line to state that this sounds like a poor starting point?
>>>>>> Probably.
>>>>>>
>>>>>> It is more akin to designing for a 16-bit ISA, where it doesn't take
>>>>>> much to eat through pretty much all of it.
>>>> Clarification:
>>>> I had meant, "it probably was a poor starting point" rather than "it was
>>>> probably out of line"...
>>> <
>>> I knew that.......
>>>>>>> <
>>>>>>> My 66000 has 1/3rd of its Major OpCode space unallocated,
>>>>>>> a bit less than 1/2 of its memory reference OpCode Space allocated,
>>>>>>> a bit less than 1/2 of its 2-operand OpCode Space allocated,
>>>>>>> a bit less than 1/128 of its 1-operand Op[Code Apace allocated,
>>>>>>> and 1/4 of its 3-operand OpCode Space unallocated.
>>>>>> Starts looking at it a little more, and realizing encoding space may be
>>>>>> a more serious problem than I realized initially...
>>>>>>
>>>>>>
>>>>>> I can't really map BJX2 to this new space, it just doesn't fit...
>>>>>>
>>>>>>
>>>>>> Then again, maybe it might win more points with the "RISC means small
>>>>>> ISA listing" crowd... Because one runs out of encoding bits before they
>>>>>> can fit all that much into it...
>>>>>>
>>>>>>
>>>>>> "Well, Imma define some Disp9 Load/Store Ops...",
>>>>>> "Oh-Noes, that was 1/4 of the encoding space!",
>>>>>> "How about some 3R Load/Store ops and 3R ALU ops and 2R space",
>>>>>> "Now it at 1/2 of the opcode space!"
>>>>> <
>>>>> To be fair, I made a loot of these mistakes in Mc 88K, and corrected the
>>>>> vast majority of them in My 66000.
>>>>>>
>>>>>> Then one has to struggle to fit some useful 3RI ALU ops, 2RI ops, and
>>>>>> Branch ops, before realizing they are already basically out of encoding
>>>>>> space...
>>>>> <
>>>>> The important thing to remember is that the most precious resource is
>>>>> the Major OpCode space--and the reason is that this gives you access
>>>>> to the other spaces.
>>>>> <
>>>>> In My 66000, the Major OpCode space consists of all 16-bit immediates
>>>>> The branches with IP relative offsets, and the extension OpCodes, of
>>>>> which there are 6 {Predication, Shifts, 2R+Disp memory refs, 2-Operand,
>>>>> 3-Operand, and 1-Operand.}
>>>>> <
>>>>> For all of the extended instructions, My 66000 has 3-bits to control the
>>>>> signs of the operands and access to long immediates, and access to
>>>>> 5-bit immediates in Src1. This supports things like 1<<k in a single instruction.
>>>>> <
>>>>> The second most important resource is the 3-operand space because
>>>>> there are only 8 available entries and we need FMAC (single and double),
>>>>> CMOV, and INSert.
>>>>> <
>>>>> The other spaces are so partially populated that one has a pretty free
>>>>> reign.
>>>> OK.
>>>>
>>>> In my initial layout, was starting from a 6-bit major space, with a
>>>> 4-bit minor space for 3R ops, and an additional 6 bits for 2R ops.
>>> <
>>> 6-bit major:: check
>>> I only got 3-bit 3-operand OpCode because I use 3 other bits for sign
>>> control and access to immediates::
>>> <
>>> FMAC Rd, R1,±R2,±R3
>>> So you can change the sign associated with multiplication or with adding
>>> and get 4 flavors of MACing. This seems to work usefully well with the
>>> bit field INSert instruction as bit-inversion rather than negation.
>>>>
>>>> Doing a Disp9 or Imm9 op would only have the 6-bit major opcode, which
>>>> doesn't really go all that far.
>>> <
>>> I guess I am missing something, here, as I get Imm16 and DISP16<<2 for both
>>> of these, and for unconditional branches (or CALL) I get DISP26<<2. Must have
>>> something to do with packing or unpacking of WEX.....
>> Making another layout attempt...
>> This one keeps Imm9 where appropriate.
>>
>>
>> Where, ppqq!=0100:
>>
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
> <
> I know it looks like you grew up inside x86-land, but using
> MOV r,(r,d) and MOV (r,d),r as LD and ST is simply confusing.
> Load and Store are much more explanatory.
> <

This is basically a similar pattern to what SH used, including putting
the destination register as the rightmost register.

Granted, I could use LD.x / ST.x as well...

> Also note, I only have 1 LEA although the index register can still be
> shifted 0,1,2,3 places.

Yeah, the different LEA ops are for the different scale values.

>>
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0000 ADD Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0001 SUB Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0010 MULS Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0011 MULU Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0100 -
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0101 AND Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0110 OR Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0111 XOR Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1000 SHAD Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1001 SHLD Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1010 SHADQ Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1011 SHLDQ Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1100 ADC Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1101 SBB Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1110 DMULS Rs, Rt, Rn
>> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1111 DMULU Rs, Rt, Rn
> <
> 4 integer multiplies ?!? I only needed 2 {signed and unsigned}
> I access carry through a different mechanism (saving those
> instruction encodings.)

MULS/MULU do narrow multiply, which sign/zero extends the result.
DMULS/DMULU are widening.

>>
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0000 SHAD Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0001 SHAD Rs, Imm6n, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0010 SHADQ Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0011 SHADQ Rs, Imm6n, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0100 SHLD Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0101 SHLD Rs, Imm6n, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0110 SHLDQ Rs, Imm6u, Rn
>> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0111 SHLDQ Rs, Imm6n, Rn
> <
> I only have 4 shift instructions {SL and SR} in {signed and unsigned} flavors
> and {constant or register} shift amounts. You have 3× as many

Imm6u is left shift, Imm6n is right shift.
These are for arithmetic, logical, and for both 32 and 64 bit.
The 32-bit variants may also sign or zero-extend the results.


Click here to read the complete article
Re: Branch prediction hints

<s8i7nc$a26$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17147&group=comp.arch#17147

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Tue, 25 May 2021 09:07:24 +0200
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <s8i7nc$a26$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com>
<jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
<ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com>
<12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>
<s8h1je$dhj$1@dont-email.me> <s8h69l$2tt$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 25 May 2021 07:07:24 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e896b81a3d54fde41039e063271fd774";
logging-data="10310"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/NRV7Ng/U/WiPUQPm7/6gJ6OsJbMyg0qo="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:H+KDLtLwhE0XXSBuEo9SaDB14LI=
In-Reply-To: <s8h69l$2tt$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Tue, 25 May 2021 07:07 UTC

On 2021-05-24, Thomas Koenig wrote:
> Marcus <m.delete@this.bitsnbites.eu> schrieb:
>
>> BTW, apart from VVM, are there any good examples of ISA:s with loop
>> instructions that are easy to predict ahead of time (thus effectively
>> unrolling loops, eliminating compares/branches, and reducing branch
>> predictor load)?
>
> POWER might count with its separate count register (sorry for
> the pun).
>
> It has instructions like Decrement the CTR, then branch if the
> decremented CTR equals or does not equal zero. It is also possible
> to combine this with conditions, just to make sure the
> branch predictors still have something to do :-)
>
> Still, it is a pretty good match for Fortran's DO loops
> or any kind of loop where you know the number of iterations
> beforehand.
>

Sounds similar to the MC68000 instruction DBF Dn,label (decrease Dn and
branch to label if Dn != -1). It also had an optional condition (DBcc).

/Marcus

Re: Branch prediction hints

<7R7rI.95638$iT.92788@fx19.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17149&group=comp.arch#17149

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx19.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad> <c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com> <s8earo$a9k$1@dont-email.me> <e977b069-6ce0-46bf-ac2a-f7fb85ef9f6cn@googlegroups.com> <sBRqI.613761$nn2.535143@fx48.iad> <7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com>
In-Reply-To: <7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 79
Message-ID: <7R7rI.95638$iT.92788@fx19.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 25 May 2021 14:29:23 UTC
Date: Tue, 25 May 2021 10:29:01 -0400
X-Received-Bytes: 4564
 by: EricP - Tue, 25 May 2021 14:29 UTC

MitchAlsup wrote:
> On Monday, May 24, 2021 at 12:43:55 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Sunday, May 23, 2021 at 2:36:26 PM UTC-5, Ivan Godard wrote:
>>>> On 5/23/2021 9:07 AM, MitchAlsup wrote:
>>>>> On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:
>>>>>> If you don't have a hint to explicitly block speculation at the
>>>>>> branch then the design would have to use more complicated and
>>>>>> probably error prone dynamic logic to "deduce" what to do.
>>>>> <
>>>>> You do not want Naked memory refs to be used to setup or complete
>>>>> ATOMIC events. You need to "mark" their participation in the event
>>>>> so the machine knows that such an event is going on from the out-
>>>>> set.
>>> <
>>>> This also permits you to have intra-transaction stores that are not part
>>>> of the transaction, say for logging and debugging, where you don't lock
>>>> the log memory.
>>> <
>>> Yes, exactly--and that is where the word "participating" came into use
>>> when doing the ASF as AMD. Since you CANNOT single step through
>>> and ATOMIC event, you need some way of figuring out what is going
>>> on, lobbing registers into a memory buffer for later printing is the
>>> prescribed means.
>>> <
>> I also did not want to checkpoint the register set at the start
>> of a transaction and roll them all back on abort.
>> I want memory protected but not registers.
>>
>> When I played about (on paper) with the ASF design I found
>> it quite inconvenient that there was no way to communicate
>> anything from inside a transaction to outside.
>>
>> In my design when a transaction starts, it remembers the start PC.
> <
> ESM does this too, records the starting IP and if interference happens
> the ATOMIC event restarts there. ESM also uses the Branch on memory
> interference instruction to reset this point as as escape point should the
> event fail later.
> <
> I admit that that was a problem in ASF.
> <
>> If an abort occurs, it tosses the protected memory changes,
>> and jumps back to that PC and sets a status into a register,
>> but any other registers already retired have their values retained.

I also thought it might be nice to pass the PC of the last retired
instruction to the fail abort code. Though it requires and extra
register be reserved to pass this to failure handler.

> <
> ESM does nothing to the registers upon fail either, but it specifically
> states that the compiler is not allowed to use the values in the
> "participating" registers. The non-participating registers may not
> have been updated in a von Neumann order, either.
> <
>> The abort handler would then reload the registers it wants.
> <
> Yes, this is the "compiler cannot use" part.

I don't see why you would require such a rule.

I want to be able to, for example, have a high-water mark counter
register inside a transaction to indicate to a failure handler
how far we made it.
Allocate R1 to this and set it to zero before transaction starts.
At each interior milestone increment R1. If aborted R1 indicates
how far we made it (though the abort PC could do this too).

I can also see other register values from inside the failed
transaction might be helpful, say for restarting a transaction
from an intermediate point.

For registers to be consistent after an abort it requires that the
abort be precise, which interrupts and exceptions should be anyway.
So I don't see this as a difficult requirement.

Re: Branch prediction hints

<f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17154&group=comp.arch#17154

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:c3d1:: with SMTP id p17mr37532516qvi.44.1621958837615;
Tue, 25 May 2021 09:07:17 -0700 (PDT)
X-Received: by 2002:a9d:19ed:: with SMTP id k100mr20934749otk.329.1621958837315;
Tue, 25 May 2021 09:07:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.uzoreto.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 May 2021 09:07:17 -0700 (PDT)
In-Reply-To: <s8hl1s$mao$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:34d5:dcbf:8bd2:1b38;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:34d5:dcbf:8bd2:1b38
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me> <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me> <4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
<s8hl1s$mao$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 16:07:17 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 25 May 2021 16:07 UTC

On Monday, May 24, 2021 at 8:48:47 PM UTC-5, BGB wrote:
> On 5/24/2021 7:26 PM, MitchAlsup wrote:

> >> Making another layout attempt...
> >> This one keeps Imm9 where appropriate.
> >>
> >>
> >> Where, ppqq!=0100:
> >>
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
> >> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
> > <
> > 6-bit register specifiers are eating you alive.
> > It also appears you are using W for 16-bit containers.
> Yep.
>
> With the combination of the 6-bit registers and ppqq field, I end up
> with several bits less than I had with the 16/32 length encoding and
> 5-bit registers.
>
> B: Byte
> W: Word (16-bit)
> L: DWord (32-bit)
> Q: QWord (64-bit)
> X: XWord (128-bit)
<
Why not OWord O=Oct=8
>
> At least, assuming I didn't change the naming scheme.
> I missed the XWord ops, in BJX2 these are shoved off in various off corners.
>
> If I were do do anything like this, would make sense to keep 128-bit
> XWord operations and similar.
>
>
> As noted, the idea for this ISA was to have 64 GPRs, such that in-theory
> modulo loop scheduling could be used without blowing out the register
> budget quite as quickly. But, not quite as absurd as the 128 registers
> in IA-64 (and, with smaller instructions, so probably better code density).
>
> It is possible I could map PC, LR, and GBR into the GPR space.
> Eg:
> R60=SP (Stack Pointer)
> R61=LR (Link Register)
> R62=GBR (Global Base Register)
> R63=PC (Program Counter)
>
> It is possible that some of the high-order bits of PC could be aliased
> to several of the SR bits, and LR would save these bits. Part of this is
> because these are either frequently accessed in prolog/epilog sequences
> or needed as base registers.
> >>
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0000 ADD Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0001 SUB Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0010 MULS Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0011 MULU Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0100 -
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0101 AND Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0110 OR Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-0111 XOR Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1000 SHAD Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1001 SHLD Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1010 SHADQ Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1011 SHLDQ Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1100 ADC Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1101 SBB Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1110 DMULS Rs, Rt, Rn
> >> * ppqq-0001 00nn-nnnn ssss-sstt tttt-1111 DMULU Rs, Rt, Rn
> >>
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0000 SHAD Rs, Imm6u, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0001 SHAD Rs, Imm6n, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0010 SHADQ Rs, Imm6u, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0011 SHADQ Rs, Imm6n, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0100 SHLD Rs, Imm6u, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0101 SHLD Rs, Imm6n, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0110 SHLDQ Rs, Imm6u, Rn
> >> * ppqq-0010 00nn-nnnn ssss-ssii iiii-0111 SHLDQ Rs, Imm6n, Rn
> >>
> >> Then fill in FPU, ALUX, SIMD, ... operations.
> >>
> >> ...
> >>
> >> * ppqq-0011 00nn-nnnn ssss-ssoo oooo-oooo 1R and 2R spaces.
> >>
> >> ...
> >>
> >>
> >> This '00' block is basically where all the 3R and 2R ops go.
> >>
> >>
> >> * ppqq-0000 01nn-nnnn ssss-ss0i iiii-iiii MOV.B Rn, (Rs, Disp9)
> >> * ppqq-0000 01nn-nnnn ssss-ss1i iiii-iiii LEA.B Rn, (Rs, Disp9)
> >> * ppqq-0001 01nn-nnnn ssss-ss0i iiii-iiii MOV.W Rn, (Rs, Disp9)
> >> * ppqq-0001 01nn-nnnn ssss-ss1i iiii-iiii LEA.W Rn, (Rs, Disp9)
> >> * ppqq-0010 01nn-nnnn ssss-ss0i iiii-iiii MOV.L Rn, (Rs, Disp9)
> >> * ppqq-0010 01nn-nnnn ssss-ss1i iiii-iiii LEA.L Rn, (Rs, Disp9)
> >> * ppqq-0011 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q Rn, (Rs, Disp9)
> >> * ppqq-0011 01nn-nnnn ssss-ss1i iiii-iiii LEA.Q Rn, (Rs, Disp9)
> >> * ppqq-0100 01nn-nnnn ssss-ss0i iiii-iiii MOV.B (Rs, Disp9), Rn
> >> * ppqq-0100 01nn-nnnn ssss-ss1i iiii-iiii MOVU.B (Rs, Disp9), Rn
> >> * ppqq-0101 01nn-nnnn ssss-ss0i iiii-iiii MOV.W (Rs, Disp9), Rn
> >> * ppqq-0101 01nn-nnnn ssss-ss1i iiii-iiii MOVU.W (Rs, Disp9), Rn
> >> * ppqq-0110 01nn-nnnn ssss-ss0i iiii-iiii MOV.L (Rs, Disp9), Rn
> >> * ppqq-0110 01nn-nnnn ssss-ss1i iiii-iiii MOVU.L (Rs, Disp9), Rn
> >> * ppqq-0111 01nn-nnnn ssss-ss0i iiii-iiii MOV.Q (Rs, Disp9), Rn
> >> * ppqq-0111 01nn-nnnn ssss-ss1i iiii-iiii -
> >> ...
> >> * ppqq-1111 0100-iiii iiii-iiii iiii-iiii BRA Disp20
> >> * ppqq-1111 0101-iiii iiii-iiii iiii-iiii BSR Disp20
> >>
> >>
> >> ...
> >>
> >> * ppqq-0000 10nn-nnnn ssss-ss0i iiii-iiii ADD Rs, Imm9u, Rn
> >> * ppqq-0000 10nn-nnnn ssss-ss1i iiii-iiii ADD Rs, Imm9n, Rn
> >> * ppqq-0001 10nn-nnnn ssss-ss0i iiii-iiii MULS Rs, Imm9u, Rn
> >> * ppqq-0001 10nn-nnnn ssss-ss1i iiii-iiii MULU Rs, Imm9n, Rn
> >> * ppqq-0010 10nn-nnnn ssss-ss0i iiii-iiii ADDSL Rs, Imm9u, Rn
> >> * ppqq-0010 10nn-nnnn ssss-ss1i iiii-iiii ADDSL Rs, Imm9n, Rn
> >> * ppqq-0011 10nn-nnnn ssss-ss0i iiii-iiii ADDUL Rs, Imm9u, Rn
> >> * ppqq-0011 10nn-nnnn ssss-ss1i iiii-iiii ADDUL Rs, Imm9n, Rn
> >> * ppqq-0100 -
> >> * ppqq-0101 10nn-nnnn ssss-ss0i iiii-iiii AND Rs, Imm9u, Rn
> >> * ppqq-0110 10nn-nnnn ssss-ss0i iiii-iiii OR Rs, Imm9u, Rn
> >> * ppqq-0111 10nn-nnnn ssss-ss0i iiii-iiii XOR Rs, Imm9u, Rn
> >>
> >> * ppqq-1000 10nn-nnnn 0000-rrii iiii-iiii CMPEQ Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 0001-rrii iiii-iiii CMPEQ Imm10n, Rn
> >> * ppqq-1000 10nn-nnnn 0010-rrii iiii-iiii CMPQEQ Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 0011-rrii iiii-iiii CMPQEQ Imm10n, Rn
> >> * ppqq-1000 10nn-nnnn 0100-rrii iiii-iiii CMPGT Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 0101-rrii iiii-iiii CMPGT Imm10n, Rn
> >> * ppqq-1000 10nn-nnnn 0110-rrii iiii-iiii CMPQGT Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 0111-rrii iiii-iiii CMPQGT Imm10n, Rn
> >>
> >> * ppqq-1000 10nn-nnnn 1000-rrii iiii-iiii CMPHI Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 1001-rrii iiii-iiii CMPHI Imm10n, Rn
> >> * ppqq-1000 10nn-nnnn 1010-rrii iiii-iiii CMPQHI Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 1011-rrii iiii-iiii CMPQHI Imm10n, Rn
> >> * ppqq-1000 10nn-nnnn 1100-rrii iiii-iiii CMPGE Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 1101-rrii iiii-iiii CMPGE Imm10n, Rn
> >> * ppqq-1000 10nn-nnnn 1110-rrii iiii-iiii CMPQGE Imm10u, Rn
> >> * ppqq-1000 10nn-nnnn 1111-rrii iiii-iiii CMPQGE Imm10n, Rn
> >>
> >> ...
> >>
> >> * ppqq-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16u, Rn
> >> * ppqq-0001 11nn-nnnn iiii-iiii iiii-iiii LDI Imm16n, Rn
> >> * ppqq-0010 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16u, Rn
> >> * ppqq-0011 11nn-nnnn iiii-iiii iiii-iiii ADD Imm16n, Rn
> >> * ppqq-0100 11nn-nnnn iiii-iiii iiii-iiii LDISH Imm16u, Rn
> >> * ppqq-0101 11nn-nnnn iiii-iiii iiii-iiii FLDCH Imm16u, Rn
> >>
> >> ...
> >>
> >>
> >> Where, ppqq==0100:
> >>
> >> * 0100-00ii iiii-iiii iiii-iiii iiii-iiii BRA Disp26s
> >> * 0100-01ii iiii-iiii iiii-iiii iiii-iiii BSR Disp26s
> >> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Immed
> >> * 0100-1001 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Opcode
> >> * 0100-1010 iiii-iiii iiii-iiii iiii-iiii LDIZ Imm24, R0
> >> * 0100-1011 iiii-iiii iiii-iiii iiii-iiii LDIN Imm24, R0
> >> ...
> >>
> >>
> >> Likewise:
> >> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> >> 0100-1010 iiii-iiii iiii-iiii iiii-iiii BRA Abs48
> >>
> >> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> >> 0100-1011 iiii-iiii iiii-iiii iiii-iiii BSR Abs48
> >>
> >> * 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> >> 0100-1000 iiii-iiii iiii-iiii iiii-iiii _
> >> 0000-0000 11nn-nnnn iiii-iiii iiii-iiii LDI Imm64, Rn
> > <
> > It is my opinion, that you are holding onto the notion of using instructions
> > to paste bits together, rather than using the momentum of the instruction
> > stream to provide constants to operand fields avoiding having to execute
> > these instructions.
> Jumbo would be horizontal / wide-execute rather than sequential.
>
> But, this approach does avoid the decoder needing to have special case
> handling to deal with random bits of raw data in the instruction stream.
>
>
> But, then realized after posting this that there is also a '1100'
> encoding space which I could put Jumbo into:
> * 1100-1000 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Immed
> * 1100-1001 iiii-iiii iiii-iiii iiii-iiii Jumbo24_Opcode
>
> Which makes more sense...
>
> The Jumbo24_Immed would be used for "more encoding space", where needing
> to use 64-bit instructions to augment the 32-bit encoding space seems
> likely...
>
>
>
> But, thinking of it, may go with a slightly different QQ numbering, eg:
> 00: Pred 1
> 01: Pred 2
> 10: Pred 3
> 11: Fixed
>
> This would move the special/unconditional spaces to: 0111 and 1111.
> >>
> >>
> >> Or, basically, I guess it works, but this encoding space hasn't exactly
> >> gone very far...
> > <
> > Interesting tidying up, but no real movement in entropy.
> >>
> >>
> >> Though, I guess, unlike BJX2, it does have space to fit in a few large
> >> 26-bit branch ops.
> >>>>
> >>>>
> >>>> Meanwhile, in this top-level space, in BJX2 it was 3+4+1 (8) bits, with
> >>>> 3R ops adding 4-bits, and 2R ops adding 8 bits.
> >>>>
> >>>>
> >>>> The new design was seriously choked in the top-level space, but could
> >>>> have more space for 2R ops.
> >>> <
> >>> All of my 2-operand stuff went under 1 Major OpCode ( 001010 )
> >>> All of the 3-agen stuff went under 1 Major OpCode ( 001001 )
> >>> So I have 6 (of 64) Major OpCodes burned for everything not in the Major
> >>> OpCode group. One can tell if it has a chance of being an extension OpCode
> >>> (XOP) by looking at the top 2 bits (00), then the second top bit (xx0) is
> >>> for operand+immediate and (001) is for operand+operand.
> >> I had assumed a shared space for 2R and 3R ops, with 2R carved off of
> >> the 3R space.
> >>
> >> There isn't a huge shortage of 2R space at least...
> >>
> >> The 2R and 3R space is OK, assuming it is given 1/4 of the top-level
> >> space...
> > <
> > I only allotted the 2-operand space 1 major OpCode.
> > I only allotted the 3-operand space 1 major OpCode.
> > I still have 1/3rd of the Major OpCode space unallocated.
> >
> I still have an OK amount of free encoding space in BJX2 as well...


Click here to read the complete article
Re: Branch prediction hints

<7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17155&group=comp.arch#17155

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:ef90:: with SMTP id w16mr38342221qvr.28.1621959626691;
Tue, 25 May 2021 09:20:26 -0700 (PDT)
X-Received: by 2002:a54:4794:: with SMTP id o20mr14439665oic.99.1621959626480;
Tue, 25 May 2021 09:20:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 May 2021 09:20:26 -0700 (PDT)
In-Reply-To: <7R7rI.95638$iT.92788@fx19.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:34d5:dcbf:8bd2:1b38;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:34d5:dcbf:8bd2:1b38
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad>
<c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com> <s8earo$a9k$1@dont-email.me>
<e977b069-6ce0-46bf-ac2a-f7fb85ef9f6cn@googlegroups.com> <sBRqI.613761$nn2.535143@fx48.iad>
<7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com> <7R7rI.95638$iT.92788@fx19.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 16:20:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 97
 by: MitchAlsup - Tue, 25 May 2021 16:20 UTC

On Tuesday, May 25, 2021 at 9:29:26 AM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Monday, May 24, 2021 at 12:43:55 PM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Sunday, May 23, 2021 at 2:36:26 PM UTC-5, Ivan Godard wrote:
> >>>> On 5/23/2021 9:07 AM, MitchAlsup wrote:
> >>>>> On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:
> >>>>>> If you don't have a hint to explicitly block speculation at the
> >>>>>> branch then the design would have to use more complicated and
> >>>>>> probably error prone dynamic logic to "deduce" what to do.
> >>>>> <
> >>>>> You do not want Naked memory refs to be used to setup or complete
> >>>>> ATOMIC events. You need to "mark" their participation in the event
> >>>>> so the machine knows that such an event is going on from the out-
> >>>>> set.
> >>> <
> >>>> This also permits you to have intra-transaction stores that are not part
> >>>> of the transaction, say for logging and debugging, where you don't lock
> >>>> the log memory.
> >>> <
> >>> Yes, exactly--and that is where the word "participating" came into use
> >>> when doing the ASF as AMD. Since you CANNOT single step through
> >>> and ATOMIC event, you need some way of figuring out what is going
> >>> on, lobbing registers into a memory buffer for later printing is the
> >>> prescribed means.
> >>> <
> >> I also did not want to checkpoint the register set at the start
> >> of a transaction and roll them all back on abort.
> >> I want memory protected but not registers.
> >>
> >> When I played about (on paper) with the ASF design I found
> >> it quite inconvenient that there was no way to communicate
> >> anything from inside a transaction to outside.
> >>
> >> In my design when a transaction starts, it remembers the start PC.
> > <
> > ESM does this too, records the starting IP and if interference happens
> > the ATOMIC event restarts there. ESM also uses the Branch on memory
> > interference instruction to reset this point as as escape point should the
> > event fail later.
> > <
> > I admit that that was a problem in ASF.
> > <
> >> If an abort occurs, it tosses the protected memory changes,
> >> and jumps back to that PC and sets a status into a register,
> >> but any other registers already retired have their values retained.
<
> I also thought it might be nice to pass the PC of the last retired
> instruction to the fail abort code. Though it requires and extra
> register be reserved to pass this to failure handler.
<
But the failure of the ATOMIC event is not associated with an instruction
in this CPU, but was cause asynchronously by an instruction in a different
CPU ! All the IP from this thread give you is what inst was holding up this
thread. Even this is specious because it might have been running "just
fine".
> > <
> > ESM does nothing to the registers upon fail either, but it specifically
> > states that the compiler is not allowed to use the values in the
> > "participating" registers. The non-participating registers may not
> > have been updated in a von Neumann order, either.
> > <
> >> The abort handler would then reload the registers it wants.
> > <
> > Yes, this is the "compiler cannot use" part.
> I don't see why you would require such a rule.
<
At the instant of failure, one must consider all data obtained recently
from the concurrent data structure as stale.
<
Imagine that this thread is starting an ATOMIC event and that after getting
the first instruction performed, the thread gets context switched and
1000 other threads get a shot at the CDS. Now the originating thread
gets control again and continues. What is the probability that the read
performed 10M cycles ago remains useful ?
>
> I want to be able to, for example, have a high-water mark counter
> register inside a transaction to indicate to a failure handler
> how far we made it.
<
At various steps along the event path, you use a branch-on-interference
to change the control point associated with failure. Thus, when you
arrive at failure point[k] you failed in critical block[k].
<
> Allocate R1 to this and set it to zero before transaction starts.
> At each interior milestone increment R1. If aborted R1 indicates
> how far we made it (though the abort PC could do this too).
>
> I can also see other register values from inside the failed
> transaction might be helpful, say for restarting a transaction
> from an intermediate point.
<
Sounds dangerous to me. At any point of failure, HW releases all of the
memory that used to be "participating" in the event.
>
> For registers to be consistent after an abort it requires that the
> abort be precise, which interrupts and exceptions should be anyway.
> So I don't see this as a difficult requirement.

Re: Branch prediction hints

<2021May25.184501@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17157&group=comp.arch#17157

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Tue, 25 May 2021 16:45:01 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 26
Distribution: world
Message-ID: <2021May25.184501@mips.complang.tuwien.ac.at>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org> <36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com> <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org> <ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com> <12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com> <s8h1je$dhj$1@dont-email.me> <s8h69l$2tt$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="7775b0cecc2fbbc421d3f061b8ebcf87";
logging-data="2918"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19tE597V9nutHluDXDaGm73"
Cancel-Lock: sha1:B9W1erJBz0VNDih/Y9TD/+rxtM4=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Tue, 25 May 2021 16:45 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Marcus <m.delete@this.bitsnbites.eu> schrieb:
>
>> BTW, apart from VVM, are there any good examples of ISA:s with loop
>> instructions that are easy to predict ahead of time (thus effectively
>> unrolling loops, eliminating compares/branches, and reducing branch
>> predictor load)?
>
>POWER might count with its separate count register (sorry for
>the pun).

It certainly does. And the IA-32 and AMD64 LOOP instruction would
also count.

But in my measurements <2017Mar14.183125@mips.complang.tuwien.ac.at>
<2017Mar15.141411@mips.complang.tuwien.ac.at>, the measured CPUs don't
use LOOP for improving branch prediction. I think the problem is that
in OoO CPUs the instruction fetcher runs ahead of the data part, and
getting the correct count into the instruction fetcher is apparently
too hard. I expect that the same is the case for OoO Power
implementations.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Branch prediction hints

<s8jcu2$u5j$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17158&group=comp.arch#17158

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Tue, 25 May 2021 12:42:21 -0500
Organization: A noiseless patient Spider
Lines: 121
Message-ID: <s8jcu2$u5j$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me>
<13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me>
<23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me>
<1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me>
<4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
<4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
<s8hl1s$mao$1@dont-email.me>
<f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 25 May 2021 17:42:26 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2660c744ddf30ee30e8ef7bd54ded436";
logging-data="30899"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XVlYS20bhLsS1sMhsxpvA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.2
Cancel-Lock: sha1:L6GDhl2mSd575TKdgOeOcJrrG3M=
In-Reply-To: <f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
Content-Language: en-US
 by: BGB - Tue, 25 May 2021 17:42 UTC

On 5/25/2021 11:07 AM, MitchAlsup wrote:
> On Monday, May 24, 2021 at 8:48:47 PM UTC-5, BGB wrote:
>> On 5/24/2021 7:26 PM, MitchAlsup wrote:
>
>>>> Making another layout attempt...
>>>> This one keeps Imm9 where appropriate.
>>>>
>>>>
>>>> Where, ppqq!=0100:
>>>>
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
>>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
>>> <
>>> 6-bit register specifiers are eating you alive.
>>> It also appears you are using W for 16-bit containers.
>> Yep.
>>
>> With the combination of the 6-bit registers and ppqq field, I end up
>> with several bits less than I had with the 16/32 length encoding and
>> 5-bit registers.
>>
>> B: Byte
>> W: Word (16-bit)
>> L: DWord (32-bit)
>> Q: QWord (64-bit)
>> X: XWord (128-bit)
> <
> Why not OWord O=Oct=8

While 'OWord' is possible, don't really see this name used.

There seems to be an established convention:
* Q=64 bit
* X=128 bit
* Y=256 bit
* Z=512 bit

As for why L=DWord, didn't come up with this, but:
* The predecessor ISA's did it this way;
* GAS seems to do this on many/most targets;
* ...

While x86 had frequently used 'D' rather than 'L', 'D' becomes more
ambiguous when FPU / SIMD ops come up (eg: "Double").

Current convention for FP types:
* H: Half (Binary16)
* F or S: Float (Binary32)
* D: Double (Binary64)
* X or G: Long Double (Binary128)
** F + X implies G, whereas X by itself implies 128-bit integer.

A few ops in BJX2 had used 'D' for a "work on 32-bits but leave high
order bits in an undefined state" ops, but most were phased out as in
many of these contexts, "do 32-bit op but zero extend to 64-bit" makes
more sense and is generally more often what ended up being done anyways
(most alternatives to zero or sign extension cost more than zero or sign
extension). Likewise, the original purpose intended for these ops was
also served well enough by using zero extended forms.

In contrast, many 'L' ops (unless also 'Unsigned') tend to be
sign-extended by default. Some ops lack an 'L' but are 32-bit, could
consider changing this (the assembler may accept 'L' suffixed forms).

My other stuff (BGBScript VM / BGBCC / etc) had used:
* B/W/I/L/X

A compromise would likely be:
* B/W/I/Q/X

But, dunno...

I guess, one possible thing could be to try to make instruction naming
and notation in the ISA listings more consistent with my assembler.
* Listings use BRA/BSR in some cases where assembler uses JMP/JSR.
** The later used to reduce semantic ambiguity.
* Listings specify various 'LDI' forms;
** ASM typicaly uses 'MOV' here and the assembler figures it out itself.
* Some use of suffixes differ, ...
* In some contexts, in the assembler, the mnemonic also gives the
encoded instruction or immediate-value size.
** In a few contexts, it is relevant to specify this.
* ...

So, eg, "BRA Abs48" might be expressed as "JMP label" or "JMP8B label"
in ASM code (since a "label" by itself doesn't indicate its encoding,
"BRA label" will generally give a Disp8 or Disp20 form based on
branch-length optimization, and PC-relative vs absolute is sometimes
relevant, ...), but I guess it is mostly a question of how much
divergence is acceptable, given there isn't strictly a 1:1
correspondence in some cases.

Then again, it is possible in some sense the ASM more represents a
"ground truth", as opposed to the listing distinguishing things using
notations which may not actually be relevant (or usable) in actual ASM
code. Merging some cases where the assembler can figure it out itself
(though the distinction may matter for the CPU), or splitting up cases
where absent the listing's notation hints, the actual intended behavior
of the instruction would become ambiguous (BRA vs JMP, ...).

....

Re: Branch prediction hints

<381478a5-62e0-42ab-971e-0c8a1898b613n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17160&group=comp.arch#17160

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5b81:: with SMTP id 1mr34785663qvp.12.1621965310219;
Tue, 25 May 2021 10:55:10 -0700 (PDT)
X-Received: by 2002:a9d:7612:: with SMTP id k18mr23847245otl.178.1621965309902;
Tue, 25 May 2021 10:55:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 May 2021 10:55:09 -0700 (PDT)
In-Reply-To: <2021May25.184501@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:34d5:dcbf:8bd2:1b38;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:34d5:dcbf:8bd2:1b38
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com> <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
<ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com> <12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>
<s8h1je$dhj$1@dont-email.me> <s8h69l$2tt$1@newsreader4.netcologne.de> <2021May25.184501@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <381478a5-62e0-42ab-971e-0c8a1898b613n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 17:55:10 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 25 May 2021 17:55 UTC

On Tuesday, May 25, 2021 at 11:58:55 AM UTC-5, Anton Ertl wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> >Marcus <m.de...@this.bitsnbites.eu> schrieb:
> >
> >> BTW, apart from VVM, are there any good examples of ISA:s with loop
> >> instructions that are easy to predict ahead of time (thus effectively
> >> unrolling loops, eliminating compares/branches, and reducing branch
> >> predictor load)?
> >
> >POWER might count with its separate count register (sorry for
> >the pun).
> It certainly does. And the IA-32 and AMD64 LOOP instruction would
> also count.
>
> But in my measurements <2017Mar1...@mips.complang.tuwien.ac.at>
> <2017Mar1...@mips.complang.tuwien.ac.at>, the measured CPUs don't
> use LOOP for improving branch prediction. I think the problem is that
> in OoO CPUs the instruction fetcher runs ahead of the data part, and
> getting the correct count into the instruction fetcher is apparently
> too hard. I expect that the same is the case for OoO Power
> implementations.
<
If the HW does not have an understanding of the ratios cycles & iteration
Then it does not know how to calculate loop termination aprioi. This
problem goes away with VEC and LOOP in My 66000, HW can determine
the above and use it to "adjust" the loop count arithmetic and not fill
the machine with instructions associated with the iterations that will NOT
happen.
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Branch prediction hints

<37435c71-b112-480f-9da8-770f5b8a02fan@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17161&group=comp.arch#17161

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:574:: with SMTP id p20mr38373636qkp.70.1621965467373;
Tue, 25 May 2021 10:57:47 -0700 (PDT)
X-Received: by 2002:a4a:d41a:: with SMTP id n26mr23885223oos.66.1621965467163;
Tue, 25 May 2021 10:57:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 May 2021 10:57:46 -0700 (PDT)
In-Reply-To: <s8jcu2$u5j$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:34d5:dcbf:8bd2:1b38;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:34d5:dcbf:8bd2:1b38
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me> <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me> <4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
<s8hl1s$mao$1@dont-email.me> <f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
<s8jcu2$u5j$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <37435c71-b112-480f-9da8-770f5b8a02fan@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 May 2021 17:57:47 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 25 May 2021 17:57 UTC

On Tuesday, May 25, 2021 at 12:42:28 PM UTC-5, BGB wrote:
> On 5/25/2021 11:07 AM, MitchAlsup wrote:
> > On Monday, May 24, 2021 at 8:48:47 PM UTC-5, BGB wrote:
> >> On 5/24/2021 7:26 PM, MitchAlsup wrote:
> >
> >>>> Making another layout attempt...
> >>>> This one keeps Imm9 where appropriate.
> >>>>
> >>>>
> >>>> Where, ppqq!=0100:
> >>>>
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0000 MOV.B Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0001 LEA.B Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0010 MOV.W Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0011 LEA.W Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0100 MOV.L Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0101 LEA.L Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0110 MOV.Q Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-0111 LEA.Q Rn, (Rs, Rt)
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1000 MOV.B (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1001 MOVU.B (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1010 MOV.W (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1011 MOVU.W (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1100 MOV.L (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1101 MOVU.L (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1110 MOV.Q (Rs, Rt), Rn
> >>>> * ppqq-0000 00nn-nnnn ssss-sstt tttt-1111 -
> >>> <
> >>> 6-bit register specifiers are eating you alive.
> >>> It also appears you are using W for 16-bit containers.
> >> Yep.
> >>
> >> With the combination of the 6-bit registers and ppqq field, I end up
> >> with several bits less than I had with the 16/32 length encoding and
> >> 5-bit registers.
> >>
> >> B: Byte
> >> W: Word (16-bit)
> >> L: DWord (32-bit)
> >> Q: QWord (64-bit)
> >> X: XWord (128-bit)
> > <
> > Why not OWord O=Oct=8
> While 'OWord' is possible, don't really see this name used.
>
> There seems to be an established convention:
> * Q=64 bit
> * X=128 bit
> * Y=256 bit
> * Z=512 bit
>
>
> As for why L=DWord, didn't come up with this, but:
> * The predecessor ISA's did it this way;
> * GAS seems to do this on many/most targets;
> * ...
>
> While x86 had frequently used 'D' rather than 'L', 'D' becomes more
> ambiguous when FPU / SIMD ops come up (eg: "Double").
>
> Current convention for FP types:
> * H: Half (Binary16)
> * F or S: Float (Binary32)
> * D: Double (Binary64)
> * X or G: Long Double (Binary128)
> ** F + X implies G, whereas X by itself implies 128-bit integer.
>
>
> A few ops in BJX2 had used 'D' for a "work on 32-bits but leave high
> order bits in an undefined state" ops, but most were phased out as in
> many of these contexts, "do 32-bit op but zero extend to 64-bit" makes
> more sense and is generally more often what ended up being done anyways
> (most alternatives to zero or sign extension cost more than zero or sign
> extension). Likewise, the original purpose intended for these ops was
> also served well enough by using zero extended forms.
>
> In contrast, many 'L' ops (unless also 'Unsigned') tend to be
> sign-extended by default. Some ops lack an 'L' but are 32-bit, could
> consider changing this (the assembler may accept 'L' suffixed forms).
>
>
> My other stuff (BGBScript VM / BGBCC / etc) had used:
> * B/W/I/L/X
>
> A compromise would likely be:
> * B/W/I/Q/X
>
> But, dunno...
>
>
> I guess, one possible thing could be to try to make instruction naming
> and notation in the ISA listings more consistent with my assembler.
> * Listings use BRA/BSR in some cases where assembler uses JMP/JSR.
<
Here I make a clear distinction:: branches are IP relative, JMPs are absolute.
<
> ** The later used to reduce semantic ambiguity.
> * Listings specify various 'LDI' forms;
> ** ASM typicaly uses 'MOV' here and the assembler figures it out itself.
> * Some use of suffixes differ, ...
> * In some contexts, in the assembler, the mnemonic also gives the
> encoded instruction or immediate-value size.
> ** In a few contexts, it is relevant to specify this.
> * ...
>
> So, eg, "BRA Abs48" might be expressed as "JMP label" or "JMP8B label"
> in ASM code (since a "label" by itself doesn't indicate its encoding,
> "BRA label" will generally give a Disp8 or Disp20 form based on
> branch-length optimization, and PC-relative vs absolute is sometimes
> relevant, ...), but I guess it is mostly a question of how much
> divergence is acceptable, given there isn't strictly a 1:1
> correspondence in some cases.
>
>
> Then again, it is possible in some sense the ASM more represents a
> "ground truth", as opposed to the listing distinguishing things using
> notations which may not actually be relevant (or usable) in actual ASM
> code. Merging some cases where the assembler can figure it out itself
> (though the distinction may matter for the CPU), or splitting up cases
> where absent the listing's notation hints, the actual intended behavior
> of the instruction would become ambiguous (BRA vs JMP, ...).
>
> ...

Re: Branch prediction hints

<jwvk0nmr2u5.fsf-monnier+comp.arch@gnu.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17168&group=comp.arch#17168

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Tue, 25 May 2021 16:14:47 -0400
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <jwvk0nmr2u5.fsf-monnier+comp.arch@gnu.org>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me>
<13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me>
<23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me>
<1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me>
<4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
<4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
<s8hl1s$mao$1@dont-email.me>
<f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
<s8jcu2$u5j$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="0b4c3ff4f4e7019f60bb0b337854c91f";
logging-data="6558"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cfn4xjocvGI7AGtRUyyeZ"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:7znpqc+AT+YQAnnAsgwIy58mOSY=
sha1:cUmJN/PRY+WiSli7u2a/t/iUMcw=
 by: Stefan Monnier - Tue, 25 May 2021 20:14 UTC

>>> B: Byte
>>> W: Word (16-bit)
>>> L: DWord (32-bit)
>>> Q: QWord (64-bit)
>>> X: XWord (128-bit)
>> <
>> Why not OWord O=Oct=8
>
> While 'OWord' is possible, don't really see this name used.
>
> There seems to be an established convention:
> * Q=64 bit
> * X=128 bit
> * Y=256 bit
> * Z=512 bit

I vote for:

1 = Byte
2 = 16bit word
3 = 32bit word
4 = 64bit word
5 = 128bit word

Much easier to remember than arbitrary Q/L/W/X soup.
Tho, admittedly, there's a case to be made for

0 = bit
1 = 2bit
2 = nibble
3 = byte
4 = 16bit word
5 = 32bit word
6 = 64bit word
7 = 128bit word

Stefan

Re: Branch prediction hints

<s8jopp$tvt$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17170&group=comp.arch#17170

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Tue, 25 May 2021 16:04:52 -0500
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <s8jopp$tvt$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me>
<s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me>
<13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me>
<23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me>
<1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me>
<4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me>
<4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
<s8hl1s$mao$1@dont-email.me>
<f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
<s8jcu2$u5j$1@dont-email.me> <jwvk0nmr2u5.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 25 May 2021 21:04:57 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2660c744ddf30ee30e8ef7bd54ded436";
logging-data="30717"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DaOf/nl0WDhSN4Ii0u2U2"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.2
Cancel-Lock: sha1:0Dqep+qUc9KDRgeczvMuE8FCluk=
In-Reply-To: <jwvk0nmr2u5.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US
 by: BGB - Tue, 25 May 2021 21:04 UTC

On 5/25/2021 3:14 PM, Stefan Monnier wrote:
>>>> B: Byte
>>>> W: Word (16-bit)
>>>> L: DWord (32-bit)
>>>> Q: QWord (64-bit)
>>>> X: XWord (128-bit)
>>> <
>>> Why not OWord O=Oct=8
>>
>> While 'OWord' is possible, don't really see this name used.
>>

Looking around some more, it seems that OWord and DQWord may be more
common than XWord in this context...

>> There seems to be an established convention:
>> * Q=64 bit
>> * X=128 bit
>> * Y=256 bit
>> * Z=512 bit
>
> I vote for:
>
> 1 = Byte
> 2 = 16bit word
> 3 = 32bit word
> 4 = 64bit word
> 5 = 128bit word
>
> Much easier to remember than arbitrary Q/L/W/X soup.
> Tho, admittedly, there's a case to be made for
>
> 0 = bit
> 1 = 2bit
> 2 = nibble
> 3 = byte
> 4 = 16bit word
> 5 = 32bit word
> 6 = 64bit word
> 7 = 128bit word
>

Dunno, not seen that much use of numbers in (typical) ASM mnemonics.

In cases I have used numbers, it has usually been for the desired size
of the instruction encoding.

Though, in any case, there isn't that much consistency for things like
naming conventions or syntax rules when it comes to ASM, just sort of
general trends.

Re: Branch prediction hints

<6166a2dd-9174-48f2-96ee-01723d86c647n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17204&group=comp.arch#17204

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:6852:: with SMTP id d79mr2473064qkc.369.1622107493389;
Thu, 27 May 2021 02:24:53 -0700 (PDT)
X-Received: by 2002:aca:3644:: with SMTP id d65mr4717221oia.122.1622107493163;
Thu, 27 May 2021 02:24:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 27 May 2021 02:24:52 -0700 (PDT)
In-Reply-To: <s8jopp$tvt$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me>
<s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me>
<s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com>
<s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com>
<s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com>
<s8h31l$vv$1@dont-email.me> <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
<s8hf4q$mrr$1@dont-email.me> <4185805d-4d21-408c-9169-f04bb7a5782cn@googlegroups.com>
<s8hl1s$mao$1@dont-email.me> <f896ec86-770a-49f0-a2c6-6cb3ad96038en@googlegroups.com>
<s8jcu2$u5j$1@dont-email.me> <jwvk0nmr2u5.fsf-monnier+comp.arch@gnu.org> <s8jopp$tvt$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6166a2dd-9174-48f2-96ee-01723d86c647n@googlegroups.com>
Subject: Re: Branch prediction hints
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Thu, 27 May 2021 09:24:53 +0000
Content-Type: text/plain; charset="UTF-8"
 by: robf...@gmail.com - Thu, 27 May 2021 09:24 UTC

On Tuesday, May 25, 2021 at 5:05:00 PM UTC-4, BGB wrote:
> On 5/25/2021 3:14 PM, Stefan Monnier wrote:
> >>>> B: Byte
> >>>> W: Word (16-bit)
> >>>> L: DWord (32-bit)
> >>>> Q: QWord (64-bit)
> >>>> X: XWord (128-bit)
> >>> <
> >>> Why not OWord O=Oct=8
> >>
> >> While 'OWord' is possible, don't really see this name used.
> >>
> Looking around some more, it seems that OWord and DQWord may be more
> common than XWord in this context...
> >> There seems to be an established convention:
> >> * Q=64 bit
> >> * X=128 bit
> >> * Y=256 bit
> >> * Z=512 bit
> >
> > I vote for:
> >
> > 1 = Byte
> > 2 = 16bit word
> > 3 = 32bit word
> > 4 = 64bit word
> > 5 = 128bit word
> >
> > Much easier to remember than arbitrary Q/L/W/X soup.
> > Tho, admittedly, there's a case to be made for
> >
> > 0 = bit
> > 1 = 2bit
> > 2 = nibble
> > 3 = byte
> > 4 = 16bit word
> > 5 = 32bit word
> > 6 = 64bit word
> > 7 = 128bit word
> >
> Dunno, not seen that much use of numbers in (typical) ASM mnemonics.
>
> In cases I have used numbers, it has usually been for the desired size
> of the instruction encoding.
>
> Though, in any case, there isn't that much consistency for things like
> naming conventions or syntax rules when it comes to ASM, just sort of
> general trends.

I have been following the convention suggested on the MMIX page by Knuth I think.
Greek Letters IIRC except for byte, wyde.
B = Byte
W = Wyde (16 bits)
T = Tetra (32 bits)
O = Octa (64 bits)
H = Hexi (128 bits)

Re: HW Transactions [was Branch prediction hints]

<SYMrI.1324$z%.1288@fx06.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17212&group=comp.arch#17212

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx06.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: HW Transactions [was Branch prediction hints]
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad> <c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com> <s8earo$a9k$1@dont-email.me> <e977b069-6ce0-46bf-ac2a-f7fb85ef9f6cn@googlegroups.com> <sBRqI.613761$nn2.535143@fx48.iad> <7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com> <7R7rI.95638$iT.92788@fx19.iad> <7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com>
In-Reply-To: <7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 156
Message-ID: <SYMrI.1324$z%.1288@fx06.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 27 May 2021 13:16:34 UTC
Date: Thu, 27 May 2021 09:16:23 -0400
X-Received-Bytes: 8119
 by: EricP - Thu, 27 May 2021 13:16 UTC

MitchAlsup wrote:
> On Tuesday, May 25, 2021 at 9:29:26 AM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Monday, May 24, 2021 at 12:43:55 PM UTC-5, EricP wrote:
>>>> MitchAlsup wrote:
>>>>> On Sunday, May 23, 2021 at 2:36:26 PM UTC-5, Ivan Godard wrote:
>>>>>> On 5/23/2021 9:07 AM, MitchAlsup wrote:
>>>>>>> On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:
>>>>>>>> If you don't have a hint to explicitly block speculation at the
>>>>>>>> branch then the design would have to use more complicated and
>>>>>>>> probably error prone dynamic logic to "deduce" what to do.
>>>>>>> <
>>>>>>> You do not want Naked memory refs to be used to setup or complete
>>>>>>> ATOMIC events. You need to "mark" their participation in the event
>>>>>>> so the machine knows that such an event is going on from the out-
>>>>>>> set.
>>>>> <
>>>>>> This also permits you to have intra-transaction stores that are not part
>>>>>> of the transaction, say for logging and debugging, where you don't lock
>>>>>> the log memory.
>>>>> <
>>>>> Yes, exactly--and that is where the word "participating" came into use
>>>>> when doing the ASF as AMD. Since you CANNOT single step through
>>>>> and ATOMIC event, you need some way of figuring out what is going
>>>>> on, lobbing registers into a memory buffer for later printing is the
>>>>> prescribed means.
>>>>> <
>>>> I also did not want to checkpoint the register set at the start
>>>> of a transaction and roll them all back on abort.
>>>> I want memory protected but not registers.
>>>>
>>>> When I played about (on paper) with the ASF design I found
>>>> it quite inconvenient that there was no way to communicate
>>>> anything from inside a transaction to outside.
>>>>
>>>> In my design when a transaction starts, it remembers the start PC.
>>> <
>>> ESM does this too, records the starting IP and if interference happens
>>> the ATOMIC event restarts there. ESM also uses the Branch on memory
>>> interference instruction to reset this point as as escape point should the
>>> event fail later.
>>> <
>>> I admit that that was a problem in ASF.
>>> <
>>>> If an abort occurs, it tosses the protected memory changes,
>>>> and jumps back to that PC and sets a status into a register,
>>>> but any other registers already retired have their values retained.
> <
>> I also thought it might be nice to pass the PC of the last retired
>> instruction to the fail abort code. Though it requires and extra
>> register be reserved to pass this to failure handler.
> <
> But the failure of the ATOMIC event is not associated with an instruction
> in this CPU, but was cause asynchronously by an instruction in a different
> CPU ! All the IP from this thread give you is what inst was holding up this
> thread. Even this is specious because it might have been running "just
> fine".

Like an interrupt it is caused asynchronously so we don't know in advance
which transaction instructions will have retired and written their
registers or committed their non-protected stores, and which have not.

But the abort is precise in that all instructions prior to the abort IP
have retired and their register write-back and non-protected stores
have been performed, and none of the instructions at or after the
abort IP have written back or stored.

In other words, the register state is not the 370 model 91 free-for-all.

All the abort does is toss any transaction protected memory stores
(these were being held in cache, tucked aside in a pending state).
It can't toss any non-protected stores as they were already
performed when that store instruction retired.

>>> <
>>> ESM does nothing to the registers upon fail either, but it specifically
>>> states that the compiler is not allowed to use the values in the
>>> "participating" registers. The non-participating registers may not
>>> have been updated in a von Neumann order, either.
>>> <
>>>> The abort handler would then reload the registers it wants.
>>> <
>>> Yes, this is the "compiler cannot use" part.
>> I don't see why you would require such a rule.
> <
> At the instant of failure, one must consider all data obtained recently
> from the concurrent data structure as stale.

Yes, but that doesn't mean the register contents are not precise
as of the abort point and therefore in an unpredictable state.
If inside the transaction it does

ADD r3=r1+r2
INC r4

then if r4 is updated we know r3 was updated.

Maybe you have a definition of "participating register"
that is different than what I expected that meant.
I assumed "participating register" means any register
updated by an instruction within a transaction.

These registers are still written in order at write-back/retire.

> <
> Imagine that this thread is starting an ATOMIC event and that after getting
> the first instruction performed, the thread gets context switched and
> 1000 other threads get a shot at the CDS. Now the originating thread
> gets control again and continues. What is the probability that the read
> performed 10M cycles ago remains useful ?

Yes, the register values MAY be stale wrt memory,
it depends on the transaction algorithm,
but the registers were written in program order.

>> I want to be able to, for example, have a high-water mark counter
>> register inside a transaction to indicate to a failure handler
>> how far we made it.
> <
> At various steps along the event path, you use a branch-on-interference
> to change the control point associated with failure. Thus, when you
> arrive at failure point[k] you failed in critical block[k].
> <

I take it you mean repeated updates the abort jump destination
while the transaction is running.

I didn't think of this.
Good idea.

>> Allocate R1 to this and set it to zero before transaction starts.
>> At each interior milestone increment R1. If aborted R1 indicates
>> how far we made it (though the abort PC could do this too).
>>
>> I can also see other register values from inside the failed
>> transaction might be helpful, say for restarting a transaction
>> from an intermediate point.
> <
> Sounds dangerous to me. At any point of failure, HW releases all of the
> memory that used to be "participating" in the event.

Here I'm thinking of maybe a update counter on an object.
If there is a collision and an abort, on restart one might
check the update counter to see if anything changed.

Or maybe a binary tree with free nodes only recovered at epochs.
You walk the tree to a node, try to update, and abort.
On restart if the epoch has not changed then you can restart at the
same node you previously found, knowing the object space has not
been recovered.

In both cases it is necessary to convey information from
inside an aborted transaction to outside,
and registers are preferable to non-protected stores.

Re: HW Transactions [was Branch prediction hints]

<635d5d66-4824-4cee-8fec-df26f14f5d96n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17217&group=comp.arch#17217

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:99db:: with SMTP id y27mr4314824qve.19.1622134043503;
Thu, 27 May 2021 09:47:23 -0700 (PDT)
X-Received: by 2002:a4a:a5c2:: with SMTP id k2mr3475104oom.5.1622134043263;
Thu, 27 May 2021 09:47:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 27 May 2021 09:47:23 -0700 (PDT)
In-Reply-To: <SYMrI.1324$z%.1288@fx06.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:14ec:dfa0:a657:35a2;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:14ec:dfa0:a657:35a2
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad>
<c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com> <s8earo$a9k$1@dont-email.me>
<e977b069-6ce0-46bf-ac2a-f7fb85ef9f6cn@googlegroups.com> <sBRqI.613761$nn2.535143@fx48.iad>
<7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com> <7R7rI.95638$iT.92788@fx19.iad>
<7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com> <SYMrI.1324$z%.1288@fx06.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <635d5d66-4824-4cee-8fec-df26f14f5d96n@googlegroups.com>
Subject: Re: HW Transactions [was Branch prediction hints]
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 27 May 2021 16:47:23 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 27 May 2021 16:47 UTC

On Thursday, May 27, 2021 at 8:16:45 AM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, May 25, 2021 at 9:29:26 AM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Monday, May 24, 2021 at 12:43:55 PM UTC-5, EricP wrote:
> >>>> MitchAlsup wrote:
> >>>>> On Sunday, May 23, 2021 at 2:36:26 PM UTC-5, Ivan Godard wrote:
> >>>>>> On 5/23/2021 9:07 AM, MitchAlsup wrote:
> >>>>>>> On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:
> >>>>>>>> If you don't have a hint to explicitly block speculation at the
> >>>>>>>> branch then the design would have to use more complicated and
> >>>>>>>> probably error prone dynamic logic to "deduce" what to do.
> >>>>>>> <
> >>>>>>> You do not want Naked memory refs to be used to setup or complete
> >>>>>>> ATOMIC events. You need to "mark" their participation in the event
> >>>>>>> so the machine knows that such an event is going on from the out-
> >>>>>>> set.
> >>>>> <
> >>>>>> This also permits you to have intra-transaction stores that are not part
> >>>>>> of the transaction, say for logging and debugging, where you don't lock
> >>>>>> the log memory.
> >>>>> <
> >>>>> Yes, exactly--and that is where the word "participating" came into use
> >>>>> when doing the ASF as AMD. Since you CANNOT single step through
> >>>>> and ATOMIC event, you need some way of figuring out what is going
> >>>>> on, lobbing registers into a memory buffer for later printing is the
> >>>>> prescribed means.
> >>>>> <
> >>>> I also did not want to checkpoint the register set at the start
> >>>> of a transaction and roll them all back on abort.
> >>>> I want memory protected but not registers.
> >>>>
> >>>> When I played about (on paper) with the ASF design I found
> >>>> it quite inconvenient that there was no way to communicate
> >>>> anything from inside a transaction to outside.
> >>>>
> >>>> In my design when a transaction starts, it remembers the start PC.
> >>> <
> >>> ESM does this too, records the starting IP and if interference happens
> >>> the ATOMIC event restarts there. ESM also uses the Branch on memory
> >>> interference instruction to reset this point as as escape point should the
> >>> event fail later.
> >>> <
> >>> I admit that that was a problem in ASF.
> >>> <
> >>>> If an abort occurs, it tosses the protected memory changes,
> >>>> and jumps back to that PC and sets a status into a register,
> >>>> but any other registers already retired have their values retained.
> > <
> >> I also thought it might be nice to pass the PC of the last retired
> >> instruction to the fail abort code. Though it requires and extra
> >> register be reserved to pass this to failure handler.
> > <
> > But the failure of the ATOMIC event is not associated with an instruction
> > in this CPU, but was cause asynchronously by an instruction in a different
> > CPU ! All the IP from this thread give you is what inst was holding up this
> > thread. Even this is specious because it might have been running "just
> > fine".
<
> Like an interrupt it is caused asynchronously so we don't know in advance
> which transaction instructions will have retired and written their
> registers or committed their non-protected stores, and which have not.
>
> But the abort is precise in that all instructions prior to the abort IP
> have retired and their register write-back and non-protected stores
> have been performed, and none of the instructions at or after the
> abort IP have written back or stored.
<
I thought we were using the term 'abort' when a thread gets into transaction
and decides for itself that the transaction cannot be completed, and takes
steps to undo any damage it may have caused--whereas 'fail' is HW detected
and vectors to a point allowing no damage to the concurrent data structure.
>
> In other words, the register state is not the 370 model 91 free-for-all.
>
> All the abort does is toss any transaction protected memory stores
> (these were being held in cache, tucked aside in a pending state).
> It can't toss any non-protected stores as they were already
> performed when that store instruction retired.
<
On this we agree, participating memory is undamaged, non-participating
memory may or may not have been performed (depending on where fail
is detected.)
> >>> <
> >>> ESM does nothing to the registers upon fail either, but it specifically
> >>> states that the compiler is not allowed to use the values in the
> >>> "participating" registers. The non-participating registers may not
> >>> have been updated in a von Neumann order, either.
> >>> <
> >>>> The abort handler would then reload the registers it wants.
> >>> <
> >>> Yes, this is the "compiler cannot use" part.
> >> I don't see why you would require such a rule.
> > <
> > At the instant of failure, one must consider all data obtained recently
> > from the concurrent data structure as stale.
<
> Yes, but that doesn't mean the register contents are not precise
> as of the abort point and therefore in an unpredictable state.
<
I should point out that this is a violation of vonNeumann order, and
necessarily so........
<
> If inside the transaction it does
>
> ADD r3=r1+r2
> INC r4
>
> then if r4 is updated we know r3 was updated.
<
My point is::
<
LDD Rd,[Rcds+Rindex+offset]:LOCK
......
fail: // the value in Rd may not be the value in [Rcds+Rindex+offset] anymore
<
Thus the compiler cannot consider Rd as having a well known value.
>
> Maybe you have a definition of "participating register"
> that is different than what I expected that meant.
> I assumed "participating register" means any register
> updated by an instruction within a transaction.
<
Only memory is Participating in a transaction.
>
> These registers are still written in order at write-back/retire.
<
Yes, but if you reach the fail point, the value is no longer necessarily
what was in memory where is was loaded from.
> > <
> > Imagine that this thread is starting an ATOMIC event and that after getting
> > the first instruction performed, the thread gets context switched and
> > 1000 other threads get a shot at the CDS. Now the originating thread
> > gets control again and continues. What is the probability that the read
> > performed 10M cycles ago remains useful ?
<
> Yes, the register values MAY be stale wrt memory,
> it depends on the transaction algorithm,
> but the registers were written in program order.
<
> >> I want to be able to, for example, have a high-water mark counter
> >> register inside a transaction to indicate to a failure handler
> >> how far we made it.
> > <
> > At various steps along the event path, you use a branch-on-interference
> > to change the control point associated with failure. Thus, when you
> > arrive at failure point[k] you failed in critical block[k].
> > <
> I take it you mean repeated updates the abort jump destination
> while the transaction is running.
<
Yes
>
> I didn't think of this.
> Good idea.
<
> >> Allocate R1 to this and set it to zero before transaction starts.
> >> At each interior milestone increment R1. If aborted R1 indicates
> >> how far we made it (though the abort PC could do this too).
> >>
> >> I can also see other register values from inside the failed
> >> transaction might be helpful, say for restarting a transaction
> >> from an intermediate point.
> > <
> > Sounds dangerous to me. At any point of failure, HW releases all of the
> > memory that used to be "participating" in the event.
<
> Here I'm thinking of maybe a update counter on an object.
> If there is a collision and an abort, on restart one might
> check the update counter to see if anything changed.
<
Data in registers is "as when" the transfer of control took place.
Data in participating memory is "as before" the ATOMIC event began
Data in non-participating memory is "as whatever actually happened".
>
> Or maybe a binary tree with free nodes only recovered at epochs.
> You walk the tree to a node, try to update, and abort.
> On restart if the epoch has not changed then you can restart at the
> same node you previously found, knowing the object space has not
> been recovered.
>
The the top of the tree could have been altered and if you re-walk
the tree from the top, you could very well end up at a different node
you want to process.
>
> In both cases it is necessary to convey information from
> inside an aborted transaction to outside,
> and registers are preferable to non-protected stores.
<
I guess I have to soften my assertion that only the values loaded from
the concurrent data structure and modification to them become stale on fail.


Click here to read the complete article
Re: HW Transactions [was Branch prediction hints]

<ykOsI.6077$jf1.5447@fx37.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17270&group=comp.arch#17270

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx37.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: HW Transactions [was Branch prediction hints]
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad> <c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com> <s8earo$a9k$1@dont-email.me> <e977b069-6ce0-46bf-ac2a-f7fb85ef9f6cn@googlegroups.com> <sBRqI.613761$nn2.535143@fx48.iad> <7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com> <7R7rI.95638$iT.92788@fx19.iad> <7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com> <SYMrI.1324$z%.1288@fx06.iad> <635d5d66-4824-4cee-8fec-df26f14f5d96n@googlegroups.com>
In-Reply-To: <635d5d66-4824-4cee-8fec-df26f14f5d96n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 248
Message-ID: <ykOsI.6077$jf1.5447@fx37.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 30 May 2021 15:39:10 UTC
Date: Sun, 30 May 2021 11:37:23 -0400
X-Received-Bytes: 12419
 by: EricP - Sun, 30 May 2021 15:37 UTC

MitchAlsup wrote:
> On Thursday, May 27, 2021 at 8:16:45 AM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Tuesday, May 25, 2021 at 9:29:26 AM UTC-5, EricP wrote:
>>>> MitchAlsup wrote:
>>>>> On Monday, May 24, 2021 at 12:43:55 PM UTC-5, EricP wrote:
>>>>>> MitchAlsup wrote:
>>>>>>> On Sunday, May 23, 2021 at 2:36:26 PM UTC-5, Ivan Godard wrote:
>>>>>>>> On 5/23/2021 9:07 AM, MitchAlsup wrote:
>>>>>>>>> On Sunday, May 23, 2021 at 8:54:03 AM UTC-5, EricP wrote:
>>>>>>>>>> If you don't have a hint to explicitly block speculation at the
>>>>>>>>>> branch then the design would have to use more complicated and
>>>>>>>>>> probably error prone dynamic logic to "deduce" what to do.
>>>>>>>>> <
>>>>>>>>> You do not want Naked memory refs to be used to setup or complete
>>>>>>>>> ATOMIC events. You need to "mark" their participation in the event
>>>>>>>>> so the machine knows that such an event is going on from the out-
>>>>>>>>> set.
>>>>>>> <
>>>>>>>> This also permits you to have intra-transaction stores that are not part
>>>>>>>> of the transaction, say for logging and debugging, where you don't lock
>>>>>>>> the log memory.
>>>>>>> <
>>>>>>> Yes, exactly--and that is where the word "participating" came into use
>>>>>>> when doing the ASF as AMD. Since you CANNOT single step through
>>>>>>> and ATOMIC event, you need some way of figuring out what is going
>>>>>>> on, lobbing registers into a memory buffer for later printing is the
>>>>>>> prescribed means.
>>>>>>> <
>>>>>> I also did not want to checkpoint the register set at the start
>>>>>> of a transaction and roll them all back on abort.
>>>>>> I want memory protected but not registers.
>>>>>>
>>>>>> When I played about (on paper) with the ASF design I found
>>>>>> it quite inconvenient that there was no way to communicate
>>>>>> anything from inside a transaction to outside.
>>>>>>
>>>>>> In my design when a transaction starts, it remembers the start PC.
>>>>> <
>>>>> ESM does this too, records the starting IP and if interference happens
>>>>> the ATOMIC event restarts there. ESM also uses the Branch on memory
>>>>> interference instruction to reset this point as as escape point should the
>>>>> event fail later.
>>>>> <
>>>>> I admit that that was a problem in ASF.
>>>>> <
>>>>>> If an abort occurs, it tosses the protected memory changes,
>>>>>> and jumps back to that PC and sets a status into a register,
>>>>>> but any other registers already retired have their values retained.
>>> <
>>>> I also thought it might be nice to pass the PC of the last retired
>>>> instruction to the fail abort code. Though it requires and extra
>>>> register be reserved to pass this to failure handler.
>>> <
>>> But the failure of the ATOMIC event is not associated with an instruction
>>> in this CPU, but was cause asynchronously by an instruction in a different
>>> CPU ! All the IP from this thread give you is what inst was holding up this
>>> thread. Even this is specious because it might have been running "just
>>> fine".
> <
>> Like an interrupt it is caused asynchronously so we don't know in advance
>> which transaction instructions will have retired and written their
>> registers or committed their non-protected stores, and which have not.
>>
>> But the abort is precise in that all instructions prior to the abort IP
>> have retired and their register write-back and non-protected stores
>> have been performed, and none of the instructions at or after the
>> abort IP have written back or stored.
> <
> I thought we were using the term 'abort' when a thread gets into transaction
> and decides for itself that the transaction cannot be completed, and takes
> steps to undo any damage it may have caused--whereas 'fail' is HW detected
> and vectors to a point allowing no damage to the concurrent data structure.

I was using the 'abort' and 'fail' interchangeably since once the
rollback mechanism is invoked it works the same in both cases.
It doesn't have to work that way though.

So in the case of an AxAbort instruction, the IP is for that instruction.
In the case of a collision, it is the IP whenever the abort/fail
is injected into the instruction stream at Retire.

>> In other words, the register state is not the 370 model 91 free-for-all.
>>
>> All the abort does is toss any transaction protected memory stores
>> (these were being held in cache, tucked aside in a pending state).
>> It can't toss any non-protected stores as they were already
>> performed when that store instruction retired.
> <
> On this we agree, participating memory is undamaged, non-participating
> memory may or may not have been performed (depending on where fail
> is detected.)
>>>>> <
>>>>> ESM does nothing to the registers upon fail either, but it specifically
>>>>> states that the compiler is not allowed to use the values in the
>>>>> "participating" registers. The non-participating registers may not
>>>>> have been updated in a von Neumann order, either.
>>>>> <
>>>>>> The abort handler would then reload the registers it wants.
>>>>> <
>>>>> Yes, this is the "compiler cannot use" part.
>>>> I don't see why you would require such a rule.
>>> <
>>> At the instant of failure, one must consider all data obtained recently
>>> from the concurrent data structure as stale.
> <
>> Yes, but that doesn't mean the register contents are not precise
>> as of the abort point and therefore in an unpredictable state.
> <
> I should point out that this is a violation of vonNeumann order, and
> necessarily so........
> <

???
I was trying to clarify what you meant by your statement "the compiler
is not allowed to use the values in the "participating" registers".

I think the phrasing of your statement was too broad as it implies that
any register updated during a transaction is left in an unpredictable
state, not just those that were loaded from protected memory locations.

But you clarified your meaning below when you said
"Data in registers is "as when" the transfer of control took place"
which is what I mean too.

This allows registers to be used to pass information from inside a
transaction to the failure handler.

I see the rollback mechanism as being somewhat like an interrupt.
The AxBegin instruction remembers the IP and starts the transaction.
As the transaction proceeds it sets the equivalent of watchpoints
on protected memory locations. Once set the watchpoints can trigger
the equivalent of an interrupt any time up to the Commit.
If contention occurs and this cpu looses, then the watchpoints
are all canceled and a special interrupt is jammed into the
instruction queue at Retire which causes the IP to jump
back to the AxBegin instruction.

So the register state is consistent as of the last retired instruction
before the abort/fail interrupt was delivered.

>> If inside the transaction it does
>>
>> ADD r3=r1+r2
>> INC r4
>>
>> then if r4 is updated we know r3 was updated.
> <
> My point is::
> <
> LDD Rd,[Rcds+Rindex+offset]:LOCK
> ......
> fail: // the value in Rd may not be the value in [Rcds+Rindex+offset] anymore
> <
> Thus the compiler cannot consider Rd as having a well known value.

Yes. My concern was the state _other_ registers were left in
when the collision abort/fail interrupt is delivered,
and you clarified that.

>> Maybe you have a definition of "participating register"
>> that is different than what I expected that meant.
>> I assumed "participating register" means any register
>> updated by an instruction within a transaction.
> <
> Only memory is Participating in a transaction.


Click here to read the complete article
Re: HW Transactions [was Branch prediction hints]

<17df51db-c8e1-48ba-84c9-2fad0979c4c1n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17272&group=comp.arch#17272

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:edcf:: with SMTP id i15mr13107022qvr.10.1622390984234;
Sun, 30 May 2021 09:09:44 -0700 (PDT)
X-Received: by 2002:aca:d658:: with SMTP id n85mr1676946oig.84.1622390983972;
Sun, 30 May 2021 09:09:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 30 May 2021 09:09:43 -0700 (PDT)
In-Reply-To: <ykOsI.6077$jf1.5447@fx37.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e806:6f53:a34f:6c2f;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e806:6f53:a34f:6c2f
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <Z7tqI.61928$N%1.35599@fx28.iad>
<c42f2bb0-6920-44d2-8877-cff238443ca7n@googlegroups.com> <s8earo$a9k$1@dont-email.me>
<e977b069-6ce0-46bf-ac2a-f7fb85ef9f6cn@googlegroups.com> <sBRqI.613761$nn2.535143@fx48.iad>
<7fb7aaf5-f3d2-42c0-b258-f670603330f5n@googlegroups.com> <7R7rI.95638$iT.92788@fx19.iad>
<7b244afc-8b29-4df3-86c3-c4e9f7a025ffn@googlegroups.com> <SYMrI.1324$z%.1288@fx06.iad>
<635d5d66-4824-4cee-8fec-df26f14f5d96n@googlegroups.com> <ykOsI.6077$jf1.5447@fx37.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <17df51db-c8e1-48ba-84c9-2fad0979c4c1n@googlegroups.com>
Subject: Re: HW Transactions [was Branch prediction hints]
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 30 May 2021 16:09:44 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sun, 30 May 2021 16:09 UTC

On Sunday, May 30, 2021 at 10:39:14 AM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Thursday, May 27, 2021 at 8:16:45 AM UTC-5, EricP wrote:

> >> Yes, but that doesn't mean the register contents are not precise
> >> as of the abort point and therefore in an unpredictable state.
> > <
> > I should point out that this is a violation of vonNeumann order, and
> > necessarily so........
> > <
> ???
<
For example: there is a store to non-participating memory that completes
which is topologically AFTER a ST to participating memory that is wiped-out
by the HW fail mechanism. This will be observed as non-vonNeumann
behavior. Some instructions executed to completion and others did not.
<
> I was trying to clarify what you meant by your statement "the compiler
> is not allowed to use the values in the "participating" registers".
<
MY wording is sloppy, but the registers containing data read from the CDS
and the values derived/calculated from those reads have all become stale.
Pretty much like a Compare-and-Swap which fails the compare, that
value is no longer relevant to the attempt to "perform" something ATOMIC.
<
> I think the phrasing of your statement was too broad as it implies that
> any register updated during a transaction is left in an unpredictable
> state, not just those that were loaded from protected memory locations.
<
You have no way of knowing where the failure took place (it is asynchronous)
so you don't know how many instruction were executed, all you know is that
control arrived at the failure point.
<
Certainly you can agree that the destination registers of instructions not
executed do not contain useful values ? or the memory locations of stores
that were not performed ?
<
For the myriad of ATOMIC primitives I have looked at, this pretty much
constitutes all of the instructions in that event !
>
> But you clarified your meaning below when you said
> "Data in registers is "as when" the transfer of control took place"
> which is what I mean too.
>
> This allows registers to be used to pass information from inside a
> transaction to the failure handler.
>
> I see the rollback mechanism as being somewhat like an interrupt.
> The AxBegin instruction remembers the IP and starts the transaction.
> As the transaction proceeds it sets the equivalent of watchpoints
> on protected memory locations. Once set the watchpoints can trigger
> the equivalent of an interrupt any time up to the Commit.
> If contention occurs and this cpu looses, then the watchpoints
> are all canceled and a special interrupt is jammed into the
> instruction queue at Retire which causes the IP to jump
> back to the AxBegin instruction.
<
Acceptably accurate.
>
> So the register state is consistent as of the last retired instruction
> before the abort/fail interrupt was delivered.
<
Yes, but what is the compiler allowed to use from that state ?
<
> >> If inside the transaction it does
> >>
> >> ADD r3=r1+r2
> >> INC r4
> >>
> >> then if r4 is updated we know r3 was updated.
> > <
> > My point is::
> > <
> > LDD Rd,[Rcds+Rindex+offset]:LOCK
> > ......
> > fail: // the value in Rd may not be the value in [Rcds+Rindex+offset] anymore
> > <
> > Thus the compiler cannot consider Rd as having a well known value.
<
> Yes. My concern was the state _other_ registers were left in
> when the collision abort/fail interrupt is delivered,
> and you clarified that.
<
> >> Maybe you have a definition of "participating register"
> >> that is different than what I expected that meant.
> >> I assumed "participating register" means any register
> >> updated by an instruction within a transaction.
> > <
> > Only memory is Participating in a transaction.
<
> Yes, we are sync'd up.
<
> >> These registers are still written in order at write-back/retire.
> > <
> > Yes, but if you reach the fail point, the value is no longer necessarily
> > what was in memory where is was loaded from.
<
> Yes, we are sync'd up.
<snip>
> > <
> > Data in registers is "as when" the transfer of control took place.
> > Data in participating memory is "as before" the ATOMIC event began
> > Data in non-participating memory is "as whatever actually happened".
<
> Yes, this is what I was saying.
<
That is the HW view of things, but what is the compiler allowed to know
about register and memory state ?
>
> The question of what state the registers are left in comes as a
> consequence of the specification not including the register set
> checkpoint and rollback that other HTM's have.
> And this answers it.
<
The HW side, but what about the compiler side ?
<
Is the compiler supposed to know that R17 contains useful data but
R18 does not when both were manipulated inside the failing event ?
How would the compiler figure this out?
<
Which is why I setup the rule where the compiler should forget every
thing that happened inside a failing ATOMIC event.
<
> >> Or maybe a binary tree with free nodes only recovered at epochs.
> >> You walk the tree to a node, try to update, and abort.
> >> On restart if the epoch has not changed then you can restart at the
> >> same node you previously found, knowing the object space has not
> >> been recovered.
> >>
> > The the top of the tree could have been altered and if you re-walk
> > the tree from the top, you could very well end up at a different node
> > you want to process.
<
> Possibly, but I don't want my usage assumptions to limit any algorithms.
<
> >> In both cases it is necessary to convey information from
> >> inside an aborted transaction to outside,
> >> and registers are preferable to non-protected stores.
> > <
> > I guess I have to soften my assertion that only the values loaded from
> > the concurrent data structure and modification to them become stale on fail.

Re: Branch prediction hints

<86v96cbd8q.fsf@linuxsc.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17874&group=comp.arch#17874

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Thu, 17 Jun 2021 10:41:09 -0700
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <86v96cbd8q.fsf@linuxsc.com>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org> <36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com> <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org> <ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com> <12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com> <s8h1je$dhj$1@dont-email.me> <s8ha7s$r7k$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="f0b9b4b9a1b21005dc3e7b402c48f6f1";
logging-data="25994"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jFKlFcivvj4FwUPV9emQWJ5mzlth1lBw="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:44ui0O+FslxvFg63JUeSi9Q+a5A=
sha1:YL/+QIWUwM2B9LifYI+ri2yOGxs=
 by: Tim Rentsch - Thu, 17 Jun 2021 17:41 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:

> On 5/24/2021 1:16 PM, Marcus wrote:
>
>> On 2021-05-23, MitchAlsup wrote:
>>
>>> On Sunday, May 23, 2021 at 12:15:04 PM UTC-5, robf...@gmail.com wrote:
>>>
>>>> Speaking of the usefulness of branch hints for prediction I have
>>>> to agree
>>>> that they are not that useful. As a gag though I added the ability
>>>> to supply
>>>> branch predictor hints in ?if? statements that also allowed the branch
>>>> predictor to be selected. How useful is it to be able to select
>>>> the branch
>>>> predictor to use (assuming multiple predictors are present)?
>>>> The only case I can think of is maybe power savings.
>>>
>>> I might note that Virtual Vector Method loops do not use the branch
>>> predictor
>>> but are executed in advance of the loop iteration to effectively
>>> perform as if
>>> the branch took zero cycles when the loop terminates (and zero
>>> cycles when
>>> the loop continues.)
>>> This improves the prediction accuracy of the "rest of the branches".
>>> PREDication also does not use the branch predictor getting the HW setup
>>> to execute either then-clause or else-clause. This also improves
>>> the prediction
>>> accuracy of the "rest of the branches".
>>
>> Those are nice properties, and some of it reminds me of DSP style
>> "hardware assisted loops" (e.g. SPLOOP in TI 320C66x).
>>
>> MRISC32 style vector loops still have regular loop branch instructions,
>> and although they execute less frequently (the core of the vector loop
>> is "hardware assisted"), they still occupy slots in the branch
>> predictor.
>>
>> BTW, apart from VVM, are there any good examples of ISA:s with loop
>> instructions that are easy to predict ahead of time (thus effectively
>> unrolling loops, eliminating compares/branches, and reducing branch
>> predictor load)?
>
> I think IBM S/360 had something like this. [...]

Are you thinking of BXH/BXLE?

(It's amazing that I still remember this stuff after 50 years...)

Re: Branch prediction hints

<sag2u2$68g$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17876&group=comp.arch#17876

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Branch prediction hints
Date: Thu, 17 Jun 2021 11:05:53 -0700
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <sag2u2$68g$1@dont-email.me>
References: <s8c0j2$q5d$1@newsreader4.netcologne.de>
<s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com>
<jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
<ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com>
<12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>
<s8h1je$dhj$1@dont-email.me> <s8ha7s$r7k$1@dont-email.me>
<86v96cbd8q.fsf@linuxsc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 17 Jun 2021 18:05:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d537c4bea7931aef812281c72fd7a4d2";
logging-data="6416"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/W0AcrgDSWfGYwT/U8kcr8foCi3XBeRCI="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:QDduf/VUX/603IALtsstzyWyJZU=
In-Reply-To: <86v96cbd8q.fsf@linuxsc.com>
Content-Language: en-US
 by: Stephen Fuld - Thu, 17 Jun 2021 18:05 UTC

On 6/17/2021 10:41 AM, Tim Rentsch wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>
>> On 5/24/2021 1:16 PM, Marcus wrote:

snip

>>> BTW, apart from VVM, are there any good examples of ISA:s with loop
>>> instructions that are easy to predict ahead of time (thus effectively
>>> unrolling loops, eliminating compares/branches, and reducing branch
>>> predictor load)?
>>
>> I think IBM S/360 had something like this. [...]
>
> Are you thinking of BXH/BXLE?
>
> (It's amazing that I still remember this stuff after 50 years...)

Could be. It has been almost 50 years for me, and I just don't
remember. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Branch prediction hints

<47552fb7-dba9-47e7-9406-a24108985564n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17880&group=comp.arch#17880

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5a44:: with SMTP id o4mr6407749qta.392.1623954881970;
Thu, 17 Jun 2021 11:34:41 -0700 (PDT)
X-Received: by 2002:a05:6830:3089:: with SMTP id f9mr5809614ots.276.1623954881762;
Thu, 17 Jun 2021 11:34:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 17 Jun 2021 11:34:41 -0700 (PDT)
In-Reply-To: <86v96cbd8q.fsf@linuxsc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3d96:df7c:31b3:6d42;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3d96:df7c:31b3:6d42
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com> <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
<ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com> <12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>
<s8h1je$dhj$1@dont-email.me> <s8ha7s$r7k$1@dont-email.me> <86v96cbd8q.fsf@linuxsc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <47552fb7-dba9-47e7-9406-a24108985564n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 17 Jun 2021 18:34:41 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 17 Jun 2021 18:34 UTC

On Thursday, June 17, 2021 at 12:41:11 PM UTC-5, Tim Rentsch wrote:
> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>
> > On 5/24/2021 1:16 PM, Marcus wrote:
> >
> >> On 2021-05-23, MitchAlsup wrote:
> >>
> >>> On Sunday, May 23, 2021 at 12:15:04 PM UTC-5, robf...@gmail.com wrote:
> >>>
> >>>> Speaking of the usefulness of branch hints for prediction I have
> >>>> to agree
> >>>> that they are not that useful. As a gag though I added the ability
> >>>> to supply
> >>>> branch predictor hints in ?if? statements that also allowed the branch
> >>>> predictor to be selected. How useful is it to be able to select
> >>>> the branch
> >>>> predictor to use (assuming multiple predictors are present)?
> >>>> The only case I can think of is maybe power savings.
> >>>
> >>> I might note that Virtual Vector Method loops do not use the branch
> >>> predictor
> >>> but are executed in advance of the loop iteration to effectively
> >>> perform as if
> >>> the branch took zero cycles when the loop terminates (and zero
> >>> cycles when
> >>> the loop continues.)
> >>> This improves the prediction accuracy of the "rest of the branches".
> >>> PREDication also does not use the branch predictor getting the HW setup
> >>> to execute either then-clause or else-clause. This also improves
> >>> the prediction
> >>> accuracy of the "rest of the branches".
> >>
> >> Those are nice properties, and some of it reminds me of DSP style
> >> "hardware assisted loops" (e.g. SPLOOP in TI 320C66x).
> >>
> >> MRISC32 style vector loops still have regular loop branch instructions,
> >> and although they execute less frequently (the core of the vector loop
> >> is "hardware assisted"), they still occupy slots in the branch
> >> predictor.
> >>
> >> BTW, apart from VVM, are there any good examples of ISA:s with loop
> >> instructions that are easy to predict ahead of time (thus effectively
> >> unrolling loops, eliminating compares/branches, and reducing branch
> >> predictor load)?
> >
> > I think IBM S/360 had something like this. [...]
>
> Are you thinking of BXH/BXLE?
>
> (It's amazing that I still remember this stuff after 50 years...)
<
BXH and BXLE are control transfers for LOOPs; so it is wise to assume
that LOOP control transfers are taken (in the absence of a real predictor).

Re: Branch prediction hints

<ff0a5ce5-aa87-4c26-8809-78e28ff351a5n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=17890&group=comp.arch#17890

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:f309:: with SMTP id p9mr6189056qkg.363.1623968382555;
Thu, 17 Jun 2021 15:19:42 -0700 (PDT)
X-Received: by 2002:a05:6830:33ea:: with SMTP id i10mr6345677otu.342.1623968382342;
Thu, 17 Jun 2021 15:19:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 17 Jun 2021 15:19:42 -0700 (PDT)
In-Reply-To: <47552fb7-dba9-47e7-9406-a24108985564n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:9d3b:eb08:6be5:9836;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:9d3b:eb08:6be5:9836
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8dcbt$7f4$1@gioia.aioe.org>
<36626ffe-f5d8-4a62-af27-310684375561n@googlegroups.com> <jwvfsyd1jio.fsf-monnier+comp.arch@gnu.org>
<ad6c1950-c7df-4ec0-b3ab-20550baccb67n@googlegroups.com> <12fa6b22-9cf8-4dd0-813d-1b8b21058c50n@googlegroups.com>
<s8h1je$dhj$1@dont-email.me> <s8ha7s$r7k$1@dont-email.me> <86v96cbd8q.fsf@linuxsc.com>
<47552fb7-dba9-47e7-9406-a24108985564n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff0a5ce5-aa87-4c26-8809-78e28ff351a5n@googlegroups.com>
Subject: Re: Branch prediction hints
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 17 Jun 2021 22:19:42 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Thu, 17 Jun 2021 22:19 UTC

On Thursday, June 17, 2021 at 12:34:43 PM UTC-6, MitchAlsup wrote:
> On Thursday, June 17, 2021 at 12:41:11 PM UTC-5, Tim Rentsch wrote:
> > Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:

> > > I think IBM S/360 had something like this. [...]

> > Are you thinking of BXH/BXLE?

> > (It's amazing that I still remember this stuff after 50 years...)

> BXH and BXLE are control transfers for LOOPs; so it is wise to assume
> that LOOP control transfers are taken (in the absence of a real predictor).

Yes. Of course, they weren't originally intended to serve for branch hinting;
they simply integrated very common functions into a single instruction.

John Savard

Pages:123
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor