Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The herd instinct among economists makes sheep look like independent thinkers.


computers / comp.arch / Re: Branch prediction hints

Re: Branch prediction hints

<4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=17132&group=comp.arch#17132

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4756:: with SMTP id k22mr29203199qtp.193.1621892211350; Mon, 24 May 2021 14:36:51 -0700 (PDT)
X-Received: by 2002:aca:c64a:: with SMTP id w71mr795956oif.44.1621892211105; Mon, 24 May 2021 14:36:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 May 2021 14:36:50 -0700 (PDT)
In-Reply-To: <s8h31l$vv$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ed6f:f412:a8c7:989c; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ed6f:f412:a8c7:989c
References: <s8c0j2$q5d$1@newsreader4.netcologne.de> <s8cmv1$1e7$1@dont-email.me> <s8csfm$172$1@dont-email.me> <s8d40h$73f$1@dont-email.me> <s8dakc$27r$1@dont-email.me> <s8e60c$ca6$1@dont-email.me> <s8eabb$ig$1@dont-email.me> <13fa2553-eaf2-43df-a87a-3559a45d88a0n@googlegroups.com> <s8ev0d$t2t$1@dont-email.me> <23b797e3-2809-4cd4-a5b4-2085a35f98cen@googlegroups.com> <s8f9bo$dup$1@dont-email.me> <1430abb7-231c-4f12-985a-44b623c2fcafn@googlegroups.com> <s8h31l$vv$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4d114eb0-d49c-478c-b49c-7270cb39a687n@googlegroups.com>
Subject: Re: Branch prediction hints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 24 May 2021 21:36:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 222
 by: MitchAlsup - Mon, 24 May 2021 21:36 UTC

On Monday, May 24, 2021 at 3:41:28 PM UTC-5, BGB wrote:
> On 5/24/2021 12:52 PM, MitchAlsup wrote:
> > On Sunday, May 23, 2021 at 11:16:58 PM UTC-5, BGB wrote:
> >> On 5/23/2021 8:32 PM, MitchAlsup wrote:

> >>> Would I be out of line to state that this sounds like a poor starting point?
> >> Probably.
> >>
> >> It is more akin to designing for a 16-bit ISA, where it doesn't take
> >> much to eat through pretty much all of it.
> Clarification:
> I had meant, "it probably was a poor starting point" rather than "it was
> probably out of line"...
<
I knew that.......
> >>> <
> >>> My 66000 has 1/3rd of its Major OpCode space unallocated,
> >>> a bit less than 1/2 of its memory reference OpCode Space allocated,
> >>> a bit less than 1/2 of its 2-operand OpCode Space allocated,
> >>> a bit less than 1/128 of its 1-operand Op[Code Apace allocated,
> >>> and 1/4 of its 3-operand OpCode Space unallocated.
> >> Starts looking at it a little more, and realizing encoding space may be
> >> a more serious problem than I realized initially...
> >>
> >>
> >> I can't really map BJX2 to this new space, it just doesn't fit...
> >>
> >>
> >> Then again, maybe it might win more points with the "RISC means small
> >> ISA listing" crowd... Because one runs out of encoding bits before they
> >> can fit all that much into it...
> >>
> >>
> >> "Well, Imma define some Disp9 Load/Store Ops...",
> >> "Oh-Noes, that was 1/4 of the encoding space!",
> >> "How about some 3R Load/Store ops and 3R ALU ops and 2R space",
> >> "Now it at 1/2 of the opcode space!"
> > <
> > To be fair, I made a loot of these mistakes in Mc 88K, and corrected the
> > vast majority of them in My 66000.
> >>
> >> Then one has to struggle to fit some useful 3RI ALU ops, 2RI ops, and
> >> Branch ops, before realizing they are already basically out of encoding
> >> space...
> > <
> > The important thing to remember is that the most precious resource is
> > the Major OpCode space--and the reason is that this gives you access
> > to the other spaces.
> > <
> > In My 66000, the Major OpCode space consists of all 16-bit immediates
> > The branches with IP relative offsets, and the extension OpCodes, of
> > which there are 6 {Predication, Shifts, 2R+Disp memory refs, 2-Operand,
> > 3-Operand, and 1-Operand.}
> > <
> > For all of the extended instructions, My 66000 has 3-bits to control the
> > signs of the operands and access to long immediates, and access to
> > 5-bit immediates in Src1. This supports things like 1<<k in a single instruction.
> > <
> > The second most important resource is the 3-operand space because
> > there are only 8 available entries and we need FMAC (single and double),
> > CMOV, and INSert.
> > <
> > The other spaces are so partially populated that one has a pretty free
> > reign.
> OK.
>
> In my initial layout, was starting from a 6-bit major space, with a
> 4-bit minor space for 3R ops, and an additional 6 bits for 2R ops.
<
6-bit major:: check
I only got 3-bit 3-operand OpCode because I use 3 other bits for sign
control and access to immediates::
<
FMAC Rd, R1,±R2,±R3
So you can change the sign associated with multiplication or with adding
and get 4 flavors of MACing. This seems to work usefully well with the
bit field INSert instruction as bit-inversion rather than negation.
>
> Doing a Disp9 or Imm9 op would only have the 6-bit major opcode, which
> doesn't really go all that far.
<
I guess I am missing something, here, as I get Imm16 and DISP16<<2 for both
of these, and for unconditional branches (or CALL) I get DISP26<<2. Must have
something to do with packing or unpacking of WEX.....
>
>
> Meanwhile, in this top-level space, in BJX2 it was 3+4+1 (8) bits, with
> 3R ops adding 4-bits, and 2R ops adding 8 bits.
>
>
> The new design was seriously choked in the top-level space, but could
> have more space for 2R ops.
<
All of my 2-operand stuff went under 1 Major OpCode ( 001010 )
All of the 3-agen stuff went under 1 Major OpCode ( 001001 )
So I have 6 (of 64) Major OpCodes burned for everything not in the Major
OpCode group. One can tell if it has a chance of being an extension OpCode
(XOP) by looking at the top 2 bits (00), then the second top bit (xx0) is
for operand+immediate and (001) is for operand+operand.
>
>
> The result would be to mostly drop Imm9 and use Imm6 instead, but:
> There is a lot more that doesn't fit into 6 bits that would have fit into 9;
> It basically precludes being able to encode an arbitrary 32 or 33 bit
> value in a 64-bit pair (since the Jumbo prefix also needs its chunk of
> encoding space, and a 27-bit jumbo prefix is basically no-go).
>
>
>
> For my existing ISA, I did go and make a tweak:
> Some of the flag bits from SR are saved in the high-order bits of LR
> during function calls (and restored on function return);
> This means predication should now work across function calls, and also
> resolves a few potential ISA semantics issues involving WEX (the WEX
> Enable state and similar is also now preserved across function calls).
>
>
> Though, it does result in LUT cost increasing by a few %, which isn't
> ideal. Can't really how much of this is due to an actual/significant
> cost increase vs random fluctuation.
>
> WNS hasn't really changed either way, and the WNS value usually
> indicates if one has poked at something serious. Likewise, the slow
> paths still seem to be mostly stuff within the memory subsystem.
> >>
> >>
> >> Yeah, a shortfall of several bits seems to make a pretty big difference...
> >>
> >>
> >> It goes a little further if one does Load/Store and 3RI ops using
> >> Disp6/Imm6 instead of Disp9/Imm9.
> >>
> >> Not enough bits to encode an Imm33/Disp33 in a 64-bit pair, and not
> >> enough bits to encode Imm64 in 96-bits, ...
> >>
> >>
> >> Yeah, "poor starting point" is starting to seem fairly evident...
> >>>>
> >>>>
> >>>> Still not fully settled on instruction layouts yet, and don't feel
> >>>> particularly inclined at the moment to pursue this, since the main way
> >>>> to "actually take advantage of it" would like require use of modulo loop
> >>>> scheduling or clever function inlining or similar (or, basically, one of
> >>>> the same issues which Itanium had to deal with).
> >>>>
> >>>> Some possible debate is whether code would benefit from a move from 32
> >>>> to 64 GPRs. Short of some tasks which come up in an OpenGL rasterizer
> >>>> (namely parallel edge walking over a bunch of parameters or similar), I
> >>>> have doubts.
> >>>>
> >>>>
> >>>> It is more likely to pay off for a wider core, but this would assume
> >>>> having a compiler which is effective enough to use the additional width
> >>>> (whereas, as-is, my compiler can't even really manage 3-wide effectively).
> >>>>
> >>> I have lived under the assumption that the wider cores have the HW resources
> >>> to do many of these things for themselves, so that code written, compiled, and
> >>> scheduled for the 1-wide cores run within spitting distance of the best compiled
> >>> code one could target at the GBOoO core. I developed this assumption from the
> >>> Mc 88120 effort where we even achieved 2.0 IPC running SPEC 89 XLISP ! and
> >>> 5.99 IPC running MATRIX300.
> > <
> >> I am assuming a lack of any OoO or GBOoO capabilities, and instead a
> >> strictly in-order bundle-at-a-time core more like the existing BJX2
> >> pipeline, just possibly widened from 3 to 5 or similar.
> > <
> > Yes, you are targeting a particular chip to hold your design, while I am
> > designing from the very small (1-wide In Order) to the moderately large
> > (8-wide Out of Order)
> Granted.
>
>
> For me, significantly smaller FPGA's fall into the "not generally sold
> on FPGA dev boards on Amazon or similar" territory (1).
>
> And, bigger FPGA's in the "they are too expensive and I don't have money
> to afford them" territory.
>
> Similarly, custom ASIC's are well outside anything I am able likely able
> to be able to afford (and, the ability to "actually do things" is a
> limiting factor).

SubjectRepliesAuthor
o Branch prediction hints

By: Thomas Koenig on Sat, 22 May 2021

72Thomas Koenig
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor