Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.


devel / comp.arch / Compare-and-branch vs PIC

SubjectAuthor
* Compare-and-branch vs PICRussell Wallace
+* Re: Compare-and-branch vs PICTheo
|`- Re: Compare-and-branch vs PICTheo
+- Re: Compare-and-branch vs PICMitchAlsup
+* Re: Compare-and-branch vs PICStephen Fuld
|+* Re: Compare-and-branch vs PICBGB
||`- Re: Compare-and-branch vs PICMitchAlsup
|`- Re: Compare-and-branch vs PICRussell Wallace
`* Re: Compare-and-branch vs PICBGB
 `* Re: Compare-and-branch vs PICMitchAlsup
  +* Re: Compare-and-branch vs PICrobf...@gmail.com
  |+- Re: Compare-and-branch vs PICBGB
  |`* Re: Compare-and-branch vs PICMitchAlsup
  | +* Re: Compare-and-branch vs PICScott Lurndal
  | |`- Re: Compare-and-branch vs PICMitchAlsup
  | `* Re: Compare-and-branch vs PICBGB
  |  +* Re: Compare-and-branch vs PICrobf...@gmail.com
  |  |+* Re: Compare-and-branch vs PICEricP
  |  ||+- Re: Compare-and-branch vs PICBGB
  |  ||`* Re: everything old is new again, Compare-and-branch vs PICJohn Levine
  |  || +- Re: everything old is new again, Compare-and-branch vs PICrobf...@gmail.com
  |  || +- Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  || `* Re: everything old is new again, Compare-and-branch vs PICEricP
  |  ||  +- Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||  +* Re: everything old is new again, Compare-and-branch vs PICJohn Levine
  |  ||  |+* Re: everything old is new again, Compare-and-branch vs PICStephen Fuld
  |  ||  ||`* Re: 36 bit history, everything old is new again, Compare-and-branch vs PICJohn Levine
  |  ||  || `- Re: 36 bit history, everything old is new again, Compare-and-branchStephen Fuld
  |  ||  |`* Re: everything old is new again, Compare-and-branch vs PICEricP
  |  ||  | `- Re: everything old is new again, Compare-and-branch vs PICJohn Levine
  |  ||  `* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||   `* Re: everything old is new again, Compare-and-branch vs PICEricP
  |  ||    `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||     `* Re: everything old is new again, Compare-and-branch vs PICEricP
  |  ||      `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||       `* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        +* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |+* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||+* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||`* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| +* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |+- Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| |`* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        ||| | `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |  +* Re: everything old is new again, Compare-and-branch vs PICStephen Fuld
  |  ||        ||| |  |+- Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| |  |`* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |  | `- Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        ||| |  `* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        ||| |   `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |    `* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| |     `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |      `* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| |       `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |        `* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| |         `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        ||| |          `- Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||| +* Re: everything old is new again, Compare-and-branch vs PICTerje Mathisen
  |  ||        ||| |`- Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        ||| `* Re: everything old is new again, Compare-and-branch vs PICAnton Ertl
  |  ||        |||  +* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  |+* Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |||  ||+* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  |||+- Re: everything old is new again, Compare-and-branch vs PICrobf...@gmail.com
  |  ||        |||  |||+* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        |||  ||||`* Re: everything old is new again, Compare-and-branch vs PICStephen Fuld
  |  ||        |||  |||| +- Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  |||| +- Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  |||| `- Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |||  |||`* Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |||  ||| `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  |||  `* Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |||  |||   `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  |||    `* Re: everything old is new again, Compare-and-branch vs PICScott Lurndal
  |  ||        |||  |||     `- Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  ||`* Re: everything old is new again, Compare-and-branch vs PICTerje Mathisen
  |  ||        |||  || +- Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        |||  || +- Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  || +- Re: everything old is new again, Compare-and-branch vs PICAnton Ertl
  |  ||        |||  || `* Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |||  ||  `* Re: everything old is new again, Compare-and-branch vs PICAnton Ertl
  |  ||        |||  ||   +* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        |||  ||   |`* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  ||   | +* Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  ||   | |`* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  ||   | | `* Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  ||   | |  `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  ||   | |   `* Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  ||   | |    `* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  ||   | |     `* Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  ||   | |      +* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        |||  ||   | |      |+* Re: everything old is new again, Compare-and-branch vs PICEricP
  |  ||        |||  ||   | |      ||+- Re: everything old is new again, Compare-and-branch vs PICAnton Ertl
  |  ||        |||  ||   | |      ||+- Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        |||  ||   | |      ||`* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        |||  ||   | |      || `- Re: everything old is new again, Compare-and-branch vs PICTerje Mathisen
  |  ||        |||  ||   | |      |+* Re: everything old is new again, Compare-and-branch vs PICMichael S
  |  ||        |||  ||   | |      ||+* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        |||  ||   | |      |||`* Re: everything old is new again, Compare-and-branch vs PICMichael S
  |  ||        |||  ||   | |      ||| `* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        |||  ||   | |      |||  `- Re: everything old is new again, Compare-and-branch vs PICMichael S
  |  ||        |||  ||   | |      ||`* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        |||  ||   | |      |`* Re: everything old is new again, Compare-and-branch vs PICDavid Brown
  |  ||        |||  ||   | |      +* Re: everything old is new again, Compare-and-branch vs PICBGB
  |  ||        |||  ||   | |      `* Re: everything old is new again, Compare-and-branch vs PICTerje Mathisen
  |  ||        |||  ||   | `* Re: everything old is new again, Compare-and-branch vs PICAnton Ertl
  |  ||        |||  ||   +* Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |||  ||   `* Re: everything old is new again, Compare-and-branch vs PICMichael S
  |  ||        |||  |`* Re: everything old is new again, Compare-and-branch vs PICTerje Mathisen
  |  ||        |||  `* Re: everything old is new again, Compare-and-branch vs PICMitchAlsup
  |  ||        ||+* Re: everything old is new again, Compare-and-branch vs PICTerje Mathisen
  |  ||        ||+* Re: everything old is new again, Compare-and-branch vs PICScott Lurndal
  |  ||        ||`* Re: everything old is new again, Compare-and-branch vs PICTim Rentsch
  |  ||        |`* Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        +- Re: everything old is new again, Compare-and-branch vs PICThomas Koenig
  |  ||        `* Re: everything old is new again, Compare-and-branch vs PICAnton Ertl
  |  |`- Re: Compare-and-branch vs PICMitchAlsup
  |  `* Re: Compare-and-branch vs PICluke.l...@gmail.com
  `- Re: Compare-and-branch vs PICBGB

Pages:123456789101112131415161718192021222324
Compare-and-branch vs PIC

<b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29375&group=comp.arch#29375

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:dc01:0:b0:6fa:aee9:9d40 with SMTP id q1-20020ae9dc01000000b006faaee99d40mr80926160qkf.194.1670426769521;
Wed, 07 Dec 2022 07:26:09 -0800 (PST)
X-Received: by 2002:a05:6870:3383:b0:142:f72a:391d with SMTP id
w3-20020a056870338300b00142f72a391dmr42601396oae.23.1670426769242; Wed, 07
Dec 2022 07:26:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Dec 2022 07:26:08 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=2a02:8084:6020:5780:5569:6326:b7c6:d6a6;
posting-account=f4I3oAkAAABDSN7-E4aFhBpEX3HML7-_
NNTP-Posting-Host: 2a02:8084:6020:5780:5569:6326:b7c6:d6a6
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
Subject: Compare-and-branch vs PIC
From: russell....@gmail.com (Russell Wallace)
Injection-Date: Wed, 07 Dec 2022 15:26:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 21
 by: Russell Wallace - Wed, 7 Dec 2022 15:26 UTC

I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1

A possibility that occurs to me, though: maybe you can't use the ALU to compare operands and also add a displacement to the instruction pointer, in the same cycle. Maybe if you want compare-and-branch, you have to choose between an extra cycle of delay, vs the simpler model of branch instruction that provides bits that simply replace the low bits of the instruction pointer.. Which is potentially awkward if you were hoping for position independent code.

Is that actually the case, assuming you are tightly constrained in die area (either historically on a micron-scale process, or today on a small embedded core)? Or am I missing the mark?

Re: Compare-and-branch vs PIC

<TYq*zcd5y@news.chiark.greenend.org.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29377&group=comp.arch#29377

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: 07 Dec 2022 16:42:13 +0000 (GMT)
Organization: University of Cambridge, England
Message-ID: <TYq*zcd5y@news.chiark.greenend.org.uk>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="32732"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-15-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Wed, 7 Dec 2022 16:42 UTC

Russell Wallace <russell.wallace@gmail.com> wrote:
> I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
> https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1
>
> A possibility that occurs to me, though: maybe you can't use the ALU to
> compare operands and also add a displacement to the instruction pointer,
> in the same cycle. Maybe if you want compare-and-branch, you have to
> choose between an extra cycle of delay, vs the simpler model of branch
> instruction that provides bits that simply replace the low bits of the
> instruction pointer. Which is potentially awkward if you were hoping for
> position independent code.

You can do PIC on 32-bit ARM, you just have to do:

ADDEQ pc,pc,#offset
(or whatever condition code)

instead of a branch (where #offset is calculated by your assembler). If the
range of an immediate constant is too short, you have to use a register:

MOV rN,#(offset & 0x3FC00) << 10
ORR rN,rN,#(offset & 0x3FC)
ADDEQ pc,pc,rN

which can also be generated by your assembler. (you know the bottom two
bits of the offset are going to be zero so don't need to set them)

If you want to do compare and relative-branch you need to use the ALU twice,
once for the compare and once for adding to the PC. It would have been
possible to do a compare with absolute branch
(PC <= cond ? immediate : PC+4)
without using the ALU, but that's not very useful.

> Is that actually the case, assuming you are tightly constrained in die
> area (either historically on a micron-scale process, or today on a small
> embedded core)? Or am I missing the mark?

These days addition is cheap, so pc <= pc + N isn't very costly. Back then,
it was.

Theo

Re: Compare-and-branch vs PIC

<VYq*6cd5y@news.chiark.greenend.org.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29378&group=comp.arch#29378

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: 07 Dec 2022 16:44:27 +0000 (GMT)
Organization: University of Cambridge, England
Message-ID: <VYq*6cd5y@news.chiark.greenend.org.uk>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <TYq*zcd5y@news.chiark.greenend.org.uk>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="32732"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-15-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Wed, 7 Dec 2022 16:44 UTC

Theo <theom+news@chiark.greenend.org.uk> wrote:
> MOV rN,#(offset & 0x3FC00) << 10
> ORR rN,rN,#(offset & 0x3FC)
> ADDEQ pc,pc,rN

should be:

MOV rN,#(offset & 0x3FC00)
ORR rN,rN,#(offset & 0x3FC)
ADDEQ pc,pc,rN

(the <<10 is inferred by the assembler)

Re: Compare-and-branch vs PIC

<c8932c84-9395-4d57-85eb-301461e32b77n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29384&group=comp.arch#29384

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3714:b0:6fa:16fe:93f6 with SMTP id de20-20020a05620a371400b006fa16fe93f6mr78678168qkb.258.1670436413359;
Wed, 07 Dec 2022 10:06:53 -0800 (PST)
X-Received: by 2002:a05:6808:1452:b0:35a:812c:3eae with SMTP id
x18-20020a056808145200b0035a812c3eaemr45852804oiv.218.1670436413076; Wed, 07
Dec 2022 10:06:53 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Dec 2022 10:06:52 -0800 (PST)
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4417:53b5:8891:65d7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4417:53b5:8891:65d7
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c8932c84-9395-4d57-85eb-301461e32b77n@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 07 Dec 2022 18:06:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3133
 by: MitchAlsup - Wed, 7 Dec 2022 18:06 UTC

On Wednesday, December 7, 2022 at 9:26:11 AM UTC-6, Russell Wallace wrote:
> I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
> https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1
>
> A possibility that occurs to me, though: maybe you can't use the ALU to compare operands and also add a displacement to the instruction pointer, in the same cycle. Maybe if you want compare-and-branch, you have to choose between an extra cycle of delay, vs the simpler model of branch instruction that provides bits that simply replace the low bits of the instruction pointer. Which is potentially awkward if you were hoping for position independent code.
<
Even early RISC machines had a separate adder for branching and for integer calculations.
These are in different stages of the pipeline and have to be that way.
>
> Is that actually the case, assuming you are tightly constrained in die area (either historically on a micron-scale process, or today on a small embedded core)? Or am I missing the mark?
<
At about the 1.5 micron level, you get to choose to put the TLB on die or the FPU on die.
At 1 micron, you can put both on die.
At 0.7 micron a significant portion of cache comes on die.
At 0.5 micron, you can put reservation stations and make it GBOoO.

Re: Compare-and-branch vs PIC

<tmqlro$k893$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29389&group=comp.arch#29389

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Wed, 7 Dec 2022 10:25:57 -0800
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <tmqlro$k893$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 7 Dec 2022 18:26:00 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="d349afc6cc529866331f0be86b5ce9e5";
logging-data="663843"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/PW0i8uccKaeacSeylqyIazkaDaAMgCPs="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:VluF62r8J1d20+pWWlZylWuo5jA=
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Wed, 7 Dec 2022 18:25 UTC

On 12/7/2022 7:26 AM, Russell Wallace wrote:
> I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
> https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1

Unless you are talking about a compare to some fixed value, expressed in
the op code (e.g. a compare to zero and branch instruction) it would
have required an extra field in the instruction, i.e. two values to
compare and a third to specify the branch target address. Given the
fixed 32 bit instructions, this would have been "a challenge". :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Compare-and-branch vs PIC

<tmqnrp$kdei$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29390&group=comp.arch#29390

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Wed, 7 Dec 2022 13:00:05 -0600
Organization: A noiseless patient Spider
Lines: 114
Message-ID: <tmqnrp$kdei$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 7 Dec 2022 19:00:09 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="04276fcd79c75ec14666113b19eeede0";
logging-data="669138"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4Sp+P3NcYY0MlcfypMufF"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:cLwPRyQf8wbpeayLVJWKT+tbCb0=
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
Content-Language: en-US
 by: BGB - Wed, 7 Dec 2022 19:00 UTC

On 12/7/2022 9:26 AM, Russell Wallace wrote:
> I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
> https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1
>
> A possibility that occurs to me, though: maybe you can't use the ALU to compare operands and also add a displacement to the instruction pointer, in the same cycle. Maybe if you want compare-and-branch, you have to choose between an extra cycle of delay, vs the simpler model of branch instruction that provides bits that simply replace the low bits of the instruction pointer. Which is potentially awkward if you were hoping for position independent code.
>
> Is that actually the case, assuming you are tightly constrained in die area (either historically on a micron-scale process, or today on a small embedded core)? Or am I missing the mark?

Doing a general compare + branch has a risk of additional latency cost,
because now one has the "initiate a branch" logic depending directly on
the ALU compare logic.

In my case, I went with a single status bit (a True/False bit) that is
used (mostly) to predicate stuff.

So, say:
OP ... //Always Execute
OP?T ... //Execute Op if T is Set
OP?F ... //Execute Op if F is Clear

The BT/BF instructions (Branch if True or False), are currently encoded
as if they were:
BRA?T
BRA?F

There were some older dedicated BT/BF encodings, but they are
deprecated/dropped (in the 32-bit forms), and the space may later be
reclaimed for more 3R ops or similar.

The use of a single status bit also allows fitting:
OP ... //Always, Normal
OP ... || //Always, Bundle
OP?T ... //If True
OP?F ... //If False

Into roughly 2 bits of encoding entropy.

There were a few special ops:
LDIx Imm24, R0
That only exist in scalar form, so:
LDIx Imm24, R0 //Scalar Form
Jumbo Ext24 //Jumbo Prefixes
OP?T ... || //If True + Bundle (ISA subset)
OP?F ... || //If False + Bundle (ISA subset)

This stays within the two bits of encoding entropy, though internally
3-bits are used in the pipeline.

If I were not using any status bits, such as for a more RISC-V like ISA
design, I would likely go instead with hard-wired compare-with-zero ops.

In this case, it is nearly as capable, but the checks are significantly
cheaper to perform (and one can use the extra bits from not having a
second register, to maybe have a slightly bigger displacement).

Say:
BZ Rn, Disp17 //Branch if Rn==0
BNZ Rn, Disp17 //Branch if Rn!=0
BLT Rn, Disp17 //Branch if Rn< 0 (Bit 63 is Set)
BGE Rn, Disp17 //Branch if Rn>=0 (Bit 63 is Clear)

In this case, one can have tristate outputs on the compare ops, say:
-1: A< B
0: A==B
1: A> B

I would also get rid of the user-specified Link Register, as:
This is rarely useful;
It adds a non-trivial cost to the "branch-with-link" mechanism.

While a second link register "can" be useful for prolog compression
and/or "millicode" functions, the relative savings (vs, say, manually
copying the link register to a different register and/or saving it to
the stack) are small enough to be mostly ignored.

In intermediate option would be to instead allow encodings for two
hard-wired link registers.

Say, for a 3b Branch space:
000 B Disp22
001 -
010 BL Disp22 //Branch with Link, Save Primary LR
011 BL2 Disp22 //Branch with Link, Save Alternate LR
100 BZ Rn, Disp17
101 BNZ Rn, Disp17
110 BLT Rn, Disp17
111 BGE Rn, Disp17

This could be fit into the same amount of encoding space as the original
RISC-V JAL instruction (if using a similar encoding scheme to RISC-V).

I would not consider this design a win over my current ISA though, as
while it could fit more easily into a slightly smaller core, to get
similar performance would require more advanced hardware-level support
(such as superscalar and special logic to detect and handle short
forward branches as predication).

Similarly, there isn't a whole lot of point in trying to face off
against RISC-V with an ISA that is "nearly the same, just with slightly
cheaper branches and similar".

Though, beyond the base ISA (into the "extensions") space, would likely
do some more significant redesign (mostly to try to keep things more
consistent and for more aggressive cost minimization).

....

Re: Compare-and-branch vs PIC

<tmqof6$kdei$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29391&group=comp.arch#29391

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Wed, 7 Dec 2022 13:10:26 -0600
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <tmqof6$kdei$2@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqlro$k893$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 7 Dec 2022 19:10:30 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="04276fcd79c75ec14666113b19eeede0";
logging-data="669138"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bd7qZZGvBruWIO+Tl9Sj7"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:EPXygJpDujIj0yAQQzXm5vuu08c=
In-Reply-To: <tmqlro$k893$1@dont-email.me>
Content-Language: en-US
 by: BGB - Wed, 7 Dec 2022 19:10 UTC

On 12/7/2022 12:25 PM, Stephen Fuld wrote:
> On 12/7/2022 7:26 AM, Russell Wallace wrote:
>> I've been trying to better understand the reasoning behind some of the
>> design decisions that went into historical computer architectures. For
>> example, I note that ARM-1 is not a purist RISC, but has a number of
>> features designed to get more work done in each instruction. But one
>> thing it does not have is compare-and-branch in one instruction; you
>> have to compare separately, then branch on condition code. Yet
>> compare-and-branch was not entirely unprecedented at the time. There
>> doesn't seem to be any clear evidence that it would have added an
>> extra cycle of delay:
>> https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1
>
> Unless you are talking about a compare to some fixed value, expressed in
> the op code (e.g. a compare to zero and branch instruction) it would
> have required an extra field in the instruction, i.e. two values to
> compare and a third to specify the branch target address.  Given the
> fixed 32 bit instructions, this would have been "a challenge".  :-)
>

RISC-V compares an arbitrary pair of registers.

IMHO, this is not worth the cost (given compare-with-zero would have
been "nearly as effective", allow larger branch displacements, and would
have also been cheaper...).

For something more like ARM, one would need a fairly small branch
displacement, and it would likely do little to help over the use of
condition-codes.

Though, burning 4-bits on the condition code is a little steep, but they
didn't need to deal with variable-length instructions (so, they gain
back a few bits by not needing to be able to encode 16-bit operations or
similar).

Re: Compare-and-branch vs PIC

<53d3ba86-8dfa-4dcf-8b83-842e4a0573d2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29393&group=comp.arch#29393

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:cb0a:0:b0:4c7:a44:483a with SMTP id o10-20020a0ccb0a000000b004c70a44483amr35987955qvk.130.1670443426570;
Wed, 07 Dec 2022 12:03:46 -0800 (PST)
X-Received: by 2002:a05:6870:816:b0:143:af88:3b6c with SMTP id
fw22-20020a056870081600b00143af883b6cmr27652147oab.79.1670443426295; Wed, 07
Dec 2022 12:03:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Dec 2022 12:03:46 -0800 (PST)
In-Reply-To: <tmqof6$kdei$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4417:53b5:8891:65d7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4417:53b5:8891:65d7
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqlro$k893$1@dont-email.me> <tmqof6$kdei$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <53d3ba86-8dfa-4dcf-8b83-842e4a0573d2n@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 07 Dec 2022 20:03:46 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4031
 by: MitchAlsup - Wed, 7 Dec 2022 20:03 UTC

On Wednesday, December 7, 2022 at 1:10:33 PM UTC-6, BGB wrote:
> On 12/7/2022 12:25 PM, Stephen Fuld wrote:

> > Unless you are talking about a compare to some fixed value, expressed in
> > the op code (e.g. a compare to zero and branch instruction) it would
> > have required an extra field in the instruction, i.e. two values to
> > compare and a third to specify the branch target address. Given the
> > fixed 32 bit instructions, this would have been "a challenge". :-)
> >
> RISC-V compares an arbitrary pair of registers.
<
Yes, It does.
>
> IMHO, this is not worth the cost (given compare-with-zero would have
> been "nearly as effective", allow larger branch displacements, and would
> have also been cheaper...).
>
My 66000 does not have a Compare-Branch, using a compare-to-zero-branch
and a Comparison instruction followed by a branch-on-bit. I also have Pred-
ication on bit and on condition. There are some codes where RISC-V wins;
there are cases where we tie, and there are cases where My 66000 wins.
Overall in integer codes; it looks to be break even--no winner either way--it
is just different.
<
Since RISC-V does not allow constants in its compare-branch instruction,
all comparisons against a constant that is not-zero costs an instruction.
In FP comparisons, RISC-V has to use a SET-T/F followed by a branch
and in many cases that SET is a comparison with a constant which also
adds to the instruction stream. Most of the FP compare codes are better
in My 66000 than in RISC-V--but their are outliers on both sides.
<
Many times My 66000 can "keep" a comparison as a persistent value
(in a register) and use that value again:: Say we compared x and y and
put the result of this comparison in R7. As long as R7 is not damaged,
we can extract any of the relations between x and y from the bit pattern
stored in R7 {==, !=, s>, s>=, s<, s<=, u>, u>=, u<, u<=, 0<x<y, 0<=x<y, 0<x<=y,
0<=x<=y} and a few more esoteric relations. Thus, My 66000 has as many
"condition codes" as the compiler has available registers. Any of the relations
can be used to start a predicated string, qualify a branch, or be extracted
into {0,+1} <unsigned> or {-1,0} <signed> -- without out having SET instructions
in the instruction set, nor be limited with SET instructions in the instruction
set.
>
>
> For something more like ARM, one would need a fairly small branch
> displacement, and it would likely do little to help over the use of
> condition-codes.
>
> Though, burning 4-bits on the condition code is a little steep, but they
> didn't need to deal with variable-length instructions (so, they gain
> back a few bits by not needing to be able to encode 16-bit operations or
> similar).
<
The only proper number of bit in the condition codes are 0.

Re: Compare-and-branch vs PIC

<8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29394&group=comp.arch#29394

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4884:b0:6fb:c38e:27bf with SMTP id ea4-20020a05620a488400b006fbc38e27bfmr81113006qkb.351.1670443839669;
Wed, 07 Dec 2022 12:10:39 -0800 (PST)
X-Received: by 2002:a4a:dc9a:0:b0:4a0:5b89:cacd with SMTP id
g26-20020a4adc9a000000b004a05b89cacdmr18651406oou.40.1670443839370; Wed, 07
Dec 2022 12:10:39 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Dec 2022 12:10:39 -0800 (PST)
In-Reply-To: <tmqnrp$kdei$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4417:53b5:8891:65d7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4417:53b5:8891:65d7
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmqnrp$kdei$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 07 Dec 2022 20:10:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5953
 by: MitchAlsup - Wed, 7 Dec 2022 20:10 UTC

On Wednesday, December 7, 2022 at 1:00:13 PM UTC-6, BGB wrote:
> On 12/7/2022 9:26 AM, Russell Wallace wrote:
> > I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
> > https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1
> >
> > A possibility that occurs to me, though: maybe you can't use the ALU to compare operands and also add a displacement to the instruction pointer, in the same cycle. Maybe if you want compare-and-branch, you have to choose between an extra cycle of delay, vs the simpler model of branch instruction that provides bits that simply replace the low bits of the instruction pointer. Which is potentially awkward if you were hoping for position independent code.
> >
> > Is that actually the case, assuming you are tightly constrained in die area (either historically on a micron-scale process, or today on a small embedded core)? Or am I missing the mark?
> Doing a general compare + branch has a risk of additional latency cost,
> because now one has the "initiate a branch" logic depending directly on
> the ALU compare logic.
>
>
> In my case, I went with a single status bit (a True/False bit) that is
> used (mostly) to predicate stuff.
>
> So, say:
> OP ... //Always Execute
> OP?T ... //Execute Op if T is Set
> OP?F ... //Execute Op if F is Clear
>
> The BT/BF instructions (Branch if True or False), are currently encoded
> as if they were:
> BRA?T
> BRA?F
>
> There were some older dedicated BT/BF encodings, but they are
> deprecated/dropped (in the 32-bit forms), and the space may later be
> reclaimed for more 3R ops or similar.
>
> The use of a single status bit also allows fitting:
> OP ... //Always, Normal
> OP ... || //Always, Bundle
> OP?T ... //If True
> OP?F ... //If False
>
> Into roughly 2 bits of encoding entropy.
<
I put this into 1-bit of entropy, and did not charge the consuming instruction
for that bit. That is normal instructions do not waste a bit encoding a possibility
that occurs less than 20% of the time, but instead, I move a group of these bits
over into a PRED instruction, which casts a execute/don't shadow across a
number of instructions.
>
>
> There were a few special ops:
> LDIx Imm24, R0
> That only exist in scalar form, so:
> LDIx Imm24, R0 //Scalar Form
> Jumbo Ext24 //Jumbo Prefixes
> OP?T ... || //If True + Bundle (ISA subset)
> OP?F ... || //If False + Bundle (ISA subset)
>
> This stays within the two bits of encoding entropy, though internally
> 3-bits are used in the pipeline.
>
1-bit per instruction in the pipeline.
>
>
> If I were not using any status bits, such as for a more RISC-V like ISA
> design, I would likely go instead with hard-wired compare-with-zero ops.
>
> In this case, it is nearly as capable, but the checks are significantly
> cheaper to perform (and one can use the extra bits from not having a
> second register, to maybe have a slightly bigger displacement).
>
> Say:
> BZ Rn, Disp17 //Branch if Rn==0
> BNZ Rn, Disp17 //Branch if Rn!=0
> BLT Rn, Disp17 //Branch if Rn< 0 (Bit 63 is Set)
> BGE Rn, Disp17 //Branch if Rn>=0 (Bit 63 is Clear)
>
> In this case, one can have tristate outputs on the compare ops, say:
> -1: A< B
> 0: A==B
> 1: A> B
>
> I would also get rid of the user-specified Link Register, as:
> This is rarely useful;
> It adds a non-trivial cost to the "branch-with-link" mechanism.
>
> While a second link register "can" be useful for prolog compression
> and/or "millicode" functions, the relative savings (vs, say, manually
> copying the link register to a different register and/or saving it to
> the stack) are small enough to be mostly ignored.
<
You "could" add ENTER and EXIT instructions to the ISA and dispense
with the fake code density, along with the alterations of the control
flow (power and cycles).
>

Re: Compare-and-branch vs PIC

<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29396&group=comp.arch#29396

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:dc01:0:b0:6fa:aee9:9d40 with SMTP id q1-20020ae9dc01000000b006faaee99d40mr81711092qkf.194.1670444778882;
Wed, 07 Dec 2022 12:26:18 -0800 (PST)
X-Received: by 2002:a05:6808:1247:b0:35a:8bd1:2f4d with SMTP id
o7-20020a056808124700b0035a8bd12f4dmr35790108oiv.261.1670444778659; Wed, 07
Dec 2022 12:26:18 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Dec 2022 12:26:18 -0800 (PST)
In-Reply-To: <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dde:6a00:556d:873f:3bc6:a98a;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dde:6a00:556d:873f:3bc6:a98a
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 07 Dec 2022 20:26:18 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2664
 by: robf...@gmail.com - Wed, 7 Dec 2022 20:26 UTC

I think deciding between compare-and-branch as one instruction may have to
do with the size of the branch displacement possible in an instruction format.
It takes more bits to represent the instruction if there are two register fields in
it and that may limit the displacement size. If instructions are 16-bits specifying
two registers in the instruction probably is not practical due to the small
displacement.

It is often not necessary to do the compare, in some processors the condition
codes are set by many instructions and a branch can take place without needing
an explicit compare. In that case the instructions are just as compact as doing a
compare-and-branch in one instruction.

RISCV has only a 12-bit displacement field which is adequate most of the time,
but no doubt on rare occasions extra code needs to be inserted to branch further
away.

Another consideration is that compare instructions are needed in the instruction
set anyway, it may be less hardware then to use separate compare and branch
instructions. Compares are needed for expressions like: X = a < b. which are
sometimes recorded instead of branched on.

My latest core stores the results in condition code registers like the PowerPC
rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
displacement. For branch-to-subroutine 24-bit displacements are possible.

Re: Compare-and-branch vs PIC

<tmr3ft$lel7$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29397&group=comp.arch#29397

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Wed, 7 Dec 2022 16:18:33 -0600
Organization: A noiseless patient Spider
Lines: 237
Message-ID: <tmr3ft$lel7$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me>
<8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 7 Dec 2022 22:18:38 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="04276fcd79c75ec14666113b19eeede0";
logging-data="703143"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18HxGjOQ4qmPLzgnL1HFZJC"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:o26oidZ12+eBWbvHd8kBzKJ5USQ=
Content-Language: en-US
In-Reply-To: <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
 by: BGB - Wed, 7 Dec 2022 22:18 UTC

On 12/7/2022 2:10 PM, MitchAlsup wrote:
> On Wednesday, December 7, 2022 at 1:00:13 PM UTC-6, BGB wrote:
>> On 12/7/2022 9:26 AM, Russell Wallace wrote:
>>> I've been trying to better understand the reasoning behind some of the design decisions that went into historical computer architectures. For example, I note that ARM-1 is not a purist RISC, but has a number of features designed to get more work done in each instruction. But one thing it does not have is compare-and-branch in one instruction; you have to compare separately, then branch on condition code. Yet compare-and-branch was not entirely unprecedented at the time. There doesn't seem to be any clear evidence that it would have added an extra cycle of delay:
>>> https://retrocomputing.stackexchange.com/questions/25785/would-compare-and-branch-have-added-an-extra-cycle-on-arm-1
>>>
>>> A possibility that occurs to me, though: maybe you can't use the ALU to compare operands and also add a displacement to the instruction pointer, in the same cycle. Maybe if you want compare-and-branch, you have to choose between an extra cycle of delay, vs the simpler model of branch instruction that provides bits that simply replace the low bits of the instruction pointer. Which is potentially awkward if you were hoping for position independent code.
>>>
>>> Is that actually the case, assuming you are tightly constrained in die area (either historically on a micron-scale process, or today on a small embedded core)? Or am I missing the mark?
>> Doing a general compare + branch has a risk of additional latency cost,
>> because now one has the "initiate a branch" logic depending directly on
>> the ALU compare logic.
>>
>>
>> In my case, I went with a single status bit (a True/False bit) that is
>> used (mostly) to predicate stuff.
>>
>> So, say:
>> OP ... //Always Execute
>> OP?T ... //Execute Op if T is Set
>> OP?F ... //Execute Op if F is Clear
>>
>> The BT/BF instructions (Branch if True or False), are currently encoded
>> as if they were:
>> BRA?T
>> BRA?F
>>
>> There were some older dedicated BT/BF encodings, but they are
>> deprecated/dropped (in the 32-bit forms), and the space may later be
>> reclaimed for more 3R ops or similar.
>>
>> The use of a single status bit also allows fitting:
>> OP ... //Always, Normal
>> OP ... || //Always, Bundle
>> OP?T ... //If True
>> OP?F ... //If False
>>
>> Into roughly 2 bits of encoding entropy.
> <
> I put this into 1-bit of entropy, and did not charge the consuming instruction
> for that bit. That is normal instructions do not waste a bit encoding a possibility
> that occurs less than 20% of the time, but instead, I move a group of these bits
> over into a PRED instruction, which casts a execute/don't shadow across a
> number of instructions.

If I were to do stuff differently here, it would probably be to spend
less of the encoding space on 16-bit ops.

I might choose to stick with 32 GPRs, since the number of cases that
benefit from 64 GPRs is fairly modest.

If 64 GPRs is supported, it likely makes sense to treat these as a
special case encoding (ISA subset or 64-bit ops), since otherwise 6-bit
register fields would still be too much of an issue for encoding space
(and only sometimes bring much benefit).

Though kinda ugly, BJX2 sort of manages 64 GPRs "semi-effectively" with
5-bit register fields. But, not sure I would do it again...

Say, possible re-imagined encoding space:
zzzz-zzzz-zzzz-zzz0 //16 bit
zzzz-zzzz-zzzz-zzzz zzzz-zzzz-zzzz-zpp1 //32+ bit

Then, say, pp:
00: Always
01: WEX
10: ?T
11: ?F

At this point, assuming I stick with 5b register fields and modest
Imm/Disp fields, I would be ahead of where I started out with for BJX2
(which burns 3-bits on the 16/32 split).

Possible Schemes, 16b:
zzzz-mmmm-nnnn-zzz0 //16b 2R (16 regs)
zzzz-iiii-nnnn-zzz0 //16b 2RI (Imm4/Disp4)
zzzm-mmmm-nnnn-nzz0 //16b 2R (32 regs)
zzzi-iiii-nnnn-nzz0 //16b 2RI (Imm5/Disp5, 32 regs)
zzzi-iiii-iiii-izz0 //16b Imm10

Would likely need to provide:
SP-relative Load/Store (R0..R31)
Constant Load (R0..R31)
MOV 2R (R0..R31)
CMPxx-2R and 2RI (R0..R31)
A few basic ALU ops (ADD/SUB);
Short-form Branch Instructions;
...

Could be more minimal than in BJX2, since there is little need for
16-bit ops to actually be able to run the system (but, exist primarily
so that one can save space).

3R or 3RI cases can be skipped as these don't really have enough of a
hit-rate to justify the level of hair they add.

For 32-bit ops, might choose to remain with primarily 9-bit and
immediate and displacement fields, since with scaling, these are
reasonably effective.

Possible 32-bit layout:
zzzt-tttt-zzzz-ssss szzn-nnnn-zzzz-zpp1 //32b, 3R
iiii-iiii-izzz-ssss szzn-nnnn-zzzz-zpp1 //32b, 3RI Imm9
iiii-iiii-iiii-iiii izzn-nnnn-zzzz-zpp1 //32b, 2RI Imm17
iiii-iiii-iiii-iiii iiii-zzzz-zzzz-zpp1 //32b, Imm20 / Branch
iiii-iiii-iiii-iiii iiii-iiii-zzzz-zpp1 //32b, Imm24 (Jumbo)

In this case, The Branch-Op would not be allowed as WEX, and this would
encode the Jumbo block.

Possible:
zzzt-tttt-zzzz-ssss szzn-nnnn-zzz0-0pp1 //32b, 3R
iiii-iiii-izzz-ssss szzn-nnnn-zzz0-1pp1 //32b, 3RI Imm9

iiii-iiii-iiii-iiii izzn-nnnn-z011-1pp1 //32b, 2RI Imm17

Imm16/Imm17 Block:
iiii-iiii-iiii-iiii 000n-nnnn-0011-1pp1 MOV Imm16u, Rn
iiii-iiii-iiii-iiii 100n-nnnn-0011-1pp1 MOV Imm16n, Rn
iiii-iiii-iiii-iiii 001n-nnnn-0011-1pp1 ADD Imm16u, Rn
iiii-iiii-iiii-iiii 101n-nnnn-0011-1pp1 ADD Imm16n, Rn
iiii-iiii-iiii-iiii 010n-nnnn-0011-1pp1 LDSH Imm16u, Rn
iiii-iiii-iiii-iiii 110n-nnnn-0011-1pp1 FLDCH Imm16u, Rn
iiii-iiii-iiii-iiii 011n-nnnn-0011-1pp1 -
iiii-iiii-iiii-iiii 111n-nnnn-0011-1pp1 -

iiii-iiii-iiii-iiii 0ttn-nnnn-1011-1pp1 ? MOV.x Rn, (GBR, Disp16u)
iiii-iiii-iiii-iiii 1ttn-nnnn-1011-1pp1 ? MOV.x (GBR, Disp16u), Rn
00=L, 01=Q, 10=UL, 11=X
Scaled by 4 or 8 (First 256K or 512K of data/bss).

Branch Block:
iiii-iiii-iiii-iiii iiii-ii00-0111-1001 BRA Disp22
iiii-iiii-iiii-iiii iiii-ii01-0111-1001 MOV Imm22u, Rtmp
iiii-iiii-iiii-iiii iiii-ii10-0111-1001 BSR Disp22
iiii-iiii-iiii-iiii iiii-ii11-0111-1001 MOV Imm22n, Rtmp
iiii-iiii-iiii-iiii izzn-nnnn-1111-1001 Bcc Disp17

iiii-iiii-iiii-iiii iiii-iiii-0111-1011 Jumbo-Imm
iiii-iiii-iiii-iiii iiii-iiii-1111-1011 Jumbo-Op
iiii-iiii-iiii-iiii iiii-ii00-0111-1101 BT Disp22
iiii-iiii-iiii-iiii iiii-ii00-0111-1111 BF Disp22

Where, Rtmp is a fixed temporary register (likely the return-value
register).

64-bit:
iiii-iiii-iiii-iiii iiii-iiii-0111-1011 -
iiii-iiii-iiii-iiii iiii-iiii-0111-1001 BRA Abs48

96-bit:
iiii-iiii-iiii-iiii iiii-iiii-0111-1011 -
iiii-iiii-iiii-iiii iiii-iiii-0111-1011 -
iiii-iiii-iiii-iiii 000n-nnnn-0011-1pp1 MOV Imm64, Rn

iiii-iiii-iiii-iiii iiii-iiii-0111-1011 -
iiii-iiii-iiii-iiii iiii-iiii-0111-1011 -
iiii-iiii-iiii-iiii 001n-nnnn-0011-1pp1 ADD Imm64, Rn

...

Might still need some more thinking here...

>>
>>
>> There were a few special ops:
>> LDIx Imm24, R0
>> That only exist in scalar form, so:
>> LDIx Imm24, R0 //Scalar Form
>> Jumbo Ext24 //Jumbo Prefixes
>> OP?T ... || //If True + Bundle (ISA subset)
>> OP?F ... || //If False + Bundle (ISA subset)
>>
>> This stays within the two bits of encoding entropy, though internally
>> 3-bits are used in the pipeline.
>>
> 1-bit per instruction in the pipeline.

Possibly, but my scheme allows executing both the Then and Else branches
in parallel in many cases, and does not need an explicit "PRED" op, ...

Also doesn't lead to extra state that would still need to be saved
somehow if an interrupt occurs.

>>
>>
>> If I were not using any status bits, such as for a more RISC-V like ISA
>> design, I would likely go instead with hard-wired compare-with-zero ops.
>>
>> In this case, it is nearly as capable, but the checks are significantly
>> cheaper to perform (and one can use the extra bits from not having a
>> second register, to maybe have a slightly bigger displacement).
>>
>> Say:
>> BZ Rn, Disp17 //Branch if Rn==0
>> BNZ Rn, Disp17 //Branch if Rn!=0
>> BLT Rn, Disp17 //Branch if Rn< 0 (Bit 63 is Set)
>> BGE Rn, Disp17 //Branch if Rn>=0 (Bit 63 is Clear)
>>
>> In this case, one can have tristate outputs on the compare ops, say:
>> -1: A< B
>> 0: A==B
>> 1: A> B
>>
>> I would also get rid of the user-specified Link Register, as:
>> This is rarely useful;
>> It adds a non-trivial cost to the "branch-with-link" mechanism.
>>
>> While a second link register "can" be useful for prolog compression
>> and/or "millicode" functions, the relative savings (vs, say, manually
>> copying the link register to a different register and/or saving it to
>> the stack) are small enough to be mostly ignored.
> <
> You "could" add ENTER and EXIT instructions to the ISA and dispense
> with the fake code density, along with the alterations of the control
> flow (power and cycles).


Click here to read the complete article
Re: Compare-and-branch vs PIC

<tmr5ti$lpp1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29398&group=comp.arch#29398

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Wed, 7 Dec 2022 16:59:58 -0600
Organization: A noiseless patient Spider
Lines: 102
Message-ID: <tmr5ti$lpp1$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me>
<8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 7 Dec 2022 23:00:02 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a706438a974cae5717481edf9c86a3eb";
logging-data="714529"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Jc52nKj46poWiNIcnrTkb"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:rCtX/QyrSUz9qmek83dlYTjsIAA=
Content-Language: en-US
In-Reply-To: <9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
 by: BGB - Wed, 7 Dec 2022 22:59 UTC

On 12/7/2022 2:26 PM, robf...@gmail.com wrote:
> I think deciding between compare-and-branch as one instruction may have to
> do with the size of the branch displacement possible in an instruction format.
> It takes more bits to represent the instruction if there are two register fields in
> it and that may limit the displacement size. If instructions are 16-bits specifying
> two registers in the instruction probably is not practical due to the small
> displacement.
>
> It is often not necessary to do the compare, in some processors the condition
> codes are set by many instructions and a branch can take place without needing
> an explicit compare. In that case the instructions are just as compact as doing a
> compare-and-branch in one instruction.
>

As noted, I still like the 1-bit True/False scheme. It has some of the
advantages of CCs, mostly without the same sorts of drawbacks.

Failing this, fixed compare-with-zero is the second-place option (nearly
as powerful as the full-compare case, but cheaper and with fewer issues).

> RISCV has only a 12-bit displacement field which is adequate most of the time,
> but no doubt on rare occasions extra code needs to be inserted to branch further
> away.
>

Such is the issue.

The 12-bits falls into the category of "usually but not always sufficient".

For larger functions, the compiler would need to do something like:
Bcc Rs, Rt, .L0
JAL X0, lbl
.L0:

With 17 bits, this would be far less likely.

With 20 or 22 bits, it would drop essentially to zero (no functions are
this large).

OTOH, with 8 or 9 bits, it more a case of "well, it is often sufficient"
(but much of the time, it is not).

> Another consideration is that compare instructions are needed in the instruction
> set anyway, it may be less hardware then to use separate compare and branch
> instructions. Compares are needed for expressions like: X = a < b. which are
> sometimes recorded instead of branched on.
>

Yeah.

RISC-V also has a "SLT" instruction for these cases:
SLT Rn, Rs, Rt
Does effectively:
Rn = Rs < Rt;

In my case, there are "MOVT Rn" / "MOVNT Rn", which copy the T bit into
a register (as-is, or inverted).

> My latest core stores the results in condition code registers like the PowerPC
> rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
> displacement. For branch-to-subroutine 24-bit displacements are possible.
>

In a few cases, I considered options for using a small set of P-bit
registers, say:
000= P0 (P0=1)
001=!P0 (P0=1)
010= P1 (T)
011=!P1 (T)
100= P2 (S)
101=!P2 (S)
110= P3 (U)
111=!P3 (U)

Where P0 is hard-wired as 1, and !P0 is a special case (such as WEX).

In this case, things like compare ops would be directed into one of the
P-bit registers.

But, the need for more than a single predicate bit is obscure enough to
make it not really worthwhile to spend an extra encoding bit on it
(needing to deal with multiple predicated branches or similar at the
same time being "fairly uncommon").

Had also experimented with a mechanism to treat P-bits as a small stack
machine, but there wasn't any good way to make use of this in my
compiler (enough to offset the cost of the mechanism to do so).

This could have allowed effectively handling modest-size if/else trees
using predication, but is "much less straightforward" for a compiler
(and both less versatile and harder to work with in the backend than
having a few free-form predicate-bit registers).

....

Re: Compare-and-branch vs PIC

<98da075d-b844-4919-8fef-258dbb70086en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29399&group=comp.arch#29399

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:60c3:0:b0:3a5:f9ba:8c68 with SMTP id i3-20020ac860c3000000b003a5f9ba8c68mr83197947qtm.192.1670482924609;
Wed, 07 Dec 2022 23:02:04 -0800 (PST)
X-Received: by 2002:a05:6870:9f0c:b0:13c:97e9:5d40 with SMTP id
xl12-20020a0568709f0c00b0013c97e95d40mr42668155oab.42.1670482924345; Wed, 07
Dec 2022 23:02:04 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Dec 2022 23:02:04 -0800 (PST)
In-Reply-To: <tmqlro$k893$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a02:8084:6020:5780:9d87:8d37:5e61:901f;
posting-account=f4I3oAkAAABDSN7-E4aFhBpEX3HML7-_
NNTP-Posting-Host: 2a02:8084:6020:5780:9d87:8d37:5e61:901f
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmqlro$k893$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <98da075d-b844-4919-8fef-258dbb70086en@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: russell....@gmail.com (Russell Wallace)
Injection-Date: Thu, 08 Dec 2022 07:02:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2110
 by: Russell Wallace - Thu, 8 Dec 2022 07:02 UTC

On Wednesday, December 7, 2022 at 6:26:02 PM UTC, Stephen Fuld wrote:
> Unless you are talking about a compare to some fixed value, expressed in
> the op code (e.g. a compare to zero and branch instruction) it would
> have required an extra field in the instruction, i.e. two values to
> compare and a third to specify the branch target address. Given the
> fixed 32 bit instructions, this would have been "a challenge". :-)

10 bits operands, 1 bit direction, 1 bit yes/no, 2 bits for kind of comparison (==, signed/uns <, and a <= b ⇔ !(b < a)), 2 bits for opcode, leaves 16 bits of displacement, which should be more than plenty. I think 2 bits for opcode is justified given the importance of this instruction, but could bump it all the way to something like 6 bits, and still get a reasonably ample 12 bits of displacement.

Re: Compare-and-branch vs PIC

<eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29409&group=comp.arch#29409

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:514d:b0:3a5:258c:d69c with SMTP id ew13-20020a05622a514d00b003a5258cd69cmr73124737qtb.279.1670529162487;
Thu, 08 Dec 2022 11:52:42 -0800 (PST)
X-Received: by 2002:a05:6830:d87:b0:66d:8b98:683f with SMTP id
bv7-20020a0568300d8700b0066d8b98683fmr49532142otb.40.1670529162207; Thu, 08
Dec 2022 11:52:42 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 8 Dec 2022 11:52:41 -0800 (PST)
In-Reply-To: <9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:99d5:b15:98cc:58d9;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:99d5:b15:98cc:58d9
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 08 Dec 2022 19:52:42 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 53
 by: MitchAlsup - Thu, 8 Dec 2022 19:52 UTC

On Wednesday, December 7, 2022 at 2:26:20 PM UTC-6, robf...@gmail.com wrote:
> I think deciding between compare-and-branch as one instruction may have to
> do with the size of the branch displacement possible in an instruction format.
> It takes more bits to represent the instruction if there are two register fields in
> it and that may limit the displacement size. If instructions are 16-bits specifying
> two registers in the instruction probably is not practical due to the small
> displacement.
<
With the Compare instruction and then branch model, there is 1 branch instruction
with a myriad of sub instructions {Br ==, Br !=, Br s>, BR s>=, Br s<, BR s<=, Br u<
Br u<=, Br u>, BR u>=, and more} all as 1 single Major OpCode (front 6 bits). I
use the bits where the destination register would be to encode the condition.
<
In contrast, the Compare-Branch model only has a few compares-and-branches
due to not wanting to waste too much OpCode space of the compare-branch.
<
My compare then branch model uses a bit vector condition, so branch on bit
falls out for free. Over on the compare-to-zero-and-branch side, I have a complete
set of compare-to-zero-branch instructions, some with the ability to test for
situations outside of normal programmatics--for example: test if an ATOMIC
sequence has suffered from interference.
>
> It is often not necessary to do the compare, in some processors the condition
> codes are set by many instructions and a branch can take place without needing
> an explicit compare. In that case the instructions are just as compact as doing a
> compare-and-branch in one instruction.
<
The SPARC experience indicates that 80% of this free work is non-useful.
>
> RISCV has only a 12-bit displacement field which is adequate most of the time,
> but no doubt on rare occasions extra code needs to be inserted to branch further
> away.
<
My 66000 has 16-bits of word displacement in the OpCode which is inadequate
a whole lot less than RISC-V. You would need a subroutine with more than 32767
instructions.
>
> Another consideration is that compare instructions are needed in the instruction
> set anyway, it may be less hardware then to use separate compare and branch
> instructions. Compares are needed for expressions like: X = a < b. which are
> sometimes recorded instead of branched on.
<
My 66000 compare instructions build a bit-vector of relations. Then my shift
instructions can extract the signed set {-1,0} or the unsigned set {0,+1}, so
different languages can access what is their definition of the right result.
>
> My latest core stores the results in condition code registers like the PowerPC
> rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
> displacement. For branch-to-subroutine 24-bit displacements are possible.
<
How often is 21-bits sufficient (where 16-bits would have been insufficient) ??
Do you know of a subroutine that is bigger than 1,000,000 instructions ?
<
As for direct BR and CALL one has 26-bits

Re: Compare-and-branch vs PIC

<3urkL.797$cKvc.98@fx42.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29411&group=comp.arch#29411

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx42.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Compare-and-branch vs PIC
Newsgroups: comp.arch
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com> <9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com> <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
Lines: 15
Message-ID: <3urkL.797$cKvc.98@fx42.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Thu, 08 Dec 2022 20:07:27 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Thu, 08 Dec 2022 20:07:27 GMT
X-Received-Bytes: 1694
 by: Scott Lurndal - Thu, 8 Dec 2022 20:07 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Wednesday, December 7, 2022 at 2:26:20 PM UTC-6, robf...@gmail.com wrote:

>> My latest core stores the results in condition code registers like the PowerPC
>> rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
>> displacement. For branch-to-subroutine 24-bit displacements are possible.
><
>How often is 21-bits sufficient (where 16-bits would have been insufficient) ??
>Do you know of a subroutine that is bigger than 1,000,000 instructions ?

For assembly code, as opposed to more constrained compiler
generated subroutines; particularly OS/VM level code, I can
see cases where the virtual address space is fragmented such that
21 bit offsets (or even 32-bit offsets) wouldn't be sufficient;
particularly in a 64-bit address space.

Re: Compare-and-branch vs PIC

<63fc980a-ef9e-49c2-95bc-40dc7545489bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29414&group=comp.arch#29414

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5849:0:b0:4c7:933:144c with SMTP id de9-20020ad45849000000b004c70933144cmr39039424qvb.80.1670532373330;
Thu, 08 Dec 2022 12:46:13 -0800 (PST)
X-Received: by 2002:a05:6808:1309:b0:359:d97b:3f6f with SMTP id
y9-20020a056808130900b00359d97b3f6fmr40558206oiv.298.1670532372805; Thu, 08
Dec 2022 12:46:12 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 8 Dec 2022 12:46:12 -0800 (PST)
In-Reply-To: <3urkL.797$cKvc.98@fx42.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:99d5:b15:98cc:58d9;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:99d5:b15:98cc:58d9
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com> <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
<3urkL.797$cKvc.98@fx42.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <63fc980a-ef9e-49c2-95bc-40dc7545489bn@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 08 Dec 2022 20:46:13 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3282
 by: MitchAlsup - Thu, 8 Dec 2022 20:46 UTC

On Thursday, December 8, 2022 at 2:07:31 PM UTC-6, Scott Lurndal wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Wednesday, December 7, 2022 at 2:26:20 PM UTC-6, robf...@gmail.com wrote:
>
> >> My latest core stores the results in condition code registers like the PowerPC
> >> rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
> >> displacement. For branch-to-subroutine 24-bit displacements are possible.
> ><
> >How often is 21-bits sufficient (where 16-bits would have been insufficient) ??
> >Do you know of a subroutine that is bigger than 1,000,000 instructions ?
<
> For assembly code, as opposed to more constrained compiler
> generated subroutines; particularly OS/VM level code, I can
> see cases where the virtual address space is fragmented such that
> 21 bit offsets (or even 32-bit offsets) wouldn't be sufficient;
> particularly in a 64-bit address space.
<
Fair enough::
<
But when the Branch/call space is insufficient, I have access to both 32-bit
and 64-bit (using my constant extension). So, if the ASM programmer
writes::
<
BR/CALL Some_GuestOS_Common_Entry_Point
<
The assembler leaves it in 64-bit form. Later, when the linker resolves the
linkages and find this is 32-bit or 26-bit capable, it (linker) can shrink the
module.
<
When ASLR randomizes the VASs, different images may find different entry
points using different sized "constants" than the previous image or later image.
<
I should also notice that should the linker leave the linkage uniformly 64-bits
that this takes no more cycles to execute (98% confidence) than the smaller
forms--all you are saving is code footprint not cycles. The 1-word BR address
takes the same cycle count as the 2-word BR and as the 3-word BR. So, the only
gain in compression of potentially large displacements is code footprint not
cycles.

Re: Compare-and-branch vs PIC

<tmtmlq$uo06$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29419&group=comp.arch#29419

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Thu, 8 Dec 2022 15:58:14 -0600
Organization: A noiseless patient Spider
Lines: 194
Message-ID: <tmtmlq$uo06$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me>
<8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
<eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 8 Dec 2022 21:58:19 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a706438a974cae5717481edf9c86a3eb";
logging-data="1007622"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kuMeHgI+Z4YQ9N67J+wtA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:kDym6j9sW0n3LaSYwpt8vML6CKE=
Content-Language: en-US
In-Reply-To: <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
 by: BGB - Thu, 8 Dec 2022 21:58 UTC

On 12/8/2022 1:52 PM, MitchAlsup wrote:
> On Wednesday, December 7, 2022 at 2:26:20 PM UTC-6, robf...@gmail.com wrote:
>> I think deciding between compare-and-branch as one instruction may have to
>> do with the size of the branch displacement possible in an instruction format.
>> It takes more bits to represent the instruction if there are two register fields in
>> it and that may limit the displacement size. If instructions are 16-bits specifying
>> two registers in the instruction probably is not practical due to the small
>> displacement.
> <
> With the Compare instruction and then branch model, there is 1 branch instruction
> with a myriad of sub instructions {Br ==, Br !=, Br s>, BR s>=, Br s<, BR s<=, Br u<
> Br u<=, Br u>, BR u>=, and more} all as 1 single Major OpCode (front 6 bits). I
> use the bits where the destination register would be to encode the condition.
> <
> In contrast, the Compare-Branch model only has a few compares-and-branches
> due to not wanting to waste too much OpCode space of the compare-branch.
> <
> My compare then branch model uses a bit vector condition, so branch on bit
> falls out for free. Over on the compare-to-zero-and-branch side, I have a complete
> set of compare-to-zero-branch instructions, some with the ability to test for
> situations outside of normal programmatics--for example: test if an ATOMIC
> sequence has suffered from interference.

I ended up defining some RISC-V style compare-and-branch instructions.
Only real reason they are enabled in because the RISC-V mode needs them.

They end up talking up ~ 1/4 of the F1 block (mostly used for Load/Store
Disp9 ops).

IIRC, I did add compiler support for them, but they are still "pretty
meh": "if(a<b)" as a single op, but only if the target is within +/- 256
bytes being not that big of a win.

Ironically, "if(a<0)" tends to end up being more useful (these ended up
existing as separate ops in BJX2 mostly because of the lack of an
architectural zero register)

In the case of integer compare ops, the majority of cases tend to be
comparisons with a constant, which is on average better served, say, by
"CMPxx Imm10, Rn; BT/BF lbl"

>>
>> It is often not necessary to do the compare, in some processors the condition
>> codes are set by many instructions and a branch can take place without needing
>> an explicit compare. In that case the instructions are just as compact as doing a
>> compare-and-branch in one instruction.
> <
> The SPARC experience indicates that 80% of this free work is non-useful.

Most status flag updating is noise.

If one does status-flags like on x86, then far more often the status
flags will end up being stomped before one can do anything with the results.

Status flags are more useful, ironically, when only a very limited
selection of instructions modify them.

Similar for True/False bit predication:
To be useful for Then/Else blocks, also implies that no instruction
inside the Then/Else block may update this bit.

>>
>> RISCV has only a 12-bit displacement field which is adequate most of the time,
>> but no doubt on rare occasions extra code needs to be inserted to branch further
>> away.
> <
> My 66000 has 16-bits of word displacement in the OpCode which is inadequate
> a whole lot less than RISC-V. You would need a subroutine with more than 32767
> instructions.

In my case, BT/BF are 20-bit, which is almost always sufficient.

But, yeah, a branch within +/- 32K words will hit a lot more often than
+/- 2K words (which will hit more often than +/- 256 words).

>>
>> Another consideration is that compare instructions are needed in the instruction
>> set anyway, it may be less hardware then to use separate compare and branch
>> instructions. Compares are needed for expressions like: X = a < b. which are
>> sometimes recorded instead of branched on.
> <
> My 66000 compare instructions build a bit-vector of relations. Then my shift
> instructions can extract the signed set {-1,0} or the unsigned set {0,+1}, so
> different languages can access what is their definition of the right result.

In my case, they come in "flavors":
EQ
GT, GE (Signed)
HI, HS (Unsigned)

BT/BF choice may be used to built the other possibilities.
Strictly speaking, one only needs EQ, GT, and HI; but this would leave
immediate-form instructions unable to directly express some cases (LT
and GE; and if these cases need a separate constant-load, this is lame).

So, one needs, say:
CMPEQ Reg, Reg
CMPEQ Imm, Reg
CMPGT Reg, Reg
CMPGT Imm, Reg
CMPGE Imm, Reg

CMPHI Reg, Reg
CMPHI Imm, Reg
CMPHS Imm, Reg

>>
>> My latest core stores the results in condition code registers like the PowerPC
>> rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
>> displacement. For branch-to-subroutine 24-bit displacements are possible.
> <
> How often is 21-bits sufficient (where 16-bits would have been insufficient) ??
> Do you know of a subroutine that is bigger than 1,000,000 instructions ?
> <
> As for direct BR and CALL one has 26-bits

Thus far, none of my test programs have exceeded the existing 20-bit
limit (1MB), for the entire ".text" section.

When this happens, things like function calls will start needing to use
a bigger encoding (thus far mostly untested; and will not currently be
predicted by the branch predictor). As-is, there are both Disp33s and
Abs48 encodings as possible fallback cases.

For most sane functions, for internal branches, this is likely in
"probably not going to happen" territory.

Closest to this limit thus far is ROTT, which weighs in at a (relatively
massive) 754K of ".text" (for reference, the x86-64 builds are nearly 2MB).

Though, part of this though seems to be that the people at Apogee seem
to have used excessive amounts of macros and copy/paste.

Something like a 22-bit branch might be better in the sense of "yeah,
4MB .text section is not implausible".

Though, 26-bits seems a little overkill though.

Non-local calls and branches need not use relative branch instructions
though, so this case does not matter.

In other news, my experiment for Huffman coding + LZ with 4 interleaved
bitstreams was a moderate success (on my Ryzen 2700X, MSVC):
Decode speeds ~ 400 to 700 MB/sec with Huffman coding;
Decode speeds ~ 600 to 1200 MB/sec with fixed-length (byte) symbols;
Compression ratios also slightly better than Deflate.

GCC still seems to be doing its "approx 40% faster" thing, so with GCC
in a few cases was getting Huffman-coded decode speeds of around 1.0GB/sec.

General approach should also map over to BJX2.

The LZ format is a little wonky though on account of the use of parallel
buffers for encoding/decoding stuff. Other than the parallel buffer
wonkiness, the design is vaguely similar to that of LZ4 (just with tags,
literal, distance, and extra bytes, being spread into 4 parallel buffers).

This is also with a 12-bit symbol length limit (can do 13 bit, which
helps compression at the cost of decoding speed).

Still technically only a moderate improvement over a codec based on a
normal (symbol at a time) decoding with a 12-bit symbol length limit
though (vs, say, the speed improvement of a 12-bit limit vs 15 bits).

But, at least, it is not being particularly bottle-necked by the entropy
coder (in the profiler), which is something (also spends a lot of time
in handling the LZ format, and in the match copying function).

Had also designed the format to be able to deal with deltas between
buffers (partly inspired by Sierra's video format). Though this part was
not being tested in this case.

So, would at least appear to be "kinda fast-ish", at least as far as
entropy-encoded designs go.

....

Re: Compare-and-branch vs PIC

<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29445&group=comp.arch#29445

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:b28f:0:b0:4da:d0a:b433 with SMTP id r15-20020a0cb28f000000b004da0d0ab433mr28021qve.13.1670662299063;
Sat, 10 Dec 2022 00:51:39 -0800 (PST)
X-Received: by 2002:a05:6830:1b72:b0:66e:9633:5d64 with SMTP id
d18-20020a0568301b7200b0066e96335d64mr11230452ote.349.1670662298786; Sat, 10
Dec 2022 00:51:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Dec 2022 00:51:38 -0800 (PST)
In-Reply-To: <tmtmlq$uo06$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dde:6a00:e893:3405:b761:b8c2;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dde:6a00:e893:3405:b761:b8c2
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com> <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 10 Dec 2022 08:51:39 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 11629
 by: robf...@gmail.com - Sat, 10 Dec 2022 08:51 UTC

On Thursday, December 8, 2022 at 4:58:22 PM UTC-5, BGB wrote:
> On 12/8/2022 1:52 PM, MitchAlsup wrote:
> > On Wednesday, December 7, 2022 at 2:26:20 PM UTC-6, robf...@gmail.com wrote:
> >> I think deciding between compare-and-branch as one instruction may have to
> >> do with the size of the branch displacement possible in an instruction format.
> >> It takes more bits to represent the instruction if there are two register fields in
> >> it and that may limit the displacement size. If instructions are 16-bits specifying
> >> two registers in the instruction probably is not practical due to the small
> >> displacement.
> > <
> > With the Compare instruction and then branch model, there is 1 branch instruction
> > with a myriad of sub instructions {Br ==, Br !=, Br s>, BR s>=, Br s<, BR s<=, Br u<
> > Br u<=, Br u>, BR u>=, and more} all as 1 single Major OpCode (front 6 bits). I
> > use the bits where the destination register would be to encode the condition.
> > <
> > In contrast, the Compare-Branch model only has a few compares-and-branches
> > due to not wanting to waste too much OpCode space of the compare-branch.
> > <
> > My compare then branch model uses a bit vector condition, so branch on bit
> > falls out for free. Over on the compare-to-zero-and-branch side, I have a complete
> > set of compare-to-zero-branch instructions, some with the ability to test for
> > situations outside of normal programmatics--for example: test if an ATOMIC
> > sequence has suffered from interference.
> I ended up defining some RISC-V style compare-and-branch instructions.
> Only real reason they are enabled in because the RISC-V mode needs them.
>
> They end up talking up ~ 1/4 of the F1 block (mostly used for Load/Store
> Disp9 ops).
>
>
> IIRC, I did add compiler support for them, but they are still "pretty
> meh": "if(a<b)" as a single op, but only if the target is within +/- 256
> bytes being not that big of a win.
>
> Ironically, "if(a<0)" tends to end up being more useful (these ended up
> existing as separate ops in BJX2 mostly because of the lack of an
> architectural zero register)
>
>
> In the case of integer compare ops, the majority of cases tend to be
> comparisons with a constant, which is on average better served, say, by
> "CMPxx Imm10, Rn; BT/BF lbl"
> >>
> >> It is often not necessary to do the compare, in some processors the condition
> >> codes are set by many instructions and a branch can take place without needing
> >> an explicit compare. In that case the instructions are just as compact as doing a
> >> compare-and-branch in one instruction.
> > <
> > The SPARC experience indicates that 80% of this free work is non-useful.
> Most status flag updating is noise.
>
> If one does status-flags like on x86, then far more often the status
> flags will end up being stomped before one can do anything with the results.
>
> Status flags are more useful, ironically, when only a very limited
> selection of instructions modify them.
>
>
> Similar for True/False bit predication:
> To be useful for Then/Else blocks, also implies that no instruction
> inside the Then/Else block may update this bit.
> >>
> >> RISCV has only a 12-bit displacement field which is adequate most of the time,
> >> but no doubt on rare occasions extra code needs to be inserted to branch further
> >> away.
> > <
> > My 66000 has 16-bits of word displacement in the OpCode which is inadequate
> > a whole lot less than RISC-V. You would need a subroutine with more than 32767
> > instructions.
> In my case, BT/BF are 20-bit, which is almost always sufficient.
>
> But, yeah, a branch within +/- 32K words will hit a lot more often than
> +/- 2K words (which will hit more often than +/- 256 words).
> >>
> >> Another consideration is that compare instructions are needed in the instruction
> >> set anyway, it may be less hardware then to use separate compare and branch
> >> instructions. Compares are needed for expressions like: X = a < b. which are
> >> sometimes recorded instead of branched on.
> > <
> > My 66000 compare instructions build a bit-vector of relations. Then my shift
> > instructions can extract the signed set {-1,0} or the unsigned set {0,+1}, so
> > different languages can access what is their definition of the right result.
> In my case, they come in "flavors":
> EQ
> GT, GE (Signed)
> HI, HS (Unsigned)
>
> BT/BF choice may be used to built the other possibilities.
> Strictly speaking, one only needs EQ, GT, and HI; but this would leave
> immediate-form instructions unable to directly express some cases (LT
> and GE; and if these cases need a separate constant-load, this is lame).
>
> So, one needs, say:
> CMPEQ Reg, Reg
> CMPEQ Imm, Reg
> CMPGT Reg, Reg
> CMPGT Imm, Reg
> CMPGE Imm, Reg
>
> CMPHI Reg, Reg
> CMPHI Imm, Reg
> CMPHS Imm, Reg
> >>
> >> My latest core stores the results in condition code registers like the PowerPC
> >> rather than gprs. It is just a bit different. I manage then to get a 21-bit branch
> >> displacement. For branch-to-subroutine 24-bit displacements are possible.
> > <
> > How often is 21-bits sufficient (where 16-bits would have been insufficient) ??
> > Do you know of a subroutine that is bigger than 1,000,000 instructions ?
> > <
> > As for direct BR and CALL one has 26-bits
> Thus far, none of my test programs have exceeded the existing 20-bit
> limit (1MB), for the entire ".text" section.
>
> When this happens, things like function calls will start needing to use
> a bigger encoding (thus far mostly untested; and will not currently be
> predicted by the branch predictor). As-is, there are both Disp33s and
> Abs48 encodings as possible fallback cases.
>
> For most sane functions, for internal branches, this is likely in
> "probably not going to happen" territory.
>
>
> Closest to this limit thus far is ROTT, which weighs in at a (relatively
> massive) 754K of ".text" (for reference, the x86-64 builds are nearly 2MB).
>
> Though, part of this though seems to be that the people at Apogee seem
> to have used excessive amounts of macros and copy/paste.
>
>
> Something like a 22-bit branch might be better in the sense of "yeah,
> 4MB .text section is not implausible".
>
> Though, 26-bits seems a little overkill though.
>
>
> Non-local calls and branches need not use relative branch instructions
> though, so this case does not matter.
>
>
>
>
> In other news, my experiment for Huffman coding + LZ with 4 interleaved
> bitstreams was a moderate success (on my Ryzen 2700X, MSVC):
> Decode speeds ~ 400 to 700 MB/sec with Huffman coding;
> Decode speeds ~ 600 to 1200 MB/sec with fixed-length (byte) symbols;
> Compression ratios also slightly better than Deflate.
>
> GCC still seems to be doing its "approx 40% faster" thing, so with GCC
> in a few cases was getting Huffman-coded decode speeds of around 1.0GB/sec.
>
> General approach should also map over to BJX2.
>
> The LZ format is a little wonky though on account of the use of parallel
> buffers for encoding/decoding stuff. Other than the parallel buffer
> wonkiness, the design is vaguely similar to that of LZ4 (just with tags,
> literal, distance, and extra bytes, being spread into 4 parallel buffers).
>
>
> This is also with a 12-bit symbol length limit (can do 13 bit, which
> helps compression at the cost of decoding speed).
>
>
> Still technically only a moderate improvement over a codec based on a
> normal (symbol at a time) decoding with a 12-bit symbol length limit
> though (vs, say, the speed improvement of a 12-bit limit vs 15 bits).
>
> But, at least, it is not being particularly bottle-necked by the entropy
> coder (in the profiler), which is something (also spends a lot of time
> in handling the LZ format, and in the match copying function).
>
>
> Had also designed the format to be able to deal with deltas between
> buffers (partly inspired by Sierra's video format). Though this part was
> not being tested in this case.
>
>
> So, would at least appear to be "kinda fast-ish", at least as far as
> entropy-encoded designs go.
>
> ...


Click here to read the complete article
Re: Compare-and-branch vs PIC

<kM1lL.20158$iU59.5539@fx14.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29447&group=comp.arch#29447

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com> <9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com> <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com> <tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
In-Reply-To: <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 19
Message-ID: <kM1lL.20158$iU59.5539@fx14.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 10 Dec 2022 15:41:04 UTC
Date: Sat, 10 Dec 2022 10:40:57 -0500
X-Received-Bytes: 1857
 by: EricP - Sat, 10 Dec 2022 15:40 UTC

robf...@gmail.com wrote:
> One complication is dealing with external interrupts that may happen. My first
> thought is to simply not allow them in the shadow of a PRED instruction. It
> would be delaying interrupt processing by seven or eight instructions max.
> Otherwise, the PRED state would need to be saved somehow and restored after
> the interrupt. Possibly by saving a copy of the interrupted instruction plus
> predicate.

An alternative is a PRED prefix that only applies to a single instruction.

Of course it won't win awards on code size but decode sees the pairing
as a single instruction and the shadow complications do go away.
Internally it still has many of the complications of predicated uOps,
but the dependency chain is simplified because it doesn't interact
with the shadow mask and its invisible global state -
it is just an optional extra source operand on each uOp.

Re: Compare-and-branch vs PIC

<tn2kml$1n57m$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29451&group=comp.arch#29451

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compare-and-branch vs PIC
Date: Sat, 10 Dec 2022 12:55:09 -0600
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <tn2kml$1n57m$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me>
<8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com>
<eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 10 Dec 2022 18:55:17 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="bd1a1a94f9f96db138b9dad844043b1a";
logging-data="1807606"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UPV7lxle6GR7k0dki3t18"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:IbbVLIL1FWaazAoBK0dcP6zB+YE=
In-Reply-To: <kM1lL.20158$iU59.5539@fx14.iad>
Content-Language: en-US
 by: BGB - Sat, 10 Dec 2022 18:55 UTC

On 12/10/2022 9:40 AM, EricP wrote:
> robf...@gmail.com wrote:
>> One complication is dealing with external interrupts that may happen.
>> My first
>> thought is to simply not allow them in the shadow of a PRED
>> instruction. It
>> would be delaying interrupt processing by seven or eight instructions
>> max.
>> Otherwise, the PRED state would need to be saved somehow and restored
>> after
>> the interrupt. Possibly by saving a copy of the interrupted
>> instruction plus
>> predicate.
>
> An alternative is a PRED prefix that only applies to a single instruction.
>
> Of course it won't win awards on code size but decode sees the pairing
> as a single instruction and the shadow complications do go away.
> Internally it still has many of the complications of predicated uOps,
> but the dependency chain is simplified because it doesn't interact
> with the shadow mask and its invisible global state -
> it is just an optional extra source operand on each uOp.
>

And, burning 2 or 3 bits in the encoding means no space (or cycles) are
wasted on a prefix.

Granted, one could argue, the predicate field and existence of 16-bit
ops, is the dividing line between the practicality of 6-bit register fields.

Say, 6b:
zztt-tttt-zzzz-ssss ssnn-nnnn-zzzz-pp11 //10b for opcode
zztt-tttt-zzzz-ssss ssnn-nnnn-zzzz-zpp1 //11b for opcode
zztt-tttt-zzzz-ssss ssnn-nnnn-zzzz-zzzz //14b for opcode

Vs, 5b:
zzzt-tttt-zzzz-ssss szzn-nnnn-zzzz-pp11 //13b for opcode
zzzt-tttt-zzzz-ssss szzn-nnnn-zzzz-zpp1 //14b for opcode
zzzt-tttt-zzzz-ssss szzn-nnnn-zzzz-zzzz //17b for opcode

With my existing ISA (in the above scheme):
zzzz-znst-tttt-zzzz 111p-zzpz-nnnn-ssss //12b for opcode

Have noted, one doesn't really want to have much less opcode than this.

Had hacked on some 32b encodings with 6b fields, say:
zzzz-znst-tttt-zzzz 0111-wnst-nnnn-ssss //9b for opcode
zzzz-znsi-iiii-iiii 1001-wnsz-nnnn-ssss //6b for opcode

Which lack predication and only cover a subset of the ISA, and then
stuff gets more hacky...

There is the Op64 encoding, which supports both 6b register fields and
predication, but not bundling, and needs 64b to encode.

There is also 2x Op40, which supports both (in a bundle), as a 96-bit
encoding, but is a bit of a hack (in the decoder, it fakes it as if
there were a pair of op64 encodings; just with a lot of the bits
hard-wired to 0).

With 6b fields, a "break even" would be to give up entirely on 16-bit
ops but keep a 2b pp field, say:
zztt-tttt-zzzz-ssss ssnn-nnnn-zzzz-zzpp //12b for opcode
Or, the RISC-V style notation of breaking up by field:
zz-tttttt-zzzz-ssssss-nnnnnn-zzzzzz-pp //12b for opcode

Code density could suffer slightly, but it could allow both uniform
6-bit register access and per-instruction predication (but not multiple
predicate bits).

Another option being to just start with 11b and try a little harder to
optimize the use of the encoding space (will still need to keep Jumbo
encodings or similar as an "escape hatch", or for less-common instructions).

Though, 16b encodings wont go very far if trying for 6b fields:
zznn-nnnn-ssss-ssz0 //3b opcode... yeah, wont go far.

Almost makes more sense to not bother if going directly for 64 GPRs, and
instead skip out on 16b ops (gaining an extra opcode bit).

Well, and/or treat the low 3 bits as a combined field:
000: 32b
001: 32b
010: 16b (Opt, 1)
011: 32b
10z: 32b
11z: 32b

1: So, there would be half the encoding space which would effectively be
disallowed from being used outside of Lane 1. This encoding space could
be left to 16b ops, which will necessarily need to operate on a subset
of the register space.

....

Re: Compare-and-branch vs PIC

<4c804d87-b941-4ea4-8180-7098078149b9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29452&group=comp.arch#29452

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:100f:b0:6fa:17e5:b62b with SMTP id z15-20020a05620a100f00b006fa17e5b62bmr84212561qkj.676.1670702267425;
Sat, 10 Dec 2022 11:57:47 -0800 (PST)
X-Received: by 2002:a05:6870:8882:b0:132:6f79:9ffb with SMTP id
m2-20020a056870888200b001326f799ffbmr45387140oam.61.1670702267054; Sat, 10
Dec 2022 11:57:47 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Dec 2022 11:57:46 -0800 (PST)
In-Reply-To: <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2c58:84f7:a72d:d4bd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2c58:84f7:a72d:d4bd
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmqnrp$kdei$1@dont-email.me> <8160b9ac-1da4-4b81-86dc-15ead2196240n@googlegroups.com>
<9359de4c-279a-4a61-80a5-2be9f7a44eabn@googlegroups.com> <eda3bd59-9a96-4c57-90d1-ca74d4e6a0f5n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4c804d87-b941-4ea4-8180-7098078149b9n@googlegroups.com>
Subject: Re: Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 10 Dec 2022 19:57:47 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3787
 by: MitchAlsup - Sat, 10 Dec 2022 19:57 UTC

On Saturday, December 10, 2022 at 2:51:40 AM UTC-6, robf...@gmail.com wrote:
> On Thursday, December 8, 2022 at 4:58:22 PM UTC-5, BGB wrote:
> > On 12/8/2022 1:52 PM, MitchAlsup wrote:

> Found I can use the condition registers as complicated predicate registers
> with the use of the PRED instruction modifier. I am liking the My66000 PRED
> modifier idea. The original Thor core had 16 predicate registers which were
> basically, condition code registers that were coupled with a four-bit condition
> specifier. Thus, eight bits were tacked onto the start of every instruction to
> specify a predicate. I found most of the time the predicate was not used so a
> byte of memory was wasted for many instructions. Eventually decided to
> shelve the core. But now with My66000 style PREDs there is one extra
> instruction for the PRED, 32-bitts, but it can be applied for up to the following
> seven instructions (in my core). And the bytes are used only when a predicate
> is needed. So, now there are eight condition (predicate) registers instead of
> 16. But the number of predicate conditions has expanded to allow a variety of
> floating-point conditions. So, a six-bit condition field is used. This is nine bits
> total, but it only must be present in the PRED instruction which has lots of bits
> available.
<
signed {=, !=, >, >=, <, <=} unsigned {>, >=, <, <=} float {=, !=, >, >=, <, <=, OR, UN} and
double {=, !=, >, >=, <, <=, OR, UN}
<
I count 26 which fits in 5-bits.
<
> One complication is dealing with external interrupts that may happen. My first
> thought is to simply not allow them in the shadow of a PRED instruction. It
> would be delaying interrupt processing by seven or eight instructions max.
> Otherwise, the PRED state would need to be saved somehow and restored after
> the interrupt. Possibly by saving a copy of the interrupted instruction plus
> predicate.
>
> My core uses condition registers to store a bit vector of comparison flags in a
> manner analogous to the My66000 use of a GPR.
<
Another convert........
<
> I am hoping to have more GPRs available for other use by offloading the compare
> results to dedicated registers. The branch instruction then also need only specify
> three bits for the condition register.

Re: everything old is new again, Compare-and-branch vs PIC

<tn2u7j$288n$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29455&group=comp.arch#29455

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Sat, 10 Dec 2022 21:37:55 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tn2u7j$288n$1@gal.iecc.com>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad>
Injection-Date: Sat, 10 Dec 2022 21:37:55 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="74007"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 10 Dec 2022 21:37 UTC

According to EricP <ThatWouldBeTelling@thevillage.com>:
>robf...@gmail.com wrote:
>> One complication is dealing with external interrupts that may happen. My first
>> thought is to simply not allow them in the shadow of a PRED instruction. It
>> would be delaying interrupt processing by seven or eight instructions max.
>> Otherwise, the PRED state would need to be saved somehow and restored after
>> the interrupt. Possibly by saving a copy of the interrupted instruction plus
>> predicate.
>
>An alternative is a PRED prefix that only applies to a single instruction.

Fifty years ago we called those skip instructions.

On a PDP-6, to put the larger of A and B into accumulator AC:

MOVE AC,A ; put A into AC
CAMGE AC,B ; skip if AC >= B
MOVE AC,B ; put B into AC

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: everything old is new again, Compare-and-branch vs PIC

<495fd5bb-a8d7-4c41-8853-4cf5ec8d65dcn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29457&group=comp.arch#29457

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:46cd:0:b0:6fb:7c45:bd5 with SMTP id t196-20020a3746cd000000b006fb7c450bd5mr83570870qka.304.1670710043266;
Sat, 10 Dec 2022 14:07:23 -0800 (PST)
X-Received: by 2002:a05:6870:d1d5:b0:144:effa:8c0f with SMTP id
b21-20020a056870d1d500b00144effa8c0fmr2536282oac.167.1670710042833; Sat, 10
Dec 2022 14:07:22 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Dec 2022 14:07:22 -0800 (PST)
In-Reply-To: <tn2u7j$288n$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dde:6a00:3d3c:d049:e645:3a7d;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dde:6a00:3d3c:d049:e645:3a7d
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <495fd5bb-a8d7-4c41-8853-4cf5ec8d65dcn@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 10 Dec 2022 22:07:23 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3349
 by: robf...@gmail.com - Sat, 10 Dec 2022 22:07 UTC

On Saturday, December 10, 2022 at 4:37:58 PM UTC-5, John Levine wrote:
> According to EricP <ThatWould...@thevillage.com>:
> >robf...@gmail.com wrote:
> >> One complication is dealing with external interrupts that may happen. My first
> >> thought is to simply not allow them in the shadow of a PRED instruction. It
> >> would be delaying interrupt processing by seven or eight instructions max.
> >> Otherwise, the PRED state would need to be saved somehow and restored after
> >> the interrupt. Possibly by saving a copy of the interrupted instruction plus
> >> predicate.
> >
> >An alternative is a PRED prefix that only applies to a single instruction.
> Fifty years ago we called those skip instructions.
>
> On a PDP-6, to put the larger of A and B into accumulator AC:
>
> MOVE AC,A ; put A into AC
> CAMGE AC,B ; skip if AC >= B
> MOVE AC,B ; put B into AC
>
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

>signed {=, !=, >, >=, <, <=} unsigned {>, >=, <, <=} float {=, !=, >, >=, <, <=, OR, UN} and
>double {=, !=, >, >=, <, <=, OR, UN}
><
>I count 26 which fits in 5-bits.

I followed the 68k FPU's float tests of which there are 32 just for one precision. They
went a bit crazy testing for Nan's in combination with other relations. Then there are
the 16 integer conditions based on the flags nf,vf,cf,zf. That leave 16 conditions to be
defined.
I was thinking of including magnitude greater than and less than too, which would take
the abs() of the numbers first. I have a low bar for including ops in the instruction set,
as a lot can be supported. Somewhere around 0.1% used.

Just added triple precision decimal-float to my 68k core. Primarily using the 68k core
to test the decimal-float instructions. They use separate compare and branch
instructions. Found an error in recent docs for the 68881/68882 FPU.

Re: everything old is new again, Compare-and-branch vs PIC

<079345c1-476d-414b-bd2e-b217543695e3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29458&group=comp.arch#29458

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5988:0:b0:3a5:9370:ccf4 with SMTP id e8-20020ac85988000000b003a59370ccf4mr76428769qte.376.1670717379244;
Sat, 10 Dec 2022 16:09:39 -0800 (PST)
X-Received: by 2002:a05:6870:f592:b0:144:543:c801 with SMTP id
eh18-20020a056870f59200b001440543c801mr21140635oab.201.1670717378950; Sat, 10
Dec 2022 16:09:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Dec 2022 16:09:38 -0800 (PST)
In-Reply-To: <tn2u7j$288n$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2c58:84f7:a72d:d4bd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2c58:84f7:a72d:d4bd
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <079345c1-476d-414b-bd2e-b217543695e3n@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 11 Dec 2022 00:09:39 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2834
 by: MitchAlsup - Sun, 11 Dec 2022 00:09 UTC

On Saturday, December 10, 2022 at 3:37:58 PM UTC-6, John Levine wrote:
> According to EricP <ThatWould...@thevillage.com>:
> >robf...@gmail.com wrote:
> >> One complication is dealing with external interrupts that may happen. My first
> >> thought is to simply not allow them in the shadow of a PRED instruction. It
> >> would be delaying interrupt processing by seven or eight instructions max.
> >> Otherwise, the PRED state would need to be saved somehow and restored after
> >> the interrupt. Possibly by saving a copy of the interrupted instruction plus
> >> predicate.
> >
> >An alternative is a PRED prefix that only applies to a single instruction.
> Fifty years ago we called those skip instructions.
<
Which comes back to the important point about predication ::
<
You do not want to predicate any farther than is already covered by your FETCH
momentum. Predication is entirely present to avoid disrupting the current FETCH
stream--and no farther. If you have to (or want to) disrupt the FETCH point, then
branch (and friends) is the proper tool.
>
> On a PDP-6, to put the larger of A and B into accumulator AC:
>
> MOVE AC,A ; put A into AC
> CAMGE AC,B ; skip if AC >= B
> MOVE AC,B ; put B into AC
>
There were a lot of cool things about the PDP-6 and the PDP-10 follow ons.
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

Re: everything old is new again, Compare-and-branch vs PIC

<bwnlL.106$%os8.36@fx03.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29468&group=comp.arch#29468

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx03.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
In-Reply-To: <tn2u7j$288n$1@gal.iecc.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 48
Message-ID: <bwnlL.106$%os8.36@fx03.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 11 Dec 2022 16:25:43 UTC
Date: Sun, 11 Dec 2022 11:25:10 -0500
X-Received-Bytes: 3092
 by: EricP - Sun, 11 Dec 2022 16:25 UTC

John Levine wrote:
> According to EricP <ThatWouldBeTelling@thevillage.com>:
>> robf...@gmail.com wrote:
>>> One complication is dealing with external interrupts that may happen. My first
>>> thought is to simply not allow them in the shadow of a PRED instruction. It
>>> would be delaying interrupt processing by seven or eight instructions max.
>>> Otherwise, the PRED state would need to be saved somehow and restored after
>>> the interrupt. Possibly by saving a copy of the interrupted instruction plus
>>> predicate.
>> An alternative is a PRED prefix that only applies to a single instruction.
>
> Fifty years ago we called those skip instructions.
>
> On a PDP-6, to put the larger of A and B into accumulator AC:
>
> MOVE AC,A ; put A into AC
> CAMGE AC,B ; skip if AC >= B
> MOVE AC,B ; put B into AC
>

Not really - skip is a very short branch, forward only.

Skip/Branch removes intervening instructions from the pipeline
and no resources are assigned for them.
(Consider how skip would behave in an in-order pipeline without branch
prediction - it would stall at fetch until its conditional resolved.)

Predication keeps the pipeline running, feeding the predicated
instructions into the pipeline and tentatively assigns resources
to them in case they do execute.
Later if they do not execute it must patch the result environment
so it looks like the disabled instructions never existed,
which can possibly include propagating values to different locations
and dynamically editing dependency chains accordingly,
then clean up and recover the assigned resources.
This must be done in such a way that it can take an interrupt/exception
at any point and resolve to a predictable and restartable state.

BTW on the simple cpu designs in days of yore, the advantage of
skip conditional over an unconditional branch vs a conditional branch
is for skip the HW instruction execute state machine always has the same
state sequence and just changes the value added to IP, whereas BRcc has to
change its whole state sequence depending on CC to perform/not-perform
the offset fetch and add. So BRcc is more complex to implement than SKIPcc,
and when you are paying per-gate for the cpu the cost difference matters.

Pages:123456789101112131415161718192021222324
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor