Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and fixed.


devel / comp.arch / Implementation of condition code on OoO CPUs

SubjectAuthor
* Implementation of condition code on OoO CPUsAnton Ertl
+- Re: Implementation of condition code on OoO CPUsMitchAlsup
+* Re: Implementation of condition code on OoO CPUsQuadibloc
|+* Re: Implementation of condition code on OoO CPUsMitchAlsup
||`* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
|| `* Re: Implementation of condition code on OoO CPUsrobf...@gmail.com
||  `* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
||   `* Re: Implementation of condition code on OoO CPUsMitchAlsup
||    `* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
||     +* Re: Implementation of condition code on OoO CPUsTerje Mathisen
||     |+- Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
||     |`* Re: Implementation of condition code on OoO CPUsAnton Ertl
||     | +* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
||     | |`* Re: Implementation of condition code on OoO CPUsMichael S
||     | | +* Re: Implementation of condition code on OoO CPUsAnton Ertl
||     | | |+* Re: Implementation of condition code on OoO CPUsMichael S
||     | | ||`- Re: Implementation of condition code on OoO CPUsTerje Mathisen
||     | | |`- Re: Implementation of condition code on OoO CPUsMitchAlsup
||     | | `- Re: Implementation of condition code on OoO CPUsTerje Mathisen
||     | `- Re: Implementation of condition code on OoO CPUsTerje Mathisen
||     `- Re: Implementation of condition code on OoO CPUsMitchAlsup
|`* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
| `* Re: Implementation of condition code on OoO CPUsMitchAlsup
|  `* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
|   `- Re: Implementation of condition code on OoO CPUsMitchAlsup
+* Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
|`- Re: Implementation of condition code on OoO CPUsluke.l...@gmail.com
`- Re: Implementation of condition code on OoO CPUsPeter Lund

Pages:12
Implementation of condition code on OoO CPUs

<2023Mar30.180206@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31410&group=comp.arch#31410

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Implementation of condition code on OoO CPUs
Date: Thu, 30 Mar 2023 16:02:06 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 18
Message-ID: <2023Mar30.180206@mips.complang.tuwien.ac.at>
Injection-Info: dont-email.me; posting-host="3b25f09d1e7f77391b159f9fbbb92fa0";
logging-data="1005115"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vWgRXOH7GmyIa8k9xmrlx"
Cancel-Lock: sha1:SaH0rniK/J4oH2s7ME1Scm6K8pE=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 30 Mar 2023 16:02 UTC

A while ago I saw a web page that said that some OOO core implements
the condition codes by hanging them on general-purpose registers; this
allows renaming the condition codes; and given that a condition code
is generated with many GPR results, managing them together makes some
sense; they are overwritten (i.e., freed) separately, which may speak
for managing them separately.

Anyway, I don't remember where I found this; does anybody have an
idea? It would be even better if I could find other references (best
scientific papers, but web sites are also of interest) that describe
the implementation of condition codes on OoO cores. Or if you know
(and can reveal) unpublished information on the topic, I am also
interested in that.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Implementation of condition code on OoO CPUs

<c43732b8-f44c-435b-85ec-80f6c119d052n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31415&group=comp.arch#31415

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a25:b0:746:5103:8eef with SMTP id bk37-20020a05620a1a2500b0074651038eefmr4297070qkb.4.1680197672395;
Thu, 30 Mar 2023 10:34:32 -0700 (PDT)
X-Received: by 2002:a05:6808:15a3:b0:389:6d65:6215 with SMTP id
t35-20020a05680815a300b003896d656215mr1264530oiw.3.1680197672116; Thu, 30 Mar
2023 10:34:32 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 30 Mar 2023 10:34:31 -0700 (PDT)
In-Reply-To: <2023Mar30.180206@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:241d:91bb:3f77:8d52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:241d:91bb:3f77:8d52
References: <2023Mar30.180206@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c43732b8-f44c-435b-85ec-80f6c119d052n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 30 Mar 2023 17:34:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2654
 by: MitchAlsup - Thu, 30 Mar 2023 17:34 UTC

On Thursday, March 30, 2023 at 11:19:52 AM UTC-5, Anton Ertl wrote:
> A while ago I saw a web page that said that some OOO core implements
> the condition codes by hanging them on general-purpose registers; this
> allows renaming the condition codes; and given that a condition code
> is generated with many GPR results, managing them together makes some
> sense; they are overwritten (i.e., freed) separately, which may speak
> for managing them separately.
>
> Anyway, I don't remember where I found this; does anybody have an
> idea? It would be even better if I could find other references (best
> scientific papers, but web sites are also of interest) that describe
> the implementation of condition codes on OoO cores. Or if you know
> (and can reveal) unpublished information on the topic, I am also
> interested in that.
<
Athlon and Opteron kept track of Zero, deNorm, Finite, Infinite, NaN attached
to each FP register. Not a true condition code, but enough to sort out which
algorithm is appropriate on this set of Operands.
<
I have seen another microarchitecture that kept track of that fact that this
result was accompanied by an exception in the register file. But I forgot
which.
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Implementation of condition code on OoO CPUs

<9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31419&group=comp.arch#31419

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:88:b0:3e1:934d:ba1a with SMTP id o8-20020a05622a008800b003e1934dba1amr9992720qtw.3.1680221394198;
Thu, 30 Mar 2023 17:09:54 -0700 (PDT)
X-Received: by 2002:a05:6870:1114:b0:177:9150:e7ba with SMTP id
20-20020a056870111400b001779150e7bamr8374681oaf.3.1680221393959; Thu, 30 Mar
2023 17:09:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 30 Mar 2023 17:09:53 -0700 (PDT)
In-Reply-To: <2023Mar30.180206@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb71:2b00:704a:c460:e8c7:5e4b;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb71:2b00:704a:c460:e8c7:5e4b
References: <2023Mar30.180206@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 31 Mar 2023 00:09:54 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2074
 by: Quadibloc - Fri, 31 Mar 2023 00:09 UTC

On Thursday, March 30, 2023 at 10:19:52 AM UTC-6, Anton Ertl wrote:
> A while ago I saw a web page that said that some OOO core implements
> the condition codes by hanging them on general-purpose registers;

To me, this seems odd.

What do you want to do with the condition codes in a pipelined computer
that uses out-of-order architecture? You want to ensure you associate
each condition code value with the _instruction_ after which it was set.

So I don't think it would be useful to stick the condition code on a register
so that register renaming could affect it.

However, in an architecture that uses *reservation stations* instead, in
line with the original Tomasulo algorithm, then I think it would make sense
to have the condition codes moving along with the operands, since this is
what mirrors the instruction flow.

John Savard

Re: Implementation of condition code on OoO CPUs

<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31424&group=comp.arch#31424

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5504:0:b0:5d9:66dc:41c5 with SMTP id pz4-20020ad45504000000b005d966dc41c5mr4516719qvb.3.1680226013149;
Thu, 30 Mar 2023 18:26:53 -0700 (PDT)
X-Received: by 2002:a05:6830:30af:b0:69f:6663:508e with SMTP id
g47-20020a05683030af00b0069f6663508emr3207133ots.1.1680226012912; Thu, 30 Mar
2023 18:26:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 30 Mar 2023 18:26:52 -0700 (PDT)
In-Reply-To: <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:241d:91bb:3f77:8d52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:241d:91bb:3f77:8d52
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 31 Mar 2023 01:26:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3325
 by: MitchAlsup - Fri, 31 Mar 2023 01:26 UTC

On Thursday, March 30, 2023 at 7:09:55 PM UTC-5, Quadibloc wrote:
> On Thursday, March 30, 2023 at 10:19:52 AM UTC-6, Anton Ertl wrote:
> > A while ago I saw a web page that said that some OOO core implements
> > the condition codes by hanging them on general-purpose registers;
<
> To me, this seems odd.
>
> What do you want to do with the condition codes in a pipelined computer
> that uses out-of-order architecture? You want to ensure you associate
> each condition code value with the _instruction_ after which it was set.
<
The thing about renaming is that you have to rename it to SomeThing!
Renaming it to a GPR is no harder than renaming it to something unique,
and you already have to have a GPR to rename it to, so why not lob on a few
more bits (64+5) is not much bigger than 64 and you get to reuse all the
other renaming HW............
<
The alternative is to rename it to its own file (of some sort) and then you
need read and write ports to it, and all those complications.
<
No, it seems easier to just use a GPR.
>
> So I don't think it would be useful to stick the condition code on a register
> so that register renaming could affect it.
<
The condition code generating calculation is already writing the register,
now you want to have the ability to write another 5-bits somewhere else
(say the CC ROB) here, the tag you need to write the CC ROB is BIGGER
than the data being written into the CC ROB.
<
And don't get me talking about x86 condition codes and having to track
3 different groups (C, O, and ZAPS) independently. x86 uses more data
tracking logic in condition code management than in register management.
>
> However, in an architecture that uses *reservation stations* instead, in
> line with the original Tomasulo algorithm, then I think it would make sense
> to have the condition codes moving along with the operands, since this is
> what mirrors the instruction flow.
>
> John Savard

Re: Implementation of condition code on OoO CPUs

<775989f1-7b39-492c-aee2-5ceaced6a68an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31471&group=comp.arch#31471

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7c48:0:b0:3e3:7dd2:47fc with SMTP id o8-20020ac87c48000000b003e37dd247fcmr13198719qtv.10.1680485308027;
Sun, 02 Apr 2023 18:28:28 -0700 (PDT)
X-Received: by 2002:a9d:7301:0:b0:698:6b65:f563 with SMTP id
e1-20020a9d7301000000b006986b65f563mr10764094otk.4.1680485307749; Sun, 02 Apr
2023 18:28:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 2 Apr 2023 18:28:27 -0700 (PDT)
In-Reply-To: <2023Mar30.180206@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=80.41.1.23; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 80.41.1.23
References: <2023Mar30.180206@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <775989f1-7b39-492c-aee2-5ceaced6a68an@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Mon, 03 Apr 2023 01:28:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3072
 by: luke.l...@gmail.com - Mon, 3 Apr 2023 01:28 UTC

On Thursday, March 30, 2023 at 5:19:52 PM UTC+1, Anton Ertl wrote:
> A while ago I saw a web page that said that some OOO core implements
> the condition codes by hanging them on general-purpose registers; this
> allows renaming the condition codes; and given that a condition code
> is generated with many GPR results, managing them together makes some
> sense; they are overwritten (i.e., freed) separately, which may speak
> for managing them separately.
>
> Anyway, I don't remember where I found this; does anybody have an
> idea? It would be even better if I could find other references (best
> scientific papers, but web sites are also of interest) that describe
> the implementation of condition codes on OoO cores. Or if you know
> (and can reveal) unpublished information on the topic, I am also
> interested in that.

small / outsid possibility your memory dredged up the Libre-SOC
design concept, where i place Zero-Overhead-Looping Prefixing with
register-number-offsetting onto a Suffix-instruction, where some
of those suffix-instructions in the Power ISA have "Rc=1" modes and
thus the Condition Codes also get Vectorised alongside the
corresponding Vector of results.

implementing this on a Multi-Issue OoO architecture you would
indeed get condition codes "hung off of" general-purpose registers
(in sequence) and would thus be able to cheat and do co-register
renaming of both the GPRs and their associated CC.

the big advantage of 4-bit Condition Codes is that you get effectively
4 Predicate Mask bits with which to play, on the following instructions.
Vectorise/Loop-i-fy the Condition Code instructions *as well*
(in a naturally orthogonal way) and you start to feel like the floor
opened up into a whole new world. just don't jump in feet-first:
it's a 1.5 *million* opcodes world if viewed naively.

l.

Re: Implementation of condition code on OoO CPUs

<9af75445-1bf2-4656-8759-5d565df308b6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31472&group=comp.arch#31472

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:88:b0:3e1:3cc8:98ae with SMTP id o8-20020a05622a008800b003e13cc898aemr13025697qtw.1.1680485482600;
Sun, 02 Apr 2023 18:31:22 -0700 (PDT)
X-Received: by 2002:a05:6808:23c3:b0:389:4cfe:9a9c with SMTP id
bq3-20020a05680823c300b003894cfe9a9cmr7654339oib.10.1680485482316; Sun, 02
Apr 2023 18:31:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 2 Apr 2023 18:31:22 -0700 (PDT)
In-Reply-To: <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=80.41.1.23; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 80.41.1.23
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9af75445-1bf2-4656-8759-5d565df308b6n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Mon, 03 Apr 2023 01:31:22 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1924
 by: luke.l...@gmail.com - Mon, 3 Apr 2023 01:31 UTC

On Friday, March 31, 2023 at 1:09:55 AM UTC+1, Quadibloc wrote:

> However, in an architecture that uses *reservation stations* instead, in
> line with the original Tomasulo algorithm, then I think it would make sense
> to have the condition codes moving along with the operands, since this is
> what mirrors the instruction flow.

indeed. only problem being that Tomasulo requires multi-ported CAMs
if you want Multi-Issue execution. those get fantastically expensive *real*
fast. alternatively you can stripe them (4-way issue ==> 4 completely
independent sets of RSes and 4 completely independent sets of pipelines
connected to them)

l.

Re: Implementation of condition code on OoO CPUs

<5916e655-f53c-45fb-85d4-77fcc409130cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31473&group=comp.arch#31473

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:24e:b0:3df:9e4:36c6 with SMTP id c14-20020a05622a024e00b003df09e436c6mr12889442qtx.3.1680514179027;
Mon, 03 Apr 2023 02:29:39 -0700 (PDT)
X-Received: by 2002:a05:6871:4d05:b0:176:43cd:d014 with SMTP id
ug5-20020a0568714d0500b0017643cdd014mr8528611oab.7.1680514178736; Mon, 03 Apr
2023 02:29:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Apr 2023 02:29:38 -0700 (PDT)
In-Reply-To: <775989f1-7b39-492c-aee2-5ceaced6a68an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=80.41.1.23; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 80.41.1.23
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <775989f1-7b39-492c-aee2-5ceaced6a68an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5916e655-f53c-45fb-85d4-77fcc409130cn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Mon, 03 Apr 2023 09:29:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Mon, 3 Apr 2023 09:29 UTC

On Monday, April 3, 2023 at 2:28:29 AM UTC+1, luke.l...@gmail.com wrote:

> small / outsid possibility your memory dredged up the Libre-SOC
> design concept, where i place Zero-Overhead-Looping Prefixing with
> register-number-offsetting onto a Suffix-instruction, where some
> of those suffix-instructions in the Power ISA have "Rc=1" modes and
> thus the Condition Codes also get Vectorised alongside the
> corresponding Vector of results.

clarifying:

....# based on Loop-Prefixing of Scalar Power ISA instructions
....# implementation of "sv.add" (Rc=0) and "sv.add." (Rc=1)
....# (scalar add being a degenerate case where VL=1)
....for i in range(VL):
.......ra = GPR(ra+i) # looping increments *scalar* register
.......rb = GPR(rb+i) # looping increments *scalar* register
.......result = ra + rb # actual add (a scalar add) here
.......GPR(RT+i) = result # but looping makes it look like a Vector Processor
.......if Rc=0: continue # skip Condition Codes if not "sv.add."
.......cr = (result == 0) || (result > 0) << 1 || (result < 0) << 2 # Condition Code
.......CR.field[i] = cr # also store the Vector of CR Fields

l.

Re: Implementation of condition code on OoO CPUs

<931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31474&group=comp.arch#31474

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:f71c:0:b0:742:f3f8:77ae with SMTP id q28-20020a37f71c000000b00742f3f877aemr9311015qkj.6.1680519310269;
Mon, 03 Apr 2023 03:55:10 -0700 (PDT)
X-Received: by 2002:a05:6808:4c7:b0:386:d50e:aa12 with SMTP id
a7-20020a05680804c700b00386d50eaa12mr9204973oie.5.1680519309933; Mon, 03 Apr
2023 03:55:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Apr 2023 03:55:09 -0700 (PDT)
In-Reply-To: <9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=80.41.1.23; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 80.41.1.23
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Mon, 03 Apr 2023 10:55:10 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2487
 by: luke.l...@gmail.com - Mon, 3 Apr 2023 10:55 UTC

On Friday, March 31, 2023 at 2:26:54 AM UTC+1, MitchAlsup wrote:

> The thing about renaming is that you have to rename it to SomeThing!
> Renaming it to a GPR is no harder than renaming it to something unique,
> and you already have to have a GPR to rename it to, so why not lob on a few
> more bits (64+5) is not much bigger than 64 and you get to reuse all the
> other renaming HW............

nice thing about that is that the "problem" of hazards for CR fields
goes away. however...

> <
> The alternative is to rename it to its own file (of some sort) and then you
> need read and write ports to it, and all those complications.

indeed we need that (for SVP64).

reason being: the GPR regfiles are actually sub-divideable
into smaller elements (2x32, 4x16, 8x8) for which CR fields
*are still applicable*.

thus no longer is just 64+5 bits needed but 64+(5*8) bits
and the valid ones depend on whether the last arithmetic
operation was a 64-bit one or a 2x32 or 4x16 or 8x8?

hmmm... :)

i mean i don't actually mind that - it's quite a nice idea,
but the programming model is a lot less straightforward
(fragile)

l.

Re: Implementation of condition code on OoO CPUs

<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31475&group=comp.arch#31475

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:b11:b0:73b:aa08:79ea with SMTP id t17-20020a05620a0b1100b0073baa0879eamr7951468qkg.5.1680526280036;
Mon, 03 Apr 2023 05:51:20 -0700 (PDT)
X-Received: by 2002:a05:6830:1be4:b0:6a1:1b5c:c6db with SMTP id
k4-20020a0568301be400b006a11b5cc6dbmr10346400otb.7.1680526279751; Mon, 03 Apr
2023 05:51:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Apr 2023 05:51:19 -0700 (PDT)
In-Reply-To: <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Mon, 03 Apr 2023 12:51:20 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1998
 by: robf...@gmail.com - Mon, 3 Apr 2023 12:51 UTC

I have used condition codes with an ooo design, and I just give the condition code
registers their own register tag just like the regular registers, and reuse the same
renaming hardware. Tags 0 to 31 are for GPRs, and tags 32 to 63 are for SPRs
including condition codes, vector masks, and a few other select SPRs. It is twice
as many tags to deal with, but it is the same hardware. CCs do have their own
register file though. The interesting part is being able to treat the CCs as a group
which also has its own tag.

The next step is to just put the condition codes in a general purpose register,
which is what Thor, and some others do. Branching on cc is just a
branch-on-bit-set then.

Re: Implementation of condition code on OoO CPUs

<d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31476&group=comp.arch#31476

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5910:0:b0:5bd:ed35:9546 with SMTP id ez16-20020ad45910000000b005bded359546mr7362245qvb.1.1680527520401;
Mon, 03 Apr 2023 06:12:00 -0700 (PDT)
X-Received: by 2002:a05:6870:11d9:b0:17e:1aaf:eb9b with SMTP id
25-20020a05687011d900b0017e1aafeb9bmr12503661oav.10.1680527520137; Mon, 03
Apr 2023 06:12:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Apr 2023 06:11:59 -0700 (PDT)
In-Reply-To: <0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=80.41.1.23; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 80.41.1.23
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Mon, 03 Apr 2023 13:12:00 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Mon, 3 Apr 2023 13:11 UTC

On Monday, April 3, 2023 at 1:51:21 PM UTC+1, robf...@gmail.com wrote:
> I have used condition codes with an ooo design, and I just give the condition code
> registers their own register tag just like the regular registers, and reuse the same
> renaming hardware. Tags 0 to 31 are for GPRs, and tags 32 to 63 are for SPRs
> including condition codes, vector masks, and a few other select SPRs. It is twice
> as many tags to deal with, but it is the same hardware. CCs do have their own
> register file though. The interesting part is being able to treat the CCs as a group
> which also has its own tag.

indeed. what i like about the CDC 6600, 68000, 66000, PDP8/11 etc. is
the fact that there are completely separate register files with very little
in the way of cross-over.

CDC 6600 has A B and X, where (i think) X was auto-increment which
of course avoids having every X-instruction have to also depend on
A or B, thus reducing the size of that Dependency Matrix Hazard Cell

if the only cross-over instructions are 1-in 1-out (mv operations in effect)
then the entire DM remains extremely sparse, each group (A-related,
B-related, X-related) having a *localised* group of multi-entry DM Cells
which on O(N^2) for each sparse group is a big reduction.

we just went over this on the Libre-SOC mailing list with one of the
hardware-inexperienced team members who proposed complex
multi-in-from-one-multi-regfiles multi-out-to-multiple-regfiles
instructions and it took several hours to get across to them that
this was not okay.

as long as you do not go above 3-in 2-out (where one of those
outs is the result and the other is the Condition Code) then it is
just about manageable from a DM Hazard perspective.

bottom line it's not whether you do a full OoO Hazard Management
system, it's whether the instructions are properly designed to
not blow up the DM Cells to completely unmanageable proportions.

l.

Re: Implementation of condition code on OoO CPUs

<0c576b9c-f39c-4fc8-847c-62d9fb386183n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31479&group=comp.arch#31479

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2a03:b0:746:98fc:c6cf with SMTP id o3-20020a05620a2a0300b0074698fcc6cfmr162517qkp.3.1680557248444;
Mon, 03 Apr 2023 14:27:28 -0700 (PDT)
X-Received: by 2002:a05:6808:14ce:b0:387:33bc:c3dc with SMTP id
f14-20020a05680814ce00b0038733bcc3dcmr9314087oiw.1.1680557248179; Mon, 03 Apr
2023 14:27:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Apr 2023 14:27:27 -0700 (PDT)
In-Reply-To: <9af75445-1bf2-4656-8759-5d565df308b6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6c65:606e:7226:bbdb;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6c65:606e:7226:bbdb
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9af75445-1bf2-4656-8759-5d565df308b6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0c576b9c-f39c-4fc8-847c-62d9fb386183n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 03 Apr 2023 21:27:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2278
 by: MitchAlsup - Mon, 3 Apr 2023 21:27 UTC

On Sunday, April 2, 2023 at 8:31:24 PM UTC-5, luke.l...@gmail.com wrote:
> On Friday, March 31, 2023 at 1:09:55 AM UTC+1, Quadibloc wrote:
>
> > However, in an architecture that uses *reservation stations* instead, in
> > line with the original Tomasulo algorithm, then I think it would make sense
> > to have the condition codes moving along with the operands, since this is
> > what mirrors the instruction flow.
<
> indeed. only problem being that Tomasulo requires multi-ported CAMs
<
That has been the lore for 30 years, but Shebanow and I solved this in 1991..
<
> if you want Multi-Issue execution. those get fantastically expensive *real*
> fast. alternatively you can stripe them (4-way issue ==> 4 completely
> independent sets of RSes and 4 completely independent sets of pipelines
> connected to them)
<
So goes the lore........
>
> l.

Re: Implementation of condition code on OoO CPUs

<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31481&group=comp.arch#31481

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5f0b:0:b0:3d7:9d03:75b0 with SMTP id x11-20020ac85f0b000000b003d79d0375b0mr14488546qta.13.1680558042989;
Mon, 03 Apr 2023 14:40:42 -0700 (PDT)
X-Received: by 2002:a05:6870:1258:b0:17e:a9eb:196b with SMTP id
24-20020a056870125800b0017ea9eb196bmr316174oao.4.1680558042723; Mon, 03 Apr
2023 14:40:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Apr 2023 14:40:42 -0700 (PDT)
In-Reply-To: <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6c65:606e:7226:bbdb;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6c65:606e:7226:bbdb
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 03 Apr 2023 21:40:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4666
 by: MitchAlsup - Mon, 3 Apr 2023 21:40 UTC

On Monday, April 3, 2023 at 8:12:02 AM UTC-5, luke.l...@gmail.com wrote:
> On Monday, April 3, 2023 at 1:51:21 PM UTC+1, robf...@gmail.com wrote:
> > I have used condition codes with an ooo design, and I just give the condition code
> > registers their own register tag just like the regular registers, and reuse the same
> > renaming hardware. Tags 0 to 31 are for GPRs, and tags 32 to 63 are for SPRs
> > including condition codes, vector masks, and a few other select SPRs. It is twice
> > as many tags to deal with, but it is the same hardware. CCs do have their own
> > register file though. The interesting part is being able to treat the CCs as a group
> > which also has its own tag.
> indeed. what i like about the CDC 6600, 68000, 66000, PDP8/11 etc. is
> the fact that there are completely separate register files with very little
> in the way of cross-over.
>
> CDC 6600 has A B and X, where (i think) X was auto-increment which
> of course avoids having every X-instruction have to also depend on
> A or B, thus reducing the size of that Dependency Matrix Hazard Cell
<
Not quite:: When you wrote to A[1..5] you got an address which was then
used to load X[1..5]. When you wrote to A[6..7] you got an address and
X[6..7] was stored at that address. This was all done in the INC unit
which calculated an 18-bit address in a single cycle, but had a latency
of 2 when successive INC instructions were executed.
>
> if the only cross-over instructions are 1-in 1-out (mv operations in effect)
> then the entire DM remains extremely sparse, each group (A-related,
> B-related, X-related) having a *localised* group of multi-entry DM Cells
> which on O(N^2) for each sparse group is a big reduction.
<
There was the SHIFT instruction which stripped the exponent off of
an X[register], delivering a FP value back to an X[register] and delivering
the exponent to a B[register]. There was also a put-it-back-together
instruction that took a B[register] and an X[register] and created
a new X[register] containing X[source]×2^B[register].
>
> we just went over this on the Libre-SOC mailing list with one of the
> hardware-inexperienced team members who proposed complex
> multi-in-from-one-multi-regfiles multi-out-to-multiple-regfiles
> instructions and it took several hours to get across to them that
> this was not okay.
<
Would have loved to have been a fly on the wall.........
>
> as long as you do not go above 3-in 2-out (where one of those
> outs is the result and the other is the Condition Code) then it is
> just about manageable from a DM Hazard perspective.
<
I managed to get N+1 in and 2-out with my CARRY strategy.......
>
> bottom line it's not whether you do a full OoO Hazard Management
> system, it's whether the instructions are properly designed to
> not blow up the DM Cells to completely unmanageable proportions.
<
Exactly--the OoO Hazard prevention is merely a tool. The less you
use/need it, the better the ISA.
>
> l.

Re: Implementation of condition code on OoO CPUs

<109771a2-dfc2-4b70-a16d-d03174565746n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31535&group=comp.arch#31535

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:199f:b0:3e4:db08:ae9c with SMTP id u31-20020a05622a199f00b003e4db08ae9cmr2229936qtc.8.1680775204654;
Thu, 06 Apr 2023 03:00:04 -0700 (PDT)
X-Received: by 2002:a05:6870:12d0:b0:17e:2ddf:b23c with SMTP id
16-20020a05687012d000b0017e2ddfb23cmr4090233oam.0.1680775204403; Thu, 06 Apr
2023 03:00:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Apr 2023 03:00:04 -0700 (PDT)
In-Reply-To: <0c576b9c-f39c-4fc8-847c-62d9fb386183n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9af75445-1bf2-4656-8759-5d565df308b6n@googlegroups.com> <0c576b9c-f39c-4fc8-847c-62d9fb386183n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <109771a2-dfc2-4b70-a16d-d03174565746n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Thu, 06 Apr 2023 10:00:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Thu, 6 Apr 2023 10:00 UTC

On Monday, April 3, 2023 at 10:27:30 PM UTC+1, MitchAlsup wrote:
> On Sunday, April 2, 2023 at 8:31:24 PM UTC-5, luke.l...@gmail.com wrote:
> > indeed. only problem being that Tomasulo requires multi-ported CAMs
> <
> That has been the lore for 30 years, but Shebanow and I solved this in 1991.

care to tell? wouldn't happen to have involved splitting the ROB# into
a partial unary- partial binary- encoding (6600-style unary DMs combined
with traditional CAM) would it?

l.

Re: Implementation of condition code on OoO CPUs

<ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31536&group=comp.arch#31536

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1911:b0:3d7:9d03:75b0 with SMTP id w17-20020a05622a191100b003d79d0375b0mr2490813qtc.13.1680775948585;
Thu, 06 Apr 2023 03:12:28 -0700 (PDT)
X-Received: by 2002:a05:6871:8f01:b0:17f:f1f4:b006 with SMTP id
zz1-20020a0568718f0100b0017ff1f4b006mr4554431oab.11.1680775948266; Thu, 06
Apr 2023 03:12:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Apr 2023 03:12:27 -0700 (PDT)
In-Reply-To: <e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Thu, 06 Apr 2023 10:12:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Thu, 6 Apr 2023 10:12 UTC

On Monday, April 3, 2023 at 10:40:44 PM UTC+1, MitchAlsup wrote:
> On Monday, April 3, 2023 at 8:12:02 AM UTC-5, luke.l...@gmail.com wrote:
> > we just went over this on the Libre-SOC mailing list with one of the
> > hardware-inexperienced team members who proposed complex
> > multi-in-from-one-multi-regfiles multi-out-to-multiple-regfiles
> > instructions and it took several hours to get across to them that
> > this was not okay.
> <
> Would have loved to have been a fly on the wall.........

ngggh if you like "Eastenders" or other high-stress "addictive"
conversations, it would have indeed been entertaining. honestly
it wasn't fun for me, being pulled out of heavy-concentration mode
from other much higher-priority tasks. sigh.

> >
> > as long as you do not go above 3-in 2-out (where one of those
> > outs is the result and the other is the Condition Code) then it is
> > just about manageable from a DM Hazard perspective.
> <
> I managed to get N+1 in and 2-out with my CARRY strategy.......

yes we're going to have to do micro-coding to avoid the
DMs getting overloaded: the 1st in the chain has to be
3-in 1-out (to read the existing reg-used-as-carry-in) and
the last in the chain has to be 2-in 2-out (to write the
reg-used-as-carry-out), but everything in between can
be 2-in 1-out *if* and only if an operand-forwarding-bus
exists.

i'm reasonably confident that CARRY would be similar?
(certainly i'd expect you to have a Carry-Op-Fwd-Bus for sure)

unless you don't store CARRY in Architectural State (SPR/CSRs)
in which case it wouldn't be possible to service an interrupt
in the middle of a chain-set, nor would it be possible to
either start with an incoming CARRY, and you'd need one
extra "digit" (one extra mul-add) with zeros in it in order
to receive the carry-out into (actual) registers?

i do really like the CARRY-has-more-than-one-bit concept.
Power ISA version 1 had it (back when it was a 32-bit-only
ISA) but they removed it. i think that was a mistake. we're
now having to propose 5 new instructions (all 3-in 2-out)
which "overload" one additional 64-bit register on RD
and one with WR, with the implicit meaning of "64-bit carry-in/out",
sigh.

intriguingly they are all remarkably similar to Intel x86
instructions that have existed for 12+ years, but are
juuust that little bit subtly different, having been designed
for biginteger arbitrary-length vector chaining.
https://libre-soc.org/openpower/sv/biginteger/analysis/

l.

Re: Implementation of condition code on OoO CPUs

<u0mgls$c4jv$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31537&group=comp.arch#31537

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Implementation of condition code on OoO CPUs
Date: Thu, 6 Apr 2023 15:15:40 +0200
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <u0mgls$c4jv$1@dont-email.me>
References: <2023Mar30.180206@mips.complang.tuwien.ac.at>
<9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>
<931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
<d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
<ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 6 Apr 2023 13:15:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e125b273b4bee778f84eb1d826cc1e86";
logging-data="397951"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/c9J8jDn6gI3x3gnnICbppS+YVFac30zhoi3GcgQCasQ=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha1:vnjDegqiWE0DEx+4yOnFuPC12dk=
In-Reply-To: <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
 by: Terje Mathisen - Thu, 6 Apr 2023 13:15 UTC

luke.l...@gmail.com wrote:
[snip]
> i do really like the CARRY-has-more-than-one-bit concept.
> Power ISA version 1 had it (back when it was a 32-bit-only
> ISA) but they removed it. i think that was a mistake. we're
> now having to propose 5 new instructions (all 3-in 2-out)
> which "overload" one additional 64-bit register on RD
> and one with WR, with the implicit meaning of "64-bit carry-in/out",
> sigh.
>
> intriguingly they are all remarkably similar to Intel x86
> instructions that have existed for 12+ years, but are
> juuust that little bit subtly different, having been designed
> for biginteger arbitrary-length vector chaining.
> https://libre-soc.org/openpower/sv/biginteger/analysis/

Those 12+ years must refer to MULX in 2012, but in reality it goes all
the way back to 1978:

x86 have always had 2-in/2-out as the standard MUL opcode, the only
problem being that most of the (register) operands were implied: MULX
provided a way to encode more register names.

For bigint applications I've usually found that the latency of the MUL
provides enough time to do all the required ADD/ADC ops, but it does
become significantly easier, and probably much more compiler friendly
after they added both MULX and allowed a second flag to be used in a
carry chain, separately from the standard CARRY.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Implementation of condition code on OoO CPUs

<31bca76b-21a5-4aa1-8d43-3a7c57f3d8bdn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31539&group=comp.arch#31539

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1a99:b0:3df:f0cf:97e with SMTP id s25-20020a05622a1a9900b003dff0cf097emr2305710qtc.13.1680792813772;
Thu, 06 Apr 2023 07:53:33 -0700 (PDT)
X-Received: by 2002:a05:6808:659:b0:386:b9bc:a2b4 with SMTP id
z25-20020a056808065900b00386b9bca2b4mr2768000oih.10.1680792813431; Thu, 06
Apr 2023 07:53:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Apr 2023 07:53:33 -0700 (PDT)
In-Reply-To: <u0mgls$c4jv$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com> <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
<u0mgls$c4jv$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <31bca76b-21a5-4aa1-8d43-3a7c57f3d8bdn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Thu, 06 Apr 2023 14:53:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3126
 by: luke.l...@gmail.com - Thu, 6 Apr 2023 14:53 UTC

On Thursday, April 6, 2023 at 2:15:44 PM UTC+1, Terje Mathisen wrote:
> luke.l...@gmail.com wrote:

> > https://libre-soc.org/openpower/sv/biginteger/analysis/
> Those 12+ years must refer to MULX in 2012,

yeah there's a really good app-note about MULX from Intel.
but we also added 3-in 2-out variants of the shift instructions
*and* a div-mod instruction that, by a really nice not-coincidence,
is the exact mathematical opposite of the 3-in 2-out mul
instruction.

> but in reality it goes all the way back to 1978:
>
> x86 have always had 2-in/2-out as the standard MUL opcode, the only
> problem being that most of the (register) operands were implied: MULX
> provided a way to encode more register names.

and dropped OV i think? because even A*B+C still does not overflow
(maximum value 0xfffe0001+0x0000ffff does not overflow)

> For bigint applications I've usually found that the latency of the MUL
> provides enough time to do all the required ADD/ADC ops, but it does
> become significantly easier, and probably much more compiler friendly
> after they added both MULX and allowed a second flag to be used in a
> carry chain, separately from the standard CARRY.

the kicker here (compared to SVP64) *is* that you had to do
2 instructions at all (per RADIX64 digit), whereas by having 3-in 2-out
it can be Vector-chain-looped by setting RC as a scalar and
(implicitly) having it as the 2nd destination for the top-half
of the 128-bit result.

l.

Re: Implementation of condition code on OoO CPUs

<6d654aac-3e52-4cf1-b6a6-63b1f03cab5bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31540&group=comp.arch#31540

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1895:b0:3e1:e1ae:9d5c with SMTP id v21-20020a05622a189500b003e1e1ae9d5cmr2515460qtc.11.1680795112428;
Thu, 06 Apr 2023 08:31:52 -0700 (PDT)
X-Received: by 2002:a05:6870:6094:b0:17e:3201:41b0 with SMTP id
t20-20020a056870609400b0017e320141b0mr4992135oae.5.1680795112061; Thu, 06 Apr
2023 08:31:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Apr 2023 08:31:51 -0700 (PDT)
In-Reply-To: <109771a2-dfc2-4b70-a16d-d03174565746n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c8d3:9dfd:4dac:9816;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c8d3:9dfd:4dac:9816
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9af75445-1bf2-4656-8759-5d565df308b6n@googlegroups.com> <0c576b9c-f39c-4fc8-847c-62d9fb386183n@googlegroups.com>
<109771a2-dfc2-4b70-a16d-d03174565746n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6d654aac-3e52-4cf1-b6a6-63b1f03cab5bn@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Apr 2023 15:31:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2269
 by: MitchAlsup - Thu, 6 Apr 2023 15:31 UTC

On Thursday, April 6, 2023 at 5:00:06 AM UTC-5, luke.l...@gmail.com wrote:
> On Monday, April 3, 2023 at 10:27:30 PM UTC+1, MitchAlsup wrote:
> > On Sunday, April 2, 2023 at 8:31:24 PM UTC-5, luke.l...@gmail.com wrote:
> > > indeed. only problem being that Tomasulo requires multi-ported CAMs
> > <
> > That has been the lore for 30 years, but Shebanow and I solved this in 1991.
> care to tell? wouldn't happen to have involved splitting the ROB# into
> a partial unary- partial binary- encoding (6600-style unary DMs combined
> with traditional CAM) would it?
<
Nope, much more straightforward than that.
But it does require knowing which function unit (or slot) delivers the result;
which Thomasulo did not, but Thornton did.
>
> l.

Re: Implementation of condition code on OoO CPUs

<848856b2-5a2f-488c-8045-c9d35ba1ede4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31541&group=comp.arch#31541

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:24c4:b0:73b:7c9b:35a7 with SMTP id m4-20020a05620a24c400b0073b7c9b35a7mr2250113qkn.9.1680795333626;
Thu, 06 Apr 2023 08:35:33 -0700 (PDT)
X-Received: by 2002:a05:6871:784:b0:177:bf3e:5d4f with SMTP id
o4-20020a056871078400b00177bf3e5d4fmr4770335oap.8.1680795333328; Thu, 06 Apr
2023 08:35:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Apr 2023 08:35:33 -0700 (PDT)
In-Reply-To: <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c8d3:9dfd:4dac:9816;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c8d3:9dfd:4dac:9816
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com> <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <848856b2-5a2f-488c-8045-c9d35ba1ede4n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Apr 2023 15:35:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3979
 by: MitchAlsup - Thu, 6 Apr 2023 15:35 UTC

On Thursday, April 6, 2023 at 5:12:30 AM UTC-5, luke.l...@gmail.com wrote:
> On Monday, April 3, 2023 at 10:40:44 PM UTC+1, MitchAlsup wrote:

> > > as long as you do not go above 3-in 2-out (where one of those
> > > outs is the result and the other is the Condition Code) then it is
> > > just about manageable from a DM Hazard perspective.
> > <
> > I managed to get N+1 in and 2-out with my CARRY strategy.......
<
> yes we're going to have to do micro-coding to avoid the
> DMs getting overloaded: the 1st in the chain has to be
> 3-in 1-out (to read the existing reg-used-as-carry-in) and
> the last in the chain has to be 2-in 2-out (to write the
> reg-used-as-carry-out), but everything in between can
> be 2-in 1-out *if* and only if an operand-forwarding-bus
> exists.
>
> i'm reasonably confident that CARRY would be similar?
> (certainly i'd expect you to have a Carry-Op-Fwd-Bus for sure)
<
Ins and Outs are similar (likely identical) but dealing with CARRY
was done by an independent result-operand bus--like programmable
forwarding.
>
> unless you don't store CARRY in Architectural State (SPR/CSRs)
> in which case it wouldn't be possible to service an interrupt
> in the middle of a chain-set, nor would it be possible to
> either start with an incoming CARRY, and you'd need one
> extra "digit" (one extra mul-add) with zeros in it in order
> to receive the carry-out into (actual) registers?
<
It does make it to architectural state when needed.
>
> i do really like the CARRY-has-more-than-one-bit concept.
> Power ISA version 1 had it (back when it was a 32-bit-only
> ISA) but they removed it. i think that was a mistake. we're
> now having to propose 5 new instructions (all 3-in 2-out)
> which "overload" one additional 64-bit register on RD
> and one with WR, with the implicit meaning of "64-bit carry-in/out",
> sigh.
<
Sins of the past catching up.....
>
> intriguingly they are all remarkably similar to Intel x86
> instructions that have existed for 12+ years, but are
> juuust that little bit subtly different, having been designed
> for biginteger arbitrary-length vector chaining.
> https://libre-soc.org/openpower/sv/biginteger/analysis/
>
> l.

Re: Implementation of condition code on OoO CPUs

<2023Apr7.090831@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31552&group=comp.arch#31552

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Implementation of condition code on OoO CPUs
Date: Fri, 07 Apr 2023 07:08:31 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2023Apr7.090831@mips.complang.tuwien.ac.at>
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com> <9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com> <0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com> <e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com> <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com> <u0mgls$c4jv$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="47321a2d7491f360f53f7cdf88144d7a";
logging-data="793991"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zuautRNeIOnhDhx85eWiH"
Cancel-Lock: sha1:UcPui/wBMpjY6mcrpOdcEeHJtJI=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 7 Apr 2023 07:08 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>x86 have always had 2-in/2-out as the standard MUL opcode, the only
>problem being that most of the (register) operands were implied: MULX
>provided a way to encode more register names.

Given that register-register moves are almost for free in Intel's
performance cores, that's only a minor benefit. The major benefit of
MULX for the intended application is that it does not destroy the
carry and overflow flags.

>For bigint applications I've usually found that the latency of the MUL
>provides enough time to do all the required ADD/ADC ops,

What kind of bigint applications do you have in mind? For
multiprecision multiplication, all the component multiplications are
independent of each other, and every additions is dependent on a
multiplication; e.g., in the simplest case RSTU = AB * CD (where A, B,
C, D, R, S, T, U are 64-bit values):

BD=B*D || AD=A*D || BC=B*C || AC=A*C
U=lo(BD) || Tc0,T0=hi(BD)+lo(AD) || Sc0,S0=hi(AD)+hi(BC)
Tc,T=T0+lo(BC) || Sc1,S1=S0+lo(AC)+Tc0 || R0=hi(AC)+Sc0
Sc,S=S1+Tc || R1=R0+Sc1
R=R1+Sc

Each line contains operations that can be performed in parallel.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Implementation of condition code on OoO CPUs

<901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31553&group=comp.arch#31553

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:152:b0:3de:d15a:847f with SMTP id v18-20020a05622a015200b003ded15a847fmr635150qtw.0.1680854573704;
Fri, 07 Apr 2023 01:02:53 -0700 (PDT)
X-Received: by 2002:aca:a9c2:0:b0:389:8dad:7832 with SMTP id
s185-20020acaa9c2000000b003898dad7832mr296217oie.8.1680854573194; Fri, 07 Apr
2023 01:02:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 Apr 2023 01:02:52 -0700 (PDT)
In-Reply-To: <2023Apr7.090831@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com> <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
<u0mgls$c4jv$1@dont-email.me> <2023Apr7.090831@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Fri, 07 Apr 2023 08:02:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2720
 by: luke.l...@gmail.com - Fri, 7 Apr 2023 08:02 UTC

On Friday, April 7, 2023 at 8:39:29 AM UTC+1, Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >x86 have always had 2-in/2-out as the standard MUL opcode, the only
> >problem being that most of the (register) operands were implied: MULX
> >provided a way to encode more register names.
> Given that register-register moves are almost for free in Intel's
> performance cores, that's only a minor benefit. The major benefit of
> MULX for the intended application is that it does not destroy the
> carry and overflow flags.

ahh yes that was it, i remember that in the whitepaper
oink, 404 not found
https://www.google.com/search?q=ia-large-integer-arithmetic-paper.pdf

ah HA!
https://web.archive.org/web/20150131061304/https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper..pdf

it wasn't just the addition of MULX, i was the addition of ADCX/ADOX
that did the trick.

the diagrams in that paper are cute, make it really clear what's
going on.

l.

Re: Implementation of condition code on OoO CPUs

<3e30278e-6e95-4665-8430-445d5ecd71f4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31554&group=comp.arch#31554

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:560b:b0:56f:80e:701b with SMTP id mg11-20020a056214560b00b0056f080e701bmr448457qvb.2.1680862964058;
Fri, 07 Apr 2023 03:22:44 -0700 (PDT)
X-Received: by 2002:a9d:75cc:0:b0:6a3:1eab:dbdf with SMTP id
c12-20020a9d75cc000000b006a31eabdbdfmr473181otl.6.1680862963717; Fri, 07 Apr
2023 03:22:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 Apr 2023 03:22:43 -0700 (PDT)
In-Reply-To: <901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:7805:ae79:1e0c:2dad;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:7805:ae79:1e0c:2dad
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com> <931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com> <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
<u0mgls$c4jv$1@dont-email.me> <2023Apr7.090831@mips.complang.tuwien.ac.at> <901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3e30278e-6e95-4665-8430-445d5ecd71f4n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 07 Apr 2023 10:22:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3385
 by: Michael S - Fri, 7 Apr 2023 10:22 UTC

On Friday, April 7, 2023 at 11:02:55 AM UTC+3, luke.l...@gmail.com wrote:
> On Friday, April 7, 2023 at 8:39:29 AM UTC+1, Anton Ertl wrote:
> > Terje Mathisen <terje.m...@tmsw.no> writes:
> > >x86 have always had 2-in/2-out as the standard MUL opcode, the only
> > >problem being that most of the (register) operands were implied: MULX
> > >provided a way to encode more register names.
> > Given that register-register moves are almost for free in Intel's
> > performance cores, that's only a minor benefit. The major benefit of
> > MULX for the intended application is that it does not destroy the
> > carry and overflow flags.
> ahh yes that was it, i remember that in the whitepaper
> oink, 404 not found
> https://www.google.com/search?q=ia-large-integer-arithmetic-paper.pdf
>
> ah HA!
> https://web.archive.org/web/20150131061304/https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf
>
> it wasn't just the addition of MULX, i was the addition of ADCX/ADOX
> that did the trick.
>
> the diagrams in that paper are cute, make it really clear what's
> going on.
>
> l.

They forgot to make timing comparisons for implementation of multiplication
of particular size with and without MULX/ADCX/ADOX.
I'd like to see the numbers not just for 512x512, which does not strike me as
something used commonly, but also for smaller sizes, down to 128x128 which
looks to me as most frequently used case of extended precision multiplication.

Re: Implementation of condition code on OoO CPUs

<2023Apr7.142207@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31558&group=comp.arch#31558

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Implementation of condition code on OoO CPUs
Date: Fri, 07 Apr 2023 12:22:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 32
Message-ID: <2023Apr7.142207@mips.complang.tuwien.ac.at>
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com> <d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com> <e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com> <ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com> <u0mgls$c4jv$1@dont-email.me> <2023Apr7.090831@mips.complang.tuwien.ac.at> <901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com> <3e30278e-6e95-4665-8430-445d5ecd71f4n@googlegroups.com>
Injection-Info: dont-email.me; posting-host="47321a2d7491f360f53f7cdf88144d7a";
logging-data="873775"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Z61oN40jIOe80bTIrFdkO"
Cancel-Lock: sha1:IeJ7WzaQ09dLQHypYDaiuTBs/0w=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 7 Apr 2023 12:22 UTC

Michael S <already5chosen@yahoo.com> writes:
>They forgot to make timing comparisons for implementation of multiplication
>of particular size with and without MULX/ADCX/ADOX.
>I'd like to see the numbers not just for 512x512, which does not strike me =
>as
>something used commonly, but also for smaller sizes, down to 128x128 which
>looks to me as most frequently used case of extended precision multiplicati=
>on.

What makes you think that 128x128 is most-frequently used? And do you
have any numbers on usage frequency that I can cite?

My thinking is that the biggest use of multiprecision arithmetics is
in cryptography, where RSA works with 1024-bit (deemed too small by
many these days), 2048-bit, and 4096-bit keys, and taking the p-th
power of those involves a number of squarings an multiplications (and
Intel has a paper on squaring in addition to one on multiplication).

However,

1) RSA is only used for exchanging a symmetric key, and then the rest
of the connection uses a symmetric key; nevertheless, key exchange is
relevant for the startup overhead of a connection.

2) Many users seem to replace RSA by elliptic curve cryptography
(e.g., Ed25519); I don't know how important multi-precision
multiplication is for that.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Implementation of condition code on OoO CPUs

<u0p3lt$qpcm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31559&group=comp.arch#31559

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Implementation of condition code on OoO CPUs
Date: Fri, 7 Apr 2023 14:52:12 +0200
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <u0p3lt$qpcm$1@dont-email.me>
References: <2023Mar30.180206@mips.complang.tuwien.ac.at>
<9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>
<931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
<d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
<ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
<u0mgls$c4jv$1@dont-email.me> <2023Apr7.090831@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 Apr 2023 12:52:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="48ac017bb7fe2e601ec0f9ff6384e426";
logging-data="877974"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+kPtViWd5mnzTdr/w1js1hv04VCdifSAZR3uJNyyGavA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha1:KqUveAkHopF5rPywHvVK4vp9owU=
In-Reply-To: <2023Apr7.090831@mips.complang.tuwien.ac.at>
 by: Terje Mathisen - Fri, 7 Apr 2023 12:52 UTC

I had forgotten about how MULX skips all the flag modifications, that
does make it much easier to employ, at least for a compiler.

My own usage have typically been 4-wide, i.e. implementing 128-bit
operations with 32-bit regs. I did that both during FDIV/FPATAN2
verification and for DFC, the Cern AES candidate.

As I've noted here previously, an IMACC operation doing (sumhi, sumlo) =
a * b + c + d is pretty much perfect, except for needing far too many
register specifiers.

The operation cannot overflow, making it even superior to IMAC with a
single addend, and as Mitch have taught us, any extra addends feeding
into the last stage of the MUL are almost free.

Terje

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> x86 have always had 2-in/2-out as the standard MUL opcode, the only
>> problem being that most of the (register) operands were implied: MULX
>> provided a way to encode more register names.
>
> Given that register-register moves are almost for free in Intel's
> performance cores, that's only a minor benefit. The major benefit of
> MULX for the intended application is that it does not destroy the
> carry and overflow flags.
>
>> For bigint applications I've usually found that the latency of the MUL
>> provides enough time to do all the required ADD/ADC ops,
>
> What kind of bigint applications do you have in mind? For
> multiprecision multiplication, all the component multiplications are
> independent of each other, and every additions is dependent on a
> multiplication; e.g., in the simplest case RSTU = AB * CD (where A, B,
> C, D, R, S, T, U are 64-bit values):
>
> BD=B*D || AD=A*D || BC=B*C || AC=A*C
> U=lo(BD) || Tc0,T0=hi(BD)+lo(AD) || Sc0,S0=hi(AD)+hi(BC)
> Tc,T=T0+lo(BC) || Sc1,S1=S0+lo(AC)+Tc0 || R0=hi(AC)+Sc0
> Sc,S=S1+Tc || R1=R0+Sc1
> R=R1+Sc
>
> Each line contains operations that can be performed in parallel.
>
> - anton
>

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Implementation of condition code on OoO CPUs

<u0p54b$qvos$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31560&group=comp.arch#31560

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Implementation of condition code on OoO CPUs
Date: Fri, 7 Apr 2023 15:16:58 +0200
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <u0p54b$qvos$1@dont-email.me>
References: <2023Mar30.180206@mips.complang.tuwien.ac.at>
<9e9fd2fc-cd45-45c6-bd83-ba8593b2239cn@googlegroups.com>
<9cd85cc7-ee0b-4ca3-9527-46a4eccd9bcfn@googlegroups.com>
<931407f7-c5b0-4049-ac86-acc8fd22f478n@googlegroups.com>
<0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
<d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com>
<e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
<ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com>
<u0mgls$c4jv$1@dont-email.me> <2023Apr7.090831@mips.complang.tuwien.ac.at>
<901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com>
<3e30278e-6e95-4665-8430-445d5ecd71f4n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Apr 2023 13:16:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="48ac017bb7fe2e601ec0f9ff6384e426";
logging-data="884508"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18j5iT8BwdPyj/MmFNBiAsv+t0uB1+YQs4rqPc44xIy1Q=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha1:tRptzCc1Mco5UTV8FN99ReZDHY0=
In-Reply-To: <3e30278e-6e95-4665-8430-445d5ecd71f4n@googlegroups.com>
 by: Terje Mathisen - Fri, 7 Apr 2023 13:16 UTC

Michael S wrote:
> On Friday, April 7, 2023 at 11:02:55 AM UTC+3, luke.l...@gmail.com wrote:
>> On Friday, April 7, 2023 at 8:39:29 AM UTC+1, Anton Ertl wrote:
>>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>>> x86 have always had 2-in/2-out as the standard MUL opcode, the only
>>>> problem being that most of the (register) operands were implied: MULX
>>>> provided a way to encode more register names.
>>> Given that register-register moves are almost for free in Intel's
>>> performance cores, that's only a minor benefit. The major benefit of
>>> MULX for the intended application is that it does not destroy the
>>> carry and overflow flags.
>> ahh yes that was it, i remember that in the whitepaper
>> oink, 404 not found
>> https://www.google.com/search?q=ia-large-integer-arithmetic-paper.pdf
>>
>> ah HA!
>> https://web.archive.org/web/20150131061304/https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf
>>
>> it wasn't just the addition of MULX, i was the addition of ADCX/ADOX
>> that did the trick.
>>
>> the diagrams in that paper are cute, make it really clear what's
>> going on.
>>
>> l.
>
> They forgot to make timing comparisons for implementation of multiplication
> of particular size with and without MULX/ADCX/ADOX.
> I'd like to see the numbers not just for 512x512, which does not strike me as
> something used commonly, but also for smaller sizes, down to 128x128 which
> looks to me as most frequently used case of extended precision multiplication.
>
Intermediate-size (512-2048 bits) are important for SSL/TSL setup, right?

I do agree that 128-bit operations are probably more common in
application code, but you can get reasonable performance for those with
just a regular N*N->2N mul and some add with carry intrinsics.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Implementation of condition code on OoO CPUs

<6d98a54e-a6b1-4f97-a61f-f87b23b547b2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=31562&group=comp.arch#31562

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4447:b0:71f:b89c:5ac7 with SMTP id w7-20020a05620a444700b0071fb89c5ac7mr427162qkp.8.1680874089965;
Fri, 07 Apr 2023 06:28:09 -0700 (PDT)
X-Received: by 2002:a05:6870:1114:b0:177:9150:e7ba with SMTP id
20-20020a056870111400b001779150e7bamr1141565oaf.3.1680874089478; Fri, 07 Apr
2023 06:28:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 Apr 2023 06:28:09 -0700 (PDT)
In-Reply-To: <2023Apr7.142207@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:70e8:578b:23d8:d0b6;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:70e8:578b:23d8:d0b6
References: <2023Mar30.180206@mips.complang.tuwien.ac.at> <0e3b17c9-8524-4f32-bc14-97b8d0bb87e8n@googlegroups.com>
<d29ec7de-ddf8-4487-9166-18b8da50f96bn@googlegroups.com> <e1d3cee8-51cb-4da4-ab40-86459047a5cbn@googlegroups.com>
<ca412eca-5891-4656-b021-ddfac8441eafn@googlegroups.com> <u0mgls$c4jv$1@dont-email.me>
<2023Apr7.090831@mips.complang.tuwien.ac.at> <901f6832-b341-4c45-9759-b2f57ec4e528n@googlegroups.com>
<3e30278e-6e95-4665-8430-445d5ecd71f4n@googlegroups.com> <2023Apr7.142207@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6d98a54e-a6b1-4f97-a61f-f87b23b547b2n@googlegroups.com>
Subject: Re: Implementation of condition code on OoO CPUs
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 07 Apr 2023 13:28:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Michael S - Fri, 7 Apr 2023 13:28 UTC

On Friday, April 7, 2023 at 3:37:00 PM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >They forgot to make timing comparisons for implementation of multiplication
> >of particular size with and without MULX/ADCX/ADOX.
> >I'd like to see the numbers not just for 512x512, which does not strike me > >as
> >something used commonly, but also for smaller sizes, down to 128x128 which
> >looks to me as most frequently used case of extended precision multiplicati=
> >on.
>
> What makes you think that 128x128 is most-frequently used? And do you
> have any numbers on usage frequency that I can cite?
>

May be, it's just me.
As you likely know, I have a soft spot extended precision floating point.
Esp. for variants with 128-bit and 112-bit mantissa with, because they
are much faster than the rest and are wide enough to test numeric stability
of algorithms implemented IEEE binary64, which, at least for me, happens
to be the main use of extended precision floating point.

> My thinking is that the biggest use of multiprecision arithmetics is
> in cryptography, where RSA works with 1024-bit (deemed too small by
> many these days), 2048-bit, and 4096-bit keys, and taking the p-th
> power of those involves a number of squarings an multiplications (and
> Intel has a paper on squaring in addition to one on multiplication).
>
> However,
>
> 1) RSA is only used for exchanging a symmetric key, and then the rest
> of the connection uses a symmetric key; nevertheless, key exchange is
> relevant for the startup overhead of a connection.
>
> 2) Many users seem to replace RSA by elliptic curve cryptography
> (e.g., Ed25519); I don't know how important multi-precision
> multiplication is for that.

I did ECC few years ago. In my case it was implementation of ECDSA
signature checks on resource-limited 32-bit embedded platform.
Unfortunately, by now I forgot majority of details.
Out of memory, on the critical path there was 256x256 multiplication with
only low 256 bits of result in use. But I can be wrong about it.
May be, 160 bits rather than 256? Or 320?

> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Pages:12
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor