Message-ID:

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.

devel / comp.arch / Compare instruction with multiple results

Compare instruction with multiple results

<trl5kb$6isc$1@newsreader4.netcologne.de>

https://www.novabbs.com/devel/article-flat.php?id=30713&group=comp.arch#30713

by: Thomas Koenig - Sat, 4 Feb 2023 08:39 UTC

Mitch's My66000 has a compare instruction which produces a multitude
of result bits in a target register. This can then later be used
for a branch or predicate on bit, or shifted to get an integer
value.

Just wondering: Are there other architectures which use a
similar approach? ARM uses condition codes, MIPS/Alpha use
comparisons which have zero or one put into a general register,
POWER has several fields in its condition register, 68000 has a
flags register, ...

Re: Compare instruction with multiple results

<0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30714&group=comp.arch#30714

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1261:b0:72c:736d:ffe0 with SMTP id b1-20020a05620a126100b0072c736dffe0mr677121qkl.376.1675508121829;
Sat, 04 Feb 2023 02:55:21 -0800 (PST)
X-Received: by 2002:a05:6870:d28a:b0:169:bcb9:88d0 with SMTP id
d10-20020a056870d28a00b00169bcb988d0mr1385931oae.106.1675508121612; Sat, 04
Feb 2023 02:55:21 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 4 Feb 2023 02:55:21 -0800 (PST)
In-Reply-To: <trl5kb$6isc$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dde:6a00:5060:3135:cbb6:c18f;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dde:6a00:5060:3135:cbb6:c18f
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 04 Feb 2023 10:55:21 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2610

by: robf...@gmail.com - Sat, 4 Feb 2023 10:55 UTC

On Saturday, February 4, 2023 at 3:39:43 AM UTC-5, Thomas Koenig wrote:
> Mitch's My66000 has a compare instruction which produces a multitude
> of result bits in a target register. This can then later be used
> for a branch or predicate on bit, or shifted to get an integer
> value.
>
> Just wondering: Are there other architectures which use a
> similar approach? ARM uses condition codes, MIPS/Alpha use
> comparisons which have zero or one put into a general register,
> POWER has several fields in its condition register, 68000 has a
> flags register, ...

I cannot think of any others. The older archs seem to use c,v,n,z flags. Decode
takes place during branch instead of compare.

The My66000 approach is a good way of doing a lot of comparisons using only
a single instruction. It is somewhat like using a GPR as a flags register. the c,v,n,z
flags could be present, but it makes more sense just to decode them at the same
time into single bit values as that is what logic ultimately operates on. The
decode is during the compare instead of during the branch. The equivalent of
SETcc is available by doing a bit extract of the compare result. I am using the
same approach for compares in the Thor2023 architecture. Thor2023 has branch
on register zero / non-zero in addition to branch on bit set.
Specifying only a single register in the branch instruction allows a nice large branch
displacement.

Re: Compare instruction with multiple results

<2023Feb4.172830@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30721&group=comp.arch#30721

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
Date: Sat, 04 Feb 2023 16:28:30 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 22
Distribution: world
Message-ID: <2023Feb4.172830@mips.complang.tuwien.ac.at>
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
Injection-Info: reader01.eternal-september.org; posting-host="2c45c42c9421cd190944064a15a8ff76";
logging-data="2158507"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19yZKrcOs8bPowpiHdfDeWu"
Cancel-Lock: sha1:sAweTjLzOq9PWllU2Uxj+KEq1B4=
X-newsreader: xrn 10.11

by: Anton Ertl - Sat, 4 Feb 2023 16:28 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Mitch's My66000 has a compare instruction which produces a multitude
>of result bits in a target register. This can then later be used
>for a branch or predicate on bit, or shifted to get an integer
>value.
>
>Just wondering: Are there other architectures which use a
>similar approach?

The 88000 architecture (by Mitch Alsup) also uses this approach.

>POWER has several fields in its condition register

Power is similar in having separate bits for <,=,> (and a fourth bit,
don't remember what that does) for each comparison result rather than
having the more common NZCV collection. But because it does not have
the full complement of bits, it needs several compare instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Compare instruction with multiple results

<afa4cd14-9003-40fe-a9c0-943821437af0n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30723&group=comp.arch#30723

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1447:b0:3b9:bfac:8327 with SMTP id v7-20020a05622a144700b003b9bfac8327mr1173278qtx.315.1675533485522;
Sat, 04 Feb 2023 09:58:05 -0800 (PST)
X-Received: by 2002:a05:6870:58aa:b0:163:b0c5:f852 with SMTP id
be42-20020a05687058aa00b00163b0c5f852mr892963oab.9.1675533485259; Sat, 04 Feb
2023 09:58:05 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 4 Feb 2023 09:58:04 -0800 (PST)
In-Reply-To: <trl5kb$6isc$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:44ad:d89d:84bf:db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:44ad:d89d:84bf:db
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <afa4cd14-9003-40fe-a9c0-943821437af0n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 04 Feb 2023 17:58:05 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1782

by: MitchAlsup - Sat, 4 Feb 2023 17:58 UTC

On Saturday, February 4, 2023 at 2:39:43 AM UTC-6, Thomas Koenig wrote:
> Mitch's My66000 has a compare instruction which produces a multitude
> of result bits in a target register. This can then later be used
> for a branch or predicate on bit, or shifted to get an integer
> value.
>
> Just wondering: Are there other architectures which use a
> similar approach? ARM uses condition codes, MIPS/Alpha use
> comparisons which have zero or one put into a general register,
> POWER has several fields in its condition register, 68000 has a
> flags register, ...
<
My Mc 88K architecture had something very similar.

Re: Compare instruction with multiple results

<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30724&group=comp.arch#30724

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:c6:b0:3b9:bbba:f444 with SMTP id p6-20020a05622a00c600b003b9bbbaf444mr1288478qtw.164.1675533627137;
Sat, 04 Feb 2023 10:00:27 -0800 (PST)
X-Received: by 2002:a05:6808:8c3:b0:37a:ca72:875f with SMTP id
k3-20020a05680808c300b0037aca72875fmr520439oij.113.1675533626897; Sat, 04 Feb
2023 10:00:26 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 4 Feb 2023 10:00:26 -0800 (PST)
In-Reply-To: <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:44ad:d89d:84bf:db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:44ad:d89d:84bf:db
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 04 Feb 2023 18:00:27 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2967

by: MitchAlsup - Sat, 4 Feb 2023 18:00 UTC

On Saturday, February 4, 2023 at 4:55:23 AM UTC-6, robf...@gmail.com wrote:
> On Saturday, February 4, 2023 at 3:39:43 AM UTC-5, Thomas Koenig wrote:
> > Mitch's My66000 has a compare instruction which produces a multitude
> > of result bits in a target register. This can then later be used
> > for a branch or predicate on bit, or shifted to get an integer
> > value.
> >
> > Just wondering: Are there other architectures which use a
> > similar approach? ARM uses condition codes, MIPS/Alpha use
> > comparisons which have zero or one put into a general register,
> > POWER has several fields in its condition register, 68000 has a
> > flags register, ...
> I cannot think of any others. The older archs seem to use c,v,n,z flags. Decode
> takes place during branch instead of compare.
>
> The My66000 approach is a good way of doing a lot of comparisons using only
> a single instruction. It is somewhat like using a GPR as a flags register. the c,v,n,z
> flags could be present, but it makes more sense just to decode them at the same
> time into single bit values as that is what logic ultimately operates on. The
> decode is during the compare instead of during the branch. The equivalent of
> SETcc is available by doing a bit extract of the compare result. I am using the
> same approach for compares in the Thor2023 architecture. Thor2023 has branch
> on register zero / non-zero in addition to branch on bit set.
> Specifying only a single register in the branch instruction allows a nice large branch
> displacement.
<
In addition, Brian's compiler is capable of seeing a comparison at one
place in the subroutine, and using it multiple times -- just like any other
common sub-expression.

Re: Compare instruction with multiple results

<trnnts$8962$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30731&group=comp.arch#30731

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-22da-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
Date: Sun, 5 Feb 2023 08:04:12 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <trnnts$8962$1@newsreader4.netcologne.de>
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
<afa4cd14-9003-40fe-a9c0-943821437af0n@googlegroups.com>
Injection-Date: Sun, 5 Feb 2023 08:04:12 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-22da-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:22da:0:7285:c2ff:fe6c:992d";
logging-data="271554"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 5 Feb 2023 08:04 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Saturday, February 4, 2023 at 2:39:43 AM UTC-6, Thomas Koenig wrote:
>> Mitch's My66000 has a compare instruction which produces a multitude
>> of result bits in a target register. This can then later be used
>> for a branch or predicate on bit, or shifted to get an integer
>> value.
>>
>> Just wondering: Are there other architectures which use a
>> similar approach? ARM uses condition codes, MIPS/Alpha use
>> comparisons which have zero or one put into a general register,
>> POWER has several fields in its condition register, 68000 has a
>> flags register, ...
><
> My Mc 88K architecture had something very similar.

Yes indeed. I should have thought to look there :-)

Re: Compare instruction with multiple results

<trnqft$8aje$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30732&group=comp.arch#30732

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-22da-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
Date: Sun, 5 Feb 2023 08:47:57 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <trnqft$8aje$1@newsreader4.netcologne.de>
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
<0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com>
Injection-Date: Sun, 5 Feb 2023 08:47:57 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-22da-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:22da:0:7285:c2ff:fe6c:992d";
logging-data="273006"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 5 Feb 2023 08:47 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Saturday, February 4, 2023 at 4:55:23 AM UTC-6, robf...@gmail.com wrote:
>> On Saturday, February 4, 2023 at 3:39:43 AM UTC-5, Thomas Koenig wrote:
>> > Mitch's My66000 has a compare instruction which produces a multitude
>> > of result bits in a target register. This can then later be used
>> > for a branch or predicate on bit, or shifted to get an integer
>> > value.
>> >
>> > Just wondering: Are there other architectures which use a
>> > similar approach? ARM uses condition codes, MIPS/Alpha use
>> > comparisons which have zero or one put into a general register,
>> > POWER has several fields in its condition register, 68000 has a
>> > flags register, ...
>> I cannot think of any others. The older archs seem to use c,v,n,z flags. Decode
>> takes place during branch instead of compare.
>>
>> The My66000 approach is a good way of doing a lot of comparisons using only
>> a single instruction. It is somewhat like using a GPR as a flags register. the c,v,n,z
>> flags could be present, but it makes more sense just to decode them at the same
>> time into single bit values as that is what logic ultimately operates on. The
>> decode is during the compare instead of during the branch. The equivalent of
>> SETcc is available by doing a bit extract of the compare result. I am using the
>> same approach for compares in the Thor2023 architecture. Thor2023 has branch
>> on register zero / non-zero in addition to branch on bit set.
>> Specifying only a single register in the branch instruction allows a nice large branch
>> displacement.
><
> In addition, Brian's compiler is capable of seeing a comparison at one
> place in the subroutine, and using it multiple times -- just like any other
> common sub-expression.

Sounds like the reasonable thing to do.

Regarding M88000: Looking at early releases (gcc 3.3, no less)
for the m88k machine description, I see that that port did indeed
use CCMode in general registers, so I would expect the same behavior
for that port.

What gcc does (and, I presume, LLVM as well) is to model condition
codes to a class of registers. This can be a hard-coded single
register (like on x86), several specialized registers like POWER,
or general registers like MIPS or m88k (or my66000, if anybody
should attempt a gcc port).

On architectures with more than one condition code registers
these can then be treated with the same algorithms for register
allocation, common subexpression elimination etc like general
registers.

Re: Compare instruction with multiple results

<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30733&group=comp.arch#30733

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:148d:b0:3ba:240b:9997 with SMTP id t13-20020a05622a148d00b003ba240b9997mr27487qtx.36.1675596041355;
Sun, 05 Feb 2023 03:20:41 -0800 (PST)
X-Received: by 2002:a05:6808:2899:b0:35c:27c2:68a4 with SMTP id
eu25-20020a056808289900b0035c27c268a4mr892664oib.42.1675596041131; Sun, 05
Feb 2023 03:20:41 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Feb 2023 03:20:40 -0800 (PST)
In-Reply-To: <trnqft$8aje$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=128.76.247.189; posting-account=tYjOgQoAAACRs74arwcusKjVVQt_fFMX
NNTP-Posting-Host: 128.76.247.189
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: agf...@dtu.dk (Agner Fog)
Injection-Date: Sun, 05 Feb 2023 11:20:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2052

by: Agner Fog - Sun, 5 Feb 2023 11:20 UTC

Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.

State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op. This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).

Re: Compare instruction with multiple results

<d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30739&group=comp.arch#30739

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7f51:0:b0:3b9:eeaf:14fc with SMTP id g17-20020ac87f51000000b003b9eeaf14fcmr971526qtk.253.1675627276105;
Sun, 05 Feb 2023 12:01:16 -0800 (PST)
X-Received: by 2002:a05:6870:d28a:b0:169:bcb9:88d0 with SMTP id
d10-20020a056870d28a00b00169bcb988d0mr1955177oae.106.1675627275846; Sun, 05
Feb 2023 12:01:15 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Feb 2023 12:01:15 -0800 (PST)
In-Reply-To: <acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd52:f337:70c4:ca47;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd52:f337:70c4:ca47
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 05 Feb 2023 20:01:16 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3964

by: MitchAlsup - Sun, 5 Feb 2023 20:01 UTC

On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
> Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
>
> State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
<
What makes you think that CMP-Bcnd cannot be fused into one OP that rides
down the pipeline in a single beat ?? Branch does not deliver a result, and at
the 95% level, the branch consumes the result of the CMP. So we have 2 inst
that consume 2 normal operands and deliver 1 result combined with a possible
change of control point.
<
>This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).
<
MIPS and RISC-V take that strategy. However when I survey the CMP-Bcnd-s
in EMBench and CoreMark emitted by the LLVM compiler for both My 66000
and for RISC-V, I see no useful advantage.
<
I should note (for those not paying attention):: that My 66000 has a complete
set of branches with respect to zero (including floating point.) So, all comparisons
against zero are essentially equivalent; Forward.Com versus My 66000.
<
For the remaining 40% if branches an awful lot of them are comparisons
to/with non-zero constants, of which MIPS and RISC-V do not have direct
instruction support. Maybe Forward.Com has constants with its alu-bcnd
that the other 2 do not.
<
When I started reading the code out of the LLVM RISC-V compiler and started
to compare with My 66000 compiler assembly; I honestly though that the
Alu-Bcnd would be a place My 66000 ISA got beat up. However, after looking
at 200,000 lines of asm out of each compiler, I do not see any advantage--sure
there are cases where the MIPS strategy works (and well!) but there are other
cases where access to constants was at least as valuable. and all in all, it turns
out to be a wash.
<
I would be happy to plow through that ASM again to extract illustrative
examples where this side won and that side lost, and vice versa.

Re: Compare instruction with multiple results

<e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30742&group=comp.arch#30742

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5786:0:b0:3b8:671e:9a9c with SMTP id v6-20020ac85786000000b003b8671e9a9cmr1431820qta.75.1675641329976;
Sun, 05 Feb 2023 15:55:29 -0800 (PST)
X-Received: by 2002:a05:6870:6082:b0:169:f122:97d9 with SMTP id
t2-20020a056870608200b00169f12297d9mr1290425oae.42.1675641329510; Sun, 05 Feb
2023 15:55:29 -0800 (PST)
Path: i2pn2.org!rocksolid2!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Feb 2023 15:55:29 -0800 (PST)
In-Reply-To: <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com> <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sun, 05 Feb 2023 23:55:29 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4805

by: robf...@gmail.com - Sun, 5 Feb 2023 23:55 UTC

On Sunday, February 5, 2023 at 3:01:17 PM UTC-5, MitchAlsup wrote:
> On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
> > Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
> >
> > State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
> <
> What makes you think that CMP-Bcnd cannot be fused into one OP that rides
> down the pipeline in a single beat ?? Branch does not deliver a result, and at
> the 95% level, the branch consumes the result of the CMP. So we have 2 inst
> that consume 2 normal operands and deliver 1 result combined with a possible
> change of control point.
> <
> >This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom)..
> <
> MIPS and RISC-V take that strategy. However when I survey the CMP-Bcnd-s
> in EMBench and CoreMark emitted by the LLVM compiler for both My 66000
> and for RISC-V, I see no useful advantage.
> <
> I should note (for those not paying attention):: that My 66000 has a complete
> set of branches with respect to zero (including floating point.) So, all comparisons
> against zero are essentially equivalent; Forward.Com versus My 66000.
> <
I assume this means EQ, NE, LT, LE, GT, GE compares against zero? I only
included EQ, NE in my ISA, but there is still time to change this.

> For the remaining 40% if branches an awful lot of them are comparisons
> to/with non-zero constants, of which MIPS and RISC-V do not have direct
> instruction support. Maybe Forward.Com has constants with its alu-bcnd
> that the other 2 do not.
> <
> When I started reading the code out of the LLVM RISC-V compiler and started
> to compare with My 66000 compiler assembly; I honestly though that the
> Alu-Bcnd would be a place My 66000 ISA got beat up. However, after looking
> at 200,000 lines of asm out of each compiler, I do not see any advantage--sure
> there are cases where the MIPS strategy works (and well!) but there are other
> cases where access to constants was at least as valuable. and all in all, it turns
> out to be a wash.
> <
> I would be happy to plow through that ASM again to extract illustrative
> examples where this side won and that side lost, and vice versa.

I found many compare-to-immediate then branch used in switch() processing. The
compiler saved some code by testing for one-hot switches then using BBS and BBC
instructions to test for constants. The compiler has a capability to generate one-hot
case constants using enums. Which are then used for switch statements. Which
then can branch using BBS and BBC without needing a compare. Wondering if
My66000 compiler can do something similar.

Re: Compare instruction with multiple results

<f73ab1c6-eb9b-4ffe-b393-a3e134102669n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30743&group=comp.arch#30743

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:c6:b0:3b9:bbba:f444 with SMTP id p6-20020a05622a00c600b003b9bbbaf444mr1626863qtw.164.1675643021009;
Sun, 05 Feb 2023 16:23:41 -0800 (PST)
X-Received: by 2002:aca:5b41:0:b0:378:840d:349 with SMTP id
p62-20020aca5b41000000b00378840d0349mr753276oib.167.1675643020758; Sun, 05
Feb 2023 16:23:40 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Feb 2023 16:23:40 -0800 (PST)
In-Reply-To: <e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd52:f337:70c4:ca47;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd52:f337:70c4:ca47
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com> <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
<e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f73ab1c6-eb9b-4ffe-b393-a3e134102669n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 06 Feb 2023 00:23:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5796

by: MitchAlsup - Mon, 6 Feb 2023 00:23 UTC

On Sunday, February 5, 2023 at 5:55:31 PM UTC-6, robf...@gmail.com wrote:
> On Sunday, February 5, 2023 at 3:01:17 PM UTC-5, MitchAlsup wrote:
> > On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
> > > Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
> > >
> > > State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
> > <
> > What makes you think that CMP-Bcnd cannot be fused into one OP that rides
> > down the pipeline in a single beat ?? Branch does not deliver a result, and at
> > the 95% level, the branch consumes the result of the CMP. So we have 2 inst
> > that consume 2 normal operands and deliver 1 result combined with a possible
> > change of control point.
> > <
> > >This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).
> > <
> > MIPS and RISC-V take that strategy. However when I survey the CMP-Bcnd-s
> > in EMBench and CoreMark emitted by the LLVM compiler for both My 66000
> > and for RISC-V, I see no useful advantage.
> > <
> > I should note (for those not paying attention):: that My 66000 has a complete
> > set of branches with respect to zero (including floating point.) So, all comparisons
> > against zero are essentially equivalent; Forward.Com versus My 66000.
> > <
> I assume this means EQ, NE, LT, LE, GT, GE compares against zero? I only
> included EQ, NE in my ISA, but there is still time to change this.
<
You get {float, double, signed, unsigned}×{==, !=, <, <=, >, >=} against zero.
<
> > For the remaining 40% if branches an awful lot of them are comparisons
> > to/with non-zero constants, of which MIPS and RISC-V do not have direct
> > instruction support. Maybe Forward.Com has constants with its alu-bcnd
> > that the other 2 do not.
> > <
> > When I started reading the code out of the LLVM RISC-V compiler and started
> > to compare with My 66000 compiler assembly; I honestly though that the
> > Alu-Bcnd would be a place My 66000 ISA got beat up. However, after looking
> > at 200,000 lines of asm out of each compiler, I do not see any advantage--sure
> > there are cases where the MIPS strategy works (and well!) but there are other
> > cases where access to constants was at least as valuable. and all in all, it turns
> > out to be a wash.
> > <
> > I would be happy to plow through that ASM again to extract illustrative
> > examples where this side won and that side lost, and vice versa.
<
> I found many compare-to-immediate then branch used in switch() processing.. The
> compiler saved some code by testing for one-hot switches then using BBS and BBC
> instructions to test for constants. The compiler has a capability to generate one-hot
> case constants using enums. Which are then used for switch statements. Which
> then can branch using BBS and BBC without needing a compare. Wondering if
> My66000 compiler can do something similar.
<
My 66000 has a Jump-Through-Table instruction (switch) as 1 word instruction.
The instruction examines the table index, and if it is our of bounds, control is
transferred to default: otherwise an element in the table is accessed and used
as an offset to the JTT instruction as a PIC control transfer. 1 instruction, 4 cycles,
and the table is accessed via the instruction cache.
<
Most switch offset tables are comprised of 16-bit elements (although many
could be 8-bit entries). There is another form of JTT that is appropriate for
method calling.

Re: Compare instruction with multiple results

<fd5f2af1-1abe-4236-9bfc-0d271bee05fcn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30744&group=comp.arch#30744

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:165b:b0:717:a5d4:de3f with SMTP id c27-20020a05620a165b00b00717a5d4de3fmr1171180qko.157.1675649690086;
Sun, 05 Feb 2023 18:14:50 -0800 (PST)
X-Received: by 2002:a05:6870:6082:b0:169:f122:97d9 with SMTP id
t2-20020a056870608200b00169f12297d9mr1326465oae.42.1675649689800; Sun, 05 Feb
2023 18:14:49 -0800 (PST)
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Feb 2023 18:14:49 -0800 (PST)
In-Reply-To: <f73ab1c6-eb9b-4ffe-b393-a3e134102669n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com> <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
<e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com> <f73ab1c6-eb9b-4ffe-b393-a3e134102669n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fd5f2af1-1abe-4236-9bfc-0d271bee05fcn@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Mon, 06 Feb 2023 02:14:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: robf...@gmail.com - Mon, 6 Feb 2023 02:14 UTC

On Sunday, February 5, 2023 at 7:23:42 PM UTC-5, MitchAlsup wrote:
> On Sunday, February 5, 2023 at 5:55:31 PM UTC-6, robf...@gmail.com wrote:
> > On Sunday, February 5, 2023 at 3:01:17 PM UTC-5, MitchAlsup wrote:
> > > On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
> > > > Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
> > > >
> > > > State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
> > > <
> > > What makes you think that CMP-Bcnd cannot be fused into one OP that rides
> > > down the pipeline in a single beat ?? Branch does not deliver a result, and at
> > > the 95% level, the branch consumes the result of the CMP. So we have 2 inst
> > > that consume 2 normal operands and deliver 1 result combined with a possible
> > > change of control point.
> > > <
> > > >This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).
> > > <
> > > MIPS and RISC-V take that strategy. However when I survey the CMP-Bcnd-s
> > > in EMBench and CoreMark emitted by the LLVM compiler for both My 66000
> > > and for RISC-V, I see no useful advantage.
> > > <
> > > I should note (for those not paying attention):: that My 66000 has a complete
> > > set of branches with respect to zero (including floating point.) So, all comparisons
> > > against zero are essentially equivalent; Forward.Com versus My 66000.
> > > <
> > I assume this means EQ, NE, LT, LE, GT, GE compares against zero? I only
> > included EQ, NE in my ISA, but there is still time to change this.
> <
> You get {float, double, signed, unsigned}×{==, !=, <, <=, >, >=} against zero.
> <
> > > For the remaining 40% if branches an awful lot of them are comparisons
> > > to/with non-zero constants, of which MIPS and RISC-V do not have direct
> > > instruction support. Maybe Forward.Com has constants with its alu-bcnd
> > > that the other 2 do not.
> > > <
> > > When I started reading the code out of the LLVM RISC-V compiler and started
> > > to compare with My 66000 compiler assembly; I honestly though that the
> > > Alu-Bcnd would be a place My 66000 ISA got beat up. However, after looking
> > > at 200,000 lines of asm out of each compiler, I do not see any advantage--sure
> > > there are cases where the MIPS strategy works (and well!) but there are other
> > > cases where access to constants was at least as valuable. and all in all, it turns
> > > out to be a wash.
> > > <
> > > I would be happy to plow through that ASM again to extract illustrative
> > > examples where this side won and that side lost, and vice versa.
> <
> > I found many compare-to-immediate then branch used in switch() processing. The
> > compiler saved some code by testing for one-hot switches then using BBS and BBC
> > instructions to test for constants. The compiler has a capability to generate one-hot
> > case constants using enums. Which are then used for switch statements. Which
> > then can branch using BBS and BBC without needing a compare. Wondering if
> > My66000 compiler can do something similar.
> <
> My 66000 has a Jump-Through-Table instruction (switch) as 1 word instruction.
> The instruction examines the table index, and if it is our of bounds, control is
> transferred to default: otherwise an element in the table is accessed and used
> as an offset to the JTT instruction as a PIC control transfer. 1 instruction, 4 cycles,
> and the table is accessed via the instruction cache.
> <
> Most switch offset tables are comprised of 16-bit elements (although many
> could be 8-bit entries). There is another form of JTT that is appropriate for
> method calling.

I had a similar memory indirect jump, JMPI, in a previous architecture I think DSD9, but it
just replaced the low order 16 or 32 bits of the program counter with a value from a table
rather than using PIC. PIC code could still be done with this restriction provided things
were kept aligned on 64k boundaries. With an MMU and paged memory system,
programs can be loaded at the same fixed address or at least on page boundaries. This
removes some of the requirement of PIC code.

DSD9 architecture had a full complement of integer branches on register compared to an
eight-bit immediate value. This was great for handling sparse switch() statements. This
could be squeezed into branches because a 40-bit instruction word was used.

DSD9 also used postfix words for large constants, something I have kept for the more
recent Thor2023. Because instructions are a fixed size, postfix decoding always occurs at
the same bits in the aligned cache line. I think it is almost as efficient as having bits to
decode the constant size in the first instruction word. It conserves bits in the first
instruction word though since bits are not needed for the constant size. I traded off the
constant size bits for other useful bits. Thor2023 supports up to 128-bit constants without
having to be loaded into a register. It follows the My66000 in that which operand is the
constant may be swapped for instructions that are non-commutative.

Re: Compare instruction with multiple results

<PU8EL.41452$5jd8.22079@fx05.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30746&group=comp.arch#30746

copy link Newsgroups: comp.arch

Path: i2pn2.org!rocksolid2!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx05.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com> <f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de> <acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com> <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com> <e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com> <f73ab1c6-eb9b-4ffe-b393-a3e134102669n@googlegroups.com> <fd5f2af1-1abe-4236-9bfc-0d271bee05fcn@googlegroups.com>
In-Reply-To: <fd5f2af1-1abe-4236-9bfc-0d271bee05fcn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 14
Message-ID: <PU8EL.41452$5jd8.22079@fx05.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 06 Feb 2023 15:20:15 UTC
Date: Mon, 06 Feb 2023 10:20:09 -0500
X-Received-Bytes: 1896

by: EricP - Mon, 6 Feb 2023 15:20 UTC

robf...@gmail.com wrote:
>
> DSD9 also used postfix words for large constants, something I have kept for the more
> recent Thor2023. Because instructions are a fixed size, postfix decoding always occurs at
> the same bits in the aligned cache line. I think it is almost as efficient as having bits to
> decode the constant size in the first instruction word. It conserves bits in the first
> instruction word though since bits are not needed for the constant size. I traded off the
> constant size bits for other useful bits. Thor2023 supports up to 128-bit constants without
> having to be loaded into a register. It follows the My66000 in that which operand is the
> constant may be swapped for instructions that are non-commutative.

How does this "postfix word" encoding work?

Re: Compare instruction with multiple results

<0e3abac9-881b-4f67-81cf-c4e25749e859n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30747&group=comp.arch#30747

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:4b:b0:3a8:179f:b1ba with SMTP id y11-20020a05622a004b00b003a8179fb1bamr2458259qtw.47.1675699369712;
Mon, 06 Feb 2023 08:02:49 -0800 (PST)
X-Received: by 2002:a05:6870:5b9d:b0:163:3ab5:b3f with SMTP id
em29-20020a0568705b9d00b001633ab50b3fmr19077oab.218.1675699369296; Mon, 06
Feb 2023 08:02:49 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 6 Feb 2023 08:02:49 -0800 (PST)
In-Reply-To: <PU8EL.41452$5jd8.22079@fx05.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com> <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
<e11f94d8-5534-41c4-ad9d-76eb5e8bb75fn@googlegroups.com> <f73ab1c6-eb9b-4ffe-b393-a3e134102669n@googlegroups.com>
<fd5f2af1-1abe-4236-9bfc-0d271bee05fcn@googlegroups.com> <PU8EL.41452$5jd8.22079@fx05.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0e3abac9-881b-4f67-81cf-c4e25749e859n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Mon, 06 Feb 2023 16:02:49 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4467

by: robf...@gmail.com - Mon, 6 Feb 2023 16:02 UTC

On Monday, February 6, 2023 at 10:20:19 AM UTC-5, EricP wrote:
> robf...@gmail.com wrote:
> >
> > DSD9 also used postfix words for large constants, something I have kept for the more
> > recent Thor2023. Because instructions are a fixed size, postfix decoding always occurs at
> > the same bits in the aligned cache line. I think it is almost as efficient as having bits to
> > decode the constant size in the first instruction word. It conserves bits in the first
> > instruction word though since bits are not needed for the constant size. I traded off the
> > constant size bits for other useful bits. Thor2023 supports up to 128-bit constants without
> > having to be loaded into a register. It follows the My66000 in that which operand is the
> > constant may be swapped for instructions that are non-commutative.
> How does this "postfix word" encoding work?

A postfix works in a similar fashion to a prefix except that it follows the instruction word rather
than preceding it. In ascending memory layout it looks like | Instruction | Postfix1 | Postfix2 | Note
the instruction is always decoded at the same position of the buffer, as are the postfix instructions.
A prefix would cause decoding to shift. The instruction buffer must be large enough to contain the
instruction and all postfixes. In terms of a classic pipeline the pipeline regs must be wide enough
to accommodate the decoded constant. Because the instruction and postfixes can all be decoded
at the same time, the PC can increment to skip over the postfixes so they do not consume clock
cycles to fetch.

The postfix itself is an opcode combined with a numeric constant. In the Thor2023 case
| 32-bit constant | 3-bit op | 5-bit opcode |. The 3-bit op determines which 32-bit piece of the full
constant is contained in the payload area. For integers, the first 32-bit would be represented by the
postfix | 32-bit constant | 0 | 31 |. The first constant sign extends to the machine width. The second
postfix constant overrides bits 32 to 63 while leaving the low order bits in place. Note that most
integer instructions may use a 16-bit immediate constant. This constant is overridden by postfixes
when a larger constant is needed only if postfixes are present. Float postfixes are slightly more
complex as they may be either 32-bit single precision values if only a single postfix is present or
64-bit double precision values. The values are converted to the FP width.

Thor2023 is slightly more complex in that it currently has a 96-bit width and can use three postfix
instructions. There is provision for up to four postfixes, the ISA is still in flux.

Instructions are 40-bits wide.

Should there be a branch to a postfix, the postfix is treated as a NOP instruction. The NOP opcode
and the postfix opcode are the same. They are distinguished by the op3 field.

Re: Compare instruction with multiple results

<1303f273-4d0c-4c5b-acbb-82a3fe44f469n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30748&group=comp.arch#30748

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:e093:0:b0:535:54ab:1c8a with SMTP id l19-20020a0ce093000000b0053554ab1c8amr24191qvk.75.1675716068169;
Mon, 06 Feb 2023 12:41:08 -0800 (PST)
X-Received: by 2002:a05:6808:8c1:b0:37a:c14e:afc2 with SMTP id
k1-20020a05680808c100b0037ac14eafc2mr80482oij.61.1675716067860; Mon, 06 Feb
2023 12:41:07 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 6 Feb 2023 12:41:07 -0800 (PST)
In-Reply-To: <trl5kb$6isc$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=108.36.229.95; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 108.36.229.95
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1303f273-4d0c-4c5b-acbb-82a3fe44f469n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: timcaff...@aol.com (Timothy McCaffrey)
Injection-Date: Mon, 06 Feb 2023 20:41:08 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1948

by: Timothy McCaffrey - Mon, 6 Feb 2023 20:41 UTC

My favorite hack on the 8088:

MOV AL,<value to branch on , 0..3>
ROR AL, 1
AND AL,81h ; I can't remember if this is necessary to not...
JZ <value is zero>
JPE <value is three>
JS <value is one>
JMP <value is two>

- Tim

Re: Compare instruction with multiple results

<trsr7f$3fbl2$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30749&group=comp.arch#30749

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
Date: Tue, 7 Feb 2023 07:31:10 +0100
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <trsr7f$3fbl2$1@dont-email.me>
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
<1303f273-4d0c-4c5b-acbb-82a3fe44f469n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 7 Feb 2023 06:31:11 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="e68e9bef9f2a1aeda28920032ff79c19";
logging-data="3649186"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Y1A0HaBQOEbmuUX9kCRjbqgxGd/9r6lDtmgO+7uqI0g=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha1:R551F8O5b2jp+d/hTdAihEnGpsk=
In-Reply-To: <1303f273-4d0c-4c5b-acbb-82a3fe44f469n@googlegroups.com>

by: Terje Mathisen - Tue, 7 Feb 2023 06:31 UTC

Timothy McCaffrey wrote:
> On Saturday, February 4, 2023 at 3:39:43 AM UTC-5, Thomas Koenig wrote:
>> Mitch's My66000 has a compare instruction which produces a multitude
>> of result bits in a target register. This can then later be used
>> for a branch or predicate on bit, or shifted to get an integer
>> value.
>>
>> Just wondering: Are there other architectures which use a
>> similar approach? ARM uses condition codes, MIPS/Alpha use
>> comparisons which have zero or one put into a general register,
>> POWER has several fields in its condition register, 68000 has a
>> flags register, ...
>
> My favorite hack on the 8088:
>
> MOV AL,<value to branch on , 0..3>
> ROR AL, 1
> AND AL,81h ; I can't remember if this is necessary to not...
> JZ <value is zero>
> JPE <value is three>
> JS <value is one>
> JMP <value is two>
>

Nice!

I have written code that did three-way branching quite a few times, but
never four, at that point I'd typically use jump tables instead.

The ROR should set all the flag bits afair, so the AND AL,81h can
probably be skipped. In fact, the AND will clear the Overflow Flag which
would have been set by ROR (along with the carry flag) if the starting
value was odd.

OTOH, by including the AND operation this works to branch on the bottom
two bits no matter what the rest contained, but I'd probably prefer an
initial AND AL,3 instead.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Compare instruction with multiple results

<trsrpu$bj93$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30750&group=comp.arch#30750

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-22da-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
Date: Tue, 7 Feb 2023 06:41:02 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <trsrpu$bj93$1@newsreader4.netcologne.de>
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
<0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com>
<trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
<d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
Injection-Date: Tue, 7 Feb 2023 06:41:02 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-22da-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:22da:0:7285:c2ff:fe6c:992d";
logging-data="380195"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Tue, 7 Feb 2023 06:41 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
>> Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
>>
>> State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
><
> What makes you think that CMP-Bcnd cannot be fused into one OP that rides
> down the pipeline in a single beat ?? Branch does not deliver a result, and at
> the 95% level, the branch consumes the result of the CMP. So we have 2 inst
> that consume 2 normal operands and deliver 1 result combined with a possible
> change of control point.
><
>>This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).
><
> MIPS and RISC-V take that strategy. However when I survey the CMP-Bcnd-s
> in EMBench and CoreMark emitted by the LLVM compiler for both My 66000
> and for RISC-V, I see no useful advantage.

That is mostly a question of encoding length, I believe (see below).

> I should note (for those not paying attention):: that My 66000 has a complete
> set of branches with respect to zero (including floating point.) So, all comparisons
> against zero are essentially equivalent; Forward.Com versus My 66000.
><
> For the remaining 40% if branches an awful lot of them are comparisons
> to/with non-zero constants, of which MIPS and RISC-V do not have direct
> instruction support. Maybe Forward.Com has constants with its alu-bcnd
> that the other 2 do not.
><
> When I started reading the code out of the LLVM RISC-V compiler and started
> to compare with My 66000 compiler assembly; I honestly though that the
> Alu-Bcnd would be a place My 66000 ISA got beat up. However, after looking
> at 200,000 lines of asm out of each compiler, I do not see any advantage--sure
> there are cases where the MIPS strategy works (and well!) but there are other
> cases where access to constants was at least as valuable. and all in all, it turns
> out to be a wash.

If an ISA offers sufficient instruction length (or sacrifices enough
entropy) to offer branches featuring a full set of comparisons with
a reasonable offset _and_ would offer constants to compare against
as well, that could lead to an advantage in instruction count.

So, to take this to the extreme, something like

bfgt r1,#3.14159265358979323846, label

(branch if "floating point greater than") would save an instruction,
a register and an instruction size over

fcmp r1,r2,#3.14159265358979323846
bgt r2, label

at the cost of having to encode this in the ISA.

Branches being as frequent as they are, this might be a win iff
the bits to encode it are there.

Re: Compare instruction with multiple results

<37171614-c9d1-4e8b-b3c3-7a61c156efa5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30751&group=comp.arch#30751

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7d4c:0:b0:3a8:12e6:67e7 with SMTP id h12-20020ac87d4c000000b003a812e667e7mr567643qtb.55.1675791707044;
Tue, 07 Feb 2023 09:41:47 -0800 (PST)
X-Received: by 2002:aca:bb43:0:b0:37b:e87:d03f with SMTP id
l64-20020acabb43000000b0037b0e87d03fmr1037370oif.79.1675791706748; Tue, 07
Feb 2023 09:41:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 7 Feb 2023 09:41:46 -0800 (PST)
In-Reply-To: <trsrpu$bj93$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:492d:c2bf:3f28:f3e7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:492d:c2bf:3f28:f3e7
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com> <d07d064e-b03d-4374-a35a-6afa94d7d0a1n@googlegroups.com>
<trsrpu$bj93$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <37171614-c9d1-4e8b-b3c3-7a61c156efa5n@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 07 Feb 2023 17:41:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5364

by: MitchAlsup - Tue, 7 Feb 2023 17:41 UTC

On Tuesday, February 7, 2023 at 12:41:05 AM UTC-6, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
> >> Why is nobody mentioning the x86 flags register? A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register.. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
> >>
> >> State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
> ><
> > What makes you think that CMP-Bcnd cannot be fused into one OP that rides
> > down the pipeline in a single beat ?? Branch does not deliver a result, and at
> > the 95% level, the branch consumes the result of the CMP. So we have 2 inst
> > that consume 2 normal operands and deliver 1 result combined with a possible
> > change of control point.
> ><
> >>This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).
> ><
> > MIPS and RISC-V take that strategy. However when I survey the CMP-Bcnd-s
> > in EMBench and CoreMark emitted by the LLVM compiler for both My 66000
> > and for RISC-V, I see no useful advantage.
> That is mostly a question of encoding length, I believe (see below).
> > I should note (for those not paying attention):: that My 66000 has a complete
> > set of branches with respect to zero (including floating point.) So, all comparisons
> > against zero are essentially equivalent; Forward.Com versus My 66000.
> ><
> > For the remaining 40% if branches an awful lot of them are comparisons
> > to/with non-zero constants, of which MIPS and RISC-V do not have direct
> > instruction support. Maybe Forward.Com has constants with its alu-bcnd
> > that the other 2 do not.
> ><
> > When I started reading the code out of the LLVM RISC-V compiler and started
> > to compare with My 66000 compiler assembly; I honestly though that the
> > Alu-Bcnd would be a place My 66000 ISA got beat up. However, after looking
> > at 200,000 lines of asm out of each compiler, I do not see any advantage--sure
> > there are cases where the MIPS strategy works (and well!) but there are other
> > cases where access to constants was at least as valuable. and all in all, it turns
> > out to be a wash.
> If an ISA offers sufficient instruction length (or sacrifices enough
> entropy) to offer branches featuring a full set of comparisons with
> a reasonable offset _and_ would offer constants to compare against
> as well, that could lead to an advantage in instruction count.
>
> So, to take this to the extreme, something like
>
> bfgt r1,#3.14159265358979323846, label
>
> (branch if "floating point greater than") would save an instruction,
> a register and an instruction size over
>
> fcmp r1,r2,#3.14159265358979323846
> bgt r2, label
>
> at the cost of having to encode this in the ISA.
<
It ends up that the CMP followed by the branch on bit can
be CoIssued since they produce 1 result and do not consume
more than 3 register operands. So, making this 1 long inst
ends up no different than 2 inst that can be CoIssued. And
both cases occupy the same space in the inst stream.
>
> Branches being as frequent as they are, this might be a win iff
> the bits to encode it are there.

Re: Compare instruction with multiple results

<3364dcb2-8a6a-4223-9524-534dc179357bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30778&group=comp.arch#30778

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:b3d3:0:b0:539:55cd:603a with SMTP id b19-20020a0cb3d3000000b0053955cd603amr1375584qvf.3.1676148493316;
Sat, 11 Feb 2023 12:48:13 -0800 (PST)
X-Received: by 2002:a05:6808:bd6:b0:37a:c065:520b with SMTP id
o22-20020a0568080bd600b0037ac065520bmr2232475oik.298.1676148493048; Sat, 11
Feb 2023 12:48:13 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 11 Feb 2023 12:48:12 -0800 (PST)
In-Reply-To: <acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:bce7:7e67:b345:d79;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:bce7:7e67:b345:d79
References: <trl5kb$6isc$1@newsreader4.netcologne.de> <0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com> <trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3364dcb2-8a6a-4223-9524-534dc179357bn@googlegroups.com>
Subject: Re: Compare instruction with multiple results
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 11 Feb 2023 20:48:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4539

by: MitchAlsup - Sat, 11 Feb 2023 20:48 UTC

On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
> Why is nobody mentioning the x86 flags register?
<
Possibly because it represents something to be avoided rather than something
to be followed. The SMD side treats these 5-bits as 3 different registers {C, O, and ZAPS}
and then has reservation station entries for each one. That is right, more resources are
used to track these 5-bits than are used to track both registers in your typical 2-operand
instruction.
<
> A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
<
SPARC taught us that condition codes are much less harmful to the implementation
when the conditions are only set ON-DEMAND (bit in the instructions) so that the
condition persists over "book keeping" instructions (loop indexes and pointer adjustments).
x86 does not have this, and does a lot of other nasties:: such as shift instructions set C
but do not modify C if the shift count was 0.
>
> State-of-the-art implementations by both Intel and AMD are fusing the compare instruction and the subsequent branch instruction in the decoder into a single compare-and-branch micro-op.
<
Note: one can only fuse these if they are coincident in the instruction stream.
So code scheduling is contraindicated.
<
>This expensive complexity in the decoder can be avoided by making combined alu-and-branch instructions in the first place (like I do in ForwardCom).
<
Which takes just as many instructions to execute in the case of constants.
<
if( i == 7 )
<
CMP Rt,Ri,#7
BNE Rt, else-clause
< versus
MOV Rt,#7
BNE Ri,Rt,else-clause
<
leaving fewer bits in the branch displacement--which leads to other complications.....
In effect you gave more bits to the immediate which you don't need, and you ended up
taking those bits from the Bcnd where you need it for displacements.
<
And the floating point side is "really sad" in RISC-V
<
if( d >= 7.0D0 )
<
FCMP Rt,Rd,#7
BLT Rt,else-clause
< versus
LDD Ft,[lit-pool[7.0D]]
SLT Rt,Fd,Ft
BNE R0,Rt,ele-clause
<
Se, here, we use a F register for the constant, and a R register for the
comparison value, and then we have to arithmetically compare against
0 (r0==0) and we still lost branch displacement bits (at least 5).
<
I should also note that the #7 is (int) that gets promoted to (double)
since the FCMP is a double "operator". So, this costs 2 word storage
containers (both in execute only memory) whereas the RISC-V version
uses 3 words in execute memory and 2 words of literal-pool memory
with the added latency of the LDD,.....2.5× bigger.
<
Agner:: please make Forward.com better than RISC-V here.

Re: Compare instruction with multiple results

<tscvj7$23uds$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30788&group=comp.arch#30788

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Compare instruction with multiple results
Date: Mon, 13 Feb 2023 10:23:51 +0100
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <tscvj7$23uds$1@dont-email.me>
References: <trl5kb$6isc$1@newsreader4.netcologne.de>
<0537d531-45e1-4d35-ae06-69754bab1629n@googlegroups.com>
<f665001d-104f-40c8-9dd9-80404def2929n@googlegroups.com>
<trnqft$8aje$1@newsreader4.netcologne.de>
<acd4ee5b-bf0e-480e-9e9f-f0f2098a0ebfn@googlegroups.com>
<3364dcb2-8a6a-4223-9524-534dc179357bn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 13 Feb 2023 09:23:51 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="69d97d05f582fbaf00ea291d7cd733b7";
logging-data="2226620"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+VEauEhMODmuDLCKEtFGjDdqGOAVZpLzWmAITEB9RIzA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha1:TCnPLiZuPQpVquec/DoRj4Bv9Y0=
In-Reply-To: <3364dcb2-8a6a-4223-9524-534dc179357bn@googlegroups.com>

by: Terje Mathisen - Mon, 13 Feb 2023 09:23 UTC

MitchAlsup wrote:
> On Sunday, February 5, 2023 at 5:20:42 AM UTC-6, Agner Fog wrote:
>> Why is nobody mentioning the x86 flags register?
> <
> Possibly because it represents something to be avoided rather than something
> to be followed. The SMD side treats these 5-bits as 3 different registers {C, O, and ZAPS}
> and then has reservation station entries for each one. That is right, more resources are
> used to track these 5-bits than are used to track both registers in your typical 2-operand
> instruction.
> <
>> A compare instruction, such as cmp eax, ebx, is setting multiple flag bits in the flags register. Different conditional jump instructions can test different flag bits for <, <=, >, >=, ==, !=, signed and unsigned, carry, overflow, and parity. The same flags register is also set by many ALU instructions, such as addition and shift.
> <
> SPARC taught us that condition codes are much less harmful to the implementation
> when the conditions are only set ON-DEMAND (bit in the instructions) so that the
> condition persists over "book keeping" instructions (loop indexes and pointer adjustments).
> x86 does not have this, and does a lot of other nasties:: such as shift instructions set C
> but do not modify C if the shift count was 0.

I agree that there's lots of performance warts on x86, but in the
particular instance of "book keeping" instructions, Intel very carefully
made that possible for some useful situations:

I.e. the core of my favorite TCPIP copy + checksum loop looks like this,
after manually inverting it and combining input/output pointers:

next32:
mov [esi+edi],edx ; Save previous dword
mov edx,[esi] ; Load next

adc eax,edx ; Wrapping checksum via carry
lea esi,[esi+4] ; Update src pointer without touching the flags

dec ecx ; DEC intentionally does not touch CARRY
jnz next32

The loop above achived perfect pairing on a Pentium so it could run
those 6 instructions in 3 clock cycles as long as both source and
destination was in $L1.

PS. It should probably be noted that real code, as in Linux, avoids all
this by simply using a larger (64-bit) accumulator and unrolling the
core: (From memory, I only looked at this once 15-20 years ago)

uint64_t acc = 0;
while (len_dwords >= 4 {
acc += dst32[0] = src32[0];
acc += dst32[1] = src32[1];
acc += dst32[2] = src32[2];
acc += dst32[3] = src32[3];
src32 += 4; dst32 += 4;
}

Back in the 32-bit days, compilers already knew that they could
zero-extend a 32-bit variable to a 64-bit pair and add to an accumulator
pair by simply adding the low dword and ADC a (fixed) zero word to the
high half of the accumulator, so the code above could compile into

next4:
mov ebx,[esi]
mov [edi],ebx
add eax,ebx
adc edx,0
mov ebx,[esi+4]
mov [edi+4],ebx
add eax,ebx
adc edx,0
mov ebx,[esi+8]
mov [edi+8],ebx
add eax,ebx
adc edx,0
mov ebx,[esi+12]
mov [edi+12],ebx
add eax,ebx
adc edx,0
lea esi,[esi+16]
lea edi,[edi+16]
sub ecx,4
jae next4

This version could run the unrolled part in just 2 clock cycles, so
obviously better!

These days, if you still want to do this by hand in the CPU (to verify
that the network card TCP offload engine does not have hardware bugs),
then you would probably SIMD it, keeping the accumulators in a
carry-save pair of vector regs. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Subject	Author
Compare instruction with multiple results	Thomas Koenig
Re: Compare instruction with multiple results	robf...@gmail.com
Re: Compare instruction with multiple results	MitchAlsup
Re: Compare instruction with multiple results	Thomas Koenig
Re: Compare instruction with multiple results	Agner Fog
Re: Compare instruction with multiple results	MitchAlsup
Re: Compare instruction with multiple results	robf...@gmail.com
Re: Compare instruction with multiple results	MitchAlsup
Re: Compare instruction with multiple results	robf...@gmail.com
Re: Compare instruction with multiple results	EricP
Re: Compare instruction with multiple results	robf...@gmail.com
Re: Compare instruction with multiple results	Thomas Koenig
Re: Compare instruction with multiple results	MitchAlsup
Re: Compare instruction with multiple results	MitchAlsup
Re: Compare instruction with multiple results	Terje Mathisen
Re: Compare instruction with multiple results	Anton Ertl
Re: Compare instruction with multiple results	MitchAlsup
Re: Compare instruction with multiple results	Thomas Koenig
Re: Compare instruction with multiple results	Timothy McCaffrey
Re: Compare instruction with multiple results	Terje Mathisen