Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

[We] use bad software and bad machines for the wrong things. -- R. W. Hamming

Re: Drastic Simplification of Concertina II Coming

Subject	Author
Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Michael S
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Anton Ertl
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Anton Ertl
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Michael S
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	EricP
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Terje Mathisen
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Ivan Godard
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	George Neuner
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	George Neuner
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Michael S
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Terje Mathisen
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Ivan Godard
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	George Neuner
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Michael S
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Scott Lurndal
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Tim Rentsch
Re: Drastic Simplification of Concertina II Coming	Stephen Fuld
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	EricP
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Michael S
Re: Drastic Simplification of Concertina II Coming	Niklas Holsti
Re: Drastic Simplification of Concertina II Coming	mac
Re: Drastic Simplification of Concertina II Coming	Niklas Holsti
Re: Drastic Simplification of Concertina II Coming	Anton Ertl
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	Terje Mathisen
Re: Drastic Simplification of Concertina II Coming	Anton Ertl
Re: Drastic Simplification of Concertina II Coming	David Brown
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	David Brown
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	BGB
Re: Drastic Simplification of Concertina II Coming	Terje Mathisen
Re: Drastic Simplification of Concertina II Coming	Michael S
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	MitchAlsup
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Thomas Koenig
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Quadibloc
Re: Drastic Simplification of Concertina II Coming	Quadibloc

Pages:12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Drastic Simplification of Concertina II Coming

<0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28015&group=comp.arch#28015

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57cf:0:b0:35c:ad7:93b8 with SMTP id w15-20020ac857cf000000b0035c0ad793b8mr22623624qta.375.1665560656068;
Wed, 12 Oct 2022 00:44:16 -0700 (PDT)
X-Received: by 2002:a05:6871:68b:b0:132:9af1:62fb with SMTP id
l11-20020a056871068b00b001329af162fbmr1673961oao.23.1665560655861; Wed, 12
Oct 2022 00:44:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Oct 2022 00:44:15 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
Subject: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 12 Oct 2022 07:44:16 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4905

by: Quadibloc - Wed, 12 Oct 2022 07:44 UTC

The various previous iterations of my Concertina II architecture
hava all obviously been far to complex to have any chance of
being taken seriously as an alternative to RISC-V and so on.

An idea finally came my way as to how I could simplify it drastically
enough so as to have some chance of consideration. However, the
simplification comes at a cost which is probably high enough to still
leave it without much chance.

Essentially:

All 32-bit instructions start with 1.
All 16-bit instructions start with 0.
A program is a stream of aligned 32-bit instructions. A 32-bit instruction
slot may contain one 32-bit instructiion, or a pair of 16-bit instructions, or
....

a 16-bit instruction, and a second half that starts with 1.

32-bit instructions that start with 111 are register-to-register operate
instructions.

32-bit instructions that start with 101111 are not decoded as 32-bit
instructions. These stub instructions serve as the second half,
or more precisely, the remaining two-thirds, of a 48-bit instruction...

that starts with the 16 bits that start with 1 which follow a single 16-bit
instruction in the preceding 32-bit instruction slot.

It's possible to follow such a 16 bits starting with 1 with two stub
instructions in a row, this would allow for an 80-bit instruction with
two memory references. (To be aligned, the 16-bit displacement
in an address constant has to be in the last 16 bits of a 32-bit slot,
so as not to collide with the bits at the start which indicate the type
of its contents.)

This allows the processor to fetch, and decode, every 32 bits in the
instruction stream independently of every other, except for the visibly
marked stub instructions.

A 16-bit second half starting with 1 can also be followed by zero
stub instructions, and have functions such as applying predication
to the next few instructions. But predication would seem to break
being able to fetch, decode, and start executing every instruction
independently!

Still, there is a way to manage this. Although having block headers and
different types of blocks is a complication to avoid, one can _still_
divide the instruction stream into blocks of eight 32-bit instruction slots.

A second half that does predication can be required to only appear in
the last 32-bit instruction slot of the block preceding the block
containing the instructions it affects.

And _that_ means that one can also have a second half that says
'do not decode these 32-bit slots, regardless of their content', which
allows me to include immediates longer than 16 bits in the instruction
stream!

So despite the drastic simplification, I can basically offer the same
features that the overly complicated architectures did.

But the price?

Memory-reference instructions are offered in two types:

Those which can be unaligned, and indexed, but which can have only
8 of the 32 registers as their destination registers, and

those which can have any of the 32 registers as their destination
registers, but which can only refer to aligned data in memory, and which
cannot be indexed.

But, *now that I've worked out how to use the last 16 bits of a 32-bit
instruction slot, in the last slot of a block as a header*, I can do *one*
overly complicated thing... allow them to indicate 'these 32-bit
instruction slots contain full memory-reference instructions, so decode
them differently'.

Of course, though, since now the indication lies _outside_ the block,
one can't branch into a block with any immediates, with any full
memory-reference instructions, or with predication.

So, sadly, even _after_ my drastic simplification, it will still be
too complicated to have any chance of acceptance.

John Savard

Re: Drastic Simplification of Concertina II Coming

<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28018&group=comp.arch#28018

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1452:b0:35c:b9ca:a3a3 with SMTP id v18-20020a05622a145200b0035cb9caa3a3mr23971633qtx.258.1665597518023;
Wed, 12 Oct 2022 10:58:38 -0700 (PDT)
X-Received: by 2002:a05:6870:785:b0:131:e39c:9140 with SMTP id
en5-20020a056870078500b00131e39c9140mr3134353oab.261.1665597517703; Wed, 12
Oct 2022 10:58:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Oct 2022 10:58:37 -0700 (PDT)
In-Reply-To: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 12 Oct 2022 17:58:38 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1713

by: Quadibloc - Wed, 12 Oct 2022 17:58 UTC

On Wednesday, October 12, 2022 at 1:44:17 AM UTC-6, Quadibloc wrote:

> A second half that does predication can be required to only appear in
> the last 32-bit instruction slot of the block preceding the block
> containing the instructions it affects.

If the second halves that do header functions affect the next block,
rather than the next eight instructions, then they can be allowed to
appear in any position, thus not interfering with the desired sequence
of instructions to the same extent.

John Savard

Re: Drastic Simplification of Concertina II Coming

<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28019&group=comp.arch#28019

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:191c:b0:6ed:88c5:e839 with SMTP id bj28-20020a05620a191c00b006ed88c5e839mr10764921qkb.627.1665602119536;
Wed, 12 Oct 2022 12:15:19 -0700 (PDT)
X-Received: by 2002:a05:6808:13ca:b0:354:bd5b:c2b7 with SMTP id
d10-20020a05680813ca00b00354bd5bc2b7mr2957756oiw.118.1665602119290; Wed, 12
Oct 2022 12:15:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Oct 2022 12:15:19 -0700 (PDT)
In-Reply-To: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6125:9a5b:c71f:9a3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6125:9a5b:c71f:9a3
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 12 Oct 2022 19:15:19 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 5814

by: MitchAlsup - Wed, 12 Oct 2022 19:15 UTC

On Wednesday, October 12, 2022 at 2:44:17 AM UTC-5, Quadibloc wrote:
> The various previous iterations of my Concertina II architecture
> hava all obviously been far to complex to have any chance of
> being taken seriously as an alternative to RISC-V and so on.
<
Basically you are putting too much stuff in a 5 pound bag.
This takes schedule, and RISV-generation 1 showed us that schedule
is one of the most important things in designing an architecture--
if you put too much stuff in, it takes too long to get done. AND THIS::
puts you behind in the performance game.
>
> An idea finally came my way as to how I could simplify it drastically
> enough so as to have some chance of consideration. However, the
> simplification comes at a cost which is probably high enough to still
> leave it without much chance.
>
> Essentially:
>
> All 32-bit instructions start with 1.
> All 16-bit instructions start with 0.
> A program is a stream of aligned 32-bit instructions. A 32-bit instruction
> slot may contain one 32-bit instructiion, or a pair of 16-bit instructions, or
> ...
>
> a 16-bit instruction, and a second half that starts with 1.
>
> 32-bit instructions that start with 111 are register-to-register operate
> instructions.
>
> 32-bit instructions that start with 101111 are not decoded as 32-bit
> instructions. These stub instructions serve as the second half,
> or more precisely, the remaining two-thirds, of a 48-bit instruction...
>
> that starts with the 16 bits that start with 1 which follow a single 16-bit
> instruction in the preceding 32-bit instruction slot.
>
> It's possible to follow such a 16 bits starting with 1 with two stub
> instructions in a row, this would allow for an 80-bit instruction with
> two memory references. (To be aligned, the 16-bit displacement
> in an address constant has to be in the last 16 bits of a 32-bit slot,
> so as not to collide with the bits at the start which indicate the type
> of its contents.)
>
> This allows the processor to fetch, and decode, every 32 bits in the
> instruction stream independently of every other, except for the visibly
> marked stub instructions.
>
> A 16-bit second half starting with 1 can also be followed by zero
> stub instructions, and have functions such as applying predication
> to the next few instructions. But predication would seem to break
> being able to fetch, decode, and start executing every instruction
> independently!
>
> Still, there is a way to manage this. Although having block headers and
> different types of blocks is a complication to avoid, one can _still_
> divide the instruction stream into blocks of eight 32-bit instruction slots.
>
> A second half that does predication can be required to only appear in
> the last 32-bit instruction slot of the block preceding the block
> containing the instructions it affects.
>
> And _that_ means that one can also have a second half that says
> 'do not decode these 32-bit slots, regardless of their content', which
> allows me to include immediates longer than 16 bits in the instruction
> stream!
>
> So despite the drastic simplification, I can basically offer the same
> features that the overly complicated architectures did.
<
Why you may have simplified encode/decode you still have too much stuff
in the bag.
>
> But the price?
<
It will take too long to implement and thereby never catch on.
>
> Memory-reference instructions are offered in two types:
>
> Those which can be unaligned, and indexed, but which can have only
> 8 of the 32 registers as their destination registers, and
>
> those which can have any of the 32 registers as their destination
> registers, but which can only refer to aligned data in memory, and which
> cannot be indexed.
>
> But, *now that I've worked out how to use the last 16 bits of a 32-bit
> instruction slot, in the last slot of a block as a header*, I can do *one*
> overly complicated thing... allow them to indicate 'these 32-bit
> instruction slots contain full memory-reference instructions, so decode
> them differently'.
>
> Of course, though, since now the indication lies _outside_ the block,
> one can't branch into a block with any immediates, with any full
> memory-reference instructions, or with predication.
>
> So, sadly, even _after_ my drastic simplification, it will still be
> too complicated to have any chance of acceptance.
<
Note:: A bigger bag does not help in this situation...........
>
> John Savard

Re: Drastic Simplification of Concertina II Coming

<74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28022&group=comp.arch#28022

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:20cf:b0:4b1:72a5:2f15 with SMTP id 15-20020a05621420cf00b004b172a52f15mr25961120qve.49.1665633710803;
Wed, 12 Oct 2022 21:01:50 -0700 (PDT)
X-Received: by 2002:aca:3608:0:b0:34f:bb9b:cdc9 with SMTP id
d8-20020aca3608000000b0034fbb9bcdc9mr3647160oia.261.1665633710552; Wed, 12
Oct 2022 21:01:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Oct 2022 21:01:50 -0700 (PDT)
In-Reply-To: <65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com> <65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 13 Oct 2022 04:01:50 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16

by: Quadibloc - Thu, 13 Oct 2022 04:01 UTC

On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:

> Basically you are putting too much stuff in a 5 pound bag.

Well, it _is_ always possible to implement a subset.

> Note:: A bigger bag does not help in this situation...........

Then why does the fact that it's a 5-pound bag matter?

But yes, I know it's too complicated. But much less so than my previous
iterations, and so now I could finally say that the goals for Concertina II
are achieved, an almost, but not quite, practical architecture.

Perhaps someday Concertina III would be fully practical.

John Savard

Re: Drastic Simplification of Concertina II Coming

<ti8m7c$1p16i$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28023&group=comp.arch#28023

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Thu, 13 Oct 2022 04:35:23 -0500
Organization: A noiseless patient Spider
Lines: 199
Message-ID: <ti8m7c$1p16i$2@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
<74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 Oct 2022 09:36:44 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="4842dc7229052c9db27bc5d16a76bd04";
logging-data="1869010"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19gihS0cHMuc3GVpE95MPFi"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:w7KD6VVNHfMcFsN+y/SbDhnObC0=
In-Reply-To: <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 13 Oct 2022 09:35 UTC

On 10/12/2022 11:01 PM, Quadibloc wrote:
> On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
>
>> Basically you are putting too much stuff in a 5 pound bag.
>
> Well, it _is_ always possible to implement a subset.
>

Scalability is always a fun issue.

One can try to design a minimal ISA.
* But, RISC-V is already a fairly sensible design in this space.

Namely, RV32I and RV64I are pretty sane.

Getting too much into the RISC-V extensions though, the "crap hits the
fan" at this point. Complexity quickly rises to a point where I don't
want to deal with it.

I will argue that, while the core of BJX2 is more complicated than
RISC-V, the complexity curve is shallower (with a more coherent design
among the various extensions).

Granted, I am one person, which probably helps.

There is probably some cruft that could be pruned or done differently if
I were to do it again.

Though, thinking, one possibility for a "simple" ISA...

16-bit range:
zzzz-ssss-dddd-zz00 //16-bit ops
zzzz-ssss-dddd-zz01 //16-bit ops
zzzz-ssss-dddd-zz10 //16-bit ops

16-bit forms:
zzzz-ssss-dddd-zzzz //2R
zzzz-iiii-dddd-zzzz //2RI (Imm4)
zzzz-iiii-iiii-zzzz //Imm8
zzzz-ssss-dddd-sdzz //2R (R0..R31)

32-bit:
zzzz-zzzt-tttt-ssss_szzz-dddd-dzzz-zz11 //32-bit ops (3R)
iiii-iiii-iiii-ssss_szzz-dddd-dzzz-zz11 //32-bit ops (3RI)

Thus far, similar to RISC-V.
Could potentially reuse a bunch of encodings from RISC-V.
Many ALU ops would be kept as-is.
Load Ops would be kept as-is.
Store Ops would be switched over to the Load pattern.

But, would drop the existing Imm20/Disp20 encodings (LUI/AUIPC/JAL).
Encoding space would be reused for some different encodings.

Instead (Replacing LUI and AUIPC):
iiii-iiii-iiii-iiii_izzz-dddd-d001-0111 //32-bit ops (2RI, Imm17s)
iiii-iiii-iiii-iiii_izzz-dddd-d011-0111 //32-bit ops (2RI, Imm17s)

Likely, JAL would be replaced with a B/BL instruction, say:
iiii-iiii-iiii-iiii_iiii-0000-0110-1111 //B Disp20
B: Branch (Disp20)
iiii-iiii-iiii-iiii_iiii-0000-1110-1111 //BL Disp20
BL: Branch with Link (Disp20), Fixed Link Register
Where would be R2..R31 are reserved for other purposes.

Also half temped to make the Disp20 encoding less dog-chewed.
Though, one could keep all of the dog-chewed immediate and displacement
encodings as they are arguably cheaper.

The existing Compare-and-Branch instructions (BEQ/BNE/...) would also be
dropped, replaced by a "Compare Rd with 0" instruction, with a 17-bit
branch displacement. These can do "mostly the same thing", but would be
cheaper to implement vs a general-purpose compare.

Constant loading could use the LDSH mechanism:
LDSH Rd, Imm17u //Rd=(Rd<<17)|Imm17u

Say (replacing AUIPC):
iiii-iiii-iiii-iiii_i000-dddd-d001-0111 BZ Rd, Disp17s //Rd==0
iiii-iiii-iiii-iiii_i001-dddd-d001-0111 BNZ Rd, Disp17s //Rd!=0
iiii-iiii-iiii-iiii_i010-dddd-d001-0111 BLT Rd, Disp17s //Rd< 0
iiii-iiii-iiii-iiii_i011-dddd-d001-0111 BGE Rd, Disp17s //Rd>=0
iiii-iiii-iiii-iiii_i100-dddd-d001-0111 LDI Rd, Imm17s
iiii-iiii-iiii-iiii_i101-dddd-d001-0111 LDSH Rd, Imm17u
iiii-iiii-iiii-iiii_i110-dddd-d001-0111 BGT Rd, Disp17s //Rd> 0
iiii-iiii-iiii-iiii_i111-dddd-d001-0111 BLE Rd, Disp17s //Rd<=0

Possible Register Space:
R0: ZR (ALU) / PC (Mem/Base)
R1: RA/LR (ALU) / TP/TBR (Mem/Base)
R2: SP
R3: GP/GBR
R4 ..R7 : Scratch
R8 ..R15: Preserve
R16..R23: Scratch
R24..R31: Preserve

Args: R4..R7, R20..R23
Structs: Fits in 1 or 2 registers, pass/return by value;
Else, pass by reference, return by copy-via-pointer.
Stack will have an 8 argument "Red Zone" followed by overflow args.
Yes, basically an MS-style ABI.

This is a significant reorganization from the existing RISC-V register
space and ABI.

If an FPU is present, it would also use GPRs.

A 32-bit machine would use paired GPRs for double-precision FPU, a
64-bit machine would use a single register (nominally always Double).
Same ABI used for both Hard-FP and FPU emulation.

A select few instructions would interpret R0 and R1 as PC and TBR, with
others interpreting them as Zero and LR. TBR would be seen as Read-Only
to the normal user-mode ISA (would require privileged instructions to
modify).

Main justification: I think it is possible (if one wanted to do so), to
further reduce implementation costs vs RISC-V.

Granted, wouldn't be binary compatible with RISC-V, so any similarity or
difference is likely moot (though, keeping it similar could make it
easier to port a compiler).

Though, not likely to be all that useful for much beyond a microcontroller.

>> Note:: A bigger bag does not help in this situation...........
>
> Then why does the fact that it's a 5-pound bag matter?
>
> But yes, I know it's too complicated. But much less so than my previous
> iterations, and so now I could finally say that the goals for Concertina II
> are achieved, an almost, but not quite, practical architecture.
>
> Perhaps someday Concertina III would be fully practical.
>

My stuff is arguably still not very practical either, FWIW.

Would be better if I had a "real OS" and a more "industrial strength" C
compiler.

Both "try to port Unix or similar" and "try to write a new backend for
LLVM or similar" look like an uphill battle.

But, seemingly, much of what time "could" be going into new features or
new stuff, is mostly going into trying to hunt down obscure compiler
bugs and similar in BGBCC...

Also performance falls short of my original hopes (even if, apparently,
it seems I am generally at this point getting better
performance-per-clock if compared with a 486 or similar).

Well, and on some "vintage" stats, I seem to be beating a 486 by a fair
margin (apparently some "vintage" stats were showing Dhrystone 2.1 on a
DX2-66 as around 36000, vs my current stat of ~ 74000 at 50MHz, with
some variability).

Though, annoyingly, would still need ~ 123000 to beat the SweRV core
(and GCC) at this metric.

Still not entirely sure "how" this score possible (seemingly this would
require executing in excess of 2 IPC on a 2-wide superscalar, based on
my current stats).

Normalizing for bundle size, I currently seem to be getting ~ 0.6 DMIPs
per native machine instruction MIPs in BJX2. Granted, compiler could
possibly still be improved more here... ( Say, getting more DMIPs per
native instruction executed. )

Not sure of other metrics, say, what the megapixels/second stat would be
for a 486 at JPEG decoding. Last test, was getting around 0.45
megapixels/second on my current core (so, roughly 100 clock-cycles per
pixel on average).

But, yeah, dunno...

> John Savard

Re: Drastic Simplification of Concertina II Coming

<dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28024&group=comp.arch#28024

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:ab8b:0:b0:4b3:d857:de62 with SMTP id j11-20020a0cab8b000000b004b3d857de62mr1093154qvb.75.1665685516121;
Thu, 13 Oct 2022 11:25:16 -0700 (PDT)
X-Received: by 2002:aca:5808:0:b0:350:9790:7fe with SMTP id
m8-20020aca5808000000b00350979007femr5602282oib.79.1665685515853; Thu, 13 Oct
2022 11:25:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 11:25:15 -0700 (PDT)
In-Reply-To: <ti8m7c$1p16i$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a0a3:a43d:704a:ffcf;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a0a3:a43d:704a:ffcf
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Oct 2022 18:25:16 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 8793

by: MitchAlsup - Thu, 13 Oct 2022 18:25 UTC

On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
> On 10/12/2022 11:01 PM, Quadibloc wrote:
> > On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
> >
> >> Basically you are putting too much stuff in a 5 pound bag.
> >
> > Well, it _is_ always possible to implement a subset.
> >
> Scalability is always a fun issue.
>
> One can try to design a minimal ISA.
> * But, RISC-V is already a fairly sensible design in this space.
>
>
> Namely, RV32I and RV64I are pretty sane.
<
12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.
Having the OpCode at the little end is sane ?
>
> Getting too much into the RISC-V extensions though, the "crap hits the
> fan" at this point. Complexity quickly rises to a point where I don't
> want to deal with it.
>
>
> I will argue that, while the core of BJX2 is more complicated than
> RISC-V, the complexity curve is shallower (with a more coherent design
> among the various extensions).
>
> Granted, I am one person, which probably helps.
>
> There is probably some cruft that could be pruned or done differently if
> I were to do it again.
>
>
>
>
> Though, thinking, one possibility for a "simple" ISA...
>
> 16-bit range:
> zzzz-ssss-dddd-zz00 //16-bit ops
> zzzz-ssss-dddd-zz01 //16-bit ops
> zzzz-ssss-dddd-zz10 //16-bit ops
>
> 16-bit forms:
> zzzz-ssss-dddd-zzzz //2R
> zzzz-iiii-dddd-zzzz //2RI (Imm4)
> zzzz-iiii-iiii-zzzz //Imm8
> zzzz-ssss-dddd-sdzz //2R (R0..R31)
>
>
> 32-bit:
> zzzz-zzzt-tttt-ssss_szzz-dddd-dzzz-zz11 //32-bit ops (3R)
> iiii-iiii-iiii-ssss_szzz-dddd-dzzz-zz11 //32-bit ops (3RI)
>
> Thus far, similar to RISC-V.
> Could potentially reuse a bunch of encodings from RISC-V.
> Many ALU ops would be kept as-is.
> Load Ops would be kept as-is.
> Store Ops would be switched over to the Load pattern.
>
>
> But, would drop the existing Imm20/Disp20 encodings (LUI/AUIPC/JAL).
> Encoding space would be reused for some different encodings.
>
> Instead (Replacing LUI and AUIPC):
> iiii-iiii-iiii-iiii_izzz-dddd-d001-0111 //32-bit ops (2RI, Imm17s)
> iiii-iiii-iiii-iiii_izzz-dddd-d011-0111 //32-bit ops (2RI, Imm17s)
>
> Likely, JAL would be replaced with a B/BL instruction, say:
> iiii-iiii-iiii-iiii_iiii-0000-0110-1111 //B Disp20
> B: Branch (Disp20)
> iiii-iiii-iiii-iiii_iiii-0000-1110-1111 //BL Disp20
> BL: Branch with Link (Disp20), Fixed Link Register
> Where would be R2..R31 are reserved for other purposes.
>
> Also half temped to make the Disp20 encoding less dog-chewed.
> Though, one could keep all of the dog-chewed immediate and displacement
> encodings as they are arguably cheaper.
>
>
> The existing Compare-and-Branch instructions (BEQ/BNE/...) would also be
> dropped, replaced by a "Compare Rd with 0" instruction, with a 17-bit
> branch displacement. These can do "mostly the same thing", but would be
> cheaper to implement vs a general-purpose compare.
>
> Constant loading could use the LDSH mechanism:
> LDSH Rd, Imm17u //Rd=(Rd<<17)|Imm17u
>
> Say (replacing AUIPC):
> iiii-iiii-iiii-iiii_i000-dddd-d001-0111 BZ Rd, Disp17s //Rd==0
> iiii-iiii-iiii-iiii_i001-dddd-d001-0111 BNZ Rd, Disp17s //Rd!=0
> iiii-iiii-iiii-iiii_i010-dddd-d001-0111 BLT Rd, Disp17s //Rd< 0
> iiii-iiii-iiii-iiii_i011-dddd-d001-0111 BGE Rd, Disp17s //Rd>=0
> iiii-iiii-iiii-iiii_i100-dddd-d001-0111 LDI Rd, Imm17s
> iiii-iiii-iiii-iiii_i101-dddd-d001-0111 LDSH Rd, Imm17u
> iiii-iiii-iiii-iiii_i110-dddd-d001-0111 BGT Rd, Disp17s //Rd> 0
> iiii-iiii-iiii-iiii_i111-dddd-d001-0111 BLE Rd, Disp17s //Rd<=0
>
>
>
> Possible Register Space:
> R0: ZR (ALU) / PC (Mem/Base)
> R1: RA/LR (ALU) / TP/TBR (Mem/Base)
> R2: SP
> R3: GP/GBR
> R4 ..R7 : Scratch
> R8 ..R15: Preserve
> R16..R23: Scratch
> R24..R31: Preserve
>
> Args: R4..R7, R20..R23
> Structs: Fits in 1 or 2 registers, pass/return by value;
> Else, pass by reference, return by copy-via-pointer.
> Stack will have an 8 argument "Red Zone" followed by overflow args.
> Yes, basically an MS-style ABI.
>
> This is a significant reorganization from the existing RISC-V register
> space and ABI.
>
>
> If an FPU is present, it would also use GPRs.
>
> A 32-bit machine would use paired GPRs for double-precision FPU, a
> 64-bit machine would use a single register (nominally always Double).
> Same ABI used for both Hard-FP and FPU emulation.
>
> A select few instructions would interpret R0 and R1 as PC and TBR, with
> others interpreting them as Zero and LR. TBR would be seen as Read-Only
> to the normal user-mode ISA (would require privileged instructions to
> modify).
>
>
> Main justification: I think it is possible (if one wanted to do so), to
> further reduce implementation costs vs RISC-V.
>
> Granted, wouldn't be binary compatible with RISC-V, so any similarity or
> difference is likely moot (though, keeping it similar could make it
> easier to port a compiler).
>
> Though, not likely to be all that useful for much beyond a microcontroller.
> >> Note:: A bigger bag does not help in this situation...........
> >
> > Then why does the fact that it's a 5-pound bag matter?
> >
> > But yes, I know it's too complicated. But much less so than my previous
> > iterations, and so now I could finally say that the goals for Concertina II
> > are achieved, an almost, but not quite, practical architecture.
> >
> > Perhaps someday Concertina III would be fully practical.
> >
> My stuff is arguably still not very practical either, FWIW.
>
> Would be better if I had a "real OS" and a more "industrial strength" C
> compiler.
>
> Both "try to port Unix or similar" and "try to write a new backend for
> LLVM or similar" look like an uphill battle.
>
>
> But, seemingly, much of what time "could" be going into new features or
> new stuff, is mostly going into trying to hunt down obscure compiler
> bugs and similar in BGBCC...
>
Including instruction selection anomalies.
>
> Also performance falls short of my original hopes (even if, apparently,
> it seems I am generally at this point getting better
> performance-per-clock if compared with a 486 or similar).
>
>
> Well, and on some "vintage" stats, I seem to be beating a 486 by a fair
> margin (apparently some "vintage" stats were showing Dhrystone 2.1 on a
> DX2-66 as around 36000, vs my current stat of ~ 74000 at 50MHz, with
> some variability).
>
> Though, annoyingly, would still need ~ 123000 to beat the SweRV core
> (and GCC) at this metric.
>
> Still not entirely sure "how" this score possible (seemingly this would
> require executing in excess of 2 IPC on a 2-wide superscalar, based on
> my current stats).
>
> Normalizing for bundle size, I currently seem to be getting ~ 0.6 DMIPs
> per native machine instruction MIPs in BJX2. Granted, compiler could
> possibly still be improved more here... ( Say, getting more DMIPs per
> native instruction executed. )
>
>
> Not sure of other metrics, say, what the megapixels/second stat would be
> for a 486 at JPEG decoding. Last test, was getting around 0.45
> megapixels/second on my current core (so, roughly 100 clock-cycles per
> pixel on average).
>
>
> But, yeah, dunno...
>
>
> > John Savard

Re: Drastic Simplification of Concertina II Coming

<caa8d2e7-dd58-4a45-b3c2-1af8fc9810b4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28025&group=comp.arch#28025

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:12cc:b0:6eb:4375:6680 with SMTP id e12-20020a05620a12cc00b006eb43756680mr1182664qkl.774.1665688917344;
Thu, 13 Oct 2022 12:21:57 -0700 (PDT)
X-Received: by 2002:a9d:5e84:0:b0:661:a58a:305f with SMTP id
f4-20020a9d5e84000000b00661a58a305fmr732619otl.137.1665688917064; Thu, 13 Oct
2022 12:21:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 12:21:56 -0700 (PDT)
In-Reply-To: <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me> <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <caa8d2e7-dd58-4a45-b3c2-1af8fc9810b4n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 13 Oct 2022 19:21:57 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1935

by: Quadibloc - Thu, 13 Oct 2022 19:21 UTC

On Thursday, October 13, 2022 at 12:25:17 PM UTC-6, MitchAlsup wrote:

> Having the OpCode at the little end is sane ?

I personally favor having the opcode in the front. However, I have
encountered some real-world computers that put it at the end
of the instruction, for whatever reason. If the instructions aren't
variable-length, they can get away with it.

However, I must admit that I haven't encountered _many_ computers
that do that. Just the Philco 2000, the DRTE computer, and the
Strela from the Soviet Union.

John Savard

Re: Drastic Simplification of Concertina II Coming

<0b273cf3-777e-43bc-9435-672ba31fde33n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28026&group=comp.arch#28026

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5ace:0:b0:39a:9939:8d9 with SMTP id d14-20020ac85ace000000b0039a993908d9mr1292691qtd.625.1665689512167;
Thu, 13 Oct 2022 12:31:52 -0700 (PDT)
X-Received: by 2002:a05:6808:21a9:b0:354:fecd:6d12 with SMTP id
be41-20020a05680821a900b00354fecd6d12mr708846oib.218.1665689511789; Thu, 13
Oct 2022 12:31:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.nntp4.net!pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 12:31:51 -0700 (PDT)
In-Reply-To: <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.183.236; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.183.236
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me> <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0b273cf3-777e-43bc-9435-672ba31fde33n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 13 Oct 2022 19:31:52 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Thu, 13 Oct 2022 19:31 UTC

On Thursday, October 13, 2022 at 9:25:17 PM UTC+3, MitchAlsup wrote:
> On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
> > On 10/12/2022 11:01 PM, Quadibloc wrote:
> > > On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
> > >
> > >> Basically you are putting too much stuff in a 5 pound bag.
> > >
> > > Well, it _is_ always possible to implement a subset.
> > >
> > Scalability is always a fun issue.
> >
> > One can try to design a minimal ISA.
> > * But, RISC-V is already a fairly sensible design in this space.
> >
> >
> > Namely, RV32I and RV64I are pretty sane.
> <
> 12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.

I'd guess, the motivation for 12-bit displacements was to maximize
semantic similarity between fixed-width and "compressed" variants
of the ISA.
Whether such similarity is a worthy goal is another question.

> Having the OpCode at the little end is sane ?

I fail to see why it matters, at least for fixed-width variant.

> >
> > Getting too much into the RISC-V extensions though, the "crap hits the
> > fan" at this point. Complexity quickly rises to a point where I don't
> > want to deal with it.
> >
> >
> > I will argue that, while the core of BJX2 is more complicated than
> > RISC-V, the complexity curve is shallower (with a more coherent design
> > among the various extensions).
> >
> > Granted, I am one person, which probably helps.
> >
> > There is probably some cruft that could be pruned or done differently if
> > I were to do it again.
> >
> >
> >
> >
> > Though, thinking, one possibility for a "simple" ISA...
> >
> > 16-bit range:
> > zzzz-ssss-dddd-zz00 //16-bit ops
> > zzzz-ssss-dddd-zz01 //16-bit ops
> > zzzz-ssss-dddd-zz10 //16-bit ops
> >
> > 16-bit forms:
> > zzzz-ssss-dddd-zzzz //2R
> > zzzz-iiii-dddd-zzzz //2RI (Imm4)
> > zzzz-iiii-iiii-zzzz //Imm8
> > zzzz-ssss-dddd-sdzz //2R (R0..R31)
> >
> >
> > 32-bit:
> > zzzz-zzzt-tttt-ssss_szzz-dddd-dzzz-zz11 //32-bit ops (3R)
> > iiii-iiii-iiii-ssss_szzz-dddd-dzzz-zz11 //32-bit ops (3RI)
> >
> > Thus far, similar to RISC-V.
> > Could potentially reuse a bunch of encodings from RISC-V.
> > Many ALU ops would be kept as-is.
> > Load Ops would be kept as-is.
> > Store Ops would be switched over to the Load pattern.
> >
> >
> > But, would drop the existing Imm20/Disp20 encodings (LUI/AUIPC/JAL).
> > Encoding space would be reused for some different encodings.
> >
> > Instead (Replacing LUI and AUIPC):
> > iiii-iiii-iiii-iiii_izzz-dddd-d001-0111 //32-bit ops (2RI, Imm17s)
> > iiii-iiii-iiii-iiii_izzz-dddd-d011-0111 //32-bit ops (2RI, Imm17s)
> >
> > Likely, JAL would be replaced with a B/BL instruction, say:
> > iiii-iiii-iiii-iiii_iiii-0000-0110-1111 //B Disp20
> > B: Branch (Disp20)
> > iiii-iiii-iiii-iiii_iiii-0000-1110-1111 //BL Disp20
> > BL: Branch with Link (Disp20), Fixed Link Register
> > Where would be R2..R31 are reserved for other purposes.
> >
> > Also half temped to make the Disp20 encoding less dog-chewed.
> > Though, one could keep all of the dog-chewed immediate and displacement
> > encodings as they are arguably cheaper.
> >
> >
> > The existing Compare-and-Branch instructions (BEQ/BNE/...) would also be
> > dropped, replaced by a "Compare Rd with 0" instruction, with a 17-bit
> > branch displacement. These can do "mostly the same thing", but would be
> > cheaper to implement vs a general-purpose compare.
> >
> > Constant loading could use the LDSH mechanism:
> > LDSH Rd, Imm17u //Rd=(Rd<<17)|Imm17u
> >
> > Say (replacing AUIPC):
> > iiii-iiii-iiii-iiii_i000-dddd-d001-0111 BZ Rd, Disp17s //Rd==0
> > iiii-iiii-iiii-iiii_i001-dddd-d001-0111 BNZ Rd, Disp17s //Rd!=0
> > iiii-iiii-iiii-iiii_i010-dddd-d001-0111 BLT Rd, Disp17s //Rd< 0
> > iiii-iiii-iiii-iiii_i011-dddd-d001-0111 BGE Rd, Disp17s //Rd>=0
> > iiii-iiii-iiii-iiii_i100-dddd-d001-0111 LDI Rd, Imm17s
> > iiii-iiii-iiii-iiii_i101-dddd-d001-0111 LDSH Rd, Imm17u
> > iiii-iiii-iiii-iiii_i110-dddd-d001-0111 BGT Rd, Disp17s //Rd> 0
> > iiii-iiii-iiii-iiii_i111-dddd-d001-0111 BLE Rd, Disp17s //Rd<=0
> >
> >
> >
> > Possible Register Space:
> > R0: ZR (ALU) / PC (Mem/Base)
> > R1: RA/LR (ALU) / TP/TBR (Mem/Base)
> > R2: SP
> > R3: GP/GBR
> > R4 ..R7 : Scratch
> > R8 ..R15: Preserve
> > R16..R23: Scratch
> > R24..R31: Preserve
> >
> > Args: R4..R7, R20..R23
> > Structs: Fits in 1 or 2 registers, pass/return by value;
> > Else, pass by reference, return by copy-via-pointer.
> > Stack will have an 8 argument "Red Zone" followed by overflow args.
> > Yes, basically an MS-style ABI.
> >
> > This is a significant reorganization from the existing RISC-V register
> > space and ABI.
> >
> >
> > If an FPU is present, it would also use GPRs.
> >
> > A 32-bit machine would use paired GPRs for double-precision FPU, a
> > 64-bit machine would use a single register (nominally always Double).
> > Same ABI used for both Hard-FP and FPU emulation.
> >
> > A select few instructions would interpret R0 and R1 as PC and TBR, with
> > others interpreting them as Zero and LR. TBR would be seen as Read-Only
> > to the normal user-mode ISA (would require privileged instructions to
> > modify).
> >
> >
> > Main justification: I think it is possible (if one wanted to do so), to
> > further reduce implementation costs vs RISC-V.
> >
> > Granted, wouldn't be binary compatible with RISC-V, so any similarity or
> > difference is likely moot (though, keeping it similar could make it
> > easier to port a compiler).
> >
> > Though, not likely to be all that useful for much beyond a microcontroller.
> > >> Note:: A bigger bag does not help in this situation...........
> > >
> > > Then why does the fact that it's a 5-pound bag matter?
> > >
> > > But yes, I know it's too complicated. But much less so than my previous
> > > iterations, and so now I could finally say that the goals for Concertina II
> > > are achieved, an almost, but not quite, practical architecture.
> > >
> > > Perhaps someday Concertina III would be fully practical.
> > >
> > My stuff is arguably still not very practical either, FWIW.
> >
> > Would be better if I had a "real OS" and a more "industrial strength" C
> > compiler.
> >
> > Both "try to port Unix or similar" and "try to write a new backend for
> > LLVM or similar" look like an uphill battle.
> >
> >
> > But, seemingly, much of what time "could" be going into new features or
> > new stuff, is mostly going into trying to hunt down obscure compiler
> > bugs and similar in BGBCC...
> >
> Including instruction selection anomalies.
> >
> > Also performance falls short of my original hopes (even if, apparently,
> > it seems I am generally at this point getting better
> > performance-per-clock if compared with a 486 or similar).
> >
> >
> > Well, and on some "vintage" stats, I seem to be beating a 486 by a fair
> > margin (apparently some "vintage" stats were showing Dhrystone 2.1 on a
> > DX2-66 as around 36000, vs my current stat of ~ 74000 at 50MHz, with
> > some variability).
> >
> > Though, annoyingly, would still need ~ 123000 to beat the SweRV core
> > (and GCC) at this metric.
> >
> > Still not entirely sure "how" this score possible (seemingly this would
> > require executing in excess of 2 IPC on a 2-wide superscalar, based on
> > my current stats).
> >
> > Normalizing for bundle size, I currently seem to be getting ~ 0.6 DMIPs
> > per native machine instruction MIPs in BJX2. Granted, compiler could
> > possibly still be improved more here... ( Say, getting more DMIPs per
> > native instruction executed. )
> >
> >
> > Not sure of other metrics, say, what the megapixels/second stat would be
> > for a 486 at JPEG decoding. Last test, was getting around 0.45
> > megapixels/second on my current core (so, roughly 100 clock-cycles per
> > pixel on average).
> >
> >
> > But, yeah, dunno...
> >
> >
> > > John Savard

Click here to read the complete article

Re: Drastic Simplification of Concertina II Coming

<ti9qj5$1s2vj$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28027&group=comp.arch#28027

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Thu, 13 Oct 2022 14:56:04 -0500
Organization: A noiseless patient Spider
Lines: 341
Message-ID: <ti9qj5$1s2vj$2@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
<74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me>
<dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 Oct 2022 19:57:25 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="4842dc7229052c9db27bc5d16a76bd04";
logging-data="1969139"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18n3b5UqAWPROIHXY8i9e17"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:zQhI1MQKgoIWEBD0hEo/aqEKapE=
In-Reply-To: <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 13 Oct 2022 19:56 UTC

On 10/13/2022 1:25 PM, MitchAlsup wrote:
> On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
>> On 10/12/2022 11:01 PM, Quadibloc wrote:
>>> On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
>>>
>>>> Basically you are putting too much stuff in a 5 pound bag.
>>>
>>> Well, it _is_ always possible to implement a subset.
>>>
>> Scalability is always a fun issue.
>>
>> One can try to design a minimal ISA.
>> * But, RISC-V is already a fairly sensible design in this space.
>>
>>
>> Namely, RV32I and RV64I are pretty sane.
> <
> 12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.
> Having the OpCode at the little end is sane ?

It is what it is.
That is basically the biggest they can get away with with their
instruction formats.

Having the opcode and length bits in the LSB makes sense if one wants a
16/32 ISA on a little endian machine and wants consistent bit ordering.

Vs in BJX2, where I ended up with a 16/32 encoding that, while it works,
it is not entirely "endian consistent".

Though, this aspect of RISC-V is partly counter-acted by the bit-pattern
layout in many of its immediate fields being a chewed up mess (so
extracting or inserting an immediate into an instruction requires an
unholy mess of bit-twiddling).

Arguably, 12-bits is bigger than 9-bits (in BJX2), but the combination
of (in BJX2):
Displacements are unsigned and scaled;
There are both register-indexed and Jumbo fallback cases;
...

Can compensate:
(9u*8) can address 4K, whereas (12s*1) can only address 2K.
LDIZ+Ld/St: +/- 16MB with Byte scale
LDIZ+Ld/St: +/- 128MB with QWord scale
Jumbo+Ld/St is up top +/- 32GB

Noting as how both negative and misaligned Load/Store displacements are
rare, and if one can fall back to a 2-op sequence if needed, this works
OK. In RISC-V, this case would require 3 ops (LUI,ADD,Ld/St).

Argument against register-index (like in BJX2), is that it sets the
minimum implementation as having a 3R1W register file. Though this is
"mostly" a reasonably tradeoff IMO. Though one can argue that a
microcontroller or similar does not want or need a 3R1W register file,
if 2R1W can be made to work.

As noted though, smallest I had gotten for 32-bit cores in my tests in
the past was around 5k LUT (for 32-bit cores with no TLB or FPU).

Smallest cases had minimal caches though:
Only held a single cache line at a time;
Would only allow aligned access.

Unaligned adds LUT cost and requires holding at least 2 cache lines.

Though, in this case, time spent executing extra instructions is a
lesser worry, since with these caches a vast majority of the clock
cycles would be spent dealing with cache misses.

Could go multi-core with this, but the performance of each core would be
basically garbage.

Could write up a spec if anyone were interested, though at the moment I
would probably not be likely to do much with this idea.

>>>> Note:: A bigger bag does not help in this situation...........
>>>
>>> Then why does the fact that it's a 5-pound bag matter?
>>>
>>> But yes, I know it's too complicated. But much less so than my previous
>>> iterations, and so now I could finally say that the goals for Concertina II
>>> are achieved, an almost, but not quite, practical architecture.
>>>
>>> Perhaps someday Concertina III would be fully practical.
>>>
>> My stuff is arguably still not very practical either, FWIW.
>>
>> Would be better if I had a "real OS" and a more "industrial strength" C
>> compiler.
>>
>> Both "try to port Unix or similar" and "try to write a new backend for
>> LLVM or similar" look like an uphill battle.
>>
>>
>> But, seemingly, much of what time "could" be going into new features or
>> new stuff, is mostly going into trying to hunt down obscure compiler
>> bugs and similar in BGBCC...
>>
> Including instruction selection anomalies.

Click here to read the complete article

Re: Drastic Simplification of Concertina II Coming

<5cca117b-2155-4e52-b9da-208db1321d32n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28028&group=comp.arch#28028

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:4014:b0:4b3:eff6:d9b with SMTP id kd20-20020a056214401400b004b3eff60d9bmr1584052qvb.20.1665696583587;
Thu, 13 Oct 2022 14:29:43 -0700 (PDT)
X-Received: by 2002:aca:3608:0:b0:34f:bb9b:cdc9 with SMTP id
d8-20020aca3608000000b0034fbb9bcdc9mr898230oia.261.1665696583296; Thu, 13 Oct
2022 14:29:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 14:29:43 -0700 (PDT)
In-Reply-To: <65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com> <65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5cca117b-2155-4e52-b9da-208db1321d32n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 13 Oct 2022 21:29:43 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1978

by: Quadibloc - Thu, 13 Oct 2022 21:29 UTC

On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:

> Basically you are putting too much stuff in a 5 pound bag.

I had encountered one very serious omission in my initial vision of
this architecture that underscored the truth of your comment here.

The mechanism I used to avoid having block headers worked well
enough for predication, and for immediates. However, I lost the ability
to associate a block bit with each instruction, so that the instructions
could be grouped into clumps of multiple instructions all of which could
execute independently in parallel. This, of course, is the basic feature
which defines a modern VLIW design.

I *did* manage to find a way to squeeze that in.

John Savard

Re: Drastic Simplification of Concertina II Coming

<e625c121-ddbd-4d66-a83c-f78eb6edf6c1n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28029&group=comp.arch#28029

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1c1:b0:39a:9da4:1177 with SMTP id t1-20020a05622a01c100b0039a9da41177mr1912444qtw.11.1665699580842;
Thu, 13 Oct 2022 15:19:40 -0700 (PDT)
X-Received: by 2002:a05:6870:82ac:b0:133:34b:6f10 with SMTP id
q44-20020a05687082ac00b00133034b6f10mr1076609oae.218.1665699580567; Thu, 13
Oct 2022 15:19:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 15:19:40 -0700 (PDT)
In-Reply-To: <caa8d2e7-dd58-4a45-b3c2-1af8fc9810b4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a0a3:a43d:704a:ffcf;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a0a3:a43d:704a:ffcf
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me> <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
<caa8d2e7-dd58-4a45-b3c2-1af8fc9810b4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e625c121-ddbd-4d66-a83c-f78eb6edf6c1n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Oct 2022 22:19:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2824

by: MitchAlsup - Thu, 13 Oct 2022 22:19 UTC

On Thursday, October 13, 2022 at 2:21:58 PM UTC-5, Quadibloc wrote:
> On Thursday, October 13, 2022 at 12:25:17 PM UTC-6, MitchAlsup wrote:
>
> > Having the OpCode at the little end is sane ?
<
> I personally favor having the opcode in the front. However, I have
> encountered some real-world computers that put it at the end
> of the instruction, for whatever reason. If the instructions aren't
> variable-length, they can get away with it.
<
The reason I leave the OpCode at the big-end is that you can, then, use
OpCode to distinguish instructions from data. Consider the top 6-bits
of My 66000 instruction-specifier:: If these bits are 000000 or 111111
then the OpCode is invalid (its an integer from 0..67,108,863 or an integer
from -67,108,864..-1) If these bits are 001111 to 01000 or 10111 to
110000 they are float ±{1/32 to 128} so most integers and many floats
cause OPERATION exceptions (The range is bigger if you run into a double
rather than a float and remains roughly centered on 0.0)
<
This is no reason not to have a bit in the PTE to specify execute-ability !
>
> However, I must admit that I haven't encountered _many_ computers
> that do that. Just the Philco 2000, the DRTE computer, and the
> Strela from the Soviet Union.
>
> John Savard

Re: Drastic Simplification of Concertina II Coming

<965e2aa7-bdf4-4b4c-8a29-bb903b5a80b5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28030&group=comp.arch#28030

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:1cc7:b0:4af:6573:c056 with SMTP id g7-20020a0562141cc700b004af6573c056mr1953470qvd.103.1665700417838;
Thu, 13 Oct 2022 15:33:37 -0700 (PDT)
X-Received: by 2002:a05:6830:1217:b0:661:c542:503c with SMTP id
r23-20020a056830121700b00661c542503cmr1116447otp.40.1665700417617; Thu, 13
Oct 2022 15:33:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 15:33:37 -0700 (PDT)
In-Reply-To: <ti9qj5$1s2vj$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a0a3:a43d:704a:ffcf;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a0a3:a43d:704a:ffcf
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me> <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
<ti9qj5$1s2vj$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <965e2aa7-bdf4-4b4c-8a29-bb903b5a80b5n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Oct 2022 22:33:37 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 5092

by: MitchAlsup - Thu, 13 Oct 2022 22:33 UTC

On Thursday, October 13, 2022 at 2:57:28 PM UTC-5, BGB wrote:
> On 10/13/2022 1:25 PM, MitchAlsup wrote:
> > On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
> >> On 10/12/2022 11:01 PM, Quadibloc wrote:
> >>> On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
> >>>
> >>>> Basically you are putting too much stuff in a 5 pound bag.
> >>>
> >>> Well, it _is_ always possible to implement a subset.
> >>>
> >> Scalability is always a fun issue.
> >>
> >> One can try to design a minimal ISA.
> >> * But, RISC-V is already a fairly sensible design in this space.
> >>
> >>
> >> Namely, RV32I and RV64I are pretty sane.
> > <
> > 12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.
> > Having the OpCode at the little end is sane ?
> It is what it is.
> That is basically the biggest they can get away with with their
> instruction formats.
<
It may be "better than nothing" but experience with SPARC (13-bit immediates)
indicates 12-bits is considerably smaller than desired. Looking at ASM code
out of MY 66000 LLVM compiler and the RISC-V LLVM compiler indicates a
useful advantage in having 16-bits over 12-bits (something like 8% of the
instructions disappear having 16-bits instead of just 12-bits)
>
> Having the opcode and length bits in the LSB makes sense if one wants a
> 16/32 ISA on a little endian machine and wants consistent bit ordering.
<
Having though about this:: I submit that were My 66000 ISA to have a 16-bit
"instruction" I would use the major opcode and the minor OpCode of the
3-operand instructions as the pair of OpCodes residing in this 32-bit container.
The 3-operand format has four (4) 5-bit register specifiers and is a natural
for having 2 "16-bit instructions" in a 32-bit word. These register specifiers
would support 2 instructions with 2 Rd registers, and 2 Rs registers and
the 11-bits of Major and Minor OpCodes would allow a sufficient number of
"16-bit instructions".
<
But I would NOT consume 3/4's of the OpCode space for 16-bit instructions.
>
> Vs in BJX2, where I ended up with a 16/32 encoding that, while it works,
> it is not entirely "endian consistent".
>
> Though, this aspect of RISC-V is partly counter-acted by the bit-pattern
> layout in many of its immediate fields being a chewed up mess (so
> extracting or inserting an immediate into an instruction requires an
> unholy mess of bit-twiddling).
<
Some immediates are taken (and assembled) from 3 different fields within
an instruction.
>
>
>
> Arguably, 12-bits is bigger than 9-bits (in BJX2), but the combination
> of (in BJX2):
> Displacements are unsigned and scaled;
> There are both register-indexed and Jumbo fallback cases;
> ...
>
> Can compensate:
> (9u*8) can address 4K, whereas (12s*1) can only address 2K.
> LDIZ+Ld/St: +/- 16MB with Byte scale
> LDIZ+Ld/St: +/- 128MB with QWord scale
> Jumbo+Ld/St is up top +/- 32GB
>
> Noting as how both negative and misaligned Load/Store displacements are
> rare, and if one can fall back to a 2-op sequence if needed, this works
> OK. In RISC-V, this case would require 3 ops (LUI,ADD,Ld/St).
>
>
> Argument against register-index (like in BJX2), is that it sets the
> minimum implementation as having a 3R1W register file. Though this is
<
3R1W is necessary if you want to perform FMAC r = a*b+c, or bit-field
INSert, or CMOV for that mater......
<
> "mostly" a reasonably tradeoff IMO. Though one can argue that a
> microcontroller or similar does not want or need a 3R1W register file,
> if 2R1W can be made to work.
>

Re: Drastic Simplification of Concertina II Coming

<c785e00b-b537-4d56-81bc-b0f058907c76n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28032&group=comp.arch#28032

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:18a:b0:39a:ffaf:6c9d with SMTP id s10-20020a05622a018a00b0039affaf6c9dmr2102853qtw.253.1665705793021;
Thu, 13 Oct 2022 17:03:13 -0700 (PDT)
X-Received: by 2002:a05:6830:2b27:b0:656:f9ef:b4fa with SMTP id
l39-20020a0568302b2700b00656f9efb4famr1193717otv.31.1665705792738; Thu, 13
Oct 2022 17:03:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 17:03:12 -0700 (PDT)
In-Reply-To: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c785e00b-b537-4d56-81bc-b0f058907c76n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 14 Oct 2022 00:03:13 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1645

by: Quadibloc - Fri, 14 Oct 2022 00:03 UTC

On Wednesday, October 12, 2022 at 1:44:17 AM UTC-6, Quadibloc wrote:
> An idea finally came my way as to how I could simplify it drastically
> enough so as to have some chance of consideration. However, the
> simplification comes at a cost which is probably high enough to still
> leave it without much chance.

I have now revised the Concertina II web pages to show this new
attempt, at

http://www.quadibloc.com/arch/ct17int.htm

John Savard

Re: Drastic Simplification of Concertina II Coming

<1c712d8e-3e0f-4e3d-ba98-852a825d2b3fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28037&group=comp.arch#28037

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:19a5:b0:6cf:4a24:cccb with SMTP id bm37-20020a05620a19a500b006cf4a24cccbmr2389751qkb.376.1665717345121;
Thu, 13 Oct 2022 20:15:45 -0700 (PDT)
X-Received: by 2002:aca:5808:0:b0:350:9790:7fe with SMTP id
m8-20020aca5808000000b00350979007femr6540593oib.79.1665717344873; Thu, 13 Oct
2022 20:15:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Oct 2022 20:15:44 -0700 (PDT)
In-Reply-To: <e625c121-ddbd-4d66-a83c-f78eb6edf6c1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me> <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
<caa8d2e7-dd58-4a45-b3c2-1af8fc9810b4n@googlegroups.com> <e625c121-ddbd-4d66-a83c-f78eb6edf6c1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1c712d8e-3e0f-4e3d-ba98-852a825d2b3fn@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 14 Oct 2022 03:15:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2750

by: Quadibloc - Fri, 14 Oct 2022 03:15 UTC

On Thursday, October 13, 2022 at 4:19:42 PM UTC-6, MitchAlsup wrote:

> The reason I leave the OpCode at the big-end is that you can, then, use
> OpCode to distinguish instructions from data. Consider the top 6-bits
> of My 66000 instruction-specifier:: If these bits are 000000 or 111111
> then the OpCode is invalid (its an integer from 0..67,108,863 or an integer
> from -67,108,864..-1)

Yes, that is a good idea.

> If these bits are 001111 to 01000 or 10111 to
> 110000 they are float ±{1/32 to 128} so most integers and many floats
> cause OPERATION exceptions (The range is bigger if you run into a double
> rather than a float and remains roughly centered on 0.0)

Yes, because floats are sign-magnitude, with excess-n notation for the
exponent, one needs to also exclude different values for their most
common values.

> This is no reason not to have a bit in the PTE to specify execute-ability !

Oh, no. Unlike something that absolutely prevents things marked as data
from being executed, it just works a lot of the time. But having it is still of
value, since sometimes features like that aren't used.

John Savard

Re: Drastic Simplification of Concertina II Coming

<tib0tn$21i5l$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28038&group=comp.arch#28038

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Fri, 14 Oct 2022 01:50:14 -0500
Organization: A noiseless patient Spider
Lines: 339
Message-ID: <tib0tn$21i5l$2@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
<74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me>
<dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
<ti9qj5$1s2vj$2@dont-email.me>
<965e2aa7-bdf4-4b4c-8a29-bb903b5a80b5n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 14 Oct 2022 06:51:35 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="3d6399c825085726cd5c084acd7ca12f";
logging-data="2148533"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18kjDdE0K99MU7Tq4mW8pXD"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:S+f+J9CyGxVVOnN0pewCccPEqAU=
Content-Language: en-US
In-Reply-To: <965e2aa7-bdf4-4b4c-8a29-bb903b5a80b5n@googlegroups.com>

by: BGB - Fri, 14 Oct 2022 06:50 UTC

On 10/13/2022 5:33 PM, MitchAlsup wrote:
> On Thursday, October 13, 2022 at 2:57:28 PM UTC-5, BGB wrote:
>> On 10/13/2022 1:25 PM, MitchAlsup wrote:
>>> On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
>>>> On 10/12/2022 11:01 PM, Quadibloc wrote:
>>>>> On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
>>>>>
>>>>>> Basically you are putting too much stuff in a 5 pound bag.
>>>>>
>>>>> Well, it _is_ always possible to implement a subset.
>>>>>
>>>> Scalability is always a fun issue.
>>>>
>>>> One can try to design a minimal ISA.
>>>> * But, RISC-V is already a fairly sensible design in this space.
>>>>
>>>>
>>>> Namely, RV32I and RV64I are pretty sane.
>>> <
>>> 12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.
>>> Having the OpCode at the little end is sane ?
>> It is what it is.
>> That is basically the biggest they can get away with with their
>> instruction formats.
> <
> It may be "better than nothing" but experience with SPARC (13-bit immediates)
> indicates 12-bits is considerably smaller than desired. Looking at ASM code
> out of MY 66000 LLVM compiler and the RISC-V LLVM compiler indicates a
> useful advantage in having 16-bits over 12-bits (something like 8% of the
> instructions disappear having 16-bits instead of just 12-bits)

Fitting larger immediate and displacement values into a 32-bit
instruction word becomes a problem.

Say:
Disp: 16b
Regs: 2*5b
Ld/St Type: 4b
Rest: 2b

One would burn 1/4 of the possible encoding space *just* on the
Load/Store displacement.

12 bits works OK for a simple RISC style ISA since it doesn't wreck
everything else.

Meanwhile, I went with 9 and 10 bits for a lot of stuff, but my ISA
allows predication, bundling, and also an encoding space for 16-bit ops.

With scaling, I get around a 98% hit-rate for Disp9u, which is similar
to the non-scaled Disp12s in RISC-V.

Granted, it makes sense, as both can address a similar size chunk of
memory for 32-bit Ld/St (2kB).

Using XGPR encodings, it is also possible to have a smaller negative
displacement, so one has +2K or +4K, and -128B or -256B. The latter are
rare, given negative displacements are fairly uncommon, which is why I
originally ended up going with an zero-extended displacement in the
first place (only around 0.7% of the displacements being negative).

There was a weak area regarding GBR, which used unscaled displacements
(for historical reasons, *1, and this couldn't be changed for the
existing encodings).

So, ended up adding a few new encodings which used scaled Disp10, so 4K
or 8K, which (if global variables are sorted by access frequency) hits
around 95% of the global variables (vs ~ 30% with the old approach).

With the remaining 5% falling back to 64-bit Disp33s encodings.

*1: Mostly it would have meant that for general-case GBR fixups, I would
have needed a significantly larger number of reloc types (this situation
is bad enough already). So, I had decided early on that PC-rel and
GBR-rel addressing modes would use always byte scale to avoid needing a
bunch of extra reloc types.

For normal immediate-form instructions, there generally seems to be
around a 94% hit rate for the 9 and 10 bit immediate values (but would
drop to around 51% for 5-bit immediate values).

This could depend a lot on the program being compiled though (but
generally seems to hold at least with the programs I am testing with).

Though, given in many of these cases, the "sign extension" is encoded
implicitly via the opcode, they are functionally analogous to 10 and 11
bit for sign-extended immediate values (with the disclaimer that only a
limited subset of instructions can natively encode a negative immediate).

Realistically, beyond a limited range of instructions, negative
immediate values are rare, so it is kind of a waste to always spend a
bit as a dedicated sign bit.

Some instructions do use sign-extended displacements though (generally
things like branches and similar).

>>
>> Having the opcode and length bits in the LSB makes sense if one wants a
>> 16/32 ISA on a little endian machine and wants consistent bit ordering.
> <
> Having though about this:: I submit that were My 66000 ISA to have a 16-bit
> "instruction" I would use the major opcode and the minor OpCode of the
> 3-operand instructions as the pair of OpCodes residing in this 32-bit container.
> The 3-operand format has four (4) 5-bit register specifiers and is a natural
> for having 2 "16-bit instructions" in a 32-bit word. These register specifiers
> would support 2 instructions with 2 Rd registers, and 2 Rs registers and
> the 11-bits of Major and Minor OpCodes would allow a sufficient number of
> "16-bit instructions".
> <
> But I would NOT consume 3/4's of the OpCode space for 16-bit instructions.

Yeah, errm, originally with BJX2 it was 7/8 of the space for 16-bit ops.

Now it is 3/4 due to the XGPR encodings, but they are not contiguous:
F, E, 7, 9

Though, at the time this was the best I could do without seriously
breaking binary compatibility.

One does sorta need to burn a lot of potential encoding space on the
16-bit space for the 16-bit encodings to be "actually usable".

Like, if all one has in 16-bit land is, say:
0: MOV Rm, Rn
1: ADD Rm, Rn
2.0: BRA Disp8s
2.1: ADD Disp8s, SP
2.2: -
2.3: -
3.00: JMP Rn
3.01: -
..
3.1F: -
4: LDI Imm5u, Rn
5: LDI Imm5n, Rn
6: ADD Imm5u, Rn
7: ADD Imm5n, Rn
8: MOV.L (SP, Disp5u), Rn
9: MOV.Q (SP, Disp5u), Rn
A: MOV.L Rn, (SP, Disp5u)
B: MOV.Q Rn, (SP, Disp5u)
C: MOV.X (SP, Disp5u), Rn
D: MOV.X Rn, (SP, Disp5u)
E: -
F: -

Maybe they can fit "all this" into 14 bits or so, but this would be
"kinda lame".

>>
>> Vs in BJX2, where I ended up with a 16/32 encoding that, while it works,
>> it is not entirely "endian consistent".
>>
>> Though, this aspect of RISC-V is partly counter-acted by the bit-pattern
>> layout in many of its immediate fields being a chewed up mess (so
>> extracting or inserting an immediate into an instruction requires an
>> unholy mess of bit-twiddling).
> <
> Some immediates are taken (and assembled) from 3 different fields within
> an instruction.

Both JAL and Bxx are 4 sub-fields.
This is really annoying IMO.

The situation for the 'C' extension is worse here, which is a partial
reason I have not yet implemented this part in my attempts.

Meanwhile, in BJX2:
Imm9 / Disp9: Contiguous bits
Disp10: Contiguous bits
Imm16: Also contiguous
Disp20: Two sub-fields, technically endian reversed if seen as LE.
Imm24: ("LDIz Imm24, R0" and Jumbo prefixes), also two sub-fields.

Typically, the instructions are presented and decoded "as-if" they were
big-endian, even if the individual 16-bit instruction words are
little-endian in memory.

Most of the register encodings are not bit-contiguous. Sign extension
and Jumbo encodings get a little wonky.

Though, for Jumbo encodings, pretty much all the
Imm9u/Imm9n/Imm10u/Imm10n are unified as Imm33s.

So, say we have a Jumbo encoding:
FEaa-bbbb-F2nm-0ecc

Layout, if seen as a 64-bit little-endian value, would be:
0eccF2nmbbbbFEaa

With the immed field:
saabbbbcc
With the e.i bit interpreted as a sign-extension bit(s):
0: Value is 0 extended to 64 bits;
1: Value is 1 extended to 64 bits.

For base opcodes, whether the value is zero or one extended depends on
the instruction (effectively, the opcode encodes an implicit sign bit
for cases where both a zero and one extended encodings exist).

IIRC, I went for this approach early on as it seemed like a more
efficient way to use encoding space (with a slightly larger usable range
in many cases) than the use of sign-extended values.

Though, from a cost POV, one could argue that it could be cheaper for an
ISA to *only* provide zero-extended immediate values in most cases
(where, in a few cases, there is a separate opcode for "interpret this
value as negative"; eg, sign-magnitude rather than twos complement with
one-extension).

Though, one drawback is that it would imply that loading a negative
constant or similar would necessarily need to go through the ALU to
negate the value (and imposing a more significant limit on which sorts
of instructions could accept a negative immediate).

Granted, one could deal with negation in the register-fetch logic, but
this is likely to be more expensive than dealing with them via
"one-extension".

I could have gone either way, and it ended up going the direction it did
I guess.

Click here to read the complete article

Re: Drastic Simplification of Concertina II Coming

<307c39c0-35a6-4ed8-9d30-2479cebc86fan@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28040&group=comp.arch#28040

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5716:0:b0:39c:c97e:ccea with SMTP id 22-20020ac85716000000b0039cc97ecceamr5062833qtw.192.1665768900232;
Fri, 14 Oct 2022 10:35:00 -0700 (PDT)
X-Received: by 2002:a05:6870:785:b0:131:e39c:9140 with SMTP id
en5-20020a056870078500b00131e39c9140mr3422356oab.261.1665768899963; Fri, 14
Oct 2022 10:34:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 14 Oct 2022 10:34:59 -0700 (PDT)
In-Reply-To: <tib0tn$21i5l$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a590:769:e7a8:2e4f;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a590:769:e7a8:2e4f
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com> <74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me> <dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
<ti9qj5$1s2vj$2@dont-email.me> <965e2aa7-bdf4-4b4c-8a29-bb903b5a80b5n@googlegroups.com>
<tib0tn$21i5l$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <307c39c0-35a6-4ed8-9d30-2479cebc86fan@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Oct 2022 17:35:00 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 11958

by: MitchAlsup - Fri, 14 Oct 2022 17:34 UTC

On Friday, October 14, 2022 at 1:51:40 AM UTC-5, BGB wrote:
> On 10/13/2022 5:33 PM, MitchAlsup wrote:
> > On Thursday, October 13, 2022 at 2:57:28 PM UTC-5, BGB wrote:
> >> On 10/13/2022 1:25 PM, MitchAlsup wrote:
> >>> On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
> >>>> On 10/12/2022 11:01 PM, Quadibloc wrote:
> >>>>> On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
> >>>>>
> >>>>>> Basically you are putting too much stuff in a 5 pound bag.
> >>>>>
> >>>>> Well, it _is_ always possible to implement a subset.
> >>>>>
> >>>> Scalability is always a fun issue.
> >>>>
> >>>> One can try to design a minimal ISA.
> >>>> * But, RISC-V is already a fairly sensible design in this space.
> >>>>
> >>>>
> >>>> Namely, RV32I and RV64I are pretty sane.
> >>> <
> >>> 12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.
> >>> Having the OpCode at the little end is sane ?
> >> It is what it is.
> >> That is basically the biggest they can get away with with their
> >> instruction formats.
> > <
> > It may be "better than nothing" but experience with SPARC (13-bit immediates)
> > indicates 12-bits is considerably smaller than desired. Looking at ASM code
> > out of MY 66000 LLVM compiler and the RISC-V LLVM compiler indicates a
> > useful advantage in having 16-bits over 12-bits (something like 8% of the
> > instructions disappear having 16-bits instead of just 12-bits)
> Fitting larger immediate and displacement values into a 32-bit
> instruction word becomes a problem.
>
> Say:
> Disp: 16b
> Regs: 2*5b
> Ld/St Type: 4b
> Rest: 2b
>
> One would burn 1/4 of the possible encoding space *just* on the
> Load/Store displacement.
<
I "waste" 3/16ths of the space on memory refs {7LDs, 4STs}
>
>
> 12 bits works OK for a simple RISC style ISA since it doesn't wreck
> everything else.
>
16-bits did not "wreck" MIPS, or Mc 88K
>
> Meanwhile, I went with 9 and 10 bits for a lot of stuff, but my ISA
> allows predication, bundling, and also an encoding space for 16-bit ops.
>
> With scaling, I get around a 98% hit-rate for Disp9u, which is similar
> to the non-scaled Disp12s in RISC-V.
<
Try EMBench and CoreMark.
<
snip
<
> >> Having the opcode and length bits in the LSB makes sense if one wants a
> >> 16/32 ISA on a little endian machine and wants consistent bit ordering.
> > <
> > Having though about this:: I submit that were My 66000 ISA to have a 16-bit
> > "instruction" I would use the major opcode and the minor OpCode of the
> > 3-operand instructions as the pair of OpCodes residing in this 32-bit container.
> > The 3-operand format has four (4) 5-bit register specifiers and is a natural
> > for having 2 "16-bit instructions" in a 32-bit word. These register specifiers
> > would support 2 instructions with 2 Rd registers, and 2 Rs registers and
> > the 11-bits of Major and Minor OpCodes would allow a sufficient number of
> > "16-bit instructions".
> > <
> > But I would NOT consume 3/4's of the OpCode space for 16-bit instructions.
> Yeah, errm, originally with BJX2 it was 7/8 of the space for 16-bit ops.
>
> Now it is 3/4 due to the XGPR encodings, but they are not contiguous:
> F, E, 7, 9
>
> Though, at the time this was the best I could do without seriously
> breaking binary compatibility.
>
>
>
> One does sorta need to burn a lot of potential encoding space on the
> 16-bit space for the 16-bit encodings to be "actually usable".
>
> Like, if all one has in 16-bit land is, say:
> 0: MOV Rm, Rn
> 1: ADD Rm, Rn
> 2.0: BRA Disp8s
> 2.1: ADD Disp8s, SP
> 2.2: -
> 2.3: -
> 3.00: JMP Rn
> 3.01: -
> ..
> 3.1F: -
> 4: LDI Imm5u, Rn
> 5: LDI Imm5n, Rn
> 6: ADD Imm5u, Rn
> 7: ADD Imm5n, Rn
> 8: MOV.L (SP, Disp5u), Rn
> 9: MOV.Q (SP, Disp5u), Rn
> A: MOV.L Rn, (SP, Disp5u)
> B: MOV.Q Rn, (SP, Disp5u)
> C: MOV.X (SP, Disp5u), Rn
> D: MOV.X Rn, (SP, Disp5u)
> E: -
> F: -
>
> Maybe they can fit "all this" into 14 bits or so, but this would be
> "kinda lame".
<
snip
<
> > Some immediates are taken (and assembled) from 3 different fields within
> > an instruction.
> Both JAL and Bxx are 4 sub-fields.
> This is really annoying IMO.
>
> The situation for the 'C' extension is worse here, which is a partial
> reason I have not yet implemented this part in my attempts.
>
The C-extension moves the register specifiers around and mandates
a MUX between the instruction (showing up) and the register file
port decoders.
>
> Meanwhile, in BJX2:
> Imm9 / Disp9: Contiguous bits
> Disp10: Contiguous bits
> Imm16: Also contiguous
> Disp20: Two sub-fields, technically endian reversed if seen as LE.
> Imm24: ("LDIz Imm24, R0" and Jumbo prefixes), also two sub-fields.
<
In My 66000 all immediates and displacements are contiguous.
>
> Typically, the instructions are presented and decoded "as-if" they were
> big-endian, even if the individual 16-bit instruction words are
> little-endian in memory.
>
> Most of the register encodings are not bit-contiguous. Sign extension
> and Jumbo encodings get a little wonky.
>
> Though, for Jumbo encodings, pretty much all the
> Imm9u/Imm9n/Imm10u/Imm10n are unified as Imm33s.
>
>
> So, say we have a Jumbo encoding:
> FEaa-bbbb-F2nm-0ecc
>
> Layout, if seen as a 64-bit little-endian value, would be:
> 0eccF2nmbbbbFEaa
>
> With the immed field:
> saabbbbcc
> With the e.i bit interpreted as a sign-extension bit(s):
> 0: Value is 0 extended to 64 bits;
> 1: Value is 1 extended to 64 bits.
>
> For base opcodes, whether the value is zero or one extended depends on
> the instruction (effectively, the opcode encodes an implicit sign bit
> for cases where both a zero and one extended encodings exist).
>
>
> IIRC, I went for this approach early on as it seemed like a more
> efficient way to use encoding space (with a slightly larger usable range
> in many cases) than the use of sign-extended values.
>
>
> Though, from a cost POV, one could argue that it could be cheaper for an
> ISA to *only* provide zero-extended immediate values in most cases
> (where, in a few cases, there is a separate opcode for "interpret this
> value as negative"; eg, sign-magnitude rather than twos complement with
> one-extension).
<
The compiler people will not accept this--even though the address arithmetic
is "conformable" -- that is:: takes no more instructions.
>
> Though, one drawback is that it would imply that loading a negative
> constant or similar would necessarily need to go through the ALU to
> negate the value (and imposing a more significant limit on which sorts
> of instructions could accept a negative immediate).
<
I have sign control over My 66000 operands, so one would not have to
"go through" the ALU to get negated or inverted.
>
> Granted, one could deal with negation in the register-fetch logic, but
> this is likely to be more expensive than dealing with them via
> "one-extension".
>
> I could have gone either way, and it ended up going the direction it did
> I guess.
>
> ..
> >>
> >>
> >>
> >> Arguably, 12-bits is bigger than 9-bits (in BJX2), but the combination
> >> of (in BJX2):
> >> Displacements are unsigned and scaled;
> >> There are both register-indexed and Jumbo fallback cases;
> >> ...
> >>
> >> Can compensate:
> >> (9u*8) can address 4K, whereas (12s*1) can only address 2K.
> >> LDIZ+Ld/St: +/- 16MB with Byte scale
> >> LDIZ+Ld/St: +/- 128MB with QWord scale
> >> Jumbo+Ld/St is up top +/- 32GB
> >>
> >> Noting as how both negative and misaligned Load/Store displacements are
> >> rare, and if one can fall back to a 2-op sequence if needed, this works
> >> OK. In RISC-V, this case would require 3 ops (LUI,ADD,Ld/St).
> >>
> >>
> >> Argument against register-index (like in BJX2), is that it sets the
> >> minimum implementation as having a 3R1W register file. Though this is
> > <
> > 3R1W is necessary if you want to perform FMAC r = a*b+c, or bit-field
> > INSert, or CMOV for that mater......
<
> One would generally not have any of these in a microcontroller class CPU.
<
What if the microcontroller is performing filtering ?
>
> My idea of a hacked up RISC-V would mostly be something that could
> potentially be slightly cheaper to implement in a microcontroller than a
> proper RISC-V.
>
> But, not likely to be all that useful for much outside the
> microcontroller space.
>
>
> One could argue for a CPU like, say:
> Fixed-length instructions;
> Aligned-only memory access;
> No FPU or MMU;
> Only Load/Store and basic ALU ops and similar;
> ...
>
> Many ISAs in this class also omit things like integer multiply and shift
> instructions (forcing these to be faked in software, awkwardly, using
> loops and instruction slides and similar).
>
>
> Say:
> ...
> __shll_4:
> ADD R4, R4, R4
> __shll_3:
> ADD R4, R4, R4
> __shll_2:
> ADD R4, R4, R4
> __shll_1:
> ADD R4, R4, R4
> __shll_0:
> JMP R1
>
> __shll: //variable shift
> AND R5, R5, 31
> LDI R7, 31
> SUB R7, R7, R5
> MOV R6, __shll_31
> ADD R7, R7, R7
> ADD R7, R7, R7
> ADD R6, R6, R7
> JMP R6
>
> Sucks for performance, but many microcontrollers have seemingly found
> this sort of thing to be a reasonable tradeoff.
>
>
> Though, one can make a case for having general purpose shift and
> multiply instructions (and there are ways to implement them that, while
> still slow, aren't too expensive in terms of LUTs; and are basically a
> non-issue for most FPGAs).
>
With a 4-wide medium OoO machine coming in under 2mm^2, the bonding
pads to DRAM consume more die area than the processor. And the above
has a double precision FMAC unit. (BOOM, btw)
>
> Granted, there are also cases where one wants the microcontroller to be
> at least sort of fast, rather than "as cheap as possible" (and Microchip
> and Texas Instruments already have a pretty good hold on this market
> segment).
> > <
> >> "mostly" a reasonably tradeoff IMO. Though one can argue that a
> >> microcontroller or similar does not want or need a 3R1W register file,
> >> if 2R1W can be made to work.
> >>
> >

Click here to read the complete article

Re: Drastic Simplification of Concertina II Coming

<tidkgt$2inp4$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28042&group=comp.arch#28042

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Sat, 15 Oct 2022 01:38:20 -0500
Organization: A noiseless patient Spider
Lines: 478
Message-ID: <tidkgt$2inp4$1@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<65d917e8-e21c-4f22-bd11-884d167cefb9n@googlegroups.com>
<74ac086e-80c0-4684-8ea9-76793e776416n@googlegroups.com>
<ti8m7c$1p16i$2@dont-email.me>
<dc332703-8507-45a3-82e0-ebdea5d80427n@googlegroups.com>
<ti9qj5$1s2vj$2@dont-email.me>
<965e2aa7-bdf4-4b4c-8a29-bb903b5a80b5n@googlegroups.com>
<tib0tn$21i5l$2@dont-email.me>
<307c39c0-35a6-4ed8-9d30-2479cebc86fan@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Oct 2022 06:38:21 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="125ee12be3541046de484e21c0067fe2";
logging-data="2711332"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+yJ77SOo3NSGnL17MHt2Ai"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.3.2
Cancel-Lock: sha1:57hhA4FvnW6YEGdAT6hAr8qqMGM=
Content-Language: en-US
In-Reply-To: <307c39c0-35a6-4ed8-9d30-2479cebc86fan@googlegroups.com>

by: BGB - Sat, 15 Oct 2022 06:38 UTC

On 10/14/2022 12:34 PM, MitchAlsup wrote:
> On Friday, October 14, 2022 at 1:51:40 AM UTC-5, BGB wrote:
>> On 10/13/2022 5:33 PM, MitchAlsup wrote:
>>> On Thursday, October 13, 2022 at 2:57:28 PM UTC-5, BGB wrote:
>>>> On 10/13/2022 1:25 PM, MitchAlsup wrote:
>>>>> On Thursday, October 13, 2022 at 4:36:48 AM UTC-5, BGB wrote:
>>>>>> On 10/12/2022 11:01 PM, Quadibloc wrote:
>>>>>>> On Wednesday, October 12, 2022 at 1:15:20 PM UTC-6, MitchAlsup wrote:
>>>>>>>
>>>>>>>> Basically you are putting too much stuff in a 5 pound bag.
>>>>>>>
>>>>>>> Well, it _is_ always possible to implement a subset.
>>>>>>>
>>>>>> Scalability is always a fun issue.
>>>>>>
>>>>>> One can try to design a minimal ISA.
>>>>>> * But, RISC-V is already a fairly sensible design in this space.
>>>>>>
>>>>>>
>>>>>> Namely, RV32I and RV64I are pretty sane.
>>>>> <
>>>>> 12-bit displacements are sane ? maybe at the 85% level, but not at the 95% level.
>>>>> Having the OpCode at the little end is sane ?
>>>> It is what it is.
>>>> That is basically the biggest they can get away with with their
>>>> instruction formats.
>>> <
>>> It may be "better than nothing" but experience with SPARC (13-bit immediates)
>>> indicates 12-bits is considerably smaller than desired. Looking at ASM code
>>> out of MY 66000 LLVM compiler and the RISC-V LLVM compiler indicates a
>>> useful advantage in having 16-bits over 12-bits (something like 8% of the
>>> instructions disappear having 16-bits instead of just 12-bits)
>> Fitting larger immediate and displacement values into a 32-bit
>> instruction word becomes a problem.
>>
>> Say:
>> Disp: 16b
>> Regs: 2*5b
>> Ld/St Type: 4b
>> Rest: 2b
>>
>> One would burn 1/4 of the possible encoding space *just* on the
>> Load/Store displacement.
> <
> I "waste" 3/16ths of the space on memory refs {7LDs, 4STs}

I was assuming a scheme like (in an RV-like notation):
LDB, LDH, LDW, LDQ
LDUB, LDUH, LDUW, LDX
STB, STH, STW, STQ
-, -, -, STX

But, yeah, 16 op spots with 2 regs and a 16-bit displacement is "pretty
steep".

>>
>>
>> 12 bits works OK for a simple RISC style ISA since it doesn't wreck
>> everything else.
>>
> 16-bits did not "wreck" MIPS, or Mc 88K
>>
>> Meanwhile, I went with 9 and 10 bits for a lot of stuff, but my ISA
>> allows predication, bundling, and also an encoding space for 16-bit ops.
>>
>> With scaling, I get around a 98% hit-rate for Disp9u, which is similar
>> to the non-scaled Disp12s in RISC-V.
> <
> Try EMBench and CoreMark.
> <

I briefly looked at CoreMark, but it looked more complicated to port
over, so I didn't do so at the time.

A lot of my stats are with Doom and Quake, which fit reasonably well
into 9-bit displacements.

My BtMini3 (backport of my BGBTech3 engine to BJX2), did more frequently
blow out the Load/Store displacement (forcing the use of 64-bit encodings).

This was mostly because it uses a few fairly large structs.

It does get a higher hit-rate for global variables, mostly because it
uses less global variables than Doom or Quake (a lot of my own code
tends to rely more on putting context into structs than into global
variables).

> snip
> <
>>>> Having the opcode and length bits in the LSB makes sense if one wants a
>>>> 16/32 ISA on a little endian machine and wants consistent bit ordering.
>>> <
>>> Having though about this:: I submit that were My 66000 ISA to have a 16-bit
>>> "instruction" I would use the major opcode and the minor OpCode of the
>>> 3-operand instructions as the pair of OpCodes residing in this 32-bit container.
>>> The 3-operand format has four (4) 5-bit register specifiers and is a natural
>>> for having 2 "16-bit instructions" in a 32-bit word. These register specifiers
>>> would support 2 instructions with 2 Rd registers, and 2 Rs registers and
>>> the 11-bits of Major and Minor OpCodes would allow a sufficient number of
>>> "16-bit instructions".
>>> <
>>> But I would NOT consume 3/4's of the OpCode space for 16-bit instructions.
>> Yeah, errm, originally with BJX2 it was 7/8 of the space for 16-bit ops.
>>
>> Now it is 3/4 due to the XGPR encodings, but they are not contiguous:
>> F, E, 7, 9
>>
>> Though, at the time this was the best I could do without seriously
>> breaking binary compatibility.
>>
>>
>>
>> One does sorta need to burn a lot of potential encoding space on the
>> 16-bit space for the 16-bit encodings to be "actually usable".
>>
>> Like, if all one has in 16-bit land is, say:
>> 0: MOV Rm, Rn
>> 1: ADD Rm, Rn
>> 2.0: BRA Disp8s
>> 2.1: ADD Disp8s, SP
>> 2.2: -
>> 2.3: -
>> 3.00: JMP Rn
>> 3.01: -
>> ..
>> 3.1F: -
>> 4: LDI Imm5u, Rn
>> 5: LDI Imm5n, Rn
>> 6: ADD Imm5u, Rn
>> 7: ADD Imm5n, Rn
>> 8: MOV.L (SP, Disp5u), Rn
>> 9: MOV.Q (SP, Disp5u), Rn
>> A: MOV.L Rn, (SP, Disp5u)
>> B: MOV.Q Rn, (SP, Disp5u)
>> C: MOV.X (SP, Disp5u), Rn
>> D: MOV.X Rn, (SP, Disp5u)
>> E: -
>> F: -
>>
>> Maybe they can fit "all this" into 14 bits or so, but this would be
>> "kinda lame".
> <
> snip
> <
>>> Some immediates are taken (and assembled) from 3 different fields within
>>> an instruction.
>> Both JAL and Bxx are 4 sub-fields.
>> This is really annoying IMO.
>>
>> The situation for the 'C' extension is worse here, which is a partial
>> reason I have not yet implemented this part in my attempts.
>>
> The C-extension moves the register specifiers around and mandates
> a MUX between the instruction (showing up) and the register file
> port decoders.

Yeah. Things like the 3-bit register fields on some instructions are
also annoying.

But, in my case, having situations where different instructions which
would appear to have the same layout, have different immediate fields
because they effectively rotated bits from one side of the immediate to
the other based on operand size or similar; I don't like this.

So, to decode the C extension would be a bit more of an issue than I
would prefer; and a bit more of an issue than the "at least slightly
more regular" encodings used by SuperH or BJX2.

Also in my case (unlike both RV-C and Thumb), BJX2 lacks any 16-bit 3R
encodings. Though, partly this was a case of noting that it didn't seem
like they would have a high enough "hit rate" to make them worthwhile
(regardless of the subset of registers chosen for the 3-bit register field).

Though, if I were to do it, likely:
R4, R5, R2, R14, R10, R11, R12, R13
Basically, some of the highest-traffic registers.

However, adding anything new to 16-bit land is unlikely, as 16-bit land
is basically already full (would need to find something "sufficiently
low priority" to drop to make room for these).

Looks like (for 3R ops):
ADDS.L, ADDU.L, and ADD
Are the main candidates for "promotion".

4A/4B, 56/57, and 24/25, could (in theory) be used to hold them (these
spots being "basically unused").

A bigger gain would be possible by dropping and replacing the Azzz/Bzzz
blocks ("LDIz Imm12, R0"), or 8zzz (Ld/St MOV.L Disp3), and reusing this
as a dedicated 3R space.

Mostly noting as these currently seem to be some of the lowest-traffic
blocks in the 16-bit encoding space.

With replacement blocks then holding, say:
ADD, SUB, ADDS.L, ADDU.L, MULS, AND, OR, XOR

Upper limit for potential savings: 1.1% off the size of Doom (in
practice, likely to be a fair bit less).

Would require a bit more looking into determine whether it would be
"actually worth it" (and any of these options would break binary
compatibility with existing code).

Major high-traffic areas in 16-bit land being:
MOV 2R
BRA, BT, BF (Disp8)
Ld/St (SP, Disp4), Rn
ADD Imm8s, Rn
LDI Imm8s, Rn
CMPxx
...

>>
>> Meanwhile, in BJX2:
>> Imm9 / Disp9: Contiguous bits
>> Disp10: Contiguous bits
>> Imm16: Also contiguous
>> Disp20: Two sub-fields, technically endian reversed if seen as LE.
>> Imm24: ("LDIz Imm24, R0" and Jumbo prefixes), also two sub-fields.
> <
> In My 66000 all immediates and displacements are contiguous.

Click here to read the complete article

Re: Drastic Simplification of Concertina II Coming

<tie0p1$1m7vi$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28043&group=comp.arch#28043

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-672-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Sat, 15 Oct 2022 10:07:29 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <tie0p1$1m7vi$1@newsreader4.netcologne.de>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com>
Injection-Date: Sat, 15 Oct 2022 10:07:29 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-672-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:672:0:7285:c2ff:fe6c:992d";
logging-data="1777650"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sat, 15 Oct 2022 10:07 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:
> On Wednesday, October 12, 2022 at 1:44:17 AM UTC-6, Quadibloc wrote:
>
>> A second half that does predication can be required to only appear in
>> the last 32-bit instruction slot of the block preceding the block
>> containing the instructions it affects.
>
> If the second halves that do header functions affect the next block,
> rather than the next eight instructions, then they can be allowed to
> appear in any position, thus not interfering with the desired sequence
> of instructions to the same extent.

Being able to bundle two 16-bit instructions on a 32-bit boundary
only will make for interesting challenges - what should be done
if there is nothing reasonable that fits in 16 bits?

And what do you do if you want to do predication, but have nothing
reasonable to fit in the first instruction half?

The answer to both would probably be "insert a nop".

A compiler could try to split 32-bit operations like

ADD Ra,Rb,Rc

into two 16-bit instructions like

MR Ra,Rb
ADD Ra,Rc

distribute them to make more 16-bit instructions and rely on OOO
implementations to elide the register move, but this is less than
elegant, decoding needs more power anyway, and compiler writers
will not love this.

Re: Drastic Simplification of Concertina II Coming

<3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28044&group=comp.arch#28044

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:be02:0:b0:6ed:1b73:a5a5 with SMTP id o2-20020a37be02000000b006ed1b73a5a5mr1912122qkf.214.1665844051354;
Sat, 15 Oct 2022 07:27:31 -0700 (PDT)
X-Received: by 2002:a05:6871:81e:b0:125:66d3:dac with SMTP id
q30-20020a056871081e00b0012566d30dacmr96317oap.1.1665843948600; Sat, 15 Oct
2022 07:25:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 15 Oct 2022 07:25:48 -0700 (PDT)
In-Reply-To: <tie0p1$1m7vi$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:c4f5:5620:3650:b3af;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:c4f5:5620:3650:b3af
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com> <tie0p1$1m7vi$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sat, 15 Oct 2022 14:27:31 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1793

by: Quadibloc - Sat, 15 Oct 2022 14:25 UTC

On Saturday, October 15, 2022 at 4:07:33 AM UTC-6, Thomas Koenig wrote:

> The answer to both would probably be "insert a nop".

Yes.

> A compiler could try to split 32-bit operations

I wouldn't recommend that, because it would slow down the
program. I intend it to be possible that implementations of
my architecture be simple. After all, I am including the feature
of explicitly indicating that some instructions can be executed
in parallel - to allow efficient VLIW execution even without OoO.

John Savard

Re: Drastic Simplification of Concertina II Coming

<tielif$2qj9c$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28045&group=comp.arch#28045

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Sat, 15 Oct 2022 11:02:23 -0500
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <tielif$2qj9c$2@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com>
<tie0p1$1m7vi$1@newsreader4.netcologne.de>
<3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Oct 2022 16:02:24 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="125ee12be3541046de484e21c0067fe2";
logging-data="2968876"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180zVI4OiG5NXToHuY69FNQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.3.2
Cancel-Lock: sha1:F102AxurRIiaWZL/23wbhFuBME8=
Content-Language: en-US
In-Reply-To: <3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com>

by: BGB - Sat, 15 Oct 2022 16:02 UTC

On 10/15/2022 9:25 AM, Quadibloc wrote:
> On Saturday, October 15, 2022 at 4:07:33 AM UTC-6, Thomas Koenig wrote:
>
>> The answer to both would probably be "insert a nop".
>
> Yes.
>
>> A compiler could try to split 32-bit operations
>
> I wouldn't recommend that, because it would slow down the
> program. I intend it to be possible that implementations of
> my architecture be simple. After all, I am including the feature
> of explicitly indicating that some instructions can be executed
> in parallel - to allow efficient VLIW execution even without OoO.
>

This was a plan for my BJX2 ISA as well.

Sadly, it isn't even really solidly beating some in-order superscalar
cores, at least on some benchmarks.

Doom and Quake are still pretty slow on RISC-V.

So, despite the much higher Dhrystone number, a 33 MHz RISC-V (RV32GC)
core still can't exactly give playable framerates in Quake.

Even Doom "isn't looking too hot".

Otherwise, I made the observation that seemingly right now, the problem
with 2-way and 4-way set-associative L2 isn't necessarily that it
increases miss rate in Doom (it appears overall L2 miss rate does
actually go down with 4-way vs 1-way), but rather that it seems to cause
a significant increase in the miss-rate for VRAM requests, implying that
the 4-way cache causes the running program and display framebuffer to
more often come into conflict (evicting VRAM contents and tying things
up reading the framebuffer back in from RAM).

> John Savard

Re: Drastic Simplification of Concertina II Coming

<tielvj$1mm87$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28046&group=comp.arch#28046

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-672-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Sat, 15 Oct 2022 16:09:23 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <tielvj$1mm87$1@newsreader4.netcologne.de>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com>
<tie0p1$1m7vi$1@newsreader4.netcologne.de>
<3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com>
Injection-Date: Sat, 15 Oct 2022 16:09:23 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-672-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:672:0:7285:c2ff:fe6c:992d";
logging-data="1792263"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sat, 15 Oct 2022 16:09 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:
> On Saturday, October 15, 2022 at 4:07:33 AM UTC-6, Thomas Koenig wrote:
>
>> The answer to both would probably be "insert a nop".
>
> Yes.
>
>> A compiler could try to split 32-bit operations

Context restored:

# like

# ADD Ra,Rb,Rc

# into two 16-bit instructions like

# MR Ra,Rb
# ADD Ra,Rc

# distribute them to make more 16-bit instructions and rely on OOO
# implementations to elide the register move,

> I wouldn't recommend that, because it would slow down the
> program.

You snipped the remark about eliding the register move. This
would not take time on a modern implementation, register to
register moves are free (as long as you don't do too many of
them).

> I intend it to be possible that implementations of
> my architecture be simple.

A simple architecture would use 32-bit instructions only.
Adding 16-bit instructions adds complexity. Adding them in such
a way that you will have to add lots of 16-bit nops negates
the benefit for the added complexity.

> After all, I am including the feature
> of explicitly indicating that some instructions can be executed
> in parallel - to allow efficient VLIW execution even without OoO.

The Alpha 21064 managed dual-issue superscalar execution without
It was introduced in 1992, and had 1.68 million transistors...

Re: Drastic Simplification of Concertina II Coming

<tiepfk$2qvu2$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28047&group=comp.arch#28047

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Sat, 15 Oct 2022 12:09:08 -0500
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <tiepfk$2qvu2$1@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com>
<tie0p1$1m7vi$1@newsreader4.netcologne.de>
<3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com>
<tielvj$1mm87$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Oct 2022 17:09:08 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="125ee12be3541046de484e21c0067fe2";
logging-data="2981826"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/IuM1J9N+7pF7MOAPOeoa6"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.3.2
Cancel-Lock: sha1:AAmgBUKsqqiu83GqTRxCrjBrILU=
In-Reply-To: <tielvj$1mm87$1@newsreader4.netcologne.de>
Content-Language: en-US

by: BGB - Sat, 15 Oct 2022 17:09 UTC

On 10/15/2022 11:09 AM, Thomas Koenig wrote:
> Quadibloc <jsavard@ecn.ab.ca> schrieb:
>> On Saturday, October 15, 2022 at 4:07:33 AM UTC-6, Thomas Koenig wrote:
>>
>>> The answer to both would probably be "insert a nop".
>>
>> Yes.
>>
>>> A compiler could try to split 32-bit operations
>
> Context restored:
>
> # like
>
> # ADD Ra,Rb,Rc
>
> # into two 16-bit instructions like
>
> # MR Ra,Rb
> # ADD Ra,Rc
>
> # distribute them to make more 16-bit instructions and rely on OOO
> # implementations to elide the register move,
>
>> I wouldn't recommend that, because it would slow down the
>> program.
>
> You snipped the remark about eliding the register move. This
> would not take time on a modern implementation, register to
> register moves are free (as long as you don't do too many of
> them).
>

FWIW: In my case, unbundled MOV takes 1 cycle, bundling a MOV can reduce
it to 0 cycle, but depends mostly on the compiler emitting the right
instructions in the right order.

>> I intend it to be possible that implementations of
>> my architecture be simple.
>
> A simple architecture would use 32-bit instructions only.
> Adding 16-bit instructions adds complexity. Adding them in such
> a way that you will have to add lots of 16-bit nops negates
> the benefit for the added complexity.
>

Agreed. I had partly evaluated adding (partial) alignment restrictions
to my ISA (mostly that bundles would be required to have 32-bit
alignment), but then realized that the extra padding hassles had a bad
enough impact on code density that it was not worthwhile.

So, ended up staying with everything having a 16-bit alignment.
Had also looked into 24-bit ops (with everything dropping to byte
alignment in this case), but this would have had similar issues. This
idea was later dropped (its encoding space being reassigned to the XGPR
extension).

>> After all, I am including the feature
>> of explicitly indicating that some instructions can be executed
>> in parallel - to allow efficient VLIW execution even without OoO.
>
> The Alpha 21064 managed dual-issue superscalar execution without
> It was introduced in 1992, and had 1.68 million transistors...

Superscalar can work well here.

It just sort of replaces encoding it in the instruction, with logic to
detect whether or not two instructions will alias and can be fit into
the pipeline.

If it is "conservative" it doesn't necessarily need to have full
understanding of the ISA, merely blacklist parts that are not valid
prefix instructions or valid suffix instructions; and check register
fields to make sure they don't clash (with per-block encoding of 2R vs
3R, etc).

Say (if no register clash):
VP VS //Execute in parallel
VS VP //Swap places, Execute in parallel

In this case, it is more a question of "can the compiler beat superscalar?"
In theory, it should be able to, as it has more ability to analyze and
shuffle things in this case.

Not so "cut and dry" in practice.

Re: Drastic Simplification of Concertina II Coming

<a967ca98-cb45-4b92-8fd9-7c0d9791448cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28050&group=comp.arch#28050

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:178d:b0:6ee:9241:89c8 with SMTP id ay13-20020a05620a178d00b006ee924189c8mr3187530qkb.194.1665874535848;
Sat, 15 Oct 2022 15:55:35 -0700 (PDT)
X-Received: by 2002:a05:6871:68b:b0:132:9af1:62fb with SMTP id
l11-20020a056871068b00b001329af162fbmr11778282oao.23.1665874535525; Sat, 15
Oct 2022 15:55:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 15 Oct 2022 15:55:35 -0700 (PDT)
In-Reply-To: <tielvj$1mm87$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:a417:fcf5:a8a4:b742;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:a417:fcf5:a8a4:b742
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com> <tie0p1$1m7vi$1@newsreader4.netcologne.de>
<3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com> <tielvj$1mm87$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a967ca98-cb45-4b92-8fd9-7c0d9791448cn@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 15 Oct 2022 22:55:35 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2092

by: Michael S - Sat, 15 Oct 2022 22:55 UTC

On Saturday, October 15, 2022 at 7:09:26 PM UTC+3, Thomas Koenig wrote:

> The Alpha 21064 managed dual-issue superscalar execution without
> It was introduced in 1992, and had 1.68 million transistors...

And was one of the last, if not the last, among major players, to do so.

If I am not mistaken, the first commercial superscalar CPU was Intel
i960CA. Although it probably can't be considered general-purpose due
to absence of FPU and MMU. On a plus side, it was single-chip.
Then came IBM Power that had both MMU and FPU, but consisted
of many chips.
Then came PA-RISC 7000, SuperSPARC, Mc88110.
Then PA-RISC 7100.
Even i860 was sort-of superscalar, but more in VLIWish fashion.

Re: Drastic Simplification of Concertina II Coming

<277c9291-0e4c-4429-a067-204491607f38n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28052&group=comp.arch#28052

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5c8b:0:b0:39b:ff53:bb57 with SMTP id r11-20020ac85c8b000000b0039bff53bb57mr5597571qta.293.1665935924426;
Sun, 16 Oct 2022 08:58:44 -0700 (PDT)
X-Received: by 2002:a05:6808:483:b0:354:927a:b212 with SMTP id
z3-20020a056808048300b00354927ab212mr324706oid.1.1665935924186; Sun, 16 Oct
2022 08:58:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 16 Oct 2022 08:58:43 -0700 (PDT)
In-Reply-To: <tielvj$1mm87$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:c59:b3cc:afc7:8d26;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:c59:b3cc:afc7:8d26
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com> <tie0p1$1m7vi$1@newsreader4.netcologne.de>
<3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com> <tielvj$1mm87$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <277c9291-0e4c-4429-a067-204491607f38n@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 16 Oct 2022 15:58:44 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1997

by: Quadibloc - Sun, 16 Oct 2022 15:58 UTC

On Saturday, October 15, 2022 at 10:09:26 AM UTC-6, Thomas Koenig wrote:

> Adding 16-bit instructions adds complexity. Adding them in such
> a way that you will have to add lots of 16-bit nops negates
> the benefit for the added complexity.

Setup directives can appear anywhere in a block, and there are 32-bit
instructions that can do anything a 16-bit instruction can do, so I doubt
there would be lots of nops.

Adding them the way I have means that the program is still a stream of
32-bit items that can be decoded separately. So a major benefit of not
having 16-bit instructions is preserved.

John Savard

Re: Drastic Simplification of Concertina II Coming

<g1_2L.203880$w35c.63997@fx47.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=28055&group=comp.arch#28055

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx47.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Drastic Simplification of Concertina II Coming
Newsgroups: comp.arch
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com> <779208a3-574e-42c4-a52b-28009c92dca5n@googlegroups.com> <tie0p1$1m7vi$1@newsreader4.netcologne.de> <3f810617-404d-4304-9c95-f38022100f5dn@googlegroups.com> <tielvj$1mm87$1@newsreader4.netcologne.de> <a967ca98-cb45-4b92-8fd9-7c0d9791448cn@googlegroups.com>
Lines: 15
Message-ID: <g1_2L.203880$w35c.63997@fx47.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 16 Oct 2022 20:42:52 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 16 Oct 2022 20:42:52 GMT
X-Received-Bytes: 1498

by: Scott Lurndal - Sun, 16 Oct 2022 20:42 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Saturday, October 15, 2022 at 7:09:26 PM UTC+3, Thomas Koenig wrote:
>
>> The Alpha 21064 managed dual-issue superscalar execution without
>> It was introduced in 1992, and had 1.68 million transistors...
>
>And was one of the last, if not the last, among major players, to do so.
>
>If I am not mistaken, the first commercial superscalar CPU was Intel
>i960CA.

There were earlier "superscalar" implementations in the
mainframe era (Burroughs, CDC and IBM) dating back to the 1960s.

The superscalar motorola 88100 (speaking of delay slots) predated the i960CA by a year.

Pages:12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor