Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Spock: We suffered 23 casualties in that attack, Captain.


devel / comp.arch / Re: Squeezing Those Bits: Concertina II

SubjectAuthor
* Squeezing Those Bits: Concertina IIQuadibloc
+* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|`* Re: Squeezing Those Bits: Concertina IIQuadibloc
| `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |`* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  | `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |  `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |   `* Re: Squeezing Those Bits: Concertina IIStephen Fuld
|  |    +- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |    `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |     `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |+- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |`* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      | `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |  `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |   `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    |+* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    ||`- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    |+- Re: Squeezing Those Bits: Concertina IIIvan Godard
|  |      |    |+- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    |`* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | +- Re: Squeezing Those Bits: Concertina IITerje Mathisen
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |`* Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | | +* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | | |`- Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | | `* Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |  +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |  |+* Re: Squeezing Those Bits: Concertina IIJohn Dallman
|  |      |    | |  ||+- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |  ||`* Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |  || `* Re: Squeezing Those Bits: Concertina IIEricP
|  |      |    | |  ||  `- Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |  |+- Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |  |+* Re: Squeezing Those Bits: Concertina IIAnssi Saari
|  |      |    | |  ||`- Re: Squeezing Those Bits: Concertina IITerje Mathisen
|  |      |    | |  |`* Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | |  | `* Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |  |  `- Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | |  `* Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | |   `* Re: Squeezing Those Bits: Concertina IIMarcus
|  |      |    | |    `* Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | |     `* Re: Squeezing Those Bits: Concertina IIMarcus
|  |      |    | |      `* Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | |       `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |        `- Re: Squeezing Those Bits: Concertina IIMarcus
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIIvan Godard
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |`- Re: Squeezing Those Bits: Concertina IIBGB
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |`* Re: Squeezing Those Bits: Concertina IIIvan Godard
|  |      |    | | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | | `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |  +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |  `- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |`* Re: Squeezing Those Bits: Concertina IIEricP
|  |      |    | | `- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |`- Re: Squeezing Those Bits: Concertina IIMarcus
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |`* Re: Squeezing Those Bits: Concertina IIStefan Monnier
|  |      |    | | `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |  `* Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |   +* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |   |+- Re: Squeezing Those Bits: Concertina IITerje Mathisen
|  |      |    | |   |`- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |   `* Re: Squeezing Those Bits: Concertina IIGeorge Neuner
|  |      |    | |    +- Re: Squeezing Those Bits: Concertina IITerje Mathisen
|  |      |    | |    +* Re: Squeezing Those Bits: Concertina IIAnton Ertl
|  |      |    | |    |`- Re: Squeezing Those Bits: Concertina IIStefan Monnier
|  |      |    | |    +- Re: Squeezing Those Bits: Concertina IIThomas Koenig
|  |      |    | |    `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |     `* Re: Squeezing Those Bits: Concertina IIMarcus
|  |      |    | |      `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | |       `- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | +* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | |`* Re: Squeezing Those Bits: Concertina IIEricP
|  |      |    | | +* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | | |`* Re: Squeezing Those Bits: Concertina IIEricP
|  |      |    | | | `- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | | `- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |    | +- Re: Squeezing Those Bits: Concertina IIJimBrakefield
|  |      |    | `- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |    `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |     `* Re: Squeezing Those Bits: Concertina IIStephen Fuld
|  |      |      `* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  |      |       +- Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      |       `* Re: Squeezing Those Bits: Concertina IIMitchAlsup
|  |      `* Re: Squeezing Those Bits: Concertina IIMarcus
|  +* Re: Squeezing Those Bits: Concertina IIQuadibloc
|  `- Re: Squeezing Those Bits: Concertina IIQuadibloc
`- Re: Squeezing Those Bits: Concertina IIQuadibloc

Pages:1234567
Re: Squeezing Those Bits: Concertina II

<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17310&group=comp.arch#17310

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:13b9:: with SMTP id m25mr5939431qki.369.1622592639090;
Tue, 01 Jun 2021 17:10:39 -0700 (PDT)
X-Received: by 2002:a05:6830:10b:: with SMTP id i11mr187705otp.240.1622592638770;
Tue, 01 Jun 2021 17:10:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 1 Jun 2021 17:10:38 -0700 (PDT)
In-Reply-To: <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ec74:258c:8b25:fdcc;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ec74:258c:8b25:fdcc
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 02 Jun 2021 00:10:39 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 2 Jun 2021 00:10 UTC

On Tuesday, June 1, 2021 at 6:51:46 PM UTC-5, Quadibloc wrote:
> On Monday, May 31, 2021 at 11:47:20 PM UTC-6, Quadibloc wrote:
> > On Monday, May 31, 2021 at 5:46:49 PM UTC-6, MitchAlsup wrote:
>
> > > The flavor with both registers and displacement is only used on the
> > > order of 2% of the time.
>
> > Well, that may be. But I have a basic goal of allowing code to be
> > produced that is as functional, and as dense, as that of the IBM
> > System/360.
<
S/360 did not have "all that good" code density partially due to 12-bit displacements
and base register addressing for everything. In order to even branch within a sub-
routine, one has to get a hold of the PC using a BALR instruction to the next instruction.
If the subroutine is bigger that 4096 bytes, you may have to do this more than once.
......
<
> So, the idea is:
>
> A register-to-register arithmetic instruction takes only 16 bits on the IBM 360!
>
> Well, all right, there are 16-bit register-to-register instructions on this architecture,
> although they come with certain restrictions those on the 360 don't have.
>
> Memory-reference instructions with full base-index addressing take only 32 bits
> on the IBM 360!
<
Make that Rd,[base+index+disp12] in a single 32-bit instruction. RX format.
>
> Load and store instructions with full base-index addressing are provided in this
> architecture, despite the fact that register banks have 32 registers instead of 15,
<
16 int registers 4 FP.
<
> and a 16-bit displacement (to match that used on the 68000 and many other modern
> architectures) instead of a 12-bit displacement is used.
>
> Aside from only allowing load and store instructions, other restrictions include having
> only seven possible index registers instead of fifteen, only seven possible base registers
<
S.E.L. (n.e., Gould) used this same idea, which did not work out as well as it should have.
<
> instead of fifteen, and only allowing the addresses of aligned operands to be specified
> (a trick, used on the SEL 32 minicomputer is used to allow this limitation to provide additional
> opcode space).
<
The trick, for the uninitiated, was that when you know data has to be aligned, the lower
order bits of the displacement have to be zeros, so you can "encode" stuff using these bits.
>
> There's also an extended mode, which allows only seven instruction slots instead of eight
> per 256-bit block. In that mode, some restrictions are lifted.
>
> Load and store instructions no longer are limited to only aligned operands.
<
A misaligned memory model simply works better than the aligned only memory model:
and the HW cost is low enough that this is the only sensible choice these days.
>
> Memory-reference operate instructions (for the most common and standard arithmetic
> operations) are provided. These are restricted to aligned operands, and they also have an
> additional restriction, in that the form with 16-bit displacements is not available.
>
> In addition to the seven of the general registers which may be used as base registers
> with 16-bit displacements, one register may also be used as the base register for a 15-bit
> displacement, and another seven registers may be used as base registers with 12-bit
> displacements. The remaining modes when the one with 16-bit displacements is removed
> are available for these instructions.
>
> And the extended mode also provides for instructions like the string and packed-decimal
> instructions of the 360. These have a 48-bit form which matches the length of the corresponding
> System/360 instructions, but which has the restriction that 16-bit displacements are not
> available, and they also have a 64-bit form which does allow 16-bit displacements, and
> even indexing.
>
> Attempting to approach the instruction density of the System/360 may seem like a fairly
> useless goal in itself, except possibly for marketing purposes, but attention has _also_
> been paid to minimizing the overhead impacts of meeting that goal. So effort may have been
> apparently wasted in keeping some rare instructions short, but this has not led to more common
> instructions becoming *longer*.
>
> John Savard

Re: Squeezing Those Bits: Concertina II

<3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17314&group=comp.arch#17314

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:8245:: with SMTP id e66mr24371667qkd.439.1622599439102;
Tue, 01 Jun 2021 19:03:59 -0700 (PDT)
X-Received: by 2002:a4a:d41a:: with SMTP id n26mr23315869oos.66.1622599438850;
Tue, 01 Jun 2021 19:03:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 1 Jun 2021 19:03:58 -0700 (PDT)
In-Reply-To: <110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:c9f5:9635:7b7:a86a;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:c9f5:9635:7b7:a86a
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 02:03:59 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 2 Jun 2021 02:03 UTC

On Tuesday, June 1, 2021 at 6:10:40 PM UTC-6, MitchAlsup wrote:

> S/360 did not have "all that good" code density partially due to 12-bit displacements
> and base register addressing for everything. In order to even branch within a sub-
> routine, one has to get a hold of the PC using a BALR instruction to the next instruction.
> If the subroutine is bigger that 4096 bytes, you may have to do this more than once.

Well, if its code density is not all that good, that's a good reason not to do even worse!

But I thought if a subroutine is bigger than 4,096 bytes, you use one BALR instruction,
and just fill several base registers with BASE, BASE+4,096, BASE+8,192 and so on.
After all, if you have any forwards branches, it would be too late for another BALR
when you cross the boundary...

> > Load and store instructions no longer are limited to only aligned operands.

> A misaligned memory model simply works better than the aligned only memory model:
> and the HW cost is low enough that this is the only sensible choice these days.

Which is _why_ I rush to provide load and store instructions without an alignment
requirement in the 'enhanced' mode, which has a low overhead cost.

For many computations, one can get by with only aligned operands. The ability to
use an unaligned operand doesn't bring any benefits. But when it is needed, the
instructions with that ability are easily available.

So it isn't as if I'm offering an "aligned-only memory model", I'm just using the SEL
trick to increase code density, but operations on unaligned operands are an
easily-accessible option.

Of course, an assembler that doesn't require the programmer to keep track of
when a program crosses a block boundary will be complex, as it will have to process
up to eight instructions at a time before deciding what format of block to generate.

To me, _that_ is the "flaw" in my architecture, but it is a nearly unavoidable consequence
of offering VLIW features.

Aside from the affectation of allowing the System/360 to determine how long an
instruction has the right to be, its goals may be questionable. The instruction set looks
a lot like RISC, can go into a VLIW mode with instruction predication and explicit
indication of parallelism, and includes the functions found on a CISC architecture.

So it hasn't made up its mind what kind of a computer it wants to be. One can have a
great big OoO implementation, and one can have a tiny VLIW-centric implementation
that only runs efficiently if you use the VLIW mode features correctly. An architecture
that can be implemented across a range of systems, of course, is the great idea that
made the System/360 famous.

But these days, the equivalent of a microprogrammed Model 30 is hardly worth using
in a pocket calculator - and in my architecture, programs should be written differently
for efficient operation on different sized implementations, which somewhat reduces the
benefits of a common instruction set.

John Savard

Re: Squeezing Those Bits: Concertina II

<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17315&group=comp.arch#17315

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4baf:: with SMTP id i15mr26332918qvw.61.1622600967757;
Tue, 01 Jun 2021 19:29:27 -0700 (PDT)
X-Received: by 2002:a4a:d442:: with SMTP id p2mr23218292oos.89.1622600967504;
Tue, 01 Jun 2021 19:29:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 1 Jun 2021 19:29:27 -0700 (PDT)
In-Reply-To: <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ec74:258c:8b25:fdcc;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ec74:258c:8b25:fdcc
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 02 Jun 2021 02:29:27 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 2 Jun 2021 02:29 UTC

On Tuesday, June 1, 2021 at 9:04:00 PM UTC-5, Quadibloc wrote:
> On Tuesday, June 1, 2021 at 6:10:40 PM UTC-6, MitchAlsup wrote:
>
> > S/360 did not have "all that good" code density partially due to 12-bit displacements
> > and base register addressing for everything. In order to even branch within a sub-
> > routine, one has to get a hold of the PC using a BALR instruction to the next instruction.
> > If the subroutine is bigger that 4096 bytes, you may have to do this more than once.
<
> Well, if its code density is not all that good, that's a good reason not to do even worse!
>
> But I thought if a subroutine is bigger than 4,096 bytes, you use one BALR instruction,
> and just fill several base registers with BASE, BASE+4,096, BASE+8,192 and so on.
> After all, if you have any forwards branches, it would be too late for another BALR
> when you cross the boundary...
<
Right, so you burn up 2,3,4 registers for branch base registers, and burn up another 2,3,4
registers for large arrays, and then you remember you only have 13 usable registers, so
you are left with 6 registers in which to "do work". This is the problem of base register
machines and why modern machines have "different arithmetic" for branching [ip+disp]
as compared to memory referencing [Rbase+disp].
<
> > > Load and store instructions no longer are limited to only aligned operands.
>
> > A misaligned memory model simply works better than the aligned only memory model:
> > and the HW cost is low enough that this is the only sensible choice these days.
<
> Which is _why_ I rush to provide load and store instructions without an alignment
> requirement in the 'enhanced' mode, which has a low overhead cost.
>
> For many computations, one can get by with only aligned operands. The ability to
> use an unaligned operand doesn't bring any benefits. But when it is needed, the
> instructions with that ability are easily available.
>
> So it isn't as if I'm offering an "aligned-only memory model", I'm just using the SEL
> trick to increase code density, but operations on unaligned operands are an
> easily-accessible option.
<
I worked at S.E.L. 1980-1983 and they were adamant that it was not SEL.
>
> Of course, an assembler that doesn't require the programmer to keep track of
> when a program crosses a block boundary will be complex, as it will have to process
> up to eight instructions at a time before deciding what format of block to generate.
>
> To me, _that_ is the "flaw" in my architecture, but it is a nearly unavoidable consequence
> of offering VLIW features.
<
Which gets at the heart of my critique:: why offer VLIW features at all ??
>
> Aside from the affectation of allowing the System/360 to determine how long an
> instruction has the right to be, its goals may be questionable. The instruction set looks
> a lot like RISC, can go into a VLIW mode with instruction predication and explicit
> indication of parallelism, and includes the functions found on a CISC architecture.
<
While S/360 has warts, it is massively better than x86.
<
And LD-Ops work surprisingly well and you can even build pipelines that make the LD
parts that hit int he cache to appear to operate at register speeds. In effect, you put
AGEN after DECODE and you put EXECUTE after LD-ALIGN.
>
> So it hasn't made up its mind what kind of a computer it wants to be. One can have a
> great big OoO implementation, and one can have a tiny VLIW-centric implementation
> that only runs efficiently if you use the VLIW mode features correctly. An architecture
> that can be implemented across a range of systems, of course, is the great idea that
> made the System/360 famous.
<
I contend that what makes S/360 ISA reasonable across a large scale of implementations
is NOT adding too many features !! and keeping those offered features clean. And I consider
the VLIW section to be a bridge too far.
>
> But these days, the equivalent of a microprogrammed Model 30 is hardly worth using
> in a pocket calculator - and in my architecture, programs should be written differently
> for efficient operation on different sized implementations, which somewhat reduces the
> benefits of a common instruction set.
<
Today, the smallest scale should be no smaller than R3000+FPU, and can be as big as
8-wide SuperScalar Out of Order with execution window of ~224 instructions.
<
Have you considered how you can make a machine using your ISA that can decode and
execute more than 1 bundle but smaller than 2 bundles per cycle ??
>
> John Savard

Re: Squeezing Those Bits: Concertina II

<4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17316&group=comp.arch#17316

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:f105:: with SMTP id k5mr24930851qkg.63.1622604807773; Tue, 01 Jun 2021 20:33:27 -0700 (PDT)
X-Received: by 2002:a9d:63cd:: with SMTP id e13mr9909186otl.206.1622604807499; Tue, 01 Jun 2021 20:33:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 1 Jun 2021 20:33:27 -0700 (PDT)
In-Reply-To: <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:c9f5:9635:7b7:a86a; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:c9f5:9635:7b7:a86a
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com> <030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com> <81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com> <caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com> <563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me> <2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com> <7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com> <110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com> <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 03:33:27 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 159
 by: Quadibloc - Wed, 2 Jun 2021 03:33 UTC

On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:
> On Tuesday, June 1, 2021 at 9:04:00 PM UTC-5, Quadibloc wrote:

> > But I thought if a subroutine is bigger than 4,096 bytes, you use one BALR instruction,
> > and just fill several base registers with BASE, BASE+4,096, BASE+8,192 and so on.
> > After all, if you have any forwards branches, it would be too late for another BALR
> > when you cross the boundary...

> Right, so you burn up 2,3,4 registers for branch base registers, and burn up another 2,3,4
> registers for large arrays, and then you remember you only have 13 usable registers, so
> you are left with 6 registers in which to "do work". This is the problem of base register
> machines and why modern machines have "different arithmetic" for branching [ip+disp]
> as compared to memory referencing [Rbase+disp].

Ah.

Then, it's a _good_ thing that my design uses 16-bit displacements, despite the fact
that it has to make some sacrifices in order to do so.

I haven't used program-relative addressing for branches very much in the design,
on the other hand. Only the 16-bit branch instruction is program counter-relative.

But I _do_ recognize the problem of large arrays. I did _not_ like the idea that if you
have an array bigger than 64K (or 4K on the 360) you need basically one base register
per array. That's why I included "Array Mode" addressing in my design. That's where
base register 0, since it isn't used as a base register, points to a table... containing the
start addresses of arrays. So the index register gives the position in the array... basically,
it's indirect post-indexed addressing, but instead of being general, it's only possible for
the array addresses in this one table... that gets loaded into cache.

> I worked at S.E.L. 1980-1983 and they were adamant that it was not SEL.

I'll have to take another look at some of their old advertisements then. Of course,
since they're not writing my paycheck, I'm not too worried about their feelings...

> Which gets at the heart of my critique:: why offer VLIW features at all ??

That's a good question. After all, Ivan Godard is a nice guy, so why should I
want to annoy him?

Since he has mentioned the Itanium as a promising architecture in some of his
posts, and his intent is to make a lightweight design that competes with OoO
without its overhead... the presence of VLIW features could indeed be construed
as an intent of competing with him - although I am woefully unequipped to actually
manage that.

One thing is that I noted in the VLIW and VLIW-like designs I've seen that they
just have the equivalent of a 'break' bit. They indicate that "here's a chunk of instructions
that can be executed simultaneously... and now here's the next chunk".

That's all very well, but is that enough to tell a dumb CPU what to do?

Suppose there's a dependency. If the machine has to take a cycle between starting
each instruction (one-wide decode unit) and an instruction takes X cycles to execute, then
I can deal with dependencies by coding X-1 instructions between the instructions involved in
the dependency.

If I can't do that (the program may be executed on different implementations) and I want
the program to wait for only the minimum amount of time, but the implementation is dumb
and doesn't have much in the way of interlocks let alone OoO...

then I figured I would need to have a way to say not only 'do this bunch of instructions later' but
instead to indicate exactly which instruction another instruction depends on.

So I came up with the U bit, the D bit, and the offset field as required for indicating dependency
relations. I thought this was worth putting on display as a way to more fully flesh out the VLIW
model.

> > Aside from the affectation of allowing the System/360 to determine how long an
> > instruction has the right to be, its goals may be questionable. The instruction set looks
> > a lot like RISC, can go into a VLIW mode with instruction predication and explicit
> > indication of parallelism, and includes the functions found on a CISC architecture.

> While S/360 has warts, it is massively better than x86.

Amen. But the massive success of x86 shows that the Highlander model applies: "There can be
only one". Because people want to own the machine that will run nearly every piece of available
software.

Which is why, when I started with the original Concertina architecture... I decided that for an
architecture to have any chance of being the replacement for the x86, it couldn't be missing
some feature that a lot of people would desperately want.

Hence, everything but the kitchen sink.

> And LD-Ops work surprisingly well and you can even build pipelines that make the LD
> parts that hit int he cache to appear to operate at register speeds. In effect, you put
> AGEN after DECODE and you put EXECUTE after LD-ALIGN.

This went over my head, so I'll have to look it up.

> > So it hasn't made up its mind what kind of a computer it wants to be. One can have a
> > great big OoO implementation, and one can have a tiny VLIW-centric implementation
> > that only runs efficiently if you use the VLIW mode features correctly. An architecture
> > that can be implemented across a range of systems, of course, is the great idea that
> > made the System/360 famous.

> I contend that what makes S/360 ISA reasonable across a large scale of implementations
> is NOT adding too many features !! and keeping those offered features clean. And I consider
> the VLIW section to be a bridge too far.

Although today's iteration of S/360 does have a lot of legacy cruft, and therefore a lot
of features, I certainly admit the truth of that.

It is indeed odd to try to unify VLIW with CISC, though. After all, CISC means you've got some
stuff in your architecture that has to be microcoded. That won't derive benefits from VLIW
features.

> > But these days, the equivalent of a microprogrammed Model 30 is hardly worth using
> > in a pocket calculator - and in my architecture, programs should be written differently
> > for efficient operation on different sized implementations, which somewhat reduces the
> > benefits of a common instruction set.

> Today, the smallest scale should be no smaller than R3000+FPU, and can be as big as
> 8-wide SuperScalar Out of Order with execution window of ~224 instructions.

For general-purpose computing, yes - there's no benefit to making a CPU lighter than a
$15 smartphone CPU. But they still make 8-bit CPUs for special purposes in embedded
use.

> Have you considered how you can make a machine using your ISA that can decode and
> execute more than 1 bundle but smaller than 2 bundles per cycle ??

Not really. I've assumed that _one_ bundle, which is 8-wide superscalar, is a high-end
implementation, and a low-end one would start no more than one instruction per cycle.
I've sought to keep the architecture of my various implementations open to improvements
in technology, so that it wouldn't be impossible to implement it at *two* bundles per
cycle.

But let's say the underlying economics meant that one had to offer this architecture
on a 12-wide superscalar implementation.

If a one-instruction per cycle implementation is possible, then processing the block
while not immediately decoding or executing all the instructinos in the block is possible.

But in order for the pseudo-immediates to work, one does have to _fetch_ a whole
block at a time.

So one has a block-wide area into which a block is fetched, and from which the header
is decoded.

After those things are done, if the next stage to decode instructions and start executing
them was only four wide, so it would take two cycles before the next block started, this
wouldn't be unworkable - any more than a one-instruction per cycle implementation was
unworkable.

So I can now think of a way to handle your example.

I have to be able to fetch 512 bits per cycle, and buffer the 512 bits, and decode two headers
in one cycle.

Then I can feed 1 1/2 blocks to three four-wide superscalar pipelines.

Four instructions left over?

Fine. Next cycle, only fetch 256 bits and decode one header.

Nothing left over? Next cycle, decode two headers.

John Savard

Re: Squeezing Those Bits: Concertina II

<237ce560-8c9f-43ed-a8b8-4d9816f5f246n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17317&group=comp.arch#17317

  copy link   Newsgroups: comp.arch
X-Received: by 2002:aed:2166:: with SMTP id 93mr16631301qtc.374.1622606005169;
Tue, 01 Jun 2021 20:53:25 -0700 (PDT)
X-Received: by 2002:a4a:d004:: with SMTP id h4mr23047635oor.90.1622606004871;
Tue, 01 Jun 2021 20:53:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!3.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 1 Jun 2021 20:53:24 -0700 (PDT)
In-Reply-To: <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:c9f5:9635:7b7:a86a;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:c9f5:9635:7b7:a86a
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <237ce560-8c9f-43ed-a8b8-4d9816f5f246n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 03:53:25 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 2 Jun 2021 03:53 UTC

On Tuesday, June 1, 2021 at 9:33:28 PM UTC-6, Quadibloc wrote:
> On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:

> > And LD-Ops work surprisingly well and you can even build pipelines that make the LD
> > parts that hit int he cache to appear to operate at register speeds. In effect, you put
> > AGEN after DECODE and you put EXECUTE after LD-ALIGN.

> This went over my head, so I'll have to look it up.

Trying to look it up turned up something unrelated, not a specific Hennessy and
Patterson coinage.

Certainly, to the limited extent that I've included memory-reference operate
instructions in the architecture, I intend to split them into load and operate
micro-ops.

Yes, address generation - adding in the base and index values - should be done
as soon as the contents of those registers are available, so that later, when the
pipeline is available for executing the calculation on the instruction's operands,
nothing is kept waiting.

And unaligned operands - they require at least dual-channel memory, so that
an operand crossing a boundary doesn't force a second fetch. Then what you
need is a circuit like a barrel shifter, except it shifts by bytes rather than bits.

Oh, yes, I've looked up the basics of how people design computers to make
them go fast.

Even if I have the habit of referring to Wallace Tree multipliers, because I'm
including the Model 91 and the STRETCH as examples, with their tiny little
trees that aren't big enough to go all Dadda.

John Savard

Re: Squeezing Those Bits: Concertina II

<s977s0$g3j$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17318&group=comp.arch#17318

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Tue, 1 Jun 2021 23:18:39 -0700
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <s977s0$g3j$1@dont-email.me>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com>
<93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com>
<38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com>
<86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
<3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
<4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 2 Jun 2021 06:18:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de594bf7172e34a572c92b7dd71738fb";
logging-data="16499"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+E4Qkk7SGZ7YdmUpSH8c9g"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:nVJByISNZLqM91FcLuOeM4QE6DI=
In-Reply-To: <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 2 Jun 2021 06:18 UTC

On 6/1/2021 8:33 PM, Quadibloc wrote:
> On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:
>> On Tuesday, June 1, 2021 at 9:04:00 PM UTC-5, Quadibloc wrote:
>
>>> But I thought if a subroutine is bigger than 4,096 bytes, you use one BALR instruction,
>>> and just fill several base registers with BASE, BASE+4,096, BASE+8,192 and so on.
>>> After all, if you have any forwards branches, it would be too late for another BALR
>>> when you cross the boundary...
>
>> Right, so you burn up 2,3,4 registers for branch base registers, and burn up another 2,3,4
>> registers for large arrays, and then you remember you only have 13 usable registers, so
>> you are left with 6 registers in which to "do work". This is the problem of base register
>> machines and why modern machines have "different arithmetic" for branching [ip+disp]
>> as compared to memory referencing [Rbase+disp].
>
> Ah.
>
> Then, it's a _good_ thing that my design uses 16-bit displacements, despite the fact
> that it has to make some sacrifices in order to do so.
>
> I haven't used program-relative addressing for branches very much in the design,
> on the other hand. Only the 16-bit branch instruction is program counter-relative.
>
> But I _do_ recognize the problem of large arrays. I did _not_ like the idea that if you
> have an array bigger than 64K (or 4K on the 360) you need basically one base register
> per array. That's why I included "Array Mode" addressing in my design. That's where
> base register 0, since it isn't used as a base register, points to a table... containing the
> start addresses of arrays. So the index register gives the position in the array... basically,
> it's indirect post-indexed addressing, but instead of being general, it's only possible for
> the array addresses in this one table... that gets loaded into cache.
>
>> I worked at S.E.L. 1980-1983 and they were adamant that it was not SEL.
>
> I'll have to take another look at some of their old advertisements then. Of course,
> since they're not writing my paycheck, I'm not too worried about their feelings...
>
>> Which gets at the heart of my critique:: why offer VLIW features at all ??
>
> That's a good question. After all, Ivan Godard is a nice guy, so why should I
> want to annoy him?

Flattery will get you everywhere!

Re: Squeezing Those Bits: Concertina II

<aa0f0107-787c-4a6c-8e94-824b39e5507cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17319&group=comp.arch#17319

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:c30f:: with SMTP id n15mr15663526qkg.71.1622626917088;
Wed, 02 Jun 2021 02:41:57 -0700 (PDT)
X-Received: by 2002:a05:6830:1bd3:: with SMTP id v19mr25725197ota.276.1622626916809;
Wed, 02 Jun 2021 02:41:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 02:41:56 -0700 (PDT)
In-Reply-To: <237ce560-8c9f-43ed-a8b8-4d9816f5f246n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:9c9:e31d:6edf:84e7;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:9c9:e31d:6edf:84e7
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
<237ce560-8c9f-43ed-a8b8-4d9816f5f246n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <aa0f0107-787c-4a6c-8e94-824b39e5507cn@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 09:41:57 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 2 Jun 2021 09:41 UTC

On Tuesday, June 1, 2021 at 9:53:26 PM UTC-6, Quadibloc wrote:

> Oh, yes, I've looked up the basics of how people design computers to make
> them go fast.

But there's plenty I don't know.

Thus, I haven't managed to figure out how one would go about designing
a proper register rename unit at the gate level.

Only recently did I learn how a 2-way set associative cache is actually
built.

John Savard

Re: Squeezing Those Bits: Concertina II

<859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17320&group=comp.arch#17320

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:162a:: with SMTP id e10mr12338561qvw.49.1622627372003;
Wed, 02 Jun 2021 02:49:32 -0700 (PDT)
X-Received: by 2002:a9d:6743:: with SMTP id w3mr24414319otm.82.1622627371797;
Wed, 02 Jun 2021 02:49:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 02:49:31 -0700 (PDT)
In-Reply-To: <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:9c9:e31d:6edf:84e7;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:9c9:e31d:6edf:84e7
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 09:49:31 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 2 Jun 2021 09:49 UTC

On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:

> Which gets at the heart of my critique:: why offer VLIW features at all ??

> And I consider
> the VLIW section to be a bridge too far.

Upon reflection, I see the "real" reason that I'm offering the feature of
VLIW operation.

It's really quite simple.

In order to offer immediates for instructions, not limited to 8-bit and
16-bit immediates (and offering floating-point immediates only from
a short list, or with A-law encoding, simply did not bear thinking about
from my point of view)...

I either would have had to encode the length of instructions in an
unwieldy way, or use the block scheme that I'm using.

And _if_ I organize the program code into blocks (as an option, since
basic format code with no headers is possible) then not only is it _easy_
to offer VLIW...

but, as well, I kind of feel that if I don't offer VLIW, I don't have a good
enough excuse to be organizing program code into blocks! If I've already
_got_ a 256-bit VLIW, I should be offering the usual amenities with it.
And since blisteringly fast speed is another one of the primary objectives
of my ISA design, it's a natural fit.

John Savard

Re: Squeezing Those Bits: Concertina II

<09fd2253-5c27-4eee-9213-0fa4bbf3dc5dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17321&group=comp.arch#17321

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:a99c:: with SMTP id a28mr6471698qvb.33.1622627821531;
Wed, 02 Jun 2021 02:57:01 -0700 (PDT)
X-Received: by 2002:a9d:63cd:: with SMTP id e13mr10889862otl.206.1622627821331;
Wed, 02 Jun 2021 02:57:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 02:57:01 -0700 (PDT)
In-Reply-To: <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:9c9:e31d:6edf:84e7;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:9c9:e31d:6edf:84e7
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <09fd2253-5c27-4eee-9213-0fa4bbf3dc5dn@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 09:57:01 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 2 Jun 2021 09:57 UTC

On Tuesday, June 1, 2021 at 9:33:28 PM UTC-6, Quadibloc wrote:
> On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:

> > I worked at S.E.L. 1980-1983 and they were adamant that it was not SEL.

> I'll have to take another look at some of their old advertisements then. Of course,
> since they're not writing my paycheck, I'm not too worried about their feelings...

I have now done so.

When it comes to the time of the SYSTEMS 32, their brochures and manuals
seem to abbreviate Systems Engineering Laboratories only as SYSTEMS.

But looking at an ad for the SEL 810B from 1967 that ran in Datamation, the
ad copy reads...

Anything the SEL 810A can do,

the new SEL 810B can do twice as fast.

Over 50 SEL 810A, 16-bit computers have been supplied for data acquisition
and control. Now meet the SEL 810B, with twice the speed of the A.

....and so on and so forth. SEL, not S.E.L. anywhere.

John Savard

Re: Squeezing Those Bits: Concertina II

<s97m1u$8kh$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17322&group=comp.arch#17322

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Wed, 2 Jun 2021 12:20:46 +0200
Organization: A noiseless patient Spider
Lines: 141
Message-ID: <s97m1u$8kh$1@dont-email.me>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com>
<93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com>
<38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<s94le0$3cr$1@dont-email.me>
<d934adf6-2832-4f14-8235-d3bddc8f0c26n@googlegroups.com>
<s963s5$4sa$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 2 Jun 2021 10:20:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9e7e83445aa7e129dc4841ab895ec3f3";
logging-data="8849"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/n6K5JRehE4oFE9UzZ1hoFu+lKkDiEOfk="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:fapggB0DH910yXTMiBYbyBnrHik=
In-Reply-To: <s963s5$4sa$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Wed, 2 Jun 2021 10:20 UTC

On 2021-06-01, Thomas Koenig wrote:
> MitchAlsup <MitchAlsup@aol.com> schrieb:
>
>> Finally, I take sample code from Brian's compiler and see if the cooked
>> data is representative of actual produced instruction. I have "read' the
>> ASM of his LLVM front end, Livermore Loops, and a few other choice
>> applications. I wish I had access of SPEC (of any era) so I could "read"
>> that, too.
>
> I have done a bit of disassembly on POWER and done some very
> rudimentary statistics, with a view towards what a kind of
> compressed instruction set should have.
>
> Of course, this does not help for constructs which POWER does
> not support.
>
> Here is an example output from /usr/bin/git :
>
> destructive three register op: 4327
> non-destructive three register op: 1223
> destructive register constant op: 34891
> two register op: 4864
> non-destructive register constant op: 29326
> load/store with constant offset: 93058
> fp load/store with constant offset: 26
> load/store base+index reg: 4038
> load immediate: 22058
> load immediate and shift: 2827
> move register: 31607
> move special: 8764
> compare instructions: 22160
> branch instructions: 65712
> of which bctr: 240
> of which blr: 3302
> of which bctrl: 217
> floating point operations: 61
> byte reversal: 82
> condition register set: 13
> vector scalar registers: 588
> altivec: 38
> load/store base+index fp: 0
> barrier: 0
> supervisor call: 0
> abort trap: 0
> default prefetch: 0
> nop: 23345
> service processor attention: 2
> load / store offset statistics:
> offset = 0: 7787
> 1 <= abs(offset) < 15: 2407, 616 disible by 4
> 16 <= abs(offset) < 128: 61950, 61596 disible by 4
> 128 <= abs(offset) < 2048: 13746, 13663 disible by 4
> 2048 <= abs(offset) 7168, 7046 disible by 4
> Branch statistics:
> offset < 127: 5035
> 128 <= offset < 2048: 9760
> 2048 <= offset < 4096: 1540
> 4096 <= offset < 8192: 1211
> 8192 <= offset < 32768: 1645
> 32768 <= offset < 65536: 673
> 65536 <= offset <131072: 1389
> 132027<= offset <262144: 2425
> 262144<= offset 16191
>
> By comparison, here is a scientific code, aermod.f90 from
> the Polyhedron benchmark suite:
>
> destructive three register op: 7866
> non-destructive three register op: 1523
> destructive register constant op: 19791
> two register op: 2854
> non-destructive register constant op: 24625
> load/store with constant offset: 56247
> fp load/store with constant offset: 10309
> load/store base+index reg: 2101
> load immediate: 17422
> load immediate and shift: 2026
> move register: 5994
> move special: 2109
> compare instructions: 7099
> branch instructions: 23590
> of which bctr: 50
> of which blr: 805
> of which bctrl: 8
> floating point operations: 8255
> byte reversal: 62
> condition register set: 961
> vector scalar registers: 2523
> altivec: 93
> load/store base+index fp: 1720
> barrier: 0
> supervisor call: 0
> abort trap: 0
> default prefetch: 0
> nop: 13563
> service processor attention: 1
> load / store offset statistics:
> offset = 0: 630
> 1 <= abs(offset) < 15: 382, 260 disible by 4
> 16 <= abs(offset) < 128: 21435, 21424 disible by 4
> 128 <= abs(offset) < 2048: 12760, 12757 disible by 4
> 2048 <= abs(offset) 21040, 20922 disible by 4
> Branch statistics:
> offset < 127: 3334
> 128 <= offset < 2048: 7217
> 2048 <= offset < 4096: 1185
> 4096 <= offset < 8192: 906
> 8192 <= offset < 32768: 901
> 32768 <= offset < 65536: 620
> 65536 <= offset <131072: 569
> 132027<= offset <262144: 1266
> 262144<= offset 6111
> Instructions: 213306
>

Thanks! Saving those numbers for future use :-)

I did a similar quick disassembly + grep for MRISC32 on the
Doom and Quake binaries (but it lacks operand configurations
and branch ranges etc):

https://github.com/mrisc32/mrisc32/issues/103#issuecomment-768143215

The scope for that investigation was code size.

A different axis is the dynamic instruction count (i.e. instruction
execution frequencies) which I think is more relevant for
performance analysis/tuning of an ISA.

> By comparison, you see theat there are far more floating point
> operations (obviously).
>
> Also, load and store with a constant offset seems to be a
> very frequent operation no matter what.
>
> Seeing which of the operations could have been fused into
> a load or store from register + register + offset would
> require a much more elaborate analysis than my simplistic
> Perl script.
>

Re: Squeezing Those Bits: Concertina II

<s989ik$itn$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17324&group=comp.arch#17324

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Wed, 2 Jun 2021 08:53:54 -0700
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <s989ik$itn$1@dont-email.me>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com>
<93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com>
<38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com>
<86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
<3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
<859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 2 Jun 2021 15:53:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c4c5cdcda7ccfa8637fcb35ba217ffd9";
logging-data="19383"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18YjabTCoMKl1/+rdOSMA4XS29PZxVEPjg="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.2
Cancel-Lock: sha1:RBRh3qxmZbZ+6YrIFZ0KmVpS2ZQ=
In-Reply-To: <859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Wed, 2 Jun 2021 15:53 UTC

On 6/2/2021 2:49 AM, Quadibloc wrote:
> On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:
>
>> Which gets at the heart of my critique:: why offer VLIW features at all ??
>
>> And I consider
>> the VLIW section to be a bridge too far.
>
> Upon reflection, I see the "real" reason that I'm offering the feature of
> VLIW operation.
>
> It's really quite simple.
>
> In order to offer immediates for instructions, not limited to 8-bit and
> 16-bit immediates (and offering floating-point immediates only from
> a short list, or with A-law encoding, simply did not bear thinking about
> from my point of view)...
>
> I either would have had to encode the length of instructions in an
> unwieldy way, or use the block scheme that I'm using.
>
> And _if_ I organize the program code into blocks (as an option, since
> basic format code with no headers is possible) then not only is it _easy_
> to offer VLIW...
>
> but, as well, I kind of feel that if I don't offer VLIW, I don't have a good
> enough excuse to be organizing program code into blocks! If I've already
> _got_ a 256-bit VLIW, I should be offering the usual amenities with it.
> And since blisteringly fast speed is another one of the primary objectives
> of my ISA design, it's a natural fit.

This last part seems to be circular logic. If I do blocks, then I
should do VLIW (because it is easy), and if I don't do VLIW, I shouldn't
do blocks. But your argument for doing both seems flawed. As I
understand it, it it to offer longer immediates. But Mitch has
demonstrated a way to have fixed length "instructions" (32 bits), with
longer immediates. Why is your solution better than his?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Squeezing Those Bits: Concertina II

<2021Jun2.183620@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17327&group=comp.arch#17327

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Wed, 02 Jun 2021 16:36:20 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 60
Message-ID: <2021Jun2.183620@mips.complang.tuwien.ac.at>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com> <563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me> <2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com> <s94le0$3cr$1@dont-email.me> <d934adf6-2832-4f14-8235-d3bddc8f0c26n@googlegroups.com> <s963s5$4sa$1@newsreader4.netcologne.de> <s97m1u$8kh$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="95fe510b8fd0bb43c149926456c0108d";
logging-data="25833"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QagSUQrJKi05XuL/Bduuh"
Cancel-Lock: sha1:F8p+P5CfOJWzLcscVSIDXfnq9iA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 2 Jun 2021 16:36 UTC

Marcus <m.delete@this.bitsnbites.eu> writes:
>A different axis is the dynamic instruction count (i.e. instruction
>execution frequencies) which I think is more relevant for
>performance analysis/tuning of an ISA.

One would think so, but the problem is that in a corpus of large
programs the dynamically executed instructions are mostly confined to
a few hot spots, and the rest of the large programs them plays hardly
any role. And as a consequence, the most frequent dynamically
executed instruction sequences tend to be not very representative of
what happens in other programs.

That was certainly our experience in our work on virtual machine
superinstructions (a sequence of virtual instructions is combined into
one virtual superinstruction): When selecting superinstructions based
on a set of benchmarks, and then measuring the performance of a
benchmark that is not in the set, we saw better results when selecting
the statically most frequently occuring sequences than when selecting
the dynamically most frequently occuring sequences. E.g., read
Chapter 3 of

@MastersThesis{eller05,
author = {Helmut Eller},
title = {Optimizing Interpreters with Superinstructions},
school = {TU Wien},
year = {2005},
type = {Diplomarbeit},
url = {http://www.complang.tuwien.ac.at/Diplomarbeiten/eller05.ps.gz},
abstract = {Superinstructions can be used to make virtual
machine (VM) interpreters faster. A superinstruction
is a combination of simpler VM instructions which
can be executed faster than the corresponding
sequence of simpler VM instructions, because the
interpretative overhead, like instruction dispatch
and argument fetching, is reduced. This work
discusses the following three topics related to
superinstructions. First, I present some heuristics
to choose superinstructions. I evaluated the
heuristics for Forth and Java programs. If the
number of allowed superinstructions was very large,
$> 1000$, then the heuristic which chooses all
possible subsequences up to length 4 achieved the
best results. If the number of allowed
superinstructions was more limited, then a heuristic
which favors short sequences and sequences which
occur in many different programs and many different
basic blocks performed better than the
others. Second, I compare a simple greedy algorithm
and an optimal algorithm to cover a program with
superinstructions. I found that the greedy algorithm
achieves almost optimal results. Finally, I compare
superinstructions with non-sequential patterns. In
my experiments, superinstructions performed slightly
better than non-sequential patterns.}
}

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Squeezing Those Bits: Concertina II

<21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17328&group=comp.arch#17328

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:ef55:: with SMTP id d82mr28164702qkg.3.1622657411742;
Wed, 02 Jun 2021 11:10:11 -0700 (PDT)
X-Received: by 2002:aca:33d4:: with SMTP id z203mr4618127oiz.51.1622657411529;
Wed, 02 Jun 2021 11:10:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 11:10:11 -0700 (PDT)
In-Reply-To: <s989ik$itn$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:f47d:a77e:92e:d2c5;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:f47d:a77e:92e:d2c5
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>
<s989ik$itn$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 18:10:11 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 2 Jun 2021 18:10 UTC

On Wednesday, June 2, 2021 at 9:53:58 AM UTC-6, Stephen Fuld wrote:
> But Mitch has
> demonstrated a way to have fixed length "instructions" (32 bits), with
> longer immediates. Why is your solution better than his?

I don't know for sure what his solution is, so I can't really answer that;
and, of course, Mitch has far more knowledge and experience in CPU
design than I do.

However, IIRC, his solution actually did require _some_ sequential
decoding of instructions, whereeas my scheme, whatever other flaws
it may have, provides for everything being decoded in parallel.

John Savard

Re: Squeezing Those Bits: Concertina II

<ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17329&group=comp.arch#17329

  copy link   Newsgroups: comp.arch
X-Received: by 2002:aed:20e3:: with SMTP id 90mr25504267qtb.165.1622658441910; Wed, 02 Jun 2021 11:27:21 -0700 (PDT)
X-Received: by 2002:a54:4794:: with SMTP id o20mr22133490oic.99.1622658441656; Wed, 02 Jun 2021 11:27:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 11:27:21 -0700 (PDT)
In-Reply-To: <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ecba:d651:c638:300e; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ecba:d651:c638:300e
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com> <030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com> <81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com> <caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com> <563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me> <2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com> <7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com> <110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com> <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 02 Jun 2021 18:27:21 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 186
 by: MitchAlsup - Wed, 2 Jun 2021 18:27 UTC

On Tuesday, June 1, 2021 at 10:33:28 PM UTC-5, Quadibloc wrote:
> On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:
> > On Tuesday, June 1, 2021 at 9:04:00 PM UTC-5, Quadibloc wrote:
>
> > > But I thought if a subroutine is bigger than 4,096 bytes, you use one BALR instruction,
> > > and just fill several base registers with BASE, BASE+4,096, BASE+8,192 and so on.
> > > After all, if you have any forwards branches, it would be too late for another BALR
> > > when you cross the boundary...
>
> > Right, so you burn up 2,3,4 registers for branch base registers, and burn up another 2,3,4
> > registers for large arrays, and then you remember you only have 13 usable registers, so
> > you are left with 6 registers in which to "do work". This is the problem of base register
> > machines and why modern machines have "different arithmetic" for branching [ip+disp]
> > as compared to memory referencing [Rbase+disp].
> Ah.
>
> Then, it's a _good_ thing that my design uses 16-bit displacements, despite the fact
> that it has to make some sacrifices in order to do so.
<
Cuts down on the register waste, so do 32-bit and 64-bit displacements.
>
> I haven't used program-relative addressing for branches very much in the design,
> on the other hand. Only the 16-bit branch instruction is program counter-relative.
>
> But I _do_ recognize the problem of large arrays. I did _not_ like the idea that if you
> have an array bigger than 64K (or 4K on the 360) you need basically one base register
> per array. That's why I included "Array Mode" addressing in my design. That's where
> base register 0, since it isn't used as a base register, points to a table... containing the
> start addresses of arrays. So the index register gives the position in the array... basically,
> it's indirect post-indexed addressing, but instead of being general, it's only possible for
> the array addresses in this one table... that gets loaded into cache.
> > I worked at S.E.L. 1980-1983 and they were adamant that it was not SEL.
> I'll have to take another look at some of their old advertisements then. Of course,
> since they're not writing my paycheck, I'm not too worried about their feelings...
> > Which gets at the heart of my critique:: why offer VLIW features at all ??
> That's a good question. After all, Ivan Godard is a nice guy, so why should I
> want to annoy him?
>
> Since he has mentioned the Itanium as a promising architecture in some of his
<
When it began, Itanium had promise.
When the first chips taped out, it no longer did.
<
> posts, and his intent is to make a lightweight design that competes with OoO
> without its overhead... the presence of VLIW features could indeed be construed
> as an intent of competing with him - although I am woefully unequipped to actually
> manage that.
>
> One thing is that I noted in the VLIW and VLIW-like designs I've seen that they
> just have the equivalent of a 'break' bit. They indicate that "here's a chunk of instructions
> that can be executed simultaneously... and now here's the next chunk".
<
The break bit is equivalent to "the rest of the bundle is NoOps."
>
> That's all very well, but is that enough to tell a dumb CPU what to do?
<
I, personally, don't think building dumb CPUs is a worthy endeavor.
>
> Suppose there's a dependency. If the machine has to take a cycle between starting
> each instruction (one-wide decode unit) and an instruction takes X cycles to execute, then
> I can deal with dependencies by coding X-1 instructions between the instructions involved in
> the dependency.
<
And this is where VLIW breaks down. What is X is variable or X changes between implementations ?
<
Mill solves this problem by distributing applications in re-compiled intermediate format.
And Mill has a specializer (code generator) for each implementation.
Mill does not have a fixed ISA, but one tuned to each implementation.
<
So, if you want VLIW, you don't want a fixed ISA !!
>
> If I can't do that (the program may be executed on different implementations) and I want
> the program to wait for only the minimum amount of time, but the implementation is dumb
> and doesn't have much in the way of interlocks let alone OoO...
>
> then I figured I would need to have a way to say not only 'do this bunch of instructions later' but
> instead to indicate exactly which instruction another instruction depends on.
<
Operand register generally suffices.
>
> So I came up with the U bit, the D bit, and the offset field as required for indicating dependency
> relations. I thought this was worth putting on display as a way to more fully flesh out the VLIW
> model.
> > > Aside from the affectation of allowing the System/360 to determine how long an
> > > instruction has the right to be, its goals may be questionable. The instruction set looks
> > > a lot like RISC, can go into a VLIW mode with instruction predication and explicit
> > > indication of parallelism, and includes the functions found on a CISC architecture.
>
> > While S/360 has warts, it is massively better than x86.
> Amen. But the massive success of x86 shows that the Highlander model applies: "There can be
> only one". Because people want to own the machine that will run nearly every piece of available
> software.
>
> Which is why, when I started with the original Concertina architecture... I decided that for an
> architecture to have any chance of being the replacement for the x86, it couldn't be missing
> some feature that a lot of people would desperately want.
<
99% of PC users don't give a crap about the architecture of the machine, only the developers do !!
>
> Hence, everything but the kitchen sink.
<
Taking the light weight nature of VLIW and crushing it with added baggage.
<
> > And LD-Ops work surprisingly well and you can even build pipelines that make the LD
> > parts that hit int he cache to appear to operate at register speeds. In effect, you put
> > AGEN after DECODE and you put EXECUTE after LD-ALIGN.
<
> This went over my head, so I'll have to look it up.
<
FETCH-DECODE-AGEN-CACHE-LDALIGN-EXEC-WRITE
<
> > > So it hasn't made up its mind what kind of a computer it wants to be. One can have a
> > > great big OoO implementation, and one can have a tiny VLIW-centric implementation
> > > that only runs efficiently if you use the VLIW mode features correctly. An architecture
> > > that can be implemented across a range of systems, of course, is the great idea that
> > > made the System/360 famous.
>
> > I contend that what makes S/360 ISA reasonable across a large scale of implementations
> > is NOT adding too many features !! and keeping those offered features clean. And I consider
> > the VLIW section to be a bridge too far.
<
> Although today's iteration of S/360 does have a lot of legacy cruft, and therefore a lot
> of features, I certainly admit the truth of that.
<
Today's mainframes from IBM are not S/360 (or even S/370, 3080, 3090) they are another
super extended 64-bit, but sometimes 32-bit ISA will 3 kinds of floating point (HEX, binary, decimal)
and everything including the kitchen sink.
>
> It is indeed odd to try to unify VLIW with CISC, though. After all, CISC means you've got some
> stuff in your architecture that has to be microcoded. That won't derive benefits from VLIW
> features.
<
Disagree on 2 counts: CISC does not need microcode and some RISC designs should consider
being mircocoded. I consider microcode a programatic way to build sequencers, nothing more
or less.
<
> > > But these days, the equivalent of a microprogrammed Model 30 is hardly worth using
> > > in a pocket calculator - and in my architecture, programs should be written differently
> > > for efficient operation on different sized implementations, which somewhat reduces the
> > > benefits of a common instruction set.
>
> > Today, the smallest scale should be no smaller than R3000+FPU, and can be as big as
> > 8-wide SuperScalar Out of Order with execution window of ~224 instructions.
<
> For general-purpose computing, yes - there's no benefit to making a CPU lighter than a
> $15 smartphone CPU. But they still make 8-bit CPUs for special purposes in embedded
> use.
> > Have you considered how you can make a machine using your ISA that can decode and
> > execute more than 1 bundle but smaller than 2 bundles per cycle ??
<
> Not really. I've assumed that _one_ bundle, which is 8-wide superscalar, is a high-end
> implementation, and a low-end one would start no more than one instruction per cycle.
> I've sought to keep the architecture of my various implementations open to improvements
> in technology, so that it wouldn't be impossible to implement it at *two* bundles per
> cycle.
>
> But let's say the underlying economics meant that one had to offer this architecture
> on a 12-wide superscalar implementation.
>
> If a one-instruction per cycle implementation is possible, then processing the block
> while not immediately decoding or executing all the instructinos in the block is possible.
>
> But in order for the pseudo-immediates to work, one does have to _fetch_ a whole
> block at a time.
>
> So one has a block-wide area into which a block is fetched, and from which the header
> is decoded.
>
> After those things are done, if the next stage to decode instructions and start executing
> them was only four wide, so it would take two cycles before the next block started, this
> wouldn't be unworkable - any more than a one-instruction per cycle implementation was
> unworkable.
>
> So I can now think of a way to handle your example.
>
> I have to be able to fetch 512 bits per cycle, and buffer the 512 bits, and decode two headers
> in one cycle.
>
> Then I can feed 1 1/2 blocks to three four-wide superscalar pipelines.
>
> Four instructions left over?
>
> Fine. Next cycle, only fetch 256 bits and decode one header.
>
> Nothing left over? Next cycle, decode two headers.
>
> John Savard


Click here to read the complete article
Re: Squeezing Those Bits: Concertina II

<420d078a-a71e-435b-ad65-f5bdaedd7cc1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17330&group=comp.arch#17330

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:806:: with SMTP id s6mr18545839qks.68.1622658688997; Wed, 02 Jun 2021 11:31:28 -0700 (PDT)
X-Received: by 2002:a9d:6743:: with SMTP id w3mr26114929otm.82.1622658688766; Wed, 02 Jun 2021 11:31:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 11:31:28 -0700 (PDT)
In-Reply-To: <21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ecba:d651:c638:300e; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ecba:d651:c638:300e
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com> <030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com> <81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com> <caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com> <563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me> <2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com> <7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com> <110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com> <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com> <s989ik$itn$1@dont-email.me> <21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <420d078a-a71e-435b-ad65-f5bdaedd7cc1n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 02 Jun 2021 18:31:28 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 13
 by: MitchAlsup - Wed, 2 Jun 2021 18:31 UTC

On Wednesday, June 2, 2021 at 1:10:12 PM UTC-5, Quadibloc wrote:
> On Wednesday, June 2, 2021 at 9:53:58 AM UTC-6, Stephen Fuld wrote:
> > But Mitch has
> > demonstrated a way to have fixed length "instructions" (32 bits), with
> > longer immediates. Why is your solution better than his?
> I don't know for sure what his solution is, so I can't really answer that;
> and, of course, Mitch has far more knowledge and experience in CPU
> design than I do.
>
> However, IIRC, his solution actually did require _some_ sequential
> decoding of instructions, whereeas my scheme, whatever other flaws
> it may have, provides for everything being decoded in parallel.
>
> John Savard

Re: Squeezing Those Bits: Concertina II

<7d48604f-f7cd-43f8-be3c-ad3fc9242058n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17332&group=comp.arch#17332

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:eb84:: with SMTP id b126mr14431723qkg.331.1622659166849; Wed, 02 Jun 2021 11:39:26 -0700 (PDT)
X-Received: by 2002:a54:4e81:: with SMTP id c1mr23725709oiy.119.1622659166612; Wed, 02 Jun 2021 11:39:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 11:39:26 -0700 (PDT)
In-Reply-To: <21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ecba:d651:c638:300e; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ecba:d651:c638:300e
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com> <030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com> <81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com> <caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com> <563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me> <2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com> <7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com> <110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com> <51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com> <s989ik$itn$1@dont-email.me> <21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7d48604f-f7cd-43f8-be3c-ad3fc9242058n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 02 Jun 2021 18:39:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 31
 by: MitchAlsup - Wed, 2 Jun 2021 18:39 UTC

On Wednesday, June 2, 2021 at 1:10:12 PM UTC-5, Quadibloc wrote:
> On Wednesday, June 2, 2021 at 9:53:58 AM UTC-6, Stephen Fuld wrote:
> > But Mitch has
> > demonstrated a way to have fixed length "instructions" (32 bits), with
> > longer immediates. Why is your solution better than his?
> I don't know for sure what his solution is, so I can't really answer that;
> and, of course, Mitch has far more knowledge and experience in CPU
> design than I do.
>
> However, IIRC, his solution actually did require _some_ sequential
> decoding of instructions, whereeas my scheme, whatever other flaws
> it may have, provides for everything being decoded in parallel.
<
In 30 total gates, and in 4-gates of delay, one can decode My 66000
instructions to determine their length, and the offsets in the instruction
stream of any constants. IBM 360 does this in 2-gates of delay. Both
share the advantage that the bits decoded are all fixed over all instruction
formats. So, My 66000 is only 2-gates behind at this point.
<
From here we setup a tree of find next instruction logic (2-gates) and we
parse (determine starting word and all constant words) 2 then 4 then 8
then 16 instructions at 2-gates of delay each level. So parsing 16 instructions
per cycle is a 12-gate delay problem. Easy Peasy in a 16-gate cycle. The
original IBM 360 would be 2-gates shorter, the current monstrosity would
be longer.
<
So, at these issue widths, the problem is more ICache access width, and
the fact that every 9.2 instructions one "takes" a branch (10.3 words).
So the machines with "simple" front ends are FETCH limited not decode
or execute limited.
>
> John Savard

Re: Squeezing Those Bits: Concertina II

<s98o26$1e81$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17333&group=comp.arch#17333

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!/FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Wed, 2 Jun 2021 22:01:12 +0200
Organization: Aioe.org NNTP Server
Lines: 19
Message-ID: <s98o26$1e81$1@gioia.aioe.org>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com>
<38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com>
<86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
<3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
<4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
<ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
NNTP-Posting-Host: /FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 2 Jun 2021 20:01 UTC

MitchAlsup wrote:
> On Tuesday, June 1, 2021 at 10:33:28 PM UTC-5, Quadibloc wrote:
>> Since he has mentioned the Itanium as a promising architecture in some of his
> <
> When it began, Itanium had promise.
> When the first chips taped out, it no longer did.

This!

When the first Itanium came out, it was 5 years late, and still only a
partial implementation. In the CPU business where all the competitors,
inluding AMD64, still enjoyed full Moore's law scaling, this was a
capital offense.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Squeezing Those Bits: Concertina II

<7d9b1862-5d8d-4b07-8c13-9f1caef37cden@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17336&group=comp.arch#17336

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2aa1:: with SMTP id js1mr21436143qvb.11.1622676809511;
Wed, 02 Jun 2021 16:33:29 -0700 (PDT)
X-Received: by 2002:a9d:397:: with SMTP id f23mr8411546otf.22.1622676809132;
Wed, 02 Jun 2021 16:33:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 16:33:28 -0700 (PDT)
In-Reply-To: <ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:4473:74b7:5972:fc62;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:4473:74b7:5972:fc62
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
<ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7d9b1862-5d8d-4b07-8c13-9f1caef37cden@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 02 Jun 2021 23:33:29 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 21
 by: Quadibloc - Wed, 2 Jun 2021 23:33 UTC

On Wednesday, June 2, 2021 at 12:27:23 PM UTC-6, MitchAlsup wrote:
> On Tuesday, June 1, 2021 at 10:33:28 PM UTC-5, Quadibloc wrote:

> > Suppose there's a dependency. If the machine has to take a cycle between starting
> > each instruction (one-wide decode unit) and an instruction takes X cycles to execute, then
> > I can deal with dependencies by coding X-1 instructions between the instructions involved in
> > the dependency.

> And this is where VLIW breaks down. What is X is variable or X changes between implementations ?

Ah, but I guess I didn't make myself clear here.

That's how I could deal with dependencies on a RISC CPU, which doesn't have the extra
features that VLIW offers.

If I have OoO, obviously, if X changes between implementations, it's not an issue.

If I have VLIW, I can now explicitly indicate 'this instruction depends on that previous
instruction', so the processor knows to wait only for the minimum time necessary (or, at
worst, the worst-case time required by the previous instruction).

John Savard

Re: Squeezing Those Bits: Concertina II

<s99cec$paq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17339&group=comp.arch#17339

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Wed, 2 Jun 2021 18:49:00 -0700
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <s99cec$paq$1@dont-email.me>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com>
<38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com>
<86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
<3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
<4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
<ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 3 Jun 2021 01:49:00 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c134b88f199225e8e96db49f64f2fa9d";
logging-data="25946"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18VTrl5/q0EDdhlfBjmdUjz"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:R/5L3SianLpdO4xoRwiAhTt7cD0=
In-Reply-To: <ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Thu, 3 Jun 2021 01:49 UTC

On 6/2/2021 11:27 AM, MitchAlsup wrote:
> On Tuesday, June 1, 2021 at 10:33:28 PM UTC-5, Quadibloc wrote:
>> On Tuesday, June 1, 2021 at 8:29:28 PM UTC-6, MitchAlsup wrote:
>>> On Tuesday, June 1, 2021 at 9:04:00 PM UTC-5, Quadibloc wrote:
>>
>>>> But I thought if a subroutine is bigger than 4,096 bytes, you use one BALR instruction,
>>>> and just fill several base registers with BASE, BASE+4,096, BASE+8,192 and so on.
>>>> After all, if you have any forwards branches, it would be too late for another BALR
>>>> when you cross the boundary...
>>
>>> Right, so you burn up 2,3,4 registers for branch base registers, and burn up another 2,3,4
>>> registers for large arrays, and then you remember you only have 13 usable registers, so
>>> you are left with 6 registers in which to "do work". This is the problem of base register
>>> machines and why modern machines have "different arithmetic" for branching [ip+disp]
>>> as compared to memory referencing [Rbase+disp].
>> Ah.
>>
>> Then, it's a _good_ thing that my design uses 16-bit displacements, despite the fact
>> that it has to make some sacrifices in order to do so.
> <
> Cuts down on the register waste, so do 32-bit and 64-bit displacements.
>>
>> I haven't used program-relative addressing for branches very much in the design,
>> on the other hand. Only the 16-bit branch instruction is program counter-relative.
>>
>> But I _do_ recognize the problem of large arrays. I did _not_ like the idea that if you
>> have an array bigger than 64K (or 4K on the 360) you need basically one base register
>> per array. That's why I included "Array Mode" addressing in my design. That's where
>> base register 0, since it isn't used as a base register, points to a table... containing the
>> start addresses of arrays. So the index register gives the position in the array... basically,
>> it's indirect post-indexed addressing, but instead of being general, it's only possible for
>> the array addresses in this one table... that gets loaded into cache.
>>> I worked at S.E.L. 1980-1983 and they were adamant that it was not SEL.
>> I'll have to take another look at some of their old advertisements then. Of course,
>> since they're not writing my paycheck, I'm not too worried about their feelings...
>>> Which gets at the heart of my critique:: why offer VLIW features at all ??
>> That's a good question. After all, Ivan Godard is a nice guy, so why should I
>> want to annoy him?
>>
>> Since he has mentioned the Itanium as a promising architecture in some of his
> <
> When it began, Itanium had promise.
> When the first chips taped out, it no longer did.
> <
>> posts, and his intent is to make a lightweight design that competes with OoO
>> without its overhead... the presence of VLIW features could indeed be construed
>> as an intent of competing with him - although I am woefully unequipped to actually
>> manage that.
>>
>> One thing is that I noted in the VLIW and VLIW-like designs I've seen that they
>> just have the equivalent of a 'break' bit. They indicate that "here's a chunk of instructions
>> that can be executed simultaneously... and now here's the next chunk".
> <
> The break bit is equivalent to "the rest of the bundle is NoOps."
>>
>> That's all very well, but is that enough to tell a dumb CPU what to do?
> <
> I, personally, don't think building dumb CPUs is a worthy endeavor.
>>
>> Suppose there's a dependency. If the machine has to take a cycle between starting
>> each instruction (one-wide decode unit) and an instruction takes X cycles to execute, then
>> I can deal with dependencies by coding X-1 instructions between the instructions involved in
>> the dependency.
> <
> And this is where VLIW breaks down. What is X is variable or X changes between implementations ?
> <
> Mill solves this problem by distributing applications in re-compiled intermediate format.
> And Mill has a specializer (code generator) for each implementation.
> Mill does not have a fixed ISA, but one tuned to each implementation.
> <
> So, if you want VLIW, you don't want a fixed ISA !!

re-compiled -> pre-compiled?

Re: Squeezing Those Bits: Concertina II

<8ecd4d89-47b3-427e-be13-91cdf0476668n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17340&group=comp.arch#17340

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:e4d:: with SMTP id o13mr18530600qvc.19.1622687812880;
Wed, 02 Jun 2021 19:36:52 -0700 (PDT)
X-Received: by 2002:a05:6830:40a4:: with SMTP id x36mr27465637ott.342.1622687812564;
Wed, 02 Jun 2021 19:36:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 19:36:52 -0700 (PDT)
In-Reply-To: <ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:4473:74b7:5972:fc62;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:4473:74b7:5972:fc62
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
<ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8ecd4d89-47b3-427e-be13-91cdf0476668n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 03 Jun 2021 02:36:52 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Thu, 3 Jun 2021 02:36 UTC

On Wednesday, June 2, 2021 at 12:27:23 PM UTC-6, MitchAlsup wrote:

> Taking the light weight nature of VLIW and crushing it with added baggage.

Yes, that's a fair criticism. But implementations can omit features. Including the option of
VLIW. (Preferred is to accept VLIW programs, but basically ignore all the hinting.)

> Today's mainframes from IBM are not S/360 (or even S/370, 3080, 3090) they are another
> super extended 64-bit, but sometimes 32-bit ISA will 3 kinds of floating point (HEX, binary, decimal)
> and everything including the kitchen sink.

Although they still have 00 -> 16 bits, 11-> 48 bits, and 01 and 10 -> 32 bits.

And the things they've added were all things that met the needs of their customers; hex and binary
floating point give compatibility with old software on the one hand, and the better numerical
properties of IEEE-754 on the other.

John Savard

Re: Squeezing Those Bits: Concertina II

<8c585853-8d3b-4529-bed7-c42b13695f10n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17341&group=comp.arch#17341

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:574:: with SMTP id p20mr30242189qkp.70.1622688084667;
Wed, 02 Jun 2021 19:41:24 -0700 (PDT)
X-Received: by 2002:a9d:19ed:: with SMTP id k100mr27679304otk.329.1622688084441;
Wed, 02 Jun 2021 19:41:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Jun 2021 19:41:24 -0700 (PDT)
In-Reply-To: <ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:4473:74b7:5972:fc62;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:4473:74b7:5972:fc62
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<030135f6-d63c-4b9b-8461-0ae08cfd5912n@googlegroups.com> <93c20171-88e1-4b0f-9919-2723cb3cf7dbn@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <4fb02966-46dc-4218-a26b-836ac68ecbb3n@googlegroups.com>
<ad2a41ce-c25e-4f84-b77c-bea8550f3b7bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8c585853-8d3b-4529-bed7-c42b13695f10n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 03 Jun 2021 02:41:24 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Thu, 3 Jun 2021 02:41 UTC

On Wednesday, June 2, 2021 at 12:27:23 PM UTC-6, MitchAlsup wrote:
> On Tuesday, June 1, 2021 at 10:33:28 PM UTC-5, Quadibloc wrote:

> > It is indeed odd to try to unify VLIW with CISC, though. After all, CISC means you've got some
> > stuff in your architecture that has to be microcoded. That won't derive benefits from VLIW
> > features.

> Disagree on 2 counts: CISC does not need microcode and some RISC designs should consider
> being mircocoded. I consider microcode a programatic way to build sequencers, nothing more
> or less.

Oh, of course. I should have been more careful.

It's just that, in general, CISC is more likely to have instructions that cry out for some kind of
microcoding; i.e. stuff like EDMK from System/360.

John Savard

Re: Squeezing Those Bits: Concertina II

<s99v64$hsp$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17345&group=comp.arch#17345

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-2e93-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Thu, 3 Jun 2021 07:08:52 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s99v64$hsp$2@newsreader4.netcologne.de>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com>
<38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com>
<86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com>
<3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com>
<859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>
<s989ik$itn$1@dont-email.me>
<21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
<7d48604f-f7cd-43f8-be3c-ad3fc9242058n@googlegroups.com>
Injection-Date: Thu, 3 Jun 2021 07:08:52 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-2e93-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:2e93:0:7285:c2ff:fe6c:992d";
logging-data="18329"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Thu, 3 Jun 2021 07:08 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

> In 30 total gates, and in 4-gates of delay, one can decode My 66000
> instructions to determine their length, and the offsets in the instruction
> stream of any constants.

How many gate delays does a modern architecture usually have per cycle?
(I know different gates have different delays, but a ballpark figure
would be very interesting).

>IBM 360 does this in 2-gates of delay. Both
> share the advantage that the bits decoded are all fixed over all instruction
> formats. So, My 66000 is only 2-gates behind at this point.
><
> From here we setup a tree of find next instruction logic (2-gates) and we
> parse (determine starting word and all constant words) 2 then 4 then 8
> then 16 instructions at 2-gates of delay each level. So parsing 16 instructions
> per cycle is a 12-gate delay problem. Easy Peasy in a 16-gate cycle. The
> original IBM 360 would be 2-gates shorter, the current monstrosity would
> be longer.

Is 16 gates per cycle particularly short?

Re: Squeezing Those Bits: Concertina II

<s9a90s$g5j$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17348&group=comp.arch#17348

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Thu, 3 Jun 2021 11:56:43 +0200
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <s9a90s$g5j$1@dont-email.me>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com>
<s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com>
<4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<s94le0$3cr$1@dont-email.me>
<d934adf6-2832-4f14-8235-d3bddc8f0c26n@googlegroups.com>
<s963s5$4sa$1@newsreader4.netcologne.de> <s97m1u$8kh$1@dont-email.me>
<2021Jun2.183620@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 3 Jun 2021 09:56:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2f18e550773e128cae6d1b9719bb008c";
logging-data="16563"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+OCRUmsL8VGCihjL7dA8e0zC1eSxUqxb8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:hfpi6lajg+4Qj8JhWYZhQvT8g1o=
In-Reply-To: <2021Jun2.183620@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Marcus - Thu, 3 Jun 2021 09:56 UTC

On 2021-06-02 Anton Ertl wrote:
> Marcus <m.delete@this.bitsnbites.eu> writes:
>> A different axis is the dynamic instruction count (i.e. instruction
>> execution frequencies) which I think is more relevant for
>> performance analysis/tuning of an ISA.
>
> One would think so, but the problem is that in a corpus of large
> programs the dynamically executed instructions are mostly confined to
> a few hot spots, and the rest of the large programs them plays hardly
> any role. And as a consequence, the most frequent dynamically
> executed instruction sequences tend to be not very representative of
> what happens in other programs.

Interesting observation. I suspected that something similar would be the
case, since looping/hot code must give a very strong bias.

I have planned to add instruction frequency profiling to my simulator (I
already have symbol-based function profiling which has proven very
useful), but I'll keep in mind the difficulties that you pointed out.

I wonder if you could improve the situation if you used something like
different scales (logarithmic?) and some sort of filtering or
thresholding (e.g. count each memory location at most N times) to
reduce the bias from loops? Or possibly categorize counts into different
bins (e.g. "cold", "medium", "hot").

>
> That was certainly our experience in our work on virtual machine
> superinstructions (a sequence of virtual instructions is combined into
> one virtual superinstruction): When selecting superinstructions based
> on a set of benchmarks, and then measuring the performance of a
> benchmark that is not in the set, we saw better results when selecting
> the statically most frequently occuring sequences than when selecting
> the dynamically most frequently occuring sequences. E.g., read
> Chapter 3 of
>
> @MastersThesis{eller05,
> author = {Helmut Eller},
> title = {Optimizing Interpreters with Superinstructions},
> school = {TU Wien},
> year = {2005},
> type = {Diplomarbeit},
> url = {http://www.complang.tuwien.ac.at/Diplomarbeiten/eller05.ps.gz},
> abstract = {Superinstructions can be used to make virtual
> machine (VM) interpreters faster. A superinstruction
> is a combination of simpler VM instructions which
> can be executed faster than the corresponding
> sequence of simpler VM instructions, because the
> interpretative overhead, like instruction dispatch
> and argument fetching, is reduced. This work
> discusses the following three topics related to
> superinstructions. First, I present some heuristics
> to choose superinstructions. I evaluated the
> heuristics for Forth and Java programs. If the
> number of allowed superinstructions was very large,
> $> 1000$, then the heuristic which chooses all
> possible subsequences up to length 4 achieved the
> best results. If the number of allowed
> superinstructions was more limited, then a heuristic
> which favors short sequences and sequences which
> occur in many different programs and many different
> basic blocks performed better than the
> others. Second, I compare a simple greedy algorithm
> and an optimal algorithm to cover a program with
> superinstructions. I found that the greedy algorithm
> achieves almost optimal results. Finally, I compare
> superinstructions with non-sequential patterns. In
> my experiments, superinstructions performed slightly
> better than non-sequential patterns.}
> }
>
> - anton
>

Re: Squeezing Those Bits: Concertina II

<2021Jun3.172822@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17351&group=comp.arch#17351

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Squeezing Those Bits: Concertina II
Date: Thu, 03 Jun 2021 15:28:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 33
Message-ID: <2021Jun3.172822@mips.complang.tuwien.ac.at>
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com> <563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me> <2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com> <s94le0$3cr$1@dont-email.me> <d934adf6-2832-4f14-8235-d3bddc8f0c26n@googlegroups.com> <s963s5$4sa$1@newsreader4.netcologne.de> <s97m1u$8kh$1@dont-email.me> <2021Jun2.183620@mips.complang.tuwien.ac.at> <s9a90s$g5j$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="c660a4d80459a6fcc440eafa0dd818d4";
logging-data="27407"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/kg8EMQagxmUHGtMWGHwJ"
Cancel-Lock: sha1:Hg3zI2FFTKVhRQIlrrIcp2HKqig=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 3 Jun 2021 15:28 UTC

Marcus <m.delete@this.bitsnbites.eu> writes:
>I wonder if you could improve the situation if you used something like
>different scales (logarithmic?) and some sort of filtering or
>thresholding (e.g. count each memory location at most N times) to
>reduce the bias from loops? Or possibly categorize counts into different
>bins (e.g. "cold", "medium", "hot").

That's plausible, but I have no empirical results on such schemes.
One would think that this is a common idea and that there would be a
lot of empirical data on it, but I am not aware of that (but then, I
have not read as many papers in recent years as I used to).

Thinking about how such an absence (if it is one) could be explained,
I thought about application areas:

* Superinstructions/superoperators/supercombinators in interpreters.
Here we can combine many simple instructions/operators/combinators,
and selecting many unrepresentative sequences can have a significant
negative effect.

* Instruction-set design for hardware architectures: Since the RISC
revolution one tends to avoid big instructions, so the problem above
does not occur in that severity. For, e.g., selecting immediate-field
widths, there are not so many variations, so the hot code in the
benchmarks is probably not that unrepresentative of other hot code,
and may be more representative than, say, initialization code.

Anything else?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Squeezing Those Bits: Concertina II

<44eabf62-646d-429e-a977-06c11fdfb2c4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17353&group=comp.arch#17353

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7c50:: with SMTP id o16mr415920qtv.153.1622739755870;
Thu, 03 Jun 2021 10:02:35 -0700 (PDT)
X-Received: by 2002:a9d:19ed:: with SMTP id k100mr197210otk.329.1622739755540;
Thu, 03 Jun 2021 10:02:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 3 Jun 2021 10:02:35 -0700 (PDT)
In-Reply-To: <s99v64$hsp$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8c6b:cdac:2e69:2811;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8c6b:cdac:2e69:2811
References: <698865df-06a6-4ec1-ae71-a36ccc30b30an@googlegroups.com>
<81deeb7a-4f9f-4e5c-95bd-64eac1fcf53cn@googlegroups.com> <38e59b03-7103-477a-957e-63ef18b72a4dn@googlegroups.com>
<caf484d6-4574-4909-bc8a-ed944fc9bddcn@googlegroups.com> <805ec395-f39c-403b-bdc3-5110653e237fn@googlegroups.com>
<563fa215-c166-4906-bf4b-e715c8b002c7n@googlegroups.com> <s93lcf$1p1$1@dont-email.me>
<2a75fedf-7f84-41df-a12f-46e70a3bd696n@googlegroups.com> <4b68e3b2-6343-429f-9afd-cb124f378817n@googlegroups.com>
<7180f6f6-d57b-4191-bddd-ef20e4f35a1dn@googlegroups.com> <86e10294-a1ce-41c3-9d56-6f73afce5dean@googlegroups.com>
<110d93f7-d8bc-4523-869d-16f4249fad00n@googlegroups.com> <3d8d0ac1-0462-4525-82fd-9dca309f038en@googlegroups.com>
<51734e5c-3a02-4079-a178-f7f46c442504n@googlegroups.com> <859da8cd-bf0b-478d-8d8b-b0d11252dfe1n@googlegroups.com>
<s989ik$itn$1@dont-email.me> <21c9b7a3-6dbe-4f84-a3bc-e3971552e772n@googlegroups.com>
<7d48604f-f7cd-43f8-be3c-ad3fc9242058n@googlegroups.com> <s99v64$hsp$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <44eabf62-646d-429e-a977-06c11fdfb2c4n@googlegroups.com>
Subject: Re: Squeezing Those Bits: Concertina II
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 03 Jun 2021 17:02:35 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 3 Jun 2021 17:02 UTC

On Thursday, June 3, 2021 at 2:08:55 AM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > In 30 total gates, and in 4-gates of delay, one can decode My 66000
> > instructions to determine their length, and the offsets in the instruction
> > stream of any constants.
> How many gate delays does a modern architecture usually have per cycle?
> (I know different gates have different delays, but a ballpark figure
> would be very interesting).
<
Really fast machines 12-gates
more typical machines 16-gates
<
> >IBM 360 does this in 2-gates of delay. Both
> > share the advantage that the bits decoded are all fixed over all instruction
> > formats. So, My 66000 is only 2-gates behind at this point.
> ><
> > From here we setup a tree of find next instruction logic (2-gates) and we
> > parse (determine starting word and all constant words) 2 then 4 then 8
> > then 16 instructions at 2-gates of delay each level. So parsing 16 instructions
> > per cycle is a 12-gate delay problem. Easy Peasy in a 16-gate cycle. The
> > original IBM 360 would be 2-gates shorter, the current monstrosity would
> > be longer.
<
> Is 16 gates per cycle particularly short?
<
It is enough to perform an add, drive the result bus and make setup timing
as a forwarded result->operand. 64-bit add = 11-gates. Drive result bus 3-
gates, consume as operand 2-gates.
<
It used to be enough to perform a direct mapped cache hit and load-align
in 2 cycles, it is no longer, the minimum number is 3 cycles. Many of the
really fast machines have gone to 4 cycles to use set associative caches.
<
I prefer a 20-gate per cycle design point. This is slow enough to reuse
the bit lines in the register file reads on 1/2 cycle, writes on the other
1/2 cycle, and slow enough to "hit" cache SRAMs twice per cycle
(solving many porting problems.)
<
With a 20 gate per cycle design point, one can build a 6-wide reservation
station machine with back to back integer, 3 cycle LDs, 3 LDs per cycle,
4 cycle FMAC, 17 cycle FDIV; and 6-ported register files into a 6-7 stage
pipeline.
<
At 16 cycles this necessarily becomes 9-10 stages.
<
At 12 gates this necessarily becomes 12-15 stages.

Pages:1234567
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor