Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Usage: fortune -P [] -a [xsz] [Q: [file]] [rKe9] -v6[+] dataspec ... inputdir


devel / comp.arch / Re: Benefits of a write once constraint on register values in an instruction loop

SubjectAuthor
* Benefits of a write once constraint on register values in anJimBrakefield
+* Re: Benefits of a write once constraint on register values in anIvan Godard
|`* Re: Benefits of a write once constraint on register values in an instruction looJimBrakefield
| `* Re: Benefits of a write once constraint on register values in an instruction looJimBrakefield
|  `* Re: Benefits of a write once constraint on register values in an instruction looMitchAlsup
|   `* Re: Benefits of a write once constraint on register values in an instruction looJimBrakefield
|    `- Re: Benefits of a write once constraint on register values in anMitchAlsup
+* Re: Benefits of a write once constraint on register values in anMitchAlsup
|`* Re: Benefits of a write once constraint on register values in an instruction looJimBrakefield
| +* Re: Benefits of a write once constraint on register values in anThomas Koenig
| |+* Re: Benefits of a write once constraint on register values in anMitchAlsup
| ||`* Re: Benefits of a write once constraint on register values in an instruction looJimBrakefield
| || `- Re: Benefits of a write once constraint on register values in anMitchAlsup
| |+- Re: Benefits of a write once constraint on register values in anMarcus
| |`- Re: Benefits of a write once constraint on register values in anJimBrakefield
| `- Re: Benefits of a write once constraint on register values in anMitchAlsup
`- Re: Benefits of a write once constraint on register values in an instruction looAnton Ertl

1
Benefits of a write once constraint on register values in an instruction loop

<7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19291&group=comp.arch#19291

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1704:: with SMTP id h4mr1586616qtk.346.1627509202657;
Wed, 28 Jul 2021 14:53:22 -0700 (PDT)
X-Received: by 2002:a05:6808:f02:: with SMTP id m2mr1068759oiw.0.1627509202418;
Wed, 28 Jul 2021 14:53:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jul 2021 14:53:22 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
Subject: Benefits of a write once constraint on register values in an
instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Wed, 28 Jul 2021 21:53:22 +0000
Content-Type: text/plain; charset="UTF-8"
 by: JimBrakefield - Wed, 28 Jul 2021 21:53 UTC

Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
Loop instructions can be issued in any order, even simultaneously.

Issues: what are the savings from not having register renaming?
Are there enough registers (e.g. is a larger register file needed)?
For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
A burst mechanism for initializing the function unit input buffers or reservation stations?
This is a somewhat novel approach? to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
it is undetermined? There is considerable room for innovation?

Jim Brakefield

Re: Benefits of a write once constraint on register values in an instruction loop

<sdslob$qeh$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19294&group=comp.arch#19294

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
Date: Wed, 28 Jul 2021 15:29:31 -0700
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <sdslob$qeh$1@dont-email.me>
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 28 Jul 2021 22:29:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fcd68b2ae732589a86b718fcfaba22dd";
logging-data="27089"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/cMUf/MrGF0YM3VZlZgsI"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:xxSjGhTmt1CCH43T5BOjkipZm2s=
In-Reply-To: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 28 Jul 2021 22:29 UTC

On 7/28/2021 2:53 PM, JimBrakefield wrote:
> Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
> Loop instructions can be issued in any order, even simultaneously.

Write-once is complicated to do in a genreg machine, because the
encoding needs some way to distinguish write-once from write-many.

A belt is entirely write-once, so while I can't comment on their use in
genregs, our experience with the belt may be useful to you. YMMV.

> Issues: what are the savings from not having register renaming?

Large, and increases super-linearly as the number of ports increases.
The belt doesn't have ports in the sense of a regfile, but it does have
multiple concurrent reads, just as multiported regfiles do. IANAHG, but
the bigger win is reported to be the absence of write port equivalents.

> Are there enough registers (e.g. is a larger register file needed)?

No, although you do need cheap spill/fill

> For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?

You need some way to distinguish values by iteration number. In a belt
this falls out, because they are at different (and statically known)
belt positions. While putting an iteration number in the register name
seems plausible, you migh get encoding problems.

> Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?

Can't say, we don't do it. Our decode is enough of a firehose that it's
simple enough to just drop them n the belt each iteration. Yes, that
contributes to belt pressure (reg pressure for you); it doesn't seem to
be a problem on our belt sizes, but I can see where it might be with regs.

> A burst mechanism for initializing the function unit input buffers or reservation stations?

Don't know.

> This is a somewhat novel approach? to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> it is undetermined?

Certainly novel, at least to me. If you push it with more thinking you
might up not much like a regular ISA at all, but more like a cellular
automaton or stream machine.

> There is considerable room for innovation?

Yes :-)

Re: Benefits of a write once constraint on register values in an instruction loop

<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19295&group=comp.arch#19295

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9a47:: with SMTP id c68mr2011245qke.37.1627511986684;
Wed, 28 Jul 2021 15:39:46 -0700 (PDT)
X-Received: by 2002:a05:6808:158a:: with SMTP id t10mr8096492oiw.175.1627511986466;
Wed, 28 Jul 2021 15:39:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jul 2021 15:39:46 -0700 (PDT)
In-Reply-To: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b902:cdd8:91d0:9b96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b902:cdd8:91d0:9b96
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 28 Jul 2021 22:39:46 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 28 Jul 2021 22:39 UTC

On Wednesday, July 28, 2021 at 4:53:23 PM UTC-5, JimBrakefield wrote:
> Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
<
OO is Object Oriented
OoO is Out of Order
<
> Loop instructions can be issued in any order, even simultaneously.
>
> Issues: what are the savings from not having register renaming?
<
Probably not enough to make this a key microarchitectural assumption. If you
have resourced the machine to run at peak performance for multiple milli-
seconds in a row, you already have enough rename read ports. If not, you
are already up sh!t creek.
<
But note: The VEC-LOOP construct in My 66000 uses a single rename over
the entire loop (body) and concatenates a loop iteration count to this name
making it loop unique without eating lots of rename space (and allowing
the front end to be quiescent during loop execution.
<
> Are there enough registers (e.g. is a larger register file needed)?
<
The general trick is that (logical) register names are used to setup the
data-flow dependencies. I might note that nothing in Livermore loops (*.c)
required anything more than the 32 GPRs in My 66000.
<
> For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
<
You need to solve the question of whether memory is dense and independent
and at this point rewriting into SIMD is fairly easy. The second problem is to
identify the produce in this loop and consume in this loop from the produce in
a previous loop and consumed in this loop. I call the former vector data and the
later loop-carried data.
<
> Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
<
My 66000 loads scalar register values into station entries during loop installation.
Subsequently, these values do not need to be read fro the RF on a per loop
iteration. Each station will await its vector or loop-carried dependencies just
like any reservation station entry would.
<
> A burst mechanism for initializing the function unit input buffers or reservation stations?
<
I did not find this necessary as it speeds up only the first iteration.
<
> This is a somewhat novel approach?
<
Sounds essentially what VVM does (or enables).........
<
> to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> it is undetermined? There is considerable room for innovation?
<
Having read the code for LL spit out from Brian's compiler, I can't see room
for a lot of improvement within the realm of RISC architectures, except perhaps
in code density.
>
> Jim Brakefield

Re: Benefits of a write once constraint on register values in an instruction loop

<78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19298&group=comp.arch#19298

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:209:: with SMTP id b9mr2509809qtx.136.1627525171571; Wed, 28 Jul 2021 19:19:31 -0700 (PDT)
X-Received: by 2002:a9d:7353:: with SMTP id l19mr2017230otk.76.1627525171298; Wed, 28 Jul 2021 19:19:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jul 2021 19:19:31 -0700 (PDT)
In-Reply-To: <sdslob$qeh$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com> <sdslob$qeh$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Thu, 29 Jul 2021 02:19:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 49
 by: JimBrakefield - Thu, 29 Jul 2021 02:19 UTC

On Wednesday, July 28, 2021 at 5:29:34 PM UTC-5, Ivan Godard wrote:
> On 7/28/2021 2:53 PM, JimBrakefield wrote:
> > Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
> > Loop instructions can be issued in any order, even simultaneously.
> Write-once is complicated to do in a genreg machine, because the
> encoding needs some way to distinguish write-once from write-many.
>
> A belt is entirely write-once, so while I can't comment on their use in
> genregs, our experience with the belt may be useful to you. YMMV.
> > Issues: what are the savings from not having register renaming?
> Large, and increases super-linearly as the number of ports increases.
> The belt doesn't have ports in the sense of a regfile, but it does have
> multiple concurrent reads, just as multiported regfiles do. IANAHG, but
> the bigger win is reported to be the absence of write port equivalents.
> > Are there enough registers (e.g. is a larger register file needed)?
> No, although you do need cheap spill/fill
> > For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
> You need some way to distinguish values by iteration number. In a belt
> this falls out, because they are at different (and statically known)
> belt positions. While putting an iteration number in the register name
> seems plausible, you migh get encoding problems.
> > Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
> Can't say, we don't do it. Our decode is enough of a firehose that it's
> simple enough to just drop them n the belt each iteration. Yes, that
> contributes to belt pressure (reg pressure for you); it doesn't seem to
> be a problem on our belt sizes, but I can see where it might be with regs.
> > A burst mechanism for initializing the function unit input buffers or reservation stations?
> Don't know.
> > This is a somewhat novel approach? to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> > it is undetermined?
> Certainly novel, at least to me. If you push it with more thinking you
> might up not much like a regular ISA at all, but more like a cellular
> automaton or stream machine.
> > There is considerable room for innovation?
> Yes :-)

Although the belt is write once, it is logically ordered by instruction sequence. Which the compiler arranges.
So there does not seem to be a simple mapping between the belt and this write once register block idea.
I'll skip the details and state that the register numbers in the function unit reservation stations are not used to write to the registers,
instead the register numbers describe the dependency graph of results and thus control the order of evaluation.
(am assuming function units wait for both operands which are obtained individually when function units broadcast their results)

So the idea maps most easily to OoO RISC architecture. There is a mapping to a stack/accumulator machine (the writes are to the data stack).
At the end of the loop the stack pointer is reset. Requires in-order instruction issue?

> might up not much like a regular ISA at all, but more like a cellular
> automaton or stream machine.
Here the goal is to keep as many pipelined function units busy as possible.
So in that sense it is a stream machine. The "draw" is that the OoO RISC hardware, without renaming, is repurposed into a stream machine.

Re: Benefits of a write once constraint on register values in an instruction loop

<64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19299&group=comp.arch#19299

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1810:: with SMTP id t16mr2416951qtc.272.1627525694531; Wed, 28 Jul 2021 19:28:14 -0700 (PDT)
X-Received: by 2002:a9d:491c:: with SMTP id e28mr1912241otf.342.1627525694187; Wed, 28 Jul 2021 19:28:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!feeder2.ecngs.de!ecngs!feeder.ecngs.de!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jul 2021 19:28:13 -0700 (PDT)
In-Reply-To: <923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com> <923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Thu, 29 Jul 2021 02:28:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 64
 by: JimBrakefield - Thu, 29 Jul 2021 02:28 UTC

On Wednesday, July 28, 2021 at 5:39:47 PM UTC-5, MitchAlsup wrote:
> On Wednesday, July 28, 2021 at 4:53:23 PM UTC-5, JimBrakefield wrote:
> > Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
> <
> OO is Object Oriented
> OoO is Out of Order
> <
> > Loop instructions can be issued in any order, even simultaneously.
> >
> > Issues: what are the savings from not having register renaming?
> <
> Probably not enough to make this a key microarchitectural assumption. If you
> have resourced the machine to run at peak performance for multiple milli-
> seconds in a row, you already have enough rename read ports. If not, you
> are already up sh!t creek.
> <
> But note: The VEC-LOOP construct in My 66000 uses a single rename over
> the entire loop (body) and concatenates a loop iteration count to this name
> making it loop unique without eating lots of rename space (and allowing
> the front end to be quiescent during loop execution.

Ah, now I can understand the My 66000 looping implementation.

> <
> > Are there enough registers (e.g. is a larger register file needed)?
> <
> The general trick is that (logical) register names are used to setup the
> data-flow dependencies. I might note that nothing in Livermore loops (*.c)
> required anything more than the 32 GPRs in My 66000.
> <
> > For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
> <
> You need to solve the question of whether memory is dense and independent
> and at this point rewriting into SIMD is fairly easy. The second problem is to
> identify the produce in this loop and consume in this loop from the produce in
> a previous loop and consumed in this loop. I call the former vector data and the
> later loop-carried data.
> <
> > Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
> <
> My 66000 loads scalar register values into station entries during loop installation.
> Subsequently, these values do not need to be read fro the RF on a per loop
> iteration. Each station will await its vector or loop-carried dependencies just
> like any reservation station entry would.
> <
> > A burst mechanism for initializing the function unit input buffers or reservation stations?
> <
> I did not find this necessary as it speeds up only the first iteration.
> <
> > This is a somewhat novel approach?
> <
> Sounds essentially what VVM does (or enables).........
> <
> > to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> > it is undetermined? There is considerable room for innovation?
> <
> Having read the code for LL spit out from Brian's compiler, I can't see room
> for a lot of improvement within the realm of RISC architectures, except perhaps
> in code density.

Livermore Loops code density would appear to benefit from load/store register indirect auto increment.
And from an accumulator or stack ISA (one operand source and result destination implied).

> >
> > Jim Brakefield

Re: Benefits of a write once constraint on register values in an instruction loop

<sdtval$6bd$1@newsreader4.netcologne.de>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19301&group=comp.arch#19301

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-c228-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
Date: Thu, 29 Jul 2021 10:19:01 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sdtval$6bd$1@newsreader4.netcologne.de>
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com>
<64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
Injection-Date: Thu, 29 Jul 2021 10:19:01 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-c228-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:c228:0:7285:c2ff:fe6c:992d";
logging-data="6509"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Thu, 29 Jul 2021 10:19 UTC

JimBrakefield <jim.brakefield@ieee.org> schrieb:

> Livermore Loops code density would appear to benefit from
> load/store register indirect auto increment.

Why is there such a big concern with Livermore Loops, especially
for code density? I would assume that any CPU used for scientific
work would run these out of its L1 cache without problems.

Apart from that, any auto increment / decrement will result in two
stores, so presumably it will be cracked into two microops anyway.

If you want do do something for instruction density where it
helps many of today's users, look towards browers, java VM and
Javascript engines and video codecs.

Plus, of course, games, but I don't know of any benchmark
there.

Re: Benefits of a write once constraint on register values in an instruction loop

<2021Jul29.155657@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19304&group=comp.arch#19304

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
Date: Thu, 29 Jul 2021 13:56:57 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 15
Message-ID: <2021Jul29.155657@mips.complang.tuwien.ac.at>
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="ee4420149de74c857bd1c499e94a7eec";
logging-data="15561"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+c1yVZuX5/HLlD82ysOg9j"
Cancel-Lock: sha1:6+eQY4PJLD7veJTzHnyCNX4cu0E=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 29 Jul 2021 13:56 UTC

JimBrakefield <jim.brakefield@ieee.org> writes:
>Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
>Loop instructions can be issued in any order, even simultaneously.
>
>Issues: what are the savings from not having register renaming?

None on a modern OoO design. Register renaming is essential not only
for OoO execution of straight-line code, but also for branch
prediction and precise exceptions. That's why we don't see OoO
without register renaming nor in-order with register renaming.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Benefits of a write once constraint on register values in an instruction loop

<5a7ddd1a-da80-4ae0-aa15-8d03f249e979n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19310&group=comp.arch#19310

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1193:: with SMTP id b19mr6393297qkk.439.1627583280042;
Thu, 29 Jul 2021 11:28:00 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr4343744oti.178.1627583279824;
Thu, 29 Jul 2021 11:27:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 11:27:59 -0700 (PDT)
In-Reply-To: <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4d78:fd0f:f097:e375;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4d78:fd0f:f097:e375
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com> <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a7ddd1a-da80-4ae0-aa15-8d03f249e979n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 29 Jul 2021 18:28:00 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 29 Jul 2021 18:27 UTC

On Wednesday, July 28, 2021 at 9:28:15 PM UTC-5, JimBrakefield wrote:
> On Wednesday, July 28, 2021 at 5:39:47 PM UTC-5, MitchAlsup wrote:
> > On Wednesday, July 28, 2021 at 4:53:23 PM UTC-5, JimBrakefield wrote:
> > > Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
> > <
> > OO is Object Oriented
> > OoO is Out of Order
> > <
> > > Loop instructions can be issued in any order, even simultaneously.
> > >
> > > Issues: what are the savings from not having register renaming?
> > <
> > Probably not enough to make this a key microarchitectural assumption. If you
> > have resourced the machine to run at peak performance for multiple milli-
> > seconds in a row, you already have enough rename read ports. If not, you
> > are already up sh!t creek.
> > <
> > But note: The VEC-LOOP construct in My 66000 uses a single rename over
> > the entire loop (body) and concatenates a loop iteration count to this name
> > making it loop unique without eating lots of rename space (and allowing
> > the front end to be quiescent during loop execution.
> Ah, now I can understand the My 66000 looping implementation.
> > <
> > > Are there enough registers (e.g. is a larger register file needed)?
> > <
> > The general trick is that (logical) register names are used to setup the
> > data-flow dependencies. I might note that nothing in Livermore loops (*.c)
> > required anything more than the 32 GPRs in My 66000.
> > <
> > > For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
> > <
> > You need to solve the question of whether memory is dense and independent
> > and at this point rewriting into SIMD is fairly easy. The second problem is to
> > identify the produce in this loop and consume in this loop from the produce in
> > a previous loop and consumed in this loop. I call the former vector data and the
> > later loop-carried data.
> > <
> > > Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
> > <
> > My 66000 loads scalar register values into station entries during loop installation.
> > Subsequently, these values do not need to be read fro the RF on a per loop
> > iteration. Each station will await its vector or loop-carried dependencies just
> > like any reservation station entry would.
> > <
> > > A burst mechanism for initializing the function unit input buffers or reservation stations?
> > <
> > I did not find this necessary as it speeds up only the first iteration.
> > <
> > > This is a somewhat novel approach?
> > <
> > Sounds essentially what VVM does (or enables).........
> > <
> > > to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> > > it is undetermined? There is considerable room for innovation?
> > <
> > Having read the code for LL spit out from Brian's compiler, I can't see room
> > for a lot of improvement within the realm of RISC architectures, except perhaps
> > in code density.
> Livermore Loops code density would appear to benefit from load/store register indirect auto increment.
> And from an accumulator or stack ISA (one operand source and result destination implied).
<
Not as much as you might think. The auto-increment part is fulfilled by the LOOP
instruction (which also does CMP and BC) and the efficient [Rbase+Rindex<<scale+Disp]
memory address generation.
>
> > >
> > > Jim Brakefield

Re: Benefits of a write once constraint on register values in an instruction loop

<e6ccb18f-56ba-40b4-b949-f32dbcf8fbdbn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19311&group=comp.arch#19311

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:62d:: with SMTP id 13mr6509452qkv.18.1627583370845;
Thu, 29 Jul 2021 11:29:30 -0700 (PDT)
X-Received: by 2002:a4a:dd04:: with SMTP id m4mr3809802oou.69.1627583369156;
Thu, 29 Jul 2021 11:29:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 11:29:28 -0700 (PDT)
In-Reply-To: <sdtval$6bd$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4d78:fd0f:f097:e375;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4d78:fd0f:f097:e375
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com> <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
<sdtval$6bd$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e6ccb18f-56ba-40b4-b949-f32dbcf8fbdbn@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 29 Jul 2021 18:29:30 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 29 Jul 2021 18:29 UTC

On Thursday, July 29, 2021 at 5:19:04 AM UTC-5, Thomas Koenig wrote:
> JimBrakefield <jim.bra...@ieee.org> schrieb:
> > Livermore Loops code density would appear to benefit from
> > load/store register indirect auto increment.
> Why is there such a big concern with Livermore Loops, especially
> for code density? I would assume that any CPU used for scientific
> work would run these out of its L1 cache without problems.
>
> Apart from that, any auto increment / decrement will result in two
> stores, so presumably it will be cracked into two microops anyway.
<
Two register file accesses: LD-w/AI 2 writes ST-w/AI 1write 1 read.
>
> If you want do do something for instruction density where it
> helps many of today's users, look towards browers, java VM and
> Javascript engines and video codecs.
>
> Plus, of course, games, but I don't know of any benchmark
> there.

Re: Benefits of a write once constraint on register values in an instruction loop

<sdusmd$gfl$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19313&group=comp.arch#19313

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
Date: Thu, 29 Jul 2021 20:40:13 +0200
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <sdusmd$gfl$1@dont-email.me>
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com>
<64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
<sdtval$6bd$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Jul 2021 18:40:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fdd2882c4a37cf078563d7c3c596eea0";
logging-data="16885"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sl6j3bRyWKxosxOrISwQKJcBYbktGapI="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:JnscQ9BLNa5/sgG7w1pwLD2FQPY=
In-Reply-To: <sdtval$6bd$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Thu, 29 Jul 2021 18:40 UTC

On 2021-07-29 12:19, Thomas Koenig wrote:
> JimBrakefield <jim.brakefield@ieee.org> schrieb:
>
>> Livermore Loops code density would appear to benefit from
>> load/store register indirect auto increment.
>
> Why is there such a big concern with Livermore Loops, especially
> for code density? I would assume that any CPU used for scientific
> work would run these out of its L1 cache without problems.
>
> Apart from that, any auto increment / decrement will result in two
> stores, so presumably it will be cracked into two microops anyway.
>
> If you want do do something for instruction density where it
> helps many of today's users, look towards browers, java VM and
> Javascript engines and video codecs.

Browsers (C++) and JavaScript (JIT), yes. Java VM, probably not so much
these days?

>
> Plus, of course, games, but I don't know of any benchmark
> there.

Games are tricky because they typically require lots of hardware
and platform support (like OpenGL/Vulkan, audio, various forms of I/O,
threading, etc), which may not be available in a lab environment that
you may have when evaluating an ISA. They may also be developed in a
more-complex-than-C language (e.g. C++17).

That said, there are some modern open source game engines that may be
interesting to dive into to find performance sensitive code, e.g. the
Godot Engine [1].

I have personally only gotten as far as Quake (written in C, in the
1990s), and I've found that it uses lots of Pentium/CISC:isms and
solutions that are not quite as relevant today as they were when the
game was created. So while it is an easy target, I'm very careful when
using it as a benchmark.

/Marcus

[1] https://github.com/godotengine/godot

Re: Benefits of a write once constraint on register values in an instruction loop

<f2cb3647-5275-423f-ac68-acbfc86c3919n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19329&group=comp.arch#19329

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5cee:: with SMTP id iv14mr7753871qvb.33.1627598007007;
Thu, 29 Jul 2021 15:33:27 -0700 (PDT)
X-Received: by 2002:a9d:7353:: with SMTP id l19mr5140924otk.76.1627598006749;
Thu, 29 Jul 2021 15:33:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 15:33:26 -0700 (PDT)
In-Reply-To: <sdtval$6bd$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com> <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
<sdtval$6bd$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f2cb3647-5275-423f-ac68-acbfc86c3919n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Thu, 29 Jul 2021 22:33:27 +0000
Content-Type: text/plain; charset="UTF-8"
 by: JimBrakefield - Thu, 29 Jul 2021 22:33 UTC

On Thursday, July 29, 2021 at 5:19:04 AM UTC-5, Thomas Koenig wrote:
> JimBrakefield <jim.bra...@ieee.org> schrieb:
> > Livermore Loops code density would appear to benefit from
> > load/store register indirect auto increment.
> Why is there such a big concern with Livermore Loops, especially
> for code density? I would assume that any CPU used for scientific
> work would run these out of its L1 cache without problems.
>
> Apart from that, any auto increment / decrement will result in two
> stores, so presumably it will be cracked into two microops anyway.
>
> If you want do do something for instruction density where it
> helps many of today's users, look towards browers, java VM and
> Javascript engines and video codecs.
>
> Plus, of course, games, but I don't know of any benchmark
> there.

Livermore Loops are short and simple, and presumably often used in
physics. Am manic on code density. For an accumulator/stack machine
an instruction is under 16-bits: ALU operation, register reference and an indirect
auto-increment flag.
|> Why is there such a big concern with Livermore Loops, especially
|> for code density?

Need to look at game engines.

Re: Benefits of a write once constraint on register values in an instruction loop

<2e49f45e-a7aa-4d5d-81b0-d8896aa3dfe2n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19331&group=comp.arch#19331

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4e0e:: with SMTP id dl14mr7818720qvb.37.1627598638809; Thu, 29 Jul 2021 15:43:58 -0700 (PDT)
X-Received: by 2002:a4a:ea83:: with SMTP id r3mr4507145ooh.89.1627598638557; Thu, 29 Jul 2021 15:43:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 15:43:58 -0700 (PDT)
In-Reply-To: <78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com> <sdslob$qeh$1@dont-email.me> <78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2e49f45e-a7aa-4d5d-81b0-d8896aa3dfe2n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Thu, 29 Jul 2021 22:43:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 54
 by: JimBrakefield - Thu, 29 Jul 2021 22:43 UTC

On Wednesday, July 28, 2021 at 9:19:32 PM UTC-5, JimBrakefield wrote:
> On Wednesday, July 28, 2021 at 5:29:34 PM UTC-5, Ivan Godard wrote:
> > On 7/28/2021 2:53 PM, JimBrakefield wrote:
> > > Consider a portion of a RISC register file used in a write-once mode per loop execution on a nominally OO design: Register renaming is not needed.
> > > Loop instructions can be issued in any order, even simultaneously.
> > Write-once is complicated to do in a genreg machine, because the
> > encoding needs some way to distinguish write-once from write-many.
> >
> > A belt is entirely write-once, so while I can't comment on their use in
> > genregs, our experience with the belt may be useful to you. YMMV.
> > > Issues: what are the savings from not having register renaming?
> > Large, and increases super-linearly as the number of ports increases.
> > The belt doesn't have ports in the sense of a regfile, but it does have
> > multiple concurrent reads, just as multiported regfiles do. IANAHG, but
> > the bigger win is reported to be the absence of write port equivalents.
> > > Are there enough registers (e.g. is a larger register file needed)?
> > No, although you do need cheap spill/fill
> > > For short loops mechanism needed to identify a register's loop number so multiple loop executions can exist simultaneously?
> > You need some way to distinguish values by iteration number. In a belt
> > this falls out, because they are at different (and statically known)
> > belt positions. While putting an iteration number in the register name
> > seems plausible, you migh get encoding problems.
> > > Loading of loop constant values from elsewhere in the register file into the functional units prior to loop execution?
> > Can't say, we don't do it. Our decode is enough of a firehose that it's
> > simple enough to just drop them n the belt each iteration. Yes, that
> > contributes to belt pressure (reg pressure for you); it doesn't seem to
> > be a problem on our belt sizes, but I can see where it might be with regs.
> > > A burst mechanism for initializing the function unit input buffers or reservation stations?
> > Don't know.
> > > This is a somewhat novel approach? to high performance loops based on the type of iterations found in the Livermore Loops. As such the best ISA for
> > > it is undetermined?
> > Certainly novel, at least to me. If you push it with more thinking you
> > might up not much like a regular ISA at all, but more like a cellular
> > automaton or stream machine.
> > > There is considerable room for innovation?
> > Yes :-)
> Although the belt is write once, it is logically ordered by instruction sequence. Which the compiler arranges.
> So there does not seem to be a simple mapping between the belt and this write once register block idea.
> I'll skip the details and state that the register numbers in the function unit reservation stations are not used to write to the registers,
> instead the register numbers describe the dependency graph of results and thus control the order of evaluation.
> (am assuming function units wait for both operands which are obtained individually when function units broadcast their results)
>
> So the idea maps most easily to OoO RISC architecture. There is a mapping to a stack/accumulator machine (the writes are to the data stack).
> At the end of the loop the stack pointer is reset. Requires in-order instruction issue?
> > might up not much like a regular ISA at all, but more like a cellular
> > automaton or stream machine.
> Here the goal is to keep as many pipelined function units busy as possible.
> So in that sense it is a stream machine. The "draw" is that the OoO RISC hardware, without renaming, is repurposed into a stream machine.

There is a remaining degree of freedom on the RISC mapping: within the write-once register block the registers can be allocated sequentially.
So if the block starts at zero, the destination register fields are sequential. Which helps with multiple issue. Likewise helps with mapping to
a belt machine or an accumulator/stack machine.

A bigger problem is that not all loops want a clean slate of the write-once register block. Reusing previous values can reduce memory traffic
(such as in FIR filters, etc). The tags associated with the write-once registers get more complicated.

Re: Benefits of a write once constraint on register values in an instruction loop

<9114e0ac-4300-47d1-8527-c5fd2d5e7cd6n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19334&group=comp.arch#19334

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5cee:: with SMTP id iv14mr7925176qvb.33.1627600519265; Thu, 29 Jul 2021 16:15:19 -0700 (PDT)
X-Received: by 2002:a9d:491c:: with SMTP id e28mr4947440otf.342.1627600519054; Thu, 29 Jul 2021 16:15:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 16:15:18 -0700 (PDT)
In-Reply-To: <2e49f45e-a7aa-4d5d-81b0-d8896aa3dfe2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4d78:fd0f:f097:e375; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4d78:fd0f:f097:e375
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com> <sdslob$qeh$1@dont-email.me> <78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com> <2e49f45e-a7aa-4d5d-81b0-d8896aa3dfe2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9114e0ac-4300-47d1-8527-c5fd2d5e7cd6n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 29 Jul 2021 23:15:19 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: MitchAlsup - Thu, 29 Jul 2021 23:15 UTC

On Thursday, July 29, 2021 at 5:43:59 PM UTC-5, JimBrakefield wrote:
> On Wednesday, July 28, 2021 at 9:19:32 PM UTC-5, JimBrakefield wrote:

> > Although the belt is write once, it is logically ordered by instruction sequence. Which the compiler arranges.
> > So there does not seem to be a simple mapping between the belt and this write once register block idea.
> > I'll skip the details and state that the register numbers in the function unit reservation stations are not used to write to the registers,
> > instead the register numbers describe the dependency graph of results and thus control the order of evaluation.
> > (am assuming function units wait for both operands which are obtained individually when function units broadcast their results)
> >
> > So the idea maps most easily to OoO RISC architecture. There is a mapping to a stack/accumulator machine (the writes are to the data stack).
> > At the end of the loop the stack pointer is reset. Requires in-order instruction issue?
> > > might up not much like a regular ISA at all, but more like a cellular
> > > automaton or stream machine.
> > Here the goal is to keep as many pipelined function units busy as possible.
> > So in that sense it is a stream machine. The "draw" is that the OoO RISC hardware, without renaming, is repurposed into a stream machine.
> There is a remaining degree of freedom on the RISC mapping: within the write-once register block the registers can be allocated sequentially.
> So if the block starts at zero, the destination register fields are sequential. Which helps with multiple issue. Likewise helps with mapping to
<
You must arrange the situation where the subroutine calling convention is integrated
with the register in block ordering such that you do not require ANY added data
movement in order to enter a loop.
<
> a belt machine or an accumulator/stack machine.
>
> A bigger problem is that not all loops want a clean slate of the write-once register block. Reusing previous values can reduce memory traffic
> (such as in FIR filters, etc). The tags associated with the write-once registers get more complicated.

Re: Benefits of a write once constraint on register values in an instruction loop

<70f04f47-1a68-4a27-a8ff-fa9485a00655n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19337&group=comp.arch#19337

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:fbd1:: with SMTP id n17mr8034548qvp.19.1627601308350; Thu, 29 Jul 2021 16:28:28 -0700 (PDT)
X-Received: by 2002:a9d:4e0a:: with SMTP id p10mr5030477otf.329.1627601308125; Thu, 29 Jul 2021 16:28:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 16:28:27 -0700 (PDT)
In-Reply-To: <9114e0ac-4300-47d1-8527-c5fd2d5e7cd6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com> <sdslob$qeh$1@dont-email.me> <78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com> <2e49f45e-a7aa-4d5d-81b0-d8896aa3dfe2n@googlegroups.com> <9114e0ac-4300-47d1-8527-c5fd2d5e7cd6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <70f04f47-1a68-4a27-a8ff-fa9485a00655n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Thu, 29 Jul 2021 23:28:28 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 30
 by: JimBrakefield - Thu, 29 Jul 2021 23:28 UTC

On Thursday, July 29, 2021 at 6:15:20 PM UTC-5, MitchAlsup wrote:
> On Thursday, July 29, 2021 at 5:43:59 PM UTC-5, JimBrakefield wrote:
> > On Wednesday, July 28, 2021 at 9:19:32 PM UTC-5, JimBrakefield wrote:
>
> > > Although the belt is write once, it is logically ordered by instruction sequence. Which the compiler arranges.
> > > So there does not seem to be a simple mapping between the belt and this write once register block idea.
> > > I'll skip the details and state that the register numbers in the function unit reservation stations are not used to write to the registers,
> > > instead the register numbers describe the dependency graph of results and thus control the order of evaluation.
> > > (am assuming function units wait for both operands which are obtained individually when function units broadcast their results)
> > >
> > > So the idea maps most easily to OoO RISC architecture. There is a mapping to a stack/accumulator machine (the writes are to the data stack).
> > > At the end of the loop the stack pointer is reset. Requires in-order instruction issue?
> > > > might up not much like a regular ISA at all, but more like a cellular
> > > > automaton or stream machine.
> > > Here the goal is to keep as many pipelined function units busy as possible.
> > > So in that sense it is a stream machine. The "draw" is that the OoO RISC hardware, without renaming, is repurposed into a stream machine.
> > There is a remaining degree of freedom on the RISC mapping: within the write-once register block the registers can be allocated sequentially.
> > So if the block starts at zero, the destination register fields are sequential. Which helps with multiple issue. Likewise helps with mapping to
> <
> You must arrange the situation where the subroutine calling convention is integrated
> with the register in block ordering such that you do not require ANY added data
> movement in order to enter a loop.
> <
> > a belt machine or an accumulator/stack machine.
> >
> > A bigger problem is that not all loops want a clean slate of the write-once register block. Reusing previous values can reduce memory traffic
> > (such as in FIR filters, etc). The tags associated with the write-once registers get more complicated.

Would expect a RISC enter loop instruction to indicate the write-once register block by, say, two register fields.
My 66000 uses a bit mask which is fine?

Re: Benefits of a write once constraint on register values in an instruction loop

<6b4c0fcf-b52f-4f38-8d35-6d37dd6d8e0fn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19339&group=comp.arch#19339

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:ccb:: with SMTP id 194mr718301qkm.369.1627605003863;
Thu, 29 Jul 2021 17:30:03 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr75668oti.178.1627605003613;
Thu, 29 Jul 2021 17:30:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jul 2021 17:30:03 -0700 (PDT)
In-Reply-To: <70f04f47-1a68-4a27-a8ff-fa9485a00655n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<sdslob$qeh$1@dont-email.me> <78c627de-2b88-456e-af21-7a4772c71ad6n@googlegroups.com>
<2e49f45e-a7aa-4d5d-81b0-d8896aa3dfe2n@googlegroups.com> <9114e0ac-4300-47d1-8527-c5fd2d5e7cd6n@googlegroups.com>
<70f04f47-1a68-4a27-a8ff-fa9485a00655n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6b4c0fcf-b52f-4f38-8d35-6d37dd6d8e0fn@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 30 Jul 2021 00:30:03 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 30 Jul 2021 00:30 UTC

On Thursday, July 29, 2021 at 6:28:29 PM UTC-5, JimBrakefield wrote:
> On Thursday, July 29, 2021 at 6:15:20 PM UTC-5, MitchAlsup wrote:
> > On Thursday, July 29, 2021 at 5:43:59 PM UTC-5, JimBrakefield wrote:
> > > On Wednesday, July 28, 2021 at 9:19:32 PM UTC-5, JimBrakefield wrote:
> >
> > > > Although the belt is write once, it is logically ordered by instruction sequence. Which the compiler arranges.
> > > > So there does not seem to be a simple mapping between the belt and this write once register block idea.
> > > > I'll skip the details and state that the register numbers in the function unit reservation stations are not used to write to the registers,
> > > > instead the register numbers describe the dependency graph of results and thus control the order of evaluation.
> > > > (am assuming function units wait for both operands which are obtained individually when function units broadcast their results)
> > > >
> > > > So the idea maps most easily to OoO RISC architecture. There is a mapping to a stack/accumulator machine (the writes are to the data stack).
> > > > At the end of the loop the stack pointer is reset. Requires in-order instruction issue?
> > > > > might up not much like a regular ISA at all, but more like a cellular
> > > > > automaton or stream machine.
> > > > Here the goal is to keep as many pipelined function units busy as possible.
> > > > So in that sense it is a stream machine. The "draw" is that the OoO RISC hardware, without renaming, is repurposed into a stream machine.
> > > There is a remaining degree of freedom on the RISC mapping: within the write-once register block the registers can be allocated sequentially.
> > > So if the block starts at zero, the destination register fields are sequential. Which helps with multiple issue. Likewise helps with mapping to
> > <
> > You must arrange the situation where the subroutine calling convention is integrated
> > with the register in block ordering such that you do not require ANY added data
> > movement in order to enter a loop.
> > <
> > > a belt machine or an accumulator/stack machine.
> > >
> > > A bigger problem is that not all loops want a clean slate of the write-once register block. Reusing previous values can reduce memory traffic
> > > (such as in FIR filters, etc). The tags associated with the write-once registers get more complicated.
<
> Would expect a RISC enter loop instruction to indicate the write-once register block by, say, two register fields.
> My 66000 uses a bit mask which is fine?
<
But this bit mask specifies the registers which carry loop-to-loop dependencies
So that during installation of the loop, the registers can be classified into Scalar
(read only in the loop), vector (written in the loop), carry (written in one loop
read in a subsequent loop). This carry registers are delivered to the register file
when the loop terminates. It ends up that the very vast majority of loops do not
have any carry dependencies.

Re: Benefits of a write once constraint on register values in an instruction loop

<5a19554c-5db5-404e-9a0b-25d4f3d0d7f2n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19911&group=comp.arch#19911

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:2f47:: with SMTP id v68mr368422qkh.190.1629139370386; Mon, 16 Aug 2021 11:42:50 -0700 (PDT)
X-Received: by 2002:a05:6808:1509:: with SMTP id u9mr97551oiw.119.1629139370067; Mon, 16 Aug 2021 11:42:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 16 Aug 2021 11:42:49 -0700 (PDT)
In-Reply-To: <e6ccb18f-56ba-40b4-b949-f32dbcf8fbdbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.182.0; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.182.0
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com> <923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com> <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com> <sdtval$6bd$1@newsreader4.netcologne.de> <e6ccb18f-56ba-40b4-b949-f32dbcf8fbdbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a19554c-5db5-404e-9a0b-25d4f3d0d7f2n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an instruction loop
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Mon, 16 Aug 2021 18:42:50 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 26
 by: JimBrakefield - Mon, 16 Aug 2021 18:42 UTC

On Thursday, July 29, 2021 at 1:29:32 PM UTC-5, MitchAlsup wrote:
> On Thursday, July 29, 2021 at 5:19:04 AM UTC-5, Thomas Koenig wrote:
> > JimBrakefield <jim.bra...@ieee.org> schrieb:
> > > Livermore Loops code density would appear to benefit from
> > > load/store register indirect auto increment.
> > Why is there such a big concern with Livermore Loops, especially
> > for code density? I would assume that any CPU used for scientific
> > work would run these out of its L1 cache without problems.
> >
> > Apart from that, any auto increment / decrement will result in two
> > stores, so presumably it will be cracked into two microops anyway.
> <
> Two register file accesses: LD-w/AI 2 writes ST-w/AI 1write 1 read.
> >
> > If you want do do something for instruction density where it
> > helps many of today's users, look towards browers, java VM and
> > Javascript engines and video codecs.
> >
> > Plus, of course, games, but I don't know of any benchmark
> > there.

Good point, auto-increment and auto-decrement addressing modes will require many register updates for Livermore loops.
Which in turn, makes PDP11 style addressing modes problematical for high performance.
The first order solution is a dedicated index register and addressing modes that use it.
E.g. (R++) becomes (R + ix<<size) where R is the base address of the data array (or derived from it)
And (--R) becomes (R - ix<<size) for use with descending index (count down to zero) loop.
If one has larger/longer instructions, a register field for the index register is possible.

Re: Benefits of a write once constraint on register values in an instruction loop

<95464bd0-3fc4-4380-bc91-047dbb159f72n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=19924&group=comp.arch#19924

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9445:: with SMTP id w66mr1219305qkd.410.1629162046270;
Mon, 16 Aug 2021 18:00:46 -0700 (PDT)
X-Received: by 2002:a9d:694c:: with SMTP id p12mr630009oto.182.1629162045956;
Mon, 16 Aug 2021 18:00:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 16 Aug 2021 18:00:45 -0700 (PDT)
In-Reply-To: <5a19554c-5db5-404e-9a0b-25d4f3d0d7f2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <7e6b951c-b95b-4c7f-bfa9-98fe92f4b50cn@googlegroups.com>
<923ed061-8ba9-4f9a-ad31-635fd9ef34ean@googlegroups.com> <64e8363c-6c13-41f2-b695-26415af3169dn@googlegroups.com>
<sdtval$6bd$1@newsreader4.netcologne.de> <e6ccb18f-56ba-40b4-b949-f32dbcf8fbdbn@googlegroups.com>
<5a19554c-5db5-404e-9a0b-25d4f3d0d7f2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <95464bd0-3fc4-4380-bc91-047dbb159f72n@googlegroups.com>
Subject: Re: Benefits of a write once constraint on register values in an
instruction loop
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 17 Aug 2021 01:00:46 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 17 Aug 2021 01:00 UTC

On Monday, August 16, 2021 at 1:42:52 PM UTC-5, JimBrakefield wrote:
> On Thursday, July 29, 2021 at 1:29:32 PM UTC-5, MitchAlsup wrote:
> > On Thursday, July 29, 2021 at 5:19:04 AM UTC-5, Thomas Koenig wrote:
> > > JimBrakefield <jim.bra...@ieee.org> schrieb:
> > > > Livermore Loops code density would appear to benefit from
> > > > load/store register indirect auto increment.
> > > Why is there such a big concern with Livermore Loops, especially
> > > for code density? I would assume that any CPU used for scientific
> > > work would run these out of its L1 cache without problems.
> > >
> > > Apart from that, any auto increment / decrement will result in two
> > > stores, so presumably it will be cracked into two microops anyway.
> > <
> > Two register file accesses: LD-w/AI 2 writes ST-w/AI 1write 1 read.
> > >
> > > If you want do do something for instruction density where it
> > > helps many of today's users, look towards browers, java VM and
> > > Javascript engines and video codecs.
> > >
> > > Plus, of course, games, but I don't know of any benchmark
> > > there.
> Good point, auto-increment and auto-decrement addressing modes will require many register updates for Livermore loops.
> Which in turn, makes PDP11 style addressing modes problematical for high performance.
<
As if were not already so.........
<
> The first order solution is a dedicated index register and addressing modes that use it.
> E.g. (R++) becomes (R + ix<<size) where R is the base address of the data array (or derived from it)
> And (--R) becomes (R - ix<<size) for use with descending index (count down to zero) loop.
<
In K9 (My x86-64 machine) we had a HW peep hole optimizer that would take a
series of stack pushes and pops and replace them with a series of STs/LDs terminated
with a single increment which decoupled the AGEN instructions from the arithmetic
instructions.
<
> If one has larger/longer instructions, a register field for the index register is possible.

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor