Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Save energy: Drive a smaller shell.


devel / comp.arch / Re: Split register files

SubjectAuthor
* Split register filesThomas Koenig
+* Re: Split register filesIvan Godard
|`* Re: Split register filesThomas Koenig
| `* Re: Split register filesBrett
|  `* Re: Split register filesThomas Koenig
|   `* Re: Split register filesBrett
|    `* Re: Split register filesBrett
|     `* Re: Split register filesIvan Godard
|      `* Re: Split register filesBrett
|       +* Re: Split register filesIvan Godard
|       |+* Re: Split register filesStefan Monnier
|       ||`* Re: Split register filesIvan Godard
|       || +- Re: Split register filesStephen Fuld
|       || +- Re: Split register filesStefan Monnier
|       || `* Rescue vs scratchpad (was: Split register files)Stefan Monnier
|       ||  `- Re: Rescue vs scratchpad (was: Split register files)Ivan Godard
|       |`* Re: Split register filesBrett
|       | `* Re: Split register filesIvan Godard
|       |  `* Re: Split register filesBrett
|       |   `* Re: Split register filesIvan Godard
|       |    `* Re: Mill conAsm vs genAsm (was: Split register files)Marcus
|       |     `* Re: Mill conAsm vs genAsm (was: Split register files)Ivan Godard
|       |      `* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       +* Re: Mill conAsm vs genAsm (was: Split register files)Ivan Godard
|       |       |+* Re: Mill conAsm vs genAsm (was: Split register files)MitchAlsup
|       |       ||`* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || +* Re: Mill conAsm vs genAsm (was: Split register files)MitchAlsup
|       |       || |+* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || ||`* Re: Mill conAsm vs genAsm (was: Split register files)Marcus
|       |       || || `* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || ||  `* Re: Mill conAsm vs genAsm (was: Split register files)Marcus
|       |       || ||   `* Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    +* Re: Vector ISA CategorisationStephen Fuld
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |`* Re: Vector ISA CategorisationStefan Monnier
|       |       || ||    | `- Re: Vector ISA CategorisationStephen Fuld
|       |       || ||    +* Re: Vector ISA CategorisationMarcus
|       |       || ||    |+* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    ||`* Re: Vector ISA Categorisationmbitsnbites
|       |       || ||    || +* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || |`- Re: Vector ISA CategorisationMarcus
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +* Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || |`* Re: Vector ISA CategorisationIvan Godard
|       |       || ||    || | `- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +* Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || |`* Re: Vector ISA CategorisationMarcus
|       |       || ||    || | `- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || `- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |+* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    ||+- Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    ||`* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +- Re: Vector ISA CategorisationIvan Godard
|       |       || ||    || `- Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    |+* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    ||`* Re: Vector ISA CategorisationEricP
|       |       || ||    || +* Re: Vector ISA CategorisationStefan Monnier
|       |       || ||    || |`- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || |`* Re: Vector ISA CategorisationEricP
|       |       || ||    || | `- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +* Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    || |`* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || | `* Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    || |  `- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || `- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |`* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    | `* Re: Vector ISA CategorisationTerje Mathisen
|       |       || ||    |  `- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    `- Re: Vector ISA CategorisationMitchAlsup
|       |       || |`* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || `* Re: Mill conAsm vs genAsm (was: Split register files)luke.l...@gmail.com
|       |       |`* Re: Mill conAsm vs genAsm (was: Split register files)Paul A. Clayton
|       |       `* Re: Mill conAsm vs genAsmStefan Monnier
|       +* Re: Split register filesStefan Monnier
|       `* Re: Split register filesThomas Koenig
+* Re: Split register filesJohn Dallman
+* Re: Split register filesAnton Ertl
+- Re: Split register filesStefan Monnier
`* Re: Split register filesMitchAlsup

Pages:12345678
Split register files

<sb6s70$dip$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18085&group=comp.arch#18085

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Split register files
Date: Sat, 26 Jun 2021 09:32:16 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb6s70$dip$1@newsreader4.netcologne.de>
Injection-Date: Sat, 26 Jun 2021 09:32:16 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="13913"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 09:32 UTC

Split register files have been proposed, for example in
http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
However, at least in this report the ISA was not changed.

So, how about this:

The ISA supports n register banks with m registers each. m=8
would probably be a good first guess, n could be 4 or 8
(for example).

Arithmetic instructions are restricted to a register bank,
as is address calculation for a load/store.

Communication between register banks is restricted to memory moves.
Synchronization between banks occurs when a register move occurs
(but only for the affected register) or on branches.

Some operations (like floating point or integer division) might
only be allowed for certain banks (say you have 8 banks of 8
registers, but only 4 of these can do floating point in addition
to all the integer operations).

SIMD operations might be done in the normal register file by doing
them simultaneously on registers i*m + j, where i is from 0 to n-1
(possibly selected by a bitmap) and j is between 0 and m.

Potential advantages:

This would need fewer read/write ports per bank and fewer
interconnections.

Resorces like the (parts of the) ALU could be done per register
bank, or per two register banks.

Because the compiler would only see the arithmetic instructions
between certain registers, it would be hopefully tailor its
register allocation accordingly to expose inherent parallelism in
the algorithm (but see disadvantages below).

Register encoding for a three-op encoding would only use 2+3+3+3=11
bits for m=8 and n=4, or 3+3+3+3+3+3=12 bits for m=8 and n=8
instead of 15 and 18 bits for 32 or 64 registers, respectively.

Potential disadvantages:

Less orthogonal, this might make the compiler's job more difficult.

Not all algorithms might be suitable; a lot of the advantages could
get lost by stalls waiting for another bank (alhtough the report
quoted above seems to indicate otherwise).

This critically depends on the compiler putting instructions in
the right register bank. Current register allocation algorithms
might not be good enough for optimum performance.

Comments? Would offer no advantages over conventional ooO
architecture? Has been tried before and didn't catch on
because...?

Re: Split register files

<sb6vfb$1ov$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18086&group=comp.arch#18086

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 03:27:56 -0700
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <sb6vfb$1ov$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 26 Jun 2021 10:27:55 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="02ab84ba7237c8d75b176d7d52cc3225";
logging-data="1823"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+8cAnNxM7BJUQD6OK+cCU3"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:PDv3bU3BheNur6dhZqQ13Qh/VfE=
In-Reply-To: <sb6s70$dip$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Ivan Godard - Sat, 26 Jun 2021 10:27 UTC

On 6/26/2021 2:32 AM, Thomas Koenig wrote:
> Split register files have been proposed, for example in
> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
> However, at least in this report the ISA was not changed.
>
> So, how about this:
>
> The ISA supports n register banks with m registers each. m=8
> would probably be a good first guess, n could be 4 or 8
> (for example).
>
> Arithmetic instructions are restricted to a register bank,
> as is address calculation for a load/store.
>
> Communication between register banks is restricted to memory moves.
> Synchronization between banks occurs when a register move occurs
> (but only for the affected register) or on branches.
>
> Some operations (like floating point or integer division) might
> only be allowed for certain banks (say you have 8 banks of 8
> registers, but only 4 of these can do floating point in addition
> to all the integer operations).
>
> SIMD operations might be done in the normal register file by doing
> them simultaneously on registers i*m + j, where i is from 0 to n-1
> (possibly selected by a bitmap) and j is between 0 and m.
>
> Potential advantages:
>
> This would need fewer read/write ports per bank and fewer
> interconnections.
>
> Resorces like the (parts of the) ALU could be done per register
> bank, or per two register banks.
>
> Because the compiler would only see the arithmetic instructions
> between certain registers, it would be hopefully tailor its
> register allocation accordingly to expose inherent parallelism in
> the algorithm (but see disadvantages below).
>
> Register encoding for a three-op encoding would only use 2+3+3+3=11
> bits for m=8 and n=4, or 3+3+3+3+3+3=12 bits for m=8 and n=8
> instead of 15 and 18 bits for 32 or 64 registers, respectively.
>
> Potential disadvantages:
>
> Less orthogonal, this might make the compiler's job more difficult.
>
> Not all algorithms might be suitable; a lot of the advantages could
> get lost by stalls waiting for another bank (alhtough the report
> quoted above seems to indicate otherwise).
>
> This critically depends on the compiler putting instructions in
> the right register bank. Current register allocation algorithms
> might not be good enough for optimum performance.
>
> Comments? Would offer no advantages over conventional ooO
> architecture? Has been tried before and didn't catch on
> because...?
>

See the Texas Instruments C64

Re: Split register files

<memo.20210626112812.12384U@jgd.cix.co.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18087&group=comp.arch#18087

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jgd...@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 11:28 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <memo.20210626112812.12384U@jgd.cix.co.uk>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
Reply-To: jgd@cix.co.uk
Injection-Info: reader02.eternal-september.org; posting-host="66deb8a96d432ad1ede7ca255b0ed4dd";
logging-data="2226"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/OICoyEZhqJPIsWAgxtvyT6N3MISv5O/8="
Cancel-Lock: sha1:Su09/YXk/bRi7aaJPh+emn5vAzg=
 by: John Dallman - Sat, 26 Jun 2021 10:28 UTC

In article <sb6s70$dip$1@newsreader4.netcologne.de>,
tkoenig@netcologne.de (Thomas Koenig) wrote:

> Comments? Would offer no advantages over conventional ooO
> architecture? Has been tried before and didn't catch on
> because...?

Makes saving and restoring registers on an interrupt more complicated.

Makes rules about callee-preserved registers more complicated.

Creates artificial shortages of registers for complex expressions,
increasing spills/fills to memory.

Wrong-register compiler bugs become more complicated.

My experience of something like this is with the Z80 alternate register
set, one of whose intended uses was to avoid saving registers in
interrupt service routines. However, that was only useful if you had a
hard guarantee that no interrupt would occur while another was being
serviced. That can be possible in a fixed-purpose embedded system, if you
know that all the interrupt service routines are short and can avoid
enabling interrupts within them. In any kind of general-purpose computer,
that isn't practical.

John

Re: Split register files

<sb70ed$fsg$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18088&group=comp.arch#18088

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 10:44:29 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb70ed$fsg$1@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<memo.20210626112812.12384U@jgd.cix.co.uk>
Injection-Date: Sat, 26 Jun 2021 10:44:29 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="16272"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 10:44 UTC

John Dallman <jgd@cix.co.uk> schrieb:
> In article <sb6s70$dip$1@newsreader4.netcologne.de>,
> tkoenig@netcologne.de (Thomas Koenig) wrote:
>
>> Comments? Would offer no advantages over conventional ooO
>> architecture? Has been tried before and didn't catch on
>> because...?
>
> Makes saving and restoring registers on an interrupt more complicated.

OK; add one synchronizing "store multiple" instruction, or
even a "store multiple banks" instruction.

> Makes rules about callee-preserved registers more complicated.

Have certain banks caller-saved and other banks callee-saved.

> Creates artificial shortages of registers for complex expressions,
> increasing spills/fills to memory.

It would possible to use a register in another bank for spilling;
I would expect that a register-to-register copy between banks would
be a single-cycle operation, much faster than a L1 cache access,
which is usually four cycles these days. The main point, of course,
is how rare such cross-bank spills would be; the data in the
report I cited seems to show that they are not that frequent even
in the absence of specific optimizations.

> Wrong-register compiler bugs become more complicated.

I'd tend to think of this as a problem that can be solved rather
easily. I haven't looked at it closely, but should be rather
straigtforward to only have instructions which operate on a
subset of registers, for example in a gcc machine description.

> My experience of something like this is with the Z80 alternate register
> set, one of whose intended uses was to avoid saving registers in
> interrupt service routines. However, that was only useful if you had a
> hard guarantee that no interrupt would occur while another was being
> serviced. That can be possible in a fixed-purpose embedded system, if you
> know that all the interrupt service routines are short and can avoid
> enabling interrupts within them. In any kind of general-purpose computer,
> that isn't practical.

Hm, I don't quite see the similarity here.

Re: Split register files

<sb70q1$fsg$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18089&group=comp.arch#18089

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 10:50:41 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb70q1$fsg$2@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me>
Injection-Date: Sat, 26 Jun 2021 10:50:41 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="16272"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 10:50 UTC

Ivan Godard <ivan@millcomputing.com> schrieb:
> On 6/26/2021 2:32 AM, Thomas Koenig wrote:

> See the Texas Instruments C64

That's rather interesting, thanks!

Looks rather similar to what I had in mind, except that they allow
at least some cross-operation between the different register files,
and they had 16 registers (later 32) per register file.

Re: Split register files

<memo.20210626124811.12384W@jgd.cix.co.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18090&group=comp.arch#18090

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jgd...@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 12:48 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <memo.20210626124811.12384W@jgd.cix.co.uk>
References: <sb70ed$fsg$1@newsreader4.netcologne.de>
Reply-To: jgd@cix.co.uk
Injection-Info: reader02.eternal-september.org; posting-host="66deb8a96d432ad1ede7ca255b0ed4dd";
logging-data="31665"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jDm3UZjDHsN5GuyyCjbI/9ekM1vbwA8U="
Cancel-Lock: sha1:2OsPuDvUStwPNS/CniW+pUkY7o8=
 by: John Dallman - Sat, 26 Jun 2021 11:48 UTC

In article <sb70ed$fsg$1@newsreader4.netcologne.de>,
tkoenig@netcologne.de (Thomas Koenig) wrote:

> > Makes rules about callee-preserved registers more complicated.
> Have certain banks caller-saved and other banks callee-saved.

That seems to argue for a designated bank for function arguments and
returns. Which is a plausible consequence of the idea.

> > Creates artificial shortages of registers for complex expressions,
> > increasing spills/fills to memory.
> It would possible to use a register in another bank for spilling;
> I would expect that a register-to-register copy between banks would
> be a single-cycle operation, much faster than a L1 cache access,

OK, but you said in the original posting:

*> Communication between register banks is restricted to memory moves.

If that is not the case, then the idea is less radical.

> > Wrong-register compiler bugs become more complicated.
> I'd tend to think of this as a problem that can be solved rather
> easily. I haven't looked at it closely, but should be rather
> straigtforward to only have instructions which operate on a
> subset of registers, for example in a gcc machine description.

I'm thinking more of bugs where the compiler has lot track of which
register bank a value is in, and references the right register number in
a bank that isn't the current bank. I've seen plenty of bugs where a
compiler gets the wrong register in a flat register file; this allows for
another layer of confusion.

> Hm, I don't quite see the similarity here.

You have limited numbers of a resource that you can only see one instance
of at a time. You need to keep track of which instance you can see,
without spending too much time using enquiry instructions.

John

Re: Split register files

<sb7575$jgr$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18091&group=comp.arch#18091

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 12:05:57 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb7575$jgr$1@newsreader4.netcologne.de>
References: <sb70ed$fsg$1@newsreader4.netcologne.de>
<memo.20210626124811.12384W@jgd.cix.co.uk>
Injection-Date: Sat, 26 Jun 2021 12:05:57 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="19995"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 12:05 UTC

John Dallman <jgd@cix.co.uk> schrieb:
> In article <sb70ed$fsg$1@newsreader4.netcologne.de>,
> tkoenig@netcologne.de (Thomas Koenig) wrote:
>
>> > Makes rules about callee-preserved registers more complicated.
>> Have certain banks caller-saved and other banks callee-saved.
>
> That seems to argue for a designated bank for function arguments and
> returns. Which is a plausible consequence of the idea.
>
>> > Creates artificial shortages of registers for complex expressions,
>> > increasing spills/fills to memory.
>> It would possible to use a register in another bank for spilling;
>> I would expect that a register-to-register copy between banks would
>> be a single-cycle operation, much faster than a L1 cache access,
>
> OK, but you said in the original posting:
>
> *> Communication between register banks is restricted to memory moves.
>
> If that is not the case, then the idea is less radical.

Yes, probably :-) So, let me fix it to read "Communication between
register banks is restricted to register to register moves",
which makes much more sense.

>> > Wrong-register compiler bugs become more complicated.
>> I'd tend to think of this as a problem that can be solved rather
>> easily. I haven't looked at it closely, but should be rather
>> straigtforward to only have instructions which operate on a
>> subset of registers, for example in a gcc machine description.
>
> I'm thinking more of bugs where the compiler has lot track of which
> register bank a value is in, and references the right register number in
> a bank that isn't the current bank. I've seen plenty of bugs where a
> compiler gets the wrong register in a flat register file; this allows for
> another layer of confusion.

Possible.

The solution, probably, is to stand on the shoulder of giants (i.e.
use a well-established compiler; I mentioned gcc, but LLVM should
also work), rather than roll my own.

There is a danger, of course, of getting the machine descriptin
wrong or of exposing bugs in the register allocator by excercising
rarely-used code paths, for which it might be hard to find a test
case with one of the usual architectures.

Still, if everything else fails, using a separate add (and shift,
and mul, and div, and ... ) instruction for each register bank
should do the trick.

>> Hm, I don't quite see the similarity here.
>
> You have limited numbers of a resource that you can only see one instance
> of at a time. You need to keep track of which instance you can see,
> without spending too much time using enquiry instructions.

I don't see where I would need an enquiry instructions, but I may
not have thought this through.

Re: Split register files

<memo.20210626142449.12384Y@jgd.cix.co.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18094&group=comp.arch#18094

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jgd...@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 14:24 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <memo.20210626142449.12384Y@jgd.cix.co.uk>
References: <sb7575$jgr$1@newsreader4.netcologne.de>
Reply-To: jgd@cix.co.uk
Injection-Info: reader02.eternal-september.org; posting-host="66deb8a96d432ad1ede7ca255b0ed4dd";
logging-data="4593"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MtWMeNYeGIv6ULUQ9KAUdnk4ysOTAIws="
Cancel-Lock: sha1:XYYuOB83oAIMOeqttPSiIo92dNc=
 by: John Dallman - Sat, 26 Jun 2021 13:24 UTC

In article <sb7575$jgr$1@newsreader4.netcologne.de>,
tkoenig@netcologne.de (Thomas Koenig) wrote:

> Yes, probably :-) So, let me fix it to read "Communication between
> register banks is restricted to register to register moves",
> which makes much more sense.

OK.

> There is a danger, of course, of getting the machine description
> wrong or of exposing bugs in the register allocator by excercising
> rarely-used code paths, for which it might be hard to find a test
> case with one of the usual architectures.

Absolutely. This unconventional register structure is likely to expose
bugs that have been quietly sleeping for years.

> Still, if everything else fails, using a separate add (and shift,
> and mul, and div, and ... ) instruction for each register bank
> should do the trick.

At that point, are you really gaining anything? You're using opcode space
for that which could otherwise go in register selection bits.

> I don't see where I would need an enquiry instructions, but I may
> not have thought this through.

Interrupt service routines that need to make sure they have the same bank
selected when they return. Applies to both hardware and software
interrupts.

Sanity checking for the calling convention, to be turned on in the
compiler when things are amiss.

In general, write-only state gives rise to problems when things are
happening asynchronously.

John

Re: Split register files

<2021Jun26.152415@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18096&group=comp.arch#18096

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 13:24:15 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 179
Distribution: world
Message-ID: <2021Jun26.152415@mips.complang.tuwien.ac.at>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="69b538dccf69ab92f3d886d7b1af1ad0";
logging-data="19973"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/k3z/jhQCal51MmnXNaj0x"
Cancel-Lock: sha1:LlFAOz+pnlFJXD+zNdV6PRN5rpg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 26 Jun 2021 13:24 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>So, how about this:
>
>The ISA supports n register banks with m registers each. m=8
>would probably be a good first guess, n could be 4 or 8
>(for example).
>
>Arithmetic instructions are restricted to a register bank,
>as is address calculation for a load/store.
>
>Communication between register banks is restricted to memory moves.
>Synchronization between banks occurs when a register move occurs
>(but only for the affected register) or on branches.
>
>Some operations (like floating point or integer division) might
>only be allowed for certain banks (say you have 8 banks of 8
>registers, but only 4 of these can do floating point in addition
>to all the integer operations).
>
>SIMD operations might be done in the normal register file by doing
>them simultaneously on registers i*m + j, where i is from 0 to n-1
>(possibly selected by a bitmap) and j is between 0 and m.
>
>Potential advantages:
>
>This would need fewer read/write ports per bank and fewer
>interconnections.
>
>Resorces like the (parts of the) ALU could be done per register
>bank, or per two register banks.
>
>Because the compiler would only see the arithmetic instructions
>between certain registers, it would be hopefully tailor its
>register allocation accordingly to expose inherent parallelism in
>the algorithm (but see disadvantages below).
>
>Register encoding for a three-op encoding would only use 2+3+3+3=11
>bits for m=8 and n=4, or 3+3+3+3+3+3=12 bits for m=8 and n=8
>instead of 15 and 18 bits for 32 or 64 registers, respectively.
>
>Potential disadvantages:
>
>Less orthogonal, this might make the compiler's job more difficult.
>
>Not all algorithms might be suitable; a lot of the advantages could
>get lost by stalls waiting for another bank (alhtough the report
>quoted above seems to indicate otherwise).
>
>This critically depends on the compiler putting instructions in
>the right register bank. Current register allocation algorithms
>might not be good enough for optimum performance.
>
>Comments?

If you do it architecturally, it certainly makes the compiler's task
harder.

I think the Multiflow architecture is organized like this (but I
think with more registers per slice), but my memory is getting hazy.

More recent instances are microarchitectural:

The 21264 is composed of two clusters, with each cluster having a copy
of all registers, and the cross-cluster communication taking an extra
cycle; this was performed automatically by the hardware, but by appropriate instruction scheduling the compiler could avoid it.

ILDP [kim&smith02] also works with copies of the full register file.

> Would offer no advantages over conventional ooO
>architecture? Has been tried before and didn't catch on
>because...?

ILDP has been evaluated by other researchers [salverda&zilles07], and
found to have more disadvantages than the original paper promised.

Recent OoO implementations have not used the cluster-splitting that
the 21264 used (AFAIK), so apparently the cost of having many ports to
the register file is not a big problem these days. I guess that OoO
can arrange writes to go to different banks of the register file (so
only one write port per bank is necessary), and an occasional delay
cycle in read access does not hurt that much; the time-critical stuff
happens on the forwarding paths anyway. As to why the gazillion
forwarding paths between the 10 FUs (with 20 inputs) is manageable
these days, I don't know.

In any case, if your idea has merit, one can implement it in the front
end of an OoO CPU (probably easier than in a compiler, and, more
importantly, with no need to break backwards compatibility). I expect
that most programs have too intertwined data flow, and will suffer a
lot from the forced inter-cluster moves. Of course, it depends on the
other costs involved. If you can increase the clock rate by 50% with
your idea, the benefit is probably greater than the cost of the
inter-cluster moves.

@InProceedings{kim&smith02,
author = {Ho-Seop Kim and James E. Smith},
title = {An Instruction Set and Microarchitecture for
Instruction Level Distributed Processing},
crossref = {isca02},
pages = {71--81},
url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},
annote = {This paper addresses the problems of wide
superscalars with communication across the chip and
the number of write ports in the register file. The
authors propose an architecture (ILDP) with
general-purpose registers and with accumulators
(with instructions only accessing one accumulator
(read and/or write) and one register (read or
write); for the accumulators their death is
specified explicitly in the instructions. The
microarchitecture builds \emph{strands} from
instructions working on an accumulator; a strand
starts with an instruction writing to an accumulator
without reading from it, continues with instructions
reading from (and possibly writing to) the
accumulator and ends with an instruction that kills
the accumulator. Strands are allocated to one out of
eight processing elements (PEs) dynamically (i.e.,
accumulators are renamed). A PE consists of
mainly one ALU data path (but also a copy of the
GPRs and an L1 cache). They evaluated this
architecture by translating Alpha binaries into it,
and comparing their architecture to a 4-wide or
8-wide Alpha implementation; their architecture has
a lower L1 cache latency, though. The performance of
ILDP in clock cycles is competetive, and one can
expect faster clocks for ILDP. The paper also
presents data for other stuff, e.g. general-purpose
register writes, which have to be promoted between
strands and which are relatively few.}
}

@Proceedings{isca02,
title = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
booktitle = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
year = "2002",
key = "ISCA 29",
}

@InProceedings{salverda&zilles07,
author = {Pierre Salverda and Craig Zilles},
title = {Dependence-Based Scheduling Revisited: A Tale of Two
Baselines},
booktitle = {Sixth Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD 2007)},
year = {2007},
url = {http://www.ece.wisc.edu/~wddd/2007/papers/wddd_01.pdf},
url2 = {http://www-sal.cs.uiuc.edu/~zilles/papers/lanes.wddd-2007.pdf},
annote = {When the authors simulated the dependence-based
scheduling work by Palacharla, Kim, and Smith, they
found 30\% lower IPC than a conventional OoO
machine, whereas the original simulations only found
a 5\% lower IPC. The paper analyses the reasons for
this, and provides a number of insights into how
hardware schedulers, execution engines, and various
features in them interact, and why and how
dependence-based scheduling works. The authors'
simulation had a number of significant differences
from the simulation in the original work: it used a
memory disambiguator, 2-cycle load latency (instead
of 1-cycle), and a better branch predictor. These
changes increase the number of strands available at
the same time, and the 8-lane dependence-based
machine becomes lane-limited (and instruction fetch
stalls waiting for a free lane), so it cannot profit
from the improvements or work around the higher
latency, whereas a conventional OoO machine can. 24
lanes would be required to bring the IPC
disadvantage of the dependence-based machine down to
5\% on the authors' simulator. OTOH, by changing
these parts of their simulation to be like the
original work, the dependence-based scheduling only
had an 11\% IPC disadvantage on an 8-lane machine
(much closer to the original 5\%)}
}

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Split register files

<jwvczs8ptnx.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18098&group=comp.arch#18098

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 10:50:42 -0400
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <jwvczs8ptnx.fsf-monnier+comp.arch@gnu.org>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="47441adc7dc81cce55e7c8e0d7424681";
logging-data="3761"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+vcwPC38Icgc43NF16yKZq"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:ZiP74oQhCIrxEWQA0ZVtPFpNbCE=
sha1:by0Y+MtEhmf9UZSdpb4bZyEHfkY=
 by: Stefan Monnier - Sat, 26 Jun 2021 14:50 UTC

> The ISA supports n register banks with m registers each. m=8
> would probably be a good first guess, n could be 4 or 8
> (for example).

You might like to take a look at Bernd Paysan's 4-stack forth chip
https://bernd-paysan.de/4stack.html

Stefan

Re: Split register files

<2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18104&group=comp.arch#18104

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5e12:: with SMTP id h18mr14270318qtx.253.1624722205099;
Sat, 26 Jun 2021 08:43:25 -0700 (PDT)
X-Received: by 2002:a05:6830:1643:: with SMTP id h3mr14898990otr.76.1624722204961;
Sat, 26 Jun 2021 08:43:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Jun 2021 08:43:24 -0700 (PDT)
In-Reply-To: <sb6s70$dip$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d167:c4a4:9932:46f3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d167:c4a4:9932:46f3
References: <sb6s70$dip$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
Subject: Re: Split register files
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 26 Jun 2021 15:43:25 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sat, 26 Jun 2021 15:43 UTC

On Saturday, June 26, 2021 at 4:32:18 AM UTC-5, Thomas Koenig wrote:
> Split register files have been proposed, for example in
> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
> However, at least in this report the ISA was not changed.
>
> So, how about this:
<><><>
>
> Comments? Would offer no advantages over conventional ooO
> architecture? Has been tried before and didn't catch on
> because...?
<
Just put me down as no. How registers are named and used is a compiler
problem not a hardware problem.
<
If you need more read ports, replicate the register file and see that both
copies get the same values written into them.
<
If you need more write ports, you are already SoL.

Re: Split register files

<sb7o4e$243$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18107&group=comp.arch#18107

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 17:28:46 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb7o4e$243$1@newsreader4.netcologne.de>
References: <sb7575$jgr$1@newsreader4.netcologne.de>
<memo.20210626142449.12384Y@jgd.cix.co.uk>
Injection-Date: Sat, 26 Jun 2021 17:28:46 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="2179"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 17:28 UTC

John Dallman <jgd@cix.co.uk> schrieb:
> In article <sb7575$jgr$1@newsreader4.netcologne.de>,
> tkoenig@netcologne.de (Thomas Koenig) wrote:
>
>> Yes, probably :-) So, let me fix it to read "Communication between
>> register banks is restricted to register to register moves",
>> which makes much more sense.
>
> OK.
>
>> There is a danger, of course, of getting the machine description
>> wrong or of exposing bugs in the register allocator by excercising
>> rarely-used code paths, for which it might be hard to find a test
>> case with one of the usual architectures.
>
> Absolutely. This unconventional register structure is likely to expose
> bugs that have been quietly sleeping for years.
>
>> Still, if everything else fails, using a separate add (and shift,
>> and mul, and div, and ... ) instruction for each register bank
>> should do the trick.
>
> At that point, are you really gaining anything? You're using opcode space
> for that which could otherwise go in register selection bits.

Doesn't matter either way what it's called, if it is part of
the opcode or part of the register selection bits, it will
be the same bits either way.

What I was talking about is how the compiler sees the instructions
(and that it doesn't get the registers wrong).

However, it is probably much easier easier to set up separate
register classes.

>
>> I don't see where I would need an enquiry instructions, but I may
>> not have thought this through.
>
> Interrupt service routines that need to make sure they have the same bank
> selected when they return. Applies to both hardware and software
> interrupts.

Maybe my terminology was misleading.

What I meant was that r0,r1...r7 would be in a group of registers,
as would be r8, r9, ... r15, etc so there would be a way to encode

add r0,r3,r7

in an instruction but not

add r0,r7,r13

If that was desired, the instruction sequence would be

mr r1, r13
add r0,r7,r2

Re: Split register files

<sb7p66$243$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18108&group=comp.arch#18108

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 17:46:46 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb7p66$243$2@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2021Jun26.152415@mips.complang.tuwien.ac.at>
Injection-Date: Sat, 26 Jun 2021 17:46:46 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="2179"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 17:46 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>>So, how about this:
>>
>>The ISA supports n register banks with m registers each. m=8
>>would probably be a good first guess, n could be 4 or 8
>>(for example).
>>
>>Arithmetic instructions are restricted to a register bank,
>>as is address calculation for a load/store.
>>
>>Communication between register banks is restricted to memory moves.
>>Synchronization between banks occurs when a register move occurs
>>(but only for the affected register) or on branches.
>>
>>Some operations (like floating point or integer division) might
>>only be allowed for certain banks (say you have 8 banks of 8
>>registers, but only 4 of these can do floating point in addition
>>to all the integer operations).
>>
>>SIMD operations might be done in the normal register file by doing
>>them simultaneously on registers i*m + j, where i is from 0 to n-1
>>(possibly selected by a bitmap) and j is between 0 and m.
>>
>>Potential advantages:
>>
>>This would need fewer read/write ports per bank and fewer
>>interconnections.
>>
>>Resorces like the (parts of the) ALU could be done per register
>>bank, or per two register banks.
>>
>>Because the compiler would only see the arithmetic instructions
>>between certain registers, it would be hopefully tailor its
>>register allocation accordingly to expose inherent parallelism in
>>the algorithm (but see disadvantages below).
>>
>>Register encoding for a three-op encoding would only use 2+3+3+3=11
>>bits for m=8 and n=4, or 3+3+3+3+3+3=12 bits for m=8 and n=8
>>instead of 15 and 18 bits for 32 or 64 registers, respectively.
>>
>>Potential disadvantages:
>>
>>Less orthogonal, this might make the compiler's job more difficult.
>>
>>Not all algorithms might be suitable; a lot of the advantages could
>>get lost by stalls waiting for another bank (alhtough the report
>>quoted above seems to indicate otherwise).
>>
>>This critically depends on the compiler putting instructions in
>>the right register bank. Current register allocation algorithms
>>might not be good enough for optimum performance.
>>
>>Comments?
>
> If you do it architecturally, it certainly makes the compiler's task
> harder.
>
> I think the Multiflow architecture is organized like this (but I
> think with more registers per slice), but my memory is getting hazy.

Unfortunately, most Multiflow articles are behind paywalls (and
the Wikipedia article is written like an advertisement, with
little technical content).

Thanks for your pointers to literature, I will take
a look at them.

Re: Split register files

<sb7pe1$243$4@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18109&group=comp.arch#18109

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 17:50:57 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb7pe1$243$4@newsreader4.netcologne.de>
References: <sb7575$jgr$1@newsreader4.netcologne.de>
<memo.20210626142449.12384Y@jgd.cix.co.uk>
<sb7o4e$243$1@newsreader4.netcologne.de>
Injection-Date: Sat, 26 Jun 2021 17:50:57 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="2179"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 17:50 UTC

Thomas Koenig <tkoenig@netcologne.de> schrieb:
> mr r1, r13
> add r0,r7,r2

add r0,r7,r1

of course.

Re: Split register files

<memo.20210626201121.12384b@jgd.cix.co.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18112&group=comp.arch#18112

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jgd...@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 20:11 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <memo.20210626201121.12384b@jgd.cix.co.uk>
References: <sb7o4e$243$1@newsreader4.netcologne.de>
Reply-To: jgd@cix.co.uk
Injection-Info: reader02.eternal-september.org; posting-host="66deb8a96d432ad1ede7ca255b0ed4dd";
logging-data="15013"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mqULaxcObM+Z0i7jowJC723RV0UAc1hk="
Cancel-Lock: sha1:ABTEyPwnEKh7LwHS54Hc/DZgzT4=
 by: John Dallman - Sat, 26 Jun 2021 19:11 UTC

In article <sb7o4e$243$1@newsreader4.netcologne.de>,
tkoenig@netcologne.de (Thomas Koenig) wrote:

> > Interrupt service routines that need to make sure they have the
> > same bank selected when they return. Applies to both hardware
> > and software interrupts.
>
> Maybe my terminology was misleading.
>
> What I meant was that r0,r1...r7 would be in a group of registers,
> as would be r8, r9, ... r15, etc so there would be a way to encode ...

Aha! Your use of "bank" led me to think that the registers were
bank-switched, so that only one bank was visible at a time. That would
mean that with n register banks, there would be n different pieces of
in-processor storage that could appear as r0. That causes a need for an
enquiry instruction, so that you can find out which bank is selected. You
see why I though the Z80 alternate register set was relevant?

What you seem to really intend is that instructions that reference more
than one register are constrained in the set of registers they can
reference, such as r0...r7, or r15...r23. But you can't reference r5 and
r20 in the same instruction, unless it's a move between them. Is that
right?

If so, all my objections above are void, because I was misunderstanding
you. This scheme doesn't have any of those problems.

I'm not sure that it is worth its extra complexity, but that's a
different level of problem.

John

Re: Split register files

<69b5e50a-1dec-417b-9b59-03abd86f5070n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18115&group=comp.arch#18115

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:91c:: with SMTP id v28mr17629113qkv.249.1624736016336;
Sat, 26 Jun 2021 12:33:36 -0700 (PDT)
X-Received: by 2002:a4a:4084:: with SMTP id n126mr13977793ooa.74.1624736016125;
Sat, 26 Jun 2021 12:33:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Jun 2021 12:33:35 -0700 (PDT)
In-Reply-To: <sb7o4e$243$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d167:c4a4:9932:46f3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d167:c4a4:9932:46f3
References: <sb7575$jgr$1@newsreader4.netcologne.de> <memo.20210626142449.12384Y@jgd.cix.co.uk>
<sb7o4e$243$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <69b5e50a-1dec-417b-9b59-03abd86f5070n@googlegroups.com>
Subject: Re: Split register files
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 26 Jun 2021 19:33:36 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sat, 26 Jun 2021 19:33 UTC

On Saturday, June 26, 2021 at 12:28:48 PM UTC-5, Thomas Koenig wrote:
> John Dallman <j...@cix.co.uk> schrieb:
> > In article <sb7575$jgr$1...@newsreader4.netcologne.de>,
> > tko...@netcologne.de (Thomas Koenig) wrote:
> >
> >> Yes, probably :-) So, let me fix it to read "Communication between
> >> register banks is restricted to register to register moves",
> >> which makes much more sense.
> >
> > OK.
> >
> >> There is a danger, of course, of getting the machine description
> >> wrong or of exposing bugs in the register allocator by excercising
> >> rarely-used code paths, for which it might be hard to find a test
> >> case with one of the usual architectures.
> >
> > Absolutely. This unconventional register structure is likely to expose
> > bugs that have been quietly sleeping for years.
> >
> >> Still, if everything else fails, using a separate add (and shift,
> >> and mul, and div, and ... ) instruction for each register bank
> >> should do the trick.
> >
> > At that point, are you really gaining anything? You're using opcode space
> > for that which could otherwise go in register selection bits.
> Doesn't matter either way what it's called, if it is part of
> the opcode or part of the register selection bits, it will
> be the same bits either way.
>
> What I was talking about is how the compiler sees the instructions
> (and that it doesn't get the registers wrong).
>
> However, it is probably much easier easier to set up separate
> register classes.
> >
> >> I don't see where I would need an enquiry instructions, but I may
> >> not have thought this through.
> >
> > Interrupt service routines that need to make sure they have the same bank
> > selected when they return. Applies to both hardware and software
> > interrupts.
> Maybe my terminology was misleading.
>
> What I meant was that r0,r1...r7 would be in a group of registers,
> as would be r8, r9, ... r15, etc so there would be a way to encode
>
> add r0,r3,r7
>
> in an instruction but not
>
> add r0,r7,r13
>
> If that was desired, the instruction sequence would be
>
> mr r1, r13
> add r0,r7,r2
<
Unless you have a very tiny instruction to perform these inter-bank
MOVs, you are going to harm instruction density.

Re: Split register files

<2e3f8732-2225-42a7-a542-9d719c6ad8a9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18116&group=comp.arch#18116

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:e18c:: with SMTP id p12mr17664871qvl.54.1624736116765; Sat, 26 Jun 2021 12:35:16 -0700 (PDT)
X-Received: by 2002:a05:6808:a19:: with SMTP id n25mr12949531oij.0.1624736116495; Sat, 26 Jun 2021 12:35:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Jun 2021 12:35:16 -0700 (PDT)
In-Reply-To: <sb7p66$243$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d167:c4a4:9932:46f3; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d167:c4a4:9932:46f3
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at> <sb7p66$243$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2e3f8732-2225-42a7-a542-9d719c6ad8a9n@googlegroups.com>
Subject: Re: Split register files
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 26 Jun 2021 19:35:16 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 70
 by: MitchAlsup - Sat, 26 Jun 2021 19:35 UTC

On Saturday, June 26, 2021 at 12:46:48 PM UTC-5, Thomas Koenig wrote:
> Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
> > Thomas Koenig <tko...@netcologne.de> writes:
> >>So, how about this:
> >>
> >>The ISA supports n register banks with m registers each. m=8
> >>would probably be a good first guess, n could be 4 or 8
> >>(for example).
> >>
> >>Arithmetic instructions are restricted to a register bank,
> >>as is address calculation for a load/store.
> >>
> >>Communication between register banks is restricted to memory moves.
> >>Synchronization between banks occurs when a register move occurs
> >>(but only for the affected register) or on branches.
> >>
> >>Some operations (like floating point or integer division) might
> >>only be allowed for certain banks (say you have 8 banks of 8
> >>registers, but only 4 of these can do floating point in addition
> >>to all the integer operations).
> >>
> >>SIMD operations might be done in the normal register file by doing
> >>them simultaneously on registers i*m + j, where i is from 0 to n-1
> >>(possibly selected by a bitmap) and j is between 0 and m.
> >>
> >>Potential advantages:
> >>
> >>This would need fewer read/write ports per bank and fewer
> >>interconnections.
> >>
> >>Resorces like the (parts of the) ALU could be done per register
> >>bank, or per two register banks.
> >>
> >>Because the compiler would only see the arithmetic instructions
> >>between certain registers, it would be hopefully tailor its
> >>register allocation accordingly to expose inherent parallelism in
> >>the algorithm (but see disadvantages below).
> >>
> >>Register encoding for a three-op encoding would only use 2+3+3+3=11
> >>bits for m=8 and n=4, or 3+3+3+3+3+3=12 bits for m=8 and n=8
> >>instead of 15 and 18 bits for 32 or 64 registers, respectively.
> >>
> >>Potential disadvantages:
> >>
> >>Less orthogonal, this might make the compiler's job more difficult.
> >>
> >>Not all algorithms might be suitable; a lot of the advantages could
> >>get lost by stalls waiting for another bank (alhtough the report
> >>quoted above seems to indicate otherwise).
> >>
> >>This critically depends on the compiler putting instructions in
> >>the right register bank. Current register allocation algorithms
> >>might not be good enough for optimum performance.
> >>
> >>Comments?
> >
> > If you do it architecturally, it certainly makes the compiler's task
> > harder.
> >
> > I think the Multiflow architecture is organized like this (but I
> > think with more registers per slice), but my memory is getting hazy.
> Unfortunately, most Multiflow articles are behind paywalls (and
> the Wikipedia article is written like an advertisement, with
> little technical content).
<
After the little success they had "way back when" and the scant pursuance
of recent, Why bother looking down dead ends for genuinely new ideas to
make forward computer architecture progress ?
>
> Thanks for your pointers to literature, I will take
> a look at them.

Re: Split register files

<sb801m$80l$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18117&group=comp.arch#18117

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 19:43:50 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb801m$80l$1@newsreader4.netcologne.de>
References: <sb7o4e$243$1@newsreader4.netcologne.de>
<memo.20210626201121.12384b@jgd.cix.co.uk>
Injection-Date: Sat, 26 Jun 2021 19:43:50 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="8213"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 19:43 UTC

John Dallman <jgd@cix.co.uk> schrieb:
> In article <sb7o4e$243$1@newsreader4.netcologne.de>,
> tkoenig@netcologne.de (Thomas Koenig) wrote:
>
>> > Interrupt service routines that need to make sure they have the
>> > same bank selected when they return. Applies to both hardware
>> > and software interrupts.
>>
>> Maybe my terminology was misleading.
>>
>> What I meant was that r0,r1...r7 would be in a group of registers,
>> as would be r8, r9, ... r15, etc so there would be a way to encode ...
>
> Aha! Your use of "bank" led me to think that the registers were
> bank-switched, so that only one bank was visible at a time. That would
> mean that with n register banks, there would be n different pieces of
> in-processor storage that could appear as r0. That causes a need for an
> enquiry instruction, so that you can find out which bank is selected. You
> see why I though the Z80 alternate register set was relevant?
>
> What you seem to really intend is that instructions that reference more
> than one register are constrained in the set of registers they can
> reference, such as r0...r7, or r15...r23. But you can't reference r5 and
> r20 in the same instruction, unless it's a move between them. Is that
> right?

That is correct.

>
> If so, all my objections above are void, because I was misunderstanding
> you. This scheme doesn't have any of those problems.

> I'm not sure that it is worth its extra complexity, but that's a
> different level of problem.

Definitely, that's what I am trying to find out :-)

Re: Split register files

<sb8154$80l$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18119&group=comp.arch#18119

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 20:02:44 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb8154$80l$2@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2021Jun26.152415@mips.complang.tuwien.ac.at>
<sb7p66$243$2@newsreader4.netcologne.de>
<2e3f8732-2225-42a7-a542-9d719c6ad8a9n@googlegroups.com>
Injection-Date: Sat, 26 Jun 2021 20:02:44 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="8213"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 20:02 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Saturday, June 26, 2021 at 12:46:48 PM UTC-5, Thomas Koenig wrote:

>> Unfortunately, most Multiflow articles are behind paywalls (and
>> the Wikipedia article is written like an advertisement, with
>> little technical content).
><
> After the little success they had "way back when" and the scant pursuance
> of recent, Why bother looking down dead ends for genuinely new ideas to
> make forward computer architecture progress ?

It can be very instructive to study things from the past and learn
about what they got right and what they got wrong, and how old
ideas might work better under the changed technical circumstances,
possibly combined with new ones.

Re: Split register files

<sb89q8$ee0$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18121&group=comp.arch#18121

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sat, 26 Jun 2021 22:30:32 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb89q8$ee0$1@newsreader4.netcologne.de>
References: <sb7575$jgr$1@newsreader4.netcologne.de>
<memo.20210626142449.12384Y@jgd.cix.co.uk>
<sb7o4e$243$1@newsreader4.netcologne.de>
<69b5e50a-1dec-417b-9b59-03abd86f5070n@googlegroups.com>
Injection-Date: Sat, 26 Jun 2021 22:30:32 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="14784"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 26 Jun 2021 22:30 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Saturday, June 26, 2021 at 12:28:48 PM UTC-5, Thomas Koenig wrote:

>> mr r1, r13
>> add r0,r7,r2
><
> Unless you have a very tiny instruction to perform these inter-bank
> MOVs, you are going to harm instruction density.

Yes.

The bits needed for register addressing are somewhat reduced by this
scheme (for example, eight bits for four banks of eight registers
each), with inter-bank register moves taking 10 register address
bits).

It might be possible to design a two-operand ISA using 16 bit
instructions only and a total of 32 registers, but this could be
a stretch, or rather the opposite, a very tight fit.

Maybe a three instructions per 64 bit word ISA would be better,
with each instruction having 21 bits.

Hmm... not yet clear on how to proceed, or if it makes sense.

Re: Split register files

<sb912k$c4c$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18125&group=comp.arch#18125

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sun, 27 Jun 2021 05:07:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <sb912k$c4c$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me>
<sb70q1$fsg$2@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 27 Jun 2021 05:07:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="3c5bda4bb95bf088ff63c7487604f6ac";
logging-data="12428"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+9i8y4nYFaTIztbwNdomq"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:8Qm8rBZqtdETCxTmjTjPGzWoPeI=
sha1:DMS3+HBdb8jM14Fvjb/RqRaYPu8=
 by: Brett - Sun, 27 Jun 2021 05:07 UTC

Thomas Koenig <tkoenig@netcologne.de> wrote:
> Ivan Godard <ivan@millcomputing.com> schrieb:
>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>
>> See the Texas Instruments C64
>
> That's rather interesting, thanks!
>
> Looks rather similar to what I had in mind, except that they allow
> at least some cross-operation between the different register files,
> and they had 16 registers (later 32) per register file.
>

I like it, however I would burn two instruction bits for the two banks.
So you can say bank 0, bank 1, both banks, sequential load to both, etc.

Generally you only need 16 registers, this gives another 16 registers for
loop unrolling while not doubling the instructions like TI C64 needs.

I can see why the TI 64 made that choice and it was the correct choice for
them.
My concern is post RISC which is now a decade away.

Re: Split register files

<sb99gi$1r5$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18128&group=comp.arch#18128

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sun, 27 Jun 2021 07:31:30 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sb99gi$1r5$1@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me>
Injection-Date: Sun, 27 Jun 2021 07:31:30 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="1893"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 27 Jun 2021 07:31 UTC

Brett <ggtgp@yahoo.com> schrieb:
> Thomas Koenig <tkoenig@netcologne.de> wrote:
>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>>
>>> See the Texas Instruments C64
>>
>> That's rather interesting, thanks!
>>
>> Looks rather similar to what I had in mind, except that they allow
>> at least some cross-operation between the different register files,
>> and they had 16 registers (later 32) per register file.
>>
>
> I like it, however I would burn two instruction bits for the two banks.
> So you can say bank 0, bank 1, both banks, sequential load to both, etc.

What would you appy them to? Assuming a three-register instruction,
(and assuming that r0-15 are in the first bank and r16-r31 in the
second) would you then be able to write

add r2,r17,r22

(so the bank 1 ALU presumably does the calculation and sends over
the result to bank 0)?

Also, this would use one bit only if you have two banks. I was
envisioning more than two, then the advantage in encoding bits
starts to disappear.

Re: Split register files

<sb9vie$bim$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18130&group=comp.arch#18130

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: not_va...@comcast.net (James Van Buskirk)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sun, 27 Jun 2021 07:47:17 -0600
Organization: A noiseless patient Spider
Lines: 1
Message-ID: <sb9vie$bim$1@dont-email.me>
References: <sb7o4e$243$1@newsreader4.netcologne.de> <memo.20210626201121.12384b@jgd.cix.co.uk> <sb801m$80l$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain;
format=flowed;
charset="Windows-1252";
reply-type=original
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 27 Jun 2021 13:47:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2d2968e4e1cd723d8c4ec63ceddaffb0";
logging-data="11862"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+btHTkTkFBPSdLQK9deJoMzWr+dIFAJ1g="
Cancel-Lock: sha1:eCLoMn+CWE1QO9m/sbLffMu7P34=
X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331
In-Reply-To: <sb801m$80l$1@newsreader4.netcologne.de>
X-Newsreader: Microsoft Windows Live Mail 16.4.3528.331
Importance: Normal
X-Priority: 3
X-MSMail-Priority: Normal
 by: James Van Buskirk - Sun, 27 Jun 2021 13:47 UTC

"Thomas Koenig" wrote in message
news:sb801m$80l$1@newsreader4.netcologne.de...

> John Dallman <jgd@cix.co.uk> schrieb:
> > In article <sb7o4e$243$1@newsreader4.netcologne.de>,
> > tkoenig@netcologne.de (Thomas Koenig) wrote:

> >> What I meant was that r0,r1...r7 would be in a group of registers,
> >> as would be r8, r9, ... r15, etc so there would be a way to encode ...

> > What you seem to really intend is that instructions that reference more
> > than one register are constrained in the set of registers they can
> > reference, such as r0...r7, or r15...r23. But you can't reference r5 and
> > r20 in the same instruction, unless it's a move between them. Is that
> > right?

> That is correct.

> > I'm not sure that it is worth its extra complexity, but that's a
> > different level of problem.

> Definitely, that's what I am trying to find out :-)

Do you support pivot operations, like a perhaps partial bit
reversal permutation of the register file?

Re: Split register files

<sbadsa$r84$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18136&group=comp.arch#18136

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Sun, 27 Jun 2021 17:52:10 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbadsa$r84$1@newsreader4.netcologne.de>
References: <sb7o4e$243$1@newsreader4.netcologne.de>
<memo.20210626201121.12384b@jgd.cix.co.uk>
<sb801m$80l$1@newsreader4.netcologne.de> <sb9vie$bim$1@dont-email.me>
Injection-Date: Sun, 27 Jun 2021 17:52:10 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="27908"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 27 Jun 2021 17:52 UTC

James Van Buskirk <not_valid@comcast.net> schrieb:
> "Thomas Koenig" wrote in message
> news:sb801m$80l$1@newsreader4.netcologne.de...
>
>> John Dallman <jgd@cix.co.uk> schrieb:
>> > In article <sb7o4e$243$1@newsreader4.netcologne.de>,
>> > tkoenig@netcologne.de (Thomas Koenig) wrote:
>
>> >> What I meant was that r0,r1...r7 would be in a group of registers,
>> >> as would be r8, r9, ... r15, etc so there would be a way to encode ...
>
>> > What you seem to really intend is that instructions that reference more
>> > than one register are constrained in the set of registers they can
>> > reference, such as r0...r7, or r15...r23. But you can't reference r5 and
>> > r20 in the same instruction, unless it's a move between them. Is that
>> > right?
>
>> That is correct.
>
>> > I'm not sure that it is worth its extra complexity, but that's a
>> > different level of problem.
>
>> Definitely, that's what I am trying to find out :-)
>
> Do you support pivot operations, like a perhaps partial bit
> reversal permutation of the register file?

I haven't given it any thought so far (but then again,
I am not sure what exactly this means).

Could you maybe elaborate?

Re: Split register files

<sbd1fq$b2h$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18162&group=comp.arch#18162

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 28 Jun 2021 12:36:38 -0500
Organization: A noiseless patient Spider
Lines: 140
Message-ID: <sbd1fq$b2h$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 28 Jun 2021 17:39:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f171d8a10e739ea3d9f38825b6ca398d";
logging-data="11345"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+rfpo6jVB6vgdwOXo6jvZ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:yTD4H8ubKlTGgk+0GusOwO36T7k=
In-Reply-To: <2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
Content-Language: en-US
 by: BGB - Mon, 28 Jun 2021 17:36 UTC

On 6/26/2021 10:43 AM, MitchAlsup wrote:
> On Saturday, June 26, 2021 at 4:32:18 AM UTC-5, Thomas Koenig wrote:
>> Split register files have been proposed, for example in
>> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
>> However, at least in this report the ISA was not changed.
>>
>> So, how about this:
> <><><>
>>
>> Comments? Would offer no advantages over conventional ooO
>> architecture? Has been tried before and didn't catch on
>> because...?
> <
> Just put me down as no. How registers are named and used is a compiler
> problem not a hardware problem.

Agreed.

> <
> If you need more read ports, replicate the register file and see that both
> copies get the same values written into them.
> <
> If you need more write ports, you are already SoL.
>

IME, what worked OK on Xilinx FPGAs:
Clone the register arrays for each read port;
Clone the register arrays again for each write port;
Use an array of tag bits to encode which array is up-to-date.

For 6R+3W, this effectively means 18 internal copies of the main
register array, plus a mess of internal 2-bit tag-bit registers.

Say, 2-bit tag:
00: Lane 1 holds current version;
01: Lane 2 holds current version;
10: Lane 3 holds current version;
11: Register is Invalid (Read as Zero).

Though, one could also argue for:
00: Register is Invalid (Read as Zero);
01: Lane 1 holds current version;
10: Lane 2 holds current version;
11: Lane 3 holds current version.

This was the 1st place option among the options tested.

As can be noted, making the individual register arrays smaller will not
reduce cost with this approach, and splitting up the register space into
smaller arrays would, if anything, make it more expensive.

The 2nd place being:
Every register is updated independently as its own state machine;
If ID is seen on Lane 1's Write port, update to Lane 1 result;
If ID is seen on Lane 2's Write port, update to Lane 2 result;
If ID is seen on Lane 3's Write port, update to Lane 3 result;
Else, keep either prior value / forwarded-in value (*1).

Each read port from the GPR file is basically a big "case block" to
select the desired register value (or, basically, a big multiplex stage).

*1: For GPRs, the forwarded-in value is simply the register's output (so
it keeps its value unless modified externally);
For SPRs, these may take a longer trip through the execute unit.

The 2nd place option is a fair bit more expensive on Xilinx FPGAs, but
was cheaper on Altera. On Altera, the original 1st place option became
absurdly expensive, whereas the 2nd place option remained roughly similar.

The cost of this approach is effected much more significantly by the
number of registers and ports though.

Not yet tested on any Lattice devices, so not yet sure what to expect.

Another source of cost is the number of forwarding stages, where there
is a fairly steep increase in cost relative to the number of EX stages
and pipeline width (eg: 3-lane and EX1/2/3, each register read port may
potentially come from one of 9 possible places in the pipeline, one of 3
register arrays, or one of N possible SPRs).

Though, at least in the latter case, most CRs can only be read via the
Rm port in Lane 1. Some SPRs may also give different results depending
on which port they are read from, ...

Another option was to do it naively:
Just do a single register array in the Verilog;
Have multiple writes to the same array (from each write port).

This naive option is very expensive on both FPGA families (somewhat
worse than either the 1st place or 2nd place options).

Control and special registers generally need to use a variant of the 2nd
place approach.

The register space currently using a 7-bit scheme internally:
00..3F: R0..R63, Main Array (Excludes R0/R1/R15);
40..5F: Special Registers (R0/R1/R15 map into here);
60..7F: Control Registers (C0..C31).

When I moved to the current scheme, I also effectively folded the CRs
back into the same register space as GPRs, and partially eliminated them
having their own register ports (they still partly remain for vestigial
reasons; mostly still used to initiate branches and similar).

Main reason R0/R1/R15 are remapped to SPRs is mostly still because these
registers still need to be able to be accessed and updated via a
side-channel for a few special cases (eg: interrupt handling). Remapping
these in the decoder was also generally cheaper than trying to deal with
these cases in the register fetch and update logic.

For CRs, things are more varied. Most are implemented with a similar
approach to the SPRs (with associated side-channels).

Registers like PC are almost purely synthetic:
Trying to fetch its value effectively synthesizes it via the pipeline;
Storing to it triggers the branch-initiation logic.

If anything, it would make more sense to fold PC and the PreBranch
mechanism into the L1 I$, it basically being a unit which serves to
fetch instruction bundles and throw them into the pipeline unless acted
upon by a Hold signal or Branch Request. I haven't quite gone this far
yet though.

If I went this route, some additional special cases would be needed to
try to deal with superscalar though. A potential superscalar core would
be less of an issue with the current design which manages PC update
externally to the L1.

....

Pages:12345678
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor