Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

The generation of random numbers is too important to be left to chance.

Re: Split register files

Subject	Author
Split register files	Thomas Koenig
Re: Split register files	Ivan Godard
Re: Split register files	Thomas Koenig
Re: Split register files	Brett
Re: Split register files	Thomas Koenig
Re: Split register files	Brett
Re: Split register files	Brett
Re: Split register files	Ivan Godard
Re: Split register files	Brett
Re: Split register files	Ivan Godard
Re: Split register files	Stefan Monnier
Re: Split register files	Ivan Godard
Re: Split register files	Stephen Fuld
Re: Split register files	Stefan Monnier
Rescue vs scratchpad (was: Split register files)	Stefan Monnier
Re: Rescue vs scratchpad (was: Split register files)	Ivan Godard
Re: Split register files	Brett
Re: Split register files	Ivan Godard
Re: Split register files	Brett
Re: Split register files	Ivan Godard
Re: Mill conAsm vs genAsm (was: Split register files)	Marcus
Re: Mill conAsm vs genAsm (was: Split register files)	Ivan Godard
Re: Mill conAsm vs genAsm (was: Split register files)	Quadibloc
Re: Mill conAsm vs genAsm (was: Split register files)	Ivan Godard
Re: Mill conAsm vs genAsm (was: Split register files)	MitchAlsup
Re: Mill conAsm vs genAsm (was: Split register files)	Quadibloc
Re: Mill conAsm vs genAsm (was: Split register files)	MitchAlsup
Re: Mill conAsm vs genAsm (was: Split register files)	Quadibloc
Re: Mill conAsm vs genAsm (was: Split register files)	Marcus
Re: Mill conAsm vs genAsm (was: Split register files)	Quadibloc
Re: Mill conAsm vs genAsm (was: Split register files)	Marcus
Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Stephen Fuld
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Stefan Monnier
Re: Vector ISA Categorisation	Stephen Fuld
Re: Vector ISA Categorisation	Marcus
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	mbitsnbites
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Marcus
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Ivan Godard
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Marcus
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Thomas Koenig
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	Ivan Godard
Re: Vector ISA Categorisation	Thomas Koenig
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	EricP
Re: Vector ISA Categorisation	Stefan Monnier
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	EricP
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Quadibloc
Re: Vector ISA Categorisation	Thomas Koenig
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Thomas Koenig
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	Terje Mathisen
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	MitchAlsup
Re: Vector ISA Categorisation	luke.l...@gmail.com
Re: Vector ISA Categorisation	MitchAlsup
Re: Mill conAsm vs genAsm (was: Split register files)	Quadibloc
Re: Mill conAsm vs genAsm (was: Split register files)	luke.l...@gmail.com
Re: Mill conAsm vs genAsm (was: Split register files)	Paul A. Clayton
Re: Mill conAsm vs genAsm	Stefan Monnier
Re: Split register files	Stefan Monnier
Re: Split register files	Thomas Koenig
Re: Split register files	John Dallman
Re: Split register files	Anton Ertl
Re: Split register files	Stefan Monnier
Re: Split register files	MitchAlsup

Pages:123 4 5 6 7 8

Re: Split register files

<sbd3sf$ns4$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18164&group=comp.arch#18164

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 28 Jun 2021 11:19:58 -0700
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <sbd3sf$ns4$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
<sbd1fq$b2h$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 28 Jun 2021 18:19:59 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="cf88ae1f0bdd23fad1b092f01507761e";
logging-data="24452"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+6VpC986TD0hbiNqJaWsvTigctN9kzoRg="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:8Eiry2RtF3G9N/eQLMHmrdyoyiE=
In-Reply-To: <sbd1fq$b2h$1@dont-email.me>
Content-Language: en-US

by: Stephen Fuld - Mon, 28 Jun 2021 18:19 UTC

On 6/28/2021 10:36 AM, BGB wrote:
> On 6/26/2021 10:43 AM, MitchAlsup wrote:
>> On Saturday, June 26, 2021 at 4:32:18 AM UTC-5, Thomas Koenig wrote:
>>> Split register files have been proposed, for example in
>>> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
>>> However, at least in this report the ISA was not changed.
>>>
>>> So, how about this:
>> <><><>
>>>
>>> Comments? Would offer no advantages over conventional ooO
>>> architecture? Has been tried before and didn't catch on
>>> because...?
>> <
>> Just put me down as no. How registers are named and used is a compiler
>> problem not a hardware problem.
>
> Agreed.
>
>
>> <
>> If you need more read ports, replicate the register file and see that
>> both
>> copies get the same values written into them.
>> <
>> If you need more write ports, you are already SoL.
>>
>
> IME, what worked OK on Xilinx FPGAs:
> Clone the register arrays for each read port;
> Clone the register arrays again for each write port;
> Use an array of tag bits to encode which array is up-to-date.
>
> For 6R+3W, this effectively means 18 internal copies of the main
> register array, plus a mess of internal 2-bit tag-bit registers.

I don't understand this. Let's say you write to one register. Say it
happens to be lane 1. Subsequent reads must come from that copy, and
not any other (as they are not current), so the extra read ports and
register copies are unused. You have the same contention issues, lots
more hardware and no benefit.

What am I missing?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Split register files

<sbdetd$q4e$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18165&group=comp.arch#18165

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 28 Jun 2021 16:25:46 -0500
Organization: A noiseless patient Spider
Lines: 99
Message-ID: <sbdetd$q4e$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
<sbd1fq$b2h$1@dont-email.me> <sbd3sf$ns4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 28 Jun 2021 21:28:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f171d8a10e739ea3d9f38825b6ca398d";
logging-data="26766"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18V1/pN5cvwFstz2iJawIvj"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:WvclT6+/eVKaaIqKVnJkDux9ix4=
In-Reply-To: <sbd3sf$ns4$1@dont-email.me>
Content-Language: en-US

by: BGB - Mon, 28 Jun 2021 21:25 UTC

On 6/28/2021 1:19 PM, Stephen Fuld wrote:
> On 6/28/2021 10:36 AM, BGB wrote:
>> On 6/26/2021 10:43 AM, MitchAlsup wrote:
>>> On Saturday, June 26, 2021 at 4:32:18 AM UTC-5, Thomas Koenig wrote:
>>>> Split register files have been proposed, for example in
>>>> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
>>>> However, at least in this report the ISA was not changed.
>>>>
>>>> So, how about this:
>>> <><><>
>>>>
>>>> Comments? Would offer no advantages over conventional ooO
>>>> architecture? Has been tried before and didn't catch on
>>>> because...?
>>> <
>>> Just put me down as no. How registers are named and used is a compiler
>>> problem not a hardware problem.
>>
>> Agreed.
>>
>>
>>> <
>>> If you need more read ports, replicate the register file and see that
>>> both
>>> copies get the same values written into them.
>>> <
>>> If you need more write ports, you are already SoL.
>>>
>>
>> IME, what worked OK on Xilinx FPGAs:
>>    Clone the register arrays for each read port;
>>    Clone the register arrays again for each write port;
>>    Use an array of tag bits to encode which array is up-to-date.
>>
>> For 6R+3W, this effectively means 18 internal copies of the main
>> register array, plus a mess of internal 2-bit tag-bit registers.
>
> I don't understand this. Let's say you write to one register. Say it
> happens to be lane 1. Subsequent reads must come from that copy, and
> not any other (as they are not current), so the extra read ports and
> register copies are unused. You have the same contention issues, lots
> more hardware and no benefit.
>
> What am I missing?
>

Both LUTRAM (Distributed RAM) and Block-RAM follows a pattern where you
can generally write to it in one location and read from it in another.

LUTRAM is slightly more flexible in the sense that it can be read via
combinatorial logic, whereas Block-RAM can't (can only be read via a
clock-edge), ...

But, internally LUTRAM is fairly specific about having a 1:1 mapping
between input and output locations (it effectively only has two
unidirectional ports; so one port can only write to the array, and the
other can only read from it).

One can do multiple reads from a array in Verilog (so, write from one
location, read from multiple locations), but typically it ends up being
implicitly duplicated between each location it is accessed from when
this is done.

But, as soon as one tries to write to the same array from multiple
locations, "crap hits the fan" in terms of resource costs. Not sure what
happens exactly, but the LUT cost basically goes insane when this happens.

Hence the use of the 2-bit tag array, which at least partly contains
this issue. While the tag array itself still suffers from this issue,
given it is only 2 bits wide (so, ~ 128 bits for 64 GPRs), the relative
cost is a lot less than if it were 64 bits wide.

In order to follow 1:1 pattern, one effectively ends up with the arrays
being duplicated both based on the write ports, as well as the read
ports, and so 3*6=18.

And, every time one stores a value via a write port, it ends up being
duplicated to 6 different arrays in parallel (one accessible via each of
the read ports), and reading from a read port reads from one of the 3
arrays it has access to, one from each write port, based on the tag bits
for the associated register.

As noted, something about this pattern breaks on Altera FPGAs though,
implying there may be some significant functional difference in how they
handle LUTRAM.

In the fallback case, the 2nd place option basically replaces the LUTRAM
arrays with a bunch of state-machine registers (Flip-Flops) and a big
"case()" block to select values from the various registers for the
various register ports. This approach seems to have a lower per-port
cost, but is a lot more sensitive to the number and size of the registers.

But, yeah, if there is a cheaper way to do all this, it might be useful
to know about...

Re: Split register files

<0fdc7366-3bd7-4a69-8e22-06821ae3c5den@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18166&group=comp.arch#18166

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1116:: with SMTP id o22mr12784653qkk.299.1624918586244;
Mon, 28 Jun 2021 15:16:26 -0700 (PDT)
X-Received: by 2002:aca:ed57:: with SMTP id l84mr15297413oih.119.1624918585990;
Mon, 28 Jun 2021 15:16:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 15:16:25 -0700 (PDT)
In-Reply-To: <2021Jun26.152415@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:8583:19e7:44c5:f7e6;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:8583:19e7:44c5:f7e6
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0fdc7366-3bd7-4a69-8e22-06821ae3c5den@googlegroups.com>
Subject: Re: Split register files
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 28 Jun 2021 22:16:26 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Mon, 28 Jun 2021 22:16 UTC

On Saturday, June 26, 2021 at 8:03:56 AM UTC-6, Anton Ertl wrote:

> url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},

> url = {http://www.ece.wisc.edu/~wddd/2007/papers/wddd_01.pdf},
> url2 = {http://www-sal.cs.uiuc.edu/~zilles/papers/lanes.wddd-2007.pdf},

Sad to say, your links are broken.

John Savard

Re: Split register files

<2181bd83-d093-46e7-811f-51814f24a858n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18167&group=comp.arch#18167

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:147:: with SMTP id v7mr24305898qtw.246.1624919615981;
Mon, 28 Jun 2021 15:33:35 -0700 (PDT)
X-Received: by 2002:a05:6808:b0f:: with SMTP id s15mr5089831oij.30.1624919615732;
Mon, 28 Jun 2021 15:33:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 15:33:35 -0700 (PDT)
In-Reply-To: <sbdetd$q4e$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4de5:91b7:1e4c:8ba8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4de5:91b7:1e4c:8ba8
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
<sbd1fq$b2h$1@dont-email.me> <sbd3sf$ns4$1@dont-email.me> <sbdetd$q4e$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2181bd83-d093-46e7-811f-51814f24a858n@googlegroups.com>
Subject: Re: Split register files
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Jun 2021 22:33:35 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 28 Jun 2021 22:33 UTC

On Monday, June 28, 2021 at 4:28:15 PM UTC-5, BGB wrote:
> On 6/28/2021 1:19 PM, Stephen Fuld wrote:
> > On 6/28/2021 10:36 AM, BGB wrote:
> >> On 6/26/2021 10:43 AM, MitchAlsup wrote:
> >>> On Saturday, June 26, 2021 at 4:32:18 AM UTC-5, Thomas Koenig wrote:
> >>>> Split register files have been proposed, for example in
> >>>> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
> >>>> However, at least in this report the ISA was not changed.
> >>>>
> >>>> So, how about this:
> >>> <><><>
> >>>>
> >>>> Comments? Would offer no advantages over conventional ooO
> >>>> architecture? Has been tried before and didn't catch on
> >>>> because...?
> >>> <
> >>> Just put me down as no. How registers are named and used is a compiler
> >>> problem not a hardware problem.
> >>
> >> Agreed.
> >>
> >>
> >>> <
> >>> If you need more read ports, replicate the register file and see that
> >>> both
> >>> copies get the same values written into them.
> >>> <
> >>> If you need more write ports, you are already SoL.
> >>>
> >>
> >> IME, what worked OK on Xilinx FPGAs:
> >> Clone the register arrays for each read port;
> >> Clone the register arrays again for each write port;
> >> Use an array of tag bits to encode which array is up-to-date.
> >>
> >> For 6R+3W, this effectively means 18 internal copies of the main
> >> register array, plus a mess of internal 2-bit tag-bit registers.
> >
> > I don't understand this. Let's say you write to one register. Say it
> > happens to be lane 1. Subsequent reads must come from that copy, and
> > not any other (as they are not current), so the extra read ports and
> > register copies are unused. You have the same contention issues, lots
> > more hardware and no benefit.
> >
> > What am I missing?
> >
> Both LUTRAM (Distributed RAM) and Block-RAM follows a pattern where you
> can generally write to it in one location and read from it in another.
<
In a single clock cycle.
>
> LUTRAM is slightly more flexible in the sense that it can be read via
> combinatorial logic, whereas Block-RAM can't (can only be read via a
> clock-edge), ...
<
By capturing of the address at the clock edge and presenting read data
by the successive clock edge.
>
> But, internally LUTRAM is fairly specific about having a 1:1 mapping
> between input and output locations (it effectively only has two
> unidirectional ports; so one port can only write to the array, and the
> other can only read from it).
>
> One can do multiple reads from a array in Verilog (so, write from one
> location, read from multiple locations), but typically it ends up being
> implicitly duplicated between each location it is accessed from when
> this is done.
<
Reads can be done with decoder selecting multiplexer outputs.
Writes are more complicated in that one needs to multiplex the write
from several write busses and coordinate with a select line. If you try
to write the same register from two ports at the same time, the logic
can actually kill itself--two current sources fighting over a single voltage.
>
>
> But, as soon as one tries to write to the same array from multiple
> locations, "crap hits the fan" in terms of resource costs. Not sure what
> happens exactly, but the LUT cost basically goes insane when this happens.
<
Whereas read ports are vectors of multiplexers spread over an array of
latches/flip-flops, Write ports have a multiplexer from the several writing
buses to each register and "a bit more logic". Write ports are about
quadratically more gates than read ports.
>
> Hence the use of the 2-bit tag array, which at least partly contains
> this issue. While the tag array itself still suffers from this issue,
> given it is only 2 bits wide (so, ~ 128 bits for 64 GPRs), the relative
> cost is a lot less than if it were 64 bits wide.
>
>
> In order to follow 1:1 pattern, one effectively ends up with the arrays
> being duplicated both based on the write ports, as well as the read
> ports, and so 3*6=18.
>
> And, every time one stores a value via a write port, it ends up being
> duplicated to 6 different arrays in parallel (one accessible via each of
> the read ports), and reading from a read port reads from one of the 3
> arrays it has access to, one from each write port, based on the tag bits
> for the associated register.
>
>
> As noted, something about this pattern breaks on Altera FPGAs though,
> implying there may be some significant functional difference in how they
> handle LUTRAM.
>
>
> In the fallback case, the 2nd place option basically replaces the LUTRAM
> arrays with a bunch of state-machine registers (Flip-Flops) and a big
> "case()" block to select values from the various registers for the
> various register ports. This approach seems to have a lower per-port
> cost, but is a lot more sensitive to the number and size of the registers.
>
>
> But, yeah, if there is a cheaper way to do all this, it might be useful
> to know about...
<
There is at the transistor level, there is not from the gate level.

Re: Split register files

<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18168&group=comp.arch#18168

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:71c1:: with SMTP id m184mr17934506qkc.367.1624920648453;
Mon, 28 Jun 2021 15:50:48 -0700 (PDT)
X-Received: by 2002:a9d:ecf:: with SMTP id 73mr1735542otj.5.1624920648172;
Mon, 28 Jun 2021 15:50:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 15:50:47 -0700 (PDT)
In-Reply-To: <2021Jun26.152415@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:8583:19e7:44c5:f7e6;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:8583:19e7:44c5:f7e6
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>
Subject: Re: Split register files
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 28 Jun 2021 22:50:48 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Mon, 28 Jun 2021 22:50 UTC

On Saturday, June 26, 2021 at 8:03:56 AM UTC-6, Anton Ertl wrote:

> ILDP has been evaluated by other researchers [salverda&zilles07], and
> found to have more disadvantages than the original paper promised.

Despite the link being broken, Googling the original paper turned it up.

The proposed architecture seems to work like this:

There are eight accumulators; each one is associated with its own
pipeline. There's a bank of 64 registers, shared between the eight
pipelines.

This way, each pipeline can be in-order, but the computer is keeping
busy since instructions belonging to different strands are interleaved
in the program.

John Savard

Re: Split register files

<21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18169&group=comp.arch#18169

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1116:: with SMTP id o22mr13153485qkk.299.1624924548392;
Mon, 28 Jun 2021 16:55:48 -0700 (PDT)
X-Received: by 2002:a05:6808:d47:: with SMTP id w7mr3625431oik.157.1624924548119;
Mon, 28 Jun 2021 16:55:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 16:55:47 -0700 (PDT)
In-Reply-To: <4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4de5:91b7:1e4c:8ba8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4de5:91b7:1e4c:8ba8
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
Subject: Re: Split register files
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Jun 2021 23:55:48 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 28 Jun 2021 23:55 UTC

On Monday, June 28, 2021 at 5:50:49 PM UTC-5, Quadibloc wrote:
> On Saturday, June 26, 2021 at 8:03:56 AM UTC-6, Anton Ertl wrote:
> > ILDP has been evaluated by other researchers [salverda&zilles07], and
> > found to have more disadvantages than the original paper promised.
> Despite the link being broken, Googling the original paper turned it up.
>
> The proposed architecture seems to work like this:
>
> There are eight accumulators; each one is associated with its own
> pipeline. There's a bank of 64 registers, shared between the eight
> pipelines.
>
> This way, each pipeline can be in-order, but the computer is keeping
> busy since instructions belonging to different strands are interleaved
> in the program.
>
> John Savard
<
Make 8 into 4 and 64 registers into 32 and it describes the Mc 88100
pipeline.

Re: Split register files

<sbdph3$n9v$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18170&group=comp.arch#18170

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 28 Jun 2021 19:26:56 -0500
Organization: A noiseless patient Spider
Lines: 203
Message-ID: <sbdph3$n9v$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2a3f1b04-b986-46e3-b5fc-efe6218eaae7n@googlegroups.com>
<sbd1fq$b2h$1@dont-email.me> <sbd3sf$ns4$1@dont-email.me>
<sbdetd$q4e$1@dont-email.me>
<2181bd83-d093-46e7-811f-51814f24a858n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 00:29:23 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="482124cd17f6036927ec04040c37f73d";
logging-data="23871"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+BHiHu37FrrS8Iv1O51dg/"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:l9qlcuZ1wzdiK1k6T+yQnMpSCgg=
In-Reply-To: <2181bd83-d093-46e7-811f-51814f24a858n@googlegroups.com>
Content-Language: en-US

by: BGB - Tue, 29 Jun 2021 00:26 UTC

On 6/28/2021 5:33 PM, MitchAlsup wrote:
> On Monday, June 28, 2021 at 4:28:15 PM UTC-5, BGB wrote:
>> On 6/28/2021 1:19 PM, Stephen Fuld wrote:
>>> On 6/28/2021 10:36 AM, BGB wrote:
>>>> On 6/26/2021 10:43 AM, MitchAlsup wrote:
>>>>> On Saturday, June 26, 2021 at 4:32:18 AM UTC-5, Thomas Koenig wrote:
>>>>>> Split register files have been proposed, for example in
>>>>>> http://www.owlnet.rice.edu/~elec525/projects/prf_report.pdf .
>>>>>> However, at least in this report the ISA was not changed.
>>>>>>
>>>>>> So, how about this:
>>>>> <><><>
>>>>>>
>>>>>> Comments? Would offer no advantages over conventional ooO
>>>>>> architecture? Has been tried before and didn't catch on
>>>>>> because...?
>>>>> <
>>>>> Just put me down as no. How registers are named and used is a compiler
>>>>> problem not a hardware problem.
>>>>
>>>> Agreed.
>>>>
>>>>
>>>>> <
>>>>> If you need more read ports, replicate the register file and see that
>>>>> both
>>>>> copies get the same values written into them.
>>>>> <
>>>>> If you need more write ports, you are already SoL.
>>>>>
>>>>
>>>> IME, what worked OK on Xilinx FPGAs:
>>>> Clone the register arrays for each read port;
>>>> Clone the register arrays again for each write port;
>>>> Use an array of tag bits to encode which array is up-to-date.
>>>>
>>>> For 6R+3W, this effectively means 18 internal copies of the main
>>>> register array, plus a mess of internal 2-bit tag-bit registers.
>>>
>>> I don't understand this. Let's say you write to one register. Say it
>>> happens to be lane 1. Subsequent reads must come from that copy, and
>>> not any other (as they are not current), so the extra read ports and
>>> register copies are unused. You have the same contention issues, lots
>>> more hardware and no benefit.
>>>
>>> What am I missing?
>>>
>> Both LUTRAM (Distributed RAM) and Block-RAM follows a pattern where you
>> can generally write to it in one location and read from it in another.
> <
> In a single clock cycle.

Yeah.

One can possibly do more stuff if they treat the read and write cases as
mutually exclusive.

One can generally read from and write to BRAM every cycle, though if
reading from the same location that was just written to, the data tends
to be stale (forwarding logic is generally needed to work around this case).

>>
>> LUTRAM is slightly more flexible in the sense that it can be read via
>> combinatorial logic, whereas Block-RAM can't (can only be read via a
>> clock-edge), ...
> <
> By capturing of the address at the clock edge and presenting read data
> by the successive clock edge.

Yeah.

Looking into it, it seems that Xilinx and Altera differ here slightly:
Xilinx allows the LUTRAM to use a combinatorial access pattern.

Most of the Altera LUTRAM seems to use a pattern of using rising and
falling clock edges for input and output for most other LUTRAM sizes.

So, Xilinx LUTRAMs are:
32x3 bit;
64x2 bit;
128x1 bit.

With write-on-rising-edge / read-whenever behavior.

And, Altera is (Cyclone V):
32x20 bit (MLAB-640)

With write on falling edge, read on rising edge.

Not sure if an MLAB can be split up and used as multiple (independent)
RAMs, or if it always (only) uses the entire 20 bits.

And, apparently, if it doesn't fit into the above pattern, it is faked
using LUTs and Flip-Flops.

This sort of implies that at-best, each copy of the GPR array would need
8x MLABs, or 144 MLABs per core (where 1 MLAB ~= 10 adjacent ALMs ~= 20
LUT6's; with up to 1/4 of the device usable as MLABs).

So, best possible case, the GPR file in the BJX2 core should use ~ 14%
of the LUTRAM budget in a DE10 or similar (still unconfirmed if Quartus
will accept 64-element arrays assuming edge-timing behavior is followed,
....).

....

So, I guess a difference here is that the Xilinx LUTRAM is a little more
flexible and fine-grained, whereas the Altera LUTRAM is less flexible
and coarser grained.

Either way, can note that my initial tests trying to synthesize stuff
for a DE10 didn't particularly make me want to jump out and buy one
(more so when it is also "kinda expensive").

>>
>> But, internally LUTRAM is fairly specific about having a 1:1 mapping
>> between input and output locations (it effectively only has two
>> unidirectional ports; so one port can only write to the array, and the
>> other can only read from it).
>>
>> One can do multiple reads from a array in Verilog (so, write from one
>> location, read from multiple locations), but typically it ends up being
>> implicitly duplicated between each location it is accessed from when
>> this is done.
> <
> Reads can be done with decoder selecting multiplexer outputs.
> Writes are more complicated in that one needs to multiplex the write
> from several write busses and coordinate with a select line. If you try
> to write the same register from two ports at the same time, the logic
> can actually kill itself--two current sources fighting over a single voltage.

OK.

>>
>>
>> But, as soon as one tries to write to the same array from multiple
>> locations, "crap hits the fan" in terms of resource costs. Not sure what
>> happens exactly, but the LUT cost basically goes insane when this happens.
> <
> Whereas read ports are vectors of multiplexers spread over an array of
> latches/flip-flops, Write ports have a multiplexer from the several writing
> buses to each register and "a bit more logic". Write ports are about
> quadratically more gates than read ports.

Or something like this.

In any case, the naive strategy of trying to use the same array being
written from multiple locations at the same time is "don't do it, it is
bad"...

Would be nice if the Verilog compiler were like "Well, I can transform
it into several LUTRAMs and some tag bits", but instead the pattern
tends to be more like "Well, it doesn't fit one of the known patterns;
fake it using an epic crap ton of LUTs and FFs instead...".

>>
>> Hence the use of the 2-bit tag array, which at least partly contains
>> this issue. While the tag array itself still suffers from this issue,
>> given it is only 2 bits wide (so, ~ 128 bits for 64 GPRs), the relative
>> cost is a lot less than if it were 64 bits wide.
>>
>>
>> In order to follow 1:1 pattern, one effectively ends up with the arrays
>> being duplicated both based on the write ports, as well as the read
>> ports, and so 3*6=18.
>>
>> And, every time one stores a value via a write port, it ends up being
>> duplicated to 6 different arrays in parallel (one accessible via each of
>> the read ports), and reading from a read port reads from one of the 3
>> arrays it has access to, one from each write port, based on the tag bits
>> for the associated register.
>>
>>
>> As noted, something about this pattern breaks on Altera FPGAs though,
>> implying there may be some significant functional difference in how they
>> handle LUTRAM.
>>
>>
>> In the fallback case, the 2nd place option basically replaces the LUTRAM
>> arrays with a bunch of state-machine registers (Flip-Flops) and a big
>> "case()" block to select values from the various registers for the
>> various register ports. This approach seems to have a lower per-port
>> cost, but is a lot more sensitive to the number and size of the registers.
>>
>>
>> But, yeah, if there is a cheaper way to do all this, it might be useful
>> to know about...
> <
> There is at the transistor level, there is not from the gate level.
>

OK.

Re: Split register files

<d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18178&group=comp.arch#18178

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:ad16:: with SMTP id f22mr9918704qkm.160.1624942999456;
Mon, 28 Jun 2021 22:03:19 -0700 (PDT)
X-Received: by 2002:aca:4f16:: with SMTP id d22mr8129643oib.44.1624942999171;
Mon, 28 Jun 2021 22:03:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 22:03:18 -0700 (PDT)
In-Reply-To: <21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:dc90:822c:5a3e:609f;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:dc90:822c:5a3e:609f
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com> <21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
Subject: Re: Split register files
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 29 Jun 2021 05:03:19 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Tue, 29 Jun 2021 05:03 UTC

On Monday, June 28, 2021 at 5:55:49 PM UTC-6, MitchAlsup wrote:

> Make 8 into 4 and 64 registers into 32 and it describes the Mc 88100
> pipeline.

I looked into the 88100 again after that.

It is a tragic story that it got pushed aside by the PowerPC from IBM,
just like how PA-RISC got pushed aside by the Itanium.

I suppose you can say the 88100 had four pipelines - one for integer,
one for floating-point, two for the memory controllers - but that's not
what that paper was talking about. The paper was describing a system
where there were eight accumulators visible to the programmer, and
eight integer pipelines associated with them. If floating-point is added,
there would be another eight floating-point pipelines perhaps.

So they don't seem to have stolen your idea.

John Savard

Re: Split register files

<feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18179&group=comp.arch#18179

copy link Newsgroups: comp.arch

X-Received: by 2002:ae9:c30f:: with SMTP id n15mr27881993qkg.71.1624943358129; Mon, 28 Jun 2021 22:09:18 -0700 (PDT)
X-Received: by 2002:a9d:6e01:: with SMTP id e1mr2830801otr.178.1624943357903; Mon, 28 Jun 2021 22:09:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Jun 2021 22:09:17 -0700 (PDT)
In-Reply-To: <d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f8e3:d700:dc90:822c:5a3e:609f; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f8e3:d700:dc90:822c:5a3e:609f
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at> <4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com> <21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com> <d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
Subject: Re: Split register files
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 29 Jun 2021 05:09:18 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 21

by: Quadibloc - Tue, 29 Jun 2021 05:09 UTC

On Monday, June 28, 2021 at 11:03:20 PM UTC-6, Quadibloc wrote:

> It is a tragic story that it got pushed aside by the PowerPC from IBM,
> just like how PA-RISC got pushed aside by the Itanium.

Of course, that part of the story would have been _more_ tragic had
it not been for the truly tragic part - of how Motorola couldn't price
the 88100 aggressively to give it a start because of infighting from the
68000 section of the company.

Would Lisa Su have put up with that kind of nonsense within AMD for
a second?

Now, it _is_ true that AMD might be said to have "done the same thing" in
a sense - it shelved a project to make ARM server chips in order to concentrate
on Ryzen. But while there's a similarity, in that x86 and the 68k are "present-day"
projects, and moving to ARM or to RISC with the 88100 are investments in the
future, I think it's far and away clear that what AMD did was in the best interests of
the company as a whole, with its limited development resources, while what
Motorola did was not.

John Savard

Re: Split register files

<sbebbo$h1g$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18180&group=comp.arch#18180

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 28 Jun 2021 22:33:43 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <sbebbo$h1g$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>
<21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
<d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
<feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 05:33:45 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e99e106100efa09992eeed68042f5d7d";
logging-data="17456"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18l2gAeG1FADyASFl4SRMAofhik5Qu67Sc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:e1cOnKIsEt37lerMbl6eX6XZTGk=
In-Reply-To: <feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
Content-Language: en-US

by: Stephen Fuld - Tue, 29 Jun 2021 05:33 UTC

On 6/28/2021 10:09 PM, Quadibloc wrote:
> On Monday, June 28, 2021 at 11:03:20 PM UTC-6, Quadibloc wrote:
>
>> It is a tragic story that it got pushed aside by the PowerPC from IBM,
>> just like how PA-RISC got pushed aside by the Itanium.
>
> Of course, that part of the story would have been _more_ tragic had
> it not been for the truly tragic part - of how Motorola couldn't price
> the 88100 aggressively to give it a start because of infighting from the
> 68000 section of the company.
>
> Would Lisa Su have put up with that kind of nonsense within AMD for
> a second?
>
> Now, it _is_ true that AMD might be said to have "done the same thing" in
> a sense - it shelved a project to make ARM server chips in order to concentrate
> on Ryzen.

Also, in the mid 1990s they dropped any future development for their
existing 29000 series of RISC chips

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Split register files

<149275ba-4c6d-4219-afd5-ce71a5910856n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18195&group=comp.arch#18195

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:c09:: with SMTP id 9mr14940520qkm.453.1624986925191;
Tue, 29 Jun 2021 10:15:25 -0700 (PDT)
X-Received: by 2002:a05:6830:447:: with SMTP id d7mr5286455otc.329.1624986924948;
Tue, 29 Jun 2021 10:15:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 10:15:24 -0700 (PDT)
In-Reply-To: <d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com> <21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
<d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <149275ba-4c6d-4219-afd5-ce71a5910856n@googlegroups.com>
Subject: Re: Split register files
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 17:15:25 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Tue, 29 Jun 2021 17:15 UTC

On Tuesday, June 29, 2021 at 12:03:20 AM UTC-5, Quadibloc wrote:
> On Monday, June 28, 2021 at 5:55:49 PM UTC-6, MitchAlsup wrote:
>
> > Make 8 into 4 and 64 registers into 32 and it describes the Mc 88100
> > pipeline.
> I looked into the 88100 again after that.
>
> It is a tragic story that it got pushed aside by the PowerPC from IBM,
> just like how PA-RISC got pushed aside by the Itanium.
>
> I suppose you can say the 88100 had four pipelines - one for integer,
> one for floating-point, two for the memory controllers - but that's not
> what that paper was talking about.
<
4 data pipelines {integer, memory, FADD, and FMUL} and
1 inst pipeline {fetch}
<
> The paper was describing a system
> where there were eight accumulators visible to the programmer, and
> eight integer pipelines associated with them. If floating-point is added,
> there would be another eight floating-point pipelines perhaps.
>
> So they don't seem to have stolen your idea.
<
I was not complaining about a steal, I was pointing out that, like them,
there are other computers that consist of multiple in order pipelines.
I generally reserve the word-pair "Partially ordered" to describe multiple
in order pipelines. it is like the difference between MM markov models
and GG markov models.
>
> John Savard

Re: Split register files

<e8cd2e19-e6eb-42c7-b1e3-8a507e269c98n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18197&group=comp.arch#18197

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:ad16:: with SMTP id f22mr12938430qkm.160.1624987584294;
Tue, 29 Jun 2021 10:26:24 -0700 (PDT)
X-Received: by 2002:a05:6830:22c9:: with SMTP id q9mr2408729otc.178.1624987583970;
Tue, 29 Jun 2021 10:26:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 10:26:23 -0700 (PDT)
In-Reply-To: <feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <sb6s70$dip$1@newsreader4.netcologne.de> <2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com> <21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
<d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com> <feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e8cd2e19-e6eb-42c7-b1e3-8a507e269c98n@googlegroups.com>
Subject: Re: Split register files
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Tue, 29 Jun 2021 17:26:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Paul A. Clayton - Tue, 29 Jun 2021 17:26 UTC

On Tuesday, June 29, 2021 at 1:09:19 AM UTC-4, Quadibloc wrote:
[snip]
> Of course, that part of the story would have been _more_ tragic had
> it not been for the truly tragic part - of how Motorola couldn't price
> the 88100 aggressively to give it a start because of infighting from the
> 68000 section of the company.

I doubt M88k could have succeeded as a merchant RISC processor.
The Unix workstation market fragmented in ISA (even more than in
software), I think, in large part because the vendors (HP, Sun, IBM, DEC,
SGI?) were large enough to do their own in-house RISCs (a RISC ISA is
easier to design and easier to implement) and were used to vertical
integration (in some cases down to fabrication).

MIPS tried to be a merchant vendor but failed.

Unix eventually developed standardization (and has seemed to move
to just 'Linux-compatible'), but even with a common parent and
initial software base HP-UX was (I have read) very different from
AIX which was different from Solaris, and so on.

One of the reasons Itanium was expected to do better was
that it was expected to be a merchant processor with Intel
selling the same product to any buyer without discrimination.
Even if the ISA/architectural concept did not have issues and
x86 had not been extended to 64 bits, Itanium would not
have matched the hope because HP was given preferential
treatment and with increasing integration (and volume
effects) system-level hardware distinctives become less
cost-effective. (EPIC issues and x86-64 did exist, so such
secondary effects were not as significant.)

The role of merchant processor vendor is not easy.
ARM avoided this by (mainly) not making chips, being a
design (and support) vendor. (This also meant that some
optimizations were hindered — e.g., cores were less tuned
to their memory systems because users would want to
configure different memory systems.)

Re: Split register files

<sbfrum$gir$3@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18206&group=comp.arch#18206

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 29 Jun 2021 19:23:02 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbfrum$gir$3@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>
<21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
<d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
<feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
<e8cd2e19-e6eb-42c7-b1e3-8a507e269c98n@googlegroups.com>
Injection-Date: Tue, 29 Jun 2021 19:23:02 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="16987"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Tue, 29 Jun 2021 19:23 UTC

Paul A. Clayton <paaronclayton@gmail.com> schrieb:

> Unix eventually developed standardization (and has seemed to move
> to just 'Linux-compatible'), but even with a common parent and
> initial software base HP-UX was (I have read) very different from
> AIX which was different from Solaris, and so on.

In those days, compiling an open source package was a bit
of a nightmare... all those #ifdef __HPUX (or whatever you
had).

I still have a login on an AIX and a Solaris box. AIX feels...
strange:

$ ls asdfasdfasdf
ls: 0653-341 The file asdfasdfasdf does not exist.

Numbered error messages are so IBM and so not UNIX.

Re: Split register files

<memo.20210630004946.12384m@jgd.cix.co.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18224&group=comp.arch#18224

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jgd...@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Wed, 30 Jun 2021 00:49 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <memo.20210630004946.12384m@jgd.cix.co.uk>
References: <e8cd2e19-e6eb-42c7-b1e3-8a507e269c98n@googlegroups.com>
Reply-To: jgd@cix.co.uk
Injection-Info: reader02.eternal-september.org; posting-host="5636f0b5cbf0f4e7a0007336794585a7";
logging-data="24882"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4QV0MkZ+jDBXM7jTFIofw9DuMTbA5S70="
Cancel-Lock: sha1:nJhc3gMk9U9kYy7k0W7X/WMdqIY=

by: John Dallman - Tue, 29 Jun 2021 23:49 UTC

In article <e8cd2e19-e6eb-42c7-b1e3-8a507e269c98n@googlegroups.com>,
paaronclayton@gmail.com (Paul A. Clayton) wrote:

> ... HP-UX was (I have read) very different from
> AIX which was different from Solaris, and so on.

Significantly different: API differences, different compiler and linker
options, different content in system headers, weird memory layouts of
their own for 32-bit processes (64-bit are much more alike), different
ideas about shared libraries. Nothing that you can't overcome, but you
need a fairly generalised set of development tooling to do it.

> Itanium would not have matched the hope because HP was given
> preferential treatment and with increasing integration (and
> volume effects) system-level hardware distinctives become less
> cost-effective.

A contributing factor was that Intel allowed HP to have a monopoly on
motherboard chipsets for small systems. That meant that ISVs who wanted
to support Itanium on a small scale had to buy HP's systems, which were
expensive (an HP-UX license was priced in) and kind of fiddly to
administer. That meant that ISVs tended to go "maybe next year?" and
little software was available.

John

Re: Split register files

<sbgc0j$qg2$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18226&group=comp.arch#18226

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: not_va...@comcast.net (James Van Buskirk)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 29 Jun 2021 17:57:02 -0600
Organization: A noiseless patient Spider
Lines: 2
Message-ID: <sbgc0j$qg2$1@dont-email.me>
References: <sb7o4e$243$1@newsreader4.netcologne.de> <memo.20210626201121.12384b@jgd.cix.co.uk> <sb801m$80l$1@newsreader4.netcologne.de> <sb9vie$bim$1@dont-email.me> <sbadsa$r84$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain;
format=flowed;
charset="Windows-1252";
reply-type=original
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 23:57:07 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f3d268768c0342c68bd4fdba5f6d0b00";
logging-data="27138"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19QXg6JCEG0osRinCO8NVivPtc2zv+TcWE="
Cancel-Lock: sha1:jGpN+8DbImlsjxE+td8DC/1NjN8=
X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331
In-Reply-To: <sbadsa$r84$1@newsreader4.netcologne.de>
X-Newsreader: Microsoft Windows Live Mail 16.4.3528.331
Importance: Normal
X-Priority: 3
X-MSMail-Priority: Normal

by: James Van Buskirk - Tue, 29 Jun 2021 23:57 UTC

"Thomas Koenig" wrote in message
news:sbadsa$r84$1@newsreader4.netcologne.de...

> James Van Buskirk <not_valid@comcast.net> schrieb:

> > Do you support pivot operations, like a perhaps partial bit
> > reversal permutation of the register file?

> I haven't given it any thought so far (but then again,
> I am not sure what exactly this means).

> Could you maybe elaborate?

An FFT can be thought of as factoring the DFT matrix into matrices
with at most 2 nonzero entries per row. The conceptually simplest
realization of an FFT algorithm thus reads two values from memory
and then produces two linear combinations of them using values
loaded from the constants pool and then stores the results back
in memory. Even if everything comes from L1 cache this process
kind of blows up your load-store bandwidth.

You can minimize cache access by loading 2**N values into the
register file and them walking them through N stages (effective
factor matrix multiplies) before once again entrusting the results
to the memory subsystem.

With the register file organized into 8 conspiracies or 'cells' of 8
registers each which can't talk to each other directly but only
via dedicated data swizzling operations you are limited to N=3
before you have to reshuffle the data between cells and even
then you have the problem of intermediate results and
constants trying to burst the scheme at its seams. The data
swizzling amounts to bit-reversed addressing so if it could be
achieved by some magic operation that would get us to N=6
or thereabouts.

Various flavors of DCTs share the core algorithm with full-
complex and real-half complex FFTs so this magic would seem
to have the potential to help all of these algorithms.

Re: Split register files

<sbh665$sht$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18239&group=comp.arch#18239

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Wed, 30 Jun 2021 07:23:49 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <sbh665$sht$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me>
<sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me>
<sb99gi$1r5$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 07:23:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c4091c2beee9601da15e0941bc65066f";
logging-data="29245"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19E2EaZmhw65u5UpewAernZ"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:5aQlPbHBugRVjZfQsHT9aGWvPcw=
sha1:8UIObbBV0WgGzxNMGc08am0JcA8=

by: Brett - Wed, 30 Jun 2021 07:23 UTC

Thomas Koenig <tkoenig@netcologne.de> wrote:
> Brett <ggtgp@yahoo.com> schrieb:
>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>>>
>>>> See the Texas Instruments C64
>>>
>>> That's rather interesting, thanks!
>>>
>>> Looks rather similar to what I had in mind, except that they allow
>>> at least some cross-operation between the different register files,
>>> and they had 16 registers (later 32) per register file.
>>>
>>
>> I like it, however I would burn two instruction bits for the two banks.
>> So you can say bank 0, bank 1, both banks, sequential load to both, etc.
>
> What would you appy them to? Assuming a three-register instruction,
> (and assuming that r0-15 are in the first bank and r16-r31 in the
> second) would you then be able to write
>
> add r2,r17,r22

Add.12 r2,r17,r22 ; both banks run same instruction.

A loop unroll of one is just tagging the opcode to run on both banks.
Which gives good code density.

Add.1s r2,r17,r22 ; add bank 1 but splat result to both banks.

Also would do load pair splat, so banks get sequential items for loop
unroll use, which saves an address register in the second bank from having
to be a clone plus 1 and other grief. The saved register could then be your
loop index, etc.

You get loop unrolling without doubling your rename width or ports, both of
which are critical limits to scaling.

> (so the bank 1 ALU presumably does the calculation and sends over
> the result to bank 0)?
>
> Also, this would use one bit only if you have two banks. I was
> envisioning more than two, then the advantage in encoding bits
> starts to disappear.

You can use 2 bank bits and 1 splat or other bit, cheaper than three 6 bit
register specifiers. 2+1+(3*4) = 15, verses 3*6 = 18.

With four banks the four common choices are; bank 1, all, alternate primary
bank, and all but primary bank. Which is better than my 2 bank mask bits in
the same opcode space.

You can expand this to 3 bits, or the full 4 bit mask each bank which is
wasteful. There is also the issue of splat variants if any, which may
better be handled by other opcodes. There will be a copy bank register
instruction that handles all the tough cases.

Re: Split register files

<sbh6ob$41l$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18241&group=comp.arch#18241

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Wed, 30 Jun 2021 07:33:31 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <sbh6ob$41l$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<2021Jun26.152415@mips.complang.tuwien.ac.at>
<4bbbf7f4-04ed-46ef-bc27-8db2464c361bn@googlegroups.com>
<21110157-3985-42ba-a746-8aeb7b2c2554n@googlegroups.com>
<d85f85de-20de-42e8-b275-6b7db1c26613n@googlegroups.com>
<feb3d5f1-b937-457e-b20e-7c9f78837879n@googlegroups.com>
<e8cd2e19-e6eb-42c7-b1e3-8a507e269c98n@googlegroups.com>
<sbfrum$gir$3@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Jun 2021 07:33:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c4091c2beee9601da15e0941bc65066f";
logging-data="4149"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19L47ahQTnx/vHgCjgKRPpd"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:7935+fitUO6MZ2TFmaUGMCijazw=
sha1:nbNRJgoJrnedsf92uiTXm893JFM=

by: Brett - Wed, 30 Jun 2021 07:33 UTC

Thomas Koenig <tkoenig@netcologne.de> wrote:
> Paul A. Clayton <paaronclayton@gmail.com> schrieb:
>
>> Unix eventually developed standardization (and has seemed to move
>> to just 'Linux-compatible'), but even with a common parent and
>> initial software base HP-UX was (I have read) very different from
>> AIX which was different from Solaris, and so on.
>
> In those days, compiling an open source package was a bit
> of a nightmare... all those #ifdef __HPUX (or whatever you
> had).
>
> I still have a login on an AIX and a Solaris box. AIX feels...
> strange:
>
> $ ls asdfasdfasdf
> ls: 0653-341 The file asdfasdfasdf does not exist.
>
> Numbered error messages are so IBM and so not UNIX.

Numbers are BETTER, I can look a number up and get more diagnostics.
Of course not actually having used AIX I cant say this was truly useful.

Re: Split register files

<sbubiu$unp$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18388&group=comp.arch#18388

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 5 Jul 2021 07:15:42 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <sbubiu$unp$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me>
<sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me>
<sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 5 Jul 2021 07:15:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="017124ae57779b7026fd38f8ef45f41e";
logging-data="31481"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18cDA+k4336txKGY6kTds77"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:L8sEiySE0O2P20+tryODb9l/9FI=
sha1:y7/kAktSmMAmMgaY9SvRXXiwgvY=

by: Brett - Mon, 5 Jul 2021 07:15 UTC

Brett <ggtgp@yahoo.com> wrote:
> Thomas Koenig <tkoenig@netcologne.de> wrote:
>> Brett <ggtgp@yahoo.com> schrieb:
>>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>>>>
>>>>> See the Texas Instruments C64
>>>>
>>>> That's rather interesting, thanks!
>>>>
>>>> Looks rather similar to what I had in mind, except that they allow
>>>> at least some cross-operation between the different register files,
>>>> and they had 16 registers (later 32) per register file.
>>>>
>>>
>>> I like it, however I would burn two instruction bits for the two banks.
>>> So you can say bank 0, bank 1, both banks, sequential load to both, etc.
>>
>> What would you appy them to? Assuming a three-register instruction,
>> (and assuming that r0-15 are in the first bank and r16-r31 in the
>> second) would you then be able to write
>>
>> add r2,r17,r22
>
> Add.12 r2,r17,r22 ; both banks run same instruction.
>
> A loop unroll of one is just tagging the opcode to run on both banks.
> Which gives good code density.
>
> Add.1s r2,r17,r22 ; add bank 1 but splat result to both banks.
>
> Also would do load pair splat, so banks get sequential items for loop
> unroll use, which saves an address register in the second bank from having
> to be a clone plus 1 and other grief. The saved register could then be your
> loop index, etc.
>
> You get loop unrolling without doubling your rename width or ports, both of
> which are critical limits to scaling.
>
>> (so the bank 1 ALU presumably does the calculation and sends over
>> the result to bank 0)?
>>
>> Also, this would use one bit only if you have two banks. I was
>> envisioning more than two, then the advantage in encoding bits
>> starts to disappear.
>
> You can use 2 bank bits and 1 splat or other bit, cheaper than three 6 bit
> register specifiers. 2+1+(3*4) = 15, verses 3*6 = 18.
>
> With four banks the four common choices are; bank 1, all, alternate primary
> bank, and all but primary bank. Which is better than my 2 bank mask bits in
> the same opcode space.
>
> You can expand this to 3 bits, or the full 4 bit mask each bank which is
> wasteful. There is also the issue of splat variants if any, which may
> better be handled by other opcodes. There will be a copy bank register
> instruction that handles all the tough cases.

I have thought about this some more and how it overlaps vector registers
but with more flexibility and have decided that 5 banks of 16 registers is
best so that one can do away with vector registers since you have that
functionality.

All addressing is done in the first two banks, and the other banks are for
unrolling and long calculations that need more than 16 registers. Only the
last four banks support floating point and vector type operations. All
registers are 64 bits.

Basically a sort of super upgraded RISC 68000, without the crap instruction
set arch and limits. You can’t do addressing in vector registers, but can
with this arch.

You could also do a Mill variant of this arch, and with 5 belts you can
dump the evil kluge visible scratchpad resulting in a nicer cleaner
architecture.

You get 80 general registers and quad loop unrolling without quadrupling
your rename width or ports, both of which are critical limits to scaling.

I like this so much I am going to call this approach post RISC. ;)
You are rolling the vector registers into the general register set and
getting a breakthrough in performance.

Re: Split register files

<sbudg8$aje$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18390&group=comp.arch#18390

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Mon, 5 Jul 2021 00:48:22 -0700
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <sbudg8$aje$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 5 Jul 2021 07:48:24 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4bb2466517d7a9384ece9c6eda689209";
logging-data="10862"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/U36eLujnw7bVnx2jgLUyW"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:Go1vXAO2kuUBBME0fEVTZf3lKn8=
In-Reply-To: <sbubiu$unp$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Mon, 5 Jul 2021 07:48 UTC

On 7/5/2021 12:15 AM, Brett wrote:
> Brett <ggtgp@yahoo.com> wrote:
>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>> Brett <ggtgp@yahoo.com> schrieb:
>>>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>>>>>
>>>>>> See the Texas Instruments C64
>>>>>
>>>>> That's rather interesting, thanks!
>>>>>
>>>>> Looks rather similar to what I had in mind, except that they allow
>>>>> at least some cross-operation between the different register files,
>>>>> and they had 16 registers (later 32) per register file.
>>>>>
>>>>
>>>> I like it, however I would burn two instruction bits for the two banks.
>>>> So you can say bank 0, bank 1, both banks, sequential load to both, etc.
>>>
>>> What would you appy them to? Assuming a three-register instruction,
>>> (and assuming that r0-15 are in the first bank and r16-r31 in the
>>> second) would you then be able to write
>>>
>>> add r2,r17,r22
>>
>> Add.12 r2,r17,r22 ; both banks run same instruction.
>>
>> A loop unroll of one is just tagging the opcode to run on both banks.
>> Which gives good code density.
>>
>> Add.1s r2,r17,r22 ; add bank 1 but splat result to both banks.
>>
>> Also would do load pair splat, so banks get sequential items for loop
>> unroll use, which saves an address register in the second bank from having
>> to be a clone plus 1 and other grief. The saved register could then be your
>> loop index, etc.
>>
>> You get loop unrolling without doubling your rename width or ports, both of
>> which are critical limits to scaling.
>>
>>> (so the bank 1 ALU presumably does the calculation and sends over
>>> the result to bank 0)?
>>>
>>> Also, this would use one bit only if you have two banks. I was
>>> envisioning more than two, then the advantage in encoding bits
>>> starts to disappear.
>>
>> You can use 2 bank bits and 1 splat or other bit, cheaper than three 6 bit
>> register specifiers. 2+1+(3*4) = 15, verses 3*6 = 18.
>>
>> With four banks the four common choices are; bank 1, all, alternate primary
>> bank, and all but primary bank. Which is better than my 2 bank mask bits in
>> the same opcode space.
>>
>> You can expand this to 3 bits, or the full 4 bit mask each bank which is
>> wasteful. There is also the issue of splat variants if any, which may
>> better be handled by other opcodes. There will be a copy bank register
>> instruction that handles all the tough cases.
>
> I have thought about this some more and how it overlaps vector registers
> but with more flexibility and have decided that 5 banks of 16 registers is
> best so that one can do away with vector registers since you have that
> functionality.
>
> All addressing is done in the first two banks, and the other banks are for
> unrolling and long calculations that need more than 16 registers. Only the
> last four banks support floating point and vector type operations. All
> registers are 64 bits.
>
> Basically a sort of super upgraded RISC 68000, without the crap instruction
> set arch and limits. You can’t do addressing in vector registers, but can
> with this arch.
>
> You could also do a Mill variant of this arch, and with 5 belts you can
> dump the evil kluge visible scratchpad resulting in a nicer cleaner
> architecture.

Code can have unbounded numbers of concurrently live operands that have
to be put somewhere. If you don't have a scratchpad then you have to be
prepared to stash to memory. Even if you do have a scratchpad you have
to be prepared fo it to overflow and need stashing to memory too.
Scratchpad, like registers, is just an optimization of memory for
certain common special cases.

Re: Split register files

<sc12qv$8ka$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18402&group=comp.arch#18402

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 6 Jul 2021 08:04:47 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 133
Message-ID: <sc12qv$8ka$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me>
<sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me>
<sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me>
<sbubiu$unp$1@dont-email.me>
<sbudg8$aje$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 6 Jul 2021 08:04:47 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8c5715bcef2280bd11360218a9dae5ba";
logging-data="8842"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dKqNtkoYoFqVLTW/SY0IW"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:5y+YSseD4hxp61dgDkYxQg3A2t4=
sha1:N8Qvda0kI5dxByiFccFWtNBjQ4k=

by: Brett - Tue, 6 Jul 2021 08:04 UTC

Ivan Godard <ivan@millcomputing.com> wrote:
> On 7/5/2021 12:15 AM, Brett wrote:
>> Brett <ggtgp@yahoo.com> wrote:
>>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>>> Brett <ggtgp@yahoo.com> schrieb:
>>>>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>>>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>>>>>>
>>>>>>> See the Texas Instruments C64
>>>>>>
>>>>>> That's rather interesting, thanks!
>>>>>>
>>>>>> Looks rather similar to what I had in mind, except that they allow
>>>>>> at least some cross-operation between the different register files,
>>>>>> and they had 16 registers (later 32) per register file.
>>>>>>
>>>>>
>>>>> I like it, however I would burn two instruction bits for the two banks.
>>>>> So you can say bank 0, bank 1, both banks, sequential load to both, etc.
>>>>
>>>> What would you appy them to? Assuming a three-register instruction,
>>>> (and assuming that r0-15 are in the first bank and r16-r31 in the
>>>> second) would you then be able to write
>>>>
>>>> add r2,r17,r22
>>>
>>> Add.12 r2,r17,r22 ; both banks run same instruction.
>>>
>>> A loop unroll of one is just tagging the opcode to run on both banks.
>>> Which gives good code density.
>>>
>>> Add.1s r2,r17,r22 ; add bank 1 but splat result to both banks.
>>>
>>> Also would do load pair splat, so banks get sequential items for loop
>>> unroll use, which saves an address register in the second bank from having
>>> to be a clone plus 1 and other grief. The saved register could then be your
>>> loop index, etc.
>>>
>>> You get loop unrolling without doubling your rename width or ports, both of
>>> which are critical limits to scaling.
>>>
>>>> (so the bank 1 ALU presumably does the calculation and sends over
>>>> the result to bank 0)?
>>>>
>>>> Also, this would use one bit only if you have two banks. I was
>>>> envisioning more than two, then the advantage in encoding bits
>>>> starts to disappear.
>>>
>>> You can use 2 bank bits and 1 splat or other bit, cheaper than three 6 bit
>>> register specifiers. 2+1+(3*4) = 15, verses 3*6 = 18.
>>>
>>> With four banks the four common choices are; bank 1, all, alternate primary
>>> bank, and all but primary bank. Which is better than my 2 bank mask bits in
>>> the same opcode space.
>>>
>>> You can expand this to 3 bits, or the full 4 bit mask each bank which is
>>> wasteful. There is also the issue of splat variants if any, which may
>>> better be handled by other opcodes. There will be a copy bank register
>>> instruction that handles all the tough cases.
>>
>> I have thought about this some more and how it overlaps vector registers
>> but with more flexibility and have decided that 5 banks of 16 registers is
>> best so that one can do away with vector registers since you have that
>> functionality.
>>
>> All addressing is done in the first two banks, and the other banks are for
>> unrolling and long calculations that need more than 16 registers. Only the
>> last four banks support floating point and vector type operations. All
>> registers are 64 bits.
>>
>> Basically a sort of super upgraded RISC 68000, without the crap instruction
>> set arch and limits. You can’t do addressing in vector registers, but can
>> with this arch.
>>
>> You could also do a Mill variant of this arch, and with 5 belts you can
>> dump the evil kluge visible scratchpad resulting in a nicer cleaner
>> architecture.
>
> Code can have unbounded numbers of concurrently live operands that have
> to be put somewhere. If you don't have a scratchpad then you have to be
> prepared to stash to memory. Even if you do have a scratchpad you have
> to be prepared fo it to overflow and need stashing to memory too.
> Scratchpad, like registers, is just an optimization of memory for
> certain common special cases.

I have used architectures with scratchpads or that could configure half the
cache as scratchpad, yuck.
And I have the talent to use such, for the average programer the idea is a
joke.

The age of two way caches that suffer aliasing issues is over, use the
cache as God intended.

Of course in your case the compiler is using the scratchpad, and it may
give you more bandwidth, and that bandwidth costs less power than the
cache.

But I still think a scratchpad is an over complicated kludge and thus you
are living in the past. More importantly it will scare off mediocre
programers. The belt is scary enough as is.

On an OS call the scratchpad can leak information unless cleared, a small
performance liability.

The real problem is that you only have one virtual belt, as that makes
opcodes smaller and seemingly makes things simpler. But not all problems
fit in one belt, fat code blows apart a single belt. Thus a scratchpad to
shoehorn more code with good performance. Fat code purely from cache would
perform badly.

You also have compiler issues, which makes handling more belts difficult.
It is easy to say that one bank should be used for globals and thus rarely
rotate, another for addressing and others for compute, but convincing a
compiler to do something intelligent is really hard.

A loop has a limited number of dependent compute chains, it should be
possible to just randomly assign opcode chains to belts. Filling belts
roughly evenly over time.

You have some secrete sauce still I bet to get to 32 ops a cycle, how you
handle vector support spreading across units? Etc. Maybe you have lots of
belts hiding under the virtual belt, and thus are doing what I am
suggesting.

It’s been years since I read your docs, I apologize for any
mischaracterizations I have made. My mental model of your architecture is
undoubtedly wrong.

I thank you for your posts, they are enlightening.

Brett

Re: Split register files

<sc186o$gns$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18406&group=comp.arch#18406

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 6 Jul 2021 02:36:18 -0700
Organization: A noiseless patient Spider
Lines: 151
Message-ID: <sc186o$gns$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me>
<sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 6 Jul 2021 09:36:25 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c0df05201c996276f1a63add5d1e516f";
logging-data="17148"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18OUI5ykJRt8xfmKAPhSvQf"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:Oj1tqv+6FXCHOep1jtjbeGGX+nk=
In-Reply-To: <sc12qv$8ka$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Tue, 6 Jul 2021 09:36 UTC

On 7/6/2021 1:04 AM, Brett wrote:
> Ivan Godard <ivan@millcomputing.com> wrote:
>> On 7/5/2021 12:15 AM, Brett wrote:
>>> Brett <ggtgp@yahoo.com> wrote:
>>>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>>>> Brett <ggtgp@yahoo.com> schrieb:
>>>>>> Thomas Koenig <tkoenig@netcologne.de> wrote:
>>>>>>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>>>>>>> On 6/26/2021 2:32 AM, Thomas Koenig wrote:
>>>>>>>
>>>>>>>> See the Texas Instruments C64
>>>>>>>
>>>>>>> That's rather interesting, thanks!
>>>>>>>
>>>>>>> Looks rather similar to what I had in mind, except that they allow
>>>>>>> at least some cross-operation between the different register files,
>>>>>>> and they had 16 registers (later 32) per register file.
>>>>>>>
>>>>>>
>>>>>> I like it, however I would burn two instruction bits for the two banks.
>>>>>> So you can say bank 0, bank 1, both banks, sequential load to both, etc.
>>>>>
>>>>> What would you appy them to? Assuming a three-register instruction,
>>>>> (and assuming that r0-15 are in the first bank and r16-r31 in the
>>>>> second) would you then be able to write
>>>>>
>>>>> add r2,r17,r22
>>>>
>>>> Add.12 r2,r17,r22 ; both banks run same instruction.
>>>>
>>>> A loop unroll of one is just tagging the opcode to run on both banks.
>>>> Which gives good code density.
>>>>
>>>> Add.1s r2,r17,r22 ; add bank 1 but splat result to both banks.
>>>>
>>>> Also would do load pair splat, so banks get sequential items for loop
>>>> unroll use, which saves an address register in the second bank from having
>>>> to be a clone plus 1 and other grief. The saved register could then be your
>>>> loop index, etc.
>>>>
>>>> You get loop unrolling without doubling your rename width or ports, both of
>>>> which are critical limits to scaling.
>>>>
>>>>> (so the bank 1 ALU presumably does the calculation and sends over
>>>>> the result to bank 0)?
>>>>>
>>>>> Also, this would use one bit only if you have two banks. I was
>>>>> envisioning more than two, then the advantage in encoding bits
>>>>> starts to disappear.
>>>>
>>>> You can use 2 bank bits and 1 splat or other bit, cheaper than three 6 bit
>>>> register specifiers. 2+1+(3*4) = 15, verses 3*6 = 18.
>>>>
>>>> With four banks the four common choices are; bank 1, all, alternate primary
>>>> bank, and all but primary bank. Which is better than my 2 bank mask bits in
>>>> the same opcode space.
>>>>
>>>> You can expand this to 3 bits, or the full 4 bit mask each bank which is
>>>> wasteful. There is also the issue of splat variants if any, which may
>>>> better be handled by other opcodes. There will be a copy bank register
>>>> instruction that handles all the tough cases.
>>>
>>> I have thought about this some more and how it overlaps vector registers
>>> but with more flexibility and have decided that 5 banks of 16 registers is
>>> best so that one can do away with vector registers since you have that
>>> functionality.
>>>
>>> All addressing is done in the first two banks, and the other banks are for
>>> unrolling and long calculations that need more than 16 registers. Only the
>>> last four banks support floating point and vector type operations. All
>>> registers are 64 bits.
>>>
>>> Basically a sort of super upgraded RISC 68000, without the crap instruction
>>> set arch and limits. You can’t do addressing in vector registers, but can
>>> with this arch.
>>>
>>> You could also do a Mill variant of this arch, and with 5 belts you can
>>> dump the evil kluge visible scratchpad resulting in a nicer cleaner
>>> architecture.
>>
>> Code can have unbounded numbers of concurrently live operands that have
>> to be put somewhere. If you don't have a scratchpad then you have to be
>> prepared to stash to memory. Even if you do have a scratchpad you have
>> to be prepared fo it to overflow and need stashing to memory too.
>> Scratchpad, like registers, is just an optimization of memory for
>> certain common special cases.
>
> I have used architectures with scratchpads or that could configure half the
> cache as scratchpad, yuck.
> And I have the talent to use such, for the average programer the idea is a
> joke.
>
> The age of two way caches that suffer aliasing issues is over, use the
> cache as God intended.
>
> Of course in your case the compiler is using the scratchpad, and it may
> give you more bandwidth, and that bandwidth costs less power than the
> cache.
>
> But I still think a scratchpad is an over complicated kludge and thus you
> are living in the past. More importantly it will scare off mediocre
> programers. The belt is scary enough as is.
>
> On an OS call the scratchpad can leak information unless cleared, a small
> performance liability.

Calls, to the OS or otherwise, get a whole new scratchpad (or so it
appears), courtesy the hardware spiller.

> The real problem is that you only have one virtual belt, as that makes
> opcodes smaller and seemingly makes things simpler. But not all problems
> fit in one belt, fat code blows apart a single belt. Thus a scratchpad to
> shoehorn more code with good performance. Fat code purely from cache would
> perform badly.

Yes; hence the scratchpad.

> You also have compiler issues, which makes handling more belts difficult.
> It is easy to say that one bank should be used for globals and thus rarely
> rotate, another for addressing and others for compute, but convincing a
> compiler to do something intelligent is really hard.

There's only one belt at present. Rather than two belts, just configure
a belt twice as big.

> A loop has a limited number of dependent compute chains, it should be
> possible to just randomly assign opcode chains to belts. Filling belts
> roughly evenly over time.

Too much fanout in open code, although FP codes are better; you would
wind up doing a lot of inter-belt transfers because the dataflow is
tree-like rather than linear.

> You have some secrete sauce still I bet to get to 32 ops a cycle, how you
> handle vector support spreading across units? Etc. Maybe you have lots of
> belts hiding under the virtual belt, and thus are doing what I am
> suggesting.

Nope. Remember, the belt is a naming device, not a shift register.

> It’s been years since I read your docs, I apologize for any
> mischaracterizations I have made. My mental model of your architecture is
> undoubtedly wrong.
>
> I thank you for your posts, they are enlightening.

You're welcome.

> Brett
>

Re: Split register files

<jwvwnq3vc90.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18410&group=comp.arch#18410

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 06 Jul 2021 08:57:10 -0400
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <jwvwnq3vc90.fsf-monnier+comp.arch@gnu.org>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me>
<sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="61f89c2afd5faa4df4ca2092461977cd";
logging-data="9601"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xjTRY0hVaxL/MtDbDUXUH"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:nwlUrf2rRE2nfuxJo0H9TEpWIEM=
sha1:DUEE8vtp5LmLmWtFegaakX/Lm6o=

by: Stefan Monnier - Tue, 6 Jul 2021 12:57 UTC

> I have used architectures with scratchpads or that could configure half the
> cache as scratchpad, yuck.

I don't think these scratchpads are very much like the Mill's scratchpad.

I suspect the "yuck" above refers to the problems you end up having of
administering this scratchpad, sharing it between unrelated functions.

Mill's scratchpad is more like a CPU-supported notion of frame
activation record. Every time you enter a function you get a fresh new
scratchpad and when you return from a function, the scratchpad is thrown
away and the caller recovers instead the scratchpad it had before the
call (scrachpads get "pushed on/popped off the stack" behind the scene).

So it's easy for programmers and compilers to use it as a kind of "slow
register file with register-windows".

Stefan

Re: Split register files

<jwvr1gbvbv2.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18411&group=comp.arch#18411

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 06 Jul 2021 09:06:31 -0400
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <jwvr1gbvbv2.fsf-monnier+comp.arch@gnu.org>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me>
<sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me>
<sc186o$gns$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="61f89c2afd5faa4df4ca2092461977cd";
logging-data="9601"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+3lEX0cF1rZcNKkaryug58"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:YN+yXu76qxjy1O5cYTqyC6FqBSI=
sha1:FWvtVWU4Ajn0VjhCDcHCof+z/38=

by: Stefan Monnier - Tue, 6 Jul 2021 13:06 UTC

> There's only one belt at present. Rather than two belts, just configure
> a belt twice as big.

I think his idea is that with 2 belts (and with control over which
results go to which belt) you can arrange to put on belt 1 the
results that are only needed very shortly, and on belt 2 the results
that are needed in the longer term. If most results are needed only
shortly then belt 1 will move faster and belt 2 will indeed store its
result longer.

Instead, you use the scratchpad which incurs a higher latency. IIUC you
don't care too much about this latency because it should be reasonably
easy to schedule your scratchpad saves&loads in advance, so as long as
"save+load" doesn't take more time than the number of cycles results
stay on the belt (i.e. the length of the belt measured in cycles), then
you can still arrange to have the results "at hand" without delay
when you need them.

Stefan

Re: Split register files

<sc1n9p$938$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18414&group=comp.arch#18414

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 6 Jul 2021 06:54:02 -0700
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <sc1n9p$938$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me>
<sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me>
<jwvwnq3vc90.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 6 Jul 2021 13:54:01 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c0df05201c996276f1a63add5d1e516f";
logging-data="9320"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QDcpoOWr5wz9Kx8cW2vOr"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:wxQDWOdeQ1YW/YvySGUPycXcHlU=
In-Reply-To: <jwvwnq3vc90.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: Ivan Godard - Tue, 6 Jul 2021 13:54 UTC

On 7/6/2021 5:57 AM, Stefan Monnier wrote:
>> I have used architectures with scratchpads or that could configure half the
>> cache as scratchpad, yuck.
>
> I don't think these scratchpads are very much like the Mill's scratchpad.
>
> I suspect the "yuck" above refers to the problems you end up having of
> administering this scratchpad, sharing it between unrelated functions.
>
> Mill's scratchpad is more like a CPU-supported notion of frame
> activation record. Every time you enter a function you get a fresh new
> scratchpad and when you return from a function, the scratchpad is thrown
> away and the caller recovers instead the scratchpad it had before the
> call (scrachpads get "pushed on/popped off the stack" behind the scene).
>
> So it's easy for programmers and compilers to use it as a kind of "slow
> register file with register-windows".
>
>
> Stefan
>

Yes; good description.

Re: Split register files

<sc1oip$mer$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18417&group=comp.arch#18417

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Split register files
Date: Tue, 6 Jul 2021 07:15:52 -0700
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <sc1oip$mer$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de>
<sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me>
<sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me>
<sc186o$gns$1@dont-email.me> <jwvr1gbvbv2.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 6 Jul 2021 14:15:53 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c0df05201c996276f1a63add5d1e516f";
logging-data="23003"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18njH2XSxT79h1kZVXXEA+B"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:MKuo5Ic0+suApiI6b7tztpP0W0E=
In-Reply-To: <jwvr1gbvbv2.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: Ivan Godard - Tue, 6 Jul 2021 14:15 UTC

On 7/6/2021 6:06 AM, Stefan Monnier wrote:
>> There's only one belt at present. Rather than two belts, just configure
>> a belt twice as big.
>
> I think his idea is that with 2 belts (and with control over which
> results go to which belt) you can arrange to put on belt 1 the
> results that are only needed very shortly, and on belt 2 the results
> that are needed in the longer term. If most results are needed only
> shortly then belt 1 will move faster and belt 2 will indeed store its
> result longer.
>
> Instead, you use the scratchpad which incurs a higher latency. IIUC you
> don't care too much about this latency because it should be reasonably
> easy to schedule your scratchpad saves&loads in advance, so as long as
> "save+load" doesn't take more time than the number of cycles results
> stay on the belt (i.e. the length of the belt measured in cycles), then
> you can still arrange to have the results "at hand" without delay
> when you need them.
>
>
> Stefan
>

As the belt is just an encoding device, it can be judged on the
compactness it achieves. Belts don't have a "speed"; things last until
the more recent set exhausts the capacity. So you could partition the
belt and decide to put short-life things in one and longer-life things
in the other, by convention. You'd still need a scratchpad for things
with still longer lives, but the scratch might be less used (and hence
smaller) because there would be fewer holes in the slow belt than in a
double-sized fast belt.

However, this is just a way to selectively preserve longer-life belt
content. We do preservation now with the rescue() operation, which takes
a bitmask covering the belt space and renumbers the selected operands to
the front. It's unclear to me whether a 32-long belt with rescues for
33+ drop lives is worse than two 16-long belts with rescues for one of
them. For space they are the same: choose one of 32 (or 2X16) costs five
bits. For usage, if the average life is 16-32 drops then the 32-long is
best because with 2X16 the data falls off the quick belt and overfills
the slow belt.

Needs some thought.

Pages:123 4 5 6 7 8

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor