Message-ID:

That's one small step for a man; one giant leap for mankind. -- Neil Armstrong

devel / comp.arch / Re: Elastic pipeline cpu

Elastic pipeline cpu

<4276cc4e-111c-4f2e-8d8e-c7370e0d481fn@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=16942&group=comp.arch#16942

X-Received: by 2002:a37:9e4c:: with SMTP id h73mr2291005qke.68.1621471003979;
Wed, 19 May 2021 17:36:43 -0700 (PDT)
X-Received: by 2002:a9d:7612:: with SMTP id k18mr1788015otl.178.1621471003756;
Wed, 19 May 2021 17:36:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 19 May 2021 17:36:43 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dc0:3700:ed5c:65d1:b87d:22cd;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dc0:3700:ed5c:65d1:b87d:22cd
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4276cc4e-111c-4f2e-8d8e-c7370e0d481fn@googlegroups.com>
Subject: Elastic pipeline cpu
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Thu, 20 May 2021 00:36:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: robf...@gmail.com - Thu, 20 May 2021 00:36 UTC

I am trying to find information on elastic pipeline processors but not having much luck.
Were there ever any built this way, or is it just theory? I am having difficulty
with the scoreboarding of registers and how it is handled on a branch miss.
If there is an instruction that tags a register as busy in the register fetch stage, I think that instruction should set the register as available in the commit stage regardless of whether or not the instruction is still valid. Otherwise, wouldn’t the pipeline lock up waiting for the register to become available? An issue is then that invalidated instructions cannot be removed from the pipeline or they wouldn’t make registers available. On a branch miss I was simply going to clear the fifos, but it looks like that won’t work because that will leave registers marked as busy that shouldn’t be. It seems like an elastic pipeline would be slow then as invalid instructions would have to percolate through the pipeline.
I have looked at some sample code of the handling of hazards but I think the code sample is too simple. In the commit stage the registers are flagged as available only if the instruction commits, I think this is too simple and will leave register flagged as busy that shouldn’t be.
I think there needs to be a register busy history correlated to branch instructions. IF there is a branch miss then the previous set of busy registers would be made active. It then gets complicated as busy counters would be used to avoid WAW hazards.

Re: Elastic pipeline cpu

<d94c4bf8-db29-4539-8d75-583a18225368n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16945&group=comp.arch#16945

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:58cc:: with SMTP id u12mr2766762qta.302.1621476497931;
Wed, 19 May 2021 19:08:17 -0700 (PDT)
X-Received: by 2002:a05:6830:4d0:: with SMTP id s16mr2135319otd.5.1621476497621;
Wed, 19 May 2021 19:08:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 19 May 2021 19:08:17 -0700 (PDT)
In-Reply-To: <4276cc4e-111c-4f2e-8d8e-c7370e0d481fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8827:4ed:9355:ebe6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8827:4ed:9355:ebe6
References: <4276cc4e-111c-4f2e-8d8e-c7370e0d481fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d94c4bf8-db29-4539-8d75-583a18225368n@googlegroups.com>
Subject: Re: Elastic pipeline cpu
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 20 May 2021 02:08:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Thu, 20 May 2021 02:08 UTC

On Wednesday, May 19, 2021 at 7:36:44 PM UTC-5, robf...@gmail.com wrote:
> I am trying to find information on elastic pipeline processors but not having much luck.
> Were there ever any built this way, or is it just theory? I am having difficulty
> with the scoreboarding of registers and how it is handled on a branch miss.
<
A lot of the time we just use the pipeline to store the destination register and its
valid bit. If the valid bit is zero, that 5-bit field does not participate in the interlock.
If you keep track of the register in a storage structure, you end up having to maintain
a bit vector of outstanding results, and when a branch is recovered, these are used
to zap the valid bits.
<
> If there is an instruction that tags a register as busy in the register fetch stage, I think that instruction should set the register as available in the commit stage regardless of whether or not the instruction is still valid.
<
You are forgetting forwarding. You want to be able to use the result register as an operand
at the instant its result is known. You cannot afford to wait for retirement or even commit.
<
Otherwise, wouldn’t the pipeline lock up waiting for the register to become available?
<
You can drain the pipeline and not write the results, too.
<
>An issue is then that invalidated instructions cannot be removed from the pipeline or they wouldn’t make registers available.
<
Obviously, this would not fly.
<
>On a branch miss I was simply going to clear the fifos, but it looks like that won’t work because that will leave registers marked as busy that shouldn’t be.
<
Yep your work is not done.
<
On the other hand if you are using a real CDC 6600-like scoreboard, you can tag the FU valid bits
with which branch shadow this falls under (unary of course), and when a branch is recovered,
the branch broadcasts its failure tag, and all the instructions self destruct, making the register
available.
<
>It seems like an elastic pipeline would be slow then as invalid instructions would have to percolate through the pipeline.
<
Do not confuse the elasticity with record keeping. An elastic pipe would make this problem
only a LITTLE harder. Record keeping in mandatory, and you can never be in a position this
fails.
<
> I have looked at some sample code of the handling of hazards but I think the code sample is too simple. In the commit stage the registers are flagged as available only if the instruction commits, I think this is too simple and will leave register flagged as busy that shouldn’t be.
> I think there needs to be a register busy history correlated to branch instructions.
<
This is the tag I mentioned above, but it can also be done with bit vectors.. On one great Big
Out of Order machine I kept a history table of the valid bits of the physical register file, and
on a branch recovery simply read them all out overwriting the present data, and presto the
machine was back where it was before those 6-instructions were decoded.
<
>IF there is a branch miss then the previous set of busy registers would be made active. It then gets complicated as busy counters would be used to avoid WAW hazards.
<
It is complicated, but it is also very simple.
<
At decode (or issue) time you record the current state of the physical register file valid
bits. The PRF is read by a CAM and written by a decoder. The current valid bits and the
data of the CAM have everything you need to know about the accessibility of the registers.
>
If an instruction writes to a register, you allocate a new PRF entry and write the CAM and
set the valid bit.
<
At the end of decode you create a second bit vector of the registers that become free if
this set of instructions retires.
<
So now we have the bit vector to use if the set of instructions retires and we have the
bit vector to use if the set of instructions does not retire. If we retire, registers get
dumped on the free list, if recovered, the valid bits in the PRF are overwritten and the
PRF is back where it should have been
<
Each set of instruction is issued into the execution window with a modulo number.
If a set of instruction retires, this number can be reused. If the set of instructions
is recovered, this "tag" is used to clear instructions out of the reservation stations
(and calculation units) so they can't deliver calculated values.
<
so you can issue instruction into the execution window when there is modulo number
available, when properly resourced, the PRF is never over capacity when there is said
modulo number, and there are reservation station entries when the modulo number
is available. So if the modulo number is available you can issue instructions into the
window. Easy peezy.
<
You can get into BIG trouble if you under-resource the PRF, the Stations, or the history
table.

Re: Elastic pipeline cpu

<0166008d-695b-4b8f-b324-e4e9d1f28825n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16962&group=comp.arch#16962

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:288:: with SMTP id l8mr4592552qvv.21.1621506263688;
Thu, 20 May 2021 03:24:23 -0700 (PDT)
X-Received: by 2002:a9d:2ee:: with SMTP id 101mr2069085otl.76.1621506263076;
Thu, 20 May 2021 03:24:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 20 May 2021 03:24:22 -0700 (PDT)
In-Reply-To: <d94c4bf8-db29-4539-8d75-583a18225368n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dc0:3700:5d02:845:fa0a:46d3;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dc0:3700:5d02:845:fa0a:46d3
References: <4276cc4e-111c-4f2e-8d8e-c7370e0d481fn@googlegroups.com> <d94c4bf8-db29-4539-8d75-583a18225368n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0166008d-695b-4b8f-b324-e4e9d1f28825n@googlegroups.com>
Subject: Re: Elastic pipeline cpu
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Thu, 20 May 2021 10:24:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: robf...@gmail.com - Thu, 20 May 2021 10:24 UTC

On Wednesday, May 19, 2021 at 10:08:19 PM UTC-4, MitchAlsup wrote:
> On Wednesday, May 19, 2021 at 7:36:44 PM UTC-5, robf...@gmail.com wrote:
> > I am trying to find information on elastic pipeline processors but not having much luck.
> > Were there ever any built this way, or is it just theory? I am having difficulty
> > with the scoreboarding of registers and how it is handled on a branch miss.
> <
> A lot of the time we just use the pipeline to store the destination register and its
> valid bit. If the valid bit is zero, that 5-bit field does not participate in the interlock.
> If you keep track of the register in a storage structure, you end up having to maintain
> a bit vector of outstanding results, and when a branch is recovered, these are used
> to zap the valid bits.
> <
> > If there is an instruction that tags a register as busy in the register fetch stage, I think that instruction should set the register as available in the commit stage regardless of whether or not the instruction is still valid.
> <
> You are forgetting forwarding. You want to be able to use the result register as an operand
> at the instant its result is known. You cannot afford to wait for retirement or even commit.
> <
> Otherwise, wouldn’t the pipeline lock up waiting for the register to become available?
> <
> You can drain the pipeline and not write the results, too.
> <
> >An issue is then that invalidated instructions cannot be removed from the pipeline or they wouldn’t make registers available.
> <
> Obviously, this would not fly.
> <
> >On a branch miss I was simply going to clear the fifos, but it looks like that won’t work because that will leave registers marked as busy that shouldn’t be.
> <
> Yep your work is not done.
> <
> On the other hand if you are using a real CDC 6600-like scoreboard, you can tag the FU valid bits
> with which branch shadow this falls under (unary of course), and when a branch is recovered,
> the branch broadcasts its failure tag, and all the instructions self destruct, making the register
> available.
> <
> >It seems like an elastic pipeline would be slow then as invalid instructions would have to percolate through the pipeline.
> <
> Do not confuse the elasticity with record keeping. An elastic pipe would make this problem
> only a LITTLE harder. Record keeping in mandatory, and you can never be in a position this
> fails.
> <
> > I have looked at some sample code of the handling of hazards but I think the code sample is too simple. In the commit stage the registers are flagged as available only if the instruction commits, I think this is too simple and will leave register flagged as busy that shouldn’t be.
> > I think there needs to be a register busy history correlated to branch instructions.
> <
> This is the tag I mentioned above, but it can also be done with bit vectors. On one great Big
> Out of Order machine I kept a history table of the valid bits of the physical register file, and
> on a branch recovery simply read them all out overwriting the present data, and presto the
> machine was back where it was before those 6-instructions were decoded.
> <
> >IF there is a branch miss then the previous set of busy registers would be made active. It then gets complicated as busy counters would be used to avoid WAW hazards.
> <
> It is complicated, but it is also very simple.
> <
> At decode (or issue) time you record the current state of the physical register file valid
> bits. The PRF is read by a CAM and written by a decoder. The current valid bits and the
> data of the CAM have everything you need to know about the accessibility of the registers.
> >
> If an instruction writes to a register, you allocate a new PRF entry and write the CAM and
> set the valid bit.
> <
> At the end of decode you create a second bit vector of the registers that become free if
> this set of instructions retires.
> <
> So now we have the bit vector to use if the set of instructions retires and we have the
> bit vector to use if the set of instructions does not retire. If we retire, registers get
> dumped on the free list, if recovered, the valid bits in the PRF are overwritten and the
> PRF is back where it should have been
> <
> Each set of instruction is issued into the execution window with a modulo number.
> If a set of instruction retires, this number can be reused. If the set of instructions
> is recovered, this "tag" is used to clear instructions out of the reservation stations
> (and calculation units) so they can't deliver calculated values.
> <
> so you can issue instruction into the execution window when there is modulo number
> available, when properly resourced, the PRF is never over capacity when there is said
> modulo number, and there are reservation station entries when the modulo number
> is available. So if the modulo number is available you can issue instructions into the
> window. Easy peezy.
> <
> You can get into BIG trouble if you under-resource the PRF, the Stations, or the history
> table.

Okay, it was a bit simpler than I thought. I was trying to follow sample code too closely.

The scoreboard now tracks which reorder buffer entry is the source for the register file data.
All the functional units write their results to the reorder buffer. So, the reorder buffer entry
can be tracked instead of the function unit. During decode the reorder id is written as the
register file source. During commit the register file is indicated as the register file source.

Fixed the operand bypassing I think. The operand, if coming from the reorder buffer, is now
loaded when the reorder buffer entry commit flag is set. The commit flag being set when the
functional unit is finished the instruction. So, as soon as the result is calculated it is usable
as an operand.

Functional units are fed by queues of the associated instructions so there is no need to wait
until the functional unit is not busy. The only stall is if the queue is full. All the queues are 64
entry as that is the number of entries that may be in a single LUT ram in the FPGA. Issuing
instructions is simple. If the queue is not full and there’s an available instruction with ready
operands, then it is queued. There is a state machine for the functional unit that deques the
next instruction when in the idle state.

Not sure branches will work properly. On a branch decode the register file source table is copied
to a history buffer indexed by the branch tag and the branch tag incremented. During execute
stage if there was a branch mis predict then the register file source table is restored from the
history buffer associated with the branch tag.

Subject	Author
Elastic pipeline cpu	robf...@gmail.com
Re: Elastic pipeline cpu	MitchAlsup
Re: Elastic pipeline cpu	robf...@gmail.com