Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
If you find that it is broken please let me know here rocksolid.nodes.help

Re: everything old is new again, Compare-and-branch vs PIC

Subject	Author
Compare-and-branch vs PIC	Russell Wallace
Re: Compare-and-branch vs PIC	Theo
Re: Compare-and-branch vs PIC	Theo
Re: Compare-and-branch vs PIC	MitchAlsup
Re: Compare-and-branch vs PIC	Stephen Fuld
Re: Compare-and-branch vs PIC	BGB
Re: Compare-and-branch vs PIC	MitchAlsup
Re: Compare-and-branch vs PIC	Russell Wallace
Re: Compare-and-branch vs PIC	BGB
Re: Compare-and-branch vs PIC	MitchAlsup
Re: Compare-and-branch vs PIC	robf...@gmail.com
Re: Compare-and-branch vs PIC	BGB
Re: Compare-and-branch vs PIC	MitchAlsup
Re: Compare-and-branch vs PIC	Scott Lurndal
Re: Compare-and-branch vs PIC	MitchAlsup
Re: Compare-and-branch vs PIC	BGB
Re: Compare-and-branch vs PIC	robf...@gmail.com
Re: Compare-and-branch vs PIC	EricP
Re: Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	John Levine
Re: everything old is new again, Compare-and-branch vs PIC	robf...@gmail.com
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	EricP
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	John Levine
Re: everything old is new again, Compare-and-branch vs PIC	Stephen Fuld
Re: 36 bit history, everything old is new again, Compare-and-branch vs PIC	John Levine
Re: 36 bit history, everything old is new again, Compare-and-branch	Stephen Fuld
Re: everything old is new again, Compare-and-branch vs PIC	EricP
Re: everything old is new again, Compare-and-branch vs PIC	John Levine
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	EricP
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	EricP
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Stephen Fuld
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	Terje Mathisen
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	Anton Ertl
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	robf...@gmail.com
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Stephen Fuld
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Scott Lurndal
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Terje Mathisen
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	Anton Ertl
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	Anton Ertl
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	EricP
Re: everything old is new again, Compare-and-branch vs PIC	Anton Ertl
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Terje Mathisen
Re: everything old is new again, Compare-and-branch vs PIC	Michael S
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Michael S
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Michael S
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	David Brown
Re: everything old is new again, Compare-and-branch vs PIC	BGB
Re: everything old is new again, Compare-and-branch vs PIC	Terje Mathisen
Re: everything old is new again, Compare-and-branch vs PIC	Anton Ertl
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	Michael S
Re: everything old is new again, Compare-and-branch vs PIC	Terje Mathisen
Re: everything old is new again, Compare-and-branch vs PIC	MitchAlsup
Re: everything old is new again, Compare-and-branch vs PIC	Terje Mathisen
Re: everything old is new again, Compare-and-branch vs PIC	Scott Lurndal
Re: everything old is new again, Compare-and-branch vs PIC	Tim Rentsch
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Thomas Koenig
Re: everything old is new again, Compare-and-branch vs PIC	Anton Ertl
Re: Compare-and-branch vs PIC	MitchAlsup
Re: Compare-and-branch vs PIC	luke.l...@gmail.com
Re: Compare-and-branch vs PIC	BGB

Pages:123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Re: everything old is new again, Compare-and-branch vs PIC

<1ad973f7-2b9c-48fc-a58a-421b63fbba3en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29471&group=comp.arch#29471

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:bd01:0:b0:6ec:53ab:90ee with SMTP id n1-20020a37bd01000000b006ec53ab90eemr86171941qkf.415.1670782252865;
Sun, 11 Dec 2022 10:10:52 -0800 (PST)
X-Received: by 2002:a05:6870:288e:b0:145:4fb:8b61 with SMTP id
gy14-20020a056870288e00b0014504fb8b61mr1081037oab.113.1670782252648; Sun, 11
Dec 2022 10:10:52 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 11 Dec 2022 10:10:52 -0800 (PST)
In-Reply-To: <bwnlL.106$%os8.36@fx03.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7896:327f:95e5:d94b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7896:327f:95e5:d94b
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1ad973f7-2b9c-48fc-a58a-421b63fbba3en@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 11 Dec 2022 18:10:52 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4558

by: MitchAlsup - Sun, 11 Dec 2022 18:10 UTC

On Sunday, December 11, 2022 at 10:25:47 AM UTC-6, EricP wrote:
> John Levine wrote:
> > According to EricP <ThatWould...@thevillage.com>:
> >> robf...@gmail.com wrote:
> >>> One complication is dealing with external interrupts that may happen. My first
> >>> thought is to simply not allow them in the shadow of a PRED instruction. It
> >>> would be delaying interrupt processing by seven or eight instructions max.
> >>> Otherwise, the PRED state would need to be saved somehow and restored after
> >>> the interrupt. Possibly by saving a copy of the interrupted instruction plus
> >>> predicate.
> >> An alternative is a PRED prefix that only applies to a single instruction.
> >
> > Fifty years ago we called those skip instructions.
> >
> > On a PDP-6, to put the larger of A and B into accumulator AC:
> >
> > MOVE AC,A ; put A into AC
> > CAMGE AC,B ; skip if AC >= B
> > MOVE AC,B ; put B into AC
> >
> Not really - skip is a very short branch, forward only.
>
> Skip/Branch removes intervening instructions from the pipeline
> and no resources are assigned for them.
> (Consider how skip would behave in an in-order pipeline without branch
> prediction - it would stall at fetch until its conditional resolved.)
>
> Predication keeps the pipeline running, feeding the predicated
> instructions into the pipeline and tentatively assigns resources
> to them in case they do execute.
<
There may be a short range (like 1 cycle) where the front end has
to guess whether the then-clause or the else-clause is to be executed,
by the following cycle that has resolved. And with the mask available
in the PRED instruction, it is easy for the front end to skip forward
over the then-clause to arrive at the else-clause without executing
any (but 1 might get nullified) instruction in the then-clause. So,
you don't put all the instructions in the pipeline.
<
Plus fixing up the rename table is harder than skipping over
the un-executed clause.
<
> Later if they do not execute it must patch the result environment
> so it looks like the disabled instructions never existed,
> which can possibly include propagating values to different locations
> and dynamically editing dependency chains accordingly,
> then clean up and recover the assigned resources.
> This must be done in such a way that it can take an interrupt/exception
> at any point and resolve to a predictable and restartable state.
>
> BTW on the simple cpu designs in days of yore, the advantage of
> skip conditional over an unconditional branch vs a conditional branch
> is for skip the HW instruction execute state machine always has the same
> state sequence and just changes the value added to IP, whereas BRcc has to
> change its whole state sequence depending on CC to perform/not-perform
> the offset fetch and add. So BRcc is more complex to implement than SKIPcc,
> and when you are paying per-gate for the cpu the cost difference matters.
<
The Front-End has to treat branches as resets--take this address and
fetch a bunch of instructions, whereas Predication tells the Front-End
Just continue using your current fetch strategy.

Re: everything old is new again, Compare-and-branch vs PIC

<tn59pj$22ql$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29472&group=comp.arch#29472

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Sun, 11 Dec 2022 19:07:31 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tn59pj$22ql$1@gal.iecc.com>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad>
Injection-Date: Sun, 11 Dec 2022 19:07:31 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="68437"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Sun, 11 Dec 2022 19:07 UTC

According to EricP <ThatWouldBeTelling@thevillage.com>:
>> Fifty years ago we called those skip instructions.

>Not really - skip is a very short branch, forward only.
>
>Skip/Branch removes intervening instructions from the pipeline
>and no resources are assigned for them.
>(Consider how skip would behave in an in-order pipeline without branch
>prediction - it would stall at fetch until its conditional resolved.)
>
>Predication keeps the pipeline running, feeding the predicated
>instructions into the pipeline and tentatively assigns resources
>to them in case they do execute.

Seems to me that's more of an implementation decision. When it sees
the skip why couldn't it treat that as a predicate on the next instruction?
I realize that when we designed machines with skip instructions, they
didn't have pipelines.

>BTW on the simple cpu designs in days of yore, the advantage of
>skip conditional over an unconditional branch vs a conditional branch
>is for skip the HW instruction execute state machine always has the same
>state sequence and just changes the value added to IP, whereas BRcc has to
>change its whole state sequence depending on CC to perform/not-perform
>the offset fetch and add. So BRcc is more complex to implement than SKIPcc,
>and when you are paying per-gate for the cpu the cost difference matters.

I don't ever recall seeing a machine that had condition codes and skip
instructions. The skip instructions all did some kind of comparison or
bit test. If you wanted the effect of a conditional jump you'd reverse
the test and skip over the jump. The point of the skip was that the
memory address in the skip instruction was the location to test or
(on PDP-6/10) sometimes an immediate operand or bit mask. There
wasn't room for a jump address.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: everything old is new again, Compare-and-branch vs PIC

<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29473&group=comp.arch#29473

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5a06:0:b0:3a6:57f0:2de6 with SMTP id n6-20020ac85a06000000b003a657f02de6mr66181211qta.674.1670786076527;
Sun, 11 Dec 2022 11:14:36 -0800 (PST)
X-Received: by 2002:a9d:184:0:b0:66e:c864:fcb1 with SMTP id
e4-20020a9d0184000000b0066ec864fcb1mr8192071ote.31.1670786076310; Sun, 11 Dec
2022 11:14:36 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 11 Dec 2022 11:14:36 -0800 (PST)
In-Reply-To: <bwnlL.106$%os8.36@fx03.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7896:327f:95e5:d94b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7896:327f:95e5:d94b
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 11 Dec 2022 19:14:36 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4178

by: MitchAlsup - Sun, 11 Dec 2022 19:14 UTC

On Sunday, December 11, 2022 at 10:25:47 AM UTC-6, EricP wrote:
> John Levine wrote:
> > According to EricP <ThatWould...@thevillage.com>:
> >> robf...@gmail.com wrote:
> >>> One complication is dealing with external interrupts that may happen. My first
> >>> thought is to simply not allow them in the shadow of a PRED instruction. It
> >>> would be delaying interrupt processing by seven or eight instructions max.
> >>> Otherwise, the PRED state would need to be saved somehow and restored after
> >>> the interrupt. Possibly by saving a copy of the interrupted instruction plus
> >>> predicate.
> >> An alternative is a PRED prefix that only applies to a single instruction.
> >
> > Fifty years ago we called those skip instructions.
> >
> > On a PDP-6, to put the larger of A and B into accumulator AC:
> >
> > MOVE AC,A ; put A into AC
> > CAMGE AC,B ; skip if AC >= B
> > MOVE AC,B ; put B into AC
> >
> Not really - skip is a very short branch, forward only.
>
> Skip/Branch removes intervening instructions from the pipeline
> and no resources are assigned for them.
> (Consider how skip would behave in an in-order pipeline without branch
> prediction - it would stall at fetch until its conditional resolved.)
>
> Predication keeps the pipeline running, feeding the predicated
> instructions into the pipeline and tentatively assigns resources
> to them in case they do execute.
<
This might be a reasonable model if every instruction has its own
condition and maybe a unique condition code. It is not a reasonable
model where one condition casts a then-shadow and an else-shadow
over a number of instructions.
<
> Later if they do not execute it must patch the result environment
> so it looks like the disabled instructions never existed,
<
Then there are the cases where the same registers are used differently
in the then and else clauses. Patching this stuff up is "non trivial".
<
> which can possibly include propagating values to different locations
> and dynamically editing dependency chains accordingly,
> then clean up and recover the assigned resources.
> This must be done in such a way that it can take an interrupt/exception
> at any point and resolve to a predictable and restartable state.
>
> BTW on the simple cpu designs in days of yore, the advantage of
> skip conditional over an unconditional branch vs a conditional branch
> is for skip the HW instruction execute state machine always has the same
> state sequence and just changes the value added to IP, whereas BRcc has to
> change its whole state sequence depending on CC to perform/not-perform
> the offset fetch and add. So BRcc is more complex to implement than SKIPcc,
> and when you are paying per-gate for the cpu the cost difference matters.

Re: everything old is new again, Compare-and-branch vs PIC

<tn7l6c$28r07$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29490&group=comp.arch#29490

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Mon, 12 Dec 2022 08:34:18 -0800
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <tn7l6c$28r07$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Dec 2022 16:34:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="24bb577af810ee4ebff4313310e58e91";
logging-data="2386951"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18U8PfCr9dpnvj7SyjudmyN8Q3XqBTE/P0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:4+VDk6HF4/SsnipybYbDo87Gdjw=
Content-Language: en-US
In-Reply-To: <tn59pj$22ql$1@gal.iecc.com>

by: Stephen Fuld - Mon, 12 Dec 2022 16:34 UTC

On 12/11/2022 11:07 AM, John Levine wrote:
> According to EricP <ThatWouldBeTelling@thevillage.com>:
>>> Fifty years ago we called those skip instructions.
>
>> Not really - skip is a very short branch, forward only.
>>
>> Skip/Branch removes intervening instructions from the pipeline
>> and no resources are assigned for them.
>> (Consider how skip would behave in an in-order pipeline without branch
>> prediction - it would stall at fetch until its conditional resolved.)
>>
>> Predication keeps the pipeline running, feeding the predicated
>> instructions into the pipeline and tentatively assigns resources
>> to them in case they do execute.
>
> Seems to me that's more of an implementation decision. When it sees
> the skip why couldn't it treat that as a predicate on the next instruction?
> I realize that when we designed machines with skip instructions, they
> didn't have pipelines.
>
>> BTW on the simple cpu designs in days of yore, the advantage of
>> skip conditional over an unconditional branch vs a conditional branch
>> is for skip the HW instruction execute state machine always has the same
>> state sequence and just changes the value added to IP, whereas BRcc has to
>> change its whole state sequence depending on CC to perform/not-perform
>> the offset fetch and add. So BRcc is more complex to implement than SKIPcc,
>> and when you are paying per-gate for the cpu the cost difference matters.
>
> I don't ever recall seeing a machine that had condition codes and skip
> instructions. The skip instructions all did some kind of comparison or
> bit test. If you wanted the effect of a conditional jump you'd reverse
> the test and skip over the jump. The point of the skip was that the
> memory address in the skip instruction was the location to test or
> (on PDP-6/10) sometimes an immediate operand or bit mask. There
> wasn't room for a jump address.

I didn't know much about the PDP-6/10 except that they were 36 bit
systems, so I spent a little time researching them. I was struck by the
similarity of the instruction formats, etc. to the Univac 1100 series
(as opposed to other 36 bit systems such as the IBM 704 or the Honeywell
DPS6).

That is a preface to saying that the 1108 had skip instructions (they
were called "Test" instructions in 1100 parlance), and conditional jump
instructions. It was pretty much as you say, conditional jump for things
like testing a register for zero, etc., but as operations like comparing
two operands required both operands, there were Test instructions as
there was not room in the instruction format for the two operands and a
jump target address.

As for condition codes, there were none for "operand compares", but
there were conditional jumps for "conditions" like overflow, FP
underflow, etc., which were kept as bits in the CPU's Processor Status
Register, which was not generally accessible by user programs.

The relative timings were interesting. Most "typical" instructions took
750ns. Note that this was before caches and basic memory access time was
also 750 ns. (Fetch next instruction was normally overlapped with
executing the current instruction.) A conditional jump instruction took
750 ns if not taken and 1,500 ns if taken (Required extra time to fetch
the instruction at the target address) A test instruction took 875 ns
if the next instruction was executed, but 1,625 ns if it was skipped. I
have forgotten why the "extra" time for test instructions.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: everything old is new again, Compare-and-branch vs PIC

<WuJlL.9382$MVg8.184@fx12.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29491&group=comp.arch#29491

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com>
In-Reply-To: <tn59pj$22ql$1@gal.iecc.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 59
Message-ID: <WuJlL.9382$MVg8.184@fx12.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 12 Dec 2022 17:26:14 UTC
Date: Mon, 12 Dec 2022 11:26:39 -0500
X-Received-Bytes: 3655

by: EricP - Mon, 12 Dec 2022 16:26 UTC

John Levine wrote:
> According to EricP <ThatWouldBeTelling@thevillage.com>:
>>> Fifty years ago we called those skip instructions.
>
>> Not really - skip is a very short branch, forward only.
>>
>> Skip/Branch removes intervening instructions from the pipeline
>> and no resources are assigned for them.
>> (Consider how skip would behave in an in-order pipeline without branch
>> prediction - it would stall at fetch until its conditional resolved.)
>>
>> Predication keeps the pipeline running, feeding the predicated
>> instructions into the pipeline and tentatively assigns resources
>> to them in case they do execute.
>
> Seems to me that's more of an implementation decision. When it sees
> the skip why couldn't it treat that as a predicate on the next instruction?
> I realize that when we designed machines with skip instructions, they
> didn't have pipelines.

Its cheapest to do a conditional increment on the IP.

>> BTW on the simple cpu designs in days of yore, the advantage of
>> skip conditional over an unconditional branch vs a conditional branch
>> is for skip the HW instruction execute state machine always has the same
>> state sequence and just changes the value added to IP, whereas BRcc has to
>> change its whole state sequence depending on CC to perform/not-perform
>> the offset fetch and add. So BRcc is more complex to implement than SKIPcc,
>> and when you are paying per-gate for the cpu the cost difference matters.
>
> I don't ever recall seeing a machine that had condition codes and skip
> instructions. The skip instructions all did some kind of comparison or
> bit test. If you wanted the effect of a conditional jump you'd reverse
> the test and skip over the jump. The point of the skip was that the
> memory address in the skip instruction was the location to test or
> (on PDP-6/10) sometimes an immediate operand or bit mask. There
> wasn't room for a jump address.

Skip on condition code is what I've seen.

PDP-8 has a skip instruction for various combination of zero & carry.
It also has increment memory and skip EQ/NE zero.

DG Nova has a 3-bit skip field on ALU instructions for various
carry and zero tests on the result.

RCA 1802 8-bit microprocessor has skip over 1 or 2 bytes instructions
on various carry and zero test. It also has branch conditional
instructions which is strange as skip should make them unnecessary.
But 1802 does lots of strange things so that is par for the course.
Like the short and long branches are 2 or 3 bytes but the skip is
over 1 or 2 bytes, so it really could only skip over a short branch.
The could have made it conditional skip over 1,2 or 3 bytes, but instead
they added two whole sets of short and long conditional branches.
And the branches are really jumps to short or long absolute addresses.
Ah, the 1970's.

Re: everything old is new again, Compare-and-branch vs PIC

<XuJlL.9383$MVg8.8632@fx12.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29492&group=comp.arch#29492

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
In-Reply-To: <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 55
Message-ID: <XuJlL.9383$MVg8.8632@fx12.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 12 Dec 2022 17:26:15 UTC
Date: Mon, 12 Dec 2022 12:25:49 -0500
X-Received-Bytes: 3508

by: EricP - Mon, 12 Dec 2022 17:25 UTC

MitchAlsup wrote:
> On Sunday, December 11, 2022 at 10:25:47 AM UTC-6, EricP wrote:
>>
>> Predication keeps the pipeline running, feeding the predicated
>> instructions into the pipeline and tentatively assigns resources
>> to them in case they do execute.
> <
> This might be a reasonable model if every instruction has its own
> condition and maybe a unique condition code. It is not a reasonable
> model where one condition casts a then-shadow and an else-shadow
> over a number of instructions.

An architecture with predication must have an internal execution mechanism
for enabling/disabling the various uOps based on matching a predicate
condition to a predicate value, and selecting what enable/disable actions
each uOp takes. That execution mechanism should be independent to how the
predicate conditions are distributed to the uOps.

One simple but size-costly distribution method would be a PRED prefix
for each macro instruction.

From that point of view the PRED shadow mask is a code space saving
method for mass distribution of the predicate condition to multiple uOps.
But the back end execution mechanism should remain the same.

> <
>> Later if they do not execute it must patch the result environment
>> so it looks like the disabled instructions never existed,
> <
> Then there are the cases where the same registers are used differently
> in the then and else clauses. Patching this stuff up is "non trivial".

As I see it, the only really viable predicate execute mechanism
(because the alternatives have exponential complexity expansion)
is to treat most uOps similar to a CMOV, where each has two actions:

- if the uOp is enabled it operates on its source registers
and writes its result as usual
- if the uOp is disabled it either (for in-order) patches the scoreboard
dependencies or (for OoO) copies the original dest register value.

Once the predicate value has resolved and is forwarded to all its
dependent uOps then those uOps can prune out source dependencies register
that are no longer relevant so they don't block execution.

The net result of the above is that all predicated instructions
always execute an action, usually either operate or copy.

> <
>> which can possibly include propagating values to different locations
>> and dynamically editing dependency chains accordingly,
>> then clean up and recover the assigned resources.
>> This must be done in such a way that it can take an interrupt/exception
>> at any point and resolve to a predictable and restartable state.

Re: everything old is new again, Compare-and-branch vs PIC

<tn7p66$1ah5$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29493&group=comp.arch#29493

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Mon, 12 Dec 2022 17:42:30 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tn7p66$1ah5$1@gal.iecc.com>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com> <WuJlL.9382$MVg8.184@fx12.iad>
Injection-Date: Mon, 12 Dec 2022 17:42:30 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="43557"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com> <WuJlL.9382$MVg8.184@fx12.iad>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Mon, 12 Dec 2022 17:42 UTC

According to EricP <ThatWouldBeTelling@thevillage.com>:
>Skip on condition code is what I've seen.
>
>PDP-8 has a skip instruction for various combination of zero & carry.
>It also has increment memory and skip EQ/NE zero.

I also spent a lot of time programming a PDP-8 and I can promise you it
did not have condition codes. It had a single accumulator, and a "link"
bit which was more or less a carry flag. The skip instructions looked
at the AC and link, e.g., skip on negative AC, skip on zero link.

ISZ increment and skip if zero was how you did loop counters. There
was no ISNZ.

>DG Nova has a 3-bit skip field on ALU instructions for various
>carry and zero tests on the result.

The Nova's skip instructions were similar to the PDP-8, not surprising
since the same guy designed them. Skip on carry, skip on zero register.
No condition codes either.

>RCA 1802 8-bit microprocessor has skip over 1 or 2 bytes instructions
>on various carry and zero test. It also has branch conditional
>instructions which is strange as skip should make them unnecessary.

Didn't use that one.

I did use the Varian 620i which had one- and two-word instructions,
but the skips only skipped one word. That led to some amusing bugs.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: 36 bit history, everything old is new again, Compare-and-branch vs PIC

<tn7qis$1fjr$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29495&group=comp.arch#29495

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: 36 bit history, everything old is new again, Compare-and-branch vs PIC
Date: Mon, 12 Dec 2022 18:06:20 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tn7qis$1fjr$1@gal.iecc.com>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com> <tn7l6c$28r07$1@dont-email.me>
Injection-Date: Mon, 12 Dec 2022 18:06:20 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="48763"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com> <tn7l6c$28r07$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Mon, 12 Dec 2022 18:06 UTC

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
>I didn't know much about the PDP-6/10 except that they were 36 bit
>systems, so I spent a little time researching them. I was struck by the
>similarity of the instruction formats, etc. to the Univac 1100 series
>(as opposed to other 36 bit systems such as the IBM 704 or the Honeywell
>DPS6).

The 704 was the ur-36 bit machine in 1954 that inspired everything
else. (The 701 was sort of the beta version and the 704 fixed many of
its mistakes like half-word addressing and Williams tube memory.) The
704 architecture was carried forward to the 709, 7090, and 7094, then
killed in favor of S/360 but not without a fight within IBM.

The DPB6 was a 16/32 bit machine so I'm guessing you mean the 6000
series which was the 36-bit GE 600 series, on which DTSS and Multics
ran. It had extremely complicated addressing but only a single AC and
MQ which in retrospect wasn't a great choice. It did have condition
codes, zero, negative, carry, overflow, and conditional branches on
them.

All of the 36 bit machines ran out of address bits. Some had extended
addressing hacks which didn't help enough.

>As for condition codes, there were none for "operand compares", but
>there were conditional jumps for "conditions" like overflow, FP
>underflow, etc., which were kept as bits in the CPU's Processor Status
>Register, which was not generally accessible by user programs.

We called them flags and didn't use them much. They weren't much like
condition codes, they were set when something like overflow happened
and stayed on until tested and reset by a JFCL or overwritten by a
JRST instruction. In all the time I programmed a PDP-10 I don't recall
writing a JFCL other than as a no-op and a JRST other than as an
unconditional jump. JSP AC,.+1 saved the flags in a register if you
wanted for some reason to test them without resetting them. Never did
that either.

>The relative timings were interesting. Most "typical" instructions took
>750ns. Note that this was before caches and basic memory access time was
>also 750 ns. (Fetch next instruction was normally overlapped with
>executing the current instruction.) A conditional jump instruction took
>750 ns if not taken and 1,500 ns if taken (Required extra time to fetch
>the instruction at the target address) A test instruction took 875 ns
>if the next instruction was executed, but 1,625 ns if it was skipped. I
>have forgotten why the "extra" time for test instructions.

Probably the same as for jump, it had to refill the prefetch buffer when
it didn't execute the skipped instruction.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: 36 bit history, everything old is new again, Compare-and-branch vs PIC

<tn7spi$29fr8$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29497&group=comp.arch#29497

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: 36 bit history, everything old is new again, Compare-and-branch
vs PIC
Date: Mon, 12 Dec 2022 10:44:00 -0800
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <tn7spi$29fr8$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<bwnlL.106$%os8.36@fx03.iad> <tn59pj$22ql$1@gal.iecc.com>
<tn7l6c$28r07$1@dont-email.me> <tn7qis$1fjr$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Dec 2022 18:44:02 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="24bb577af810ee4ebff4313310e58e91";
logging-data="2408296"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+DscskY1dCwGY2e93NEmLbFAHQtHZqENM="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:GMfX0y2POxgklwRuMUhNTuJurQk=
Content-Language: en-US
In-Reply-To: <tn7qis$1fjr$1@gal.iecc.com>

by: Stephen Fuld - Mon, 12 Dec 2022 18:44 UTC

On 12/12/2022 10:06 AM, John Levine wrote:
> According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
>> I didn't know much about the PDP-6/10 except that they were 36 bit
>> systems, so I spent a little time researching them. I was struck by the
>> similarity of the instruction formats, etc. to the Univac 1100 series
>> (as opposed to other 36 bit systems such as the IBM 704 or the Honeywell
>> DPS6).
>
> The 704 was the ur-36 bit machine in 1954 that inspired everything
> else.

The Univac 1103 was a 36 bit machine introduced in 1953, though it was
not compatible with later 1100 series machines.

https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series

(The 701 was sort of the beta version and the 704 fixed many of
> its mistakes like half-word addressing and Williams tube memory.) The
> 704 architecture was carried forward to the 709, 7090, and 7094, then
> killed in favor of S/360 but not without a fight within IBM.
>
> The DPB6 was a 16/32 bit machine so I'm guessing you mean the 6000
> series which was the 36-bit GE 600 series, on which DTSS and Multics
> ran.

Yes. My sloppy research. :-(

> It had extremely complicated addressing but only a single AC and
> MQ which in retrospect wasn't a great choice. It did have condition
> codes, zero, negative, carry, overflow, and conditional branches on
> them.
>
> All of the 36 bit machines ran out of address bits. Some had extended
> addressing hacks which didn't help enough.

I think it is fair to say that *all* computers introduced before say the
1980s, that survived, ran out of address bits. The 1100 series
introduced their Extended Mode, that survives (albeit in emulation) to
this day. I don't think that lack of address bits was the primary
reason for its decline.

I have often said that a reason for its decline was their marketing
department's utter failure to convince the world that 36 was an integral
power of 2. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: everything old is new again, Compare-and-branch vs PIC

<tn9i9j$2g22f$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29501&group=comp.arch#29501

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Tue, 13 Dec 2022 03:56:58 -0600
Organization: A noiseless patient Spider
Lines: 134
Message-ID: <tn9i9j$2g22f$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad>
<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 13 Dec 2022 09:57:07 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="8eb80b557ad796800edc26f345886394";
logging-data="2623567"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19K6xSqkrCWMnAEwuCnK3rV"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:/KSW+t5uQhTfhzvOel2KVutAqFU=
In-Reply-To: <XuJlL.9383$MVg8.8632@fx12.iad>
Content-Language: en-US

by: BGB - Tue, 13 Dec 2022 09:56 UTC

On 12/12/2022 11:25 AM, EricP wrote:
> MitchAlsup wrote:
>> On Sunday, December 11, 2022 at 10:25:47 AM UTC-6, EricP wrote:
>>>
>>> Predication keeps the pipeline running, feeding the predicated
>>> instructions into the pipeline and tentatively assigns resources to
>>> them in case they do execute.
>> <
>> This might be a reasonable model if every instruction has its own
>> condition and maybe a unique condition code. It is not a reasonable
>> model where one condition casts a then-shadow and an else-shadow
>> over a number of instructions.
>
> An architecture with predication must have an internal execution mechanism
> for enabling/disabling the various uOps based on matching a predicate
> condition to a predicate value, and selecting what enable/disable actions
> each uOp takes. That execution mechanism should be independent to how the
> predicate conditions are distributed to the uOps.
>

In my case, the predication mode travels along with the opcode, and once
it moves into EX1, it gets remapped if needed.

Most normal operations turn into a NOP operation. Branches split into
BRA and BRA_NB (Branch Non-Branch). The main purpose of BRA_NB is that
if the branch-predictor had predicted a branch as taken, then BRA_NB
will initiate a branch to the following instruction (essentially a "BRA 0").

> One simple but size-costly distribution method would be a PRED prefix
> for each macro instruction.
>
> From that point of view the PRED shadow mask is a code space saving
> method for mass distribution of the predicate condition to multiple uOps.
> But the back end execution mechanism should remain the same.
>

As noted, I went with essentially dedicating 2 bits to this in every
instruction.

This is a little less than 32-bit ARM, albeit there are 3 bits spent on
encoding the 16/32 split.

Well, apart from 7xxx and 9xxx, which are more a case of these spaces
having been "mostly" unused in the 16-bit opcode map (whereas most other
options would have broken binary compatibility).

I had originally almost considered making XGPR being a special operating
mode that would have essentially reclaimed all of the 16-bit space, say:
wZnm-ZeoZ
wZnm-Zeii

Where the high E/F is has the '111' bits replaced by the (inverted) high
bits of the 3 register fields.

This encoding would have mostly avoided the orthogonality issues, but I
ended up choosing the "more conservative" option.

But, faced with some other issues, I am almost left to wonder if going
that route might have been better in the long run.

It is possible that I could still consider this option.

Pros: Regains some amount of orthogonality.
Cons: Operating mode hassle, complete loss of 16-bit ops in this mode.

So, say:
ADD?T R17, 0x123, R37 | SUB?F R17, 0x145, R37

Could then be encoded as two 32-bit instructions.

Handling would be otherwise similar to that of RISC-V mode; may need a
different PE/COFF machine-ID so that the PE loader knows to start the
program in this mode. Calling between modes would be slightly less of an
issue than BJX2 <-> RISC-V, since at least both sides will (more or
less) agree on the same ABI.

Basically, it is similar to the ARM32<->Thumb situation.

A mode switch being needed because (for fairly obvious reasons), this
would otherwise break binary compatibility in a pretty major way.

>> <
>>> Later if they do not execute it must patch the result environment so
>>> it looks like the disabled instructions never existed,
>> <
>> Then there are the cases where the same registers are used differently
>> in the then and else clauses. Patching this stuff up is "non trivial".
>
> As I see it, the only really viable predicate execute mechanism
> (because the alternatives have exponential complexity expansion)
> is to treat most uOps similar to a CMOV, where each has two actions:
>
> - if the uOp is enabled it operates on its source registers
> and writes its result as usual
> - if the uOp is disabled it either (for in-order) patches the scoreboard
> dependencies or (for OoO) copies the original dest register value.
>
> Once the predicate value has resolved and is forwarded to all its
> dependent uOps then those uOps can prune out source dependencies register
> that are no longer relevant so they don't block execution.
>
> The net result of the above is that all predicated instructions
> always execute an action, usually either operate or copy.
>

Probably depends on the design of the core.

In my design, simply turning them into NOPs is sufficient.

Granted, my core is strictly in-order, and there is no scoreboard. So,
the NOPs are just "empty spots" in the pipeline.

As with NOPs, these instructions may still eat clock-cycles.
Because interlocking happens before they enter the EX stage, they may
also still trigger interlock stalls on other instructions.

>> <
>>> which can possibly include propagating values to different locations
>>> and dynamically editing dependency chains accordingly, then clean up
>>> and recover the assigned resources. This must be done in such a way
>>> that it can take an interrupt/exception at any point and resolve to a
>>> predictable and restartable state.
>

Re: everything old is new again, Compare-and-branch vs PIC

<T44mL.71538$gGD7.48801@fx11.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29508&group=comp.arch#29508

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com> <XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
In-Reply-To: <tn9i9j$2g22f$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 63
Message-ID: <T44mL.71538$gGD7.48801@fx11.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 13 Dec 2022 19:08:35 UTC
Date: Tue, 13 Dec 2022 14:07:29 -0500
X-Received-Bytes: 3689

by: EricP - Tue, 13 Dec 2022 19:07 UTC

BGB wrote:
> On 12/12/2022 11:25 AM, EricP wrote:
>>
>> As I see it, the only really viable predicate execute mechanism
>> (because the alternatives have exponential complexity expansion)
>> is to treat most uOps similar to a CMOV, where each has two actions:
>>
>> - if the uOp is enabled it operates on its source registers
>> and writes its result as usual
>> - if the uOp is disabled it either (for in-order) patches the scoreboard
>> dependencies or (for OoO) copies the original dest register value.
>>
>> Once the predicate value has resolved and is forwarded to all its
>> dependent uOps then those uOps can prune out source dependencies register
>> that are no longer relevant so they don't block execution.
>>
>> The net result of the above is that all predicated instructions
>> always execute an action, usually either operate or copy.
>>
>
> Probably depends on the design of the core.
>
> In my design, simply turning them into NOPs is sufficient.
>
> Granted, my core is strictly in-order, and there is no scoreboard. So,
> the NOPs are just "empty spots" in the pipeline.
>
> As with NOPs, these instructions may still eat clock-cycles.
> Because interlocking happens before they enter the EX stage, they may
> also still trigger interlock stalls on other instructions.

WRT interlocking, yes that is exactly my point - they are not really
NOPs because in-order still has to track RAW and WAW register
dependencies and diddle the tracking flip-flops.

Say I have a predicate prefix that tests if the lsb of a register
matches a predicate condition. Eg
ET rN: Execute True prefix enables if rN[0] == 1
EF rN: Execute False prefix enables if rN[0] == 0

#1 CMP r0 = r9 > 1234 // set/clear r0 lsb
#2 ET r0: ADD r3 = r2 + r1 // enable if r0[0] == 1
#3 ADD r5 = r4 + r3

In the above instruction #2 the prefix tests lsb of r0 and
enables if it matches the T==1 or F==0 predicate condition.
#2 is RAW dependent on r0 for the predicate value
and may additionally be RAW on r2 and r1 if enabled.
#3 may be RAW dependent on r3 until r0 is resolved.

Either (a) the pipeline stalls #2 at Reg Read stage until r0 resolves,
or (b) it reads r2 and r1 anyway and marks r3 as Write Pending.
If it chooses (b) then #3 will RAW stall on r3, and if later r0 resolves
and disables #2 then #2 has to reset the Write Pending flag on r3
allowing #3 to continue execution.

In both cases instruction #2 either stalls for the r0 value or
proceeds to perform work like register read and then does housekeeping
on tracking flags to patch up after.
In both cases a disabled instruction still has visible side effects
so is not a NOP.

Re: everything old is new again, Compare-and-branch vs PIC

<tnala2$2j5mv$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29510&group=comp.arch#29510

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Tue, 13 Dec 2022 13:54:32 -0600
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <tnala2$2j5mv$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad>
<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 13 Dec 2022 19:54:42 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="8eb80b557ad796800edc26f345886394";
logging-data="2725599"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18X+8WR+Sc2BTS4pHDnGff7"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:VF5vpkrsFu8Ru04OYWDOYTrH/gM=
Content-Language: en-US
In-Reply-To: <T44mL.71538$gGD7.48801@fx11.iad>

by: BGB - Tue, 13 Dec 2022 19:54 UTC

On 12/13/2022 1:07 PM, EricP wrote:
> BGB wrote:
>> On 12/12/2022 11:25 AM, EricP wrote:
>>>
>>> As I see it, the only really viable predicate execute mechanism
>>> (because the alternatives have exponential complexity expansion)
>>> is to treat most uOps similar to a CMOV, where each has two actions:
>>>
>>> - if the uOp is enabled it operates on its source registers
>>>    and writes its result as usual
>>> - if the uOp is disabled it either (for in-order) patches the scoreboard
>>>    dependencies or (for OoO) copies the original dest register value.
>>>
>>> Once the predicate value has resolved and is forwarded to all its
>>> dependent uOps then those uOps can prune out source dependencies
>>> register
>>> that are no longer relevant so they don't block execution.
>>>
>>> The net result of the above is that all predicated instructions
>>> always execute an action, usually either operate or copy.
>>>
>>
>> Probably depends on the design of the core.
>>
>> In my design, simply turning them into NOPs is sufficient.
>>
>> Granted, my core is strictly in-order, and there is no scoreboard. So,
>> the NOPs are just "empty spots" in the pipeline.
>>
>> As with NOPs, these instructions may still eat clock-cycles.
>> Because interlocking happens before they enter the EX stage, they may
>> also still trigger interlock stalls on other instructions.
>
> WRT interlocking, yes that is exactly my point - they are not really
> NOPs because in-order still has to track RAW and WAW register
> dependencies and diddle the tracking flip-flops.
>
> Say I have a predicate prefix that tests if the lsb of a register
> matches a predicate condition. Eg
> ET rN:   Execute True prefix enables if rN[0] == 1
> EF rN:   Execute False prefix enables if rN[0] == 0
>
> #1         CMP r0 = r9 > 1234 // set/clear r0 lsb
> #2 ET r0: ADD r3 = r2 + r1    // enable if r0[0] == 1
> #3         ADD r5 = r4 + r3
>
> In the above instruction #2 the prefix tests lsb of r0 and
> enables if it matches the T==1 or F==0 predicate condition.
> #2 is RAW dependent on r0 for the predicate value
> and may additionally be RAW on r2 and r1 if enabled.
> #3 may be RAW dependent on r3 until r0 is resolved.
>
> Either (a) the pipeline stalls #2 at Reg Read stage until r0 resolves,
> or (b) it reads r2 and r1 anyway and marks r3 as Write Pending.
> If it chooses (b) then #3 will RAW stall on r3, and if later r0 resolves
> and disables #2 then #2 has to reset the Write Pending flag on r3
> allowing #3 to continue execution.
>
> In both cases instruction #2 either stalls for the r0 value or
> proceeds to perform work like register read and then does housekeeping
> on tracking flags to patch up after.
> In both cases a disabled instruction still has visible side effects
> so is not a NOP.
>

Sorta, I guess...

It is a NOP in terms of "high level behavior" or "How EX stages see it".

Or, not necessarily a NOP if one demands that "Every NOP eats exactly 1
clock cycle". These may eat multiple clock cycles for any register
dependencies that may have otherwise occurred (since, this part happens
before it enters EX1, and thus before it can be known whether or not the
instruction will execute).

In theory, it could be possible to shortcut the predication during
ID1->ID2 by looking forward and verifying that nothing in the ID2 or EX1
stages can touch SR.T, and if so using the prior value of SR.T, however,
this isn't likely to save enough clock cycles to make it worth the extra
cost.

....

Re: everything old is new again, Compare-and-branch vs PIC

<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29511&group=comp.arch#29511

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:fdc4:0:b0:4d8:42df:19d8 with SMTP id g4-20020a0cfdc4000000b004d842df19d8mr1014691qvs.126.1670964796564;
Tue, 13 Dec 2022 12:53:16 -0800 (PST)
X-Received: by 2002:a05:6870:2f05:b0:132:6f79:9ffb with SMTP id
qj5-20020a0568702f0500b001326f799ffbmr12943oab.61.1670964796275; Tue, 13 Dec
2022 12:53:16 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Dec 2022 12:53:16 -0800 (PST)
In-Reply-To: <tnala2$2j5mv$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d63:36fb:50ef:7088;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d63:36fb:50ef:7088
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Dec 2022 20:53:16 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 6203

by: MitchAlsup - Tue, 13 Dec 2022 20:53 UTC

On Tuesday, December 13, 2022 at 1:54:45 PM UTC-6, BGB wrote:
> On 12/13/2022 1:07 PM, EricP wrote:
> > BGB wrote:
> >> On 12/12/2022 11:25 AM, EricP wrote:
> >>>
> >>> As I see it, the only really viable predicate execute mechanism
> >>> (because the alternatives have exponential complexity expansion)
> >>> is to treat most uOps similar to a CMOV, where each has two actions:
> >>>
> >>> - if the uOp is enabled it operates on its source registers
> >>> and writes its result as usual
> >>> - if the uOp is disabled it either (for in-order) patches the scoreboard
> >>> dependencies or (for OoO) copies the original dest register value.
> >>>
> >>> Once the predicate value has resolved and is forwarded to all its
> >>> dependent uOps then those uOps can prune out source dependencies
> >>> register
> >>> that are no longer relevant so they don't block execution.
> >>>
> >>> The net result of the above is that all predicated instructions
> >>> always execute an action, usually either operate or copy.
> >>>
> >>
> >> Probably depends on the design of the core.
> >>
> >> In my design, simply turning them into NOPs is sufficient.
> >>
> >> Granted, my core is strictly in-order, and there is no scoreboard. So,
> >> the NOPs are just "empty spots" in the pipeline.
> >>
> >> As with NOPs, these instructions may still eat clock-cycles.
> >> Because interlocking happens before they enter the EX stage, they may
> >> also still trigger interlock stalls on other instructions.
> >
> > WRT interlocking, yes that is exactly my point - they are not really
> > NOPs because in-order still has to track RAW and WAW register
> > dependencies and diddle the tracking flip-flops.
> >
> > Say I have a predicate prefix that tests if the lsb of a register
> > matches a predicate condition. Eg
> > ET rN: Execute True prefix enables if rN[0] == 1
> > EF rN: Execute False prefix enables if rN[0] == 0
> >
> > #1 CMP r0 = r9 > 1234 // set/clear r0 lsb
> > #2 ET r0: ADD r3 = r2 + r1 // enable if r0[0] == 1
> > #3 ADD r5 = r4 + r3
> >
> > In the above instruction #2 the prefix tests lsb of r0 and
> > enables if it matches the T==1 or F==0 predicate condition.
> > #2 is RAW dependent on r0 for the predicate value
> > and may additionally be RAW on r2 and r1 if enabled.
> > #3 may be RAW dependent on r3 until r0 is resolved.
> >
> > Either (a) the pipeline stalls #2 at Reg Read stage until r0 resolves,
> > or (b) it reads r2 and r1 anyway and marks r3 as Write Pending.
> > If it chooses (b) then #3 will RAW stall on r3, and if later r0 resolves
> > and disables #2 then #2 has to reset the Write Pending flag on r3
> > allowing #3 to continue execution.
> >
> > In both cases instruction #2 either stalls for the r0 value or
> > proceeds to perform work like register read and then does housekeeping
> > on tracking flags to patch up after.
> > In both cases a disabled instruction still has visible side effects
> > so is not a NOP.
> >
> Sorta, I guess...
>
> It is a NOP in terms of "high level behavior" or "How EX stages see it".
>
> Or, not necessarily a NOP if one demands that "Every NOP eats exactly 1
> clock cycle".
<
On a 10-wide machine, does a NoOp eat exactly 1 cycle ?? and what does
1 cycle mean on a 10-wide machine ??
<
> These may eat multiple clock cycles for any register
> dependencies that may have otherwise occurred (since, this part happens
> before it enters EX1, and thus before it can be known whether or not the
> instruction will execute).
<
On GBOoO machines, many to most instructions to not appear to add any
latency to the rest of the executing instructions !! This makes the above
definition intractable.
>
>
> In theory, it could be possible to shortcut the predication during
> ID1->ID2 by looking forward and verifying that nothing in the ID2 or EX1
> stages can touch SR.T, and if so using the prior value of SR.T, however,
> this isn't likely to save enough clock cycles to make it worth the extra
> cost.
<
The was predication is defined my My 66000 is that the front end is
free to skip all instructions that are not supposed to be executed--but
this comes with a restriction that all of the instructions can be categorized
into exactly 1 of 2 groups (then-clause and else-clause). I started out by
allowing instructions to be common, but the compiler could not figure
out how to use and we simplified the encoding--I may further simplify
the encoding from here........
>
> ...

Re: everything old is new again, Compare-and-branch vs PIC

<tnba86$2kho9$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29512&group=comp.arch#29512

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Tue, 13 Dec 2022 19:51:57 -0600
Organization: A noiseless patient Spider
Lines: 226
Message-ID: <tnba86$2kho9$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad>
<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 14 Dec 2022 01:52:06 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="4bc0f3af3433c271a0af7f51f322cf2c";
logging-data="2770697"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oEFJ/pZ3y0efK9CyEAjy0"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:okf/A27hPSBQns8N5BpJgXv7Fh0=
In-Reply-To: <ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>
Content-Language: en-US

by: BGB - Wed, 14 Dec 2022 01:51 UTC

On 12/13/2022 2:53 PM, MitchAlsup wrote:
> On Tuesday, December 13, 2022 at 1:54:45 PM UTC-6, BGB wrote:
>> On 12/13/2022 1:07 PM, EricP wrote:
>>> BGB wrote:
>>>> On 12/12/2022 11:25 AM, EricP wrote:
>>>>>
>>>>> As I see it, the only really viable predicate execute mechanism
>>>>> (because the alternatives have exponential complexity expansion)
>>>>> is to treat most uOps similar to a CMOV, where each has two actions:
>>>>>
>>>>> - if the uOp is enabled it operates on its source registers
>>>>> and writes its result as usual
>>>>> - if the uOp is disabled it either (for in-order) patches the scoreboard
>>>>> dependencies or (for OoO) copies the original dest register value.
>>>>>
>>>>> Once the predicate value has resolved and is forwarded to all its
>>>>> dependent uOps then those uOps can prune out source dependencies
>>>>> register
>>>>> that are no longer relevant so they don't block execution.
>>>>>
>>>>> The net result of the above is that all predicated instructions
>>>>> always execute an action, usually either operate or copy.
>>>>>
>>>>
>>>> Probably depends on the design of the core.
>>>>
>>>> In my design, simply turning them into NOPs is sufficient.
>>>>
>>>> Granted, my core is strictly in-order, and there is no scoreboard. So,
>>>> the NOPs are just "empty spots" in the pipeline.
>>>>
>>>> As with NOPs, these instructions may still eat clock-cycles.
>>>> Because interlocking happens before they enter the EX stage, they may
>>>> also still trigger interlock stalls on other instructions.
>>>
>>> WRT interlocking, yes that is exactly my point - they are not really
>>> NOPs because in-order still has to track RAW and WAW register
>>> dependencies and diddle the tracking flip-flops.
>>>
>>> Say I have a predicate prefix that tests if the lsb of a register
>>> matches a predicate condition. Eg
>>> ET rN: Execute True prefix enables if rN[0] == 1
>>> EF rN: Execute False prefix enables if rN[0] == 0
>>>
>>> #1 CMP r0 = r9 > 1234 // set/clear r0 lsb
>>> #2 ET r0: ADD r3 = r2 + r1 // enable if r0[0] == 1
>>> #3 ADD r5 = r4 + r3
>>>
>>> In the above instruction #2 the prefix tests lsb of r0 and
>>> enables if it matches the T==1 or F==0 predicate condition.
>>> #2 is RAW dependent on r0 for the predicate value
>>> and may additionally be RAW on r2 and r1 if enabled.
>>> #3 may be RAW dependent on r3 until r0 is resolved.
>>>
>>> Either (a) the pipeline stalls #2 at Reg Read stage until r0 resolves,
>>> or (b) it reads r2 and r1 anyway and marks r3 as Write Pending.
>>> If it chooses (b) then #3 will RAW stall on r3, and if later r0 resolves
>>> and disables #2 then #2 has to reset the Write Pending flag on r3
>>> allowing #3 to continue execution.
>>>
>>> In both cases instruction #2 either stalls for the r0 value or
>>> proceeds to perform work like register read and then does housekeeping
>>> on tracking flags to patch up after.
>>> In both cases a disabled instruction still has visible side effects
>>> so is not a NOP.
>>>
>> Sorta, I guess...
>>
>> It is a NOP in terms of "high level behavior" or "How EX stages see it".
>>
>> Or, not necessarily a NOP if one demands that "Every NOP eats exactly 1
>> clock cycle".
> <
> On a 10-wide machine, does a NoOp eat exactly 1 cycle ?? and what does
> 1 cycle mean on a 10-wide machine ??
> <

This was a response to the assertion that a NOP from a predicated-away
instruction was not a true NOP because it still has to deal with
register dependencies and similar (so, more like a phantom version of
the instruction, rather than a "true NOP").

The only real alternative then, is to assume that a NOP has a known
latency (say, 1 cycle).

But, yeah, 0 cycles is another possible interpretation, but 0 cycle NOPs
aren't really a workable definition for an in-order machine.

>> These may eat multiple clock cycles for any register
>> dependencies that may have otherwise occurred (since, this part happens
>> before it enters EX1, and thus before it can be known whether or not the
>> instruction will execute).
> <
> On GBOoO machines, many to most instructions to not appear to add any
> latency to the rest of the executing instructions !! This makes the above
> definition intractable.

Possibly, I was thinking for a simple in-order machine.

Instruction latency on GBOoO is more a "throw hands in the air and say
'who knows'?" thing.

Then again, I guess it was interesting to observe that some types of
optimizations which help on BJX2 also help on x86-64, despite x86-64
having not enough registers to actually make use of the optimization as
expressed (x86-64 seems to be able to pretend as-if the stack
spills/fills were registers).

Meanwhile, I am left needing to use lots of registers, and occasional
optimizations which try to side-step a few (store then load) cases, then
recently needing to fix a bug where one of these optimizations broke due
to a type-handling issue in my compiler (was using the type of the value
stored to an array, rather than the type of the value as loaded from the
array).

Well, along with other issues, like local variables having their storage
being culled, then missing some of these variables were stored to but
not loaded from, resulting in them having a wild-card offset relative to
the stack-pointer (thus leading to some annoying stack-corruption bugs).

....

>>
>>
>> In theory, it could be possible to shortcut the predication during
>> ID1->ID2 by looking forward and verifying that nothing in the ID2 or EX1
>> stages can touch SR.T, and if so using the prior value of SR.T, however,
>> this isn't likely to save enough clock cycles to make it worth the extra
>> cost.
> <
> The was predication is defined my My 66000 is that the front end is
> free to skip all instructions that are not supposed to be executed--but
> this comes with a restriction that all of the instructions can be categorized
> into exactly 1 of 2 groups (then-clause and else-clause). I started out by
> allowing instructions to be common, but the compiler could not figure
> out how to use and we simplified the encoding--I may further simplify
> the encoding from here........

Nothing requires that they are not skipped in my case either, just that
this is beyond what my current implementation can manage.

Rules are mostly similar to normal instructions, just with the
difference that ?T and ?F instructions are allowed to have register
conflicts with each other in bundles on account of them being mutually
exclusive.

BTW: I put up a vote on Twitter wanting to see what the general
sentiment was on possible ways to "resolve" the encoding orthogonality
issues with R32..R63:
https://twitter.com/cr88192/status/1602801136590782466

Partly, because "add a new CPU mode and sort of half-way address the
issue" is kind of, not exactly a clear win...

The new encoding would at least make R32..R63 able to be able to be used
in all the same contexts as R0..R31, but still can't address some of the
other orthogonality issues, mostly because there is basically no way
possible to fit everything I would want into a 32 bit instruction word.

I have already experimented with adding some of the features needed for
the new mode to the Verilog, and everything seems to survive (mostly
involves some minor tweaks to the handling of the Link Register and
similar, increasing the number of CPU modes from 4 to 16).

Say:
0: BJX2 + Scalar (16/32)
1: BJX2 + WEX (16/32/64/96)
2: RISC-V (Generic 16/32)
3: RISC-V + WEX (TBD)
4: BJX2_XGP2 + Scalar (32)
5: BJX2_XGP2 + WEX (32/64/96)
6..15: Reserved

This will come at the cost that the status-flag bits for SIMD comparison
(PQRO) will no longer be preserved across function calls (replaced by
additional CPU-mode bits).

Was also tempted to replace '3', noting that this mode is probably DOA
(in favor of handling RISC-V as superscalar). Mode-3 would have required
a load-time bundle conversion, which in turn requires additional
metadata to do safely.

They will also have new Machine IDs in PEL4, say:
B232: Start in Mode 0|1 (32-bit pointers)
B264: Start in Mode 0|1 (64-bit pointers)
B296: Start in Mode 0|1 (but uses 128-bit pointers)
B265: Start in Mode 4|5 (64-bit pointers)
B297: Start in Mode 4|5 (128-bit pointers)

Click here to read the complete article

Re: everything old is new again, Compare-and-branch vs PIC

<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29513&group=comp.arch#29513

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:711:0:b0:6fe:c86a:c1c4 with SMTP id 17-20020a370711000000b006fec86ac1c4mr11967798qkh.518.1670991586220;
Tue, 13 Dec 2022 20:19:46 -0800 (PST)
X-Received: by 2002:a05:6870:e251:b0:148:1264:1054 with SMTP id
d17-20020a056870e25100b0014812641054mr82975oac.9.1670991585634; Tue, 13 Dec
2022 20:19:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Dec 2022 20:19:45 -0800 (PST)
In-Reply-To: <tnba86$2kho9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ac13:15b9:822f:bbc8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ac13:15b9:822f:bbc8
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com> <tnba86$2kho9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Dec 2022 04:19:46 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3784

by: MitchAlsup - Wed, 14 Dec 2022 04:19 UTC

On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
> On 12/13/2022 2:53 PM, MitchAlsup wrote:

> Then again, I guess it was interesting to observe that some types of
> optimizations which help on BJX2 also help on x86-64, despite x86-64
> having not enough registers to actually make use of the optimization as
> expressed (x86-64 seems to be able to pretend as-if the stack
> spills/fills were registers).
>
The stack on x86 is defined to always be aligned to the size of the GPRs.
>
> Meanwhile, I am left needing to use lots of registers, and occasional
> optimizations which try to side-step a few (store then load) cases, then
> recently needing to fix a bug where one of these optimizations broke due
> to a type-handling issue in my compiler (was using the type of the value
> stored to an array, rather than the type of the value as loaded from the
> array).
<
If there is only 1 set of registers, this problem vanishes.
> <snip>
> Nothing requires that they are not skipped in my case either, just that
> this is beyond what my current implementation can manage.
>
This was a discussion about NoOps and the necessity of having them in ISA.
If they are present they have to have some defined meaning, EricP and I
are trying to ascertain exactly what the prescription needs to be. Being able
to be skipped fails several of the assertions in this thread.
<
> Rules are mostly similar to normal instructions, just with the
> difference that ?T and ?F instructions are allowed to have register
> conflicts with each other in bundles on account of them being mutually
> exclusive.
>
>
>
> BTW: I put up a vote on Twitter wanting to see what the general
> sentiment was on possible ways to "resolve" the encoding orthogonality
> issues with R32..R63:
> https://twitter.com/cr88192/status/1602801136590782466
>
There was a recent poll on some site and 56% of Americans do not think
that Arabic numerals should be taught in schools, too; and 15% don't have
an opinion.
https://www.snopes.com/fact-check/teaching-arabic-numerals/
We are well on the way to Idiocracy.
>

Re: everything old is new again, Compare-and-branch vs PIC

<tnbvfc$2oj2d$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29514&group=comp.arch#29514

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Wed, 14 Dec 2022 01:54:11 -0600
Organization: A noiseless patient Spider
Lines: 205
Message-ID: <tnbvfc$2oj2d$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad>
<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>
<tnba86$2kho9$1@dont-email.me>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 14 Dec 2022 07:54:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="4bc0f3af3433c271a0af7f51f322cf2c";
logging-data="2903117"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19qk/eWVK5QqlkTmEl2fzWf"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:8wd3ChER9w5WVqj4x5ue8eIJx20=
Content-Language: en-US
In-Reply-To: <6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>

by: BGB - Wed, 14 Dec 2022 07:54 UTC

On 12/13/2022 10:19 PM, MitchAlsup wrote:
> On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
>> On 12/13/2022 2:53 PM, MitchAlsup wrote:
>
>> Then again, I guess it was interesting to observe that some types of
>> optimizations which help on BJX2 also help on x86-64, despite x86-64
>> having not enough registers to actually make use of the optimization as
>> expressed (x86-64 seems to be able to pretend as-if the stack
>> spills/fills were registers).
>>
> The stack on x86 is defined to always be aligned to the size of the GPRs.

Yeah. As noted, current ABI rules generally enforce a 16B alignment for
the stack on BJX2 as well (with stack items generally following their
"natural alignment"). CPU isn't smart enough to treat memory as
registers though.

>>
>> Meanwhile, I am left needing to use lots of registers, and occasional
>> optimizations which try to side-step a few (store then load) cases, then
>> recently needing to fix a bug where one of these optimizations broke due
>> to a type-handling issue in my compiler (was using the type of the value
>> stored to an array, rather than the type of the value as loaded from the
>> array).
> <
> If there is only 1 set of registers, this problem vanishes.

There is only one type of register in my case, but this particular issue
was closer to the C level rather than the ISA level.

When storing something to an array, and then loading it back again, it
was reported as the type of the value that had been stored rather than
the type of the array element.

So, say, if one tries storing "int" to a "short*" array, or "void*" to a
"Foo*" array, everything is fine. But, if a load returns 'int' or
'void*' rather than the intended type, this may lead to other problems.

Options are one of:
Types match exactly, allow it through as-is;
Types are "compatible", quietly coerce the type;
Quietly convert it into a type-cast operation;
Actually do the load.

>> <snip>
>> Nothing requires that they are not skipped in my case either, just that
>> this is beyond what my current implementation can manage.
>>
> This was a discussion about NoOps and the necessity of having them in ISA.
> If they are present they have to have some defined meaning, EricP and I
> are trying to ascertain exactly what the prescription needs to be. Being able
> to be skipped fails several of the assertions in this thread.

Fair enough.

Probably depends some on the role and context:
As passing (in the ISA);
For burning clock-cycles;
Or, as an empty-spot within the pipeline (resulting either from an
interlock or not otherwise having an instruction to run in that location
at that time).

Within my pipeline, major-opcode 6'h00 is regarded as NOP.
There is another class of NOP expressed as essentially "CONV.MOV ZR, ZR".

> <
>> Rules are mostly similar to normal instructions, just with the
>> difference that ?T and ?F instructions are allowed to have register
>> conflicts with each other in bundles on account of them being mutually
>> exclusive.
>>
>>
>>
>> BTW: I put up a vote on Twitter wanting to see what the general
>> sentiment was on possible ways to "resolve" the encoding orthogonality
>> issues with R32..R63:
>> https://twitter.com/cr88192/status/1602801136590782466
>>
> There was a recent poll on some site and 56% of Americans do not think
> that Arabic numerals should be taught in schools, too; and 15% don't have
> an opinion.
> https://www.snopes.com/fact-check/teaching-arabic-numerals/
> We are well on the way to Idiocracy.

"Oh those squiggles, what do they mean. Radix-10 positional arithmetic,
what sorcery is this! We all know true numbers look like MCMXCIX!", then
proceeds to lose their crap if someone tries to bring up zero or
negative numbers...

Thus far, the dominant response seems to be that people are against
having 64 GPRs. Would have preferred better prompts, but each was
limited to 25 characters, which isn't really enough for this.

I was more just wondering what the general sentiment was, rather than
committing to follow with whatever option wins.

But, yeah, the "least effort" option is "leave everything as-is", where:
BGBCC does not use R32..R63 by default unless enabled via a command-line
option (except for the 128-bit ABI).

If not enabled, as far as BGBCC and the assembler are concerned, these
registers do not exist:
-fxgpr: Allows ASM to use these registers.
-fxgpr_ena: Allow BGBCC to use them for register allocation.
-fxgpr_abi: Allow ABI to use them for argument passing
Increases the number of register arguments to 16 (*).
But, also increases ABI's register spill space to 128 bytes.

*: Increasing the number of register arguments from 8 to 16 seems to
increase the number of function calls which fit entirely in registers
from around 80% to around 98%.

Using R32..R63 can help slightly for things like TKRA-GL and JPEG
decoding, but is slightly detrimental in many other cases (globally
enabling them is slightly detrimental to performance and code density
with the existing encoding scheme).

Part of the issue is likely due to the orthogonality issue:
Prevents cases where instructions could have been bundled;
Forces using 64-bit Op64 encodings in many cases (as a fallback);
...

This seems to be enough to offset the (arguably small) reduction in the
number of stack spill-and-fill (spill-and-fill is bad; but 64-bit
instruction encodings being a roadblock to the WEXifier is also bad...).

By design, this would eliminate cases where Op64 encodings are needed to
deal with XGPR (but where the op could otherwise fit into a 32-bit
encoding).

But, it does allow the "majority of all functions" to switch almost or
entirely to static-assigning all of the local variables to registers.
This mostly applies to non-leaf functions in this case; as the majority
of leaf functions are already able to go full-static with 32 GPRs, but
non-leaf functions can't use scratch registers for static assignment.

Say, 32 GPRs:
Leaf function: Has ~ 26 registers it can use in this case.
Non-Leaf: Has ~ 14 registers it can use in this case.

With 64 GPRs:
Leaf function: Has ~ 58 registers it can use.
Non-Leaf: Has ~ 30 registers it can use.

Partial reason being, in a leaf function, nothing is going to stomp the
scratch registers, but a non-leaf functions need to deal with these
registers getting stomped during function calls.

On average, functions seems to have roughly 17 variables in total (in a
roughly Gaussian distribution). With the number of function arguments
seemingly following a geometric distribution.

Though, using fixed-length 32-bit ops would be bad for size-optimized
code. So, for example, the new encoding mode would be a bad idea for the
Boot ROM.

Would need to finish implementing and then test it though.

Compared to other possibilities, the current scheme was chosen to be
"fairly conservative" (trying to minimize changes needed both to my
Verilog code and to BGBCC needed to add support for it).

It will basically reuse nearly all of the existing instruction decoding
as-is, with mostly a few tweaks related to instruction-length and the
WN/WM/WI bits and similar.

It will still have Jumbo encodings, just now with the possibility of
there being 16 unique Jumbo prefixes (rather than 2). Not sure yet what
I will do with the additional Jumbo prefixes.

Implicitly, a lot of the Imm5u fields will become Imm6s (as with the
existing XGPR encodings).

Stuff in BGBCC is still going to be a mess, but nothing new here... This
is where I currently expect most of the "fight" will be.

Changes needed to my emulator should also be fairly modest.

But, yeah, an obvious drawback is that it would fracture the ISA into
two variants which will not be binary compatible with each other (hence
why different CPU modes and MachineID values are needed for this).

The new mode did require making some semantic changes to the handling of
the Link Register, but my existing code doesn't notice the change.
Mostly changing some of the high-order bits in LR, and changing it to
*always* be encoded as the Inter-ISA encoding (which will now be the
standard encoding for LR).

Re: everything old is new again, Compare-and-branch vs PIC

<tnc39r$1uf0$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29515&group=comp.arch#29515

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!pcXxviI/vaijo7Aoz0JW3Q.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Wed, 14 Dec 2022 09:59:39 +0100
Organization: Aioe.org NNTP Server
Message-ID: <tnc39r$1uf0$1@gioia.aioe.org>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me>
<45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad>
<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>
<tnba86$2kho9$1@dont-email.me>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="63968"; posting-host="pcXxviI/vaijo7Aoz0JW3Q.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.14
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Wed, 14 Dec 2022 08:59 UTC

MitchAlsup wrote:
> On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
>> On 12/13/2022 2:53 PM, MitchAlsup wrote:
>
>> Then again, I guess it was interesting to observe that some types of
>> optimizations which help on BJX2 also help on x86-64, despite x86-64
>> having not enough registers to actually make use of the optimization as
>> expressed (x86-64 seems to be able to pretend as-if the stack
>> spills/fills were registers).
>>
> The stack on x86 is defined to always be aligned to the size of the GPRs.

That is an OS/software convention afaik?

I.e. at least in real (16-bit) and 32-bit mode you _can_ misalign your
stack pointer as long as you don't enable the check_alignment control
word flag?

My mime-ascii executable text starts by pushing a word of 0000h and a
word of 0ffffh on the stack, then incrementing the stack pointer so that
a pop will return 00ffh.
>>
>> Meanwhile, I am left needing to use lots of registers, and occasional
>> optimizations which try to side-step a few (store then load) cases, then
>> recently needing to fix a bug where one of these optimizations broke due
>> to a type-handling issue in my compiler (was using the type of the value
>> stored to an array, rather than the type of the value as loaded from the
>> array).
> <
> If there is only 1 set of registers, this problem vanishes.
>> <snip>
>> Nothing requires that they are not skipped in my case either, just that
>> this is beyond what my current implementation can manage.
>>
> This was a discussion about NoOps and the necessity of having them in ISA.
> If they are present they have to have some defined meaning, EricP and I
> are trying to ascertain exactly what the prescription needs to be. Being able
> to be skipped fails several of the assertions in this thread.
> <
>> Rules are mostly similar to normal instructions, just with the
>> difference that ?T and ?F instructions are allowed to have register
>> conflicts with each other in bundles on account of them being mutually
>> exclusive.
>>
>>
>>
>> BTW: I put up a vote on Twitter wanting to see what the general
>> sentiment was on possible ways to "resolve" the encoding orthogonality
>> issues with R32..R63:
>> https://twitter.com/cr88192/status/1602801136590782466
>>
> There was a recent poll on some site and 56% of Americans do not think
> that Arabic numerals should be taught in schools, too; and 15% don't have
> an opinion.
> https://www.snopes.com/fact-check/teaching-arabic-numerals/
> We are well on the way to Idiocracy.

It is much worse in the US than here, but that might just be a matter of
time?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

movapd/movaps/movdqa vs movupd/movups/movdqu (was: everything old ...)

<2022Dec14.132439@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29521&group=comp.arch#29521

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: movapd/movaps/movdqa vs movupd/movups/movdqu (was: everything old ...)
Date: Wed, 14 Dec 2022 12:24:39 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 78
Message-ID: <2022Dec14.132439@mips.complang.tuwien.ac.at>
References: <tnc39r$1uf0$1@gioia.aioe.org> <memo.20221214103401.4144U@jgd.cix.co.uk>
Injection-Info: reader01.eternal-september.org; posting-host="cfa8ac94f106378cf6eab29da2a4208e";
logging-data="2955587"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ahnMUuir9F3oShWKSk4AR"
Cancel-Lock: sha1:BUHBFFHAzyFlhVimFNwvR6CTqjk=
X-newsreader: xrn 10.11

by: Anton Ertl - Wed, 14 Dec 2022 12:24 UTC

jgd@cix.co.uk (John Dallman) writes:
>There are SSE2 instructions that deal with aligned pairs of doubles, such
>as <https://en.wikipedia.org/wiki/MOVAPD>. Those trap on addresses that
>aren't 16-byte aligned, even if alignment checking is turned off. They
>produce a General Protection trap, not an alignment trap.

And there is vmovapd that moves from/to a zmm register and that traps
if the address is not 64-byte aligned.

One difference between SSE and AVX is that the implicit memory
accesses of the load-and-op instructions of SSE require 16-byte
alignment, while the implicit memory accesses of such instructions in
AVX and later extensions do not require alignment.

>There are
>versions for unaligned addresses, but they're slower.

A commom myth, and it was brought up for justifying that gcc's
auto-vectorization breaks a program by compiling movdqa instread of
movdqu, so I measured it for the routine for which the bug was
reported <http://www.complang.tuwien.ac.at/anton/autovectors/>.

It turned out that even on the K10 and the Core 2, which were
explicitly mentioned as affected CPUs, the movdqa version (16-byte
alignment required) was either the same speed (K10) as the movdqu
version (no alignment required, but present in the benchmark) or was
slower (Core 2) than the unvectorized version written in
standard-compliant C (see below). On the Core 2 the movdqa version
was faster than the movdqu version by a factor 1.0014, on the others
they were essentially the same speed.

[Side note: Rewriting the program in standard-compliant C (by using
memcpy() instead of dereferencing pointers to 64-bit values) resulted
in the program not being auto-vectorized on the same compiler and was
significantly slower on 4 out of 5 CPUs (in particular, by a factor of
1.7 on Haswell and Skylake).]

Given the defaults for the implicit loads of AVX instructions, I
expect instructions that do not require alignment are at least as fast
as those that require alignment on CPUs that support AVX. And indeed,
an Intel performance guy writes
<https://software.intel.com/comment/1470256#comment-1470256> that
vmovdqu vs. vmovdqa is neutral for performance.

One interesting thing about the auto-vectorized code was that it uses
movdqa (requiring 16-byte alignment) for loads and movups for stores.
That is probably because even if you assume (based on the pointer
type) that both pointers are 8-byte aligned, if you load two items
aligned to a 16-byte boundary, the corresponding store may be
misaligned wrt the 16-byte requirement of movdqa/movaps/movapd.

One strange thing here is that one supposedly should not mix types:
Movdqu and movups are architecturally the same instruction, but movdqu
is for integers, while movups is for 32-bit floats, and there are
supposedly performance penalties for mixing types.

The other issue is: For code like this that performs the same number
of loads and stores, I would have chosen to align the store pointer to
16 bytes rather than the load pointer, because most CPUs can perform
more loads than stores per cycle, and the idea is that it can then
perform an unaligned load by performing two aligned loads in one
cycle.

I wondered if maybe the store buffer is used to make adjacent
unaligned stores as cheap as aligned stores, but another
microbenchmark
<http://www.complang.tuwien.ac.at/anton/unaligned-stores/> refutes
this theory.

Nevertheless, in the "other results" in
<http://www.complang.tuwien.ac.at/anton/autovectors/>, the (actually)
unaligned store variants were as fast on K10, Sandy Bridge, and
Skylake as the same benchmarks with actually aligned stor pointers.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: everything old is new again, Compare-and-branch vs PIC

<8jlmL.4515$b7Kc.1476@fx39.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29522&group=comp.arch#29522

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Newsgroups: comp.arch
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com> <bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com> <XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me> <T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me> <ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com> <tnba86$2kho9$1@dont-email.me> <6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>
Lines: 18
Message-ID: <8jlmL.4515$b7Kc.1476@fx39.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 14 Dec 2022 14:44:20 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 14 Dec 2022 14:44:20 GMT
X-Received-Bytes: 1772

by: Scott Lurndal - Wed, 14 Dec 2022 14:44 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
>> On 12/13/2022 2:53 PM, MitchAlsup wrote:
>

>> BTW: I put up a vote on Twitter wanting to see what the general
>> sentiment was on possible ways to "resolve" the encoding orthogonality
>> issues with R32..R63:
>> https://twitter.com/cr88192/status/1602801136590782466
>>
>There was a recent poll on some site and 56% of Americans do not think
>that Arabic numerals should be taught in schools, too; and 15% don't have
>an opinion.
>https://www.snopes.com/fact-check/teaching-arabic-numerals/
>We are well on the way to Idiocracy.

3000 responses out of 330 million americans. How representative is
that survey?

Re: everything old is new again, Compare-and-branch vs PIC

<tncs86$2qru3$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29523&group=comp.arch#29523

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: everything old is new again, Compare-and-branch vs PIC
Date: Wed, 14 Dec 2022 08:05:24 -0800
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <tncs86$2qru3$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad>
<d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com>
<tnba86$2kho9$1@dont-email.me>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com>
<8jlmL.4515$b7Kc.1476@fx39.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 14 Dec 2022 16:05:26 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="4f9c89852f0db64b5197c00f5b3a7907";
logging-data="2977731"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18v1zJjQP04qaqBVmIS1z6AXrpyzhHqxD8="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:BmVI/K88RyUQaq1qdNp3ZZn6S8c=
In-Reply-To: <8jlmL.4515$b7Kc.1476@fx39.iad>
Content-Language: en-US

by: Stephen Fuld - Wed, 14 Dec 2022 16:05 UTC

On 12/14/2022 6:44 AM, Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
>>> On 12/13/2022 2:53 PM, MitchAlsup wrote:
>>
>
>>> BTW: I put up a vote on Twitter wanting to see what the general
>>> sentiment was on possible ways to "resolve" the encoding orthogonality
>>> issues with R32..R63:
>>> https://twitter.com/cr88192/status/1602801136590782466
>>>
>> There was a recent poll on some site and 56% of Americans do not think
>> that Arabic numerals should be taught in schools, too; and 15% don't have
>> an opinion.
>> https://www.snopes.com/fact-check/teaching-arabic-numerals/
>> We are well on the way to Idiocracy.
>
> 3000 responses out of 330 million americans. How representative is
> that survey?

The SNOPES article shows the margin of error to be 3%. So, assuming it
was a non-biased sample, and the firm doing the survey seems to be
reputable, pretty good. Would you feel better if they said it was 53%?

But it is interesting that they were trying to get a bias, i.e. anti
Arab, and picked a question where they expected few to realize the
origin (actually the origin was Indian, but brought to the western world
by Arabs) of the numerals. As it is a specific piece of "trivia", I am
not sure how significant it is.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: everything old is new again, Compare-and-branch vs PIC

<bc76d960-0863-49a8-8e72-0e44a684f4ben@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29524&group=comp.arch#29524

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4506:0:b0:4e6:ca24:d339 with SMTP id k6-20020ad44506000000b004e6ca24d339mr271298qvu.115.1671043605971;
Wed, 14 Dec 2022 10:46:45 -0800 (PST)
X-Received: by 2002:a05:6870:1b8a:b0:144:9878:46be with SMTP id
hm10-20020a0568701b8a00b00144987846bemr369038oab.245.1671043605387; Wed, 14
Dec 2022 10:46:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Dec 2022 10:46:45 -0800 (PST)
In-Reply-To: <tnbvfc$2oj2d$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9e8:644b:7a5b:28fa;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9e8:644b:7a5b:28fa
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com> <tnba86$2kho9$1@dont-email.me>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com> <tnbvfc$2oj2d$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bc76d960-0863-49a8-8e72-0e44a684f4ben@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Dec 2022 18:46:45 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 9907

by: MitchAlsup - Wed, 14 Dec 2022 18:46 UTC

On Wednesday, December 14, 2022 at 1:54:24 AM UTC-6, BGB wrote:
> On 12/13/2022 10:19 PM, MitchAlsup wrote:
> > On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
> >> On 12/13/2022 2:53 PM, MitchAlsup wrote:
> >
> >> Then again, I guess it was interesting to observe that some types of
> >> optimizations which help on BJX2 also help on x86-64, despite x86-64
> >> having not enough registers to actually make use of the optimization as
> >> expressed (x86-64 seems to be able to pretend as-if the stack
> >> spills/fills were registers).
> >>
> > The stack on x86 is defined to always be aligned to the size of the GPRs.
> Yeah. As noted, current ABI rules generally enforce a 16B alignment for
> the stack on BJX2 as well (with stack items generally following their
> "natural alignment"). CPU isn't smart enough to treat memory as
> registers though.
> >>
> >> Meanwhile, I am left needing to use lots of registers, and occasional
> >> optimizations which try to side-step a few (store then load) cases, then
> >> recently needing to fix a bug where one of these optimizations broke due
> >> to a type-handling issue in my compiler (was using the type of the value
> >> stored to an array, rather than the type of the value as loaded from the
> >> array).
> > <
> > If there is only 1 set of registers, this problem vanishes.
> There is only one type of register in my case, but this particular issue
> was closer to the C level rather than the ISA level.
>
> When storing something to an array, and then loading it back again, it
> was reported as the type of the value that had been stored rather than
> the type of the array element.
>
> So, say, if one tries storing "int" to a "short*" array, or "void*" to a
> "Foo*" array, everything is fine. But, if a load returns 'int' or
> 'void*' rather than the intended type, this may lead to other problems.
<
Had your base port been from LLVM instead of GCC this likely would not
have happened.
GCC is more of a K&R (anything goes) while LLVM is more of a C-done-right
(ADA-style)
>
> Options are one of:
> Types match exactly, allow it through as-is;
> Types are "compatible", quietly coerce the type;
> Quietly convert it into a type-cast operation;
> Actually do the load.
<
What if the int was 32-bits and short* was 64-bits ? Somehow, 32-bits must
get invented prior to the store.
>
> >> <snip>
> >> BTW: I put up a vote on Twitter wanting to see what the general
> >> sentiment was on possible ways to "resolve" the encoding orthogonality
> >> issues with R32..R63:
> >> https://twitter.com/cr88192/status/1602801136590782466
> >>
> > There was a recent poll on some site and 56% of Americans do not think
> > that Arabic numerals should be taught in schools, too; and 15% don't have
> > an opinion.
> > https://www.snopes.com/fact-check/teaching-arabic-numerals/
> > We are well on the way to Idiocracy.
> "Oh those squiggles, what do they mean. Radix-10 positional arithmetic,
> what sorcery is this! We all know true numbers look like MCMXCIX!", then
> proceeds to lose their crap if someone tries to bring up zero or
> negative numbers...
<
I had a 7-th grade algebra teacher state that you cannot do Multiplication and
Division in Roman Numerals. It took but a single day to show her the fallacy of
her ways.
>
>
> Thus far, the dominant response seems to be that people are against
> having 64 GPRs. Would have preferred better prompts, but each was
> limited to 25 characters, which isn't really enough for this.
<
More than 60% of the subroutines from the LLVM front end, EMBench, and
CoreMark can be compiled <essentially> optimally with the 16-temporary
registers my ABI provides. Of the rest only 2 subroutines (from 1000+)
do any stack push/pops of temporaries values (not associated with
subroutine calling), and this is without a FP RF ! and only 32 total registers.
<
I am not against 64 registers, I just don't see the "value add" of consuming
that many more instruction bits for "that fewer" instructions. That is: does
64 registers buy anything. The old data was 16->32 registers bought 15%
while 32->64 bought only 3% and may constrain the OpCode layout. So,
is it worth it:: does it buy more than it costs? .....
<
One thing I will note is that having constants universally available is like
having 3-5 more registers in your file. Sort-of-like LD-Ops make the ISA
as efficient as if it were a LD-only machine with 3-6 more registers.
>
> I was more just wondering what the general sentiment was, rather than
> committing to follow with whatever option wins.
>
>
>
> But, yeah, the "least effort" option is "leave everything as-is", where:
> BGBCC does not use R32..R63 by default unless enabled via a command-line
> option (except for the 128-bit ABI).
>
> If not enabled, as far as BGBCC and the assembler are concerned, these
> registers do not exist:
> -fxgpr: Allows ASM to use these registers.
> -fxgpr_ena: Allow BGBCC to use them for register allocation.
> -fxgpr_abi: Allow ABI to use them for argument passing
> Increases the number of register arguments to 16 (*).
> But, also increases ABI's register spill space to 128 bytes.
>
> *: Increasing the number of register arguments from 8 to 16 seems to
> increase the number of function calls which fit entirely in registers
> from around 80% to around 98%.
>
Interesting data point, thanks. Any idea as to where 15 would fall ??
>
> Using R32..R63 can help slightly for things like TKRA-GL and JPEG
> decoding, but is slightly detrimental in many other cases (globally
> enabling them is slightly detrimental to performance and code density
> with the existing encoding scheme).
<
And therein lies the rub.
>
> Part of the issue is likely due to the orthogonality issue:
> Prevents cases where instructions could have been bundled;
> Forces using 64-bit Op64 encodings in many cases (as a fallback);
> ...
rub with liniment.
>
>
> This seems to be enough to offset the (arguably small) reduction in the
> number of stack spill-and-fill (spill-and-fill is bad; but 64-bit
> instruction encodings being a roadblock to the WEXifier is also bad...).
>
> By design, this would eliminate cases where Op64 encodings are needed to
> deal with XGPR (but where the op could otherwise fit into a 32-bit
> encoding).
>
>
> But, it does allow the "majority of all functions" to switch almost or
> entirely to static-assigning all of the local variables to registers.
> This mostly applies to non-leaf functions in this case; as the majority
> of leaf functions are already able to go full-static with 32 GPRs, but
> non-leaf functions can't use scratch registers for static assignment.
<
I am seeing (on different applications and not as many of them) that
90%-ish of all leaf functions are happy with 16-registers--no prologue
or epilogue (and no spills/fills because the compiler inserts prologue
if spills or fills would be required and then uses as many of the 32
registers as needed.).
>
> Say, 32 GPRs:
> Leaf function: Has ~ 26 registers it can use in this case.
<
Leaf function (not using a FP) has 30 registers it can use.
<
> Non-Leaf: Has ~ 14 registers it can use in this case.
<
Non-leaf has 15 (no FP) or 14 (FP) it can preserve across subroutine
calls.
>
> With 64 GPRs:
> Leaf function: Has ~ 58 registers it can use.
> Non-Leaf: Has ~ 30 registers it can use.
>
> Partial reason being, in a leaf function, nothing is going to stomp the
> scratch registers, but a non-leaf functions need to deal with these
> registers getting stomped during function calls.
<
There is no reason a leaf subroutine cannot dump the preserved registers
to the stack and use as many as it needs. The contract is that these must
be restored before returning.
>
> On average, functions seems to have roughly 17 variables in total (in a
> roughly Gaussian distribution). With the number of function arguments
> seemingly following a geometric distribution.
>

Re: everything old is new again, Compare-and-branch vs PIC

<5fcdfb38-6b41-43db-8232-2d5bdea663c1n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29525&group=comp.arch#29525

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1b82:b0:6fc:967e:c7b4 with SMTP id dv2-20020a05620a1b8200b006fc967ec7b4mr34132034qkb.253.1671043869995;
Wed, 14 Dec 2022 10:51:09 -0800 (PST)
X-Received: by 2002:a05:6808:8cd:b0:354:4a19:f09e with SMTP id
k13-20020a05680808cd00b003544a19f09emr208858oij.61.1671043869712; Wed, 14 Dec
2022 10:51:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Dec 2022 10:51:09 -0800 (PST)
In-Reply-To: <tnc39r$1uf0$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9e8:644b:7a5b:28fa;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9e8:644b:7a5b:28fa
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<tmtmlq$uo06$1@dont-email.me> <45ecc365-c1f0-4bd8-b92a-4e50209095c7n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com> <tnba86$2kho9$1@dont-email.me>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com> <tnc39r$1uf0$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5fcdfb38-6b41-43db-8232-2d5bdea663c1n@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Dec 2022 18:51:09 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3682

by: MitchAlsup - Wed, 14 Dec 2022 18:51 UTC

On Wednesday, December 14, 2022 at 2:59:42 AM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
> >> On 12/13/2022 2:53 PM, MitchAlsup wrote:
> >
> >> Then again, I guess it was interesting to observe that some types of
> >> optimizations which help on BJX2 also help on x86-64, despite x86-64
> >> having not enough registers to actually make use of the optimization as
> >> expressed (x86-64 seems to be able to pretend as-if the stack
> >> spills/fills were registers).
> >>
> > The stack on x86 is defined to always be aligned to the size of the GPRs.
> That is an OS/software convention afaik?
<
A push byte will write a byte and adjust the stack by a doubleword.
Thus, it is partially enforced by HW !
>
> I.e. at least in real (16-bit) and 32-bit mode you _can_ misalign your
> stack pointer as long as you don't enable the check_alignment control
> word flag?
<
I am talking about -64 not that stuff that should have died in 1983.
But the actual HW probably has 87-bifferent ways depending on
what mode the CPU is currently in.
>
> My mime-ascii executable text starts by pushing a word of 0000h and a
> word of 0ffffh on the stack, then incrementing the stack pointer so that
> a pop will return 00ffh.
<
Then you are on your own.
> > There was a recent poll on some site and 56% of Americans do not think
> > that Arabic numerals should be taught in schools, too; and 15% don't have
> > an opinion.
> > https://www.snopes.com/fact-check/teaching-arabic-numerals/
> > We are well on the way to Idiocracy.
> It is much worse in the US than here, but that might just be a matter of
> time?
<
You have a population that still values an education, we have a population
that values an edumacation.
>
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: everything old is new again, Compare-and-branch vs PIC

<c441a3d7-a459-48d7-ac82-15a1d213b53fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29526&group=comp.arch#29526

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5219:0:b0:3a7:e66a:8c0d with SMTP id r25-20020ac85219000000b003a7e66a8c0dmr13721851qtn.337.1671044170407;
Wed, 14 Dec 2022 10:56:10 -0800 (PST)
X-Received: by 2002:a05:6870:9e99:b0:143:dea4:c591 with SMTP id
pu25-20020a0568709e9900b00143dea4c591mr515880oab.106.1671044169735; Wed, 14
Dec 2022 10:56:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Dec 2022 10:56:09 -0800 (PST)
In-Reply-To: <8jlmL.4515$b7Kc.1476@fx39.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9e8:644b:7a5b:28fa;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9e8:644b:7a5b:28fa
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<kM1lL.20158$iU59.5539@fx14.iad> <tn2u7j$288n$1@gal.iecc.com>
<bwnlL.106$%os8.36@fx03.iad> <d3842b13-6457-4a7b-964d-ec0ce66447e2n@googlegroups.com>
<XuJlL.9383$MVg8.8632@fx12.iad> <tn9i9j$2g22f$1@dont-email.me>
<T44mL.71538$gGD7.48801@fx11.iad> <tnala2$2j5mv$1@dont-email.me>
<ffae19fc-caa4-47ce-b730-70389d9f8186n@googlegroups.com> <tnba86$2kho9$1@dont-email.me>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com> <8jlmL.4515$b7Kc.1476@fx39.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c441a3d7-a459-48d7-ac82-15a1d213b53fn@googlegroups.com>
Subject: Re: everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Dec 2022 18:56:10 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3055

by: MitchAlsup - Wed, 14 Dec 2022 18:56 UTC

On Wednesday, December 14, 2022 at 8:45:16 AM UTC-6, Scott Lurndal wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Tuesday, December 13, 2022 at 7:52:10 PM UTC-6, BGB wrote:
> >> On 12/13/2022 2:53 PM, MitchAlsup wrote:
> >
>
> >> BTW: I put up a vote on Twitter wanting to see what the general
> >> sentiment was on possible ways to "resolve" the encoding orthogonality
> >> issues with R32..R63:
> >> https://twitter.com/cr88192/status/1602801136590782466
> >>
> >There was a recent poll on some site and 56% of Americans do not think
> >that Arabic numerals should be taught in schools, too; and 15% don't have
> >an opinion.
> >https://www.snopes.com/fact-check/teaching-arabic-numerals/
> >We are well on the way to Idiocracy.
> 3000 responses out of 330 million americans. How representative is
> that survey?
<
Exactly as representative as "any gun question ask on any media outlet"
which then is picked up by 999 of 1000 gun forums around the net, and
they, in turn send out messages to all their members asking them to
participate in the "survey" while the other 65% of people get no notice
at all and simply stumble across the survey at some kind of normal pace.
<
So, polling by the net is not representative of anything.
<
Penn and Teller did on on di-hydrogen-monoxide that is hilarious. They
got a very large majority of people to sign a petition to ban H2O.

Re: terminology, everything old is new again, Compare-and-branch vs PIC

<tndbik$2g6n$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29527&group=comp.arch#29527

copy link Newsgroups: comp.arch
Followup: alt.flame

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: terminology, everything old is new again, Compare-and-branch vs PIC
Followup-To: alt.flame
Date: Wed, 14 Dec 2022 20:27:00 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tndbik$2g6n$1@gal.iecc.com>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com> <8jlmL.4515$b7Kc.1476@fx39.iad> <c441a3d7-a459-48d7-ac82-15a1d213b53fn@googlegroups.com>
Injection-Date: Wed, 14 Dec 2022 20:27:00 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="82135"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com> <8jlmL.4515$b7Kc.1476@fx39.iad> <c441a3d7-a459-48d7-ac82-15a1d213b53fn@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Wed, 14 Dec 2022 20:27 UTC

According to MitchAlsup <MitchAlsup@aol.com>:
>So, polling by the net is not representative of anything.
><
>Penn and Teller did on on di-hydrogen-monoxide that is hilarious. They
>got a very large majority of people to sign a petition to ban H2O.

But that's different. Surely you are aware that DHMO has been
scientifically shown to be the major cause of drowning, is found in
many dangerous substances including Nitric Acid, and that prolonged
exposure to its solid form causes tissue damage.

Further info here: https://www.dhmo.org/

(Please note Followup-To:)

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: terminology, everything old is new again, Compare-and-branch vs PIC

<bbcdd772-e85b-4de8-bee0-4cbfc449ebbcn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29528&group=comp.arch#29528

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:5d87:b0:4c7:91ae:7458 with SMTP id mf7-20020a0562145d8700b004c791ae7458mr5938077qvb.51.1671052794801;
Wed, 14 Dec 2022 13:19:54 -0800 (PST)
X-Received: by 2002:a05:6870:d69e:b0:13c:97e9:5d40 with SMTP id
z30-20020a056870d69e00b0013c97e95d40mr511739oap.42.1671052794147; Wed, 14 Dec
2022 13:19:54 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Dec 2022 13:19:53 -0800 (PST)
In-Reply-To: <tndbik$2g6n$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9e8:644b:7a5b:28fa;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9e8:644b:7a5b:28fa
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<6d6de22f-53be-4275-aee4-44f6581ff0a4n@googlegroups.com> <8jlmL.4515$b7Kc.1476@fx39.iad>
<c441a3d7-a459-48d7-ac82-15a1d213b53fn@googlegroups.com> <tndbik$2g6n$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bbcdd772-e85b-4de8-bee0-4cbfc449ebbcn@googlegroups.com>
Subject: Re: terminology, everything old is new again, Compare-and-branch vs PIC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Dec 2022 21:19:54 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2385

by: MitchAlsup - Wed, 14 Dec 2022 21:19 UTC

On Wednesday, December 14, 2022 at 2:27:04 PM UTC-6, John Levine wrote:
> According to MitchAlsup <Mitch...@aol.com>:
> >So, polling by the net is not representative of anything.
> ><
> >Penn and Teller did on on di-hydrogen-monoxide that is hilarious. They
> >got a very large majority of people to sign a petition to ban H2O.
> But that's different. Surely you are aware that DHMO has been
> scientifically shown to be the major cause of drowning, is found in
> many dangerous substances including Nitric Acid, and that prolonged
> exposure to its solid form causes tissue damage.
<
Just imaging how safe all of would be if we lived on a world where there
was no di-hydrogen-monoxide:: no drownings, no freezing in snow, no ...
>
> Further info here: https://www.dhmo.org/
>
> (Please note Followup-To:)
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly
<
Oh Wait !

devel / comp.arch / Re: everything old is new again, Compare-and-branch vs PIC

Pages:123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards. If you find that it is broken please let me know here rocksolid.nodes.help

devel / comp.arch / Re: everything old is new again, Compare-and-branch vs PIC

devel / comp.arch / Re: everything old is new again, Compare-and-branch vs PIC

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
If you find that it is broken please let me know here rocksolid.nodes.help