novaBBS - comp.arch - Re: Non-RISC-ness in AMD64

Re: Non-RISC-ness in AMD64

<3efc99a9-f17b-4970-a575-899b71e6340an@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=22563&group=comp.arch#22563

X-Received: by 2002:a05:622a:1a87:: with SMTP id s7mr20614254qtc.304.1640796923397;
Wed, 29 Dec 2021 08:55:23 -0800 (PST)
X-Received: by 2002:a9d:206a:: with SMTP id n97mr20516409ota.142.1640796923199;
Wed, 29 Dec 2021 08:55:23 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 29 Dec 2021 08:55:23 -0800 (PST)
In-Reply-To: <jwva6gk81dz.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fdd8:9d71:527b:6b7b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fdd8:9d71:527b:6b7b
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <jwva6gk81dz.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3efc99a9-f17b-4970-a575-899b71e6340an@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 29 Dec 2021 16:55:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18

by: MitchAlsup - Wed, 29 Dec 2021 16:55 UTC

On Tuesday, December 28, 2021 at 10:46:51 PM UTC-6, Stefan Monnier wrote:
> > With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
> > at all in modifying MMU tables. You just write the MMU tables and
> > it is up to all the HW resources to "do the right thing".
> >
> > That SW prevents this is no reason HW guys should not be worried
> > about anomalous behavior.
> I still don't see what kind of anomalous behavior you're thinking of
> that's solved by special handling of (non-atomic) RMW.
<
Imagine the above scenario where SW did NOT perform the interlocking:
<
What semantic do you prescribe where a non-interrupted stream of instructions
forms 2 addresses from the same pattern, with none of the pattern registers
changing value, and touching different memory locations because some other
piece of SW altered the MMU tables "at just the right time".
>
>
> Stefan

Re: Non-RISC-ness in AMD64

<cC2zJ.104301$zF3.1172@fx03.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22568&group=comp.arch#22568

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx03.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com>
In-Reply-To: <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 54
Message-ID: <cC2zJ.104301$zF3.1172@fx03.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 29 Dec 2021 19:24:56 UTC
Date: Wed, 29 Dec 2021 14:24:46 -0500
X-Received-Bytes: 3618

by: EricP - Wed, 29 Dec 2021 19:24 UTC

MitchAlsup wrote:
> On Tuesday, December 28, 2021 at 6:43:24 PM UTC-6, EricP wrote:
>> Yes, its not the RMW sequence that is a problem, although the shootdown-IPI
>> does prevent that because the RMW is all before or all after the interrupt.
>> But a LD OP ST sequence could be paged out after the LD and page in
>> for the ST at a different physical address and it would not be harmed.
>>
>> The purpose of the shootdown-IPI handshake is to ensure that
>> all cores agree that there is just one translation for that address.
> <
> But consider the case where the TLBs are coherent. Anyone with TLB
> permission to anyone-else's MMU tables, can write to and instantly
> modify the other-guy's TLB entries. With multi-core operations, this
> could happen....

Yes but the problem is the *instantly* part - it takes time to
notify all SMP cores and that leaves a hole where one core uses
the old mapping and a different core uses the new mapping.
Waiting for the ACK's closes that hole.

Note also below the ACK must not be sent too quickly.

> <
> With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
> at all in modifying MMU tables. You just write the MMU tables and
> it is up to all the HW resources to "do the right thing".

I was thinking this applied just to RMW instructions but I just
realized it actually applies to *ALL* load or store instructions.

Any LSQ entry that looks up a translation and is holding the physical
address before it is used is *also effectively a cached TLB entry*.
If a RMW instruction LSQ entry holds its translation while the
OP executes then it is just a longer duration cached copy.

When a remote TLB receives a shootdown coherence msg, it removes its
own matching entries but it must not ACK until it knows there are
no other copies of the PTE (address, access protections) in the LSQ.
The easiest way is to wait until LSQ empties but that could
have other consequences (it could take a long time for the ACK).

Also any ITLB copies that fetch buffers might have need to be zapped.

The PTE value must not be allowed to change until all ACKs are received,
just like when switching a cache line from Shared to Exclusive state.

> That SW prevents this is no reason HW guys should not be worried
> about anomalous behavior.

Right but HW taking over TLB coherence is a new responsibly
so it presents new anomalies.

Re: Non-RISC-ness in AMD64

<ll4zJ.121128$IB7.59871@fx02.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22570&group=comp.arch#22570

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad>
In-Reply-To: <cC2zJ.104301$zF3.1172@fx03.iad>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 74
Message-ID: <ll4zJ.121128$IB7.59871@fx02.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 29 Dec 2021 21:23:29 UTC
Date: Wed, 29 Dec 2021 16:22:15 -0500
X-Received-Bytes: 4746

by: EricP - Wed, 29 Dec 2021 21:22 UTC

EricP wrote:
> MitchAlsup wrote:
>> On Tuesday, December 28, 2021 at 6:43:24 PM UTC-6, EricP wrote:
>>> Yes, its not the RMW sequence that is a problem, although the
>>> shootdown-IPI does prevent that because the RMW is all before or all
>>> after the interrupt. But a LD OP ST sequence could be paged out after
>>> the LD and page in for the ST at a different physical address and it
>>> would not be harmed.
>>> The purpose of the shootdown-IPI handshake is to ensure that all
>>> cores agree that there is just one translation for that address.
>> <
>> But consider the case where the TLBs are coherent. Anyone with TLB
>> permission to anyone-else's MMU tables, can write to and instantly
>> modify the other-guy's TLB entries. With multi-core operations, this
>> could happen....
>
> Yes but the problem is the *instantly* part - it takes time to
> notify all SMP cores and that leaves a hole where one core uses
> the old mapping and a different core uses the new mapping.
> Waiting for the ACK's closes that hole.
>
> Note also below the ACK must not be sent too quickly.
>
>> <
>> With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
>> at all in modifying MMU tables. You just write the MMU tables and
>> it is up to all the HW resources to "do the right thing".
>
> I was thinking this applied just to RMW instructions but I just
> realized it actually applies to *ALL* load or store instructions.
>
> Any LSQ entry that looks up a translation and is holding the physical
> address before it is used is *also effectively a cached TLB entry*.
> If a RMW instruction LSQ entry holds its translation while the
> OP executes then it is just a longer duration cached copy.
>
> When a remote TLB receives a shootdown coherence msg, it removes its
> own matching entries but it must not ACK until it knows there are
> no other copies of the PTE (address, access protections) in the LSQ.
> The easiest way is to wait until LSQ empties but that could
> have other consequences (it could take a long time for the ACK).

Actually it is more than this since in OoO a LD instruction can
bypass other LD's or ST's, access a physical address, complete,
and be removed from the LSQ.
But the value loaded in implicitly tied to the translation used
even if there is no trace of that physical address in the LSQ.

If the translation was allowed to change out of order
it could allow a younger LD to use an older translation,
and an older LD to use a younger translation.
Which would create an illegal coherence scenario.

So the TLB shootdown has to trigger a replay trap for all non-retired
instructions and wait for committed ST's to flush to cache
before sending the ACK.

> Also any ITLB copies that fetch buffers might have need to be zapped.
>
> The PTE value must not be allowed to change until all ACKs are received,
> just like when switching a cache line from Shared to Exclusive state.

The PTE writer can use Exclusive ownership of the PTE cache line to prevent
remote TLB's from reloading the PTE too quickly after a shootdown.

The PTE writer acquires Exclusive ownership and invalidates all
remote cached copies of the PTE cache line. Only after does it
purge the TLB entries, trigger a replay, and send its ACK.
The replay will try to retranslate the VA and the TLB will
try to reread the PTE cache line in a Shared state,
but will stall until the PTE writer finishes its change
which only occurs after it receives ACKs from all nodes.

Re: Non-RISC-ness in AMD64

<d3b67bf2-124c-4549-a006-27c4e9fb7793n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22573&group=comp.arch#22573

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:e41:: with SMTP id o1mr25275232qvc.63.1640814822899;
Wed, 29 Dec 2021 13:53:42 -0800 (PST)
X-Received: by 2002:aca:646:: with SMTP id 67mr21809176oig.175.1640814822622;
Wed, 29 Dec 2021 13:53:42 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 29 Dec 2021 13:53:42 -0800 (PST)
In-Reply-To: <ll4zJ.121128$IB7.59871@fx02.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fdd8:9d71:527b:6b7b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fdd8:9d71:527b:6b7b
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad>
<ll4zJ.121128$IB7.59871@fx02.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d3b67bf2-124c-4549-a006-27c4e9fb7793n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 29 Dec 2021 21:53:42 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 82

by: MitchAlsup - Wed, 29 Dec 2021 21:53 UTC

On Wednesday, December 29, 2021 at 3:23:32 PM UTC-6, EricP wrote:
> EricP wrote:
> > MitchAlsup wrote:
> >> On Tuesday, December 28, 2021 at 6:43:24 PM UTC-6, EricP wrote:
> >>> Yes, its not the RMW sequence that is a problem, although the
> >>> shootdown-IPI does prevent that because the RMW is all before or all
> >>> after the interrupt. But a LD OP ST sequence could be paged out after
> >>> the LD and page in for the ST at a different physical address and it
> >>> would not be harmed.
> >>> The purpose of the shootdown-IPI handshake is to ensure that all
> >>> cores agree that there is just one translation for that address.
> >> <
> >> But consider the case where the TLBs are coherent. Anyone with TLB
> >> permission to anyone-else's MMU tables, can write to and instantly
> >> modify the other-guy's TLB entries. With multi-core operations, this
> >> could happen....
> >
> > Yes but the problem is the *instantly* part - it takes time to
> > notify all SMP cores and that leaves a hole where one core uses
> > the old mapping and a different core uses the new mapping.
> > Waiting for the ACK's closes that hole.
> >
> > Note also below the ACK must not be sent too quickly.
> >
> >> <
> >> With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
> >> at all in modifying MMU tables. You just write the MMU tables and
> >> it is up to all the HW resources to "do the right thing".
> >
> > I was thinking this applied just to RMW instructions but I just
> > realized it actually applies to *ALL* load or store instructions.
> >
> > Any LSQ entry that looks up a translation and is holding the physical
> > address before it is used is *also effectively a cached TLB entry*.
> > If a RMW instruction LSQ entry holds its translation while the
> > OP executes then it is just a longer duration cached copy.
> >
> > When a remote TLB receives a shootdown coherence msg, it removes its
> > own matching entries but it must not ACK until it knows there are
> > no other copies of the PTE (address, access protections) in the LSQ.
> > The easiest way is to wait until LSQ empties but that could
> > have other consequences (it could take a long time for the ACK).
<
> Actually it is more than this since in OoO a LD instruction can
> bypass other LD's or ST's, access a physical address, complete,
> and be removed from the LSQ.
<
> But the value loaded in implicitly tied to the translation used
> even if there is no trace of that physical address in the LSQ.
<
Now you are at least catching on as to why there is a scenario to be
concerned about.
>
> If the translation was allowed to change out of order
> it could allow a younger LD to use an older translation,
> and an older LD to use a younger translation.
> Which would create an illegal coherence scenario.
>
> So the TLB shootdown has to trigger a replay trap for all non-retired
> instructions and wait for committed ST's to flush to cache
> before sending the ACK.
> > Also any ITLB copies that fetch buffers might have need to be zapped.
> >
> > The PTE value must not be allowed to change until all ACKs are received,
> > just like when switching a cache line from Shared to Exclusive state.
<
> The PTE writer can use Exclusive ownership of the PTE cache line to prevent
> remote TLB's from reloading the PTE too quickly after a shootdown.
<
I have been giving some thought to using a different state in the Dcache
for PTEs--which would convert an access response into "not right now"
to let the slings and arrows of doing these quiess before granting access.
>
> The PTE writer acquires Exclusive ownership and invalidates all
> remote cached copies of the PTE cache line. Only after does it
> purge the TLB entries, trigger a replay, and send its ACK.
<
Almost starts to smell ATOMIC..........
<
> The replay will try to retranslate the VA and the TLB will
> try to reread the PTE cache line in a Shared state,
> but will stall until the PTE writer finishes its change
> which only occurs after it receives ACKs from all nodes.

Re: Non-RISC-ness in AMD64

<2X4zJ.72820$Bu7.26077@fx26.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22576&group=comp.arch#22576

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx26.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad>
In-Reply-To: <ll4zJ.121128$IB7.59871@fx02.iad>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 88
Message-ID: <2X4zJ.72820$Bu7.26077@fx26.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 29 Dec 2021 22:03:42 UTC
Date: Wed, 29 Dec 2021 17:03:08 -0500
X-Received-Bytes: 5549

by: EricP - Wed, 29 Dec 2021 22:03 UTC

EricP wrote:
> EricP wrote:
>> MitchAlsup wrote:
>>> On Tuesday, December 28, 2021 at 6:43:24 PM UTC-6, EricP wrote:
>>>> Yes, its not the RMW sequence that is a problem, although the
>>>> shootdown-IPI does prevent that because the RMW is all before or all
>>>> after the interrupt. But a LD OP ST sequence could be paged out
>>>> after the LD and page in for the ST at a different physical address
>>>> and it would not be harmed.
>>>> The purpose of the shootdown-IPI handshake is to ensure that all
>>>> cores agree that there is just one translation for that address.
>>> <
>>> But consider the case where the TLBs are coherent. Anyone with TLB
>>> permission to anyone-else's MMU tables, can write to and instantly
>>> modify the other-guy's TLB entries. With multi-core operations, this
>>> could happen....
>>
>> Yes but the problem is the *instantly* part - it takes time to
>> notify all SMP cores and that leaves a hole where one core uses
>> the old mapping and a different core uses the new mapping.
>> Waiting for the ACK's closes that hole.
>>
>> Note also below the ACK must not be sent too quickly.
>>
>>> <
>>> With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
>>> at all in modifying MMU tables. You just write the MMU tables and
>>> it is up to all the HW resources to "do the right thing".
>>
>> I was thinking this applied just to RMW instructions but I just
>> realized it actually applies to *ALL* load or store instructions.
>>
>> Any LSQ entry that looks up a translation and is holding the physical
>> address before it is used is *also effectively a cached TLB entry*.
>> If a RMW instruction LSQ entry holds its translation while the
>> OP executes then it is just a longer duration cached copy.
>>
>> When a remote TLB receives a shootdown coherence msg, it removes its
>> own matching entries but it must not ACK until it knows there are
>> no other copies of the PTE (address, access protections) in the LSQ.
>> The easiest way is to wait until LSQ empties but that could
>> have other consequences (it could take a long time for the ACK).
>
> Actually it is more than this since in OoO a LD instruction can
> bypass other LD's or ST's, access a physical address, complete,
> and be removed from the LSQ.
> But the value loaded in implicitly tied to the translation used
> even if there is no trace of that physical address in the LSQ.
>
> If the translation was allowed to change out of order
> it could allow a younger LD to use an older translation,
> and an older LD to use a younger translation.
> Which would create an illegal coherence scenario.
>
> So the TLB shootdown has to trigger a replay trap for all non-retired
> instructions and wait for committed ST's to flush to cache
> before sending the ACK.

Hmmm... this could deadlock.
If it had to flush committed ST's to cache before sending the ACK then
this could deadlock with the PTE writer holding the cache line Exclusive
awaiting those ACKs if the store was to the same PTE cache line.

The reason I'm concerned is if there are committed stores queued
to write to a physical page and the translation changes,
the PTE writer might think that the old page won't be written
any more but if fact there could many pending writes scattered
about the system in various hit-under-miss MSHR buffers.
That physical page might be reallocated for some other use
only to be stomped on by all those late writes.

>> Also any ITLB copies that fetch buffers might have need to be zapped.
>>
>> The PTE value must not be allowed to change until all ACKs are received,
>> just like when switching a cache line from Shared to Exclusive state.
>
> The PTE writer can use Exclusive ownership of the PTE cache line to prevent
> remote TLB's from reloading the PTE too quickly after a shootdown.
>
> The PTE writer acquires Exclusive ownership and invalidates all
> remote cached copies of the PTE cache line. Only after does it
> purge the TLB entries, trigger a replay, and send its ACK.
> The replay will try to retranslate the VA and the TLB will
> try to reread the PTE cache line in a Shared state,
> but will stall until the PTE writer finishes its change
> which only occurs after it receives ACKs from all nodes.

Re: Non-RISC-ness in AMD64

<jwvsfub3vz7.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22577&group=comp.arch#22577

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Wed, 29 Dec 2021 17:11:22 -0500
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <jwvsfub3vz7.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>
<jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>
<2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad>
<jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com>
<JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com>
<jwva6gk81dz.fsf-monnier+comp.arch@gnu.org>
<3efc99a9-f17b-4970-a575-899b71e6340an@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="c8e3e16af424ed6fc14d57f777fdaf24";
logging-data="8035"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185D6Sigia0pfTLQUBonjoQ"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:sxqXgzAMKn48+pi+ZK+8EFo42FQ=
sha1:5Tqh7mdVpoh2gLu1M0Vd+1ULiU8=

by: Stefan Monnier - Wed, 29 Dec 2021 22:11 UTC

>> I still don't see what kind of anomalous behavior you're thinking of
>> that's solved by special handling of (non-atomic) RMW.
> <
> Imagine the above scenario where SW did NOT perform the interlocking:
> <
> What semantic do you prescribe where a non-interrupted stream of instructions
> forms 2 addresses from the same pattern, with none of the pattern registers
> changing value, and touching different memory locations because some other
> piece of SW altered the MMU tables "at just the right time".

Trying to address this for RMW seems pointless because the same problem
affects all the cases where the same logical address is used several times
from different instructions.

Stefan

Re: Non-RISC-ness in AMD64

<d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22578&group=comp.arch#22578

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57c2:: with SMTP id w2mr24445662qta.54.1640821742016;
Wed, 29 Dec 2021 15:49:02 -0800 (PST)
X-Received: by 2002:a05:6808:11c5:: with SMTP id p5mr22078386oiv.51.1640821741802;
Wed, 29 Dec 2021 15:49:01 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 29 Dec 2021 15:49:01 -0800 (PST)
In-Reply-To: <2X4zJ.72820$Bu7.26077@fx26.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fdd8:9d71:527b:6b7b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fdd8:9d71:527b:6b7b
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad>
<ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 29 Dec 2021 23:49:02 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 110

by: MitchAlsup - Wed, 29 Dec 2021 23:49 UTC

On Wednesday, December 29, 2021 at 4:03:44 PM UTC-6, EricP wrote:
> EricP wrote:
> > EricP wrote:
> >> MitchAlsup wrote:
> >>> On Tuesday, December 28, 2021 at 6:43:24 PM UTC-6, EricP wrote:
> >>>> Yes, its not the RMW sequence that is a problem, although the
> >>>> shootdown-IPI does prevent that because the RMW is all before or all
> >>>> after the interrupt. But a LD OP ST sequence could be paged out
> >>>> after the LD and page in for the ST at a different physical address
> >>>> and it would not be harmed.
> >>>> The purpose of the shootdown-IPI handshake is to ensure that all
> >>>> cores agree that there is just one translation for that address.
> >>> <
> >>> But consider the case where the TLBs are coherent. Anyone with TLB
> >>> permission to anyone-else's MMU tables, can write to and instantly
> >>> modify the other-guy's TLB entries. With multi-core operations, this
> >>> could happen....
> >>
> >> Yes but the problem is the *instantly* part - it takes time to
> >> notify all SMP cores and that leaves a hole where one core uses
> >> the old mapping and a different core uses the new mapping.
> >> Waiting for the ACK's closes that hole.
> >>
> >> Note also below the ACK must not be sent too quickly.
> >>
> >>> <
> >>> With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
> >>> at all in modifying MMU tables. You just write the MMU tables and
> >>> it is up to all the HW resources to "do the right thing".
> >>
> >> I was thinking this applied just to RMW instructions but I just
> >> realized it actually applies to *ALL* load or store instructions.
> >>
> >> Any LSQ entry that looks up a translation and is holding the physical
> >> address before it is used is *also effectively a cached TLB entry*.
> >> If a RMW instruction LSQ entry holds its translation while the
> >> OP executes then it is just a longer duration cached copy.
> >>
> >> When a remote TLB receives a shootdown coherence msg, it removes its
> >> own matching entries but it must not ACK until it knows there are
> >> no other copies of the PTE (address, access protections) in the LSQ.
<
If the ST already has a valid Phsical Address why is not allowed to complete.
<
> >> The easiest way is to wait until LSQ empties but that could
> >> have other consequences (it could take a long time for the ACK).
> >
> > Actually it is more than this since in OoO a LD instruction can
> > bypass other LD's or ST's, access a physical address, complete,
> > and be removed from the LSQ.
<
The LD should not be able to be removed from the queue until all stores
in the queue have known-physical-addresses. Yes, it can run early, but
it cannot run (bypass forward) over a store which will interfere with the
data the LD wants.
<
> > But the value loaded in implicitly tied to the translation used
> > even if there is no trace of that physical address in the LSQ.
<
The value loaded is tied to any older store in any queue {pre AGEN in
reservation station, and post AGEN in the miss queues.}
> >
> > If the translation was allowed to change out of order
> > it could allow a younger LD to use an older translation,
> > and an older LD to use a younger translation.
> > Which would create an illegal coherence scenario.
<
This gets you into the game of asking out-of-order with respect to whom ?
<
OoO wrt itself (same core)
OoO wrt any core in the system
> >
> > So the TLB shootdown has to trigger a replay trap for all non-retired
> > instructions and wait for committed ST's to flush to cache
> > before sending the ACK.
> Hmmm... this could deadlock.
> If it had to flush committed ST's to cache before sending the ACK then
> this could deadlock with the PTE writer holding the cache line Exclusive
> awaiting those ACKs if the store was to the same PTE cache line.
<
I think what we have here is a scenario where SW is not allowed to
modify MMU table entries while any thread using those tables are
in a runnable state. Threads need to be waiting for it to be legal to
alter their tables.
>
> The reason I'm concerned is if there are committed stores queued
> to write to a physical page and the translation changes,
> the PTE writer might think that the old page won't be written
> any more but if fact there could many pending writes scattered
> about the system in various hit-under-miss MSHR buffers.
<
Under the rule above: the ST cannot enter execution because the MMU
tables are being executed and the thread(s) is(are) in WAIT states.
<
> That physical page might be reallocated for some other use
> only to be stomped on by all those late writes.
> >> Also any ITLB copies that fetch buffers might have need to be zapped.
> >>
> >> The PTE value must not be allowed to change until all ACKs are received,
> >> just like when switching a cache line from Shared to Exclusive state.
> >
> > The PTE writer can use Exclusive ownership of the PTE cache line to prevent
> > remote TLB's from reloading the PTE too quickly after a shootdown.
> >
> > The PTE writer acquires Exclusive ownership and invalidates all
> > remote cached copies of the PTE cache line. Only after does it
> > purge the TLB entries, trigger a replay, and send its ACK.
> > The replay will try to retranslate the VA and the TLB will
> > try to reread the PTE cache line in a Shared state,
> > but will stall until the PTE writer finishes its change
> > which only occurs after it receives ACKs from all nodes.

Re: Non-RISC-ness in AMD64

<KeHzJ.189942$831.144461@fx40.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22660&group=comp.arch#22660

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!feeder5.feed.usenet.farm!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx40.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com>
In-Reply-To: <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 104
Message-ID: <KeHzJ.189942$831.144461@fx40.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 31 Dec 2021 17:38:50 UTC
Date: Fri, 31 Dec 2021 12:38:28 -0500
X-Received-Bytes: 6061
X-Original-Bytes: 6010

by: EricP - Fri, 31 Dec 2021 17:38 UTC

MitchAlsup wrote:
> On Wednesday, December 29, 2021 at 4:03:44 PM UTC-6, EricP wrote:
>> EricP wrote:
>>>>
>>>> Any LSQ entry that looks up a translation and is holding the physical
>>>> address before it is used is *also effectively a cached TLB entry*.
>>>> If a RMW instruction LSQ entry holds its translation while the
>>>> OP executes then it is just a longer duration cached copy.
>>>>
>>>> When a remote TLB receives a shootdown coherence msg, it removes its
>>>> own matching entries but it must not ACK until it knows there are
>>>> no other copies of the PTE (address, access protections) in the LSQ.
> <
> If the ST already has a valid Phsical Address why is not allowed to complete.

I was looking for a way to make the delay to sending an ACK predictable.

A ST may have its physical address but it can't retire and commit
until all prior instructions have retired.

>>>> The easiest way is to wait until LSQ empties but that could
>>>> have other consequences (it could take a long time for the ACK).
>>> Actually it is more than this since in OoO a LD instruction can
>>> bypass other LD's or ST's, access a physical address, complete,
>>> and be removed from the LSQ.
> <
> The LD should not be able to be removed from the queue until all stores
> in the queue have known-physical-addresses. Yes, it can run early, but
> it cannot run (bypass forward) over a store which will interfere with the
> data the LD wants.

Oops right. There are a lot of causality balls to juggle.
Virtual addresses can resolve in any order and translate in any order.
Also its not just unknown stores that loads should not bypass.
Loads should not bypass any unknown addresses.

>>> But the value loaded in implicitly tied to the translation used
>>> even if there is no trace of that physical address in the LSQ.
> <
> The value loaded is tied to any older store in any queue {pre AGEN in
> reservation station, and post AGEN in the miss queues.}

This would be easier if we triggered a replay and tossed all the
in-flight values and any existing translations.

Then we'd only have to deal with the values in miss queues
as those are committed values from retired stores.
We don't know which, if any, of them is a store to the old page
that is retiring so we have to assume that all of them are.

We have to ensure all pending stores are complete and have
reached their coherence points and updated local cache.
Only when there are no references to the old page is
it available for reuse.

>>> If the translation was allowed to change out of order
>>> it could allow a younger LD to use an older translation,
>>> and an older LD to use a younger translation.
>>> Which would create an illegal coherence scenario.
> <
> This gets you into the game of asking out-of-order with respect to whom ?
> <
> OoO wrt itself (same core)
> OoO wrt any core in the system

Good question. I'm wondering if thinking about this as
versioning in a database is helpful.

>>> So the TLB shootdown has to trigger a replay trap for all non-retired
>>> instructions and wait for committed ST's to flush to cache
>>> before sending the ACK.
>> Hmmm... this could deadlock.
>> If it had to flush committed ST's to cache before sending the ACK then
>> this could deadlock with the PTE writer holding the cache line Exclusive
>> awaiting those ACKs if the store was to the same PTE cache line.
> <
> I think what we have here is a scenario where SW is not allowed to
> modify MMU table entries while any thread using those tables are
> in a runnable state. Threads need to be waiting for it to be legal to
> alter their tables.
>> The reason I'm concerned is if there are committed stores queued
>> to write to a physical page and the translation changes,
>> the PTE writer might think that the old page won't be written
>> any more but if fact there could many pending writes scattered
>> about the system in various hit-under-miss MSHR buffers.
> <
> Under the rule above: the ST cannot enter execution because the MMU
> tables are being executed and the thread(s) is(are) in WAIT states.

I want to reset and start again from scratch.
I'm going to assume that a replay trap is triggered and
tosses all in-flight instructions, values and translations.
Then we can add more complex scenarios back in later.

There is a lot of similarity in this to an OS managing files.
The PTE is acting like an OS file handle,
the copies of the physical address are like referenced counting
to OS kernel resources, the memory values are file contents.

Updating the PTE is like closing an old file handle and opening a new
handle to the same file without loosing any file updates made using
the old handle.

Re: Non-RISC-ness in AMD64

<88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22665&group=comp.arch#22665

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5dc8:: with SMTP id m8mr32516801qvh.71.1640979580240;
Fri, 31 Dec 2021 11:39:40 -0800 (PST)
X-Received: by 2002:a9d:7443:: with SMTP id p3mr25826707otk.331.1640979580014;
Fri, 31 Dec 2021 11:39:40 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 31 Dec 2021 11:39:39 -0800 (PST)
In-Reply-To: <KeHzJ.189942$831.144461@fx40.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:44d8:5de8:1a0e:df71;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:44d8:5de8:1a0e:df71
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad>
<ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad>
<d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 31 Dec 2021 19:39:40 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 113

by: MitchAlsup - Fri, 31 Dec 2021 19:39 UTC

On Friday, December 31, 2021 at 11:38:55 AM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Wednesday, December 29, 2021 at 4:03:44 PM UTC-6, EricP wrote:
> >> EricP wrote:
> >>>>
> >>>> Any LSQ entry that looks up a translation and is holding the physical
> >>>> address before it is used is *also effectively a cached TLB entry*.
> >>>> If a RMW instruction LSQ entry holds its translation while the
> >>>> OP executes then it is just a longer duration cached copy.
> >>>>
> >>>> When a remote TLB receives a shootdown coherence msg, it removes its
> >>>> own matching entries but it must not ACK until it knows there are
> >>>> no other copies of the PTE (address, access protections) in the LSQ.
> > <
> > If the ST already has a valid Phsical Address why is not allowed to complete.
> I was looking for a way to make the delay to sending an ACK predictable.
>
> A ST may have its physical address but it can't retire and commit
> until all prior instructions have retired.
<
Yes, It cannot retire, but it has everything it needs to retire.
<
> >>>> The easiest way is to wait until LSQ empties but that could
> >>>> have other consequences (it could take a long time for the ACK).
> >>> Actually it is more than this since in OoO a LD instruction can
> >>> bypass other LD's or ST's, access a physical address, complete,
> >>> and be removed from the LSQ.
> > <
> > The LD should not be able to be removed from the queue until all stores
> > in the queue have known-physical-addresses. Yes, it can run early, but
> > it cannot run (bypass forward) over a store which will interfere with the
> > data the LD wants.
<
> Oops right. There are a lot of causality balls to juggle.
> Virtual addresses can resolve in any order and translate in any order.
> Also its not just unknown stores that loads should not bypass.
> Loads should not bypass any unknown addresses.
<
In Mc 88120 we allowed LDs to bypass older LDs with unknown address,
But this become dangerous if the bypassed LD (or the LD at hand) are
to MMI/O address space (Or configuration address space).
<
> >>> But the value loaded in implicitly tied to the translation used
> >>> even if there is no trace of that physical address in the LSQ.
> > <
> > The value loaded is tied to any older store in any queue {pre AGEN in
> > reservation station, and post AGEN in the miss queues.}
<
> This would be easier if we triggered a replay and tossed all the
> in-flight values and any existing translations.
>
> Then we'd only have to deal with the values in miss queues
> as those are committed values from retired stores.
> We don't know which, if any, of them is a store to the old page
> that is retiring so we have to assume that all of them are.
>
> We have to ensure all pending stores are complete and have
> reached their coherence points and updated local cache.
> Only when there are no references to the old page is
> it available for reuse.
<
> >>> If the translation was allowed to change out of order
> >>> it could allow a younger LD to use an older translation,
> >>> and an older LD to use a younger translation.
> >>> Which would create an illegal coherence scenario.
> > <
> > This gets you into the game of asking out-of-order with respect to whom ?
> > <
> > OoO wrt itself (same core)
> > OoO wrt any core in the system
<
> Good question. I'm wondering if thinking about this as
> versioning in a database is helpful.
<
> >>> So the TLB shootdown has to trigger a replay trap for all non-retired
> >>> instructions and wait for committed ST's to flush to cache
> >>> before sending the ACK.
> >> Hmmm... this could deadlock.
> >> If it had to flush committed ST's to cache before sending the ACK then
> >> this could deadlock with the PTE writer holding the cache line Exclusive
> >> awaiting those ACKs if the store was to the same PTE cache line.
> > <
> > I think what we have here is a scenario where SW is not allowed to
> > modify MMU table entries while any thread using those tables are
> > in a runnable state. Threads need to be waiting for it to be legal to
> > alter their tables.
> >> The reason I'm concerned is if there are committed stores queued
> >> to write to a physical page and the translation changes,
> >> the PTE writer might think that the old page won't be written
> >> any more but if fact there could many pending writes scattered
> >> about the system in various hit-under-miss MSHR buffers.
> > <
> > Under the rule above: the ST cannot enter execution because the MMU
> > tables are being executed and the thread(s) is(are) in WAIT states.
<
> I want to reset and start again from scratch.
> I'm going to assume that a replay trap is triggered and
> tosses all in-flight instructions, values and translations.
> Then we can add more complex scenarios back in later.
<
Brutal, but often effective--especially if the scenario occurs seldomly.
>
> There is a lot of similarity in this to an OS managing files.
> The PTE is acting like an OS file handle,
> the copies of the physical address are like referenced counting
> to OS kernel resources, the memory values are file contents.
>
> Updating the PTE is like closing an old file handle and opening a new
> handle to the same file without loosing any file updates made using
> the old handle.
<
And the question at hand is, what does the "system" do with all the writes
to the file between the closing of one handle and the opening of another ??
And it is not allowed to lose the order of the writes, either.

Re: Non-RISC-ness in AMD64

<f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22672&group=comp.arch#22672

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:5286:: with SMTP id kj6mr33758668qvb.74.1641000091932;
Fri, 31 Dec 2021 17:21:31 -0800 (PST)
X-Received: by 2002:a05:6830:1445:: with SMTP id w5mr26649735otp.112.1641000091683;
Fri, 31 Dec 2021 17:21:31 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 31 Dec 2021 17:21:31 -0800 (PST)
In-Reply-To: <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:103e:69ab:433f:589c;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:103e:69ab:433f:589c
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad>
<ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad>
<d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad>
<88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 01 Jan 2022 01:21:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 10

by: robf...@gmail.com - Sat, 1 Jan 2022 01:21 UTC

>> A ST may have its physical address but it can't retire and commit
>> until all prior instructions have retired.
><
>Yes, It cannot retire, but it has everything it needs to retire.

?What is meant by retire or commit? In my cores I allow the ST operation
to be completed if there is no possibility of flow control change in prior
instructions. Memory may be updated even if prior instructions have not
completed yet.

Re: Non-RISC-ness in AMD64

<Ow%zJ.268897$I%1.169134@fx36.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22681&group=comp.arch#22681

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx36.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
In-Reply-To: <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 27
Message-ID: <Ow%zJ.268897$I%1.169134@fx36.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 01 Jan 2022 16:43:26 UTC
Date: Sat, 01 Jan 2022 11:42:34 -0500
X-Received-Bytes: 2603

by: EricP - Sat, 1 Jan 2022 16:42 UTC

robf...@gmail.com wrote:
>>> A ST may have its physical address but it can't retire and commit
>>> until all prior instructions have retired.
>> <
>> Yes, It cannot retire, but it has everything it needs to retire.
>
>
> ?What is meant by retire or commit? In my cores I allow the ST operation
> to be completed if there is no possibility of flow control change in prior
> instructions. Memory may be updated even if prior instructions have not
> completed yet.

Similar. The difference sounds like how and when exceptions are detected.

I think of Retire as the point where the oldest instruction is removed
from the queue, and if it had no exceptions updates the program state.
For a ST instruction it initiates the sequence that ultimately
writes a value to an address.

My method only looks at the oldest instruction to see if
it has an exception, and tosses any changes if it does.

Your method requires knowing that no future partially complete
instructions can trigger an exception or trigger a replay,
which allows it to initiate a non-reversible state change ASAP.

Re: Non-RISC-ness in AMD64

<f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22685&group=comp.arch#22685

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:2aab:: with SMTP id js11mr35831759qvb.54.1641064348084;
Sat, 01 Jan 2022 11:12:28 -0800 (PST)
X-Received: by 2002:aca:c44:: with SMTP id i4mr2120025oiy.0.1641064347842;
Sat, 01 Jan 2022 11:12:27 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 1 Jan 2022 11:12:27 -0800 (PST)
In-Reply-To: <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8861:c1fc:c10c:a9f4;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8861:c1fc:c10c:a9f4
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad>
<ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad>
<d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad>
<88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 01 Jan 2022 19:12:28 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 14

by: MitchAlsup - Sat, 1 Jan 2022 19:12 UTC

On Friday, December 31, 2021 at 7:21:32 PM UTC-6, robf...@gmail.com wrote:
> >> A ST may have its physical address but it can't retire and commit
> >> until all prior instructions have retired.
> ><
> >Yes, It cannot retire, but it has everything it needs to retire.
<
> ?What is meant by retire or commit? In my cores I allow the ST operation
> to be completed if there is no possibility of flow control change in prior
> instructions. Memory may be updated even if prior instructions have not
> completed yet.
<
Comit is the point where there are no older instructions that could cause
the ST not to be performed due to raising of an exception.
<
Retire is the point where the ST can be performed to unbackupable memory.

Re: Non-RISC-ness in AMD64

<50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22686&group=comp.arch#22686

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1988:: with SMTP id bm8mr28835669qkb.494.1641078454420; Sat, 01 Jan 2022 15:07:34 -0800 (PST)
X-Received: by 2002:aca:646:: with SMTP id 67mr31520485oig.175.1641078454169; Sat, 01 Jan 2022 15:07:34 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 1 Jan 2022 15:07:33 -0800 (PST)
In-Reply-To: <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:5505:c070:7945:2082; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:5505:c070:7945:2082
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com> <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 01 Jan 2022 23:07:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 43

by: robf...@gmail.com - Sat, 1 Jan 2022 23:07 UTC

On Saturday, January 1, 2022 at 2:12:29 PM UTC-5, MitchAlsup wrote:
> On Friday, December 31, 2021 at 7:21:32 PM UTC-6, robf...@gmail.com wrote:
> > >> A ST may have its physical address but it can't retire and commit
> > >> until all prior instructions have retired.
> > ><
> > >Yes, It cannot retire, but it has everything it needs to retire.
> <
> > ?What is meant by retire or commit? In my cores I allow the ST operation
> > to be completed if there is no possibility of flow control change in prior
> > instructions. Memory may be updated even if prior instructions have not
> > completed yet.
> <
> Comit is the point where there are no older instructions that could cause
> the ST not to be performed due to raising of an exception.
> <
> Retire is the point where the ST can be performed to unbackupable memory.

Thanks, I have some difficulty understanding the difference between commit
and retire, but I think I have got it now. Commit means the machine is dedicated
to performing the instruction. The instruction might be executed but registers
are not written yet. Retire means the machine state is updated, registers
updated. Normally stores do not update memory until they retire. I called the
retire stage in my core commit, oh well. Now I am confusing commit and issue.

There are many instructions that fall under the category of not causing
exceptions or changes of flow control. For my machine, there is a signal
coming out of the decoder (canex) that indicates such. Instructions such as
unsigned multiply and divide, add, and, shifts, etc. qualify. However, if debug
mode is on then potentially any instruction can cause an exception – single
stepping, so then stores do not update memory until they are ready to retire.

Re: Non-RISC-ness in AMD64

<5a2a84d2-5fc4-4864-9d79-772f4b18d075n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22687&group=comp.arch#22687

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:44e:: with SMTP id o14mr35183178qtx.369.1641079170031; Sat, 01 Jan 2022 15:19:30 -0800 (PST)
X-Received: by 2002:aca:c44:: with SMTP id i4mr2138822oiy.0.1641079169797; Sat, 01 Jan 2022 15:19:29 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 1 Jan 2022 15:19:29 -0800 (PST)
In-Reply-To: <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8861:c1fc:c10c:a9f4; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8861:c1fc:c10c:a9f4
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com> <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com> <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a2a84d2-5fc4-4864-9d79-772f4b18d075n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 01 Jan 2022 23:19:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 83

by: MitchAlsup - Sat, 1 Jan 2022 23:19 UTC

On Saturday, January 1, 2022 at 5:07:35 PM UTC-6, robf...@gmail.com wrote:
> On Saturday, January 1, 2022 at 2:12:29 PM UTC-5, MitchAlsup wrote:
> > On Friday, December 31, 2021 at 7:21:32 PM UTC-6, robf...@gmail.com wrote:
> > > >> A ST may have its physical address but it can't retire and commit
> > > >> until all prior instructions have retired.
> > > ><
> > > >Yes, It cannot retire, but it has everything it needs to retire.
> > <
> > > ?What is meant by retire or commit? In my cores I allow the ST operation
> > > to be completed if there is no possibility of flow control change in prior
> > > instructions. Memory may be updated even if prior instructions have not
> > > completed yet.
> > <
> > Comit is the point where there are no older instructions that could cause
> > the ST not to be performed due to raising of an exception.
> > <
> > Retire is the point where the ST can be performed to unbackupable memory.
<
> Thanks, I have some difficulty understanding the difference between commit
> and retire, but I think I have got it now. Commit means the machine is dedicated
> to performing the instruction. The instruction might be executed but registers
> are not written yet. Retire means the machine state is updated, registers
> updated. Normally stores do not update memory until they retire. I called the
> retire stage in my core commit, oh well. Now I am confusing commit and issue.
<
In the Mc 88120, stores could be written into the conditional cache where younger
loads could access them, but this was a place where stores could still be "thrown
away" due to exceptions or even branch mispredictions. These were executed even
prior to commit.
<
Between comit and retire, stores from the conditional cache would be migrated to
the <real> cache or to main memory depending on set-overload.
<
At retire, all the resources being used to track the instruction in progress is made
available to a new instruction (so it can enter the execution window).
<
This is how we (Mike Shebanow and I) defined the nomenclature in 1991. While doing
the Mc 88120 we did not understand the subtle difference until over a year into the
design.
>
> There are many instructions that fall under the category of not causing
> exceptions or changes of flow control. For my machine, there is a signal
> coming out of the decoder (canex) that indicates such. Instructions such as
> unsigned multiply and divide, add, and, shifts, etc. qualify. However, if debug
> mode is on then potentially any instruction can cause an exception – single
> stepping, so then stores do not update memory until they are ready to retire.
<
You are using the word "retire" to mean it is safe to comit state (visible outside
the execution of this thread).
This is what I use the word "comit" to indicate--it is safe to comit the store
to memory that cannot be undone (visible outside of this thread)
I (we) then use the word retire to denote the point where execution resources are
delivered back so new instruction can use them.
<
This may not be the nomenclature used by UCB or Stanford.

Re: Non-RISC-ness in AMD64

<ZynAJ.135981$QB1.40170@fx42.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22691&group=comp.arch#22691

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx42.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com> <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com> <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
In-Reply-To: <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 56
Message-ID: <ZynAJ.135981$QB1.40170@fx42.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 02 Jan 2022 20:04:09 UTC
Date: Sun, 02 Jan 2022 15:03:46 -0500
X-Received-Bytes: 4424
X-Original-Bytes: 4373

by: EricP - Sun, 2 Jan 2022 20:03 UTC

robf...@gmail.com wrote:
> On Saturday, January 1, 2022 at 2:12:29 PM UTC-5, MitchAlsup wrote:
>> On Friday, December 31, 2021 at 7:21:32 PM UTC-6, robf...@gmail.com wrote:
>>>>> A ST may have its physical address but it can't retire and commit
>>>>> until all prior instructions have retired.
>>>> <
>>>> Yes, It cannot retire, but it has everything it needs to retire.
>> <
>>> ?What is meant by retire or commit? In my cores I allow the ST operation
>>> to be completed if there is no possibility of flow control change in prior
>>> instructions. Memory may be updated even if prior instructions have not
>>> completed yet.
>> <
>> Comit is the point where there are no older instructions that could cause
>> the ST not to be performed due to raising of an exception.
>> <
>> Retire is the point where the ST can be performed to unbackupable memory.
>
> Thanks, I have some difficulty understanding the difference between commit
> and retire, but I think I have got it now. Commit means the machine is dedicated
> to performing the instruction. The instruction might be executed but registers
> are not written yet. Retire means the machine state is updated, registers
> updated. Normally stores do not update memory until they retire. I called the
> retire stage in my core commit, oh well. Now I am confusing commit and issue.
>
> There are many instructions that fall under the category of not causing
> exceptions or changes of flow control. For my machine, there is a signal
> coming out of the decoder (canex) that indicates such. Instructions such as
> unsigned multiply and divide, add, and, shifts, etc. qualify. However, if debug
> mode is on then potentially any instruction can cause an exception – single
> stepping, so then stores do not update memory until they are ready to retire.

The terms Commit and Retire are often used interchangeably.

While Retire is thought of as a single step where the new state is
made permanent and resources allocated to an instruction are recovered,
depending on the uArch and instruction it can require multiple steps.

A ST instruction in particular can linger in the Load-Store Queue
after its Instruction Queue entry has been retired and deleted,
waiting to be passed to the memory subsystem. And there could
be multiple committed ST operations waiting ahead it it.
After it is transferred to memory subsystem the LSQ entry is recovered.

Looking at a diagram of the IBM z15 pipeline I see 10 stages
for "Completion" and "Checkpointing".

https://www.servethehome.com/ibm-z15-mainframe-processor-design/hot-chips-32-ibm-z15-processor-pipeline/

Note also that there is a difference between when a ST instruction
retires and starts to transfer a new value to memory,
and when that new value becomes coherently visible to all cores
as the "One True Value" of an address.

Re: Non-RISC-ness in AMD64

<2022Jan3.141346@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22700&group=comp.arch#22700

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Mon, 03 Jan 2022 13:13:46 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 14
Message-ID: <2022Jan3.141346@mips.complang.tuwien.ac.at>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com> <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com> <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="d2ff4cf4d9565f5ed7aa38586fbd14d6";
logging-data="8985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+VgR3Ed8YhZG+Dz/XleRR"
Cancel-Lock: sha1:DSaG83Q5MDLQCZ+9IrGe5hCIXAQ=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Mon, 3 Jan 2022 13:13 UTC

"robf...@gmail.com" <robfi680@gmail.com> writes:
>Thanks, I have some difficulty understanding the difference between commit
>and retire

That's not surprising, because it's the same point in time, and
therefore people (including me) use these words as synonyms. Commit
is when the architectural effect of the instruction becomes permanent.
After that there is no point in keeping the instruction around, so the
instruction can retire.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<3bb8d98f-9401-4019-8a10-d089ab47456dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22703&group=comp.arch#22703

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:4446:: with SMTP id w6mr32954325qkp.631.1641230784529;
Mon, 03 Jan 2022 09:26:24 -0800 (PST)
X-Received: by 2002:a05:6808:2396:: with SMTP id bp22mr38172486oib.78.1641230784356;
Mon, 03 Jan 2022 09:26:24 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Jan 2022 09:26:24 -0800 (PST)
In-Reply-To: <2022Jan3.141346@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5959:7534:4159:ef20;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5959:7534:4159:ef20
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <cC2zJ.104301$zF3.1172@fx03.iad>
<ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad>
<d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad>
<88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
<f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com> <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
<2022Jan3.141346@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3bb8d98f-9401-4019-8a10-d089ab47456dn@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 03 Jan 2022 17:26:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 15

by: MitchAlsup - Mon, 3 Jan 2022 17:26 UTC

On Monday, January 3, 2022 at 7:19:43 AM UTC-6, Anton Ertl wrote:
> "robf...@gmail.com" <robf...@gmail.com> writes:
> >Thanks, I have some difficulty understanding the difference between commit
> >and retire
> That's not surprising, because it's the same point in time, and
> therefore people (including me) use these words as synonyms. Commit
> is when the architectural effect of the instruction becomes permanent.
s/can/is allowed to/
> After that there is no point in keeping the instruction around, so the
> instruction can retire.
One may have a number of instructions that retire together and some other
instruction can be holding up the STs retirement.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Non-RISC-ness in AMD64

<sr0umo$73n$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22734&group=comp.arch#22734

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Mon, 3 Jan 2022 23:58:17 -0800
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <sr0umo$73n$1@dont-email.me>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad>
<2X4zJ.72820$Bu7.26077@fx26.iad>
<d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com>
<KeHzJ.189942$831.144461@fx40.iad>
<88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>
<f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com>
<f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com>
<50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com>
<2022Jan3.141346@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 4 Jan 2022 07:58:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de80ed7290c9d686a431169b61ac8f21";
logging-data="7287"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/sk+WkKDM6TMzyUjVP4SLI"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:ZGmJMXpwSzVo2Yq2j64dTxWB+Jo=
In-Reply-To: <2022Jan3.141346@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Ivan Godard - Tue, 4 Jan 2022 07:58 UTC

On 1/3/2022 5:13 AM, Anton Ertl wrote:
> "robf...@gmail.com" <robfi680@gmail.com> writes:
>> Thanks, I have some difficulty understanding the difference between commit
>> and retire
>
> That's not surprising, because it's the same point in time, and
> therefore people (including me) use these words as synonyms. Commit
> is when the architectural effect of the instruction becomes permanent.
> After that there is no point in keeping the instruction around, so the
> instruction can retire.
>
> - anton

Commit is when it *will* happen. Retire is when it *has* happened.

Re: Non-RISC-ness in AMD64

<KR_AJ.147898$7D4.39943@fx37.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22742&group=comp.arch#22742

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!npeer.as286.net!npeer-ng0.as286.net!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx37.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>
In-Reply-To: <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 39
Message-ID: <KR_AJ.147898$7D4.39943@fx37.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 04 Jan 2022 16:46:34 UTC
Date: Tue, 04 Jan 2022 11:45:48 -0500
X-Received-Bytes: 3198

by: EricP - Tue, 4 Jan 2022 16:45 UTC

MitchAlsup wrote:
> On Friday, December 31, 2021 at 11:38:55 AM UTC-6, EricP wrote:
>> Also its not just unknown stores that loads should not bypass.
>> Loads should not bypass any unknown addresses.
> <
> In Mc 88120 we allowed LDs to bypass older LDs with unknown address,
> But this become dangerous if the bypassed LD (or the LD at hand) are
> to MMI/O address space (Or configuration address space).

Its not just MMIO, its that loads to the same address must be
performed in program order in case the cache line is grabbed away.
A load-load bypass could allow a younger load to read an older value and
an older load to read a younger value, a coherence violation under TSO-86.

#1 LD r0,[r1] // r1 is not yet resolved
#2 LD r2,[r3] // r3 = 1234

R0 address pending and R3 set to address 1234.
If LD#2 is allowed to proceed, R2 reads the old value from 1234.
Then that cache line is grabbed away by another core and changed.
Later R1 resolves to 1234 and R0 loads the new value.
Result is R2 has old value for 1234 and R1 has new value.

Its not just unresolved addresses that cause this,
they just make the window of vulnerability wider.
An LSQ scheduler which chose resolved entries to service
in the wrong order might cause it too.

To prevent the coherence violation one can block the load from
bypassing all older unresolved load and store addresses,
AND LSQ scheduler enforces older to younger ordering to the same address
(just in case the line is grabbed away between entries).

Or the load bypass could put a watch point on the cache line
that triggers a replay trap if the line gets invalidated.
But the watch point must be removed somehow at the right time
and that is probably more complicated.

Re: Non-RISC-ness in AMD64

<d%_AJ.159587$Ql5.19493@fx39.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22743&group=comp.arch#22743

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!feeder5.feed.usenet.farm!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com> <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com> <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com> <2022Jan3.141346@mips.complang.tuwien.ac.at> <3bb8d98f-9401-4019-8a10-d089ab47456dn@googlegroups.com>
In-Reply-To: <3bb8d98f-9401-4019-8a10-d089ab47456dn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 38
Message-ID: <d%_AJ.159587$Ql5.19493@fx39.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 04 Jan 2022 16:56:41 UTC
Date: Tue, 04 Jan 2022 11:56:05 -0500
X-Received-Bytes: 2948

by: EricP - Tue, 4 Jan 2022 16:56 UTC

MitchAlsup wrote:
> On Monday, January 3, 2022 at 7:19:43 AM UTC-6, Anton Ertl wrote:
>> "robf...@gmail.com" <robf...@gmail.com> writes:
>>> Thanks, I have some difficulty understanding the difference between commit
>>> and retire
>> That's not surprising, because it's the same point in time, and
>> therefore people (including me) use these words as synonyms. Commit
>> is when the architectural effect of the instruction becomes permanent.
> s/can/is allowed to/
>> After that there is no point in keeping the instruction around, so the
>> instruction can retire.
> One may have a number of instructions that retire together and some other
> instruction can be holding up the STs retirement.

Just to muddy the water, there is also some research on
Out-of-Order Commit/Retire such as this one by Gordon Bell:

Deconstructing Commit, GB Bell, MH Lipasti, 2004
https://pharm.ece.wisc.edu/papers/ispass2004gbell.pdf

The rules for OOC concern:
- WAR hazards
- unresolved older branches
- exceptions from older instructions
- replay traps from older instructions used to enforce
the memory consistency model (like load-load bypass ordering).

One of the advantages of OOC is early resource recovery but the
problem is that simple structures like circular buffers don't work.
Their proposed solution is compacting buffers, basically FIFO
shift registers that pack down to eliminate any deleted holes.

They say a "collapsing the ROB does not add significant complexity"
but I think it is considerably more expensive than a circular buffer
especially if you want to allow multiple deletions/retires at once
(takes lots and lots of muxes to do that).

Re: Non-RISC-ness in AMD64

<ecbb983d-e688-4816-a3af-cadb5fa6da51n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22754&group=comp.arch#22754

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:29c8:: with SMTP id gh8mr48323493qvb.78.1641331567208; Tue, 04 Jan 2022 13:26:07 -0800 (PST)
X-Received: by 2002:a05:6808:1248:: with SMTP id o8mr205568oiv.157.1641331566893; Tue, 04 Jan 2022 13:26:06 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 4 Jan 2022 13:26:06 -0800 (PST)
In-Reply-To: <KR_AJ.147898$7D4.39943@fx37.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4ce0:ecdb:e44f:1566; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4ce0:ecdb:e44f:1566
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad> <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <KR_AJ.147898$7D4.39943@fx37.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ecbb983d-e688-4816-a3af-cadb5fa6da51n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 04 Jan 2022 21:26:07 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 48

by: MitchAlsup - Tue, 4 Jan 2022 21:26 UTC

On Tuesday, January 4, 2022 at 10:46:37 AM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Friday, December 31, 2021 at 11:38:55 AM UTC-6, EricP wrote:
> >> Also its not just unknown stores that loads should not bypass.
> >> Loads should not bypass any unknown addresses.
> > <
> > In Mc 88120 we allowed LDs to bypass older LDs with unknown address,
> > But this become dangerous if the bypassed LD (or the LD at hand) are
> > to MMI/O address space (Or configuration address space).
<
> Its not just MMIO, its that loads to the same address must be
> performed in program order in case the cache line is grabbed away.
> A load-load bypass could allow a younger load to read an older value and
> an older load to read a younger value, a coherence violation under TSO-86.
>
> #1 LD r0,[r1] // r1 is not yet resolved
> #2 LD r2,[r3] // r3 = 1234
>
> R0 address pending and R3 set to address 1234.
> If LD#2 is allowed to proceed, R2 reads the old value from 1234.
> Then that cache line is grabbed away by another core and changed.
> Later R1 resolves to 1234 and R0 loads the new value.
> Result is R2 has old value for 1234 and R1 has new value.
>
> Its not just unresolved addresses that cause this,
> they just make the window of vulnerability wider.
> An LSQ scheduler which chose resolved entries to service
> in the wrong order might cause it too.
<
What we did in the Mc 88120 was: When a LD result had been delivered,
and the LD remained unRetireable AND there was a SNOOP to the cache
line the LD consumed; the window was backed up to the beginning of the
packet containing the LD instruction. In the K9 design, we did a similar
backup, but just marked all the affected instructions are unExecuted
(rather than flushing and reInserting them into the window.)
>
> To prevent the coherence violation one can block the load from
> bypassing all older unresolved load and store addresses,
> AND LSQ scheduler enforces older to younger ordering to the same address
> (just in case the line is grabbed away between entries).
<
These cases happen so seldomly, that using the brute hand of mispredict
seems in-order. And when they do occur, doing them fast is never as good
as appearing as if the instruction stream is always in-order.
>
> Or the load bypass could put a watch point on the cache line
> that triggers a replay trap if the line gets invalidated.
> But the watch point must be removed somehow at the right time
> and that is probably more complicated.

Re: Non-RISC-ness in AMD64

<1945b86f-78b6-457e-b7e1-eee974305a3bn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22756&group=comp.arch#22756

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:212d:: with SMTP id r13mr7416502qvc.63.1641331941288; Tue, 04 Jan 2022 13:32:21 -0800 (PST)
X-Received: by 2002:a05:6808:1283:: with SMTP id a3mr245130oiw.110.1641331940978; Tue, 04 Jan 2022 13:32:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 4 Jan 2022 13:32:20 -0800 (PST)
In-Reply-To: <d%_AJ.159587$Ql5.19493@fx39.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4ce0:ecdb:e44f:1566; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4ce0:ecdb:e44f:1566
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <cC2zJ.104301$zF3.1172@fx03.iad> <ll4zJ.121128$IB7.59871@fx02.iad> <2X4zJ.72820$Bu7.26077@fx26.iad> <d9a332f4-5bf1-4b46-a39e-7765519cddd2n@googlegroups.com> <KeHzJ.189942$831.144461@fx40.iad> <88c3e3f7-e8b8-4fa2-8f2b-3809d0b4a969n@googlegroups.com> <f7ca2f71-460f-42a7-91d7-14957109f3c9n@googlegroups.com> <f5ca9e45-b8de-4662-b53e-b56c5cd7c298n@googlegroups.com> <50c9f47b-2cd5-4c4c-ab64-b6cb50e42ba7n@googlegroups.com> <2022Jan3.141346@mips.complang.tuwien.ac.at> <3bb8d98f-9401-4019-8a10-d089ab47456dn@googlegroups.com> <d%_AJ.159587$Ql5.19493@fx39.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1945b86f-78b6-457e-b7e1-eee974305a3bn@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 04 Jan 2022 21:32:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 58

by: MitchAlsup - Tue, 4 Jan 2022 21:32 UTC

On Tuesday, January 4, 2022 at 10:56:46 AM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Monday, January 3, 2022 at 7:19:43 AM UTC-6, Anton Ertl wrote:
> >> "robf...@gmail.com" <robf...@gmail.com> writes:
> >>> Thanks, I have some difficulty understanding the difference between commit
> >>> and retire
> >> That's not surprising, because it's the same point in time, and
> >> therefore people (including me) use these words as synonyms. Commit
> >> is when the architectural effect of the instruction becomes permanent.
> > s/can/is allowed to/
> >> After that there is no point in keeping the instruction around, so the
> >> instruction can retire.
> > One may have a number of instructions that retire together and some other
> > instruction can be holding up the STs retirement.
> Just to muddy the water, there is also some research on
> Out-of-Order Commit/Retire such as this one by Gordon Bell:
>
> Deconstructing Commit, GB Bell, MH Lipasti, 2004
> https://pharm.ece.wisc.edu/papers/ispass2004gbell.pdf
>
> The rules for OOC concern:
> - WAR hazards
renaming registers and memory eliminates this.
> - unresolved older branches
I never tried to retire anything still covered by the shadow of a branch.
> - exceptions from older instructions
See unresolved older branches.
> - replay traps from older instructions used to enforce
I never tried to retire anything still in the shadow of a potential exception.
> the memory consistency model (like load-load bypass ordering).
>
> One of the advantages of OOC is early resource recovery but the
> problem is that simple structures like circular buffers don't work.
> Their proposed solution is compacting buffers, basically FIFO
> shift registers that pack down to eliminate any deleted holes.
<
We used the exception flags, and some memory state/memory-ref
to decide if all instructions in a packet were ready to retire, then we
retired them en massé. The circuits I used look similar to that found
in the Bell paper.
>
> They say a "collapsing the ROB does not add significant complexity"
> but I think it is considerably more expensive than a circular buffer
> especially if you want to allow multiple deletions/retires at once
> (takes lots and lots of muxes to do that).
<
A lot of this is dependent on how long it takes to backup your execution
window. Mc 88120 could perform a ranch instruction in cycle[k] and
both backup the execution window AND insert instructions from the
alternate packet in cycle[k+1]--that is zero cycles wasted on recovery.
In order to do this, one needs the source register decoders in the RF
to be CAMs......

Nonsense. Space is blue and birds fly through it. -- Heisenberg

devel / comp.arch / Re: Non-RISC-ness in AMD64

Subject	Author
Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	BGB
Re: Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	BGB
Re: Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Stephen Fuld
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Stefan Monnier
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Stefan Monnier
Re: Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	Stefan Monnier
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Stefan Monnier
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Stefan Monnier
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	robf...@gmail.com
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	robf...@gmail.com
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	Anton Ertl
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Ivan Godard
Re: Non-RISC-ness in AMD64	EricP
Re: Non-RISC-ness in AMD64	MitchAlsup
Re: Non-RISC-ness in AMD64	Andy Valencia