Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"The greatest warriors are the ones who fight for peace." -- Holly Near


devel / comp.arch / Non-RISC-ness in AMD64

SubjectAuthor
* Non-RISC-ness in AMD64Anton Ertl
+* Re: Non-RISC-ness in AMD64MitchAlsup
|`* Re: Non-RISC-ness in AMD64Anton Ertl
| `* Re: Non-RISC-ness in AMD64MitchAlsup
|  +- Re: Non-RISC-ness in AMD64BGB
|  `* Re: Non-RISC-ness in AMD64Anton Ertl
|   `- Re: Non-RISC-ness in AMD64BGB
+* Re: Non-RISC-ness in AMD64Anton Ertl
|`* Re: Non-RISC-ness in AMD64Anton Ertl
| `* Re: Non-RISC-ness in AMD64MitchAlsup
|  `* Re: Non-RISC-ness in AMD64EricP
|   `- Re: Non-RISC-ness in AMD64MitchAlsup
`* Re: Non-RISC-ness in AMD64Stephen Fuld
 `* Re: Non-RISC-ness in AMD64MitchAlsup
  `* Re: Non-RISC-ness in AMD64Stefan Monnier
   `* Re: Non-RISC-ness in AMD64MitchAlsup
    +- Re: Non-RISC-ness in AMD64Stefan Monnier
    `* Re: Non-RISC-ness in AMD64Anton Ertl
     +* Re: Non-RISC-ness in AMD64EricP
     |`* Re: Non-RISC-ness in AMD64Stefan Monnier
     | `* Re: Non-RISC-ness in AMD64MitchAlsup
     |  `* Re: Non-RISC-ness in AMD64EricP
     |   `* Re: Non-RISC-ness in AMD64MitchAlsup
     |    +* Re: Non-RISC-ness in AMD64Stefan Monnier
     |    |`* Re: Non-RISC-ness in AMD64MitchAlsup
     |    | `- Re: Non-RISC-ness in AMD64Stefan Monnier
     |    `* Re: Non-RISC-ness in AMD64EricP
     |     `* Re: Non-RISC-ness in AMD64EricP
     |      +- Re: Non-RISC-ness in AMD64MitchAlsup
     |      `* Re: Non-RISC-ness in AMD64EricP
     |       `* Re: Non-RISC-ness in AMD64MitchAlsup
     |        `* Re: Non-RISC-ness in AMD64EricP
     |         `* Re: Non-RISC-ness in AMD64MitchAlsup
     |          +* Re: Non-RISC-ness in AMD64robf...@gmail.com
     |          |+- Re: Non-RISC-ness in AMD64EricP
     |          |`* Re: Non-RISC-ness in AMD64MitchAlsup
     |          | `* Re: Non-RISC-ness in AMD64robf...@gmail.com
     |          |  +- Re: Non-RISC-ness in AMD64MitchAlsup
     |          |  +- Re: Non-RISC-ness in AMD64EricP
     |          |  `* Re: Non-RISC-ness in AMD64Anton Ertl
     |          |   +* Re: Non-RISC-ness in AMD64MitchAlsup
     |          |   |`* Re: Non-RISC-ness in AMD64EricP
     |          |   | `- Re: Non-RISC-ness in AMD64MitchAlsup
     |          |   `- Re: Non-RISC-ness in AMD64Ivan Godard
     |          `* Re: Non-RISC-ness in AMD64EricP
     |           `- Re: Non-RISC-ness in AMD64MitchAlsup
     `- Re: Non-RISC-ness in AMD64Andy Valencia

Pages:12
Non-RISC-ness in AMD64

<2021Dec24.180027@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22399&group=comp.arch#22399

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Non-RISC-ness in AMD64
Date: Fri, 24 Dec 2021 17:00:27 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 64
Message-ID: <2021Dec24.180027@mips.complang.tuwien.ac.at>
Injection-Info: reader02.eternal-september.org; posting-host="23d2a299e6ae2f6ff650580b2afae5f2";
logging-data="25751"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+RERq2WyttGKkx7vza6VuD"
Cancel-Lock: sha1:XCjxIfiHSLSKetV6mH3AlLcKnlc=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 24 Dec 2021 17:00 UTC

Thinking about the architectures in Celio's talk
<https://www.youtube.com/watch?v=Ii_pEXKKYUg>
<https://arxiv.org/abs/1607.02318>
<https://arxiv.org/pdf/1607.02318.pdf> also made me think of what
CISCy problems AMD64 has.

Ok, instruction decoding is an obvious problem, but it is now pretty
well understood how to decode variable-length instructions quickly if
we throw enough transistors at it (and now even variants of RISC-V
have variable-length instructions), so I will skip this part here.

I'll focus on the central issue in RISC: AMD64 is not a load-store
architecture. And in the 1980s this made a big difference wrt. easy
and efficient implementation, but how are things now?

One problem in the VAX (especially wrt the number of bug-prone corner
cases) was reported to be the number of page translations needed by
one instruction. On a page fault, the instruction would be rerun
afterwards, but the system would have to ensure that a some point all
pages needed by the instruction are there.

The common instructions of AMD64 outside the load/store paradigm are:

load-and-operate instructions, e.g., reg += mem. I don't see that
these instructions cause any difficulty. Am I missing something?
Neither A64 nor RISC-V have added such instructions.

read-modify-write instructions, e.g., mem += reg. Here we can
translate the address once, fault in the page(s) that contain the
relevant memory, then do the reading and writing on mem. One issue
is whether the cache line can migrate to a different core between
reading and writing; AFAIK the architecture says permissions are
allowed to do it, not sure if the implementations actually do it.
Delaying the answer to a cache line request a little while the
"modify" part runs appears to be a relatively cheap way to deal with
the problem, but maybe I am missing something. In any case, this
appears more problematic than load-and-operate. At least in the K8
days AMD had a load-store microinstruction for implementing RMW
instructions.

AMD64 also supports unaligned accesses, which means that the memory
reference in the instructions above may refer to bytes in two pages;
but the same is true about modern 64-bit RISCs.

Now for the not-so-common AMD64 instructions; among those that we have
inherited from the 8086, the most extreme seems to be MOVSW (and its
32-bit variant MOVSL/MOVSD, and its 64-bit variant MOVSQ): it loads
from one memory address and stores in a different memory address;
overall it can access 4 pages (if both memory accesses are misaligned
and straddling pages); can anybody name anything worse?

In recent years Intel has added the VGATHER and VSCATTER instructions
which (in their AVX512 form) can access up to 16 independent memory
locations in one instruction (if unaligned accesses are allowed, that
would be 32 pages). Makes me wonder if accessing many pages in one
instruction is no longer considered a problem.

Apart from the memory accesses and the instruction encoding, are there
any other non-RISC properties of AMD64 that matter today?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22407&group=comp.arch#22407

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:4015:: with SMTP id kd21mr7161993qvb.41.1640380880512;
Fri, 24 Dec 2021 13:21:20 -0800 (PST)
X-Received: by 2002:a05:6808:1283:: with SMTP id a3mr6180118oiw.110.1640380880285;
Fri, 24 Dec 2021 13:21:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 24 Dec 2021 13:21:20 -0800 (PST)
In-Reply-To: <memo.20211224205346.2376M@jgd.cix.co.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d876:5ad9:f900:90dc;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d876:5ad9:f900:90dc
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <memo.20211224205346.2376M@jgd.cix.co.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 24 Dec 2021 21:21:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 45
 by: MitchAlsup - Fri, 24 Dec 2021 21:21 UTC

On Friday, December 24, 2021 at 2:53:49 PM UTC-6, John Dallman wrote:
> In article <2021Dec2...@mips.complang.tuwien.ac.at>,
> an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
> > load-and-operate instructions, e.g., reg += mem. I don't see that
> > these instructions cause any difficulty. Am I missing something?
> > Neither A64 nor RISC-V have added such instructions.
<
> They make dependency tracking in an OoO implementation a little bit more
> complicated, because you have a dependency on the register value that is
> not present if you're going to simply replace the register.
<
This was solved in Athlon and Opteron by making the reservation stations
fire twice--once for the LD and once for the calculation. The mem op= reg
had the stations fire 3 times, the final time used the same physical address
as the first firing--avoiding a dependance on someone else changing the
paging tables mid-instruction.
>
> However, I suspect that the real attraction of the design is simple
> instruction encoding. In a plain load/store architecture, memory
> instructions only have to address one register, freeing up bits in the
> fixed-length instructions for offsets and addressing modes. Many
> operations on registers need to specify several registers, so not needing
> the baggage of memory instructions is helpful. On x86, instructions with
> memory references grow considerably.
<
There is the cartesian product problem of LD-size × calculation-operation
which consumes big hunks of OpCode space. Here, I prefer fused decoding.
{Just don't let the code scheduler move these instructions apart due to the
inherent data dependency, keep the instruction together for easy of fusing.}
<
On the other hand, if you have only 8 (or 16) GPRs the LD-ops and
LD-op-STs give you another 50% register effective count (8->12, 16->24)
<
Moral: don't constrict yourself to 8 (or 16) Registers rather than having
LD-ops an LD-op-STs.
>
>
> John

Re: Non-RISC-ness in AMD64

<2021Dec25.174632@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22430&group=comp.arch#22430

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sat, 25 Dec 2021 16:46:32 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 61
Message-ID: <2021Dec25.174632@mips.complang.tuwien.ac.at>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <memo.20211224205346.2376M@jgd.cix.co.uk>
Injection-Info: reader02.eternal-september.org; posting-host="70510cc33546f9e44e24374914364b6e";
logging-data="24014"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+hsl6t0ogRfFCCi0fl1dfJ"
Cancel-Lock: sha1:I+zY2GwRBfY6nYcs8WDy16N7eAs=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 25 Dec 2021 16:46 UTC

jgd@cix.co.uk (John Dallman) writes:
>In article <2021Dec24.180027@mips.complang.tuwien.ac.at>,
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
>> load-and-operate instructions, e.g., reg += mem. I don't see that
>> these instructions cause any difficulty. Am I missing something?
>> Neither A64 nor RISC-V have added such instructions.
>
>They make dependency tracking in an OoO implementation a little bit more
>complicated, because you have a dependency on the register value that is
>not present if you're going to simply replace the register.

Can you elaborate on that? In an OoO implementation, after the
register renamer such an instruction becomes

preg1 = preg2 + mem

(if the instruction is not split into two uops: load and add).

And if a three-address architecture has an instruction

reg1 = reg2 + reg3

and uses that for, say

reg4 = reg4 + reg5

the same problem exists. Such instruction usage is common and is the
basis for Thumb, MIPS16, and the RISC-V C extension providing
two-address variants of their three-address instructions.

>However, I suspect that the real attraction of the design is simple
>instruction encoding. In a plain load/store architecture, memory
>instructions only have to address one register, freeing up bits in the
>fixed-length instructions for offsets and addressing modes. Many
>operations on registers need to specify several registers, so not needing
>the baggage of memory instructions is helpful. On x86, instructions with
>memory references grow considerably.

Good point, certainly for A64. For RISC-V, it's probably more of a
philosophy thing. It has only one addressing mode with one register
specifier, so adding instructions of the form

reg1 = reg2 op [disp+reg3]

with say 2 or 3 bits for op would be doable in 32 bits, and would fit
nicely with the existing 2-read 1-write instructions. But I see
significant costs for simple implementations here: You would now have
a pipeline like

IF ID MEM1 MEM2 OP WB

and you would need more bypasses, and conditional (and, for
simplicity, probably also unconditional) branches would take more
cycles. The benefit is that the number of ops/cycle could increase,
but the additional cost of branches might easily consume this benefit.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<2021Dec25.185315@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22433&group=comp.arch#22433

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sat, 25 Dec 2021 17:53:15 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 12
Message-ID: <2021Dec25.185315@mips.complang.tuwien.ac.at>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <memo.20211224205346.2376M@jgd.cix.co.uk> <19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="70510cc33546f9e44e24374914364b6e";
logging-data="7648"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+thU6abDugKVjJ4v2nQMet"
Cancel-Lock: sha1:9/UCc8f4qNYvvI9yPyq7q8N0slI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 25 Dec 2021 17:53 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On the other hand, if you have only 8 (or 16) GPRs the LD-ops and=20
>LD-op-STs give you another 50% register effective count (8->12, 16->24)

You can replace load-op and RMW instructions with sequences employing
one additional register, so these instructions reduce the register
pressure at best by 1.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<9c2fb0fe-0773-40cc-86e9-6200f61d143cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22436&group=comp.arch#22436

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5806:: with SMTP id g6mr9455531qtg.581.1640456212269;
Sat, 25 Dec 2021 10:16:52 -0800 (PST)
X-Received: by 2002:a9d:75d4:: with SMTP id c20mr7850630otl.85.1640456211892;
Sat, 25 Dec 2021 10:16:51 -0800 (PST)
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 25 Dec 2021 10:16:51 -0800 (PST)
In-Reply-To: <2021Dec25.185315@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7149:e490:e0cd:51ab;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7149:e490:e0cd:51ab
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <memo.20211224205346.2376M@jgd.cix.co.uk>
<19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com> <2021Dec25.185315@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9c2fb0fe-0773-40cc-86e9-6200f61d143cn@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 25 Dec 2021 18:16:52 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18
 by: MitchAlsup - Sat, 25 Dec 2021 18:16 UTC

On Saturday, December 25, 2021 at 11:56:16 AM UTC-6, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On the other hand, if you have only 8 (or 16) GPRs the LD-ops and=20
> >LD-op-STs give you another 50% register effective count (8->12, 16->24)
> You can replace load-op and RMW instructions with sequences employing
> one additional register, so these instructions reduce the register
> pressure at best by 1.
<
A LD-op-ST can allow a variable to be manipulated directly in memory,
never needing to occupy a register in the CPU. Every one of these variables
saves a GPR, too.
<
These variables only get access to the simplest of integer arithmetic
{+, -, &, | ^ ~} in typical ISAs.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Non-RISC-ness in AMD64

<sq89vl$rbm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22450&group=comp.arch#22450

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sat, 25 Dec 2021 17:37:22 -0600
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <sq89vl$rbm$1@dont-email.me>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<memo.20211224205346.2376M@jgd.cix.co.uk>
<19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com>
<2021Dec25.185315@mips.complang.tuwien.ac.at>
<9c2fb0fe-0773-40cc-86e9-6200f61d143cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 25 Dec 2021 23:37:25 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b39450aad085f07d940960b62d661ee6";
logging-data="28022"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ol572xmkVCeBwVswBbnTm"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:Rkq6J0VgY8xABn0unjYIZjjCwXo=
In-Reply-To: <9c2fb0fe-0773-40cc-86e9-6200f61d143cn@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 25 Dec 2021 23:37 UTC

On 12/25/2021 12:16 PM, MitchAlsup wrote:
> On Saturday, December 25, 2021 at 11:56:16 AM UTC-6, Anton Ertl wrote:
>> MitchAlsup <Mitch...@aol.com> writes:
>>> On the other hand, if you have only 8 (or 16) GPRs the LD-ops and=20
>>> LD-op-STs give you another 50% register effective count (8->12, 16->24)
>> You can replace load-op and RMW instructions with sequences employing
>> one additional register, so these instructions reduce the register
>> pressure at best by 1.
> <
> A LD-op-ST can allow a variable to be manipulated directly in memory,
> never needing to occupy a register in the CPU. Every one of these variables
> saves a GPR, too.
> <
> These variables only get access to the simplest of integer arithmetic
> {+, -, &, | ^ ~} in typical ISAs.

If it were sufficiently restricted, it seems one could handle it similar
to a special case of a store operation (rather than replacing whatever
is in the target location, one performs an operation between the store
value and whatever was there already).

More general cases, like trying to support Mod/RM operation on nearly
every operation, as in x86, seems to be where there are problems.

Meanwhile, an ISA with 4x or 8x as many registers probably doesn't need
to worry as much about register pressure; general issue being more with
inter-instruction dependencies and data movement.

> <
>> - anton
>> --
>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Non-RISC-ness in AMD64

<2021Dec26.184313@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22462&group=comp.arch#22462

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sun, 26 Dec 2021 17:43:13 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 38
Message-ID: <2021Dec26.184313@mips.complang.tuwien.ac.at>
References: <2021Dec25.174632@mips.complang.tuwien.ac.at> <memo.20211226165103.2376N@jgd.cix.co.uk>
Injection-Info: reader02.eternal-september.org; posting-host="44bac8bbaf3270f98959b778bea13e5e";
logging-data="27367"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+baEwZYY0MRvqwzZCkkP0G"
Cancel-Lock: sha1:U4kFk75o+bQ1v2n2a3A3L2h/ZR4=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 26 Dec 2021 17:43 UTC

jgd@cix.co.uk (John Dallman) writes:
>In article <2021Dec25.174632@mips.complang.tuwien.ac.at>,
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>> [RISC-V] has only one addressing mode with one register specifier,
>> so adding instructions of the form
>>
>> reg1 = reg2 op [disp+reg3]
>>
>> with say 2 or 3 bits for op would be doable in 32 bits, and would
>> fit nicely with the existing 2-read 1-write instructions. But I see
>> significant costs for simple implementations here: You would now
>> have a pipeline like
>>
>> IF ID MEM1 MEM2 OP WB
>>
>> and you would need more bypasses, and conditional (and, for
>> simplicity, probably also unconditional) branches would take more
>> cycles. The benefit is that the number of ops/cycle could increase,
>> but the additional cost of branches might easily consume this
>> benefit.
>
>The RISC-V approach might be to make such instructions yet another
>extension.

Yes, they could do that. They could add load-and-op instructions with
48-bit encodings to allow significant disps. But I think the
philosophy is to rather have them as a load and an op (which can be
independently compressed) and to fuse them if there is any
microarchitectural reason to do it (there probably isn't). As for the
waste of specifying the intermediate register twice: a RISC-V 48-bit
instruction takes 6 bits just for encoding the 48-bit length, so the
additional register specifier may mostly amortize itself by not
needing that.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<2021Dec26.193350@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22464&group=comp.arch#22464

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sun, 26 Dec 2021 18:33:50 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 29
Message-ID: <2021Dec26.193350@mips.complang.tuwien.ac.at>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <memo.20211224205346.2376M@jgd.cix.co.uk> <19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com> <2021Dec25.185315@mips.complang.tuwien.ac.at> <9c2fb0fe-0773-40cc-86e9-6200f61d143cn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="44bac8bbaf3270f98959b778bea13e5e";
logging-data="7435"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18nSvwwrQGfG5p2h49oSKG9"
Cancel-Lock: sha1:N0OMEjD4pWAlYI47fvss8XM6ySI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 26 Dec 2021 18:33 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Saturday, December 25, 2021 at 11:56:16 AM UTC-6, Anton Ertl wrote:
>> MitchAlsup <Mitch...@aol.com> writes:
>> >On the other hand, if you have only 8 (or 16) GPRs the LD-ops and=20
>> >LD-op-STs give you another 50% register effective count (8->12, 16->24)
>> You can replace load-op and RMW instructions with sequences employing
>> one additional register, so these instructions reduce the register
>> pressure at best by 1.
><
>A LD-op-ST can allow a variable to be manipulated directly in memory,

You can also do this without LD-op-ST, it just takes more instructions
and thus makes the cost more visible.

>never needing to occupy a register in the CPU. Every one of these variables
>saves a GPR, too.

On a load-store architecture, you need 1 extra register (compared to
the architecture with LD-op-ST) for passing the result of the load to
the op, and the result of the op to the store.

Of course, what you typically do is that you have an instruction set
with so and so many registers, and you try to keep as many variables
in registers as fit.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<2ad05c84-5fe1-46cb-aa59-f954fbbeafc3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22466&group=comp.arch#22466

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4652:: with SMTP id f18mr12640324qto.381.1640549441422;
Sun, 26 Dec 2021 12:10:41 -0800 (PST)
X-Received: by 2002:a05:6830:348f:: with SMTP id c15mr10655486otu.254.1640549440833;
Sun, 26 Dec 2021 12:10:40 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 26 Dec 2021 12:10:40 -0800 (PST)
In-Reply-To: <2021Dec26.184313@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:81ac:9808:f6a0:d948;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:81ac:9808:f6a0:d948
References: <2021Dec25.174632@mips.complang.tuwien.ac.at> <memo.20211226165103.2376N@jgd.cix.co.uk>
<2021Dec26.184313@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2ad05c84-5fe1-46cb-aa59-f954fbbeafc3n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 26 Dec 2021 20:10:41 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 60
 by: MitchAlsup - Sun, 26 Dec 2021 20:10 UTC

On Sunday, December 26, 2021 at 11:59:51 AM UTC-6, Anton Ertl wrote:
> j...@cix.co.uk (John Dallman) writes:
> >In article <2021Dec2...@mips.complang.tuwien.ac.at>,
> >an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> >> [RISC-V] has only one addressing mode with one register specifier,
> >> so adding instructions of the form
> >>
> >> reg1 = reg2 op [disp+reg3]
> >>
> >> with say 2 or 3 bits for op would be doable in 32 bits, and would
> >> fit nicely with the existing 2-read 1-write instructions. But I see
> >> significant costs for simple implementations here: You would now
> >> have a pipeline like
> >>
> >> IF ID MEM1 MEM2 OP WB
> >>
> >> and you would need more bypasses, and conditional (and, for
> >> simplicity, probably also unconditional) branches would take more
> >> cycles. The benefit is that the number of ops/cycle could increase,
> >> but the additional cost of branches might easily consume this
> >> benefit.
> >
> >The RISC-V approach might be to make such instructions yet another
> >extension.
> Yes, they could do that. They could add load-and-op instructions with
> 48-bit encodings to allow significant disps. But I think the
> philosophy is to rather have them as a load and an op (which can be
> independently compressed) and to fuse them if there is any
> microarchitectural reason to do it (there probably isn't). As for the
> waste of specifying the intermediate register twice: a RISC-V 48-bit
> instruction takes 6 bits just for encoding the 48-bit length, so the
> additional register specifier may mostly amortize itself by not
> needing that.
<
In my opinion:
<
Fusing should be a microarchitectural choice (i.e., implementation)
not architectural (all implementations have to do it.)
<
There are things one can do in microarchitecture that one cannot do in
macroarchitecture:
<
I remember back in the K9 design, we would recognize 3 moves in a row
<
MOV Rt,Ry
MOV Ry,Rx
MOV Rx,Rt
<
Was changed into:
MOV Rx,Ry; MOV Ry,Rx; MOV Rt,Ry
which executes simultaneously. Or into
MOV Rx,Ry; MOV Ry,Rx;
if Rt gets reassigned before the local horizon (MOV Rt ,Rz became dead code)
<
Nobody would allow this in ISA design, but it is perfectly fine in micro-
architecture design.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Non-RISC-ness in AMD64

<sqas63$58g$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22470&group=comp.arch#22470

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sun, 26 Dec 2021 17:00:17 -0600
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <sqas63$58g$1@dont-email.me>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<memo.20211224205346.2376M@jgd.cix.co.uk>
<19adbe49-6d32-44f1-b493-e4b592e706e0n@googlegroups.com>
<2021Dec25.185315@mips.complang.tuwien.ac.at>
<9c2fb0fe-0773-40cc-86e9-6200f61d143cn@googlegroups.com>
<2021Dec26.193350@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 Dec 2021 23:00:19 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8aa7ed5a85a0d9a3d655c87683959d4a";
logging-data="5392"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/V+USWLsUd/EfjEc2H23re"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:tXKvH/1p2qm8uCvOCKtb8zGqWJw=
In-Reply-To: <2021Dec26.193350@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: BGB - Sun, 26 Dec 2021 23:00 UTC

On 12/26/2021 12:33 PM, Anton Ertl wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Saturday, December 25, 2021 at 11:56:16 AM UTC-6, Anton Ertl wrote:
>>> MitchAlsup <Mitch...@aol.com> writes:
>>>> On the other hand, if you have only 8 (or 16) GPRs the LD-ops and=20
>>>> LD-op-STs give you another 50% register effective count (8->12, 16->24)
>>> You can replace load-op and RMW instructions with sequences employing
>>> one additional register, so these instructions reduce the register
>>> pressure at best by 1.
>> <
>> A LD-op-ST can allow a variable to be manipulated directly in memory,
>
> You can also do this without LD-op-ST, it just takes more instructions
> and thus makes the cost more visible.
>

IME the vast majority of these cases tend to be things like "loop
counter or similar got evicted". Typically less of an issue if one has
sufficient registers.

>> never needing to occupy a register in the CPU. Every one of these variables
>> saves a GPR, too.
>
> On a load-store architecture, you need 1 extra register (compared to
> the architecture with LD-op-ST) for passing the result of the load to
> the op, and the result of the op to the store.
>
> Of course, what you typically do is that you have an instruction set
> with so and so many registers, and you try to keep as many variables
> in registers as fit.
>

Pretty much.

In BJX2, there are currently: 32 GPRs in the baseline ISA, 64 GPRs with
XGPR.

Pattern seems to be, roughly:
8 GPRs (x86-32): Nearly everything is in memory;
Registers mostly used as temporary scratch values.
11 GPRs (A32): Very high spill rate;
16 GPRs (x64/SH): Can mostly stick to registers, frequent spills;
32 GPRs: Can mostly use registers, occasional spills;
A majority of small leaf functions can be mapped to registers.
64 GPRs: Most functions do not need stack variables at all.

On x86-32, there seems to be "CPU magic" which makes it fast.

On 32-bit ARM, there seems to be some sort of "GCC magic" at play, as
most of my attempts at generating code for 32-bit ARM invariably perform
like total garbage (though, have generally also ended up with code that
consists almost entirely of LD/ST ops due to register pressure; but
without the "make it fast" magic that x86 CPUs seem to have).

Meanwhile, 16 GPRs works better. There is still a lot of spills.

In my own ISA efforts, I quickly switched to 32 GPRs as this can result
in a very significant reduction in the rate of register spills. For
hand-written ASM and also for small leaf functions, it is possible to
map everything to registers and potentially skip the creation of a stack
frame.

The expansion to 64 GPRs can help with some "high register pressure"
cases, though the savings are at best "fairly modest". Many leaf
functions can run entirely in scratch registers, and a majority of
non-leaf functions need only save/restore registers but can statically
assign all the normal variables to registers.

Re: Non-RISC-ness in AMD64

<sqbd31$jkg$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22474&group=comp.arch#22474

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Sun, 26 Dec 2021 19:48:47 -0800
Organization: A noiseless patient Spider
Lines: 87
Message-ID: <sqbd31$jkg$1@dont-email.me>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Dec 2021 03:48:50 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c940ef03a76180abb07b48aef3b933ae";
logging-data="20112"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX193Kk1qEtJi1oVwKomPPsQpNxr+u/U6/M4="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:52zRV8n4/yMudHQt+xj3Bw6dM1o=
In-Reply-To: <2021Dec24.180027@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Stephen Fuld - Mon, 27 Dec 2021 03:48 UTC

On 12/24/2021 9:00 AM, Anton Ertl wrote:
> Thinking about the architectures in Celio's talk
> <https://www.youtube.com/watch?v=Ii_pEXKKYUg>
> <https://arxiv.org/abs/1607.02318>
> <https://arxiv.org/pdf/1607.02318.pdf> also made me think of what
> CISCy problems AMD64 has.
>
> Ok, instruction decoding is an obvious problem, but it is now pretty
> well understood how to decode variable-length instructions quickly if
> we throw enough transistors at it (and now even variants of RISC-V
> have variable-length instructions), so I will skip this part here.
>
> I'll focus on the central issue in RISC: AMD64 is not a load-store
> architecture. And in the 1980s this made a big difference wrt. easy
> and efficient implementation, but how are things now?
>
> One problem in the VAX (especially wrt the number of bug-prone corner
> cases) was reported to be the number of page translations needed by
> one instruction. On a page fault, the instruction would be rerun
> afterwards, but the system would have to ensure that a some point all
> pages needed by the instruction are there.
>
> The common instructions of AMD64 outside the load/store paradigm are:
>
> load-and-operate instructions, e.g., reg += mem. I don't see that
> these instructions cause any difficulty. Am I missing something?
> Neither A64 nor RISC-V have added such instructions.

ISTM that one of the advantages of load and operate instructions is the
savings in I$ usage/bandwidth of combining what was two instructions
into one. Even if you have to make the destination be one of the
sources, i.e. A = A + mem, and you have to restrict which addressing
modes you can use (to save instruction bits), it may be worth while much
of the time. And if you have to precede the instruction with a register
to register copy to save the unmodified source, you are no worse off
space wise, and that instruction may even be able to be done in the
renaming stage, not causing a full instruction time.

But perhaps it just isn't worth the trouble.

> read-modify-write instructions, e.g., mem += reg. Here we can
> translate the address once, fault in the page(s) that contain the
> relevant memory, then do the reading and writing on mem. One issue
> is whether the cache line can migrate to a different core between
> reading and writing; AFAIK the architecture says permissions are
> allowed to do it, not sure if the implementations actually do it.
> Delaying the answer to a cache line request a little while the
> "modify" part runs appears to be a relatively cheap way to deal with
> the problem, but maybe I am missing something. In any case, this
> appears more problematic than load-and-operate. At least in the K8
> days AMD had a load-store microinstruction for implementing RMW
> instructions.

I think you have to differentiate between a plain RMW and an interlocked
RMW. The latter may be necessary for locks, etc. The former is just a
performance optimization.

> Now for the not-so-common AMD64 instructions; among those that we have
> inherited from the 8086, the most extreme seems to be MOVSW (and its
> 32-bit variant MOVSL/MOVSD, and its 64-bit variant MOVSQ): it loads
> from one memory address and stores in a different memory address;
> overall it can access 4 pages (if both memory accesses are misaligned
> and straddling pages); can anybody name anything worse?
>
> In recent years Intel has added the VGATHER and VSCATTER instructions
> which (in their AVX512 form) can access up to 16 independent memory
> locations in one instruction (if unaligned accesses are allowed, that
> would be 32 pages). Makes me wonder if accessing many pages in one
> instruction is no longer considered a problem.
>
> Apart from the memory accesses and the instruction encoding, are there
> any other non-RISC properties of AMD64 that matter today?

I am not sure this counts, but even ignoring page faults, instructions
like the byte moves must be interruptible and restartable where you left
off. Original RISC required all instructions to be a single cycle, so
this never came up.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Non-RISC-ness in AMD64

<HckyJ.106971$IB7.31124@fx02.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22485&group=comp.arch#22485

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec25.174632@mips.complang.tuwien.ac.at> <memo.20211226165103.2376N@jgd.cix.co.uk> <2021Dec26.184313@mips.complang.tuwien.ac.at> <2ad05c84-5fe1-46cb-aa59-f954fbbeafc3n@googlegroups.com>
In-Reply-To: <2ad05c84-5fe1-46cb-aa59-f954fbbeafc3n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 64
Message-ID: <HckyJ.106971$IB7.31124@fx02.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 27 Dec 2021 14:37:27 UTC
Date: Mon, 27 Dec 2021 09:36:57 -0500
X-Received-Bytes: 3116
X-Original-Bytes: 3065
 by: EricP - Mon, 27 Dec 2021 14:36 UTC

MitchAlsup wrote:
>
> In my opinion:
>
> Fusing should be a microarchitectural choice (i.e., implementation)
> not architectural (all implementations have to do it.)
>
> There are things one can do in microarchitecture that one cannot do in
> macroarchitecture:
>
> I remember back in the K9 design, we would recognize 3 moves in a row
>
> MOV Rt,Ry
> MOV Ry,Rx
> MOV Rx,Rt
>
> Was changed into:
> MOV Rx,Ry; MOV Ry,Rx; MOV Rt,Ry
> which executes simultaneously. Or into
> MOV Rx,Ry; MOV Ry,Rx;
> if Rt gets reassigned before the local horizon (MOV Rt ,Rz became dead code)
>
> Nobody would allow this in ISA design, but it is perfectly fine in micro-
> architecture design.

I was composing a post to ask this about RISC-V fusion but
this example does just as well, replacing 3 MOV's with a SWAP.

How does one prevent the above fusion from being 'fragile'?
In that there appear to be many ways such fusion can fail
and only a couple that can succeed.

If we have 4 decoders D1..D4 and the MOV's parse into D1..D3 or D2..D4
then we can detect that
(a) they are all MOV's and
(b) we check that the registers # all match up correctly
(so there is some inter-decoder semantic validity checking)
then it can emit one SWAP uOp.
Otherwise they decode as separate uOps.

But if D1,D2 have other instructions and the MOV's parse into D3,D4
then what does it do?
Should it stall D3,D4 and wait to see what the next instruction is,
or skip fusion?

Or if two MOV's land in D1,D2 but then the fetch buffer is empty.
Again stall or skip fusion?

One option appears to be two simple extra lookahead decoders located after
D4 that could warn D1..D4 that fusible instructions are about to arrive.
But that requires extra parsers for variable length instructions.
And it doesn't deal with the empty fetch buffer scenario.

So it appears that fusion optimizations are laissez faire -
if it works, great, but it is probabilistic and fragile.

One idea was to add fusing into a uOp cache,
so fusion MAY be detected by decoders on the first pass,
but IS detected for subsequent usage.
Of course this assumes one can afford an expensive uOp cache,
and that kinda throws the whole risc approach under a bus.

Re: Non-RISC-ness in AMD64

<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22492&group=comp.arch#22492

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4e96:: with SMTP id 22mr16125560qtp.76.1640630883381;
Mon, 27 Dec 2021 10:48:03 -0800 (PST)
X-Received: by 2002:a05:6808:1448:: with SMTP id x8mr13794557oiv.84.1640630883178;
Mon, 27 Dec 2021 10:48:03 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 27 Dec 2021 10:48:02 -0800 (PST)
In-Reply-To: <sqbd31$jkg$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7907:330c:656:c2fa;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7907:330c:656:c2fa
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 27 Dec 2021 18:48:03 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 92
 by: MitchAlsup - Mon, 27 Dec 2021 18:48 UTC

On Sunday, December 26, 2021 at 9:48:52 PM UTC-6, Stephen Fuld wrote:
> On 12/24/2021 9:00 AM, Anton Ertl wrote:
> > Thinking about the architectures in Celio's talk
> > <https://www.youtube.com/watch?v=Ii_pEXKKYUg>
> > <https://arxiv.org/abs/1607.02318>
> > <https://arxiv.org/pdf/1607.02318.pdf> also made me think of what
> > CISCy problems AMD64 has.
> >
> > Ok, instruction decoding is an obvious problem, but it is now pretty
> > well understood how to decode variable-length instructions quickly if
> > we throw enough transistors at it (and now even variants of RISC-V
> > have variable-length instructions), so I will skip this part here.
> >
> > I'll focus on the central issue in RISC: AMD64 is not a load-store
> > architecture. And in the 1980s this made a big difference wrt. easy
> > and efficient implementation, but how are things now?
> >
> > One problem in the VAX (especially wrt the number of bug-prone corner
> > cases) was reported to be the number of page translations needed by
> > one instruction. On a page fault, the instruction would be rerun
> > afterwards, but the system would have to ensure that a some point all
> > pages needed by the instruction are there.
> >
> > The common instructions of AMD64 outside the load/store paradigm are:
> >
> > load-and-operate instructions, e.g., reg += mem. I don't see that
> > these instructions cause any difficulty. Am I missing something?
> > Neither A64 nor RISC-V have added such instructions.
> ISTM that one of the advantages of load and operate instructions is the
> savings in I$ usage/bandwidth of combining what was two instructions
> into one. Even if you have to make the destination be one of the
> sources, i.e. A = A + mem, and you have to restrict which addressing
> modes you can use (to save instruction bits), it may be worth while much
> of the time. And if you have to precede the instruction with a register
> to register copy to save the unmodified source, you are no worse off
> space wise, and that instruction may even be able to be done in the
> renaming stage, not causing a full instruction time.
>
> But perhaps it just isn't worth the trouble.
> > read-modify-write instructions, e.g., mem += reg. Here we can
> > translate the address once, fault in the page(s) that contain the
> > relevant memory, then do the reading and writing on mem. One issue
> > is whether the cache line can migrate to a different core between
> > reading and writing; AFAIK the architecture says permissions are
> > allowed to do it, not sure if the implementations actually do it.
> > Delaying the answer to a cache line request a little while the
> > "modify" part runs appears to be a relatively cheap way to deal with
> > the problem, but maybe I am missing something. In any case, this
> > appears more problematic than load-and-operate. At least in the K8
> > days AMD had a load-store microinstruction for implementing RMW
> > instructions.
> I think you have to differentiate between a plain RMW and an interlocked
> RMW. The latter may be necessary for locks, etc. The former is just a
> performance optimization.
<
Err, more than that:
<
Consider the case where a RISC machine uses 3 instructions:
<
LD Rt,[somewhere]
op Rt,Rt,Rx
ST Rt,[somewhere]
<
In the case another core modifies the MMU tables between the LD and the
ST you store to a different location that you loaded; whereas, an RMW
machine is guaranteed to store to the same location that was loaded.
<
> > Now for the not-so-common AMD64 instructions; among those that we have
> > inherited from the 8086, the most extreme seems to be MOVSW (and its
> > 32-bit variant MOVSL/MOVSD, and its 64-bit variant MOVSQ): it loads
> > from one memory address and stores in a different memory address;
> > overall it can access 4 pages (if both memory accesses are misaligned
> > and straddling pages); can anybody name anything worse?
> >
> > In recent years Intel has added the VGATHER and VSCATTER instructions
> > which (in their AVX512 form) can access up to 16 independent memory
> > locations in one instruction (if unaligned accesses are allowed, that
> > would be 32 pages). Makes me wonder if accessing many pages in one
> > instruction is no longer considered a problem.
> >
> > Apart from the memory accesses and the instruction encoding, are there
> > any other non-RISC properties of AMD64 that matter today?
> I am not sure this counts, but even ignoring page faults, instructions
> like the byte moves must be interruptible and restartable where you left
> off. Original RISC required all instructions to be a single cycle, so
> this never came up.
>
>
>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Non-RISC-ness in AMD64

<91cc87a4-548f-4d3c-b355-e155e114f917n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22494&group=comp.arch#22494

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:b0e:: with SMTP id t14mr12844383qkg.146.1640631248786;
Mon, 27 Dec 2021 10:54:08 -0800 (PST)
X-Received: by 2002:a4a:d248:: with SMTP id e8mr9054569oos.5.1640631248588;
Mon, 27 Dec 2021 10:54:08 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 27 Dec 2021 10:54:08 -0800 (PST)
In-Reply-To: <HckyJ.106971$IB7.31124@fx02.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7907:330c:656:c2fa;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7907:330c:656:c2fa
References: <2021Dec25.174632@mips.complang.tuwien.ac.at> <memo.20211226165103.2376N@jgd.cix.co.uk>
<2021Dec26.184313@mips.complang.tuwien.ac.at> <2ad05c84-5fe1-46cb-aa59-f954fbbeafc3n@googlegroups.com>
<HckyJ.106971$IB7.31124@fx02.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <91cc87a4-548f-4d3c-b355-e155e114f917n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 27 Dec 2021 18:54:08 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 73
 by: MitchAlsup - Mon, 27 Dec 2021 18:54 UTC

On Monday, December 27, 2021 at 8:37:32 AM UTC-6, EricP wrote:
> MitchAlsup wrote:
> >
> > In my opinion:
> >
> > Fusing should be a microarchitectural choice (i.e., implementation)
> > not architectural (all implementations have to do it.)
> >
> > There are things one can do in microarchitecture that one cannot do in
> > macroarchitecture:
> >
> > I remember back in the K9 design, we would recognize 3 moves in a row
> >
> > MOV Rt,Ry
> > MOV Ry,Rx
> > MOV Rx,Rt
> >
> > Was changed into:
> > MOV Rx,Ry; MOV Ry,Rx; MOV Rt,Ry
> > which executes simultaneously. Or into
> > MOV Rx,Ry; MOV Ry,Rx;
> > if Rt gets reassigned before the local horizon (MOV Rt ,Rz became dead code)
> >
> > Nobody would allow this in ISA design, but it is perfectly fine in micro-
> > architecture design.
> I was composing a post to ask this about RISC-V fusion but
> this example does just as well, replacing 3 MOV's with a SWAP.
>
> How does one prevent the above fusion from being 'fragile'?
> In that there appear to be many ways such fusion can fail
> and only a couple that can succeed.
<
a) These patterns are built at packet/trace build time, not at DECODE
time.
b) During packet/trace build time, there are plenty of cycles to look at
all the patterns and decide which one to build "this time".
c) During packet/trace build, instructions are passing down the pipeline
at raw decode width which will be significantly lower than when executing
packets.
>
> If we have 4 decoders D1..D4 and the MOV's parse into D1..D3 or D2..D4
> then we can detect that
> (a) they are all MOV's and
> (b) we check that the registers # all match up correctly
> (so there is some inter-decoder semantic validity checking)
> then it can emit one SWAP uOp.
> Otherwise they decode as separate uOps.
>
> But if D1,D2 have other instructions and the MOV's parse into D3,D4
> then what does it do?
> Should it stall D3,D4 and wait to see what the next instruction is,
> or skip fusion?
<
All good questions. and why you don't do this in the DECODEr.
>
> Or if two MOV's land in D1,D2 but then the fetch buffer is empty.
> Again stall or skip fusion?
>
> One option appears to be two simple extra lookahead decoders located after
> D4 that could warn D1..D4 that fusible instructions are about to arrive.
> But that requires extra parsers for variable length instructions.
> And it doesn't deal with the empty fetch buffer scenario.
>
> So it appears that fusion optimizations are laissez faire -
> if it works, great, but it is probabilistic and fragile.
>
> One idea was to add fusing into a uOp cache,
<
Must be the word of the decade, supplanting packet (1990) and trace (2000).
<
> so fusion MAY be detected by decoders on the first pass,
> but IS detected for subsequent usage.
> Of course this assumes one can afford an expensive uOp cache,
> and that kinda throws the whole risc approach under a bus.

Re: Non-RISC-ness in AMD64

<jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22508&group=comp.arch#22508

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Mon, 27 Dec 2021 17:17:41 -0500
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="ff231636d4a4bdf362a9d3992bc0e63a";
logging-data="1359"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/zmdi4xLMY0Znk+vPPYlPr"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:kgr9CHhJLnvKBX5HRVoEz2jbJfY=
sha1:t9NMtyE/PZhcczdDKXFcmStbBxU=
 by: Stefan Monnier - Mon, 27 Dec 2021 22:17 UTC

> Consider the case where a RISC machine uses 3 instructions:
> <
> LD Rt,[somewhere]
> op Rt,Rt,Rx
> ST Rt,[somewhere]
> <
> In the case another core modifies the MMU tables between the LD and the
> ST you store to a different location that you loaded; whereas, an RMW
> machine is guaranteed to store to the same location that was loaded.

Why is it good to guarantee the same physical address?

Furthermore, in the CISC case if the TLB is changed in the
middle of the instruction, it seems wrong to store back into the
original physical address since it might now be allocated to
a completely different process.

Stefan

Re: Non-RISC-ness in AMD64

<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22510&group=comp.arch#22510

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:19a3:: with SMTP id u35mr16183577qtc.303.1640644838338;
Mon, 27 Dec 2021 14:40:38 -0800 (PST)
X-Received: by 2002:aca:646:: with SMTP id 67mr14634022oig.175.1640644838171;
Mon, 27 Dec 2021 14:40:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 27 Dec 2021 14:40:37 -0800 (PST)
In-Reply-To: <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:383f:58a2:7daa:7355;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:383f:58a2:7daa:7355
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 27 Dec 2021 22:40:38 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: MitchAlsup - Mon, 27 Dec 2021 22:40 UTC

On Monday, December 27, 2021 at 4:17:44 PM UTC-6, Stefan Monnier wrote:
> > Consider the case where a RISC machine uses 3 instructions:
> > <
> > LD Rt,[somewhere]
> > op Rt,Rt,Rx
> > ST Rt,[somewhere]
> > <
> > In the case another core modifies the MMU tables between the LD and the
> > ST you store to a different location that you loaded; whereas, an RMW
> > machine is guaranteed to store to the same location that was loaded.
<
> Why is it good to guarantee the same physical address?
<
The converse is what would you ever do if the PA was allowed to change ?
What meaning could SW derive for it ?
>
> Furthermore, in the CISC case if the TLB is changed in the
> middle of the instruction, it seems wrong to store back into the
> original physical address since it might now be allocated to
> a completely different process.
<
It is exactly 1 instruction looking at exactly 1 set of state the core
is operating under. RISC exposes this as non-unit-instruction.
>
>
> Stefan

Re: Non-RISC-ness in AMD64

<jwvee5xaa1j.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22516&group=comp.arch#22516

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Mon, 27 Dec 2021 18:49:27 -0500
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <jwvee5xaa1j.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>
<jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="575e1be200083089f13bac3ff8f00071";
logging-data="11221"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18bleY0KIfcjJOescSH0cZI"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:49OjX/Vv9tYuMwcv4LdE1o7cEdM=
sha1:Xk2iYvw1hiG5O20py82WTd0wSLY=
 by: Stefan Monnier - Mon, 27 Dec 2021 23:49 UTC

>> > LD Rt,[somewhere]
>> > op Rt,Rt,Rx
>> > ST Rt,[somewhere]
>> > <
>> > In the case another core modifies the MMU tables between the LD and the
>> > ST you store to a different location that you loaded; whereas, an RMW
>> > machine is guaranteed to store to the same location that was loaded.
> <
>> Why is it good to guarantee the same physical address?
> <
> The converse is what would you ever do if the PA was allowed to change ?

Nothing special: the ST operates on the new physical address, which is
what we want since that's indeed where that location now lives.

>> Furthermore, in the CISC case if the TLB is changed in the
>> middle of the instruction, it seems wrong to store back into the
>> original physical address since it might now be allocated to
>> a completely different process.
> It is exactly 1 instruction looking at exactly 1 set of state the core
> is operating under. RISC exposes this as non-unit-instruction.

Assuming the RMW is not guaranteed to be atomic, I don't see what's the
great benefit of treating it as a single instruction (I can see some
potential benefits in terms of efficiency, but I'm here only worried
about semantics).

Stefan

Re: Non-RISC-ness in AMD64

<2021Dec28.112737@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22524&group=comp.arch#22524

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Tue, 28 Dec 2021 10:27:37 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 38
Message-ID: <2021Dec28.112737@mips.complang.tuwien.ac.at>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="601e837a5ee8f3e908f93612dbc27cca";
logging-data="9061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AA4ezALweTSXJvfcaknWL"
Cancel-Lock: sha1:Ksv3D8FuTVyeEOQMO8WZ2z21NdA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Tue, 28 Dec 2021 10:27 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Monday, December 27, 2021 at 4:17:44 PM UTC-6, Stefan Monnier wrote:
>> > Consider the case where a RISC machine uses 3 instructions:
>> > <
>> > LD Rt,[somewhere]
>> > op Rt,Rt,Rx
>> > ST Rt,[somewhere]
>> > <
>> > In the case another core modifies the MMU tables between the LD and the
>> > ST you store to a different location that you loaded; whereas, an RMW
>> > machine is guaranteed to store to the same location that was loaded.
><
>> Why is it good to guarantee the same physical address?
><
>The converse is what would you ever do if the PA was allowed to change ?
>What meaning could SW derive for it ?

There could be a context switch (based on, say, the end of the time
slot) at the op instruction. The page containing "somewhere" could be
paged out and reused for something else, resulting in having no PA for
"somewhere". On becoming active again, the ST would first produce a
page fault (so another page table change), and "somewhere" would get a
PA, but that can be quite different from the old PA.

>> Furthermore, in the CISC case if the TLB is changed in the
>> middle of the instruction, it seems wrong to store back into the
>> original physical address since it might now be allocated to
>> a completely different process.

In the CISC case the whole instruction will not be committed, and on
restarting it after the context switch, already the read part of the
RMW instruction will encounter a page fault, and later access
"somewhere" at the new address.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Non-RISC-ness in AMD64

<TkGyJ.242585$IW4.224257@fx48.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22531&group=comp.arch#22531

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx48.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
In-Reply-To: <2021Dec28.112737@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 46
Message-ID: <TkGyJ.242585$IW4.224257@fx48.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 28 Dec 2021 15:48:03 UTC
Date: Tue, 28 Dec 2021 10:47:03 -0500
X-Received-Bytes: 3006
 by: EricP - Tue, 28 Dec 2021 15:47 UTC

Anton Ertl wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Monday, December 27, 2021 at 4:17:44 PM UTC-6, Stefan Monnier wrote:
>>>> Consider the case where a RISC machine uses 3 instructions:
>>>> <
>>>> LD Rt,[somewhere]
>>>> op Rt,Rt,Rx
>>>> ST Rt,[somewhere]
>>>> <
>>>> In the case another core modifies the MMU tables between the LD and the
>>>> ST you store to a different location that you loaded; whereas, an RMW
>>>> machine is guaranteed to store to the same location that was loaded.
>> <
>>> Why is it good to guarantee the same physical address?
>> <
>> The converse is what would you ever do if the PA was allowed to change ?
>> What meaning could SW derive for it ?
>
> There could be a context switch (based on, say, the end of the time
> slot) at the op instruction. The page containing "somewhere" could be
> paged out and reused for something else, resulting in having no PA for
> "somewhere". On becoming active again, the ST would first produce a
> page fault (so another page table change), and "somewhere" would get a
> PA, but that can be quite different from the old PA.

The OS ensures that these things never occur without a TLB shootdown IPI.
Whether a RMW instruction does 1 or 2 translates is a local optimization
decision.

It all works if you think of the PTE Present flag as an ownership flag.
Not-Present PTE is owned by the OS, Present PTE is owned by HW MMU.
Only the owner may change PTE, HW TLB may change PTE Accessed and Modified
flags on PTE's it owns, and TLB _never_ caches PTE's marked Not-Present.

For an OS to make any change to a Present PTE it must
- clear the Present flag to take ownership back from MMU
- issue a TLB shootdown to all potentially affected cores
- wait for all shootdown ACK's
- make its PTE changes
- set Present flag giving ownership back to MMU
though there may be ways to optimize this sequence in some situations.
OS is responsible for coordinating its own activities with
thread mutexes and cpu spinlocks.

Re: Non-RISC-ness in AMD64

<jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22533&group=comp.arch#22533

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Tue, 28 Dec 2021 13:27:23 -0500
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>
<jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>
<2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="c02f2ec344c1003adc9abcaf1b023887";
logging-data="16849"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QxvWMFfeRZlmaYAymCPSm"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:nc79Drfb5E8rdYYSBWw81fJJrC4=
sha1:Lh10SJQTzuY77sIncwLS3XbneOU=
 by: Stefan Monnier - Tue, 28 Dec 2021 18:27 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
> ... an RMW machine is
> guaranteed to store to the same location that was loaded.

EricP [2021-12-28 10:47:03] wrote:
> Whether a RMW instruction does 1 or 2 translates is a local optimization
> decision.

My understanding is also that it's a implementation's optimization
choice, but Mitch seems to say it's not just an optimization.

Stefan

Re: Non-RISC-ness in AMD64

<164071692201.3020.8987411384740989250@media.vsta.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22534&group=comp.arch#22534

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: van...@vsta.org (Andy Valencia)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Tue, 28 Dec 2021 10:42:02 -0800
Lines: 18
Message-ID: <164071692201.3020.8987411384740989250@media.vsta.org>
References: <TkGyJ.242585$IW4.224257@fx48.iad> <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
X-Trace: individual.net STbzj5+UJcPyadS6G2kCaw4u9nCq45sDsoqdOZryqyhQnTKuBt
X-Orig-Path: media
Cancel-Lock: sha1:vAwrKPDIUMwIj6ySIZqWr2VHndc=
User-Agent: rn.py v0.0.1
 by: Andy Valencia - Tue, 28 Dec 2021 18:42 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
> For an OS to make any change to a Present PTE it must
> - clear the Present flag to take ownership back from MMU
> - issue a TLB shootdown to all potentially affected cores
> - wait for all shootdown ACK's

I had to fix a fundamental flaw in Sequent's Symmetry line, the root cause
was a lame duck TLB. The original dev had gone with the "almost impossible
to last long enough to matter" design, and I had finally hunted down a truly
subtle bug to this bad assumption.

I pooled virtual address space in a generational treatment, so that the whole
shootdown/ack dance happened fairly rarely, and thus its cost amortized
nicely.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Re: Non-RISC-ness in AMD64

<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22537&group=comp.arch#22537

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:8883:: with SMTP id k125mr16666073qkd.464.1640721131068;
Tue, 28 Dec 2021 11:52:11 -0800 (PST)
X-Received: by 2002:a05:6808:11c5:: with SMTP id p5mr17837150oiv.51.1640721130955;
Tue, 28 Dec 2021 11:52:10 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 28 Dec 2021 11:52:10 -0800 (PST)
In-Reply-To: <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e4fa:e569:5e7a:5438;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e4fa:e569:5e7a:5438
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 28 Dec 2021 19:52:11 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 21
 by: MitchAlsup - Tue, 28 Dec 2021 19:52 UTC

On Tuesday, December 28, 2021 at 12:27:25 PM UTC-6, Stefan Monnier wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> > ... an RMW machine is
> > guaranteed to store to the same location that was loaded.
> EricP [2021-12-28 10:47:03] wrote:
> > Whether a RMW instruction does 1 or 2 translates is a local optimization
> > decision.
<
> My understanding is also that it's a implementation's optimization
> choice, but Mitch seems to say it's not just an optimization.
<
Mitch is not claiming that taking an interrupt between the LD and the ST is
problematic, Mitch is pointing out that if the MMU tables of task[k] changes
while task[k] is using those tables, you are unlikely to get what you wanted.
<
Putting task[k] in a wait state (or as EricP points out removing the PTE from use
temporarily) moving the page, reinstalling PTE, and then allowing task[k] to run
again is de rigueur (and has been since Multix). But changing the paging tables
of an actively running task is much more problematic.
>
>
> Stefan

Re: Non-RISC-ness in AMD64

<JaOyJ.205306$3q9.102503@fx47.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22541&group=comp.arch#22541

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx47.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me> <6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org> <1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at> <TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org> <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com>
In-Reply-To: <f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 45
Message-ID: <JaOyJ.205306$3q9.102503@fx47.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 29 Dec 2021 00:43:21 UTC
Date: Tue, 28 Dec 2021 19:43:14 -0500
X-Received-Bytes: 3444
 by: EricP - Wed, 29 Dec 2021 00:43 UTC

MitchAlsup wrote:
> On Tuesday, December 28, 2021 at 12:27:25 PM UTC-6, Stefan Monnier wrote:
>> MitchAlsup <Mitch...@aol.com> writes:
>>> ... an RMW machine is
>>> guaranteed to store to the same location that was loaded.
>> EricP [2021-12-28 10:47:03] wrote:
>>> Whether a RMW instruction does 1 or 2 translates is a local optimization
>>> decision.
> <
>> My understanding is also that it's a implementation's optimization
>> choice, but Mitch seems to say it's not just an optimization.
> <
> Mitch is not claiming that taking an interrupt between the LD and the ST is
> problematic, Mitch is pointing out that if the MMU tables of task[k] changes
> while task[k] is using those tables, you are unlikely to get what you wanted.
> <
> Putting task[k] in a wait state (or as EricP points out removing the PTE from use
> temporarily) moving the page, reinstalling PTE, and then allowing task[k] to run
> again is de rigueur (and has been since Multix). But changing the paging tables
> of an actively running task is much more problematic.

Yes, its not the RMW sequence that is a problem, although the shootdown-IPI
does prevent that because the RMW is all before or all after the interrupt.
But a LD OP ST sequence could be paged out after the LD and page in
for the ST at a different physical address and it would not be harmed.

The purpose of the shootdown-IPI handshake is to ensure that
all cores agree that there is just one translation for that address.

You wouldn't want one core to RMW the old physical address and a different
core executing the same instruction to RMW the new physical address.
The shootdown-IPI handshake acts like a write-invalidate cache
coherence protocol, but for the PTE and implemented in software.

One can't eliminate waiting for all TLB shootdown ACK's before
changing a PTE, but one might be able to integrate the shootdown with
cache coherence protocol Invalidates which also must ACK so that
exclusive ownership of the PTE cache line is transferred to the writer
at the same time other copies are eliminated from all caches and TLB's.
(Its not quite the same because software decides which cores get
shootdowns while hardware tracks which caches have line copies.)
This could eliminate the IPI overhead.

Re: Non-RISC-ness in AMD64

<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22544&group=comp.arch#22544

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:bd05:: with SMTP id n5mr16871111qkf.293.1640739236652;
Tue, 28 Dec 2021 16:53:56 -0800 (PST)
X-Received: by 2002:a05:6830:348f:: with SMTP id c15mr17501268otu.254.1640739236385;
Tue, 28 Dec 2021 16:53:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 28 Dec 2021 16:53:56 -0800 (PST)
In-Reply-To: <JaOyJ.205306$3q9.102503@fx47.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8d47:d976:4476:a5f6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8d47:d976:4476:a5f6
References: <2021Dec24.180027@mips.complang.tuwien.ac.at> <sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com> <jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com> <2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad> <jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com> <JaOyJ.205306$3q9.102503@fx47.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com>
Subject: Re: Non-RISC-ness in AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 29 Dec 2021 00:53:56 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 53
 by: MitchAlsup - Wed, 29 Dec 2021 00:53 UTC

On Tuesday, December 28, 2021 at 6:43:24 PM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, December 28, 2021 at 12:27:25 PM UTC-6, Stefan Monnier wrote:
> >> MitchAlsup <Mitch...@aol.com> writes:
> >>> ... an RMW machine is
> >>> guaranteed to store to the same location that was loaded.
> >> EricP [2021-12-28 10:47:03] wrote:
> >>> Whether a RMW instruction does 1 or 2 translates is a local optimization
> >>> decision.
> > <
> >> My understanding is also that it's a implementation's optimization
> >> choice, but Mitch seems to say it's not just an optimization.
> > <
> > Mitch is not claiming that taking an interrupt between the LD and the ST is
> > problematic, Mitch is pointing out that if the MMU tables of task[k] changes
> > while task[k] is using those tables, you are unlikely to get what you wanted.
> > <
> > Putting task[k] in a wait state (or as EricP points out removing the PTE from use
> > temporarily) moving the page, reinstalling PTE, and then allowing task[k] to run
> > again is de rigueur (and has been since Multix). But changing the paging tables
> > of an actively running task is much more problematic.
> Yes, its not the RMW sequence that is a problem, although the shootdown-IPI
> does prevent that because the RMW is all before or all after the interrupt.
> But a LD OP ST sequence could be paged out after the LD and page in
> for the ST at a different physical address and it would not be harmed.
>
> The purpose of the shootdown-IPI handshake is to ensure that
> all cores agree that there is just one translation for that address.
<
But consider the case where the TLBs are coherent. Anyone with TLB
permission to anyone-else's MMU tables, can write to and instantly
modify the other-guy's TLB entries. With multi-core operations, this
could happen....
<
With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
at all in modifying MMU tables. You just write the MMU tables and
it is up to all the HW resources to "do the right thing".
<
That SW prevents this is no reason HW guys should not be worried
about anomalous behavior.
>
> You wouldn't want one core to RMW the old physical address and a different
> core executing the same instruction to RMW the new physical address.
> The shootdown-IPI handshake acts like a write-invalidate cache
> coherence protocol, but for the PTE and implemented in software.
>
> One can't eliminate waiting for all TLB shootdown ACK's before
> changing a PTE, but one might be able to integrate the shootdown with
> cache coherence protocol Invalidates which also must ACK so that
> exclusive ownership of the PTE cache line is transferred to the writer
> at the same time other copies are eliminated from all caches and TLB's.
> (Its not quite the same because software decides which cores get
> shootdowns while hardware tracks which caches have line copies.)
> This could eliminate the IPI overhead.

Re: Non-RISC-ness in AMD64

<jwva6gk81dz.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22546&group=comp.arch#22546

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Non-RISC-ness in AMD64
Date: Tue, 28 Dec 2021 23:46:48 -0500
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <jwva6gk81dz.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.180027@mips.complang.tuwien.ac.at>
<sqbd31$jkg$1@dont-email.me>
<6cfb554a-cd8c-4a83-804f-7299057e1c2dn@googlegroups.com>
<jwvv8z9ae74.fsf-monnier+comp.arch@gnu.org>
<1b4f4748-8370-4c6b-97dd-423178575422n@googlegroups.com>
<2021Dec28.112737@mips.complang.tuwien.ac.at>
<TkGyJ.242585$IW4.224257@fx48.iad>
<jwvpmpg8u3h.fsf-monnier+comp.arch@gnu.org>
<f742142a-1582-4561-8eb6-e0c5443b8c58n@googlegroups.com>
<JaOyJ.205306$3q9.102503@fx47.iad>
<e8c364e1-90af-44ac-9545-81f635c96b72n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="c8e3e16af424ed6fc14d57f777fdaf24";
logging-data="29278"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197wv4khq8zerDzX7ApcDh1"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:Szg5T0fLoUCUSyR9tdyEO6RwxMc=
sha1:Ar2xuHnNc3+rmgS9fdG0/ZawZvE=
 by: Stefan Monnier - Wed, 29 Dec 2021 04:46 UTC

> With coherent TLBs, there are no IPI-shootdowns. In fact, no IPIs
> at all in modifying MMU tables. You just write the MMU tables and
> it is up to all the HW resources to "do the right thing".
>
> That SW prevents this is no reason HW guys should not be worried
> about anomalous behavior.

I still don't see what kind of anomalous behavior you're thinking of
that's solved by special handling of (non-atomic) RMW.

Stefan

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor