Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.


devel / comp.arch / Re: RISC-V vs. Aarch64

SubjectAuthor
* RISC-V vs. Aarch64Anton Ertl
+* Re: RISC-V vs. Aarch64MitchAlsup
|+* Re: RISC-V vs. Aarch64Anton Ertl
||`* Re: RISC-V vs. Aarch64MitchAlsup
|| +- Re: RISC-V vs. Aarch64BGB
|| `- Re: RISC-V vs. Aarch64Anton Ertl
|+* Re: RISC-V vs. Aarch64Ivan Godard
||+- Re: RISC-V vs. Aarch64robf...@gmail.com
||+- Re: RISC-V vs. Aarch64MitchAlsup
||`* Re: RISC-V vs. Aarch64Quadibloc
|| `* Re: RISC-V vs. Aarch64Quadibloc
||  `- Re: RISC-V vs. Aarch64Quadibloc
|+* Re: RISC-V vs. Aarch64Marcus
||+- Re: RISC-V vs. Aarch64BGB
||`* Re: RISC-V vs. Aarch64MitchAlsup
|| +- Re: RISC-V vs. Aarch64BGB
|| `- Re: RISC-V vs. Aarch64Ivan Godard
|`- Re: RISC-V vs. Aarch64MitchAlsup
`* Re: RISC-V vs. Aarch64BGB
 +* Re: RISC-V vs. Aarch64MitchAlsup
 |+- Re: RISC-V vs. Aarch64MitchAlsup
 |+* Re: RISC-V vs. Aarch64Thomas Koenig
 ||+* Re: RISC-V vs. Aarch64Ivan Godard
 |||`* Re: RISC-V vs. Aarch64EricP
 ||| `- Re: RISC-V vs. Aarch64Ivan Godard
 ||+* Re: RISC-V vs. Aarch64MitchAlsup
 |||`* Re: RISC-V vs. Aarch64Ivan Godard
 ||| `* Re: RISC-V vs. Aarch64MitchAlsup
 |||  `* Re: RISC-V vs. Aarch64Ivan Godard
 |||   `* Re: RISC-V vs. Aarch64MitchAlsup
 |||    `- Re: RISC-V vs. Aarch64Marcus
 ||`* Re: RISC-V vs. Aarch64BGB
 || `- Re: RISC-V vs. Aarch64MitchAlsup
 |+* Re: RISC-V vs. Aarch64BGB
 ||`* Re: RISC-V vs. Aarch64MitchAlsup
 || `- Re: RISC-V vs. Aarch64Thomas Koenig
 |`* Re: RISC-V vs. Aarch64Marcus
 | `* Re: RISC-V vs. Aarch64EricP
 |  +* Re: RISC-V vs. Aarch64Marcus
 |  |+* Re: RISC-V vs. Aarch64MitchAlsup
 |  ||+* Re: RISC-V vs. Aarch64Niklas Holsti
 |  |||+* Re: RISC-V vs. Aarch64Bill Findlay
 |  ||||`- Re: RISC-V vs. Aarch64MitchAlsup
 |  |||`- Re: RISC-V vs. Aarch64Ivan Godard
 |  ||`- Re: RISC-V vs. Aarch64Thomas Koenig
 |  |+* Re: RISC-V vs. Aarch64Thomas Koenig
 |  ||+* Re: RISC-V vs. Aarch64MitchAlsup
 |  |||`- Re: RISC-V vs. Aarch64BGB
 |  ||+* Re: RISC-V vs. Aarch64Ivan Godard
 |  |||`* Re: RISC-V vs. Aarch64Thomas Koenig
 |  ||| `- Re: RISC-V vs. Aarch64Ivan Godard
 |  ||`* Re: RISC-V vs. Aarch64Marcus
 |  || +* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |`* Re: RISC-V vs. Aarch64aph
 |  || | +- Re: RISC-V vs. Aarch64Michael S
 |  || | `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |  `* Re: RISC-V vs. Aarch64robf...@gmail.com
 |  || |   +* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |   |`- Re: RISC-V vs. Aarch64Tim Rentsch
 |  || |   `* Re: RISC-V vs. Aarch64Terje Mathisen
 |  || |    `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |     `* Re: RISC-V vs. Aarch64Marcus
 |  || |      `* Re: RISC-V vs. Aarch64Guillaume
 |  || |       `* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        +- Re: RISC-V vs. Aarch64Marcus
 |  || |        +* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |`* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        | `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |  `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |        |   `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |    `* Re: RISC-V vs. Aarch64EricP
 |  || |        |     +* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |     |`* Re: RISC-V vs. Aarch64EricP
 |  || |        |     | `- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |     `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |      `* Re: RISC-V vs. Aarch64EricP
 |  || |        |       +- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |       `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        +* Re: RISC-V vs. Aarch64Brett
 |  || |        |        |+* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |        ||`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        `* Re: RISC-V vs. Aarch64Stephen Fuld
 |  || |        |         `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          +* Re: RISC-V vs. Aarch64Stefan Monnier
 |  || |        |          |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          +* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |          |`* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          | `- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |          +* Re: RISC-V vs. Aarch64Stephen Fuld
 |  || |        |          |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          `* Re: RISC-V vs. Aarch64EricP
 |  || |        |           +* Re: RISC-V vs. Aarch64EricP
 |  || |        |           |`* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |           | `* The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |  +- Re: The type of Mill's belt's slotsMitchAlsup
 |  || |        |           |  `* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |   `* Re: The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |    `* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |     +* Re: The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |     |`* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |     `* Re: The type of Mill's belt's slotsMitchAlsup
 |  || |        |           `- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        +* Re: RISC-V vs. Aarch64Guillaume
 |  || |        `* Re: RISC-V vs. Aarch64Quadibloc
 |  || `* MRISC32 vectorization (was: RISC-V vs. Aarch64)Thomas Koenig
 |  |`* Re: RISC-V vs. Aarch64Terje Mathisen
 |  `- Re: RISC-V vs. Aarch64Quadibloc
 +* Re: RISC-V vs. Aarch64Anton Ertl
 `- Re: RISC-V vs. Aarch64aph

Pages:123456789101112131415
Re: The type of Mill's belt's slots

<b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23069&group=comp.arch#23069

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5c89:: with SMTP id r9mr10411824qta.42.1642966484205;
Sun, 23 Jan 2022 11:34:44 -0800 (PST)
X-Received: by 2002:a05:6808:ecd:: with SMTP id q13mr4883454oiv.122.1642966484067;
Sun, 23 Jan 2022 11:34:44 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 Jan 2022 11:34:43 -0800 (PST)
In-Reply-To: <221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a9ec:f6c7:4e0d:af04;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a9ec:f6c7:4e0d:af04
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ss858d$m0a$1@dont-email.me>
<74f730f1-e138-4124-9fbb-21f388eafeb3n@googlegroups.com> <ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com> <ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at>
<221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>
Subject: Re: The type of Mill's belt's slots
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 Jan 2022 19:34:44 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 33
 by: MitchAlsup - Sun, 23 Jan 2022 19:34 UTC

On Sunday, January 23, 2022 at 4:09:03 AM UTC-6, Michael S wrote:
> On Sunday, January 23, 2022 at 1:31:58 AM UTC+2, Anton Ertl wrote:
> > Michael S <already...@yahoo.com> writes:
> > >I don't think that Spectre-V1 (bounds check bypass) can be solved in hardware without significant performance degradation on many important workloads.
> > We have described here in January 2018 (and repeated it later) how to
> > fix Spectre (all versions): Just treat microarchitectural state like
> > they treat architectural state: only make it visible to the outside on
> > commit, and throw it away on misprediction.
> >
> > I think this can be achieved with little performance degradation for
> > most workloads; it will cost hardware (not huge amounts, but still)
> > and some design effort.
> > - anton
> > --
> > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
<
> I read these discussions. Found all proposals unconvincing.
> They all appear to either underestimate performance cost of not doing load
> that is based on result of speculated load
<
This is not <directly> the problem. It is absolutely OK to perform a LD that
is under the shadow of a branch that reads data which is used as an operand
to another LD (or ST). What is NOT OK is to use the loaded value in a subsequent
AGEN when the LD does not get a hit in either tag or TLB or both. It is the use of
"bad data" that opens the sideband window, and it is the pre-retirement updates
of microarchitectural state (cache tags, cache lines, TLB PTEs, ...) that further
opens the sideband channel window so it can be exploited.
<
> or underestimate area cost of circuit
> that completly undo side-effects of such "twice-speculated" loads.
<
I submit that the overhead of closing this is nearly invisible in size and nearly
invisible in performance.

Re: The type of Mill's belt's slots

<a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23070&group=comp.arch#23070

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2195:: with SMTP id g21mr2183290qka.495.1642970698385;
Sun, 23 Jan 2022 12:44:58 -0800 (PST)
X-Received: by 2002:a05:6830:2b24:: with SMTP id l36mr9880504otv.333.1642970698165;
Sun, 23 Jan 2022 12:44:58 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 Jan 2022 12:44:57 -0800 (PST)
In-Reply-To: <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:e9bc:5e0b:e2f5:ee79;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:e9bc:5e0b:e2f5:ee79
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ss858d$m0a$1@dont-email.me>
<74f730f1-e138-4124-9fbb-21f388eafeb3n@googlegroups.com> <ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com> <ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at>
<221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com> <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com>
Subject: Re: The type of Mill's belt's slots
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 23 Jan 2022 20:44:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 37
 by: Michael S - Sun, 23 Jan 2022 20:44 UTC

On Sunday, January 23, 2022 at 9:34:45 PM UTC+2, MitchAlsup wrote:
> On Sunday, January 23, 2022 at 4:09:03 AM UTC-6, Michael S wrote:
> > On Sunday, January 23, 2022 at 1:31:58 AM UTC+2, Anton Ertl wrote:
> > > Michael S <already...@yahoo.com> writes:
> > > >I don't think that Spectre-V1 (bounds check bypass) can be solved in hardware without significant performance degradation on many important workloads.
> > > We have described here in January 2018 (and repeated it later) how to
> > > fix Spectre (all versions): Just treat microarchitectural state like
> > > they treat architectural state: only make it visible to the outside on
> > > commit, and throw it away on misprediction.
> > >
> > > I think this can be achieved with little performance degradation for
> > > most workloads; it will cost hardware (not huge amounts, but still)
> > > and some design effort.
> > > - anton
> > > --
> > > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
> <
> > I read these discussions. Found all proposals unconvincing.
> > They all appear to either underestimate performance cost of not doing load
> > that is based on result of speculated load
> <
> This is not <directly> the problem. It is absolutely OK to perform a LD that
> is under the shadow of a branch that reads data which is used as an operand
> to another LD (or ST). What is NOT OK is to use the loaded value in a subsequent
> AGEN when the LD does not get a hit in either tag or TLB or both. It is the use of
> "bad data" that opens the sideband window, and it is the pre-retirement updates
> of microarchitectural state (cache tags, cache lines, TLB PTEs, ...) that further
> opens the sideband channel window so it can be exploited.

But majority of the gain from speculation is exactly in situation when "twice-speculated" load misses L1D$

> <
> > or underestimate area cost of circuit
> > that completly undo side-effects of such "twice-speculated" loads.
> <
> I submit that the overhead of closing this is nearly invisible in size and nearly
> invisible in performance.

Re: The type of Mill's belt's slots

<66880e90-c285-4797-ab5d-0f3c1ec78101n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23072&group=comp.arch#23072

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a497:: with SMTP id n145mr2681554qke.527.1642971355888;
Sun, 23 Jan 2022 12:55:55 -0800 (PST)
X-Received: by 2002:a9d:650e:: with SMTP id i14mr9435643otl.350.1642971355562;
Sun, 23 Jan 2022 12:55:55 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 Jan 2022 12:55:55 -0800 (PST)
In-Reply-To: <a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a9ec:f6c7:4e0d:af04;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a9ec:f6c7:4e0d:af04
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ss858d$m0a$1@dont-email.me>
<74f730f1-e138-4124-9fbb-21f388eafeb3n@googlegroups.com> <ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com> <ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at>
<221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com> <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>
<a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <66880e90-c285-4797-ab5d-0f3c1ec78101n@googlegroups.com>
Subject: Re: The type of Mill's belt's slots
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 23 Jan 2022 20:55:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 71
 by: MitchAlsup - Sun, 23 Jan 2022 20:55 UTC

On Sunday, January 23, 2022 at 2:44:59 PM UTC-6, Michael S wrote:
> On Sunday, January 23, 2022 at 9:34:45 PM UTC+2, MitchAlsup wrote:
> > On Sunday, January 23, 2022 at 4:09:03 AM UTC-6, Michael S wrote:
> > > On Sunday, January 23, 2022 at 1:31:58 AM UTC+2, Anton Ertl wrote:
> > > > Michael S <already...@yahoo.com> writes:
> > > > >I don't think that Spectre-V1 (bounds check bypass) can be solved in hardware without significant performance degradation on many important workloads.
> > > > We have described here in January 2018 (and repeated it later) how to
> > > > fix Spectre (all versions): Just treat microarchitectural state like
> > > > they treat architectural state: only make it visible to the outside on
> > > > commit, and throw it away on misprediction.
> > > >
> > > > I think this can be achieved with little performance degradation for
> > > > most workloads; it will cost hardware (not huge amounts, but still)
> > > > and some design effort.
> > > > - anton
> > > > --
> > > > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > > > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
> > <
> > > I read these discussions. Found all proposals unconvincing.
> > > They all appear to either underestimate performance cost of not doing load
> > > that is based on result of speculated load
> > <
> > This is not <directly> the problem. It is absolutely OK to perform a LD that
> > is under the shadow of a branch that reads data which is used as an operand
> > to another LD (or ST). What is NOT OK is to use the loaded value in a subsequent
> > AGEN when the LD does not get a hit in either tag or TLB or both. It is the use of
> > "bad data" that opens the sideband window, and it is the pre-retirement updates
> > of microarchitectural state (cache tags, cache lines, TLB PTEs, ...) that further
> > opens the sideband channel window so it can be exploited.
<
> But majority of the gain from speculation is exactly in situation when "twice-speculated" load misses L1D$
<
No, it is not. What is important is that subsequent loads can be processed
the the point of retirement awaiting the missed load to finally resolve and
allow itself and subsequents to retire.
<
Spectré and Meltdown have a side channel means to observe the states of
the cache that is ONLY visible if instructions that will never retire are allowed
to modify cache state (Meltdown) or predictor state (Spectré_).
<
The elimination of a high precision clock also eliminates both of these exploits.
<
But so does deferring the microarchitectural state updates until retirement..
<
> > <
> > > or underestimate area cost of circuit
> > > that completly undo side-effects of such "twice-speculated" loads.
> > <
> > I submit that the overhead of closing this is nearly invisible in size and nearly
> > invisible in performance.

Re: fixing Spectre (was: The type of Mill's belt's slots)

<2022Jan23.233628@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23073&group=comp.arch#23073

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: fixing Spectre (was: The type of Mill's belt's slots)
Date: Sun, 23 Jan 2022 22:36:28 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 17
Message-ID: <2022Jan23.233628@mips.complang.tuwien.ac.at>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ssero6$kae$1@dont-email.me> <6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me> <666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at> <effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at> <221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com> <2022Jan23.193555@mips.complang.tuwien.ac.at> <ssk8p5$o1k$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="231a70890999f0984e9e4b5f17818584";
logging-data="5052"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+r3prLE8NuWPUGuf3Fe3DP"
Cancel-Lock: sha1:blV+iti47f4tMYPyjU96Cag7+Os=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 23 Jan 2022 22:36 UTC

Ivan Godard <ivan@millcomputing.com> writes:
>On 1/23/2022 10:35 AM, Anton Ertl wrote:
>> You don't undo the side-effects. Instead, you don't do the side
>> effects. You just keep the load buffers around until commit, and then
>> you update the L1 cache. And if the line is already in the L1 cache,
>> you only need to keep the info around that the line has been accessed
>> (for LRU).
>
>Yes, but that's hard to get right - the LRU can leak the reference too.

That's why you don't update the LRU info until the load has been
commited.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The type of Mill's belt's slots

<ssksr0$r2a$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23074&group=comp.arch#23074

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The type of Mill's belt's slots
Date: Sun, 23 Jan 2022 16:45:21 -0800
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <ssksr0$r2a$1@dont-email.me>
References: <ss7ila$obl$1@dont-email.me>
<jwv7dawmmb7.fsf-monnier+comp.arch@gnu.org> <ss858d$m0a$1@dont-email.me>
<74f730f1-e138-4124-9fbb-21f388eafeb3n@googlegroups.com>
<ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com>
<ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com>
<ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com>
<2022Jan22.144235@mips.complang.tuwien.ac.at>
<4qrrugdkra9v3jipof29gfhvjcriggv1nv@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 24 Jan 2022 00:45:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d3662e65be4ce2d7732efa79c9f1ae77";
logging-data="27722"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188X8S1tijDeG4xKIidYh3E"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:PvGa7dz7GFzEdvqMqAacSRit3Lc=
In-Reply-To: <4qrrugdkra9v3jipof29gfhvjcriggv1nv@4ax.com>
Content-Language: en-US
 by: Ivan Godard - Mon, 24 Jan 2022 00:45 UTC

On 1/23/2022 4:16 PM, David W Schroth wrote:
> On Sat, 22 Jan 2022 13:42:35 GMT, anton@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:
>
>> MitchAlsup <MitchAlsup@aol.com> writes:
>>> I wonder if all of the viruses, Spctr=C3=A9, Meltdown, RoP attacks have red=
>>> uced
>>> PC sales by more than a total of 37 chips ?
>>
>> Probably not by much, thanks also to the competetive landscape. But I
>> think that if one of the manufacturers chose to close Spectre, and
>> then made this big in their marketing, it might win them quite a lot
>> of sales (and lose the other quite a lot).
>>
>> Given that Alder Lake came out still vulnerable to Spectre 54 months
>> after Intel was informed of Spectre in June 2017, it seems that they
>> don't want to take this opportunity to have a USP for a while. We
>> will see with Zen 4 towards the end of the year whether AMD is wiser.
>>
>> We have seen with Rowhammer that the DRAM industry seems to just want
>> to wait this out. They seem to think that Rowhammer-immune DRAM won't
>> sell. Does the CPU industry think the same about Spectre?
>>
>> - anton
>
> I would be interested in this forum's reaction to the article in the
> December 2021 issue of Communications of the ACM, "Speculative Taint
> Tracking (STT): A Comprehensive Protection for Speculatively Accessed
> Data".

Available without paywall at
https://www.cs.tau.ac.il/~mad/publications/micro2019-stt.pdf

Re: The type of Mill's belt's slots

<a6637e52-b757-4ccf-ba9f-d79c17ff2b0cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23075&group=comp.arch#23075

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:6313:: with SMTP id x19mr4245347qkb.367.1642986384177;
Sun, 23 Jan 2022 17:06:24 -0800 (PST)
X-Received: by 2002:aca:646:: with SMTP id 67mr8055988oig.175.1642986383981;
Sun, 23 Jan 2022 17:06:23 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 23 Jan 2022 17:06:23 -0800 (PST)
In-Reply-To: <4qrrugdkra9v3jipof29gfhvjcriggv1nv@4ax.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:34f2:4189:fa0:ff4c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:34f2:4189:fa0:ff4c
References: <ss7ila$obl$1@dont-email.me> <jwv7dawmmb7.fsf-monnier+comp.arch@gnu.org>
<ss858d$m0a$1@dont-email.me> <74f730f1-e138-4124-9fbb-21f388eafeb3n@googlegroups.com>
<ss9p4l$a2i$1@dont-email.me> <34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com>
<ssero6$kae$1@dont-email.me> <6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com>
<ssfreu$ck$1@dont-email.me> <666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com>
<2022Jan22.144235@mips.complang.tuwien.ac.at> <4qrrugdkra9v3jipof29gfhvjcriggv1nv@4ax.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a6637e52-b757-4ccf-ba9f-d79c17ff2b0cn@googlegroups.com>
Subject: Re: The type of Mill's belt's slots
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 24 Jan 2022 01:06:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 35
 by: MitchAlsup - Mon, 24 Jan 2022 01:06 UTC

On Sunday, January 23, 2022 at 6:14:26 PM UTC-6, David W Schroth wrote:
> On Sat, 22 Jan 2022 13:42:35 GMT, an...@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:
>
> >MitchAlsup <Mitch...@aol.com> writes:
> >>I wonder if all of the viruses, Spctr=C3=A9, Meltdown, RoP attacks have red=
> >>uced
> >>PC sales by more than a total of 37 chips ?
> >
> >Probably not by much, thanks also to the competetive landscape. But I
> >think that if one of the manufacturers chose to close Spectre, and
> >then made this big in their marketing, it might win them quite a lot
> >of sales (and lose the other quite a lot).
> >
> >Given that Alder Lake came out still vulnerable to Spectre 54 months
> >after Intel was informed of Spectre in June 2017, it seems that they
> >don't want to take this opportunity to have a USP for a while. We
> >will see with Zen 4 towards the end of the year whether AMD is wiser.
> >
> >We have seen with Rowhammer that the DRAM industry seems to just want
> >to wait this out. They seem to think that Rowhammer-immune DRAM won't
> >sell. Does the CPU industry think the same about Spectre?
> >
> >- anton
>
> I would be interested in this forum's reaction to the article in the
> December 2021 issue of Communications of the ACM, "Speculative Taint
> Tracking (STT): A Comprehensive Protection for Speculatively Accessed
> Data".
>
First thoughts: A bunch of new features added to various (already hard to design)
circuits that lose 7%-15% performance. This is to be compared with a HW cost
of a few-ish % that looses an invisible amount of performance by buffering state
updates until safe and forwarding from the buffers while still unsafe.
<
Academic quality.

Re: The type of Mill's belt's slots

<ssloc2$dm6$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23079&group=comp.arch#23079

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!To5nvU/sTaigmVbgRJ05pQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The type of Mill's belt's slots
Date: Mon, 24 Jan 2022 09:35:21 +0100
Organization: Aioe.org NNTP Server
Message-ID: <ssloc2$dm6$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<ss858d$m0a$1@dont-email.me>
<74f730f1-e138-4124-9fbb-21f388eafeb3n@googlegroups.com>
<ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com>
<ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com>
<ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com>
<2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com>
<2022Jan23.002646@mips.complang.tuwien.ac.at>
<221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com>
<b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>
<a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com>
<66880e90-c285-4797-ab5d-0f3c1ec78101n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="14022"; posting-host="To5nvU/sTaigmVbgRJ05pQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Mon, 24 Jan 2022 08:35 UTC

MitchAlsup wrote:
> On Sunday, January 23, 2022 at 2:44:59 PM UTC-6, Michael S wrote:
>> On Sunday, January 23, 2022 at 9:34:45 PM UTC+2, MitchAlsup wrote:
>>> On Sunday, January 23, 2022 at 4:09:03 AM UTC-6, Michael S wrote:
>>>> On Sunday, January 23, 2022 at 1:31:58 AM UTC+2, Anton Ertl wrote:
>>>>> Michael S <already...@yahoo.com> writes:
>>>>>> I don't think that Spectre-V1 (bounds check bypass) can be solved in hardware without significant performance degradation on many important workloads.
>>>>> We have described here in January 2018 (and repeated it later) how to
>>>>> fix Spectre (all versions): Just treat microarchitectural state like
>>>>> they treat architectural state: only make it visible to the outside on
>>>>> commit, and throw it away on misprediction.
>>>>>
>>>>> I think this can be achieved with little performance degradation for
>>>>> most workloads; it will cost hardware (not huge amounts, but still)
>>>>> and some design effort.
>>>>> - anton
>>>>> --
>>>>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>>>>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
>>> <
>>>> I read these discussions. Found all proposals unconvincing.
>>>> They all appear to either underestimate performance cost of not doing load
>>>> that is based on result of speculated load
>>> <
>>> This is not <directly> the problem. It is absolutely OK to perform a LD that
>>> is under the shadow of a branch that reads data which is used as an operand
>>> to another LD (or ST). What is NOT OK is to use the loaded value in a subsequent
>>> AGEN when the LD does not get a hit in either tag or TLB or both. It is the use of
>>> "bad data" that opens the sideband window, and it is the pre-retirement updates
>>> of microarchitectural state (cache tags, cache lines, TLB PTEs, ...) that further
>>> opens the sideband channel window so it can be exploited.
> <
>> But majority of the gain from speculation is exactly in situation when "twice-speculated" load misses L1D$
> <
> No, it is not. What is important is that subsequent loads can be processed
> the the point of retirement awaiting the missed load to finally resolve and
> allow itself and subsequents to retire.
> <
> Spectré and Meltdown have a side channel means to observe the states of
> the cache that is ONLY visible if instructions that will never retire are allowed
> to modify cache state (Meltdown) or predictor state (Spectré_).
> <
> The elimination of a high precision clock also eliminates both of these exploits.
> <
> But so does deferring the microarchitectural state updates until retirement.

Nothing really bad happens if you delay such updates, we have seen
examples of this previously in the case of an x86 CPU (around the
Pentium era?) which could not update the branch predictor tables unless
you had a taken-branch-free cycle. I.e. a chain of Jcc where all were
taken including the one at the end that wrapped back up would never be
predicted correctly the first time you entered the loop:

or ecx,-1 ; Set to -1
add ecx,1000001h ; Will set carry

top:
jc carry_set
...
carry_set:
dec ecx ;; Don't touch carry
jnz top

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

fixing Spectre (was: The type of Mill's belt's slots)

<2022Jan24.133212@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23082&group=comp.arch#23082

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: fixing Spectre (was: The type of Mill's belt's slots)
Date: Mon, 24 Jan 2022 12:32:12 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 84
Message-ID: <2022Jan24.133212@mips.complang.tuwien.ac.at>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ssfreu$ck$1@dont-email.me> <666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at> <effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at> <221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com> <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com> <a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="5799183beb6a459a66ff5e341f7aa149";
logging-data="21828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ZCaWD9hh/0WPh2fnx3t+S"
Cancel-Lock: sha1:0EcdJRs0ShPTJ61oUYM1YEXFZ+E=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 24 Jan 2022 12:32 UTC

Michael S <already5chosen@yahoo.com> writes:
>But majority of the gain from speculation is exactly in situation when "twice-speculated" load misses L1D$

First of all, fixing Spectre the way we suggest will not slow the
execution down in that situation.

Second, I very much doubt that these situations provide the majority
of the gain from speculation. You gain from speculative execution
every time the branch predictor predicts correctly. You gain so much
that even in-order cores like the A53 and Bonnell have it. They would
all suffer heavily without branch prediction, including on code that
does not miss the D-cache. I actually expect that such code would
suffer more from leaving away branch predictions, because such code
performs more branches/cycle, so making branches more expensive causes
a bigger slowdown.

There have also been claims that OoO execution gains are mainly due to
being better able to deal with cache misses. That's not the case.
Workloads with few cache misses also benefit greatly from OoO
execution:

E.g., for our LaTeX benchmark:
<http://www.complang.tuwien.ac.at/franz/latex-bench>

Here's the D-cache misses measured on a CPU with 32KB D-cache
(Cortex-A72 on a RK3399 on a Rockpro64 with Debian 9):

LC_NUMERIC=en_US.utf8 taskset -c 5 perf stat -B -e cycles -e L1-dcache-loads -e L1-dcache-load-misses -e L1-dcache-stores -e L1-dcache-store-misses latex bench >/dev/null

Performance counter stats for 'latex bench':

Performance counter stats for 'latex bench':

2,406,381,871 cycles
1,072,495,674 L1-dcache-loads
15,797,929 L1-dcache-load-misses #1.47% of all L1-dcache hits
639,859,039 L1-dcache-stores
1,824,796 L1-dcache-store-misses

And measurements on a Skylake (also 32KB D-cache) with Debian 11:

LC_NUMERIC=en_US.utf8 perf stat -B -e cycles -e L1-dcache-loads -e L1-dcache-load-misses -e L1-dcache-stores -e L1-dcache-store-misses -e LLC-loads -e LLC-load-misses latex bench >/dev/null

Performance counter stats for 'latex bench':

1,567,716,026 cycles
887,037,687 L1-dcache-loads
15,595,603 L1-dcache-load-misses #1.76% of all L1-dcache accesses
554,281,424 L1-dcache-stores
<not supported> L1-dcache-store-misses
2,988,275 LLC-loads
252,212 LLC-load-misses #8.44% of all LL-cache accesses

We see a similar number of D-cache load misses on both CPUs, and a
relatively low D-cache miss rate overall. The L2 cache on the Skylake
reduces the miss rate by another factor of ~5, and the L3 cache by
another factor of 10 (unfortunately, I have no event counters for
these things on the Cortex-A72). If we assume 10 cycles extra for a
L1 miss/L2 hit, 40 cycles extra for a L1/L2 miss/L3 hit, and 200
cycles extra for an l1/l2/l3 miss (and 0 extra cycles for store
misses), the cache misses result in
10*15595603+(40-10)*2988275+(200-40)*252212=285,958,200 cycles on the
Skylake, i.e., less than 20% of the execution time.

Yet when we look at the performance of 2-wide in-order vs. 2-wide OoO
machines, we see that the OoO machines are roughly twice as fast; Only
a small part of this can be explained with OoO papering over cache
misses:

- Intel Atom 330, 1.6GHz, 512K L2 Zotac ION A, Debian 9 64bit 2.368
- AMD E-450 1650MHz (Lenovo Thinkpad X121e), Ubuntu 11.10 64-bit 1.216
- Celeron J1900 (Silvermont) 2416MHz (Shuttle XS35V4) Ubuntu16.10 1.052

- Odroid N2 (1896MHz Cortex A53) Ubuntu 18.04 2.488
- Odroid N2 (1800MHz Cortex A73) Ubuntu 18.04 1.224

The Atom 330 and Cortex-A53 are in-order, while the AMD E-450,
Silvermont, and Cortex-A73 are OoO.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Avoiding Spectre (was: The type of Mill's belt's slots)

<2022Jan24.144612@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23083&group=comp.arch#23083

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Avoiding Spectre (was: The type of Mill's belt's slots)
Date: Mon, 24 Jan 2022 13:46:12 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 41
Message-ID: <2022Jan24.144612@mips.complang.tuwien.ac.at>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ss9p4l$a2i$1@dont-email.me> <34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com> <ssero6$kae$1@dont-email.me> <6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me> <666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at> <effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at> <jwvlez677i7.fsf-monnier+comp.arch@gnu.org>
Injection-Info: reader02.eternal-september.org; posting-host="5799183beb6a459a66ff5e341f7aa149";
logging-data="21828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+3Iyy6Hz4i9pwK+81D3VBd"
Cancel-Lock: sha1:Sd4zs4sT9K6NdHA9AU3ekFf0aAU=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 24 Jan 2022 13:46 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> We have described here in January 2018 (and repeated it later) how to
>> fix Spectre (all versions): Just treat microarchitectural state like
>> they treat architectural state: only make it visible to the outside on
>> commit, and throw it away on misprediction.
>
>While I do believe it can be done I don't think it's as easy as
>it sounds. More specifically it requires:
>- A clear definition of "microarchitectural state".

That's pretty easy: All state that's not architectural. Given that
architectural state also has to be made visible to the outside only on
commit, the end result is that no speculative state must become
visible to the outside.

>- A clear definition of "the outside".

Anything that survives a branch misprediction recovery is outside.

>I think there is a lack of experience with such a viewpoint which makes
>it hard to have a clear understanding of the real costs.

Actually there has been a lot of experience with the same viewpoint
for architectural state since the mid-1990s, we only need to extend it
to all state.

There is one other potential side channel that needs to be taken care
of: resource limitations. This has been used for non-speculative
exploits, but AFAIK not for speculative ones yet. But while we are at
it, we should be thinking about that, too. E.g., if the attacker can
cause the attacked program to perform a speculative access to some
secret, and then perform a data-dependent action that can be watched
by measuring resource contention (e.g., functional unit usage in a
competing SMT thread, or cache access contention on another core).
Not sure if that can really be used for a realistic attack, but if
that side channel can be closed, there's one less worry.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The type of Mill's belt's slots

<2022Jan24.150330@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23084&group=comp.arch#23084

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The type of Mill's belt's slots
Date: Mon, 24 Jan 2022 14:03:30 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 21
Message-ID: <2022Jan24.150330@mips.complang.tuwien.ac.at>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <2022Jan22.144235@mips.complang.tuwien.ac.at> <effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at> <221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com> <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com> <a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com> <66880e90-c285-4797-ab5d-0f3c1ec78101n@googlegroups.com> <ssloc2$dm6$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="5799183beb6a459a66ff5e341f7aa149";
logging-data="21828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19U9c5qq9g4v+7kbKad76oT"
Cancel-Lock: sha1:MfZxigwol3yMaWdD8uh29pJxRc4=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 24 Jan 2022 14:03 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>MitchAlsup wrote:
>> But so does deferring the microarchitectural state updates until retirement.
>
>Nothing really bad happens if you delay such updates, we have seen
>examples of this previously in the case of an x86 CPU (around the
>Pentium era?) which could not update the branch predictor tables unless
>you had a taken-branch-free cycle. I.e. a chain of Jcc where all were
>taken including the one at the end that wrapped back up would never be
>predicted correctly the first time you entered the loop:

If you delay the update until retirement (and at that time you
actually know the outcome of the branch, so update the predictor with
correct data) rather than forever, the branch will be updated after
the first mispreduction, and all subsequent instances of that branch
will be predicted correctly (until the loop is exited).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Avoiding Spectre

<jwv4k5t2ncq.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23085&group=comp.arch#23085

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Avoiding Spectre
Date: Mon, 24 Jan 2022 10:06:35 -0500
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <jwv4k5t2ncq.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com>
<ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com>
<ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com>
<2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com>
<2022Jan23.002646@mips.complang.tuwien.ac.at>
<jwvlez677i7.fsf-monnier+comp.arch@gnu.org>
<2022Jan24.144612@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="40fefeb59a1e2f066fde26797ef180e8";
logging-data="15162"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19GyYloAXTeP+AnWXFNks6z"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:GGUQ3xvfgb/DsD8fNIuRMOw3FbU=
sha1:SzMtPCwi07omKyctQ1YjQl91PGk=
 by: Stefan Monnier - Mon, 24 Jan 2022 15:06 UTC

Anton Ertl [2022-01-24 13:46:12] wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> We have described here in January 2018 (and repeated it later) how to
>>> fix Spectre (all versions): Just treat microarchitectural state like
>>> they treat architectural state: only make it visible to the outside on
>>> commit, and throw it away on misprediction.
>>
>>While I do believe it can be done I don't think it's as easy as
>>it sounds. More specifically it requires:
>>- A clear definition of "microarchitectural state".
>
> That's pretty easy: All state that's not architectural.

AFAIK that includes all the latches & flip-flops, right?

> Given that architectural state also has to be made visible to the
> outside only on commit, the end result is that no speculative state
> must become visible to the outside.

But all the buffers we may add to delay exposing the microarchitectural
state are themselves part of the microarchitectural state. I think this
recursion is well-founded, but it does make thinking about it a bit
harder for me.

>>I think there is a lack of experience with such a viewpoint which makes
>>it hard to have a clear understanding of the real costs.
>
> Actually there has been a lot of experience with the same viewpoint
> for architectural state since the mid-1990s, we only need to extend it
> to all state.

But there is not the same kind of recursion for the non-micro
architectural state.

> There is one other potential side channel that needs to be taken care
> of: resource limitations.

Hmm... I didn't realize this was considered as a different category.
Maybe that's why you think it's easy and I think it's hard: to me side
channels via resource limitations is included in the general
Spectre problem.

Stefan

Re: The type of Mill's belt's slots

<d3c08671-8382-4f4f-9748-bd90172baa61n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23087&group=comp.arch#23087

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5908:: with SMTP id 8mr13538836qty.655.1643045284293;
Mon, 24 Jan 2022 09:28:04 -0800 (PST)
X-Received: by 2002:a05:6820:164:: with SMTP id k4mr2915593ood.18.1643045284034;
Mon, 24 Jan 2022 09:28:04 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 Jan 2022 09:28:03 -0800 (PST)
In-Reply-To: <2022Jan24.150330@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3da1:fbb1:90cb:151d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3da1:fbb1:90cb:151d
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at>
<221708c9-f4be-4421-a680-93a3e3599bd1n@googlegroups.com> <b7b60e9f-e043-4265-90bc-5e83a496ccc6n@googlegroups.com>
<a2c3f8cc-526b-40d0-9937-e54bb3fb833cn@googlegroups.com> <66880e90-c285-4797-ab5d-0f3c1ec78101n@googlegroups.com>
<ssloc2$dm6$1@gioia.aioe.org> <2022Jan24.150330@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d3c08671-8382-4f4f-9748-bd90172baa61n@googlegroups.com>
Subject: Re: The type of Mill's belt's slots
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 24 Jan 2022 17:28:04 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Mon, 24 Jan 2022 17:28 UTC

On Monday, January 24, 2022 at 8:06:39 AM UTC-6, Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >MitchAlsup wrote:
> >> But so does deferring the microarchitectural state updates until retirement.
> >
> >Nothing really bad happens if you delay such updates, we have seen
> >examples of this previously in the case of an x86 CPU (around the
> >Pentium era?) which could not update the branch predictor tables unless
> >you had a taken-branch-free cycle. I.e. a chain of Jcc where all were
> >taken including the one at the end that wrapped back up would never be
> >predicted correctly the first time you entered the loop:
<
> If you delay the update until retirement (and at that time you
> actually know the outcome of the branch, so update the predictor with
> correct data) rather than forever, the branch will be updated after
> the first mispreduction, and all subsequent instances of that branch
> will be predicted correctly (until the loop is exited).
<
You must have missed my statement about forwarding. You have to snoop
these deferral buffers and get the most up to date prediction--just like you
have to snoop the miss buffers for data that has not made its way into the
cache.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Avoiding Spectre (was: The type of Mill's belt's slots)

<5d2e7ae0-737a-4f8d-b2c9-3443ac56505dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23088&group=comp.arch#23088

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5d93:: with SMTP id d19mr13315562qtx.191.1643045430540;
Mon, 24 Jan 2022 09:30:30 -0800 (PST)
X-Received: by 2002:a05:6808:1987:: with SMTP id bj7mr2351726oib.37.1643045430213;
Mon, 24 Jan 2022 09:30:30 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 Jan 2022 09:30:30 -0800 (PST)
In-Reply-To: <2022Jan24.144612@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3da1:fbb1:90cb:151d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3da1:fbb1:90cb:151d
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com> <ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at>
<jwvlez677i7.fsf-monnier+comp.arch@gnu.org> <2022Jan24.144612@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5d2e7ae0-737a-4f8d-b2c9-3443ac56505dn@googlegroups.com>
Subject: Re: Avoiding Spectre (was: The type of Mill's belt's slots)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 24 Jan 2022 17:30:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 48
 by: MitchAlsup - Mon, 24 Jan 2022 17:30 UTC

On Monday, January 24, 2022 at 8:02:35 AM UTC-6, Anton Ertl wrote:
> Stefan Monnier <mon...@iro.umontreal.ca> writes:
> >> We have described here in January 2018 (and repeated it later) how to
> >> fix Spectre (all versions): Just treat microarchitectural state like
> >> they treat architectural state: only make it visible to the outside on
> >> commit, and throw it away on misprediction.
> >
> >While I do believe it can be done I don't think it's as easy as
> >it sounds. More specifically it requires:
> >- A clear definition of "microarchitectural state".
<
> That's pretty easy: All state that's not architectural. Given that
> architectural state also has to be made visible to the outside only on
> commit, the end result is that no speculative state must become
> visible to the outside.
<
> >- A clear definition of "the outside".
<
> Anything that survives a branch misprediction recovery is outside.
<
And any 3rd party observer (another CPU or DMA device, but these have
already been restricted to memory).
<
> >I think there is a lack of experience with such a viewpoint which makes
> >it hard to have a clear understanding of the real costs.
<
> Actually there has been a lot of experience with the same viewpoint
> for architectural state since the mid-1990s, we only need to extend it
> to all state.
<
All state that eventually becomes visible.
>
> There is one other potential side channel that needs to be taken care
> of: resource limitations. This has been used for non-speculative
> exploits, but AFAIK not for speculative ones yet. But while we are at
> it, we should be thinking about that, too. E.g., if the attacker can
> cause the attacked program to perform a speculative access to some
> secret, and then perform a data-dependent action that can be watched
> by measuring resource contention (e.g., functional unit usage in a
> competing SMT thread, or cache access contention on another core).
> Not sure if that can really be used for a realistic attack, but if
> that side channel can be closed, there's one less worry.
<
It likely can.
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Avoiding Spectre

<42f6ba28-4235-45b8-8a56-d09585651bffn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23089&group=comp.arch#23089

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5dcd:: with SMTP id m13mr15789085qvh.128.1643045654289;
Mon, 24 Jan 2022 09:34:14 -0800 (PST)
X-Received: by 2002:a05:6808:ecd:: with SMTP id q13mr2409512oiv.122.1643045654054;
Mon, 24 Jan 2022 09:34:14 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 Jan 2022 09:34:13 -0800 (PST)
In-Reply-To: <jwv4k5t2ncq.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3da1:fbb1:90cb:151d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3da1:fbb1:90cb:151d
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <ss9p4l$a2i$1@dont-email.me>
<34d3c262-c385-4a42-a6c1-a15c34536b28n@googlegroups.com> <ssero6$kae$1@dont-email.me>
<6cc6b294-7b58-42a8-8751-4fd808ea034fn@googlegroups.com> <ssfreu$ck$1@dont-email.me>
<666a6ae5-0bac-421e-81c6-8e3d752192e0n@googlegroups.com> <2022Jan22.144235@mips.complang.tuwien.ac.at>
<effb9090-c7c1-4e64-9a71-5b4b5d444d4en@googlegroups.com> <2022Jan23.002646@mips.complang.tuwien.ac.at>
<jwvlez677i7.fsf-monnier+comp.arch@gnu.org> <2022Jan24.144612@mips.complang.tuwien.ac.at>
<jwv4k5t2ncq.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <42f6ba28-4235-45b8-8a56-d09585651bffn@googlegroups.com>
Subject: Re: Avoiding Spectre
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 24 Jan 2022 17:34:14 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 54
 by: MitchAlsup - Mon, 24 Jan 2022 17:34 UTC

On Monday, January 24, 2022 at 9:06:38 AM UTC-6, Stefan Monnier wrote:
> Anton Ertl [2022-01-24 13:46:12] wrote:
> > Stefan Monnier <mon...@iro.umontreal.ca> writes:
> >>> We have described here in January 2018 (and repeated it later) how to
> >>> fix Spectre (all versions): Just treat microarchitectural state like
> >>> they treat architectural state: only make it visible to the outside on
> >>> commit, and throw it away on misprediction.
> >>
> >>While I do believe it can be done I don't think it's as easy as
> >>it sounds. More specifically it requires:
> >>- A clear definition of "microarchitectural state".
> >
> > That's pretty easy: All state that's not architectural.
> AFAIK that includes all the latches & flip-flops, right?
<
Not if these never become "visible"--for example, the pipeline flip-flops
do not become visible, but many times the state they carry does. Here
it is the state not the container holding the state the is important.
<
Notice again: it is the state not the container that matters.
<
> > Given that architectural state also has to be made visible to the
> > outside only on commit, the end result is that no speculative state
> > must become visible to the outside.
<
> But all the buffers we may add to delay exposing the microarchitectural
> state are themselves part of the microarchitectural state. I think this
> recursion is well-founded, but it does make thinking about it a bit
> harder for me.
<
No, they are like the pipeline flip-flops holding state.
<
> >>I think there is a lack of experience with such a viewpoint which makes
> >>it hard to have a clear understanding of the real costs.
> >
> > Actually there has been a lot of experience with the same viewpoint
> > for architectural state since the mid-1990s, we only need to extend it
> > to all state.
<
> But there is not the same kind of recursion for the non-micro
> architectural state.
<
MIcrocoded state would have to be included, pipeline nd sequencing
state is mainly what we have been talking about.
<
> > There is one other potential side channel that needs to be taken care
> > of: resource limitations.
<
> Hmm... I didn't realize this was considered as a different category.
> Maybe that's why you think it's easy and I think it's hard: to me side
> channels via resource limitations is included in the general
> Spectre problem.
>
>
> Stefan

Re: RISC-V vs. Aarch64

<ssn81h$fgu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23090&group=comp.arch#23090

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Mon, 24 Jan 2022 14:08:49 -0800
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <ssn81h$fgu$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
<ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 24 Jan 2022 22:08:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="b00d9fa0b848ef284705d8a657704a95";
logging-data="15902"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/gYrwekrJbVpLcRX97N963fIrTpcRd7sk="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:BBHmp9qdWTqpGOmTIyjrh0EASMU=
In-Reply-To: <ssgqj2$vef$1@gioia.aioe.org>
Content-Language: en-US
 by: Stephen Fuld - Mon, 24 Jan 2022 22:08 UTC

On 1/22/2022 3:42 AM, Terje Mathisen wrote:
> Stephen Fuld wrote:

>> Anyway, if one had this instruction, the main loop in the code above
>> could be something like
>>
>>
>> loop:
>>      LDUB        R10,[R1+R9]
>>      CARRY         R6,IO
>>      LBF        R12,R10,R2     ;I am not sure about R2,  It should be
>> the start of the packed buffer.
>>      STD        R12,[R3+R9<<3]
>>      ADD        R9,R9,#1
>>       CMP        R11,R9,R4
>>       BLT        R11,loop
>
> That is really quite nice.

Thank you!

>>
>> For a savings of about 10 instructions in the I cache, but fewer in
>> execution (but still significant) depending upon how often the
>> instructions under the predicate are executed.
>>
>>
>> Anyway, Of course, I invite comments, criticisms, etc.  One obvious
>> drawback is that this only addresses the "decompression" side.  While
>> I briefly considered a "Store Bit Field", I discarded it as it seemed
>> too complex, and presumably would used less frequently, as
>> compression/coding happens less frequently than decompression/decoding.
>
> Encoding is almost always far easier than decoding, since there are zero
> surprises when encoding. Yes, for codecs/compression it can be a _lot_
> of work to figure out a near-optimal encoding, but the actual conversion
> of the selected option into a bit stream is easy.

Good point, that I hadn't thought of. More reason why the hypothetical
"Store Bit Field" isn't needed.

> I.e. just like writing JSON or XML is trivial, while decoding is
> somewhere between "some work" and "very hard".

:-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: RISC-V vs. Aarch64

<8c5a7905-3f7d-491c-9f74-d12e39288570n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23091&group=comp.arch#23091

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5ad0:: with SMTP id d16mr14299424qtd.557.1643064792046;
Mon, 24 Jan 2022 14:53:12 -0800 (PST)
X-Received: by 2002:a05:6808:1490:: with SMTP id e16mr3390829oiw.84.1643064791871;
Mon, 24 Jan 2022 14:53:11 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 24 Jan 2022 14:53:11 -0800 (PST)
In-Reply-To: <ssn81h$fgu$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3da1:fbb1:90cb:151d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3da1:fbb1:90cb:151d
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <ssep4k$vke$1@dont-email.me>
<ssgqj2$vef$1@gioia.aioe.org> <ssn81h$fgu$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8c5a7905-3f7d-491c-9f74-d12e39288570n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 24 Jan 2022 22:53:12 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 60
 by: MitchAlsup - Mon, 24 Jan 2022 22:53 UTC

On Monday, January 24, 2022 at 4:08:52 PM UTC-6, Stephen Fuld wrote:
> On 1/22/2022 3:42 AM, Terje Mathisen wrote:
> > Stephen Fuld wrote:
>
> >> Anyway, if one had this instruction, the main loop in the code above
> >> could be something like
> >>
> >>
> >> loop:
> >> LDUB R10,[R1+R9]
> >> CARRY R6,IO
> >> LBF R12,R10,R2 ;I am not sure about R2, It should be
> >> the start of the packed buffer.
> >> STD R12,[R3+R9<<3]
> >> ADD R9,R9,#1
> >> CMP R11,R9,R4
> >> BLT R11,loop
> >
> > That is really quite nice.
>
> Thank you!
>
>
> >>
> >> For a savings of about 10 instructions in the I cache, but fewer in
> >> execution (but still significant) depending upon how often the
> >> instructions under the predicate are executed.
> >>
> >>
> >> Anyway, Of course, I invite comments, criticisms, etc. One obvious
> >> drawback is that this only addresses the "decompression" side. While
> >> I briefly considered a "Store Bit Field", I discarded it as it seemed
> >> too complex, and presumably would used less frequently, as
> >> compression/coding happens less frequently than decompression/decoding.
> >
> > Encoding is almost always far easier than decoding,
<
Encoding of bit-strings is easier than decoding of bit strings.
<
Encoding of instructions sets is a lot harder than decoding of instructions sets
because the encoding directly influences how hard it is (how many gates) it
takes; and if you don't take decoding (now and in future implementations) into
consideration you have already shot yourself in the foot. Witness x86. Q.E.D
<
> since there are zero
> > surprises when encoding. Yes, for codecs/compression it can be a _lot_
> > of work to figure out a near-optimal encoding, but the actual conversion
> > of the selected option into a bit stream is easy.
>
> Good point, that I hadn't thought of. More reason why the hypothetical
> "Store Bit Field" isn't needed.
>
> > I.e. just like writing JSON or XML is trivial, while decoding is
> > somewhere between "some work" and "very hard".
>
> :-)
>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: RISC-V vs. Aarch64

<mZUHJ.15870$mS1.14076@fx10.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23099&group=comp.arch#23099

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me> <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org> <ssn81h$fgu$1@dont-email.me>
In-Reply-To: <ssn81h$fgu$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 67
Message-ID: <mZUHJ.15870$mS1.14076@fx10.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 25 Jan 2022 15:48:34 UTC
Date: Tue, 25 Jan 2022 10:48:24 -0500
X-Received-Bytes: 4182
X-Original-Bytes: 4131
 by: EricP - Tue, 25 Jan 2022 15:48 UTC

Stephen Fuld wrote:
> On 1/22/2022 3:42 AM, Terje Mathisen wrote:
>> Stephen Fuld wrote:
>
>>> Anyway, if one had this instruction, the main loop in the code above
>>> could be something like
>>>
>>>
>>> loop:
>>> LDUB R10,[R1+R9]
>>> CARRY R6,IO
>>> LBF R12,R10,R2 ;I am not sure about R2, It should be
>>> the start of the packed buffer.
>>> STD R12,[R3+R9<<3]
>>> ADD R9,R9,#1
>>> CMP R11,R9,R4
>>> BLT R11,loop
>>
>> That is really quite nice.
>
> Thank you!
>
>
>>>
>>> For a savings of about 10 instructions in the I cache, but fewer in
>>> execution (but still significant) depending upon how often the
>>> instructions under the predicate are executed.
>>>
>>>
>>> Anyway, Of course, I invite comments, criticisms, etc. One obvious
>>> drawback is that this only addresses the "decompression" side. While
>>> I briefly considered a "Store Bit Field", I discarded it as it seemed
>>> too complex, and presumably would used less frequently, as
>>> compression/coding happens less frequently than decompression/decoding.
>>
>> Encoding is almost always far easier than decoding, since there are
>> zero surprises when encoding. Yes, for codecs/compression it can be a
>> _lot_ of work to figure out a near-optimal encoding, but the actual
>> conversion of the selected option into a bit stream is easy.
>
> Good point, that I hadn't thought of. More reason why the hypothetical
> "Store Bit Field" isn't needed.

High bandwith compression happens more these days with video conferencing.
Maybe its a chicken & egg thing - we do more because we can do more.

Unless Bit Field Insert requires significantly more hardware than
BF Extract I don't see why one would leave it out.

BFEXT requires a data and BF specifier source regs + 1 dest reg,
so fits in the standard RISC 2R 1W port model.

BFINS requires a struct source, data source, BF specifier regs, + dest reg,
so 3R 1W ports. Also instruction would require 4 register specifiers
unless you allow the struct source and dest reg to be the same specifier.
That bothers some, but if you have FMA then you probably have already
crossed that Rubicon. I would rather have the functionality than stick
to a somewhat arbitrary design philosophy.

There are a bunch of bit field manipulation instructions beyond those
useful for multimedia codec encode/decode, signal processing, crypto.
butterfly, reverse butterfly, permute, mix.

Then there is the whole area of double-wide shifts and bit fields
to facilitate bit stream processing.

Re: RISC-V vs. Aarch64

<jwv5yq7agm7.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23101&group=comp.arch#23101

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 25 Jan 2022 12:08:48 -0500
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <jwv5yq7agm7.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
<ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org>
<ssn81h$fgu$1@dont-email.me> <mZUHJ.15870$mS1.14076@fx10.iad>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="9c8f52ad70aef5777f5bb0cfceac8296";
logging-data="10231"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/F1Fpo3f9LjHQ5Yy9tpAoB"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:UrqXiXJwOzDfUeXIa9wF0twmu68=
sha1:6Ix7CtqoqRnpC3s/mNOQj/SfDyY=
 by: Stefan Monnier - Tue, 25 Jan 2022 17:08 UTC

> Unless Bit Field Insert requires significantly more hardware than
> BF Extract I don't see why one would leave it out.

I think we know better nowadays than to use this kind of reasoning when
decided what to put in an ISA ;-)

Stefan

Re: RISC-V vs. Aarch64

<sspc48$p9r$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23102&group=comp.arch#23102

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 25 Jan 2022 09:30:46 -0800
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <sspc48$p9r$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
<ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org>
<ssn81h$fgu$1@dont-email.me> <mZUHJ.15870$mS1.14076@fx10.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 25 Jan 2022 17:30:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="57262e48279642b28b38b0eb5671202b";
logging-data="25915"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+oMZPMgrRp0911OzHCwvddKHYftUoTB2Q="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:wWrgBdwIZyalgl6NX5XqVZ/4fS8=
In-Reply-To: <mZUHJ.15870$mS1.14076@fx10.iad>
Content-Language: en-US
 by: Stephen Fuld - Tue, 25 Jan 2022 17:30 UTC

On 1/25/2022 7:48 AM, EricP wrote:
> Stephen Fuld wrote:
>> On 1/22/2022 3:42 AM, Terje Mathisen wrote:
>>> Stephen Fuld wrote:
>>
>>>> Anyway, if one had this instruction, the main loop in the code above
>>>> could be something like
>>>>
>>>>
>>>> loop:
>>>>      LDUB        R10,[R1+R9]
>>>>      CARRY         R6,IO
>>>>      LBF        R12,R10,R2     ;I am not sure about R2,  It should
>>>> be the start of the packed buffer.
>>>>      STD        R12,[R3+R9<<3]
>>>>      ADD        R9,R9,#1
>>>>       CMP        R11,R9,R4
>>>>       BLT        R11,loop
>>>
>>> That is really quite nice.
>>
>> Thank you!
>>
>>
>>>>
>>>> For a savings of about 10 instructions in the I cache, but fewer in
>>>> execution (but still significant) depending upon how often the
>>>> instructions under the predicate are executed.
>>>>
>>>>
>>>> Anyway, Of course, I invite comments, criticisms, etc.  One obvious
>>>> drawback is that this only addresses the "decompression" side.
>>>> While I briefly considered a "Store Bit Field", I discarded it as it
>>>> seemed too complex, and presumably would used less frequently, as
>>>> compression/coding happens less frequently than decompression/decoding.
>>>
>>> Encoding is almost always far easier than decoding, since there are
>>> zero surprises when encoding. Yes, for codecs/compression it can be a
>>> _lot_ of work to figure out a near-optimal encoding, but the actual
>>> conversion of the selected option into a bit stream is easy.
>>
>> Good point, that I hadn't thought of.  More reason why the
>> hypothetical "Store Bit Field" isn't needed.
>
> High bandwith compression happens more these days with video conferencing.
> Maybe its a chicken & egg thing - we do more because we can do more.

Interesting. I think a valid counter argument. But see below.

>
> Unless Bit Field Insert requires significantly more hardware than
> BF Extract I don't see why one would leave it out.

The MY 66000 ISA already includes "simple" bit field insert and extract.
The main advantage of my proposed instruction is to eliminate all the
instructions needed for bookkeeping and testing for and handling when
the extracted field crosses double word boundaries.

> There are a bunch of bit field manipulation instructions beyond those
> useful for multimedia codec encode/decode, signal processing, crypto.
> butterfly, reverse butterfly, permute, mix.

Can you tell us what they are?

> Then there is the whole area of double-wide shifts and bit fields
> to facilitate bit stream processing.

Again, already included in the MY 66000 ISA via the use of the CARRY
meta instruction.

As shown in Mitch's example code, the problem posed could already be
solved with the existing ISA. My proposed additional instruction simply
made the solution more efficient.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: RISC-V vs. Aarch64

<9533b38e-39f1-432d-9a2b-fa57412d201en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23104&group=comp.arch#23104

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2915:: with SMTP id m21mr5383267qkp.374.1643134417903;
Tue, 25 Jan 2022 10:13:37 -0800 (PST)
X-Received: by 2002:a05:6808:2189:: with SMTP id be9mr1427044oib.227.1643134417591;
Tue, 25 Jan 2022 10:13:37 -0800 (PST)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jan 2022 10:13:37 -0800 (PST)
In-Reply-To: <mZUHJ.15870$mS1.14076@fx10.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cce:80ce:44ea:2dbd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cce:80ce:44ea:2dbd
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me>
<srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me> <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
<ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org>
<ssn81h$fgu$1@dont-email.me> <mZUHJ.15870$mS1.14076@fx10.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9533b38e-39f1-432d-9a2b-fa57412d201en@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 Jan 2022 18:13:37 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 34
 by: MitchAlsup - Tue, 25 Jan 2022 18:13 UTC

On Tuesday, January 25, 2022 at 9:48:39 AM UTC-6, EricP wrote:
> Stephen Fuld wrote:

> > Good point, that I hadn't thought of. More reason why the hypothetical
> > "Store Bit Field" isn't needed.
<
> High bandwith compression happens more these days with video conferencing.
> Maybe its a chicken & egg thing - we do more because we can do more.
>
> Unless Bit Field Insert requires significantly more hardware than
> BF Extract I don't see why one would leave it out.
>
> BFEXT requires a data and BF specifier source regs + 1 dest reg,
> so fits in the standard RISC 2R 1W port model.
>
> BFINS requires a struct source, data source, BF specifier regs, + dest reg,
> so 3R 1W ports. Also instruction would require 4 register specifiers
> unless you allow the struct source and dest reg to be the same specifier.
<
Register INS does require what you state
<
Memory INS does not as the memory container is updated in situ.
Thus, memory based EXT and INS are more like LDs and STs than
register EXT and INS.
<
> That bothers some, but if you have FMA then you probably have already
> crossed that Rubicon. I would rather have the functionality than stick
> to a somewhat arbitrary design philosophy.
>
> There are a bunch of bit field manipulation instructions beyond those
> useful for multimedia codec encode/decode, signal processing, crypto.
> butterfly, reverse butterfly, permute, mix.
>
> Then there is the whole area of double-wide shifts and bit fields
> to facilitate bit stream processing.

Re: RISC-V vs. Aarch64

<3c348c8c-e96d-43f0-8b3e-ca0b22580566n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23105&group=comp.arch#23105

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a611:: with SMTP id p17mr3020927qke.605.1643134488458;
Tue, 25 Jan 2022 10:14:48 -0800 (PST)
X-Received: by 2002:a05:6808:130b:: with SMTP id y11mr1430282oiv.309.1643134488125;
Tue, 25 Jan 2022 10:14:48 -0800 (PST)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jan 2022 10:14:47 -0800 (PST)
In-Reply-To: <sspc48$p9r$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cce:80ce:44ea:2dbd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cce:80ce:44ea:2dbd
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me>
<srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me> <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
<ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org>
<ssn81h$fgu$1@dont-email.me> <mZUHJ.15870$mS1.14076@fx10.iad> <sspc48$p9r$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3c348c8c-e96d-43f0-8b3e-ca0b22580566n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 Jan 2022 18:14:48 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 72
 by: MitchAlsup - Tue, 25 Jan 2022 18:14 UTC

On Tuesday, January 25, 2022 at 11:30:50 AM UTC-6, Stephen Fuld wrote:
> On 1/25/2022 7:48 AM, EricP wrote:
> > Stephen Fuld wrote:
> >> On 1/22/2022 3:42 AM, Terje Mathisen wrote:
> >>> Stephen Fuld wrote:
> >>
> >>>> Anyway, if one had this instruction, the main loop in the code above
> >>>> could be something like
> >>>>
> >>>>
> >>>> loop:
> >>>> LDUB R10,[R1+R9]
> >>>> CARRY R6,IO
> >>>> LBF R12,R10,R2 ;I am not sure about R2, It should
> >>>> be the start of the packed buffer.
> >>>> STD R12,[R3+R9<<3]
> >>>> ADD R9,R9,#1
> >>>> CMP R11,R9,R4
> >>>> BLT R11,loop
> >>>
> >>> That is really quite nice.
> >>
> >> Thank you!
> >>
> >>
> >>>>
> >>>> For a savings of about 10 instructions in the I cache, but fewer in
> >>>> execution (but still significant) depending upon how often the
> >>>> instructions under the predicate are executed.
> >>>>
> >>>>
> >>>> Anyway, Of course, I invite comments, criticisms, etc. One obvious
> >>>> drawback is that this only addresses the "decompression" side.
> >>>> While I briefly considered a "Store Bit Field", I discarded it as it
> >>>> seemed too complex, and presumably would used less frequently, as
> >>>> compression/coding happens less frequently than decompression/decoding.
> >>>
> >>> Encoding is almost always far easier than decoding, since there are
> >>> zero surprises when encoding. Yes, for codecs/compression it can be a
> >>> _lot_ of work to figure out a near-optimal encoding, but the actual
> >>> conversion of the selected option into a bit stream is easy.
> >>
> >> Good point, that I hadn't thought of. More reason why the
> >> hypothetical "Store Bit Field" isn't needed.
> >
> > High bandwith compression happens more these days with video conferencing.
> > Maybe its a chicken & egg thing - we do more because we can do more.
> Interesting. I think a valid counter argument. But see below.
> >
> > Unless Bit Field Insert requires significantly more hardware than
> > BF Extract I don't see why one would leave it out.
> The MY 66000 ISA already includes "simple" bit field insert and extract.
> The main advantage of my proposed instruction is to eliminate all the
> instructions needed for bookkeeping and testing for and handling when
> the extracted field crosses double word boundaries.
> > There are a bunch of bit field manipulation instructions beyond those
> > useful for multimedia codec encode/decode, signal processing, crypto.
> > butterfly, reverse butterfly, permute, mix.
> Can you tell us what they are?
> > Then there is the whole area of double-wide shifts and bit fields
> > to facilitate bit stream processing.
<
> Again, already included in the MY 66000 ISA via the use of the CARRY
> meta instruction.
<
Instruction-modifier
>
> As shown in Mitch's example code, the problem posed could already be
> solved with the existing ISA. My proposed additional instruction simply
> made the solution more efficient.
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: RISC-V vs. Aarch64

<wTXHJ.25684$7U.2006@fx42.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23107&group=comp.arch#23107

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx42.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me> <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org> <ssn81h$fgu$1@dont-email.me> <mZUHJ.15870$mS1.14076@fx10.iad> <sspc48$p9r$1@dont-email.me>
In-Reply-To: <sspc48$p9r$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 39
Message-ID: <wTXHJ.25684$7U.2006@fx42.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 25 Jan 2022 19:07:08 UTC
Date: Tue, 25 Jan 2022 14:06:47 -0500
X-Received-Bytes: 2914
 by: EricP - Tue, 25 Jan 2022 19:06 UTC

Stephen Fuld wrote:
> On 1/25/2022 7:48 AM, EricP wrote:
>
>> There are a bunch of bit field manipulation instructions beyond those
>> useful for multimedia codec encode/decode, signal processing, crypto.
>> butterfly, reverse butterfly, permute, mix.
>
> Can you tell us what they are?

I don't personally need such instructions as I rarely operate on
bit fields and when I do it is not performance critical,
but I note that others feel they do need them. One example paper:

A New Basis for Shifters in General-Purpose Processors for
Existing and Advanced Bit Manipulations, 2008
https://www.researchgate.net/publication/220332176_A_New_Basis_for_Shifters_in_General-Purpose_Processors_for_Existing_and_Advanced_Bit_Manipulations

describes the following bit field operations:
- rotate right & left
- shift right & left with zero or sign fill
- bit field extract & insert
- mix select right or left subwords
- butterfly and inverse butterfly
- parallel extract and insert (scatter, gather)
- popcount
- bit dot product
- bit matrix multiply

I would add:
- find first/last bit set/clear

potential usage in:

- dna sequencing and sequence compression, alignment
- crypto
- error correction
- bit stream signal processing, multiplexing

Re: RISC-V vs. Aarch64

<d1959d5f-0d22-4b43-8156-0449f43d430dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23108&group=comp.arch#23108

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:21ae:: with SMTP id t14mr10947795qvc.59.1643138614279;
Tue, 25 Jan 2022 11:23:34 -0800 (PST)
X-Received: by 2002:a05:6808:1a1d:: with SMTP id bk29mr1624425oib.293.1643138613924;
Tue, 25 Jan 2022 11:23:33 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jan 2022 11:23:33 -0800 (PST)
In-Reply-To: <wTXHJ.25684$7U.2006@fx42.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cce:80ce:44ea:2dbd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cce:80ce:44ea:2dbd
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <ssep4k$vke$1@dont-email.me>
<ssgqj2$vef$1@gioia.aioe.org> <ssn81h$fgu$1@dont-email.me>
<mZUHJ.15870$mS1.14076@fx10.iad> <sspc48$p9r$1@dont-email.me> <wTXHJ.25684$7U.2006@fx42.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d1959d5f-0d22-4b43-8156-0449f43d430dn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 Jan 2022 19:23:34 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4119
 by: MitchAlsup - Tue, 25 Jan 2022 19:23 UTC

On Tuesday, January 25, 2022 at 1:07:11 PM UTC-6, EricP wrote:
> Stephen Fuld wrote:
> > On 1/25/2022 7:48 AM, EricP wrote:
> >
> >> There are a bunch of bit field manipulation instructions beyond those
> >> useful for multimedia codec encode/decode, signal processing, crypto.
> >> butterfly, reverse butterfly, permute, mix.
> >
> > Can you tell us what they are?
> I don't personally need such instructions as I rarely operate on
> bit fields and when I do it is not performance critical,
> but I note that others feel they do need them. One example paper:
>
> A New Basis for Shifters in General-Purpose Processors for
> Existing and Advanced Bit Manipulations, 2008
> https://www.researchgate.net/publication/220332176_A_New_Basis_for_Shifters_in_General-Purpose_Processors_for_Existing_and_Advanced_Bit_Manipulations
>
<
My 66000 ISA encodings:
<
> describes the following bit field operations:
> - rotate right & left
CARRY Rs,{I}
SL/SR Rd,Rs,off
> - shift right & left with zero or sign fill
SL/SR Rd,Rs,off
> - bit field extract & insert
SL Rd,Rs,<len,off>
INS Rd,Rb,<len,off>
> - mix select right or left subwords
MUX Rd,Rs1,Rs2,mask // but level multiplex between s1 and s2 based on S3
> - butterfly and inverse butterfly
BITR Rd,Rs,<len,off>
> - parallel extract and insert (scatter, gather)
> - popcount
POP Rd,Rs
> - bit dot product
???
> - bit matrix multiply
BMM Rd,Rb,[Rbm]
>
> I would add:
> - find first/last bit set/clear
FF1 Rd,Rs // can be from the left or from the right.
FF1 Rd,~Rs
SET = INS Rd,#~0,<len,off>
CLR = IINS Rd,#0,<len,off>
>
<
Looks like I got most of them.
<
> potential usage in:
>
> - dna sequencing and sequence compression, alignment
> - crypto
> - error correction
> - bit stream signal processing, multiplexing

Re: RISC-V vs. Aarch64

<f405474c-1182-452c-91d9-a2c8df69a1f3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23109&group=comp.arch#23109

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1347:: with SMTP id w7mr17511870qtk.463.1643139504982;
Tue, 25 Jan 2022 11:38:24 -0800 (PST)
X-Received: by 2002:a05:6808:178e:: with SMTP id bg14mr1747235oib.84.1643139504555;
Tue, 25 Jan 2022 11:38:24 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jan 2022 11:38:24 -0800 (PST)
In-Reply-To: <d1959d5f-0d22-4b43-8156-0449f43d430dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cce:80ce:44ea:2dbd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cce:80ce:44ea:2dbd
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <ssep4k$vke$1@dont-email.me>
<ssgqj2$vef$1@gioia.aioe.org> <ssn81h$fgu$1@dont-email.me>
<mZUHJ.15870$mS1.14076@fx10.iad> <sspc48$p9r$1@dont-email.me>
<wTXHJ.25684$7U.2006@fx42.iad> <d1959d5f-0d22-4b43-8156-0449f43d430dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f405474c-1182-452c-91d9-a2c8df69a1f3n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 25 Jan 2022 19:38:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 61
 by: MitchAlsup - Tue, 25 Jan 2022 19:38 UTC

On Tuesday, January 25, 2022 at 1:23:35 PM UTC-6, MitchAlsup wrote:
> On Tuesday, January 25, 2022 at 1:07:11 PM UTC-6, EricP wrote:
> > Stephen Fuld wrote:
> > > On 1/25/2022 7:48 AM, EricP wrote:
> > >
> > >> There are a bunch of bit field manipulation instructions beyond those
> > >> useful for multimedia codec encode/decode, signal processing, crypto.
> > >> butterfly, reverse butterfly, permute, mix.
> > >
> > > Can you tell us what they are?
> > I don't personally need such instructions as I rarely operate on
> > bit fields and when I do it is not performance critical,
> > but I note that others feel they do need them. One example paper:
> >
> > A New Basis for Shifters in General-Purpose Processors for
> > Existing and Advanced Bit Manipulations, 2008
> > https://www.researchgate.net/publication/220332176_A_New_Basis_for_Shifters_in_General-Purpose_Processors_for_Existing_and_Advanced_Bit_Manipulations
> >
> <
> My 66000 ISA encodings:
> <
> > describes the following bit field operations:
> > - rotate right & left
> CARRY Rs,{I}
> SL/SR Rd,Rs,off
> > - shift right & left with zero or sign fill
> SL/SR Rd,Rs,off
> > - bit field extract & insert
> SL Rd,Rs,<len,off>
> INS Rd,Rb,<len,off>
> > - mix select right or left subwords
> MUX Rd,Rs1,Rs2,mask // but level multiplex between s1 and s2 based on S3
> > - butterfly and inverse butterfly
> BITR Rd,Rs,<len,off>
> > - parallel extract and insert (scatter, gather)
> > - popcount
> POP Rd,Rs
> > - bit dot product
> ???
> > - bit matrix multiply
> BMM Rd,Rb,[Rbm]
> >
> > I would add:
> > - find first/last bit set/clear
> FF1 Rd,Rs // can be from the left or from the right.
> FF1 Rd,~Rs
> SET = INS Rd,#~0,<len,off>
> CLR = IINS Rd,#0,<len,off>
> >
> <
> Looks like I got most of them.
<
I should also point out that many of these were available in Mc 88100
circa 1985............
<
> <
> > potential usage in:
> >
> > - dna sequencing and sequence compression, alignment
> > - crypto
> > - error correction
> > - bit stream signal processing, multiplexing

Re: RISC-V vs. Aarch64

<sspr2i$8de$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23113&group=comp.arch#23113

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Tue, 25 Jan 2022 13:45:54 -0800
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <sspr2i$8de$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
<ssep4k$vke$1@dont-email.me> <ssgqj2$vef$1@gioia.aioe.org>
<ssn81h$fgu$1@dont-email.me> <mZUHJ.15870$mS1.14076@fx10.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 25 Jan 2022 21:45:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f1d2e715850f20173216a00ab186ca2d";
logging-data="8622"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DLi7neqZvqLIDbsl5NngF"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:SlgvgGhx4tYoVTPjq/u1ICeZpGM=
In-Reply-To: <mZUHJ.15870$mS1.14076@fx10.iad>
Content-Language: en-US
 by: Ivan Godard - Tue, 25 Jan 2022 21:45 UTC

On 1/25/2022 7:48 AM, EricP wrote:
> Stephen Fuld wrote:
>> On 1/22/2022 3:42 AM, Terje Mathisen wrote:
>>> Stephen Fuld wrote:
>>
>>>> Anyway, if one had this instruction, the main loop in the code above
>>>> could be something like
>>>>
>>>>
>>>> loop:
>>>>      LDUB        R10,[R1+R9]
>>>>      CARRY         R6,IO
>>>>      LBF        R12,R10,R2     ;I am not sure about R2,  It should
>>>> be the start of the packed buffer.
>>>>      STD        R12,[R3+R9<<3]
>>>>      ADD        R9,R9,#1
>>>>       CMP        R11,R9,R4
>>>>       BLT        R11,loop
>>>
>>> That is really quite nice.
>>
>> Thank you!
>>
>>
>>>>
>>>> For a savings of about 10 instructions in the I cache, but fewer in
>>>> execution (but still significant) depending upon how often the
>>>> instructions under the predicate are executed.
>>>>
>>>>
>>>> Anyway, Of course, I invite comments, criticisms, etc.  One obvious
>>>> drawback is that this only addresses the "decompression" side.
>>>> While I briefly considered a "Store Bit Field", I discarded it as it
>>>> seemed too complex, and presumably would used less frequently, as
>>>> compression/coding happens less frequently than decompression/decoding.
>>>
>>> Encoding is almost always far easier than decoding, since there are
>>> zero surprises when encoding. Yes, for codecs/compression it can be a
>>> _lot_ of work to figure out a near-optimal encoding, but the actual
>>> conversion of the selected option into a bit stream is easy.
>>
>> Good point, that I hadn't thought of.  More reason why the
>> hypothetical "Store Bit Field" isn't needed.
>
> High bandwith compression happens more these days with video conferencing.
> Maybe its a chicken & egg thing - we do more because we can do more.
>
> Unless Bit Field Insert requires significantly more hardware than
> BF Extract I don't see why one would leave it out.
>
> BFEXT requires a data and BF specifier source regs + 1 dest reg,
> so fits in the standard RISC 2R 1W port model.
>
> BFINS requires a struct source, data source, BF specifier regs, + dest reg,
> so 3R 1W ports. Also instruction would require 4 register specifiers
> unless you allow the struct source and dest reg to be the same specifier.
> That bothers some, but if you have FMA then you probably have already
> crossed that Rubicon. I would rather have the functionality than stick
> to a somewhat arbitrary design philosophy.
>
> There are a bunch of bit field manipulation instructions beyond those
> useful for multimedia codec encode/decode, signal processing, crypto.
> butterfly, reverse butterfly, permute, mix.
>
> Then there is the whole area of double-wide shifts and bit fields
> to facilitate bit stream processing.
>
>

You don't offer dynamic BFINS/EXT? Dynamc needs two more registers,
unless you separately buid a descriptor like Mitch.

Pages:123456789101112131415
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor