Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

With all the fancy scientists in the world, why can't they just once build a nuclear balm?


devel / comp.arch / RISC-V vs. Aarch64

SubjectAuthor
* RISC-V vs. Aarch64Anton Ertl
+* Re: RISC-V vs. Aarch64MitchAlsup
|+* Re: RISC-V vs. Aarch64Anton Ertl
||`* Re: RISC-V vs. Aarch64MitchAlsup
|| +- Re: RISC-V vs. Aarch64BGB
|| `- Re: RISC-V vs. Aarch64Anton Ertl
|+* Re: RISC-V vs. Aarch64Ivan Godard
||+- Re: RISC-V vs. Aarch64robf...@gmail.com
||+- Re: RISC-V vs. Aarch64MitchAlsup
||`* Re: RISC-V vs. Aarch64Quadibloc
|| `* Re: RISC-V vs. Aarch64Quadibloc
||  `- Re: RISC-V vs. Aarch64Quadibloc
|+* Re: RISC-V vs. Aarch64Marcus
||+- Re: RISC-V vs. Aarch64BGB
||`* Re: RISC-V vs. Aarch64MitchAlsup
|| +- Re: RISC-V vs. Aarch64BGB
|| `- Re: RISC-V vs. Aarch64Ivan Godard
|`- Re: RISC-V vs. Aarch64MitchAlsup
`* Re: RISC-V vs. Aarch64BGB
 +* Re: RISC-V vs. Aarch64MitchAlsup
 |+- Re: RISC-V vs. Aarch64MitchAlsup
 |+* Re: RISC-V vs. Aarch64Thomas Koenig
 ||+* Re: RISC-V vs. Aarch64Ivan Godard
 |||`* Re: RISC-V vs. Aarch64EricP
 ||| `- Re: RISC-V vs. Aarch64Ivan Godard
 ||+* Re: RISC-V vs. Aarch64MitchAlsup
 |||`* Re: RISC-V vs. Aarch64Ivan Godard
 ||| `* Re: RISC-V vs. Aarch64MitchAlsup
 |||  `* Re: RISC-V vs. Aarch64Ivan Godard
 |||   `* Re: RISC-V vs. Aarch64MitchAlsup
 |||    `- Re: RISC-V vs. Aarch64Marcus
 ||`* Re: RISC-V vs. Aarch64BGB
 || `- Re: RISC-V vs. Aarch64MitchAlsup
 |+* Re: RISC-V vs. Aarch64BGB
 ||`* Re: RISC-V vs. Aarch64MitchAlsup
 || `- Re: RISC-V vs. Aarch64Thomas Koenig
 |`* Re: RISC-V vs. Aarch64Marcus
 | `* Re: RISC-V vs. Aarch64EricP
 |  +* Re: RISC-V vs. Aarch64Marcus
 |  |+* Re: RISC-V vs. Aarch64MitchAlsup
 |  ||+* Re: RISC-V vs. Aarch64Niklas Holsti
 |  |||+* Re: RISC-V vs. Aarch64Bill Findlay
 |  ||||`- Re: RISC-V vs. Aarch64MitchAlsup
 |  |||`- Re: RISC-V vs. Aarch64Ivan Godard
 |  ||`- Re: RISC-V vs. Aarch64Thomas Koenig
 |  |+* Re: RISC-V vs. Aarch64Thomas Koenig
 |  ||+* Re: RISC-V vs. Aarch64MitchAlsup
 |  |||`- Re: RISC-V vs. Aarch64BGB
 |  ||+* Re: RISC-V vs. Aarch64Ivan Godard
 |  |||`* Re: RISC-V vs. Aarch64Thomas Koenig
 |  ||| `- Re: RISC-V vs. Aarch64Ivan Godard
 |  ||`* Re: RISC-V vs. Aarch64Marcus
 |  || +* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |`* Re: RISC-V vs. Aarch64aph
 |  || | +- Re: RISC-V vs. Aarch64Michael S
 |  || | `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |  `* Re: RISC-V vs. Aarch64robf...@gmail.com
 |  || |   +* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |   |`- Re: RISC-V vs. Aarch64Tim Rentsch
 |  || |   `* Re: RISC-V vs. Aarch64Terje Mathisen
 |  || |    `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |     `* Re: RISC-V vs. Aarch64Marcus
 |  || |      `* Re: RISC-V vs. Aarch64Guillaume
 |  || |       `* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        +- Re: RISC-V vs. Aarch64Marcus
 |  || |        +* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |`* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        | `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |  `* Re: RISC-V vs. Aarch64Thomas Koenig
 |  || |        |   `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |    `* Re: RISC-V vs. Aarch64EricP
 |  || |        |     +* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |     |`* Re: RISC-V vs. Aarch64EricP
 |  || |        |     | `- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |     `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |      `* Re: RISC-V vs. Aarch64EricP
 |  || |        |       +- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |       `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        +* Re: RISC-V vs. Aarch64Brett
 |  || |        |        |+* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |        ||`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |        `* Re: RISC-V vs. Aarch64Stephen Fuld
 |  || |        |         `* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          +* Re: RISC-V vs. Aarch64Stefan Monnier
 |  || |        |          |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          +* Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |          |`* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          | `- Re: RISC-V vs. Aarch64MitchAlsup
 |  || |        |          +* Re: RISC-V vs. Aarch64Stephen Fuld
 |  || |        |          |`- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |          `* Re: RISC-V vs. Aarch64EricP
 |  || |        |           +* Re: RISC-V vs. Aarch64EricP
 |  || |        |           |`* Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        |           | `* The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |  +- Re: The type of Mill's belt's slotsMitchAlsup
 |  || |        |           |  `* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |   `* Re: The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |    `* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |     +* Re: The type of Mill's belt's slotsStefan Monnier
 |  || |        |           |     |`* Re: The type of Mill's belt's slotsIvan Godard
 |  || |        |           |     `* Re: The type of Mill's belt's slotsMitchAlsup
 |  || |        |           `- Re: RISC-V vs. Aarch64Ivan Godard
 |  || |        +* Re: RISC-V vs. Aarch64Guillaume
 |  || |        `* Re: RISC-V vs. Aarch64Quadibloc
 |  || `* MRISC32 vectorization (was: RISC-V vs. Aarch64)Thomas Koenig
 |  |`* Re: RISC-V vs. Aarch64Terje Mathisen
 |  `- Re: RISC-V vs. Aarch64Quadibloc
 +* Re: RISC-V vs. Aarch64Anton Ertl
 `- Re: RISC-V vs. Aarch64aph

Pages:123456789101112131415
Re: RISC-V vs. Aarch64

<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22946&group=comp.arch#22946

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:230d:: with SMTP id gc13mr4471038qvb.68.1642089240664;
Thu, 13 Jan 2022 07:54:00 -0800 (PST)
X-Received: by 2002:a9d:6289:: with SMTP id x9mr3586130otk.243.1642089240379;
Thu, 13 Jan 2022 07:54:00 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 07:54:00 -0800 (PST)
In-Reply-To: <srp4lu$hhg$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7c1c:e150:9e08:4b91;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7c1c:e150:9e08:4b91
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <sql2cm$3h7$1@dont-email.me>
<sqmsqq$14kp$1@gioia.aioe.org> <VSFzJ.136700$7D4.47834@fx37.iad>
<2021Dec31.203710@mips.complang.tuwien.ac.at> <KC_zJ.59028$Ak2.12921@fx20.iad>
<86h7agvxun.fsf@linuxsc.com> <M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com> <srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me>
<srp4lu$hhg$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jan 2022 15:54:00 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 104
 by: MitchAlsup - Thu, 13 Jan 2022 15:54 UTC

On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
> Ivan Godard wrote:
> > On 1/12/2022 2:46 PM, MitchAlsup wrote:
> >> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
> >>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
> >>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
> >>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
> >>>>>
> >>>>>> So, what does (13 << -7) mean ?
> >>>>> From the replies, I see that this is a completely different
> >>>>> can of worms. I would tend to favor the VAX interpretation,
> >>>>> but it's unclear to me that it would be worth the extra
> >>>>> run-time overhead that it might imply.
> >>>> <
> >>>> The VAX interpretation does not allow for shifts to be used as
> >>>> bit-manipulation instructions (extract signed and unsigned).
> >> <
> >>> That turns into a rotate and a shft right. But isn't that what a
> >>> hardware EXTR is going to do anyway?
> >> <
> >> Technically an extract is:
> >> <
> >> r = ((c << (containerWidth - fieldWidth - offset)) >>
> >> (containerWidth-fieldWidth));
> >> <
> >> However, what we do in HW is:
> >> <
> >> m = tableM[fieldWidth]; // mask m:: often done
> >> with table
> >> s = ~0u << (fieldWidth+offset); // sign extension bits ::
> >> often done with table
> >> t = (c >> offset) & m | ( signed || (c & ( 1 <<
> >> (fieldWidth+offset)) ? s : 0);
> >> <
> >> as this uses 1 shifter and either tables or greater-than decoders.
> >> <
> >> But the part I was hinting at is that we have a large container for
> >> this shift
> >> count and we need only a few bits on one end. So, in My 66000, the
> >> upper ½
> >> of the container is allowed to contain a value 0..64 (where both 0 and 64
> >> mean 64-bit field width).
> >> <
> >> Done this way, shifts are degenerate subsets of EXT.
> >> <
> >> If you use the sign bit<63> to control shift direction, you probably
> >> should not
> >> be using bits<37:32> as the field width because pasting the fieldWidth
> >> into
> >> such a container is simply harder.
> >> <
> >> I learned the hard way about placing both containers too close together
> >> in the Mc 88100.
> >
> > Hmm. So the container is a field descriptor. But it's not actually a
> > bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
> > IIRC. Gives you some encoding entropy, but not actually any new
> > semantic, and you have to build the descriptor, which you avoid when
> > offset and width are separate arguments.
> >
> > Where ISAs really fall down is parsing a bit stream: grab a dynamic
> > number of bits off the front of a bit stream, advancing the stream; word
> > boundaries are not significant. The problem is that HW provides word
> > streams (loop/load) and mapping that to a bit stream is nasty. The logic
> > is the same as mapping a line stream into a (VL) instruction stream in
> > the decoder's instruction buffer, but how to represent that in an ISA?
> We did use to have something like that back in the PDP days with the
> variable-sized load byte opcode that would automatically step forward to
> the next work if needed.
>
> Today we would need a separate hw mechanism for that bitstream buffer,
> and probably a filling mechanism which either would need to be
> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
> would load the next (8/16/32 bits into the target register at offs bit
> offset, updating the OFFS reg and SRC pointer, but only if there was
> room, otherwise it is a NOP. The main problem is the need to update all
> three register operands. :-(
>
> Having it would however make it far easier to handle arbitrary bit
> streams, avoiding the current need for branchy code.
<
What is wrong with having a container 2× as large as the largest bit-field
then use a shift-double to position the bits at an appropriate
place for extraction and then decoding/encoding.
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: RISC-V vs. Aarch64

<srpjgj$541$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22947&group=comp.arch#22947

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 08:20:35 -0800
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <srpjgj$541$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 13 Jan 2022 16:20:35 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6046e23aa8f48c74d77f2fb73876c0cc";
logging-data="5249"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NSGoSmZPTEa72pXwv4e7X"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:kkMO6ZaDd11eqkJM4mvTqnCe4ec=
In-Reply-To: <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Thu, 13 Jan 2022 16:20 UTC

On 1/13/2022 7:54 AM, MitchAlsup wrote:
> On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
>> Ivan Godard wrote:
>>> On 1/12/2022 2:46 PM, MitchAlsup wrote:
>>>> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
>>>>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
>>>>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
>>>>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
>>>>>>>
>>>>>>>> So, what does (13 << -7) mean ?
>>>>>>> From the replies, I see that this is a completely different
>>>>>>> can of worms. I would tend to favor the VAX interpretation,
>>>>>>> but it's unclear to me that it would be worth the extra
>>>>>>> run-time overhead that it might imply.
>>>>>> <
>>>>>> The VAX interpretation does not allow for shifts to be used as
>>>>>> bit-manipulation instructions (extract signed and unsigned).
>>>> <
>>>>> That turns into a rotate and a shft right. But isn't that what a
>>>>> hardware EXTR is going to do anyway?
>>>> <
>>>> Technically an extract is:
>>>> <
>>>> r = ((c << (containerWidth - fieldWidth - offset)) >>
>>>> (containerWidth-fieldWidth));
>>>> <
>>>> However, what we do in HW is:
>>>> <
>>>> m = tableM[fieldWidth]; // mask m:: often done
>>>> with table
>>>> s = ~0u << (fieldWidth+offset); // sign extension bits ::
>>>> often done with table
>>>> t = (c >> offset) & m | ( signed || (c & ( 1 <<
>>>> (fieldWidth+offset)) ? s : 0);
>>>> <
>>>> as this uses 1 shifter and either tables or greater-than decoders.
>>>> <
>>>> But the part I was hinting at is that we have a large container for
>>>> this shift
>>>> count and we need only a few bits on one end. So, in My 66000, the
>>>> upper ½
>>>> of the container is allowed to contain a value 0..64 (where both 0 and 64
>>>> mean 64-bit field width).
>>>> <
>>>> Done this way, shifts are degenerate subsets of EXT.
>>>> <
>>>> If you use the sign bit<63> to control shift direction, you probably
>>>> should not
>>>> be using bits<37:32> as the field width because pasting the fieldWidth
>>>> into
>>>> such a container is simply harder.
>>>> <
>>>> I learned the hard way about placing both containers too close together
>>>> in the Mc 88100.
>>>
>>> Hmm. So the container is a field descriptor. But it's not actually a
>>> bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
>>> IIRC. Gives you some encoding entropy, but not actually any new
>>> semantic, and you have to build the descriptor, which you avoid when
>>> offset and width are separate arguments.
>>>
>>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
>>> number of bits off the front of a bit stream, advancing the stream; word
>>> boundaries are not significant. The problem is that HW provides word
>>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
>>> is the same as mapping a line stream into a (VL) instruction stream in
>>> the decoder's instruction buffer, but how to represent that in an ISA?
>> We did use to have something like that back in the PDP days with the
>> variable-sized load byte opcode that would automatically step forward to
>> the next work if needed.
>>
>> Today we would need a separate hw mechanism for that bitstream buffer,
>> and probably a filling mechanism which either would need to be
>> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
>> would load the next (8/16/32 bits into the target register at offs bit
>> offset, updating the OFFS reg and SRC pointer, but only if there was
>> room, otherwise it is a NOP. The main problem is the need to update all
>> three register operands. :-(
>>
>> Having it would however make it far easier to handle arbitrary bit
>> streams, avoiding the current need for branchy code.
> <
> What is wrong with having a container 2× as large as the largest bit-field
> then use a shift-double to position the bits at an appropriate
> place for extraction and then decoding/encoding.

What's wrong is the need to conditionally refill and then merge the
loaded value into the shift pair. The condition is totally
unpredictable, so you get a miss every <word size>/<average request
size> cycles. We have predicated loads and isomorphic shifts, but have
only enough slots to do a reasonable job on a Gold.

Try it on my66: input is an array of request bit-sizes, an input array
of bits, an output array of words, and a count. Unpack the consecutive
bit-size fields from the bits into the words.

Have fun.

Re: RISC-V vs. Aarch64

<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22949&group=comp.arch#22949

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:506:: with SMTP id l6mr4563374qtx.61.1642096423860;
Thu, 13 Jan 2022 09:53:43 -0800 (PST)
X-Received: by 2002:a4a:270d:: with SMTP id l13mr3664902oof.5.1642096423562;
Thu, 13 Jan 2022 09:53:43 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 09:53:43 -0800 (PST)
In-Reply-To: <srpjgj$541$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7c1c:e150:9e08:4b91;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7c1c:e150:9e08:4b91
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jan 2022 17:53:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 184
 by: MitchAlsup - Thu, 13 Jan 2022 17:53 UTC

On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
> On 1/13/2022 7:54 AM, MitchAlsup wrote:
> > On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
> >> Ivan Godard wrote:
> >>> On 1/12/2022 2:46 PM, MitchAlsup wrote:
> >>>> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
> >>>>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
> >>>>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
> >>>>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
> >>>>>>>
> >>>>>>>> So, what does (13 << -7) mean ?
> >>>>>>> From the replies, I see that this is a completely different
> >>>>>>> can of worms. I would tend to favor the VAX interpretation,
> >>>>>>> but it's unclear to me that it would be worth the extra
> >>>>>>> run-time overhead that it might imply.
> >>>>>> <
> >>>>>> The VAX interpretation does not allow for shifts to be used as
> >>>>>> bit-manipulation instructions (extract signed and unsigned).
> >>>> <
> >>>>> That turns into a rotate and a shft right. But isn't that what a
> >>>>> hardware EXTR is going to do anyway?
> >>>> <
> >>>> Technically an extract is:
> >>>> <
> >>>> r = ((c << (containerWidth - fieldWidth - offset)) >>
> >>>> (containerWidth-fieldWidth));
> >>>> <
> >>>> However, what we do in HW is:
> >>>> <
> >>>> m = tableM[fieldWidth]; // mask m:: often done
> >>>> with table
> >>>> s = ~0u << (fieldWidth+offset); // sign extension bits ::
> >>>> often done with table
> >>>> t = (c >> offset) & m | ( signed || (c & ( 1 <<
> >>>> (fieldWidth+offset)) ? s : 0);
> >>>> <
> >>>> as this uses 1 shifter and either tables or greater-than decoders.
> >>>> <
> >>>> But the part I was hinting at is that we have a large container for
> >>>> this shift
> >>>> count and we need only a few bits on one end. So, in My 66000, the
> >>>> upper ½
> >>>> of the container is allowed to contain a value 0..64 (where both 0 and 64
> >>>> mean 64-bit field width).
> >>>> <
> >>>> Done this way, shifts are degenerate subsets of EXT.
> >>>> <
> >>>> If you use the sign bit<63> to control shift direction, you probably
> >>>> should not
> >>>> be using bits<37:32> as the field width because pasting the fieldWidth
> >>>> into
> >>>> such a container is simply harder.
> >>>> <
> >>>> I learned the hard way about placing both containers too close together
> >>>> in the Mc 88100.
> >>>
> >>> Hmm. So the container is a field descriptor. But it's not actually a
> >>> bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
> >>> IIRC. Gives you some encoding entropy, but not actually any new
> >>> semantic, and you have to build the descriptor, which you avoid when
> >>> offset and width are separate arguments.
> >>>
> >>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
> >>> number of bits off the front of a bit stream, advancing the stream; word
> >>> boundaries are not significant. The problem is that HW provides word
> >>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
> >>> is the same as mapping a line stream into a (VL) instruction stream in
> >>> the decoder's instruction buffer, but how to represent that in an ISA?
> >> We did use to have something like that back in the PDP days with the
> >> variable-sized load byte opcode that would automatically step forward to
> >> the next work if needed.
> >>
> >> Today we would need a separate hw mechanism for that bitstream buffer,
> >> and probably a filling mechanism which either would need to be
> >> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
> >> would load the next (8/16/32 bits into the target register at offs bit
> >> offset, updating the OFFS reg and SRC pointer, but only if there was
> >> room, otherwise it is a NOP. The main problem is the need to update all
> >> three register operands. :-(
> >>
> >> Having it would however make it far easier to handle arbitrary bit
> >> streams, avoiding the current need for branchy code.
> > <
> > What is wrong with having a container 2× as large as the largest bit-field
> > then use a shift-double to position the bits at an appropriate
> > place for extraction and then decoding/encoding.
> What's wrong is the need to conditionally refill and then merge the
> loaded value into the shift pair. The condition is totally
> unpredictable, so you get a miss every <word size>/<average request
> size> cycles. We have predicated loads and isomorphic shifts, but have
> only enough slots to do a reasonable job on a Gold.
>
> Try it on my66: input is an array of request bit-sizes, an input array
> of bits, an output array of words, and a count. Unpack the consecutive
> bit-size fields from the bits into the words.
>
> Have fun.
<
I came up with this in 5 minutes::
This assumes the input bit-length selector is an vector of characters and that the
chars contain values from {1..64}
<
void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
{ uint64_t len,
bit=0,
word=0,
extract,
container1 = packed[0],
container2 = packed[1];

for( unsigned int i = 0; i < count; i++ )
{
len = size[i];
bit += len;
extract = ( len << 32 ) | ( bit & 0x3F );
if( word != bit >> 6 )
{
container1 = container2;
container2 = packed[++word];
}
unpacked[i] = {container2, container1} >> extract;
}
} <
This translates into pretty nice My 66000 ISA:
<
ENTRY unpack
unpack:
MOV R5,#0
MOV R6,#0
LDD R7,[R2]
LDD R8,[R2+8]
MOV R9,#0
loop:
LDUB R10,[R1+R9]
ADD R5,R5,R10
AND R11,R5,#63
SL R12,R10,#32
OR R11,R11,R12
SR R12,R6,#6
CMP R11,R6,R12
PEQ R11,{111}
ADD R6,R6,#1
MOV R7,R8
LDD R8,[R2+R6<<3]
CARRY R8,{{I}}
SL R12,R7,R11
STD R12,[R3+R9<<3]
ADD R9,R9,#1
CMP R11,R9,R4
BLT R11,loop
RET
<
Well at least straightforwardly.

Re: RISC-V vs. Aarch64

<srprfc$8ar$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22950&group=comp.arch#22950

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!To5nvU/sTaigmVbgRJ05pQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 19:36:31 +0100
Organization: Aioe.org NNTP Server
Message-ID: <srprfc$8ar$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="8539"; posting-host="To5nvU/sTaigmVbgRJ05pQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Thu, 13 Jan 2022 18:36 UTC

MitchAlsup wrote:
> On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
>> Ivan Godard wrote:
>>> On 1/12/2022 2:46 PM, MitchAlsup wrote:
>>>> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
>>>>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
>>>>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
>>>>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
>>>>>>>
>>>>>>>> So, what does (13 << -7) mean ?
>>>>>>> From the replies, I see that this is a completely different
>>>>>>> can of worms. I would tend to favor the VAX interpretation,
>>>>>>> but it's unclear to me that it would be worth the extra
>>>>>>> run-time overhead that it might imply.
>>>>>> <
>>>>>> The VAX interpretation does not allow for shifts to be used as
>>>>>> bit-manipulation instructions (extract signed and unsigned).
>>>> <
>>>>> That turns into a rotate and a shft right. But isn't that what a
>>>>> hardware EXTR is going to do anyway?
>>>> <
>>>> Technically an extract is:
>>>> <
>>>> r = ((c << (containerWidth - fieldWidth - offset)) >>
>>>> (containerWidth-fieldWidth));
>>>> <
>>>> However, what we do in HW is:
>>>> <
>>>> m = tableM[fieldWidth]; // mask m:: often done
>>>> with table
>>>> s = ~0u << (fieldWidth+offset); // sign extension bits ::
>>>> often done with table
>>>> t = (c >> offset) & m | ( signed || (c & ( 1 <<
>>>> (fieldWidth+offset)) ? s : 0);
>>>> <
>>>> as this uses 1 shifter and either tables or greater-than decoders.
>>>> <
>>>> But the part I was hinting at is that we have a large container for
>>>> this shift
>>>> count and we need only a few bits on one end. So, in My 66000, the
>>>> upper ½
>>>> of the container is allowed to contain a value 0..64 (where both 0 and 64
>>>> mean 64-bit field width).
>>>> <
>>>> Done this way, shifts are degenerate subsets of EXT.
>>>> <
>>>> If you use the sign bit<63> to control shift direction, you probably
>>>> should not
>>>> be using bits<37:32> as the field width because pasting the fieldWidth
>>>> into
>>>> such a container is simply harder.
>>>> <
>>>> I learned the hard way about placing both containers too close together
>>>> in the Mc 88100.
>>>
>>> Hmm. So the container is a field descriptor. But it's not actually a
>>> bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
>>> IIRC. Gives you some encoding entropy, but not actually any new
>>> semantic, and you have to build the descriptor, which you avoid when
>>> offset and width are separate arguments.
>>>
>>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
>>> number of bits off the front of a bit stream, advancing the stream; word
>>> boundaries are not significant. The problem is that HW provides word
>>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
>>> is the same as mapping a line stream into a (VL) instruction stream in
>>> the decoder's instruction buffer, but how to represent that in an ISA?
>> We did use to have something like that back in the PDP days with the
>> variable-sized load byte opcode that would automatically step forward to
>> the next work if needed.
>>
>> Today we would need a separate hw mechanism for that bitstream buffer,
>> and probably a filling mechanism which either would need to be
>> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
>> would load the next (8/16/32 bits into the target register at offs bit
>> offset, updating the OFFS reg and SRC pointer, but only if there was
>> room, otherwise it is a NOP. The main problem is the need to update all
>> three register operands. :-(
>>
>> Having it would however make it far easier to handle arbitrary bit
>> streams, avoiding the current need for branchy code.
> <
> What is wrong with having a container 2× as large as the largest bit-field
> then use a shift-double to position the bits at an appropriate
> place for extraction and then decoding/encoding.

This is indeed the most obvious solution, the only reason I haven't used
it generally is because on some x86 cores, SHRD/SHLD (plus SHR/SHL of
the top half) have been 4x slower than just SHR/SHL, and that made it
faster to use a single reg and a branchful (or even branchless) update.

Make the shifter double-wide by default (as you already need for FMAC),
and that problem goes away.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: RISC-V vs. Aarch64

<srpuc5$1klu$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22953&group=comp.arch#22953

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!To5nvU/sTaigmVbgRJ05pQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 20:25:59 +0100
Organization: Aioe.org NNTP Server
Message-ID: <srpuc5$1klu$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<86h7agvxun.fsf@linuxsc.com> <M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="53950"; posting-host="To5nvU/sTaigmVbgRJ05pQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Thu, 13 Jan 2022 19:25 UTC

MitchAlsup wrote:
> On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
>> On 1/13/2022 7:54 AM, MitchAlsup wrote:
>>> On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
>>>> Ivan Godard wrote:
>>>>> On 1/12/2022 2:46 PM, MitchAlsup wrote:
>>>>>> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
>>>>>>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
>>>>>>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
>>>>>>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
>>>>>>>>>
>>>>>>>>>> So, what does (13 << -7) mean ?
>>>>>>>>> From the replies, I see that this is a completely different
>>>>>>>>> can of worms. I would tend to favor the VAX interpretation,
>>>>>>>>> but it's unclear to me that it would be worth the extra
>>>>>>>>> run-time overhead that it might imply.
>>>>>>>> <
>>>>>>>> The VAX interpretation does not allow for shifts to be used as
>>>>>>>> bit-manipulation instructions (extract signed and unsigned).
>>>>>> <
>>>>>>> That turns into a rotate and a shft right. But isn't that what a
>>>>>>> hardware EXTR is going to do anyway?
>>>>>> <
>>>>>> Technically an extract is:
>>>>>> <
>>>>>> r = ((c << (containerWidth - fieldWidth - offset)) >>
>>>>>> (containerWidth-fieldWidth));
>>>>>> <
>>>>>> However, what we do in HW is:
>>>>>> <
>>>>>> m = tableM[fieldWidth]; // mask m:: often done
>>>>>> with table
>>>>>> s = ~0u << (fieldWidth+offset); // sign extension bits ::
>>>>>> often done with table
>>>>>> t = (c >> offset) & m | ( signed || (c & ( 1 <<
>>>>>> (fieldWidth+offset)) ? s : 0);
>>>>>> <
>>>>>> as this uses 1 shifter and either tables or greater-than decoders.
>>>>>> <
>>>>>> But the part I was hinting at is that we have a large container for
>>>>>> this shift
>>>>>> count and we need only a few bits on one end. So, in My 66000, the
>>>>>> upper ½
>>>>>> of the container is allowed to contain a value 0..64 (where both 0 and 64
>>>>>> mean 64-bit field width).
>>>>>> <
>>>>>> Done this way, shifts are degenerate subsets of EXT.
>>>>>> <
>>>>>> If you use the sign bit<63> to control shift direction, you probably
>>>>>> should not
>>>>>> be using bits<37:32> as the field width because pasting the fieldWidth
>>>>>> into
>>>>>> such a container is simply harder.
>>>>>> <
>>>>>> I learned the hard way about placing both containers too close together
>>>>>> in the Mc 88100.
>>>>>
>>>>> Hmm. So the container is a field descriptor. But it's not actually a
>>>>> bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
>>>>> IIRC. Gives you some encoding entropy, but not actually any new
>>>>> semantic, and you have to build the descriptor, which you avoid when
>>>>> offset and width are separate arguments.
>>>>>
>>>>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
>>>>> number of bits off the front of a bit stream, advancing the stream; word
>>>>> boundaries are not significant. The problem is that HW provides word
>>>>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
>>>>> is the same as mapping a line stream into a (VL) instruction stream in
>>>>> the decoder's instruction buffer, but how to represent that in an ISA?
>>>> We did use to have something like that back in the PDP days with the
>>>> variable-sized load byte opcode that would automatically step forward to
>>>> the next work if needed.
>>>>
>>>> Today we would need a separate hw mechanism for that bitstream buffer,
>>>> and probably a filling mechanism which either would need to be
>>>> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
>>>> would load the next (8/16/32 bits into the target register at offs bit
>>>> offset, updating the OFFS reg and SRC pointer, but only if there was
>>>> room, otherwise it is a NOP. The main problem is the need to update all
>>>> three register operands. :-(
>>>>
>>>> Having it would however make it far easier to handle arbitrary bit
>>>> streams, avoiding the current need for branchy code.
>>> <
>>> What is wrong with having a container 2× as large as the largest bit-field
>>> then use a shift-double to position the bits at an appropriate
>>> place for extraction and then decoding/encoding.
>> What's wrong is the need to conditionally refill and then merge the
>> loaded value into the shift pair. The condition is totally
>> unpredictable, so you get a miss every <word size>/<average request
>> size> cycles. We have predicated loads and isomorphic shifts, but have
>> only enough slots to do a reasonable job on a Gold.
>>
>> Try it on my66: input is an array of request bit-sizes, an input array
>> of bits, an output array of words, and a count. Unpack the consecutive
>> bit-size fields from the bits into the words.
>>
>> Have fun.
> <
> I came up with this in 5 minutes::
> This assumes the input bit-length selector is an vector of characters and that the
> chars contain values from {1..64}
> <
> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> {
> uint64_t len,
> bit=0,
> word=0,
> extract,
> container1 = packed[0],
> container2 = packed[1];
>
> for( unsigned int i = 0; i < count; i++ )
> {
> len = size[i];
> bit += len;
> extract = ( len << 32 ) | ( bit & 0x3F );
> if( word != bit >> 6 )
> {
> container1 = container2;
> container2 = packed[++word];
> }
> unpacked[i] = {container2, container1} >> extract;
> }
> }
> <
> This translates into pretty nice My 66000 ISA:
> <
> ENTRY unpack
> unpack:
> MOV R5,#0
> MOV R6,#0
> LDD R7,[R2]
> LDD R8,[R2+8]
> MOV R9,#0
> loop:
> LDUB R10,[R1+R9]
> ADD R5,R5,R10
> AND R11,R5,#63
> SL R12,R10,#32
> OR R11,R11,R12
> SR R12,R6,#6
> CMP R11,R6,R12
> PEQ R11,{111}
> ADD R6,R6,#1
> MOV R7,R8
> LDD R8,[R2+R6<<3]
> CARRY R8,{{I}}
> SL R12,R7,R11
> STD R12,[R3+R9<<3]
> ADD R9,R9,#1
> CMP R11,R9,R4
> BLT R11,loop
> RET
> <
> Well at least straightforwardly.
>

Here you are taking full advantage of both predicate shadows to do the
branchless buffer update and your CARRY-enabled double-wide combined
shift/extract operation. The latter is somewhat broken here since you
would need a compiler intrinsic instead of directly passing the length
field in the top 32 bits of the shift count. I.e. this is NOT valid C afaik?

OTOH, it does show nicely how those two features allow very efficient
implementation of a very common but currenlty quite expensive building
block.

One small wrinkle: You very often want to have that bit picker inlined
in a lot of different locations, instead of it being a sentral function
you call, this is similar to typical C stdio libraries where the
single-byte io functions are macros which conditionally call filler
functions only when needed.

For a Huffman style decompressor you don't know how many bits you need
until after you've read them, so there you have to extract enough bits
for a subsequent table lookup which either tells you how many to keep,
along with the value of the lookup, or a flag which indicates more bits
are needed:


Click here to read the complete article
Re: RISC-V vs. Aarch64

<c43650f3-0540-4cc6-8d89-4536ebff960dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22954&group=comp.arch#22954

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:14c8:: with SMTP id u8mr4828748qtx.267.1642102616420;
Thu, 13 Jan 2022 11:36:56 -0800 (PST)
X-Received: by 2002:aca:a890:: with SMTP id r138mr1982671oie.99.1642102616129;
Thu, 13 Jan 2022 11:36:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 11:36:55 -0800 (PST)
In-Reply-To: <srpuc5$1klu$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b9a8:2de0:6c5:30db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b9a8:2de0:6c5:30db
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <srpuc5$1klu$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c43650f3-0540-4cc6-8d89-4536ebff960dn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jan 2022 19:36:56 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 276
 by: MitchAlsup - Thu, 13 Jan 2022 19:36 UTC

On Thursday, January 13, 2022 at 1:26:00 PM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
> >> On 1/13/2022 7:54 AM, MitchAlsup wrote:
> >>> On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
> >>>> Ivan Godard wrote:
> >>>>> On 1/12/2022 2:46 PM, MitchAlsup wrote:
> >>>>>> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
> >>>>>>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
> >>>>>>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
> >>>>>>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
> >>>>>>>>>
> >>>>>>>>>> So, what does (13 << -7) mean ?
> >>>>>>>>> From the replies, I see that this is a completely different
> >>>>>>>>> can of worms. I would tend to favor the VAX interpretation,
> >>>>>>>>> but it's unclear to me that it would be worth the extra
> >>>>>>>>> run-time overhead that it might imply.
> >>>>>>>> <
> >>>>>>>> The VAX interpretation does not allow for shifts to be used as
> >>>>>>>> bit-manipulation instructions (extract signed and unsigned).
> >>>>>> <
> >>>>>>> That turns into a rotate and a shft right. But isn't that what a
> >>>>>>> hardware EXTR is going to do anyway?
> >>>>>> <
> >>>>>> Technically an extract is:
> >>>>>> <
> >>>>>> r = ((c << (containerWidth - fieldWidth - offset)) >>
> >>>>>> (containerWidth-fieldWidth));
> >>>>>> <
> >>>>>> However, what we do in HW is:
> >>>>>> <
> >>>>>> m = tableM[fieldWidth]; // mask m:: often done
> >>>>>> with table
> >>>>>> s = ~0u << (fieldWidth+offset); // sign extension bits ::
> >>>>>> often done with table
> >>>>>> t = (c >> offset) & m | ( signed || (c & ( 1 <<
> >>>>>> (fieldWidth+offset)) ? s : 0);
> >>>>>> <
> >>>>>> as this uses 1 shifter and either tables or greater-than decoders.
> >>>>>> <
> >>>>>> But the part I was hinting at is that we have a large container for
> >>>>>> this shift
> >>>>>> count and we need only a few bits on one end. So, in My 66000, the
> >>>>>> upper ½
> >>>>>> of the container is allowed to contain a value 0..64 (where both 0 and 64
> >>>>>> mean 64-bit field width).
> >>>>>> <
> >>>>>> Done this way, shifts are degenerate subsets of EXT.
> >>>>>> <
> >>>>>> If you use the sign bit<63> to control shift direction, you probably
> >>>>>> should not
> >>>>>> be using bits<37:32> as the field width because pasting the fieldWidth
> >>>>>> into
> >>>>>> such a container is simply harder.
> >>>>>> <
> >>>>>> I learned the hard way about placing both containers too close together
> >>>>>> in the Mc 88100.
> >>>>>
> >>>>> Hmm. So the container is a field descriptor. But it's not actually a
> >>>>> bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
> >>>>> IIRC. Gives you some encoding entropy, but not actually any new
> >>>>> semantic, and you have to build the descriptor, which you avoid when
> >>>>> offset and width are separate arguments.
> >>>>>
> >>>>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
> >>>>> number of bits off the front of a bit stream, advancing the stream; word
> >>>>> boundaries are not significant. The problem is that HW provides word
> >>>>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
> >>>>> is the same as mapping a line stream into a (VL) instruction stream in
> >>>>> the decoder's instruction buffer, but how to represent that in an ISA?
> >>>> We did use to have something like that back in the PDP days with the
> >>>> variable-sized load byte opcode that would automatically step forward to
> >>>> the next work if needed.
> >>>>
> >>>> Today we would need a separate hw mechanism for that bitstream buffer,
> >>>> and probably a filling mechanism which either would need to be
> >>>> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
> >>>> would load the next (8/16/32 bits into the target register at offs bit
> >>>> offset, updating the OFFS reg and SRC pointer, but only if there was
> >>>> room, otherwise it is a NOP. The main problem is the need to update all
> >>>> three register operands. :-(
> >>>>
> >>>> Having it would however make it far easier to handle arbitrary bit
> >>>> streams, avoiding the current need for branchy code.
> >>> <
> >>> What is wrong with having a container 2× as large as the largest bit-field
> >>> then use a shift-double to position the bits at an appropriate
> >>> place for extraction and then decoding/encoding.
> >> What's wrong is the need to conditionally refill and then merge the
> >> loaded value into the shift pair. The condition is totally
> >> unpredictable, so you get a miss every <word size>/<average request
> >> size> cycles. We have predicated loads and isomorphic shifts, but have
> >> only enough slots to do a reasonable job on a Gold.
> >>
> >> Try it on my66: input is an array of request bit-sizes, an input array
> >> of bits, an output array of words, and a count. Unpack the consecutive
> >> bit-size fields from the bits into the words.
> >>
> >> Have fun.
> > <
> > I came up with this in 5 minutes::
> > This assumes the input bit-length selector is an vector of characters and that the
> > chars contain values from {1..64}
> > <
> > void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> > {
> > uint64_t len,
> > bit=0,
> > word=0,
> > extract,
> > container1 = packed[0],
> > container2 = packed[1];
> >
> > for( unsigned int i = 0; i < count; i++ )
> > {
> > len = size[i];
> > bit += len;
> > extract = ( len << 32 ) | ( bit & 0x3F );
> > if( word != bit >> 6 )
> > {
> > container1 = container2;
> > container2 = packed[++word];
> > }
> > unpacked[i] = {container2, container1} >> extract;
> > }
> > }
> > <
> > This translates into pretty nice My 66000 ISA:
> > <
> > ENTRY unpack
> > unpack:
> > MOV R5,#0
> > MOV R6,#0
> > LDD R7,[R2]
> > LDD R8,[R2+8]
> > MOV R9,#0
> > loop:
> > LDUB R10,[R1+R9]
> > ADD R5,R5,R10
> > AND R11,R5,#63
> > SL R12,R10,#32
> > OR R11,R11,R12
> > SR R12,R6,#6
> > CMP R11,R6,R12
> > PEQ R11,{111}
> > ADD R6,R6,#1
> > MOV R7,R8
> > LDD R8,[R2+R6<<3]
> > CARRY R8,{{I}}
> > SL R12,R7,R11
> > STD R12,[R3+R9<<3]
> > ADD R9,R9,#1
> > CMP R11,R9,R4
> > BLT R11,loop
> > RET
> > <
> > Well at least straightforwardly.
> >
> Here you are taking full advantage of both predicate shadows to do the
> branchless buffer update and your CARRY-enabled double-wide combined
> shift/extract operation. The latter is somewhat broken here since you
> would need a compiler intrinsic instead of directly passing the length
> field in the top 32 bits of the shift count. I.e. this is NOT valid C afaik?
<
It is non-portable C that happens to do the right stuff in My 66000 architectures.
>
> OTOH, it does show nicely how those two features allow very efficient
> implementation of a very common but currenlty quite expensive building
> block.
<
Thanks.
>
> One small wrinkle: You very often want to have that bit picker inlined
> in a lot of different locations, instead of it being a sentral function
> you call, this is similar to typical C stdio libraries where the
> single-byte io functions are macros which conditionally call filler
> functions only when needed.
>
> For a Huffman style decompressor you don't know how many bits you need
> until after you've read them,
<
I was just following Ivan's instructions and inserted the requirement that
the length of the maximum field is 64-bits.
<
> so there you have to extract enough bits
> for a subsequent table lookup which either tells you how many to keep,
> along with the value of the lookup, or a flag which indicates more bits
> are needed:
>
> bits9 = get9bits(); // (buffer >> bitsused) & 511;
> vb = bits9table[bits9];
> if (vb.bits > 16) { // More bits needed, use the token field as a
> // secondary table index. Max token len = 16
> bitsused += 9;
> tab = sec_table[vb.token];
> bits7 = get7bits(); // (buffer >> bitsused) & 127;
> vb = tab[bits7];
> }
> token = vb.token;
> bitsused += vb.bits;
> refill(); // buffer |= *word << 32;
> // word += bitsused>>5; bitsused &= 31;
>
> Here the get9bits() and get7bits() calls are macros that blindly pick
> off the next 9/7 bits, starting at bitsused offset into the buffer
> register, but then we would need regular calls to that refill macro
> which makes sure we always have enough bits available to get the next token.
>
> By doing the refills only in one (or a few) central location(s), it
> becomes feasible to have larger/more complicated code to implement it,
> but on x86 I have never been able to gain any time by using branchless
> code (like the example above) to do it.
>
> In theory reloading the same 32-bit value and stuffing it into the high
> half of the 64-bit buffer would only take 5-6 instructions and a couple
> of cycles, but branching around still tends to win, but I haven't tried
> this on a 64-bit platform, i.e. I tended to either reload 16-bit top
> halfs of a 32-bit buffer, or use SHRD on a pair of buffer regs, but in
> the latter case register pressure could get really bad with
> SI/DX:BX/CL/DI all permanently allocated.
<
Had I reversed the field positions in My 66000 ISA encoding, I could
take 3 instructions out of the loop (but this would have made the
more typical shifting operations harder.)
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"


Click here to read the complete article
Re: RISC-V vs. Aarch64

<srq01m$qvi$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22955&group=comp.arch#22955

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-7a42-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 19:54:30 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <srq01m$qvi$1@newsreader4.netcologne.de>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<sql2cm$3h7$1@dont-email.me> <sqmsqq$14kp$1@gioia.aioe.org>
<VSFzJ.136700$7D4.47834@fx37.iad>
<2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me>
Injection-Date: Thu, 13 Jan 2022 19:54:30 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-7a42-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:7a42:0:7285:c2ff:fe6c:992d";
logging-data="27634"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Thu, 13 Jan 2022 19:54 UTC

Ivan Godard <ivan@millcomputing.com> schrieb:

> Where ISAs really fall down is parsing a bit stream: grab a dynamic
> number of bits off the front of a bit stream, advancing the stream; word
> boundaries are not significant. The problem is that HW provides word
> streams (loop/load) and mapping that to a bit stream is nasty. The logic
> is the same as mapping a line stream into a (VL) instruction stream in
> the decoder's instruction buffer, but how to represent that in an ISA?

The same way that a vector instruction would be represented?

Vectors could be made to operate on sub-word quantities such as bytes,
with microarchitectural SIMD underneath.

I have to confess I do not know much of how compression and
decompression algorithms work. What are the operations that need
to be done with the chunk of bits that is grabbed?

Re: RISC-V vs. Aarch64

<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22956&group=comp.arch#22956

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5c87:: with SMTP id r7mr4989725qta.575.1642105397989;
Thu, 13 Jan 2022 12:23:17 -0800 (PST)
X-Received: by 2002:a9d:664e:: with SMTP id q14mr4199258otm.331.1642105396609;
Thu, 13 Jan 2022 12:23:16 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 12:23:16 -0800 (PST)
In-Reply-To: <srpjgj$541$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b9a8:2de0:6c5:30db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b9a8:2de0:6c5:30db
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jan 2022 20:23:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 38
 by: MitchAlsup - Thu, 13 Jan 2022 20:23 UTC

On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
> On 1/13/2022 7:54 AM, MitchAlsup wrote:

> What's wrong is the need to conditionally refill and then merge the
> loaded value into the shift pair. The condition is totally
> unpredictable, so you get a miss every <word size>/<average request
> size> cycles. We have predicated loads and isomorphic shifts, but have
> only enough slots to do a reasonable job on a Gold.
>
> Try it on my66: input is an array of request bit-sizes, an input array
> of bits, an output array of words, and a count. Unpack the consecutive
> bit-size fields from the bits into the words.
>
> Have fun.
<
Minor update: had to move a unit of arithmetic so the first field is performed properly.
<
void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
{ uint64_t len,
bit=0,
word=0,
extract,
container1 = packed[0],
container2 = packed[1];

for( unsigned int i = 0; i < count; i++ )
{
len = size[i];
extract = ( len << 32 ) | ( bit & 0x3F );
bit += len;
if( word != bit >> 6 )
{
container1 = container2;
container2 = packed[++word];
}
unpacked[i] = {container2, container1} >> extract;
}
}

Re: RISC-V vs. Aarch64

<srq8fv$tps$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22957&group=comp.arch#22957

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 14:18:40 -0800
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <srq8fv$tps$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<86h7agvxun.fsf@linuxsc.com> <M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 Jan 2022 22:18:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6046e23aa8f48c74d77f2fb73876c0cc";
logging-data="30524"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18oHaxsN5J/CEShsNcAcVuP"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:lzlpJHRBSXGQ98eyThyB4Xa8iGc=
In-Reply-To: <0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Thu, 13 Jan 2022 22:18 UTC

On 1/13/2022 12:23 PM, MitchAlsup wrote:
> On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
>> On 1/13/2022 7:54 AM, MitchAlsup wrote:
>
>> What's wrong is the need to conditionally refill and then merge the
>> loaded value into the shift pair. The condition is totally
>> unpredictable, so you get a miss every <word size>/<average request
>> size> cycles. We have predicated loads and isomorphic shifts, but have
>> only enough slots to do a reasonable job on a Gold.
>>
>> Try it on my66: input is an array of request bit-sizes, an input array
>> of bits, an output array of words, and a count. Unpack the consecutive
>> bit-size fields from the bits into the words.
>>
>> Have fun.
> <
> Minor update: had to move a unit of arithmetic so the first field is performed properly.
> <
> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> {
> uint64_t len,
> bit=0,
> word=0,
> extract,
> container1 = packed[0],
> container2 = packed[1];
>
> for( unsigned int i = 0; i < count; i++ )
> {
> len = size[i];
> extract = ( len << 32 ) | ( bit & 0x3F );
> bit += len;
> if( word != bit >> 6 )
> {
> container1 = container2;
> container2 = packed[++word];
> }
> unpacked[i] = {container2, container1} >> extract;
> }
> }

Perhaps I'm misreading your double right shift, but doesn't the second
no-reload extraction contain bits from the first? You aren't updating
the containers with a mask.

Re: RISC-V vs. Aarch64

<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22958&group=comp.arch#22958

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9f52:: with SMTP id i79mr4747180qke.717.1642113440990;
Thu, 13 Jan 2022 14:37:20 -0800 (PST)
X-Received: by 2002:a9d:206a:: with SMTP id n97mr4967939ota.142.1642113440676;
Thu, 13 Jan 2022 14:37:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 14:37:20 -0800 (PST)
In-Reply-To: <srq8fv$tps$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b9a8:2de0:6c5:30db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b9a8:2de0:6c5:30db
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com> <srq8fv$tps$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jan 2022 22:37:20 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 59
 by: MitchAlsup - Thu, 13 Jan 2022 22:37 UTC

On Thursday, January 13, 2022 at 4:18:43 PM UTC-6, Ivan Godard wrote:
> On 1/13/2022 12:23 PM, MitchAlsup wrote:
> > On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
> >> On 1/13/2022 7:54 AM, MitchAlsup wrote:
> >
> >> What's wrong is the need to conditionally refill and then merge the
> >> loaded value into the shift pair. The condition is totally
> >> unpredictable, so you get a miss every <word size>/<average request
> >> size> cycles. We have predicated loads and isomorphic shifts, but have
> >> only enough slots to do a reasonable job on a Gold.
> >>
> >> Try it on my66: input is an array of request bit-sizes, an input array
> >> of bits, an output array of words, and a count. Unpack the consecutive
> >> bit-size fields from the bits into the words.
> >>
> >> Have fun.
> > <
> > Minor update: had to move a unit of arithmetic so the first field is performed properly.
> > <
> > void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> > {
> > uint64_t len,
> > bit=0,
> > word=0,
> > extract,
> > container1 = packed[0],
> > container2 = packed[1];
> >
> > for( unsigned int i = 0; i < count; i++ )
> > {
> > len = size[i];
> > extract = ( len << 32 ) | ( bit & 0x3F );
> > bit += len;
> > if( word != bit >> 6 )
> > {
> > container1 = container2;
> > container2 = packed[++word];
> > }
> > unpacked[i] = {container2, container1} >> extract;
> > }
> > }
> Perhaps I'm misreading your double right shift, but doesn't the second
> no-reload extraction contain bits from the first? You aren't updating
> the containers with a mask.
<
That is correct. I am leaving the containers in memory format, and extracting
bit-fields from the pair of containers. When it is time to advance the pair of
containers, I copy the second to the first and load a new second (Little Endian
Order).
<
Note that while I use a shift operator (>>); I am performing a bit field extract
because bits<37:32> of the shift count operand contain the size of the field
to be extracted. This is non-portable--but how My 66000 works and how the
industrious programmer can access extract without leaving C. AND you ask
how this could be programmed in my66K (wrong spelling BTW).
<
Essentially, I keep track of the lower bit position of the extracted fields in
two (2) counters {word and bit} and paste in the length of each field on a
per loop basis.

Re: RISC-V vs. Aarch64

<srqeir$4fl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22959&group=comp.arch#22959

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 16:02:37 -0800
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <srqeir$4fl$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me>
<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 14 Jan 2022 00:02:36 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fe9b121eabd5c8db44ae8a29d2fa36e5";
logging-data="4597"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18L9Wy5EjqC6LVzwenQfbTI"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:Aoz5rUHaprOA7YCTPFsEdy94MV0=
In-Reply-To: <b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Fri, 14 Jan 2022 00:02 UTC

On 1/13/2022 2:37 PM, MitchAlsup wrote:
> On Thursday, January 13, 2022 at 4:18:43 PM UTC-6, Ivan Godard wrote:
>> On 1/13/2022 12:23 PM, MitchAlsup wrote:
>>> On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
>>>> On 1/13/2022 7:54 AM, MitchAlsup wrote:
>>>
>>>> What's wrong is the need to conditionally refill and then merge the
>>>> loaded value into the shift pair. The condition is totally
>>>> unpredictable, so you get a miss every <word size>/<average request
>>>> size> cycles. We have predicated loads and isomorphic shifts, but have
>>>> only enough slots to do a reasonable job on a Gold.
>>>>
>>>> Try it on my66: input is an array of request bit-sizes, an input array
>>>> of bits, an output array of words, and a count. Unpack the consecutive
>>>> bit-size fields from the bits into the words.
>>>>
>>>> Have fun.
>>> <
>>> Minor update: had to move a unit of arithmetic so the first field is performed properly.
>>> <
>>> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
>>> {
>>> uint64_t len,
>>> bit=0,
>>> word=0,
>>> extract,
>>> container1 = packed[0],
>>> container2 = packed[1];
>>>
>>> for( unsigned int i = 0; i < count; i++ )
>>> {
>>> len = size[i];
>>> extract = ( len << 32 ) | ( bit & 0x3F );
>>> bit += len;
>>> if( word != bit >> 6 )
>>> {
>>> container1 = container2;
>>> container2 = packed[++word];
>>> }
>>> unpacked[i] = {container2, container1} >> extract;
>>> }
>>> }
>> Perhaps I'm misreading your double right shift, but doesn't the second
>> no-reload extraction contain bits from the first? You aren't updating
>> the containers with a mask.
> <
> That is correct. I am leaving the containers in memory format, and extracting
> bit-fields from the pair of containers. When it is time to advance the pair of
> containers, I copy the second to the first and load a new second (Little Endian
> Order).
> <
> Note that while I use a shift operator (>>); I am performing a bit field extract
> because bits<37:32> of the shift count operand contain the size of the field
> to be extracted. This is non-portable--but how My 66000 works and how the
> industrious programmer can access extract without leaving C. AND you ask
> how this could be programmed in my66K (wrong spelling BTW).
> <
> Essentially, I keep track of the lower bit position of the extracted fields in
> two (2) counters {word and bit} and paste in the length of each field on a
> per loop basis.
>

Ah - I see. Whereas I would do unpacked[i] = extract(container2,
container1, bit, len); saves a shift, an and and an or, but uses two slots.

Re: RISC-V vs. Aarch64

<jwvwnj3qi3g.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22960&group=comp.arch#22960

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Thu, 13 Jan 2022 19:30:16 -0500
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <jwvwnj3qi3g.fsf-monnier+comp.arch@gnu.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<86h7agvxun.fsf@linuxsc.com> <M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="f8e4bfae7329adc8a7cb497be442c3e2";
logging-data="31986"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/UuYsN7COzFGdjrEvabRWH"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:WkQggSBsl58z1OKwpbW4edWaEl0=
sha1:UR8vsTfy/9xcMvCUUpKdUguDVbM=
 by: Stefan Monnier - Fri, 14 Jan 2022 00:30 UTC

> loop:
> LDUB R10,[R1+R9]
> ADD R5,R5,R10
> AND R11,R5,#63
> SL R12,R10,#32
> OR R11,R11,R12
> SR R12,R6,#6
> CMP R11,R6,R12
> PEQ R11,{111}
> ADD R6,R6,#1
> MOV R7,R8
> LDD R8,[R2+R6<<3]
> CARRY R8,{{I}}
> SL R12,R7,R11
> STD R12,[R3+R9<<3]
> ADD R9,R9,#1
> CMP R11,R9,R4
> BLT R11,loop
> RET

There's still this `PEQ` predication which seems to lengthen the critical path
significantly, whereas ideally this "load next word" should be a kind of
"background task" that runs in parallel and is not in the critical path
(as long as it's fast enough).

Stefan

Re: RISC-V vs. Aarch64

<2a291764-208a-493e-913a-1a0adb9861dbn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22961&group=comp.arch#22961

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:8883:: with SMTP id k125mr5097651qkd.464.1642120751199;
Thu, 13 Jan 2022 16:39:11 -0800 (PST)
X-Received: by 2002:a9d:53c6:: with SMTP id i6mr4983430oth.96.1642120750936;
Thu, 13 Jan 2022 16:39:10 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 16:39:10 -0800 (PST)
In-Reply-To: <jwvwnj3qi3g.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b9a8:2de0:6c5:30db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b9a8:2de0:6c5:30db
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com> <jwvwnj3qi3g.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2a291764-208a-493e-913a-1a0adb9861dbn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jan 2022 00:39:11 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 39
 by: MitchAlsup - Fri, 14 Jan 2022 00:39 UTC

On Thursday, January 13, 2022 at 6:30:20 PM UTC-6, Stefan Monnier wrote:
> > loop:
> > LDUB R10,[R1+R9]
> > ADD R5,R5,R10
> > AND R11,R5,#63
> > SL R12,R10,#32
> > OR R11,R11,R12
> > SR R12,R6,#6
> > CMP R11,R6,R12
> > PEQ R11,{111}
> > ADD R6,R6,#1
> > MOV R7,R8
> > LDD R8,[R2+R6<<3]
> > CARRY R8,{{I}}
> > SL R12,R7,R11
> > STD R12,[R3+R9<<3]
> > ADD R9,R9,#1
> > CMP R11,R9,R4
> > BLT R11,loop
> > RET
> There's still this `PEQ` predication which seems to lengthen the critical path
> significantly, whereas ideally this "load next word" should be a kind of
> "background task" that runs in parallel and is not in the critical path
> (as long as it's fast enough).
<
Note: when doing things like Huffman decoding, most of the extractions
are small fields, and with 64-bit registers, you get a significant number
of extractions per container--maybe 12-16 extractions per container.
<
If you want, you can load both containers continuously and get rid of the
if-statement--this gives a more constant time version. Due to cache hit rates
on continuously accessed data, it might even be faster (but at higher power).
<
But, still: the predication still uses constant time FETCH-DECODE. Predication
always inserts the instructions into the instruction stream, and then cancels
them selectively. Predication is present to avoid disturbing the F-D process
making use of the vast momentum this process contains.
>
>
> Stefan

Re: RISC-V vs. Aarch64

<9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22962&group=comp.arch#22962

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:20ea:: with SMTP id 10mr6316044qvk.94.1642120863767;
Thu, 13 Jan 2022 16:41:03 -0800 (PST)
X-Received: by 2002:a9d:ba8:: with SMTP id 37mr4864884oth.227.1642120863499;
Thu, 13 Jan 2022 16:41:03 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 16:41:03 -0800 (PST)
In-Reply-To: <srqeir$4fl$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b9a8:2de0:6c5:30db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b9a8:2de0:6c5:30db
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me>
<srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me> <0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me> <b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
<srqeir$4fl$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jan 2022 00:41:03 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 67
 by: MitchAlsup - Fri, 14 Jan 2022 00:41 UTC

On Thursday, January 13, 2022 at 6:02:39 PM UTC-6, Ivan Godard wrote:
> On 1/13/2022 2:37 PM, MitchAlsup wrote:
> > On Thursday, January 13, 2022 at 4:18:43 PM UTC-6, Ivan Godard wrote:
> >> On 1/13/2022 12:23 PM, MitchAlsup wrote:
> >>> On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
> >>>> On 1/13/2022 7:54 AM, MitchAlsup wrote:
> >>>
> >>>> What's wrong is the need to conditionally refill and then merge the
> >>>> loaded value into the shift pair. The condition is totally
> >>>> unpredictable, so you get a miss every <word size>/<average request
> >>>> size> cycles. We have predicated loads and isomorphic shifts, but have
> >>>> only enough slots to do a reasonable job on a Gold.
> >>>>
> >>>> Try it on my66: input is an array of request bit-sizes, an input array
> >>>> of bits, an output array of words, and a count. Unpack the consecutive
> >>>> bit-size fields from the bits into the words.
> >>>>
> >>>> Have fun.
> >>> <
> >>> Minor update: had to move a unit of arithmetic so the first field is performed properly.
> >>> <
> >>> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> >>> {
> >>> uint64_t len,
> >>> bit=0,
> >>> word=0,
> >>> extract,
> >>> container1 = packed[0],
> >>> container2 = packed[1];
> >>>
> >>> for( unsigned int i = 0; i < count; i++ )
> >>> {
> >>> len = size[i];
> >>> extract = ( len << 32 ) | ( bit & 0x3F );
> >>> bit += len;
> >>> if( word != bit >> 6 )
> >>> {
> >>> container1 = container2;
> >>> container2 = packed[++word];
> >>> }
> >>> unpacked[i] = {container2, container1} >> extract;
> >>> }
> >>> }
> >> Perhaps I'm misreading your double right shift, but doesn't the second
> >> no-reload extraction contain bits from the first? You aren't updating
> >> the containers with a mask.
> > <
> > That is correct. I am leaving the containers in memory format, and extracting
> > bit-fields from the pair of containers. When it is time to advance the pair of
> > containers, I copy the second to the first and load a new second (Little Endian
> > Order).
> > <
> > Note that while I use a shift operator (>>); I am performing a bit field extract
> > because bits<37:32> of the shift count operand contain the size of the field
> > to be extracted. This is non-portable--but how My 66000 works and how the
> > industrious programmer can access extract without leaving C. AND you ask
> > how this could be programmed in my66K (wrong spelling BTW).
> > <
> > Essentially, I keep track of the lower bit position of the extracted fields in
> > two (2) counters {word and bit} and paste in the length of each field on a
> > per loop basis.
> >
> Ah - I see. Whereas I would do unpacked[i] = extract(container2,
> container1, bit, len); saves a shift, an and and an or, but uses two slots.
<
I would argue the an extract IS a shift. But I can agree having to combine
both offset and length into a single container is not as efficient as if there
were enough register ports to do it as 1 instruction.

Re: RISC-V vs. Aarch64

<f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22963&group=comp.arch#22963

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:550d:: with SMTP id j13mr1935239qtq.349.1642122584348;
Thu, 13 Jan 2022 17:09:44 -0800 (PST)
X-Received: by 2002:a05:6808:238f:: with SMTP id bp15mr5481023oib.78.1642122584083;
Thu, 13 Jan 2022 17:09:44 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jan 2022 17:09:43 -0800 (PST)
In-Reply-To: <9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b9a8:2de0:6c5:30db;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b9a8:2de0:6c5:30db
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me>
<srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me> <0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me> <b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
<srqeir$4fl$1@dont-email.me> <9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jan 2022 01:09:44 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 42
 by: MitchAlsup - Fri, 14 Jan 2022 01:09 UTC

On Thursday, January 13, 2022 at 6:41:05 PM UTC-6, MitchAlsup wrote:
> On Thursday, January 13, 2022 at 6:02:39 PM UTC-6, Ivan Godard wrote:
> > On 1/13/2022 2:37 PM, MitchAlsup wrote:
> > >>>>
> > >>>> Try it on my66: input is an array of request bit-sizes, an input array
> > >>>> of bits, an output array of words, and a count. Unpack the consecutive
> > >>>> bit-size fields from the bits into the words.
> > >>>>
> > >>>> Have fun.
> > >>> <
> > >>> Minor update: had to move a unit of arithmetic so the first field is performed properly.
> > >>>
This one puts the work into the container update, but requires 2 more registers.
<
void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
{ uint64_t len,
bit=0,
word=0,
extract,
mask,
container1 = packed[0],
container2 = packed[1];

for( unsigned int i = 0; i < count; i++ )
{
len = size[i];
mask = ~0u << len;
unpacked[i] = container1 & ~mask;
bit += len;
{ container2, container1 } >>= len; // this takes another register write port.

if( bit >> 6 )
{
container2 = packed[++word] ;
container1 |= container2 << (64-(bit&63)) // more background arithmetic
container 2 = container2 >> (bits&63);
}
}
} <
The cruft n the update looks to contain more arithmetic than the other way,
but might be faster. (MIGHT).

Re: RISC-V vs. Aarch64

<srr7rr$e7d$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22964&group=comp.arch#22964

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!To5nvU/sTaigmVbgRJ05pQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Fri, 14 Jan 2022 08:14:10 +0100
Organization: Aioe.org NNTP Server
Message-ID: <srr7rr$e7d$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<VSFzJ.136700$7D4.47834@fx37.iad>
<2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srq01m$qvi$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="14573"; posting-host="To5nvU/sTaigmVbgRJ05pQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Fri, 14 Jan 2022 07:14 UTC

Thomas Koenig wrote:
> Ivan Godard <ivan@millcomputing.com> schrieb:
>
>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
>> number of bits off the front of a bit stream, advancing the stream; word
>> boundaries are not significant. The problem is that HW provides word
>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
>> is the same as mapping a line stream into a (VL) instruction stream in
>> the decoder's instruction buffer, but how to represent that in an ISA?
>
> The same way that a vector instruction would be represented?
>
> Vectors could be made to operate on sub-word quantities such as bytes,
> with microarchitectural SIMD underneath.
>
> I have to confess I do not know much of how compression and
> decompression algorithms work. What are the operations that need
> to be done with the chunk of bits that is grabbed?
>
That varies a _lot_! Absolute worst case in my experience is the CABAC
option for h264: Content-Adaptive-Binary-Arithmetic-Coding

Here you normally extract single bits from the input stream, then you
immediately branch on the value of that bit (by definition completely
unpredictable, right?) to two separate code paths that really cannot be
combined into a single branchless/predicated stream.

More sw-optimized algorithms/codecs, like LZ4, will make sure that
tokens are much easier to locate and then immediately usable inline
instead of having to branch away. LZ4 additionally makes life very easy
for a branch predictor by always alternating immediate data and
back-references to stuff that has already been decoded.

For that h264 BlueRay decompressor, the alternative (CATL afair?)
encoding needs a few percent more bandwidth but it is nearly an order of
magnitude easier to decode in sw.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: RISC-V vs. Aarch64

<srr9ak$ui9$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22965&group=comp.arch#22965

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!To5nvU/sTaigmVbgRJ05pQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Fri, 14 Jan 2022 08:39:07 +0100
Organization: Aioe.org NNTP Server
Message-ID: <srr9ak$ui9$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me>
<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="31305"; posting-host="To5nvU/sTaigmVbgRJ05pQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Fri, 14 Jan 2022 07:39 UTC

MitchAlsup wrote:
> On Thursday, January 13, 2022 at 4:18:43 PM UTC-6, Ivan Godard wrote:
>> On 1/13/2022 12:23 PM, MitchAlsup wrote:
>>> On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
>>>> On 1/13/2022 7:54 AM, MitchAlsup wrote:
>>>
>>>> What's wrong is the need to conditionally refill and then merge the
>>>> loaded value into the shift pair. The condition is totally
>>>> unpredictable, so you get a miss every <word size>/<average request
>>>> size> cycles. We have predicated loads and isomorphic shifts, but have
>>>> only enough slots to do a reasonable job on a Gold.
>>>>
>>>> Try it on my66: input is an array of request bit-sizes, an input array
>>>> of bits, an output array of words, and a count. Unpack the consecutive
>>>> bit-size fields from the bits into the words.
>>>>
>>>> Have fun.
>>> <
>>> Minor update: had to move a unit of arithmetic so the first field is performed properly.
>>> <
>>> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
>>> {
>>> uint64_t len,
>>> bit=0,
>>> word=0,
>>> extract,
>>> container1 = packed[0],
>>> container2 = packed[1];
>>>
>>> for( unsigned int i = 0; i < count; i++ )
>>> {
>>> len = size[i];
>>> extract = ( len << 32 ) | ( bit & 0x3F );
>>> bit += len;
>>> if( word != bit >> 6 )
>>> {
>>> container1 = container2;
>>> container2 = packed[++word];
>>> }
>>> unpacked[i] = {container2, container1} >> extract;
>>> }
>>> }
>> Perhaps I'm misreading your double right shift, but doesn't the second
>> no-reload extraction contain bits from the first? You aren't updating
>> the containers with a mask.
> <
> That is correct. I am leaving the containers in memory format, and extracting
> bit-fields from the pair of containers. When it is time to advance the pair of
> containers, I copy the second to the first and load a new second (Little Endian
> Order).
> <
> Note that while I use a shift operator (>>); I am performing a bit field extract
> because bits<37:32> of the shift count operand contain the size of the field
> to be extracted. This is non-portable--but how My 66000 works and how the
> industrious programmer can access extract without leaving C. AND you ask
> how this could be programmed in my66K (wrong spelling BTW).
> <
> Essentially, I keep track of the lower bit position of the extracted fields in
> two (2) counters {word and bit} and paste in the length of each field on a
> per loop basis.
>

Essentially you are using exactly the same algorithm as I posted for
Huffmann decoding, except you get to write a branch around the reload
which the compiler can then turn into a predicated shadow block instead.

It might be possible to abuse the shift masking on x86 with a pair of
64-bit regs and SHRD, skipping the need to mask the bit position index:

mov rax,r8 ;; The buffer is 128 bits in R9:R8
shrd rax,r9,cl ;; Full bit count in RCX
and al,511 ;; Alternatively use a bitmask table
stosb ;; Save the token or do something with it
add rcx,9
mov rsi,rcx ;; Refill the full 128 bits here
shr rsi,6
mov r8,input[rsi]
mov r9,input[rsi+8]

Nope, this is almost certainly worse. :-(

With a full 128-bit buffer, it is faster to branch around the refill
unless it is actually needed.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: RISC-V vs. Aarch64

<srraa8$1bfk$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22966&group=comp.arch#22966

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!aioe.org!To5nvU/sTaigmVbgRJ05pQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Fri, 14 Jan 2022 08:55:59 +0100
Organization: Aioe.org NNTP Server
Message-ID: <srraa8$1bfk$1@gioia.aioe.org>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me>
<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
<srqeir$4fl$1@dont-email.me>
<9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>
<f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="44532"; posting-host="To5nvU/sTaigmVbgRJ05pQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Fri, 14 Jan 2022 07:55 UTC

MitchAlsup wrote:
> On Thursday, January 13, 2022 at 6:41:05 PM UTC-6, MitchAlsup wrote:
>> On Thursday, January 13, 2022 at 6:02:39 PM UTC-6, Ivan Godard wrote:
>>> On 1/13/2022 2:37 PM, MitchAlsup wrote:
>>>>>>>
>>>>>>> Try it on my66: input is an array of request bit-sizes, an input array
>>>>>>> of bits, an output array of words, and a count. Unpack the consecutive
>>>>>>> bit-size fields from the bits into the words.
>>>>>>>
>>>>>>> Have fun.
>>>>>> <
>>>>>> Minor update: had to move a unit of arithmetic so the first field is performed properly.
>>>>>>
> This one puts the work into the container update, but requires 2 more registers.
> <
> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> {
> uint64_t len,
> bit=0,
> word=0,
> extract,
> mask,
> container1 = packed[0],
> container2 = packed[1];
>
> for( unsigned int i = 0; i < count; i++ )
> {
> len = size[i];
> mask = ~0u << len;
> unpacked[i] = container1 & ~mask;
> bit += len;
> { container2, container1 } >>= len; // this takes another register write port.
>
> if( bit >> 6 )
> {
> container2 = packed[++word] ;
> container1 |= container2 << (64-(bit&63)) // more background arithmetic
> container 2 = container2 >> (bits&63);
> }
> }
> }
> <
> The cruft n the update looks to contain more arithmetic than the other way,
> but might be faster. (MIGHT).
>

At this point, with typically small tokens, so (as you noted) lots of
them in a 64-bit register, it is better (in my experience) to actually
branch past the refill.

Don't you need a "bit &= 63" inside the refill, to bring it back down
under 64? Otherwise you'll repeat the block on every iteration?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: RISC-V vs. Aarch64

<1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22969&group=comp.arch#22969

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:14c8:: with SMTP id u8mr8377135qtx.267.1642181969278;
Fri, 14 Jan 2022 09:39:29 -0800 (PST)
X-Received: by 2002:a05:6830:318b:: with SMTP id p11mr6731167ots.129.1642181968926;
Fri, 14 Jan 2022 09:39:28 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 14 Jan 2022 09:39:28 -0800 (PST)
In-Reply-To: <srraa8$1bfk$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d0c1:c15:6f54:9253;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d0c1:c15:6f54:9253
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com> <srq8fv$tps$1@dont-email.me>
<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com> <srqeir$4fl$1@dont-email.me>
<9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com> <f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com>
<srraa8$1bfk$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jan 2022 17:39:29 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 62
 by: MitchAlsup - Fri, 14 Jan 2022 17:39 UTC

On Friday, January 14, 2022 at 1:55:55 AM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Thursday, January 13, 2022 at 6:41:05 PM UTC-6, MitchAlsup wrote:
> >> On Thursday, January 13, 2022 at 6:02:39 PM UTC-6, Ivan Godard wrote:
> >>> On 1/13/2022 2:37 PM, MitchAlsup wrote:
> >>>>>>>
> >>>>>>> Try it on my66: input is an array of request bit-sizes, an input array
> >>>>>>> of bits, an output array of words, and a count. Unpack the consecutive
> >>>>>>> bit-size fields from the bits into the words.
> >>>>>>>
> >>>>>>> Have fun.
> >>>>>> <
> >>>>>> Minor update: had to move a unit of arithmetic so the first field is performed properly.
> >>>>>>
> > This one puts the work into the container update, but requires 2 more registers.
> > <
> > void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> > {
> > uint64_t len,
> > bit=0,
> > word=0,
> > extract,
> > mask,
> > container1 = packed[0],
> > container2 = packed[1];
> >
> > for( unsigned int i = 0; i < count; i++ )
> > {
> > len = size[i];
> > mask = ~0u << len;
> > unpacked[i] = container1 & ~mask;
> > bit += len;
> > { container2, container1 } >>= len; // this takes another register write port.
> >
> > if( bit >> 6 )
> > {
> > container2 = packed[++word] ;
> > container1 |= container2 << (64-(bit&63)) // more background arithmetic
> > container 2 = container2 >> (bits&63);
> > }
> > }
> > }
> > <
> > The cruft n the update looks to contain more arithmetic than the other way,
> > but might be faster. (MIGHT).
> >
> At this point, with typically small tokens, so (as you noted) lots of
> them in a 64-bit register, it is better (in my experience) to actually
> branch past the refill.
>
> Don't you need a "bit &= 63" inside the refill, to bring it back down
> under 64? Otherwise you'll repeat the block on every iteration?
<
The shift-count bits not being used ARE checked for significance, and raise
an exception if present. So, yes, I did this to myself. Although it can be
disabled, but I expect that some people wanting to do decoding will want
some kind of security and protection from out of domain values.
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: RISC-V vs. Aarch64

<eQkEJ.26889$Q11.6344@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22970&group=comp.arch#22970

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me> <0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com> <srq8fv$tps$1@dont-email.me> <b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com> <srqeir$4fl$1@dont-email.me> <9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com> <f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com> <srraa8$1bfk$1@gioia.aioe.org> <1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com>
In-Reply-To: <1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 26
Message-ID: <eQkEJ.26889$Q11.6344@fx33.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 14 Jan 2022 20:14:02 UTC
Date: Fri, 14 Jan 2022 15:13:44 -0500
X-Received-Bytes: 2695
X-Original-Bytes: 2644
 by: EricP - Fri, 14 Jan 2022 20:13 UTC

MitchAlsup wrote:
> On Friday, January 14, 2022 at 1:55:55 AM UTC-6, Terje Mathisen wrote:
>>
>> Don't you need a "bit &= 63" inside the refill, to bring it back down
>> under 64? Otherwise you'll repeat the block on every iteration?
> <
> The shift-count bits not being used ARE checked for significance, and raise
> an exception if present. So, yes, I did this to myself. Although it can be
> disabled, but I expect that some people wanting to do decoding will want
> some kind of security and protection from out of domain values.

My 66000 INS Insert Bitfield instruction could merge the shift
field width into the offset and chop off any high order width bits.
However my copy of the instruction manual doesn't show INS accepting
immediate w and o counts as other bitfield instructions, just register.

If DST reg was srcDst then SRC3 register wouldn't be needed
and it could have a 12-bit immediate w and o like others.

Also a CARRY with INS could allow inserting into a field
straddling two registers.

This would allow "INS rShiftCtrl,rNewWidth,6,32" to overwrite the
width field at offset 32 in rShiftCtrl with a new 6 bit width.

Re: RISC-V vs. Aarch64

<58928de3-b78f-470e-b6ab-07af215cef46n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22971&group=comp.arch#22971

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5e87:: with SMTP id jl7mr8571011qvb.130.1642192083216;
Fri, 14 Jan 2022 12:28:03 -0800 (PST)
X-Received: by 2002:a05:6808:1987:: with SMTP id bj7mr91919oib.37.1642192082958;
Fri, 14 Jan 2022 12:28:02 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 14 Jan 2022 12:28:02 -0800 (PST)
In-Reply-To: <eQkEJ.26889$Q11.6344@fx33.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d0c1:c15:6f54:9253;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d0c1:c15:6f54:9253
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com> <srq8fv$tps$1@dont-email.me>
<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com> <srqeir$4fl$1@dont-email.me>
<9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com> <f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com>
<srraa8$1bfk$1@gioia.aioe.org> <1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com>
<eQkEJ.26889$Q11.6344@fx33.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <58928de3-b78f-470e-b6ab-07af215cef46n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jan 2022 20:28:03 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 42
 by: MitchAlsup - Fri, 14 Jan 2022 20:28 UTC

On Friday, January 14, 2022 at 2:14:07 PM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Friday, January 14, 2022 at 1:55:55 AM UTC-6, Terje Mathisen wrote:
> >>
> >> Don't you need a "bit &= 63" inside the refill, to bring it back down
> >> under 64? Otherwise you'll repeat the block on every iteration?
> > <
> > The shift-count bits not being used ARE checked for significance, and raise
> > an exception if present. So, yes, I did this to myself. Although it can be
> > disabled, but I expect that some people wanting to do decoding will want
> > some kind of security and protection from out of domain values.
<
> My 66000 INS Insert Bitfield instruction could merge the shift
> field width into the offset and chop off any high order width bits.
> However my copy of the instruction manual doesn't show INS accepting
> immediate w and o counts as other bitfield instructions, just register.
<
Table 14: 3-operand Constant Specification
The above table shows when {I, S1, S2} = 0b111 that the shift count (Rs2)
constant is provided by a 64-bit immediate. These tables work across the
instruction groups to provide for the need (that is not just for FP, but for
any instructions in the group.)
<
Whether I should put another INS instruction with access to the 12-bit shift
immediates is another discussion; there is room.
>
> If DST reg was srcDst then SRC3 register wouldn't be needed
> and it could have a 12-bit immediate w and o like others.
<
But nothing else in the ISA would follow this self-modifying-register rule.
>
> Also a CARRY with INS could allow inserting into a field
> straddling two registers.
<
Perhaps.
>
> This would allow "INS rShiftCtrl,rNewWidth,6,32" to overwrite the
> width field at offset 32 in rShiftCtrl with a new 6 bit width.
<
Since the location is so fixed (bits<37:32>) one could place a constant in
a register (immed = <6:32>) and use this to insert len into the proper position.
One could also use the 64-bit immediate, and this might take 1 instruction
out of the loop.

Re: RISC-V vs. Aarch64

<33c41d40-0055-48df-bd90-7b342203d219n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22972&group=comp.arch#22972

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5a07:: with SMTP id n7mr9741220qta.197.1642200715710; Fri, 14 Jan 2022 14:51:55 -0800 (PST)
X-Received: by 2002:a05:6830:318b:: with SMTP id p11mr7575777ots.129.1642200715422; Fri, 14 Jan 2022 14:51:55 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 14 Jan 2022 14:51:55 -0800 (PST)
In-Reply-To: <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d0c1:c15:6f54:9253; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d0c1:c15:6f54:9253
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <2021Dec31.203710@mips.complang.tuwien.ac.at> <KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com> <M4_BJ.140002$lz3.547@fx34.iad> <f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com> <srag0i$2ed$2@dont-email.me> <00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com> <2022Jan8.101413@mips.complang.tuwien.ac.at> <7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com> <ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com> <4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com> <570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com> <5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me> <b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com> <srpjgj$541$1@dont-email.me> <1e0b0ba3-e11c-4a14-a0c7-7c074f2f9ba7n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <33c41d40-0055-48df-bd90-7b342203d219n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jan 2022 22:51:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 194
 by: MitchAlsup - Fri, 14 Jan 2022 22:51 UTC

On Thursday, January 13, 2022 at 11:53:45 AM UTC-6, MitchAlsup wrote:
> On Thursday, January 13, 2022 at 10:20:39 AM UTC-6, Ivan Godard wrote:
> > On 1/13/2022 7:54 AM, MitchAlsup wrote:
> > > On Thursday, January 13, 2022 at 6:07:29 AM UTC-6, Terje Mathisen wrote:
> > >> Ivan Godard wrote:
> > >>> On 1/12/2022 2:46 PM, MitchAlsup wrote:
> > >>>> On Wednesday, January 12, 2022 at 4:24:36 PM UTC-6, Ivan Godard wrote:
> > >>>>> On 1/12/2022 1:24 PM, MitchAlsup wrote:
> > >>>>>> On Monday, January 10, 2022 at 11:36:01 AM UTC-6, Quadibloc wrote:
> > >>>>>>> On Saturday, January 8, 2022 at 6:16:14 PM UTC-7, MitchAlsup wrote:
> > >>>>>>>
> > >>>>>>>> So, what does (13 << -7) mean ?
> > >>>>>>> From the replies, I see that this is a completely different
> > >>>>>>> can of worms. I would tend to favor the VAX interpretation,
> > >>>>>>> but it's unclear to me that it would be worth the extra
> > >>>>>>> run-time overhead that it might imply.
> > >>>>>> <
> > >>>>>> The VAX interpretation does not allow for shifts to be used as
> > >>>>>> bit-manipulation instructions (extract signed and unsigned).
> > >>>> <
> > >>>>> That turns into a rotate and a shft right. But isn't that what a
> > >>>>> hardware EXTR is going to do anyway?
> > >>>> <
> > >>>> Technically an extract is:
> > >>>> <
> > >>>> r = ((c << (containerWidth - fieldWidth - offset)) >>
> > >>>> (containerWidth-fieldWidth));
> > >>>> <
> > >>>> However, what we do in HW is:
> > >>>> <
> > >>>> m = tableM[fieldWidth]; // mask m:: often done
> > >>>> with table
> > >>>> s = ~0u << (fieldWidth+offset); // sign extension bits ::
> > >>>> often done with table
> > >>>> t = (c >> offset) & m | ( signed || (c & ( 1 <<
> > >>>> (fieldWidth+offset)) ? s : 0);
> > >>>> <
> > >>>> as this uses 1 shifter and either tables or greater-than decoders.
> > >>>> <
> > >>>> But the part I was hinting at is that we have a large container for
> > >>>> this shift
> > >>>> count and we need only a few bits on one end. So, in My 66000, the
> > >>>> upper ½
> > >>>> of the container is allowed to contain a value 0..64 (where both 0 and 64
> > >>>> mean 64-bit field width).
> > >>>> <
> > >>>> Done this way, shifts are degenerate subsets of EXT.
> > >>>> <
> > >>>> If you use the sign bit<63> to control shift direction, you probably
> > >>>> should not
> > >>>> be using bits<37:32> as the field width because pasting the fieldWidth
> > >>>> into
> > >>>> such a container is simply harder.
> > >>>> <
> > >>>> I learned the hard way about placing both containers too close together
> > >>>> in the Mc 88100.
> > >>>
> > >>> Hmm. So the container is a field descriptor. But it's not actually a
> > >>> bitRow, because it can't cross word boundaries - more like a PDP-10 PLT
> > >>> IIRC. Gives you some encoding entropy, but not actually any new
> > >>> semantic, and you have to build the descriptor, which you avoid when
> > >>> offset and width are separate arguments.
> > >>>
> > >>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
> > >>> number of bits off the front of a bit stream, advancing the stream; word
> > >>> boundaries are not significant. The problem is that HW provides word
> > >>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
> > >>> is the same as mapping a line stream into a (VL) instruction stream in
> > >>> the decoder's instruction buffer, but how to represent that in an ISA?
> > >> We did use to have something like that back in the PDP days with the
> > >> variable-sized load byte opcode that would automatically step forward to
> > >> the next work if needed.
> > >>
> > >> Today we would need a separate hw mechanism for that bitstream buffer,
> > >> and probably a filling mechanism which either would need to be
> > >> hardware-only or possibly a FILL_IF_ROOM target,offs,[src] opcode that
> > >> would load the next (8/16/32 bits into the target register at offs bit
> > >> offset, updating the OFFS reg and SRC pointer, but only if there was
> > >> room, otherwise it is a NOP. The main problem is the need to update all
> > >> three register operands. :-(
> > >>
> > >> Having it would however make it far easier to handle arbitrary bit
> > >> streams, avoiding the current need for branchy code.
> > > <
> > > What is wrong with having a container 2× as large as the largest bit-field
> > > then use a shift-double to position the bits at an appropriate
> > > place for extraction and then decoding/encoding.
> > What's wrong is the need to conditionally refill and then merge the
> > loaded value into the shift pair. The condition is totally
> > unpredictable, so you get a miss every <word size>/<average request
> > size> cycles. We have predicated loads and isomorphic shifts, but have
> > only enough slots to do a reasonable job on a Gold.
> >
> > Try it on my66: input is an array of request bit-sizes, an input array
> > of bits, an output array of words, and a count. Unpack the consecutive
> > bit-size fields from the bits into the words.
> >
> > Have fun.
> <
> I came up with this in 5 minutes::
> This assumes the input bit-length selector is an vector of characters and that the
> chars contain values from {1..64}
> <
> void unpack( uchar_t size[], uint64_t packed[], uint64_t unpacked[], uint64_t count )
> {
> uint64_t len,
> bit=0,
> word=0,
> extract,
> container1 = packed[0],
> container2 = packed[1];
>
> for( unsigned int i = 0; i < count; i++ )
> {
> len = size[i];
> bit += len;
> extract = ( len << 32 ) | ( bit & 0x3F );
> if( word != bit >> 6 )
> {
> container1 = container2;
> container2 = packed[++word];
> }
> unpacked[i] = {container2, container1} >> extract;
> }
> }
> <
> This translates into pretty nice My 66000 ISA:
> <
> ENTRY unpack
> unpack:
> MOV R5,#0
> MOV R6,#0
> LDD R7,[R2]
> LDD R8,[R2+8]
> MOV R9,#0
> loop:
> LDUB R10,[R1+R9]
> ADD R5,R5,R10
> AND R11,R5,#63
> SL R12,R10,#32
> OR R11,R11,R12
> SR R12,R6,#6
> CMP R11,R6,R12
> PEQ R11,{111}
> ADD R6,R6,#1
> MOV R7,R8
> LDD R8,[R2+R6<<3]
> CARRY R8,{{I}}
> SL R12,R7,R11
<
The above should be an SR (shift right) instruction.
<
> STD R12,[R3+R9<<3]
> ADD R9,R9,#1
> CMP R11,R9,R4
> BLT R11,loop
> RET
> <
> Well at least straightforwardly.

Re: RISC-V vs. Aarch64

<srt6p1$ur9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22973&group=comp.arch#22973

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Fri, 14 Jan 2022 17:07:46 -0800
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <srt6p1$ur9$1@dont-email.me>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srp4lu$hhg$1@gioia.aioe.org>
<81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me>
<0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me>
<b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
<srqeir$4fl$1@dont-email.me>
<9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>
<f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com>
<srraa8$1bfk$1@gioia.aioe.org>
<1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com>
<eQkEJ.26889$Q11.6344@fx33.iad>
<58928de3-b78f-470e-b6ab-07af215cef46n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Jan 2022 01:07:45 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5dc20c94e0b42d014f20cc012d9d41e9";
logging-data="31593"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JyY3UKiIIV2zXv1UQt2+z"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:/LWj+iap9zeaI+Yi7h7KKprZkVA=
In-Reply-To: <58928de3-b78f-470e-b6ab-07af215cef46n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Sat, 15 Jan 2022 01:07 UTC

On 1/14/2022 12:28 PM, MitchAlsup wrote:
> On Friday, January 14, 2022 at 2:14:07 PM UTC-6, EricP wrote:
>> MitchAlsup wrote:
>>> On Friday, January 14, 2022 at 1:55:55 AM UTC-6, Terje Mathisen wrote:
>>>>
>>>> Don't you need a "bit &= 63" inside the refill, to bring it back down
>>>> under 64? Otherwise you'll repeat the block on every iteration?
>>> <
>>> The shift-count bits not being used ARE checked for significance, and raise
>>> an exception if present. So, yes, I did this to myself. Although it can be
>>> disabled, but I expect that some people wanting to do decoding will want
>>> some kind of security and protection from out of domain values.
> <
>> My 66000 INS Insert Bitfield instruction could merge the shift
>> field width into the offset and chop off any high order width bits.
>> However my copy of the instruction manual doesn't show INS accepting
>> immediate w and o counts as other bitfield instructions, just register.
> <
> Table 14: 3-operand Constant Specification
> The above table shows when {I, S1, S2} = 0b111 that the shift count (Rs2)
> constant is provided by a 64-bit immediate. These tables work across the
> instruction groups to provide for the need (that is not just for FP, but for
> any instructions in the group.)
> <
> Whether I should put another INS instruction with access to the 12-bit shift
> immediates is another discussion; there is room.
>>
>> If DST reg was srcDst then SRC3 register wouldn't be needed
>> and it could have a 12-bit immediate w and o like others.
> <
> But nothing else in the ISA would follow this self-modifying-register rule.
>>
>> Also a CARRY with INS could allow inserting into a field
>> straddling two registers.
> <
> Perhaps.
>>
>> This would allow "INS rShiftCtrl,rNewWidth,6,32" to overwrite the
>> width field at offset 32 in rShiftCtrl with a new 6 bit width.
> <
> Since the location is so fixed (bits<37:32>) one could place a constant in
> a register (immed = <6:32>) and use this to insert len into the proper position.
> One could also use the 64-bit immediate, and this might take 1 instruction
> out of the loop.

Doesn't the use of combined extract arguments preclude use by your
vector facility?

Also, how does the vector interact with the projected predicate?

Re: RISC-V vs. Aarch64

<ddb8a88d-7fbf-4ab4-b796-e9c6efe70d57n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22974&group=comp.arch#22974

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a24:: with SMTP id bk36mr8395019qkb.513.1642217501608;
Fri, 14 Jan 2022 19:31:41 -0800 (PST)
X-Received: by 2002:a05:6830:409d:: with SMTP id x29mr8985773ott.112.1642217501274;
Fri, 14 Jan 2022 19:31:41 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 14 Jan 2022 19:31:41 -0800 (PST)
In-Reply-To: <srt6p1$ur9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8953:b3:20bc:ff2a;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8953:b3:20bc:ff2a
References: <2021Dec24.163843@mips.complang.tuwien.ac.at> <2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com> <850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com> <srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com> <srp3n0$e0a$1@dont-email.me>
<srp4lu$hhg$1@gioia.aioe.org> <81c0ddc6-4b46-4b3d-b64b-e65963889214n@googlegroups.com>
<srpjgj$541$1@dont-email.me> <0cb445ab-978c-4812-8685-7bee7010b1c6n@googlegroups.com>
<srq8fv$tps$1@dont-email.me> <b9df2205-389e-48d2-ba8f-fe0175190d30n@googlegroups.com>
<srqeir$4fl$1@dont-email.me> <9eef7290-4980-48ea-87a8-61125f1a7b90n@googlegroups.com>
<f7439017-f3df-485c-8061-2190f5a55b6bn@googlegroups.com> <srraa8$1bfk$1@gioia.aioe.org>
<1278da74-675b-43ec-9d9f-d3b7afe70a7bn@googlegroups.com> <eQkEJ.26889$Q11.6344@fx33.iad>
<58928de3-b78f-470e-b6ab-07af215cef46n@googlegroups.com> <srt6p1$ur9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ddb8a88d-7fbf-4ab4-b796-e9c6efe70d57n@googlegroups.com>
Subject: Re: RISC-V vs. Aarch64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 15 Jan 2022 03:31:41 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 61
 by: MitchAlsup - Sat, 15 Jan 2022 03:31 UTC

On Friday, January 14, 2022 at 7:07:49 PM UTC-6, Ivan Godard wrote:
> On 1/14/2022 12:28 PM, MitchAlsup wrote:
> > On Friday, January 14, 2022 at 2:14:07 PM UTC-6, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Friday, January 14, 2022 at 1:55:55 AM UTC-6, Terje Mathisen wrote:
> >>>>
> >>>> Don't you need a "bit &= 63" inside the refill, to bring it back down
> >>>> under 64? Otherwise you'll repeat the block on every iteration?
> >>> <
> >>> The shift-count bits not being used ARE checked for significance, and raise
> >>> an exception if present. So, yes, I did this to myself. Although it can be
> >>> disabled, but I expect that some people wanting to do decoding will want
> >>> some kind of security and protection from out of domain values.
> > <
> >> My 66000 INS Insert Bitfield instruction could merge the shift
> >> field width into the offset and chop off any high order width bits.
> >> However my copy of the instruction manual doesn't show INS accepting
> >> immediate w and o counts as other bitfield instructions, just register.
> > <
> > Table 14: 3-operand Constant Specification
> > The above table shows when {I, S1, S2} = 0b111 that the shift count (Rs2)
> > constant is provided by a 64-bit immediate. These tables work across the
> > instruction groups to provide for the need (that is not just for FP, but for
> > any instructions in the group.)
> > <
> > Whether I should put another INS instruction with access to the 12-bit shift
> > immediates is another discussion; there is room.
> >>
> >> If DST reg was srcDst then SRC3 register wouldn't be needed
> >> and it could have a 12-bit immediate w and o like others.
> > <
> > But nothing else in the ISA would follow this self-modifying-register rule.
> >>
> >> Also a CARRY with INS could allow inserting into a field
> >> straddling two registers.
> > <
> > Perhaps.
> >>
> >> This would allow "INS rShiftCtrl,rNewWidth,6,32" to overwrite the
> >> width field at offset 32 in rShiftCtrl with a new 6 bit width.
> > <
> > Since the location is so fixed (bits<37:32>) one could place a constant in
> > a register (immed = <6:32>) and use this to insert len into the proper position.
> > One could also use the 64-bit immediate, and this might take 1 instruction
> > out of the loop.
<
> Doesn't the use of combined extract arguments preclude use by your
> vector facility?
<
No
>
> Also, how does the vector interact with the projected predicate?
<
The data dependencies that permeate the predicated block and cause the
loop to take more cycles than the required arithmetic requires. That is: the
data dependencies within the predicated block are themselves dependent
on the predicating condition.
<
But what you are getting at is to reformulate the loop so that as long as the
containers are not fully consumed, stay in one loop. Then when one needs
to switch containers, exit the vectorized loop, update the containers and
repeat until done.

Re: RISC-V vs. Aarch64

<sru612$g7c$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22975&group=comp.arch#22975

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-df7a-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: RISC-V vs. Aarch64
Date: Sat, 15 Jan 2022 10:01:06 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sru612$g7c$1@newsreader4.netcologne.de>
References: <2021Dec24.163843@mips.complang.tuwien.ac.at>
<2021Dec31.203710@mips.complang.tuwien.ac.at>
<KC_zJ.59028$Ak2.12921@fx20.iad> <86h7agvxun.fsf@linuxsc.com>
<M4_BJ.140002$lz3.547@fx34.iad>
<f91f3db8-640e-4c10-b0f7-61c7085b70c8n@googlegroups.com>
<srag0i$2ed$2@dont-email.me>
<00add816-93d7-4763-a68b-33a67db6d770n@googlegroups.com>
<2022Jan8.101413@mips.complang.tuwien.ac.at>
<7557bf3a-61ce-4500-8cf8-ced2dbed7087n@googlegroups.com>
<ad2ee700-b604-4565-9e24-3386580b90c8n@googlegroups.com>
<4d2fbc82-af69-4388-bfa5-e3b2be652744n@googlegroups.com>
<2e706405-006a-49bb-8e8a-f634d749205en@googlegroups.com>
<570acc73-a5da-497f-8ec4-810150e0a9f1n@googlegroups.com>
<850b7681-204a-4df6-9095-cd6ee816a7d5n@googlegroups.com>
<5ea00397-5572-4fbd-bfb3-85c3554f1eb9n@googlegroups.com>
<srnkf0$4cb$1@dont-email.me>
<b4e98991-4fb9-4ef7-a831-430c3fc10145n@googlegroups.com>
<srp3n0$e0a$1@dont-email.me> <srq01m$qvi$1@newsreader4.netcologne.de>
<srr7rr$e7d$1@gioia.aioe.org>
Injection-Date: Sat, 15 Jan 2022 10:01:06 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-df7a-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:df7a:0:7285:c2ff:fe6c:992d";
logging-data="16620"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 15 Jan 2022 10:01 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Thomas Koenig wrote:
>> Ivan Godard <ivan@millcomputing.com> schrieb:
>>
>>> Where ISAs really fall down is parsing a bit stream: grab a dynamic
>>> number of bits off the front of a bit stream, advancing the stream; word
>>> boundaries are not significant. The problem is that HW provides word
>>> streams (loop/load) and mapping that to a bit stream is nasty. The logic
>>> is the same as mapping a line stream into a (VL) instruction stream in
>>> the decoder's instruction buffer, but how to represent that in an ISA?
>>
>> The same way that a vector instruction would be represented?
>>
>> Vectors could be made to operate on sub-word quantities such as bytes,
>> with microarchitectural SIMD underneath.
>>
>> I have to confess I do not know much of how compression and
>> decompression algorithms work. What are the operations that need
>> to be done with the chunk of bits that is grabbed?
>>
> That varies a _lot_! Absolute worst case in my experience is the CABAC
> option for h264: Content-Adaptive-Binary-Arithmetic-Coding
>
> Here you normally extract single bits from the input stream, then you
> immediately branch on the value of that bit (by definition completely
> unpredictable, right?) to two separate code paths that really cannot be
> combined into a single branchless/predicated stream.

Interesting. You can do it in hardware in parallel, then select
the result. I would assume that the workloads of the two branches
are roughly the same?

A general purpose CPU could do something similar. A superscalar
in-order architecture could distribute the work between its
pipelines. If the ISA has a way to specify separate execution
units, so that (in SSA form)

if (condition) {
foo_1 = some work;
}
else {
foo_2 = some other work;
}
foo = PHI (foo_1, foo_2);

this could be implemented efficiently even on a superscalar
in-order machine.

I know this can be done with Mitch's shadow modifier for a limited
number of instructions, and I also suspect that I am just describing
a feature of the Mill.

Pages:123456789101112131415
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor