Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

We can predict everything, except the future.


devel / comp.arch / Re: Proposal for Single instructions for string library functions on My 66000

SubjectAuthor
* Proposal for Single instructions for string library functions on MyStephen Fuld
+* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
|`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| `- Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
+* Re: Proposal for Single instructions for string library functions onBrian G. Lucas
|+- Re: Proposal for Single instructions for string library functions onMarcus
|`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| `* Re: Proposal for Single instructions for string library functions onBGB
|  `- Re: Proposal for Single instructions for string library functions onMitchAlsup
+* Re: Proposal for Single instructions for string library functionsThomas Koenig
|+* Re: Proposal for Single instructions for string library functions onStephen Fuld
||`* Re: Proposal for Single instructions for string library functionsThomas Koenig
|| +- Re: Proposal for Single instructions for string library functions onMarcus
|| `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||  `* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
||   `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||    `* Re: Proposal for Single instructions for string library functions onMitchAlsup
||     +- Re: Proposal for Single instructions for string library functions onIvan Godard
||     +* Re: Proposal for Single instructions for string library functions onTerje Mathisen
||     |+- Re: Proposal for Single instructions for string library functions onIvan Godard
||     |`* Re: Proposal for Single instructions for string library functions onMitchAlsup
||     | `* Re: Proposal for Single instructions for string library functions onTerje Mathisen
||     |  `* Re: Proposal for Single instructions for string library functions onMichael S
||     |   `- Re: Proposal for Single instructions for string library functions onMitchAlsup
||     `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||      `* Re: Proposal for Single instructions for string library functions onMitchAlsup
||       `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||        `* Re: Proposal for Single instructions for string library functions onMitchAlsup
||         `* Re: Proposal for Single instructions for string library functions onMarcus
||          `* Re: Proposal for Single instructions for string library functionsThomas Koenig
||           `* Re: Proposal for Single instructions for string library functions onMarcus
||            `* Re: Proposal for Single instructions for string library functionsThomas Koenig
||             `* Re: Proposal for Single instructions for string library functions onMarcus
||              `- Re: Proposal for Single instructions for string library functions onMitchAlsup
|`* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| +* Re: Proposal for Single instructions for string library functions onMitchAlsup
| |`* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | `* Re: Proposal for Single instructions for string library functions onMitchAlsup
| |  +- Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |  `* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |   `* Re: Proposal for Single instructions for string library functions onMitchAlsup
| |    `- Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| +* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |`* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | +* Re: Proposal for Single instructions for string library functions on My 66000Stefan Monnier
| | |`* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | | `* Re: Proposal for Single instructions for string library functions onIvan Godard
| | |  `* Re: Proposal for Single instructions for string library functions onMitchAlsup
| | |   +* Re: Proposal for Single instructions for string library functions onIvan Godard
| | |   |+* Re: Proposal for Single instructions for string library functions onTerje Mathisen
| | |   ||`* Re: Proposal for Single instructions for string library functionsEricP
| | |   || +* Re: Proposal for Single instructions for string library functions onMitchAlsup
| | |   || |`* Re: Proposal for Single instructions for string library functionsEricP
| | |   || | +- Re: Proposal for Single instructions for string library functions onIvan Godard
| | |   || | +* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | |   || | |`* Re: Proposal for Single instructions for string library functionsEricP
| | |   || | | `* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | |   || | |  `- Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | |   || | `- Re: Proposal for Single instructions for string library functions onTerje Mathisen
| | |   || `* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | |   ||  `* Re: Proposal for Single instructions for string library functions onIvan Godard
| | |   ||   `- Re: Proposal for Single instructions for string library functions onMitchAlsup
| | |   |`- Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | |   `- Re: Proposal for Single instructions for string library functions onMitchAlsup
| | `* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |  +- Re: Proposal for Single instructions for string library functionsBranimir Maksimovic
| |  +* Re: Proposal for Single instructions for string library functions on My 66000Stefan Monnier
| |  |`- Re: Proposal for Single instructions for string library functions onDavid Brown
| |  `* Re: Proposal for Single instructions for string library functions on My 66000George Neuner
| |   `* Re: Proposal for Single instructions for string library functions onDavid Brown
| |    +* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |    |`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |    | +* Re: Proposal for Single instructions for string library functions onDavid Brown
| |    | |`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |    | | `- Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |    | `* Re: Proposal for Single instructions for string library functions onIvan Godard
| |    |  +- Re: Proposal for Single instructions for string library functions onStephen Fuld
| |    |  `- Re: Proposal for Single instructions for string library functions onTerje Mathisen
| |    `* Re: Proposal for Single instructions for string library functions on My 66000George Neuner
| |     `* Re: Proposal for Single instructions for string library functions onDavid Brown
| |      `- Re: Proposal for Single instructions for string library functions onStephen Fuld
| `- Re: Proposal for Single instructions for string library functionsThomas Koenig
`* Re: Proposal for Single instructions for string library functions onTerje Mathisen
 `- Re: Proposal for Single instructions for string library functions onStephen Fuld

Pages:1234
Re: Proposal for Single instructions for string library functions on My 66000

<a638bd60-15a0-4838-85b5-cfb5eaf1849cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18074&group=comp.arch#18074

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:71c1:: with SMTP id m184mr4228627qkc.367.1624668973219;
Fri, 25 Jun 2021 17:56:13 -0700 (PDT)
X-Received: by 2002:a54:4004:: with SMTP id x4mr4561071oie.44.1624668972955;
Fri, 25 Jun 2021 17:56:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 25 Jun 2021 17:56:12 -0700 (PDT)
In-Reply-To: <sb5t0g$jrq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a93f:cb3c:8a11:3663;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a93f:cb3c:8a11:3663
References: <sar8dp$d9$1@dont-email.me> <sarihq$hv5$1@dont-email.me>
<sasvif$1df$1@dont-email.me> <sb5t0g$jrq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a638bd60-15a0-4838-85b5-cfb5eaf1849cn@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 26 Jun 2021 00:56:13 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sat, 26 Jun 2021 00:56 UTC

On Friday, June 25, 2021 at 7:39:47 PM UTC-5, BGB wrote:
> On 6/22/2021 10:28 AM, Stephen Fuld wrote:
> > On 6/21/2021 7:39 PM, Brian G. Lucas wrote:
> >> On 6/21/21 6:47 PM, Stephen Fuld wrote:
> >
> > big snip
> >
> >>> Let me conclude by re-emphasizing that this whole idea (single
> >>> instructions for string functions) might not make sense, or might not
> >>> be worthwhile, or it might be the wrong way to implement the
> >>> functionality, etc. But I want to present it to get reactions and
> >>> potential improvements.
> >>>
> >>>
> >> The MM instruction is widely useful no matter what the source language
> >> is.
> >>
> >> IMHO implementing the C string library instructions is "preparing for
> >> the previous war". I think we need to wait until we see what happens
> >> with Rust (and perhaps Go and others) and determine what string (or
> >> array) primitives are hot spots in applications written in more modern
> >> languages.
> >
> > Certainly a valid point, although there will certainly be a huge amount
> > of C/C++ code (and Java and other popular languages that have similar
> > functions) in use for a long time.
> >
> > I spent a little time looking at the Rust book to see if what I proposed
> > was applicable. It seems it might be, though more of the functionality
> > is in the language proper rather than in a library. It would take more
> > study or someone more versed in Rust than I am to know for sure. I
> > haven't looked at Go at all.
> >
> Expecting much standardization between these language is probably a stretch.
>
>
> Though, one possibility for strings could be (for bare character pointers):
> Pointer points at start of string data;
> String data ends in NUL byte.
>
> However, this is not the end of the story:
> str[-1]==0, Plain Null terminated string
> We are pointing at the start.
> str[-1]==01..7F, We are pointing somewhere to the string interior;
> str[-1]==80..BF, We are pointing somewhere to the string interior;
> str[-1]==C0..EF && str[0]==80..BF, String Interior.
> str[-1]==C0..EF && str[0]!=80..BF, Start of String, Reverse VLN.
> ...
>
> Reverse VLN:
> 80..BF: Length (0000..003F)
> C0..DF: Length (0040..07FF)
> E0..EF: Length (0800..FFFF)
> ...
>
> Essentially, Reverse VLN is effectively sort of like a UTF-8 codepoint
> but encoded backwards. The Reverse VLN is preceded either with a
> meta-type-tag or NUL byte.
<
This reminds me of using IBM TSS360 where we would pad file names
with spaces "file " so that the disk sweeper could not come along and
remove the files (without being told to do so explicitly with file name in
quotation marks with the proper number of spaces !)
<
>
>
> One advantage of these strings is that they are partially backwards
> compatible with C strings, but can allow some more capabilities (such as
> scanning backwards to find the start of a string if given a pointer to
> its interior).
>
> Unlike plain C strings, they would require double-ended termination, but
> for string tables, the start and end terminators between adjacent short
> strings could be merged.
>
>
> Possibly, strings longer than a certain minimum could be encoded by
> default with a length prefix. This format will assume that the character
> data is stored as either ASCII or UTF-8.
>
> A pair of NUL bytes could also encode the start or end marker of a
> string table.
>
>
> A similar scheme can be used for UTF-16 strings but with backwards
> surrogate pairs or similar as the start-of-string length marker.
>
>
> As for whether special string instructions belong in an ISA, I don't
> personally believe so. Packed byte-compare comes close, but arguably has
> other uses as well.

Re: Proposal for Single instructions for string library functions on My 66000

<sbfq9k$9vm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18205&group=comp.arch#18205

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 29 Jun 2021 11:54:43 -0700
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <sbfq9k$9vm$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 18:54:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5c5e1cb849a7487e5a5af0cde2059592";
logging-data="10230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/sic0KcaFw1jEWd8MpmsmQAPlssLrjbA0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:8VTvF8U2L0rD/qjuLGhRxtyShQI=
In-Reply-To: <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Tue, 29 Jun 2021 18:54 UTC

On 6/23/2021 4:49 PM, MitchAlsup wrote:
> On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
>> On 6/23/2021 8:44 AM, MitchAlsup wrote:

snip

>>> Note: The LOOP instruction in My 66000 was designed to deal with both
>>> counted and null terminated loops (and a few more).
>> Yes. That was one of the things I was talking about in my original post
>> as "existing logic".
>>> It seems to me that
>>> providing all of the instructions one can synthesize with VVM and My
>>> 66000 instructions would consume a lot of space.
>> Sure, but no one is proposing that! The question is whether there is a
>> subset of "all" that is worthwhile? So far, there is, consisting of a
>> single member, MM. My question is, are there more?
>>> <
>>> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
>>> implementations. So It is hard to see direct HW implementations at the
>>> instruction level running "that much faster" most of the loops are governed
>>> by cache access width (both VVM and direct HW implementation.)
> <
>> I understand. Thus my surprise that you implemented MM. The fact that
>> you did led to my trying to see if it was worthwhile to go further.
> <
> MM made the list, after much consideration, mainly because it is much more
> compact way to move stuff around in memory without having to pass through
> registers. LDM and STM made the cut but are so underutilized it would cause
> no undo harm to remove them--the vast majority of the LDM/STM uses are
> performed with ENTER and EXIT.

I have thought about making STM/LDM "variants" of Enter/Exit, but you
need another register specifier to hold the memory address. If the goal
is to eliminate the LDM/STM op codes, and they are infrequently used, I
suppose you could precede Enter/Exit with a Carry meta instruction that
indicates what register contains the memory address. I am not sure how
much saving the two op-codes is worth.

>> As I said, ISTM that the main advantages of a single instruction is
>> lower cost "start up" (and resume after interrupt), and lower
>> memory/I-cache usage. Once you are up and going, I agree that there is
>> essentially no advantage.
> <
> MM, ENTER, and EXIT made the cut for code density reasons.

So the second of my reasons above (the memory/I-cache usage). Fine. I
think you are saying that MM would occur often enough that the savings
justify the cost. And by omitting the others, that they do not occur
often enough. You may be right. As I said in the OP, I don't have any
good statistics.

My guess is that perhaps the next most used instruction would be
essentially a variant of strchr that optionally had an n character
limit. Besides strchr and memchr, this provides strlen by making the
searched for character a null and n very large, allows the "outer loop"
of the nested loop functions to use VVM for a big savings on those, and
speeds up the first part of strcat and strncat.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18209&group=comp.arch#18209

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:ad16:: with SMTP id f22mr13757826qkm.160.1624998340316;
Tue, 29 Jun 2021 13:25:40 -0700 (PDT)
X-Received: by 2002:aca:de05:: with SMTP id v5mr326175oig.157.1624998340120;
Tue, 29 Jun 2021 13:25:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 13:25:39 -0700 (PDT)
In-Reply-To: <sbfq9k$9vm$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me> <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 20:25:40 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 29 Jun 2021 20:25 UTC

On Tuesday, June 29, 2021 at 1:54:47 PM UTC-5, Stephen Fuld wrote:
> On 6/23/2021 4:49 PM, MitchAlsup wrote:
> > On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
> >> On 6/23/2021 8:44 AM, MitchAlsup wrote:
> snip
> >>> Note: The LOOP instruction in My 66000 was designed to deal with both
> >>> counted and null terminated loops (and a few more).
> >> Yes. That was one of the things I was talking about in my original post
> >> as "existing logic".
> >>> It seems to me that
> >>> providing all of the instructions one can synthesize with VVM and My
> >>> 66000 instructions would consume a lot of space.
> >> Sure, but no one is proposing that! The question is whether there is a
> >> subset of "all" that is worthwhile? So far, there is, consisting of a
> >> single member, MM. My question is, are there more?
> >>> <
> >>> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
> >>> implementations. So It is hard to see direct HW implementations at the
> >>> instruction level running "that much faster" most of the loops are governed
> >>> by cache access width (both VVM and direct HW implementation.)
> > <
> >> I understand. Thus my surprise that you implemented MM. The fact that
> >> you did led to my trying to see if it was worthwhile to go further.
> > <
> > MM made the list, after much consideration, mainly because it is much more
> > compact way to move stuff around in memory without having to pass through
> > registers. LDM and STM made the cut but are so underutilized it would cause
> > no undo harm to remove them--the vast majority of the LDM/STM uses are
> > performed with ENTER and EXIT.
<
> I have thought about making STM/LDM "variants" of Enter/Exit, but you
> need another register specifier to hold the memory address. If the goal
> is to eliminate the LDM/STM op codes, and they are infrequently used, I
> suppose you could precede Enter/Exit with a Carry meta instruction that
> indicates what register contains the memory address. I am not sure how
> much saving the two op-codes is worth.
<
No, ENTER and EXIT imply the stack pointer (SP), teh 2 register specifiers
are the start and stop registers (start==stop implies all 32 registers).
<
> >> As I said, ISTM that the main advantages of a single instruction is
> >> lower cost "start up" (and resume after interrupt), and lower
> >> memory/I-cache usage. Once you are up and going, I agree that there is
> >> essentially no advantage.
> > <
> > MM, ENTER, and EXIT made the cut for code density reasons.
> So the second of my reasons above (the memory/I-cache usage). Fine. I
> think you are saying that MM would occur often enough that the savings
> justify the cost. And by omitting the others, that they do not occur
> often enough. You may be right. As I said in the OP, I don't have any
> good statistics.
<
In any event MM went in and came out several times until Brian found
ways to use it for struct assignments, at which point it made the cut.
>
> My guess is that perhaps the next most used instruction would be
> essentially a variant of strchr that optionally had an n character
> limit. Besides strchr and memchr, this provides strlen by making the
> searched for character a null and n very large, allows the "outer loop"
> of the nested loop functions to use VVM for a big savings on those, and
> speeds up the first part of strcat and strncat.
<
I specifically architected VEC and especially LOOP to deal with the NULL
terminated C strings and the count to N loop limits simultaneously.
Basically, all of the str* and mem* that are leaf routines vectorize.
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sbg4gb$fhh$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18214&group=comp.arch#18214

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 29 Jun 2021 14:48:58 -0700
Organization: A noiseless patient Spider
Lines: 103
Message-ID: <sbg4gb$fhh$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 29 Jun 2021 21:48:59 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5c5e1cb849a7487e5a5af0cde2059592";
logging-data="15921"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jB54Nllxm6mOzSFjnZYLGS6iK/IX6vgc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:a1ba0WfzpYXzTZbnGmYFz9aohPU=
In-Reply-To: <656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Tue, 29 Jun 2021 21:48 UTC

On 6/29/2021 1:25 PM, MitchAlsup wrote:
> On Tuesday, June 29, 2021 at 1:54:47 PM UTC-5, Stephen Fuld wrote:
>> On 6/23/2021 4:49 PM, MitchAlsup wrote:
>>> On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
>>>> On 6/23/2021 8:44 AM, MitchAlsup wrote:
>> snip
>>>>> Note: The LOOP instruction in My 66000 was designed to deal with both
>>>>> counted and null terminated loops (and a few more).
>>>> Yes. That was one of the things I was talking about in my original post
>>>> as "existing logic".
>>>>> It seems to me that
>>>>> providing all of the instructions one can synthesize with VVM and My
>>>>> 66000 instructions would consume a lot of space.
>>>> Sure, but no one is proposing that! The question is whether there is a
>>>> subset of "all" that is worthwhile? So far, there is, consisting of a
>>>> single member, MM. My question is, are there more?
>>>>> <
>>>>> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
>>>>> implementations. So It is hard to see direct HW implementations at the
>>>>> instruction level running "that much faster" most of the loops are governed
>>>>> by cache access width (both VVM and direct HW implementation.)
>>> <
>>>> I understand. Thus my surprise that you implemented MM. The fact that
>>>> you did led to my trying to see if it was worthwhile to go further.
>>> <
>>> MM made the list, after much consideration, mainly because it is much more
>>> compact way to move stuff around in memory without having to pass through
>>> registers. LDM and STM made the cut but are so underutilized it would cause
>>> no undo harm to remove them--the vast majority of the LDM/STM uses are
>>> performed with ENTER and EXIT.
> <
>> I have thought about making STM/LDM "variants" of Enter/Exit, but you
>> need another register specifier to hold the memory address. If the goal
>> is to eliminate the LDM/STM op codes, and they are infrequently used, I
>> suppose you could precede Enter/Exit with a Carry meta instruction that
>> indicates what register contains the memory address. I am not sure how
>> much saving the two op-codes is worth.
> <
> No, ENTER and EXIT imply the stack pointer (SP), teh 2 register specifiers
> are the start and stop registers (start==stop implies all 32 registers).

Yes, I realize that. Hence my suggestion about using Carry. The
register specified in the Carry instruction would contain the memory
address for starting the LDM/STM, and the Fact that the Enter/Exit was
under the shadow of the Carry would tell the HW to use the Carry
register rather than the stack pointer. The start and stop specifiers
in the Enter/Exit would work exactly as they do now.

> <
>>>> As I said, ISTM that the main advantages of a single instruction is
>>>> lower cost "start up" (and resume after interrupt), and lower
>>>> memory/I-cache usage. Once you are up and going, I agree that there is
>>>> essentially no advantage.
>>> <
>>> MM, ENTER, and EXIT made the cut for code density reasons.
>> So the second of my reasons above (the memory/I-cache usage). Fine. I
>> think you are saying that MM would occur often enough that the savings
>> justify the cost. And by omitting the others, that they do not occur
>> often enough. You may be right. As I said in the OP, I don't have any
>> good statistics.
> <
> In any event MM went in and came out several times until Brian found
> ways to use it for struct assignments, at which point it made the cut.

So, perhaps I missed that it appears that an important factor for you is
whether the compiler generates the instruction, not that it would
benefit a library function. Is that correct?

>>
>> My guess is that perhaps the next most used instruction would be
>> essentially a variant of strchr that optionally had an n character
>> limit. Besides strchr and memchr, this provides strlen by making the
>> searched for character a null and n very large, allows the "outer loop"
>> of the nested loop functions to use VVM for a big savings on those, and
>> speeds up the first part of strcat and strncat.
> <
> I specifically architected VEC and especially LOOP to deal with the NULL
> terminated C strings and the count to N loop limits simultaneously.

I understand the advantages of VVM, and especially its ability to deal
with compare value and count as both termination conditions. As I have
said several times, I am a big fan. :-) But the functionality provided
by MM obviously vectorizes extememly well with VVM, yet you felt it was
worthwhile to get a little bit more performance out f it.

> Basically, all of the str* and mem* that are leaf routines vectorize.

I spent some time looking at ones like strpbrk or strcspn, and the best
VVM implementation I would come up with vectorizes the search of the
presumably shorter shorter second string/set. But you really want to
take advantage of VVM on the longer string. Hence my idea of making
what was the inner/shorter loop an instruction to allow VVM to work its
magic on the longer string. Is there a better implementation?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18221&group=comp.arch#18221

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:65c3:: with SMTP id z186mr23900536qkb.481.1625009579791;
Tue, 29 Jun 2021 16:32:59 -0700 (PDT)
X-Received: by 2002:a4a:ab07:: with SMTP id i7mr6153921oon.89.1625009579475;
Tue, 29 Jun 2021 16:32:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Jun 2021 16:32:59 -0700 (PDT)
In-Reply-To: <sbg4gb$fhh$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a6:66b6:1520:df39;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a6:66b6:1520:df39
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me> <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me> <656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Jun 2021 23:32:59 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 29 Jun 2021 23:32 UTC

On Tuesday, June 29, 2021 at 4:49:01 PM UTC-5, Stephen Fuld wrote:
> On 6/29/2021 1:25 PM, MitchAlsup wrote:
> > On Tuesday, June 29, 2021 at 1:54:47 PM UTC-5, Stephen Fuld wrote:
> >> On 6/23/2021 4:49 PM, MitchAlsup wrote:
> >>> On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
> >>>> On 6/23/2021 8:44 AM, MitchAlsup wrote:
> >> snip
> >>>>> Note: The LOOP instruction in My 66000 was designed to deal with both
> >>>>> counted and null terminated loops (and a few more).
> >>>> Yes. That was one of the things I was talking about in my original post
> >>>> as "existing logic".
> >>>>> It seems to me that
> >>>>> providing all of the instructions one can synthesize with VVM and My
> >>>>> 66000 instructions would consume a lot of space.
> >>>> Sure, but no one is proposing that! The question is whether there is a
> >>>> subset of "all" that is worthwhile? So far, there is, consisting of a
> >>>> single member, MM. My question is, are there more?
> >>>>> <
> >>>>> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
> >>>>> implementations. So It is hard to see direct HW implementations at the
> >>>>> instruction level running "that much faster" most of the loops are governed
> >>>>> by cache access width (both VVM and direct HW implementation.)
> >>> <
> >>>> I understand. Thus my surprise that you implemented MM. The fact that
> >>>> you did led to my trying to see if it was worthwhile to go further.
> >>> <
> >>> MM made the list, after much consideration, mainly because it is much more
> >>> compact way to move stuff around in memory without having to pass through
> >>> registers. LDM and STM made the cut but are so underutilized it would cause
> >>> no undo harm to remove them--the vast majority of the LDM/STM uses are
> >>> performed with ENTER and EXIT.
> > <
> >> I have thought about making STM/LDM "variants" of Enter/Exit, but you
> >> need another register specifier to hold the memory address. If the goal
> >> is to eliminate the LDM/STM op codes, and they are infrequently used, I
> >> suppose you could precede Enter/Exit with a Carry meta instruction that
> >> indicates what register contains the memory address. I am not sure how
> >> much saving the two op-codes is worth.
> > <
> > No, ENTER and EXIT imply the stack pointer (SP), teh 2 register specifiers
> > are the start and stop registers (start==stop implies all 32 registers).
<
> Yes, I realize that. Hence my suggestion about using Carry. The
> register specified in the Carry instruction would contain the memory
> address for starting the LDM/STM, and the Fact that the Enter/Exit was
> under the shadow of the Carry would tell the HW to use the Carry
> register rather than the stack pointer. The start and stop specifiers
> in the Enter/Exit would work exactly as they do now.
<
But at this point you start loosing code density.
> > <
> >>>> As I said, ISTM that the main advantages of a single instruction is
> >>>> lower cost "start up" (and resume after interrupt), and lower
> >>>> memory/I-cache usage. Once you are up and going, I agree that there is
> >>>> essentially no advantage.
> >>> <
> >>> MM, ENTER, and EXIT made the cut for code density reasons.
> >> So the second of my reasons above (the memory/I-cache usage). Fine. I
> >> think you are saying that MM would occur often enough that the savings
> >> justify the cost. And by omitting the others, that they do not occur
> >> often enough. You may be right. As I said in the OP, I don't have any
> >> good statistics.
> > <
> > In any event MM went in and came out several times until Brian found
> > ways to use it for struct assignments, at which point it made the cut.
<
> So, perhaps I missed that it appears that an important factor for you is
> whether the compiler generates the instruction, not that it would
> benefit a library function. Is that correct?
<
MM made the cut because it takes the place of a couple of setup instructions
and a loop of what would have been 5 instructions {LD/ST/ADD/CMP/BC}.
So the compiler can use it and it is cheaper and denser than a call+ret.
> >>
> >> My guess is that perhaps the next most used instruction would be
> >> essentially a variant of strchr that optionally had an n character
> >> limit. Besides strchr and memchr, this provides strlen by making the
> >> searched for character a null and n very large, allows the "outer loop"
> >> of the nested loop functions to use VVM for a big savings on those, and
> >> speeds up the first part of strcat and strncat.
> > <
> > I specifically architected VEC and especially LOOP to deal with the NULL
> > terminated C strings and the count to N loop limits simultaneously.
<
> I understand the advantages of VVM, and especially its ability to deal
> with compare value and count as both termination conditions. As I have
> said several times, I am a big fan. :-) But the functionality provided
> by MM obviously vectorizes extememly well with VVM, yet you felt it was
> worthwhile to get a little bit more performance out f it.
<
Yes, it does vectorize well, but MM is also considerably denser.
<
> > Basically, all of the str* and mem* that are leaf routines vectorize.
<
> I spent some time looking at ones like strpbrk or strcspn, and the best
> VVM implementation I would come up with vectorizes the search of the
> presumably shorter shorter second string/set. But you really want to
> take advantage of VVM on the longer string. Hence my idea of making
> what was the inner/shorter loop an instruction to allow VVM to work its
> magic on the longer string. Is there a better implementation?
<
But the inner (shorter) loops are far from being "an instruction"
<
From the Apple library:: lightly updated to modern c::
<
char *strpbrk( const char *s1, const char *s2 )
{ const char *scanp;
int c, sc;

while ((c = *s1++) != 0) {
for (scanp = s2; (sc = *scanp++) != 0;)
if (sc == c)
return ((char *)(s1 - 1));
}
return (NULL);
}

strpbrk:
LDSB R4,[R1]
ADD R1,R1,#1
BEQ0 R4,exit
MOV R3,R2
loop:
LDSB R5,[R3]
ADD R3,R3,#1
BEQ0 R5,strpbrk
CMP R6,R4,R5
BNE R6,loop
ADD R1,R1,#-1
RET
exit:
MOV R1,#0
RET

size_t strcspn( const char *s1, const char *s2 )
{ register const char *p, *spanp;
register char c, sc;

/*
* Stop as soon as we find any character from s2. Note that there
* must be a NUL in s2; it suffices to stop when we find that, too.
*/
for (p = s1;;) {
c = *p++;
spanp = s2;
do {
if ((sc = *spanp++) == c)
return (p - 1 - s1);
} while (sc != 0);
}
}

strcspn:
MOV R3,R1
loop:
LDSB R5,[R3]
ADD R3,R3,#1
MOV R6,R2
doloop:
LDSB R7,[R6]
ADD R6,R6,#1
CMP R7,R5,R6
PNE R7,{3,{111}}
ADD R3,R3,#-1
ADD R1,R3,-R1
RET
BNE0 R6,doloop
BR loop

<
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sbh6df$v5t$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18240&group=comp.arch#18240

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 30 Jun 2021 09:27:42 +0200
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <sbh6df$v5t$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me>
<54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 30 Jun 2021 07:27:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="09069882886e836735159a7d32a41321";
logging-data="31933"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/0fxSNcCD1nBmZB+X+C89hr8q7RSRvVmM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:wJ8lV3OPosMAgmRh+bxdOtfwni8=
In-Reply-To: <54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
Content-Language: en-US
 by: Marcus - Wed, 30 Jun 2021 07:27 UTC

On 2021-06-30, MitchAlsup wrote:

[snip]

> <
> But the inner (shorter) loops are far from being "an instruction"
> <
> From the Apple library:: lightly updated to modern c::
> <
> char *strpbrk( const char *s1, const char *s2 )
> {
> const char *scanp;
> int c, sc;
>
> while ((c = *s1++) != 0) {
> for (scanp = s2; (sc = *scanp++) != 0;)
> if (sc == c)
> return ((char *)(s1 - 1));
> }
> return (NULL);
> }
>
>
>
> strpbrk:
> LDSB R4,[R1]
> ADD R1,R1,#1
> BEQ0 R4,exit
> MOV R3,R2
> loop:
> LDSB R5,[R3]
> ADD R3,R3,#1
> BEQ0 R5,strpbrk
> CMP R6,R4,R5
> BNE R6,loop
> ADD R1,R1,#-1
> RET
> exit:
> MOV R1,#0
> RET
>

For kicks, the MRISC32 GCC output is:

strpbrk:
LDUB R6,R1,#0
ADD R1,R1,#1
BZ R6,exit
MOV R3,R2
loop:
LDUB R4,R3,#0
ADD R3,R3,#1
SNE R5,R4,R6
BZ R4,strpbrk
BS R5,loop
ADD R1,R1,#-1
RET
exit:
MOV R1,R6
RET

It's remarkable how similar our ISA:s are in _certain_ areas ;-) BTW I
noticed that our C compilers use different signs for "char" - one of the
many vexing parts of the C standard.

/Marcus

Re: Proposal for Single instructions for string library functions on My 66000

<sbhdm4$ftr$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18243&group=comp.arch#18243

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions
on My 66000
Date: Wed, 30 Jun 2021 09:31:48 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbhdm4$ftr$1@newsreader4.netcologne.de>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me>
<54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
<sbh6df$v5t$1@dont-email.me>
Injection-Date: Wed, 30 Jun 2021 09:31:48 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="16315"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 30 Jun 2021 09:31 UTC

Marcus <m.delete@this.bitsnbites.eu> schrieb:
> BTW I
> noticed that our C compilers use different signs for "char" - one of the
> many vexing parts of the C standard.

signed char deals "gracefully" with broken code like

char ch;
while ((ch = getchar()) != EOF)
putchar(ch);

which works normally as long as you don't input 0xff (if your
EOF happens to be -1).

I wonder how this particular idiom influenced ABI designer's choice
of using signed vs. unsigned char for default char.

(If you want to be warned about this, use

cc -Wall -Wextra -Werror -funsigned-char

which will issue an error if your cc is a relatively recent gcc
or clang.)

Re: Proposal for Single instructions for string library functions on My 66000

<sbhf9m$8n8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18244&group=comp.arch#18244

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 30 Jun 2021 11:59:18 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <sbhf9m$8n8$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me>
<54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
<sbh6df$v5t$1@dont-email.me> <sbhdm4$ftr$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 30 Jun 2021 09:59:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="09069882886e836735159a7d32a41321";
logging-data="8936"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19F4oBNcRjaq2ju0AiImMKCfp8Wv6xTTWA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:FuTZ0NmTDWZYmSimfGt1nNIp8xM=
In-Reply-To: <sbhdm4$ftr$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Wed, 30 Jun 2021 09:59 UTC

On 2021-06-30, Thomas Koenig wrote:
> Marcus <m.delete@this.bitsnbites.eu> schrieb:
>> BTW I
>> noticed that our C compilers use different signs for "char" - one of the
>> many vexing parts of the C standard.
>
> signed char deals "gracefully" with broken code like
>
> char ch;
> while ((ch = getchar()) != EOF)
> putchar(ch);
>
> which works normally as long as you don't input 0xff (if your
> EOF happens to be -1).
>
> I wonder how this particular idiom influenced ABI designer's choice
> of using signed vs. unsigned char for default char.
>
> (If you want to be warned about this, use
>
> cc -Wall -Wextra -Werror -funsigned-char
>
> which will issue an error if your cc is a relatively recent gcc
> or clang.)
>

Usually this is not a problem, as you usually do == or != comparisons
with characters. Where you do < > comparisons you either look at ASCII
characters (where the sign bit is always 0) or you are careful to do the
right type conversions.

In some places, though, it can bite you. E.g. the original Doom source
assumed x86 style signed char, and on machines/compilers that use
unsigned char by default you got funny steering / control errors.

Fixed independently in different Doom ports:


https://github.com/mbitsnbites/mc1-doom/commit/14d04b02f199c36ebfe488799a473ff2f8af62a3


https://github.com/smunaut/doom_riscv/commit/da1dbb098e429d67dec52ebc600e2fabaf282cef

...etc.

/Marcus

Re: Proposal for Single instructions for string library functions on My 66000

<sbhj00$k6c$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18245&group=comp.arch#18245

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions
on My 66000
Date: Wed, 30 Jun 2021 11:02:24 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbhj00$k6c$1@newsreader4.netcologne.de>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me>
<54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
<sbh6df$v5t$1@dont-email.me> <sbhdm4$ftr$1@newsreader4.netcologne.de>
<sbhf9m$8n8$1@dont-email.me>
Injection-Date: Wed, 30 Jun 2021 11:02:24 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="20684"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 30 Jun 2021 11:02 UTC

Marcus <m.delete@this.bitsnbites.eu> schrieb:

[signed vs. unsigned chars]

> In some places, though, it can bite you. E.g. the original Doom source
> assumed x86 style signed char, and on machines/compilers that use
> unsigned char by default you got funny steering / control errors.

Aaaaah... now I know why Apple made the char type signed for
their aarch64 platform. They obviously wanted to play Doom
without patching it.

Re: Proposal for Single instructions for string library functions on My 66000

<sbhkdj$5rm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18246&group=comp.arch#18246

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 30 Jun 2021 13:26:43 +0200
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <sbhkdj$5rm$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me>
<656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me>
<54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
<sbh6df$v5t$1@dont-email.me> <sbhdm4$ftr$1@newsreader4.netcologne.de>
<sbhf9m$8n8$1@dont-email.me> <sbhj00$k6c$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 30 Jun 2021 11:26:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="09069882886e836735159a7d32a41321";
logging-data="6006"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Tn8zRsYDjZT4IHNUtLc4p0KsUgmGzXOY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:UTOP33to18hT+Um4SvW9yQM/cyg=
In-Reply-To: <sbhj00$k6c$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Wed, 30 Jun 2021 11:26 UTC

On 2021-06-30, Thomas Koenig wrote:
> Marcus <m.delete@this.bitsnbites.eu> schrieb:
>
> [signed vs. unsigned chars]
>
>> In some places, though, it can bite you. E.g. the original Doom source
>> assumed x86 style signed char, and on machines/compilers that use
>> unsigned char by default you got funny steering / control errors.
>
> Aaaaah... now I know why Apple made the char type signed for
> their aarch64 platform. They obviously wanted to play Doom
> without patching it.
>

That must be it! :-)

I did a quick grep in the GCC repo and got the following map (some of
these are controlled by ifdef:s so it's probably not the real truth):

aarch64 DEFAULT_SIGNED_CHAR = 0
alpha DEFAULT_SIGNED_CHAR = 1
arc DEFAULT_SIGNED_CHAR = 0
arm DEFAULT_SIGNED_CHAR = 0
avr DEFAULT_SIGNED_CHAR = 1
bfin DEFAULT_SIGNED_CHAR = 1
bpf DEFAULT_SIGNED_CHAR = 1
c6x DEFAULT_SIGNED_CHAR = 1
cr16 DEFAULT_SIGNED_CHAR = 1
cris DEFAULT_SIGNED_CHAR = 1
csky DEFAULT_SIGNED_CHAR = 0
epiphany DEFAULT_SIGNED_CHAR = 0
fr30 DEFAULT_SIGNED_CHAR = 1
frv DEFAULT_SIGNED_CHAR = 1
ft32 DEFAULT_SIGNED_CHAR = 1
gcn DEFAULT_SIGNED_CHAR = 1
h8300 DEFAULT_SIGNED_CHAR = 0/1
i386 DEFAULT_SIGNED_CHAR = 1
ia64 DEFAULT_SIGNED_CHAR = 1
iq2000 DEFAULT_SIGNED_CHAR = 1
lm32 DEFAULT_SIGNED_CHAR = 0
m32c DEFAULT_SIGNED_CHAR = 1
m32r DEFAULT_SIGNED_CHAR = 1
m68k DEFAULT_SIGNED_CHAR = 1
mcore DEFAULT_SIGNED_CHAR = 0
microblaze DEFAULT_SIGNED_CHAR = 1
mips DEFAULT_SIGNED_CHAR = 0/1
mmix DEFAULT_SIGNED_CHAR = 1
mn10300 DEFAULT_SIGNED_CHAR = 0
moxie DEFAULT_SIGNED_CHAR = 0
mrisc32 DEFAULT_SIGNED_CHAR = 0
msp430 DEFAULT_SIGNED_CHAR = 0
nds32 DEFAULT_SIGNED_CHAR = 1
nios2 DEFAULT_SIGNED_CHAR = 1
nvptx DEFAULT_SIGNED_CHAR = 1
or1k DEFAULT_SIGNED_CHAR = 1
pa DEFAULT_SIGNED_CHAR = 1
pdp11 DEFAULT_SIGNED_CHAR = 1
pru DEFAULT_SIGNED_CHAR = 0
riscv DEFAULT_SIGNED_CHAR = 0
rl78 DEFAULT_SIGNED_CHAR = 0
rs6000 DEFAULT_SIGNED_CHAR = 0/1
rx DEFAULT_SIGNED_CHAR = 0
s390 DEFAULT_SIGNED_CHAR = 0
sh DEFAULT_SIGNED_CHAR = 1
sparc DEFAULT_SIGNED_CHAR = 1
stormy16 DEFAULT_SIGNED_CHAR = 0
tilegx DEFAULT_SIGNED_CHAR = 1
tilepro DEFAULT_SIGNED_CHAR = 1
v850 DEFAULT_SIGNED_CHAR = 1
vax DEFAULT_SIGNED_CHAR = 1
visium DEFAULT_SIGNED_CHAR = 0
xtensa DEFAULT_SIGNED_CHAR = 0

For MRISC32 i went with ARM+AArch64+RISC-V > x86, but in the end it
really does not matter since, well, there will always be portability
problems and you will always have to specify "unsigned char" or "signed
char" (or better yet, "uint8_t" or "int8_t") if you care about
portability.

/Marcus

Re: Proposal for Single instructions for string library functions on My 66000

<b082a510-f4ba-4e73-a22c-fa49f1143cebn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18254&group=comp.arch#18254

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:587:: with SMTP id 129mr35709396qkf.418.1625064537263;
Wed, 30 Jun 2021 07:48:57 -0700 (PDT)
X-Received: by 2002:aca:d6c2:: with SMTP id n185mr3329056oig.51.1625064536983;
Wed, 30 Jun 2021 07:48:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 30 Jun 2021 07:48:56 -0700 (PDT)
In-Reply-To: <sbhkdj$5rm$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f5f4:9dac:532a:1a43;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f5f4:9dac:532a:1a43
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me> <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sbfq9k$9vm$1@dont-email.me> <656cf53f-7e36-4b27-b916-2033404537d7n@googlegroups.com>
<sbg4gb$fhh$1@dont-email.me> <54d6f3d3-b0f9-4900-8677-7b824ba9451bn@googlegroups.com>
<sbh6df$v5t$1@dont-email.me> <sbhdm4$ftr$1@newsreader4.netcologne.de>
<sbhf9m$8n8$1@dont-email.me> <sbhj00$k6c$1@newsreader4.netcologne.de> <sbhkdj$5rm$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b082a510-f4ba-4e73-a22c-fa49f1143cebn@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 30 Jun 2021 14:48:57 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4769
 by: MitchAlsup - Wed, 30 Jun 2021 14:48 UTC

On Wednesday, June 30, 2021 at 6:26:45 AM UTC-5, Marcus wrote:
> On 2021-06-30, Thomas Koenig wrote:
> > Marcus <m.de...@this.bitsnbites.eu> schrieb:
> >
> > [signed vs. unsigned chars]
> >
> >> In some places, though, it can bite you. E.g. the original Doom source
> >> assumed x86 style signed char, and on machines/compilers that use
> >> unsigned char by default you got funny steering / control errors.
> >
> > Aaaaah... now I know why Apple made the char type signed for
> > their aarch64 platform. They obviously wanted to play Doom
> > without patching it.
> >
> That must be it! :-)
>
> I did a quick grep in the GCC repo and got the following map (some of
> these are controlled by ifdef:s so it's probably not the real truth):
>
> aarch64 DEFAULT_SIGNED_CHAR = 0
> alpha DEFAULT_SIGNED_CHAR = 1
> arc DEFAULT_SIGNED_CHAR = 0
> arm DEFAULT_SIGNED_CHAR = 0
> avr DEFAULT_SIGNED_CHAR = 1
> bfin DEFAULT_SIGNED_CHAR = 1
> bpf DEFAULT_SIGNED_CHAR = 1
> c6x DEFAULT_SIGNED_CHAR = 1
> cr16 DEFAULT_SIGNED_CHAR = 1
> cris DEFAULT_SIGNED_CHAR = 1
> csky DEFAULT_SIGNED_CHAR = 0
> epiphany DEFAULT_SIGNED_CHAR = 0
> fr30 DEFAULT_SIGNED_CHAR = 1
> frv DEFAULT_SIGNED_CHAR = 1
> ft32 DEFAULT_SIGNED_CHAR = 1
> gcn DEFAULT_SIGNED_CHAR = 1
> h8300 DEFAULT_SIGNED_CHAR = 0/1
> i386 DEFAULT_SIGNED_CHAR = 1
> ia64 DEFAULT_SIGNED_CHAR = 1
> iq2000 DEFAULT_SIGNED_CHAR = 1
> lm32 DEFAULT_SIGNED_CHAR = 0
> m32c DEFAULT_SIGNED_CHAR = 1
> m32r DEFAULT_SIGNED_CHAR = 1
> m68k DEFAULT_SIGNED_CHAR = 1
> mcore DEFAULT_SIGNED_CHAR = 0
> microblaze DEFAULT_SIGNED_CHAR = 1
> mips DEFAULT_SIGNED_CHAR = 0/1
> mmix DEFAULT_SIGNED_CHAR = 1
> mn10300 DEFAULT_SIGNED_CHAR = 0
> moxie DEFAULT_SIGNED_CHAR = 0
> mrisc32 DEFAULT_SIGNED_CHAR = 0
> msp430 DEFAULT_SIGNED_CHAR = 0
> nds32 DEFAULT_SIGNED_CHAR = 1
> nios2 DEFAULT_SIGNED_CHAR = 1
> nvptx DEFAULT_SIGNED_CHAR = 1
> or1k DEFAULT_SIGNED_CHAR = 1
> pa DEFAULT_SIGNED_CHAR = 1
> pdp11 DEFAULT_SIGNED_CHAR = 1
> pru DEFAULT_SIGNED_CHAR = 0
> riscv DEFAULT_SIGNED_CHAR = 0
> rl78 DEFAULT_SIGNED_CHAR = 0
> rs6000 DEFAULT_SIGNED_CHAR = 0/1
> rx DEFAULT_SIGNED_CHAR = 0
> s390 DEFAULT_SIGNED_CHAR = 0
> sh DEFAULT_SIGNED_CHAR = 1
> sparc DEFAULT_SIGNED_CHAR = 1
> stormy16 DEFAULT_SIGNED_CHAR = 0
> tilegx DEFAULT_SIGNED_CHAR = 1
> tilepro DEFAULT_SIGNED_CHAR = 1
> v850 DEFAULT_SIGNED_CHAR = 1
> vax DEFAULT_SIGNED_CHAR = 1
> visium DEFAULT_SIGNED_CHAR = 0
> xtensa DEFAULT_SIGNED_CHAR = 0
<
With a list like the above, one would be foolish not to include signed
and unsigned byte LDs.
<
>
> For MRISC32 i went with ARM+AArch64+RISC-V > x86, but in the end it
> really does not matter since, well, there will always be portability
> problems and you will always have to specify "unsigned char" or "signed
> char" (or better yet, "uint8_t" or "int8_t") if you care about
> portability.
>
> /Marcus

Re: Proposal for Single instructions for string library functions on My 66000

<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18669&group=comp.arch#18669

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:ad0d:: with SMTP id f13mr1342609qkm.453.1626135443250;
Mon, 12 Jul 2021 17:17:23 -0700 (PDT)
X-Received: by 2002:a9d:5f19:: with SMTP id f25mr1335819oti.206.1626135442973;
Mon, 12 Jul 2021 17:17:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 12 Jul 2021 17:17:22 -0700 (PDT)
In-Reply-To: <sas8g4$77t$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.176.255; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.40.176.255
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Tue, 13 Jul 2021 00:17:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Tue, 13 Jul 2021 00:17 UTC

On Tuesday, June 22, 2021 at 9:54:30 AM UTC+1, Thomas Koenig wrote:
> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:

> While C was an amazing language design for its time and especially
> for the hardware constraints of the machine it was developed for,
> some of its features have not aged well. Null-terminated strings
> are one of these features.

it isn't going away. MSRPC (aka DCE/RPC) uses length-specifiers: first heavy cost, a 16 bit integer added to every string. a zero byte string (empty) is therefore 2 bytes. second: if strings are over 65535 bytes *you can't have them* you are forced to use a 4 byte length encoder, now you have a *4* byte overhead for short strings...want to do bith dynamicalky with escaoe sequencing? that strncpy in microcode is looking reeeal attractive by comparison, ehn?

then also you are forgetting that UTF8 is the de-facto web internet encoding format, and that is a whole new ballgane for which specislist opcodes are well known (except by me, sigh, i just heard about them)
> I wouldn't try to implement those in hardware. The mem* functions,
> however, are fair game (and probably already covered by the
> MM instruction).

it is seriously worthwhile looking up "speculative load" that has been introduced into SVE, RVV and SVP64.

these are suited to Horizontal-first Vector ISAs although could likely be adapted to VVM Vertical-First.

basically they say "when doing a LOAD the length may be ARBITRARILY TRUNCATED to the number of elements that would succeed WITHOUT a page fault"

or to anything the hardware feels like as long as it's one or more.

with VL now truncated the remaining Vector ops can operate safely. some of these will be "find first zero" typically using cntlz (inverted) on a parallel cmpeqz

overall in RVV you end up with a stunning 13 instructions, where the hardware is free to choose the number of Lanes, could be 1 could be 4 could be 64 could be 10,000.

strlen, similar size, note the fail-first load:
https://github.com/gsauthof/riscv/blob/master/strlen.s

on VVM, the Vertical-First ISA, the number of elements to be SIMDified at the hardware vackend would be determined by the first LD operation.

the hardware *starts out* with the *intention* of performing a parallel execution of say loading 64 bytes simultaneously, the number of parallel elements having arbitrarily set to 64.

however at the FFIRSTed LD it goes, "wwwhoops, actually this is totally misaligned, and i can only load 12 bytes wiyjout a page fault"

so the hardware goes, "ok i know i meant to load 64 elements but actuslly i only gonna do 12".

now a quick and dirty hack way of doing this is to create a "fake" (hidden) predicate mask, which is implicitly ANDed eith all VVM Loop Vector ops following the FFIRSTed LD.

it starts off at 0b111111111111...1111 and if say the 13th byte would segfault, then this fake hidden predicate mask would be set to 0b0000000111111111111.

that way, the hardware VVM loop detector need not try desperately to back out of any decisions, it simply automatically ANDs that hidden mask into all Vector operations up until the end of the loop. it also only increments the loop counter by 12 not 64.

10 years ago, Power ISA retired stringcopy instructions they added in 1994, this should tell you everything you need to know about whether they're a good idea to add in 2021.

l.

Re: Proposal for Single instructions for string library functions on My 66000

<b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18671&group=comp.arch#18671

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:ff01:: with SMTP id w1mr1947041qvt.28.1626137694550;
Mon, 12 Jul 2021 17:54:54 -0700 (PDT)
X-Received: by 2002:aca:2b08:: with SMTP id i8mr1191642oik.0.1626137693473;
Mon, 12 Jul 2021 17:54:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 12 Jul 2021 17:54:53 -0700 (PDT)
In-Reply-To: <c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2914:c98d:387d:bfef;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2914:c98d:387d:bfef
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Jul 2021 00:54:54 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5413
 by: MitchAlsup - Tue, 13 Jul 2021 00:54 UTC

On Monday, July 12, 2021 at 7:17:24 PM UTC-5, luke.l...@gmail.com wrote:
> On Tuesday, June 22, 2021 at 9:54:30 AM UTC+1, Thomas Koenig wrote:
> > Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> > While C was an amazing language design for its time and especially
> > for the hardware constraints of the machine it was developed for,
> > some of its features have not aged well. Null-terminated strings
> > are one of these features.
> it isn't going away. MSRPC (aka DCE/RPC) uses length-specifiers: first heavy cost, a 16 bit integer added to every string. a zero byte string (empty) is therefore 2 bytes. second: if strings are over 65535 bytes *you can't have them* you are forced to use a 4 byte length encoder, now you have a *4* byte overhead for short strings...want to do bith dynamicalky with escaoe sequencing? that strncpy in microcode is looking reeeal attractive by comparison, ehn?
>
> then also you are forgetting that UTF8 is the de-facto web internet encoding format, and that is a whole new ballgane for which specislist opcodes are well known (except by me, sigh, i just heard about them)
>
> > I wouldn't try to implement those in hardware. The mem* functions,
> > however, are fair game (and probably already covered by the
> > MM instruction).
>
> it is seriously worthwhile looking up "speculative load" that has been introduced into SVE, RVV and SVP64.
>
> these are suited to Horizontal-first Vector ISAs although could likely be adapted to VVM Vertical-First.
>
> basically they say "when doing a LOAD the length may be ARBITRARILY TRUNCATED to the number of elements that would succeed WITHOUT a page fault"
>
> or to anything the hardware feels like as long as it's one or more.
>
> with VL now truncated the remaining Vector ops can operate safely. some of these will be "find first zero" typically using cntlz (inverted) on a parallel cmpeqz
>
> overall in RVV you end up with a stunning 13 instructions, where the hardware is free to choose the number of Lanes, could be 1 could be 4 could be 64 could be 10,000.
>
> strlen, similar size, note the fail-first load:
> https://github.com/gsauthof/riscv/blob/master/strlen.s
>
> on VVM, the Vertical-First ISA, the number of elements to be SIMDified at the hardware vackend would be determined by the first LD operation.
<
My 66000 VVM compiled code:
<
GLOBAL strlen
ENTRY strlen
strlen:
MOV R2,#0
VEC R4,{R2}
LDUB R3,[R1+R2]
LOOP T,R2,R3!=#0 // LOOP type 3
MOV R1,R2
RET
>
> the hardware *starts out* with the *intention* of performing a parallel execution of say loading 64 bytes simultaneously, the number of parallel elements having arbitrarily set to 64.
>
> however at the FFIRSTed LD it goes, "wwwhoops, actually this is totally misaligned, and i can only load 12 bytes wiyjout a page fault"
>
> so the hardware goes, "ok i know i meant to load 64 elements but actuslly i only gonna do 12".
>
> now a quick and dirty hack way of doing this is to create a "fake" (hidden) predicate mask, which is implicitly ANDed eith all VVM Loop Vector ops following the FFIRSTed LD.
>
> it starts off at 0b111111111111...1111 and if say the 13th byte would segfault, then this fake hidden predicate mask would be set to 0b0000000111111111111.
<
These shenanigans are why CRAY vectors are not good for string handling..........
>
> that way, the hardware VVM loop detector need not try desperately to back out of any decisions, it simply automatically ANDs that hidden mask into all Vector operations up until the end of the loop. it also only increments the loop counter by 12 not 64.
>
> 10 years ago, Power ISA retired stringcopy instructions they added in 1994, this should tell you everything you need to know about whether they're a good idea to add in 2021.
>
> l.

Re: Proposal for Single instructions for string library functions on My 66000

<5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18673&group=comp.arch#18673

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:11cf:: with SMTP id n15mr1656677qtk.256.1626138621082;
Mon, 12 Jul 2021 18:10:21 -0700 (PDT)
X-Received: by 2002:a9d:3b0:: with SMTP id f45mr1465454otf.5.1626138620797;
Mon, 12 Jul 2021 18:10:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 12 Jul 2021 18:10:20 -0700 (PDT)
In-Reply-To: <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.176.255; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.40.176.255
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Tue, 13 Jul 2021 01:10:21 +0000
Content-Type: text/plain; charset="UTF-8"
 by: luke.l...@gmail.com - Tue, 13 Jul 2021 01:10 UTC

On Tuesday, July 13, 2021 at 1:54:55 AM UTC+1, MitchAlsup wrote:

> My 66000 VVM compiled code:
> <
> GLOBAL strlen
> ENTRY strlen
> strlen:
> MOV R2,#0
> VEC R4,{R2}
> LDUB R3,[R1+R2]
> LOOP T,R2,R3!=#0 // LOOP type 3
> MOV R1,R2
> RET

how many bytes can be parallel-loaded by LDUB being autovectorised?
(and, what's a loop type 3? :) )

is the possibility of performing more than one byte-load terminated by the use of R3!=0?

> > it starts off at 0b111111111111...1111 and if say the 13th byte would segfault, then this fake hidden predicate mask would be set to 0b0000000111111111111.
> <
> These shenanigans are why CRAY vectors are not good for string handling.........

the original one? absolutely agree. with SVE / RVV / SVP64 FFIRST.Load i respectfully disagree.

the idea there of a hidden predicate mask was to add speculative FFIRST.LOAD to *VVM* (Vertical-First), as a first cut, illustrating the basic principle.

for Cray-style (Horizontal-First) there are no "shenanigens": VL is truncated, there and then. subsequent instructions *automatically* operate on only the data that was LOADed.

that "fake predicate" was a first cut of an idea of how to add the same concept to VVM.

l.

Re: Proposal for Single instructions for string library functions on My 66000

<sciq9u$mgm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18676&group=comp.arch#18676

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Mon, 12 Jul 2021 18:29:33 -0700
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <sciq9u$mgm$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 13 Jul 2021 01:29:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2803600291fdb187722b2d5e72f33d36";
logging-data="23062"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Dma8pHSORamOEcOrCZKVGim3Elk9voac="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:+q9YRBdvXmYbP7oAIxOyvQ0gGQk=
In-Reply-To: <c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Tue, 13 Jul 2021 01:29 UTC

On 7/12/2021 5:17 PM, luke.l...@gmail.com wrote:
> On Tuesday, June 22, 2021 at 9:54:30 AM UTC+1, Thomas Koenig wrote:
>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>
>> While C was an amazing language design for its time and especially
>> for the hardware constraints of the machine it was developed for,
>> some of its features have not aged well. Null-terminated strings
>> are one of these features.

First of all, you messed up the attributions. The comments above are
Thomas Koenig's, not mine.

> it isn't going away.

I tend to agree.

MSRPC (aka DCE/RPC) uses length-specifiers: first heavy cost, a 16
bit integer added to every string. a zero byte string (empty) is
therefore 2 bytes. second: if strings are over 65535 bytes *you can't
have them* you are forced to use a 4 byte length encoder, now you have a
*4* byte overhead for short strings...want to do bith dynamicalky with
escaoe sequencing? that strncpy in microcode is looking reeeal
attractive by comparison, ehn?

Ignoring the escape and multi-byte characters, I have seen systems that
support an encoded length to handle exactly that problem. For example,
if the first byte is zero, the length is zero. If the high order bit of
the first byte is zero, the remaining seven bits encode the length (up
to 127). If the first bit is one but the second bit is zero remaining
14 bits in the first two bytes specify the length, up to 16K. If the
first two bits are one, you use the remaining 22 bits in the first three
bytes for very long strings. So, compared to null terminated strings,
for pretty short strings, the additional storage overhead is zero, and
it is one extra byte for strings between 127 and 16K. Overall, pretty
negligible.

Easier to use than null terminated for some operations, harder for
others. Typical engineering trade off.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18680&group=comp.arch#18680

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9244:: with SMTP id u65mr1759834qkd.46.1626140846224;
Mon, 12 Jul 2021 18:47:26 -0700 (PDT)
X-Received: by 2002:a9d:4c9a:: with SMTP id m26mr1468880otf.110.1626140846007;
Mon, 12 Jul 2021 18:47:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 12 Jul 2021 18:47:25 -0700 (PDT)
In-Reply-To: <5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2914:c98d:387d:bfef;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2914:c98d:387d:bfef
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
<5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Jul 2021 01:47:26 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3799
 by: MitchAlsup - Tue, 13 Jul 2021 01:47 UTC

On Monday, July 12, 2021 at 8:10:22 PM UTC-5, luke.l...@gmail.com wrote:
> On Tuesday, July 13, 2021 at 1:54:55 AM UTC+1, MitchAlsup wrote:
>
> > My 66000 VVM compiled code:
> > <
> > GLOBAL strlen
> > ENTRY strlen
> > strlen:
> > MOV R2,#0
> > VEC R4,{R2}
> > LDUB R3,[R1+R2]
> > LOOP T,R2,R3!=#0 // LOOP type 3
> > MOV R1,R2
> > RET
> how many bytes can be parallel-loaded by LDUB being autovectorised?
<
Cache width per cycle. A small machine will be able to do 16 iterations of the loop
per cycle while larger machines may be able to do 32-to-64 per cycle.
<
> (and, what's a loop type 3? :) )
<
The LOOP instruction has 32 sub-variants, and type 1 is purely counted loop, Type 2 is
purely data terminated loop with increments, type 3 is counted and data==0 terminated.
The syntax of the LOOP instruction was a bit hard to read/parse by eye, so we had the
compiler spit out a token which simplifies the job.
>
> is the possibility of performing more than one byte-load terminated by the use of R3!=0?
<
No, each lane of the loop performs this on its byte and the loop controller assimilates it
into what smells like a VL register but it actually goes into the LOOP instruction (which
is a branch instruction) which decides if the loop terminates, and which values are to
be left in the scalar registers.
<
> > > it starts off at 0b111111111111...1111 and if say the 13th byte would segfault, then this fake hidden predicate mask would be set to 0b0000000111111111111.
> > <
> > These shenanigans are why CRAY vectors are not good for string handling..........
<
> the original one? absolutely agree. with SVE / RVV / SVP64 FFIRST.Load i respectfully disagree.
<
Your code size seems to be about 2× what My 66000 code size happens to be.
>
> the idea there of a hidden predicate mask was to add speculative FFIRST.LOAD to *VVM* (Vertical-First), as a first cut, illustrating the basic principle.
>
> for Cray-style (Horizontal-First) there are no "shenanigens": VL is truncated, there and then. subsequent instructions *automatically* operate on only the data that was LOADed.
>
> that "fake predicate" was a first cut of an idea of how to add the same concept to VVM.
>
> l.

Re: Proposal for Single instructions for string library functions on My 66000

<c0c797c8-2eb8-4389-88c7-8a9ed441d742n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18682&group=comp.arch#18682

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:244c:: with SMTP id h12mr1705713qkn.249.1626141328122;
Mon, 12 Jul 2021 18:55:28 -0700 (PDT)
X-Received: by 2002:a9d:7f91:: with SMTP id t17mr1547666otp.22.1626141327815;
Mon, 12 Jul 2021 18:55:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 12 Jul 2021 18:55:27 -0700 (PDT)
In-Reply-To: <sciq9u$mgm$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.176.255; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.40.176.255
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <sciq9u$mgm$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c0c797c8-2eb8-4389-88c7-8a9ed441d742n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Tue, 13 Jul 2021 01:55:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Tue, 13 Jul 2021 01:55 UTC

On Tuesday, July 13, 2021 at 2:29:36 AM UTC+1, Stephen Fuld wrote:

> First of all, you messed up the attributions. The comments above are
> Thomas Koenig's, not mine.

apologies, i am using a very small device and the google groups interface is awesomely dire. it converts HTML formatted reply indentation into "nothingness" (i.e. ignores them entirely). this ignoringness basically removes one level of reply attribution, without people's knowledge or consent. it's done it again to my replies.

your replies are now attributed to me, thanks to google . if you were to switch to plaintext replies rather than "rich text" it would not mess up.

> > it isn't going away.
> I tend to agree.
> MSRPC (aka DCE/RPC) uses length-specifiers: first heavy cost, a 16

(see? you used HTML reply, and now my reply is attributed to you, according to google. this is because you use an HTML formatted mailer)

> bit integer added to every string. a zero byte string (empty) is
> therefore 2 bytes. second: if strings are over 65535 bytes *you can't
> have them* you are forced to use a 4 byte length encoder, now you have a
> *4* byte overhead for short strings...want to do bith dynamicalky with
> escaoe sequencing? that strncpy in microcode is looking reeeal
> attractive by comparison, ehn?

i recognise my style (and bad phone peck spelling) so the next para must be yours.

> Ignoring the escape and multi-byte characters, I have seen systems that
> support an encoded length to handle exactly that problem. For example,
> if the first byte is zero, the length is zero. If the high order bit of
> the first byte is zero, the remaining seven bits encode the length (up
> to 127). If the first bit is one but the second bit is zero remaining
> 14 bits in the first two bytes specify the length, up to 16K. If the
> first two bits are one, you use the remaining 22 bits in the first three
> bytes for very long strings. So, compared to null terminated strings,
> for pretty short strings, the additional storage overhead is zero, and
> it is one extra byte for strings between 127 and 16K. Overall, pretty
> negligible.

in storage space, yes. in terms of implementing in hardware, as a special
microcoded operation, quite risky. if the hardware microcode could be
*defined* (a la Transmeta) i would say that was a route worth pursuing.

an example we are seriously considering proposing to OPF is javascript
style FP to INT rounding. it has modulo 2^32 in the actual specification.
whoever thought that it was a good idea to convert FP to INT by doing
modulo arithmetic instead of saturation (like any sensible rounding would)
needs their frickin head examined.

typical implementations of FP to INT rounding are FORTY FIVE
instructions and involve FIVE branches.

in hardware however it is far simpler, and, here's the kicker, javascript
is hardly likely to drop this or change the spec any time soon.

consequently as a high profile stable and expensive operation it is
easy to justify adding.

various string routines, not standardised, very tricky.

l.

Re: Proposal for Single instructions for string library functions on My 66000

<ead1640f-ef19-44fa-8b67-949a7edd9f64n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18683&group=comp.arch#18683

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:e713:: with SMTP id m19mr1836701qka.98.1626141867508;
Mon, 12 Jul 2021 19:04:27 -0700 (PDT)
X-Received: by 2002:a4a:2242:: with SMTP id z2mr1575470ooe.90.1626141867211;
Mon, 12 Jul 2021 19:04:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 12 Jul 2021 19:04:26 -0700 (PDT)
In-Reply-To: <4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.176.255; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.40.176.255
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
<5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com> <4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ead1640f-ef19-44fa-8b67-949a7edd9f64n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Tue, 13 Jul 2021 02:04:27 +0000
Content-Type: text/plain; charset="UTF-8"
 by: luke.l...@gmail.com - Tue, 13 Jul 2021 02:04 UTC

On Tuesday, July 13, 2021 at 2:47:27 AM UTC+1, MitchAlsup wrote:
> On Monday, July 12, 2021 at 8:10:22 PM UTC-5, luke.l...@gmail.com wrote:

> > is the possibility of performing more than one byte-load terminated by the use of R3!=0?
> <
> No, each lane of the loop performs this on its byte and the loop controller assimilates it
> into what smells like a VL register but it actually goes into the LOOP instruction (which
> is a branch instruction) which decides if the loop terminates, and which values are to
> be left in the scalar registers.

niice. i have to assimilate this. effectively it automaticallyy incorporates the set-before-first capability into LOOP.

l.

Re: Proposal for Single instructions for string library functions on My 66000

<jwvk0luzz8r.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18684&group=comp.arch#18684

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on My 66000
Date: Mon, 12 Jul 2021 23:22:10 -0400
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <jwvk0luzz8r.fsf-monnier+comp.arch@gnu.org>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
<sciq9u$mgm$1@dont-email.me>
<c0c797c8-2eb8-4389-88c7-8a9ed441d742n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="f250f55a76c9e598310f18e552e46316";
logging-data="15727"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+LVzgdgVaKtzEXfz/Ovl0/"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:Bkhq3W9L3wFO69PAB7odykaKppw=
sha1:Wa7g2qgyEwETKViXy0Y7snfUol8=
 by: Stefan Monnier - Tue, 13 Jul 2021 03:22 UTC

>> > it isn't going away.

Agreed, tho zero-terminated strings are rarely important for performance
nowadays I'd think.

>> bit integer added to every string. a zero byte string (empty) is
>> therefore 2 bytes. second: if strings are over 65535 bytes *you can't
>> have them* you are forced to use a 4 byte length encoder, now you have a
>> *4* byte overhead for short strings...

Nowadays you have to accommodate strings longer than 4GB, so you'll want
64bit for the length. In Emacs, a string object comes with 2
length fields, both of them `ptrdiff_t` (one of them is for the length
in bytes, the other for the length in "characters").

We haven't made effort to assess whether it was the optimal/best choice,
but small strings are never the dominant factor is total heap size.

> in storage space, yes. in terms of implementing in hardware, as a special
> microcoded operation, quite risky. if the hardware microcode could be
> *defined* (a la Transmeta) i would say that was a route worth pursuing.

Agreed, I don't see much need for special hardware support for string
processing (whether zero-terminated or with a specified length).
If it's a significant part of your total processing time, there's a good
chance that the better way to speed it up is to use a different
representation rather than to try and squeeze a few extra percents from
the hardware.

> typical implementations of FP to INT rounding are FORTY FIVE
> instructions and involve FIVE branches.

But JS compilers should rarely need to generate such code, because they
should do enough analysis that in 99% of the cases they know that the
inputs were themselves integers and to do the whole computation without
any FP ops at all.

> in hardware however it is far simpler, and, here's the kicker, javascript
> is hardly likely to drop this or change the spec any time soon.

But JS compilers will still want to use integer ops instead of FP ops
whenever possible even if you can make this operation a bit faster, so
I'd expect this special hardware support to be largely unused.

Stefan

Re: Proposal for Single instructions for string library functions on My 66000

<scjd0o$dso$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18692&group=comp.arch#18692

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-2e47-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions
on My 66000
Date: Tue, 13 Jul 2021 06:48:57 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <scjd0o$dso$1@newsreader4.netcologne.de>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
Injection-Date: Tue, 13 Jul 2021 06:48:57 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-2e47-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:2e47:0:7285:c2ff:fe6c:992d";
logging-data="14232"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 13 Jul 2021 06:48 UTC

luke.l...@gmail.com <luke.leighton@gmail.com> schrieb:
> On Tuesday, June 22, 2021 at 9:54:30 AM UTC+1, Thomas Koenig wrote:
>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>
>> While C was an amazing language design for its time and especially
>> for the hardware constraints of the machine it was developed for,
>> some of its features have not aged well. Null-terminated strings
>> are one of these features.
>
> it isn't going away. MSRPC (aka DCE/RPC) uses length-specifiers:
> first heavy cost, a 16 bit integer added to every string.

Unless a lot of your strings are very small, that should not be
an issue.

And, of course, this means that you cannot use strings to
store arbitrary data.

You need a 64-bit (or, less often these days, a 32-bit) pointer
to the string anyway.

>a zero
> byte string (empty) is therefore 2 bytes. second: if strings are
> over 65535 bytes *you can't have them* you are forced to use a 4
> byte length encoder, now you have a *4* byte overhead for short
> strings...want to do bith dynamicalky with escaoe sequencing?

Naw, just put an 8-byte length on the string and be done with it.
This is what gfortran does (switched from 4-byte string length
some time ago).

If you want to play games, put the string data for short strings
into a union with the metadata, as is done for C++ strings
(minus the NULL terminator of course).

> that
> strncpy in microcode is looking reeeal attractive by comparison,
> ehn?

Not really.

> then also you are forgetting that UTF8 is the de-facto web
> internet encoding format, and that is a whole new ballgane for
> which specislist opcodes are well known (except by me, sigh,
> i just heard about them)

Which specialized opcodes for which architecture?

Re: Proposal for Single instructions for string library functions on My 66000

<c8b3670c-d998-4808-87cc-ad8ee43ad100n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18704&group=comp.arch#18704

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:e54e:: with SMTP id n14mr5529535qvm.41.1626192773509; Tue, 13 Jul 2021 09:12:53 -0700 (PDT)
X-Received: by 2002:aca:5cd7:: with SMTP id q206mr3622875oib.99.1626192773284; Tue, 13 Jul 2021 09:12:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 09:12:53 -0700 (PDT)
In-Reply-To: <jwvk0luzz8r.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de> <c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <sciq9u$mgm$1@dont-email.me> <c0c797c8-2eb8-4389-88c7-8a9ed441d742n@googlegroups.com> <jwvk0luzz8r.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c8b3670c-d998-4808-87cc-ad8ee43ad100n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Jul 2021 16:12:53 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 42
 by: MitchAlsup - Tue, 13 Jul 2021 16:12 UTC

On Monday, July 12, 2021 at 10:22:13 PM UTC-5, Stefan Monnier wrote:
> >> > it isn't going away.
> Agreed, tho zero-terminated strings are rarely important for performance
> nowadays I'd think.
> >> bit integer added to every string. a zero byte string (empty) is
> >> therefore 2 bytes. second: if strings are over 65535 bytes *you can't
> >> have them* you are forced to use a 4 byte length encoder, now you have a
> >> *4* byte overhead for short strings...
> Nowadays you have to accommodate strings longer than 4GB, so you'll want
> 64bit for the length. In Emacs, a string object comes with 2
> length fields, both of them `ptrdiff_t` (one of them is for the length
> in bytes, the other for the length in "characters").
>
> We haven't made effort to assess whether it was the optimal/best choice,
> but small strings are never the dominant factor is total heap size.
> > in storage space, yes. in terms of implementing in hardware, as a special
> > microcoded operation, quite risky. if the hardware microcode could be
> > *defined* (a la Transmeta) i would say that was a route worth pursuing.
> Agreed, I don't see much need for special hardware support for string
> processing (whether zero-terminated or with a specified length).
> If it's a significant part of your total processing time, there's a good
> chance that the better way to speed it up is to use a different
> representation rather than to try and squeeze a few extra percents from
> the hardware.
<
> > typical implementations of FP to INT rounding are FORTY FIVE
> > instructions and involve FIVE branches.
<
A) There are 48 ways to do {float,double}FP<->INT{signed,unsigned}
B) there are instructions in My 66000 to do any and all of them.
<
> But JS compilers should rarely need to generate such code, because they
> should do enough analysis that in 99% of the cases they know that the
> inputs were themselves integers and to do the whole computation without
> any FP ops at all.
> > in hardware however it is far simpler, and, here's the kicker, javascript
> > is hardly likely to drop this or change the spec any time soon.
> But JS compilers will still want to use integer ops instead of FP ops
> whenever possible even if you can make this operation a bit faster, so
> I'd expect this special hardware support to be largely unused.
>
>
> Stefan

Re: Proposal for Single instructions for string library functions on My 66000

<sclbk4$dsb$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18727&group=comp.arch#18727

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 13 Jul 2021 17:37:23 -0700
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <sclbk4$dsb$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com>
<sciq9u$mgm$1@dont-email.me>
<c0c797c8-2eb8-4389-88c7-8a9ed441d742n@googlegroups.com>
<jwvk0luzz8r.fsf-monnier+comp.arch@gnu.org>
<c8b3670c-d998-4808-87cc-ad8ee43ad100n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 14 Jul 2021 00:37:24 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c7b08b79d36dfc4ff85d3651a084697d";
logging-data="14219"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3Nhop/1c/jEk4KOMW7Qw9"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:eBMRYjuOhF+SAfjLtLy/pwX2GsA=
In-Reply-To: <c8b3670c-d998-4808-87cc-ad8ee43ad100n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 14 Jul 2021 00:37 UTC

On 7/13/2021 9:12 AM, MitchAlsup wrote:
> On Monday, July 12, 2021 at 10:22:13 PM UTC-5, Stefan Monnier wrote:
>>>>> it isn't going away.
>> Agreed, tho zero-terminated strings are rarely important for performance
>> nowadays I'd think.
>>>> bit integer added to every string. a zero byte string (empty) is
>>>> therefore 2 bytes. second: if strings are over 65535 bytes *you can't
>>>> have them* you are forced to use a 4 byte length encoder, now you have a
>>>> *4* byte overhead for short strings...
>> Nowadays you have to accommodate strings longer than 4GB, so you'll want
>> 64bit for the length. In Emacs, a string object comes with 2
>> length fields, both of them `ptrdiff_t` (one of them is for the length
>> in bytes, the other for the length in "characters").
>>
>> We haven't made effort to assess whether it was the optimal/best choice,
>> but small strings are never the dominant factor is total heap size.
>>> in storage space, yes. in terms of implementing in hardware, as a special
>>> microcoded operation, quite risky. if the hardware microcode could be
>>> *defined* (a la Transmeta) i would say that was a route worth pursuing.
>> Agreed, I don't see much need for special hardware support for string
>> processing (whether zero-terminated or with a specified length).
>> If it's a significant part of your total processing time, there's a good
>> chance that the better way to speed it up is to use a different
>> representation rather than to try and squeeze a few extra percents from
>> the hardware.
> <
>>> typical implementations of FP to INT rounding are FORTY FIVE
>>> instructions and involve FIVE branches.
> <
> A) There are 48 ways to do {float,double}FP<->INT{signed,unsigned}
> B) there are instructions in My 66000 to do any and all of them.

48 ways? Two FP radices, five FP rounding modes, two int signednesses, 4
integer overflow behavior. That's 80. If you are changing width in the
process (we don't) then it's 4 FP widths X 4 integer widths, which
explodes to much more than 48 and even if only {float, double}->single
sized int it runs the count to 160 cases. You are probably not dealing
with decimal, so we're back to 80, but not 48.

How do you get 48?

Re: Proposal for Single instructions for string library functions on My 66000

<18312ff6-244b-4172-9da4-aa0b281cbe80n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18729&group=comp.arch#18729

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:11cf:: with SMTP id n15mr6845962qtk.256.1626223971619;
Tue, 13 Jul 2021 17:52:51 -0700 (PDT)
X-Received: by 2002:aca:31ca:: with SMTP id x193mr5282506oix.84.1626223971348;
Tue, 13 Jul 2021 17:52:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 17:52:51 -0700 (PDT)
In-Reply-To: <4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.186.234; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.40.186.234
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
<5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com> <4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <18312ff6-244b-4172-9da4-aa0b281cbe80n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Wed, 14 Jul 2021 00:52:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luke.l...@gmail.com - Wed, 14 Jul 2021 00:52 UTC

On Tuesday, July 13, 2021 at 2:47:27 AM UTC+1, MitchAlsup wrote:

> Your code size seems to be about 2× what My 66000 code size happens to be.

yes. i am both impressed and also now challenged to unpack this. do, walking through it:

strlen:
MOV R2,#0
VEC R4,{R2}
LDUB R3,[R1+R2]
LOOP T,R2,R3!=#0 // LOOP type 3
MOV R1,R2
RET

let us assume that the hardware is a multi-issue precise-capable microarchitecture apable of doing 32 LDUBs per loop. given this is actually only 8 64-bit LDs it is not unreasonable.

let us assume that the Vec hardware attempts all 32. also that byte 17 crosses a page boundary and in speculative load terms throws a page fault

if the VVM hardware at this point says "ok 17 is only possible, cancel the other 15 elements" by pulling their Shadow "Die" fkag, we have unwound the speculative in-flight execution to the 17 element mark.

the internal loop counter can then also unwind to 17.

also, the 15 speculatively-running LOOP R3!=0 tests can *also be Shadow cancelled*.

this means that although the Loop counter started out at 32 *it is safe to truncate to 17* all based on simple application of Precise Exception Shadow handling of speculative execution.

although i probably have some details a bit fuzzy, is this basically how VVM works?

because if so it *already has FFIRST.LOAD behaviour inherently built-in* without an explicit ffirst load instruction being needed.

l.

Re: Proposal for Single instructions for string library functions on My 66000

<251feb87-86b3-42f8-a250-829e321cae36n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18731&group=comp.arch#18731

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5f83:: with SMTP id j3mr7026643qta.149.1626228010526;
Tue, 13 Jul 2021 19:00:10 -0700 (PDT)
X-Received: by 2002:a9d:3b0:: with SMTP id f45mr6380136otf.5.1626228010141;
Tue, 13 Jul 2021 19:00:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 19:00:09 -0700 (PDT)
In-Reply-To: <sclbk4$dsb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <sciq9u$mgm$1@dont-email.me>
<c0c797c8-2eb8-4389-88c7-8a9ed441d742n@googlegroups.com> <jwvk0luzz8r.fsf-monnier+comp.arch@gnu.org>
<c8b3670c-d998-4808-87cc-ad8ee43ad100n@googlegroups.com> <sclbk4$dsb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <251feb87-86b3-42f8-a250-829e321cae36n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Jul 2021 02:00:10 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 14 Jul 2021 02:00 UTC

On Tuesday, July 13, 2021 at 7:37:27 PM UTC-5, Ivan Godard wrote:
> On 7/13/2021 9:12 AM, MitchAlsup wrote:
> > On Monday, July 12, 2021 at 10:22:13 PM UTC-5, Stefan Monnier wrote:
> >>>>> it isn't going away.
> >> Agreed, tho zero-terminated strings are rarely important for performance
> >> nowadays I'd think.
> >>>> bit integer added to every string. a zero byte string (empty) is
> >>>> therefore 2 bytes. second: if strings are over 65535 bytes *you can't
> >>>> have them* you are forced to use a 4 byte length encoder, now you have a
> >>>> *4* byte overhead for short strings...
> >> Nowadays you have to accommodate strings longer than 4GB, so you'll want
> >> 64bit for the length. In Emacs, a string object comes with 2
> >> length fields, both of them `ptrdiff_t` (one of them is for the length
> >> in bytes, the other for the length in "characters").
> >>
> >> We haven't made effort to assess whether it was the optimal/best choice,
> >> but small strings are never the dominant factor is total heap size.
> >>> in storage space, yes. in terms of implementing in hardware, as a special
> >>> microcoded operation, quite risky. if the hardware microcode could be
> >>> *defined* (a la Transmeta) i would say that was a route worth pursuing.
> >> Agreed, I don't see much need for special hardware support for string
> >> processing (whether zero-terminated or with a specified length).
> >> If it's a significant part of your total processing time, there's a good
> >> chance that the better way to speed it up is to use a different
> >> representation rather than to try and squeeze a few extra percents from
> >> the hardware.
> > <
> >>> typical implementations of FP to INT rounding are FORTY FIVE
> >>> instructions and involve FIVE branches.
> > <
> > A) There are 48 ways to do {float,double}FP<->INT{signed,unsigned}
> > B) there are instructions in My 66000 to do any and all of them.
> 48 ways? Two FP radices, five FP rounding modes, two int signednesses, 4
> integer overflow behavior. That's 80. If you are changing width in the
> process (we don't) then it's 4 FP widths X 4 integer widths, which
> explodes to much more than 48 and even if only {float, double}->single
> sized int it runs the count to 160 cases. You are probably not dealing
> with decimal, so we're back to 80, but not 48.
>
> How do you get 48?
<
There are 5 ways to round Double into UnSigned, Signed, or Float; each.
There are 5 ways to round Float into UnSigned or Signed; each.
There are 5 ways to round UnSigned into Double or Float; each
There are 5 ways to round Signed into Double or Float; each
And there are assignment conversions:
UnSigned = Signed,
Signed = UnSigned , and
Double = Float.
<
15+10+10+10+3 = 48
<
The singed = unsigned is not a MOV saturating at SPOSMAX
The unsigned = signed is not a MOV saturating negatives to zero.

Re: Proposal for Single instructions for string library functions on My 66000

<3e39d954-94ec-4b3a-ba12-8b5b5c7613a4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18732&group=comp.arch#18732

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5c8c:: with SMTP id r12mr6954449qta.265.1626228600299;
Tue, 13 Jul 2021 19:10:00 -0700 (PDT)
X-Received: by 2002:a9d:1d23:: with SMTP id m32mr5970195otm.16.1626228600095;
Tue, 13 Jul 2021 19:10:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 19:09:59 -0700 (PDT)
In-Reply-To: <18312ff6-244b-4172-9da4-aa0b281cbe80n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<c1c7689e-2cd8-4648-a2e4-b489c0cf9767n@googlegroups.com> <b02d9d10-9cd4-4263-97b0-43ccf6cfd335n@googlegroups.com>
<5a1b1325-c43d-474c-a79a-c3884519464fn@googlegroups.com> <4759e6c1-02fa-4a56-9582-d6839137f9e6n@googlegroups.com>
<18312ff6-244b-4172-9da4-aa0b281cbe80n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3e39d954-94ec-4b3a-ba12-8b5b5c7613a4n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Jul 2021 02:10:00 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Wed, 14 Jul 2021 02:09 UTC

On Tuesday, July 13, 2021 at 7:52:52 PM UTC-5, luke.l...@gmail.com wrote:
> On Tuesday, July 13, 2021 at 2:47:27 AM UTC+1, MitchAlsup wrote:
> > Your code size seems to be about 2× what My 66000 code size happens to be.
> yes. i am both impressed and also now challenged to unpack this. do, walking through it:
> strlen:
> MOV R2,#0
> VEC R4,{R2}
> LDUB R3,[R1+R2]
> LOOP T,R2,R3!=#0 // LOOP type 3
> MOV R1,R2
> RET
> let us assume that the hardware is a multi-issue precise-capable microarchitecture apable of doing 32 LDUBs per loop. given this is actually only 8 64-bit LDs it is not unreasonable.
<
32 bytes per cycle read is not unreasonable for a higher end machine--this may take 2 or even 4
cache banks to be performed in 1 cycle.
>
> let us assume that the Vec hardware attempts all 32. also that byte 17 crosses a page boundary and in speculative load terms throws a page fault
>
> if the VVM hardware at this point says "ok 17 is only possible, cancel the other 15 elements" by pulling their Shadow "Die" fkag, we have unwound the speculative in-flight execution to the 17 element mark.
<
It is more like 15 of them all detect page fault and cancel themselves, letting the vector
termination sequencer to clean up the mess. The 17 which got data compare to zero,
and if detected, terminate the loop with i containing the proper strlen. If no zero was
found, the page fault control transfer is taken with IP pointing at LDUB and Ri properly set.
<
After page fault is "handled" control returns, the 17th LDUB is performed, then the LOOP
instruction is performed, which transfers control back to the VEC instruction, instead of
the LDUB instruction, the vectorized loop is setup again and then vectorized running continues.
>
> the internal loop counter can then also unwind to 17.
>
> also, the 15 speculatively-running LOOP R3!=0 tests can *also be Shadow cancelled*.
>
> this means that although the Loop counter started out at 32 *it is safe to truncate to 17* all based on simple application of Precise Exception Shadow handling of speculative execution.
>
> although i probably have some details a bit fuzzy, is this basically how VVM works?
<
Tolerably good guess.
>
> because if so it *already has FFIRST.LOAD behaviour inherently built-in* without an explicit ffirst load instruction being needed.
<
Precisely! No go figure out how you don't need that instruction, either.
>
> l.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor