Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Before Xerox, five carbons were the maximum extension of anybody's ego.


devel / comp.arch / Proposal for Single instructions for string library functions on My 66000

SubjectAuthor
* Proposal for Single instructions for string library functions on MyStephen Fuld
+* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
|`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| `- Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
+* Re: Proposal for Single instructions for string library functions onBrian G. Lucas
|+- Re: Proposal for Single instructions for string library functions onMarcus
|`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| `* Re: Proposal for Single instructions for string library functions onBGB
|  `- Re: Proposal for Single instructions for string library functions onMitchAlsup
+* Re: Proposal for Single instructions for string library functionsThomas Koenig
|+* Re: Proposal for Single instructions for string library functions onStephen Fuld
||`* Re: Proposal for Single instructions for string library functionsThomas Koenig
|| +- Re: Proposal for Single instructions for string library functions onMarcus
|| `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||  `* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
||   `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||    `* Re: Proposal for Single instructions for string library functions onMitchAlsup
||     +- Re: Proposal for Single instructions for string library functions onIvan Godard
||     +* Re: Proposal for Single instructions for string library functions onTerje Mathisen
||     |+- Re: Proposal for Single instructions for string library functions onIvan Godard
||     |`* Re: Proposal for Single instructions for string library functions onMitchAlsup
||     | `* Re: Proposal for Single instructions for string library functions onTerje Mathisen
||     |  `* Re: Proposal for Single instructions for string library functions onMichael S
||     |   `- Re: Proposal for Single instructions for string library functions onMitchAlsup
||     `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||      `* Re: Proposal for Single instructions for string library functions onMitchAlsup
||       `* Re: Proposal for Single instructions for string library functions onStephen Fuld
||        `* Re: Proposal for Single instructions for string library functions onMitchAlsup
||         `* Re: Proposal for Single instructions for string library functions onMarcus
||          `* Re: Proposal for Single instructions for string library functionsThomas Koenig
||           `* Re: Proposal for Single instructions for string library functions onMarcus
||            `* Re: Proposal for Single instructions for string library functionsThomas Koenig
||             `* Re: Proposal for Single instructions for string library functions onMarcus
||              `- Re: Proposal for Single instructions for string library functions onMitchAlsup
|`* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| +* Re: Proposal for Single instructions for string library functions onMitchAlsup
| |`* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | `* Re: Proposal for Single instructions for string library functions onMitchAlsup
| |  +- Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |  `* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |   `* Re: Proposal for Single instructions for string library functions onMitchAlsup
| |    `- Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| +* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |`* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | +* Re: Proposal for Single instructions for string library functions on My 66000Stefan Monnier
| | |`* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | | `* Re: Proposal for Single instructions for string library functions onIvan Godard
| | |  `* Re: Proposal for Single instructions for string library functions onMitchAlsup
| | |   +* Re: Proposal for Single instructions for string library functions onIvan Godard
| | |   |+* Re: Proposal for Single instructions for string library functions onTerje Mathisen
| | |   ||`* Re: Proposal for Single instructions for string library functionsEricP
| | |   || +* Re: Proposal for Single instructions for string library functions onMitchAlsup
| | |   || |`* Re: Proposal for Single instructions for string library functionsEricP
| | |   || | +- Re: Proposal for Single instructions for string library functions onIvan Godard
| | |   || | +* Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | |   || | |`* Re: Proposal for Single instructions for string library functionsEricP
| | |   || | | `* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | |   || | |  `- Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | |   || | `- Re: Proposal for Single instructions for string library functions onTerje Mathisen
| | |   || `* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| | |   ||  `* Re: Proposal for Single instructions for string library functions onIvan Godard
| | |   ||   `- Re: Proposal for Single instructions for string library functions onMitchAlsup
| | |   |`- Re: Proposal for Single instructions for string library functions on My 66000MitchAlsup
| | |   `- Re: Proposal for Single instructions for string library functions onMitchAlsup
| | `* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |  +- Re: Proposal for Single instructions for string library functionsBranimir Maksimovic
| |  +* Re: Proposal for Single instructions for string library functions on My 66000Stefan Monnier
| |  |`- Re: Proposal for Single instructions for string library functions onDavid Brown
| |  `* Re: Proposal for Single instructions for string library functions on My 66000George Neuner
| |   `* Re: Proposal for Single instructions for string library functions onDavid Brown
| |    +* Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |    |`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |    | +* Re: Proposal for Single instructions for string library functions onDavid Brown
| |    | |`* Re: Proposal for Single instructions for string library functions onStephen Fuld
| |    | | `- Re: Proposal for Single instructions for string library functions onluke.l...@gmail.com
| |    | `* Re: Proposal for Single instructions for string library functions onIvan Godard
| |    |  +- Re: Proposal for Single instructions for string library functions onStephen Fuld
| |    |  `- Re: Proposal for Single instructions for string library functions onTerje Mathisen
| |    `* Re: Proposal for Single instructions for string library functions on My 66000George Neuner
| |     `* Re: Proposal for Single instructions for string library functions onDavid Brown
| |      `- Re: Proposal for Single instructions for string library functions onStephen Fuld
| `- Re: Proposal for Single instructions for string library functionsThomas Koenig
`* Re: Proposal for Single instructions for string library functions onTerje Mathisen
 `- Re: Proposal for Single instructions for string library functions onStephen Fuld

Pages:1234
Proposal for Single instructions for string library functions on My 66000

<sar8dp$d9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18019&group=comp.arch#18019

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Proposal for Single instructions for string library functions on My
66000
Date: Mon, 21 Jun 2021 16:47:03 -0700
Organization: A noiseless patient Spider
Lines: 141
Message-ID: <sar8dp$d9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Jun 2021 23:47:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4febf96f37c05179e389ec85804b50a9";
logging-data="425"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19gLBvZwGGLIuSd98YJtPC1mjxyg+U/N1A="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:n3C5uGYBGzEjUB50IH+CPv92kXc=
Content-Language: en-US
X-Mozilla-News-Host: snews://news.eternal-september.org:563
 by: Stephen Fuld - Mon, 21 Jun 2021 23:47 UTC

I was content to assume the use of VVM for most/all of the C string
library functions on Mitches MY 66000. The capabilities, especially the
ones “underneath” the ISA level to stream values from/to memory and do
multiple byte compares in a single cycle, make a huge improvement in the
performance of these functions.

However, once Mitch introduced the MM (Memory Move) instruction, which
makes a single instruction out of what would otherwise be a short VVM
sequence of instructions, that made me try to think about the issues
involved in adding single instructions to implement (perhaps some of)
the other string functions. This is what I have so far.

A few caveats.

First, IANAHG, and this certainly not a detailed proposal. I don’t know
enough to make one. It is just some thoughts on the issue. Many of
these ideas and statements may be wrong, and of course, I welcome
corrections.

Second, I looked a little but could not find information giving the
frequency of use of each of the functions, so any ideas there are just
my, probably poor, intuition. Again, I welcome more information.

Third, I realize that Mitch added the MM instruction to speed up
structure moves, and this means higher use of the MM instruction than
strictly as a library function replacement. But this provides the
infrastructure making the implementation of similar instructions easier.
(i.e. arbitrary number of TLB misses per instruction,
interruptable/resumable instructions, etc.) And several of the proposed
instructions have other uses too.

So, is it worth considering?

The big pro to this is performance. While the VVM solution eliminates
the cost of fetching, decoding and executing the multiple instructions
in the loop on other than the first pass, there is still more cost to
executing the several instructions in the loop once (on the first pass)
than doing so for a single instruction. However this is usually a one
time cost, and may not by huge when talking about a longer running
function. But if the function is, in fact, long running, then it it
more likely to encounter an interrupt, in which case the “first pass”
cost is encountered again. Similarly, a single instruction takes less
space in the instruction stream and the I-cache than a sequence of
several instructions. (Note that this assumes in-lining the function.
If it is called, there is even more overhead.) And functions like
strcat require two VVM loops to complete, although upon an interrupt,
only one of them must incur the extra cost. Again, I didn’t think this
was huge, but the fact that Mitch thought it was worthwhile enough to
eliminate made me reconsider.

For some functions, there are opportunities for further substantial
performance gains. (see below)

The big cons are the cost of implementing any new instruction. First,
there is the cost of the gates to do it, although for many of the
functions, they are already there, so it adds only the logic to invoke
them.

Second is the cost of additional op-codes. While there are twenty some
functions, I don’t propose anywhere close to that many op codes. I
suspect that some of the functions, especially the ones that require an
additional potential character substitution per character (e.g. the
localization functions) aren’t good candidates for single instruction
implementation. I also probably wouldn’t do the errno lookup function,
as presumably it is infrequently called and never in the critical path.

And many of the functions are essentially small modifications of others,
e.g. strcmp and strncmp. These two can be handled using a single op
code but using the Carry meta instruction to indicate “use the n
version” and specify the register that contains n. Applying this logic
to other similar cases reduces the number of op codes further. Even
where you don’t need the additional register, you could use the presence
of the Carry indicator bit to modify the instruction, e.g. strchr and
strrchr. There are several choices for how to implement this. I don’t
pretend to know the best one.

Lastly, there are a number of functions, mostly the “nested loop” ones
that would gain substantial benefit from being able to use an
instruction that implements another string function as a “building
block” to speed them up, even without a dedicated op code. See below
for an example.

Combining all of these, I think you could get down to a single digit
number of new op codes for most of the desired functionality.

The “nested loop” functions are the ones, such as strpbrk that require
you to code a nested loop, the outer loop going over the first string,
the inner loop going over the second string. The code that is in the
outer loop, but not in the inner loop is just loading the next byte and
checking it for a value of zero. This will work, but there is a
performance issue. VVM loops can’t be nested. So, assuming you use VVM
for the inner loop, the outer loop will cost relatively a lot, as it
can’t use the streaming and multi byte compare capabilities of VVM.
But, if you have the single instruction that searches for a character
match in a string (strchr), you can use this single instruction (plus
perhaps the Carry modifier), as the inner loop, thus enabling you to use
VVM for the outer loop. So while you still have essentially a nested
loop (the strchr instruction is essentially a loop), you have
substantially sped up the operation.

One last thing. While thinking about this, I had another idea regarding
some of these nested loop functions. I am guessing that for a
non-trivial percentage of these, the string containing the “to be looked
for” characters is, in fact, a compile time constant. An example of
this is searching text for any “white space” character. For these
cases, perhaps we can use the features built into the My 66000, together
with some things that hardware does better than software to do better
than even the method I outlined in the previous paragraph.

The idea is as follows. Let’s use strspn as an example. If the compiler
sees that the second string is a compile time constant, it builds a 256
bit map, with a one bit set for the value in each character in the
string. It then emits an instruction giving the addresses of the first
and this newly constructed second string as the two source operands. The
result operand will be used to hold the count. When the instruction is
encountered, the hardware loads the 256 bit (32 byte) bit map into one
of the available buffers. It also loads the starting bytes of the first
string into another streaming buffer. It then starts going through the
first string, using the value of each byte as an index into the second
string (this ability to have a 256 bit index into a 32 byte string is
the thing that the hardware can do better (faster) than software. The
hardware proceeds through the first string, looking up each byte and
looking for a one bit, incrementing a counter for each byte. The
presence of the carry flag could be used to reverse the sense of the
test, giving strcspn. The hope is that it can do one byte per cycle, or
perhaps one byte per two cycles. In any event, it certainly is much
faster than a nested loop as it becomes a single loop. I don’t know
if the advantages of this are worth the extra implementation cost, but I
wanted to mention it.

Let me conclude by re-emphasizing that this whole idea (single
instructions for string functions) might not make sense, or might not be
worthwhile, or it might be the wrong way to implement the functionality,
etc. But I want to present it to get reactions and potential improvements.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<b000a2bf-af34-4f2e-a41b-756a987275a5n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18021&group=comp.arch#18021

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:c1:: with SMTP id p1mr1156160qtw.231.1624321608630; Mon, 21 Jun 2021 17:26:48 -0700 (PDT)
X-Received: by 2002:a9d:82b:: with SMTP id 40mr663165oty.81.1624321608318; Mon, 21 Jun 2021 17:26:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Jun 2021 17:26:48 -0700 (PDT)
In-Reply-To: <sar8dp$d9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1df4:f8da:a2d6:40c0; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1df4:f8da:a2d6:40c0
References: <sar8dp$d9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b000a2bf-af34-4f2e-a41b-756a987275a5n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 22 Jun 2021 00:26:48 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 208
 by: MitchAlsup - Tue, 22 Jun 2021 00:26 UTC

On Monday, June 21, 2021 at 6:47:08 PM UTC-5, Stephen Fuld wrote:
> I was content to assume the use of VVM for most/all of the C string
> library functions on Mitches MY 66000. The capabilities, especially the
> ones “underneath” the ISA level to stream values from/to memory and do
> multiple byte compares in a single cycle, make a huge improvement in the
> performance of these functions.
>
> However, once Mitch introduced the MM (Memory Move) instruction, which
> makes a single instruction out of what would otherwise be a short VVM
> sequence of instructions, that made me try to think about the issues
> involved in adding single instructions to implement (perhaps some of)
> the other string functions. This is what I have so far.
>
> A few caveats.
>
> First, IANAHG, and this certainly not a detailed proposal. I don’t know
> enough to make one. It is just some thoughts on the issue. Many of
> these ideas and statements may be wrong, and of course, I welcome
> corrections.
>
> Second, I looked a little but could not find information giving the
> frequency of use of each of the functions, so any ideas there are just
> my, probably poor, intuition. Again, I welcome more information.
>
> Third, I realize that Mitch added the MM instruction to speed up
> structure moves, and this means higher use of the MM instruction than
> strictly as a library function replacement. But this provides the
> infrastructure making the implementation of similar instructions easier.
> (i.e. arbitrary number of TLB misses per instruction,
> interruptable/resumable instructions, etc.) And several of the proposed
> instructions have other uses too.
>
> So, is it worth considering?
>
> The big pro to this is performance. While the VVM solution eliminates
> the cost of fetching, decoding and executing the multiple instructions
> in the loop on other than the first pass, there is still more cost to
> executing the several instructions in the loop once (on the first pass)
> than doing so for a single instruction. However this is usually a one
> time cost, and may not by huge when talking about a longer running
> function. But if the function is, in fact, long running, then it it
> more likely to encounter an interrupt, in which case the “first pass”
> cost is encountered again. Similarly, a single instruction takes less
> space in the instruction stream and the I-cache than a sequence of
> several instructions. (Note that this assumes in-lining the function.
> If it is called, there is even more overhead.) And functions like
> strcat require two VVM loops to complete, although upon an interrupt,
> only one of them must incur the extra cost. Again, I didn’t think this
> was huge, but the fact that Mitch thought it was worthwhile enough to
> eliminate made me reconsider.
>
> For some functions, there are opportunities for further substantial
> performance gains. (see below)
>
> The big cons are the cost of implementing any new instruction. First,
> there is the cost of the gates to do it, although for many of the
> functions, they are already there, so it adds only the logic to invoke
> them.
>
> Second is the cost of additional op-codes. While there are twenty some
> functions, I don’t propose anywhere close to that many op codes. I
> suspect that some of the functions, especially the ones that require an
> additional potential character substitution per character (e.g. the
> localization functions) aren’t good candidates for single instruction
> implementation. I also probably wouldn’t do the errno lookup function,
> as presumably it is infrequently called and never in the critical path.
>
> And many of the functions are essentially small modifications of others,
> e.g. strcmp and strncmp. These two can be handled using a single op
> code but using the Carry meta instruction to indicate “use the n
> version” and specify the register that contains n. Applying this logic
> to other similar cases reduces the number of op codes further. Even
> where you don’t need the additional register, you could use the presence
> of the Carry indicator bit to modify the instruction, e.g. strchr and
> strrchr. There are several choices for how to implement this. I don’t
> pretend to know the best one.
>
> Lastly, there are a number of functions, mostly the “nested loop” ones
> that would gain substantial benefit from being able to use an
> instruction that implements another string function as a “building
> block” to speed them up, even without a dedicated op code. See below
> for an example.
>
> Combining all of these, I think you could get down to a single digit
> number of new op codes for most of the desired functionality.
>
> The “nested loop” functions are the ones, such as strpbrk that require
> you to code a nested loop, the outer loop going over the first string,
> the inner loop going over the second string. The code that is in the
> outer loop, but not in the inner loop is just loading the next byte and
> checking it for a value of zero. This will work, but there is a
> performance issue. VVM loops can’t be nested. So, assuming you use VVM
> for the inner loop, the outer loop will cost relatively a lot, as it
> can’t use the streaming and multi byte compare capabilities of VVM.
> But, if you have the single instruction that searches for a character
> match in a string (strchr), you can use this single instruction (plus
> perhaps the Carry modifier), as the inner loop, thus enabling you to use
> VVM for the outer loop. So while you still have essentially a nested
> loop (the strchr instruction is essentially a loop), you have
> substantially sped up the operation.
>
> One last thing. While thinking about this, I had another idea regarding
> some of these nested loop functions. I am guessing that for a
> non-trivial percentage of these, the string containing the “to be looked
> for” characters is, in fact, a compile time constant. An example of
> this is searching text for any “white space” character. For these
> cases, perhaps we can use the features built into the My 66000, together
> with some things that hardware does better than software to do better
> than even the method I outlined in the previous paragraph.
>
> The idea is as follows. Let’s use strspn as an example. If the compiler
> sees that the second string is a compile time constant, it builds a 256
> bit map, with a one bit set for the value in each character in the
> string. It then emits an instruction giving the addresses of the first
> and this newly constructed second string as the two source operands. The
> result operand will be used to hold the count. When the instruction is
> encountered, the hardware loads the 256 bit (32 byte) bit map into one
> of the available buffers. It also loads the starting bytes of the first
> string into another streaming buffer. It then starts going through the
> first string, using the value of each byte as an index into the second
> string (this ability to have a 256 bit index into a 32 byte string is
> the thing that the hardware can do better (faster) than software. The
> hardware proceeds through the first string, looking up each byte and
> looking for a one bit, incrementing a counter for each byte. The
> presence of the carry flag could be used to reverse the sense of the
> test, giving strcspn. The hope is that it can do one byte per cycle, or
> perhaps one byte per two cycles. In any event, it certainly is much
> faster than a nested loop as it becomes a single loop. I don’t know
> if the advantages of this are worth the extra implementation cost, but I
> wanted to mention it.
>
> Let me conclude by re-emphasizing that this whole idea (single
> instructions for string functions) might not make sense, or might not be
> worthwhile, or it might be the wrong way to implement the functionality,
> etc. But I want to present it to get reactions and potential improvements..
>
>
<
Stephen has done a good job of lining up the pros and cons on converting
well known libraries into instructions, in fact, I have done so also in the case
of transcendental instructions.
<
I did fret about putting MM in My 66000 ISA.
<
I did fret about leaving some of the str* and mem* functions out of ISA.
<
By incorporating these functions into ISA you invoke the near necessity of
function unit microcode. Each of these functions has different sequencing
rules and necessities, and many of the sub-cases are sub-sets of each other..
For these kinds of sequences, a microcode sequencer is dé rigueur. Today
the name microcode has a bad taste in the minds of the almost knowing
and almost understanding. Just the marketing would be an uphill struggle.
<
In the end, I left these out as I thought that getting 3×-16× performance benefits
via something that already did make it into ISA was enough. Architecture is as
much about what to leave out as it is to what to put in.
<
>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)


Click here to read the complete article
Re: Proposal for Single instructions for string library functions on My 66000

<sarihq$hv5$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18024&group=comp.arch#18024

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bage...@gmail.com (Brian G. Lucas)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Mon, 21 Jun 2021 21:39:52 -0500
Organization: A noiseless patient Spider
Lines: 137
Message-ID: <sarihq$hv5$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Jun 2021 02:39:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e6f576869b9c49a1ccd47104722566e1";
logging-data="18405"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EKd8400XJ4G1CUpPFfzu4"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:8lwBsvNEiEqxk/xjtNuv3lsRlvs=
In-Reply-To: <sar8dp$d9$1@dont-email.me>
Content-Language: en-US
 by: Brian G. Lucas - Tue, 22 Jun 2021 02:39 UTC

On 6/21/21 6:47 PM, Stephen Fuld wrote:
> I was content to assume the use of VVM for most/all of the C string library
> functions on Mitches MY 66000.  The capabilities, especially the ones
> “underneath” the ISA level to stream values from/to memory and do multiple
> byte compares in a single cycle, make a huge improvement in the performance of
> these functions.
>
> However, once Mitch introduced the MM (Memory Move) instruction, which makes a
> single instruction out of what would otherwise be a short VVM sequence of
> instructions, that made me try to think about the issues involved in adding
> single instructions to implement (perhaps some of) the other string
> functions.  This is what I have so far.
>
> A few caveats.
>
> First, IANAHG, and this certainly not a detailed proposal.  I don’t know
> enough to make one.  It is just some thoughts on the issue.  Many of these
> ideas and statements may be wrong, and of course, I welcome corrections.
>
> Second, I looked a little but could not find information giving the frequency
> of use of each of the functions, so any ideas there are just my, probably
> poor, intuition. Again, I welcome more information.
>
> Third, I realize that Mitch added the MM instruction to speed up structure
> moves, and this means higher use of the MM instruction than strictly as a
> library function replacement.  But this provides the infrastructure making the
> implementation of similar instructions easier. (i.e. arbitrary number of TLB
> misses per instruction, interruptable/resumable instructions, etc.)  And
> several of the proposed instructions have other uses too.
>
> So, is it worth considering?
>
> The big pro to this is performance.  While the VVM solution eliminates the
> cost of fetching, decoding and executing the multiple instructions in the loop
> on other than the first pass, there is still more cost to executing the
> several instructions in the loop once (on the first pass) than doing so for a
> single instruction.    However this is usually a one time cost, and may not by
> huge when talking about a longer running function.  But if the function is, in
> fact, long running, then it it more likely to encounter an interrupt, in which
> case the “first pass” cost is encountered again.  Similarly, a single
> instruction takes less space in the instruction stream and the I-cache than a
> sequence of several instructions. (Note that this assumes in-lining the
> function. If it is called, there is even more overhead.)  And functions like
> strcat require two VVM loops to complete, although upon an interrupt, only one
> of them must incur the extra cost.  Again, I didn’t think this was huge, but
> the fact that Mitch thought it was worthwhile enough to eliminate made me
> reconsider.
>
> For some functions, there are opportunities for further substantial
> performance gains.  (see below)
>
> The big cons are the cost of implementing any new instruction.  First, there
> is the cost of the gates to do it, although for many of the functions, they
> are already there, so it adds only the logic to invoke them.
>
> Second is the cost of additional op-codes.  While there are twenty some
> functions, I don’t propose anywhere close to that many op codes.  I suspect
> that some of the functions, especially the ones that require an additional
> potential character substitution per character (e.g. the localization
> functions) aren’t good candidates for single instruction implementation.  I
> also probably wouldn’t do the errno lookup function, as presumably it is
> infrequently called and never in the critical path.
>
> And many of the functions are essentially small modifications of others, e.g.
> strcmp and strncmp.  These two can be handled using a single op code but using
> the Carry meta instruction to indicate “use the n version” and specify the
> register that contains n.  Applying this logic to other similar cases reduces
> the number of op codes further.  Even where you don’t need the additional
> register, you could use the presence of the Carry indicator bit to modify the
> instruction, e.g. strchr and strrchr.  There are several choices for how to
> implement this.  I don’t pretend to know the best one.
>
> Lastly, there are a number of functions, mostly the “nested loop” ones that
> would gain substantial benefit from being able to use an instruction that
> implements another string function as a “building block” to speed them up,
> even without a dedicated op code.  See below for an example.
>
> Combining all of these, I think you could get down to a single digit number of
> new op codes for most of the desired functionality.
>
> The “nested loop” functions are the ones, such as strpbrk that require you to
> code a nested loop, the outer loop going over the first string, the inner loop
> going over the second string.  The code that is in the outer loop, but not in
> the inner loop is just loading the next byte and checking it for a value of
> zero.  This will work, but there is a performance issue.  VVM loops can’t be
> nested.  So, assuming you use VVM for the inner loop,  the outer loop will
> cost relatively a lot, as it can’t use the streaming and multi byte compare
> capabilities of VVM. But, if you have the single instruction that searches for
> a character match in a string (strchr), you can use this single instruction
> (plus perhaps the Carry modifier), as the inner loop, thus enabling you to use
> VVM for the outer loop.  So while you still have essentially a nested loop
> (the strchr instruction is essentially a loop), you have substantially sped up
> the operation.
>
> One last thing.  While thinking about this, I had another idea regarding some
> of these nested loop functions.  I am guessing that for a non-trivial
> percentage of these, the string containing the “to be looked for” characters
> is, in fact, a compile time constant.  An example of this is searching text
> for any “white space” character.  For these cases, perhaps we can use the
> features built into the My 66000, together with some things that hardware does
> better than software to do better than even the method I outlined in the
> previous paragraph.
>
> The idea is as follows.  Let’s use strspn as an example. If the compiler sees
> that the second string is a compile time constant, it builds a 256 bit map,
> with a one bit set for the value in each character in the string.  It then
> emits an instruction giving the addresses of the first and this newly
> constructed second string as the two source operands. The result operand will
> be used to hold the count.  When the instruction is encountered, the hardware
> loads the 256 bit (32 byte) bit map into one of the available buffers.  It
> also loads the starting bytes of the first string into another streaming
> buffer.  It then starts going through the first string, using the value of
> each byte as an index into the second string (this ability to have a 256 bit
> index into a 32 byte string is the thing that the hardware can do better
> (faster) than software.  The hardware proceeds through the first string,
> looking up each byte and looking for a one bit, incrementing a counter for
> each byte.  The presence of the carry flag could be used to reverse the sense
> of the test, giving strcspn.  The hope is that it can do one byte per cycle,
> or perhaps one byte per two cycles.  In any event, it certainly is much faster
> than a nested loop as it becomes a single loop.    I don’t know if the
> advantages of this are worth the extra implementation cost, but I wanted to
> mention it.
>
> Let me conclude by re-emphasizing that this whole idea (single instructions
> for string functions) might not make sense, or might not be worthwhile, or it
> might be the wrong way to implement the functionality, etc.  But I want to
> present it to get reactions and potential improvements.
>
>
The MM instruction is widely useful no matter what the source language is.

IMHO implementing the C string library instructions is "preparing for the
previous war". I think we need to wait until we see what happens with Rust
(and perhaps Go and others) and determine what string (or array) primitives
are hot spots in applications written in more modern languages.

Brian

Re: Proposal for Single instructions for string library functions on My 66000

<sas6bi$kau$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18026&group=comp.arch#18026

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 22 Jun 2021 10:17:54 +0200
Organization: A noiseless patient Spider
Lines: 163
Message-ID: <sas6bi$kau$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me> <sarihq$hv5$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Jun 2021 08:17:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a10ad00632d8df3b3831679e1f782f9a";
logging-data="20830"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Bsno9r3Ptt0ORJImIHDB2+iSl+ldO9sA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:vNkdPqmVkpjs+p3p6ethaEGUO6s=
In-Reply-To: <sarihq$hv5$1@dont-email.me>
Content-Language: en-US
 by: Marcus - Tue, 22 Jun 2021 08:17 UTC

On 2021-06-22, Brian G. Lucas wrote:
> On 6/21/21 6:47 PM, Stephen Fuld wrote:
>> I was content to assume the use of VVM for most/all of the C string
>> library functions on Mitches MY 66000.  The capabilities, especially
>> the ones “underneath” the ISA level to stream values from/to memory
>> and do multiple byte compares in a single cycle, make a huge
>> improvement in the performance of these functions.
>>
>> However, once Mitch introduced the MM (Memory Move) instruction, which
>> makes a single instruction out of what would otherwise be a short VVM
>> sequence of instructions, that made me try to think about the issues
>> involved in adding single instructions to implement (perhaps some of)
>> the other string functions.  This is what I have so far.
>>
>> A few caveats.
>>
>> First, IANAHG, and this certainly not a detailed proposal.  I don’t
>> know enough to make one.  It is just some thoughts on the issue.  Many
>> of these ideas and statements may be wrong, and of course, I welcome
>> corrections.
>>
>> Second, I looked a little but could not find information giving the
>> frequency of use of each of the functions, so any ideas there are just
>> my, probably poor, intuition. Again, I welcome more information.
>>
>> Third, I realize that Mitch added the MM instruction to speed up
>> structure moves, and this means higher use of the MM instruction than
>> strictly as a library function replacement.  But this provides the
>> infrastructure making the implementation of similar instructions
>> easier. (i.e. arbitrary number of TLB misses per instruction,
>> interruptable/resumable instructions, etc.)  And several of the
>> proposed instructions have other uses too.
>>
>> So, is it worth considering?
>>
>> The big pro to this is performance.  While the VVM solution eliminates
>> the cost of fetching, decoding and executing the multiple instructions
>> in the loop on other than the first pass, there is still more cost to
>> executing the several instructions in the loop once (on the first
>> pass) than doing so for a single instruction.    However this is
>> usually a one time cost, and may not by huge when talking about a
>> longer running function.  But if the function is, in fact, long
>> running, then it it more likely to encounter an interrupt, in which
>> case the “first pass” cost is encountered again.  Similarly, a single
>> instruction takes less space in the instruction stream and the I-cache
>> than a sequence of several instructions. (Note that this assumes
>> in-lining the function. If it is called, there is even more
>> overhead.)  And functions like strcat require two VVM loops to
>> complete, although upon an interrupt, only one of them must incur the
>> extra cost.  Again, I didn’t think this was huge, but the fact that
>> Mitch thought it was worthwhile enough to eliminate made me reconsider.
>>
>> For some functions, there are opportunities for further substantial
>> performance gains.  (see below)
>>
>> The big cons are the cost of implementing any new instruction.  First,
>> there is the cost of the gates to do it, although for many of the
>> functions, they are already there, so it adds only the logic to invoke
>> them.
>>
>> Second is the cost of additional op-codes.  While there are twenty
>> some functions, I don’t propose anywhere close to that many op codes.
>> I suspect that some of the functions, especially the ones that require
>> an additional potential character substitution per character (e.g. the
>> localization functions) aren’t good candidates for single instruction
>> implementation.  I also probably wouldn’t do the errno lookup
>> function, as presumably it is infrequently called and never in the
>> critical path.
>>
>> And many of the functions are essentially small modifications of
>> others, e.g. strcmp and strncmp.  These two can be handled using a
>> single op code but using the Carry meta instruction to indicate “use
>> the n version” and specify the register that contains n.  Applying
>> this logic to other similar cases reduces the number of op codes
>> further.  Even where you don’t need the additional register, you could
>> use the presence of the Carry indicator bit to modify the instruction,
>> e.g. strchr and strrchr.  There are several choices for how to
>> implement this.  I don’t pretend to know the best one.
>>
>> Lastly, there are a number of functions, mostly the “nested loop” ones
>> that would gain substantial benefit from being able to use an
>> instruction that implements another string function as a “building
>> block” to speed them up, even without a dedicated op code.  See below
>> for an example.
>>
>> Combining all of these, I think you could get down to a single digit
>> number of new op codes for most of the desired functionality.
>>
>> The “nested loop” functions are the ones, such as strpbrk that require
>> you to code a nested loop, the outer loop going over the first string,
>> the inner loop going over the second string.  The code that is in the
>> outer loop, but not in the inner loop is just loading the next byte
>> and checking it for a value of zero.  This will work, but there is a
>> performance issue.  VVM loops can’t be nested.  So, assuming you use
>> VVM for the inner loop,  the outer loop will cost relatively a lot, as
>> it can’t use the streaming and multi byte compare capabilities of VVM.
>> But, if you have the single instruction that searches for a character
>> match in a string (strchr), you can use this single instruction (plus
>> perhaps the Carry modifier), as the inner loop, thus enabling you to
>> use VVM for the outer loop.  So while you still have essentially a
>> nested loop (the strchr instruction is essentially a loop), you have
>> substantially sped up the operation.
>>
>> One last thing.  While thinking about this, I had another idea
>> regarding some of these nested loop functions.  I am guessing that for
>> a non-trivial percentage of these, the string containing the “to be
>> looked for” characters is, in fact, a compile time constant.  An
>> example of this is searching text for any “white space” character.
>> For these cases, perhaps we can use the features built into the My
>> 66000, together with some things that hardware does better than
>> software to do better than even the method I outlined in the previous
>> paragraph.
>>
>> The idea is as follows.  Let’s use strspn as an example. If the
>> compiler sees that the second string is a compile time constant, it
>> builds a 256 bit map, with a one bit set for the value in each
>> character in the string.  It then emits an instruction giving the
>> addresses of the first and this newly constructed second string as the
>> two source operands. The result operand will be used to hold the
>> count.  When the instruction is encountered, the hardware loads the
>> 256 bit (32 byte) bit map into one of the available buffers.  It also
>> loads the starting bytes of the first string into another streaming
>> buffer.  It then starts going through the first string, using the
>> value of each byte as an index into the second string (this ability to
>> have a 256 bit index into a 32 byte string is the thing that the
>> hardware can do better (faster) than software.  The hardware proceeds
>> through the first string, looking up each byte and looking for a one
>> bit, incrementing a counter for each byte.  The presence of the carry
>> flag could be used to reverse the sense of the test, giving strcspn.
>> The hope is that it can do one byte per cycle, or perhaps one byte per
>> two cycles.  In any event, it certainly is much faster than a nested
>> loop as it becomes a single loop.    I don’t know if the advantages of
>> this are worth the extra implementation cost, but I wanted to mention it.
>>
>> Let me conclude by re-emphasizing that this whole idea (single
>> instructions for string functions) might not make sense, or might not
>> be worthwhile, or it might be the wrong way to implement the
>> functionality, etc.  But I want to present it to get reactions and
>> potential improvements.
>>
>>
> The MM instruction is widely useful no matter what the source language is.
>
> IMHO implementing the C string library instructions is "preparing for
> the previous war".  I think we need to wait until we see what happens
> with Rust (and perhaps Go and others) and determine what string (or
> array) primitives are hot spots in applications written in more modern
> languages.
>
> Brian


Click here to read the complete article
Re: Proposal for Single instructions for string library functions on My 66000

<sas8g4$77t$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18027&group=comp.arch#18027

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-da68-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions
on My 66000
Date: Tue, 22 Jun 2021 08:54:28 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sas8g4$77t$1@newsreader4.netcologne.de>
References: <sar8dp$d9$1@dont-email.me>
Injection-Date: Tue, 22 Jun 2021 08:54:28 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-da68-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:da68:0:7285:c2ff:fe6c:992d";
logging-data="7421"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 22 Jun 2021 08:54 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:

> However, once Mitch introduced the MM (Memory Move) instruction, which
> makes a single instruction out of what would otherwise be a short VVM
> sequence of instructions, that made me try to think about the issues
> involved in adding single instructions to implement (perhaps some of)
> the other string functions. This is what I have so far.

[...]

While C was an amazing language design for its time and especially
for the hardware constraints of the machine it was developed for,
some of its features have not aged well. Null-terminated strings
are one of these features.

I wouldn't try to implement those in hardware. The mem* functions,
however, are fair game (and probably already covered by the
MM instruction).

Re: Proposal for Single instructions for string library functions on My 66000

<sasg75$cj9$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18029&group=comp.arch#18029

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 22 Jun 2021 13:06:14 +0200
Organization: Aioe.org NNTP Server
Lines: 156
Message-ID: <sasg75$cj9$1@gioia.aioe.org>
References: <sar8dp$d9$1@dont-email.me>
NNTP-Posting-Host: XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Tue, 22 Jun 2021 11:06 UTC

Stephen Fuld wrote:
> I was content to assume the use of VVM for most/all of the C string
> library functions on Mitches MY 66000.  The capabilities, especially the
> ones “underneath” the ISA level to stream values from/to memory and
> do multiple byte compares in a single cycle, make a huge improvement in
> the performance of these functions.
>
> However, once Mitch introduced the MM (Memory Move) instruction, which
> makes a single instruction out of what would otherwise be a short VVM
> sequence of instructions, that made me try to think about the issues
> involved in adding single instructions to implement (perhaps some of)
> the other string functions.  This is what I have so far.
>
> A few caveats.
>
> First, IANAHG, and this certainly not a detailed proposal.  I don’t
> know enough to make one.  It is just some thoughts on the issue.  Many
> of these ideas and statements may be wrong, and of course, I welcome
> corrections.
>
> Second, I looked a little but could not find information giving the
> frequency of use of each of the functions, so any ideas there are just
> my, probably poor, intuition. Again, I welcome more information.
>
> Third, I realize that Mitch added the MM instruction to speed up
> structure moves, and this means higher use of the MM instruction than
> strictly as a library function replacement.  But this provides the
> infrastructure making the implementation of similar instructions easier.
> (i.e. arbitrary number of TLB misses per instruction,
> interruptable/resumable instructions, etc.)  And several of the proposed
> instructions have other uses too.
>
> So, is it worth considering?
>
> The big pro to this is performance.  While the VVM solution eliminates
> the cost of fetching, decoding and executing the multiple instructions
> in the loop on other than the first pass, there is still more cost to
> executing the several instructions in the loop once (on the first pass)
> than doing so for a single instruction.    However this is usually a one
> time cost, and may not by huge when talking about a longer running
> function.  But if the function is, in fact, long running, then it it
> more likely to encounter an interrupt, in which case the “first
> pass” cost is encountered again.  Similarly, a single instruction
> takes less space in the instruction stream and the I-cache than a
> sequence of several instructions. (Note that this assumes in-lining the
> function. If it is called, there is even more overhead.)  And functions
> like strcat require two VVM loops to complete, although upon an
> interrupt, only one of them must incur the extra cost.  Again, I
> didn’t think this was huge, but the fact that Mitch thought it was
> worthwhile enough to eliminate made me reconsider.
>
> For some functions, there are opportunities for further substantial
> performance gains.  (see below)
>
> The big cons are the cost of implementing any new instruction.  First,
> there is the cost of the gates to do it, although for many of the
> functions, they are already there, so it adds only the logic to invoke
> them.
>
> Second is the cost of additional op-codes.  While there are twenty some
> functions, I don’t propose anywhere close to that many op codes.  I
> suspect that some of the functions, especially the ones that require an
> additional potential character substitution per character (e.g. the
> localization functions) aren’t good candidates for single instruction
> implementation.  I also probably wouldn’t do the errno lookup
> function, as presumably it is infrequently called and never in the
> critical path.
>
> And many of the functions are essentially small modifications of others,
> e.g. strcmp and strncmp.  These two can be handled using a single op
> code but using the Carry meta instruction to indicate “use the n
> version” and specify the register that contains n.  Applying this
> logic to other similar cases reduces the number of op codes further.
> Even where you don’t need the additional register, you could use the
> presence of the Carry indicator bit to modify the instruction, e.g.
> strchr and strrchr.  There are several choices for how to implement
> this.  I don’t pretend to know the best one.
>
> Lastly, there are a number of functions, mostly the “nested loop”
> ones that would gain substantial benefit from being able to use an
> instruction that implements another string function as a “building
> block” to speed them up, even without a dedicated op code.  See below
> for an example.
>
> Combining all of these, I think you could get down to a single digit
> number of new op codes for most of the desired functionality.
>
> The “nested loop” functions are the ones, such as strpbrk that
> require you to code a nested loop, the outer loop going over the first
> string, the inner loop going over the second string.  The code that is
> in the outer loop, but not in the inner loop is just loading the next
> byte and checking it for a value of zero.  This will work, but there is
> a performance issue.  VVM loops can’t be nested.  So, assuming you use
> VVM for the inner loop,  the outer loop will cost relatively a lot, as
> it can’t use the streaming and multi byte compare capabilities of VVM.
> But, if you have the single instruction that searches for a character
> match in a string (strchr), you can use this single instruction (plus
> perhaps the Carry modifier), as the inner loop, thus enabling you to use
> VVM for the outer loop.  So while you still have essentially a nested
> loop (the strchr instruction is essentially a loop), you have
> substantially sped up the operation.
>
> One last thing.  While thinking about this, I had another idea regarding
> some of these nested loop functions.  I am guessing that for a
> non-trivial percentage of these, the string containing the “to be
> looked for” characters is, in fact, a compile time constant.  An
> example of this is searching text for any “white space” character.
> For these cases, perhaps we can use the features built into the My
> 66000, together with some things that hardware does better than software
> to do better than even the method I outlined in the previous paragraph.
>
> The idea is as follows.  Let’s use strspn as an example. If the
> compiler sees that the second string is a compile time constant, it
> builds a 256 bit map, with a one bit set for the value in each character
> in the string.  It then emits an instruction giving the addresses of the
> first and this newly constructed second string as the two source
> operands. The result operand will be used to hold the count.  When the
> instruction is encountered, the hardware loads the 256 bit (32 byte) bit
> map into one of the available buffers.  It also loads the starting bytes
> of the first string into another streaming buffer.  It then starts going
> through the first string, using the value of each byte as an index into
> the second string (this ability to have a 256 bit index into a 32 byte
> string is the thing that the hardware can do better (faster) than
> software.  The hardware proceeds through the first string, looking up
> each byte and looking for a one bit, incrementing a counter for each
> byte.  The presence of the carry flag could be used to reverse the sense
> of the test, giving strcspn.  The hope is that it can do one byte per
> cycle, or perhaps one byte per two cycles.  In any event, it certainly
> is much faster than a nested loop as it becomes a single loop.    I
> don’t know if the advantages of this are worth the extra
> implementation cost, but I wanted to mention it.
>
> Let me conclude by re-emphasizing that this whole idea (single
> instructions for string functions) might not make sense, or might not be
> worthwhile, or it might be the wrong way to implement the functionality,
> etc.  But I want to present it to get reactions and potential improvements.

This idea tends to break down when all text is utf8, i.e. you can still
handle all the 7-bit US ASCII chars this way but you have to exit out of
the inner loop each time you get to an extended unicode wide char.

I try to write such code that I can simply ignore all the utf8 issues,
but that isn't always possible.

I.e. for my next world count implementation I've already figured out
that I can count characters by skipping (setting the char count
increment to zero) all intermediate utf8 byte values, but counting words
the same way only works if all utf8 sequences that end with a given
final byte value are members of the same set: Either all space/word
separators, or all in-word chars.


Click here to read the complete article
Re: Proposal for Single instructions for string library functions on My 66000

<sasv5b$u9p$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18030&group=comp.arch#18030

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 22 Jun 2021 08:21:14 -0700
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <sasv5b$u9p$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<b000a2bf-af34-4f2e-a41b-756a987275a5n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Jun 2021 15:21:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="123f03ae72daa51b6f4132eb384db09c";
logging-data="31033"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18JLiMNUPu6yzretXKdjuhVdQHP7X2ZmuM="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:HxcXR5vs8eyb0NhgqUw7WplAjEw=
In-Reply-To: <b000a2bf-af34-4f2e-a41b-756a987275a5n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Tue, 22 Jun 2021 15:21 UTC

On 6/21/2021 5:26 PM, MitchAlsup wrote:
> On Monday, June 21, 2021 at 6:47:08 PM UTC-5, Stephen Fuld wrote:

big snip

>> Let me conclude by re-emphasizing that this whole idea (single
>> instructions for string functions) might not make sense, or might not be
>> worthwhile, or it might be the wrong way to implement the functionality,
>> etc. But I want to present it to get reactions and potential improvements.
>>
>>
> <
> Stephen has done a good job of lining up the pros and cons on converting
> well known libraries into instructions, in fact, I have done so also in the case
> of transcendental instructions.
> <
> I did fret about putting MM in My 66000 ISA.
> <
> I did fret about leaving some of the str* and mem* functions out of ISA.

I did sense some of your hesitation. Your response helps to clarify why.

> <
> By incorporating these functions into ISA you invoke the near necessity of
> function unit microcode. Each of these functions has different sequencing
> rules and necessities, and many of the sub-cases are sub-sets of each other.
> For these kinds of sequences, a microcode sequencer is dé rigueur.

OK. That brings up a couple of questions. Does the MM implementation
use microcode? Do the transcendental instructions use microcode?

I'm sure you see where I am going with these questions. If MM uses
microcode, then you have already "bitten the bullet". If not, what
subset of the kinds of instructions I have been talking about can you
also implement without microcode.

> Today
> the name microcode has a bad taste in the minds of the almost knowing
> and almost understanding.

:-)

> Just the marketing would be an uphill struggle.

:-( But you are probably right.

> <
> In the end, I left these out as I thought that getting 3×-16× performance benefits
> via something that already did make it into ISA was enough. Architecture is as
> much about what to leave out as it is to what to put in.

Of course. Thanks for your thoughtful response.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sasvif$1df$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18031&group=comp.arch#18031

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 22 Jun 2021 08:28:13 -0700
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <sasvif$1df$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me> <sarihq$hv5$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Jun 2021 15:28:15 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="123f03ae72daa51b6f4132eb384db09c";
logging-data="1455"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+9CmEkJWVDj1iu/w76sY9KSePRqP+YIYM="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:bak1SPhpNaA4cd+VR6tQp+7lUsc=
In-Reply-To: <sarihq$hv5$1@dont-email.me>
Content-Language: en-US
 by: Stephen Fuld - Tue, 22 Jun 2021 15:28 UTC

On 6/21/2021 7:39 PM, Brian G. Lucas wrote:
> On 6/21/21 6:47 PM, Stephen Fuld wrote:

big snip

>> Let me conclude by re-emphasizing that this whole idea (single
>> instructions for string functions) might not make sense, or might not
>> be worthwhile, or it might be the wrong way to implement the
>> functionality, etc.  But I want to present it to get reactions and
>> potential improvements.
>>
>>
> The MM instruction is widely useful no matter what the source language is.
>
> IMHO implementing the C string library instructions is "preparing for
> the previous war".  I think we need to wait until we see what happens
> with Rust (and perhaps Go and others) and determine what string (or
> array) primitives are hot spots in applications written in more modern
> languages.

Certainly a valid point, although there will certainly be a huge amount
of C/C++ code (and Java and other popular languages that have similar
functions) in use for a long time.

I spent a little time looking at the Rust book to see if what I proposed
was applicable. It seems it might be, though more of the functionality
is in the language proper rather than in a library. It would take more
study or someone more versed in Rust than I am to know for sure. I
haven't looked at Go at all.

Thanks for your thoughts.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sat03v$55d$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18032&group=comp.arch#18032

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 22 Jun 2021 08:37:34 -0700
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <sat03v$55d$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 22 Jun 2021 15:37:36 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="123f03ae72daa51b6f4132eb384db09c";
logging-data="5293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ukhrJh5/EoM2pPlqaQf7xKq+SYmKj8Zc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:I6yrFHbBrvk/+QiYsMko9NCrv0U=
In-Reply-To: <sas8g4$77t$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Stephen Fuld - Tue, 22 Jun 2021 15:37 UTC

On 6/22/2021 1:54 AM, Thomas Koenig wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>
>> However, once Mitch introduced the MM (Memory Move) instruction, which
>> makes a single instruction out of what would otherwise be a short VVM
>> sequence of instructions, that made me try to think about the issues
>> involved in adding single instructions to implement (perhaps some of)
>> the other string functions. This is what I have so far.
>
> [...]
>
> While C was an amazing language design for its time and especially
> for the hardware constraints of the machine it was developed for,
> some of its features have not aged well. Null-terminated strings
> are one of these features.

I never liked null-terminated strings, but they certainly have become
popular. Is there a consensus on what to replace them with in newer,
general purpose languages?

> I wouldn't try to implement those in hardware.

Certainly a valid position.

> The mem* functions,
> however, are fair game (and probably already covered by the
> MM instruction).

I don't think so. As I understand it, the MM instruction only does mem
move (and therefore mem copy), but not the others. You are suggesting a
subset of what I suggested, and I have no problems with that.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sat0a0$55d$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18033&group=comp.arch#18033

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Tue, 22 Jun 2021 08:40:48 -0700
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <sat0a0$55d$2@dont-email.me>
References: <sar8dp$d9$1@dont-email.me> <sasg75$cj9$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Jun 2021 15:40:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="123f03ae72daa51b6f4132eb384db09c";
logging-data="5293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jfYShzrUFyY5hvVqV8VTkhAMlnAjGYwc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:vm0ROGS+MCK1zrWsDhsYysUqnrU=
In-Reply-To: <sasg75$cj9$1@gioia.aioe.org>
Content-Language: en-US
 by: Stephen Fuld - Tue, 22 Jun 2021 15:40 UTC

On 6/22/2021 4:06 AM, Terje Mathisen wrote:
> Stephen Fuld wrote:

big snip

>> Let me conclude by re-emphasizing that this whole idea (single
>> instructions for string functions) might not make sense, or might not
>> be worthwhile, or it might be the wrong way to implement the
>> functionality, etc.  But I want to present it to get reactions and
>> potential improvements.
>
> This idea tends to break down when all text is utf8, i.e. you can still
> handle all the 7-bit US ASCII chars this way but you have to exit out of
> the inner loop each time you get to an extended unicode wide char.
>
> I try to write such code that I can simply ignore all the utf8 issues,
> but that isn't always possible.

Good point. And, of course, my proposal doesn't handle wide characters.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<3e9ce78d-58ee-4196-9bcb-3b5c570fc523n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18034&group=comp.arch#18034

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:7c07:: with SMTP id x7mr2619383qkc.417.1624376453509; Tue, 22 Jun 2021 08:40:53 -0700 (PDT)
X-Received: by 2002:a05:6808:aa1:: with SMTP id r1mr3652809oij.157.1624376453227; Tue, 22 Jun 2021 08:40:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.karotte.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 22 Jun 2021 08:40:53 -0700 (PDT)
In-Reply-To: <sasv5b$u9p$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3578:c706:7ce0:aad0; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3578:c706:7ce0:aad0
References: <sar8dp$d9$1@dont-email.me> <b000a2bf-af34-4f2e-a41b-756a987275a5n@googlegroups.com> <sasv5b$u9p$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3e9ce78d-58ee-4196-9bcb-3b5c570fc523n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 22 Jun 2021 15:40:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 73
 by: MitchAlsup - Tue, 22 Jun 2021 15:40 UTC

On Tuesday, June 22, 2021 at 10:21:18 AM UTC-5, Stephen Fuld wrote:
> On 6/21/2021 5:26 PM, MitchAlsup wrote:
> > On Monday, June 21, 2021 at 6:47:08 PM UTC-5, Stephen Fuld wrote:
> big snip
> >> Let me conclude by re-emphasizing that this whole idea (single
> >> instructions for string functions) might not make sense, or might not be
> >> worthwhile, or it might be the wrong way to implement the functionality,
> >> etc. But I want to present it to get reactions and potential improvements.
> >>
> >>
> > <
> > Stephen has done a good job of lining up the pros and cons on converting
> > well known libraries into instructions, in fact, I have done so also in the case
> > of transcendental instructions.
> > <
> > I did fret about putting MM in My 66000 ISA.
> > <
> > I did fret about leaving some of the str* and mem* functions out of ISA..
> I did sense some of your hesitation. Your response helps to clarify why.
> > <
> > By incorporating these functions into ISA you invoke the near necessity of
> > function unit microcode. Each of these functions has different sequencing
> > rules and necessities, and many of the sub-cases are sub-sets of each other.
> > For these kinds of sequences, a microcode sequencer is dé rigueur.
<
> OK. That brings up a couple of questions. Does the MM implementation
> use microcode? Do the transcendental instructions use microcode?
<
Transcendentals did not need microcode as the body of the polynomial
evaluation is identical for each one so this part is a pure sequencer.
Then the end cases are a simply HW switch on certain data values.
This could be done with microcode, but remains inside the bounds of
what one can do with a sequencer.
<
str* and mem* have enough special cases that the HW switch to
the right part of the sequencer might as well be microcode--although
it could be done without--you would need to figure out a way to
effectively call a HW sequence from within a sequence, and then
return whence you started. Doable, but the times I have encountered
such sequences I had access to microcode and let it handle the
scenarios.
<
>
> I'm sure you see where I am going with these questions. If MM uses
> microcode, then you have already "bitten the bullet". If not, what
> subset of the kinds of instructions I have been talking about can you
> also implement without microcode.
<
> > Today
> > the name microcode has a bad taste in the minds of the almost knowing
> > and almost understanding.
> :-)
> > Just the marketing would be an uphill struggle.
> :-( But you are probably right.
> > <
> > In the end, I left these out as I thought that getting 3×-16× performance benefits
> > via something that already did make it into ISA was enough. Architecture is as
> > much about what to leave out as it is to what to put in.
> Of course. Thanks for your thoughtful response.
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sat4ue$r57$3@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18035&group=comp.arch#18035

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-da68-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions
on My 66000
Date: Tue, 22 Jun 2021 16:59:58 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sat4ue$r57$3@newsreader4.netcologne.de>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
Injection-Date: Tue, 22 Jun 2021 16:59:58 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-da68-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:da68:0:7285:c2ff:fe6c:992d";
logging-data="27815"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 22 Jun 2021 16:59 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
> On 6/22/2021 1:54 AM, Thomas Koenig wrote:
>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>>
>>> However, once Mitch introduced the MM (Memory Move) instruction, which
>>> makes a single instruction out of what would otherwise be a short VVM
>>> sequence of instructions, that made me try to think about the issues
>>> involved in adding single instructions to implement (perhaps some of)
>>> the other string functions. This is what I have so far.
>>
>> [...]
>>
>> While C was an amazing language design for its time and especially
>> for the hardware constraints of the machine it was developed for,
>> some of its features have not aged well. Null-terminated strings
>> are one of these features.
>
> I never liked null-terminated strings, but they certainly have become
> popular. Is there a consensus on what to replace them with in newer,
> general purpose languages?

Let's google a bit.

Fortran: Length + data (older than C, but character variables are
newer, so I think this counts) (OK, I knew that before)

Go: It has slices, which look a bit like array descrptors
(or dope vectors) under used the hood of Fortran, except
they are user-visible.

Rust: Strings are not null-terminated.

C#: Strings are not null-terminated.

Swift: Strings are not null-terminated.

Those are probably the major modern languages that are likely to
to be compiled directly to machine code. It stands to reason
that they have to store a length somewhere.

[...]

Re: Proposal for Single instructions for string library functions on My 66000

<saulpt$777$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18040&group=comp.arch#18040

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 23 Jun 2021 08:53:48 +0200
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <saulpt$777$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 23 Jun 2021 06:53:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fb8324891469039b7eb150a32915c04c";
logging-data="7399"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ELzqxEHjGGJfm7E1yRWkoFcJKDswkDuU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:VUCTQB+KxQiua3DTvZFt0t1CoOw=
In-Reply-To: <sat4ue$r57$3@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Wed, 23 Jun 2021 06:53 UTC

On 2021-06-22, Thomas Koenig wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>> On 6/22/2021 1:54 AM, Thomas Koenig wrote:
>>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>>>
>>>> However, once Mitch introduced the MM (Memory Move) instruction, which
>>>> makes a single instruction out of what would otherwise be a short VVM
>>>> sequence of instructions, that made me try to think about the issues
>>>> involved in adding single instructions to implement (perhaps some of)
>>>> the other string functions. This is what I have so far.
>>>
>>> [...]
>>>
>>> While C was an amazing language design for its time and especially
>>> for the hardware constraints of the machine it was developed for,
>>> some of its features have not aged well. Null-terminated strings
>>> are one of these features.
>>
>> I never liked null-terminated strings, but they certainly have become
>> popular. Is there a consensus on what to replace them with in newer,
>> general purpose languages?
>
> Let's google a bit.
>
> Fortran: Length + data (older than C, but character variables are
> newer, so I think this counts) (OK, I knew that before)
>
> Go: It has slices, which look a bit like array descrptors
> (or dope vectors) under used the hood of Fortran, except
> they are user-visible.
>
> Rust: Strings are not null-terminated.
>
> C#: Strings are not null-terminated.
>
> Swift: Strings are not null-terminated.

C++: Length + data (but with a null-termination at data + 1, in order
to be compatible(ish) with C strings).

And do not forget that major string-heavy C++ applications often have
their own string classes, that typically are not null-terminated.

>
> Those are probably the major modern languages that are likely to
> to be compiled directly to machine code. It stands to reason
> that they have to store a length somewhere.
>
> [...]
>

Re: Proposal for Single instructions for string library functions on My 66000

<savk6i$o5m$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18041&group=comp.arch#18041

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 23 Jun 2021 08:32:32 -0700
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <savk6i$o5m$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 23 Jun 2021 15:32:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2e898a83ef4407011b829180542b7c6c";
logging-data="24758"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX182EosWzt4tqj+i9PK8N3euKWEmMtS16w8="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:FlYk5xH7UbyYQ+QQU1xyd1emCZw=
In-Reply-To: <sat4ue$r57$3@newsreader4.netcologne.de>
Content-Language: en-US
 by: Stephen Fuld - Wed, 23 Jun 2021 15:32 UTC

On 6/22/2021 9:59 AM, Thomas Koenig wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>> On 6/22/2021 1:54 AM, Thomas Koenig wrote:
>>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>>>
>>>> However, once Mitch introduced the MM (Memory Move) instruction, which
>>>> makes a single instruction out of what would otherwise be a short VVM
>>>> sequence of instructions, that made me try to think about the issues
>>>> involved in adding single instructions to implement (perhaps some of)
>>>> the other string functions. This is what I have so far.
>>>
>>> [...]
>>>
>>> While C was an amazing language design for its time and especially
>>> for the hardware constraints of the machine it was developed for,
>>> some of its features have not aged well. Null-terminated strings
>>> are one of these features.
>>
>> I never liked null-terminated strings, but they certainly have become
>> popular. Is there a consensus on what to replace them with in newer,
>> general purpose languages?
>
> Let's google a bit.
>
> Fortran: Length + data (older than C, but character variables are
> newer, so I think this counts) (OK, I knew that before)
>
> Go: It has slices, which look a bit like array descrptors
> (or dope vectors) under used the hood of Fortran, except
> they are user-visible.
>
> Rust: Strings are not null-terminated.
>
> C#: Strings are not null-terminated.
>
> Swift: Strings are not null-terminated.
>
> Those are probably the major modern languages that are likely to
> to be compiled directly to machine code. It stands to reason
> that they have to store a length somewhere.

First, thank you for the research. I have been thinking about your
proposal, and I think it boils down to several questions/issues. These
assume that there are single instructions for each of the mem* functions
in the C string library, but no others.

1. How easy is it for the various compilers to recognize the idioms and
generate code to use the new instructions? In C it is easy as they are
all function calls to well known names. I am sure it varies between
languages, but it might be a little harder.

2. How much help do these provide for the "traditional" C string
functions? As I noted, there is a lot of C code out there, and will be
for a long time. For example, memchr looking for a null is equivalent
to strlen. So this could be used to speed up the first part of strcat.

3. A related question is are there any minor modifications to the mem*
instructions to aid in other cases, especially the C string cases? As
an example, allow an option (the carry modifier?) to allow "early
termination" if a null is encountered before the count is exhausted?
How much would these "mess up" the implementation?

4. How much improvement to overall performance would these provide?

5. And last, but certainly not least, a question for Mitch, could these
be implemented within the hardware constraints you have laid out of no
microcode?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18042&group=comp.arch#18042

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7d8e:: with SMTP id c14mr516081qtd.350.1624463062703; Wed, 23 Jun 2021 08:44:22 -0700 (PDT)
X-Received: by 2002:a05:6808:8c1:: with SMTP id k1mr389153oij.99.1624463062381; Wed, 23 Jun 2021 08:44:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Jun 2021 08:44:22 -0700 (PDT)
In-Reply-To: <savk6i$o5m$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9d3d:1a2b:5ff8:19b; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9d3d:1a2b:5ff8:19b
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 23 Jun 2021 15:44:22 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 84
 by: MitchAlsup - Wed, 23 Jun 2021 15:44 UTC

On Wednesday, June 23, 2021 at 10:32:36 AM UTC-5, Stephen Fuld wrote:
> On 6/22/2021 9:59 AM, Thomas Koenig wrote:
> > Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> >> On 6/22/2021 1:54 AM, Thomas Koenig wrote:
> >>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> >>>
> >>>> However, once Mitch introduced the MM (Memory Move) instruction, which
> >>>> makes a single instruction out of what would otherwise be a short VVM
> >>>> sequence of instructions, that made me try to think about the issues
> >>>> involved in adding single instructions to implement (perhaps some of)
> >>>> the other string functions. This is what I have so far.
> >>>
> >>> [...]
> >>>
> >>> While C was an amazing language design for its time and especially
> >>> for the hardware constraints of the machine it was developed for,
> >>> some of its features have not aged well. Null-terminated strings
> >>> are one of these features.
> >>
> >> I never liked null-terminated strings, but they certainly have become
> >> popular. Is there a consensus on what to replace them with in newer,
> >> general purpose languages?
> >
> > Let's google a bit.
> >
> > Fortran: Length + data (older than C, but character variables are
> > newer, so I think this counts) (OK, I knew that before)
> >
> > Go: It has slices, which look a bit like array descrptors
> > (or dope vectors) under used the hood of Fortran, except
> > they are user-visible.
> >
> > Rust: Strings are not null-terminated.
> >
> > C#: Strings are not null-terminated.
> >
> > Swift: Strings are not null-terminated.
> >
> > Those are probably the major modern languages that are likely to
> > to be compiled directly to machine code. It stands to reason
> > that they have to store a length somewhere.
> First, thank you for the research. I have been thinking about your
> proposal, and I think it boils down to several questions/issues. These
> assume that there are single instructions for each of the mem* functions
> in the C string library, but no others.
>
> 1. How easy is it for the various compilers to recognize the idioms and
> generate code to use the new instructions? In C it is easy as they are
> all function calls to well known names. I am sure it varies between
> languages, but it might be a little harder.
>
> 2. How much help do these provide for the "traditional" C string
> functions? As I noted, there is a lot of C code out there, and will be
> for a long time. For example, memchr looking for a null is equivalent
> to strlen. So this could be used to speed up the first part of strcat.
>
> 3. A related question is are there any minor modifications to the mem*
> instructions to aid in other cases, especially the C string cases? As
> an example, allow an option (the carry modifier?) to allow "early
> termination" if a null is encountered before the count is exhausted?
> How much would these "mess up" the implementation?
>
> 4. How much improvement to overall performance would these provide?
<
Note: The LOOP instruction in My 66000 was designed to deal with both
counted and null terminated loops (and a few more). It seems to me that
providing all of the instructions one can synthesize with VVM and My
66000 instructions would consume a lot of space.
<
Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
implementations. So It is hard to see direct HW implementations at the
instruction level running "that much faster" most of the loops are governed
by cache access width (both VVM and direct HW implementation.)
>
> 5. And last, but certainly not least, a question for Mitch, could these
> be implemented within the hardware constraints you have laid out of no
> microcode?
<
There comes a time where sequences are best described in tabular formats,
one can have a direct translation into microcode, or one can pass the table
through the great fate eater in Verliog synthesis and have the same sequencer
without the ability to program it after fabrication.
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<savqo3$8q6$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18043&group=comp.arch#18043

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 23 Jun 2021 10:24:07 -0700
Organization: A noiseless patient Spider
Lines: 99
Message-ID: <savqo3$8q6$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 23 Jun 2021 17:24:19 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2e898a83ef4407011b829180542b7c6c";
logging-data="9030"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/0jm457aFzIGcMYMZQhRjS3yuvWbEbVgo="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:wF+2SyoTQyfrEqfkg/ghYSRFkrw=
In-Reply-To: <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Wed, 23 Jun 2021 17:24 UTC

On 6/23/2021 8:44 AM, MitchAlsup wrote:
> On Wednesday, June 23, 2021 at 10:32:36 AM UTC-5, Stephen Fuld wrote:
>> On 6/22/2021 9:59 AM, Thomas Koenig wrote:
>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>>>> On 6/22/2021 1:54 AM, Thomas Koenig wrote:
>>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>>>>>
>>>>>> However, once Mitch introduced the MM (Memory Move) instruction, which
>>>>>> makes a single instruction out of what would otherwise be a short VVM
>>>>>> sequence of instructions, that made me try to think about the issues
>>>>>> involved in adding single instructions to implement (perhaps some of)
>>>>>> the other string functions. This is what I have so far.
>>>>>
>>>>> [...]
>>>>>
>>>>> While C was an amazing language design for its time and especially
>>>>> for the hardware constraints of the machine it was developed for,
>>>>> some of its features have not aged well. Null-terminated strings
>>>>> are one of these features.
>>>>
>>>> I never liked null-terminated strings, but they certainly have become
>>>> popular. Is there a consensus on what to replace them with in newer,
>>>> general purpose languages?
>>>
>>> Let's google a bit.
>>>
>>> Fortran: Length + data (older than C, but character variables are
>>> newer, so I think this counts) (OK, I knew that before)
>>>
>>> Go: It has slices, which look a bit like array descrptors
>>> (or dope vectors) under used the hood of Fortran, except
>>> they are user-visible.
>>>
>>> Rust: Strings are not null-terminated.
>>>
>>> C#: Strings are not null-terminated.
>>>
>>> Swift: Strings are not null-terminated.
>>>
>>> Those are probably the major modern languages that are likely to
>>> to be compiled directly to machine code. It stands to reason
>>> that they have to store a length somewhere.
>> First, thank you for the research. I have been thinking about your
>> proposal, and I think it boils down to several questions/issues. These
>> assume that there are single instructions for each of the mem* functions
>> in the C string library, but no others.
>>
>> 1. How easy is it for the various compilers to recognize the idioms and
>> generate code to use the new instructions? In C it is easy as they are
>> all function calls to well known names. I am sure it varies between
>> languages, but it might be a little harder.
>>
>> 2. How much help do these provide for the "traditional" C string
>> functions? As I noted, there is a lot of C code out there, and will be
>> for a long time. For example, memchr looking for a null is equivalent
>> to strlen. So this could be used to speed up the first part of strcat.
>>
>> 3. A related question is are there any minor modifications to the mem*
>> instructions to aid in other cases, especially the C string cases? As
>> an example, allow an option (the carry modifier?) to allow "early
>> termination" if a null is encountered before the count is exhausted?
>> How much would these "mess up" the implementation?
>>
>> 4. How much improvement to overall performance would these provide?
> <
> Note: The LOOP instruction in My 66000 was designed to deal with both
> counted and null terminated loops (and a few more).

Yes. That was one of the things I was talking about in my original post
as "existing logic".

> It seems to me that
> providing all of the instructions one can synthesize with VVM and My
> 66000 instructions would consume a lot of space.

Sure, but no one is proposing that! The question is whether there is a
subset of "all" that is worthwhile? So far, there is, consisting of a
single member, MM. My question is, are there more?

> <
> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
> implementations. So It is hard to see direct HW implementations at the
> instruction level running "that much faster" most of the loops are governed
> by cache access width (both VVM and direct HW implementation.)

I understand. Thus my surprise that you implemented MM. The fact that
you did led to my trying to see if it was worthwhile to go further.

As I said, ISTM that the main advantages of a single instruction is
lower cost "start up" (and resume after interrupt), and lower
memory/I-cache usage. Once you are up and going, I agree that there is
essentially no advantage.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18044&group=comp.arch#18044

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:ffa2:: with SMTP id d2mr2591975qvv.50.1624492191647;
Wed, 23 Jun 2021 16:49:51 -0700 (PDT)
X-Received: by 2002:a4a:ab07:: with SMTP id i7mr1864391oon.89.1624492191371;
Wed, 23 Jun 2021 16:49:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 23 Jun 2021 16:49:51 -0700 (PDT)
In-Reply-To: <savqo3$8q6$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6d56:5a70:fa31:bedd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6d56:5a70:fa31:bedd
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 23 Jun 2021 23:49:51 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 23 Jun 2021 23:49 UTC

On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
> On 6/23/2021 8:44 AM, MitchAlsup wrote:
> > On Wednesday, June 23, 2021 at 10:32:36 AM UTC-5, Stephen Fuld wrote:
> >> On 6/22/2021 9:59 AM, Thomas Koenig wrote:
> >>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> >>>> On 6/22/2021 1:54 AM, Thomas Koenig wrote:
> >>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
> >>>>>
> >>>>>> However, once Mitch introduced the MM (Memory Move) instruction, which
> >>>>>> makes a single instruction out of what would otherwise be a short VVM
> >>>>>> sequence of instructions, that made me try to think about the issues
> >>>>>> involved in adding single instructions to implement (perhaps some of)
> >>>>>> the other string functions. This is what I have so far.
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>> While C was an amazing language design for its time and especially
> >>>>> for the hardware constraints of the machine it was developed for,
> >>>>> some of its features have not aged well. Null-terminated strings
> >>>>> are one of these features.
> >>>>
> >>>> I never liked null-terminated strings, but they certainly have become
> >>>> popular. Is there a consensus on what to replace them with in newer,
> >>>> general purpose languages?
> >>>
> >>> Let's google a bit.
> >>>
> >>> Fortran: Length + data (older than C, but character variables are
> >>> newer, so I think this counts) (OK, I knew that before)
> >>>
> >>> Go: It has slices, which look a bit like array descrptors
> >>> (or dope vectors) under used the hood of Fortran, except
> >>> they are user-visible.
> >>>
> >>> Rust: Strings are not null-terminated.
> >>>
> >>> C#: Strings are not null-terminated.
> >>>
> >>> Swift: Strings are not null-terminated.
> >>>
> >>> Those are probably the major modern languages that are likely to
> >>> to be compiled directly to machine code. It stands to reason
> >>> that they have to store a length somewhere.
> >> First, thank you for the research. I have been thinking about your
> >> proposal, and I think it boils down to several questions/issues. These
> >> assume that there are single instructions for each of the mem* functions
> >> in the C string library, but no others.
> >>
> >> 1. How easy is it for the various compilers to recognize the idioms and
> >> generate code to use the new instructions? In C it is easy as they are
> >> all function calls to well known names. I am sure it varies between
> >> languages, but it might be a little harder.
> >>
> >> 2. How much help do these provide for the "traditional" C string
> >> functions? As I noted, there is a lot of C code out there, and will be
> >> for a long time. For example, memchr looking for a null is equivalent
> >> to strlen. So this could be used to speed up the first part of strcat.
> >>
> >> 3. A related question is are there any minor modifications to the mem*
> >> instructions to aid in other cases, especially the C string cases? As
> >> an example, allow an option (the carry modifier?) to allow "early
> >> termination" if a null is encountered before the count is exhausted?
> >> How much would these "mess up" the implementation?
> >>
> >> 4. How much improvement to overall performance would these provide?
> > <
> > Note: The LOOP instruction in My 66000 was designed to deal with both
> > counted and null terminated loops (and a few more).
> Yes. That was one of the things I was talking about in my original post
> as "existing logic".
> > It seems to me that
> > providing all of the instructions one can synthesize with VVM and My
> > 66000 instructions would consume a lot of space.
> Sure, but no one is proposing that! The question is whether there is a
> subset of "all" that is worthwhile? So far, there is, consisting of a
> single member, MM. My question is, are there more?
> > <
> > Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
> > implementations. So It is hard to see direct HW implementations at the
> > instruction level running "that much faster" most of the loops are governed
> > by cache access width (both VVM and direct HW implementation.)
<
> I understand. Thus my surprise that you implemented MM. The fact that
> you did led to my trying to see if it was worthwhile to go further.
<
MM made the list, after much consideration, mainly because it is much more
compact way to move stuff around in memory without having to pass through
registers. LDM and STM made the cut but are so underutilized it would cause
no undo harm to remove them--the vast majority of the LDM/STM uses are
performed with ENTER and EXIT.
>
> As I said, ISTM that the main advantages of a single instruction is
> lower cost "start up" (and resume after interrupt), and lower
> memory/I-cache usage. Once you are up and going, I agree that there is
> essentially no advantage.
<
MM, ENTER, and EXIT made the cut for code density reasons.
ABS and CopySign made the cut for power reasons and more advanced
implementations can perform these in zero cycles (on the forwarding path).
<
The one I keep fretting about is BMM--bit matrix multiply--where you take
an operand (64-bits) and a matrix (64 entries of 64-bits each in memory)
And perform a BMM between these delivering a 64-bit result. There are all
sorts of clever bit manipulation that can be so expressed, and I can see a
way of executing it in 8-ish cycles.
<
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Proposal for Single instructions for string library functions on My 66000

<sb0pov$9uk$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18045&group=comp.arch#18045

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Wed, 23 Jun 2021 19:13:50 -0700
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <sb0pov$9uk$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 24 Jun 2021 02:13:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fb5acf1dd6a479b86c5e1d6d7e6b2e26";
logging-data="10196"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19sAEDsPBzX4GN6/Y/yVjX0"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:yzbX17P+BJN+w9B6r9Nyj+7U9fw=
In-Reply-To: <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Thu, 24 Jun 2021 02:13 UTC

On 6/23/2021 4:49 PM, MitchAlsup wrote:
> On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:

<snip>

>> I understand. Thus my surprise that you implemented MM. The fact that
>> you did led to my trying to see if it was worthwhile to go further.
> <
> MM made the list, after much consideration, mainly because it is much more
> compact way to move stuff around in memory without having to pass through
> registers. LDM and STM made the cut but are so underutilized it would cause
> no undo harm to remove them--the vast majority of the LDM/STM uses are
> performed with ENTER and EXIT.
>>
>> As I said, ISTM that the main advantages of a single instruction is
>> lower cost "start up" (and resume after interrupt), and lower
>> memory/I-cache usage. Once you are up and going, I agree that there is
>> essentially no advantage.
> <
> MM, ENTER, and EXIT made the cut for code density reasons.
> ABS and CopySign made the cut for power reasons and more advanced
> implementations can perform these in zero cycles (on the forwarding path).
> <
> The one I keep fretting about is BMM--bit matrix multiply--where you take
> an operand (64-bits) and a matrix (64 entries of 64-bits each in memory)
> And perform a BMM between these delivering a 64-bit result. There are all
> sorts of clever bit manipulation that can be so expressed, and I can see a
> way of executing it in 8-ish cycles.

ENTER and EXIT should make the list on RAS grounds; code density is a
useful side benefit.

MM is a special case of streams. I always prefer general solutions.

BMM is attractive functionality, but is better handled by a dedicated
co-processor IMO. Though that opinion may come because I have never
found a good way to fit it into the architecture paradigm.

Re: Proposal for Single instructions for string library functions on My 66000

<sb18d3$a58$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18046&group=comp.arch#18046

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Thu, 24 Jun 2021 08:23:30 +0200
Organization: Aioe.org NNTP Server
Lines: 56
Message-ID: <sb18d3$a58$1@gioia.aioe.org>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
NNTP-Posting-Host: XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Thu, 24 Jun 2021 06:23 UTC

MitchAlsup wrote:
> On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
>>> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
>>> implementations. So It is hard to see direct HW implementations at the
>>> instruction level running "that much faster" most of the loops are governed
>>> by cache access width (both VVM and direct HW implementation.)
> <
>> I understand. Thus my surprise that you implemented MM. The fact that
>> you did led to my trying to see if it was worthwhile to go further.
> <
> MM made the list, after much consideration, mainly because it is much more
> compact way to move stuff around in memory without having to pass through
> registers. LDM and STM made the cut but are so underutilized it would cause
> no undo harm to remove them--the vast majority of the LDM/STM uses are
> performed with ENTER and EXIT.

I am equally confident MM is worthwhile, there are just so many
instances of copying going on that having a single approved method that
is both near-light-speed fast and very compact (in code size) makes a
lot of sense.

We have been following Intel and AMD's various attempts to do the same
to REP MOVS wiht various "fast strings" implementation, and it is slowly
getting to where it can match unrolled SSE/AVX copy blocks, without
having any of those pesky alignment/block size limitations.
>>
>> As I said, ISTM that the main advantages of a single instruction is
>> lower cost "start up" (and resume after interrupt), and lower
>> memory/I-cache usage. Once you are up and going, I agree that there is
>> essentially no advantage.
> <
> MM, ENTER, and EXIT made the cut for code density reasons.
> ABS and CopySign made the cut for power reasons and more advanced
> implementations can perform these in zero cycles (on the forwarding path).

They fall out from your fast trancendentals?
> <
> The one I keep fretting about is BMM--bit matrix multiply--where you take
> an operand (64-bits) and a matrix (64 entries of 64-bits each in memory)
> And perform a BMM between these delivering a 64-bit result. There are all
> sorts of clever bit manipulation that can be so expressed, and I can see a
> way of executing it in 8-ish cycles.

8!!!

We are talking about 512 bytes of data in the matrix, so just loading it
needs 8 64-byte cache line loads: I think you must envision a way to
stream the matrix past the operand during those cache loads, or would
this be implemented with a dedicated 512-byte matrix cache inside the
CPU so that it could be reused multiple times for an array of operands?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Proposal for Single instructions for string library functions on My 66000

<sb1dth$e8v$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18048&group=comp.arch#18048

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Thu, 24 Jun 2021 00:57:36 -0700
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <sb1dth$e8v$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sb18d3$a58$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 24 Jun 2021 07:57:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fb5acf1dd6a479b86c5e1d6d7e6b2e26";
logging-data="14623"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Bgd1fKcBKkec0PEAU5G9j"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:A/1HfQLcF9Ol77X+YX1GDSaDS0A=
In-Reply-To: <sb18d3$a58$1@gioia.aioe.org>
Content-Language: en-US
 by: Ivan Godard - Thu, 24 Jun 2021 07:57 UTC

On 6/23/2021 11:23 PM, Terje Mathisen wrote:
> MitchAlsup wrote:
>> On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
>>>> Secondly: Using VVM one is running in the 8-32 I/C range on not that
>>>> wide
>>>> implementations. So It is hard to see direct HW implementations at the
>>>> instruction level running "that much faster" most of the loops are
>>>> governed
>>>> by cache access width (both VVM and direct HW implementation.)
>> <
>>> I understand. Thus my surprise that you implemented MM. The fact that
>>> you did led to my trying to see if it was worthwhile to go further.
>> <
>> MM made the list, after much consideration, mainly because it is much
>> more
>> compact way to move stuff around in memory without having to pass through
>> registers. LDM and STM made the cut but are so underutilized it would
>> cause
>> no undo harm to remove them--the vast majority of the LDM/STM uses are
>> performed with ENTER and EXIT.
>
> I am equally confident MM is worthwhile, there are just so many
> instances of copying going on that having a single approved method that
> is both near-light-speed fast and very compact (in code size) makes a
> lot of sense.
>
> We have been following Intel and AMD's various attempts to do the same
> to REP MOVS wiht various "fast strings" implementation, and it is slowly
> getting to where it can match unrolled SSE/AVX copy blocks, without
> having any of those pesky alignment/block size limitations.
>>>
>>> As I said, ISTM that the main advantages of a single instruction is
>>> lower cost "start up" (and resume after interrupt), and lower
>>> memory/I-cache usage. Once you are up and going, I agree that there is
>>> essentially no advantage.
>> <
>> MM, ENTER, and EXIT made the cut for code density reasons.
>> ABS and CopySign made the cut for power reasons and more advanced
>> implementations can perform these in zero cycles (on the forwarding
>> path).
>
> They fall out from your fast trancendentals?

They fall out of bit set/clear/test instructions in the ALU/shifter. I
don't see why you's want them as FP codes, unless you were using split
int/FP regs and didn't want to pay for the moves.

Re: Proposal for Single instructions for string library functions on My 66000

<c91df5a0-afc5-493b-886a-fdeb43d3d12an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18051&group=comp.arch#18051

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:883:: with SMTP id b3mr6709498qka.433.1624551748163;
Thu, 24 Jun 2021 09:22:28 -0700 (PDT)
X-Received: by 2002:a4a:d781:: with SMTP id c1mr5073913oou.23.1624551747942;
Thu, 24 Jun 2021 09:22:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 24 Jun 2021 09:22:27 -0700 (PDT)
In-Reply-To: <sb18d3$a58$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ad16:43cb:bd70:119d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ad16:43cb:bd70:119d
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me> <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sb18d3$a58$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c91df5a0-afc5-493b-886a-fdeb43d3d12an@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 24 Jun 2021 16:22:28 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 24 Jun 2021 16:22 UTC

On Thursday, June 24, 2021 at 1:23:34 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Wednesday, June 23, 2021 at 12:24:22 PM UTC-5, Stephen Fuld wrote:
> >>> Secondly: Using VVM one is running in the 8-32 I/C range on not that wide
> >>> implementations. So It is hard to see direct HW implementations at the
> >>> instruction level running "that much faster" most of the loops are governed
> >>> by cache access width (both VVM and direct HW implementation.)
> > <
> >> I understand. Thus my surprise that you implemented MM. The fact that
> >> you did led to my trying to see if it was worthwhile to go further.
> > <
> > MM made the list, after much consideration, mainly because it is much more
> > compact way to move stuff around in memory without having to pass through
> > registers. LDM and STM made the cut but are so underutilized it would cause
> > no undo harm to remove them--the vast majority of the LDM/STM uses are
> > performed with ENTER and EXIT.
> I am equally confident MM is worthwhile, there are just so many
> instances of copying going on that having a single approved method that
> is both near-light-speed fast and very compact (in code size) makes a
> lot of sense.
>
> We have been following Intel and AMD's various attempts to do the same
> to REP MOVS wiht various "fast strings" implementation, and it is slowly
> getting to where it can match unrolled SSE/AVX copy blocks, without
> having any of those pesky alignment/block size limitations.
> >>
> >> As I said, ISTM that the main advantages of a single instruction is
> >> lower cost "start up" (and resume after interrupt), and lower
> >> memory/I-cache usage. Once you are up and going, I agree that there is
> >> essentially no advantage.
> > <
> > MM, ENTER, and EXIT made the cut for code density reasons.
> > ABS and CopySign made the cut for power reasons and more advanced
> > implementations can perform these in zero cycles (on the forwarding path).
<
> They fall out from your fast trancendentals?
<
The SW implementations (to look at and define what HW does) is replete
with these, however the actual HW does them internally.
<
> > <
> > The one I keep fretting about is BMM--bit matrix multiply--where you take
> > an operand (64-bits) and a matrix (64 entries of 64-bits each in memory)
> > And perform a BMM between these delivering a 64-bit result. There are all
> > sorts of clever bit manipulation that can be so expressed, and I can see a
> > way of executing it in 8-ish cycles.
<
> 8!!!
<
Not a typo.
>
> We are talking about 512 bytes of data in the matrix, so just loading it
> needs 8 64-byte cache line loads: I think you must envision a way to
> stream the matrix past the operand during those cache loads, or would
> this be implemented with a dedicated 512-byte matrix cache inside the
> CPU so that it could be reused multiple times for an array of operands?
<
The amount of "math logic" is so small that it is entirely cache bound.
So if you make it capable of accessing a whole cache line in a cycle,
you can perform the BMM on eight (8) rows per cycle.
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Proposal for Single instructions for string library functions on My 66000

<sb3s0j$um$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18061&group=comp.arch#18061

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Fri, 25 Jun 2021 08:10:26 +0200
Organization: Aioe.org NNTP Server
Lines: 34
Message-ID: <sb3s0j$um$1@gioia.aioe.org>
References: <sar8dp$d9$1@dont-email.me>
<sas8g4$77t$1@newsreader4.netcologne.de> <sat03v$55d$1@dont-email.me>
<sat4ue$r57$3@newsreader4.netcologne.de> <savk6i$o5m$1@dont-email.me>
<dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me>
<77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sb18d3$a58$1@gioia.aioe.org>
<c91df5a0-afc5-493b-886a-fdeb43d3d12an@googlegroups.com>
NNTP-Posting-Host: XpBc3qS8ZZIa50C3RMoQkQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Fri, 25 Jun 2021 06:10 UTC

MitchAlsup wrote:
> On Thursday, June 24, 2021 at 1:23:34 AM UTC-5, Terje Mathisen wrote:
>> MitchAlsup wrote:
>>> The one I keep fretting about is BMM--bit matrix multiply--where you take
>>> an operand (64-bits) and a matrix (64 entries of 64-bits each in memory)
>>> And perform a BMM between these delivering a 64-bit result. There are all
>>> sorts of clever bit manipulation that can be so expressed, and I can see a
>>> way of executing it in 8-ish cycles.
> <
>> 8!!!
> <
> Not a typo.
>>
>> We are talking about 512 bytes of data in the matrix, so just loading it
>> needs 8 64-byte cache line loads: I think you must envision a way to
>> stream the matrix past the operand during those cache loads, or would
>> this be implemented with a dedicated 512-byte matrix cache inside the
>> CPU so that it could be reused multiple times for an array of operands?
> <
> The amount of "math logic" is so small that it is entirely cache bound.
> So if you make it capable of accessing a whole cache line in a cycle,
> you can perform the BMM on eight (8) rows per cycle.

I.e. my cache line/cycle streaming speed guess/suggestion.

I've been used to thinking in cache line chunks since the Larrabee tech
review. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Proposal for Single instructions for string library functions on My 66000

<76c49bf0-1993-4224-9415-f7aa9cd33ab9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18064&group=comp.arch#18064

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:e407:: with SMTP id q7mr10475696qkc.410.1624615246008;
Fri, 25 Jun 2021 03:00:46 -0700 (PDT)
X-Received: by 2002:a05:6808:919:: with SMTP id w25mr10675369oih.30.1624615245786;
Fri, 25 Jun 2021 03:00:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 25 Jun 2021 03:00:45 -0700 (PDT)
In-Reply-To: <fb0f4209-d134-4d71-88b6-c813b0305469n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me> <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sb18d3$a58$1@gioia.aioe.org> <c91df5a0-afc5-493b-886a-fdeb43d3d12an@googlegroups.com>
<sb3s0j$um$1@gioia.aioe.org> <fb0f4209-d134-4d71-88b6-c813b0305469n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <76c49bf0-1993-4224-9415-f7aa9cd33ab9n@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 25 Jun 2021 10:00:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Michael S - Fri, 25 Jun 2021 10:00 UTC

On Friday, June 25, 2021 at 12:33:00 PM UTC+3, robf...@gmail.com wrote:
> I am a little confused by the terminology??? I thought a bit-matrix multiply was multiplying bits between two 64-bit register values as 8x8 matrixes of bits. This can be done in a single clock cycle and requires only two 64-bit registers. Larger registers could be used for larger bit multiplies. A 256-bit SIMD register could hold a 16x16 bit matrix.

You seems less confused than I am.
For starter, I would like to understand if operations are modulo-2 or not.
Later on, I'd like to know what BMM is good for.

Re: Proposal for Single instructions for string library functions on My 66000

<7f552ee9-b040-4c98-be9e-7cd8147a881dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18065&group=comp.arch#18065

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:11c3:: with SMTP id n3mr9735012qtk.211.1624632008414;
Fri, 25 Jun 2021 07:40:08 -0700 (PDT)
X-Received: by 2002:a05:6830:4a:: with SMTP id d10mr10252311otp.81.1624632008138;
Fri, 25 Jun 2021 07:40:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 25 Jun 2021 07:40:07 -0700 (PDT)
In-Reply-To: <76c49bf0-1993-4224-9415-f7aa9cd33ab9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a93f:cb3c:8a11:3663;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a93f:cb3c:8a11:3663
References: <sar8dp$d9$1@dont-email.me> <sas8g4$77t$1@newsreader4.netcologne.de>
<sat03v$55d$1@dont-email.me> <sat4ue$r57$3@newsreader4.netcologne.de>
<savk6i$o5m$1@dont-email.me> <dd4475a4-d564-4bda-9e0a-869034a60011n@googlegroups.com>
<savqo3$8q6$1@dont-email.me> <77ce766c-4255-4a77-80ae-7e5235d12482n@googlegroups.com>
<sb18d3$a58$1@gioia.aioe.org> <c91df5a0-afc5-493b-886a-fdeb43d3d12an@googlegroups.com>
<sb3s0j$um$1@gioia.aioe.org> <fb0f4209-d134-4d71-88b6-c813b0305469n@googlegroups.com>
<76c49bf0-1993-4224-9415-f7aa9cd33ab9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7f552ee9-b040-4c98-be9e-7cd8147a881dn@googlegroups.com>
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 25 Jun 2021 14:40:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Fri, 25 Jun 2021 14:40 UTC

On Friday, June 25, 2021 at 5:00:47 AM UTC-5, Michael S wrote:
> On Friday, June 25, 2021 at 12:33:00 PM UTC+3, robf...@gmail.com wrote:
> > I am a little confused by the terminology??? I thought a bit-matrix multiply was multiplying bits between two 64-bit register values as 8x8 matrixes of bits. This can be done in a single clock cycle and requires only two 64-bit registers. Larger registers could be used for larger bit multiplies. A 256-bit SIMD register could hold a 16x16 bit matrix.
<
You could make an 8×8 version and then use loops to make the larger one.
But if you do this, you are going to be in the 64 cycle range for a 64×64 BMM
while my target is 8 cycles. The "math" is so simple that orchestrating the
loop in SW is just slow (and counterproductive.)
<
> You seems less confused than I am.
> For starter, I would like to understand if operations are modulo-2 or not..
<
# define index(i,j) (((j)<<3)+(i))

uint64_t BMM( uint64_t S1, uint64_t S2 )
{ uint64_t i, j, k,
rd;

for( i = 0; i < 8; i++ )
for( j = 0; j < 8; j++ )
for( k = 0; k < 8; k++ )
if( S )
rd<index(i,j)>^=S1<index(i,k)>&S2<index(k,j)>;
else
rd<index(i,j)>|=S1<index(i,k)>&S2<index(k,j)>;
return rd;
} <
> Later on, I'd like to know what BMM is good for.
<
Moving arbitrary bit positions around to arbitrary bit result locations while
also performing bit multiplication (either AND or XOR) and accumulation (OR).

Re: Proposal for Single instructions for string library functions on My 66000

<sb5t0g$jrq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18073&group=comp.arch#18073

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Proposal for Single instructions for string library functions on
My 66000
Date: Fri, 25 Jun 2021 19:37:19 -0500
Organization: A noiseless patient Spider
Lines: 85
Message-ID: <sb5t0g$jrq$1@dont-email.me>
References: <sar8dp$d9$1@dont-email.me> <sarihq$hv5$1@dont-email.me>
<sasvif$1df$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 26 Jun 2021 00:39:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fea811e399dfb52ece867d39a1d8c2bd";
logging-data="20346"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/KkJNGicWVBt5IZIAlx70k"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:WsJlLHztg1yOCLTLMOOUPdVpGqQ=
In-Reply-To: <sasvif$1df$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sat, 26 Jun 2021 00:37 UTC

On 6/22/2021 10:28 AM, Stephen Fuld wrote:
> On 6/21/2021 7:39 PM, Brian G. Lucas wrote:
>> On 6/21/21 6:47 PM, Stephen Fuld wrote:
>
> big snip
>
>>> Let me conclude by re-emphasizing that this whole idea (single
>>> instructions for string functions) might not make sense, or might not
>>> be worthwhile, or it might be the wrong way to implement the
>>> functionality, etc.  But I want to present it to get reactions and
>>> potential improvements.
>>>
>>>
>> The MM instruction is widely useful no matter what the source language
>> is.
>>
>> IMHO implementing the C string library instructions is "preparing for
>> the previous war".  I think we need to wait until we see what happens
>> with Rust (and perhaps Go and others) and determine what string (or
>> array) primitives are hot spots in applications written in more modern
>> languages.
>
> Certainly a valid point, although there will certainly be a huge amount
> of C/C++ code (and Java and other popular languages that have similar
> functions) in use for a long time.
>
> I spent a little time looking at the Rust book to see if what I proposed
> was applicable.  It seems it might be, though more of the functionality
> is in the language proper rather than in a library.  It would take more
> study or someone more versed in Rust than I am to know for sure.  I
> haven't looked at Go at all.
>

Expecting much standardization between these language is probably a stretch.

Though, one possibility for strings could be (for bare character pointers):
Pointer points at start of string data;
String data ends in NUL byte.

However, this is not the end of the story:
str[-1]==0, Plain Null terminated string
We are pointing at the start.
str[-1]==01..7F, We are pointing somewhere to the string interior;
str[-1]==80..BF, We are pointing somewhere to the string interior;
str[-1]==C0..EF && str[0]==80..BF, String Interior.
str[-1]==C0..EF && str[0]!=80..BF, Start of String, Reverse VLN.
...

Reverse VLN:
80..BF: Length (0000..003F)
C0..DF: Length (0040..07FF)
E0..EF: Length (0800..FFFF)
...

Essentially, Reverse VLN is effectively sort of like a UTF-8 codepoint
but encoded backwards. The Reverse VLN is preceded either with a
meta-type-tag or NUL byte.

One advantage of these strings is that they are partially backwards
compatible with C strings, but can allow some more capabilities (such as
scanning backwards to find the start of a string if given a pointer to
its interior).

Unlike plain C strings, they would require double-ended termination, but
for string tables, the start and end terminators between adjacent short
strings could be merged.

Possibly, strings longer than a certain minimum could be encoded by
default with a length prefix. This format will assume that the character
data is stored as either ASCII or UTF-8.

A pair of NUL bytes could also encode the start or end marker of a
string table.

A similar scheme can be used for UTF-16 strings but with backwards
surrogate pairs or similar as the start-of-string length marker.

As for whether special string instructions belong in an ISA, I don't
personally believe so. Packed byte-compare comes close, but arguably has
other uses as well.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor