Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Just think of a computer as hardware you can program." -- Nigel de la Tierre


devel / comp.arch / Re: Clamping. was: Compact representation for common integer constants

SubjectAuthor
* Compact representation for common integer constantsJohnG
+* Re: Compact representation for common integer constantsIvan Godard
|+- Re: Compact representation for common integer constantsDavid Brown
|`* Re: Compact representation for common integer constantsJohnG
| `* Re: Compact representation for common integer constantsBGB
|  `* Re: Compact representation for common integer constantsMitchAlsup
|   `* Re: Compact representation for common integer constantsBGB
|    `* Re: Compact representation for common integer constantsThomas Koenig
|     +- Re: Compact representation for common integer constantsMitchAlsup
|     `* Re: Compact representation for common integer constantsBGB
|      `* Re: Compact representation for common integer constantsMitchAlsup
|       `* Re: Compact representation for common integer constantsIvan Godard
|        +- Re: Compact representation for common integer constantsMarcus
|        +* Re: Compact representation for common integer constantsBGB
|        |`* Re: Compact representation for common integer constantsMitchAlsup
|        | +* Clamping. was: Compact representation for common integer constantsIvan Godard
|        | |+* Re: Clamping. was: Compact representation for common integer constantsMitchAlsup
|        | ||`* Re: Clamping. was: Compact representation for common integerIvan Godard
|        | || `- Re: Clamping. was: Compact representation for common integer constantsMitchAlsup
|        | |`* Re: Clamping. was: Compact representation for common integerBGB
|        | | `* Re: Clamping. was: Compact representation for common integerIvan Godard
|        | |  `- Re: Clamping. was: Compact representation for common integer constantsMitchAlsup
|        | +* Re: Compact representation for common integer constantsMarcus
|        | |`* Re: Compact representation for common integer constantsMitchAlsup
|        | | `* Re: Compact representation for common integer constantsDavid Brown
|        | |  `* Re: Compact representation for common integer constantsMitchAlsup
|        | |   +- Re: Compact representation for common integer constantsThomas Koenig
|        | |   `* Re: Compact representation for common integer constantsDavid Brown
|        | |    `- Re: Compact representation for common integer constantsMitchAlsup
|        | `* Re: Compact representation for common integer constantsThomas Koenig
|        |  +- Re: Compact representation for common integer constantsAnton Ertl
|        |  `* Re: Compact representation for common integer constantsMitchAlsup
|        |   `* Re: Compact representation for common integer constantsThomas Koenig
|        |    +* Re: Compact representation for common integer constantsAnton Ertl
|        |    |`* Re: Compact representation for common integer constantsBrian G. Lucas
|        |    | +* Re: Compact representation for common integer constantsThomas Koenig
|        |    | |`- Re: Compact representation for common integer constantsBrian G. Lucas
|        |    | +- Re: Compact representation for common integer constantsStefan Monnier
|        |    | `* Re: Compact representation for common integer constantsAnton Ertl
|        |    |  `* Re: Compact representation for common integer constantsThomas Koenig
|        |    |   +* Re: Compact representation for common integer constantsAnton Ertl
|        |    |   |`* Re: Compact representation for common integer constantsThomas Koenig
|        |    |   | `- Re: Compact representation for common integer constantsAnton Ertl
|        |    |   `* Re: Compact representation for common integer constantsTerje Mathisen
|        |    |    `- Re: Compact representation for common integer constantsAnton Ertl
|        |    `* Re: Compact representation for common integer constantsMitchAlsup
|        |     `* Re: Compact representation for common integer constantsThomas Koenig
|        |      `* Re: Compact representation for common integer constantsBrian G. Lucas
|        |       `* Re: Compact representation for common integer constantsThomas Koenig
|        |        +* Re: Compact representation for common integer constantsMitchAlsup
|        |        |`- Re: Compact representation for common integer constantsThomas Koenig
|        |        +* Re: Compact representation for common integer constantsAnton Ertl
|        |        |+* Re: Compact representation for common integer constantsThomas Koenig
|        |        ||+* Re: Compact representation for common integer constantsMitchAlsup
|        |        |||`* Re: Compact representation for common integer constantsThomas Koenig
|        |        ||| `- Re: Compact representation for common integer constantsMitchAlsup
|        |        ||`* Re: Compact representation for common integer constantsAnton Ertl
|        |        || +* Re: Compact representation for common integer constantsMitchAlsup
|        |        || |+* Re: Compact representation for common integer constantsEricP
|        |        || ||+* Re: Compact representation for common integer constantsThomas Koenig
|        |        || |||+- Re: Compact representation for common integer constantsMitchAlsup
|        |        || |||+* Re: Compact representation for common integer constantsEricP
|        |        || ||||`* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || |||| `* Re: Compact representation for common integer constantsDavid Brown
|        |        || ||||  `* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || ||||   `* Re: Compact representation for common integer constantsDavid Brown
|        |        || ||||    `- Re: Compact representation for common integer constantsTerje Mathisen
|        |        || |||`* Re: Compact representation for common integer constantsStephen Fuld
|        |        || ||| `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || |||  +- Re: Compact representation for common integer constantsStephen Fuld
|        |        || |||  `* Re: Compact representation for common integer constantsBill Findlay
|        |        || |||   `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || |||    `- Re: Compact representation for common integer constantsBill Findlay
|        |        || ||+* Re: Compact representation for common integer constantsThomas Koenig
|        |        || |||+* Re: Compact representation for common integer constantsStephen Fuld
|        |        || ||||`- Re: Compact representation for common integer constantsThomas Koenig
|        |        || |||`- Re: Compact representation for common integer constantsEricP
|        |        || ||`* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || +* Re: Compact representation for common integer constantsNiklas Holsti
|        |        || || |`* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || | `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |  `* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || |   `* Re: Compact representation for common integer constantsEricP
|        |        || || |    +* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |`* Re: Compact representation for common integer constantsEricP
|        |        || || |    | `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |  `* Re: Compact representation for common integer constantsThomas Koenig
|        |        || || |    |   +- Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |   `* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |    |    `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |     +- Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |    |     `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |      `- Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |    `* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || |     `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |      +- Re: Compact representation for common integer constantsBill Findlay
|        |        || || |      +* Re: Compact representation for common integer constantsThomas Koenig
|        |        || || |      |+* Re: Compact representation for common integer constantsAnton Ertl
|        |        || || |      ||`* Re: Compact representation for common integer constantsThomas Koenig
|        |        || || |      || `- Re: Compact representation for common integer constantsAnton Ertl
|        |        || || |      |`* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |      +* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |      `* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || `* Re: Compact representation for common integer constantsEricP
|        |        || |`- Re: Compact representation for common integer constantsAnton Ertl
|        |        || `* Re: Compact representation for common integer constantsThomas Koenig
|        |        |`* Re: Compact representation for common integer constantsMitchAlsup
|        |        `* Re: Compact representation for common integer constantsBrian G. Lucas
|        `* Re: Compact representation for common integer constantsQuadibloc
+* Re: Compact representation for common integer constantsBGB
`* Re: Compact representation for common integer constantsJohn Levine

Pages:123456789101112131415
Re: FP8 (was Compact representation for common integer constants)

<s7dh0j$mj6$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16608&group=comp.arch#16608

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!/FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Tue, 11 May 2021 10:58:58 +0200
Organization: Aioe.org NNTP Server
Lines: 38
Message-ID: <s7dh0j$mj6$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org> <s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org> <ygnzgx2fb2b.fsf@y.z>
NNTP-Posting-Host: /FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Tue, 11 May 2021 08:58 UTC

Josh Vanderhoof wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>
>> MitchAlsup wrote:
>>> On Sunday, May 9, 2021 at 1:45:06 PM UTC-5, Stefan Monnier wrote:
>>>> Stefan Monnier [2021-05-09 14:35:28] wrote:
>>>>>> Yes that makes sense. Specifically for FP8, the increased dynamic range
>>>>>> is by far the most important trait. I've even seen examples of using
>>>>>> 1:5:2 in DNN:s (deep neural networks) where dynamic range is often more
>>>>>> important than precision.
>>>>> Indeed 1:5:2 is arguably more useful than 1:4:3.
>>>> An alternative of course is to use a log-base representation. So your
>>>> 8bit data is split 1:7 between a sign and an exponent, with no mantissa
>>>> at all and then you choose your dynamic range by picking the base.
>>>> Multiplication is easy but addition is takes more effort.
>>> <
>>> Really ?!?!?
>>> Both are implemented with table look ups with concatenated values as the
>>> address.
>>
>> Agreed, in a HW implementation you don't even need a full 64K (8+8
>> bits), since the sign logic can go on in parallel with the table
>> access.
>>
>> I.e. 14-bit tables for each operation is sufficient, so that is about
>> 14 KB x 4 = 56 KB of rom space?
>
> Don't you only need half the table for the commutative ops?
>
Even for FADD/FMUL you need the full table unless you want to start by
sorting the inputs, and you can't do that for FSUB/FDIV so the savings
are relatively small.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16613&group=comp.arch#16613

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Tue, 11 May 2021 10:31:22 -0400
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
<s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="6fcdc590312139b22c872e589820db70";
logging-data="25119"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18o4YqNIiMTrOUsgJgZw9o1"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:Yb7cOw10kfXnJb8RUy2ydOeWIUs=
sha1:OUcmlhqCRYoOwQrbmeEEoBhefmw=
 by: Stefan Monnier - Tue, 11 May 2021 14:31 UTC

>>> An alternative of course is to use a log-base representation. So your
>>> 8bit data is split 1:7 between a sign and an exponent, with no mantissa
>>> at all and then you choose your dynamic range by picking the base.
>>> Multiplication is easy but addition is takes more effort.
>> <
>> Really ?!?!?
>> Both are implemented with table look ups with concatenated values as the
>> address.
>
> Agreed, in a HW implementation you don't even need a full 64K (8+8 bits),
> since the sign logic can go on in parallel with the table access.
>
> I.e. 14-bit tables for each operation is sufficient, so that is about 14 KB
> x 4 = 56 KB of rom space?

For FMUL and FDIV, a logarithm based representation means that these
operations are implemented as addition/substraction of the exponent,
which I don't think deserves to be implemented via table lookup.

So we're only really talking about 2 operations (FADD and FSUB) which
reduce to a single operation after fiddling with the sign.

I'm sill not sure if a 32kB table is the best choice for FADD, tho since
with the low-precision of FP8, I'd expect that of the 32k different
possible cases, the overwhelming majority of cases fall into the
situation where x + y ≃ max(x, y) because the smaller of the two is
smaller than the relative error of the larger (and I think we can find
those cases cheaply by subtracting the two exponents and checking if
it's magnitude is larger than the threshold).

Stefan

Re: The old RISC-vs-CISC

<7d9da68e-5aae-4619-9593-b7496f2c7725n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16614&group=comp.arch#16614

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:404a:: with SMTP id n71mr28858235qka.330.1620748396487;
Tue, 11 May 2021 08:53:16 -0700 (PDT)
X-Received: by 2002:a4a:8311:: with SMTP id f17mr20818788oog.83.1620748396190;
Tue, 11 May 2021 08:53:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 11 May 2021 08:53:15 -0700 (PDT)
In-Reply-To: <s7decb$1c7j$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me> <0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me> <f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me> <55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me> <62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com> <e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7ccdn$28b$1@dont-email.me> <s7decb$1c7j$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7d9da68e-5aae-4619-9593-b7496f2c7725n@googlegroups.com>
Subject: Re: The old RISC-vs-CISC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 11 May 2021 15:53:16 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 11 May 2021 15:53 UTC

On Tuesday, May 11, 2021 at 3:14:10 AM UTC-5, Terje Mathisen wrote:
> Ivan Godard wrote:
> > On 5/10/2021 2:11 PM, MitchAlsup wrote:
> >> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
> >>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
> >>> may also store the return address in the target register allowing
> >>> conditional branch to subroutine.
> >> <
> >> How often do you find conditional branching to a subroutine to be
> >> effective ?
> >
> > Very frequent in Mill code, due to predication of calls when EBBs are
> > folded together to let both then and else run interleaved. Probably rare
> > otherwise. Consequently asking "how often do you see?" is not very
> > informative unless also asking whether the compiler takes advantage of
> > the conditional.
> A conditional function call would be very useful whenever you write any
> code that process a stream of data, using a local buffer which you have
> to refill whenever you reach the low-water mark.
>
> I.e.
>
> if (amount < low_limit) fill_buffer()
>
> This happens a lot in most stream/file library code, as well as wjen
> processing bitstreams for a codec.
<
But does the code not look more like::

if( buf->amount < low_limit ) buf->fill_pointer = fill_buffer( buf->file, buf->fill_pointer );
<
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s7eep9$no2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16620&group=comp.arch#16620

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Tue, 11 May 2021 19:27:05 +0200
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <s7eep9$no2$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me>
<f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me>
<55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me>
<62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com>
<e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 May 2021 17:27:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c8679b4e2bd34864a43c7e801490a814";
logging-data="24322"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Iglo9T548ASl4EJgFyAXuDVQwqRvC6zg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:+JR4rkVWtyu+D7HQdDXmYlbGNjM=
In-Reply-To: <e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
Content-Language: en-US
 by: Marcus - Tue, 11 May 2021 17:27 UTC

On 2021-05-10, MitchAlsup wrote:
> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It may also store the return address in the target register allowing conditional branch to subroutine.
> <
> How often do you find conditional branching to a subroutine to be effective ?

I had conditional branch-and-link in my ISA at one point (probably
inspired by POWER or some such ISA), but then I realized that it
actually implied a conditional move to the link register, which
was something I wanted to avoid (plus the instructions ate up valuable
instruction encoding space) - so I removed those instructions.

>
> {I remember using this a lot in 8085 assembly coding, but when writing in C I seldom find
> it profitable because all the arguments had to be in the proper registers before the condition
> to use the conditionality of the call.
>
> Also, do you have a conditional return, or does the epilogue of the subroutine "get in the way" ??
> <
>> Operation for BEQ:
>> Rt = IP + 8
>> If (Ra = Rb)
>> If (Rc = 63)
>> IP = IP + Displacement
>> Else
>> IP = Rc + Displacement
>>
>> There really needs to be only a small number of return address registers. So in a couple of designs I have trimmed the target register down to two bits allowing three return address registers, and then extended the constant field of the JAL instruction by three extra bits. It makes quite a difference being able to branch +-16MB for instance instead of 2MB. 2MB is not enough for some code.
> <
> I got ±128MB from my ISA.
>

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s7em2f$7gf$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16623&group=comp.arch#16623

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Tue, 11 May 2021 14:30:16 -0500
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <s7em2f$7gf$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me>
<f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me>
<55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me>
<62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com>
<e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7eep9$no2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 May 2021 19:31:27 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="588833796df27a08939633a5f2cccbe9";
logging-data="7695"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/yndZzCoJmsnMfq5wfqbnI"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:8ooVg9kYuVjsJV2mRjDaqy3podE=
In-Reply-To: <s7eep9$no2$1@dont-email.me>
Content-Language: en-US
 by: BGB - Tue, 11 May 2021 19:30 UTC

On 5/11/2021 12:27 PM, Marcus wrote:
> On 2021-05-10, MitchAlsup wrote:
>> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
>>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
>>> may also store the return address in the target register allowing
>>> conditional branch to subroutine.
>> <
>> How often do you find conditional branching to a subroutine to be
>> effective ?
>
> I had conditional branch-and-link in my ISA at one point (probably
> inspired by POWER or some such ISA), but then I realized that it
> actually implied a conditional move to the link register, which
> was something I wanted to avoid (plus the instructions ate up valuable
> instruction encoding space) - so I removed those instructions.
>

It exists indirectly in BJX2 as a result of predicated instructions, and
the ability to predicate branches.
BSR?T / BSR?F

It sort of exists in a gray area...
Neither allowed nor prohibited, will probably work with the FPGA
implementation.

>>
>> {I remember using this a lot in 8085 assembly coding, but when writing
>> in C I seldom find
>> it profitable because all the arguments had to be in the proper
>> registers before the condition
>> to use the conditionality of the call.
>>
>> Also, do you have a conditional return, or does the epilogue of the
>> subroutine "get in the way" ??
>> <
>>> Operation for BEQ:
>>> Rt = IP + 8
>>> If (Ra = Rb)
>>> If (Rc = 63)
>>> IP = IP + Displacement
>>> Else
>>> IP = Rc + Displacement
>>>
>>> There really needs to be only a small number of return address
>>> registers. So in a couple of designs I have trimmed the target
>>> register down to two bits allowing three return address registers,
>>> and then extended the constant field of the JAL instruction by three
>>> extra bits. It makes quite a difference being able to branch +-16MB
>>> for instance instead of 2MB. 2MB is not enough for some code.
>> <
>> I got ±128MB from my ISA.
>>
>

At the moment, I now have, by instruction size:
16-bit: ± 256B (Disp8s)
32-bit: ± 1MB (Disp20s)
64-bit: ± 8GB (Disp33s, *)
~ 256TB (Abs48)

*: After realizing a new trick that allowed me to effectively eliminate
the modular addressing issue from branches while actually reducing cost
and latency in a few areas (and sidesteps the need for a separate LEA,
which has now also been expanded to ± 32GB with LEA.Q).

Though, the latter change was basically "just make it a little wider"
(adding a few bits to the AGU width didn't hurt things too badly). Also
it allows using both "signed int" and "unsigned int" as displacements
provided they are properly extended. Using "long" or "long long" as an
array index would still require explicit ALU ops.

Hopefully, displacements shouldn't be too much of an issue anymore in
this case.

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<6523662e-f897-453c-b17f-6a9e5608d5b6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16626&group=comp.arch#16626

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1049:: with SMTP id f9mr29626545qte.140.1620762235945;
Tue, 11 May 2021 12:43:55 -0700 (PDT)
X-Received: by 2002:a9d:6244:: with SMTP id i4mr26496767otk.182.1620762235727;
Tue, 11 May 2021 12:43:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 11 May 2021 12:43:55 -0700 (PDT)
In-Reply-To: <s7em2f$7gf$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me> <0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me> <f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me> <55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me> <62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com> <e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7eep9$no2$1@dont-email.me> <s7em2f$7gf$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6523662e-f897-453c-b17f-6a9e5608d5b6n@googlegroups.com>
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 11 May 2021 19:43:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 11 May 2021 19:43 UTC

On Tuesday, May 11, 2021 at 2:31:30 PM UTC-5, BGB wrote:
> On 5/11/2021 12:27 PM, Marcus wrote:
> > On 2021-05-10, MitchAlsup wrote:
> >> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
> >>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
> >>> may also store the return address in the target register allowing
> >>> conditional branch to subroutine.
> >> <
> >> How often do you find conditional branching to a subroutine to be
> >> effective ?
> >
> > I had conditional branch-and-link in my ISA at one point (probably
> > inspired by POWER or some such ISA), but then I realized that it
> > actually implied a conditional move to the link register, which
> > was something I wanted to avoid (plus the instructions ate up valuable
> > instruction encoding space) - so I removed those instructions.
> >
> It exists indirectly in BJX2 as a result of predicated instructions, and
> the ability to predicate branches.
> BSR?T / BSR?F
>
> It sort of exists in a gray area...
> Neither allowed nor prohibited, will probably work with the FPGA
> implementation.
> >>
> >> {I remember using this a lot in 8085 assembly coding, but when writing
> >> in C I seldom find
> >> it profitable because all the arguments had to be in the proper
> >> registers before the condition
> >> to use the conditionality of the call.
> >>
> >> Also, do you have a conditional return, or does the epilogue of the
> >> subroutine "get in the way" ??
> >> <
> >>> Operation for BEQ:
> >>> Rt = IP + 8
> >>> If (Ra = Rb)
> >>> If (Rc = 63)
> >>> IP = IP + Displacement
> >>> Else
> >>> IP = Rc + Displacement
> >>>
> >>> There really needs to be only a small number of return address
> >>> registers. So in a couple of designs I have trimmed the target
> >>> register down to two bits allowing three return address registers,
> >>> and then extended the constant field of the JAL instruction by three
> >>> extra bits. It makes quite a difference being able to branch +-16MB
> >>> for instance instead of 2MB. 2MB is not enough for some code.
> >> <
> >> I got ±128MB from my ISA.
> >>
> >
> At the moment, I now have, by instruction size:
> 16-bit: ± 256B (Disp8s)
> 32-bit: ± 1MB (Disp20s)
> 64-bit: ± 8GB (Disp33s, *)
> ~ 256TB (Abs48)
>
It appears you like the <alt>0177</alt> sequence.......
>
> *: After realizing a new trick that allowed me to effectively eliminate
> the modular addressing issue from branches while actually reducing cost
> and latency in a few areas (and sidesteps the need for a separate LEA,
> which has now also been expanded to ± 32GB with LEA.Q).
>
> Though, the latter change was basically "just make it a little wider"
> (adding a few bits to the AGU width didn't hurt things too badly). Also
> it allows using both "signed int" and "unsigned int" as displacements
> provided they are properly extended. Using "long" or "long long" as an
> array index would still require explicit ALU ops.
>
>
> Hopefully, displacements shouldn't be too much of an issue anymore in
> this case.

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s7eo9r$ic4$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16628&group=comp.arch#16628

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Tue, 11 May 2021 15:08:15 -0500
Organization: A noiseless patient Spider
Lines: 91
Message-ID: <s7eo9r$ic4$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me>
<f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me>
<55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me>
<62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com>
<e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7eep9$no2$1@dont-email.me> <s7em2f$7gf$1@dont-email.me>
<6523662e-f897-453c-b17f-6a9e5608d5b6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 May 2021 20:09:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="588833796df27a08939633a5f2cccbe9";
logging-data="18820"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Bffvnye/JFE/G7DrBBAJe"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:LFXdWhrBe1dIkE0u9wL6eSyHqo0=
In-Reply-To: <6523662e-f897-453c-b17f-6a9e5608d5b6n@googlegroups.com>
Content-Language: en-US
 by: BGB - Tue, 11 May 2021 20:08 UTC

On 5/11/2021 2:43 PM, MitchAlsup wrote:
> On Tuesday, May 11, 2021 at 2:31:30 PM UTC-5, BGB wrote:
>> On 5/11/2021 12:27 PM, Marcus wrote:
>>> On 2021-05-10, MitchAlsup wrote:
>>>> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
>>>>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
>>>>> may also store the return address in the target register allowing
>>>>> conditional branch to subroutine.
>>>> <
>>>> How often do you find conditional branching to a subroutine to be
>>>> effective ?
>>>
>>> I had conditional branch-and-link in my ISA at one point (probably
>>> inspired by POWER or some such ISA), but then I realized that it
>>> actually implied a conditional move to the link register, which
>>> was something I wanted to avoid (plus the instructions ate up valuable
>>> instruction encoding space) - so I removed those instructions.
>>>
>> It exists indirectly in BJX2 as a result of predicated instructions, and
>> the ability to predicate branches.
>> BSR?T / BSR?F
>>
>> It sort of exists in a gray area...
>> Neither allowed nor prohibited, will probably work with the FPGA
>> implementation.
>>>>
>>>> {I remember using this a lot in 8085 assembly coding, but when writing
>>>> in C I seldom find
>>>> it profitable because all the arguments had to be in the proper
>>>> registers before the condition
>>>> to use the conditionality of the call.
>>>>
>>>> Also, do you have a conditional return, or does the epilogue of the
>>>> subroutine "get in the way" ??
>>>> <
>>>>> Operation for BEQ:
>>>>> Rt = IP + 8
>>>>> If (Ra = Rb)
>>>>> If (Rc = 63)
>>>>> IP = IP + Displacement
>>>>> Else
>>>>> IP = Rc + Displacement
>>>>>
>>>>> There really needs to be only a small number of return address
>>>>> registers. So in a couple of designs I have trimmed the target
>>>>> register down to two bits allowing three return address registers,
>>>>> and then extended the constant field of the JAL instruction by three
>>>>> extra bits. It makes quite a difference being able to branch +-16MB
>>>>> for instance instead of 2MB. 2MB is not enough for some code.
>>>> <
>>>> I got ±128MB from my ISA.
>>>>
>>>
>> At the moment, I now have, by instruction size:
>> 16-bit: ± 256B (Disp8s)
>> 32-bit: ± 1MB (Disp20s)
>> 64-bit: ± 8GB (Disp33s, *)
>> ~ 256TB (Abs48)
>>
> It appears you like the <alt>0177</alt> sequence.......

I figured since it was brought up, maybe you were getting annoyed with
me typing +/- or something...

Otherwise, sometimes wishes that usenet/pastebin/... were properly
whitespace preserving.

Also wishes that more stuff did "proper" monospace fonts...
Like, even when one specifies a monospace font, a lot of programs still
don't give "proper" monospace display.

....

>>
>> *: After realizing a new trick that allowed me to effectively eliminate
>> the modular addressing issue from branches while actually reducing cost
>> and latency in a few areas (and sidesteps the need for a separate LEA,
>> which has now also been expanded to ± 32GB with LEA.Q).
>>
>> Though, the latter change was basically "just make it a little wider"
>> (adding a few bits to the AGU width didn't hurt things too badly). Also
>> it allows using both "signed int" and "unsigned int" as displacements
>> provided they are properly extended. Using "long" or "long long" as an
>> array index would still require explicit ALU ops.
>>
>>
>> Hopefully, displacements shouldn't be too much of an issue anymore in
>> this case.

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s7eoml$qc7$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16629&group=comp.arch#16629

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Tue, 11 May 2021 13:16:21 -0700
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <s7eoml$qc7$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me>
<f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me>
<55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me>
<62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com>
<e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7eep9$no2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 11 May 2021 20:16:22 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8d7b9135d963dcff083c5b1cfb1a474a";
logging-data="27015"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184J7sQLhC2KslG2hpspaQC"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:xY5s57Wmb+VUUZiANwpQ2tR/GD0=
In-Reply-To: <s7eep9$no2$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Tue, 11 May 2021 20:16 UTC

On 5/11/2021 10:27 AM, Marcus wrote:
> On 2021-05-10, MitchAlsup wrote:
>> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
>>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
>>> may also store the return address in the target register allowing
>>> conditional branch to subroutine.
>> <
>> How often do you find conditional branching to a subroutine to be
>> effective ?
>
> I had conditional branch-and-link in my ISA at one point (probably
> inspired by POWER or some such ISA), but then I realized that it
> actually implied a conditional move to the link register, which
> was something I wanted to avoid (plus the instructions ate up valuable
> instruction encoding space) - so I removed those instructions.

Doesn't have to require conditional move.

In a belt machine, a call must update the belt in the same way whether
taken or not, so Mill conditional calls drop null results if the call is
untaken. Because of the availability of NaR as a metadata marker for
invalid/missing data, the nulls are NaRs and so will poison any
subsequent actual use of the value, but that's a QOI feature independent
of the conditional call feature.

In your case, you could have the link register unconditionally
klobbered, either by the return address if taken, or a null if untaken.
Or, if the return address can be meaningfully used whether taken or not,
set the register unconditionally to what would be the return address if
it had been taken. An untaken conditional call need not be totally free
of side effects; it just must not enter the called address.

Re: FP8 (was Compact representation for common integer constants)

<s7fsgs$1fag$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16642&group=comp.arch#16642

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!/FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Wed, 12 May 2021 08:27:39 +0200
Organization: Aioe.org NNTP Server
Lines: 59
Message-ID: <s7fsgs$1fag$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org> <s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org> <jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>
NNTP-Posting-Host: /FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 12 May 2021 06:27 UTC

Stefan Monnier wrote:
>>>> An alternative of course is to use a log-base representation. So your
>>>> 8bit data is split 1:7 between a sign and an exponent, with no mantissa
>>>> at all and then you choose your dynamic range by picking the base.
>>>> Multiplication is easy but addition is takes more effort.
>>> <
>>> Really ?!?!?
>>> Both are implemented with table look ups with concatenated values as the
>>> address.
>>
>> Agreed, in a HW implementation you don't even need a full 64K (8+8 bits),
>> since the sign logic can go on in parallel with the table access.
>>
>> I.e. 14-bit tables for each operation is sufficient, so that is about 14 KB
>> x 4 = 56 KB of rom space?
>
> For FMUL and FDIV, a logarithm based representation means that these
> operations are implemented as addition/substraction of the exponent,
> which I don't think deserves to be implemented via table lookup.

OK, for a pure log with no mantissa (or an implied mantissa of 1), I
agree that FDIV/FMUL deserves to be implemented as integer sub/add. :-)

You do need to detect zero inputs and handle that separately.
>
> So we're only really talking about 2 operations (FADD and FSUB) which
> reduce to a single operation after fiddling with the sign.

I did assume that the actual sign bit fiddling had finished, so that
FSUB is two terms with opposite sign, and FADD means equal sign.

Assuming default rounding and FADD, the larger value wins unless the
smaller is equal or exactly 1 below, in which case we'll round up.

With ceil() logic the smaller value plays the role of the sticky bit, so
any non-zero value can modify (i.e. increment) the result, while floor()
is truncate unless both inputs are equal.

For FSUB the logic is similar but opposite, so I do agree that a logic
table based on a single comparison of the two inputs and the rounding
mode is sufficient: No need for any tables at all. :-)
>
> I'm sill not sure if a 32kB table is the best choice for FADD, tho since
> with the low-precision of FP8, I'd expect that of the 32k different
> possible cases, the overwhelming majority of cases fall into the
> situation where x + y ≃ max(x, y) because the smaller of the two is
> smaller than the relative error of the larger (and I think we can find
> those cases cheaply by subtracting the two exponents and checking if
> it's magnitude is larger than the threshold).

Yeah, it is only when you have at least two manitssa bits (one of them
hidden) that you need more complicated logic and a lookup table becomes
viable.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The old RISC-vs-CISC

<s7fsr9$1j6h$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16643&group=comp.arch#16643

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!/FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC
Date: Wed, 12 May 2021 08:33:11 +0200
Organization: Aioe.org NNTP Server
Lines: 50
Message-ID: <s7fsr9$1j6h$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me>
<f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me>
<55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me>
<62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com>
<e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7ccdn$28b$1@dont-email.me> <s7decb$1c7j$1@gioia.aioe.org>
<7d9da68e-5aae-4619-9593-b7496f2c7725n@googlegroups.com>
NNTP-Posting-Host: /FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 12 May 2021 06:33 UTC

MitchAlsup wrote:
> On Tuesday, May 11, 2021 at 3:14:10 AM UTC-5, Terje Mathisen wrote:
>> Ivan Godard wrote:
>>> On 5/10/2021 2:11 PM, MitchAlsup wrote:
>>>> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
>>>>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
>>>>> may also store the return address in the target register allowing
>>>>> conditional branch to subroutine.
>>>> <
>>>> How often do you find conditional branching to a subroutine to be
>>>> effective ?
>>>
>>> Very frequent in Mill code, due to predication of calls when EBBs are
>>> folded together to let both then and else run interleaved. Probably rare
>>> otherwise. Consequently asking "how often do you see?" is not very
>>> informative unless also asking whether the compiler takes advantage of
>>> the conditional.
>> A conditional function call would be very useful whenever you write any
>> code that process a stream of data, using a local buffer which you have
>> to refill whenever you reach the low-water mark.
>>
>> I.e.
>>
>> if (amount < low_limit) fill_buffer()
>>
>> This happens a lot in most stream/file library code, as well as wjen
>> processing bitstreams for a codec.
> <
> But does the code not look more like::
>
> if( buf->amount < low_limit ) buf->fill_pointer = fill_buffer( buf->file, buf->fill_pointer );

Not really, when I've written code where these details matter, I am
typically writing asm, and the entire context is already implied, i.e.
in fixed register positions.

It has more in common with a sw trap to fixup a COW page access, i.e.
the function is normally very seldom called, but when it happens, it has
to either know up front or figure out on the fly what the environment is
and what to do.

On either My 66000 or Mill function calls are so cheap that inlining of
even small functions is less interesting: Saving code space becomes
relatively more important.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<s7g088$d28$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16644&group=comp.arch#16644

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-2ea4-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Wed, 12 May 2021 07:31:20 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s7g088$d28$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org> <s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org> <jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>
<s7fsgs$1fag$1@gioia.aioe.org>
Injection-Date: Wed, 12 May 2021 07:31:20 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-2ea4-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:2ea4:0:7285:c2ff:fe6c:992d";
logging-data="13384"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 12 May 2021 07:31 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

> OK, for a pure log with no mantissa (or an implied mantissa of 1), I
> agree that FDIV/FMUL deserves to be implemented as integer sub/add. :-)
>
> You do need to detect zero inputs and handle that separately.

Unless you handle zero as a special number, which might make sense.

Re: FP8 (was Compact representation for common integer constants)

<s7ge9j$1i2d$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16645&group=comp.arch#16645

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!/FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Wed, 12 May 2021 13:31:00 +0200
Organization: Aioe.org NNTP Server
Lines: 19
Message-ID: <s7ge9j$1i2d$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org> <s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org> <jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>
<s7fsgs$1fag$1@gioia.aioe.org> <s7g088$d28$1@newsreader4.netcologne.de>
NNTP-Posting-Host: /FKOcGQMirZgkZJCo9x3IA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 12 May 2021 11:31 UTC

Thomas Koenig wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>
>> OK, for a pure log with no mantissa (or an implied mantissa of 1), I
>> agree that FDIV/FMUL deserves to be implemented as integer sub/add. :-)
>>
>> You do need to detect zero inputs and handle that separately.
>
> Unless you handle zero as a special number, which might make sense.
>
That's actually what I do in my sw emulation, i.e. zero/inf/nan are all
special and handled in parallel with the normal inputs which can then
assume that all inputs are sane.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<jwv4kf8krbo.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16647&group=comp.arch#16647

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Wed, 12 May 2021 09:42:38 -0400
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <jwv4kf8krbo.fsf-monnier+comp.arch@gnu.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
<s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org>
<jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>
<s7fsgs$1fag$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="df6dbc34a84afabf6952402c5dd2d050";
logging-data="11451"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+EaUZeilq6L2zXe1/GpE8g"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:W/buYtV73IwNBPVzGxb5PrYxcZ8=
sha1:IlogjRrwBmPAH1KWsKpUxxgVxXA=
 by: Stefan Monnier - Wed, 12 May 2021 13:42 UTC

> Assuming default rounding and FADD, the larger value wins unless the smaller
> is equal or exactly 1 below, in which case we'll round up.

I think this is only true if the base of your logarithm is something
like 2 (which I expect wouldn't be the most common choice, it
corresponds to a "standard" FP but with 0 mantissa bits).

E.g. I think the moral equivalent of 2 mantissa bits would be to use
a logarithm of base of 2^¼. When the logarithm's base is closer to
1 (as in this example), then you have to consider more cases.

Stefan

Re: FP8 (was Compact representation for common integer constants)

<2021May12.190836@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16651&group=comp.arch#16651

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Wed, 12 May 2021 17:08:36 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 74
Message-ID: <2021May12.190836@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com> <2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
Injection-Info: reader02.eternal-september.org; posting-host="113c21cb184098292786791c2c20a4b3";
logging-data="29867"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184i9jCJE/EhnCKfakjYpkt"
Cancel-Lock: sha1:R5k49U6mScIk07MFXvDTH7wZ9dI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 12 May 2021 17:08 UTC

John Levine <johnl@taugh.com> writes:
>According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>>On small systems, I suppose. Floating point was standard or at least a
>>>widely provided option on large computers by the early 1960s.
>>
>>But it still was much slower than integer arithmetic on most machines.
>
>I happen to have a 360/67 manual here. It and the similar /65 were
>workhorse mainframes in the early 1970s.
>
>Memory to register integer add took 1.4us, floating took 2.43 short, 2.45 double
>Integer multiply was 4.8us, 4.4 short (faster than integer), 7.6 double
>Integer divide was 8.7us, float 7.3, double 14.10

Not bad.

>I have manuals for the 386 and 486, where float arithmetic was also
>about half the speed of fixed, even though integer arithmetic was
>32x32 and float was all extended with 64 fraction bits. Doesn't seem
>much slower to me.

According to
<https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_f.html>,
8087-Pentium take the following number of cycles for FADD(P) and FMUL(P):

variations/
operand 8087 287 387 486 Pentium
fadd 70-100 70-100 23-34 8-20 3/1 FX
fadd mem32 90-120+EA 90-120 24-32 8-20 3/1 FX
fadd mem64 95-125+EA 95-125 29-37 8-20 3/1 FX
faddp 75-105 75-105 23-31 8-20 3/1 FX
fmul reg s 90-105 90-105 29-52 16 3/1 FX
fmul reg 130-145 130-145 46-57 16 3/1 FX
fmul mem32 (110-125)+EA 110-125 27-35 11 3/1 FX
fmul mem64 (154-168)+EA 154-168 32-57 14 3/1 FX
fmulp reg s 94-108 94-108 29-52 16 3/1 FX
fmulp reg 134-148 134-148 29-57 16 3/1 FX

and for comparison ADD and MUL:

add operands bytes 8088 186 286 386 486 Pentium
add reg, reg 2 3 3 2 2 1 1 UV
add mem, reg 2+d(0,2) 24+EA 10 7 7 3 3 UV
add reg, mem 2+d(0,2) 13+EA 10 7 6 2 2 UV
mul r32 2 - - - 9-38 13-42 10 NP
mul mem32 2+d(0-2) - - - 12-41 13-42 10 NP

So on the 486 "add reg, reg" is 8-20 times faster than "faddp", but
fmul is as fast or faster than mul (and imul has the same cycle counts
as mul). On the Pentium FMUL is >3 times as fast as MUL, and
pipelined (i.e., 1 fmul/cycle can be started, and I don't think that
MUL/IMUL can).

>If you can write your code using the same number of fixed instructions
>as floats, sure, it'll be faster, but if you need to add extra code to
>explicit scaling, I doubt it'll really be faster.

If the code does many adds, fixed point will be faster (no scaling
needed if all the summands and the result have the same scale), if it
does primarily muls, floating will be faster even on the 386, 486, and
Pentium, because MUL is slow, and because scaling is needed (but
scaling is ideally a shift).

>On the other hand, if you had a 286 or 386 with no 287 or 387 and
>were simulating floating point in software, *that* was slow.

On the 8087 and 287, floating point is also slow; but given the cost
of synthesized 32x32-bit multiplication on the 8086 and 80286 you
again have the same balance as above.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s7h6os$vj2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16653&group=comp.arch#16653

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Wed, 12 May 2021 20:28:44 +0200
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <s7h6os$vj2$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
<s74rp7$laf$1@dont-email.me> <s78dnf$hpi$1@dont-email.me>
<f656541b-ffc7-407e-8f5f-c7ca93477f98n@googlegroups.com>
<s79khu$u4v$1@dont-email.me>
<55ccad4d-7603-415e-9e37-c6db84901e33n@googlegroups.com>
<s7bt9m$iim$1@dont-email.me>
<62b81886-bf04-4a4c-a89c-aacedba9a1f5n@googlegroups.com>
<f475d3b6-dc42-4063-9d3f-4f38e3de4e37n@googlegroups.com>
<e1d470b1-b2d7-43df-af69-e6f6c9aff210n@googlegroups.com>
<s7eep9$no2$1@dont-email.me> <s7eoml$qc7$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 12 May 2021 18:28:45 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ecac275f4481e53944ff7b6ed3b8b822";
logging-data="32354"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+xyKVDBw4tD2BhHInbHpt7028BnWHe46c="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:Y+KgEypLU8piinI82ZAqwndoya4=
In-Reply-To: <s7eoml$qc7$1@dont-email.me>
Content-Language: en-US
 by: Marcus - Wed, 12 May 2021 18:28 UTC

On 2021-05-11, Ivan Godard wrote:
> On 5/11/2021 10:27 AM, Marcus wrote:
>> On 2021-05-10, MitchAlsup wrote:
>>> On Monday, May 10, 2021 at 3:53:45 PM UTC-5, robf...@gmail.com wrote:
>>>> ANY1 supports a 21 bit branch displacement for branching +/-1MB. It
>>>> may also store the return address in the target register allowing
>>>> conditional branch to subroutine.
>>> <
>>> How often do you find conditional branching to a subroutine to be
>>> effective ?
>>
>> I had conditional branch-and-link in my ISA at one point (probably
>> inspired by POWER or some such ISA), but then I realized that it
>> actually implied a conditional move to the link register, which
>> was something I wanted to avoid (plus the instructions ate up valuable
>> instruction encoding space) - so I removed those instructions.
>
> Doesn't have to require conditional move.
>
> In a belt machine, a call must update the belt in the same way whether
> taken or not, so Mill conditional calls drop null results if the call is
> untaken. Because of the availability of NaR as a metadata marker for
> invalid/missing data, the nulls are NaRs and so will poison any
> subsequent actual use of the value, but that's a QOI feature independent
> of the conditional call feature.
>
> In your case, you could have the link register unconditionally
> klobbered, either by the return address if taken, or a null if untaken.
> Or, if the return address can be meaningfully used whether taken or not,
> set the register unconditionally to what would be the return address if
> it had been taken. An untaken conditional call need not be totally free
> of side effects; it just must not enter the called address.

True, I did not think of that at the time. Today I don't think I'll
reintroduce those instructions since I'd have to rearrange instruction
encoding etc (and I think the ISA is in a pretty good place right now),
and honestly I don't know how much of an impact conditional
branch-and-link would have for my ISA.

/Marcus

Re: Compact representation for common integer constants

<s7h9kc$q16$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16655&group=comp.arch#16655

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Wed, 12 May 2021 14:16:21 -0500
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <s7h9kc$q16$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 12 May 2021 19:17:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="79fb1c82076aad0ee72c7a1579bb5e55";
logging-data="26662"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/IkwDfyRGxF3EGgZ1wQGl4"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:dPh03BaKxhjf98PZN3nNfe/eyC0=
In-Reply-To: <s7bo65$9gq$1@dont-email.me>
Content-Language: en-US
X-Mozilla-News-Host: news://news.albasani.net
 by: BGB - Wed, 12 May 2021 19:16 UTC

On 5/10/2021 11:49 AM, Ivan Godard wrote:
> On 5/10/2021 9:22 AM, MitchAlsup wrote:
>> On Sunday, May 9, 2021 at 10:22:49 PM UTC-5, BGB wrote:
>>> On 5/9/2021 4:28 AM, Thomas Koenig wrote:
>>>> BGB <cr8...@gmail.com> schrieb:
>>>>
>>>>> IMUL (Lane 1)
>>>>> 32*32 -> 64
>>>>
>>>> Do you have two instructions (one for signed and one for unsigned)
>>>> or three (one for the lower half, one for signed high, one for
>>>> unsigned high)? The latter version could save you some ALU
>>>> complexity and some latency in the (probably common) case where
>>>> only a 32*32 multiplication is needed, at the cost of added
>>>> instructions for the 32*32-> 64 bit case.
>>>>
>>> There are actually more cases:
>>> MULS: 32*32->32, Result is sign-extended from low 32 bits
>> IMUL
>>> MULU: 32*32->32, Result is zero-extended from low 32 bits
>> UMUL
>>> DMULS: 32*32->64, 64-bit signed result
>> CARRY; IMUL
>>> DMULU: 32*32->64, 64-bit unsigned result
>> CARRY; UMUL
>>>
>>> The former give the typical behaviors one expects in C, the latter gives
>>> the widened results.
>>>
>>> These exist as 3R forms, so:
>>> DMULU R4, R5, R7 // R7 = R4 * R5
>> <
>> All mine are 2-operand 1-result
>>>
>>>
>>> Originally, there were also multiply ops which multiplied two inputs and
>>> then stored a pair of results in R0 and R1, more like the original
>>> SuperH multiply ops, but I dropped these for various reasons.
>> <
>> Consumes way more OpCode space that in useful
>>>
>>> There are cases where DMACS or DMACU instructions could be useful:
>>> DMACU R4, R5, R7 // R7 = R4 * R5 + R7
>> <
>> IMAC and UMAC
>>>
>>> But, I don't currently have these.
>>>
>>>
>>> Eg (64-bit signed multiply):
>>> SHADQ R4, -32, R6 | SHADQ R5, -32, R7 //1c
>>> DMULS R4, R7, R16 //1c
>>> DMACS R5, R6, R16 //3c (2c penalty)
>>> SHADQ R16, 32, R2 //3c (2c penalty)
>>> DMACU R4, R5, R2 //1c
>>> RTS
>>>
>>> Though, while fewer instructions than the current form, the above
>>> construction would still be pretty bad in terms of interlock penalties.
>>>
>>> SHADQ R4, -32, R6 | SHADQ R5, -32, R7 //1c
>>> DMULS R5, R6, R16 //1c
>>> DMULS R4, R7, R17 //1c
>>> DMULU R4, R5, R18 //1c
>>> ADD R16, R17, R19 //2c (1c penalty, DMULS R17)
>>> SHADQ R19, 32, R2 //2c (1c penalty, ADD R19)
>>> ADD R18, R2 //1c
>>> RTS
>>>
>>> Both cases would have approximately the same clock-cycle count (assuming
>>> both cases have a 3-cycle latency).
>> <
>> Which is why I used CARRY; xMUL
>>>
>>> ( Where recently, I have gotten around to modifying things such that the
>>> multiplier is now fully pipelined... )
>>>
>>>
>>>
>>> Otherwise, my time recently has mostly been being consumed by
>>> debugging...
>> <
>> Sherlock we know you well...........
>>>
>>> Then tries and seeing if I can get stuff to pass timing at 75MHz again
>>> (hopefully without wrecking stuff quite as bad this time). This
>>> sub-effort also revealed a few bugs (*), though there are still some
>>> bugs I have yet to resolve...
>>>
>>> *: Eg, after boosting the core to 75MHz while leaving the MMIO bus at
>>> 50MHz, stuff was breaking in simulation due to the L2 Ringbus <->
>>> MMIO-Bus bridge not waiting for the MMIO-Bus to return to a READY state
>>> before returning back to an idle state.
>>>
>>> It was then possible for the response to travel the rings and get back
>>> to the L1, which then allows execution to continue, with the CPU core
>>> then issuing another MMIO request, which then travels along the rings
>>> back to the MMIO bridge, in less time than it took for the 'OK -> READY'
>>> transition to happen on the MMIO bus...
>>>
>>> The way the bridge was designed, it would then try to initiate a
>>> request, see that the MMIO-Bus state was 'OK', and use whatever result
>>> was present (losing the request or returning garbage).
>>>
>>> This may have been happening at 50MHz as well, and could have possibly
>>> been leading to some of the bugs I had seen.
>>>
>>>
>>> Or such...
>
> Do you have any saturating multiplies? Why or why not?

For BJX2?
Not currently.

There are neither saturating multiplies nor saturated ADD/SUB.

Granted, these could be useful for Fixed-point SIMD, but I was mostly
being careful with value ranges and arithmetic to limit the likelihood
of overflow/underflow.

As for why not:
These add a bit of cost and complexity.

Say, one goes from needing ops for, say:
PADD.W, PMULU.W, PMULS.W, ...
To:
PADDS.W, PADDU.W, PADDSS.W, PADDUS.W,
PMULS.W, PMULU.W, PMULSS.W, PMULUS.W
...

For vector cases, it is usually possible to keep a certain amount of a
"safety zone" at the top and bottom of the range such that overflow is
unlikely, and writing calculations such that they are more likely to
undershoot than overshoot, ...

There are some "packed-compare" and "packed-select" ops which can also
help here.

For non-SIMD cases, clamping can be done like:
CMPGT R4, 0
MOV?T 0, R4
CMPGT R4, 255
MOV?F 255, R4

I had at one point considered specialized range-clamping ops, like:
CLAMPU.B R4, R4 //clamp to Unsigned Byte range
CLAMPS.S R4, R4 //clamp to Signed Short range

But, never added them...

Mostly it is an issue of not being common enough (or expensive enough in
the naive case) to justify the added cost.

Re: Compact representation for common integer constants

<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16656&group=comp.arch#16656

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:e518:: with SMTP id w24mr19952001qkf.490.1620847877085; Wed, 12 May 2021 12:31:17 -0700 (PDT)
X-Received: by 2002:a9d:6743:: with SMTP id w3mr18380929otm.82.1620847876400; Wed, 12 May 2021 12:31:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 May 2021 12:31:16 -0700 (PDT)
In-Reply-To: <s7h9kc$q16$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com> <s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com> <s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de> <s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com> <s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 12 May 2021 19:31:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 191
 by: MitchAlsup - Wed, 12 May 2021 19:31 UTC

On Wednesday, May 12, 2021 at 2:17:33 PM UTC-5, BGB wrote:
> On 5/10/2021 11:49 AM, Ivan Godard wrote:
> > On 5/10/2021 9:22 AM, MitchAlsup wrote:
> >> On Sunday, May 9, 2021 at 10:22:49 PM UTC-5, BGB wrote:
> >>> On 5/9/2021 4:28 AM, Thomas Koenig wrote:
> >>>> BGB <cr8...@gmail.com> schrieb:
> >>>>
> >>>>> IMUL (Lane 1)
> >>>>> 32*32 -> 64
> >>>>
> >>>> Do you have two instructions (one for signed and one for unsigned)
> >>>> or three (one for the lower half, one for signed high, one for
> >>>> unsigned high)? The latter version could save you some ALU
> >>>> complexity and some latency in the (probably common) case where
> >>>> only a 32*32 multiplication is needed, at the cost of added
> >>>> instructions for the 32*32-> 64 bit case.
> >>>>
> >>> There are actually more cases:
> >>> MULS: 32*32->32, Result is sign-extended from low 32 bits
> >> IMUL
> >>> MULU: 32*32->32, Result is zero-extended from low 32 bits
> >> UMUL
> >>> DMULS: 32*32->64, 64-bit signed result
> >> CARRY; IMUL
> >>> DMULU: 32*32->64, 64-bit unsigned result
> >> CARRY; UMUL
> >>>
> >>> The former give the typical behaviors one expects in C, the latter gives
> >>> the widened results.
> >>>
> >>> These exist as 3R forms, so:
> >>> DMULU R4, R5, R7 // R7 = R4 * R5
> >> <
> >> All mine are 2-operand 1-result
> >>>
> >>>
> >>> Originally, there were also multiply ops which multiplied two inputs and
> >>> then stored a pair of results in R0 and R1, more like the original
> >>> SuperH multiply ops, but I dropped these for various reasons.
> >> <
> >> Consumes way more OpCode space that in useful
> >>>
> >>> There are cases where DMACS or DMACU instructions could be useful:
> >>> DMACU R4, R5, R7 // R7 = R4 * R5 + R7
> >> <
> >> IMAC and UMAC
> >>>
> >>> But, I don't currently have these.
> >>>
> >>>
> >>> Eg (64-bit signed multiply):
> >>> SHADQ R4, -32, R6 | SHADQ R5, -32, R7 //1c
> >>> DMULS R4, R7, R16 //1c
> >>> DMACS R5, R6, R16 //3c (2c penalty)
> >>> SHADQ R16, 32, R2 //3c (2c penalty)
> >>> DMACU R4, R5, R2 //1c
> >>> RTS
> >>>
> >>> Though, while fewer instructions than the current form, the above
> >>> construction would still be pretty bad in terms of interlock penalties.
> >>>
> >>> SHADQ R4, -32, R6 | SHADQ R5, -32, R7 //1c
> >>> DMULS R5, R6, R16 //1c
> >>> DMULS R4, R7, R17 //1c
> >>> DMULU R4, R5, R18 //1c
> >>> ADD R16, R17, R19 //2c (1c penalty, DMULS R17)
> >>> SHADQ R19, 32, R2 //2c (1c penalty, ADD R19)
> >>> ADD R18, R2 //1c
> >>> RTS
> >>>
> >>> Both cases would have approximately the same clock-cycle count (assuming
> >>> both cases have a 3-cycle latency).
> >> <
> >> Which is why I used CARRY; xMUL
> >>>
> >>> ( Where recently, I have gotten around to modifying things such that the
> >>> multiplier is now fully pipelined... )
> >>>
> >>>
> >>>
> >>> Otherwise, my time recently has mostly been being consumed by
> >>> debugging...
> >> <
> >> Sherlock we know you well...........
> >>>
> >>> Then tries and seeing if I can get stuff to pass timing at 75MHz again
> >>> (hopefully without wrecking stuff quite as bad this time). This
> >>> sub-effort also revealed a few bugs (*), though there are still some
> >>> bugs I have yet to resolve...
> >>>
> >>> *: Eg, after boosting the core to 75MHz while leaving the MMIO bus at
> >>> 50MHz, stuff was breaking in simulation due to the L2 Ringbus <->
> >>> MMIO-Bus bridge not waiting for the MMIO-Bus to return to a READY state
> >>> before returning back to an idle state.
> >>>
> >>> It was then possible for the response to travel the rings and get back
> >>> to the L1, which then allows execution to continue, with the CPU core
> >>> then issuing another MMIO request, which then travels along the rings
> >>> back to the MMIO bridge, in less time than it took for the 'OK -> READY'
> >>> transition to happen on the MMIO bus...
> >>>
> >>> The way the bridge was designed, it would then try to initiate a
> >>> request, see that the MMIO-Bus state was 'OK', and use whatever result
> >>> was present (losing the request or returning garbage).
> >>>
> >>> This may have been happening at 50MHz as well, and could have possibly
> >>> been leading to some of the bugs I had seen.
> >>>
> >>>
> >>> Or such...
> >
> > Do you have any saturating multiplies? Why or why not?
>
> For BJX2?
> Not currently.
>
> There are neither saturating multiplies nor saturated ADD/SUB.
>
> Granted, these could be useful for Fixed-point SIMD, but I was mostly
> being careful with value ranges and arithmetic to limit the likelihood
> of overflow/underflow.
>
>
> As for why not:
> These add a bit of cost and complexity.
>
> Say, one goes from needing ops for, say:
> PADD.W, PMULU.W, PMULS.W, ...
> To:
> PADDS.W, PADDU.W, PADDSS.W, PADDUS.W,
> PMULS.W, PMULU.W, PMULSS.W, PMULUS.W
> ...
>
> For vector cases, it is usually possible to keep a certain amount of a
> "safety zone" at the top and bottom of the range such that overflow is
> unlikely, and writing calculations such that they are more likely to
> undershoot than overshoot, ...
>
> There are some "packed-compare" and "packed-select" ops which can also
> help here.
>
>
> For non-SIMD cases, clamping can be done like:
> CMPGT R4, 0
> MOV?T 0, R4
> CMPGT R4, 255
> MOV?F 255, R4
>
> I had at one point considered specialized range-clamping ops, like:
> CLAMPU.B R4, R4 //clamp to Unsigned Byte range
> CLAMPS.S R4, R4 //clamp to Signed Short range
<
<
My 66000 has clamping operations to any bit width--these are a subset
of the extract instructions (which is itself a subset of the shift instruction)
The classical cases are::
SLL R8,R8,<8:0> // unsigned char
SL R8,R8,<8:0> // signed char
<
but there are other cases
<
SLL R8,R9,<11:0> // 11-bit unsigned bit-field
SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
<
These come in signed (sign extended) and unsigned (zero extended) forms.
These add exactly ZERO (nada zilch no} instruction to ISA and are in fact the
basis for the shift instructions with the simple rule of::
width==0 -> width "really =" 64
>
> But, never added them...
<
I never needed to............
>
> Mostly it is an issue of not being common enough (or expensive enough in
> the naive case) to justify the added cost.
<
When you start using LLVM as a front end, it throws these clamps out any time
a value in a register gets arithmetically manipulated--preventing the register from
ever having a value outside of the type defined value-space.
<
Unsigned char i = 0;
........
i++
<
gets compiled to::
<
MOV R19,#0
........
ADD R19,R19,#1
SLL R19,R19,<8:0>
<
Yes it is ugly......but people should not ask for small types unless they need the small type.

Clamping. was: Compact representation for common integer constants

<s7hbo3$48e$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16658&group=comp.arch#16658

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Clamping. was: Compact representation for common integer constants
Date: Wed, 12 May 2021 12:53:39 -0700
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <s7hbo3$48e$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 12 May 2021 19:53:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2a93c134b2eab2a6d6731853ca662d4f";
logging-data="4366"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18BDeQeZfWiHFUZ10GLQvBA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:5o5iWcfS5DqJY4YswWa2Ucai2uQ=
In-Reply-To: <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 12 May 2021 19:53 UTC

On 5/12/2021 12:31 PM, MitchAlsup wrote:

> My 66000 has clamping operations to any bit width--these are a subset
> of the extract instructions (which is itself a subset of the shift instruction)
> The classical cases are::
> SLL R8,R8,<8:0> // unsigned char
> SL R8,R8,<8:0> // signed char
> <
> but there are other cases
> <
> SLL R8,R9,<11:0> // 11-bit unsigned bit-field
> SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
> <
> These come in signed (sign extended) and unsigned (zero extended) forms.
> These add exactly ZERO (nada zilch no} instruction to ISA and are in fact the
> basis for the shift instructions with the simple rule of::
> width==0 -> width "really =" 64

But that calls for full width multiply and then a clamp, which is two
instructions' entropy and the dataflow delay, and doesn't actually clamp
the multiply - consider a signed multiply that overflows (unclamped),
and the low order looks to the clamp as a value of the opposite sign.

Or have I misunderstood?

>>
>> But, never added them...
> <
> I never needed to............
>>
>> Mostly it is an issue of not being common enough (or expensive enough in
>> the naive case) to justify the added cost.
> <
> When you start using LLVM as a front end, it throws these clamps out any time
> a value in a register gets arithmetically manipulated--preventing the register from
> ever having a value outside of the type defined value-space.
> <
> Unsigned char i = 0;
> .......
> i++
> <
> gets compiled to::
> <
> MOV R19,#0
> .......
> ADD R19,R19,#1
> SLL R19,R19,<8:0>
> <
> Yes it is ugly......but people should not ask for small types unless they need the small type.
>

Bit field select, signed and unsigned, is useful, but it's not the same
as saturating arithmetic.

Re: arithmetic fast and slow, was FP8 (was Compact representation for common integer constants)

<s7hdel$sp1$2@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16660&group=comp.arch#16660

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!adore2!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: arithmetic fast and slow, was FP8 (was Compact representation for common integer constants)
Date: Wed, 12 May 2021 20:22:45 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <s7hdel$sp1$2@gal.iecc.com>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com> <2021May12.190836@mips.complang.tuwien.ac.at>
Injection-Date: Wed, 12 May 2021 20:22:45 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="29473"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com> <2021May12.190836@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Wed, 12 May 2021 20:22 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>So on the 486 "add reg, reg" is 8-20 times faster than "faddp", but
>fmul is as fast or faster than mul (and imul has the same cycle counts
>as mul). On the Pentium FMUL is >3 times as fast as MUL, and
>pipelined (i.e., 1 fmul/cycle can be started, and I don't think that
>MUL/IMUL can).

I looked at my 486 manual and that seems right. The 486 float and
integer units ran independently so if you could interleave instuctions
a floating add was only 7 cycles average rather than 10. But it's
still a lot slower than integer if 32 bits is enough. Shifts were 3 or
4 cycles so it didn't take a lot of aligning to tip the balance.

Don't have a Pentium manual handy although I expect bitsavers has one.

It does make the assertion that he rewrote everything in fixed point to
make it faster seem a bit naive.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Clamping. was: Compact representation for common integer constants

<80d87cdf-a4dd-4d09-b34e-53db5cf449f4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16663&group=comp.arch#16663

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:20e7:: with SMTP id 7mr37098159qvk.36.1620854855053;
Wed, 12 May 2021 14:27:35 -0700 (PDT)
X-Received: by 2002:aca:2107:: with SMTP id 7mr413187oiz.110.1620854854792;
Wed, 12 May 2021 14:27:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 May 2021 14:27:34 -0700 (PDT)
In-Reply-To: <s7hbo3$48e$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7hbo3$48e$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <80d87cdf-a4dd-4d09-b34e-53db5cf449f4n@googlegroups.com>
Subject: Re: Clamping. was: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 12 May 2021 21:27:35 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 12 May 2021 21:27 UTC

On Wednesday, May 12, 2021 at 2:53:41 PM UTC-5, Ivan Godard wrote:
> On 5/12/2021 12:31 PM, MitchAlsup wrote:
>
> > My 66000 has clamping operations to any bit width--these are a subset
> > of the extract instructions (which is itself a subset of the shift instruction)
> > The classical cases are::
> > SLL R8,R8,<8:0> // unsigned char
> > SL R8,R8,<8:0> // signed char
> > <
> > but there are other cases
> > <
> > SLL R8,R9,<11:0> // 11-bit unsigned bit-field
> > SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
> > <
> > These come in signed (sign extended) and unsigned (zero extended) forms.
> > These add exactly ZERO (nada zilch no} instruction to ISA and are in fact the
> > basis for the shift instructions with the simple rule of::
> > width==0 -> width "really =" 64
> But that calls for full width multiply and then a clamp, which is two
> instructions' entropy and the dataflow delay, and doesn't actually clamp
> the multiply - consider a signed multiply that overflows (unclamped),
> and the low order looks to the clamp as a value of the opposite sign.
>
> Or have I misunderstood?
<
clamp means to prevent values from being outside of a range
saturate means to prevent a calculation from delivering a value that is out of range.
So::
uint8_t x = (int64_t)y
needs to have the assignment clamped so x has no significance beyond 8-bits.
whereas::
saturated uint8_t x,y,z;
x = y+z;
needs to have the addition saturate.
> >>
> >> But, never added them...
> > <
> > I never needed to............
> >>
> >> Mostly it is an issue of not being common enough (or expensive enough in
> >> the naive case) to justify the added cost.
> > <
> > When you start using LLVM as a front end, it throws these clamps out any time
> > a value in a register gets arithmetically manipulated--preventing the register from
> > ever having a value outside of the type defined value-space.
> > <
> > Unsigned char i = 0;
> > .......
> > i++
> > <
> > gets compiled to::
> > <
> > MOV R19,#0
> > .......
> > ADD R19,R19,#1
> > SLL R19,R19,<8:0>
> > <
> > Yes it is ugly......but people should not ask for small types unless they need the small type.
> >
> Bit field select, signed and unsigned, is useful, but it's not the same
> as saturating arithmetic.
<
Was not implying that is was or is. One has to do with preventing a value container
from containing a value outside of the allotted range, the other has to do with pre-
venting a calculation from delivering a value that is out of range.

Re: Clamping. was: Compact representation for common integer constants

<s7hh95$hap$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16664&group=comp.arch#16664

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Clamping. was: Compact representation for common integer
constants
Date: Wed, 12 May 2021 16:26:54 -0500
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <s7hh95$hap$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7hbo3$48e$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 12 May 2021 21:28:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="79fb1c82076aad0ee72c7a1579bb5e55";
logging-data="17753"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/d1jO/8JG821bRiQrylHyN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:8wUcnOG0kou7mamQmnwBpJ6slf4=
In-Reply-To: <s7hbo3$48e$1@dont-email.me>
Content-Language: en-US
 by: BGB - Wed, 12 May 2021 21:26 UTC

On 5/12/2021 2:53 PM, Ivan Godard wrote:
> On 5/12/2021 12:31 PM, MitchAlsup wrote:
>
>> My 66000 has clamping operations to any bit width--these are a subset
>> of the extract instructions (which is itself a subset of the shift
>> instruction)
>> The classical cases are::
>>            SLL        R8,R8,<8:0>           // unsigned char
>>            SL          R8,R8,<8:0>           // signed char
>> <
>> but there are other cases
>> <
>>            SLL        R8,R9,<11:0>         // 11-bit unsigned bit-field
>>            SLL        R8,R9,<14:13>       // 14-bit field from R9<26:13>
>> <
>> These come in signed (sign extended) and unsigned (zero extended) forms.
>> These add exactly ZERO (nada zilch no} instruction to ISA and are in
>> fact the
>> basis for the shift instructions with the simple rule of::
>>   width==0 -> width "really =" 64
>
> But that calls for full width multiply and then a clamp, which is two
> instructions' entropy and the dataflow delay, and doesn't actually clamp
> the multiply - consider a signed multiply that overflows (unclamped),
> and the low order looks to the clamp as a value of the opposite sign.
>
> Or have I misunderstood?
>

Luckily, widening multiply can't overflow...

So, say:
DMULS R4, R5, R2
MOV 2147483647, R6
MOV -2147483648, R7
CMPQGT R6, R2
MOV?T R6, R2
CMPQGT R7, R2
MOV?F R7, R2

But, one doesn't usually clamp each operation, but will rather do a more
complex operation, and then clamp the result.

But, then one is back to making sure that it can't overflow in any
intermediate calculations.

>>>
>>> But, never added them...
>> <
>> I never needed to............
>>>
>>> Mostly it is an issue of not being common enough (or expensive enough in
>>> the naive case) to justify the added cost.
>> <
>> When you start using LLVM as a front end, it throws these clamps out
>> any time
>> a value in a register gets arithmetically manipulated--preventing the
>> register from
>> ever having a value outside of the type defined value-space.
>> <
>> Unsigned char i = 0;
>> .......
>> i++
>> <
>> gets compiled to::
>> <
>>            MOV       R19,#0
>> .......
>>            ADD        R19,R19,#1
>>            SLL         R19,R19,<8:0>
>> <
>> Yes it is ugly......but people should not ask for small types unless
>> they need the small type.
>>
>
> Bit field select, signed and unsigned, is useful, but it's not the same
> as saturating arithmetic.

Yes.
There are built in instructions for sign and zero extension.

But, if (i==255), having i++ wrap to 0 is a very different behavior than
if 'i++' stays at 255.

Re: Clamping. was: Compact representation for common integer constants

<s7hoku$qd7$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16676&group=comp.arch#16676

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Clamping. was: Compact representation for common integer
constants
Date: Wed, 12 May 2021 16:33:50 -0700
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <s7hoku$qd7$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7hbo3$48e$1@dont-email.me> <s7hh95$hap$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 12 May 2021 23:33:50 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="06fcc820a36a296d009e14bf251c0493";
logging-data="27047"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18BC5xdZn4ww1ZTFhRyH/a6"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:EarSxJ2NDL33Zg2JPeQxHsa/kL4=
In-Reply-To: <s7hh95$hap$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Wed, 12 May 2021 23:33 UTC

On 5/12/2021 2:26 PM, BGB wrote:
> On 5/12/2021 2:53 PM, Ivan Godard wrote:
>> On 5/12/2021 12:31 PM, MitchAlsup wrote:
>>
>>> My 66000 has clamping operations to any bit width--these are a subset
>>> of the extract instructions (which is itself a subset of the shift
>>> instruction)
>>> The classical cases are::
>>>            SLL        R8,R8,<8:0>           // unsigned char
>>>            SL          R8,R8,<8:0>           // signed char
>>> <
>>> but there are other cases
>>> <
>>>            SLL        R8,R9,<11:0>         // 11-bit unsigned bit-field
>>>            SLL        R8,R9,<14:13>       // 14-bit field from R9<26:13>
>>> <
>>> These come in signed (sign extended) and unsigned (zero extended) forms.
>>> These add exactly ZERO (nada zilch no} instruction to ISA and are in
>>> fact the
>>> basis for the shift instructions with the simple rule of::
>>>   width==0 -> width "really =" 64
>>
>> But that calls for full width multiply and then a clamp, which is two
>> instructions' entropy and the dataflow delay, and doesn't actually
>> clamp the multiply - consider a signed multiply that overflows
>> (unclamped), and the low order looks to the clamp as a value of the
>> opposite sign.
>>
>> Or have I misunderstood?
>>
>
> Luckily, widening multiply can't overflow...
>
> So, say:
>  DMULS   R4, R5, R2
>  MOV      2147483647, R6
>  MOV     -2147483648, R7
>  CMPQGT  R6, R2
>  MOV?T   R6, R2
>  CMPQGT  R7, R2
>  MOV?F   R7, R2
>
> But, one doesn't usually clamp each operation, but will rather do a more
> complex operation, and then clamp the result.
>
> But, then one is back to making sure that it can't overflow in any
> intermediate calculations.

The problem is that there is always a width for which widening multiply
is not available, but saturating is possible. If you have wXw->d then
you're happy until you need to do dXd->sat(d).

Re: Clamping. was: Compact representation for common integer constants

<s7hovl$jd$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16677&group=comp.arch#16677

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Clamping. was: Compact representation for common integer
constants
Date: Wed, 12 May 2021 16:39:33 -0700
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <s7hovl$jd$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7hbo3$48e$1@dont-email.me>
<80d87cdf-a4dd-4d09-b34e-53db5cf449f4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 12 May 2021 23:39:33 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="06fcc820a36a296d009e14bf251c0493";
logging-data="621"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19uVkI/ZJpTrlciqs+lrwZQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:5E7liiqc6pCD6BAHi+NYri6LFa8=
In-Reply-To: <80d87cdf-a4dd-4d09-b34e-53db5cf449f4n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 12 May 2021 23:39 UTC

On 5/12/2021 2:27 PM, MitchAlsup wrote:
> On Wednesday, May 12, 2021 at 2:53:41 PM UTC-5, Ivan Godard wrote:
>> On 5/12/2021 12:31 PM, MitchAlsup wrote:
>>
>>> My 66000 has clamping operations to any bit width--these are a subset
>>> of the extract instructions (which is itself a subset of the shift instruction)
>>> The classical cases are::
>>> SLL R8,R8,<8:0> // unsigned char
>>> SL R8,R8,<8:0> // signed char
>>> <
>>> but there are other cases
>>> <
>>> SLL R8,R9,<11:0> // 11-bit unsigned bit-field
>>> SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
>>> <
>>> These come in signed (sign extended) and unsigned (zero extended) forms.
>>> These add exactly ZERO (nada zilch no} instruction to ISA and are in fact the
>>> basis for the shift instructions with the simple rule of::
>>> width==0 -> width "really =" 64
>> But that calls for full width multiply and then a clamp, which is two
>> instructions' entropy and the dataflow delay, and doesn't actually clamp
>> the multiply - consider a signed multiply that overflows (unclamped),
>> and the low order looks to the clamp as a value of the opposite sign.
>>
>> Or have I misunderstood?
> <
> clamp means to prevent values from being outside of a range
> saturate means to prevent a calculation from delivering a value that is out of range.
> So::
> uint8_t x = (int64_t)y
> needs to have the assignment clamped so x has no significance beyond 8-bits.
> whereas::
> saturated uint8_t x,y,z;
> x = y+z;
> needs to have the addition saturate.

True; sorry for sloppy terminology.

>>>>
>>>> But, never added them...
>>> <
>>> I never needed to............
>>>>
>>>> Mostly it is an issue of not being common enough (or expensive enough in
>>>> the naive case) to justify the added cost.
>>> <
>>> When you start using LLVM as a front end, it throws these clamps out any time
>>> a value in a register gets arithmetically manipulated--preventing the register from
>>> ever having a value outside of the type defined value-space.
>>> <
>>> Unsigned char i = 0;
>>> .......
>>> i++
>>> <
>>> gets compiled to::
>>> <
>>> MOV R19,#0
>>> .......
>>> ADD R19,R19,#1
>>> SLL R19,R19,<8:0>
>>> <
>>> Yes it is ugly......but people should not ask for small types unless they need the small type.
>>>
>> Bit field select, signed and unsigned, is useful, but it's not the same
>> as saturating arithmetic.
> <
> Was not implying that is was or is. One has to do with preventing a value container
> from containing a value outside of the allotted range, the other has to do with pre-
> venting a calculation from delivering a value that is out of range.
>

So do you have a cheap way to do a mul and trap if the result doesn't
fit in 11 bits?

How about to ensure that it is within [-3..542]? That's needed for Ada

Just asking, not challenging; both are problems I've wrestled with in Mill.

Re: Clamping. was: Compact representation for common integer constants

<6037be6e-a545-4e45-9a95-776c96a8f39cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16680&group=comp.arch#16680

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:126d:: with SMTP id b13mr14979766qkl.436.1620865317744;
Wed, 12 May 2021 17:21:57 -0700 (PDT)
X-Received: by 2002:a9d:2de1:: with SMTP id g88mr34279220otb.5.1620865317523;
Wed, 12 May 2021 17:21:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 May 2021 17:21:57 -0700 (PDT)
In-Reply-To: <s7hoku$qd7$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7hbo3$48e$1@dont-email.me> <s7hh95$hap$1@dont-email.me> <s7hoku$qd7$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6037be6e-a545-4e45-9a95-776c96a8f39cn@googlegroups.com>
Subject: Re: Clamping. was: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 May 2021 00:21:57 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 53
 by: MitchAlsup - Thu, 13 May 2021 00:21 UTC

On Wednesday, May 12, 2021 at 6:33:51 PM UTC-5, Ivan Godard wrote:
> On 5/12/2021 2:26 PM, BGB wrote:
> > On 5/12/2021 2:53 PM, Ivan Godard wrote:
> >> On 5/12/2021 12:31 PM, MitchAlsup wrote:
> >>
> >>> My 66000 has clamping operations to any bit width--these are a subset
> >>> of the extract instructions (which is itself a subset of the shift
> >>> instruction)
> >>> The classical cases are::
> >>> SLL R8,R8,<8:0> // unsigned char
> >>> SL R8,R8,<8:0> // signed char
> >>> <
> >>> but there are other cases
> >>> <
> >>> SLL R8,R9,<11:0> // 11-bit unsigned bit-field
> >>> SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
> >>> <
> >>> These come in signed (sign extended) and unsigned (zero extended) forms.
> >>> These add exactly ZERO (nada zilch no} instruction to ISA and are in
> >>> fact the
> >>> basis for the shift instructions with the simple rule of::
> >>> width==0 -> width "really =" 64
> >>
> >> But that calls for full width multiply and then a clamp, which is two
> >> instructions' entropy and the dataflow delay, and doesn't actually
> >> clamp the multiply - consider a signed multiply that overflows
> >> (unclamped), and the low order looks to the clamp as a value of the
> >> opposite sign.
> >>
> >> Or have I misunderstood?
> >>
> >
> > Luckily, widening multiply can't overflow...
> >
> > So, say:
> > DMULS R4, R5, R2
> > MOV 2147483647, R6
> > MOV -2147483648, R7
> > CMPQGT R6, R2
> > MOV?T R6, R2
> > CMPQGT R7, R2
> > MOV?F R7, R2
> >
> > But, one doesn't usually clamp each operation, but will rather do a more
> > complex operation, and then clamp the result.
> >
> > But, then one is back to making sure that it can't overflow in any
> > intermediate calculations.
> The problem is that there is always a width for which widening multiply
> is not available, but saturating is possible. If you have wXw->d then
> you're happy until you need to do dXd->sat(d).
<
Big Multiply is an easy problem in My 66000 using CARRY twice, once for
the double width multiply and the second for the double width add.

Re: Clamping. was: Compact representation for common integer constants

<566943a2-e384-4ca9-9ff5-a8ef7c86dd8dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16681&group=comp.arch#16681

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:570a:: with SMTP id 10mr9193705qtw.360.1620865323010;
Wed, 12 May 2021 17:22:03 -0700 (PDT)
X-Received: by 2002:a9d:3623:: with SMTP id w32mr32023506otb.16.1620865322790;
Wed, 12 May 2021 17:22:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 May 2021 17:22:02 -0700 (PDT)
In-Reply-To: <s7hovl$jd$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7hbo3$48e$1@dont-email.me> <80d87cdf-a4dd-4d09-b34e-53db5cf449f4n@googlegroups.com>
<s7hovl$jd$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <566943a2-e384-4ca9-9ff5-a8ef7c86dd8dn@googlegroups.com>
Subject: Re: Clamping. was: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 May 2021 00:22:03 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 13 May 2021 00:22 UTC

On Wednesday, May 12, 2021 at 6:39:34 PM UTC-5, Ivan Godard wrote:
> On 5/12/2021 2:27 PM, MitchAlsup wrote:
> > On Wednesday, May 12, 2021 at 2:53:41 PM UTC-5, Ivan Godard wrote:
> >> On 5/12/2021 12:31 PM, MitchAlsup wrote:
> >>
> >>> My 66000 has clamping operations to any bit width--these are a subset
> >>> of the extract instructions (which is itself a subset of the shift instruction)
> >>> The classical cases are::
> >>> SLL R8,R8,<8:0> // unsigned char
> >>> SL R8,R8,<8:0> // signed char
> >>> <
> >>> but there are other cases
> >>> <
> >>> SLL R8,R9,<11:0> // 11-bit unsigned bit-field
> >>> SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
> >>> <
> >>> These come in signed (sign extended) and unsigned (zero extended) forms.
> >>> These add exactly ZERO (nada zilch no} instruction to ISA and are in fact the
> >>> basis for the shift instructions with the simple rule of::
> >>> width==0 -> width "really =" 64
> >> But that calls for full width multiply and then a clamp, which is two
> >> instructions' entropy and the dataflow delay, and doesn't actually clamp
> >> the multiply - consider a signed multiply that overflows (unclamped),
> >> and the low order looks to the clamp as a value of the opposite sign.
> >>
> >> Or have I misunderstood?
> > <
> > clamp means to prevent values from being outside of a range
> > saturate means to prevent a calculation from delivering a value that is out of range.
> > So::
> > uint8_t x = (int64_t)y
> > needs to have the assignment clamped so x has no significance beyond 8-bits.
> > whereas::
> > saturated uint8_t x,y,z;
> > x = y+z;
> > needs to have the addition saturate.
> True; sorry for sloppy terminology.
> >>>>
> >>>> But, never added them...
> >>> <
> >>> I never needed to............
> >>>>
> >>>> Mostly it is an issue of not being common enough (or expensive enough in
> >>>> the naive case) to justify the added cost.
> >>> <
> >>> When you start using LLVM as a front end, it throws these clamps out any time
> >>> a value in a register gets arithmetically manipulated--preventing the register from
> >>> ever having a value outside of the type defined value-space.
> >>> <
> >>> Unsigned char i = 0;
> >>> .......
> >>> i++
> >>> <
> >>> gets compiled to::
> >>> <
> >>> MOV R19,#0
> >>> .......
> >>> ADD R19,R19,#1
> >>> SLL R19,R19,<8:0>
> >>> <
> >>> Yes it is ugly......but people should not ask for small types unless they need the small type.
> >>>
> >> Bit field select, signed and unsigned, is useful, but it's not the same
> >> as saturating arithmetic.
> > <
> > Was not implying that is was or is. One has to do with preventing a value container
> > from containing a value outside of the allotted range, the other has to do with pre-
> > venting a calculation from delivering a value that is out of range.
> >
> So do you have a cheap way to do a mul and trap if the result doesn't
> fit in 11 bits?
<
unsigned:
MUL R8,R13,R19
CMP R9,R8,#1<<11
PCIN R9,<1,1>
TRAP #where_ever
signed:
IMUL R8,R13,R19
ADD R9,R9,#1<<10
CMP R9,R9,#1<<11
PCIN R9,<1,1>
TRAP #where_ever
<
Not excessive, but not great, either.
<
But this is one of the reasons the CMP instruction performs the "within" boundary checks.
which come in 4 flavors:
CIN 0<=x<max // "C" IN
FIN 0<x<=max // FORTRAN IN
RIN 0<=x<=max // Really IN
AIN 0<x<max // Absolutely IN
>
> How about to ensure that it is within [-3..542]? That's needed for Ada
<
Change ADD #1<<11 to ADD #3
Change CMP to #542+3
>
> Just asking, not challenging; both are problems I've wrestled with in Mill.

Pages:123456789101112131415
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor