Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

If you think the system is working, ask someone who's waiting for a prompt.


devel / comp.arch / Re: Compact representation for common integer constants

SubjectAuthor
* Compact representation for common integer constantsJohnG
+* Re: Compact representation for common integer constantsIvan Godard
|+- Re: Compact representation for common integer constantsDavid Brown
|`* Re: Compact representation for common integer constantsJohnG
| `* Re: Compact representation for common integer constantsBGB
|  `* Re: Compact representation for common integer constantsMitchAlsup
|   `* Re: Compact representation for common integer constantsBGB
|    `* Re: Compact representation for common integer constantsThomas Koenig
|     +- Re: Compact representation for common integer constantsMitchAlsup
|     `* Re: Compact representation for common integer constantsBGB
|      `* Re: Compact representation for common integer constantsMitchAlsup
|       `* Re: Compact representation for common integer constantsIvan Godard
|        +- Re: Compact representation for common integer constantsMarcus
|        +* Re: Compact representation for common integer constantsBGB
|        |`* Re: Compact representation for common integer constantsMitchAlsup
|        | +* Clamping. was: Compact representation for common integer constantsIvan Godard
|        | |+* Re: Clamping. was: Compact representation for common integer constantsMitchAlsup
|        | ||`* Re: Clamping. was: Compact representation for common integerIvan Godard
|        | || `- Re: Clamping. was: Compact representation for common integer constantsMitchAlsup
|        | |`* Re: Clamping. was: Compact representation for common integerBGB
|        | | `* Re: Clamping. was: Compact representation for common integerIvan Godard
|        | |  `- Re: Clamping. was: Compact representation for common integer constantsMitchAlsup
|        | +* Re: Compact representation for common integer constantsMarcus
|        | |`* Re: Compact representation for common integer constantsMitchAlsup
|        | | `* Re: Compact representation for common integer constantsDavid Brown
|        | |  `* Re: Compact representation for common integer constantsMitchAlsup
|        | |   +- Re: Compact representation for common integer constantsThomas Koenig
|        | |   `* Re: Compact representation for common integer constantsDavid Brown
|        | |    `- Re: Compact representation for common integer constantsMitchAlsup
|        | `* Re: Compact representation for common integer constantsThomas Koenig
|        |  +- Re: Compact representation for common integer constantsAnton Ertl
|        |  `* Re: Compact representation for common integer constantsMitchAlsup
|        |   `* Re: Compact representation for common integer constantsThomas Koenig
|        |    +* Re: Compact representation for common integer constantsAnton Ertl
|        |    |`* Re: Compact representation for common integer constantsBrian G. Lucas
|        |    | +* Re: Compact representation for common integer constantsThomas Koenig
|        |    | |`- Re: Compact representation for common integer constantsBrian G. Lucas
|        |    | +- Re: Compact representation for common integer constantsStefan Monnier
|        |    | `* Re: Compact representation for common integer constantsAnton Ertl
|        |    |  `* Re: Compact representation for common integer constantsThomas Koenig
|        |    |   +* Re: Compact representation for common integer constantsAnton Ertl
|        |    |   |`* Re: Compact representation for common integer constantsThomas Koenig
|        |    |   | `- Re: Compact representation for common integer constantsAnton Ertl
|        |    |   `* Re: Compact representation for common integer constantsTerje Mathisen
|        |    |    `- Re: Compact representation for common integer constantsAnton Ertl
|        |    `* Re: Compact representation for common integer constantsMitchAlsup
|        |     `* Re: Compact representation for common integer constantsThomas Koenig
|        |      `* Re: Compact representation for common integer constantsBrian G. Lucas
|        |       `* Re: Compact representation for common integer constantsThomas Koenig
|        |        +* Re: Compact representation for common integer constantsMitchAlsup
|        |        |`- Re: Compact representation for common integer constantsThomas Koenig
|        |        +* Re: Compact representation for common integer constantsAnton Ertl
|        |        |+* Re: Compact representation for common integer constantsThomas Koenig
|        |        ||+* Re: Compact representation for common integer constantsMitchAlsup
|        |        |||`* Re: Compact representation for common integer constantsThomas Koenig
|        |        ||| `- Re: Compact representation for common integer constantsMitchAlsup
|        |        ||`* Re: Compact representation for common integer constantsAnton Ertl
|        |        || +* Re: Compact representation for common integer constantsMitchAlsup
|        |        || |+* Re: Compact representation for common integer constantsEricP
|        |        || ||+* Re: Compact representation for common integer constantsThomas Koenig
|        |        || |||+- Re: Compact representation for common integer constantsMitchAlsup
|        |        || |||+* Re: Compact representation for common integer constantsEricP
|        |        || ||||`* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || |||| `* Re: Compact representation for common integer constantsDavid Brown
|        |        || ||||  `* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || ||||   `* Re: Compact representation for common integer constantsDavid Brown
|        |        || ||||    `- Re: Compact representation for common integer constantsTerje Mathisen
|        |        || |||`* Re: Compact representation for common integer constantsStephen Fuld
|        |        || ||| `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || |||  +- Re: Compact representation for common integer constantsStephen Fuld
|        |        || |||  `* Re: Compact representation for common integer constantsBill Findlay
|        |        || |||   `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || |||    `- Re: Compact representation for common integer constantsBill Findlay
|        |        || ||+* Re: Compact representation for common integer constantsThomas Koenig
|        |        || |||+* Re: Compact representation for common integer constantsStephen Fuld
|        |        || ||||`- Re: Compact representation for common integer constantsThomas Koenig
|        |        || |||`- Re: Compact representation for common integer constantsEricP
|        |        || ||`* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || +* Re: Compact representation for common integer constantsNiklas Holsti
|        |        || || |`* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || | `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |  `* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || |   `* Re: Compact representation for common integer constantsEricP
|        |        || || |    +* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |`* Re: Compact representation for common integer constantsEricP
|        |        || || |    | `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |  `* Re: Compact representation for common integer constantsThomas Koenig
|        |        || || |    |   +- Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |   `* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |    |    `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |     +- Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |    |     `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |    |      `- Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |    `* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || |     `* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |      +- Re: Compact representation for common integer constantsBill Findlay
|        |        || || |      +* Re: Compact representation for common integer constantsThomas Koenig
|        |        || || |      |+* Re: Compact representation for common integer constantsAnton Ertl
|        |        || || |      ||`* Re: Compact representation for common integer constantsThomas Koenig
|        |        || || |      || `- Re: Compact representation for common integer constantsAnton Ertl
|        |        || || |      |`* Re: Compact representation for common integer constantsMitchAlsup
|        |        || || |      +* Re: Compact representation for common integer constantsTerje Mathisen
|        |        || || |      `* Re: Compact representation for common integer constantsStephen Fuld
|        |        || || `* Re: Compact representation for common integer constantsEricP
|        |        || |`- Re: Compact representation for common integer constantsAnton Ertl
|        |        || `* Re: Compact representation for common integer constantsThomas Koenig
|        |        |`* Re: Compact representation for common integer constantsMitchAlsup
|        |        `* Re: Compact representation for common integer constantsBrian G. Lucas
|        `* Re: Compact representation for common integer constantsQuadibloc
+* Re: Compact representation for common integer constantsBGB
`* Re: Compact representation for common integer constantsJohn Levine

Pages:123456789101112131415
Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16520&group=comp.arch#16520

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6799:: with SMTP id b25mr11165012qtp.165.1620419606711;
Fri, 07 May 2021 13:33:26 -0700 (PDT)
X-Received: by 2002:aca:d493:: with SMTP id l141mr2789170oig.51.1620419606480;
Fri, 07 May 2021 13:33:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 May 2021 13:33:26 -0700 (PDT)
In-Reply-To: <2021May7.195613@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org> <2021May7.195613@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com>
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 May 2021 20:33:26 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 7 May 2021 20:33 UTC

On Friday, May 7, 2021 at 1:34:44 PM UTC-5, Anton Ertl wrote:
> Stefan Monnier <mon...@iro.umontreal.ca> writes:
> >Also I think the "classic RISC" is the result of specific conditions at
> >that point in time which resulted in a particular sweet spot in the
> >design space.
<
I might note that at the time RISC was coined, it was not possible to
put a pipelined 32- bit data path and control unit in any of the then
current CISC architectures. For example the 68020 needed a 68881
for floating point and a 68851 for MMU whereas MIPS R2000 only
needed the CPU + TLB chip and the FP chip. Also at the time, CISC
were running 1 instruction every 2 clocks best case.
<
> There is some truth to this: For CPUs that perform well on
> general-purpose code, RISC gave a big performance advantage in the
> mid-late 1980s.
<
To a large part this was the memory performance from large SRAM
caches (sometimes 2 caches) and the pipelined data path.
<
> That advantage eroded as Moore's law gave more
> transistors to CISCs, allowing CISCs to implement pipelining (486
> 1989), superscalar (Pentium 1993), and OoO execution (Pentium Pro
> 1995).
<
Once CISC got pipelined and larger caches, they did indeed suddenly
catch up.
>
> OTOH, the CISCs did not manage to conquer low-power space, despite
> attempts by both Intel (Bonnell, Silvermont ff.) and AMD (Bobcat ff.).
<
It is no wonder since Qualcom had an entry that ran in the milliwatt
region whereas the LP x86s were targeted at sub 10 Watt region.
<
> And they also do not compete in the low-area microcontroller market.
<
Where manufactures sell chips at $0.10 each.
<
> AFAIK every Ryzen has several ARM-based microcontrollers inside that
> manage various aspects of its operation.
> >But design constraints are quite different now, so applying the same
> >"quantitative approach" that lead to MIPS/SPARC/... will now result in
> >something quite different. One could argue that such new ISAs
> >could be called RISC-2020 ;-)
<
> Hmm, RISC-V is pretty close to MIPS (and Patterson argues that it is
> pretty close to Berkeley RISC). Maybe the argument is: If the ISA
> does not matter much for high-performance, high-power, high-area
> cores, because we can fix it in the decoder, better optimize the ISA
> for lower-power lower-area applications, where RISC still gives
> benefits (of course, if you go for lowest area, you go for something
> like the b16-small, but apparently people are not that keen on having
> something smaller than a Cortex-M3 or so.
<
An R2000 shrunk down to 7nm is smaller than a single I/O pad on the
actual R2000.
>
> The other entry in the ISA field in recent years is Aarch64, which is
> a RISC, and in certain ways closer to MIPS/SPARC than to the original
> ARM ISA, but in others it's quite different: condition codes, double
> loads and double stores, many addressing modes.
<
I would argue that "having address modes" does not eliminate an ISA
from the RISC camp. As long as the address mode still takes one
cycle in address calculation so [Rbase+Rindex<<scale+Displacement]
can remain RISC, while *p++, *--p, and **p cannot.
<
Secondly, there is a lost of potential code space savings by having
things like LM and STM (or their even more powerful cousins ENTER
and EXIT). And once you have an AGEN sequencer, why not setup the
pipeline and run it at the width of the data path. Thus, My 66000 even
contains MM (memory to memory move).
<
Thirdly, constants, this is where most RISCs fell on their faces. What
with instructions being used to paste 32-bit constants together, and
it gets really messy pasting 64-bit constants together. No, the proper
rule, here, it to expend no (nada zero zilch) execution time pasting
constants together. These constants need to service the needs of
integers, logicals, floating pint, and global/local memory access.
<
Finally, I will argue that having transcendental instructions that are as
fast as an FDIV is GOOD for the ISA and certainly better than having to
call subroutines of the same functionality.
<
> And from this ISA
> camp we see the A14 Firestorm core which has a significantly higher
> IPC than Intel's and AMD's offerings, admittedly at a lower peak clock
> rate, but also at a lower power consumption; an 8-wide execution
> engine with a very deep reorder buffer (630 entries compared to 352
> for Intel's Sunny Cove and 256 for AMD's Zen3) are probably a reason
> for this, but the question remains: Was the ISA helpful (or less of a
> hindrance than AMD64) for building such a wide and deep core?
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16521&group=comp.arch#16521

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:7745:: with SMTP id s66mr11422158qkc.18.1620420014228;
Fri, 07 May 2021 13:40:14 -0700 (PDT)
X-Received: by 2002:a4a:8311:: with SMTP id f17mr5988957oog.83.1620420013994;
Fri, 07 May 2021 13:40:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 May 2021 13:40:13 -0700 (PDT)
In-Reply-To: <s747f6$5ri$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org> <s747f6$5ri$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 May 2021 20:40:14 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 7 May 2021 20:40 UTC

On Friday, May 7, 2021 at 3:20:56 PM UTC-5, BGB wrote:
> On 5/7/2021 8:15 AM, Stefan Monnier wrote:
> >> It is possible that it could be generalized, but then the ISA would be less
> >> RISC style then it is already...
> >>
> >> Then again, I did see recently that someone had listed my ISA somewhere, but
> >> then classified it as a CISC; I don't entirely agree, but alas...
> >>
> >> Then again, many peoples' definitions of "RISC" exclude largish
> >> instruction-sets with variable-length instruction encodings, so alas.
> >>
> >> But, taken at face value, then one would also need to exclude Thumb2 and
> >> similar from the RISC category.
> >
> > I think it's better not to worry about how other people label your ISA.
> >
> > The manichean RISC-vs-CISC labeling is stupid anyway: the design space
> > has many more than 2 spots.
> >
> > Also I think the "classic RISC" is the result of specific conditions at
> > that point in time which resulted in a particular sweet spot in the
> > design space.
> >
> > But design constraints are quite different now, so applying the same
> > "quantitative approach" that lead to MIPS/SPARC/... will now result in
> > something quite different. One could argue that such new ISAs
> > could be called RISC-2020 ;-)
> >
> Yeah.
>
> Classic RISC:
> Fixed size instructions
> Typically only a single addressing mode (Reg, Disp)
> Typically only supporting aligned memory access
> Aims for fixed 1 instruction per cycle
<
Mc 88K had
fixed sized instructions
[Rbase+IMM16] and [Rbase+Rindex<<scale] address modes
Aligned memory has been show to be defective
aimed at 1ipc but we did engineer a 6-wide OoO version

My 66000
has fixed sized instruction specifiers with 1 or 2 constants.
[Rbase+IMM16] and [Rbase+Rindex<<scale] and [Rbase+Rindex<<scale+disp[32,64] address modes
Misaligned memory model
Aimed at low burden for LBIO implementations and low burden for GBOoO implementations
Inherently parallel
Never needs a NoOp
> ...
>
> BJX2:
> Variable length
> 16/32 base
> 16/32/64/96 extended
> 48 possible, but currently unused
> Original 48-bit ops can no-longer be encoded.
> Multiple addressing modes
> Unaligned memory access (8/16/32/64)
> 64/128 bit alignment for 128-bit ops.
> Supports explicitly parallel instructions
> ...
>
>
> It has a few features in common with VLIW architectures as well, but
> differs from traditional VLIW in that these typically use a fixed-size
> bundle encoding, whereas BJX2 bundles are variable-length (composed of
> 32-bit units), with the 64 and 96 bit instruction encodings being
> effectively a special-case of the bundle encoding.
>
> There are several DSP architectures which do something similar to this:
> Xtensa, Hexagon, ...
>
>
> Though, originally, inclination toward VLIW support was based on taking
> inspiration from the TMS320C6x and IA64, which, granted, use a
> fixed-size bundle encoding.
>
> Some of my earlier ideas involved using a more traditional bundle format
> (just sort of awkwardly plonked into the code-stream), but this later
> transformed into the current approach.
>
>
> A recent experiment did also test using 24-bit instructions, but as
> noted elsewhere, while in themselves they were effective at reducing the
> number of 32-bit ops in size-optimized code, because 32-bit ops are
> already the minority in the size-optimized case, the net savings were
> fairly small (and ran into issues with the baked-in assumption that
> branch-targets are 16-bit aligned, *, ...).
>
> *: One either needs to jostle around with re-encoding the past several
> instructions to re-align the instruction stream, or insert a 24-bit NOP
> (if the former fails), or use 24-bit byte-aligned branch encodings
> (which ended up costing more than they saved vs the "reencode ops to fix
> alignment to allow for using the 16-bit branch ops" strategy).
>
>
>
> As for resource cost:
> My current BJX2 core costs ~ 4x as much as a minimalist 32-bit RISC
> style core.
>
> Or, basically, if I go for:
> Fixed-length instructions
> One instruction at a time
> 16x 32-bit GPRs
> Aligned-only memory access
> No Variable-Shift or FPU ops
> No MMU
> ...
>
> It is possible to use ~ 1/4 the LUTs of the current (full-featured) BJX2
> core. The difference isn't quite drastic enough to make a strong use
> case for the minimalist core (even as a secondary "IO controller" or
> similar).
>
> I can also get a lot of this back (eg, fitting a microcontroller subset
> of BJX2 on an XC7S25), mostly by disabling WEX and the MMU and similar.
>
> By disabling the FPU and a few other things, it is also possible to
> shoe-horn it into an XC7A15 (basically, the smallest Xilinx FPGA I can
> really manage to find on FPGA dev-boards "in the wild").
>
> This in-turn creates a disincentive to have a separate 32-bit ISA, vs
> using a slightly more restrictive subset of the 64-bit ISA.
>
>
> I had also looked briefly at trying to do a 16-bit IO controller, but
> then ran into the problem that there is basically no real good way to
> plug a 16-bit core into my existing bus architecture.
>
> And, it seems, short of targeting something like an iCE40 or similar,
> there isn't much point.
>
> Not sure how this compares with the ASIC space, but I suspect with
> modern technologies this is probably "grain of sand" territory.
>
>
> Could matter more if one is having their logic printed onto a plastic
> substrate using semiconductor inks, but given these technologies (in
> their commercial form) are managing things like Cortex-M cores, it
> probably isn't too major of an issue.
> Similarly, building such a printer would currently appear to be too
> expensive for the hobbyist space.
>
>
> ...

Re: Compact representation for common integer constants

<s74akj$siq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16522&group=comp.arch#16522

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Fri, 7 May 2021 16:14:51 -0500
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <s74akj$siq$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 21:15:00 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e2a3090e0f7b3aa018204bf333c0c384";
logging-data="29274"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18phYgF51YQnttCQaAjyq/a"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:tWJhx7XXoDNTZhFu3pva4hQNOVc=
In-Reply-To: <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
Content-Language: en-US
 by: BGB - Fri, 7 May 2021 21:14 UTC

On 5/7/2021 8:18 AM, JohnG wrote:
> On Wednesday, May 5, 2021 at 8:29:32 AM UTC-7, Ivan Godard wrote:
>> You wouldn't want to actually do the implied multiplies, so this would
>> become a look-up table indexed by the byte. But if you do that then you
>> could have any frequent values in the table, not just those implied by
>> the factors. That's how Mill popCons (Popular Constants) work, with the
>> nicety that they are encoded only using otherwise wasted entropy in the
>> binary ISA.
>
> Right, but in my example, there are only 8 interesting implied multiplies and would only need an 8x9b rom or equivalent logic:
>
> [5^n 2-bits][3^n 1-bit]
> [00][0] = 9'b0_0000_0001 # 5^0 * 3^0 = 1
> [00][1] = 9'b0_0000_0011 # 5^0 * 3^1 = 3
> [01][0] = 9'b0_0000_0101 # 5^1 * 3^0 = 5
> [01][1] = 9'b0_0000_1111 # 5^1 * 3^1 = 15
> [10][0] = 9'b0_0001_1001 # 5^2 * 3^0 = 25
> [10][1] = 9'b0_0100_1011 # 5^2 * 3^1 = 75
> [11][0] = 9'b0_0111_1101 # 5^3 * 3^0 = 125
> [11][1] = 9'b1_0111_0111 # 5^3 * 3^1 = 375
>

Realize that by FPGA standards, an 8b*9b ROM, presumably looking up
18-bit values or similar, is "pretty damn massive"...

Luckily at least, most FPGAs provide DSP48 units or similar which can
manage a small multiplier, but then roughly one may have to budget for
an extra clock-cycle of latency.

> Everything else is just as many left shifts as you want to spend bits on and could even wait until the ALU if it has the ability to shift inputs.
>
> If you have the space and time to have a table of perfect constants, then yeah, that seems like the way to go.
>
> A couple other things I think are asked in other replies that I'll put here just to have them all in one place.
>

Realize, also, that shift is "kinda expensive" in terms of resources...
One doesn't really want to assume that a shift unit is available during
the decode process. ALU/EX is another matter though (if it is already
known to be available), albeit limits how such a thing may be used.

This is also why many smaller ISAs tend to leave out things like
variable-shift and integer multiply...

Decided to leave it out, but one can use their imagination for how
things work on an ISA which lacks both shift instructions and multiply...

Re: Compact representation for common integer constants

<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16523&group=comp.arch#16523

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:7745:: with SMTP id s66mr11619282qkc.18.1620422918032;
Fri, 07 May 2021 14:28:38 -0700 (PDT)
X-Received: by 2002:a05:6830:40a4:: with SMTP id x36mr6835772ott.342.1620422917736;
Fri, 07 May 2021 14:28:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 May 2021 14:28:37 -0700 (PDT)
In-Reply-To: <s74akj$siq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 May 2021 21:28:38 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 7 May 2021 21:28 UTC

On Friday, May 7, 2021 at 4:15:02 PM UTC-5, BGB wrote:
> On 5/7/2021 8:18 AM, JohnG wrote:
> > On Wednesday, May 5, 2021 at 8:29:32 AM UTC-7, Ivan Godard wrote:
> >> You wouldn't want to actually do the implied multiplies, so this would
> >> become a look-up table indexed by the byte. But if you do that then you
> >> could have any frequent values in the table, not just those implied by
> >> the factors. That's how Mill popCons (Popular Constants) work, with the
> >> nicety that they are encoded only using otherwise wasted entropy in the
> >> binary ISA.
> >
> > Right, but in my example, there are only 8 interesting implied multiplies and would only need an 8x9b rom or equivalent logic:
> >
> > [5^n 2-bits][3^n 1-bit]
> > [00][0] = 9'b0_0000_0001 # 5^0 * 3^0 = 1
> > [00][1] = 9'b0_0000_0011 # 5^0 * 3^1 = 3
> > [01][0] = 9'b0_0000_0101 # 5^1 * 3^0 = 5
> > [01][1] = 9'b0_0000_1111 # 5^1 * 3^1 = 15
> > [10][0] = 9'b0_0001_1001 # 5^2 * 3^0 = 25
> > [10][1] = 9'b0_0100_1011 # 5^2 * 3^1 = 75
> > [11][0] = 9'b0_0111_1101 # 5^3 * 3^0 = 125
> > [11][1] = 9'b1_0111_0111 # 5^3 * 3^1 = 375
> >
> Realize that by FPGA standards, an 8b*9b ROM, presumably looking up
> 18-bit values or similar, is "pretty damn massive"...
<
But brain dead easy to verify........

On the other hand, those constants may not be used all that often.
>
> Luckily at least, most FPGAs provide DSP48 units or similar which can
> manage a small multiplier, but then roughly one may have to budget for
> an extra clock-cycle of latency.
> > Everything else is just as many left shifts as you want to spend bits on and could even wait until the ALU if it has the ability to shift inputs.
> >
> > If you have the space and time to have a table of perfect constants, then yeah, that seems like the way to go.
> >
> > A couple other things I think are asked in other replies that I'll put here just to have them all in one place.
> >
> Realize, also, that shift is "kinda expensive" in terms of resources...
> One doesn't really want to assume that a shift unit is available during
> the decode process. ALU/EX is another matter though (if it is already
> known to be available), albeit limits how such a thing may be used.
<
I did a design that was 6-wide and only had 3 shifters--which were shared
with (i.e. positioned in) the LD/ST units wherease each of the 6 FUs could
perform an integer operation (IMUL and IDIV were in the FMAC unit).

With base plus scaled index addressing the number of shifts in typical code
was down in the 2%-5% range so while you hade to have somewhat easy
access to a shifter, in a unit that is not expected to be saturated, you don't
have to over resource shift capability.
>
> This is also why many smaller ISAs tend to leave out things like
> variable-shift and integer multiply...
<
In this day and age, that is a mistake.
>
>
> Decided to leave it out, but one can use their imagination for how
> things work on an ISA which lacks both shift instructions and multiply...

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<2021May7.233426@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16524&group=comp.arch#16524

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)
Date: Fri, 07 May 2021 21:34:26 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 32
Message-ID: <2021May7.233426@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com> <s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com> <s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org> <2021May7.195613@mips.complang.tuwien.ac.at> <3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="6a2c35625d0ae004fed12eeed502541b";
logging-data="20336"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xMYO1BF9n7O55Bu2huCOL"
Cancel-Lock: sha1:3lLY4Bhd3rj+hDT4R0UZMPk4JGo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 7 May 2021 21:34 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Friday, May 7, 2021 at 1:34:44 PM UTC-5, Anton Ertl wrote:
>> The other entry in the ISA field in recent years is Aarch64, which is
>> a RISC, and in certain ways closer to MIPS/SPARC than to the original
>> ARM ISA, but in others it's quite different: condition codes, double
>> loads and double stores, many addressing modes.
><
>I would argue that "having address modes" does not eliminate an ISA
>from the RISC camp.

I agree. It contributes to Aarch64 being relatively far from the
MIPS, though.

>As long as the address mode still takes one
>cycle in address calculation so [Rbase+Rindex<<scale+Displacement]
>can remain RISC, while *p++, *--p, and **p cannot.

ARM, HPPA, Power, and Aarch64 have loads and stores with
autoincrement/decrement and a considered to be RISCs by most.

>Secondly, there is a lost of potential code space savings by having
>things like LM and STM

ARM and Power have load and store multiple intructions. The Aarch64
architects apparently decided that this is not so great, and gave the
users load and store pair instructions, which are quite useful also in
cases where I don't see an ARM compiler generate load/store-multiple.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s74epa$9jn$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16525&group=comp.arch#16525

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-6262-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Fri, 7 May 2021 22:25:46 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s74epa$9jn$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<2021May7.195613@mips.complang.tuwien.ac.at>
<3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com>
<2021May7.233426@mips.complang.tuwien.ac.at>
Injection-Date: Fri, 7 May 2021 22:25:46 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-6262-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:6262:0:7285:c2ff:fe6c:992d";
logging-data="9847"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 7 May 2021 22:25 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

> ARM, HPPA, Power, and Aarch64 have loads and stores with
> autoincrement/decrement and a considered to be RISCs by most.

And at least for POWER, they are a bit different (and more
powerful):

lbzu ra,D(rb)

will, for example, load from ra from rb + D and store rb + d into
rb afterwards.

However, it is better not to use them:

# In some implementations, the Load Algebraic and
# Load with Update instructions may have greater
# latency than other types of Load instructions. More-
# over, Load with Update instructions may take lon-
# ger to execute in some implementations than the
# corresponding pair of a non-update Load instruc-
# tion and an Add instruction.

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<b6d66d81-2c1d-4ccf-ba22-27f6381659cen@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16526&group=comp.arch#16526

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:910:: with SMTP id 16mr8026951qkj.497.1620426671463;
Fri, 07 May 2021 15:31:11 -0700 (PDT)
X-Received: by 2002:aca:30cc:: with SMTP id w195mr8965285oiw.78.1620426671266;
Fri, 07 May 2021 15:31:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 May 2021 15:31:10 -0700 (PDT)
In-Reply-To: <s74epa$9jn$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<2021May7.195613@mips.complang.tuwien.ac.at> <3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com>
<2021May7.233426@mips.complang.tuwien.ac.at> <s74epa$9jn$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b6d66d81-2c1d-4ccf-ba22-27f6381659cen@googlegroups.com>
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 May 2021 22:31:11 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 7 May 2021 22:31 UTC

On Friday, May 7, 2021 at 5:25:48 PM UTC-5, Thomas Koenig wrote:
> Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
> > ARM, HPPA, Power, and Aarch64 have loads and stores with
> > autoincrement/decrement and a considered to be RISCs by most.
> And at least for POWER, they are a bit different (and more
> powerful):
>
> lbzu ra,D(rb)
>
> will, for example, load from ra from rb + D and store rb + d into
> rb afterwards.
<
I think one could argue that this is causing the instruction to have
more than one result.
>
> However, it is better not to use them:
>
> # In some implementations, the Load Algebraic and
> # Load with Update instructions may have greater
> # latency than other types of Load instructions. More-
> # over, Load with Update instructions may take lon-
> # ger to execute in some implementations than the
> # corresponding pair of a non-update Load instruc-
> # tion and an Add instruction.
<
Register write collisions cause pipeline stalling behavior.

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<2091e184-b3cf-4d54-abfe-fc9dca5ea315n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16527&group=comp.arch#16527

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:b1b:: with SMTP id t27mr5463901qkg.42.1620427523492; Fri, 07 May 2021 15:45:23 -0700 (PDT)
X-Received: by 2002:aca:c64a:: with SMTP id w71mr11689672oif.44.1620427523298; Fri, 07 May 2021 15:45:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 May 2021 15:45:23 -0700 (PDT)
In-Reply-To: <2021May7.233426@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com> <s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com> <s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org> <2021May7.195613@mips.complang.tuwien.ac.at> <3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com> <2021May7.233426@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2091e184-b3cf-4d54-abfe-fc9dca5ea315n@googlegroups.com>
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 May 2021 22:45:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 69
 by: MitchAlsup - Fri, 7 May 2021 22:45 UTC

On Friday, May 7, 2021 at 4:47:41 PM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Friday, May 7, 2021 at 1:34:44 PM UTC-5, Anton Ertl wrote:
> >> The other entry in the ISA field in recent years is Aarch64, which is
> >> a RISC, and in certain ways closer to MIPS/SPARC than to the original
> >> ARM ISA, but in others it's quite different: condition codes, double
> >> loads and double stores, many addressing modes.
> ><
> >I would argue that "having address modes" does not eliminate an ISA
> >from the RISC camp.
> I agree. It contributes to Aarch64 being relatively far from the
> MIPS, though.
> >As long as the address mode still takes one
> >cycle in address calculation so [Rbase+Rindex<<scale+Displacement]
> >can remain RISC, while *p++, *--p, and **p cannot.
> ARM, HPPA, Power, and Aarch64 have loads and stores with
> autoincrement/decrement and a considered to be RISCs by most.
> >Secondly, there is a lost of potential code space savings by having
> >things like LM and STM
> ARM and Power have load and store multiple intructions. The Aarch64
> architects apparently decided that this is not so great, and gave the
> users load and store pair instructions, which are quite useful also in
> cases where I don't see an ARM compiler generate load/store-multiple.
<
I added these (ENTER and EXIT in particular) to avoid subroutine prologue
that looked like::
subroutine:
SUB SP,SP, 208
MOV R16,[SP+100]
MOV R17,[SP+108]
MOV R18,[SP+116]
MOV R19,[SP+124]
MOV R20,[SP+132]
MOV R21,[SP+140]
MOV R22,[SP+148]
MOV R23,[SP+156]
MOV R24,[SP+164]
MOV R25,[SP+172]
MOV R26,[SP+180]
MOV R27,[SP+188]
MOV R28[SP+192]
MOV R29,[SP+200]
MOV R0,[SP+208]

And similar for the subroutine epilogue.

Instead One can simply write:

subroutine:
ENTER R16,R0,100

And perform the same amount of work. Now, since HW knows the list of
registers being saves is IN ORDER in memory, it can read the register file
4-8 registers at a time and store them into 1/4 or 1/2 cache line every cycle.
So on a lowly 1-wide machine the above sequence would take 4 or 5 cycles.
On a more aggressive implementation this can be performed lazily and in
the background (so it might appear to take only 1 cycle.) Returning from
the subroutine does similarly, but since the HW knows* the EXIT performs
a control transfer, the return address can be read and delivered to IP without
flowing through R0 and the FETCH can begin before the register reloads
complete.

(*) it is possible to perform an EXIT that does not perform a control transfer
and these are used when exiting dynamic blocks in ALGOL-like languages
and by the stack walker for TRY-CATCH-THROW.

> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Compact representation for common integer constants

<s74muh$vqf$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16532&group=comp.arch#16532

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Fri, 7 May 2021 19:44:57 -0500
Organization: A noiseless patient Spider
Lines: 190
Message-ID: <s74muh$vqf$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 00:45:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d88d4e5616e3e48100710db326053574";
logging-data="32591"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/BOImQVEzXozWrF0a4q+Sn"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:BQjoTzKxzgef+og0uU1PbFRSXNE=
In-Reply-To: <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 8 May 2021 00:44 UTC

On 5/7/2021 4:28 PM, MitchAlsup wrote:
> On Friday, May 7, 2021 at 4:15:02 PM UTC-5, BGB wrote:
>> On 5/7/2021 8:18 AM, JohnG wrote:
>>> On Wednesday, May 5, 2021 at 8:29:32 AM UTC-7, Ivan Godard wrote:
>>>> You wouldn't want to actually do the implied multiplies, so this would
>>>> become a look-up table indexed by the byte. But if you do that then you
>>>> could have any frequent values in the table, not just those implied by
>>>> the factors. That's how Mill popCons (Popular Constants) work, with the
>>>> nicety that they are encoded only using otherwise wasted entropy in the
>>>> binary ISA.
>>>
>>> Right, but in my example, there are only 8 interesting implied multiplies and would only need an 8x9b rom or equivalent logic:
>>>
>>> [5^n 2-bits][3^n 1-bit]
>>> [00][0] = 9'b0_0000_0001 # 5^0 * 3^0 = 1
>>> [00][1] = 9'b0_0000_0011 # 5^0 * 3^1 = 3
>>> [01][0] = 9'b0_0000_0101 # 5^1 * 3^0 = 5
>>> [01][1] = 9'b0_0000_1111 # 5^1 * 3^1 = 15
>>> [10][0] = 9'b0_0001_1001 # 5^2 * 3^0 = 25
>>> [10][1] = 9'b0_0100_1011 # 5^2 * 3^1 = 75
>>> [11][0] = 9'b0_0111_1101 # 5^3 * 3^0 = 125
>>> [11][1] = 9'b1_0111_0111 # 5^3 * 3^1 = 375
>>>
>> Realize that by FPGA standards, an 8b*9b ROM, presumably looking up
>> 18-bit values or similar, is "pretty damn massive"...
> <
> But brain dead easy to verify........
>
> On the other hand, those constants may not be used all that often.
>>
>> Luckily at least, most FPGAs provide DSP48 units or similar which can
>> manage a small multiplier, but then roughly one may have to budget for
>> an extra clock-cycle of latency.
>>> Everything else is just as many left shifts as you want to spend bits on and could even wait until the ALU if it has the ability to shift inputs.
>>>
>>> If you have the space and time to have a table of perfect constants, then yeah, that seems like the way to go.
>>>
>>> A couple other things I think are asked in other replies that I'll put here just to have them all in one place.
>>>
>> Realize, also, that shift is "kinda expensive" in terms of resources...
>> One doesn't really want to assume that a shift unit is available during
>> the decode process. ALU/EX is another matter though (if it is already
>> known to be available), albeit limits how such a thing may be used.
> <
> I did a design that was 6-wide and only had 3 shifters--which were shared
> with (i.e. positioned in) the LD/ST units wherease each of the 6 FUs could
> perform an integer operation (IMUL and IDIV were in the FMAC unit).
>
> With base plus scaled index addressing the number of shifts in typical code
> was down in the 2%-5% range so while you hade to have somewhat easy
> access to a shifter, in a unit that is not expected to be saturated, you don't
> have to over resource shift capability.

A 'core' in BJX2 currently has:
3x 'SHAD' Units
2x "wide" funnel shift units (Lane 1/2);
1x "narrow" shift-unit (Lane 3);
AGU / AGEN (Lane 1)
32 or 48-bit base address (Operating Mode)
32-bit displacement
(Linear addressing may break with large displacements)
Memory Port (Lane 1)
Partially spans Lane 2 to support 128-bit Load/Store.
Plugs into L1 D$.
IMUL (Lane 1)
32*32 -> 64
3x ALU:
1x ALU-A (Lane 1), Full Capability (ADD / SUB / CMP / ...)
2x ALU-B (Lane 2/3), Partial Capability (ADD / SUB)
1x FPU (SIMD Capable, Spans Lanes 1/2/3)
FADD / FSUB / FMUL (Double)
Packed ADD / SUB / MUL (Single / Half)
Contains:
FADD/FSUB Unit
FMUL Unit
Converters / Misc Glue Logic
3x CONV (Integer Conversion)
Sign/Zero Extension, and a few other misc operations.

Then, other stages:
L1 I$
Decoder Unit
GPR Register-File Unit
PreBranch Unit (Branch Predictor)
...

The "wide" shift units can produce shift and rotate, and can be ganged
together to perform a 128-bit shift.

The "narrow" shift unit can perform a 64-bit arithmetic or logical shift
(but excludes rotate and similar to save cost).

AGU: Essentially, it calculates a 32-bit address in the low bits, and
may do +1/0/-1 on (47:32), with bits (63:48) being copied unchanged (and
generally ignored by the L1 cache). If the scaled displacement the
32-bit limit, then addressing may break down.

Technically, a 33-bit scaled index with a 36-bit partial address would
be "better", it is mostly a trade-off of timing and resource cost
(partly because the AGU and also LEA are single-cycle operations).

The IMUL unit only exists in Lane 1, and performs a 32-bit widening
multiply. Typically, 64-bit multiply is faked using multiple 32-bit
multiplies (with 64-bit multiply being both infrequent and expensive).

ALU: The full-featured ALU only exists in Lane 1, with Lane 2 and 3
getting shaved down ALUs (ADD/SUB/AND/OR/XOR).

The Lane 1 ALU-A unit and Lane 2's ALU-B can be ganged to support
128-bit ADD/SUB/CMP/AND/OR/XOR. Each ALU itself only operates on 64 bits
at a time, but sending a few bits between them allowed the Carry-Select
mechanism to operate as if it were 128 bits wide.

This mostly works mostly because 64 and 128 bit ADD/SUB/CMP has a
2-cycle latency.

Some other misc stuff, eg: CLZ/CTZ, UTX2 (Texture Compression), ... are
also routed through ALU-A.

There is only a single FPU, but effectively it spans all 3 lanes, as
this was needed to be able to support 128-bit FP-SIMD operations. Note
that SIMD operations are implemented via pipelining within the FPU.

As noted, FADD is a 6-cycle operation, whereas PADDF (4x Single) is 10
cycles, so SIMD mostly wins here.

FP conversion ops are routed through different locations:
Double <-> Integer: routed through FADD.
Double <-> Single/Half: FPU, via converter units.
...

FCMP is implemented via the ALU.

FRCP / FDIV / FSQRT: Faked in software via the C runtime.

There was the Long-Double-Extension, which added "Truncated Binary128"
support (~ S.15.80), but this has the drawback of being "rather
expensive". Full Binary128 support currently falls well outside the FPGA
resource budget.

There were plans that the FMULX unit could be made to also be able to
perform 64-bit integer multiply, but this depends on "actually being
able to afford it". This mechanism would involve being able to
reconfigure some of the low-order units of the multiplier via "plumbing
tricks" (to switch between a "square" and "triangular" configuration).

By analogy, it would be equivalent to having a big isosceles triangle,
then being able to cut two of the corners off, and move them to the
bottom of the triangle, forming a square (with a big interior area which
does not move).

Where, the triangle represents an FMUL, and the square represents an
IMUL, and the non-moving part represents the high-order bits.

Units like the TLB aren't directly connected to the execute pipeline,
but is instead connected to the L1 Caches via the ringbus. Operations
like LDTLB are internally actually special-case MMIO operations which
communicate with the TLB more like it were an IO device. Several
registers are forwarded to the TLB though (MMCR, KRR, ...). While being
"theoretically" an MMU register, TTB and similar aren't actually visible
to the MMU (and are more intended to be used by the OS / firmware's
page-table walker).

>>
>> This is also why many smaller ISAs tend to leave out things like
>> variable-shift and integer multiply...
> <
> In this day and age, that is a mistake.

It does suck, and probably isn't ideal for anything meant for performance.

Sorta works in microcontrollers though which aren't usually meant for
performance-sensitive applications.

>>
>>
>> Decided to leave it out, but one can use their imagination for how
>> things work on an ISA which lacks both shift instructions and multiply...

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s74rp7$laf$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16534&group=comp.arch#16534

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Fri, 7 May 2021 21:07:26 -0500
Organization: A noiseless patient Spider
Lines: 221
Message-ID: <s74rp7$laf$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<s747f6$5ri$1@dont-email.me>
<0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 02:07:35 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d88d4e5616e3e48100710db326053574";
logging-data="21839"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/gdolVuQtfhkGMgtS2w+SE"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:7ehwcb7vPwVOJyf2rvMIzuUkTCU=
In-Reply-To: <0a355bf9-7067-4076-9365-de1c63061df1n@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 8 May 2021 02:07 UTC

On 5/7/2021 3:40 PM, MitchAlsup wrote:
> On Friday, May 7, 2021 at 3:20:56 PM UTC-5, BGB wrote:
>> On 5/7/2021 8:15 AM, Stefan Monnier wrote:
>>>> It is possible that it could be generalized, but then the ISA would be less
>>>> RISC style then it is already...
>>>>
>>>> Then again, I did see recently that someone had listed my ISA somewhere, but
>>>> then classified it as a CISC; I don't entirely agree, but alas...
>>>>
>>>> Then again, many peoples' definitions of "RISC" exclude largish
>>>> instruction-sets with variable-length instruction encodings, so alas.
>>>>
>>>> But, taken at face value, then one would also need to exclude Thumb2 and
>>>> similar from the RISC category.
>>>
>>> I think it's better not to worry about how other people label your ISA.
>>>
>>> The manichean RISC-vs-CISC labeling is stupid anyway: the design space
>>> has many more than 2 spots.
>>>
>>> Also I think the "classic RISC" is the result of specific conditions at
>>> that point in time which resulted in a particular sweet spot in the
>>> design space.
>>>
>>> But design constraints are quite different now, so applying the same
>>> "quantitative approach" that lead to MIPS/SPARC/... will now result in
>>> something quite different. One could argue that such new ISAs
>>> could be called RISC-2020 ;-)
>>>
>> Yeah.
>>
>> Classic RISC:
>> Fixed size instructions
>> Typically only a single addressing mode (Reg, Disp)
>> Typically only supporting aligned memory access
>> Aims for fixed 1 instruction per cycle
> <
> Mc 88K had
> fixed sized instructions
> [Rbase+IMM16] and [Rbase+Rindex<<scale] address modes
> Aligned memory has been show to be defective
> aimed at 1ipc but we did engineer a 6-wide OoO version
>
> My 66000
> has fixed sized instruction specifiers with 1 or 2 constants.
> [Rbase+IMM16] and [Rbase+Rindex<<scale] and [Rbase+Rindex<<scale+disp[32,64] address modes
> Misaligned memory model
> Aimed at low burden for LBIO implementations and low burden for GBOoO implementations
> Inherently parallel
> Never needs a NoOp

BJX2 allows misaligned access, and also does not need NOPs.

Misaligned access may degrade performance in some cases though, doesn't
work with certain instructions, ...

If one triggers an interlock on a memory load or similar, there will be
a partial pipeline stall, and it will behave as-if a NOP were present.
Triggering an interlock on various other instructions may also trigger
this behavior.

Interlocks currently are in 2nd place (behind cache misses) for wasting
clock-cycles (improving the efficiency of memory has, in effect, also
increased the proportion of clock-cycles wasted on pipeline interlock
stalls).

I had more recently added logic to my C compiler's "wexifier" to shuffle
instructions around to try to reduce interlock penalties. At the moment,
it is disabled by default as the wexifier seems to be buggy in some
as-of-yet undetermined way (seems to be producing bundles which don't
work correctly on the FPGA implementation, but I can't seem to isolate
the behavior well enough to determine the cause).

BJX2:
Rb + Imm<<Sc
Rb + Ri<<Sc

Rb: R2..R15, R18..R31
Ri: R0, R2..R14, R18..R31

R0/R1: Used to encode special-cases
R15 (SP): Only usable as a base register
R16/R17: Reserved (for the possibility of more special cases)

Sc: Typically hard-wired based on the element type

Base registers encoded via special cases:
PC, GBR, TBR
Index registers encoded via special cases:
R0, Sc=0 (Allows access to misaligned struct members)

However, since pretty much all this is handled in the decoder, I don't
really feel it classifies as distinct address modes (and, as far as the
AGU is concerned (Rb+Ri<<Sc) is the only mode).

Generally, (Rb) and (Rb,0) are treated as equivalent.

Given the lack of autoincrement modes or similar, both PC and PC&(~3)
addressing, ... BJX2 in practice actually has fewer distinct addressing
modes than SuperH.

The SuperH ISA also had a few edge cases where multiple memory accesses
would be performed by a single instruction (and other violations of
Load/Store), but was generally classified as a RISC.

One thing BJX2 does have, that many RISC's lack, is a LEA instruction.
LEA Generally takes the form of assuming the Zero-Extended Store doesn't
make sense, so these cases are interpreted as a LEA.

This was mostly because LEA helps in various edge cases, such as for:
Composing function pointers;
Long-distance branches;
Compound addressing within structures;
Eg: Accessing an element within an array within a structure.
...

>> ...
>>
>> BJX2:
>> Variable length
>> 16/32 base
>> 16/32/64/96 extended
>> 48 possible, but currently unused
>> Original 48-bit ops can no-longer be encoded.
>> Multiple addressing modes
>> Unaligned memory access (8/16/32/64)
>> 64/128 bit alignment for 128-bit ops.
>> Supports explicitly parallel instructions
>> ...
>>
>>
>> It has a few features in common with VLIW architectures as well, but
>> differs from traditional VLIW in that these typically use a fixed-size
>> bundle encoding, whereas BJX2 bundles are variable-length (composed of
>> 32-bit units), with the 64 and 96 bit instruction encodings being
>> effectively a special-case of the bundle encoding.
>>
>> There are several DSP architectures which do something similar to this:
>> Xtensa, Hexagon, ...
>>
>>
>> Though, originally, inclination toward VLIW support was based on taking
>> inspiration from the TMS320C6x and IA64, which, granted, use a
>> fixed-size bundle encoding.
>>
>> Some of my earlier ideas involved using a more traditional bundle format
>> (just sort of awkwardly plonked into the code-stream), but this later
>> transformed into the current approach.
>>
>>
>> A recent experiment did also test using 24-bit instructions, but as
>> noted elsewhere, while in themselves they were effective at reducing the
>> number of 32-bit ops in size-optimized code, because 32-bit ops are
>> already the minority in the size-optimized case, the net savings were
>> fairly small (and ran into issues with the baked-in assumption that
>> branch-targets are 16-bit aligned, *, ...).
>>
>> *: One either needs to jostle around with re-encoding the past several
>> instructions to re-align the instruction stream, or insert a 24-bit NOP
>> (if the former fails), or use 24-bit byte-aligned branch encodings
>> (which ended up costing more than they saved vs the "reencode ops to fix
>> alignment to allow for using the 16-bit branch ops" strategy).
>>
>>
>>
>> As for resource cost:
>> My current BJX2 core costs ~ 4x as much as a minimalist 32-bit RISC
>> style core.
>>
>> Or, basically, if I go for:
>> Fixed-length instructions
>> One instruction at a time
>> 16x 32-bit GPRs
>> Aligned-only memory access
>> No Variable-Shift or FPU ops
>> No MMU
>> ...
>>
>> It is possible to use ~ 1/4 the LUTs of the current (full-featured) BJX2
>> core. The difference isn't quite drastic enough to make a strong use
>> case for the minimalist core (even as a secondary "IO controller" or
>> similar).
>>
>> I can also get a lot of this back (eg, fitting a microcontroller subset
>> of BJX2 on an XC7S25), mostly by disabling WEX and the MMU and similar.
>>
>> By disabling the FPU and a few other things, it is also possible to
>> shoe-horn it into an XC7A15 (basically, the smallest Xilinx FPGA I can
>> really manage to find on FPGA dev-boards "in the wild").
>>
>> This in-turn creates a disincentive to have a separate 32-bit ISA, vs
>> using a slightly more restrictive subset of the 64-bit ISA.
>>
>>
>> I had also looked briefly at trying to do a 16-bit IO controller, but
>> then ran into the problem that there is basically no real good way to
>> plug a 16-bit core into my existing bus architecture.
>>
>> And, it seems, short of targeting something like an iCE40 or similar,
>> there isn't much point.
>>
>> Not sure how this compares with the ASIC space, but I suspect with
>> modern technologies this is probably "grain of sand" territory.
>>
>>
>> Could matter more if one is having their logic printed onto a plastic
>> substrate using semiconductor inks, but given these technologies (in
>> their commercial form) are managing things like Cortex-M cores, it
>> probably isn't too major of an issue.
>> Similarly, building such a printer would currently appear to be too
>> expensive for the hobbyist space.
>>
>>
>> ...


Click here to read the complete article
Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s750aq$4u8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16535&group=comp.arch#16535

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Fri, 7 May 2021 22:25:06 -0500
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <s750aq$4u8$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
<2021May7.195613@mips.complang.tuwien.ac.at>
<3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com>
<2021May7.233426@mips.complang.tuwien.ac.at>
<s74epa$9jn$1@newsreader4.netcologne.de>
<b6d66d81-2c1d-4ccf-ba22-27f6381659cen@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 03:25:15 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d88d4e5616e3e48100710db326053574";
logging-data="5064"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18dOKHsPgH7F0jlr+IgcqUH"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:Jz1TdSOp/r+ZqaN70LcgQQJ43mY=
In-Reply-To: <b6d66d81-2c1d-4ccf-ba22-27f6381659cen@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 8 May 2021 03:25 UTC

On 5/7/2021 5:31 PM, MitchAlsup wrote:
> On Friday, May 7, 2021 at 5:25:48 PM UTC-5, Thomas Koenig wrote:
>> Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
>>> ARM, HPPA, Power, and Aarch64 have loads and stores with
>>> autoincrement/decrement and a considered to be RISCs by most.
>> And at least for POWER, they are a bit different (and more
>> powerful):
>>
>> lbzu ra,D(rb)
>>
>> will, for example, load from ra from rb + D and store rb + d into
>> rb afterwards.
> <
> I think one could argue that this is causing the instruction to have
> more than one result.
>>
>> However, it is better not to use them:
>>
>> # In some implementations, the Load Algebraic and
>> # Load with Update instructions may have greater
>> # latency than other types of Load instructions. More-
>> # over, Load with Update instructions may take lon-
>> # ger to execute in some implementations than the
>> # corresponding pair of a non-update Load instruc-
>> # tion and an Add instruction.
> <
> Register write collisions cause pipeline stalling behavior.
>

Or: Why did I not keep auto-increment in my ISA's...

At best, they don't buy much, or worse, one has to produce multiple
register stores from the same instruction, which adds complexity, and
(in a scalar implementation) is likely to also require a pipeline
interlock to fake the presence of an ADD instruction...

It is "slightly less bad" if using a hard-wired register via a
side-channel (such as PUSH/POP and SP), but as can be noted, PUSH/POP no
longer exists in my ISA either (they didn't "bring home the bacon"
enough to justify the costs of their continued existence).

Related:
"Why are there no branch delay slots?"
...

Re: FP8 (was Compact representation for common integer constants)

<s75pe2$shi$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16536&group=comp.arch#16536

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sat, 8 May 2021 12:33:37 +0200
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <s75pe2$shi$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me> <s72qbr$ehq$1@dont-email.me>
<s73287$itf$1@gioia.aioe.org> <s73rcr$83m$1@dont-email.me>
<s73u8m$tnj$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 10:33:38 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c78f7449fe20912d35cb69126afde30d";
logging-data="29234"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zbc+qtWHG6DtPTNXkLy9cmIiDYHIcoUY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:rPYBlywqTLdRCBxVvCmbPvlI9PY=
In-Reply-To: <s73u8m$tnj$1@dont-email.me>
Content-Language: en-US
 by: Marcus - Sat, 8 May 2021 10:33 UTC

On 2021-05-07, Stephen Fuld wrote:
> On 5/7/2021 9:54 AM, Marcus wrote:
>> On 2021-05-07, Terje Mathisen wrote:
>
> snip
>
>>> FP8 could be either 1:3:4 or 1:4:3, but for full ieee compliance you
>>> really want at least 2n+3 bits in the mantissa when going up in size,
>>> so your choice is the best one imho.
>>>
>>> Having 2n+3 means that you never get into double rounding problems by
>>> doing any single operation in the larger size, then rounding back
>>> down to the target precision.
>>
>> Good to know - I didn't think of that. I mostly went with the
>> configuration that made the most sense: The main advantage (IMO) of
>> floating-point numbers compared to fixed point numbers is the increased
>> dynamic range,
>
> Isn't that the *only* reason to do floating point?
>

I can think of a few more advantages, such as the normalized
representation that simplifies the implementation of many algorithms
(e.g. sqrt, reciprocals, etc). You also get the nice property that
multiplication produces a result that has the same storage size as the
inputs, unlike integer multiplication where you need twice the size
for the output compared to the input (which has led to many different
special solutions in ISA:s over the years, like the HI/LO registers
in MIPS). FP became popular in DSP:s mostly to eliminate the hassles of
avoiding overflow and underflow in fixed point arithmetic. And so on
(many of the advantages are of course side effects of the increased
dynamic range - but I wouldn't go as far as to say that it's the only
reason to use or implement floating-point).

/Marcus

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<28c5e291-c7ca-42d4-b103-f83b10e98d3dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16538&group=comp.arch#16538

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5588:: with SMTP id e8mr15152585qvx.10.1620483450363; Sat, 08 May 2021 07:17:30 -0700 (PDT)
X-Received: by 2002:aca:b387:: with SMTP id c129mr18205853oif.30.1620483450132; Sat, 08 May 2021 07:17:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 8 May 2021 07:17:29 -0700 (PDT)
In-Reply-To: <s750aq$4u8$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com> <s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com> <s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org> <2021May7.195613@mips.complang.tuwien.ac.at> <3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com> <2021May7.233426@mips.complang.tuwien.ac.at> <s74epa$9jn$1@newsreader4.netcologne.de> <b6d66d81-2c1d-4ccf-ba22-27f6381659cen@googlegroups.com> <s750aq$4u8$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <28c5e291-c7ca-42d4-b103-f83b10e98d3dn@googlegroups.com>
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 08 May 2021 14:17:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 65
 by: MitchAlsup - Sat, 8 May 2021 14:17 UTC

On Friday, May 7, 2021 at 10:25:17 PM UTC-5, BGB wrote:
> On 5/7/2021 5:31 PM, MitchAlsup wrote:
> > On Friday, May 7, 2021 at 5:25:48 PM UTC-5, Thomas Koenig wrote:
> >> Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
> >>> ARM, HPPA, Power, and Aarch64 have loads and stores with
> >>> autoincrement/decrement and a considered to be RISCs by most.
> >> And at least for POWER, they are a bit different (and more
> >> powerful):
> >>
> >> lbzu ra,D(rb)
> >>
> >> will, for example, load from ra from rb + D and store rb + d into
> >> rb afterwards.
> > <
> > I think one could argue that this is causing the instruction to have
> > more than one result.
> >>
> >> However, it is better not to use them:
> >>
> >> # In some implementations, the Load Algebraic and
> >> # Load with Update instructions may have greater
> >> # latency than other types of Load instructions. More-
> >> # over, Load with Update instructions may take lon-
> >> # ger to execute in some implementations than the
> >> # corresponding pair of a non-update Load instruc-
> >> # tion and an Add instruction.
> > <
> > Register write collisions cause pipeline stalling behavior.
> >
> Or: Why did I not keep auto-increment in my ISA's...
>
> At best, they don't buy much, or worse, one has to produce multiple
> register stores from the same instruction, which adds complexity, and
> (in a scalar implementation) is likely to also require a pipeline
> interlock to fake the presence of an ADD instruction...
<
Auto INCs/DECs often occur several in a row, and a good value
propagator can find these and perform several LDs/STs and
compress the auto INCs/DECs arithmetic into a single add/sub.
>
> It is "slightly less bad" if using a hard-wired register via a
> side-channel (such as PUSH/POP and SP), but as can be noted, PUSH/POP no
> longer exists in my ISA either (they didn't "bring home the bacon"
> enough to justify the costs of their continued existence).
>
> Related:
> "Why are there no branch delay slots?"
<
With a 20% branch density and a single cycle of branch target latency
branch delay slots can save that cycle-------well closer to 50% of that
cycle as only 50% of BDSs are filled doing useful work while up to 70%
are filled with other than NoOps.

In anything other than a 1-wide simple pipeline BDSs are an unuseful
complexity that gets in the way of other things one wants to do in the
pipeline. For example, by FETCHing 4-wide and scanning forward in
an instruction buffer, one can find branches long before they become
"executed" and have the target instructions arrive before the branch
is through DECODE. Thus, unconditional branches may take 0
(zero nada zilch no) cycles even without BDSs. This affords many
opportunities to reduce the penalty for branches. Also, in wider
implementations BDSs are only a complexity that does not save
execution cycles, and makes the rest of the HW considerably more
complex.
<
> ...

Re: FP8 (was Compact representation for common integer constants)

<7a87e9f5-37ee-4689-b9b7-0d6bf774e715n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16539&group=comp.arch#16539

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:910:: with SMTP id 16mr10625429qkj.497.1620483566456;
Sat, 08 May 2021 07:19:26 -0700 (PDT)
X-Received: by 2002:a9d:711a:: with SMTP id n26mr977352otj.329.1620483566063;
Sat, 08 May 2021 07:19:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 8 May 2021 07:19:25 -0700 (PDT)
In-Reply-To: <s75pe2$shi$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me> <s6vpfb$cvd$1@dont-email.me>
<s72qbr$ehq$1@dont-email.me> <s73287$itf$1@gioia.aioe.org>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me> <s75pe2$shi$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7a87e9f5-37ee-4689-b9b7-0d6bf774e715n@googlegroups.com>
Subject: Re: FP8 (was Compact representation for common integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 08 May 2021 14:19:26 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sat, 8 May 2021 14:19 UTC

On Saturday, May 8, 2021 at 5:33:40 AM UTC-5, Marcus wrote:
> On 2021-05-07, Stephen Fuld wrote:
> > On 5/7/2021 9:54 AM, Marcus wrote:
> >> On 2021-05-07, Terje Mathisen wrote:
> >
> > snip
> >
> >>> FP8 could be either 1:3:4 or 1:4:3, but for full ieee compliance you
> >>> really want at least 2n+3 bits in the mantissa when going up in size,
> >>> so your choice is the best one imho.
> >>>
> >>> Having 2n+3 means that you never get into double rounding problems by
> >>> doing any single operation in the larger size, then rounding back
> >>> down to the target precision.
> >>
> >> Good to know - I didn't think of that. I mostly went with the
> >> configuration that made the most sense: The main advantage (IMO) of
> >> floating-point numbers compared to fixed point numbers is the increased
> >> dynamic range,
> >
> > Isn't that the *only* reason to do floating point?
> >
> I can think of a few more advantages, such as the normalized
> representation that simplifies the implementation of many algorithms
> (e.g. sqrt, reciprocals, etc). You also get the nice property that
> multiplication produces a result that has the same storage size as the
> inputs, unlike integer multiplication where you need twice the size
<
There are just about as many uses for a FP multiply to produce an
exact result as there are for integer multiplies.
<
> for the output compared to the input (which has led to many different
> special solutions in ISA:s over the years, like the HI/LO registers
> in MIPS). FP became popular in DSP:s mostly to eliminate the hassles of
> avoiding overflow and underflow in fixed point arithmetic. And so on
> (many of the advantages are of course side effects of the increased
> dynamic range - but I wouldn't go as far as to say that it's the only
> reason to use or implement floating-point).
>
> /Marcus

Re: FP8 (was Compact representation for common integer constants)

<s76as3$i7j$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16541&group=comp.arch#16541

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sat, 8 May 2021 15:31:15 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <s76as3$i7j$1@gal.iecc.com>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me> <s75pe2$shi$1@dont-email.me>
Injection-Date: Sat, 8 May 2021 15:31:15 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="18675"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me> <s75pe2$shi$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 8 May 2021 15:31 UTC

According to Marcus <m.delete@this.bitsnbites.eu>:
>>> Good to know - I didn't think of that. I mostly went with the
>>> configuration that made the most sense: The main advantage (IMO) of
>>> floating-point numbers compared to fixed point numbers is the increased
>>> dynamic range,
>>
>> Isn't that the *only* reason to do floating point?
>
>I can think of a few more advantages, such as the normalized
>representation that simplifies the implementation of many algorithms ...

The huge advantage is that it takes care of scaling automatically so
the programmer doesn't have to worry about it in every operation.

When people were designing early computers in the 1940s, they knew
about floating point but didn't include it because they thought that
the programer could easily keep track of the scale so the extra
hardware complexity wasn't worth it. This was true if the programmer
was John von Neumann, not so true otherwise, so FP hardware showed up
on the 704 in the early 1950s.

> You also get the nice property that
>multiplication produces a result that has the same storage size as the
>inputs, unlike integer multiplication where you need twice the size

Well, no. The fraction part of a product is twice as long as the
fraction part of the inputs, but FP hardware can helpfully round it if
you're lucky, truncate if you aren't. On S/360, floating multiply
took float inputs and produced a double output. S/370 added double
to quad. I guess you were on your own.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: FP8 (was Compact representation for common integer constants)

<s76ehd$65p$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16544&group=comp.arch#16544

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sat, 8 May 2021 09:33:47 -0700
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <s76ehd$65p$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 16:33:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="91981a832e40b355fe5333c7c6920092";
logging-data="6329"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kvgbNJbspjmTlVHcY/eqobg4S+7z+CnE="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:M9oMEmSaKKaevOY+ar7HNmQpEog=
In-Reply-To: <s76as3$i7j$1@gal.iecc.com>
Content-Language: en-US
 by: Stephen Fuld - Sat, 8 May 2021 16:33 UTC

On 5/8/2021 8:31 AM, John Levine wrote:
> According to Marcus <m.delete@this.bitsnbites.eu>:
>>>> Good to know - I didn't think of that. I mostly went with the
>>>> configuration that made the most sense: The main advantage (IMO) of
>>>> floating-point numbers compared to fixed point numbers is the increased
>>>> dynamic range,
>>>
>>> Isn't that the *only* reason to do floating point?
>>
>> I can think of a few more advantages, such as the normalized
>> representation that simplifies the implementation of many algorithms ...
>
> The huge advantage is that it takes care of scaling automatically so
> the programmer doesn't have to worry about it in every operation.
>
> When people were designing early computers in the 1940s, they knew
> about floating point but didn't include it because they thought that
> the programer could easily keep track of the scale so the extra
> hardware complexity wasn't worth it. This was true if the programmer
> was John von Neumann, not so true otherwise, so FP hardware showed up
> on the 704 in the early 1950s.

While all of that is true, there were other alternatives. COBOL
supported, and still does, automatic scaling of fixed point numbers. I
don't know if other languages support this.

Of course, the time period you were discussing was before high level
languages, and writing the code in assembler was probably "a bridge too
far" for most programmers at the time.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<2021May8.185104@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16545&group=comp.arch#16545

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)
Date: Sat, 08 May 2021 16:51:04 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 56
Distribution: world
Message-ID: <2021May8.185104@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com> <s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com> <s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org> <2021May7.195613@mips.complang.tuwien.ac.at> <3551b3df-dc05-4aab-b097-d3a9cfffdc41n@googlegroups.com> <2021May7.233426@mips.complang.tuwien.ac.at> <s74epa$9jn$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="563075f51077ac8d313b58f10fb9d412";
logging-data="20591"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+PspXiGrYop8jUGyL0SAwz"
Cancel-Lock: sha1:SInhFwBhQvER13NzLCLLiI+bTzA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 8 May 2021 16:51 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
>
>> ARM, HPPA, Power, and Aarch64 have loads and stores with
>> autoincrement/decrement and a considered to be RISCs by most.
>
>And at least for POWER, they are a bit different (and more
>powerful):
>
> lbzu ra,D(rb)
>
>will, for example, load from ra from rb + D and store rb + d into
>rb afterwards.

If D is a constant, that's like Aarch64's and ARM's pre-indexed
addressing mode.

>However, it is better not to use them:
>
># In some implementations, the Load Algebraic and
># Load with Update instructions may have greater
># latency than other types of Load instructions. More-
># over, Load with Update instructions may take lon-
># ger to execute in some implementations than the
># corresponding pair of a non-update Load instruc-
># tion and an Add instruction.

Interestingly, the ARMv8 Instruction Set Overview
<http://www.cs.princeton.edu/courses/archive/spr19/cos217/reading/ArmInstructionSetOverview.pdf> says:

|3.3.1 Register Indexed Addressing
| |The A64 instruction set extends on 32-bit T32 addressing modes,
|allowing a 64-bit index register to be added to the 64-bit base
|register, with optional scaling of the index by the access
|size. Additionally it provides for sign or zero-extension of a 32-bit
|value within an index register, again with optional scaling.
| |These register index addressing modes provide a useful performance
|gain if they can be performed within a single cycle, and it is
|believed that at least some implementations will be able to do
|this. However, based on implementation experience with AArch32, it is
|expected that other implementations will need an additional cycle to
|execute such addressing modes.
| |Rationale: The architects intend that implementations should be free
|to fine-tune the performance trade-offs within each implementation,
|and note that providing an instruction which in some implementations
|takes two cycles, is preferable to requiring the dynamic grouping of
|two independent instructions in an implementation that can perform
|this address arithmetic in a single cycle.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: FP8 (was Compact representation for common integer constants)

<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16546&group=comp.arch#16546

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sat, 08 May 2021 13:34:20 -0400
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="bcebba74faf17b92e46ab0e33ae289d1";
logging-data="16409"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/hRdXxRDHA7IuE9GGun2eM"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:fCmIBL6IethKxL/4poId7x+Uoq8=
sha1:BU4Vf9pimO/rKc4xgDOBaMoCoKs=
 by: Stefan Monnier - Sat, 8 May 2021 17:34 UTC

>>>> Good to know - I didn't think of that. I mostly went with the
>>>> configuration that made the most sense: The main advantage (IMO) of
>>>> floating-point numbers compared to fixed point numbers is the increased
>>>> dynamic range,
>>> Isn't that the *only* reason to do floating point?
>>I can think of a few more advantages, such as the normalized
>>representation that simplifies the implementation of many algorithms ...
> The huge advantage is that it takes care of scaling automatically so
> the programmer doesn't have to worry about it in every operation.

While this is mostly true in general, the comments above where made in
the context of an 8bit floating point format, where the exponent's range
is sufficiently limited that it is still something the programmer very
much has to worry about.

Stefan

Re: FP8 (was Compact representation for common integer constants)

<s76mnn$4ok$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16547&group=comp.arch#16547

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sat, 8 May 2021 11:53:43 -0700
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <s76mnn$4ok$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me> <s72qbr$ehq$1@dont-email.me>
<s73287$itf$1@gioia.aioe.org> <s73rcr$83m$1@dont-email.me>
<s73u8m$tnj$1@dont-email.me> <s75pe2$shi$1@dont-email.me>
<7a87e9f5-37ee-4689-b9b7-0d6bf774e715n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 18:53:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="eb79b3be68655002d1fbd08b5c5fbcd4";
logging-data="4884"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/CNFs0Tw9vVhWl/g/rKzmx"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:krpqjaFXRxglctH2dKg/BSsy12A=
In-Reply-To: <7a87e9f5-37ee-4689-b9b7-0d6bf774e715n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Sat, 8 May 2021 18:53 UTC

On 5/8/2021 7:19 AM, MitchAlsup wrote:
> On Saturday, May 8, 2021 at 5:33:40 AM UTC-5, Marcus wrote:
>> On 2021-05-07, Stephen Fuld wrote:
>>> On 5/7/2021 9:54 AM, Marcus wrote:
>>>> On 2021-05-07, Terje Mathisen wrote:
>>>
>>> snip
>>>
>>>>> FP8 could be either 1:3:4 or 1:4:3, but for full ieee compliance you
>>>>> really want at least 2n+3 bits in the mantissa when going up in size,
>>>>> so your choice is the best one imho.
>>>>>
>>>>> Having 2n+3 means that you never get into double rounding problems by
>>>>> doing any single operation in the larger size, then rounding back
>>>>> down to the target precision.
>>>>
>>>> Good to know - I didn't think of that. I mostly went with the
>>>> configuration that made the most sense: The main advantage (IMO) of
>>>> floating-point numbers compared to fixed point numbers is the increased
>>>> dynamic range,
>>>
>>> Isn't that the *only* reason to do floating point?
>>>
>> I can think of a few more advantages, such as the normalized
>> representation that simplifies the implementation of many algorithms
>> (e.g. sqrt, reciprocals, etc). You also get the nice property that
>> multiplication produces a result that has the same storage size as the
>> inputs, unlike integer multiplication where you need twice the size
> <
> There are just about as many uses for a FP multiply to produce an
> exact result as there are for integer multiplies.

I always liked the B6500 arithmetic, which had only one instruction for
both integer and FP - if the result fit in the integral value set you
got an integer, and a FP if not, and all ops didn't care which they got.
IIRC, the largest integer had 41 bits, sign-magnitude.

Re: scaling, was FP8 (was Compact representation for common integer constants)

<s76o7e$jpn$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16548&group=comp.arch#16548

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!adore2!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: scaling, was FP8 (was Compact representation for common integer constants)
Date: Sat, 8 May 2021 19:19:10 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <s76o7e$jpn$1@gal.iecc.com>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com> <s76ehd$65p$1@dont-email.me>
Injection-Date: Sat, 8 May 2021 19:19:10 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="20279"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com> <s76ehd$65p$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 8 May 2021 19:19 UTC

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
>> The huge advantage is that it takes care of scaling automatically so
>> the programmer doesn't have to worry about it in every operation.

>While all of that is true, there were other alternatives. COBOL
>supported, and still does, automatic scaling of fixed point numbers. I
>don't know if other languages support this.

COBOL gives you fixed scaling, e.g. PIC 9999V999 has four digits before
the implicit decimal point and three after. When you do arithmetic
it'll align the decimal point, but you don't get automatic scaling
unless you tell the compiler that the data is COMP-1 or COMP-2 so
it uses the internal floating point representation.

Scaling crisis of the day:

The NASDAQ stock exchange's computers represent prices as 32 bit
unsigned integers, with units being 1/100 of a cent or 1/10000 of a
dollar. Since the largest 32 bit integer is 4294967295 the highest
price it can represnt is $429,496.7295. The price of Berkshire
Hathaway reached $437,131 this week. Oops. They say they'll have a
fix later this month.

The next highest price is about $5000, and Berkshire's CEO Warren
Buffett has said for decades that he'll never split the shares like
everyone else does.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: FP8 (was Compact representation for common integer constants)

<s76uv7$5vp$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16549&group=comp.arch#16549

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sat, 8 May 2021 16:14:06 -0500
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <s76uv7$5vp$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 May 2021 21:14:15 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d88d4e5616e3e48100710db326053574";
logging-data="6137"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/zdAi6ycF/AKnzSnplL8vQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:gxx8kVL0qeYT0lJmQaoDMuWz6f0=
In-Reply-To: <jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US
 by: BGB - Sat, 8 May 2021 21:14 UTC

On 5/8/2021 12:34 PM, Stefan Monnier wrote:
>>>>> Good to know - I didn't think of that. I mostly went with the
>>>>> configuration that made the most sense: The main advantage (IMO) of
>>>>> floating-point numbers compared to fixed point numbers is the increased
>>>>> dynamic range,
>>>> Isn't that the *only* reason to do floating point?
>>> I can think of a few more advantages, such as the normalized
>>> representation that simplifies the implementation of many algorithms ...
>> The huge advantage is that it takes care of scaling automatically so
>> the programmer doesn't have to worry about it in every operation.
>
> While this is mostly true in general, the comments above where made in
> the context of an 8bit floating point format, where the exponent's range
> is sufficiently limited that it is still something the programmer very
> much has to worry about.
>

Yeah...

Exponent:
15: Inf / NaN
14: 128.0 .. 256.0
13: 64.0 .. 128.0
12: 32.0 .. 64.0
11: 16.0 .. 32.0
10: 8.0 .. 16.0
9: 4.0 .. 8.0
8: 2.0 .. 4.0
7: 1.0 .. 2.0
6: 0.5 .. 1.0
5: 0.25 .. 0.5
4: 0.125 .. 0.25
3: 0.063 .. 0.125
2: 0.031 .. 0.063
1: 0.016 .. 0.031
0: Zero / Denormal

In the FP8S variant, you have precision of, say:
...
0.500, 0.563, 0.625, 0.688, 0.750, 0.813, 0.875, 0.938
1.000, 1.125, 1.250, 1.375, 1.500, 1.625, 1.750, 1.875
2.000, 2.250, 2.500, 2.750, 3.000, 3.250, 3.500, 3.750
...
128, 144, 160, 176, 192, 208, 224, 240

Or, compared with conventional formats, its precision and dynamic range
are kinda garbage.

But, for certain types of tasks, it is still sufficient.

Re: Compact representation for common integer constants

<s789v4$rv6$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16550&group=comp.arch#16550

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-6262-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Sun, 9 May 2021 09:28:04 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s789v4$rv6$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me>
Injection-Date: Sun, 9 May 2021 09:28:04 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-6262-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:6262:0:7285:c2ff:fe6c:992d";
logging-data="28646"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 9 May 2021 09:28 UTC

BGB <cr88192@gmail.com> schrieb:

> IMUL (Lane 1)
> 32*32 -> 64

Do you have two instructions (one for signed and one for unsigned)
or three (one for the lower half, one for signed high, one for
unsigned high)? The latter version could save you some ALU
complexity and some latency in the (probably common) case where
only a 32*32 multiplication is needed, at the cost of added
instructions for the 32*32-> 64 bit case.

Re: Compact representation for common integer constants

<e96a28e2-2e9a-4b4a-b179-4696d9355c87n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16551&group=comp.arch#16551

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6799:: with SMTP id b25mr18670875qtp.165.1620571862083;
Sun, 09 May 2021 07:51:02 -0700 (PDT)
X-Received: by 2002:aca:6286:: with SMTP id w128mr15008253oib.119.1620571861844;
Sun, 09 May 2021 07:51:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 9 May 2021 07:51:01 -0700 (PDT)
In-Reply-To: <s789v4$rv6$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e96a28e2-2e9a-4b4a-b179-4696d9355c87n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 09 May 2021 14:51:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 9 May 2021 14:51 UTC

On Sunday, May 9, 2021 at 4:28:07 AM UTC-5, Thomas Koenig wrote:
> BGB <cr8...@gmail.com> schrieb:
> > IMUL (Lane 1)
> > 32*32 -> 64
<
> Do you have two instructions (one for signed and one for unsigned)
> or three (one for the lower half, one for signed high, one for
> unsigned high)? The latter version could save you some ALU
> complexity and some latency in the (probably common) case where
> only a 32*32 multiplication is needed, at the cost of added
> instructions for the 32*32-> 64 bit case.
<
In My 66000's case, 64×64->64 is the base function and comes in
signed and unsigned varietals. When one wants 64×64-128, one
uses the CARRY instruction-modifier. CARRY provides access to
all forms of wider than std operations {Shifts, ADD, SUB, IMUL, UMUL
IDIV, UDIV, FADD, FSUB, FMUL, and Kahan-Babuška Summation}

So one adds but a single instruction-modifier and gets 2 handfuls
of instructions.

Re: FP8 (was Compact representation for common integer constants)

<2021May9.101917@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16552&group=comp.arch#16552

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sun, 09 May 2021 08:19:17 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 46
Message-ID: <2021May9.101917@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me> <s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com> <s76ehd$65p$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="a437ca1e55e9fd7e44b7b38c21b83891";
logging-data="8946"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+IxfNzLwc3Mw92FSrRpQKB"
Cancel-Lock: sha1:vhbIDPPgpZBqQBMQmRbAQJ2czPs=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 9 May 2021 08:19 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 5/8/2021 8:31 AM, John Levine wrote:
>> When people were designing early computers in the 1940s, they knew
>> about floating point but didn't include it because they thought that
>> the programer could easily keep track of the scale so the extra
>> hardware complexity wasn't worth it. This was true if the programmer
>> was John von Neumann, not so true otherwise, so FP hardware showed up
>> on the 704 in the early 1950s.
>
>While all of that is true, there were other alternatives. COBOL
>supported, and still does, automatic scaling of fixed point numbers. I
>don't know if other languages support this.
>
>Of course, the time period you were discussing was before high level
>languages, and writing the code in assembler was probably "a bridge too
>far" for most programmers at the time.

Not really sure what you mean with the latter. In the early 1950s,
writing programs included steps that most of us don't know about these
days, such as layout of instructions on the rotating memory for
performance, where each instruction included the address of the next
one. What we see as machine code since the 1960s does not have to
deal with these complications, assembly language is even higher level,
Fortran I higher level yet.

As for fixed vs. floating point, I guess that is a cross-cutting
concern. Sure you can argue that, if you spend a lot of time on
low-level steps such as coding layout, spending some time on range
analysis is minor change, so fixed point is acceptable.

OTOH, if you have a nice numerical routine for a certain fixed-point
number range and notice that you need the same computation, but for a
different numeric range, do you want to repeat all the low-level work?
Ok, maybe you can change just a few immediate values and leave the
code as-is otherwise, but if the code includes optimizations based on
the knowledge of the values, that's not possible. In such a scenario,
floating point offers advantages, just as it does now.

What held back floating point for a long time was the slowness and/or
high cost of FP hardware, but at least in general-purpose computers
that's a thing of the past.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: FP8 (was Compact representation for common integer constants)

<s787j3$99m$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16553&group=comp.arch#16553

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Sun, 9 May 2021 10:47:30 +0200
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <s787j3$99m$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 9 May 2021 08:47:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a8137cbbf03d8e7974c63a761c828a0b";
logging-data="9526"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+rJUZD5GfP1PzLchKd7nXVt4MPxWi0aCA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:DBRV4MIkMbPOWj/D94hVNgRZZOs=
In-Reply-To: <jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US
 by: Marcus - Sun, 9 May 2021 08:47 UTC

On 2021-05-08, Stefan Monnier wrote:
>>>>> Good to know - I didn't think of that. I mostly went with the
>>>>> configuration that made the most sense: The main advantage (IMO) of
>>>>> floating-point numbers compared to fixed point numbers is the increased
>>>>> dynamic range,
>>>> Isn't that the *only* reason to do floating point?
>>> I can think of a few more advantages, such as the normalized
>>> representation that simplifies the implementation of many algorithms ...
>> The huge advantage is that it takes care of scaling automatically so
>> the programmer doesn't have to worry about it in every operation.
>
> While this is mostly true in general, the comments above where made in
> the context of an 8bit floating point format, where the exponent's range
> is sufficiently limited that it is still something the programmer very
> much has to worry about.
>

Yes that makes sense. Specifically for FP8, the increased dynamic range
is by far the most important trait. I've even seen examples of using
1:5:2 in DNN:s (deep neural networks) where dynamic range is often more
important than precision.

>
> Stefan
>

/Marcus

Pages:123456789101112131415
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor