Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

Research is what I'm doing when I don't know what I'm doing. -- Wernher von Braun

Compact representation for common integer constants

Subject	Author
Compact representation for common integer constants	JohnG
Re: Compact representation for common integer constants	Ivan Godard
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	JohnG
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Ivan Godard
Re: Compact representation for common integer constants	Marcus
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	MitchAlsup
Clamping. was: Compact representation for common integer constants	Ivan Godard
Re: Clamping. was: Compact representation for common integer constants	MitchAlsup
Re: Clamping. was: Compact representation for common integer	Ivan Godard
Re: Clamping. was: Compact representation for common integer constants	MitchAlsup
Re: Clamping. was: Compact representation for common integer	BGB
Re: Clamping. was: Compact representation for common integer	Ivan Godard
Re: Clamping. was: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Marcus
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Stefan Monnier
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	Bill Findlay
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Bill Findlay
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	Niklas Holsti
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Bill Findlay
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Quadibloc
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	John Levine

Pages:12 3 4 5 6 7 8 9 10 11 12 13 14 15

Compact representation for common integer constants

<44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16467&group=comp.arch#16467

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:6c3a:: with SMTP id k26mr10509588qtu.146.1620199551673;
Wed, 05 May 2021 00:25:51 -0700 (PDT)
X-Received: by 2002:a9d:3623:: with SMTP id w32mr21903287otb.16.1620199551344;
Wed, 05 May 2021 00:25:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 May 2021 00:25:51 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=108.91.172.125; posting-account=5gV3HwoAAAAce05MvbMFVKxb-iBCVVSr
NNTP-Posting-Host: 108.91.172.125
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Subject: Compact representation for common integer constants
From: gomijaco...@gmail.com (JohnG)
Injection-Date: Wed, 05 May 2021 07:25:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: JohnG - Wed, 5 May 2021 07:25 UTC

Hey All,

I've had this for ages and figured I'd toss it out for people's consideration.

Instead of a complete binary representation, represent common integers as 1 times a count of a small number of prime factors and a left shift.

Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.

It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.

-JohnG

Re: Compact representation for common integer constants

<s6udkp$hs5$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16470&group=comp.arch#16470

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Wed, 5 May 2021 08:29:28 -0700
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <s6udkp$hs5$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 5 May 2021 15:29:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="74489c77d7cee187a7fb2bd51f4c29f2";
logging-data="18309"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DMBQLqOfZG7sceYhPRRbr"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:nQ5RdqHDknLuv7i6D19fzwxNQ2c=
In-Reply-To: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Wed, 5 May 2021 15:29 UTC

On 5/5/2021 12:25 AM, JohnG wrote:
> Hey All,
>
> I've had this for ages and figured I'd toss it out for people's consideration.
>
> Instead of a complete binary representation, represent common integers as 1 times a count of a small number of prime factors and a left shift.
>
> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
>
> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
>
> -JohnG
>

You wouldn't want to actually do the implied multiplies, so this would
become a look-up table indexed by the byte. But if you do that then you
could have any frequent values in the table, not just those implied by
the factors. That's how Mill popCons (Popular Constants) work, with the
nicety that they are encoded only using otherwise wasted entropy in the
binary ISA.

Re: Compact representation for common integer constants

<s6utud$keb$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16478&group=comp.arch#16478

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Wed, 5 May 2021 15:07:34 -0500
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <s6utud$keb$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 5 May 2021 20:07:41 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7de3e68cdbcf712b87ca9e4d8e1e92d3";
logging-data="20939"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX195CBm26OWsxQ0dsm1Y+wCR"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:W941NTXvoYuabDJrp21lTqG6YZw=
In-Reply-To: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Content-Language: en-US

by: BGB - Wed, 5 May 2021 20:07 UTC

On 5/5/2021 2:25 AM, JohnG wrote:
> Hey All,
>
> I've had this for ages and figured I'd toss it out for people's consideration.
>
> Instead of a complete binary representation, represent common integers as 1 times a count of a small number of prime factors and a left shift.
>
> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
>
> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
>

My experiments with shifted-constants were not particularly promising:
Normal integer constants tend to be highly clustered around zero;
There are few obvious common patterns for most larger constants.

And, for this example, not much reason why multiples of 3 or 5 would be
significantly more common than non-multiples of 3 or 5, now combined
with needing hardware to scale things by a multiple of 3 or 5.

This can maybe deal slightly better with "casual" constants, like
"i=10000;" or similar, but these cases are a relative minority IME, and
probably not worth the cost of needing special handling in hardware.

The clustering near zero is also lopsided, with significantly more
positive constants than negative ones. The probability also seems to
fall off with roughly the square of the distance from zero.

The majority of constants which are not clustered near zero tend to fall
into one of several categories:
Hard-coded memory addresses or similar;
Bit-masks or similar;
Floating-point literals.

Memory Addresses: Not a whole lot one can do there, but these do make a
strong case for some sort of relative addressing, such as PC-Relative or
having a Global Pointer or similar.

In which case, the number of bits needed is proportional to
text+data+bss, which for "most binaries" (at least of the ones I am
testing) tends to fall in the range of 20..24 bits.

Though, for relative addressing for structs and arrays, ~ 8..12 bits
seems to be "mostly sufficient" (I am having OK results with a 9-bit
displacement for load/store ops; but as noted, I typically need
"something bigger" to deal with PC-rel or GBR-rel).

In my case, since my C ABI is mostly built around GBR relative
addressing, the number of full-width addresses (48-bits, in theory) is
relatively small (in practice, most of these are references to the MMIO
area at 0000_F0000000, *1).

*1: I am almost half-tempted to eliminate this region if using 48-bit
addressing with the MMU enabled (JQ+MMU). This would force using 48-bit
MMIO addresses if the MMU is enabled, but would effectively give an
entirely flat userland region (vs, "mostly flat but with a random hole
just below the 4GB mark for MMIO and similar").

Bit Masks:
These are the most likely to benefit from some form of shifted encoding,
but need to be able to encode several types of patterns (to be useful).
0000_0010_0000_0000 / 0000_0000_0800_0000 / ...
0000_007F_FFFF_FFFF / 0000_0000_01FF_FFFF / ...
0000_3FE0_0000_0000 / 0000_0000_01FC_0000 / ...
FFFF_FFEF_FFFF_FFFF / FFFF_FFFF_F7FF_FFFF / ...
FFFF_FF80_0000_0000 / FFFF_FFFF_FE00_0000 / ...
FFFF_C01F_FFFF_FFFF / FFFF_FFFF_FE03_FFFF / ...
...

Here, one needs to hit "approximately" the right area, and have rules
for filling in the rest of the bits.

In an ideal case, one would need 16-bits of payload, with a shift, and a
few bits as a fill pattern selector. To save bits, the shift could be a
multiple of 4 or 8 bits.

This would take ~ 22-bits, which is a bit steep.
Cramming it down to 12 to 16 bits is possible, but makes it a lot less
useful (eg: 10-bit literal, 4-bit nybble-shift, 2-bit fill pattern).

Implementation is more difficult, as then one either needs to spend the
resources to handle it in the decoder, or re-purpose another unit (such
as the shift unit). Doing it in a way that leaves it "sufficiently
useful" (vs simpler options), is more difficult.

Shift-style units are kind of expensive, so aren't a good thing to put
in the instruction decoder.

Floating point literals:
If one has a few free spots with 16-bit immediate values, and a
Half-Float converter, then this actually works fairly well (a majority
of floating point constants which appear in code tend to be able to be
expressed exactly as Half-Float).

Main exceptions (can't be encoded exactly / at-all):
Powers of 10, eg: 1000.0, 0.1, 0.001, ...
Series of 9s, eg: 9999.0
Values that fall outside the exponent range
Values derived from irrational numbers
...

However, for: 1.0, 1.5, 0.75, 0.5, ...
They work pretty well.

Even E4.F4 microfloats (FP8) or similar "aren't completely useless" (and
may provide a more compact way to express things like FP-SIMD vectors).

These cases may also be able to express a range of patterns which while
not themselves FP constants, happen to exist as a bit pattern which can
be represented exactly using packed FP16 or FP8. This includes a certain
subset of bit-masks.

The "exception" cases tend not to follow an obvious pattern (in terms of
global statistics) so are "at best" handled with a lookup table, whose
optimal contents will tend to vary from one application to another.

In this latter case, the main viable options then end up being to encode
constants inline, or fetch them from a software managed lookup table,
with the usual sorts of tradeoffs.

Re: Compact representation for common integer constants

<s6v2o1$krb$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16482&group=comp.arch#16482

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Wed, 5 May 2021 14:29:36 -0700
Organization: A noiseless patient Spider
Lines: 154
Message-ID: <s6v2o1$krb$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 5 May 2021 21:29:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="74489c77d7cee187a7fb2bd51f4c29f2";
logging-data="21355"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18aw6SA8XP/L7Du2AB4us2T"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:F5UaIGlBbET0xzxtqOBKAmugwNY=
In-Reply-To: <s6utud$keb$1@dont-email.me>
Content-Language: en-US

by: Ivan Godard - Wed, 5 May 2021 21:29 UTC

On 5/5/2021 1:07 PM, BGB wrote:
> On 5/5/2021 2:25 AM, JohnG wrote:
>> Hey All,
>>
>> I've had this for ages and figured I'd toss it out for people's
>> consideration.
>>
>> Instead of a complete binary representation, represent common integers
>> as 1 times a count of a small number of prime factors and a left shift.
>>
>> Example, an 8-bit value with 1 bit for the presence of a factor of 3,
>> and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for
>> left shift. So this would cover integers that were multiples of 1, 3,
>> 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
>>
>> It seemed like a nice way to encode a lot of useful constants in a
>> compact opcode format. Variants of the basic idea are hopefully obvious.
>>
>
> My experiments with shifted-constants were not particularly promising:
> Normal integer constants tend to be highly clustered around zero;
> There are few obvious common patterns for most larger constants.
>
> And, for this example, not much reason why multiples of 3 or 5 would be
> significantly more common than non-multiples of 3 or 5, now combined
> with needing hardware to scale things by a multiple of 3 or 5.
>
> This can maybe deal slightly better with "casual" constants, like
> "i=10000;" or similar, but these cases are a relative minority IME, and
> probably not worth the cost of needing special handling in hardware.
>
>
> The clustering near zero is also lopsided, with significantly more
> positive constants than negative ones. The probability also seems to
> fall off with roughly the square of the distance from zero.
>
>
>
> The majority of constants which are not clustered near zero tend to fall
> into one of several categories:
> Hard-coded memory addresses or similar;
> Bit-masks or similar;
> Floating-point literals.
>
>
> Memory Addresses: Not a whole lot one can do there, but these do make a
> strong case for some sort of relative addressing, such as PC-Relative or
> having a Global Pointer or similar.
>
> In which case, the number of bits needed is proportional to
> text+data+bss, which for "most binaries" (at least of the ones I am
> testing) tends to fall in the range of 20..24 bits.
>
> Though, for relative addressing for structs and arrays, ~ 8..12 bits
> seems to be "mostly sufficient" (I am having OK results with a 9-bit
> displacement for load/store ops; but as noted, I typically need
> "something bigger" to deal with PC-rel or GBR-rel).
>
> In my case, since my C ABI is mostly built around GBR relative
> addressing, the number of full-width addresses (48-bits, in theory) is
> relatively small (in practice, most of these are references to the MMIO
> area at 0000_F0000000, *1).
>
> *1: I am almost half-tempted to eliminate this region if using 48-bit
> addressing with the MMU enabled (JQ+MMU). This would force using 48-bit
> MMIO addresses if the MMU is enabled, but would effectively give an
> entirely flat userland region (vs, "mostly flat but with a random hole
> just below the 4GB mark for MMIO and similar").
>
>
>
> Bit Masks:
> These are the most likely to benefit from some form of shifted encoding,
> but need to be able to encode several types of patterns (to be useful).
> 0000_0010_0000_0000 / 0000_0000_0800_0000 / ...
> 0000_007F_FFFF_FFFF / 0000_0000_01FF_FFFF / ...
> 0000_3FE0_0000_0000 / 0000_0000_01FC_0000 / ...
> FFFF_FFEF_FFFF_FFFF / FFFF_FFFF_F7FF_FFFF / ...
> FFFF_FF80_0000_0000 / FFFF_FFFF_FE00_0000 / ...
> FFFF_C01F_FFFF_FFFF / FFFF_FFFF_FE03_FFFF / ...
> ...
>
> Here, one needs to hit "approximately" the right area, and have rules
> for filling in the rest of the bits.
>
> In an ideal case, one would need 16-bits of payload, with a shift, and a
> few bits as a fill pattern selector. To save bits, the shift could be a
> multiple of 4 or 8 bits.
>
> This would take ~ 22-bits, which is a bit steep.
> Cramming it down to 12 to 16 bits is possible, but makes it a lot less
> useful (eg: 10-bit literal, 4-bit nybble-shift, 2-bit fill pattern).
>
>
> Implementation is more difficult, as then one either needs to spend the
> resources to handle it in the decoder, or re-purpose another unit (such
> as the shift unit). Doing it in a way that leaves it "sufficiently
> useful" (vs simpler options), is more difficult.
>
> Shift-style units are kind of expensive, so aren't a good thing to put
> in the instruction decoder.
>
>
> Floating point literals:
> If one has a few free spots with 16-bit immediate values, and a
> Half-Float converter, then this actually works fairly well (a majority
> of floating point constants which appear in code tend to be able to be
> expressed exactly as Half-Float).
>
> Main exceptions (can't be encoded exactly / at-all):
> Powers of 10, eg: 1000.0, 0.1, 0.001, ...
> Series of 9s, eg: 9999.0
> Values that fall outside the exponent range
> Values derived from irrational numbers
> ...
>
> However, for: 1.0, 1.5, 0.75, 0.5, ...
> They work pretty well.
>
> Even E4.F4 microfloats (FP8) or similar "aren't completely useless" (and
> may provide a more compact way to express things like FP-SIMD vectors).
>
> These cases may also be able to express a range of patterns which while
> not themselves FP constants, happen to exist as a bit pattern which can
> be represented exactly using packed FP16 or FP8. This includes a certain
> subset of bit-masks.
>
>
> The "exception" cases tend not to follow an obvious pattern (in terms of
> global statistics) so are "at best" handled with a lookup table, whose
> optimal contents will tend to vary from one application to another.
>
> In this latter case, the main viable options then end up being to encode
> constants inline, or fetch them from a software managed lookup table,
> with the usual sorts of tradeoffs.
>

This matches our experience with popCons very well. Let me add that not
only "half-float" works, but also quarter-float, eight-float etc,. Low
precision/low range FP constants are quite common; the only full-sized
erratic (not in tabular data) floats are irrationals.

Mill code uses relative addressing throughout, so absolute addresses are
very rare. Besides 0..10, powers of two and ten seem to be the only ones
worth recognizing n our test corpus.

However, special-casing constants seems to be only a marginal gain in
general: at most 25-30% seem to be able to use the facility. The big win
is to have some way to compose a full size constant on the code-side
only, in one instruction without going through the data-side by loads
etc. Needing to compose a 128-bit constant from 23-bit literals is very
painful, but so is using a data fetch. I like Mitch's solution, although
it requires (or falls out of, if you refer) bundle-ization in the
pre-decoder.

Re: Compact representation for common integer constants

<824f2e40-93fd-42a8-9235-0371180d8c3an@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16483&group=comp.arch#16483

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:40ce:: with SMTP id g14mr1403024qko.190.1620259658858; Wed, 05 May 2021 17:07:38 -0700 (PDT)
X-Received: by 2002:a05:6830:90b:: with SMTP id v11mr1034161ott.110.1620259658561; Wed, 05 May 2021 17:07:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 May 2021 17:07:38 -0700 (PDT)
In-Reply-To: <s6utud$keb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s6utud$keb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <824f2e40-93fd-42a8-9235-0371180d8c3an@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 06 May 2021 00:07:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 183

by: MitchAlsup - Thu, 6 May 2021 00:07 UTC

On Wednesday, May 5, 2021 at 3:07:43 PM UTC-5, BGB wrote:
> On 5/5/2021 2:25 AM, JohnG wrote:
> > Hey All,
> >
> > I've had this for ages and figured I'd toss it out for people's consideration.
> >
> > Instead of a complete binary representation, represent common integers as 1 times a count of a small number of prime factors and a left shift.
> >
> > Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
> >
> > It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
> >
> My experiments with shifted-constants were not particularly promising:
> Normal integer constants tend to be highly clustered around zero;
<
With 80%+ of them positive
<
> There are few obvious common patterns for most larger constants.
>
> And, for this example, not much reason why multiples of 3 or 5 would be
> significantly more common than non-multiples of 3 or 5, now combined
> with needing hardware to scale things by a multiple of 3 or 5.
>
> This can maybe deal slightly better with "casual" constants, like
> "i=10000;" or similar, but these cases are a relative minority IME, and
> probably not worth the cost of needing special handling in hardware.
<
Note: 10,000 fits in 14-bits.....
>
>
> The clustering near zero is also lopsided, with significantly more
> positive constants than negative ones. The probability also seems to
> fall off with roughly the square of the distance from zero.
<
4-to-6 positive constants to 1 negative constant for integer stuff
For logical masking stuff there is a bit more on the negative side.
>
>
>
> The majority of constants which are not clustered near zero tend to fall
> into one of several categories:
> Hard-coded memory addresses or similar;
> Bit-masks or similar;
> Floating-point literals.
>
>
> Memory Addresses: Not a whole lot one can do there, but these do make a
> strong case for some sort of relative addressing, such as PC-Relative or
> having a Global Pointer or similar.
<
Which My 66000 has...
>
> In which case, the number of bits needed is proportional to
> text+data+bss, which for "most binaries" (at least of the ones I am
> testing) tends to fall in the range of 20..24 bits.
<
I am seeing 2% of the addresses needing to be 64 bits in size on a
64-bit address space machine.
>
> Though, for relative addressing for structs and arrays, ~ 8..12 bits
> seems to be "mostly sufficient" (I am having OK results with a 9-bit
> displacement for load/store ops; but as noted, I typically need
> "something bigger" to deal with PC-rel or GBR-rel).
<
I remember that the 16-bit immediates on Mc 88K were a lot more useful
in SPEC{89,93} than the 13-bit immediates in SPARC.
>
> In my case, since my C ABI is mostly built around GBR relative
> addressing, the number of full-width addresses (48-bits, in theory) is
> relatively small (in practice, most of these are references to the MMIO
> area at 0000_F0000000, *1).
<
Why does each I/O device not have its own "base register" wich you load
and then talk at it using small displacements ??
>
> *1: I am almost half-tempted to eliminate this region if using 48-bit
> addressing with the MMU enabled (JQ+MMU). This would force using 48-bit
> MMIO addresses if the MMU is enabled, but would effectively give an
> entirely flat userland region (vs, "mostly flat but with a random hole
> just below the 4GB mark for MMIO and similar").
>
>
>
> Bit Masks:
> These are the most likely to benefit from some form of shifted encoding,
> but need to be able to encode several types of patterns (to be useful).
> 0000_0010_0000_0000 / 0000_0000_0800_0000 / ...
> 0000_007F_FFFF_FFFF / 0000_0000_01FF_FFFF / ...
> 0000_3FE0_0000_0000 / 0000_0000_01FC_0000 / ...
> FFFF_FFEF_FFFF_FFFF / FFFF_FFFF_F7FF_FFFF / ...
> FFFF_FF80_0000_0000 / FFFF_FFFF_FE00_0000 / ...
> FFFF_C01F_FFFF_FFFF / FFFF_FFFF_FE03_FFFF / ...
> ...
<
Mc 88K has MAKe instruction which created a 000111000 kind of mask
using two 5-bit values. Offset and size of the 1s field. My 66000 droped
this easy to perform instruction because immediates are more efficient
in execution time.
>
> Here, one needs to hit "approximately" the right area, and have rules
> for filling in the rest of the bits.
>
> In an ideal case, one would need 16-bits of payload, with a shift, and a
> few bits as a fill pattern selector. To save bits, the shift could be a
> multiple of 4 or 8 bits.
>
> This would take ~ 22-bits, which is a bit steep.
> Cramming it down to 12 to 16 bits is possible, but makes it a lot less
> useful (eg: 10-bit literal, 4-bit nybble-shift, 2-bit fill pattern).
>
>
> Implementation is more difficult, as then one either needs to spend the
> resources to handle it in the decoder, or re-purpose another unit (such
> as the shift unit). Doing it in a way that leaves it "sufficiently
> useful" (vs simpler options), is more difficult.
>
> Shift-style units are kind of expensive, so aren't a good thing to put
> in the instruction decoder.
>
>
> Floating point literals:
> If one has a few free spots with 16-bit immediate values, and a
> Half-Float converter, then this actually works fairly well (a majority
> of floating point constants which appear in code tend to be able to be
> expressed exactly as Half-Float).
>
> Main exceptions (can't be encoded exactly / at-all):
> Powers of 10, eg: 1000.0, 0.1, 0.001, ...
> Series of 9s, eg: 9999.0
> Values that fall outside the exponent range
> Values derived from irrational numbers
> ...
>
> However, for: 1.0, 1.5, 0.75, 0.5, ...
> They work pretty well.
<
PDP-10 had a pretty workable scheme for float constants from reasonable
sized constant.

My 66000 looked at this and decided not to "waste" the 16-bit immediate
OpCode space and save it for future use.
>
> Even E4.F4 microfloats (FP8) or similar "aren't completely useless" (and
> may provide a more compact way to express things like FP-SIMD vectors).
>
> These cases may also be able to express a range of patterns which while
> not themselves FP constants, happen to exist as a bit pattern which can
> be represented exactly using packed FP16 or FP8. This includes a certain
> subset of bit-masks.
>
>
> The "exception" cases tend not to follow an obvious pattern (in terms of
> global statistics) so are "at best" handled with a lookup table, whose
> optimal contents will tend to vary from one application to another.
>
> In this latter case, the main viable options then end up being to encode
> constants inline, or fetch them from a software managed lookup table,
> with the usual sorts of tradeoffs.

Re: Compact representation for common integer constants

<70eaad9c-6d4d-4464-8bbb-3058848acc44n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16484&group=comp.arch#16484

copy link Newsgroups: comp.arch

X-Received: by 2002:aed:2010:: with SMTP id 16mr1312273qta.256.1620259797363;
Wed, 05 May 2021 17:09:57 -0700 (PDT)
X-Received: by 2002:aca:f587:: with SMTP id t129mr1057760oih.84.1620259797089;
Wed, 05 May 2021 17:09:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 5 May 2021 17:09:56 -0700 (PDT)
In-Reply-To: <824f2e40-93fd-42a8-9235-0371180d8c3an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <824f2e40-93fd-42a8-9235-0371180d8c3an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <70eaad9c-6d4d-4464-8bbb-3058848acc44n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 06 May 2021 00:09:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Thu, 6 May 2021 00:09 UTC

On Wednesday, May 5, 2021 at 7:07:39 PM UTC-5, MitchAlsup wrote:
> On Wednesday, May 5, 2021 at 3:07:43 PM UTC-5, BGB wrote:
> > On 5/5/2021 2:25 AM, JohnG wrote:
> > > Hey All,
> > >
> > > I've had this for ages and figured I'd toss it out for people's consideration.
> > >
> > > Instead of a complete binary representation, represent common integers as 1 times a count of a small number of prime factors and a left shift.
> > >
> > > Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
> > >
> > > It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
> > >
> > My experiments with shifted-constants were not particularly promising:
> > Normal integer constants tend to be highly clustered around zero;
> <
> With 80%+ of them positive
> <
> > There are few obvious common patterns for most larger constants.
> >
> > And, for this example, not much reason why multiples of 3 or 5 would be
> > significantly more common than non-multiples of 3 or 5, now combined
> > with needing hardware to scale things by a multiple of 3 or 5.
> >
> > This can maybe deal slightly better with "casual" constants, like
> > "i=10000;" or similar, but these cases are a relative minority IME, and
> > probably not worth the cost of needing special handling in hardware.
> <
> Note: 10,000 fits in 14-bits.....
> >
> >
> > The clustering near zero is also lopsided, with significantly more
> > positive constants than negative ones. The probability also seems to
> > fall off with roughly the square of the distance from zero.
> <
> 4-to-6 positive constants to 1 negative constant for integer stuff
> For logical masking stuff there is a bit more on the negative side.
> >
> >
> >
> > The majority of constants which are not clustered near zero tend to fall
> > into one of several categories:
> > Hard-coded memory addresses or similar;
> > Bit-masks or similar;
> > Floating-point literals.
> >
> >
> > Memory Addresses: Not a whole lot one can do there, but these do make a
> > strong case for some sort of relative addressing, such as PC-Relative or
> > having a Global Pointer or similar.
> <
> Which My 66000 has...
> >
> > In which case, the number of bits needed is proportional to
> > text+data+bss, which for "most binaries" (at least of the ones I am
> > testing) tends to fall in the range of 20..24 bits.
> <
> I am seeing 2% of the addresses needing to be 64 bits in size on a
> 64-bit address space machine.
> >
> > Though, for relative addressing for structs and arrays, ~ 8..12 bits
> > seems to be "mostly sufficient" (I am having OK results with a 9-bit
> > displacement for load/store ops; but as noted, I typically need
> > "something bigger" to deal with PC-rel or GBR-rel).
> <
> I remember that the 16-bit immediates on Mc 88K were a lot more useful
> in SPEC{89,93} than the 13-bit immediates in SPARC.
> >
> > In my case, since my C ABI is mostly built around GBR relative
> > addressing, the number of full-width addresses (48-bits, in theory) is
> > relatively small (in practice, most of these are references to the MMIO
> > area at 0000_F0000000, *1).
> <
> Why does each I/O device not have its own "base register" wich you load
> and then talk at it using small displacements ??
> >
> > *1: I am almost half-tempted to eliminate this region if using 48-bit
> > addressing with the MMU enabled (JQ+MMU). This would force using 48-bit
> > MMIO addresses if the MMU is enabled, but would effectively give an
> > entirely flat userland region (vs, "mostly flat but with a random hole
> > just below the 4GB mark for MMIO and similar").
> >
> >
> >
> > Bit Masks:
> > These are the most likely to benefit from some form of shifted encoding,
> > but need to be able to encode several types of patterns (to be useful).
> > 0000_0010_0000_0000 / 0000_0000_0800_0000 / ...
> > 0000_007F_FFFF_FFFF / 0000_0000_01FF_FFFF / ...
> > 0000_3FE0_0000_0000 / 0000_0000_01FC_0000 / ...
> > FFFF_FFEF_FFFF_FFFF / FFFF_FFFF_F7FF_FFFF / ...
> > FFFF_FF80_0000_0000 / FFFF_FFFF_FE00_0000 / ...
> > FFFF_C01F_FFFF_FFFF / FFFF_FFFF_FE03_FFFF / ...
> > ...
> <
> Mc 88K has MAKe instruction which created a 000111000 kind of mask
> using two 5-bit values. Offset and size of the 1s field. My 66000 droped
> this easy to perform instruction because immediates are more efficient
> in execution time.
> >
> > Here, one needs to hit "approximately" the right area, and have rules
> > for filling in the rest of the bits.
> >
> > In an ideal case, one would need 16-bits of payload, with a shift, and a
> > few bits as a fill pattern selector. To save bits, the shift could be a
> > multiple of 4 or 8 bits.
> >
> > This would take ~ 22-bits, which is a bit steep.
> > Cramming it down to 12 to 16 bits is possible, but makes it a lot less
> > useful (eg: 10-bit literal, 4-bit nybble-shift, 2-bit fill pattern).
> >
> >
> > Implementation is more difficult, as then one either needs to spend the
> > resources to handle it in the decoder, or re-purpose another unit (such
> > as the shift unit). Doing it in a way that leaves it "sufficiently
> > useful" (vs simpler options), is more difficult.
> >
> > Shift-style units are kind of expensive, so aren't a good thing to put
> > in the instruction decoder.
> >
> >
> > Floating point literals:
> > If one has a few free spots with 16-bit immediate values, and a
> > Half-Float converter, then this actually works fairly well (a majority
> > of floating point constants which appear in code tend to be able to be
> > expressed exactly as Half-Float).
> >
> > Main exceptions (can't be encoded exactly / at-all):
> > Powers of 10, eg: 1000.0, 0.1, 0.001, ...
> > Series of 9s, eg: 9999.0
> > Values that fall outside the exponent range
> > Values derived from irrational numbers
> > ...
> >
> > However, for: 1.0, 1.5, 0.75, 0.5, ...
> > They work pretty well.
> <
> PDP-10 had a pretty workable scheme for float constants from reasonable
> sized constant.
>
> My 66000 looked at this and decided not to "waste" the 16-bit immediate
> OpCode space and save it for future use.
<
However, a 32-bit constant can be used to hold a 64-bit FP value when the
OpCode decoder an see the constant consumer is double precision.
<
> >
> > Even E4.F4 microfloats (FP8) or similar "aren't completely useless" (and
> > may provide a more compact way to express things like FP-SIMD vectors).
> >
> > These cases may also be able to express a range of patterns which while
> > not themselves FP constants, happen to exist as a bit pattern which can
> > be represented exactly using packed FP16 or FP8. This includes a certain
> > subset of bit-masks.
> >
> >
> > The "exception" cases tend not to follow an obvious pattern (in terms of
> > global statistics) so are "at best" handled with a lookup table, whose
> > optimal contents will tend to vary from one application to another.
> >
> > In this latter case, the main viable options then end up being to encode
> > constants inline, or fetch them from a software managed lookup table,
> > with the usual sorts of tradeoffs.

Re: Compact representation for common integer constants

<s6vpfb$cvd$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16486&group=comp.arch#16486

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Wed, 5 May 2021 22:57:24 -0500
Organization: A noiseless patient Spider
Lines: 219
Message-ID: <s6vpfb$cvd$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 6 May 2021 03:57:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="96a964d1b7eb11fa5e85b3d22502b7c0";
logging-data="13293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JUnNHNwhVUNKZXLKI3Eqn"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:GxlDpXlmsqUJP1c9kgGimXlUC9M=
In-Reply-To: <s6v2o1$krb$1@dont-email.me>
Content-Language: en-US

by: BGB - Thu, 6 May 2021 03:57 UTC

On 5/5/2021 4:29 PM, Ivan Godard wrote:
> On 5/5/2021 1:07 PM, BGB wrote:
>> On 5/5/2021 2:25 AM, JohnG wrote:
>>> Hey All,
>>>
>>> I've had this for ages and figured I'd toss it out for people's
>>> consideration.
>>>
>>> Instead of a complete binary representation, represent common
>>> integers as 1 times a count of a small number of prime factors and a
>>> left shift.
>>>
>>> Example, an 8-bit value with 1 bit for the presence of a factor of 3,
>>> and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
>>> for left shift. So this would cover integers that were multiples of
>>> 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
>>> bits.
>>>
>>> It seemed like a nice way to encode a lot of useful constants in a
>>> compact opcode format. Variants of the basic idea are hopefully obvious.
>>>
>>
>> My experiments with shifted-constants were not particularly promising:
>>   Normal integer constants tend to be highly clustered around zero;
>>   There are few obvious common patterns for most larger constants.
>>
>> And, for this example, not much reason why multiples of 3 or 5 would
>> be significantly more common than non-multiples of 3 or 5, now
>> combined with needing hardware to scale things by a multiple of 3 or 5.
>>
>> This can maybe deal slightly better with "casual" constants, like
>> "i=10000;" or similar, but these cases are a relative minority IME,
>> and probably not worth the cost of needing special handling in hardware.
>>
>>
>> The clustering near zero is also lopsided, with significantly more
>> positive constants than negative ones. The probability also seems to
>> fall off with roughly the square of the distance from zero.
>>
>>
>>
>> The majority of constants which are not clustered near zero tend to
>> fall into one of several categories:
>>   Hard-coded memory addresses or similar;
>>   Bit-masks or similar;
>>   Floating-point literals.
>>
>>
>> Memory Addresses: Not a whole lot one can do there, but these do make
>> a strong case for some sort of relative addressing, such as
>> PC-Relative or having a Global Pointer or similar.
>>
>> In which case, the number of bits needed is proportional to
>> text+data+bss, which for "most binaries" (at least of the ones I am
>> testing) tends to fall in the range of 20..24 bits.
>>
>> Though, for relative addressing for structs and arrays, ~ 8..12 bits
>> seems to be "mostly sufficient" (I am having OK results with a 9-bit
>> displacement for load/store ops; but as noted, I typically need
>> "something bigger" to deal with PC-rel or GBR-rel).
>>
>> In my case, since my C ABI is mostly built around GBR relative
>> addressing, the number of full-width addresses (48-bits, in theory) is
>> relatively small (in practice, most of these are references to the
>> MMIO area at 0000_F0000000, *1).
>>
>> *1: I am almost half-tempted to eliminate this region if using 48-bit
>> addressing with the MMU enabled (JQ+MMU). This would force using
>> 48-bit MMIO addresses if the MMU is enabled, but would effectively
>> give an entirely flat userland region (vs, "mostly flat but with a
>> random hole just below the 4GB mark for MMIO and similar").
>>
>>
>>
>> Bit Masks:
>> These are the most likely to benefit from some form of shifted
>> encoding, but need to be able to encode several types of patterns (to
>> be useful).
>>   0000_0010_0000_0000 / 0000_0000_0800_0000 / ...
>>   0000_007F_FFFF_FFFF / 0000_0000_01FF_FFFF / ...
>>   0000_3FE0_0000_0000 / 0000_0000_01FC_0000 / ...
>>   FFFF_FFEF_FFFF_FFFF / FFFF_FFFF_F7FF_FFFF / ...
>>   FFFF_FF80_0000_0000 / FFFF_FFFF_FE00_0000 / ...
>>   FFFF_C01F_FFFF_FFFF / FFFF_FFFF_FE03_FFFF / ...
>>   ...
>>
>> Here, one needs to hit "approximately" the right area, and have rules
>> for filling in the rest of the bits.
>>
>> In an ideal case, one would need 16-bits of payload, with a shift, and
>> a few bits as a fill pattern selector. To save bits, the shift could
>> be a multiple of 4 or 8 bits.
>>
>> This would take ~ 22-bits, which is a bit steep.
>> Cramming it down to 12 to 16 bits is possible, but makes it a lot less
>> useful (eg: 10-bit literal, 4-bit nybble-shift, 2-bit fill pattern).
>>
>>
>> Implementation is more difficult, as then one either needs to spend
>> the resources to handle it in the decoder, or re-purpose another unit
>> (such as the shift unit). Doing it in a way that leaves it
>> "sufficiently useful" (vs simpler options), is more difficult.
>>
>> Shift-style units are kind of expensive, so aren't a good thing to put
>> in the instruction decoder.
>>
>>
>> Floating point literals:
>> If one has a few free spots with 16-bit immediate values, and a
>> Half-Float converter, then this actually works fairly well (a majority
>> of floating point constants which appear in code tend to be able to be
>> expressed exactly as Half-Float).
>>
>> Main exceptions (can't be encoded exactly / at-all):
>>   Powers of 10, eg: 1000.0, 0.1, 0.001, ...
>>   Series of 9s, eg: 9999.0
>>   Values that fall outside the exponent range
>>   Values derived from irrational numbers
>>   ...
>>
>> However, for: 1.0, 1.5, 0.75, 0.5, ...
>> They work pretty well.
>>
>> Even E4.F4 microfloats (FP8) or similar "aren't completely useless"
>> (and may provide a more compact way to express things like FP-SIMD
>> vectors).
>>
>> These cases may also be able to express a range of patterns which
>> while not themselves FP constants, happen to exist as a bit pattern
>> which can be represented exactly using packed FP16 or FP8. This
>> includes a certain subset of bit-masks.
>>
>>
>> The "exception" cases tend not to follow an obvious pattern (in terms
>> of global statistics) so are "at best" handled with a lookup table,
>> whose optimal contents will tend to vary from one application to another.
>>
>> In this latter case, the main viable options then end up being to
>> encode constants inline, or fetch them from a software managed lookup
>> table, with the usual sorts of tradeoffs.
>>
>
> This matches our experience with popCons very well. Let me add that not
> only "half-float" works, but also quarter-float, eight-float etc,. Low
> precision/low range FP constants are quite common; the only full-sized
> erratic (not in tabular data) floats are irrationals.
>

Quarter-Float is essentially what FP8 is, though in BJX2 only comes in
packed form via conversion ops (4xFP8 to 4xFP16, or 2xFP8 to 2xFP32).

These don't currently have constant-load forms, so loading a vector this
way is a multi-op sequence.

The 8-bit format also comes in both a signed and unsigned variants:
FP8U: E4.F4, bias=7
FP8S: S.E4.F3, bias=7

When used for image data, FP8 has similar image quality to RGB555,
although it can express HDR images and preserves more detail in darker
parts of the image.

> Mill code uses relative addressing throughout, so absolute addresses are
> very rare. Besides 0..10, powers of two and ten seem to be the only ones
> worth recognizing n our test corpus.
>
> However, special-casing constants seems to be only a marginal gain in
> general: at most 25-30% seem to be able to use the facility. The big win
> is to have some way to compose a full size constant on the code-side
> only, in one instruction without going through the data-side by loads
> etc. Needing to compose a 128-bit constant from 23-bit literals is very
> painful, but so is using a data fetch. I like Mitch's solution, although
> it requires (or falls out of, if you refer) bundle-ization in the
> pre-decoder.

The BJX2 ISA can compose 32 or 64-bit constants inline within a single
clock-cycle (via "jumbo prefixes"), though doing so still requires using
64 or 96 bits worth of encoding space.

Only a few instructions can encode a full-width 64-bit immediate though, eg:
MOV Imm64, Rn
ADD Imm64, Rn
...

Most other operations are currently limited to a 33 bit immediate.

Though, 56-bit immediate values technically exist, only certain ops can
actually use them (eg, ADD/SUB/...). Things like memory loads are
in-practice currently limited to a 33-bit displacement.

Click here to read the complete article

Re: Compact representation for common integer constants

<s6vpfq$cvd$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16487&group=comp.arch#16487

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Wed, 5 May 2021 22:57:40 -0500
Organization: A noiseless patient Spider
Lines: 326
Message-ID: <s6vpfq$cvd$2@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me>
<824f2e40-93fd-42a8-9235-0371180d8c3an@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 6 May 2021 03:57:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="96a964d1b7eb11fa5e85b3d22502b7c0";
logging-data="13293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8kuVXHAIFD8VNZy9geiwn"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1://MYHPngMhrc7DIGNs/MZy98jHE=
In-Reply-To: <824f2e40-93fd-42a8-9235-0371180d8c3an@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 6 May 2021 03:57 UTC

On 5/5/2021 7:07 PM, MitchAlsup wrote:
> On Wednesday, May 5, 2021 at 3:07:43 PM UTC-5, BGB wrote:
>> On 5/5/2021 2:25 AM, JohnG wrote:
>>> Hey All,
>>>
>>> I've had this for ages and figured I'd toss it out for people's consideration.
>>>
>>> Instead of a complete binary representation, represent common integers as 1 times a count of a small number of prime factors and a left shift.
>>>
>>> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
>>>
>>> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
>>>
>> My experiments with shifted-constants were not particularly promising:
>> Normal integer constants tend to be highly clustered around zero;
> <
> With 80%+ of them positive
> <

Yes, granted.

>> There are few obvious common patterns for most larger constants.
>>
>> And, for this example, not much reason why multiples of 3 or 5 would be
>> significantly more common than non-multiples of 3 or 5, now combined
>> with needing hardware to scale things by a multiple of 3 or 5.
>>
>> This can maybe deal slightly better with "casual" constants, like
>> "i=10000;" or similar, but these cases are a relative minority IME, and
>> probably not worth the cost of needing special handling in hardware.
> <
> Note: 10,000 fits in 14-bits.....

True, but as can be noted, most 3R encodings need a jumbo prefix in my
case to hold a value larger than 9 bits.

Eg:
ADD R12, 291, R29
Can be encoded using a 32-bit form, whereas:
ADD R12, 9999, R29
Needs a 64-bit encoding.

>>
>>
>> The clustering near zero is also lopsided, with significantly more
>> positive constants than negative ones. The probability also seems to
>> fall off with roughly the square of the distance from zero.
> <
> 4-to-6 positive constants to 1 negative constant for integer stuff
> For logical masking stuff there is a bit more on the negative side.

Yep.

>>
>>
>>
>> The majority of constants which are not clustered near zero tend to fall
>> into one of several categories:
>> Hard-coded memory addresses or similar;
>> Bit-masks or similar;
>> Floating-point literals.
>>
>>
>> Memory Addresses: Not a whole lot one can do there, but these do make a
>> strong case for some sort of relative addressing, such as PC-Relative or
>> having a Global Pointer or similar.
> <
> Which My 66000 has...

Likewise, BJX2 has these as well.

Not every ISA has them though...

>>
>> In which case, the number of bits needed is proportional to
>> text+data+bss, which for "most binaries" (at least of the ones I am
>> testing) tends to fall in the range of 20..24 bits.
> <
> I am seeing 2% of the addresses needing to be 64 bits in size on a
> 64-bit address space machine.

Pretty much.

I have a 48-bit space, but thus far pretty much everything sits in the
low 4GB.

I figure I may as well use the full pointer size though, since the
relative cost isn't too drastic, and dealing with 32/64 ABI
compatibility issues is likely to be a bigger issue in the future than
needing to use ~ 3% more RAM.

Code size isn't really effected much either way, though could be
slightly better if 48-bit Disp24 encodings or similar were reintroduced,
but this is unlikely at the moment.

I actually saw a bigger theoretical gain from the 24-bit encodings. As
can be noted, in the case where they were relevant, despite eliminating
a fair chunk of the 32-bit encodings, the overall savings were fairly
modest.

Say, program is:
70% 16-bit ops, 30% 32-bit ops.
And, we eliminate 1/3 of the 32-bit ops:
70% 16-bit ops, 20% 32-bit ops, 10% 24-bit ops.

Binary is only 4% smaller...

For comparison, LZ4 packing tends to save ~ 40-60%, though with the
tradeoff that it is necessary to unpack the code before it can be
executed (so, doesn't make much sense with a small RAM space).

Some sort of hardware-level LZ packing could be possible, in theory, but
would do evil things to the L1 I$; it would effectively need to be
multiple L1 caches glued together in order to be able to support the
in-flight LZ unpacking. The impact on the compiler would likely also be
kinda evil as it would effectively need to perform both code generation
and dictionary compression at the same time (and the constraints imposed
on it are likely to make it somewhat less effective than a traditional
LZ compressor).

>>
>> Though, for relative addressing for structs and arrays, ~ 8..12 bits
>> seems to be "mostly sufficient" (I am having OK results with a 9-bit
>> displacement for load/store ops; but as noted, I typically need
>> "something bigger" to deal with PC-rel or GBR-rel).
> <
> I remember that the 16-bit immediates on Mc 88K were a lot more useful
> in SPEC{89,93} than the 13-bit immediates in SPARC.

Could be, dunno...

As noted, a 9-bit displacement with a struct, for 32 and 64-bit
load/store, allows a struct of 2K or 4K.

Given a majority of structs tend to be less than 512 bytes, it works out
OK for the most part.

Albeit, large structs end up needing 64-bit jumbo encodings to deal with
the field displacements.

Some of this is partly a tradeoff of being able to support 16-bit
instructions, WEX encodings, predicated instructions, ... These things
did cut a few bits off the encoding space.

This was compensated for originally by being able to readily load
constants into registers when needed.

>>
>> In my case, since my C ABI is mostly built around GBR relative
>> addressing, the number of full-width addresses (48-bits, in theory) is
>> relatively small (in practice, most of these are references to the MMIO
>> area at 0000_F0000000, *1).

> <
> Why does each I/O device not have its own "base register" wich you load
> and then talk at it using small displacements ??

Each MMIO device does have a base address and range...

But one needs to load an address to it, somewhere...

so, eg, something like:
volatile uint32_t *somedevice_regs32;
volatile uint64_t *somedevice_regs64;

void somedevice_init()
{ somedevice_regs32 = (uint32_t *)0x0000F00CD000ULL;
somedevice_regs64 = (uint64_t *)somedevice_regs32;
}

uint64_t somedevice_access()
{ uint32_t fl;
uint64_t v;
fl=somedevice_regs32[SOMEDEVICE_STATUS];
... check status or whatever ...
v=somedevice_regs64[SOMEDEVICE_DATAQ];
return(v);
}

If I make the change I mentioned, it would be necessary to instead
write, say:
somedevice_regs32 = (uint32_t *)0xF000000CD000ULL;
somedevice_regs64 = (uint64_t *)somedevice_regs32;

But, they would correspond to the same address as far as the hardware is
concerned.

There is also an MMIO bypass range at 0xC00000000000...

So:
void *mmu_cant_touch_this;
mmu_cant_touch_this = 0xC00001234000;
But, userland code "can't touch this" either...

Though, with the tradeoff that the L1 cache wont necessarily see them as
the same memory. If a store is performed to one address and then read
from another which happens to alias to the same underlying memory, the
results of the first store may not (necessarily) be visible...

However, in simple cases, the direct-mapped L1 is likely to evict one
location before it loads the other. This pattern will fall on its face
if the MMU is used though.

It is possible in the future though, it is possible that the caches
could be made to keep track of this stuff, and possibly the L1 caches
would signal their intentions for a cache-line to the L2, which would
keep track of things, and then notify the L1 caches about cases where
they need to flush certain cache lines.

>>
>> *1: I am almost half-tempted to eliminate this region if using 48-bit
>> addressing with the MMU enabled (JQ+MMU). This would force using 48-bit
>> MMIO addresses if the MMU is enabled, but would effectively give an
>> entirely flat userland region (vs, "mostly flat but with a random hole
>> just below the 4GB mark for MMIO and similar").
>>
>>
>>
>> Bit Masks:
>> These are the most likely to benefit from some form of shifted encoding,
>> but need to be able to encode several types of patterns (to be useful).
>> 0000_0010_0000_0000 / 0000_0000_0800_0000 / ...
>> 0000_007F_FFFF_FFFF / 0000_0000_01FF_FFFF / ...
>> 0000_3FE0_0000_0000 / 0000_0000_01FC_0000 / ...
>> FFFF_FFEF_FFFF_FFFF / FFFF_FFFF_F7FF_FFFF / ...
>> FFFF_FF80_0000_0000 / FFFF_FFFF_FE00_0000 / ...
>> FFFF_C01F_FFFF_FFFF / FFFF_FFFF_FE03_FFFF / ...
>> ...
> <
> Mc 88K has MAKe instruction which created a 000111000 kind of mask
> using two 5-bit values. Offset and size of the 1s field. My 66000 droped
> this easy to perform instruction because immediates are more efficient
> in execution time.

Click here to read the complete article

Re: Compact representation for common integer constants

<s70cjc$1ue$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16489&group=comp.arch#16489

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Thu, 6 May 2021 11:23:55 +0200
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <s70cjc$1ue$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 6 May 2021 09:23:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ccb1bf6b05e170810f1a3941dd969192";
logging-data="1998"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19fZUExbUU6IH1oP/QWkswPsSOR9hPj0vE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:Mdg6vd3WwGaKvTAAGnOYsgwGzSU=
In-Reply-To: <s6udkp$hs5$1@dont-email.me>
Content-Language: en-GB

by: David Brown - Thu, 6 May 2021 09:23 UTC

On 05/05/2021 17:29, Ivan Godard wrote:
> On 5/5/2021 12:25 AM, JohnG wrote:
>> Hey All,
>>
>> I've had this for ages and figured I'd toss it out for people's
>> consideration.
>>
>> Instead of a complete binary representation, represent common integers
>> as 1 times a count of a small number of prime factors and a left shift.
>>
>> Example, an 8-bit value with 1 bit for the presence of a factor of 3,
>> and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits for
>> left shift. So this would cover integers that were multiples of 1, 3,
>> 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31 bits.
>>
>> It seemed like a nice way to encode a lot of useful constants in a
>> compact opcode format. Variants of the basic idea are hopefully obvious.
>>
>> -JohnG
>>
>
> You wouldn't want to actually do the implied multiplies, so this would
> become a look-up table indexed by the byte. But if you do that then you
> could have any frequent values in the table, not just those implied by
> the factors. That's how Mill popCons (Popular Constants) work, with the
> nicety that they are encoded only using otherwise wasted entropy in the
> binary ISA.

That's the way to do it, IMHO. A table also keeps the size of the
constants independent from the number of bits used for the index. And
it lets you include heavily used constants such as 42 even though they
don't fit into a nice pattern.

By the time you've got perhaps -1, 0, 1, 2, 4, 8, 10, 255 (taking 3 bits
to encode), you've covered a lot. I'd guess a 5 bit table would suffice.

Re: Compact representation for common integer constants

<s713uv$707$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16492&group=comp.arch#16492

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Thu, 6 May 2021 16:02:39 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <s713uv$707$1@gal.iecc.com>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Injection-Date: Thu, 6 May 2021 16:02:39 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="7175"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Thu, 6 May 2021 16:02 UTC

According to JohnG <gomijacogeo@gmail.com>:
>Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
>for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
>bits.
>
>It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.

It's an old idea which doesn't mean it's a bad one. It is my impression
that you can easily outsmart yourself and once you get beyond small
integers, there's no consistency in what values are popular.

The VAX among its zillion addressing modes had an immediate one with
a six-bit value. Depending on the instruction it was interpreted
as a integer between 0 and 63 or as a floating point number with
a three bit exponent and three bit fraction. That let them represent
integers from 1 to 16, 0.5, and a bunch of less common values.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Compact representation for common integer constants

<s719lp$bg$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16493&group=comp.arch#16493

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Thu, 6 May 2021 12:40:01 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <s719lp$bg$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 6 May 2021 17:40:09 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="96a964d1b7eb11fa5e85b3d22502b7c0";
logging-data="368"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LO4B+LYSlbT4SSZqaglST"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:1ydlWS0FrB5MQJ6ivsiKDydDZmM=
In-Reply-To: <s713uv$707$1@gal.iecc.com>
Content-Language: en-US

by: BGB - Thu, 6 May 2021 17:40 UTC

On 5/6/2021 11:02 AM, John Levine wrote:
> According to JohnG <gomijacogeo@gmail.com>:
>> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
>> for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
>> bits.
>>
>> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
>
> It's an old idea which doesn't mean it's a bad one. It is my impression
> that you can easily outsmart yourself and once you get beyond small
> integers, there's no consistency in what values are popular.
> > The VAX among its zillion addressing modes had an immediate one with
> a six-bit value. Depending on the instruction it was interpreted
> as a integer between 0 and 63 or as a floating point number with
> a three bit exponent and three bit fraction. That let them represent
> integers from 1 to 16, 0.5, and a bunch of less common values.
>

Generally true, though a few patterns exist.
If you can efficiently express:
Small integer values;
Small FP values;
Simple bit masks;
...

This goes a long way.
Small integer values are handled in the obvious way.

Small FP values can either be a truncated form of the value (load the
high N bits), or a small FP representation. Of these, a small FP value
typically does better for representing a useful range of constants.

This is how I (eventually) ended up having an instruction to load a
half-float immediate (which is then converted to Double on load).

Note that the conversion used in this case is essentially just repacking
the bits with a few special cases (the zero exponent being handled in a
"slightly non-standard" way).

Bit-masks are a slightly harder problem, not so much to express them
compactly, but rather to keep them cost-effective to decode. The only
real hope they have of being viable is if they can reuse an existing
shift unit or similar ("X SHL N" or "X ROL N" or similar, *1).

Generally, the range of values and patterns needed goes outside the
range of what is particularly viable for a lookup table.

It is also debatable whether they are particularly worthwhile in an ISA
which is able to encode larger constants directly (at the expense of a
slightly larger instruction and loss of ability to execute in parallel
with other instructions).

*1: Say, one has ops which split a 10-bit immediate into X (6 bits) and
N (4 bits), and then does the equivalent of (X ROL (N*4)), and then
comes in both a Zero and One extended variant.

Re: Compact representation for common integer constants

<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16494&group=comp.arch#16494

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:ae85:: with SMTP id x127mr5240893qke.436.1620324321516;
Thu, 06 May 2021 11:05:21 -0700 (PDT)
X-Received: by 2002:a05:6830:40a4:: with SMTP id x36mr1675708ott.342.1620324321178;
Thu, 06 May 2021 11:05:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 May 2021 11:05:20 -0700 (PDT)
In-Reply-To: <s719lp$bg$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 06 May 2021 18:05:21 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Thu, 6 May 2021 18:05 UTC

On Thursday, May 6, 2021 at 12:40:11 PM UTC-5, BGB wrote:
> On 5/6/2021 11:02 AM, John Levine wrote:
> > According to JohnG <gomij...@gmail.com>:
> >> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
> >> for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
> >> bits.
> >>
> >> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
> >
> > It's an old idea which doesn't mean it's a bad one. It is my impression
> > that you can easily outsmart yourself and once you get beyond small
> > integers, there's no consistency in what values are popular.
> > > The VAX among its zillion addressing modes had an immediate one with
> > a six-bit value. Depending on the instruction it was interpreted
> > as a integer between 0 and 63 or as a floating point number with
> > a three bit exponent and three bit fraction. That let them represent
> > integers from 1 to 16, 0.5, and a bunch of less common values.
> >
> Generally true, though a few patterns exist.
> If you can efficiently express:
> Small integer values;
> Small FP values;
> Simple bit masks;
> ...
>
> This goes a long way.
> Small integer values are handled in the obvious way.
>
>
> Small FP values can either be a truncated form of the value (load the
> high N bits), or a small FP representation. Of these, a small FP value
> typically does better for representing a useful range of constants.
>
> This is how I (eventually) ended up having an instruction to load a
> half-float immediate (which is then converted to Double on load).
>
> Note that the conversion used in this case is essentially just repacking
> the bits with a few special cases (the zero exponent being handled in a
> "slightly non-standard" way).
>
>
> Bit-masks are a slightly harder problem, not so much to express them
> compactly, but rather to keep them cost-effective to decode. The only
> real hope they have of being viable is if they can reuse an existing
> shift unit or similar ("X SHL N" or "X ROL N" or similar, *1).
<
But this, in general, places them in the execute domain rather than the
constant domain (along with the baggage that entails.)

Constants should never "eat" any execution time, not the packing of bits
using multiple instructions, nor loading constants from memory. No
constants should emanate only from the instruction stream.
>
> Generally, the range of values and patterns needed goes outside the
> range of what is particularly viable for a lookup table.
>
>
> It is also debatable whether they are particularly worthwhile in an ISA
> which is able to encode larger constants directly (at the expense of a
> slightly larger instruction and loss of ability to execute in parallel
> with other instructions).
>
> *1: Say, one has ops which split a 10-bit immediate into X (6 bits) and
> N (4 bits), and then does the equivalent of (X ROL (N*4)), and then
> comes in both a Zero and One extended variant.

Re: Compact representation for common integer constants

<s71imq$3b4$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16495&group=comp.arch#16495

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Thu, 6 May 2021 15:14:10 -0500
Organization: A noiseless patient Spider
Lines: 118
Message-ID: <s71imq$3b4$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 6 May 2021 20:14:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="96a964d1b7eb11fa5e85b3d22502b7c0";
logging-data="3428"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18PPtggHkKXPBscri4TWjVv"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:p+cKM3nhyH+DPzRXKOnvBgPiZN0=
In-Reply-To: <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 6 May 2021 20:14 UTC

On 5/6/2021 1:05 PM, MitchAlsup wrote:
> On Thursday, May 6, 2021 at 12:40:11 PM UTC-5, BGB wrote:
>> On 5/6/2021 11:02 AM, John Levine wrote:
>>> According to JohnG <gomij...@gmail.com>:
>>>> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
>>>> for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
>>>> bits.
>>>>
>>>> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
>>>
>>> It's an old idea which doesn't mean it's a bad one. It is my impression
>>> that you can easily outsmart yourself and once you get beyond small
>>> integers, there's no consistency in what values are popular.
>>>> The VAX among its zillion addressing modes had an immediate one with
>>> a six-bit value. Depending on the instruction it was interpreted
>>> as a integer between 0 and 63 or as a floating point number with
>>> a three bit exponent and three bit fraction. That let them represent
>>> integers from 1 to 16, 0.5, and a bunch of less common values.
>>>
>> Generally true, though a few patterns exist.
>> If you can efficiently express:
>> Small integer values;
>> Small FP values;
>> Simple bit masks;
>> ...
>>
>> This goes a long way.
>> Small integer values are handled in the obvious way.
>>
>>
>> Small FP values can either be a truncated form of the value (load the
>> high N bits), or a small FP representation. Of these, a small FP value
>> typically does better for representing a useful range of constants.
>>
>> This is how I (eventually) ended up having an instruction to load a
>> half-float immediate (which is then converted to Double on load).
>>
>> Note that the conversion used in this case is essentially just repacking
>> the bits with a few special cases (the zero exponent being handled in a
>> "slightly non-standard" way).
>>
>>
>> Bit-masks are a slightly harder problem, not so much to express them
>> compactly, but rather to keep them cost-effective to decode. The only
>> real hope they have of being viable is if they can reuse an existing
>> shift unit or similar ("X SHL N" or "X ROL N" or similar, *1).
> <
> But this, in general, places them in the execute domain rather than the
> constant domain (along with the baggage that entails.)
>
> Constants should never "eat" any execution time, not the packing of bits
> using multiple instructions, nor loading constants from memory. No
> constants should emanate only from the instruction stream.

In this case, it is the lesser of two evils:
Spend 32 bits to load a shifted-immediate constant;
Spend 64 or 96 bits to load the constant in some other way.

In any case, there will be some execution cost to using a discrete
instruction to load a constant, whether it behaves like a Shift/Rotate
operation, or is effectively just a plain MOV operation.

Either way, one is spending a clock-cycle in this case...

This wouldn't be used in cases where some other instruction is able to
fit the value into a normal immediate field.

Similarly, not going to add ARM style shifted-immediate values to random
other instructions, as this basically opens up a big ugly mess (and
would likely require another dedicated unit for this).

Re-adding something like this as an experiment, doing it for all 3 lanes
costs ~ 2500 LUTs (unreasonably expensive); Limiting it to Lane 1
reduces cost to ~ 800 LUTs (still pretty expensive for what it is).

Most of the cost in this case seemingly goes into the logic for
supporting split-immediate values in the register file (say, the normal
33-bit immed field emitted by the decoder is split into a high-part and
a low-part for sake of the EX stages).

This is handled via special virtual registers, which can be used to
select the high or low part of the immed field.

Also note that Jumbo-Loads had been implemented by using immediate
fields from both Lane 1 and 2 as a combined immediate.

Eg:
IMM: Imm33s (sign-extended to 64 bits; already existed).
JIMM: { Imm32B, Imm32A } (already existed)
RIMMH: Imm33[32: 8] (sign-extended to 64 bits, new).
RIMML: Imm33[ 7: 0] (zero-extended to 64 bits, new).

Though, it is possible such a mechanism could have other uses.

The instruction decoder manages converting the Imm10 field into the
usual Imm33 form, but needs new logic to deal with the (6,4) fields.

....

>>
>> Generally, the range of values and patterns needed goes outside the
>> range of what is particularly viable for a lookup table.
>>
>>
>> It is also debatable whether they are particularly worthwhile in an ISA
>> which is able to encode larger constants directly (at the expense of a
>> slightly larger instruction and loss of ability to execute in parallel
>> with other instructions).
>>
>> *1: Say, one has ops which split a 10-bit immediate into X (6 bits) and
>> N (4 bits), and then does the equivalent of (X ROL (N*4)), and then
>> comes in both a Zero and One extended variant.

Re: Compact representation for common integer constants

<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16497&group=comp.arch#16497

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:e50e:: with SMTP id l14mr7682753qvm.52.1620348019141;
Thu, 06 May 2021 17:40:19 -0700 (PDT)
X-Received: by 2002:a9d:60f:: with SMTP id 15mr5982917otn.81.1620348018906;
Thu, 06 May 2021 17:40:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 May 2021 17:40:18 -0700 (PDT)
In-Reply-To: <s71imq$3b4$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 07 May 2021 00:40:19 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Fri, 7 May 2021 00:40 UTC

On Thursday, May 6, 2021 at 3:14:20 PM UTC-5, BGB wrote:
> On 5/6/2021 1:05 PM, MitchAlsup wrote:
> > On Thursday, May 6, 2021 at 12:40:11 PM UTC-5, BGB wrote:
> >> On 5/6/2021 11:02 AM, John Levine wrote:
> >>> According to JohnG <gomij...@gmail.com>:
> >>>> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
> >>>> for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
> >>>> bits.
> >>>>
> >>>> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
> >>>
> >>> It's an old idea which doesn't mean it's a bad one. It is my impression
> >>> that you can easily outsmart yourself and once you get beyond small
> >>> integers, there's no consistency in what values are popular.
> >>>> The VAX among its zillion addressing modes had an immediate one with
> >>> a six-bit value. Depending on the instruction it was interpreted
> >>> as a integer between 0 and 63 or as a floating point number with
> >>> a three bit exponent and three bit fraction. That let them represent
> >>> integers from 1 to 16, 0.5, and a bunch of less common values.
> >>>
> >> Generally true, though a few patterns exist.
> >> If you can efficiently express:
> >> Small integer values;
> >> Small FP values;
> >> Simple bit masks;
> >> ...
> >>
> >> This goes a long way.
> >> Small integer values are handled in the obvious way.
> >>
> >>
> >> Small FP values can either be a truncated form of the value (load the
> >> high N bits), or a small FP representation. Of these, a small FP value
> >> typically does better for representing a useful range of constants.
> >>
> >> This is how I (eventually) ended up having an instruction to load a
> >> half-float immediate (which is then converted to Double on load).
> >>
> >> Note that the conversion used in this case is essentially just repacking
> >> the bits with a few special cases (the zero exponent being handled in a
> >> "slightly non-standard" way).
> >>
> >>
> >> Bit-masks are a slightly harder problem, not so much to express them
> >> compactly, but rather to keep them cost-effective to decode. The only
> >> real hope they have of being viable is if they can reuse an existing
> >> shift unit or similar ("X SHL N" or "X ROL N" or similar, *1).
> > <
> > But this, in general, places them in the execute domain rather than the
> > constant domain (along with the baggage that entails.)
> >
> > Constants should never "eat" any execution time, not the packing of bits
> > using multiple instructions, nor loading constants from memory. No
> > constants should emanate only from the instruction stream.
<
> In this case, it is the lesser of two evils:
> Spend 32 bits to load a shifted-immediate constant;
> Spend 64 or 96 bits to load the constant in some other way.
<
from the instruction stream using no instructions.
>
>
> In any case, there will be some execution cost to using a discrete
> instruction to load a constant, whether it behaves like a Shift/Rotate
> operation, or is effectively just a plain MOV operation.
<
The only operations My 66000 performs on constants is
a) sign extension from 16-bits to 64-bits
b) sign extension from 32-bits to 64-bits
c) expansion of 32-bit FP to 64-bit FP (denorms not allowed in 32-bit FP
......constant.
>
> Either way, one is spending a clock-cycle in this case...
<
Nope.
>
>
> This wouldn't be used in cases where some other instruction is able to
> fit the value into a normal immediate field.
>
> Similarly, not going to add ARM style shifted-immediate values to random
> other instructions, as this basically opens up a big ugly mess (and
> would likely require another dedicated unit for this).
>
Strongly agree.
>
> Re-adding something like this as an experiment, doing it for all 3 lanes
> costs ~ 2500 LUTs (unreasonably expensive); Limiting it to Lane 1
> reduces cost to ~ 800 LUTs (still pretty expensive for what it is).
>
Which is why I limited my constant manipulations to the above.
>
> Most of the cost in this case seemingly goes into the logic for
> supporting split-immediate values in the register file (say, the normal
> 33-bit immed field emitted by the decoder is split into a high-part and
> a low-part for sake of the EX stages).
>
> This is handled via special virtual registers, which can be used to
> select the high or low part of the immed field.
>
> Also note that Jumbo-Loads had been implemented by using immediate
> fields from both Lane 1 and 2 as a combined immediate.
>
> Eg:
> IMM: Imm33s (sign-extended to 64 bits; already existed).
> JIMM: { Imm32B, Imm32A } (already existed)
> RIMMH: Imm33[32: 8] (sign-extended to 64 bits, new).
> RIMML: Imm33[ 7: 0] (zero-extended to 64 bits, new).
>
> Though, it is possible such a mechanism could have other uses.
>
> The instruction decoder manages converting the Imm10 field into the
> usual Imm33 form, but needs new logic to deal with the (6,4) fields.
>
> ...
> >>
> >> Generally, the range of values and patterns needed goes outside the
> >> range of what is particularly viable for a lookup table.
> >>
> >>
> >> It is also debatable whether they are particularly worthwhile in an ISA
> >> which is able to encode larger constants directly (at the expense of a
> >> slightly larger instruction and loss of ability to execute in parallel
> >> with other instructions).
> >>
> >> *1: Say, one has ops which split a 10-bit immediate into X (6 bits) and
> >> N (4 bits), and then does the equivalent of (X ROL (N*4)), and then
> >> comes in both a Zero and One extended variant.

Re: Compact representation for common integer constants

<s72mv0$qai$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16504&group=comp.arch#16504

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Fri, 7 May 2021 01:32:56 -0500
Organization: A noiseless patient Spider
Lines: 193
Message-ID: <s72mv0$qai$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 06:33:04 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6f328b46843998533988b5b18da3b993";
logging-data="26962"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19J0mqicX+OrVG8IUYCpGyc"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:mDpRVf3Iuzhd6U4HHXJQdGgvvwg=
In-Reply-To: <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
Content-Language: en-US

by: BGB - Fri, 7 May 2021 06:32 UTC

On 5/6/2021 7:40 PM, MitchAlsup wrote:
> On Thursday, May 6, 2021 at 3:14:20 PM UTC-5, BGB wrote:
>> On 5/6/2021 1:05 PM, MitchAlsup wrote:
>>> On Thursday, May 6, 2021 at 12:40:11 PM UTC-5, BGB wrote:
>>>> On 5/6/2021 11:02 AM, John Levine wrote:
>>>>> According to JohnG <gomij...@gmail.com>:
>>>>>> Example, an 8-bit value with 1 bit for the presence of a factor of 3, and 2 bits for factors of 5, 25, and 125, and the remaining 5 bits
>>>>>> for left shift. So this would cover integers that were multiples of 1, 3, 5, 15, 25, 75, 125, and 375 then left-shifted between 0 and 31
>>>>>> bits.
>>>>>>
>>>>>> It seemed like a nice way to encode a lot of useful constants in a compact opcode format. Variants of the basic idea are hopefully obvious.
>>>>>
>>>>> It's an old idea which doesn't mean it's a bad one. It is my impression
>>>>> that you can easily outsmart yourself and once you get beyond small
>>>>> integers, there's no consistency in what values are popular.
>>>>>> The VAX among its zillion addressing modes had an immediate one with
>>>>> a six-bit value. Depending on the instruction it was interpreted
>>>>> as a integer between 0 and 63 or as a floating point number with
>>>>> a three bit exponent and three bit fraction. That let them represent
>>>>> integers from 1 to 16, 0.5, and a bunch of less common values.
>>>>>
>>>> Generally true, though a few patterns exist.
>>>> If you can efficiently express:
>>>> Small integer values;
>>>> Small FP values;
>>>> Simple bit masks;
>>>> ...
>>>>
>>>> This goes a long way.
>>>> Small integer values are handled in the obvious way.
>>>>
>>>>
>>>> Small FP values can either be a truncated form of the value (load the
>>>> high N bits), or a small FP representation. Of these, a small FP value
>>>> typically does better for representing a useful range of constants.
>>>>
>>>> This is how I (eventually) ended up having an instruction to load a
>>>> half-float immediate (which is then converted to Double on load).
>>>>
>>>> Note that the conversion used in this case is essentially just repacking
>>>> the bits with a few special cases (the zero exponent being handled in a
>>>> "slightly non-standard" way).
>>>>
>>>>
>>>> Bit-masks are a slightly harder problem, not so much to express them
>>>> compactly, but rather to keep them cost-effective to decode. The only
>>>> real hope they have of being viable is if they can reuse an existing
>>>> shift unit or similar ("X SHL N" or "X ROL N" or similar, *1).
>>> <
>>> But this, in general, places them in the execute domain rather than the
>>> constant domain (along with the baggage that entails.)
>>>
>>> Constants should never "eat" any execution time, not the packing of bits
>>> using multiple instructions, nor loading constants from memory. No
>>> constants should emanate only from the instruction stream.
> <
>> In this case, it is the lesser of two evils:
>> Spend 32 bits to load a shifted-immediate constant;
>> Spend 64 or 96 bits to load the constant in some other way.
> <
> from the instruction stream using no instructions.

As noted, at least in my case, not every operation supports immediate
fields in every context, and when they do, they may have size
limitations for other reasons.

A consequence of this is still needing to load values into registers.
The operations which have immediate forms are mostly semi-common ALU ops
and similar.

Eg: ADD/SUB/AND/OR/XOR/SHAD/SHLD/... have immediate forms.
Many other operations do not.

It is possible that it could be generalized, but then the ISA would be
less RISC style then it is already...

Then again, I did see recently that someone had listed my ISA somewhere,
but then classified it as a CISC; I don't entirely agree, but alas...

Then again, many peoples' definitions of "RISC" exclude largish
instruction-sets with variable-length instruction encodings, so alas.

But, taken at face value, then one would also need to exclude Thumb2 and
similar from the RISC category.

>>
>>
>> In any case, there will be some execution cost to using a discrete
>> instruction to load a constant, whether it behaves like a Shift/Rotate
>> operation, or is effectively just a plain MOV operation.
> <
> The only operations My 66000 performs on constants is
> a) sign extension from 16-bits to 64-bits
> b) sign extension from 32-bits to 64-bits
> c) expansion of 32-bit FP to 64-bit FP (denorms not allowed in 32-bit FP
> .....constant.

In this case, it is less of a constant manipulation per-se, and more a
hack to feed an immediate value into both ports of a ROTL or ROTLQ
operation, which can then masquerade as a constant load.

Similarly, the Half-Float constant load isn't so much a new type of
constant-load, so much as invoking the Half-To-Double converter but
giving it an immediate value rather than a register.

As can be noted though, pretty much none of the existing FPU ops can use
immediate values apart from the Half-Float converter.

>>
>> Either way, one is spending a clock-cycle in this case...
> <
> Nope.

In some cases, it might actually be able to save clock cycles vs
jumbo-loads, since jumbo ops can't be executed in parallel with other
instructions...

>>
>>
>> This wouldn't be used in cases where some other instruction is able to
>> fit the value into a normal immediate field.
>>
>> Similarly, not going to add ARM style shifted-immediate values to random
>> other instructions, as this basically opens up a big ugly mess (and
>> would likely require another dedicated unit for this).
>>
> Strongly agree.

Yeah, hence why this is not what I am doing...
Adding a whole bunch of new ALU ops with ARM-style immediate values
would be, a bit much...

>>
>> Re-adding something like this as an experiment, doing it for all 3 lanes
>> costs ~ 2500 LUTs (unreasonably expensive); Limiting it to Lane 1
>> reduces cost to ~ 800 LUTs (still pretty expensive for what it is).
>>
> Which is why I limited my constant manipulations to the above.

From what I can tell, most of the cost seems to be due to adding the
R8IMML / R8IMMH pseudo-registers.

Granted, this is in a part of the code which is filled with fairly tight
timing and "lots of angry bees" (much of the register interlock and
forwarding machinery and similar also passes through here).

>>
>> Most of the cost in this case seemingly goes into the logic for
>> supporting split-immediate values in the register file (say, the normal
>> 33-bit immed field emitted by the decoder is split into a high-part and
>> a low-part for sake of the EX stages).
>>
>> This is handled via special virtual registers, which can be used to
>> select the high or low part of the immed field.
>>
>> Also note that Jumbo-Loads had been implemented by using immediate
>> fields from both Lane 1 and 2 as a combined immediate.
>>
>> Eg:
>> IMM: Imm33s (sign-extended to 64 bits; already existed).
>> JIMM: { Imm32B, Imm32A } (already existed)
>> RIMMH: Imm33[32: 8] (sign-extended to 64 bits, new).
>> RIMML: Imm33[ 7: 0] (zero-extended to 64 bits, new).
>>
>> Though, it is possible such a mechanism could have other uses.
>>
>> The instruction decoder manages converting the Imm10 field into the
>> usual Imm33 form, but needs new logic to deal with the (6,4) fields.
>>
>> ...
>>>>
>>>> Generally, the range of values and patterns needed goes outside the
>>>> range of what is particularly viable for a lookup table.
>>>>
>>>>
>>>> It is also debatable whether they are particularly worthwhile in an ISA
>>>> which is able to encode larger constants directly (at the expense of a
>>>> slightly larger instruction and loss of ability to execute in parallel
>>>> with other instructions).
>>>>
>>>> *1: Say, one has ops which split a 10-bit immediate into X (6 bits) and
>>>> N (4 bits), and then does the equivalent of (X ROL (N*4)), and then
>>>> comes in both a Zero and One extended variant.

Re: FP8 (was Compact representation for common integer constants)

<s72qbr$ehq$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16505&group=comp.arch#16505

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 7 May 2021 09:31:07 +0200
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <s72qbr$ehq$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 May 2021 07:31:07 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22c17f3c7b2dbd5236a02e20232d8761";
logging-data="14906"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cbpzahj4xFZoiwvdWU6y23IsX/EH2fRc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.7.1
Cancel-Lock: sha1:LMeouBAWo00ZoLUZPC1pBkAsSYY=
In-Reply-To: <s6vpfb$cvd$1@dont-email.me>
Content-Language: en-US

by: Marcus - Fri, 7 May 2021 07:31 UTC

On 2021-05-06, BGB wrote:
> On 5/5/2021 4:29 PM, Ivan Godard wrote:

[snip]

>> This matches our experience with popCons very well. Let me add that
>> not only "half-float" works, but also quarter-float, eight-float etc,.
>> Low precision/low range FP constants are quite common; the only
>> full-sized erratic (not in tabular data) floats are irrationals.
>>
>
> Quarter-Float is essentially what FP8 is, though in BJX2 only comes in
> packed form via conversion ops (4xFP8 to 4xFP16, or 2xFP8 to 2xFP32).
>
> These don't currently have constant-load forms, so loading a vector this
> way is a multi-op sequence.
>
>
> The 8-bit format also comes in both a signed and unsigned variants:
> FP8U: E4.F4, bias=7
> FP8S: S.E4.F3, bias=7
>

In MRISC32 I have "float32", "float16" and "float8" (the shorter formats
usually come in packed form). The 32-bit and 16-bit formats are
identical to binary32 and binary16 in IEEE 754, respectively, while
I had to make a custom format for the 8-bit variant - which happens to
be identical to your "FP8S" format (i.e. S.E4.F3, bias=7).

I wonder: Are there any common/official 8-bit IEEE-style
(sign+exponent+fraction) FP formats that are gaining traction (ignoring
posits)?

/Marcus

Re: FP8 (was Compact representation for common integer constants)

<s73287$itf$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16506&group=comp.arch#16506

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 7 May 2021 11:45:45 +0200
Organization: Aioe.org NNTP Server
Lines: 31
Message-ID: <s73287$itf$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me> <s72qbr$ehq$1@dont-email.me>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Fri, 7 May 2021 09:45 UTC

Marcus wrote:
> In MRISC32 I have "float32", "float16" and "float8" (the shorter formats
> usually come in packed form). The 32-bit and 16-bit formats are
> identical to binary32 and binary16 in IEEE 754, respectively, while
> I had to make a custom format for the 8-bit variant - which happens to
> be identical to your "FP8S" format (i.e. S.E4.F3, bias=7).

With FP16 presumably 1:5:10 right?
>
> I wonder: Are there any common/official 8-bit IEEE-style
> (sign+exponent+fraction) FP formats that are gaining traction (ignoring
> posits)?

FP8 could be either 1:3:4 or 1:4:3, but for full ieee compliance you
really want at least 2n+3 bits in the mantissa when going up in size, so
your choice is the best one imho.

Having 2n+3 means that you never get into double rounding problems by
doing any single operation in the larger size, then rounding back down
to the target precision.

The smallest possible ieee format would need at least sign + two
exponent and two mantissa bits, so FP5 = 1:2:2.

This format would still support all of QNaN/SNaN/Inf/Normal/Sub-normal/Zero.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

The old RISC-vs-CISC (was: Compact representation for common integer constants)

<jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16509&group=comp.arch#16509

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: The old RISC-vs-CISC (was: Compact representation for common integer constants)
Date: Fri, 07 May 2021 09:15:36 -0400
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="0e14712689c4a6779c242e0d7e730ec1";
logging-data="7800"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1shompnsarpTaWGyXp3kV"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:IMXqVTl7Bpm9BB4o5dwRZooxnJ0=
sha1:4CdftqrgcBtV0iL9g7jvXBvRDwU=

by: Stefan Monnier - Fri, 7 May 2021 13:15 UTC

> It is possible that it could be generalized, but then the ISA would be less
> RISC style then it is already...
>
> Then again, I did see recently that someone had listed my ISA somewhere, but
> then classified it as a CISC; I don't entirely agree, but alas...
>
> Then again, many peoples' definitions of "RISC" exclude largish
> instruction-sets with variable-length instruction encodings, so alas.
>
> But, taken at face value, then one would also need to exclude Thumb2 and
> similar from the RISC category.

I think it's better not to worry about how other people label your ISA.

The manichean RISC-vs-CISC labeling is stupid anyway: the design space
has many more than 2 spots.

Also I think the "classic RISC" is the result of specific conditions at
that point in time which resulted in a particular sweet spot in the
design space.

But design constraints are quite different now, so applying the same
"quantitative approach" that lead to MIPS/SPARC/... will now result in
something quite different. One could argue that such new ISAs
could be called RISC-2020 ;-)

Stefan

Re: Compact representation for common integer constants

<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16510&group=comp.arch#16510

copy link Newsgroups: comp.arch

X-Received: by 2002:ae9:ef55:: with SMTP id d82mr9353808qkg.3.1620393488600;
Fri, 07 May 2021 06:18:08 -0700 (PDT)
X-Received: by 2002:a4a:48c2:: with SMTP id p185mr7743236ooa.73.1620393488331;
Fri, 07 May 2021 06:18:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 May 2021 06:18:08 -0700 (PDT)
In-Reply-To: <s6udkp$hs5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=108.91.172.125; posting-account=5gV3HwoAAAAce05MvbMFVKxb-iBCVVSr
NNTP-Posting-Host: 108.91.172.125
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s6udkp$hs5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: gomijaco...@gmail.com (JohnG)
Injection-Date: Fri, 07 May 2021 13:18:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: JohnG - Fri, 7 May 2021 13:18 UTC

On Wednesday, May 5, 2021 at 8:29:32 AM UTC-7, Ivan Godard wrote:
> You wouldn't want to actually do the implied multiplies, so this would
> become a look-up table indexed by the byte. But if you do that then you
> could have any frequent values in the table, not just those implied by
> the factors. That's how Mill popCons (Popular Constants) work, with the
> nicety that they are encoded only using otherwise wasted entropy in the
> binary ISA.

Right, but in my example, there are only 8 interesting implied multiplies and would only need an 8x9b rom or equivalent logic:

[5^n 2-bits][3^n 1-bit]
[00][0] = 9'b0_0000_0001 # 5^0 * 3^0 = 1
[00][1] = 9'b0_0000_0011 # 5^0 * 3^1 = 3
[01][0] = 9'b0_0000_0101 # 5^1 * 3^0 = 5
[01][1] = 9'b0_0000_1111 # 5^1 * 3^1 = 15
[10][0] = 9'b0_0001_1001 # 5^2 * 3^0 = 25
[10][1] = 9'b0_0100_1011 # 5^2 * 3^1 = 75
[11][0] = 9'b0_0111_1101 # 5^3 * 3^0 = 125
[11][1] = 9'b1_0111_0111 # 5^3 * 3^1 = 375

Everything else is just as many left shifts as you want to spend bits on and could even wait until the ALU if it has the ability to shift inputs.

If you have the space and time to have a table of perfect constants, then yeah, that seems like the way to go.

A couple other things I think are asked in other replies that I'll put here just to have them all in one place.

This was intended to encode small integers for arithmetic ops - addition, subtraction, multiplication, offsets, etc. for a compressed opcode space (e.g. Thumb, RVC). If the immediate doesn't have a representation in the compressed opcode, use the full-sized instruction.

Good immediate values for logical ops - and, or, xor - are, as noted, more likely to be bit masks of various widths and offsets and would have a different encoding.

I only did positive integers as I was thinking of an arch with separate add and subtract opcodes, but one could of course burn a bit to flip the sign.

The reason I thought it might be interesting to pull out factors of 3 and 5 is that I saw quite a few instances of small data structures that were 24, 48, 96 etc bytes and a lot of human-friendly loops and arrays that were multiples of 10, 100, and 1000.

I just did a kernel build and objdump'd the .o's and looked at the immediates (but not offsets) used. And, for 6-bit immediates at least, it might be a wash. I can reach 0x40, 0x50, 0x60, 0xa0, 0x78, and 0xc0, while 0x0..0x3f can hit 0x38, 0x22, 0x7. And once you get to 7 bits, you might be pretty far down the long-tail and only a dynamic count would make sense.

38722 $0x1 <-- both
17892 $0x8 <-- both
15534 $0x10 <-- both
13068 $0x18 <-- both
6801 $0x20 <-- both
5502 $0x40 <----- mine
4796 $0x30 <-- both
4576 $0x28 <-- both
4209 $0x4 <-- both
2904 $0x38 <-- 6-bit
2595 $0x48 <-- neither
2454 $0x2 <-- both
1936 $0x50 <----- mine
1417 $0x10a0 <-- neither
1328 $0x58 <-- neither
1323 $0x60 <----- mine
1200 $0x3 <-- both
1085 $0xc <-- both
1066 $0x68 <-- neither
1012 $0x22 <-- 6-bit
936 $0x70 <-- neither
780 $0xa0 <----- mine
754 $0x78 <----- mine
724 $0xffffffffffffff80 <-- neither
704 $0x88 <-- neither
686 $0xc0 <----- mine
540 $0x90 <-- neither
526 $0x200 <-- neither
496 $0x7 <-- 6-bit
482 $0x98 <-- neither
449 $0x5 <-- both
439 $0x1000 <-- neither
433 $0x14 <-- both
396 $0xb8 <-- neither
372 $0x6 <-- both
369 $0xa8 <-- neither
319 $0xb0 <-- neither
295 $0xc8 <----- mine
272 $0xfff <-- neither
260 $0xf0 <----- mine
237 $0x9 <-- 6-bit

-JohnG

Re: FP8 (was Compact representation for common integer constants)

<s73rcr$83m$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16513&group=comp.arch#16513

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 7 May 2021 18:54:50 +0200
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <s73rcr$83m$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me> <s72qbr$ehq$1@dont-email.me>
<s73287$itf$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 16:54:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="22c17f3c7b2dbd5236a02e20232d8761";
logging-data="8310"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IID+XZqnk4XMc2sDTIEgPzvHTrppXolU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:6OtwSMIoYhCzCsQkP36+T9BF7hg=
In-Reply-To: <s73287$itf$1@gioia.aioe.org>
Content-Language: en-US

by: Marcus - Fri, 7 May 2021 16:54 UTC

On 2021-05-07, Terje Mathisen wrote:
> Marcus wrote:
>> In MRISC32 I have "float32", "float16" and "float8" (the shorter formats
>> usually come in packed form). The 32-bit and 16-bit formats are
>> identical to binary32 and binary16 in IEEE 754, respectively, while
>> I had to make a custom format for the 8-bit variant - which happens to
>> be identical to your "FP8S" format (i.e. S.E4.F3, bias=7).
>
> With FP16 presumably 1:5:10 right?

Yes (1:5:10, bias 15), as in IEEE 754 (and as specified in section 1.2.4
of the MRISC32 Instruction Set Manual
https://mrisc32.bitsnbites.eu/doc/mrisc32-instruction-set-manual.pdf ).

>>
>> I wonder: Are there any common/official 8-bit IEEE-style
>> (sign+exponent+fraction) FP formats that are gaining traction (ignoring
>> posits)?
>
> FP8 could be either 1:3:4 or 1:4:3, but for full ieee compliance you
> really want at least 2n+3 bits in the mantissa when going up in size, so
> your choice is the best one imho.
>
> Having 2n+3 means that you never get into double rounding problems by
> doing any single operation in the larger size, then rounding back down
> to the target precision.

Good to know - I didn't think of that. I mostly went with the
configuration that made the most sense: The main advantage (IMO) of
floating-point numbers compared to fixed point numbers is the increased
dynamic range, and to get an increase of the dynamic range compared to
8-bit signed integers you need at least four exponent bits (and using
more than that just seemed silly).

>
> The smallest possible ieee format would need at least sign + two
> exponent and two mantissa bits, so FP5 = 1:2:2.
>
> This format would still support all of
> QNaN/SNaN/Inf/Normal/Sub-normal/Zero.
>
> Terje
>

/Marcus

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s73s6e$dqk$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16514&group=comp.arch#16514

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Fri, 7 May 2021 10:08:31 -0700
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <s73s6e$dqk$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 17:08:30 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="04b317aacfd0a41fc344a6881bfdfa79";
logging-data="14164"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kn90ASqGdZJ2kT6p6dgxu3wZqHL2q950="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:w5m+zk9t0XSiJT0tJzzHU90Xz4g=
In-Reply-To: <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: Stephen Fuld - Fri, 7 May 2021 17:08 UTC

On 5/7/2021 6:15 AM, Stefan Monnier wrote:
>> It is possible that it could be generalized, but then the ISA would be less
>> RISC style then it is already...
>>
>> Then again, I did see recently that someone had listed my ISA somewhere, but
>> then classified it as a CISC; I don't entirely agree, but alas...
>>
>> Then again, many peoples' definitions of "RISC" exclude largish
>> instruction-sets with variable-length instruction encodings, so alas.
>>
>> But, taken at face value, then one would also need to exclude Thumb2 and
>> similar from the RISC category.
>
> I think it's better not to worry about how other people label your ISA.
>
> The manichean RISC-vs-CISC labeling is stupid anyway: the design space
> has many more than 2 spots.

Amen.

> Also I think the "classic RISC" is the result of specific conditions at
> that point in time which resulted in a particular sweet spot in the
> design space.

As was "CISC", at least in the form of the 8080, which became, through
compatibility concerns, the X86.

> But design constraints are quite different now, so applying the same
> "quantitative approach" that lead to MIPS/SPARC/... will now result in
> something quite different.

Yes. Note that some of the original RISC dogmas were eliminated pretty
quickly i.e. single cycle instructions once chip densities increased
enough that it made sense to include multiply/divide instructions.

But the hardest part is to anticipate where the technology is going so
as to try to make your design reasonably optimal as technology evolves,
while still making something that works well enough in the short term to
be able to get to the long term.

> One could argue that such new ISAs
> could be called RISC-2020 ;-)

Or just not try to name them at all! :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: FP8 (was Compact representation for common integer constants)

<s73tde$mtt$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16515&group=comp.arch#16515

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 7 May 2021 12:29:11 -0500
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <s73tde$mtt$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me> <s72qbr$ehq$1@dont-email.me>
<s73287$itf$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 17:29:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e2a3090e0f7b3aa018204bf333c0c384";
logging-data="23485"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18twqtb90l60VjVOaPAGh5+"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:nO1YzVlUY27oiwjEZcFAdApVOnY=
In-Reply-To: <s73287$itf$1@gioia.aioe.org>
Content-Language: en-US

by: BGB - Fri, 7 May 2021 17:29 UTC

On 5/7/2021 4:45 AM, Terje Mathisen wrote:
> Marcus wrote:
>> In MRISC32 I have "float32", "float16" and "float8" (the shorter formats
>> usually come in packed form). The 32-bit and 16-bit formats are
>> identical to binary32 and binary16 in IEEE 754, respectively, while
>> I had to make a custom format for the 8-bit variant - which happens to
>> be identical to your "FP8S" format (i.e. S.E4.F3, bias=7).
>
> With FP16 presumably 1:5:10 right?

At least in my case, yes.

S.E4.F3 has just enough exponent bits to provide a semi-useful dynamic
range.

Whereas S.E3.F4 is only really usable if the bias is explicit /
modifiable. Otherwise, with a fixed bias of 3 or similar, the dynamic
range kinda sucks too much to be useful for all that much.

There are not currently any operations which perform computations on
FP8, currently it is mostly considered as a format for storage and
inline vector constants. In these cases, the actual vector computations
would generally be done at using FP16 / Binary16 vectors.

In this case though, there was some bias in the choice towards what
would be useful for things like RGBA color data and similar.

Re: FP8 (was Compact representation for common integer constants)

<s73u8m$tnj$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16516&group=comp.arch#16516

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 7 May 2021 10:43:50 -0700
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <s73u8m$tnj$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6utud$keb$1@dont-email.me> <s6v2o1$krb$1@dont-email.me>
<s6vpfb$cvd$1@dont-email.me> <s72qbr$ehq$1@dont-email.me>
<s73287$itf$1@gioia.aioe.org> <s73rcr$83m$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 17:43:50 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="31cd803f72e0a9db50b83817be64d3f7";
logging-data="30451"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bC0kr4CeiTgsrUvX/zj8IeMyYQVArmvQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:RznyRIBmIASzWDGpR6T0zOh59E4=
In-Reply-To: <s73rcr$83m$1@dont-email.me>
Content-Language: en-US

by: Stephen Fuld - Fri, 7 May 2021 17:43 UTC

On 5/7/2021 9:54 AM, Marcus wrote:
> On 2021-05-07, Terje Mathisen wrote:

snip

>> FP8 could be either 1:3:4 or 1:4:3, but for full ieee compliance you
>> really want at least 2n+3 bits in the mantissa when going up in size,
>> so your choice is the best one imho.
>>
>> Having 2n+3 means that you never get into double rounding problems by
>> doing any single operation in the larger size, then rounding back down
>> to the target precision.
>
> Good to know - I didn't think of that. I mostly went with the
> configuration that made the most sense: The main advantage (IMO) of
> floating-point numbers compared to fixed point numbers is the increased
> dynamic range,

Isn't that the *only* reason to do floating point?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<2021May7.195613@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16518&group=comp.arch#16518

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)
Date: Fri, 07 May 2021 17:56:13 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 48
Message-ID: <2021May7.195613@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me> <6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com> <s71imq$3b4$1@dont-email.me> <77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com> <s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
Injection-Info: reader02.eternal-september.org; posting-host="6a2c35625d0ae004fed12eeed502541b";
logging-data="21613"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mbg4xqSj+6ak8/WOL1csM"
Cancel-Lock: sha1:IkaFhF+5z/fuROLYQOEVDFRdvcg=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Fri, 7 May 2021 17:56 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>Also I think the "classic RISC" is the result of specific conditions at
>that point in time which resulted in a particular sweet spot in the
>design space.

There is some truth to this: For CPUs that perform well on
general-purpose code, RISC gave a big performance advantage in the
mid-late 1980s. That advantage eroded as Moore's law gave more
transistors to CISCs, allowing CISCs to implement pipelining (486
1989), superscalar (Pentium 1993), and OoO execution (Pentium Pro
1995).

OTOH, the CISCs did not manage to conquer low-power space, despite
attempts by both Intel (Bonnell, Silvermont ff.) and AMD (Bobcat ff.).
And they also do not compete in the low-area microcontroller market.
AFAIK every Ryzen has several ARM-based microcontrollers inside that
manage various aspects of its operation.

>But design constraints are quite different now, so applying the same
>"quantitative approach" that lead to MIPS/SPARC/... will now result in
>something quite different. One could argue that such new ISAs
>could be called RISC-2020 ;-)

Hmm, RISC-V is pretty close to MIPS (and Patterson argues that it is
pretty close to Berkeley RISC). Maybe the argument is: If the ISA
does not matter much for high-performance, high-power, high-area
cores, because we can fix it in the decoder, better optimize the ISA
for lower-power lower-area applications, where RISC still gives
benefits (of course, if you go for lowest area, you go for something
like the b16-small, but apparently people are not that keen on having
something smaller than a Cortex-M3 or so.

The other entry in the ISA field in recent years is Aarch64, which is
a RISC, and in certain ways closer to MIPS/SPARC than to the original
ARM ISA, but in others it's quite different: condition codes, double
loads and double stores, many addressing modes. And from this ISA
camp we see the A14 Firestorm core which has a significantly higher
IPC than Intel's and AMD's offerings, admittedly at a lower peak clock
rate, but also at a lower power consumption; an 8-wide execution
engine with a very deep reorder buffer (630 entries compared to 352
for Intel's Sunny Cove and 256 for AMD's Zen3) are probably a reason
for this, but the question remains: Was the ISA helpful (or less of a
hindrance than AMD64) for building such a wide and deep core?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The old RISC-vs-CISC (was: Compact representation for common integer constants)

<s747f6$5ri$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16519&group=comp.arch#16519

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The old RISC-vs-CISC (was: Compact representation for common
integer constants)
Date: Fri, 7 May 2021 15:20:45 -0500
Organization: A noiseless patient Spider
Lines: 133
Message-ID: <s747f6$5ri$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s713uv$707$1@gal.iecc.com> <s719lp$bg$1@dont-email.me>
<6ebdf17e-3188-44d2-b946-3c2e9e104672n@googlegroups.com>
<s71imq$3b4$1@dont-email.me>
<77cd652a-a3c4-48a9-a088-58fe96562dc7n@googlegroups.com>
<s72mv0$qai$1@dont-email.me> <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 May 2021 20:20:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e2a3090e0f7b3aa018204bf333c0c384";
logging-data="6002"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/nKU8MlfLJph0EpJznj7pX"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
Cancel-Lock: sha1:xuEE7LM+/AodL52xUA95HW4/zIU=
In-Reply-To: <jwv5yzuae2l.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: BGB - Fri, 7 May 2021 20:20 UTC

On 5/7/2021 8:15 AM, Stefan Monnier wrote:
>> It is possible that it could be generalized, but then the ISA would be less
>> RISC style then it is already...
>>
>> Then again, I did see recently that someone had listed my ISA somewhere, but
>> then classified it as a CISC; I don't entirely agree, but alas...
>>
>> Then again, many peoples' definitions of "RISC" exclude largish
>> instruction-sets with variable-length instruction encodings, so alas.
>>
>> But, taken at face value, then one would also need to exclude Thumb2 and
>> similar from the RISC category.
>
> I think it's better not to worry about how other people label your ISA.
>
> The manichean RISC-vs-CISC labeling is stupid anyway: the design space
> has many more than 2 spots.
>
> Also I think the "classic RISC" is the result of specific conditions at
> that point in time which resulted in a particular sweet spot in the
> design space.
>
> But design constraints are quite different now, so applying the same
> "quantitative approach" that lead to MIPS/SPARC/... will now result in
> something quite different. One could argue that such new ISAs
> could be called RISC-2020 ;-)
>

Yeah.

Classic RISC:
Fixed size instructions
Typically only a single addressing mode (Reg, Disp)
Typically only supporting aligned memory access
Aims for fixed 1 instruction per cycle
...

BJX2:
Variable length
16/32 base
16/32/64/96 extended
48 possible, but currently unused
Original 48-bit ops can no-longer be encoded.
Multiple addressing modes
Unaligned memory access (8/16/32/64)
64/128 bit alignment for 128-bit ops.
Supports explicitly parallel instructions
...

It has a few features in common with VLIW architectures as well, but
differs from traditional VLIW in that these typically use a fixed-size
bundle encoding, whereas BJX2 bundles are variable-length (composed of
32-bit units), with the 64 and 96 bit instruction encodings being
effectively a special-case of the bundle encoding.

There are several DSP architectures which do something similar to this:
Xtensa, Hexagon, ...

Though, originally, inclination toward VLIW support was based on taking
inspiration from the TMS320C6x and IA64, which, granted, use a
fixed-size bundle encoding.

Some of my earlier ideas involved using a more traditional bundle format
(just sort of awkwardly plonked into the code-stream), but this later
transformed into the current approach.

A recent experiment did also test using 24-bit instructions, but as
noted elsewhere, while in themselves they were effective at reducing the
number of 32-bit ops in size-optimized code, because 32-bit ops are
already the minority in the size-optimized case, the net savings were
fairly small (and ran into issues with the baked-in assumption that
branch-targets are 16-bit aligned, *, ...).

*: One either needs to jostle around with re-encoding the past several
instructions to re-align the instruction stream, or insert a 24-bit NOP
(if the former fails), or use 24-bit byte-aligned branch encodings
(which ended up costing more than they saved vs the "reencode ops to fix
alignment to allow for using the 16-bit branch ops" strategy).

As for resource cost:
My current BJX2 core costs ~ 4x as much as a minimalist 32-bit RISC
style core.

Or, basically, if I go for:
Fixed-length instructions
One instruction at a time
16x 32-bit GPRs
Aligned-only memory access
No Variable-Shift or FPU ops
No MMU
...

It is possible to use ~ 1/4 the LUTs of the current (full-featured) BJX2
core. The difference isn't quite drastic enough to make a strong use
case for the minimalist core (even as a secondary "IO controller" or
similar).

I can also get a lot of this back (eg, fitting a microcontroller subset
of BJX2 on an XC7S25), mostly by disabling WEX and the MMU and similar.

By disabling the FPU and a few other things, it is also possible to
shoe-horn it into an XC7A15 (basically, the smallest Xilinx FPGA I can
really manage to find on FPGA dev-boards "in the wild").

This in-turn creates a disincentive to have a separate 32-bit ISA, vs
using a slightly more restrictive subset of the 64-bit ISA.

I had also looked briefly at trying to do a 16-bit IO controller, but
then ran into the problem that there is basically no real good way to
plug a 16-bit core into my existing bus architecture.

And, it seems, short of targeting something like an iCE40 or similar,
there isn't much point.

Not sure how this compares with the ASIC space, but I suspect with
modern technologies this is probably "grain of sand" territory.

Could matter more if one is having their logic printed onto a plastic
substrate using semiconductor inks, but given these technologies (in
their commercial form) are managing things like Cortex-M cores, it
probably isn't too major of an issue.
Similarly, building such a printer would currently appear to be too
expensive for the hobbyist space.

....

Pages:12 3 4 5 6 7 8 9 10 11 12 13 14 15

server_pubkey.txt

rocksolid light 0.9.8
clearnet tor