novaBBS - comp.arch - Re: Extended double precision decimal floating point

On 4/27/2022 10:51 AM, Michael S wrote:
> On Wednesday, April 27, 2022 at 5:36:59 PM UTC+3, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>>> On Tuesday, April 26, 2022 at 11:40:55 PM UTC+3, Anton Ertl wrote:
>>>> Michael S <already...@yahoo.com> writes:
>>>> [reformatted for conventional Usenet line length]
>>>>> Another reason to prefer sign-magnitude over two-complement as
>>>>> internal format for Big Integer library is weak support for mixed
>>>>> signed-unsigned arithmetic in many instruction sets and even worse
>>>>> support in all popular HLLs that are likely candidates for
>>>>> implementation language of Big Integer libraries.
>>>> I don't see why you would need such instructions for implementing
>>>> twos-complement big-integers.
>>>>
>>>> I don't see a single reason for sign-magnitude big-integer
>>>> representation.
>> ...
>> [Again, reformatted]
>>> So, what would be your code for, say, multiplication, when numbers
>>> are stored as two-complements? Can you sketch it here? Leave away
>>> management, memory allocation, prescaling, postscaling, etc... Just
>>> show two inner loops of convolution.
>> Why would big-integer arithmetic have any scaling?
>
> Pre-scaling likely not needed, at least for multiplication.
> Post-scaling is because upfront you don't know an exact number of words in
> result. If the number is stored with LS word first then post-scaling only needs
> to update metadata. But for data stored with MS word first it can require
> a word shift.
>

Don't do this.

This is a case where low-high ordering makes more sense.

>>
>> For multiplication, multiplying a two-word integer a with a two-word
>> integer b (giving a four-word integer c) should display all the
>> interesting cases, so I'll show that.
>>
>> c1,c2,c3,c4 are two-word values
>> al and ah are the low and high words of a; likewise for bl and bh
>> umul is an unsigned multiplication of two words, with a two-word result
>
> Complication of 2c is more obvious when length of inputs and outputs is variable.
> But as example, 2x2 will do.
>

You could simplify things by making all of the bignums be power-of-2
sizes, and by padding inputs to the same size in cases where they differ.

An alternative (more relevant to _BitInt) is to expand both to the
destination width first and then do the multiply.

Though, _BitInt doesn't expand to Len1+Len2, but rather Max(Len1,Len2).

In my case, underlying Storage for _BitInt(N) is, effectively:
1.. 8: int8_t/uint8_t (char)
9.. 16: int16_t/uint16_t (short)
17.. 32: int32_t/uint32_t (int)
33.. 64: int64_t/uint64_t (long long)
65..128: int128_t/uint128_t (__int128)
129+: Pad to a multiple of 128 bits.

They were not forced to power-of-2 in this case, so, eg:
256, 384, 512, 640, ...

One could argue whether power-of-2 would be better, however, it would be
slower, and at these large sizes, the memory cost is big enough to where
power-of-2 vs non-power-of-2 might be noticeable.

Signed values remained twos complement.

At the ABI level, these larger sizes were handled similarly to structs
(pass by reference and copy as needed).

If I added a _Decimal(N) type, it would likely be similar:
1..16: __bcd64 (64b storage)
17..32: __bcd128 (128b storage)
33+: Pad to a multiple of 128-bits (32 digits).

Though, a more generalized _Decimal(N,P) type would have the issue that
now the type also needs to encode a decimal point.

I could cut off some-odd bits from the length to encode the location of
the decimal point.

As-is, BitInt is encoded (in the typesystem) like an array, just with
the array length encoding the length of the bit-vector (rather than a
number of elements of the base type).

However, modest size 1D arrays are encoded directly as a bit-field
within the "type ID word", whereas 2D arrays need to be encoded via the
"type overflow table" (itself a finite size).

So, it is preferable to leave it encoded like it were a 1D array.
It is possible multiple base-types could be used to encode how many bits
are cut off, say:
TY_DECIMAL_P0 //0 bits for decimal point (max=4095 digits, pt=0)
TY_DECIMAL_P4 //4 bits for decimal point (max=255 digits, pt=0..15)
TY_DECIMAL_P6 //6 bits for decimal point (max=63 digits, pt=0..63)
TY_DECIMAL_PA //Encoded as a 2D array

Note that for arrays of these types, these would invariably need to use
the overflow table.

Sign for signed decimal numbers would most likely be 10s complement.

Probably not really a compelling reason to justify storage sizes smaller
than 64 bits for this.

>>
>> c1 = umul(al,bl);
>> c2 = umul(ah,bl);
>> c3 = umul(al,bh);
>> c4 = umul(ah,bh);
>> if (signbit(ah))
>> c4 -= b;
>> if (signbit(bh))
>> c4 -= a;
>> c = c4<<(2*wordwidth) + (c3+c2)<<wordwidth + c1;
>>
>> I leave it to you to map the last line efficiently onto the
>> architecture at hand. And of course you want to eliminate the top
>> word of c if it is just a sign extension of the next word.
>
> Yes, that's one possible trick. Relatively simple, but not free in runtime cost.
> Esp. non-free when at least one of operands is short.
>

Say, you do 256*128, pre-expand the latter to 256, and then do a 256*256
multiply...

And/or special-case it...

>>
>> For comparison, here is the sign-magnitude counterpart:
>>
>> ahu = ah & ~SIGNBIT;
>> bhu = bh & ~SIGNBIT;
>
> I'd expect a sign to be stored separately rather than with MS word.
> It shouldn't be hard to find a "free" room for a single bit in header/metadata
> area.
>

That probably makes more sense for a format that is variable-length at
runtime, rather than for formats that are fixed-length at compile time.

Though, if one uses a sign byte, they might also use it to store the size.

One big/ugly issue with variable-length types though is that now one
needs to deal with memory management for them.

Granted, this is where something like an "alloca" mechanism makes sense,
rather than malloc'ing them and potentially having them leak.

Though, in my case, the alloca mechanism is built on top of malloc
internally, it will put them in a linked list and automatically free
anything in the list when the function returns.

This still leaves a problem though for big-integer types in globals or
in structures.

This is less of an issue for something like _BitInt(N) as the size is
known at compile time (and it can be stored in-place).

>> c1 = umul(al,bl);
>> c2 = umul(ahu,bl);
>> c3 = umul(al,bhu);
>> c4 = umul(ahu,bhu);
>> c = c4<<(2*wordwidth) + (c3+c2)<<wordwidth + c1;
>> c = c | (((ah^bh)&SIGNBIT)<<(3*wordwidth);
>>
>> Admittedly the overhead of sign-magnitude is independent of the width
>> of a and b,
>
> Or no overhead at all, when sign not stored with limb.
>
>> while the subtractions get more expensive with longer a
>> and longer b (but the number of multiplies grows with
>> length(a)*length(b), while the overhead of the subtractions grows only
>> with length(a)+length(b).
>
> I am more concerned with complication than with speed impact.
>
>> OTOH two's-complement addition is cheaper
>> than sign-magnitude addition. Plus, I think that for most signed
>> bigint operations the involved lengths are 1;
>
> Does not sound obvious.
>
>> one could special-case
>> that by using smul and leaving the subtractions away.
>>
>> I am too lazy to think through how a mixed signed-unsigned would help.
>> Would it be as simple as:
>>
>> c1 = uumul(al,bl);
>> c2 = sumul(ah,bl);
>> c3 = usmul(al,bh);
>> c4 = ssmul(ah,bh);
>> c = c4<<(2*wordwidth) + (c3+c2)<<wordwidth + c1;
>>
>> ?
>
> Yes, you got the idea.
> For arbitrary length, utilizing mixed multiplication would be rather
> complicated, with main complication related to handling of carry/borrow,
> but if done right it can spare you of O(n+m) overhead.
> However, why bother? Just use sign-magnitude format!
>

This is unnecessary, when the former strategy (just use unsigned
multiplies and then special-case the high-end results) also works well
enough.

Sign-magnitude would also be annoyingly out of place on a system where
pretty much everything else (excluding FPU types) is twos complement.

Someone could almost make a case for a 1s complement FP format:
Positive numbers: Encoded as in IEEE-754;
Negative numbers: All bits inverted.

Advantage of this is that one could then use a normal signed integer
compare with floating-point numbers, would make some other cases where
one works with FP values via integer ops easier, ...

But, alas, any deviation from the IEEE formats would not likely go
unnoticed by software.

>>
>> In that case, the absense of such an instruction is probably due to
>> the shortness of typical signed big integers.
>
> IMO, , at least for architectures that were defined more recently (30-35 years),
> a difficulty of mapping it into HLL was more important factor.
> Another factor is absence of obvious applications. Big Integer does not count
> as application for more than a single reason.
> May be, there are cases where it helps elliptic curves cryptography?
> Few years ago I played rather intensely with ECDSA verification, but already
> forgot details.
>

Main thing C is lacking here IMO are things like:
Inability to express a widening multiply;
Inability to express add/subtract with explicit carry/borrow;
...

Some prior extensions I had seen in these areas, had exposed obvious
x86isms.

So, ideally, one would need to define it in a way which does not expose
the existence/contents of the EFLAGS register or similar, ...

>
>> - anton
>> --
>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Subject	Replies	Author
Extended double precision decimal floating point By: robf...@gmail.com on Thu, 21 Apr 2022	113	robf...@gmail.com

If it has syntax, it isn't user friendly.

computers / comp.arch / Re: Extended double precision decimal floating point