Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
 If you find that it is broken please let me know here rocksolid.nodes.help


devel / comp.arch / Re: bad float, Fantasy architecture: the 10-bit byte

SubjectAuthor
* Fantasy architecture: the 10-bit byteRussell Wallace
+* Re: Fantasy architecture: the 10-bit byteRussell Wallace
|`* Re: Fantasy architecture: the 10-bit byteMitchAlsup
| `- Re: Fantasy architecture: the 10-bit byteBrett
+* Re: Fantasy architecture: the 10-bit byteStephen Fuld
|`* Re: Fantasy architecture: the 10-bit byteRussell Wallace
| `* Re: Fantasy architecture: the 10-bit byteMarcus
|  `* Re: Fantasy architecture: the 10-bit byteBGB
|   `- Re: Fantasy architecture: the 10-bit byteRussell Wallace
+* Re: Fantasy architecture: the 10-bit byteAnton Ertl
|`* Re: Fantasy architecture: the 10-bit byteRussell Wallace
| +- Re: Fantasy architecture: the 10-bit byteMitchAlsup
| +* Re: Fantasy architecture: the 10-bit byteBGB
| |`* Re: Fantasy architecture: the 10-bit byteThomas Koenig
| | `* Re: Fantasy architecture: the 10-bit byteBGB
| |  `* Re: Fantasy architecture: the 10-bit byteAnton Ertl
| |   `* Re: Fantasy architecture: the 10-bit byteBGB
| |    +* Re: Fantasy architecture: the 10-bit byterobf...@gmail.com
| |    |+* Re: Fantasy architecture: the 10-bit byteMichael S
| |    ||+- Re: Fantasy architecture: the 10-bit byterobf...@gmail.com
| |    ||+- Re: Fantasy architecture: the 10-bit byteBGB
| |    ||`* Re: Fantasy architecture: the 10-bit byteTerje Mathisen
| |    || +- Re: Fantasy architecture: the 10-bit byteMitchAlsup
| |    || +* Re: Fantasy architecture: the 10-bit byteMichael S
| |    || |+* Re: Fantasy architecture: the 10-bit byteMitchAlsup
| |    || ||`- Re: Fantasy architecture: the 10-bit byteBGB
| |    || |`* Re: Fantasy architecture: the 10-bit byteTerje Mathisen
| |    || | +- Re: Fantasy architecture: the 10-bit byteJimBrakefield
| |    || | `* Re: Fantasy architecture: the 10-bit byteMichael S
| |    || |  `* Re: Fantasy architecture: the 10-bit byteTerje Mathisen
| |    || |   +* Re: Fantasy architecture: the 10-bit byteMichael S
| |    || |   |`- Re: Fantasy architecture: the 10-bit byteAnton Ertl
| |    || |   +* Re: Fantasy architecture: the 10-bit byteMichael S
| |    || |   |`* Re: Fantasy architecture: the 10-bit byteTerje Mathisen
| |    || |   | `* Re: Fantasy architecture: the 10-bit byteMichael S
| |    || |   |  +* Re: Fantasy architecture: the 10-bit byteTerje Mathisen
| |    || |   |  |+- Re: Fantasy architecture: the 10-bit byteMitchAlsup
| |    || |   |  |`- Re: Fantasy architecture: the 10-bit byteMichael S
| |    || |   |  `- Re: Fantasy architecture: the 10-bit byteMitchAlsup
| |    || |   `- Re: Fantasy architecture: the 10-bit byteBGB
| |    || `* Re: Fantasy architecture: the 10-bit byteBGB
| |    ||  `- Re: Fantasy architecture: the 10-bit byterobf...@gmail.com
| |    |`- Re: Fantasy architecture: the 10-bit byteMitchAlsup
| |    `* Re: Fantasy architecture: the 10-bit byteAnton Ertl
| |     `- Re: Fantasy architecture: the 10-bit byteBGB
| `- Re: Fantasy architecture: the 10-bit byteAnton Ertl
+* Re: Fantasy architecture: the 10-bit byteJohn Levine
|+* Re: Fantasy architecture: the 10-bit byteMitchAlsup
||+- Re: Fantasy architecture: the 10-bit byteQuadibloc
||+* Re: Fantasy architecture: the 10-bit byteRussell Wallace
|||`* Re: Fantasy architecture: the 10-bit byterobf...@gmail.com
||| `* Re: old circuits, Fantasy architecture: the 10-bit byteJohn Levine
|||  +- Re: old circuits, Fantasy architecture: the 10-bit byteStephen Fuld
|||  `- Re: old circuits, Fantasy architecture: the 10-bit byteBGB
||`* Re: Fantasy architecture: the 10-bit byteScott Lurndal
|| `- Re: 12 bits, Fantasy architecture: the 10-bit byteJohn Levine
|+- Re: Fantasy architecture: the 10-bit byteAnton Ertl
|`- Re: Fantasy architecture: the 10-bit bytemac
+* Re: Fantasy architecture: the 10-bit byteThomas Koenig
|+* Re: bad float, Fantasy architecture: the 10-bit byteJohn Levine
||+- Re: bad float, Fantasy architecture: the 10-bit byteTim Rentsch
||`* Re: bad float, Fantasy architecture: the 10-bit byteQuadibloc
|| +* Re: bad float, Fantasy architecture: the 10-bit byteQuadibloc
|| |`- Re: bad float, Fantasy architecture: the 10-bit byteDavid Brown
|| `* Re: bad float, Fantasy architecture: the 10-bit byteAnton Ertl
||  +* Re: bad float, Fantasy architecture: the 10-bit byteJohn Levine
||  |`* Re: bad float, Fantasy architecture: the 10-bit byteMitchAlsup
||  | +- Re: bad float, Fantasy architecture: the 10-bit byteQuadibloc
||  | `- Re: bad float, Fantasy architecture: the 10-bit byteScott Lurndal
||  +* Re: bad float, Fantasy architecture: the 10-bit byteQuadibloc
||  |+- Re: bad float, Fantasy architecture: the 10-bit byteStephen Fuld
||  |`- Re: science and commerce, was bad float, Fantasy architecture: the 10-bit byteJohn Levine
||  `- Re: bad float, Fantasy architecture: the 10-bit byteJimBrakefield
|`* Re: Fantasy architecture: the 10-bit byteQuadibloc
| `* Re: Fantasy architecture: the 10-bit byteStephen Fuld
|  `* Re: Fantasy architecture: the 10-bit byteQuadibloc
|   `- Re: Fantasy architecture: the 10-bit byteQuadibloc
+- Re: Fantasy architecture: the 10-bit byteEricP
`- Re: Fantasy architecture: the 10-bit bytePaul A. Clayton

Pages:1234
Re: Fantasy architecture: the 10-bit byte

<tnq4um$d0c9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29644&group=comp.arch#29644

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Mon, 19 Dec 2022 10:53:41 -0600
Organization: A noiseless patient Spider
Lines: 110
Message-ID: <tnq4um$d0c9$1@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Dec 2022 16:53:42 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="8aab342171ee03fe9a468bf9b13e8277";
logging-data="426377"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+jyj8ciw9TfdERKmHLc992"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
Cancel-Lock: sha1:DTXnPF8MewlxnT5j8d7bXx9bDlw=
Content-Language: en-US
In-Reply-To: <tnp38h$160bb$1@newsreader4.netcologne.de>
 by: BGB - Mon, 19 Dec 2022 16:53 UTC

On 12/19/2022 1:18 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>> On 12/18/2022 5:35 PM, Russell Wallace wrote:
>>> On Sunday, December 18, 2022 at 6:03:50 PM UTC, Anton Ertl wrote:
>>>> Russell Wallace <russell...@gmail.com> writes:
>>>>> I saw Fred Brooks in an interview remark that one of his few regrets about =
>>>>> the 8-bit byte was that 64-bit floating point is not quite enough for doubl=
>>>>> e precision.
>>>>
>>>> Those who made the IEEE 754 standard thought differently.
>>>
>>> Did they? Look at the resources Intel spent on getting 80-bit floating point into the 8087.
>>>
>>
>> For most use cases, 80 bit is overkill.
>>
>> Meanwhile:
>> 64-bit can do "nearly everything";
>> 32-bit is "usually sufficient";
>> 16-bit is "often sufficient", but more hit/miss.
>>
>>
>> I suspect likely Binary16 would have been far more popular had it been
>> popular 20 years earlier.
>>
>>
>> Not sure why Binary16 wasn't a thing in the 80s, given values were
>> (generally) smaller back then.
>
> For scientific or engineering computations, 32-bit floats are
> barely adequate, or they don't work at all. This is why a lot
> of scientific code is in 64-bit. 36-bit floats were much better.
>

But, the IBM PC was mostly intended for home and business users? ...

Or, I think it was like, the main PC was intended for business, PCjr for
home users, but the PCjr was a flop, people either going for the full PC
or the PC clones.

Granted, I guess it is possible Intel could have had markets for this
other than the IBM PC ?...

> And binary16 is only useful for very limited applications, like neural
> networks. Even calculating everyday geometries will get you into
> trouble.

They are usable IME for:
Neural Nets (*1);
Pixel calculations;
Audio filtering;
...

*1: Though, in my own testing here, it is necessary to fiddle with stuff
here to try to prevent the net from training itself in ways that values
will go out of range (in effect, the training algorithm also needs to
use Binary16).

For 3D geometry, they work well enough if the 3D model isn't too large
or doesn't require too much detail.

For something like a character model, no one is likely to notice.

For something like scene geometry, usually need a little more than this.
For things like the transformation matrices, one really needs full
single precision if possible.

But, with ~ 3 significant figures, they should be fairly widely
applicable to many sorts of problems (along with the common tendency of
people to express typical values as x.xx*10^y or similar).

And, also all of the people that use 3.14 as their standard value for
PI, 2.72 for E, ... And also often don't want answers that are much
longer than 2 or 3 digits.

Granted, the dynamic range for Binary16 isn't particularly large.

But, as noted, some things it wouldn't really work. For example, it
couldn't give cent-accurate values for adding up values on a typical
shopping receipt, ...

But, for many of the types of problems where one might otherwise use,
say, 13.3 fixed point or similar, Binary16 could have been be a usable
alternative.

More just a question of why Binary32 was seemingly seen as the minimum
here, in an era where aggressive cost optimization of pretty much
everything seemed like a sensible option.

Or, for x87, why it bothered with a bunch of complex operators (such as
FSIN and FCOS, ...), when presumably the FPU could have been simpler and
cheaper had they had people to do most of these in software ?...

Or, if Binary16 itself would have added cost, why not have had an
instruction that truncated the Binary32 format to 16 bits on store, and
padded it with with zeroes on load. Shouldn't have added too much
additional cost to whatever mechanism they were using to Load/Store
values to RAM.

Re: Fantasy architecture: the 10-bit byte

<2022Dec19.191001@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29652&group=comp.arch#29652

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Mon, 19 Dec 2022 18:10:01 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 52
Message-ID: <2022Dec19.191001@mips.complang.tuwien.ac.at>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com> <tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de> <tnq4um$d0c9$1@dont-email.me>
Injection-Info: reader01.eternal-september.org; posting-host="7a5f832fb5b53f4f42cfbffecec3acfb";
logging-data="448853"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18GLi6akn++mg5zHhYTo8+B"
Cancel-Lock: sha1:eNyOLsITrrQQbMBZx58vblLu7IU=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 19 Dec 2022 18:10 UTC

BGB <cr88192@gmail.com> writes:
[8087]
>Granted, I guess it is possible Intel could have had markets for this
>other than the IBM PC ?...

Certainly.

Intel started the 8087 project in 1977 and the 8087 was launched in
1980, before the IBM PC (1981). And the IBM PC (and its clones) were
not an instant hit; e.g., in the early stages of the 386 project
(started 1982) the 386 was just a minor project.

Interestingly, there were earlier math coprocessors: The Am9511 (1977,
fixed point, but with, e.g., trigonometric functions) and Am9512
(1979, floating-point, compatible with draft 754; binary32 and
binary64), which Intel licensed as 8231 and 8232.

>Or, for x87, why it bothered with a bunch of complex operators (such as
>FSIN and FCOS, ...), when presumably the FPU could have been simpler and
>cheaper had they had people to do most of these in software ?...

CISC was in full swing in 1977, RISC at the time limited to the IBM
801 group.

There are also cases where a more complex instruction allows later
improved hardware implementations that are not possible in software,
although I think this kind of thinking was far more common in those
times than waranted, and it's unclear to me whether the 8087 or its
successors ever made use of that (e.g., additional precision in
intermediate results).

But thinking about it again (after reading up on the earlier
co-processors), I think the main reason was that the 8087 ran
asynchronously to the 8086/8088. So the benefit of FSIN was that the
CPU could start an FSIN operation, then do a lot of other stuff, then
deal with the result of the FSIN; by contrast, with an FSIN software
function, you call it, and the CPU is blocked until it is finished.

>Or, if Binary16 itself would have added cost, why not have had an
>instruction that truncated the Binary32 format to 16 bits on store, and
>padded it with with zeroes on load.

The 8087 converts everything into its internal 80+-bit format on
loading and back on storing (plus some optional rounding of the
mantissa to 53 or 23 bits on computations). Binary16 would certainly
have added a cost, and, at the time, provided no benefit. People were
not interesting in FP16 until a few years ago.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Fantasy architecture: the 10-bit byte

<2022Dec19.194710@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29653&group=comp.arch#29653

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Mon, 19 Dec 2022 18:47:10 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 54
Message-ID: <2022Dec19.194710@mips.complang.tuwien.ac.at>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
Injection-Info: reader01.eternal-september.org; posting-host="7a5f832fb5b53f4f42cfbffecec3acfb";
logging-data="448853"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190NkEf+zDu4CwKldk6B32h"
Cancel-Lock: sha1:zxGYDzmrrrcWvNMQcHoA3/Qcig8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 19 Dec 2022 18:47 UTC

Russell Wallace <russell.wallace@gmail.com> writes:
>On Sunday, December 18, 2022 at 6:03:50 PM UTC, Anton Ertl wrote:
>> Russell Wallace <russell...@gmail.com> writes:=20
>> >I saw Fred Brooks in an interview remark that one of his few regrets abo=
>ut =3D=20
>> >the 8-bit byte was that 64-bit floating point is not quite enough for do=
>ubl=3D=20
>> >e precision.=20
>>=20
>> Those who made the IEEE 754 standard thought differently.=20
>
>Did they?

According to
<https://en.wikipedia.org/wiki/IEEE-754#Basic_and_interchange_formats>:

|The binary32 and binary64 formats are the single and double formats of
|IEEE 754-1985 respectively.

>> A 10-bit data bus is 25% more expensive than an 8-bit data bus. At=20
>> the same time, with your 20/40-bit instructions, a 10-bit bus requires=20
>> 2-4 bus cycles to load an instruction, which reduces performance=20
>> significantly.=20
>
>Compared to what?

Compared to having 10-bit instructions (plus immediate operands).
There is a reason why instruction sets with many implicit registers
like the 6502 and 8086 were successful when we still had 8-bit busses
and no I-cache.

>The 2010 can add a pair of 20-bit numbers in two cycles. =

A 2010 with 10-bit instructions can take one cycle for an addition (if
you have a 20-bit ALU).

>The 6502 transistor count is much more c=
>onstrained. (For good reason; they were explicitly aimed at a minimum-cost =
>CPU for embedded applications.)

Maybe. But it was hugely successful in general-purpose computers for
the cost reason.

>I don't see a lot of room to do better with=
>in that transistor count.

I do. And actually it's a small-enough project that a dedicated
hoppyist could achieve it in reasonable time. But I am not taking it
on, proving that point is not important enough to me.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: bad float, Fantasy architecture: the 10-bit byte

<tnr2cn$j9b$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29658&group=comp.arch#29658

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: bad float, Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 01:16:07 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tnr2cn$j9b$1@gal.iecc.com>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <tnov9j$15u8a$1@newsreader4.netcologne.de>
Injection-Date: Tue, 20 Dec 2022 01:16:07 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="19755"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <tnov9j$15u8a$1@newsreader4.netcologne.de>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Tue, 20 Dec 2022 01:16 UTC

According to Thomas Koenig <tkoenig@netcologne.de>:
>Russell Wallace <russell.wallace@gmail.com> schrieb:
>
>> I saw Fred Brooks in an interview remark that one of his few
>> regrets about the 8-bit byte was that 64-bit floating point is
>> not quite enough for double precision.
>
>No regrets about the 32-bit floating point real that was introduced?
>IBM certainly knew better, from the 704.
>
>Chosing the exponent range of real and double to coincide was not
>a great decision, either.

Indeed, but none of that was as bad as doing hex normalization, which
precluded a hidden bit, and no rounding. That lost three bits of
accuracy on each operation. They retrofitted guard digits in the
field which helped, but not enough.

It is really strange that they did all sorts of simulations but
somehow missed the key fact that leading float digits are distributed
geometrically, not linearly. It's not like it's hard to figure out.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: 12 bits, Fantasy architecture: the 10-bit byte

<tnr411$qpg$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29659&group=comp.arch#29659

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: 12 bits, Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 01:44:01 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tnr411$qpg$1@gal.iecc.com>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <tnnt73$251h$1@gal.iecc.com> <0782c181-4db6-454f-80b0-341db41b5e11n@googlegroups.com> <K9%nL.38670$t5W7.6267@fx13.iad>
Injection-Date: Tue, 20 Dec 2022 01:44:01 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="27440"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <tnnt73$251h$1@gal.iecc.com> <0782c181-4db6-454f-80b0-341db41b5e11n@googlegroups.com> <K9%nL.38670$t5W7.6267@fx13.iad>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Tue, 20 Dec 2022 01:44 UTC

According to Scott Lurndal <slp53@pacbell.net>:
>MitchAlsup <MitchAlsup@aol.com> writes:
>>On Sunday, December 18, 2022 at 2:29:26 PM UTC-6, John Levine wrote:
>>> According to Russell Wallace <russell...@gmail.com>:
>>> >In that scenario, what's the best that can be done? I don't think a 32-bit RISC microprocessor can be made with the
>>> >process technology of 1974, and even if it could, I suspect it would be too expensive to succeed in the market.
>>> >
>>> >My idea is to expand the byte to 10 bits, and build a CPU with a 20-bit word size and linear address space, and a 10-bit
>>> >data bus. I'll call this architecture the 2010.
>>> Repeat after me: There Are No New Bad Ideas ...
>>>
>>> In the 1970s, BBN built the C/30 to replace the 16-bit Honeywell 316
>>> machines used in Arpanet IMPs because Honeywell stopped making 316s.
>>> The C/30 was microprogrammed to emulate a 316 so it could run the
>>> existing IMP code. BBN believed (wrongly as it turned out) that they
>>> were in the computer business so they expanded it to a larger C/70
>>> just as you proposed, with 10 bit bytes and a 20 bit address space.
>>>
>>> It was a complete failure even though it was quite fast and phyiscally
>>> worked fine. I don't know all the reasons it failed but I do recall
>>> talking to one of the people working on it who told me that porting
>>> code from 8-bit byte systems was slow and painful because there were
>>> so many implicit assumptions about the byte size.
>>>
>>> There were plenty of machines with 8-bit bytes and 16 bit addresses
>>> that used a variety of mapping and bank switching hacks to handle
>>> more memory. The PDP-11 got up to 22 bit physical addresses before
>>> it died, and the 80286 had 24 bit addresses.
>>>
>>> When IBM chose 8 bit bytes in 1964 for S/360 they really hit a
>>> sweet spot. I don't think there was ever another new architecture
>>> that didn't have power of 2 addresses other than some special
>>> purpose DSPs.
>><
>>PDP-8 was surely post 360.
>
>PDP-8 (released 1965, designed 63-64) was based on the 12-bit LINC from 1962.

The PDP-8 was a reimplemented PDP-5, which in turn was a PDP-4 cut
down from 18 bits to 12. According to Gordon Bell, the PDP-4 and -5
were influenced by the LINC. But they were different enough that DEC
later made the LINC-8 which was basically a LINC and a PDP-8 lashed
together and sharing memory, and later the much smaller and cheaper
PDP-12 which had a single CPU that could switch between PDP-8 and LINC
modes. The biggest legacy of the LINC was LINCtape which in slightly
modified form became DECtape, the block addressiable tape we all used
before there were floppy disks.

>>PDP-6 was surely post 1964.
>
>The 1963 36-bit PDP-6 followed the 18-bit PDP-1 from 1959.

The PDP-6 was nothing like the PDP-1. The PDP-1 had a single
accumulator and an instruction set that was clearly the predecessor of
the -4, the -5 and the rest of DEC's 18 and 12 bit machines. The PDP-6
had 16 registers and a large orthogonal instruction set. Legend says
that the PDP-6 was designed at the MIT Tech Model Railroading Club.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Fantasy architecture: the 10-bit byte

<tnrfd1$jt2c$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29660&group=comp.arch#29660

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Mon, 19 Dec 2022 22:58:08 -0600
Organization: A noiseless patient Spider
Lines: 147
Message-ID: <tnrfd1$jt2c$1@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 20 Dec 2022 04:58:09 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="509700c8e11564b53bfbcf4992ddaf3a";
logging-data="652364"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MEozkzSY27/OH8geMpIHq"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
Cancel-Lock: sha1:0DM0uWw2OeDVlnSmwmpXKXU1fNc=
In-Reply-To: <2022Dec19.191001@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: BGB - Tue, 20 Dec 2022 04:58 UTC

On 12/19/2022 12:10 PM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
> [8087]
>> Granted, I guess it is possible Intel could have had markets for this
>> other than the IBM PC ?...
>
> Certainly.
>
> Intel started the 8087 project in 1977 and the 8087 was launched in
> 1980, before the IBM PC (1981). And the IBM PC (and its clones) were
> not an instant hit; e.g., in the early stages of the 386 project
> (started 1982) the 386 was just a minor project.
>

OK, wasn't aware the 8087 predated the PC. Sort of thought it was a
late-stage add-on.

> Interestingly, there were earlier math coprocessors: The Am9511 (1977,
> fixed point, but with, e.g., trigonometric functions) and Am9512
> (1979, floating-point, compatible with draft 754; binary32 and
> binary64), which Intel licensed as 8231 and 8232.
>

Yeah, don't know much about them.

>> Or, for x87, why it bothered with a bunch of complex operators (such as
>> FSIN and FCOS, ...), when presumably the FPU could have been simpler and
>> cheaper had they had people to do most of these in software ?...
>
> CISC was in full swing in 1977, RISC at the time limited to the IBM
> 801 group.
>
> There are also cases where a more complex instruction allows later
> improved hardware implementations that are not possible in software,
> although I think this kind of thinking was far more common in those
> times than waranted, and it's unclear to me whether the 8087 or its
> successors ever made use of that (e.g., additional precision in
> intermediate results).
>

Dunno there.

I think the eventual result was that SSE came along, and then by x86-64
people had mostly abandoned x87 in favor of SSE, and typically doing the
math operations in software rather than using x87 ops (given SSE doesn't
have a lot of this stuff either).

If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
leave nearly everything else to software.

Did eventually re-add FDIV, in my case, which is accurate but not
particularly fast (ended up routing it though a Shift-ADD divider,
noting as it didn't seem like too huge of a jump to make it do FP divide
in addition to integer divide).

I had also re-added an FSQRT instruction, which is neither particularly
fast nor accurate. No real advantage over doing it in software.

I have noted previously that N-R seemingly is unable to converge the
last 4 bits or so of the mantissa. Seemingly when it gets to this point,
it just sorta jumps around and doesn't actually reach the answer (it
seems one would need some sub-ULP bits to converge on an exact answer here).

> But thinking about it again (after reading up on the earlier
> co-processors), I think the main reason was that the 8087 ran
> asynchronously to the 8086/8088. So the benefit of FSIN was that the
> CPU could start an FSIN operation, then do a lot of other stuff, then
> deal with the result of the FSIN; by contrast, with an FSIN software
> function, you call it, and the CPU is blocked until it is finished.
>

Probably true.

There is no async aspect in my case, as the FPU is effectively glued
onto the same logic as the integer ISA.

>> Or, if Binary16 itself would have added cost, why not have had an
>> instruction that truncated the Binary32 format to 16 bits on store, and
>> padded it with with zeroes on load.
>
> The 8087 converts everything into its internal 80+-bit format on
> loading and back on storing (plus some optional rounding of the
> mantissa to 53 or 23 bits on computations). Binary16 would certainly
> have added a cost, and, at the time, provided no benefit. People were
> not interesting in FP16 until a few years ago.
>

Probably true enough.

I had been using it in my various 3D engines, mostly in the context of
vertex arrays and similar (was supported by OpenGL via GL_HALF_FLOAT and
similar).

In my BJX2 project, its role is somewhat expanded, but it is still a
non-standard extension as far as C goes.

In my case, I use non-standard conversion rules, which can save some
cost but are a little funky. Cheaper cases don't bother with rounding
(since rounding is one of the more expensive parts of a narrowing
conversion).

Say:
F32 -> F16:
{ val[31],
(val[30]==val[29]) ||
((val[29:27]!=3'b000) &&
(val[29:27]!=3'b111)) ?
(val[30] ? 5'h1F : 5'h00) :
{ val[30], val[26:23] },
val[22:13] }

F16 -> F32:
{ val[15:14],
((val[14] || (val[14:10]==5'h00)) && !(val[14:10]==5'h1F)) ?
3:b000 : 3'b111,
val[9:0], 13'h00 }

....

Another semi-popular option is "BFloat16", or "S.E8.F7" (essentially a
truncated Binary32), but this is less well supported in my case.

For hacking something onto the x87, this could make more sense. Could be
faked in software, but on x86, this would mean wrangling the values
around in memory.

Say (assuming a GAS style syntax):
mov ax, [bx]
mov [sp+2], ax
xor ax, ax
mov [sp+2], ax
fld dword ptr [sp+0]

As first, wrote it in GAS style syntax, but then noted Intel style would
be more era appropriate...

> - anton

Re: Fantasy architecture: the 10-bit byte

<39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29661&group=comp.arch#29661

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:cce:0:b0:6fe:b359:4896 with SMTP id 197-20020a370cce000000b006feb3594896mr15674876qkm.579.1671529532179;
Tue, 20 Dec 2022 01:45:32 -0800 (PST)
X-Received: by 2002:a4a:d982:0:b0:4a5:80d2:1a06 with SMTP id
k2-20020a4ad982000000b004a580d21a06mr1048487oou.21.1671529531891; Tue, 20 Dec
2022 01:45:31 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 01:45:31 -0800 (PST)
In-Reply-To: <tnrfd1$jt2c$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at> <tnrfd1$jt2c$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Tue, 20 Dec 2022 09:45:32 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 7877
 by: robf...@gmail.com - Tue, 20 Dec 2022 09:45 UTC

On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> On 12/19/2022 12:10 PM, Anton Ertl wrote:
> > BGB <cr8...@gmail.com> writes:
> > [8087]
> >> Granted, I guess it is possible Intel could have had markets for this
> >> other than the IBM PC ?...
> >
> > Certainly.
> >
> > Intel started the 8087 project in 1977 and the 8087 was launched in
> > 1980, before the IBM PC (1981). And the IBM PC (and its clones) were
> > not an instant hit; e.g., in the early stages of the 386 project
> > (started 1982) the 386 was just a minor project.
> >
> OK, wasn't aware the 8087 predated the PC. Sort of thought it was a
> late-stage add-on.
> > Interestingly, there were earlier math coprocessors: The Am9511 (1977,
> > fixed point, but with, e.g., trigonometric functions) and Am9512
> > (1979, floating-point, compatible with draft 754; binary32 and
> > binary64), which Intel licensed as 8231 and 8232.
> >
> Yeah, don't know much about them.
> >> Or, for x87, why it bothered with a bunch of complex operators (such as
> >> FSIN and FCOS, ...), when presumably the FPU could have been simpler and
> >> cheaper had they had people to do most of these in software ?...
> >
> > CISC was in full swing in 1977, RISC at the time limited to the IBM
> > 801 group.
> >
> > There are also cases where a more complex instruction allows later
> > improved hardware implementations that are not possible in software,
> > although I think this kind of thinking was far more common in those
> > times than waranted, and it's unclear to me whether the 8087 or its
> > successors ever made use of that (e.g., additional precision in
> > intermediate results).
> >
> Dunno there.
>
>
> I think the eventual result was that SSE came along, and then by x86-64
> people had mostly abandoned x87 in favor of SSE, and typically doing the
> math operations in software rather than using x87 ops (given SSE doesn't
> have a lot of this stuff either).
>
> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> leave nearly everything else to software.
>
>
> Did eventually re-add FDIV, in my case, which is accurate but not
> particularly fast (ended up routing it though a Shift-ADD divider,
> noting as it didn't seem like too huge of a jump to make it do FP divide
> in addition to integer divide).
>
> I had also re-added an FSQRT instruction, which is neither particularly
> fast nor accurate. No real advantage over doing it in software.
>
> I have noted previously that N-R seemingly is unable to converge the
> last 4 bits or so of the mantissa. Seemingly when it gets to this point,
> it just sorta jumps around and doesn't actually reach the answer (it
> seems one would need some sub-ULP bits to converge on an exact answer here).
> > But thinking about it again (after reading up on the earlier
> > co-processors), I think the main reason was that the 8087 ran
> > asynchronously to the 8086/8088. So the benefit of FSIN was that the
> > CPU could start an FSIN operation, then do a lot of other stuff, then
> > deal with the result of the FSIN; by contrast, with an FSIN software
> > function, you call it, and the CPU is blocked until it is finished.
> >
> Probably true.
>
> There is no async aspect in my case, as the FPU is effectively glued
> onto the same logic as the integer ISA.
> >> Or, if Binary16 itself would have added cost, why not have had an
> >> instruction that truncated the Binary32 format to 16 bits on store, and
> >> padded it with with zeroes on load.
> >
> > The 8087 converts everything into its internal 80+-bit format on
> > loading and back on storing (plus some optional rounding of the
> > mantissa to 53 or 23 bits on computations). Binary16 would certainly
> > have added a cost, and, at the time, provided no benefit. People were
> > not interesting in FP16 until a few years ago.
> >
> Probably true enough.
>
> I had been using it in my various 3D engines, mostly in the context of
> vertex arrays and similar (was supported by OpenGL via GL_HALF_FLOAT and
> similar).
>
> In my BJX2 project, its role is somewhat expanded, but it is still a
> non-standard extension as far as C goes.
>
>
> In my case, I use non-standard conversion rules, which can save some
> cost but are a little funky. Cheaper cases don't bother with rounding
> (since rounding is one of the more expensive parts of a narrowing
> conversion).
>
> Say:
> F32 -> F16:
> { val[31],
> (val[30]==val[29]) ||
> ((val[29:27]!=3'b000) &&
> (val[29:27]!=3'b111)) ?
> (val[30] ? 5'h1F : 5'h00) :
> { val[30], val[26:23] },
> val[22:13] }
>
> F16 -> F32:
> { val[15:14],
> ((val[14] || (val[14:10]==5'h00)) && !(val[14:10]==5'h1F)) ?
> 3:b000 : 3'b111,
> val[9:0], 13'h00 }
>
> ...
>
>
> Another semi-popular option is "BFloat16", or "S.E8.F7" (essentially a
> truncated Binary32), but this is less well supported in my case.
>
> For hacking something onto the x87, this could make more sense. Could be
> faked in software, but on x86, this would mean wrangling the values
> around in memory.
>
> Say (assuming a GAS style syntax):
> mov ax, [bx]
> mov [sp+2], ax
> xor ax, ax
> mov [sp+2], ax
> fld dword ptr [sp+0]
>
> As first, wrote it in GAS style syntax, but then noted Intel style would
> be more era appropriate...
>
>
> > - anton
>If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
>leave nearly everything else to software.

Bare minimum is good if transistors are limited. Otherwise, might as well do
other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
in one instruction with one normalize and round. Might be less hardware than
separate FADD/FSUB/FMUL. I like to include a handful more instructions. Like
FCMP, FNEG, FABS, FSCALE. FSCALE works great with decimal float in numeric
to string conversions for multiplying by 10 and dividing by 10. There is a
minimum level of supported functions required for IEEE maybe worth looking
into.
I happen to be working on DFP right now for a 68k compatible. I have got 96-bit
triple precision working, now I am thinking of reducing that to 64-bit double
precision. I think it does not make a lot of sense to support lower precision
decimal float. Better to use binary floats for lower precision > 64 bits.

Re: Fantasy architecture: the 10-bit byte

<2022Dec20.111018@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29663&group=comp.arch#29663

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 10:10:18 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 15
Message-ID: <2022Dec20.111018@mips.complang.tuwien.ac.at>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com> <tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de> <tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at> <tnrfd1$jt2c$1@dont-email.me>
Injection-Info: reader01.eternal-september.org; posting-host="86ac993b0548a288995e829eb99d9390";
logging-data="696753"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CT2kyW/DzrrNFZcHMewny"
Cancel-Lock: sha1:ngqwr87zzVODz0Hy+uf671G/nq8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Tue, 20 Dec 2022 10:10 UTC

BGB <cr88192@gmail.com> writes:
>I think the eventual result was that SSE came along, and then by x86-64
>people had mostly abandoned x87 in favor of SSE, and typically doing the
>math operations in software rather than using x87 ops (given SSE doesn't
>have a lot of this stuff either).

They could use the 80387 instructions if they provide a benefit; they
are still there. However, I just checked this, and on a Skylake with
glibc-2.31 sin() calls __sin_fma, and I see AVX-128 instructions in
the first part of this code, so it probably does not invoke FSIN.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Fantasy architecture: the 10-bit byte

<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29664&group=comp.arch#29664

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:687:0:b0:3a5:41fd:2216 with SMTP id f7-20020ac80687000000b003a541fd2216mr90360808qth.338.1671533240902;
Tue, 20 Dec 2022 02:47:20 -0800 (PST)
X-Received: by 2002:a05:6808:1a1f:b0:35e:728d:838c with SMTP id
bk31-20020a0568081a1f00b0035e728d838cmr1270787oib.118.1671533240652; Tue, 20
Dec 2022 02:47:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 02:47:20 -0800 (PST)
In-Reply-To: <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 20 Dec 2022 10:47:20 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 36
 by: Michael S - Tue, 20 Dec 2022 10:47 UTC

On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> >If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >leave nearly everything else to software.
> Bare minimum is good if transistors are limited. Otherwise, might as well do
> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> in one instruction with one normalize and round. Might be less hardware than
> separate FADD/FSUB/FMUL.

Compare apples to apples.
FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
But it is bigger than FPU that does either FADD or FMUL per n clocks.
Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
there exist a potential for HW savings.

> I like to include a handful more instructions. Like
> FCMP, FNEG, FABS, FSCALE.

I see no reason to not provide full set of common logicals AND, OR, XOR, also NAND or ANDN.
HW cost is close to 0, usefulness in absence of shifts and of integer add/sub is not great, but above 0.

> FSCALE works great with decimal float in numeric
> to string conversions for multiplying by 10 and dividing by 10. There is a
> minimum level of supported functions required for IEEE maybe worth looking
> into.

Well, when BGB said "If it were me", he almost certainly didn't mean decimal FP.

> I happen to be working on DFP right now for a 68k compatible. I have got 96-bit
> triple precision working, now I am thinking of reducing that to 64-bit double
> precision. I think it does not make a lot of sense to support lower precision
> decimal float. Better to use binary floats for lower precision > 64 bits.

There are little reasons to include DFP hardware at all, but if nevertheless you
decided to include DFP then it has to be IEEE-754 DFP.

Re: Fantasy architecture: the 10-bit byte

<0f633dd3-acf3-4f8e-b9f0-40cec1b2fbb4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29669&group=comp.arch#29669

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:5d87:b0:4c7:91ae:7458 with SMTP id mf7-20020a0562145d8700b004c791ae7458mr6958964qvb.51.1671556510464;
Tue, 20 Dec 2022 09:15:10 -0800 (PST)
X-Received: by 2002:a05:6870:a7aa:b0:143:af88:3b6c with SMTP id
x42-20020a056870a7aa00b00143af883b6cmr2152440oao.79.1671556510183; Tue, 20
Dec 2022 09:15:10 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 09:15:09 -0800 (PST)
In-Reply-To: <a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0f633dd3-acf3-4f8e-b9f0-40cec1b2fbb4n@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Tue, 20 Dec 2022 17:15:10 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4634
 by: robf...@gmail.com - Tue, 20 Dec 2022 17:15 UTC

On Tuesday, December 20, 2022 at 5:47:22 AM UTC-5, Michael S wrote:
> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> > On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> > >If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> > >leave nearly everything else to software.
> > Bare minimum is good if transistors are limited. Otherwise, might as well do
> > other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> > in one instruction with one normalize and round. Might be less hardware than
> > separate FADD/FSUB/FMUL.
> Compare apples to apples.
> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> But it is bigger than FPU that does either FADD or FMUL per n clocks.
> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> there exist a potential for HW savings.
> > I like to include a handful more instructions. Like
> > FCMP, FNEG, FABS, FSCALE.
> I see no reason to not provide full set of common logicals AND, OR, XOR, also NAND or ANDN.
> HW cost is close to 0, usefulness in absence of shifts and of integer add/sub is not great, but above 0.

I have these on float-registers in Thor2023. Previously I had unified register files, so
all ops available for integers could also be used on floats. But I wanted to reduce
the size of the register spec field so now I am using separate integer and float
registers. Cannot do much about the 68k core though. I suppose some of the reserved
opcodes could be used.

> > FSCALE works great with decimal float in numeric
> > to string conversions for multiplying by 10 and dividing by 10. There is a
> > minimum level of supported functions required for IEEE maybe worth looking
> > into.
> Well, when BGB said "If it were me", he almost certainly didn't mean decimal FP.
> > I happen to be working on DFP right now for a 68k compatible. I have got 96-bit
> > triple precision working, now I am thinking of reducing that to 64-bit double
> > precision. I think it does not make a lot of sense to support lower precision
> > decimal float. Better to use binary floats for lower precision > 64 bits.
> There are little reasons to include DFP hardware at all, but if nevertheless you
> decided to include DFP then it has to be IEEE-754 DFP.

The decimal floating point uses the IEEE-754 format. I would like to include the
minimum set of operations in hardware. I am not sure about the DPD digits. I am
using Mike Cowlishaw’s encoding / decoding.

See: http://speleotrove.com/decimal/DPDecimal.html

Re: Fantasy architecture: the 10-bit byte

<tnsveq$oppq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29670&group=comp.arch#29670

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 12:38:16 -0600
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <tnsveq$oppq$1@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me>
<39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 20 Dec 2022 18:38:18 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="509700c8e11564b53bfbcf4992ddaf3a";
logging-data="812858"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19LyfCyE1bFBw+G2w4/t79s"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
Cancel-Lock: sha1:LTcY4gubnU82TQ4CbkiyZj5aq/Q=
In-Reply-To: <a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
Content-Language: en-US
 by: BGB - Tue, 20 Dec 2022 18:38 UTC

On 12/20/2022 4:47 AM, Michael S wrote:
> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
>>> leave nearly everything else to software.
>> Bare minimum is good if transistors are limited. Otherwise, might as well do
>> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
>> in one instruction with one normalize and round. Might be less hardware than
>> separate FADD/FSUB/FMUL.
>
> Compare apples to apples.
> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> But it is bigger than FPU that does either FADD or FMUL per n clocks.
> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> there exist a potential for HW savings.
>

Yeah.

Separate FADD/FSUB + FMUL units were cheaper in my case, due in large
part to the mantissa issues (FMA needing a much larger mantissa).

Also, an FMA would end up needing nearly twice the latency, so isn't too
big of an advantage in this sense over separate MUL/ADD.

I did end up with rounding-mode support on the main FPU, though my
low-precision unit (SIMD mostly) is hard-wired for Truncate. Hard-wiring
truncate is the cheapest option.

>> I like to include a handful more instructions. Like
>> FCMP, FNEG, FABS, FSCALE.
>
> I see no reason to not provide full set of common logicals AND, OR, XOR, also NAND or ANDN.
> HW cost is close to 0, usefulness in absence of shifts and of integer add/sub is not great, but above 0.
>

Yeah, I missed mentioning FCMP/FNEG/FABS, these ones are useful to have.

FCMP is basically a hacked integer compare, so not too bad. FNEG/FABS
are just twiddling the sign bit.

Format conversion ops are also nice to have, ...

Just would probably leave out FDIV and FSQRT (at least beyond cheap
approximate versions).

>> FSCALE works great with decimal float in numeric
>> to string conversions for multiplying by 10 and dividing by 10. There is a
>> minimum level of supported functions required for IEEE maybe worth looking
>> into.
>
> Well, when BGB said "If it were me", he almost certainly didn't mean decimal FP.
>

Yes, true.

Decimal FP is the extreme opposite of "cheap FPU"...

>> I happen to be working on DFP right now for a 68k compatible. I have got 96-bit
>> triple precision working, now I am thinking of reducing that to 64-bit double
>> precision. I think it does not make a lot of sense to support lower precision
>> decimal float. Better to use binary floats for lower precision > 64 bits.
>
> There are little reasons to include DFP hardware at all, but if nevertheless you
> decided to include DFP then it has to be IEEE-754 DFP.

Yeah.

I had at one point looked into trying to "borrow" the 128-bit format
used by .NET, but then decided against this and just sorta went with
Binary128 for the 128-bit floating point, as it ended up being both
faster and had more precision.

The MS format was, IIRC:
First 3 DWORDs, hold an integer value from 000000000 to 999999999.
Final DWORD, IIRC, holds a few more digits and the sign and exponent.

For a pure software implementation, almost makes more sense than either
of the IEEE formats, since it can be implemented relatively
straightforwardly with 32-bit integer operations.

If, albeit, it doesn't make the most efficient use of bits.

....

Re: Fantasy architecture: the 10-bit byte

<2022Dec20.191820@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29672&group=comp.arch#29672

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 18:18:20 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 38
Message-ID: <2022Dec20.191820@mips.complang.tuwien.ac.at>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com> <tnnt73$251h$1@gal.iecc.com>
Injection-Info: reader01.eternal-september.org; posting-host="86ac993b0548a288995e829eb99d9390";
logging-data="815920"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DfbX6aXQO4RlWWzxbPG8I"
Cancel-Lock: sha1:gi5FrLSO05fA6EYylviqb3aXg4M=
X-newsreader: xrn 10.11
 by: Anton Ertl - Tue, 20 Dec 2022 18:18 UTC

John Levine <johnl@taugh.com> writes:
>When IBM chose 8 bit bytes in 1964 for S/360 they really hit a
>sweet spot. I don't think there was ever another new architecture
>that didn't have power of 2 addresses other than some special
>purpose DSPs.

My thinking has been that the 16-bit-ness of the PDP-11, NOVA and
other 16-bit minis was due to the 4-bit bit-slice ALUs like the 74181
and the memory sizes of the time (12 address bits were too little, 20
bits more than necessary).

Many of these machines (e.g., the NOVA) are word-addressed, so 8-bit
bytes played a minor role.

And even the byte-addressed PDP-11 used little-endian byte order
(unlike IBM) and ASCII (unlike IBM), so apparently they did not feel a
need to be compatible with IBM's choices.

The IBM 1130 was too early for 4-bit bit-slice ALUs, and it was
word-addressed. Apparently it chose 16 bits due to the planned memory
size (4K-32K words).

Intel chose 8 bits for the 8008, because it was intended for a
terminal for IBM.

It's not obvious why Motorola chose 8 bits for the 6800. Was
compatibility with the 8-bit world already important? Was, e.g., 14
bits for the address already too close for comfort? Was it support
for BCD arithmetics (the 6800 has a DAA instruction)?

Maybe in the end we have to thank BCD arithmetic for the swift and
complete victory of powers-of-2 byte and word sizes. IIRC it was
also an important factor in IBM's decision to go for 8-bit bytes.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Fantasy architecture: the 10-bit byte

<2e275867-9119-497a-98d5-f438f33bdd34n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29673&group=comp.arch#29673

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:22b9:b0:702:e09:3af with SMTP id p25-20020a05620a22b900b007020e0903afmr819976qkh.691.1671563456820;
Tue, 20 Dec 2022 11:10:56 -0800 (PST)
X-Received: by 2002:a9d:6c81:0:b0:670:9f81:2457 with SMTP id
c1-20020a9d6c81000000b006709f812457mr1949843otr.384.1671563456510; Tue, 20
Dec 2022 11:10:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 11:10:56 -0800 (PST)
In-Reply-To: <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b98a:605:cbf8:f0d2;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b98a:605:cbf8:f0d2
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2e275867-9119-497a-98d5-f438f33bdd34n@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 20 Dec 2022 19:10:56 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 9310
 by: MitchAlsup - Tue, 20 Dec 2022 19:10 UTC

On Tuesday, December 20, 2022 at 3:45:33 AM UTC-6, robf...@gmail.com wrote:
> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> > On 12/19/2022 12:10 PM, Anton Ertl wrote:
> > > BGB <cr8...@gmail.com> writes:
> > > [8087]
> > >> Granted, I guess it is possible Intel could have had markets for this
> > >> other than the IBM PC ?...
> > >
> > > Certainly.
> > >
> > > Intel started the 8087 project in 1977 and the 8087 was launched in
> > > 1980, before the IBM PC (1981). And the IBM PC (and its clones) were
> > > not an instant hit; e.g., in the early stages of the 386 project
> > > (started 1982) the 386 was just a minor project.
> > >
> > OK, wasn't aware the 8087 predated the PC. Sort of thought it was a
> > late-stage add-on.
> > > Interestingly, there were earlier math coprocessors: The Am9511 (1977,
> > > fixed point, but with, e.g., trigonometric functions) and Am9512
> > > (1979, floating-point, compatible with draft 754; binary32 and
> > > binary64), which Intel licensed as 8231 and 8232.
> > >
> > Yeah, don't know much about them.
> > >> Or, for x87, why it bothered with a bunch of complex operators (such as
> > >> FSIN and FCOS, ...), when presumably the FPU could have been simpler and
> > >> cheaper had they had people to do most of these in software ?...
> > >
> > > CISC was in full swing in 1977, RISC at the time limited to the IBM
> > > 801 group.
> > >
> > > There are also cases where a more complex instruction allows later
> > > improved hardware implementations that are not possible in software,
> > > although I think this kind of thinking was far more common in those
> > > times than waranted, and it's unclear to me whether the 8087 or its
> > > successors ever made use of that (e.g., additional precision in
> > > intermediate results).
> > >
> > Dunno there.
> >
> >
> > I think the eventual result was that SSE came along, and then by x86-64
> > people had mostly abandoned x87 in favor of SSE, and typically doing the
> > math operations in software rather than using x87 ops (given SSE doesn't
> > have a lot of this stuff either).
> >
> > If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> > leave nearly everything else to software.
> >
> >
> > Did eventually re-add FDIV, in my case, which is accurate but not
> > particularly fast (ended up routing it though a Shift-ADD divider,
> > noting as it didn't seem like too huge of a jump to make it do FP divide
> > in addition to integer divide).
> >
> > I had also re-added an FSQRT instruction, which is neither particularly
> > fast nor accurate. No real advantage over doing it in software.
> >
> > I have noted previously that N-R seemingly is unable to converge the
> > last 4 bits or so of the mantissa. Seemingly when it gets to this point,
> > it just sorta jumps around and doesn't actually reach the answer (it
> > seems one would need some sub-ULP bits to converge on an exact answer here).
> > > But thinking about it again (after reading up on the earlier
> > > co-processors), I think the main reason was that the 8087 ran
> > > asynchronously to the 8086/8088. So the benefit of FSIN was that the
> > > CPU could start an FSIN operation, then do a lot of other stuff, then
> > > deal with the result of the FSIN; by contrast, with an FSIN software
> > > function, you call it, and the CPU is blocked until it is finished.
> > >
> > Probably true.
> >
> > There is no async aspect in my case, as the FPU is effectively glued
> > onto the same logic as the integer ISA.
> > >> Or, if Binary16 itself would have added cost, why not have had an
> > >> instruction that truncated the Binary32 format to 16 bits on store, and
> > >> padded it with with zeroes on load.
> > >
> > > The 8087 converts everything into its internal 80+-bit format on
> > > loading and back on storing (plus some optional rounding of the
> > > mantissa to 53 or 23 bits on computations). Binary16 would certainly
> > > have added a cost, and, at the time, provided no benefit. People were
> > > not interesting in FP16 until a few years ago.
> > >
> > Probably true enough.
> >
> > I had been using it in my various 3D engines, mostly in the context of
> > vertex arrays and similar (was supported by OpenGL via GL_HALF_FLOAT and
> > similar).
> >
> > In my BJX2 project, its role is somewhat expanded, but it is still a
> > non-standard extension as far as C goes.
> >
> >
> > In my case, I use non-standard conversion rules, which can save some
> > cost but are a little funky. Cheaper cases don't bother with rounding
> > (since rounding is one of the more expensive parts of a narrowing
> > conversion).
> >
> > Say:
> > F32 -> F16:
> > { val[31],
> > (val[30]==val[29]) ||
> > ((val[29:27]!=3'b000) &&
> > (val[29:27]!=3'b111)) ?
> > (val[30] ? 5'h1F : 5'h00) :
> > { val[30], val[26:23] },
> > val[22:13] }
> >
> > F16 -> F32:
> > { val[15:14],
> > ((val[14] || (val[14:10]==5'h00)) && !(val[14:10]==5'h1F)) ?
> > 3:b000 : 3'b111,
> > val[9:0], 13'h00 }
> >
> > ...
> >
> >
> > Another semi-popular option is "BFloat16", or "S.E8.F7" (essentially a
> > truncated Binary32), but this is less well supported in my case.
> >
> > For hacking something onto the x87, this could make more sense. Could be
> > faked in software, but on x86, this would mean wrangling the values
> > around in memory.
> >
> > Say (assuming a GAS style syntax):
> > mov ax, [bx]
> > mov [sp+2], ax
> > xor ax, ax
> > mov [sp+2], ax
> > fld dword ptr [sp+0]
> >
> > As first, wrote it in GAS style syntax, but then noted Intel style would
> > be more era appropriate...
> >
> >
> > > - anton
> >If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >leave nearly everything else to software.
<
> Bare minimum is good if transistors are limited. Otherwise, might as well do
> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> in one instruction with one normalize and round. Might be less hardware than
> separate FADD/FSUB/FMUL.
<
It is fair to say FMAC is not much more HW but it is not fair to say it is less
HW, it is not. The augend adder needs to be 52×3+50 bits long, and the
normalize shifter is bigger.
<
> I like to include a handful more instructions. Like
> FCMP, FNEG, FABS, FSCALE. FSCALE works great with decimal float in numeric
<
With a few special cases, FCMP is an integer compare. The others are 1 cycle
calculations. Probably not wise to run them through the 3-4-5 cycle FMAC unit.
<
> to string conversions for multiplying by 10 and dividing by 10. There is a
> minimum level of supported functions required for IEEE maybe worth looking
> into.
> I happen to be working on DFP right now for a 68k compatible. I have got 96-bit
> triple precision working, now I am thinking of reducing that to 64-bit double
> precision. I think it does not make a lot of sense to support lower precision
> decimal float. Better to use binary floats for lower precision > 64 bits.

Re: Fantasy architecture: the 10-bit byte

<tnt3gu$9nn$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29674&group=comp.arch#29674

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!rd9pRsUZyxkRLAEK7e/Uzw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 20:47:42 +0100
Organization: Aioe.org NNTP Server
Message-ID: <tnt3gu$9nn$1@gioia.aioe.org>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me>
<39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="9975"; posting-host="rd9pRsUZyxkRLAEK7e/Uzw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.14
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Tue, 20 Dec 2022 19:47 UTC

Michael S wrote:
> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
>>> leave nearly everything else to software.
>> Bare minimum is good if transistors are limited. Otherwise, might as well do
>> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
>> in one instruction with one normalize and round. Might be less hardware than
>> separate FADD/FSUB/FMUL.
>
> Compare apples to apples.
> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> But it is bigger than FPU that does either FADD or FMUL per n clocks.
> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> there exist a potential for HW savings.
>
For me, a sufficient reason to include FMA is the fact that with the FMA
wide normalizer we gain the ability to handle subnormal inputs and
outputs at zero cycle cost and very marginal gate cost.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Fantasy architecture: the 10-bit byte

<7a15e6cb-f2c9-4741-91f3-b2d45ff74c69n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29675&group=comp.arch#29675

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6896:0:b0:3a5:6aa1:7cd6 with SMTP id m22-20020ac86896000000b003a56aa17cd6mr71577809qtq.146.1671567206043;
Tue, 20 Dec 2022 12:13:26 -0800 (PST)
X-Received: by 2002:a05:6870:ebc3:b0:13c:97e9:5d40 with SMTP id
cr3-20020a056870ebc300b0013c97e95d40mr2222412oab.42.1671567205701; Tue, 20
Dec 2022 12:13:25 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 12:13:25 -0800 (PST)
In-Reply-To: <tnt3gu$9nn$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6d05:5e47:7756:554b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6d05:5e47:7756:554b
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com> <tnt3gu$9nn$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7a15e6cb-f2c9-4741-91f3-b2d45ff74c69n@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 20 Dec 2022 20:13:26 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3297
 by: MitchAlsup - Tue, 20 Dec 2022 20:13 UTC

On Tuesday, December 20, 2022 at 1:47:45 PM UTC-6, Terje Mathisen wrote:
> Michael S wrote:
> > On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> >> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> >>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >>> leave nearly everything else to software.
> >> Bare minimum is good if transistors are limited. Otherwise, might as well do
> >> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> >> in one instruction with one normalize and round. Might be less hardware than
> >> separate FADD/FSUB/FMUL.
> >
> > Compare apples to apples.
> > FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> > But it is bigger than FPU that does either FADD or FMUL per n clocks.
> > Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> > Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> > plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> > there exist a potential for HW savings.
> >
> For me, a sufficient reason to include FMA is the fact that with the FMA
> wide normalizer we gain the ability to handle subnormal inputs and
> outputs at zero cycle cost and very marginal gate cost.
<
For me, a sufficient reason to include FMAC is that IEEE 754 demands it.
The rest, as they say, is gravy.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Fantasy architecture: the 10-bit byte

<tntio2$ql9h$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29677&group=comp.arch#29677

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 18:07:28 -0600
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <tntio2$ql9h$1@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <2022Dec20.111018@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 21 Dec 2022 00:07:30 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="1cedb4eb961a5e82c67f3875b01dfb56";
logging-data="873777"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19dfShKLa1Eee3vt8rH8Eze"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
Cancel-Lock: sha1:IPpDtKPhU0yTn3cNGqHmk9g5rpw=
Content-Language: en-US
In-Reply-To: <2022Dec20.111018@mips.complang.tuwien.ac.at>
 by: BGB - Wed, 21 Dec 2022 00:07 UTC

On 12/20/2022 4:10 AM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> I think the eventual result was that SSE came along, and then by x86-64
>> people had mostly abandoned x87 in favor of SSE, and typically doing the
>> math operations in software rather than using x87 ops (given SSE doesn't
>> have a lot of this stuff either).
>
> They could use the 80387 instructions if they provide a benefit; they
> are still there. However, I just checked this, and on a Skylake with
> glibc-2.31 sin() calls __sin_fma, and I see AVX-128 instructions in
> the first part of this code, so it probably does not invoke FSIN.
>

It seems that once x86-64 came along, and code moved over to it, the use
of x87 ops was typically dropped entirely (even if one could still in
premise make use of x87 for FSIN/FCOS/etc).

> - anton

Re: Fantasy architecture: the 10-bit byte

<c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29678&group=comp.arch#29678

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:72d1:0:b0:3a8:15e1:757 with SMTP id o17-20020ac872d1000000b003a815e10757mr1047276qtp.194.1671581867231;
Tue, 20 Dec 2022 16:17:47 -0800 (PST)
X-Received: by 2002:a05:6808:5d9:b0:355:4eda:47e0 with SMTP id
d25-20020a05680805d900b003554eda47e0mr1172606oij.167.1671581866949; Tue, 20
Dec 2022 16:17:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 16:17:46 -0800 (PST)
In-Reply-To: <tnt3gu$9nn$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:11cd:d708:9b1:4a60;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:11cd:d708:9b1:4a60
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com> <tnt3gu$9nn$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 21 Dec 2022 00:17:47 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3345
 by: Michael S - Wed, 21 Dec 2022 00:17 UTC

On Tuesday, December 20, 2022 at 9:47:45 PM UTC+2, Terje Mathisen wrote:
> Michael S wrote:
> > On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> >> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> >>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >>> leave nearly everything else to software.
> >> Bare minimum is good if transistors are limited. Otherwise, might as well do
> >> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> >> in one instruction with one normalize and round. Might be less hardware than
> >> separate FADD/FSUB/FMUL.
> >
> > Compare apples to apples.
> > FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> > But it is bigger than FPU that does either FADD or FMUL per n clocks.
> > Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> > Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> > plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> > there exist a potential for HW savings.
> >
> For me, a sufficient reason to include FMA is the fact that with the FMA
> wide normalizer we gain the ability to handle subnormal inputs and
> outputs at zero cycle cost and very marginal gate cost.
>

If this is indeed the fact.
I am unconvinced that it is true in all possible circumstances.
There are many sensible ways to design FMA-capable FPU.

> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Fantasy architecture: the 10-bit byte

<tntjmb$qogm$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29679&group=comp.arch#29679

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: paaroncl...@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 19:23:38 -0500
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <tntjmb$qogm$1@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 21 Dec 2022 00:23:39 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="16ad1a4a1511737ed6851b3878784cce";
logging-data="877078"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18A7IRhXVQP9LF1FtgHvsvruecLzykbZUc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.0
Cancel-Lock: sha1:eNkwvLmUs4pS8kg8mPaDVu6u7xE=
In-Reply-To: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
 by: Paul A. Clayton - Wed, 21 Dec 2022 00:23 UTC

Russell Wallace wrote:
[snip]
> Suppose you could go back to the early seventies and design an architecture to try to outcompete the 8080 and become the long-term basis of the personal computer industry. You get to use everything we have learned between that day and this, with two caveats:
>
> 1. You only get to bring back ideas, not equipment, so it must be technically feasible to build the first implementation with the process technology of 1974.

You might be interested in the thread:
[just fun] Time travel destination choice with (micro)architecture
knowledge
(Google Groups link:
https://groups.google.com/g/comp.arch/c/XuxpaN5fTZI/m/n4y6HkwGBAAJ

Re: Fantasy architecture: the 10-bit byte

<16e51bdc-8a62-4d4c-b85e-3f9e54ba5e35n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29680&group=comp.arch#29680

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1014:b0:6ff:c9c:dde4 with SMTP id z20-20020a05620a101400b006ff0c9cdde4mr2532887qkj.18.1671585966849;
Tue, 20 Dec 2022 17:26:06 -0800 (PST)
X-Received: by 2002:a54:4518:0:b0:359:d97b:3f6f with SMTP id
l24-20020a544518000000b00359d97b3f6fmr1881655oil.298.1671585966593; Tue, 20
Dec 2022 17:26:06 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 17:26:06 -0800 (PST)
In-Reply-To: <c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6d05:5e47:7756:554b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6d05:5e47:7756:554b
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com> <tnt3gu$9nn$1@gioia.aioe.org>
<c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <16e51bdc-8a62-4d4c-b85e-3f9e54ba5e35n@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 21 Dec 2022 01:26:06 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4583
 by: MitchAlsup - Wed, 21 Dec 2022 01:26 UTC

On Tuesday, December 20, 2022 at 6:17:48 PM UTC-6, Michael S wrote:
> On Tuesday, December 20, 2022 at 9:47:45 PM UTC+2, Terje Mathisen wrote:
> > Michael S wrote:
> > > On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> > >> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> > >>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> > >>> leave nearly everything else to software.
> > >> Bare minimum is good if transistors are limited. Otherwise, might as well do
> > >> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> > >> in one instruction with one normalize and round. Might be less hardware than
> > >> separate FADD/FSUB/FMUL.
> > >
> > > Compare apples to apples.
> > > FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> > > But it is bigger than FPU that does either FADD or FMUL per n clocks.
> > > Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> > > Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> > > plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> > > there exist a potential for HW savings.
> > >
> > For me, a sufficient reason to include FMA is the fact that with the FMA
> > wide normalizer we gain the ability to handle subnormal inputs and
> > outputs at zero cycle cost and very marginal gate cost.
> >
> If this is indeed the fact.
> I am unconvinced that it is true in all possible circumstances.
> There are many sensible ways to design FMA-capable FPU.
<
All FMACs have to be able to produce a 106-bit product and to all to it
a 53-bit fraction such that the product is to the right of the fraction,
the fraction is to the right of the product, and that all bits are considered
when rounding is performed.
<
With such requirements, the product operands only have to avoid
producing the hidden bit when the exponents are 0 as does the augend.
At this point the proper result is produced in all circumstances.
<
One has to be able to normalize across this 106-bit fraction, but
there is a circuit trick whereby one inserts a "fake" 1 in the position
that would become the hidden bit of the denormal (should on occur)
and then simply let the normalization transpire. Presto, denorms
are properly created and handled. Overall cost less than 2% gate
count.
<
So, yes, it is indeed a fact, and is not dependent on the 5-20 ways
one has to build a FMAC unit, just that the unit is capable of IEEE
rounded result accuracy. The rest, as they say, fall out for free.
<
> > Terje
> >
> > --
> > - <Terje.Mathisen at tmsw.no>
> > "almost all programming can be viewed as an exercise in caching"

Re: Fantasy architecture: the 10-bit byte

<tntnl7$r1m1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29681&group=comp.arch#29681

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 19:31:17 -0600
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <tntnl7$r1m1$1@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me>
<39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
<tnt3gu$9nn$1@gioia.aioe.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 21 Dec 2022 01:31:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="1cedb4eb961a5e82c67f3875b01dfb56";
logging-data="886465"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188vtP4rJu6iuBjbW3X9iyf"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
Cancel-Lock: sha1:xpfb1mnTr6soUG79Tg0Us0MJCF0=
Content-Language: en-US
In-Reply-To: <tnt3gu$9nn$1@gioia.aioe.org>
 by: BGB - Wed, 21 Dec 2022 01:31 UTC

On 12/20/2022 1:47 PM, Terje Mathisen wrote:
> Michael S wrote:
>> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com
>> wrote:
>>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
>>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
>>>> leave nearly everything else to software.
>>> Bare minimum is good if transistors are limited. Otherwise, might as
>>> well do
>>> other functions in hardware. FMA (fused multiply-add) does
>>> FADD/FSUB/FMUL
>>> in one instruction with one normalize and round. Might be less
>>> hardware than
>>> separate FADD/FSUB/FMUL.
>>
>> Compare apples to apples.
>> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with
>> throughput of 1 per n clocks *each*.
>> But it is bigger than FPU that does either FADD or FMUL per n clocks.
>> Renormalization after FMA is a lot more costly than after FMUL and
>> somewhat more costly than after FADD.
>> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other
>> hand, only needs 54 MS bits of result
>> plus wired or of the rest of the bits. I am not an expert in
>> multipliers HW, but it seems to me that
>> there exist a potential for HW savings.
>>
> For me, a sufficient reason to include FMA is the fact that with the FMA
> wide normalizer we gain the ability to handle subnormal inputs and
> outputs at zero cycle cost and very marginal gate cost.
>

This is one possible merit.

I started at one point looking into doing a combine FMA unit, but this
effort stalled out when I noted it would be expensive. Tried out a "less
extreme" intermediate option, but was still expensive.

Could have done it for the low-precision FPU (partly leveraging that the
DSP48 does an X*Y+Z operation already), but initially would have still
needed ~ 4 cycles, which would not work with my existing pipeline (and
not being able to pipeline it would be worse than not having FMA, and
would kinda defeat much of the gain vs the main FPU).

Say:
Main FPU:
Scalar FADD/FSUB/FMUL: 6 cycle (6C 6T)
FMAC (double rounded): 12 cycle (12C 12T)
SIMD FADD/FSUB/FMUL: 10 cycle (10C 10T)
Low Precision (at present, *):
SIMD FADD/FSUB/FMUL: 3 cycle (3C 1T)

*: Includes both Binary16 and Binary32 operations.

Though, besides the operations themselves, one are of concern for
denormals is converters, where one either needs to pay a full FADD
latency cost for format conversions, or the converters need to be fairly
expensive.

I had fudged the rules slightly in a way that allowed keeping converter
cost low, while still keeping some other useful properties.

Normal conversion between FP and integer types is still an issue though.
For general-case, needs to be routed through the FADD unit, which
requires having a mantissa larger than the largest supported integer type.

The SIMD converters use a scheme which avoids needing to use the FADD
units for this, but puts some steep restrictions on what can be converted.

Similarly, an Int<->FP converter that can only directly handle 16 or 20
bit values (and needing to fall back to emulation software for anything
bigger) would be... Kinda lame.

> Terje
>

Re: Fantasy architecture: the 10-bit byte

<c82583ee-2c34-4ab4-9e02-aee8e1a7dffcn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29682&group=comp.arch#29682

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a891:0:b0:6ff:9543:d534 with SMTP id r139-20020a37a891000000b006ff9543d534mr1204432qke.676.1671590134915;
Tue, 20 Dec 2022 18:35:34 -0800 (PST)
X-Received: by 2002:a05:6808:3a8d:b0:354:9da8:98a9 with SMTP id
fb13-20020a0568083a8d00b003549da898a9mr6534oib.9.1671590134596; Tue, 20 Dec
2022 18:35:34 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Dec 2022 18:35:34 -0800 (PST)
In-Reply-To: <tntnl7$r1m1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com> <tnt3gu$9nn$1@gioia.aioe.org>
<tntnl7$r1m1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c82583ee-2c34-4ab4-9e02-aee8e1a7dffcn@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 21 Dec 2022 02:35:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6979
 by: robf...@gmail.com - Wed, 21 Dec 2022 02:35 UTC

On Tuesday, December 20, 2022 at 8:31:23 PM UTC-5, BGB wrote:
> On 12/20/2022 1:47 PM, Terje Mathisen wrote:
> > Michael S wrote:
> >> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com
> >> wrote:
> >>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> >>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >>>> leave nearly everything else to software.
> >>> Bare minimum is good if transistors are limited. Otherwise, might as
> >>> well do
> >>> other functions in hardware. FMA (fused multiply-add) does
> >>> FADD/FSUB/FMUL
> >>> in one instruction with one normalize and round. Might be less
> >>> hardware than
> >>> separate FADD/FSUB/FMUL.
> >>
> >> Compare apples to apples.
> >> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with
> >> throughput of 1 per n clocks *each*.
> >> But it is bigger than FPU that does either FADD or FMUL per n clocks.
> >> Renormalization after FMA is a lot more costly than after FMUL and
> >> somewhat more costly than after FADD.
> >> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other
> >> hand, only needs 54 MS bits of result
> >> plus wired or of the rest of the bits. I am not an expert in
> >> multipliers HW, but it seems to me that
> >> there exist a potential for HW savings.
> >>
> > For me, a sufficient reason to include FMA is the fact that with the FMA
> > wide normalizer we gain the ability to handle subnormal inputs and
> > outputs at zero cycle cost and very marginal gate cost.
> >
> This is one possible merit.
>
>
> I started at one point looking into doing a combine FMA unit, but this
> effort stalled out when I noted it would be expensive. Tried out a "less
> extreme" intermediate option, but was still expensive.
>
> Could have done it for the low-precision FPU (partly leveraging that the
> DSP48 does an X*Y+Z operation already), but initially would have still
> needed ~ 4 cycles, which would not work with my existing pipeline (and
> not being able to pipeline it would be worse than not having FMA, and
> would kinda defeat much of the gain vs the main FPU).
>
>
> Say:
> Main FPU:
> Scalar FADD/FSUB/FMUL: 6 cycle (6C 6T)
> FMAC (double rounded): 12 cycle (12C 12T)
> SIMD FADD/FSUB/FMUL: 10 cycle (10C 10T)
> Low Precision (at present, *):
> SIMD FADD/FSUB/FMUL: 3 cycle (3C 1T)
>
> *: Includes both Binary16 and Binary32 operations.
>
>
>
> Though, besides the operations themselves, one are of concern for
> denormals is converters, where one either needs to pay a full FADD
> latency cost for format conversions, or the converters need to be fairly
> expensive.
>
> I had fudged the rules slightly in a way that allowed keeping converter
> cost low, while still keeping some other useful properties.
>
>
> Normal conversion between FP and integer types is still an issue though.
> For general-case, needs to be routed through the FADD unit, which
> requires having a mantissa larger than the largest supported integer type..
>
> The SIMD converters use a scheme which avoids needing to use the FADD
> units for this, but puts some steep restrictions on what can be converted..
>
> Similarly, an Int<->FP converter that can only directly handle 16 or 20
> bit values (and needing to fall back to emulation software for anything
> bigger) would be... Kinda lame.
>
>
> > Terje
> >

There are a lot of merits to the 10-bit byte. Somewhat of a sweet-spot for character
data and memory sizes. I have wished back programming early micros that more
programmable characters were available on-screen to allow graphics. A 40-25
screen could use 1024 character tiles.

I have done some thinking, sketched out design, in the past on using 10/11-bit bytes
because ECC can be used with single bit error correction if 16-bit wide memory were
used to support the byte. Wanting to get error correction with a standard 16-bit wide
memory leads to an odd byte size.

One of the issues with the byte size is getting stuck in a rut, of computers designed
around it. If 10-bit bytes had been adopted, then we may not have progressed as quickly
to 64-bit machines, because 20-bits would have been “good-enough” for a much longer
time. 40-bit machines may have become popular with little demand to upgrade from 40
to 80 bits. 40-bit machines would probably be adequate today. We are now likely stuck
n a 64-bit machine rut. While those who chose 10-bit bytes are using 80-bit machines.

>A 10-bit byte will be good for a few hundred characters of extended ASCII. A 20-bit
>word will be >good for Unicode, avoiding the trouble that happened OTL when the world
>tried to squeeze Unicode >into a 16-bit word.

Even 20-bits may not be enough for Unicode. Prefer to have a larger size for this. 24-bits
as I think there are a lot of Unicode characters. UTF21 format, three characters per 64-bit
word works okay but might be too restrictive in the long run.

Re: Fantasy architecture: the 10-bit byte

<tntuvm$uidt$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29684&group=comp.arch#29684

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Tue, 20 Dec 2022 21:36:19 -0600
Organization: A noiseless patient Spider
Lines: 137
Message-ID: <tntuvm$uidt$2@dont-email.me>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me>
<39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
<tnt3gu$9nn$1@gioia.aioe.org>
<c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com>
<16e51bdc-8a62-4d4c-b85e-3f9e54ba5e35n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 21 Dec 2022 03:36:22 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="1cedb4eb961a5e82c67f3875b01dfb56";
logging-data="1001917"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+pmo2a5tiSKnr9lOi/xbdW"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.0
Cancel-Lock: sha1:tLBTXK3lOYBJI9j0gjOfs1YQLgQ=
Content-Language: en-US
In-Reply-To: <16e51bdc-8a62-4d4c-b85e-3f9e54ba5e35n@googlegroups.com>
 by: BGB - Wed, 21 Dec 2022 03:36 UTC

On 12/20/2022 7:26 PM, MitchAlsup wrote:
> On Tuesday, December 20, 2022 at 6:17:48 PM UTC-6, Michael S wrote:
>> On Tuesday, December 20, 2022 at 9:47:45 PM UTC+2, Terje Mathisen wrote:
>>> Michael S wrote:
>>>> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
>>>>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
>>>>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
>>>>>> leave nearly everything else to software.
>>>>> Bare minimum is good if transistors are limited. Otherwise, might as well do
>>>>> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
>>>>> in one instruction with one normalize and round. Might be less hardware than
>>>>> separate FADD/FSUB/FMUL.
>>>>
>>>> Compare apples to apples.
>>>> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
>>>> But it is bigger than FPU that does either FADD or FMUL per n clocks.
>>>> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
>>>> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
>>>> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
>>>> there exist a potential for HW savings.
>>>>
>>> For me, a sufficient reason to include FMA is the fact that with the FMA
>>> wide normalizer we gain the ability to handle subnormal inputs and
>>> outputs at zero cycle cost and very marginal gate cost.
>>>
>> If this is indeed the fact.
>> I am unconvinced that it is true in all possible circumstances.
>> There are many sensible ways to design FMA-capable FPU.
> <
> All FMACs have to be able to produce a 106-bit product and to all to it
> a 53-bit fraction such that the product is to the right of the fraction,
> the fraction is to the right of the product, and that all bits are considered
> when rounding is performed.
> <
> With such requirements, the product operands only have to avoid
> producing the hidden bit when the exponents are 0 as does the augend.
> At this point the proper result is produced in all circumstances.
> <
> One has to be able to normalize across this 106-bit fraction, but
> there is a circuit trick whereby one inserts a "fake" 1 in the position
> that would become the hidden bit of the denormal (should on occur)
> and then simply let the normalization transpire. Presto, denorms
> are properly created and handled. Overall cost less than 2% gate
> count.
> <
> So, yes, it is indeed a fact, and is not dependent on the 5-20 ways
> one has to build a FMAC unit, just that the unit is capable of IEEE
> rounded result accuracy. The rest, as they say, fall out for free.
> <

Yeah, just... That first part...

This is a big reason why BJX2 does not have single-rounded FMAC, nor any
immediate plan to implement it in "hardware".

There is an optional FMAC instruction, but it is double-rounded, for the
main reason that double-rounding is a lot cheaper (but, the tricks used
to implement higher-precision math via FMAC ops, will not work in this
case).

....

Otherwise, recently bought an OrangeCrab board, which I could in-theory
try porting stuff to (though, IO is a big concern; uses an Lattice ECP5
FPGA which "seems decently large on-paper"; but little idea how it
compares with an Artix or Spartan in-use).

Have been trying to think out how to wire a VGA output to the thing.

Currently looks like I might need to do it RGBI style (~ 6 IO pins), but
possibly feed it through some 220pF capacitors or similar (thinking is
that 220pF could potentially allow me to PWM the output signal, while
still having a fast enough response so as to not turn the output into a
blurry mess).

Ideally, should be able to smooth out multiple pulses within a single
pixel, while allowing a "reasonably quick" transition between adjacent
pixels (otherwise, I would be left with either 16-color for lower
capacitance, or blurred output for higher capacitance; basically as part
of a low-pass RC-filter).

Though, 68 or 100pF capacitors might also work, ideally want around a 15
to 20 MHz cutoff (say, with 2K to 10K for R, then fed into a 2n3904 (as
a driver/amplifier), which then goes to the output along with the
voltage-limiting LED, with probably a 100 ohm resistor on the NPN, ...).

Could maybe also use trimpots to keep the cutoff as adjustable.

Don't really have enough IO pins for much beyond RGBI; nor enough
clock-cycles per pixel for effective PWM/PDM ...

Unclear how many mA the IO pins can drive (might need to use a few
2n3904 transistors or similar; also to step up the H/V sync from 3.3v to
5v, though could skip if the monitor will accept 3.3v H/V sync).

Could maybe use some LEDs as voltage limiters, since if the voltage goes
too far over the 0.7v white level, the LED will dissipate the excess.

Otherwise, if I could fit a PS2 connector into the mix, that would be
good as well (would need ~ 2 or 4 more data wires, 4 for mouse +
keyboard, and source for 5v).

The board seemingly also has a connection for (I assume) a LiON/LiPO
cell. Just less IO than I initially thought as the 'A' pins are
apparently input-only. Could run off this, but would only have 3.3v.

If I want batteries and 5v, could use a NiMH pack for 5v (likely 4x 1.2v
cells), but could not use the built-in charger circuit (and would need
6v to charge the pack). Would need a 6-cell pack to run a 7805 (5V)
regulator (with ~ 9v for the charger).

Though, 6v does fall within the standard +/- 25% tolerance for TTL, so
should be OK in theory to run stuff directly off the batteries (a 4-cell
NiMH pack would be ~ 4.0 to 5.6 v depending on charge level).

Still some uncertainty about hardware mapping differences, or specifics
of using the Yosys toolchain, ...

Could be an experiment though to see if I could fit the BJX2 core onto it.

>>> Terje
>>>
>>> --
>>> - <Terje.Mathisen at tmsw.no>
>>> "almost all programming can be viewed as an exercise in caching"

Re: Fantasy architecture: the 10-bit byte

<tnv23d$1vk1$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29693&group=comp.arch#29693

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!rd9pRsUZyxkRLAEK7e/Uzw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Fantasy architecture: the 10-bit byte
Date: Wed, 21 Dec 2022 14:35:40 +0100
Organization: Aioe.org NNTP Server
Message-ID: <tnv23d$1vk1$1@gioia.aioe.org>
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at>
<e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me>
<39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com>
<tnt3gu$9nn$1@gioia.aioe.org>
<c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="65153"; posting-host="rd9pRsUZyxkRLAEK7e/Uzw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.14
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 21 Dec 2022 13:35 UTC

Michael S wrote:
> On Tuesday, December 20, 2022 at 9:47:45 PM UTC+2, Terje Mathisen wrote:
>> Michael S wrote:
>>> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
>>>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
>>>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
>>>>> leave nearly everything else to software.
>>>> Bare minimum is good if transistors are limited. Otherwise, might as well do
>>>> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
>>>> in one instruction with one normalize and round. Might be less hardware than
>>>> separate FADD/FSUB/FMUL.
>>>
>>> Compare apples to apples.
>>> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
>>> But it is bigger than FPU that does either FADD or FMUL per n clocks.
>>> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
>>> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
>>> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
>>> there exist a potential for HW savings.
>>>
>> For me, a sufficient reason to include FMA is the fact that with the FMA
>> wide normalizer we gain the ability to handle subnormal inputs and
>> outputs at zero cycle cost and very marginal gate cost.
>>
>
> If this is indeed the fact.
> I am unconvinced that it is true in all possible circumstances.
> There are many sensible ways to design FMA-capable FPU.

This is an interesting statement!

Can you give at least one example that don't include a normalizer wide
enough to handle FMUL of two subnormal numbers without first normalizing
them?

Having that normalizer means that we can basically ignore normal vs
subnormal on input, and the output normalization is the same as
otherwise required for FMA with (near-)maximal cancellation, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Fantasy architecture: the 10-bit byte

<8e266293-fa76-4e6e-bfa8-849cc8a1eb0cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29706&group=comp.arch#29706

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:71da:0:b0:3a9:80b6:4ca0 with SMTP id i26-20020ac871da000000b003a980b64ca0mr110207qtp.304.1671659928279;
Wed, 21 Dec 2022 13:58:48 -0800 (PST)
X-Received: by 2002:a05:6870:cc81:b0:144:5572:4aeb with SMTP id
ot1-20020a056870cc8100b0014455724aebmr351320oab.186.1671659927726; Wed, 21
Dec 2022 13:58:47 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 21 Dec 2022 13:58:47 -0800 (PST)
In-Reply-To: <tnv23d$1vk1$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com> <tnt3gu$9nn$1@gioia.aioe.org>
<c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com> <tnv23d$1vk1$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8e266293-fa76-4e6e-bfa8-849cc8a1eb0cn@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Wed, 21 Dec 2022 21:58:48 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4413
 by: JimBrakefield - Wed, 21 Dec 2022 21:58 UTC

On Wednesday, December 21, 2022 at 7:35:45 AM UTC-6, Terje Mathisen wrote:
> Michael S wrote:
> > On Tuesday, December 20, 2022 at 9:47:45 PM UTC+2, Terje Mathisen wrote:
> >> Michael S wrote:
> >>> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> >>>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> >>>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >>>>> leave nearly everything else to software.
> >>>> Bare minimum is good if transistors are limited. Otherwise, might as well do
> >>>> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> >>>> in one instruction with one normalize and round. Might be less hardware than
> >>>> separate FADD/FSUB/FMUL.
> >>>
> >>> Compare apples to apples.
> >>> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> >>> But it is bigger than FPU that does either FADD or FMUL per n clocks.
> >>> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> >>> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> >>> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> >>> there exist a potential for HW savings.
> >>>
> >> For me, a sufficient reason to include FMA is the fact that with the FMA
> >> wide normalizer we gain the ability to handle subnormal inputs and
> >> outputs at zero cycle cost and very marginal gate cost.
> >>
> >
> > If this is indeed the fact.
> > I am unconvinced that it is true in all possible circumstances.
> > There are many sensible ways to design FMA-capable FPU.
> This is an interesting statement!
>
> Can you give at least one example that don't include a normalizer wide
> enough to handle FMUL of two subnormal numbers without first normalizing
> them?
>
> Having that normalizer means that we can basically ignore normal vs
> subnormal on input, and the output normalization is the same as
> otherwise required for FMA with (near-)maximal cancellation, right?
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"
|>
|>Can you give at least one example that don't include a normalizer wide
|>enough to handle FMUL of two subnormal numbers without first normalizing
|>them?
Rant on:
There is a way to keep subnormal numbers normalized, works best with round to odd
using a trailing zeros fraction/mantissa code and using the exponent to identify as subnormal.
The interpretation of this method is that subnormal numbers are inexact.
Rant off

Re: Fantasy architecture: the 10-bit byte

<13dd8ec8-d946-4b35-a4ed-d6dc93766e9dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29709&group=comp.arch#29709

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2cc5:b0:517:2e07:da29 with SMTP id lf5-20020a0562142cc500b005172e07da29mr179003qvb.92.1671663815809;
Wed, 21 Dec 2022 15:03:35 -0800 (PST)
X-Received: by 2002:a05:6808:130d:b0:35e:acce:1869 with SMTP id
y13-20020a056808130d00b0035eacce1869mr82298oiv.106.1671663815497; Wed, 21 Dec
2022 15:03:35 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 21 Dec 2022 15:03:35 -0800 (PST)
In-Reply-To: <tnv23d$1vk1$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:11cd:d708:9b1:4a60;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:11cd:d708:9b1:4a60
References: <41a60986-1046-4318-823e-e07a9f175e70n@googlegroups.com>
<2022Dec18.175256@mips.complang.tuwien.ac.at> <e8b88b4b-409d-41af-a98b-1487a3dbf91fn@googlegroups.com>
<tnovcp$79fh$1@dont-email.me> <tnp38h$160bb$1@newsreader4.netcologne.de>
<tnq4um$d0c9$1@dont-email.me> <2022Dec19.191001@mips.complang.tuwien.ac.at>
<tnrfd1$jt2c$1@dont-email.me> <39a88253-607b-483b-87c1-4a0008235051n@googlegroups.com>
<a8d8b7cc-cdf5-44aa-917b-012509d5438en@googlegroups.com> <tnt3gu$9nn$1@gioia.aioe.org>
<c0fffdda-789f-49a2-9d74-569ccde10976n@googlegroups.com> <tnv23d$1vk1$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <13dd8ec8-d946-4b35-a4ed-d6dc93766e9dn@googlegroups.com>
Subject: Re: Fantasy architecture: the 10-bit byte
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 21 Dec 2022 23:03:35 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 51
 by: Michael S - Wed, 21 Dec 2022 23:03 UTC

On Wednesday, December 21, 2022 at 3:35:45 PM UTC+2, Terje Mathisen wrote:
> Michael S wrote:
> > On Tuesday, December 20, 2022 at 9:47:45 PM UTC+2, Terje Mathisen wrote:
> >> Michael S wrote:
> >>> On Tuesday, December 20, 2022 at 11:45:33 AM UTC+2, robf...@gmail.com wrote:
> >>>> On Monday, December 19, 2022 at 11:58:13 PM UTC-5, BGB wrote:
> >>>>> If it were me, I would just assume an FPU that does FADD/FSUB/FMUL and
> >>>>> leave nearly everything else to software.
> >>>> Bare minimum is good if transistors are limited. Otherwise, might as well do
> >>>> other functions in hardware. FMA (fused multiply-add) does FADD/FSUB/FMUL
> >>>> in one instruction with one normalize and round. Might be less hardware than
> >>>> separate FADD/FSUB/FMUL.
> >>>
> >>> Compare apples to apples.
> >>> FMA with throughput of 1 per n clocks is smaller that FADD + FMUL with throughput of 1 per n clocks *each*.
> >>> But it is bigger than FPU that does either FADD or FMUL per n clocks.
> >>> Renormalization after FMA is a lot more costly than after FMUL and somewhat more costly than after FADD.
> >>> Also, FMA needs full 53x53=>106 bit multiplier. FMUL, on the other hand, only needs 54 MS bits of result
> >>> plus wired or of the rest of the bits. I am not an expert in multipliers HW, but it seems to me that
> >>> there exist a potential for HW savings.
> >>>
> >> For me, a sufficient reason to include FMA is the fact that with the FMA
> >> wide normalizer we gain the ability to handle subnormal inputs and
> >> outputs at zero cycle cost and very marginal gate cost.
> >>
> >
> > If this is indeed the fact.
> > I am unconvinced that it is true in all possible circumstances.
> > There are many sensible ways to design FMA-capable FPU.
> This is an interesting statement!
>
> Can you give at least one example that don't include a normalizer wide
> enough to handle FMUL of two subnormal numbers without first normalizing
> them?
>
> Having that normalizer means that we can basically ignore normal vs
> subnormal on input, and the output normalization is the same as
> otherwise required for FMA with (near-)maximal cancellation, right?
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

I can't answer your question right now.
Instead, I can give you result of the measurement on real-world hardware.
Intel Skylake (has FMA on both FP execution pipes):
1e-307 - 1.01e-307 takes ~138 clocks (latency)
That's pretty close to ~152 clocks the same operation takes on Intel Ivy Bridge
that does not have FMA.


devel / comp.arch / Re: bad float, Fantasy architecture: the 10-bit byte

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor