Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

God doesn't play dice. -- Albert Einstein


devel / comp.arch / Re: Power cost of IEEE754

SubjectAuthor
* Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
+* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
|`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
|  `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|   +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
|   |`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|   | `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
|   `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
|    `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|     `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
|      `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|       +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
|       |`- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|       `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
+* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Terje Mathisen
|+- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
| +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| |`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
| | +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| | |`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
| | | `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| | |  +- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
| | |  `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
| | `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Andy
| |  +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.robf...@gmail.com
| |  |`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
| |  | `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| |  `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| |   `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Andy
| |    `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| |     `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.robf...@gmail.com
| |      `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| |       `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.robf...@gmail.com
| |        `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| |         `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
| |          `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Terje Mathisen
| |`- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
| +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Anton Ertl
| |`- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Terje Mathisen
| `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Quadibloc
|  +- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
|  +* AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)Anton Ertl
|  |`* Re: AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)John Dallman
|  | +* Re: AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)Thomas Koenig
|  | |`- Re: AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)John Dallman
|  | `* Re: AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)Marcus
|  |  +- Re: AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)Anton Ertl
|  |  `- Re: AVX512 (was: Misc: Idle thoughts for cheap and fast(ish) GPU.)Terje Mathisen
|  `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
|   `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Michael S
|    `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Anton Ertl
|     `* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Michael S
|      +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Anton Ertl
|      |`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
|      | `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Anton Ertl
|      `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Anton Ertl
`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.luke.l...@gmail.com
 +* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.Quadibloc
 |+* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
 ||`* Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
 || `- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.BGB
 |`- Re: Misc: Idle thoughts for cheap and fast(ish) GPU.MitchAlsup
 `* Power cost of IEEE754 (was: Misc: Idle thoughts for cheap and fast(ish) GPU)Stefan Monnier
  +* Re: Power cost of IEEE754 (was: Misc: Idle thoughts for cheap androbf...@gmail.com
  |`- Re: Power cost of IEEE754 (was: Misc: Idle thoughts for cheap andBGB
  +* Re: Power cost of IEEE754Terje Mathisen
  |+* Re: Power cost of IEEE754BGB
  ||+* Re: Power cost of IEEE754MitchAlsup
  |||`- Re: Power cost of IEEE754BGB
  ||`* Re: Power cost of IEEE754MitchAlsup
  || `* Re: Power cost of IEEE754Quadibloc
  ||  +- Re: Power cost of IEEE754MitchAlsup
  ||  +* Re: Power cost of IEEE754Anton Ertl
  ||  |`* Re: Power cost of IEEE754Terje Mathisen
  ||  | `- Re: Power cost of IEEE754MitchAlsup
  ||  `* Re: Power cost of IEEE754Thomas Koenig
  ||   `* Re: Power cost of IEEE754John Dallman
  ||    `* Re: Power cost of IEEE754Michael S
  ||     `* Re: Power cost of IEEE754EricP
  ||      `* Re: Power cost of IEEE754MitchAlsup
  ||       `* Re: Power cost of IEEE754EricP
  ||        `- Re: Power cost of IEEE754EricP
  |`* Re: Power cost of IEEE754Paul A. Clayton
  | +* Re: Power cost of IEEE754MitchAlsup
  | |`* Re: Power cost of IEEE754luke.l...@gmail.com
  | | +- Re: Power cost of IEEE754MitchAlsup
  | | +* Re: Power cost of IEEE754Josh Vanderhoof
  | | |`* Re: Power cost of IEEE754BGB
  | | | `* Re: Power cost of IEEE754Josh Vanderhoof
  | | |  `* Re: Power cost of IEEE754BGB
  | | |   `* Re: Power cost of IEEE754Josh Vanderhoof
  | | |    `* Re: Power cost of IEEE754BGB
  | | |     `* Re: Power cost of IEEE754Josh Vanderhoof
  | | |      `* Re: Power cost of IEEE754BGB
  | | |       `* Re: Power cost of IEEE754Josh Vanderhoof
  | | |        `- Re: Power cost of IEEE754BGB
  | | `- Re: Power cost of IEEE754Terje Mathisen
  | +* Re: Power cost of IEEE754Ivan Godard
  | `* Re: Power cost of IEEE754BGB
  `- Re: Power cost of IEEE754 (was: Misc: Idle thoughts for cheap andQuadibloc

Pages:123456
Re: Power cost of IEEE754

<70a2c19e-4757-4c5e-a4d0-b4c9055f01cfn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27480&group=comp.arch#27480

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:1c8c:b0:473:408f:ddd6 with SMTP id ib12-20020a0562141c8c00b00473408fddd6mr16502452qvb.74.1661184482225;
Mon, 22 Aug 2022 09:08:02 -0700 (PDT)
X-Received: by 2002:a05:622a:214:b0:342:f97c:1706 with SMTP id
b20-20020a05622a021400b00342f97c1706mr15645625qtx.291.1661184482037; Mon, 22
Aug 2022 09:08:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 22 Aug 2022 09:08:01 -0700 (PDT)
In-Reply-To: <55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7078:8162:223f:40bd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7078:8162:223f:40bd
References: <t6gush$p5u$1@dont-email.me> <0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <70a2c19e-4757-4c5e-a4d0-b4c9055f01cfn@googlegroups.com>
Subject: Re: Power cost of IEEE754
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 22 Aug 2022 16:08:02 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3034
 by: MitchAlsup - Mon, 22 Aug 2022 16:08 UTC

On Monday, August 22, 2022 at 7:39:53 AM UTC-5, luke.l...@gmail.com wrote:
> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>
> > At the time IEE754 was being done (the first version) there were
> > 2 kinds of FP users, those who wanted fast FP (zillions) and those
> > who wanted correct answers (5--yes 5, maybe 6).
<
> 3D GPUs as you know are still focussed on speed (and power)
> due to simple pragmatic issues that the screen resolutions simply
> don't need high-accuracy to work out where a pixel goes. i.e.: if the
> screen is 1280x800 you certainly don't need accuracy above
> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
> ops were designed with low-accuracy.
<
GPU FP started out as IEEE 754 containers, then as the generations
flew by, became more and more what 754 wanted in the first place.
Accuracy improved, then rounding became correct, then denormals
started to function properly, and here we are today.
>
> the *influence* of 3D GPU manufacturers although they are
> small in number is much higher (behind closed doors such
> as the Khronos Group) and so we just don't hear about
> it.
>
> Tom Forsyth on the other hand in his talk on Larrabee mentions
> that they concentrated on FP32, missed the goal of being a
> commercially-viable 3D GPU Card (performance and power)
> so attempted to target the scientific market instead, only to
> be told that the FP64 performance sucked and consequently
> they fell between two stools.
>
> wark-wark.
>
> damned if you do, damned if you don't.
>
> l.

Re: Power cost of IEEE754

<te0pu4$hv1$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27485&group=comp.arch#27485

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Mon, 22 Aug 2022 20:45:56 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <te0pu4$hv1$1@newsreader4.netcologne.de>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <tduucp$2j1ao$1@dont-email.me>
<tdvpn5$2lci2$1@dont-email.me> <tdvs11$18fj$1@gioia.aioe.org>
Injection-Date: Mon, 22 Aug 2022 20:45:56 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:4e56:0:7285:c2ff:fe6c:992d";
logging-data="18401"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Mon, 22 Aug 2022 20:45 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Paul A. Clayton wrote:
>> Ivan Godard wrote:
>>> On 8/21/2022 6:45 PM, Paul A. Clayton wrote:
>> [snip]>> alternative. I also think that the probablistic rounding that Nick
>>>> Maclaren championed (a little) would have been interesting and
>>>> reduced or exposed some problems with floating point.)
>>>>
>>>
>>> I originally thought stochastic rounding was a great idea and put it
>>> in. Kahan talked me out of it.
>>
>> I suspect Kahan was concerned with accuracy and provability while Nick
>> Maclaren was concerned with ordinary use where providing jitter could
>> either (rarely?) avoid some issues with practical function (getting a
>> meaningfully correct result) or make more visible the existence of
>> numerical issues with the computation. I *think* Nick Maclaren
>> considered variable results a positive feature (i.e., FP should not be
>> treated as integer with bit-exact results).
>>
>> Without hardware that supports such as an option, examining the effects
>> becomes much more difficult. Getting others to run their code in real
>> life conditions seems important for getting enough data with fewer
>> systematic 'errors'; even with a roughly equally fast and reasonably
>> available option, motivating such testing seems likely to be challenging.
>
> The best you can do these days to determine if you are close to the
> endge of stability is to either rerun the calculation with fixed
> rounding mode (truncate/floor/ceil) which will use the exact same time,
> or you can manually truncate/jitter one or more of the bottom mantissa
> bits at regular points in the calculation, but this requires source code
> changes.

I used a Fortran compiler on a Siemens/Fujitsu mainframe clone a few
decades ago. It had an option to drop digits on each calculation,
to check for such errors.

The details are a bit hazy, and I threw away the handbook quite
some time ago.

Re: Power cost of IEEE754

<ygntu64q7a5.fsf@y.z>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27488&group=comp.arch#27488

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx03.iad.POSTED!not-for-mail
From: x...@y.z (Josh Vanderhoof)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org>
<tdiql4$96b$1@gioia.aioe.org> <tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
Reply-To: Josh Vanderhoof <jlv@mxsimulator.com>
Message-ID: <ygntu64q7a5.fsf@y.z>
Cancel-Lock: sha1:4ekLNd+S1JAoBRKsBaHJNPHTJog=
MIME-Version: 1.0
Content-Type: text/plain
Lines: 19
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Mon, 22 Aug 2022 21:48:19 UTC
Date: Mon, 22 Aug 2022 17:48:18 -0400
X-Received-Bytes: 1816
 by: Josh Vanderhoof - Mon, 22 Aug 2022 21:48 UTC

"luke.l...@gmail.com" <luke.leighton@gmail.com> writes:

> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>
>> At the time IEE754 was being done (the first version) there were
>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>> who wanted correct answers (5--yes 5, maybe 6).
>
> 3D GPUs as you know are still focussed on speed (and power)
> due to simple pragmatic issues that the screen resolutions simply
> don't need high-accuracy to work out where a pixel goes. i.e.: if the
> screen is 1280x800 you certainly don't need accuracy above
> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
> ops were designed with low-accuracy.

Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
that precision the vertices will only be accurate to around 3/4 of an
inch. It'd be a jittery mess as you move and rotate it. The screen
resolution has nothing to do with it.

Re: Power cost of IEEE754

<te10gg$2p8ng$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27490&group=comp.arch#27490

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Mon, 22 Aug 2022 17:38:03 -0500
Organization: A noiseless patient Spider
Lines: 136
Message-ID: <te10gg$2p8ng$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 22 Aug 2022 22:38:08 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="825deb3acb9e9e91b13b3dd84d074261";
logging-data="2925296"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MtrYr0nAeypcLrq1FzC4X"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:a+tYYbPZCGJt+8iv9sNN3MmzRmI=
In-Reply-To: <tdun4o$2ifk6$1@dont-email.me>
Content-Language: en-US
 by: BGB - Mon, 22 Aug 2022 22:38 UTC

On 8/21/2022 8:45 PM, Paul A. Clayton wrote:
> Terje Mathisen wrote:
>> Stefan Monnier wrote:
>>>> 3) RVV or any other Cray-Style Vector ISA requiring strict compliance
>>>>       with IEEE754 FP accuracy will punish you with a 400% power/area
>>>>       penalty compared to modern 3D-optimised GPUs which, as explicitly
>>>
>>> What is the origin of this extra cost?  IOW, where's the meat of the
>>> savings (I mean: which part of the computation of the last few bits
>>> costs so much)?
>>> Does the saving vary significantly between instructions?
>>
>> I think that is pretty much bogus, Mitch have shown repeatedly that
>> having FMAC makes subnormal handling nearly free, as in zero cycles
>> and single-digit percentage gates/power. 400% is simply rubbish.
>
> I am guessing that double-rounding FMADD would not provide as much
> savings. Single-rounding is typically assumed, but I think the
> original implementation on MIPS did double rounding (perhaps to
> provide the same result as unfused FMUL and FADD).
>

Double rounding allows cutting off the low bits from the multiplier with
no one being able to see that you had cut off the low-order bits (within
a statistical probability of inexact rounding).

As I see it, a small but statistically non-zero probability of incorrect
rounding is small enough to be ignored for most practical applications.

With single rounding FMAC, things like "catastrophic cancellation" will
reveal the contents of these low order bits, whereas the double-rounded
version would mostly give zeroes here. Some algorithms make use of
single-rounding behavior, so this can't be entirely glossed over (if an
FMAC operator is provided).

Though, one option (if albeit not ideal for performance) would be to
have the compiler fall back to software emulation for cases where this
is needed.

Then again, it a is similar issue if things like FDIV and FSQRT are
provided by the runtime library rather than by the FPU itself.

As can be noted in my case:
FADD, FSUB, FMUL: Baseline
FDIV, FSQRT: Extension (Default=Software)
FMAC: Optional Extension (Double Rounded Only)

The FMAC in this case being done by having the FMUL and FADD units able
to do a "combining mecha" thing, but still double rounded because FMUL
doesn't actually produce the low-order bits, nor would the FADD be wide
enough for this.

Conceivably, this sub-mode could be made to support subnormal numbers,
but I am not sure if they are enough of a use-case on their own (and
also the format-converters don't support subnormal numbers, as the cost
of supporting them in a converter is quite significant vs the "don't
bother" case).

The FDIV and FSQRT support is "they technically exist".
The FDIV exists
But is not fast (goes through a Shift-Add unit);
The Shift-Add unit does seem to give an accurate result.
The FSQRT is both slow and not very accurate.
Doing it in software is "generally better" on both fronts.

In both cases (FDIV and FSQRT), the "fast but low precision" option is
also to fake then in software (with 1 or 2 Newton-Raphson stages).

> (I kind of wonder what the costs would have been to implement
> floating point multiplication as a high power-of-two bits
> truncated result (i.e., ignoring carry-in from lower bits). I tend
> to agree that the benefit of standardization limits different
> interfaces to specialized uses, but there may have been a better
> alternative. I also think that the probablistic rounding that Nick
> Maclaren championed (a little) would have been interesting and
> reduced or exposed some problems with floating point.)
>

As noted, creating the high order bits, but discarding the low order
results (and any carry-in from them) is kinda how it works in mt case.

I am able to pull off a double precision FMUL with 6x 18*18->36
multipliers (and some extra LUTs).

The "full solution" would effectively require 16 multipliers.

This was considered at one point, with some internal paper folding so
that it would function as one of:
Provide full (widened) result for Binary64 multiply;
Provide for an S.E15.F80 "Long Double"/FP96 multiply;
Also provide a 64-bit integer multiply.

Idea was that it would also provide FMAC and sub-normal numbers for
Binary64.

This didn't really happen mostly as it would have been too expensive.

Did almost get a feature-cut version as well (mostly would have just
given S.E15.F80 without the other features), but it was still pretty
expensive (and not likely to be useful enough to justify how its costs).

The idea of the multiplier was that some of the DSPs could move between
the low-order bits (for Int64 and Binary64), and the corners of the
triangle for the FP96 format (mostly by changing around some of the
inputs and outputs).

....

This falls into the "might have happened had I been using a bigger FPGA"
category...

Full Binary128 (in hardware) still wouldn't have happened, as this was
still a bit much, and can be handled "well enough" in software.

Though, yeah, as soon as one types:
long double x, y, z;
...
z=x*y;
One can look forwards to amazing(ly bad) performance...

Luckily, "long double" is itself pretty rare in practice. It was
originally being handled as an alias to "double", and thus Binary64, but
then I noted that since nothing I was running was using it, then
switching it out to software-emulated Binary128 didn't really hurt
anything (and, if one is using "long double", it generally means that
accuracy matters more than performance).

Re: Power cost of IEEE754

<te1440$2phc4$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27492&group=comp.arch#27492

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Mon, 22 Aug 2022 18:39:41 -0500
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <te1440$2phc4$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 22 Aug 2022 23:39:44 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="825deb3acb9e9e91b13b3dd84d074261";
logging-data="2934148"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18A6GaKzfrhWEIRwW3m2QBY"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:ZBU1tZIuWT8adnKJtWHwi+RN3Ao=
In-Reply-To: <ygntu64q7a5.fsf@y.z>
Content-Language: en-US
 by: BGB - Mon, 22 Aug 2022 23:39 UTC

On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>
>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>
>>> At the time IEE754 was being done (the first version) there were
>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>> who wanted correct answers (5--yes 5, maybe 6).
>>
>> 3D GPUs as you know are still focussed on speed (and power)
>> due to simple pragmatic issues that the screen resolutions simply
>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>> screen is 1280x800 you certainly don't need accuracy above
>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>> ops were designed with low-accuracy.
>
> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
> that precision the vertices will only be accurate to around 3/4 of an
> inch. It'd be a jittery mess as you move and rotate it. The screen
> resolution has nothing to do with it.

For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
it is generally sufficient for stuff like Quake and similar. Much beyond
this, who knows?...

Still fares better than Binary16, which if used in the projection stages
falls more solidly into "not sufficient" territory.

Binary16 gives a jittery mess even within normal Quake maps (~ 4096 inch
cube).

Ironically, a 16 bit mantissa seems to hold up pretty OK up until around
1 or 2 km from the origin (where jitter becomes fairly obvious), which
isn't too far off from what I see with normal consumer-grade graphics cards.

Whereas, if one does actually use full Binary32 precision for
projection, this sort of jitter doesn't seem to happen until one is
*significantly* further from the origin.

Can't help but feel a little suspicious here...

....

Re: Power cost of IEEE754

<te1nco$5ds$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27494&group=comp.arch#27494

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Tue, 23 Aug 2022 05:08:40 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <te1nco$5ds$1@newsreader4.netcologne.de>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
Injection-Date: Tue, 23 Aug 2022 05:08:40 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:4e56:0:7285:c2ff:fe6c:992d";
logging-data="5564"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 23 Aug 2022 05:08 UTC

BGB <cr88192@gmail.com> schrieb:

> Double rounding allows cutting off the low bits from the multiplier with
> no one being able to see that you had cut off the low-order bits (within
> a statistical probability of inexact rounding).

You are making the same error that Intel trying to explain the FDIV
bug at the time - this is not statistical, it is deterministic.

Re: Power cost of IEEE754

<te20u2$2uhk1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27495&group=comp.arch#27495

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Tue, 23 Aug 2022 02:51:26 -0500
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <te20u2$2uhk1$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 23 Aug 2022 07:51:30 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="825deb3acb9e9e91b13b3dd84d074261";
logging-data="3098241"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19UCdEuzv7tT77rWIubir3D"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:JeoigV7WYx2chCJpahIcpwW6tts=
In-Reply-To: <te1nco$5ds$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: BGB - Tue, 23 Aug 2022 07:51 UTC

On 8/23/2022 12:08 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>
>> Double rounding allows cutting off the low bits from the multiplier with
>> no one being able to see that you had cut off the low-order bits (within
>> a statistical probability of inexact rounding).
>
> You are making the same error that Intel trying to explain the FDIV
> bug at the time - this is not statistical, it is deterministic.

While the rounding is deterministic, there is only a certain chance (on
average) that it will effect the result.

Of the total number of floating point values that can exist, there is a
smaller (but still very large) number of values for which rounding would
be inexact.

But, for plain FADD and FMUL, inexactly rounded results is all one is
going to see from all the bits which fell off the bottom (with only a
few extra bits being needed here).

The FADD unit has a little more internally, but mostly this is because
the mantissa needs to be large enough to hold a full-width 64-bit
integer value in order to handle full-width conversion.

....

Re: Power cost of IEEE754

<te233t$1b3a$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27496&group=comp.arch#27496

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!T3F9KNSTSM9ffyC31YXeHw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Tue, 23 Aug 2022 10:28:50 +0200
Organization: Aioe.org NNTP Server
Message-ID: <te233t$1b3a$1@gioia.aioe.org>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="44138"; posting-host="T3F9KNSTSM9ffyC31YXeHw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.13
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Tue, 23 Aug 2022 08:28 UTC

luke.l...@gmail.com wrote:
> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>
>> At the time IEE754 was being done (the first version) there were
>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>> who wanted correct answers (5--yes 5, maybe 6).
>
> 3D GPUs as you know are still focussed on speed (and power)
> due to simple pragmatic issues that the screen resolutions simply
> don't need high-accuracy to work out where a pixel goes. i.e.: if the
> screen is 1280x800 you certainly don't need accuracy above
> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
> ops were designed with low-accuracy.
>
> the *influence* of 3D GPU manufacturers although they are
> small in number is much higher (behind closed doors such
> as the Khronos Group) and so we just don't hear about
> it.
>
> Tom Forsyth on the other hand in his talk on Larrabee mentions
> that they concentrated on FP32, missed the goal of being a
> commercially-viable 3D GPU Card (performance and power)
> so attempted to target the scientific market instead, only to
> be told that the FP64 performance sucked and consequently
> they fell between two stools.

Funnily enough, for the only external architecture review for Larrabee
they invited a small bunch of games programmers, plus me who advocated
strongly for better HPC usability.

I did get a significanlty larger TLB, so that it at least was able to
cover the caches, but not nearly as much double performance as I wanted.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Power cost of IEEE754

<te3aot$7ln$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27497&group=comp.arch#27497

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Tue, 23 Aug 2022 19:45:33 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <te3aot$7ln$2@newsreader4.netcologne.de>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me>
Injection-Date: Tue, 23 Aug 2022 19:45:33 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:4e56:0:7285:c2ff:fe6c:992d";
logging-data="7863"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 23 Aug 2022 19:45 UTC

BGB <cr88192@gmail.com> schrieb:
> On 8/23/2022 12:08 AM, Thomas Koenig wrote:
>> BGB <cr88192@gmail.com> schrieb:
>>
>>> Double rounding allows cutting off the low bits from the multiplier with
>>> no one being able to see that you had cut off the low-order bits (within
>>> a statistical probability of inexact rounding).
>>
>> You are making the same error that Intel trying to explain the FDIV
>> bug at the time - this is not statistical, it is deterministic.
>
> While the rounding is deterministic, there is only a certain chance (on
> average) that it will effect the result.

If your input is in fact random, yes.

If your input is deterministic, no.
>
> Of the total number of floating point values that can exist, there is a
> smaller (but still very large) number of values for which rounding would
> be inexact.

FDIV affected around 1e-10 of all possible combinations, if German
Wikipedia is to be believed. I'm have a suspicion that Terje has
the data at his fingertips or in his drawer :-)

What is the fraction of wrong rounding in your calculations?

Re: Power cost of IEEE754

<24e69b46-34b2-434b-b7cf-c86e43bf8157n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27498&group=comp.arch#27498

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1350:b0:31f:1cb2:926d with SMTP id w16-20020a05622a135000b0031f1cb2926dmr21728191qtk.279.1661285201091;
Tue, 23 Aug 2022 13:06:41 -0700 (PDT)
X-Received: by 2002:a05:620a:4482:b0:6bb:c315:9597 with SMTP id
x2-20020a05620a448200b006bbc3159597mr16477680qkp.423.1661285200934; Tue, 23
Aug 2022 13:06:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 23 Aug 2022 13:06:40 -0700 (PDT)
In-Reply-To: <te3aot$7ln$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:41ee:9be:4d72:21e8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:41ee:9be:4d72:21e8
References: <t6gush$p5u$1@dont-email.me> <0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me> <te3aot$7ln$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24e69b46-34b2-434b-b7cf-c86e43bf8157n@googlegroups.com>
Subject: Re: Power cost of IEEE754
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 23 Aug 2022 20:06:41 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3305
 by: MitchAlsup - Tue, 23 Aug 2022 20:06 UTC

On Tuesday, August 23, 2022 at 2:45:36 PM UTC-5, Thomas Koenig wrote:
> BGB <cr8...@gmail.com> schrieb:
> > On 8/23/2022 12:08 AM, Thomas Koenig wrote:
> >> BGB <cr8...@gmail.com> schrieb:
> >>
> >>> Double rounding allows cutting off the low bits from the multiplier with
> >>> no one being able to see that you had cut off the low-order bits (within
> >>> a statistical probability of inexact rounding).
> >>
> >> You are making the same error that Intel trying to explain the FDIV
> >> bug at the time - this is not statistical, it is deterministic.
> >
> > While the rounding is deterministic, there is only a certain chance (on
> > average) that it will effect the result.
> If your input is in fact random, yes.
>
> If your input is deterministic, no.
<
The difference is why Kahan won in the IEEE discussions (1984).
<
When testing an algorithm with inputs of know accuracy, you need to
get results of known accuracy out the other side.
<
When running an algorithm with actual physically derived input,
it does not mater.
> >
> > Of the total number of floating point values that can exist, there is a
> > smaller (but still very large) number of values for which rounding would
> > be inexact.
<
> FDIV affected around 1e-10 of all possible combinations, if German
> Wikipedia is to be believed. I'm have a suspicion that Terje has
> the data at his fingertips or in his drawer :-)
>
> What is the fraction of wrong rounding in your calculations?
<
CRAY-1 had similar (to BGB) multiplier tree with various parallelograms
used to construct the tree and various parallelograms missing. Numerical
analysists tore their hair out trying to figure out how much accuracy could
be delivered when using a CRAY-1 to compute various vector and matrix
algorithms. And there were cases where it actually did mater.

Re: Power cost of IEEE754

<te3ckl$ake$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27499&group=comp.arch#27499

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!y0sttPrO1OAcON/g+jAtOw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Tue, 23 Aug 2022 22:17:25 +0200
Organization: Aioe.org NNTP Server
Message-ID: <te3ckl$ake$1@gioia.aioe.org>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me>
<te3aot$7ln$2@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="10894"; posting-host="y0sttPrO1OAcON/g+jAtOw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.13
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Tue, 23 Aug 2022 20:17 UTC

Thomas Koenig wrote:
> FDIV affected around 1e-10 of all possible combinations, if German
> Wikipedia is to be believed. I'm have a suspicion that Terje has
> the data at his fingertips or in his drawer :-)

FDIV _could_ hit 5/1024 divisors, that was the basis for our sw
workaround. In reality far less would actually fail, but many numerical
algorithms produce numbers like N.9999... and these were much more
susceptible than random mantissas.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Power cost of IEEE754

<te3da9$7ln$3@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27500&group=comp.arch#27500

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Tue, 23 Aug 2022 20:28:57 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <te3da9$7ln$3@newsreader4.netcologne.de>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me>
<te3aot$7ln$2@newsreader4.netcologne.de>
<24e69b46-34b2-434b-b7cf-c86e43bf8157n@googlegroups.com>
Injection-Date: Tue, 23 Aug 2022 20:28:57 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-4e56-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:4e56:0:7285:c2ff:fe6c:992d";
logging-data="7863"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 23 Aug 2022 20:28 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

> CRAY-1 had similar (to BGB) multiplier tree with various parallelograms
> used to construct the tree and various parallelograms missing. Numerical
> analysists tore their hair out trying to figure out how much accuracy could
> be delivered when using a CRAY-1 to compute various vector and matrix
> algorithms. And there were cases where it actually did mater.

I've read that Seymour Cray would have built a computer where 2.0 + 2.0
did not equal 4.0 if it made it a little faster:-)

And yes, the accuracy can matter for Kryolv subspace methods of
solving large sets of equations.

Re: Power cost of IEEE754

<ygnv8qi4qtd.fsf@y.z>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27501&group=comp.arch#27501

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
From: x...@y.z (Josh Vanderhoof)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org>
<tdiql4$96b$1@gioia.aioe.org> <tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
Reply-To: Josh Vanderhoof <jlv@mxsimulator.com>
Message-ID: <ygnv8qi4qtd.fsf@y.z>
Cancel-Lock: sha1:V7aSDG6ncUtTQ63lQD37IfwVDiA=
MIME-Version: 1.0
Content-Type: text/plain
Lines: 55
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Tue, 23 Aug 2022 21:01:51 UTC
Date: Tue, 23 Aug 2022 17:01:50 -0400
X-Received-Bytes: 3260
 by: Josh Vanderhoof - Tue, 23 Aug 2022 21:01 UTC

BGB <cr88192@gmail.com> writes:

> On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
>> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>>
>>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>>
>>>> At the time IEE754 was being done (the first version) there were
>>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>>> who wanted correct answers (5--yes 5, maybe 6).
>>>
>>> 3D GPUs as you know are still focussed on speed (and power)
>>> due to simple pragmatic issues that the screen resolutions simply
>>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>>> screen is 1280x800 you certainly don't need accuracy above
>>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>>> ops were designed with low-accuracy.
>>
>> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
>> that precision the vertices will only be accurate to around 3/4 of an
>> inch. It'd be a jittery mess as you move and rotate it. The screen
>> resolution has nothing to do with it.
>
>
> For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
> it is generally sufficient for stuff like Quake and similar. Much
> beyond this, who knows?...
>
> Still fares better than Binary16, which if used in the projection
> stages falls more solidly into "not sufficient" territory.
>
> Binary16 gives a jittery mess even within normal Quake maps (~ 4096
> inch cube).
>
>
> Ironically, a 16 bit mantissa seems to hold up pretty OK up until
> around 1 or 2 km from the origin (where jitter becomes fairly
> obvious), which isn't too far off from what I see with normal
> consumer-grade graphics cards.
>
> Whereas, if one does actually use full Binary32 precision for
> projection, this sort of jitter doesn't seem to happen until one is
> *significantly* further from the origin.
>
> Can't help but feel a little suspicious here...
>
> ...

If you suspect they're cheating on the precision, I don't think that's
the case. I once wrote an extended precision webgl mandelbrot program
that would have definitely noticed any cheating on precision and it
worked fine. Definitely not as fast as I hoped though.

FWIW, OpenGL ES defaults to 32 bit floats for the vertex shaders and 16
bit for the fragment shaders.

Re: Power cost of IEEE754

<te4kij$39041$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27503&group=comp.arch#27503

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Wed, 24 Aug 2022 02:38:54 -0500
Organization: A noiseless patient Spider
Lines: 116
Message-ID: <te4kij$39041$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me>
<te3aot$7ln$2@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 24 Aug 2022 07:38:59 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="fcfb7a1198f013838ede4db08f25325c";
logging-data="3440769"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+AGpWNqKZZFMJgI8luehQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:lRgM7s0mkbOmP8ckfTMUjPGVsVQ=
Content-Language: en-US
In-Reply-To: <te3aot$7ln$2@newsreader4.netcologne.de>
 by: BGB - Wed, 24 Aug 2022 07:38 UTC

On 8/23/2022 2:45 PM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>> On 8/23/2022 12:08 AM, Thomas Koenig wrote:
>>> BGB <cr88192@gmail.com> schrieb:
>>>
>>>> Double rounding allows cutting off the low bits from the multiplier with
>>>> no one being able to see that you had cut off the low-order bits (within
>>>> a statistical probability of inexact rounding).
>>>
>>> You are making the same error that Intel trying to explain the FDIV
>>> bug at the time - this is not statistical, it is deterministic.
>>
>> While the rounding is deterministic, there is only a certain chance (on
>> average) that it will effect the result.
>
> If your input is in fact random, yes.
>
> If your input is deterministic, no.
>>
>> Of the total number of floating point values that can exist, there is a
>> smaller (but still very large) number of values for which rounding would
>> be inexact.
>
> FDIV affected around 1e-10 of all possible combinations, if German
> Wikipedia is to be believed. I'm have a suspicion that Terje has
> the data at his fingertips or in his drawer :-)
>

From what I read, the FDIV bug would have a chance to basically break
everything below around the high 4 decimal digits of the result.

Luckily, the situation seems nowhere near that bad in this case.

> What is the fraction of wrong rounding in your calculations?

For FMUL, should be around 1 in 4096 is incorrectly rounded (from
sub-ULP bits), and 1 in 256 due to the limited carry propagation.

Double to Single also has a 1 in 256 rounding error (mostly due to
limited carry propagation).

So, say:
0x4035AAAAAAAAAAAF.F55 => 0x4035AAAAAAAAAAB0 (OK)
0x4035AAAAAAAAABFF.F55 => 0x4035AAAAAAAAABFF (hits 8-bit limit)

If the carry limit is hit, the value is left as-is (round towards zero),
since otherwise, say:
0x4035AAAAAAAAABFF.F55 => 0x4035AAAAAAAAAB00
Is worse than it would have been otherwise.

In cases where rounding is needed, long chains of ones seem to be
uncommon though.

Unless I am significantly wrong about something, it shouldn't be much
bigger than this.

The internal adders use full carry propagation in this case.

In the main FPU, Single precision operations are computed using Double,
with the value being converted to Single when it is stored to memory
(implying a similar carry-propagation-limited rounding conversion)

For FADD, the situation is fairly similar.

Note that for integer values less than around 2^52, FMUL and FADD will
produce exact results.

FADD internally uses twos complement logic for the mantissa, so does not
depend on the final rounding step to give correct results here.

This differs slightly from the low-precision FPU, which uses bitwise NOT
for negation. This is cheaper, and the result is on-average closer to
the non-truncated answer than had one used twos-complement, but makes it
basically incapable of doing integer math and giving an integer result
(like, 5.0 - 3.0 => 1.99997, yeah...).

The latter is seemingly still good enough for things like vertex
transformation or projection, and is faster (3 cycles vs 10 cycles), so
it is a win for this case.

There is a hardware FDIV, which seems to be accurate, but it is fairly
slow... The use of a Shift-Add divider being not exactly speedy...

The software divider uses 6 stage Newton-Raphson, but I have ran into an
issue that N-R is seemingly unable to converge much past the low 4 bits
or so of the result in any reasonable time frame (and this seems to be
more a thing of trying to hit the exact answer by chance).

Fast divider functions generally uses 1 or 2 N-R stages (often
sufficient, though not particularly accurate).

For example, TKRA-GL uses mostly 1 and 2 N-R stages for the dividers in
the vertex transform process.

Note that the dividers used internally for span-drawing are generally
using multiply-by-reciprocal and lookup tables (this part being fixed
point).

How I will deal with floating-point shader attributes is unclear, one
idea though is to fudge things slightly (flip the sign bits), and then
pretend the floating-point values are fixed-point (a little wonky, but
generally works well enough).

....

Re: Power cost of IEEE754

<te4vbb$2m5$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27505&group=comp.arch#27505

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!y0sttPrO1OAcON/g+jAtOw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Wed, 24 Aug 2022 12:42:57 +0200
Organization: Aioe.org NNTP Server
Message-ID: <te4vbb$2m5$1@gioia.aioe.org>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me>
<te3aot$7ln$2@newsreader4.netcologne.de> <te4kij$39041$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="2757"; posting-host="y0sttPrO1OAcON/g+jAtOw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.13
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 24 Aug 2022 10:42 UTC

BGB wrote:
> On 8/23/2022 2:45 PM, Thomas Koenig wrote:
>> FDIV affected around 1e-10 of all possible combinations, if German
>> Wikipedia is to be believed.  I'm have a suspicion that Terje has
>> the data at his fingertips or in his drawer :-)
>>
>
> From what I read, the FDIV bug would have a chance to basically break
> everything below around the high 4 decimal digits of the result.

Tim Coe determined (on paper, with no PC himself!) that two 7-digit
integer values taken as FP would trigger a maximally bad situation: In
those cases only the 7 first mantissa bits would be OK, the error would
start on the 8th bit!

I.e. you could only really depend on getting the first couple of decimal
digits more or less correct.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Power cost of IEEE754

<te5he9$3bsn3$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27506&group=comp.arch#27506

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Wed, 24 Aug 2022 10:51:32 -0500
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <te5he9$3bsn3$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me> <te10gg$2p8ng$1@dont-email.me>
<te1nco$5ds$1@newsreader4.netcologne.de> <te20u2$2uhk1$1@dont-email.me>
<te3aot$7ln$2@newsreader4.netcologne.de> <te4kij$39041$1@dont-email.me>
<te4vbb$2m5$1@gioia.aioe.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 24 Aug 2022 15:51:37 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="fcfb7a1198f013838ede4db08f25325c";
logging-data="3535587"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zG48KNc5jLHQTNDmXN/o1"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:Me5T+f7mwa6+pZI6y4NE6yYnKXk=
In-Reply-To: <te4vbb$2m5$1@gioia.aioe.org>
Content-Language: en-US
 by: BGB - Wed, 24 Aug 2022 15:51 UTC

On 8/24/2022 5:42 AM, Terje Mathisen wrote:
> BGB wrote:
>> On 8/23/2022 2:45 PM, Thomas Koenig wrote:
>>> FDIV affected around 1e-10 of all possible combinations, if German
>>> Wikipedia is to be believed.  I'm have a suspicion that Terje has
>>> the data at his fingertips or in his drawer :-)
>>>
>>
>>  From what I read, the FDIV bug would have a chance to basically break
>> everything below around the high 4 decimal digits of the result.
>
> Tim Coe determined (on paper, with no PC himself!) that two 7-digit
> integer values taken as FP would trigger a maximally bad situation: In
> those cases only the 7 first mantissa bits would be OK, the error would
> start on the 8th bit!
>
> I.e. you could only really depend on getting the first couple of decimal
> digits more or less correct.
>

Yeah, this seems just straight up broken...

> Terje
>

Re: Power cost of IEEE754

<te78ae$3jl51$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27509&group=comp.arch#27509

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Thu, 25 Aug 2022 02:28:09 -0500
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <te78ae$3jl51$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me> <ygnv8qi4qtd.fsf@y.z>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 25 Aug 2022 07:28:14 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="ac064cd8dbd67b1d9fbe849e61d446fe";
logging-data="3789985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+EtrER5SP9ec8dqoXu3Rxr"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:oUuqO9AqG3ouwHb5lhhdlfDNy8U=
Content-Language: en-US
In-Reply-To: <ygnv8qi4qtd.fsf@y.z>
 by: BGB - Thu, 25 Aug 2022 07:28 UTC

On 8/23/2022 4:01 PM, Josh Vanderhoof wrote:
> BGB <cr88192@gmail.com> writes:
>
>> On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
>>> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>>>
>>>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>>>
>>>>> At the time IEE754 was being done (the first version) there were
>>>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>>>> who wanted correct answers (5--yes 5, maybe 6).
>>>>
>>>> 3D GPUs as you know are still focussed on speed (and power)
>>>> due to simple pragmatic issues that the screen resolutions simply
>>>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>>>> screen is 1280x800 you certainly don't need accuracy above
>>>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>>>> ops were designed with low-accuracy.
>>>
>>> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
>>> that precision the vertices will only be accurate to around 3/4 of an
>>> inch. It'd be a jittery mess as you move and rotate it. The screen
>>> resolution has nothing to do with it.
>>
>>
>> For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
>> it is generally sufficient for stuff like Quake and similar. Much
>> beyond this, who knows?...
>>
>> Still fares better than Binary16, which if used in the projection
>> stages falls more solidly into "not sufficient" territory.
>>
>> Binary16 gives a jittery mess even within normal Quake maps (~ 4096
>> inch cube).
>>
>>
>> Ironically, a 16 bit mantissa seems to hold up pretty OK up until
>> around 1 or 2 km from the origin (where jitter becomes fairly
>> obvious), which isn't too far off from what I see with normal
>> consumer-grade graphics cards.
>>
>> Whereas, if one does actually use full Binary32 precision for
>> projection, this sort of jitter doesn't seem to happen until one is
>> *significantly* further from the origin.
>>
>> Can't help but feel a little suspicious here...
>>
>> ...
>
> If you suspect they're cheating on the precision, I don't think that's
> the case. I once wrote an extended precision webgl mandelbrot program
> that would have definitely noticed any cheating on precision and it
> worked fine. Definitely not as fast as I hoped though.
>
> FWIW, OpenGL ES defaults to 32 bit floats for the vertex shaders and 16
> bit for the fragment shaders.

Goes and starts looking stuff up, ...

It seems I lack any good explanation for the effect.
The cards I was using should have full Binary32 here.

But, this still leaves the jitter as something of a mystery, as in
theory (and based on some of my own informal testing) it shouldn't
become an issue until a significantly larger distance from the origin.

Or, otherwise, I would expect in my testing that truncating the low 7
bits off a Binary32 would be expected to cause jitter significantly
closer to the origin.

Basically, Quake should be a jittery mess if the pattern held.

Though, Quake *does* turn into a jittery mess if one tries to do the
vertex transformation and projection and similar using Binary16...

Re: Power cost of IEEE754

<ygnpmgonjas.fsf@y.z>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27510&group=comp.arch#27510

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
From: x...@y.z (Josh Vanderhoof)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org>
<tdiql4$96b$1@gioia.aioe.org> <tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me>
<ygnv8qi4qtd.fsf@y.z> <te78ae$3jl51$1@dont-email.me>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
Reply-To: Josh Vanderhoof <jlv@mxsimulator.com>
Message-ID: <ygnpmgonjas.fsf@y.z>
Cancel-Lock: sha1:n+F5p7Vk9AZIkKwx9q9UKJ6nO7s=
MIME-Version: 1.0
Content-Type: text/plain
Lines: 85
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Thu, 25 Aug 2022 20:46:04 UTC
Date: Thu, 25 Aug 2022 16:46:03 -0400
X-Received-Bytes: 4602
 by: Josh Vanderhoof - Thu, 25 Aug 2022 20:46 UTC

BGB <cr88192@gmail.com> writes:

> On 8/23/2022 4:01 PM, Josh Vanderhoof wrote:
>> BGB <cr88192@gmail.com> writes:
>>
>>> On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
>>>> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>>>>
>>>>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>>>>
>>>>>> At the time IEE754 was being done (the first version) there were
>>>>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>>>>> who wanted correct answers (5--yes 5, maybe 6).
>>>>>
>>>>> 3D GPUs as you know are still focussed on speed (and power)
>>>>> due to simple pragmatic issues that the screen resolutions simply
>>>>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>>>>> screen is 1280x800 you certainly don't need accuracy above
>>>>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>>>>> ops were designed with low-accuracy.
>>>>
>>>> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
>>>> that precision the vertices will only be accurate to around 3/4 of an
>>>> inch. It'd be a jittery mess as you move and rotate it. The screen
>>>> resolution has nothing to do with it.
>>>
>>>
>>> For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
>>> it is generally sufficient for stuff like Quake and similar. Much
>>> beyond this, who knows?...
>>>
>>> Still fares better than Binary16, which if used in the projection
>>> stages falls more solidly into "not sufficient" territory.
>>>
>>> Binary16 gives a jittery mess even within normal Quake maps (~ 4096
>>> inch cube).
>>>
>>>
>>> Ironically, a 16 bit mantissa seems to hold up pretty OK up until
>>> around 1 or 2 km from the origin (where jitter becomes fairly
>>> obvious), which isn't too far off from what I see with normal
>>> consumer-grade graphics cards.
>>>
>>> Whereas, if one does actually use full Binary32 precision for
>>> projection, this sort of jitter doesn't seem to happen until one is
>>> *significantly* further from the origin.
>>>
>>> Can't help but feel a little suspicious here...
>>>
>>> ...
>>
>> If you suspect they're cheating on the precision, I don't think that's
>> the case. I once wrote an extended precision webgl mandelbrot program
>> that would have definitely noticed any cheating on precision and it
>> worked fine. Definitely not as fast as I hoped though.
>>
>> FWIW, OpenGL ES defaults to 32 bit floats for the vertex shaders and 16
>> bit for the fragment shaders.
>
>
> Goes and starts looking stuff up, ...
>
> It seems I lack any good explanation for the effect.
> The cards I was using should have full Binary32 here.
>
>
> But, this still leaves the jitter as something of a mystery, as in
> theory (and based on some of my own informal testing) it shouldn't
> become an issue until a significantly larger distance from the origin.
>
>
> Or, otherwise, I would expect in my testing that truncating the low 7
> bits off a Binary32 would be expected to cause jitter significantly
> closer to the origin.
>
> Basically, Quake should be a jittery mess if the pattern held.
>
> Though, Quake *does* turn into a jittery mess if one tries to do the
> vertex transformation and projection and similar using Binary16...

Could be the Quake world is in world coordinates so it only has the
camera translation and rotation to lose precision on, whereas the
jittery program is made of objects that are translated and rotated into
place in addition to the camera transformation. That would require more
precision to work correctly.

Re: Power cost of IEEE754

<te94mm$3p2tt$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27511&group=comp.arch#27511

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Thu, 25 Aug 2022 19:38:40 -0500
Organization: A noiseless patient Spider
Lines: 215
Message-ID: <te94mm$3p2tt$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me> <ygnv8qi4qtd.fsf@y.z>
<te78ae$3jl51$1@dont-email.me> <ygnpmgonjas.fsf@y.z>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 26 Aug 2022 00:38:46 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="391e3646bfde77bc0adbb39aa3dadcdf";
logging-data="3967933"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DrXbajDuyvs5KOw98IA0o"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:pYnvhdTGpgX9HyydxAQpIrV397Q=
In-Reply-To: <ygnpmgonjas.fsf@y.z>
Content-Language: en-US
 by: BGB - Fri, 26 Aug 2022 00:38 UTC

On 8/25/2022 3:46 PM, Josh Vanderhoof wrote:
> BGB <cr88192@gmail.com> writes:
>
>> On 8/23/2022 4:01 PM, Josh Vanderhoof wrote:
>>> BGB <cr88192@gmail.com> writes:
>>>
>>>> On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
>>>>> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>>>>>
>>>>>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>>>>>
>>>>>>> At the time IEE754 was being done (the first version) there were
>>>>>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>>>>>> who wanted correct answers (5--yes 5, maybe 6).
>>>>>>
>>>>>> 3D GPUs as you know are still focussed on speed (and power)
>>>>>> due to simple pragmatic issues that the screen resolutions simply
>>>>>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>>>>>> screen is 1280x800 you certainly don't need accuracy above
>>>>>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>>>>>> ops were designed with low-accuracy.
>>>>>
>>>>> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
>>>>> that precision the vertices will only be accurate to around 3/4 of an
>>>>> inch. It'd be a jittery mess as you move and rotate it. The screen
>>>>> resolution has nothing to do with it.
>>>>
>>>>
>>>> For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
>>>> it is generally sufficient for stuff like Quake and similar. Much
>>>> beyond this, who knows?...
>>>>
>>>> Still fares better than Binary16, which if used in the projection
>>>> stages falls more solidly into "not sufficient" territory.
>>>>
>>>> Binary16 gives a jittery mess even within normal Quake maps (~ 4096
>>>> inch cube).
>>>>
>>>>
>>>> Ironically, a 16 bit mantissa seems to hold up pretty OK up until
>>>> around 1 or 2 km from the origin (where jitter becomes fairly
>>>> obvious), which isn't too far off from what I see with normal
>>>> consumer-grade graphics cards.
>>>>
>>>> Whereas, if one does actually use full Binary32 precision for
>>>> projection, this sort of jitter doesn't seem to happen until one is
>>>> *significantly* further from the origin.
>>>>
>>>> Can't help but feel a little suspicious here...
>>>>
>>>> ...
>>>
>>> If you suspect they're cheating on the precision, I don't think that's
>>> the case. I once wrote an extended precision webgl mandelbrot program
>>> that would have definitely noticed any cheating on precision and it
>>> worked fine. Definitely not as fast as I hoped though.
>>>
>>> FWIW, OpenGL ES defaults to 32 bit floats for the vertex shaders and 16
>>> bit for the fragment shaders.
>>
>>
>> Goes and starts looking stuff up, ...
>>
>> It seems I lack any good explanation for the effect.
>> The cards I was using should have full Binary32 here.
>>
>>
>> But, this still leaves the jitter as something of a mystery, as in
>> theory (and based on some of my own informal testing) it shouldn't
>> become an issue until a significantly larger distance from the origin.
>>
>>
>> Or, otherwise, I would expect in my testing that truncating the low 7
>> bits off a Binary32 would be expected to cause jitter significantly
>> closer to the origin.
>>
>> Basically, Quake should be a jittery mess if the pattern held.
>>
>> Though, Quake *does* turn into a jittery mess if one tries to do the
>> vertex transformation and projection and similar using Binary16...
>
> Could be the Quake world is in world coordinates so it only has the
> camera translation and rotation to lose precision on, whereas the
> jittery program is made of objects that are translated and rotated into
> place in addition to the camera transformation. That would require more
> precision to work correctly.

Could be.

In my searching, I did run into a paper which was talking about how it
is better (from a numerical precision POV) to handle Modelview and
Projection as two separate matrix multiplies (rather than a combined
matrix), and to handle both world displacement and camera orientation
via Modelview (leaving Projection) merely to handle frustum projection
and similar.

On one hand, this does imply that my 3D engines (which had typically
handled camera by translating and rotating the Projection matrix), are
probably not ideal regarding numerical precision.

Quake does basically the same thing though, using Modelview mostly to
position entities within the scene rather than deal with camera positioning.

But, this does still leave the issue that, if Binary32 precision were
the issue here, then presumably using a combined matrix with all the
values having the low 7 bits chopped off, should not give acceptable
precision.

Though, I guess one difference is that mainline OpenGL uses
perspective-correct texturing whereas TKRA-GL uses affine texturing and
dynamic tessellation. It is possible that the visible jitter could be
due to running into precision issues involving the (S,T)/W step or
similar (which is N/A in my case).

Also in TKRA-GL, most stuff beyond the front-end transform stage is
being handled using fixed-point math (mostly 16.16 and similar).

As can be noted, my normal workaround in my 3D engines for the "camera
distance from the origin" issue was effectively that chunks and similar
would mostly use region-local coordinates, with the chunks being
translated relative to the region's origin within the local coordinate
system.

So, in this case, the camera would move around within a +/- 512 meter
box, and trying to leave this box would effectively cause the local
origin to jump over by 1km or so.

With "km" being defined as 1024 "meters", and there being 32 "inches"
per meter; with accuracy to real-life being subject to interpretation
(though, this scaling is also pretty similar to the "inch" unit used in
Quake and friends).

In my last few engines, apart from the OpenGL facing parts, many of the
coordinates within the world were expressed using 32-bit fixed point.

In the BT2 engine (BGBTech2):
20.12 relative to a "meter" unit.
25.7 relative to an "inch" unit.
Likewise:
16.16 relative to a "chunk" unit.
12.20 relative to a "region" unit.

In the BT3 engine, it is adjusted slightly:
16.16 relative to a "meter" unit.
21.11 relative to an "inch" unit.
And:
12.20 relative to a "chunk" unit.
9.23 relative to a "region" unit.

In both engines, the fixed-point coordinates were interpreted as modulo,
so the world will effectively wrap over at the edges.

Where BT2 used a 1024x1024km world size, and BT3 is 64x64km.

In BT2, entity coordinates were stored using a hacked extended precision
floating point vector format.

In BT3, entity coordinates are also stored in 32-bit fixed-point.

The BT3 engine uses a smaller region size than BT2 (8x8x8 chunks rather
than 16x16x16).

Region is similar, though differing in a few minor ways:
BT2 used an entropy-coded LZ scheme for storing chunks;
And would also compress the region as a whole with another LZ stage.
BT3 uses my RP2 scheme for storing chunks.
No region-level compression is used.

BT3 regions added a bitmap to track which blocks are air or non-air,
which is used for speeding up raycast operations (BT3's renderer being
raycast based rather than based around using per-chunk vertex arrays).

In the BT2 engine, pretty much any chunks within the view radius would
be turned into vertex arrays, which would then be thrown at the GPU.

The raycast approach means that only blocks within line of sight of the
camera will be drawn (things outside the line of sight are implicitly
hidden). This approach works well for small draw distances, but doesn't
scale very well to large draw distances when using an actual GPU.

However, it is a little friendlier to software rasterization and slow GPUs.

Some other things are similar, for example, both use an indexed encoding
for blocks, where, say (unique blocks per chunk):
1: Chunk is effectively skipped;
2.. 16: Chunk uses 4-bit block indices;
17..255: Chunk uses 8-bit block indices.
Else: We need 32 bits per block.


Click here to read the complete article
Re: Power cost of IEEE754

<ygnsfli634w.fsf@y.z>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27512&group=comp.arch#27512

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: x...@y.z (Josh Vanderhoof)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org>
<tdiql4$96b$1@gioia.aioe.org> <tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me>
<ygnv8qi4qtd.fsf@y.z> <te78ae$3jl51$1@dont-email.me>
<ygnpmgonjas.fsf@y.z> <te94mm$3p2tt$1@dont-email.me>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
Reply-To: Josh Vanderhoof <jlv@mxsimulator.com>
Message-ID: <ygnsfli634w.fsf@y.z>
Cancel-Lock: sha1:iJHDNnSOLMVSC5EPfpkFTGe7pAE=
MIME-Version: 1.0
Content-Type: text/plain
Lines: 225
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Fri, 26 Aug 2022 22:39:28 UTC
Date: Fri, 26 Aug 2022 18:39:27 -0400
X-Received-Bytes: 10261
 by: Josh Vanderhoof - Fri, 26 Aug 2022 22:39 UTC

BGB <cr88192@gmail.com> writes:

> On 8/25/2022 3:46 PM, Josh Vanderhoof wrote:
>> BGB <cr88192@gmail.com> writes:
>>
>>> On 8/23/2022 4:01 PM, Josh Vanderhoof wrote:
>>>> BGB <cr88192@gmail.com> writes:
>>>>
>>>>> On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
>>>>>> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>>>>>>
>>>>>>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>>>>>>
>>>>>>>> At the time IEE754 was being done (the first version) there were
>>>>>>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>>>>>>> who wanted correct answers (5--yes 5, maybe 6).
>>>>>>>
>>>>>>> 3D GPUs as you know are still focussed on speed (and power)
>>>>>>> due to simple pragmatic issues that the screen resolutions simply
>>>>>>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>>>>>>> screen is 1280x800 you certainly don't need accuracy above
>>>>>>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>>>>>>> ops were designed with low-accuracy.
>>>>>>
>>>>>> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
>>>>>> that precision the vertices will only be accurate to around 3/4 of an
>>>>>> inch. It'd be a jittery mess as you move and rotate it. The screen
>>>>>> resolution has nothing to do with it.
>>>>>
>>>>>
>>>>> For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
>>>>> it is generally sufficient for stuff like Quake and similar. Much
>>>>> beyond this, who knows?...
>>>>>
>>>>> Still fares better than Binary16, which if used in the projection
>>>>> stages falls more solidly into "not sufficient" territory.
>>>>>
>>>>> Binary16 gives a jittery mess even within normal Quake maps (~ 4096
>>>>> inch cube).
>>>>>
>>>>>
>>>>> Ironically, a 16 bit mantissa seems to hold up pretty OK up until
>>>>> around 1 or 2 km from the origin (where jitter becomes fairly
>>>>> obvious), which isn't too far off from what I see with normal
>>>>> consumer-grade graphics cards.
>>>>>
>>>>> Whereas, if one does actually use full Binary32 precision for
>>>>> projection, this sort of jitter doesn't seem to happen until one is
>>>>> *significantly* further from the origin.
>>>>>
>>>>> Can't help but feel a little suspicious here...
>>>>>
>>>>> ...
>>>>
>>>> If you suspect they're cheating on the precision, I don't think that's
>>>> the case. I once wrote an extended precision webgl mandelbrot program
>>>> that would have definitely noticed any cheating on precision and it
>>>> worked fine. Definitely not as fast as I hoped though.
>>>>
>>>> FWIW, OpenGL ES defaults to 32 bit floats for the vertex shaders and 16
>>>> bit for the fragment shaders.
>>>
>>>
>>> Goes and starts looking stuff up, ...
>>>
>>> It seems I lack any good explanation for the effect.
>>> The cards I was using should have full Binary32 here.
>>>
>>>
>>> But, this still leaves the jitter as something of a mystery, as in
>>> theory (and based on some of my own informal testing) it shouldn't
>>> become an issue until a significantly larger distance from the origin.
>>>
>>>
>>> Or, otherwise, I would expect in my testing that truncating the low 7
>>> bits off a Binary32 would be expected to cause jitter significantly
>>> closer to the origin.
>>>
>>> Basically, Quake should be a jittery mess if the pattern held.
>>>
>>> Though, Quake *does* turn into a jittery mess if one tries to do the
>>> vertex transformation and projection and similar using Binary16...
>>
>> Could be the Quake world is in world coordinates so it only has the
>> camera translation and rotation to lose precision on, whereas the
>> jittery program is made of objects that are translated and rotated into
>> place in addition to the camera transformation. That would require more
>> precision to work correctly.
>
> Could be.
>
>
> In my searching, I did run into a paper which was talking about how it
> is better (from a numerical precision POV) to handle Modelview and
> Projection as two separate matrix multiplies (rather than a combined
> matrix), and to handle both world displacement and camera orientation
> via Modelview (leaving Projection) merely to handle frustum projection
> and similar.
>
> On one hand, this does imply that my 3D engines (which had typically
> handled camera by translating and rotating the Projection matrix), are
> probably not ideal regarding numerical precision.
>
> Quake does basically the same thing though, using Modelview mostly to
> position entities within the scene rather than deal with camera
> positioning.

Another reason is the eye-space coords are useful for lighting.

> But, this does still leave the issue that, if Binary32 precision were
> the issue here, then presumably using a combined matrix with all the
> values having the low 7 bits chopped off, should not give acceptable
> precision.
>
>
> Though, I guess one difference is that mainline OpenGL uses
> perspective-correct texturing whereas TKRA-GL uses affine texturing
> and dynamic tessellation. It is possible that the visible jitter could
> be due to running into precision issues involving the (S,T)/W step or
> similar (which is N/A in my case).

Affine might warp more but it shouldn't be any more jittery. Is it
possible you have a bug in your rasteriser stepping to the first pixel
center? If you're truncating texture coords it'll definitely jitter.

>
> Also in TKRA-GL, most stuff beyond the front-end transform stage is
> being handled using fixed-point math (mostly 16.16 and similar).
>
>
>
> As can be noted, my normal workaround in my 3D engines for the "camera
> distance from the origin" issue was effectively that chunks and
> similar would mostly use region-local coordinates, with the chunks
> being translated relative to the region's origin within the local
> coordinate system.
>
> So, in this case, the camera would move around within a +/- 512 meter
> box, and trying to leave this box would effectively cause the local
> origin to jump over by 1km or so.
>
> With "km" being defined as 1024 "meters", and there being 32 "inches"
> per meter; with accuracy to real-life being subject to interpretation
> (though, this scaling is also pretty similar to the "inch" unit used
> in Quake and friends).
>
>
> In my last few engines, apart from the OpenGL facing parts, many of
> the coordinates within the world were expressed using 32-bit fixed
> point.
>
> In the BT2 engine (BGBTech2):
> 20.12 relative to a "meter" unit.
> 25.7 relative to an "inch" unit.
> Likewise:
> 16.16 relative to a "chunk" unit.
> 12.20 relative to a "region" unit.
>
>
> In the BT3 engine, it is adjusted slightly:
> 16.16 relative to a "meter" unit.
> 21.11 relative to an "inch" unit.
> And:
> 12.20 relative to a "chunk" unit.
> 9.23 relative to a "region" unit.
>
> In both engines, the fixed-point coordinates were interpreted as
> modulo, so the world will effectively wrap over at the edges.
>
> Where BT2 used a 1024x1024km world size, and BT3 is 64x64km.
>
>
> In BT2, entity coordinates were stored using a hacked extended
> precision floating point vector format.
>
> In BT3, entity coordinates are also stored in 32-bit fixed-point.
>
> The BT3 engine uses a smaller region size than BT2 (8x8x8 chunks
> rather than 16x16x16).
>
> Region is similar, though differing in a few minor ways:
> BT2 used an entropy-coded LZ scheme for storing chunks;
> And would also compress the region as a whole with another LZ stage.
> BT3 uses my RP2 scheme for storing chunks.
> No region-level compression is used.
>
>
> BT3 regions added a bitmap to track which blocks are air or non-air,
> which is used for speeding up raycast operations (BT3's renderer being
> raycast based rather than based around using per-chunk vertex arrays).
>
> In the BT2 engine, pretty much any chunks within the view radius would
> be turned into vertex arrays, which would then be thrown at the GPU.
>
>
> The raycast approach means that only blocks within line of sight of
> the camera will be drawn (things outside the line of sight are
> implicitly hidden). This approach works well for small draw distances,
> but doesn't scale very well to large draw distances when using an
> actual GPU.
>
> However, it is a little friendlier to software rasterization and slow GPUs.
>
>
> Some other things are similar, for example, both use an indexed
> encoding for blocks, where, say (unique blocks per chunk):
> 1: Chunk is effectively skipped;
> 2.. 16: Chunk uses 4-bit block indices;
> 17..255: Chunk uses 8-bit block indices.
> Else: We need 32 bits per block.
>
> With a block format sort of like:
> (31:24): Edge Occlusion
> (23:20): Sky Light Level (Transparent)
> (19:16): Block Light Level (Transparent)
> (15:12): Block Light Color (Transparent)
> (11: 8): Block Attribute
> ( 7: 0): Block Type
>
>
> Neither engine gave "particularly interesting" gameplay though (it
> would be a pretty long road to catch up to something like Minecraft
> here).
>
> ...


Click here to read the complete article
Re: Power cost of IEEE754

<tec3o9$7aq2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27514&group=comp.arch#27514

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Fri, 26 Aug 2022 22:40:50 -0500
Organization: A noiseless patient Spider
Lines: 358
Message-ID: <tec3o9$7aq2$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me> <ygnv8qi4qtd.fsf@y.z>
<te78ae$3jl51$1@dont-email.me> <ygnpmgonjas.fsf@y.z>
<te94mm$3p2tt$1@dont-email.me> <ygnsfli634w.fsf@y.z>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 27 Aug 2022 03:40:57 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="42ae949326d302f94f7afdc71142f0e7";
logging-data="240450"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19BqVt4A8P4m5cERjTsgXwt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:jcseCtdtquUP3DOM4W+TdMRxDRE=
In-Reply-To: <ygnsfli634w.fsf@y.z>
Content-Language: en-US
 by: BGB - Sat, 27 Aug 2022 03:40 UTC

On 8/26/2022 5:39 PM, Josh Vanderhoof wrote:
> BGB <cr88192@gmail.com> writes:
>
>> On 8/25/2022 3:46 PM, Josh Vanderhoof wrote:
>>> BGB <cr88192@gmail.com> writes:
>>>
>>>> On 8/23/2022 4:01 PM, Josh Vanderhoof wrote:
>>>>> BGB <cr88192@gmail.com> writes:
>>>>>
>>>>>> On 8/22/2022 4:48 PM, Josh Vanderhoof wrote:
>>>>>>> "luke.l...@gmail.com" <luke.leighton@gmail.com> writes:
>>>>>>>
>>>>>>>> On Monday, August 22, 2022 at 3:18:19 AM UTC+1, MitchAlsup wrote:
>>>>>>>>
>>>>>>>>> At the time IEE754 was being done (the first version) there were
>>>>>>>>> 2 kinds of FP users, those who wanted fast FP (zillions) and those
>>>>>>>>> who wanted correct answers (5--yes 5, maybe 6).
>>>>>>>>
>>>>>>>> 3D GPUs as you know are still focussed on speed (and power)
>>>>>>>> due to simple pragmatic issues that the screen resolutions simply
>>>>>>>> don't need high-accuracy to work out where a pixel goes. i.e.: if the
>>>>>>>> screen is 1280x800 you certainly don't need accuracy above
>>>>>>>> ~14 bits, it is wasted silicon and power. this was why MIPS 3D ASE
>>>>>>>> ops were designed with low-accuracy.
>>>>>>>
>>>>>>> Say I have a 3d model that's 1000 feet long with 14 bit vertices. At
>>>>>>> that precision the vertices will only be accurate to around 3/4 of an
>>>>>>> inch. It'd be a jittery mess as you move and rotate it. The screen
>>>>>>> resolution has nothing to do with it.
>>>>>>
>>>>>>
>>>>>> For TKRA-GL, I am currently using a format with a 16-bit mantissa, and
>>>>>> it is generally sufficient for stuff like Quake and similar. Much
>>>>>> beyond this, who knows?...
>>>>>>
>>>>>> Still fares better than Binary16, which if used in the projection
>>>>>> stages falls more solidly into "not sufficient" territory.
>>>>>>
>>>>>> Binary16 gives a jittery mess even within normal Quake maps (~ 4096
>>>>>> inch cube).
>>>>>>
>>>>>>
>>>>>> Ironically, a 16 bit mantissa seems to hold up pretty OK up until
>>>>>> around 1 or 2 km from the origin (where jitter becomes fairly
>>>>>> obvious), which isn't too far off from what I see with normal
>>>>>> consumer-grade graphics cards.
>>>>>>
>>>>>> Whereas, if one does actually use full Binary32 precision for
>>>>>> projection, this sort of jitter doesn't seem to happen until one is
>>>>>> *significantly* further from the origin.
>>>>>>
>>>>>> Can't help but feel a little suspicious here...
>>>>>>
>>>>>> ...
>>>>>
>>>>> If you suspect they're cheating on the precision, I don't think that's
>>>>> the case. I once wrote an extended precision webgl mandelbrot program
>>>>> that would have definitely noticed any cheating on precision and it
>>>>> worked fine. Definitely not as fast as I hoped though.
>>>>>
>>>>> FWIW, OpenGL ES defaults to 32 bit floats for the vertex shaders and 16
>>>>> bit for the fragment shaders.
>>>>
>>>>
>>>> Goes and starts looking stuff up, ...
>>>>
>>>> It seems I lack any good explanation for the effect.
>>>> The cards I was using should have full Binary32 here.
>>>>
>>>>
>>>> But, this still leaves the jitter as something of a mystery, as in
>>>> theory (and based on some of my own informal testing) it shouldn't
>>>> become an issue until a significantly larger distance from the origin.
>>>>
>>>>
>>>> Or, otherwise, I would expect in my testing that truncating the low 7
>>>> bits off a Binary32 would be expected to cause jitter significantly
>>>> closer to the origin.
>>>>
>>>> Basically, Quake should be a jittery mess if the pattern held.
>>>>
>>>> Though, Quake *does* turn into a jittery mess if one tries to do the
>>>> vertex transformation and projection and similar using Binary16...
>>>
>>> Could be the Quake world is in world coordinates so it only has the
>>> camera translation and rotation to lose precision on, whereas the
>>> jittery program is made of objects that are translated and rotated into
>>> place in addition to the camera transformation. That would require more
>>> precision to work correctly.
>>
>> Could be.
>>
>>
>> In my searching, I did run into a paper which was talking about how it
>> is better (from a numerical precision POV) to handle Modelview and
>> Projection as two separate matrix multiplies (rather than a combined
>> matrix), and to handle both world displacement and camera orientation
>> via Modelview (leaving Projection) merely to handle frustum projection
>> and similar.
>>
>> On one hand, this does imply that my 3D engines (which had typically
>> handled camera by translating and rotating the Projection matrix), are
>> probably not ideal regarding numerical precision.
>>
>> Quake does basically the same thing though, using Modelview mostly to
>> position entities within the scene rather than deal with camera
>> positioning.
>
> Another reason is the eye-space coords are useful for lighting.
>

I tried to add an implementation of the OpenGL lighting stuff, not
really complete or tested as of yet (would need a test case).

My BT2 and BT3 engines were using precomputed lighting values via the
vertex colors.

My first 3D engine (BT1) had used dynamic lighting via fragment shaders,
and stencil shadows (Depth Pass; was initially Depth Fail but changed it
because apparently someone got a patent on Depth Fail).

At the time, the engine suffered from poor performance, and the only
real way to make it faster was to fall back to a simpler rendering
strategy (more like that used in my later engines).

Originally, the design of the BT1 engine started out as trying to do a
mock-up of Doom 3 (and early on fell well short of usable framerates).

Partial issues were things like needing to limit the amount of geometry
considered for the lighting and shadowing to only that which fell within
the light radius of the light-source in question; and then issues like
how to deal with a giant "sun" light-source (which would effect pretty
much all the geometry in the world at the same time).

Then switched to using multiple strategies, mostly pre-computing the
effects of static light sources (and using vertex lighting), using
stencil shadows mostly for movable entities, and with dynamic light
sources for moving objects.

Had also experimented some with shadow-maps and similar as well, ...

For later engines, I didn't bother, and just did everything with vertex
lighting and a single big rendering pass (not really bothering with
dynamic light sources or shadows).

Also, a single-pass vertex lighting also works well on old/weak GPUs (~
20 years old), as well as for software rendering.

>> But, this does still leave the issue that, if Binary32 precision were
>> the issue here, then presumably using a combined matrix with all the
>> values having the low 7 bits chopped off, should not give acceptable
>> precision.
>>
>>
>> Though, I guess one difference is that mainline OpenGL uses
>> perspective-correct texturing whereas TKRA-GL uses affine texturing
>> and dynamic tessellation. It is possible that the visible jitter could
>> be due to running into precision issues involving the (S,T)/W step or
>> similar (which is N/A in my case).
>
> Affine might warp more but it shouldn't be any more jittery. Is it
> possible you have a bug in your rasteriser stepping to the first pixel
> center? If you're truncating texture coords it'll definitely jitter.
>

I was mostly seeing the 2km jitter issue with OpenGL on actual GPUs
(such as, "GeForce GTX 260" or "Mobility Radeon R200" and similar).

The R200 (in an WinXP era laptop) also has the other funky behavior that
for BC3(DXT5), it cares about endpoint ordering in a similar way to BC1
(DXT1), so with the endpoints in the alpha-transparency ordering it will
draw black texels rather than interpolated texels.


Click here to read the complete article
Re: Power cost of IEEE754

<ygnler9e1hh.fsf@y.z>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27527&group=comp.arch#27527

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
From: x...@y.z (Josh Vanderhoof)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org>
<tdiql4$96b$1@gioia.aioe.org> <tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me>
<ygnv8qi4qtd.fsf@y.z> <te78ae$3jl51$1@dont-email.me>
<ygnpmgonjas.fsf@y.z> <te94mm$3p2tt$1@dont-email.me>
<ygnsfli634w.fsf@y.z> <tec3o9$7aq2$1@dont-email.me>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
Reply-To: Josh Vanderhoof <jlv@mxsimulator.com>
Message-ID: <ygnler9e1hh.fsf@y.z>
Cancel-Lock: sha1:HfDOqptFuw2r/z9rs+ffcsOb704=
MIME-Version: 1.0
Content-Type: text/plain
Lines: 99
X-Complaints-To: https://www.astraweb.com/aup
NNTP-Posting-Date: Sat, 27 Aug 2022 23:00:10 UTC
Date: Sat, 27 Aug 2022 19:00:10 -0400
X-Received-Bytes: 5543
 by: Josh Vanderhoof - Sat, 27 Aug 2022 23:00 UTC

BGB <cr88192@gmail.com> writes:

> I was mostly seeing the 2km jitter issue with OpenGL on actual GPUs
> (such as, "GeForce GTX 260" or "Mobility Radeon R200" and similar).
>
>
> The R200 (in an WinXP era laptop) also has the other funky behavior
> that for BC3(DXT5), it cares about endpoint ordering in a similar way
> to BC1 (DXT1), so with the endpoints in the alpha-transparency
> ordering it will draw black texels rather than interpolated texels.
>
> Also seems to have a few other limitations:
> Shaders can't contain loops or branches;
> Only a single texel fetch per bound texture;
> Can only bind 4 textures at a time;
> ...
>
> Trying to violate any of this, shader compilation fails and/or the
> geometry rendering as a flat white square.
>
>
> So, one of these cards is rated for ~ DirectX 10 / OpenGL 3.3, the
> other for DirectX 8 and OpenGL 1.4 (though the drivers seem to support
> OpenGL 2.1).
>
> Then there is the Intel GMA (in another laptop; came with Vista),
> where trying to use shaders just sort of immediately turns things into
> a slide-show. Also claims OpenGL 2.1 (but, can be noted that most non
> 1.x functionality is basically unusable, and there was a check-box in
> the driver configuration tool that seemingly switched it between
> reporting itself as GL 1.4 or 2.1).
>
> Runs fixed-function pretty OK though; can run Quake 3 and Half-Life
> pretty OK, Half-Life 2 was basically unplayable though (single digit
> framerates).
>
> Neither laptop can manage either Doom3 or Minecraft.

Intel's OpenGL likes to advertise features that it doesn't have. One
example is GL_SGIS_generate_mipmap. It's in the version string but if
you use it you just get black mipmaps. So instead of filtered textures
you get black textures at a distance.

> It seems neither my current GPU (GeForce GTX 980), nor seemingly the
> Intel GMA, have any significant issues at the 2km mark.
>
>
> When it happens, generally there is no real jitter below 2km, then one
> hits a point where it starts happening (initially fairly subtle), and
> it gets steadily worse the further one travels beyond this point (say,
> at 8km, the geometry is shaking around all over the place).
>
>
> But, not been able to find much information about the numerical
> properties of ~ 14 to 20 year old graphics hardware.

Actually, I thought we were talking about recent-ish hardware. It
wouldn't surprise me if 20 year old cards might have cheated on
precision.

> Meanwhile, TKRA-GL starts having jitter at the 2km mark if I use
> truncated floating point (S.E8.F16.z7), but it doesn't really happen
> at this point if using full precision Binary32.
>
>
> As noted, Binary16 (S.E5.F10) can work OK for coordinates and similar,
> but trying to use Binary16 vectors for the transformation/projection
> matrix or transform steps does not produce acceptable results (even at
> much smaller distances).
>
> Then again, at the extremes of a Quake 1 map, one is looking at around
> a 1 or 2 inch ULP with Binary16, so it stands to reason, ...
>
>
> Binary16 would give a separate sort of 2km issue:
> If one specifies coordinates in "inches", at 2km they will overflow
> the exponent range...
>
>
> Affine warping isn't really the issue here, and is pretty much
> unavoidable with this rasterizer design (depends mostly on how finely
> geometry is subdivided, which in turn effects performance).
>
> For 320x200, a size limit of around 56 or 64 pixels per edge seems
> like a reasonable compromise (so if the perimeter for a quad exceeds
> 224 pixels or so, it is subdivided).
>
> Finer subdivision is used for things than cross the frustum edges, and
> extra fine for geometry which crosses the near plane, mostly because
> glitches can occur otherwise (main alternative here would be to clip
> the geometry for primitives which cross the edge, but this would
> require partly redesigning this part of the process).

It seems like subdividing based on edge length would subdivide polygons
that don't need it. E.g. if the polygon is directly facing the camera,
Z doesn't change so affine is correct with no subdivision. But if you
have a polygon that's almost edge on to the camera, Z changes a lot and
it'll need lots of subdivision to not warp, even if the area in pixels
is smaller.

Re: Power cost of IEEE754

<tegfn8$n317$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27535&group=comp.arch#27535

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Power cost of IEEE754
Date: Sun, 28 Aug 2022 14:29:35 -0500
Organization: A noiseless patient Spider
Lines: 237
Message-ID: <tegfn8$n317$1@dont-email.me>
References: <t6gush$p5u$1@dont-email.me>
<0a9765d5-f885-4109-9ba8-69430513d05fn@googlegroups.com>
<jwvlerncc0p.fsf-monnier+comp.arch@gnu.org> <tdiql4$96b$1@gioia.aioe.org>
<tdun4o$2ifk6$1@dont-email.me>
<3888602b-9fcb-4c6b-9cb1-b3ae8d3c4fe7n@googlegroups.com>
<55c4f336-ba24-4812-a9cc-b313a62ee5d2n@googlegroups.com>
<ygntu64q7a5.fsf@y.z> <te1440$2phc4$1@dont-email.me> <ygnv8qi4qtd.fsf@y.z>
<te78ae$3jl51$1@dont-email.me> <ygnpmgonjas.fsf@y.z>
<te94mm$3p2tt$1@dont-email.me> <ygnsfli634w.fsf@y.z>
<tec3o9$7aq2$1@dont-email.me> <ygnler9e1hh.fsf@y.z>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 28 Aug 2022 19:29:44 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="643ea4f46fd515dbb23e4bcf390f9d16";
logging-data="756775"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/f3N77NoSjvD4zzaZgeV7U"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.12.0
Cancel-Lock: sha1:/aTaJO5pQH9amOYTVvby1cSnqeI=
Content-Language: en-US
In-Reply-To: <ygnler9e1hh.fsf@y.z>
 by: BGB - Sun, 28 Aug 2022 19:29 UTC

On 8/27/2022 6:00 PM, Josh Vanderhoof wrote:
> BGB <cr88192@gmail.com> writes:
>
>> I was mostly seeing the 2km jitter issue with OpenGL on actual GPUs
>> (such as, "GeForce GTX 260" or "Mobility Radeon R200" and similar).
>>
>>
>> The R200 (in an WinXP era laptop) also has the other funky behavior
>> that for BC3(DXT5), it cares about endpoint ordering in a similar way
>> to BC1 (DXT1), so with the endpoints in the alpha-transparency
>> ordering it will draw black texels rather than interpolated texels.
>>
>> Also seems to have a few other limitations:
>> Shaders can't contain loops or branches;
>> Only a single texel fetch per bound texture;
>> Can only bind 4 textures at a time;
>> ...
>>
>> Trying to violate any of this, shader compilation fails and/or the
>> geometry rendering as a flat white square.
>>
>>
>> So, one of these cards is rated for ~ DirectX 10 / OpenGL 3.3, the
>> other for DirectX 8 and OpenGL 1.4 (though the drivers seem to support
>> OpenGL 2.1).
>>
>> Then there is the Intel GMA (in another laptop; came with Vista),
>> where trying to use shaders just sort of immediately turns things into
>> a slide-show. Also claims OpenGL 2.1 (but, can be noted that most non
>> 1.x functionality is basically unusable, and there was a check-box in
>> the driver configuration tool that seemingly switched it between
>> reporting itself as GL 1.4 or 2.1).
>>
>> Runs fixed-function pretty OK though; can run Quake 3 and Half-Life
>> pretty OK, Half-Life 2 was basically unplayable though (single digit
>> framerates).
>>
>> Neither laptop can manage either Doom3 or Minecraft.
>
> Intel's OpenGL likes to advertise features that it doesn't have. One
> example is GL_SGIS_generate_mipmap. It's in the version string but if
> you use it you just get black mipmaps. So instead of filtered textures
> you get black textures at a distance.
>

Hmm...

From what I read before, apparently the GMA was primarily
fixed-function, and dealt with shaders by falling back to software
rasterization.

Can be use-able, but needs to be limited to fixed-function, and with a
"modest" polygon budget, ... So, basically on-par with Quake or
Half-Life or similar.

Quake 3 is sort of a mixed bag, as it seems to actually be a little
easier on the GPU (on average, it pushes a smaller number of larger
polygons, has less overdraw, etc). However, it is more expensive in
terms of memory footprint and CPU requirements (my port had tried
shaving it down, but only so much is possible; compared with Quake 1 or
2, its BSP trees are huge).

For a BSP, had before noted that a ray-sweep (from the camera) can give
less overdraw than the PVS, but walking the BSP tree for each ray-cast
is expensive (result is slower than the overdraw from the PVS would have
been). Something like a octree or blockmap would likely be better for
raycast visibility determination, but alas.

In theory (since the Quake 1 map source was released), I could rebuild
the Quake1 maps in terms of octress (and then possibly use a raycast
sweep or similar rather than PVS), but it is unclear if doing so is
worth the effort.

Would probably also keep most of the lights intact, and use vertex
lighting rather than lightmaps, ...

Say, each polygon has:
XYZ and ST coords;
A list of static-lighting contributions
Multiple non-animated lights;
A list of animated lighting contributions (intensity and sequence);
Grouped per light animation sequence;
A scratch list for dynamic light-sources.
The light-source itself does the query and update.

Then there would be a periodic process which runs and adds up the color
contributions to recalculate the current vertex color. Possibly with a
10Hz tick, triggered if the surface is both visible and one of the
animated light-sources or similar has changed.

One does need to statically subdivide large polygons, mostly for sake of
not being so large that vertex lighting can't work effectively (say,
subdividing any edge larger than 2 or 3 meters or so).

Lighting algo would likely be, say (offline):
Try to raycast from a bounding-box around the target vertex to a
bounding box around the lightsource (it is a hit if one of the ray-casts
pass);
Else, cast rays off in a few directions (from the target vertex), and
see if it one of these rays hits a vertex which can ray-cast back to the
light-source, and if so add any indirect lighting contributions;
....

For dynamic lights, would query a list of surfaces near the light
source, and then perform ray-casts and add the light from the dynamic
source, if the ray is not blocked by something (and skipping any cases
where the light is on the back-side of the polygon).

Where, line/polygon intersection check can be, roughly:
Check whether line would cross polygon's surface plane.
If no, no collision.
Calculate line/plane intersect between line and polygon's surface.
Check if intersection point falls within polygon's bounding box.
If no, no collision.
Check if intersection falls outside any of the polygon edges.
If yes, no collision.
Else, collision.

Though, for ray-cast occlusion checking, would likely make sense to keep
the original CSG brushes around as well, since it would be cheaper to
check for collisions between lines and CSG brushes than between lines
and polygons (clip line to brush planes and see if it is clipped away,
along with an initial bounding-box check).

....

>> It seems neither my current GPU (GeForce GTX 980), nor seemingly the
>> Intel GMA, have any significant issues at the 2km mark.
>>
>>
>> When it happens, generally there is no real jitter below 2km, then one
>> hits a point where it starts happening (initially fairly subtle), and
>> it gets steadily worse the further one travels beyond this point (say,
>> at 8km, the geometry is shaking around all over the place).
>>
>>
>> But, not been able to find much information about the numerical
>> properties of ~ 14 to 20 year old graphics hardware.
>
> Actually, I thought we were talking about recent-ish hardware. It
> wouldn't surprise me if 20 year old cards might have cheated on
> precision.
>

Newest GPU I have (in my main PC) was apparently released 7 years ago
(and I got it second-hand).

My laptops have older chips, and I have some older Xeon rack servers
with no real GPU whatsoever (though apparently the integrated graphics
were based on the ATI Rage, but not been able to get any 3D acceleration
from it).

Running Linux (CentOS) on it, can technically run Mesa LLVMpipe though...

My previous software GL was faster than LLVMpipe, but worse quality.

Aside from the affine texturing, my software GL's had typically used
nearest filtering for minimization, with linear typically only used for
magnification. Partly this is because things like trilinear filtering
are computationally expensive (and would need to be applied to the whole
scene rather than only to things near the camera).

>> Meanwhile, TKRA-GL starts having jitter at the 2km mark if I use
>> truncated floating point (S.E8.F16.z7), but it doesn't really happen
>> at this point if using full precision Binary32.
>>
>>
>> As noted, Binary16 (S.E5.F10) can work OK for coordinates and similar,
>> but trying to use Binary16 vectors for the transformation/projection
>> matrix or transform steps does not produce acceptable results (even at
>> much smaller distances).
>>
>> Then again, at the extremes of a Quake 1 map, one is looking at around
>> a 1 or 2 inch ULP with Binary16, so it stands to reason, ...
>>
>>
>> Binary16 would give a separate sort of 2km issue:
>> If one specifies coordinates in "inches", at 2km they will overflow
>> the exponent range...
>>
>>
>> Affine warping isn't really the issue here, and is pretty much
>> unavoidable with this rasterizer design (depends mostly on how finely
>> geometry is subdivided, which in turn effects performance).
>>
>> For 320x200, a size limit of around 56 or 64 pixels per edge seems
>> like a reasonable compromise (so if the perimeter for a quad exceeds
>> 224 pixels or so, it is subdivided).
>>
>> Finer subdivision is used for things than cross the frustum edges, and
>> extra fine for geometry which crosses the near plane, mostly because
>> glitches can occur otherwise (main alternative here would be to clip
>> the geometry for primitives which cross the edge, but this would
>> require partly redesigning this part of the process).
>
> It seems like subdividing based on edge length would subdivide polygons
> that don't need it. E.g. if the polygon is directly facing the camera,
> Z doesn't change so affine is correct with no subdivision. But if you
> have a polygon that's almost edge on to the camera, Z changes a lot and
> it'll need lots of subdivision to not warp, even if the area in pixels
> is smaller.


Click here to read the complete article
Re: Misc: Idle thoughts for cheap and fast(ish) GPU.

<2022Sep1.194347@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27583&group=comp.arch#27583

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Misc: Idle thoughts for cheap and fast(ish) GPU.
Date: Thu, 01 Sep 2022 17:43:47 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 39
Message-ID: <2022Sep1.194347@mips.complang.tuwien.ac.at>
References: <t6gush$p5u$1@dont-email.me> <t6htd8$q8c$1@gioia.aioe.org> <ae97dee0-35fc-4294-be1f-aca37367c1c8n@googlegroups.com> <88646c7a-3e1d-406e-b3ec-7171cfd4e235n@googlegroups.com> <9e7e834e-889e-4a61-a357-ae47061ef766n@googlegroups.com> <2869eb73-0329-49d5-8ea6-2021382bc82bn@googlegroups.com> <2022Aug17.190034@mips.complang.tuwien.ac.at> <7e3cd26f-aa39-48b7-866f-a0f04402c5a6n@googlegroups.com>
Injection-Info: reader01.eternal-september.org; posting-host="2c9f1754ab0862a5db02883727a2c504";
logging-data="2369127"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19sBnKug1HYKp7GGJeWHA5I"
Cancel-Lock: sha1:lIdagSayCUDiht/Qz9jD/rqRkrg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 1 Sep 2022 17:43 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Wednesday, August 17, 2022 at 8:33:06 PM UTC+3, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>> >According to what Anton told us few weeks ago, it does not apply to Rocket Lake
>> >Probably, does not apply to Tiger Lake either, as long as only 1 or 2 cores are
>> >crunching 512-bit stuff.
>> Actually, for the Ice Lake i5-1035G4, we see an AVX512 downclock with
>> 1 active core from 3.7GHz to 3.6GHz, but no downclock with more active
>> cores (2 cores are already at 3.6GHz without AVX512, 3 and 4 at
>> 3.3GHz)
>> <https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html>.
>>
>> I guess that the increased voltage for 3.7GHz in combination with
>> AVX512 can result in too much current draw for the core, so they
>> downclock (and lower voltage) to 3.6GHz. For the other cases the
>> voltage is already low enough, so no proactive downclocking is
>> necessary, and you can leave reactive downclocking to the power and
>> temperature limits. If my guess is correct, I expect that Tiger Lake
>> will be similar.
>
>An equivalent of i5-1035G4 in Tiger Lake family is i5-1155G7
>that has max. frequency = 4.5 GHz.

I happen to have an i5-1135G7, which has a max frequency of 4200GHz.
I ran avx-turbo there, and you can find the results on
<http://www.complang.tuwien.ac.at/anton/avx-turbo-i5-1135G7>.

Looking at the 1-core results, I see results like 4234MHz and 4133MHz;
the latter is probably due to license-based downclocking. It happens
for all the avx512 tests, and all the others are at the higher clock
frequency (on repeating, also avx256_merge_sparse).

A similar thing seems to happen with two active cores. Starting at 3
active cores, the CPU seems to run into the power or thermal limit.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Misc: Idle thoughts for cheap and fast(ish) GPU.

<b1269dba-f1d2-49c7-9dcd-28ff25e8d63an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27584&group=comp.arch#27584

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:21a6:b0:499:2e7:d7fe with SMTP id t6-20020a05621421a600b0049902e7d7femr18618152qvc.63.1662057953225;
Thu, 01 Sep 2022 11:45:53 -0700 (PDT)
X-Received: by 2002:ac8:7d84:0:b0:344:662d:278c with SMTP id
c4-20020ac87d84000000b00344662d278cmr24333690qtd.513.1662057953078; Thu, 01
Sep 2022 11:45:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 1 Sep 2022 11:45:52 -0700 (PDT)
In-Reply-To: <2022Sep1.194347@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d59:affd:e2d9:9e01;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d59:affd:e2d9:9e01
References: <t6gush$p5u$1@dont-email.me> <t6htd8$q8c$1@gioia.aioe.org>
<ae97dee0-35fc-4294-be1f-aca37367c1c8n@googlegroups.com> <88646c7a-3e1d-406e-b3ec-7171cfd4e235n@googlegroups.com>
<9e7e834e-889e-4a61-a357-ae47061ef766n@googlegroups.com> <2869eb73-0329-49d5-8ea6-2021382bc82bn@googlegroups.com>
<2022Aug17.190034@mips.complang.tuwien.ac.at> <7e3cd26f-aa39-48b7-866f-a0f04402c5a6n@googlegroups.com>
<2022Sep1.194347@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b1269dba-f1d2-49c7-9dcd-28ff25e8d63an@googlegroups.com>
Subject: Re: Misc: Idle thoughts for cheap and fast(ish) GPU.
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 01 Sep 2022 18:45:53 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 43
 by: MitchAlsup - Thu, 1 Sep 2022 18:45 UTC

On Thursday, September 1, 2022 at 12:53:38 PM UTC-5, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Wednesday, August 17, 2022 at 8:33:06 PM UTC+3, Anton Ertl wrote:
> >> Michael S <already...@yahoo.com> writes:
> >> >According to what Anton told us few weeks ago, it does not apply to Rocket Lake
> >> >Probably, does not apply to Tiger Lake either, as long as only 1 or 2 cores are
> >> >crunching 512-bit stuff.
> >> Actually, for the Ice Lake i5-1035G4, we see an AVX512 downclock with
> >> 1 active core from 3.7GHz to 3.6GHz, but no downclock with more active
> >> cores (2 cores are already at 3.6GHz without AVX512, 3 and 4 at
> >> 3.3GHz)
> >> <https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html>.
> >>
> >> I guess that the increased voltage for 3.7GHz in combination with
> >> AVX512 can result in too much current draw for the core, so they
> >> downclock (and lower voltage) to 3.6GHz. For the other cases the
> >> voltage is already low enough, so no proactive downclocking is
> >> necessary, and you can leave reactive downclocking to the power and
> >> temperature limits. If my guess is correct, I expect that Tiger Lake
> >> will be similar.
> >
> >An equivalent of i5-1035G4 in Tiger Lake family is i5-1155G7
> >that has max. frequency = 4.5 GHz.
>
> I happen to have an i5-1135G7, which has a max frequency of 4200GHz.
> I ran avx-turbo there, and you can find the results on
> <http://www.complang.tuwien.ac.at/anton/avx-turbo-i5-1135G7>.
>
> Looking at the 1-core results, I see results like 4234MHz and 4133MHz;
> the latter is probably due to license-based downclocking. It happens
> for all the avx512 tests, and all the others are at the higher clock
> frequency (on repeating, also avx256_merge_sparse).
>
> A similar thing seems to happen with two active cores. Starting at 3
> active cores, the CPU seems to run into the power or thermal limit.
<
So, in effect, you can have as many cores as you are willing to buy,
but you can only use 3 of them ?!?
<
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Pages:123456
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor