Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

If you're not part of the solution, you're part of the precipitate.


devel / comp.arch / Re: Approximate reciprocals

SubjectAuthor
* Approximate reciprocalsMarcus
+* Re: Approximate reciprocalsTerje Mathisen
|+- Re: Approximate reciprocalsrobf...@gmail.com
|+* Re: Approximate reciprocalsMarcus
||+- Re: Approximate reciprocalsMitchAlsup
||`* Re: Approximate reciprocalsTerje Mathisen
|| +- Re: Approximate reciprocalsMarcus
|| `- Re: Approximate reciprocalsMitchAlsup
|`* Re: Approximate reciprocalsQuadibloc
| `- Re: Approximate reciprocalsTerje Mathisen
+* Re: Approximate reciprocalsMitchAlsup
|+* Re: Approximate reciprocalsMarcus
||`* Re: Approximate reciprocalsMitchAlsup
|| `- Re: Approximate reciprocalsBGB
|`* Re: Approximate reciprocalsThomas Koenig
| `* Re: Approximate reciprocalsMitchAlsup
|  `* Re: Approximate reciprocalsThomas Koenig
|   +* Re: Approximate reciprocalsMichael S
|   |`* Re: Approximate reciprocalsThomas Koenig
|   | `* Re: Approximate reciprocalsMichael S
|   |  `* Re: Approximate reciprocalsThomas Koenig
|   |   `* Re: Approximate reciprocalsMichael S
|   |    `* Re: Approximate reciprocalsThomas Koenig
|   |     `* Re: Approximate reciprocalsMichael S
|   |      `* Re: Approximate reciprocalsMichael S
|   |       +* Re: Approximate reciprocalsTerje Mathisen
|   |       |+* Re: Approximate reciprocalsMitchAlsup
|   |       ||`* Re: Approximate reciprocalsTerje Mathisen
|   |       || `* Re: Approximate reciprocalsMitchAlsup
|   |       ||  +- Re: Approximate reciprocalsTerje Mathisen
|   |       ||  `- Re: Approximate reciprocalsQuadibloc
|   |       |`- Re: Approximate reciprocalsMichael S
|   |       `* Re: Approximate reciprocalsThomas Koenig
|   |        `* Re: Approximate reciprocalsMichael S
|   |         `* Re: Approximate reciprocalsThomas Koenig
|   |          `* Re: Approximate reciprocalsMichael S
|   |           `* Re: Approximate reciprocalsMichael S
|   |            +* Re: Approximate reciprocalsMitchAlsup
|   |            |`* Re: Approximate reciprocalsJames Van Buskirk
|   |            | `- Re: Approximate reciprocalsMitchAlsup
|   |            `* Re: Approximate reciprocalsThomas Koenig
|   |             `* Re: Approximate reciprocalsMichael S
|   |              +- Re: Approximate reciprocalsMichael S
|   |              +* Re: Approximate reciprocalsMitchAlsup
|   |              |`* Re: Approximate reciprocalsTerje Mathisen
|   |              | `* Re: Approximate reciprocalsMitchAlsup
|   |              |  +- Re: Approximate reciprocalsMichael S
|   |              |  `* Re: Approximate reciprocalsTerje Mathisen
|   |              |   `* Re: Approximate reciprocalsMitchAlsup
|   |              |    +- Re: Approximate reciprocalsMichael S
|   |              |    +- Re: Approximate reciprocalsMichael S
|   |              |    `- Re: Approximate reciprocalsTerje Mathisen
|   |              +* Re: Approximate reciprocalsMichael S
|   |              |`* Re: Approximate reciprocalsThomas Koenig
|   |              | +- Re: Approximate reciprocalsMichael S
|   |              | `* Re: Approximate reciprocalsTerje Mathisen
|   |              |  +- Re: Approximate reciprocalsQuadibloc
|   |              |  +* Re: Approximate reciprocalsThomas Koenig
|   |              |  |+- Re: Approximate reciprocalsMichael S
|   |              |  |+- Re: Approximate reciprocalsTerje Mathisen
|   |              |  |`* Re: Approximate reciprocalsMichael S
|   |              |  | `* Re: Approximate reciprocalsThomas Koenig
|   |              |  |  +- Re: Approximate reciprocalsMichael S
|   |              |  |  `* Re: Approximate reciprocalsMichael S
|   |              |  |   `* Re: Approximate reciprocalsThomas Koenig
|   |              |  |    `* Re: Approximate reciprocalsMichael S
|   |              |  |     `* Re: Approximate reciprocalsMichael S
|   |              |  |      `* Re: Approximate reciprocalsThomas Koenig
|   |              |  |       `* Re: Approximate reciprocalsMichael S
|   |              |  |        +* Re: Approximate reciprocalsrobf...@gmail.com
|   |              |  |        |`* Useful floating point instructions (was: Approximate reciprocals)Thomas Koenig
|   |              |  |        | `* Re: Useful floating point instructionsTerje Mathisen
|   |              |  |        |  `* Re: Useful floating point instructionsStephen Fuld
|   |              |  |        |   `* Re: Useful floating point instructionsMitchAlsup
|   |              |  |        |    `* Re: Useful floating point instructionsStephen Fuld
|   |              |  |        |     +- Re: Useful floating point instructionsMitchAlsup
|   |              |  |        |     +* Re: Useful floating point instructionsMichael S
|   |              |  |        |     |+- Re: Useful floating point instructionsStephen Fuld
|   |              |  |        |     |`- Re: Useful floating point instructionsTerje Mathisen
|   |              |  |        |     `* Re: Useful floating point instructionsTerje Mathisen
|   |              |  |        |      `- Re: Useful floating point instructionsStefan Monnier
|   |              |  |        +* Re: Approximate reciprocalsMichael S
|   |              |  |        |`* Re: Approximate reciprocalsGeorge Neuner
|   |              |  |        | +* Re: Approximate reciprocalsAnton Ertl
|   |              |  |        | |+* Re: Approximate reciprocalsMichael S
|   |              |  |        | ||`* Re: Approximate reciprocalsAnton Ertl
|   |              |  |        | || `- Re: Approximate reciprocalsMichael S
|   |              |  |        | |`* Re: Approximate reciprocalsGeorge Neuner
|   |              |  |        | | `* Re: Approximate reciprocalsAnton Ertl
|   |              |  |        | |  `* Re: Approximate reciprocalsMichael S
|   |              |  |        | |   `* Re: Approximate reciprocalsTerje Mathisen
|   |              |  |        | |    `* Re: Approximate reciprocalsMichael S
|   |              |  |        | |     `* Re: Approximate reciprocalsTerje Mathisen
|   |              |  |        | |      `- Re: Approximate reciprocalsMitchAlsup
|   |              |  |        | +- Re: Approximate reciprocalsMichael S
|   |              |  |        | `* Re: Approximate reciprocalsJohn Dallman
|   |              |  |        |  +- Re: Approximate reciprocalsMitchAlsup
|   |              |  |        |  `* Re: Approximate reciprocalsGeorge Neuner
|   |              |  |        |   +* Re: Approximate reciprocalsMichael S
|   |              |  |        |   |+* Re: Approximate reciprocalsEricP
|   |              |  |        |   ||`* Re: Approximate reciprocalsAnton Ertl
|   |              |  |        |   |`* Re: Approximate reciprocalsAnton Ertl
|   |              |  |        |   `* Re: Approximate reciprocalsJohn Dallman
|   |              |  |        +- Re: Approximate reciprocalsMichael S
|   |              |  |        `- Re: Approximate reciprocalsMichael S
|   |              |  `* Re: Approximate reciprocalsMichael S
|   |              `- Re: Approximate reciprocalsMichael S
|   `- Re: Approximate reciprocalsTerje Mathisen
+* Re: Approximate reciprocalsElijah Stone
+* Re: Approximate reciprocalsMarcus
`* Re: Approximate reciprocalsMarcus

Pages:12345678910111213
Re: Approximate reciprocals

<ca788e28-7e17-4734-a3a5-4517f16f352cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24506&group=comp.arch#24506

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4512:b0:67d:52fc:4792 with SMTP id t18-20020a05620a451200b0067d52fc4792mr16270268qkp.458.1648481458536;
Mon, 28 Mar 2022 08:30:58 -0700 (PDT)
X-Received: by 2002:a05:6808:1451:b0:2ec:cfe4:21e with SMTP id
x17-20020a056808145100b002eccfe4021emr17551308oiv.147.1648481458287; Mon, 28
Mar 2022 08:30:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 08:30:58 -0700 (PDT)
In-Reply-To: <t1rjon$pqm$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:60:82f3:436c:9b29;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:60:82f3:436c:9b29
References: <t1c154$j5t$1@dont-email.me> <81bd21bb-8e02-4629-9749-d846be44ef43n@googlegroups.com>
<t1d0r8$o4v$1@newsreader4.netcologne.de> <903965ad-5226-49d5-9883-57b1bc836fd7n@googlegroups.com>
<t1dckv$u7$2@newsreader4.netcologne.de> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com> <t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com> <285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<600fb4d3-5e5f-490f-b7b6-4301a93f656en@googlegroups.com> <t1rjon$pqm$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ca788e28-7e17-4734-a3a5-4517f16f352cn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Mar 2022 15:30:58 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 35
 by: MitchAlsup - Mon, 28 Mar 2022 15:30 UTC

On Monday, March 28, 2022 at 1:14:52 AM UTC-5, James Van Buskirk wrote:
> "MitchAlsup" wrote in message
> news:600fb4d3-5e5f-490f...@googlegroups.com...
> > First 3 iterations are SP, next iteration is DP, next 2 are QP.
> Have you investigated the potential of using variable numbers of
> terms from the series
>
> x/(1-(1-x*D)) = x*sum([((1-x*D)**n,n=0,∞)])
> x/sqrt(1-(1-x**2*D)) =
> x*sum([(gamma(n+0.5)/(sqrt(pi)*gamma(n+1.0))*(1-x**2*D)**n,n=0,∞)])
<
Cursarilly: I develop HW algorithms, these are not constrained like SW algorithms
are. In HW, one often uses an algorithm that would be less optimal in SW imple-
mentation, but HW run faster--for example: Goldschmidt is "like" Newton-Raphson
but the multiplies are independent rather than dependent, so if you can keep the
multiplier tree occupied, it will be faster than N-R. SW is always bound by the
latency of the multiplier function unit, HW only by the multiplier tree itself. So,
Goldschmidt runs 1 iteration every 2 cycles, N-R runs 1 iteration every 2×multiplier
latency (generally 4 or 5). About 4× faster per iteration. HW has no trouble converting
from one precision to another, so 3 iterations in SP, 1 in DP, and 2 in QP is straight
forward (no conversion instructions.)
>
> to perhaps remove an iteration somewhere?
<
HW removes cycles directly, SW attempts to use tricks to remove 1 iteration..

Re: Approximate reciprocals

<d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24507&group=comp.arch#24507

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7fc6:0:b0:2e1:ce3e:b491 with SMTP id b6-20020ac87fc6000000b002e1ce3eb491mr23353038qtk.287.1648490555770;
Mon, 28 Mar 2022 11:02:35 -0700 (PDT)
X-Received: by 2002:a05:6808:152b:b0:2ec:f48f:8120 with SMTP id
u43-20020a056808152b00b002ecf48f8120mr216872oiw.58.1648490555267; Mon, 28 Mar
2022 11:02:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!144.202.29.153.MISMATCH!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 11:02:35 -0700 (PDT)
In-Reply-To: <e185d33-8bd-e6c4-867f-ed5abbd6969a@elronnd.net>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:447:ed3a:39b2:a631;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:447:ed3a:39b2:a631
References: <t1c154$j5t$1@dont-email.me> <e185d33-8bd-e6c4-867f-ed5abbd6969a@elronnd.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 28 Mar 2022 18:02:35 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 13
 by: Quadibloc - Mon, 28 Mar 2022 18:02 UTC

On Friday, March 25, 2022 at 4:22:59 AM UTC-6, Elijah Stone wrote:
> The itanium had only an fp reciprocal; no division, that had to be done in
> software.

Yes, and that was a mistake when the Itanium did it, and it would be
a mistake for any other processor intended for general-purpose use.

Instructions take time to decode and fetch and so on. So if you
have no divide instruction, there's no way that you're going to be
able to do start one division going in the pipeline every single cycle.
Which is what a decently fast processor ought to be able to do, at
least as a peak rate.

John Savard

Re: Approximate reciprocals

<ca22bd4f-5c84-41ac-8541-a8c414cdd27dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24508&group=comp.arch#24508

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2586:b0:67b:1141:ce24 with SMTP id x6-20020a05620a258600b0067b1141ce24mr17546676qko.542.1648490823015;
Mon, 28 Mar 2022 11:07:03 -0700 (PDT)
X-Received: by 2002:a9d:4e99:0:b0:5b2:54f4:75e7 with SMTP id
v25-20020a9d4e99000000b005b254f475e7mr10700091otk.94.1648490822741; Mon, 28
Mar 2022 11:07:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 11:07:02 -0700 (PDT)
In-Reply-To: <1c0e4f2e-9ecf-4467-a8cb-87613af14e65n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:447:ed3a:39b2:a631;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:447:ed3a:39b2:a631
References: <t1c154$j5t$1@dont-email.me> <81bd21bb-8e02-4629-9749-d846be44ef43n@googlegroups.com>
<t1d0r8$o4v$1@newsreader4.netcologne.de> <903965ad-5226-49d5-9883-57b1bc836fd7n@googlegroups.com>
<t1dckv$u7$2@newsreader4.netcologne.de> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1kui4$l4a$1@gioia.aioe.org>
<35f5c160-b3f8-4d9f-9d38-57b3010705cdn@googlegroups.com> <t1lcmp$1ef7$1@gioia.aioe.org>
<1c0e4f2e-9ecf-4467-a8cb-87613af14e65n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ca22bd4f-5c84-41ac-8541-a8c414cdd27dn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 28 Mar 2022 18:07:03 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Mon, 28 Mar 2022 18:07 UTC

On Friday, March 25, 2022 at 4:32:10 PM UTC-6, MitchAlsup quoted, in part:
> That is basically why we don't do
> CORDIC anymore.

Hey, CORDIC was a _great_ algorithm. For computers that didn't
have a _multiply_ instruction in hardware. So of course it's not
useful on computers that can even do division and floating-point
arithmetic in hardware.

Pocket calculators, with four-bit processors that indeed don't
have hardware multiply, still do CORDIC.

John Savard

Re: Approximate reciprocals

<f27736fb-d4f9-41af-9de3-4fbe712ba8e7n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24509&group=comp.arch#24509

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2461:b0:442:6b33:7b61 with SMTP id im1-20020a056214246100b004426b337b61mr14442684qvb.57.1648493944509;
Mon, 28 Mar 2022 11:59:04 -0700 (PDT)
X-Received: by 2002:a05:6870:45a4:b0:dd:b08e:fa49 with SMTP id
y36-20020a05687045a400b000ddb08efa49mr324947oao.270.1648493944272; Mon, 28
Mar 2022 11:59:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 11:59:04 -0700 (PDT)
In-Reply-To: <d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:7408:496f:7430:392a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:7408:496f:7430:392a
References: <t1c154$j5t$1@dont-email.me> <e185d33-8bd-e6c4-867f-ed5abbd6969a@elronnd.net>
<d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f27736fb-d4f9-41af-9de3-4fbe712ba8e7n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 28 Mar 2022 18:59:04 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16
 by: Michael S - Mon, 28 Mar 2022 18:59 UTC

On Monday, March 28, 2022 at 9:02:37 PM UTC+3, Quadibloc wrote:
> On Friday, March 25, 2022 at 4:22:59 AM UTC-6, Elijah Stone wrote:
> > The itanium had only an fp reciprocal; no division, that had to be done in
> > software.
>
> Yes, and that was a mistake when the Itanium did it, and it would be
> a mistake for any other processor intended for general-purpose use.
>
> Instructions take time to decode and fetch and so on. So if you
> have no divide instruction, there's no way that you're going to be
> able to do start one division going in the pipeline every single cycle.
> Which is what a decently fast processor ought to be able to do, at
> least as a peak rate.
>
> John Savard

According to this definition, decently fast processors still do not exist.

Re: Approximate reciprocals

<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24510&group=comp.arch#24510

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5f05:0:b0:440:ea8c:c439 with SMTP id fo5-20020ad45f05000000b00440ea8cc439mr23337416qvb.69.1648501238735;
Mon, 28 Mar 2022 14:00:38 -0700 (PDT)
X-Received: by 2002:a05:6870:9604:b0:de:a876:fbba with SMTP id
d4-20020a056870960400b000dea876fbbamr493875oaq.239.1648501238498; Mon, 28 Mar
2022 14:00:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 14:00:38 -0700 (PDT)
In-Reply-To: <b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:7408:496f:7430:392a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:7408:496f:7430:392a
References: <t1c154$j5t$1@dont-email.me> <81bd21bb-8e02-4629-9749-d846be44ef43n@googlegroups.com>
<t1d0r8$o4v$1@newsreader4.netcologne.de> <903965ad-5226-49d5-9883-57b1bc836fd7n@googlegroups.com>
<t1dckv$u7$2@newsreader4.netcologne.de> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com> <t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com> <285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de> <b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 28 Mar 2022 21:00:38 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 31
 by: Michael S - Mon, 28 Mar 2022 21:00 UTC

On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> > Michael S <already...@yahoo.com> schrieb:
> > > On my system(s) I see another very strange timing effect:
> > > The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> > > it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> > Source for the sqrt routine in libquadmath is here:
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> >
> > Hmm.. they first do a test if it is within double precision range,
> > with two Newton iterations, then a test if it is within long double
> > range with a single Newton iteration, and if it is outside then
> > they pick apart the number and run two Newton iterations.
> >
> > Were the numbers outside the range of double inside the range
> > of long double?
> Yes.
> I didn't test "outside long double".
> If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> > That could explain things (and suggest
> > an improvement).
>
> I think, the whole routine needs rewrite, rather than improvements.
> The original author probably was not thinking very hard.

Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
For example,
sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
The correct answer is 1.03168097468685393293622695023074831927

I wonder if on Linux result is also wrong.

Re: Approximate reciprocals

<373b3e6e-5e30-4fa9-a283-ad21f484363bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24511&group=comp.arch#24511

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:450f:b0:67d:b1ee:bd3 with SMTP id t15-20020a05620a450f00b0067db1ee0bd3mr17383801qkp.766.1648503589447;
Mon, 28 Mar 2022 14:39:49 -0700 (PDT)
X-Received: by 2002:aca:bb56:0:b0:2ef:6652:5581 with SMTP id
l83-20020acabb56000000b002ef66525581mr679669oif.270.1648503589212; Mon, 28
Mar 2022 14:39:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 14:39:49 -0700 (PDT)
In-Reply-To: <d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:610f:3faa:f89:9948;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:610f:3faa:f89:9948
References: <t1c154$j5t$1@dont-email.me> <e185d33-8bd-e6c4-867f-ed5abbd6969a@elronnd.net>
<d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <373b3e6e-5e30-4fa9-a283-ad21f484363bn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Mar 2022 21:39:49 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Mon, 28 Mar 2022 21:39 UTC

On Monday, March 28, 2022 at 1:02:37 PM UTC-5, Quadibloc wrote:
> On Friday, March 25, 2022 at 4:22:59 AM UTC-6, Elijah Stone wrote:
> > The itanium had only an fp reciprocal; no division, that had to be done in
> > software.
>
> Yes, and that was a mistake when the Itanium did it, and it would be
> a mistake for any other processor intended for general-purpose use.
>
> Instructions take time to decode and fetch and so on. So if you
> have no divide instruction, there's no way that you're going to be
> able to do start one division going in the pipeline every single cycle.
> Which is what a decently fast processor ought to be able to do, at
> least as a peak rate.
<
Err, no...........
<
Division is not used often enough to warrant 1-cycle throughput.
Division has 15-20 cycle latency (64-bit, int or fp)
<
The cost to make reciprocation fully pipelined is about the same cost
as 6-7 FAMC units. The cost to make division fully pipelined is about
the same cost as 10-11 FMAC units.
>
> John Savard

Re: Approximate reciprocals

<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24512&group=comp.arch#24512

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:c447:0:b0:432:8ae6:aee with SMTP id t7-20020a0cc447000000b004328ae60aeemr23315649qvi.88.1648503894422;
Mon, 28 Mar 2022 14:44:54 -0700 (PDT)
X-Received: by 2002:a05:6870:1692:b0:dd:9dc0:1747 with SMTP id
j18-20020a056870169200b000dd9dc01747mr570507oae.205.1648503894178; Mon, 28
Mar 2022 14:44:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 14:44:53 -0700 (PDT)
In-Reply-To: <cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:610f:3faa:f89:9948;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:610f:3faa:f89:9948
References: <t1c154$j5t$1@dont-email.me> <81bd21bb-8e02-4629-9749-d846be44ef43n@googlegroups.com>
<t1d0r8$o4v$1@newsreader4.netcologne.de> <903965ad-5226-49d5-9883-57b1bc836fd7n@googlegroups.com>
<t1dckv$u7$2@newsreader4.netcologne.de> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com> <t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com> <285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de> <b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Mar 2022 21:44:54 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Mon, 28 Mar 2022 21:44 UTC

On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
> On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> > On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> > > Michael S <already...@yahoo.com> schrieb:
> > > > On my system(s) I see another very strange timing effect:
> > > > The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> > > > it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> > > Source for the sqrt routine in libquadmath is here:
> > >
> > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> > >
> > > Hmm.. they first do a test if it is within double precision range,
> > > with two Newton iterations, then a test if it is within long double
> > > range with a single Newton iteration, and if it is outside then
> > > they pick apart the number and run two Newton iterations.
> > >
> > > Were the numbers outside the range of double inside the range
> > > of long double?
> > Yes.
> > I didn't test "outside long double".
> > If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> > > That could explain things (and suggest
> > > an improvement).
> >
> > I think, the whole routine needs rewrite, rather than improvements.
> > The original author probably was not thinking very hard.
> Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
> For example,
> sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
> The correct answer is 1.03168097468685393293622695023074831927
<
This is a 1ULP error, perhaps simply improperly rounded (or the 1:million cases that require the second N-R
iteration after you have 111 accurate fraction bits.
<
BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).
>
> I wonder if on Linux result is also wrong.

Re: Approximate reciprocals

<cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24513&group=comp.arch#24513

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:508:b0:2e1:deae:22bd with SMTP id l8-20020a05622a050800b002e1deae22bdmr24185409qtx.597.1648507784212;
Mon, 28 Mar 2022 15:49:44 -0700 (PDT)
X-Received: by 2002:a4a:b307:0:b0:324:c7f2:386 with SMTP id
m7-20020a4ab307000000b00324c7f20386mr7961526ooo.18.1648507783978; Mon, 28 Mar
2022 15:49:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 15:49:43 -0700 (PDT)
In-Reply-To: <6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:7408:496f:7430:392a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:7408:496f:7430:392a
References: <t1c154$j5t$1@dont-email.me> <81bd21bb-8e02-4629-9749-d846be44ef43n@googlegroups.com>
<t1d0r8$o4v$1@newsreader4.netcologne.de> <903965ad-5226-49d5-9883-57b1bc836fd7n@googlegroups.com>
<t1dckv$u7$2@newsreader4.netcologne.de> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com> <t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com> <285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de> <b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com> <6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 28 Mar 2022 22:49:44 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 50
 by: Michael S - Mon, 28 Mar 2022 22:49 UTC

On Tuesday, March 29, 2022 at 12:44:55 AM UTC+3, MitchAlsup wrote:
> On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
> > On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> > > On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> > > > Michael S <already...@yahoo.com> schrieb:
> > > > > On my system(s) I see another very strange timing effect:
> > > > > The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> > > > > it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> > > > Source for the sqrt routine in libquadmath is here:
> > > >
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> > > >
> > > > Hmm.. they first do a test if it is within double precision range,
> > > > with two Newton iterations, then a test if it is within long double
> > > > range with a single Newton iteration, and if it is outside then
> > > > they pick apart the number and run two Newton iterations.
> > > >
> > > > Were the numbers outside the range of double inside the range
> > > > of long double?
> > > Yes.
> > > I didn't test "outside long double".
> > > If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> > > > That could explain things (and suggest
> > > > an improvement).
> > >
> > > I think, the whole routine needs rewrite, rather than improvements.
> > > The original author probably was not thinking very hard.
> > Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
> > For example,
> > sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
> > The correct answer is 1.03168097468685393293622695023074831927
> <
> This is a 1ULP error, perhaps simply improperly rounded

Of course, it's 1ULP.
But, according to my understanding, sqrt() is one of those very few primitives for which IEEE-754 not just recommends
correct rounding, but requires correct rounding.

> (or the 1:million cases that require the second N-R
> iteration after you have 111 accurate fraction bits.

Unlikely.

> <
> BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).

Didn't they provide compiler option for exact division?

> >
> > I wonder if on Linux result is also wrong.

Re: Approximate reciprocals

<a2fd15f6-c496-467f-9044-b68c2d60f6dan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24514&group=comp.arch#24514

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:5d1:b0:2e0:70c7:1678 with SMTP id d17-20020a05622a05d100b002e070c71678mr24757024qtb.43.1648509193412;
Mon, 28 Mar 2022 16:13:13 -0700 (PDT)
X-Received: by 2002:a05:6808:2018:b0:2ec:c22b:15b8 with SMTP id
q24-20020a056808201800b002ecc22b15b8mr823115oiw.136.1648509193118; Mon, 28
Mar 2022 16:13:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 16:13:12 -0700 (PDT)
In-Reply-To: <cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:7408:496f:7430:392a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:7408:496f:7430:392a
References: <t1c154$j5t$1@dont-email.me> <81bd21bb-8e02-4629-9749-d846be44ef43n@googlegroups.com>
<t1d0r8$o4v$1@newsreader4.netcologne.de> <903965ad-5226-49d5-9883-57b1bc836fd7n@googlegroups.com>
<t1dckv$u7$2@newsreader4.netcologne.de> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com> <t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com> <285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de> <b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com> <6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
<cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a2fd15f6-c496-467f-9044-b68c2d60f6dan@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Mon, 28 Mar 2022 23:13:13 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 47
 by: Michael S - Mon, 28 Mar 2022 23:13 UTC

On Tuesday, March 29, 2022 at 1:49:45 AM UTC+3, Michael S wrote:
> On Tuesday, March 29, 2022 at 12:44:55 AM UTC+3, MitchAlsup wrote:
> > On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
> > > On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> > > > On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> > > > > Michael S <already...@yahoo.com> schrieb:
> > > > > > On my system(s) I see another very strange timing effect:
> > > > > > The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> > > > > > it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> > > > > Source for the sqrt routine in libquadmath is here:
> > > > >
> > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> > > > >
> > > > > Hmm.. they first do a test if it is within double precision range,
> > > > > with two Newton iterations, then a test if it is within long double
> > > > > range with a single Newton iteration, and if it is outside then
> > > > > they pick apart the number and run two Newton iterations.
> > > > >
> > > > > Were the numbers outside the range of double inside the range
> > > > > of long double?
> > > > Yes.
> > > > I didn't test "outside long double".
> > > > If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> > > > > That could explain things (and suggest
> > > > > an improvement).
> > > >
> > > > I think, the whole routine needs rewrite, rather than improvements.
> > > > The original author probably was not thinking very hard.
> > > Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
> > > For example,
> > > sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
> > > The correct answer is 1.03168097468685393293622695023074831927
> > <
> > This is a 1ULP error, perhaps simply improperly rounded
> Of course, it's 1ULP.
> But, according to my understanding, sqrt() is one of those very few primitives for which IEEE-754 not just recommends
> correct rounding, but requires correct rounding.
> > (or the 1:million cases that require the second N-R
> > iteration after you have 111 accurate fraction bits.
> Unlikely.

I checked it. An error in this case = 0.7499966 ULP.

> > <
> > BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).
> Didn't they provide compiler option for exact division?
> > >
> > > I wonder if on Linux result is also wrong.

Re: Approximate reciprocals

<b3183ce1-5c2a-4270-8c24-2176c200e14dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24515&group=comp.arch#24515

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:240a:b0:443:5288:f135 with SMTP id fv10-20020a056214240a00b004435288f135mr8072353qvb.77.1648517878389;
Mon, 28 Mar 2022 18:37:58 -0700 (PDT)
X-Received: by 2002:a05:6830:22ea:b0:5b2:35c1:de3c with SMTP id
t10-20020a05683022ea00b005b235c1de3cmr147717otc.282.1648517878117; Mon, 28
Mar 2022 18:37:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 28 Mar 2022 18:37:57 -0700 (PDT)
In-Reply-To: <373b3e6e-5e30-4fa9-a283-ad21f484363bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:447:ed3a:39b2:a631;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:447:ed3a:39b2:a631
References: <t1c154$j5t$1@dont-email.me> <e185d33-8bd-e6c4-867f-ed5abbd6969a@elronnd.net>
<d543dc94-203b-4bda-a16f-989e6b47c4f1n@googlegroups.com> <373b3e6e-5e30-4fa9-a283-ad21f484363bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b3183ce1-5c2a-4270-8c24-2176c200e14dn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 29 Mar 2022 01:37:58 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Tue, 29 Mar 2022 01:37 UTC

On Monday, March 28, 2022 at 3:39:51 PM UTC-6, MitchAlsup wrote:
> On Monday, March 28, 2022 at 1:02:37 PM UTC-5, Quadibloc wrote:
> > On Friday, March 25, 2022 at 4:22:59 AM UTC-6, Elijah Stone wrote:
> > > The itanium had only an fp reciprocal; no division, that had to be done in
> > > software.
> >
> > Yes, and that was a mistake when the Itanium did it, and it would be
> > a mistake for any other processor intended for general-purpose use.
> >
> > Instructions take time to decode and fetch and so on. So if you
> > have no divide instruction, there's no way that you're going to be
> > able to do start one division going in the pipeline every single cycle.
> > Which is what a decently fast processor ought to be able to do, at
> > least as a peak rate.
> <
> Err, no...........
> <
> Division is not used often enough to warrant 1-cycle throughput.
> Division has 15-20 cycle latency (64-bit, int or fp)
> <
> The cost to make reciprocation fully pipelined is about the same cost
> as 6-7 FAMC units. The cost to make division fully pipelined is about
> the same cost as 10-11 FMAC units.

There are definitely some hidden assumptions in my statement.

Obviously, letting a CPU start a division every cycle is going
to use die area. If die area is a crippling constraint, such that
each feature has to be weighed against, 'would I be better in
leaving out this feature, so I could put more cores on the die',
then indeed one division per cycle is an extravagance.

I, on the other hand, am of this opinion:

Once you can put *one* core on a die, that's all you need.

If you want 16 cores, you can put sixteen sockets on your
motherboard, or 16 dies on a substrate... or four sockets
on your motherboard, and four dies on each substrate.

You can't get away with having less than _one_
core, and adequate L1 cache, on a die. That's it.

L2 cache is nice also, however.

Having multiple cores more tightly coupled by putting
them on the same die and sharing a cache produces,
I had thought, so little performance benefit that it's
far outweighed by making bigger cores capable of
doing things faster.

And if Intel and AMD stand up to Microsoft, and say that
it needs to change its licensing policy, so you
can run Windows Home on motherboards with
four or eight CPU chips on them, not just one, I think that's
overdue, and a simple way to get greater performance
without requiring Moore's Law to override the laws
of physics.

John Savard

Re: Approximate reciprocals

<t1uvnn$1tc8$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24516&group=comp.arch#24516

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Tue, 29 Mar 2022 14:57:26 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t1uvnn$1tc8$1@gioia.aioe.org>
References: <t1c154$j5t$1@dont-email.me>
<t1dckv$u7$2@newsreader4.netcologne.de>
<526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="62856"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Tue, 29 Mar 2022 12:57 UTC

MitchAlsup wrote:
> On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
>> On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
>>> On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
>>>> Michael S <already...@yahoo.com> schrieb:
>>>>> On my system(s) I see another very strange timing effect:
>>>>> The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
>>>>> it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
>>>> Source for the sqrt routine in libquadmath is here:
>>>>
>>>> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
>>>>
>>>> Hmm.. they first do a test if it is within double precision range,
>>>> with two Newton iterations, then a test if it is within long double
>>>> range with a single Newton iteration, and if it is outside then
>>>> they pick apart the number and run two Newton iterations.
>>>>
>>>> Were the numbers outside the range of double inside the range
>>>> of long double?
>>> Yes.
>>> I didn't test "outside long double".
>>> If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
>>>> That could explain things (and suggest
>>>> an improvement).
>>>
>>> I think, the whole routine needs rewrite, rather than improvements.
>>> The original author probably was not thinking very hard.
>> Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
>> For example,
>> sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
>> The correct answer is 1.03168097468685393293622695023074831927
> <
> This is a 1ULP error, perhaps simply improperly rounded (or the 1:million cases that require the second N-R
> iteration after you have 111 accurate fraction bits.
> <
> BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).

Do you have a numeric example? Any error at all in FDIV is simply a bug,
except for subnormal results where rounding errors got grandfathered in.
(I.e. it is explicitly allowed to do the rounding before you realize
that the result is subnormal, so that you have to denormalize it,
leading to double rounding (or maybe even truncation?)

According to the intentions of the full standard, there can be no doubt
that you should first align the result properly (usually normalize it,
but for subnormal you use the temporary 1 bit to stop the normalization
process at the correct amount of shifting), then you examine the
SIGN/LSB/GUARD/STICKY bits to determine the rounding.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<118dffef-fc0c-4607-9528-dc8afa64adafn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24519&group=comp.arch#24519

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:13ca:b0:2e1:a52f:18f4 with SMTP id p10-20020a05622a13ca00b002e1a52f18f4mr28249849qtk.412.1648570831085;
Tue, 29 Mar 2022 09:20:31 -0700 (PDT)
X-Received: by 2002:a05:6870:9604:b0:de:a876:fbba with SMTP id
d4-20020a056870960400b000dea876fbbamr226043oaq.239.1648570830695; Tue, 29 Mar
2022 09:20:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Mar 2022 09:20:30 -0700 (PDT)
In-Reply-To: <t1uvnn$1tc8$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3df4:c3d8:fde5:d649;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3df4:c3d8:fde5:d649
References: <t1c154$j5t$1@dont-email.me> <t1dckv$u7$2@newsreader4.netcologne.de>
<526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com> <t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com> <t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com> <t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com> <5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de> <ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de> <c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com> <t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com> <cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com> <t1uvnn$1tc8$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <118dffef-fc0c-4607-9528-dc8afa64adafn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Mar 2022 16:20:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 55
 by: MitchAlsup - Tue, 29 Mar 2022 16:20 UTC

On Tuesday, March 29, 2022 at 7:57:35 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
> >> On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> >>> On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> >>>> Michael S <already...@yahoo.com> schrieb:
> >>>>> On my system(s) I see another very strange timing effect:
> >>>>> The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> >>>>> it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> >>>> Source for the sqrt routine in libquadmath is here:
> >>>>
> >>>> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> >>>>
> >>>> Hmm.. they first do a test if it is within double precision range,
> >>>> with two Newton iterations, then a test if it is within long double
> >>>> range with a single Newton iteration, and if it is outside then
> >>>> they pick apart the number and run two Newton iterations.
> >>>>
> >>>> Were the numbers outside the range of double inside the range
> >>>> of long double?
> >>> Yes.
> >>> I didn't test "outside long double".
> >>> If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> >>>> That could explain things (and suggest
> >>>> an improvement).
> >>>
> >>> I think, the whole routine needs rewrite, rather than improvements.
> >>> The original author probably was not thinking very hard.
> >> Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
> >> For example,
> >> sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
> >> The correct answer is 1.03168097468685393293622695023074831927
> > <
> > This is a 1ULP error, perhaps simply improperly rounded (or the 1:million cases that require the second N-R
> > iteration after you have 111 accurate fraction bits.
> > <
> > BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).
> Do you have a numeric example? Any error at all in FDIV is simply a bug,
<
https://www.cl.cam.ac.uk/~jrh13/slides/gelato-25may05/slides.pdf
<
> except for subnormal results where rounding errors got grandfathered in.
> (I.e. it is explicitly allowed to do the rounding before you realize
> that the result is subnormal, so that you have to denormalize it,
> leading to double rounding (or maybe even truncation?)
>
> According to the intentions of the full standard, there can be no doubt
> that you should first align the result properly (usually normalize it,
> but for subnormal you use the temporary 1 bit to stop the normalization
> process at the correct amount of shifting), then you examine the
> SIGN/LSB/GUARD/STICKY bits to determine the rounding.
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<23d05021-6bbf-4548-b8f4-6a2b39facdd8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24520&group=comp.arch#24520

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2466:b0:441:2daa:4ab1 with SMTP id im6-20020a056214246600b004412daa4ab1mr27441345qvb.12.1648572105201;
Tue, 29 Mar 2022 09:41:45 -0700 (PDT)
X-Received: by 2002:a05:6808:2185:b0:2d9:ebf0:fb66 with SMTP id
be5-20020a056808218500b002d9ebf0fb66mr5319oib.69.1648572104795; Tue, 29 Mar
2022 09:41:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Mar 2022 09:41:44 -0700 (PDT)
In-Reply-To: <118dffef-fc0c-4607-9528-dc8afa64adafn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <t1c154$j5t$1@dont-email.me> <t1dckv$u7$2@newsreader4.netcologne.de>
<526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com> <t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com> <t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com> <t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com> <5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de> <ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de> <c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com> <t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com> <cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com> <t1uvnn$1tc8$1@gioia.aioe.org>
<118dffef-fc0c-4607-9528-dc8afa64adafn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <23d05021-6bbf-4548-b8f4-6a2b39facdd8n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 29 Mar 2022 16:41:45 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 60
 by: Michael S - Tue, 29 Mar 2022 16:41 UTC

On Tuesday, March 29, 2022 at 7:20:32 PM UTC+3, MitchAlsup wrote:
> On Tuesday, March 29, 2022 at 7:57:35 AM UTC-5, Terje Mathisen wrote:
> > MitchAlsup wrote:
> > > On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
> > >> On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> > >>> On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> > >>>> Michael S <already...@yahoo.com> schrieb:
> > >>>>> On my system(s) I see another very strange timing effect:
> > >>>>> The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> > >>>>> it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> > >>>> Source for the sqrt routine in libquadmath is here:
> > >>>>
> > >>>> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> > >>>>
> > >>>> Hmm.. they first do a test if it is within double precision range,
> > >>>> with two Newton iterations, then a test if it is within long double
> > >>>> range with a single Newton iteration, and if it is outside then
> > >>>> they pick apart the number and run two Newton iterations.
> > >>>>
> > >>>> Were the numbers outside the range of double inside the range
> > >>>> of long double?
> > >>> Yes.
> > >>> I didn't test "outside long double".
> > >>> If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> > >>>> That could explain things (and suggest
> > >>>> an improvement).
> > >>>
> > >>> I think, the whole routine needs rewrite, rather than improvements.
> > >>> The original author probably was not thinking very hard.
> > >> Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
> > >> For example,
> > >> sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
> > >> The correct answer is 1.03168097468685393293622695023074831927
> > > <
> > > This is a 1ULP error, perhaps simply improperly rounded (or the 1:million cases that require the second N-R
> > > iteration after you have 111 accurate fraction bits.
> > > <
> > > BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).
> > Do you have a numeric example? Any error at all in FDIV is simply a bug,
> <
> https://www.cl.cam.ac.uk/~jrh13/slides/gelato-25may05/slides.pdf

On which page?
So far I don't see anything suggesting that compiler does non-perfectly rounded division.

> <
> > except for subnormal results where rounding errors got grandfathered in.
> > (I.e. it is explicitly allowed to do the rounding before you realize
> > that the result is subnormal, so that you have to denormalize it,
> > leading to double rounding (or maybe even truncation?)
> >
> > According to the intentions of the full standard, there can be no doubt
> > that you should first align the result properly (usually normalize it,
> > but for subnormal you use the temporary 1 bit to stop the normalization
> > process at the correct amount of shifting), then you examine the
> > SIGN/LSB/GUARD/STICKY bits to determine the rounding.
> > Terje
> >
> > --
> > - <Terje.Mathisen at tmsw.no>
> > "almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<t1vd17$5bj$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24521&group=comp.arch#24521

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Tue, 29 Mar 2022 16:44:23 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t1vd17$5bj$1@newsreader4.netcologne.de>
References: <t1c154$j5t$1@dont-email.me> <t1qf0u$oko$1@dont-email.me>
<t1qkql$ui0$1@newsreader4.netcologne.de>
<394168eb-53ed-49c2-a349-4035c3177361n@googlegroups.com>
<t1rm34$pg9$1@gioia.aioe.org>
<7029a173-963d-402b-a184-642120b5e1b8n@googlegroups.com>
<4bdfaba8-898f-4c1e-8ca1-234bf4d3ffc8n@googlegroups.com>
Injection-Date: Tue, 29 Mar 2022 16:44:23 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:30bd:0:7285:c2ff:fe6c:992d";
logging-data="5491"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 29 Mar 2022 16:44 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Monday, March 28, 2022 at 3:19:46 AM UTC-5, Michael S wrote:
>> On Monday, March 28, 2022 at 9:54:31 AM UTC+3, Terje Mathisen wrote:
>> > Michael S wrote:
>> > > On Monday, March 28, 2022 at 12:26:48 AM UTC+3, Thomas Koenig wrote:
>> > >> Marcus <m.de...@this.bitsnbites.eu> schrieb:
>> > >>> Today I found a pretty nice looking 2nd order polynomial for
>> > >>> approximating 1/x in the interval [1.0, 2.0):
>> > >>>
>> > >>> (2*x^2 - 9*x + 13) / 6
>> > >> Interesting. Could be useful for the start of a Newton iteration
>> > >> for a reciprocal, where the iteration formula would be
>> > >> x_n+1 = 2*x_n - a * x_n^2 (division-free), or x_n*(2-a*x_n).
>> > >>>
>> > >>> Maybe this is known stuff, and I'm not sure that it would be useful for
>> > >>> anything, but I thought I'd share it anyway.
>> > >>>
>> > >>> The accuracy that you get is roughly 6-7 bits, so it's not better than a
>> > >>> LUT.
>> > >>>
>> > >>> The coefficients are "nice" from an implementation perspective, if you
>> > >>> were to hard-wire the polynomial. It should be possible to implement the
>> > >>> dividend with a single multiplier and a few integer adders. The 3 in the
>> > >>> divisor is muddying the waters, though. Is there a fast way to divide by
>> > >>> three?
>> > >> The standard trick of dividing by three in fixed point: For
>> > >> 32-bit numbers, multiply the number by the magic number 2863311531
>> > >> (0xAAAAAAAB) and shift right by 33 bits (in practice, use a
>> > >> "multiply high" and shift right one bit). For floating point
>> > >> numbers, this would have to be adjusted somewhat.
>> > >>
>> > >
>> > > And that causes the whole calculation to be of approximately the same
>> > > computational complexity as 2nd-order polynomials with arbitrary 32-bit coefficients,
>> > > potentially much better coefficients than those suggested.
>> > > The suggestion of Marcus is very naive.
>> > > Any suggestion to do sqrt or rsqrt approximation without lookup table is a loose proposition.
>> > > Even very small table at the first step, like 32 or 64 entries, helps a lot and saves many steps
>> > > down the road.
>> > > The only exception to that could be when you do very wide SIMD that has no tools for even small lookups.
>> > > Or when you don't care about speed, but that case is sort of off topic.
>> > If you are doing SIMD, but without a reciprocal/reciprocal square root
>> > lookup opcode, then I would use the infamous invsqrt() trick.
><
>> Magic constant and such?
>> I don't think it's a good idea if you seek top performance.
>> One thing to realize is that if you went to trouble of using rsqrt() NR steps
>> based on custom polynomials, with different coefficients on each step, then it
>> is better to keep error one-sided on all steps except possibly the last one.
><
> Notice that the error term for N-R iterations flip the sign every iteration.
><
>> Apart from of being slightly more precise, such strategy has advantage of being
>> simpler to do with unsigned integer math.
><
> You need to choose the approximat such that the error is uniformly positive after
> the last iteration. This allows you to choose from {r+0, r+1, and r+2} for the properly
> rounded result; which is easier than choosing {r-1, r+0, r+1} for the properly rounded
> result.
><
>> > With the
>> > latest Cheby-style tweaks to all the constants, this delivers 10+ bits
>> > after a single iteration.
>> >
>> > This is significantly better than an in-register 4-bit/16-term table lookup.
>> >
>> > Anyway, I am sure Mitch has much better polynomials to do it all
>> > directly. :-)
>> Mitch appears to like Chebychev polynomials.
>> Likely, because they are easy to find out :-)
><
> Once you figure out how to calculate them, they are, indeed, easy to determine.
>>
>> They are very close to optimal (in sense of minimizing two-sided error, but
>> probably can be easily offseted to minimize one-sided error too) when the
>> interval of input error is already small. But when the interval is large, as it
>> is a case on the first step without lookup table and, may be, on the second
>> step, when the 1st step was of low order, then Chebychev polynomials are
>> measurably suboptimal vs true equiripple polys. The later can be found by Remez
>> exchange algorithm.
><
> Cheby polynomials are within 1 ULP of optimal, where Remez is better, it is
> a lot harder to calculate, and if you are working to the last bit of precision,
> even calculating Remez requires more precision. Cheby does not have this
> property.

I've looked at the optimum Remez polynomial for 1/x in the range
of 1..2. It is 70/33 - 16/11 * x + 32/99 * x**2, with a weight of x
(so optimizing for the relative error).

The maximum relative error is 1/99, reached at four points in the
interval [1..2], so around 6.62 bits of minimum accuracy.

Re: Approximate reciprocals

<1d99080f-3c84-4a44-b2cf-271c2f3f7e90n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24524&group=comp.arch#24524

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:454b:b0:67e:4202:32b8 with SMTP id u11-20020a05620a454b00b0067e420232b8mr21967504qkp.278.1648577684680;
Tue, 29 Mar 2022 11:14:44 -0700 (PDT)
X-Received: by 2002:a05:6870:538d:b0:de:aa91:898e with SMTP id
h13-20020a056870538d00b000deaa91898emr259768oan.54.1648577684337; Tue, 29 Mar
2022 11:14:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Mar 2022 11:14:44 -0700 (PDT)
In-Reply-To: <t1vd17$5bj$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:3df4:c3d8:fde5:d649;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:3df4:c3d8:fde5:d649
References: <t1c154$j5t$1@dont-email.me> <t1qf0u$oko$1@dont-email.me>
<t1qkql$ui0$1@newsreader4.netcologne.de> <394168eb-53ed-49c2-a349-4035c3177361n@googlegroups.com>
<t1rm34$pg9$1@gioia.aioe.org> <7029a173-963d-402b-a184-642120b5e1b8n@googlegroups.com>
<4bdfaba8-898f-4c1e-8ca1-234bf4d3ffc8n@googlegroups.com> <t1vd17$5bj$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1d99080f-3c84-4a44-b2cf-271c2f3f7e90n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Mar 2022 18:14:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 144
 by: MitchAlsup - Tue, 29 Mar 2022 18:14 UTC

On Tuesday, March 29, 2022 at 11:45:08 AM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > On Monday, March 28, 2022 at 3:19:46 AM UTC-5, Michael S wrote:
> >> On Monday, March 28, 2022 at 9:54:31 AM UTC+3, Terje Mathisen wrote:
> >> > Michael S wrote:
> >> > > On Monday, March 28, 2022 at 12:26:48 AM UTC+3, Thomas Koenig wrote:
> >> > >> Marcus <m.de...@this.bitsnbites.eu> schrieb:
> >> > >>> Today I found a pretty nice looking 2nd order polynomial for
> >> > >>> approximating 1/x in the interval [1.0, 2.0):
> >> > >>>
> >> > >>> (2*x^2 - 9*x + 13) / 6
> >> > >> Interesting. Could be useful for the start of a Newton iteration
> >> > >> for a reciprocal, where the iteration formula would be
> >> > >> x_n+1 = 2*x_n - a * x_n^2 (division-free), or x_n*(2-a*x_n).
> >> > >>>
> >> > >>> Maybe this is known stuff, and I'm not sure that it would be useful for
> >> > >>> anything, but I thought I'd share it anyway.
> >> > >>>
> >> > >>> The accuracy that you get is roughly 6-7 bits, so it's not better than a
> >> > >>> LUT.
> >> > >>>
> >> > >>> The coefficients are "nice" from an implementation perspective, if you
> >> > >>> were to hard-wire the polynomial. It should be possible to implement the
> >> > >>> dividend with a single multiplier and a few integer adders. The 3 in the
> >> > >>> divisor is muddying the waters, though. Is there a fast way to divide by
> >> > >>> three?
> >> > >> The standard trick of dividing by three in fixed point: For
> >> > >> 32-bit numbers, multiply the number by the magic number 2863311531
> >> > >> (0xAAAAAAAB) and shift right by 33 bits (in practice, use a
> >> > >> "multiply high" and shift right one bit). For floating point
> >> > >> numbers, this would have to be adjusted somewhat.
> >> > >>
> >> > >
> >> > > And that causes the whole calculation to be of approximately the same
> >> > > computational complexity as 2nd-order polynomials with arbitrary 32-bit coefficients,
> >> > > potentially much better coefficients than those suggested.
> >> > > The suggestion of Marcus is very naive.
> >> > > Any suggestion to do sqrt or rsqrt approximation without lookup table is a loose proposition.
> >> > > Even very small table at the first step, like 32 or 64 entries, helps a lot and saves many steps
> >> > > down the road.
> >> > > The only exception to that could be when you do very wide SIMD that has no tools for even small lookups.
> >> > > Or when you don't care about speed, but that case is sort of off topic.
> >> > If you are doing SIMD, but without a reciprocal/reciprocal square root
> >> > lookup opcode, then I would use the infamous invsqrt() trick.
> ><
> >> Magic constant and such?
> >> I don't think it's a good idea if you seek top performance.
> >> One thing to realize is that if you went to trouble of using rsqrt() NR steps
> >> based on custom polynomials, with different coefficients on each step, then it
> >> is better to keep error one-sided on all steps except possibly the last one.
> ><
> > Notice that the error term for N-R iterations flip the sign every iteration.
> ><
> >> Apart from of being slightly more precise, such strategy has advantage of being
> >> simpler to do with unsigned integer math.
> ><
> > You need to choose the approximat such that the error is uniformly positive after
> > the last iteration. This allows you to choose from {r+0, r+1, and r+2} for the properly
> > rounded result; which is easier than choosing {r-1, r+0, r+1} for the properly rounded
> > result.
> ><
> >> > With the
> >> > latest Cheby-style tweaks to all the constants, this delivers 10+ bits
> >> > after a single iteration.
> >> >
> >> > This is significantly better than an in-register 4-bit/16-term table lookup.
> >> >
> >> > Anyway, I am sure Mitch has much better polynomials to do it all
> >> > directly. :-)
> >> Mitch appears to like Chebychev polynomials.
> >> Likely, because they are easy to find out :-)
> ><
> > Once you figure out how to calculate them, they are, indeed, easy to determine.
> >>
> >> They are very close to optimal (in sense of minimizing two-sided error, but
> >> probably can be easily offseted to minimize one-sided error too) when the
> >> interval of input error is already small. But when the interval is large, as it
> >> is a case on the first step without lookup table and, may be, on the second
> >> step, when the 1st step was of low order, then Chebychev polynomials are
> >> measurably suboptimal vs true equiripple polys. The later can be found by Remez
> >> exchange algorithm.
> ><
> > Cheby polynomials are within 1 ULP of optimal, where Remez is better, it is
> > a lot harder to calculate, and if you are working to the last bit of precision,
> > even calculating Remez requires more precision. Cheby does not have this
> > property.
> I've looked at the optimum Remez polynomial for 1/x in the range
> of 1..2. It is 70/33 - 16/11 * x + 32/99 * x**2, with a weight of x
> (so optimizing for the relative error).
>
> The maximum relative error is 1/99, reached at four points in the
> interval [1..2], so around 6.62 bits of minimum accuracy.
<
I stated way above::---------------------------------------------------------------------------------------
Equivalent to::
<
..33333×x^2 - 1.5×x + 2.1666
<
The Chebychev 2nd order Coefficients on the interval [1..2) are:
<
0.32323232×x^2 -0.48484848×x + 0.66666667
<
with 6.63 bits of accuracy.
<-------------------------------------------------------------------------------------------------------------------
Why does your Remez get less precision than Chebychev ?

Re: Approximate reciprocals

<t1vkm4$ar2$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24525&group=comp.arch#24525

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Tue, 29 Mar 2022 18:55:00 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t1vkm4$ar2$1@newsreader4.netcologne.de>
References: <t1c154$j5t$1@dont-email.me> <t1qf0u$oko$1@dont-email.me>
<t1qkql$ui0$1@newsreader4.netcologne.de>
<394168eb-53ed-49c2-a349-4035c3177361n@googlegroups.com>
<t1rm34$pg9$1@gioia.aioe.org>
<7029a173-963d-402b-a184-642120b5e1b8n@googlegroups.com>
<4bdfaba8-898f-4c1e-8ca1-234bf4d3ffc8n@googlegroups.com>
<t1vd17$5bj$1@newsreader4.netcologne.de>
<1d99080f-3c84-4a44-b2cf-271c2f3f7e90n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 29 Mar 2022 18:55:00 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:30bd:0:7285:c2ff:fe6c:992d";
logging-data="11106"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 29 Mar 2022 18:55 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Tuesday, March 29, 2022 at 11:45:08 AM UTC-5, Thomas Koenig wrote:

>> I've looked at the optimum Remez polynomial for 1/x in the range
>> of 1..2. It is 70/33 - 16/11 * x + 32/99 * x**2, with a weight of x
>> (so optimizing for the relative error).
>>
>> The maximum relative error is 1/99, reached at four points in the
>> interval [1..2], so around 6.62 bits of minimum accuracy.
><
> I stated way above::---------------------------------------------------------------------------------------
> Equivalent to::
><
> .33333×x^2 - 1.5×x + 2.1666
><
> The Chebychev 2nd order Coefficients on the interval [1..2) are:
><
> 0.32323232×x^2 -0.48484848×x + 0.66666667

There is somethig wrong with your formula. f(1) would be
0.32323232 - 0.48484848 + 0.66666667 = 0.50505051, which does
not even closely approximate 1/1 = 1. Is there some rescaling
somewhere?

><
> with 6.63 bits of accuracy.
><-------------------------------------------------------------------------------------------------------------------
> Why does your Remez get less precision than Chebychev ?

Absolute or relative precision? As I wrote above, I used relative
accuracy (a weight of x), and the maximum relative error is indeed
1/99 = 0.01010101.. A low bound for the relative error is better
as a starting point for Newton-Raphson, for example.

Optimizting for absolute error would have given a different result.

Re: Approximate reciprocals

<t1vnme$dhs$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24526&group=comp.arch#24526

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Tue, 29 Mar 2022 19:46:23 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t1vnme$dhs$1@newsreader4.netcologne.de>
References: <t1c154$j5t$1@dont-email.me>
<526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
<cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
Injection-Date: Tue, 29 Mar 2022 19:46:23 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:30bd:0:7285:c2ff:fe6c:992d";
logging-data="13884"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 29 Mar 2022 19:46 UTC

Michael S <already5chosen@yahoo.com> schrieb:
> On Tuesday, March 29, 2022 at 12:44:55 AM UTC+3, MitchAlsup wrote:
>> On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
>> > On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
>> > > On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
>> > > > Michael S <already...@yahoo.com> schrieb:
>> > > > > On my system(s) I see another very strange timing effect:
>> > > > > The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
>> > > > > it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
>> > > > Source for the sqrt routine in libquadmath is here:
>> > > >
>> > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
>> > > >
>> > > > Hmm.. they first do a test if it is within double precision range,
>> > > > with two Newton iterations, then a test if it is within long double
>> > > > range with a single Newton iteration, and if it is outside then
>> > > > they pick apart the number and run two Newton iterations.
>> > > >
>> > > > Were the numbers outside the range of double inside the range
>> > > > of long double?
>> > > Yes.
>> > > I didn't test "outside long double".
>> > > If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
>> > > > That could explain things (and suggest
>> > > > an improvement).
>> > >
>> > > I think, the whole routine needs rewrite, rather than improvements.
>> > > The original author probably was not thinking very hard.
>> > Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
>> > For example,
>> > sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
>> > The correct answer is 1.03168097468685393293622695023074831927
>> <
>> This is a 1ULP error, perhaps simply improperly rounded
>
> Of course, it's 1ULP.
> But, according to my understanding, sqrt() is one of those very
> few primitives for which IEEE-754 not just recommends > correct
> rounding, but requires correct rounding.

This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 .

Re: Approximate reciprocals

<dc571956-dddd-469a-8b8e-30017e37d5bbn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24527&group=comp.arch#24527

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5fd2:0:b0:2e1:b346:7505 with SMTP id k18-20020ac85fd2000000b002e1b3467505mr30046965qta.94.1648588636911;
Tue, 29 Mar 2022 14:17:16 -0700 (PDT)
X-Received: by 2002:a4a:3f56:0:b0:324:bc64:6713 with SMTP id
x22-20020a4a3f56000000b00324bc646713mr1785294ooe.50.1648588636626; Tue, 29
Mar 2022 14:17:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Mar 2022 14:17:16 -0700 (PDT)
In-Reply-To: <t1vkm4$ar2$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cd31:c03:5819:947e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cd31:c03:5819:947e
References: <t1c154$j5t$1@dont-email.me> <t1qf0u$oko$1@dont-email.me>
<t1qkql$ui0$1@newsreader4.netcologne.de> <394168eb-53ed-49c2-a349-4035c3177361n@googlegroups.com>
<t1rm34$pg9$1@gioia.aioe.org> <7029a173-963d-402b-a184-642120b5e1b8n@googlegroups.com>
<4bdfaba8-898f-4c1e-8ca1-234bf4d3ffc8n@googlegroups.com> <t1vd17$5bj$1@newsreader4.netcologne.de>
<1d99080f-3c84-4a44-b2cf-271c2f3f7e90n@googlegroups.com> <t1vkm4$ar2$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dc571956-dddd-469a-8b8e-30017e37d5bbn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 29 Mar 2022 21:17:16 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 38
 by: MitchAlsup - Tue, 29 Mar 2022 21:17 UTC

On Tuesday, March 29, 2022 at 1:55:03 PM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > On Tuesday, March 29, 2022 at 11:45:08 AM UTC-5, Thomas Koenig wrote:
>
> >> I've looked at the optimum Remez polynomial for 1/x in the range
> >> of 1..2. It is 70/33 - 16/11 * x + 32/99 * x**2, with a weight of x
> >> (so optimizing for the relative error).
> >>
> >> The maximum relative error is 1/99, reached at four points in the
> >> interval [1..2], so around 6.62 bits of minimum accuracy.
> ><
> > I stated way above::---------------------------------------------------------------------------------------
> > Equivalent to::
> ><
> > .33333×x^2 - 1.5×x + 2.1666
> ><
> > The Chebychev 2nd order Coefficients on the interval [1..2) are:
> ><
> > 0.32323232×x^2 -0.48484848×x + 0.66666667
> There is somethig wrong with your formula. f(1) would be
> 0.32323232 - 0.48484848 + 0.66666667 = 0.50505051, which does
> not even closely approximate 1/1 = 1. Is there some rescaling
> somewhere?
> ><
> > with 6.63 bits of accuracy.
> ><-------------------------------------------------------------------------------------------------------------------
> > Why does your Remez get less precision than Chebychev ?
> Absolute or relative precision? As I wrote above, I used relative
<
Mine was relative (too)
<
> accuracy (a weight of x), and the maximum relative error is indeed
> 1/99 = 0.01010101.. A low bound for the relative error is better
> as a starting point for Newton-Raphson, for example.
>
> Optimizting for absolute error would have given a different result.

Re: Approximate reciprocals

<ba5b15b5-0d95-4ce3-bd65-19efbdada971n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24528&group=comp.arch#24528

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7d91:0:b0:2e0:6b65:c76c with SMTP id c17-20020ac87d91000000b002e06b65c76cmr29601967qtd.564.1648589138480;
Tue, 29 Mar 2022 14:25:38 -0700 (PDT)
X-Received: by 2002:a05:6870:9590:b0:de:27ca:c60c with SMTP id
k16-20020a056870959000b000de27cac60cmr702575oao.108.1648589138245; Tue, 29
Mar 2022 14:25:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Mar 2022 14:25:38 -0700 (PDT)
In-Reply-To: <t1vnme$dhs$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:7408:496f:7430:392a;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:7408:496f:7430:392a
References: <t1c154$j5t$1@dont-email.me> <526d6018-1e28-44f7-86e6-89ccbda1f663n@googlegroups.com>
<t1fmss$i30$2@newsreader4.netcologne.de> <5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de> <b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de> <4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com> <t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com> <t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com> <285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de> <b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com> <6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
<cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com> <t1vnme$dhs$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ba5b15b5-0d95-4ce3-bd65-19efbdada971n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 29 Mar 2022 21:25:38 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 60
 by: Michael S - Tue, 29 Mar 2022 21:25 UTC

On Tuesday, March 29, 2022 at 10:46:25 PM UTC+3, Thomas Koenig wrote:
> Michael S <already...@yahoo.com> schrieb:
> > On Tuesday, March 29, 2022 at 12:44:55 AM UTC+3, MitchAlsup wrote:
> >> On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
> >> > On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
> >> > > On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
> >> > > > Michael S <already...@yahoo.com> schrieb:
> >> > > > > On my system(s) I see another very strange timing effect:
> >> > > > > The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
> >> > > > > it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
> >> > > > Source for the sqrt routine in libquadmath is here:
> >> > > >
> >> > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
> >> > > >
> >> > > > Hmm.. they first do a test if it is within double precision range,
> >> > > > with two Newton iterations, then a test if it is within long double
> >> > > > range with a single Newton iteration, and if it is outside then
> >> > > > they pick apart the number and run two Newton iterations.
> >> > > >
> >> > > > Were the numbers outside the range of double inside the range
> >> > > > of long double?
> >> > > Yes.
> >> > > I didn't test "outside long double".
> >> > > If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
> >> > > > That could explain things (and suggest
> >> > > > an improvement).
> >> > >
> >> > > I think, the whole routine needs rewrite, rather than improvements.
> >> > > The original author probably was not thinking very hard.
> >> > Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
> >> > For example,
> >> > sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
> >> > The correct answer is 1.03168097468685393293622695023074831927
> >> <
> >> This is a 1ULP error, perhaps simply improperly rounded
> >
> > Of course, it's 1ULP.
> > But, according to my understanding, sqrt() is one of those very
> > few primitives for which IEEE-754 not just recommends > correct
> > rounding, but requires correct rounding.
> This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 .

Funny, tests vs MPFR is exactly what I was doing for a last hour.
But I didn't know about existence mpfr_get_float128 ().
Was converting back and force via strings. Like that:

static void to_mpfr(mpfr_t dst, __float128 src)
{ char buf[256];
quadmath_snprintf(buf, sizeof(buf), "%Qa", src);
mpfr_set_str(dst, buf, 0, GMP_RNDN);
}

My method is different, slower but more robust.
I start from pseudo-random float128 x, calculate res=sqrtq(x), then convert to MPFR
with 256-bit mantissa, calculate ref=mpfr_sqrt(), convert res to MPFR,
do a diff and compare a diff vs 0.5 ULP.

But at this point I am quite convinced of what I said couple of days ago: quadmath implementation
is hopeless and should be completely rewritten.

Re: Approximate reciprocals

<t20qrb$4lp$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24529&group=comp.arch#24529

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Wed, 30 Mar 2022 05:46:19 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t20qrb$4lp$1@newsreader4.netcologne.de>
References: <t1c154$j5t$1@dont-email.me> <t1qf0u$oko$1@dont-email.me>
<t1qkql$ui0$1@newsreader4.netcologne.de>
<394168eb-53ed-49c2-a349-4035c3177361n@googlegroups.com>
<t1rm34$pg9$1@gioia.aioe.org>
<7029a173-963d-402b-a184-642120b5e1b8n@googlegroups.com>
<4bdfaba8-898f-4c1e-8ca1-234bf4d3ffc8n@googlegroups.com>
<t1vd17$5bj$1@newsreader4.netcologne.de>
<1d99080f-3c84-4a44-b2cf-271c2f3f7e90n@googlegroups.com>
<t1vkm4$ar2$1@newsreader4.netcologne.de>
<dc571956-dddd-469a-8b8e-30017e37d5bbn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 30 Mar 2022 05:46:19 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:30bd:0:7285:c2ff:fe6c:992d";
logging-data="4793"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 30 Mar 2022 05:46 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Tuesday, March 29, 2022 at 1:55:03 PM UTC-5, Thomas Koenig wrote:
>> MitchAlsup <Mitch...@aol.com> schrieb:
>> > On Tuesday, March 29, 2022 at 11:45:08 AM UTC-5, Thomas Koenig wrote:
>>
>> >> I've looked at the optimum Remez polynomial for 1/x in the range
>> >> of 1..2. It is 70/33 - 16/11 * x + 32/99 * x**2, with a weight of x
>> >> (so optimizing for the relative error).
>> >>
>> >> The maximum relative error is 1/99, reached at four points in the
>> >> interval [1..2], so around 6.62 bits of minimum accuracy.
>> ><
>> > I stated way above::---------------------------------------------------------------------------------------
>> > Equivalent to::
>> ><
>> > .33333×x^2 - 1.5×x + 2.1666
>> ><
>> > The Chebychev 2nd order Coefficients on the interval [1..2) are:
>> ><
>> > 0.32323232×x^2 -0.48484848×x + 0.66666667
>> There is somethig wrong with your formula. f(1) would be
>> 0.32323232 - 0.48484848 + 0.66666667 = 0.50505051, which does
>> not even closely approximate 1/1 = 1. Is there some rescaling
>> somewhere?
>> ><
>> > with 6.63 bits of accuracy.
>> ><-------------------------------------------------------------------------------------------------------------------
>> > Why does your Remez get less precision than Chebychev ?
>> Absolute or relative precision? As I wrote above, I used relative
><
> Mine was relative (too)

Easy enough to check. What was the actual formula you
arrived at? Not the one you wrote above, obviously.

(And for 6.63 vs. 6.62 - I rounded down log[2](1/99) not to
overstate the accuracy :-)

Re: Approximate reciprocals

<t20sj1$1dtk$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24530&group=comp.arch#24530

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Wed, 30 Mar 2022 08:16:00 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t20sj1$1dtk$1@gioia.aioe.org>
References: <t1c154$j5t$1@dont-email.me>
<t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
<t1uvnn$1tc8$1@gioia.aioe.org>
<118dffef-fc0c-4607-9528-dc8afa64adafn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="47028"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 30 Mar 2022 06:16 UTC

MitchAlsup wrote:
> On Tuesday, March 29, 2022 at 7:57:35 AM UTC-5, Terje Mathisen wrote:
>> MitchAlsup wrote:
>>> On Monday, March 28, 2022 at 4:00:40 PM UTC-5, Michael S wrote:
>>>> On Sunday, March 27, 2022 at 2:32:34 AM UTC+3, Michael S wrote:
>>>>> On Sunday, March 27, 2022 at 1:25:29 AM UTC+3, Thomas Koenig wrote:
>>>>>> Michael S <already...@yahoo.com> schrieb:
>>>>>>> On my system(s) I see another very strange timing effect:
>>>>>>> The speed of sqrtq() strongly depends on the range of the input and does it in unexpected manner:
>>>>>>> it is much faster (1.5x to 2x) when input is *outside* the range of double precision!
>>>>>> Source for the sqrt routine in libquadmath is here:
>>>>>>
>>>>>> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libquadmath/math/sqrtq.c;h=56ea5d3243c06df2276689849b448eebacae6367;hb=refs/heads/master
>>>>>>
>>>>>> Hmm.. they first do a test if it is within double precision range,
>>>>>> with two Newton iterations, then a test if it is within long double
>>>>>> range with a single Newton iteration, and if it is outside then
>>>>>> they pick apart the number and run two Newton iterations.
>>>>>>
>>>>>> Were the numbers outside the range of double inside the range
>>>>>> of long double?
>>>>> Yes.
>>>>> I didn't test "outside long double".
>>>>> If "long double" is 80-bit x87 format then "outside" would be mostly sub-normals, right?
>>>>>> That could explain things (and suggest
>>>>>> an improvement).
>>>>>
>>>>> I think, the whole routine needs rewrite, rather than improvements.
>>>>> The original author probably was not thinking very hard.
>>>> Hmm, on my system it's not just slow. Sometimes it rounds wrongly.
>>>> For example,
>>>> sqrtq(1.06436563353081694552652292498817875110) => 1.03168097468685393293622695023074812668
>>>> The correct answer is 1.03168097468685393293622695023074831927
>>> <
>>> This is a 1ULP error, perhaps simply improperly rounded (or the 1:million cases that require the second N-R
>>> iteration after you have 111 accurate fraction bits.
>>> <
>>> BTW: Itanium only gave 0.502 ULP FDIV accuracy (DP).
>> Do you have a numeric example? Any error at all in FDIV is simply a bug,
> <
> https://www.cl.cam.ac.uk/~jrh13/slides/gelato-25may05/slides.pdf

That pdf shows 0.50x accuracy for most of the trancendentals, but not a
single mention of doing the same on FDIV, rather the opposite:

> Exploiting non-atomic division
> On Itanium, there’s no atomic division operation, but frcpa returns a
> reciprocal approximation good to about 8 bits.
> Techniques largely due to Markstein allow this to be refined to an
> IEEE-correct quotient just using standard fma operations.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<t20t87$1k64$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24531&group=comp.arch#24531

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Wed, 30 Mar 2022 08:27:18 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t20t87$1k64$1@gioia.aioe.org>
References: <t1c154$j5t$1@dont-email.me>
<t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
<cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
<t1vnme$dhs$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="53444"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 30 Mar 2022 06:27 UTC

Thomas Koenig wrote:
> Michael S <already5chosen@yahoo.com> schrieb:
>>
>> Of course, it's 1ULP.
>> But, according to my understanding, sqrt() is one of those very
>> few primitives for which IEEE-754 not just recommends > correct
>> rounding, but requires correct rounding.
>
> This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 .
>

Exact rounding rules are easy: All 5 core primitives
(FADD/FSUB/FMUL/FDIV/FSQRT) for which it was known back in 1978 that it
was relatively easy to always provide exact rounding, are in fact
required to do so.

Having a square root (for any supported fp size) which does not obey
this is in fact a breaking bug.

It is of course perfectly legal to provide an alternative "fastmath"
library which disobeys these rules, but at that point you are not
providing IEEE 754 math. I.e. Cray did this at some point, right?

Similar issues with IBM hex math, but both of them was before 1978.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<ec70026f-a1bf-489a-ab8e-5a56cc240439n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24532&group=comp.arch#24532

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5848:0:b0:441:4092:c385 with SMTP id de8-20020ad45848000000b004414092c385mr30393067qvb.24.1648623504186;
Tue, 29 Mar 2022 23:58:24 -0700 (PDT)
X-Received: by 2002:a05:6808:2185:b0:2d9:ebf0:fb66 with SMTP id
be5-20020a056808218500b002d9ebf0fb66mr1174242oib.69.1648623503940; Tue, 29
Mar 2022 23:58:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 29 Mar 2022 23:58:23 -0700 (PDT)
In-Reply-To: <t20t87$1k64$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <t1c154$j5t$1@dont-email.me> <t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com> <t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com> <t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com> <5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de> <ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de> <c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com> <t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com> <cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com> <cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
<t1vnme$dhs$1@newsreader4.netcologne.de> <t20t87$1k64$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ec70026f-a1bf-489a-ab8e-5a56cc240439n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 30 Mar 2022 06:58:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 15
 by: Quadibloc - Wed, 30 Mar 2022 06:58 UTC

On Wednesday, March 30, 2022 at 12:27:22 AM UTC-6, Terje Mathisen wrote:

> Exact rounding rules are easy: All 5 core primitives
> (FADD/FSUB/FMUL/FDIV/FSQRT) for which it was known back in 1978 that it
> was relatively easy to always provide exact rounding,

And the thing is, though, that for division (and, likely, square root) if you use
a fast algorithm, exact rounding is no longer easy, but requires significant extra
hardware, or a detectable amount of extra time - and, yes, one extra cycle is
detectable.

For addition, subtraction, and multiplication - only - is the extra effort for exact
rounding something that takes only a few gates _and_ a few gate delays that
will normally be swallowed up in the length of a cycle.

John Savard

Re: Approximate reciprocals

<t215f3$9o7$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24533&group=comp.arch#24533

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Wed, 30 Mar 2022 08:47:31 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t215f3$9o7$1@newsreader4.netcologne.de>
References: <t1c154$j5t$1@dont-email.me>
<t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com>
<t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com>
<t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com>
<5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de>
<ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de>
<c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com>
<t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com>
<cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com>
<cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
<t1vnme$dhs$1@newsreader4.netcologne.de> <t20t87$1k64$1@gioia.aioe.org>
Injection-Date: Wed, 30 Mar 2022 08:47:31 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-30bd-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:30bd:0:7285:c2ff:fe6c:992d";
logging-data="9991"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 30 Mar 2022 08:47 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Thomas Koenig wrote:
>> Michael S <already5chosen@yahoo.com> schrieb:
>>>
>>> Of course, it's 1ULP.
>>> But, according to my understanding, sqrt() is one of those very
>>> few primitives for which IEEE-754 not just recommends > correct
>>> rounding, but requires correct rounding.
>>
>> This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 .
>>
>
> Exact rounding rules are easy: All 5 core primitives
> (FADD/FSUB/FMUL/FDIV/FSQRT) for which it was known back in 1978 that it
> was relatively easy to always provide exact rounding, are in fact
> required to do so.
>
> Having a square root (for any supported fp size) which does not obey
> this is in fact a breaking bug.

It's a wrong-code bug (and I have marked it as such). Compilers and
libraries are known to have these, even thougn, in an ideal world,
they should not exist.

It is certainly too late to fix for gcc12 (regression-mode only),
but gcc13 should be doable.

Hmm... for those who are in the know about IEEE standards (which
cost money :-(): Does sqrt need to follow rounding modes?

Re: Approximate reciprocals

<03152646-b508-400f-892c-d0fda7212c85n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24535&group=comp.arch#24535

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1012:b0:2e1:e7f3:5c89 with SMTP id d18-20020a05622a101200b002e1e7f35c89mr31943690qte.550.1648637520512;
Wed, 30 Mar 2022 03:52:00 -0700 (PDT)
X-Received: by 2002:a05:6870:d249:b0:dd:ada6:736b with SMTP id
h9-20020a056870d24900b000ddada6736bmr1774957oac.27.1648637520176; Wed, 30 Mar
2022 03:52:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 30 Mar 2022 03:51:59 -0700 (PDT)
In-Reply-To: <t215f3$9o7$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <t1c154$j5t$1@dont-email.me> <t1fmss$i30$2@newsreader4.netcologne.de>
<5991ffcb-7857-49ba-9204-7201850b64a6n@googlegroups.com> <t1helc$mtc$1@newsreader4.netcologne.de>
<b58e87e7-5cad-4867-835e-ea84b192b230n@googlegroups.com> <t1i106$4jp$1@newsreader4.netcologne.de>
<4a14747b-b131-4619-af63-e87caa1186cen@googlegroups.com> <5c553807-0d0a-45f4-8b4e-a52480359c8cn@googlegroups.com>
<t1mnc6$8lb$2@newsreader4.netcologne.de> <ae3841d6-c26f-4703-99f8-9893cb7c4701n@googlegroups.com>
<t1nqv2$2b2$1@newsreader4.netcologne.de> <c98cc15b-e3bf-4636-a2fc-0232aaf544f6n@googlegroups.com>
<285f1ec1-4a48-4ab7-953a-1a4a2b4f8094n@googlegroups.com> <t1o3sm$977$1@newsreader4.netcologne.de>
<b48d1398-004f-4c6e-8c27-94c635d23c52n@googlegroups.com> <cc1f77e5-2cfe-4d7d-b65b-4a2627818b43n@googlegroups.com>
<6e7449c4-7134-44ea-a550-7c130d88b975n@googlegroups.com> <cfd303cb-9107-4887-a71f-dd593d7698ebn@googlegroups.com>
<t1vnme$dhs$1@newsreader4.netcologne.de> <t20t87$1k64$1@gioia.aioe.org> <t215f3$9o7$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <03152646-b508-400f-892c-d0fda7212c85n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 30 Mar 2022 10:52:00 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Wed, 30 Mar 2022 10:51 UTC

On Wednesday, March 30, 2022 at 11:47:34 AM UTC+3, Thomas Koenig wrote:
> Terje Mathisen <terje.m...@tmsw.no> schrieb:
> > Thomas Koenig wrote:
> >> Michael S <already...@yahoo.com> schrieb:
> >>>
> >>> Of course, it's 1ULP.
> >>> But, according to my understanding, sqrt() is one of those very
> >>> few primitives for which IEEE-754 not just recommends > correct
> >>> rounding, but requires correct rounding.
> >>
> >> This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 .
> >>
> >
> > Exact rounding rules are easy: All 5 core primitives
> > (FADD/FSUB/FMUL/FDIV/FSQRT) for which it was known back in 1978 that it
> > was relatively easy to always provide exact rounding, are in fact
> > required to do so.
> >
> > Having a square root (for any supported fp size) which does not obey
> > this is in fact a breaking bug.
> It's a wrong-code bug (and I have marked it as such). Compilers and
> libraries are known to have these, even thougn, in an ideal world,
> they should not exist.
>
> It is certainly too late to fix for gcc12 (regression-mode only),
> but gcc13 should be doable.
>
> Hmm... for those who are in the know about IEEE standards (which
> cost money :-(): Does sqrt need to follow rounding modes?

Terje is (or was until recently?) a member of IEEE-754 Standard Committee,
so you can take his answer (i.e. Yes) in the above post as authoritative.

Pages:12345678910111213
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor