novaBBS - comp.arch - Re: Approximate reciprocals

Re: Approximate reciprocals

<acc5d2ac-54b0-40f2-8eb8-f47877b68cb2n@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=24666&group=comp.arch#24666

X-Received: by 2002:a37:54a:0:b0:69a:f10c:f533 with SMTP id 71-20020a37054a000000b0069af10cf533mr8532865qkf.525.1649602411326;
Sun, 10 Apr 2022 07:53:31 -0700 (PDT)
X-Received: by 2002:a05:6808:1a21:b0:2f9:c3b2:843b with SMTP id
bk33-20020a0568081a2100b002f9c3b2843bmr3316428oib.7.1649602411028; Sun, 10
Apr 2022 07:53:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 10 Apr 2022 07:53:30 -0700 (PDT)
In-Reply-To: <STA4K.349142$Gojc.88544@fx99.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <memo.20220408160456.22520T@jgd.cix.co.uk>
<tmg35h5jeb295i594psbeih9dlrjik3cvs@4ax.com> <3a582037-b580-45a1-9262-c4e0a4ced2ban@googlegroups.com>
<5fd7f105-c8ce-48cf-8cfb-13e98a584649n@googlegroups.com> <2022Apr10.103214@mips.complang.tuwien.ac.at>
<STA4K.349142$Gojc.88544@fx99.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <acc5d2ac-54b0-40f2-8eb8-f47877b68cb2n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 10 Apr 2022 14:53:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 105

by: Michael S - Sun, 10 Apr 2022 14:53 UTC

On Sunday, April 10, 2022 at 4:24:07 PM UTC+3, EricP wrote:
> Anton Ertl wrote:
> > Quadibloc <jsa...@ecn.ab.ca> writes:
> >> You have heard the crazy way *I* would have designed a processor. With separate
> >> pipelines for single precision and double precision and quad precision and one-and-a-half
> >> precision.
> >>
> >> But if one were to put a 60-bit floating-point number down the double precision pipeline,
> >> no, one would not have to drain it to change the mode. (The double precision pipeline
> >> would actually be designed for 72-bit floats, which would also use it. 80 bit temporary
> >> reals would go down the double precision pipeline, unless there was one for 96-bit
> >> floats.)
> >>
> >> So mixing precisions would make your programs go *faster*, because it would utilize
> >> the other pipelines that otherwise would not be used. The very opposite of Intel!
> >
> > In the last weeks I have noticed unusually many people posting
> > over-long lines (i.e., longer than 80 chars, and ideally you should
> > limit the lines to 70-72 chars to leave room for quoting). Is there
> > some new attack on Usenet conventions coming out from Google?
> >
> > Anyway, my guess for the reason for slow precision-setting is that
> > Intel and AMD microarchitects want the precision setting to be known
> > to the decoder, so it can deliver the precision as part of the uop.
> > This requires that when setting the precision, decoding of subsequent
> > instructions starts from scratch. An alternative would be to deliver
> > the precision as another input in the OoO engine, but that would
> > require additional resources in the OoO engine, an apparently they
> > thought that spending these resources elsewhere would buy more
> > performance.
> >
> > Concerning your separate-pipelines idea, in that setup it's even more
> > advantageous to know the precision early in instruction processing:
> > You can have separate queues/ports for the different precisions and
> > steer the instructions to these ports early, instead of having common
> > FP queues, and steering the instructions to the right pipelines only
> > when all the data (including precision) is in; ok, you could also have
> > two stages of queues, but that introduces additional complication,
> > area, and probably latency.
> >
> > Also, even if switching is fast, how frequent is code with mixed
> > precision? E.g., in DGEMM you only use double precision operations,
> > while in SGEMM you only use single-precision operations.
> >
> > Bottom line: There's a reason why Intel and AMD are designing their
> > CPUs the way they are.
> The x87 FP Control Word has flags to control
> Infinity, Rounding, Precision, and Exception masking.
>
> If you are changing some bits and not others then you have to
> store the current value, mask in your changes, and load it FLDCW.
> That store can either be synchronous FSTCW or asynchronous FNSTCW
> (remember x87 is a separate co-processor with long running
> transcendentals, and the FSTCW is actually an assembler macro
> instruction which emits FWAIT, FNSTCW).
>
> If any of the exception mask flags change then we may need to
> synchronize with current and pending exceptions status.
> So this ties changes in the control word into current
> and future state in the status register.
>
> Then there is the issue that the new CW is coming from the
> data path so to merge the CW bits into the uOp bits in the decoder
> implies some kind of front end delay between when the FLDCW decodes
> and when the new value propagates back to decode.
>
> An alternatively design merges the FPCW flags into the uOp in the
> FPU itself, but then we have to deal with the FP instructions are
> launching out-of-order and we have to make sure the right set
> of flags goes to the right uOp.
> For this I would have a small set physical FPCW registers,
> 4 should be sufficient, and a renamer for the one logical CW register.
> This makes the CW bits a uOp data dependency like other FP operands
> and it would require its own wake-up matrix and forwarding bus,
> but much simpler than the normal operand support logic.
> The then current (future) CW bits merge into the uOp when
> it is launched for execution.
>
> With this a write to the FPCW would only stall as long as it took
> the new CW value to arrive at its CW physical register or appear
> on its forwarding bus, so ideally allowing back-to-back execution.

According to my understanding, what you described as alternative design
is exactly what Intel did starting from Yonah and up to Nehalem (of course,
excluding Bonell).
In Sandy Bridge and later, it seems that limitation of 4 renaming registers gone.
Probably, by now x87 Control Word is stored in one of the big PRFs.

AMD Bulldozer and Zen are similar to Sandy Bridge derivatives.

AMD K7/K8/K10 does not rename Control Word, but somehow FLDCW is slow
only when the new value differs from the old one.

Pentium 4 went into opposite direction. x87 Control Word is not renamed, but
the value predicted to be the same as before a previous FLDCW . When prediction
fails then everything is flushed and CPU goes through very slow replay. Agner says
143 clocks, but this things are hard to measure. However in common scenario
[of late 90s and early 00s] software temporarily changes Control Word and
then restores the previous value, so this primitive prediction works well.

"Small" x86 cores (AMD Bobcat&Jaguar, Intel Silvermon and Goldmont) do not
rename x87 Control Word and don't try to be smart about it in any ways.

Re: Approximate reciprocals

<2022Apr10.173246@mips.complang.tuwien.ac.at>

Subject	Author
Approximate reciprocals	Marcus
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	robf...@gmail.com
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Quadibloc
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	BGB
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Quadibloc
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	James Van Buskirk
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Quadibloc
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	robf...@gmail.com
Useful floating point instructions (was: Approximate reciprocals)	Thomas Koenig
Re: Useful floating point instructions	Terje Mathisen
Re: Useful floating point instructions	Stephen Fuld
Re: Useful floating point instructions	MitchAlsup
Re: Useful floating point instructions	Stephen Fuld
Re: Useful floating point instructions	MitchAlsup
Re: Useful floating point instructions	Michael S
Re: Useful floating point instructions	Stephen Fuld
Re: Useful floating point instructions	Terje Mathisen
Re: Useful floating point instructions	Terje Mathisen
Re: Useful floating point instructions	Stefan Monnier
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	George Neuner
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	George Neuner
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	John Dallman
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	George Neuner
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	EricP
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	John Dallman
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Elijah Stone
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	Marcus

"Just the facts, Ma'am" -- Joe Friday

devel / comp.arch / Re: Approximate reciprocals