Message-ID:

It is not best to swap horses while crossing the river. -- Abraham Lincoln

devel / comp.arch / FMA in implementaion of FFT - reaction to last year's post of Mitch

FMA in implementaion of FFT - reaction to last year's post of Mitch

<6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=20447&group=comp.arch#20447

X-Received: by 2002:a05:622a:170a:: with SMTP id h10mr5194413qtk.327.1631632977663;
Tue, 14 Sep 2021 08:22:57 -0700 (PDT)
X-Received: by 2002:a4a:e7d4:: with SMTP id y20mr7089280oov.16.1631632977458;
Tue, 14 Sep 2021 08:22:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 14 Sep 2021 08:22:57 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>
Subject: FMA in implementaion of FFT - reaction to last year's post of Mitch
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 14 Sep 2021 15:22:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 121

by: Michael S - Tue, 14 Sep 2021 15:22 UTC

On Friday, April 24, 2020 at 12:10:58 AM UTC+3, MitchAlsup wrote:
> On Thursday, April 23, 2020 at 2:54:54 PM UTC-5, Terje Mathisen wrote:
> > David Brown wrote:
> > > On 23/04/2020 10:25, Tom Gardner wrote:
> > >> On 23/04/20 08:44, David Brown wrote:
> > >>> On 22/04/2020 20:49, Tom Gardner wrote:
> > >>>>
> > >>>> The HPC mob have complained that is problematic for
> > >>>> them w.r.t. floating point behaviour.
> > >>>>
> > >>>
> > >>> If the language makes restrictions on floating point details, then I
> > >>> can see how that might annoy people.Â Some floating point users need
> > >>> highly predictable results that match IEEE specifications across
> > >>> different platforms - others want the highest possible speeds and
> > >>> will accept minor differences in results.Â If a language is
> > >>> specified to suit one of these groups, it is bound to annoy the other.
> > >>
> > >> There's an argument that if you are using floating point is never
> > >> exact and it does no harm to expose that inexactitude.
> > >
> > > Yes. That's my attitude for the code I write.
> >
> > Good for you!
> >
> > >
> > > There is another argument that says floating point should work exactly
> > > the same each time, and on all platforms - it is not an exact model of
> > > real numbers, but needs to be consistent and tightly specified nonetheless.
> > >
> > > Both attitudes are fine IMHO, and have their uses.
> >
> > Individual operations needs to be both consistent and reproducible, but
> > HPC style programming in the large have to accept that when the order of
> > operation is effectively random, you have to accept some limits to the
> > reproducibility of the final results.
> > >
> > >>
> > >> Caution: I am not an FP expert; so this is from memory
> > >> and is possibly misunderstood...
> > >> More importantly, if you have higher "internal precision"
> > >> during operations (e.g. 80-bit for 64-bit FP numbers), then
> > >> the end result of a calculation can be more exact - but the
> > >> IEEE spec precludes that.
> >
> > It does not! Java however do so.
> >
> > I.e. partly because the ieee754 spec was written in parallel with the
> > 8087 fpu, more or less everything Intel's 80-bit format can do is legal.. :-)
> >
> >
> > OTOH, since all x64 fp math is done using the SSE regs, even that
> > platform is now in the exact reproducibility camp.
> >
> > Actually, there are still a few open slots for ulp offsets to occur
> > legally: The one I know best is related to rounding of
> > subnormal/denormal numbers where you can round first, before detecting
> > that the result is subnormal, or you can do it after normalization.
> >
> > The latter is the only "Right Way" (TM) to do it imho, but since both
> > exists in current cpus, both options are grandfathered in.
> >
> >
> > > Yes. 80-bit FP dates mainly back to floating point coprocessors, I
> > > think (I'm sure others here know vastly more of floating point history
> > > than I do), which are pretty much gone now. But you do get other cases
> > > like fused MAC that can give more accurate results than separate
> > > operations.
> >
> > FMAC is completely specified, but compilers can't really use it unless
> > language lawyers allow them to, i.e when you do
> >
> > s = a*b+c;
> >
> > without an internal rounding after the multiplication but instead keep
> > the exact product and feed it into the adder, you are explicitly
> > breaking the "as-if" rule since the results you get are different.
> Fast Fourier Transforms have an accuracy dependency that precludes
> using FMACs for the last 2 Multiply ADDs in the inner loop::
>
> DO 30 I = J, N, N1
> XT = RE(I) - RO(I)
> YT = IM(I) - IN(I)
> RE(I) = RE(I) + RO(I)
> IM(I) = IM(I) + IN(I)
> RO(I) = XT*C - YT*S // no FMAc here
> IN(I) = XT*S + YT*C // no FMAC here
> 30 CONTINUE
>
> The problem is that the COS() term and the SIN() term are balanced in
> a way* that causes FMAC to lose accuracy over the 2 MUL and 1 ADD
> version. This balance is inherent in S**2 + C**2 = 1.000000000000000

I don't understand why what's written above is bigger problem than any other loss of accuracy that inevitably happens during implementation of FFT via fixed-width FP arithmetic.

Mitch, can you provide an example of input vector for which FFT with FMA really breaks accuracy in significant/unusual way?

Also, your loop appears to be of Radix2 Decimation-In-Frequency variety.
If the problem really exist, does it apply to other common FFT algorithms,
e.g. Radix4 Decimation-In-Frequency or Radix2 Decimation-In-Time?

Re: FMA in implementaion of FFT - reaction to last year's post of Mitch

<f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=20453&group=comp.arch#20453

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a094:: with SMTP id j142mr6080131qke.465.1631640847017;
Tue, 14 Sep 2021 10:34:07 -0700 (PDT)
X-Received: by 2002:a05:6808:2114:: with SMTP id r20mr2385534oiw.110.1631640846097;
Tue, 14 Sep 2021 10:34:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 14 Sep 2021 10:34:05 -0700 (PDT)
In-Reply-To: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:19d7:d2c:4498:62d8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:19d7:d2c:4498:62d8
References: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com>
Subject: Re: FMA in implementaion of FFT - reaction to last year's post of Mitch
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 14 Sep 2021 17:34:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 142

by: MitchAlsup - Tue, 14 Sep 2021 17:34 UTC

On Tuesday, September 14, 2021 at 10:22:59 AM UTC-5, Michael S wrote:
> On Friday, April 24, 2020 at 12:10:58 AM UTC+3, MitchAlsup wrote:
> > On Thursday, April 23, 2020 at 2:54:54 PM UTC-5, Terje Mathisen wrote:
> > > David Brown wrote:
> > > > On 23/04/2020 10:25, Tom Gardner wrote:
> > > >> On 23/04/20 08:44, David Brown wrote:
> > > >>> On 22/04/2020 20:49, Tom Gardner wrote:
> > > >>>>
> > > >>>> The HPC mob have complained that is problematic for
> > > >>>> them w.r.t. floating point behaviour.
> > > >>>>
> > > >>>
> > > >>> If the language makes restrictions on floating point details, then I
> > > >>> can see how that might annoy people.Â Some floating point users need
> > > >>> highly predictable results that match IEEE specifications across
> > > >>> different platforms - others want the highest possible speeds and
> > > >>> will accept minor differences in results.Â If a language is
> > > >>> specified to suit one of these groups, it is bound to annoy the other.
> > > >>
> > > >> There's an argument that if you are using floating point is never
> > > >> exact and it does no harm to expose that inexactitude.
> > > >
> > > > Yes. That's my attitude for the code I write.
> > >
> > > Good for you!
> > >
> > > >
> > > > There is another argument that says floating point should work exactly
> > > > the same each time, and on all platforms - it is not an exact model of
> > > > real numbers, but needs to be consistent and tightly specified nonetheless.
> > > >
> > > > Both attitudes are fine IMHO, and have their uses.
> > >
> > > Individual operations needs to be both consistent and reproducible, but
> > > HPC style programming in the large have to accept that when the order of
> > > operation is effectively random, you have to accept some limits to the
> > > reproducibility of the final results.
> > > >
> > > >>
> > > >> Caution: I am not an FP expert; so this is from memory
> > > >> and is possibly misunderstood...
> > > >> More importantly, if you have higher "internal precision"
> > > >> during operations (e.g. 80-bit for 64-bit FP numbers), then
> > > >> the end result of a calculation can be more exact - but the
> > > >> IEEE spec precludes that.
> > >
> > > It does not! Java however do so.
> > >
> > > I.e. partly because the ieee754 spec was written in parallel with the
> > > 8087 fpu, more or less everything Intel's 80-bit format can do is legal. :-)
> > >
> > >
> > > OTOH, since all x64 fp math is done using the SSE regs, even that
> > > platform is now in the exact reproducibility camp.
> > >
> > > Actually, there are still a few open slots for ulp offsets to occur
> > > legally: The one I know best is related to rounding of
> > > subnormal/denormal numbers where you can round first, before detecting
> > > that the result is subnormal, or you can do it after normalization.
> > >
> > > The latter is the only "Right Way" (TM) to do it imho, but since both
> > > exists in current cpus, both options are grandfathered in.
> > >
> > >
> > > > Yes. 80-bit FP dates mainly back to floating point coprocessors, I
> > > > think (I'm sure others here know vastly more of floating point history
> > > > than I do), which are pretty much gone now. But you do get other cases
> > > > like fused MAC that can give more accurate results than separate
> > > > operations.
> > >
> > > FMAC is completely specified, but compilers can't really use it unless
> > > language lawyers allow them to, i.e when you do
> > >
> > > s = a*b+c;
> > >
> > > without an internal rounding after the multiplication but instead keep
> > > the exact product and feed it into the adder, you are explicitly
> > > breaking the "as-if" rule since the results you get are different.
> > Fast Fourier Transforms have an accuracy dependency that precludes
> > using FMACs for the last 2 Multiply ADDs in the inner loop::
> >
> > DO 30 I = J, N, N1
> > XT = RE(I) - RO(I)
> > YT = IM(I) - IN(I)
> > RE(I) = RE(I) + RO(I)
> > IM(I) = IM(I) + IN(I)
> > RO(I) = XT*C - YT*S // no FMAc here
> > IN(I) = XT*S + YT*C // no FMAC here
> > 30 CONTINUE
> >
> > The problem is that the COS() term and the SIN() term are balanced in
> > a way* that causes FMAC to lose accuracy over the 2 MUL and 1 ADD
> > version. This balance is inherent in S**2 + C**2 = 1.000000000000000
>
> I don't understand why what's written above is bigger problem than any other loss of accuracy that inevitably happens during implementation of FFT via fixed-width FP arithmetic.
<
It has to do with the balance between the SIN terms and to COS terms.
Performing a FMUL followed by an FMAC results in the first term being
carried out in 53-bit of precision, while the second is performed in 106
bits of FMUL precision and 53-bits of FADD precision.
<
This created a bias in the roundings that FMUL+FMUL does not.
>
> Mitch, can you provide an example of input vector for which FFT with FMA really breaks accuracy in significant/unusual way?
>
> Also, your loop appears to be of Radix2 Decimation-In-Frequency variety.
> If the problem really exist, does it apply to other common FFT algorithms,
> e.g. Radix4 Decimation-In-Frequency or Radix2 Decimation-In-Time?
<
My guess is that the bias is reduced, but intermixing of FMUL and FMAC
tends to create opportunity for rounding bias.

Re: FMA in implementaion of FFT - reaction to last year's post of Mitch

<05cbf436-830a-4a54-8b51-ab693a5ceb87n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=20454&group=comp.arch#20454

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:624a:: with SMTP id w71mr6300050qkb.81.1631641197111;
Tue, 14 Sep 2021 10:39:57 -0700 (PDT)
X-Received: by 2002:aca:ea8b:: with SMTP id i133mr2403159oih.44.1631641196872;
Tue, 14 Sep 2021 10:39:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 14 Sep 2021 10:39:56 -0700 (PDT)
In-Reply-To: <f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com> <f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <05cbf436-830a-4a54-8b51-ab693a5ceb87n@googlegroups.com>
Subject: Re: FMA in implementaion of FFT - reaction to last year's post of Mitch
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 14 Sep 2021 17:39:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7526

by: Michael S - Tue, 14 Sep 2021 17:39 UTC

On Tuesday, September 14, 2021 at 8:34:08 PM UTC+3, MitchAlsup wrote:
> On Tuesday, September 14, 2021 at 10:22:59 AM UTC-5, Michael S wrote:
> > On Friday, April 24, 2020 at 12:10:58 AM UTC+3, MitchAlsup wrote:
> > > On Thursday, April 23, 2020 at 2:54:54 PM UTC-5, Terje Mathisen wrote:
> > > > David Brown wrote:
> > > > > On 23/04/2020 10:25, Tom Gardner wrote:
> > > > >> On 23/04/20 08:44, David Brown wrote:
> > > > >>> On 22/04/2020 20:49, Tom Gardner wrote:
> > > > >>>>
> > > > >>>> The HPC mob have complained that is problematic for
> > > > >>>> them w.r.t. floating point behaviour.
> > > > >>>>
> > > > >>>
> > > > >>> If the language makes restrictions on floating point details, then I
> > > > >>> can see how that might annoy people.Â Some floating point users need
> > > > >>> highly predictable results that match IEEE specifications across
> > > > >>> different platforms - others want the highest possible speeds and
> > > > >>> will accept minor differences in results.Â If a language is
> > > > >>> specified to suit one of these groups, it is bound to annoy the other.
> > > > >>
> > > > >> There's an argument that if you are using floating point is never
> > > > >> exact and it does no harm to expose that inexactitude.
> > > > >
> > > > > Yes. That's my attitude for the code I write.
> > > >
> > > > Good for you!
> > > >
> > > > >
> > > > > There is another argument that says floating point should work exactly
> > > > > the same each time, and on all platforms - it is not an exact model of
> > > > > real numbers, but needs to be consistent and tightly specified nonetheless.
> > > > >
> > > > > Both attitudes are fine IMHO, and have their uses.
> > > >
> > > > Individual operations needs to be both consistent and reproducible, but
> > > > HPC style programming in the large have to accept that when the order of
> > > > operation is effectively random, you have to accept some limits to the
> > > > reproducibility of the final results.
> > > > >
> > > > >>
> > > > >> Caution: I am not an FP expert; so this is from memory
> > > > >> and is possibly misunderstood...
> > > > >> More importantly, if you have higher "internal precision"
> > > > >> during operations (e.g. 80-bit for 64-bit FP numbers), then
> > > > >> the end result of a calculation can be more exact - but the
> > > > >> IEEE spec precludes that.
> > > >
> > > > It does not! Java however do so.
> > > >
> > > > I.e. partly because the ieee754 spec was written in parallel with the
> > > > 8087 fpu, more or less everything Intel's 80-bit format can do is legal. :-)
> > > >
> > > >
> > > > OTOH, since all x64 fp math is done using the SSE regs, even that
> > > > platform is now in the exact reproducibility camp.
> > > >
> > > > Actually, there are still a few open slots for ulp offsets to occur
> > > > legally: The one I know best is related to rounding of
> > > > subnormal/denormal numbers where you can round first, before detecting
> > > > that the result is subnormal, or you can do it after normalization.
> > > >
> > > > The latter is the only "Right Way" (TM) to do it imho, but since both
> > > > exists in current cpus, both options are grandfathered in.
> > > >
> > > >
> > > > > Yes. 80-bit FP dates mainly back to floating point coprocessors, I
> > > > > think (I'm sure others here know vastly more of floating point history
> > > > > than I do), which are pretty much gone now. But you do get other cases
> > > > > like fused MAC that can give more accurate results than separate
> > > > > operations.
> > > >
> > > > FMAC is completely specified, but compilers can't really use it unless
> > > > language lawyers allow them to, i.e when you do
> > > >
> > > > s = a*b+c;
> > > >
> > > > without an internal rounding after the multiplication but instead keep
> > > > the exact product and feed it into the adder, you are explicitly
> > > > breaking the "as-if" rule since the results you get are different.
> > > Fast Fourier Transforms have an accuracy dependency that precludes
> > > using FMACs for the last 2 Multiply ADDs in the inner loop::
> > >
> > > DO 30 I = J, N, N1
> > > XT = RE(I) - RO(I)
> > > YT = IM(I) - IN(I)
> > > RE(I) = RE(I) + RO(I)
> > > IM(I) = IM(I) + IN(I)
> > > RO(I) = XT*C - YT*S // no FMAc here
> > > IN(I) = XT*S + YT*C // no FMAC here
> > > 30 CONTINUE
> > >
> > > The problem is that the COS() term and the SIN() term are balanced in
> > > a way* that causes FMAC to lose accuracy over the 2 MUL and 1 ADD
> > > version. This balance is inherent in S**2 + C**2 = 1.000000000000000
> >
> > I don't understand why what's written above is bigger problem than any other loss of accuracy that inevitably happens during implementation of FFT via fixed-width FP arithmetic.
> <
> It has to do with the balance between the SIN terms and to COS terms.
> Performing a FMUL followed by an FMAC results in the first term being
> carried out in 53-bit of precision, while the second is performed in 106
> bits of FMUL precision and 53-bits of FADD precision.
> <
> This created a bias in the roundings that FMUL+FMUL does not.
> >
> > Mitch, can you provide an example of input vector for which FFT with FMA really breaks accuracy in significant/unusual way?
> >
> > Also, your loop appears to be of Radix2 Decimation-In-Frequency variety..
> > If the problem really exist, does it apply to other common FFT algorithms,
> > e.g. Radix4 Decimation-In-Frequency or Radix2 Decimation-In-Time?
> <
> My guess is that the bias is reduced, but intermixing of FMUL and FMAC
> tends to create opportunity for rounding bias.

So far, I am not convinced at all.
Did you came to conclusion that the problem exist by yourself or you was told about it by somebody else?

Re: FMA in implementaion of FFT - reaction to last year's post of Mitch

<599b8dbd-d904-4cfe-8b71-3b45646317c7n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=20466&group=comp.arch#20466

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:ab4e:: with SMTP id i14mr7574338qvb.28.1631652803410;
Tue, 14 Sep 2021 13:53:23 -0700 (PDT)
X-Received: by 2002:a4a:e7d4:: with SMTP id y20mr8363064oov.16.1631652803142;
Tue, 14 Sep 2021 13:53:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 14 Sep 2021 13:53:22 -0700 (PDT)
In-Reply-To: <05cbf436-830a-4a54-8b51-ab693a5ceb87n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7dab:8129:64ce:f0bb;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7dab:8129:64ce:f0bb
References: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>
<f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com> <05cbf436-830a-4a54-8b51-ab693a5ceb87n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <599b8dbd-d904-4cfe-8b71-3b45646317c7n@googlegroups.com>
Subject: Re: FMA in implementaion of FFT - reaction to last year's post of Mitch
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 14 Sep 2021 20:53:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 161

by: MitchAlsup - Tue, 14 Sep 2021 20:53 UTC

On Tuesday, September 14, 2021 at 12:39:58 PM UTC-5, Michael S wrote:
> On Tuesday, September 14, 2021 at 8:34:08 PM UTC+3, MitchAlsup wrote:
> > On Tuesday, September 14, 2021 at 10:22:59 AM UTC-5, Michael S wrote:
> > > On Friday, April 24, 2020 at 12:10:58 AM UTC+3, MitchAlsup wrote:
> > > > On Thursday, April 23, 2020 at 2:54:54 PM UTC-5, Terje Mathisen wrote:
> > > > > David Brown wrote:
> > > > > > On 23/04/2020 10:25, Tom Gardner wrote:
> > > > > >> On 23/04/20 08:44, David Brown wrote:
> > > > > >>> On 22/04/2020 20:49, Tom Gardner wrote:
> > > > > >>>>
> > > > > >>>> The HPC mob have complained that is problematic for
> > > > > >>>> them w.r.t. floating point behaviour.
> > > > > >>>>
> > > > > >>>
> > > > > >>> If the language makes restrictions on floating point details, then I
> > > > > >>> can see how that might annoy people.Â Some floating point users need
> > > > > >>> highly predictable results that match IEEE specifications across
> > > > > >>> different platforms - others want the highest possible speeds and
> > > > > >>> will accept minor differences in results.Â If a language is
> > > > > >>> specified to suit one of these groups, it is bound to annoy the other.
> > > > > >>
> > > > > >> There's an argument that if you are using floating point is never
> > > > > >> exact and it does no harm to expose that inexactitude.
> > > > > >
> > > > > > Yes. That's my attitude for the code I write.
> > > > >
> > > > > Good for you!
> > > > >
> > > > > >
> > > > > > There is another argument that says floating point should work exactly
> > > > > > the same each time, and on all platforms - it is not an exact model of
> > > > > > real numbers, but needs to be consistent and tightly specified nonetheless.
> > > > > >
> > > > > > Both attitudes are fine IMHO, and have their uses.
> > > > >
> > > > > Individual operations needs to be both consistent and reproducible, but
> > > > > HPC style programming in the large have to accept that when the order of
> > > > > operation is effectively random, you have to accept some limits to the
> > > > > reproducibility of the final results.
> > > > > >
> > > > > >>
> > > > > >> Caution: I am not an FP expert; so this is from memory
> > > > > >> and is possibly misunderstood...
> > > > > >> More importantly, if you have higher "internal precision"
> > > > > >> during operations (e.g. 80-bit for 64-bit FP numbers), then
> > > > > >> the end result of a calculation can be more exact - but the
> > > > > >> IEEE spec precludes that.
> > > > >
> > > > > It does not! Java however do so.
> > > > >
> > > > > I.e. partly because the ieee754 spec was written in parallel with the
> > > > > 8087 fpu, more or less everything Intel's 80-bit format can do is legal. :-)
> > > > >
> > > > >
> > > > > OTOH, since all x64 fp math is done using the SSE regs, even that
> > > > > platform is now in the exact reproducibility camp.
> > > > >
> > > > > Actually, there are still a few open slots for ulp offsets to occur
> > > > > legally: The one I know best is related to rounding of
> > > > > subnormal/denormal numbers where you can round first, before detecting
> > > > > that the result is subnormal, or you can do it after normalization.
> > > > >
> > > > > The latter is the only "Right Way" (TM) to do it imho, but since both
> > > > > exists in current cpus, both options are grandfathered in.
> > > > >
> > > > >
> > > > > > Yes. 80-bit FP dates mainly back to floating point coprocessors, I
> > > > > > think (I'm sure others here know vastly more of floating point history
> > > > > > than I do), which are pretty much gone now. But you do get other cases
> > > > > > like fused MAC that can give more accurate results than separate
> > > > > > operations.
> > > > >
> > > > > FMAC is completely specified, but compilers can't really use it unless
> > > > > language lawyers allow them to, i.e when you do
> > > > >
> > > > > s = a*b+c;
> > > > >
> > > > > without an internal rounding after the multiplication but instead keep
> > > > > the exact product and feed it into the adder, you are explicitly
> > > > > breaking the "as-if" rule since the results you get are different..
> > > > Fast Fourier Transforms have an accuracy dependency that precludes
> > > > using FMACs for the last 2 Multiply ADDs in the inner loop::
> > > >
> > > > DO 30 I = J, N, N1
> > > > XT = RE(I) - RO(I)
> > > > YT = IM(I) - IN(I)
> > > > RE(I) = RE(I) + RO(I)
> > > > IM(I) = IM(I) + IN(I)
> > > > RO(I) = XT*C - YT*S // no FMAc here
> > > > IN(I) = XT*S + YT*C // no FMAC here
> > > > 30 CONTINUE
> > > >
> > > > The problem is that the COS() term and the SIN() term are balanced in
> > > > a way* that causes FMAC to lose accuracy over the 2 MUL and 1 ADD
> > > > version. This balance is inherent in S**2 + C**2 = 1.000000000000000
> > >
> > > I don't understand why what's written above is bigger problem than any other loss of accuracy that inevitably happens during implementation of FFT via fixed-width FP arithmetic.
> > <
> > It has to do with the balance between the SIN terms and to COS terms.
> > Performing a FMUL followed by an FMAC results in the first term being
> > carried out in 53-bit of precision, while the second is performed in 106
> > bits of FMUL precision and 53-bits of FADD precision.
> > <
> > This created a bias in the roundings that FMUL+FMUL does not.
> > >
> > > Mitch, can you provide an example of input vector for which FFT with FMA really breaks accuracy in significant/unusual way?
> > >
> > > Also, your loop appears to be of Radix2 Decimation-In-Frequency variety.
> > > If the problem really exist, does it apply to other common FFT algorithms,
> > > e.g. Radix4 Decimation-In-Frequency or Radix2 Decimation-In-Time?
> > <
> > My guess is that the bias is reduced, but intermixing of FMUL and FMAC
> > tends to create opportunity for rounding bias.
<
> So far, I am not convinced at all.
> Did you came to conclusion that the problem exist by yourself or you was told about it by somebody else?
<
I was told of the problem by someone else.

Re: FMA in implementaion of FFT - reaction to last year's post of Mitch

<shr2i1$mha$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=20467&group=comp.arch#20467

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a544-865-0-d0a1-d3ae-e8a2-a3d2.ipv6dyn.netcologne.de!not-for-mail
From: bl1-remo...@gmx.com (Bernd Linsel)
Newsgroups: comp.arch
Subject: Re: FMA in implementaion of FFT - reaction to last year's post of
Mitch
Date: Tue, 14 Sep 2021 23:00:49 +0200
Organization: news.netcologne.de
Distribution: world
Message-ID: <shr2i1$mha$1@newsreader4.netcologne.de>
References: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>
<f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com>
<05cbf436-830a-4a54-8b51-ab693a5ceb87n@googlegroups.com>
<599b8dbd-d904-4cfe-8b71-3b45646317c7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 14 Sep 2021 21:00:49 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a544-865-0-d0a1-d3ae-e8a2-a3d2.ipv6dyn.netcologne.de:2a0a:a544:865:0:d0a1:d3ae:e8a2:a3d2";
logging-data="23082"; mail-complaints-to="abuse@netcologne.de"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
In-Reply-To: <599b8dbd-d904-4cfe-8b71-3b45646317c7n@googlegroups.com>

by: Bernd Linsel - Tue, 14 Sep 2021 21:00 UTC

On 14.09.2021 22:53, MitchAlsup wrote:
> On Tuesday, September 14, 2021 at 12:39:58 PM UTC-5, Michael S wrote:
> <
> <
> I was told of the problem by someone else.
>

Wouldn't a simple remedy be to start with 0 and then do 2 successive FMAC?

eg.

RO(I) = 0 + XT*C - YT*S
IN(I) = 0 + XT*S + YT*C

Re: FMA in implementaion of FFT - reaction to last year's post of Mitch

<ff9e2450-5e0d-463f-87b4-fc2084e6e6a8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=20468&group=comp.arch#20468

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a51:: with SMTP id 78mr7187603qkk.88.1631658040920;
Tue, 14 Sep 2021 15:20:40 -0700 (PDT)
X-Received: by 2002:a9d:7019:: with SMTP id k25mr16891676otj.350.1631658040644;
Tue, 14 Sep 2021 15:20:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 14 Sep 2021 15:20:40 -0700 (PDT)
In-Reply-To: <shr2i1$mha$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7dab:8129:64ce:f0bb;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7dab:8129:64ce:f0bb
References: <6040c048-2d3e-4b68-80fa-8887736615a5n@googlegroups.com>
<f243ea19-cbad-4e5a-8bca-bd7212840b74n@googlegroups.com> <05cbf436-830a-4a54-8b51-ab693a5ceb87n@googlegroups.com>
<599b8dbd-d904-4cfe-8b71-3b45646317c7n@googlegroups.com> <shr2i1$mha$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff9e2450-5e0d-463f-87b4-fc2084e6e6a8n@googlegroups.com>
Subject: Re: FMA in implementaion of FFT - reaction to last year's post of Mitch
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 14 Sep 2021 22:20:40 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 17

by: MitchAlsup - Tue, 14 Sep 2021 22:20 UTC

Subject	Author
FMA in implementaion of FFT - reaction to last year's post of Mitch	Michael S
Re: FMA in implementaion of FFT - reaction to last year's post of Mitch	MitchAlsup
Re: FMA in implementaion of FFT - reaction to last year's post of Mitch	Michael S
Re: FMA in implementaion of FFT - reaction to last year's post of Mitch	MitchAlsup
Re: FMA in implementaion of FFT - reaction to last year's post of	Bernd Linsel
Re: FMA in implementaion of FFT - reaction to last year's post of Mitch	MitchAlsup