Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

((lambda (foo) (bar foo)) (baz))


programming / comp.lang.asm.x86 / FMA4 on Matisse?

SubjectAuthor
* FMA4 on Matisse?Melzzzzz
+- Re: FMA4 on Matisse?Bonita Montero
`* Re: FMA4 on Matisse?Anton Ertl
 `- Re: FMA4 on Matisse?Bonita Montero

1
Subject: FMA4 on Matisse?
From: Melzzzzz
Newsgroups: comp.lang.asm.x86
Organization: usenet-news.net
Date: Sun, 20 Oct 2019 08:18 UTC
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Melzz...@nospicedham.zzzzz.com (Melzzzzz)
Newsgroups: comp.lang.asm.x86
Subject: FMA4 on Matisse?
Date: Sun, 20 Oct 2019 08:18:15 GMT
Organization: usenet-news.net
Lines: 13
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <bLUqF.1074454$yw3.803879@fx35.am4>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="524a3cf6b019644064c860031d89f246";
logging-data="1870"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19a8gwLgG8owZeY1b4sniJ9ntxYfgk4qeI="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:KESlot/GHefQHq/sMYmQ9pu/kRM=
View all headers
FMA4 works on my 2700X. Question is: is that supported
still on Zen2?
Instructions are not reported at all by cpuid but works.
I had some nice speedup with FMA4 on Zen.
I don't know why FMA3 won when FMA4 is clearly superior?


--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala



Subject: Re: FMA4 on Matisse?
From: Bonita Montero
Newsgroups: comp.lang.asm.x86
Organization: albasani.net
Date: Sun, 20 Oct 2019 18:55 UTC
References: 1
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@nospicedham.gmail.com (Bonita Montero)
Newsgroups: comp.lang.asm.x86
Subject: Re: FMA4 on Matisse?
Date: Sun, 20 Oct 2019 20:55:35 +0200
Organization: albasani.net
Lines: 14
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <qoiaj5$mbn$1@news.albasani.net>
References: <bLUqF.1074454$yw3.803879@fx35.am4>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="524a3cf6b019644064c860031d89f246";
logging-data="28501"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+6Xv92J6kISZsXST9Vd5Hbtyjl9OkjJCM="
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.1.2
Cancel-Lock: sha1:XvfsJD2q4FqgUG11UioYNdnxeag=
View all headers
FMA4 works on my 2700X. Question is: is that supported
still on Zen2?
Instructions are not reported at all by cpuid but works.
I had some nice speedup with FMA4 on Zen.
I don't know why FMA3 won when FMA4 is clearly superior?

When dealing with floating-point-operations you have a lot of
instructions with long latencies. Even if you have parallel chains
of instructions that could be pipelined, there are mostly oppurtu-
nities to hide movs that might be necessary to prevent overwriting
registers. And often their value isn't needed to be remembered. So
your speedup could be only slightly and sometimes there's nothing
at all.



Subject: Re: FMA4 on Matisse?
From: Anton Ertl
Newsgroups: comp.lang.asm.x86
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Date: Mon, 21 Oct 2019 08:36 UTC
References: 1
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@nospicedham.mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.asm.x86
Subject: Re: FMA4 on Matisse?
Date: Mon, 21 Oct 2019 08:36:49 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 28
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <2019Oct21.103649@mips.complang.tuwien.ac.at>
References: <bLUqF.1074454$yw3.803879@fx35.am4>
Injection-Info: reader02.eternal-september.org; posting-host="7bfb955b236d7c980e7375b09d3981f6";
logging-data="27058"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18mXs3kD3XRwsy0nyLEwZixmp2gTmHuIXE="
Cancel-Lock: sha1:Tb3P/Stw1NYi8WM+o4IqY9INc7M=
View all headers
Melzzzzz <Melzzzzz@nospicedham.zzzzz.com> writes:
FMA4 works on my 2700X. Question is: is that supported
still on Zen2?
Instructions are not reported at all by cpuid but works.
I had some nice speedup with FMA4 on Zen.
I don't know why FMA3 won when FMA4 is clearly superior?

Intel supports FMA3, but not FMA4.

If your question is why they do that, I can only speculate:

1) Well-placed register-register moves can have latency 0 and cost no
execution unit resources, only some front-end resources, because the
register renamer eliminates them.  So the advantage of FMA4 on Intel
may be miniscule (not sure if the Zen register renamer works the same
way; if so, maybe you can get similar performance with FMA3 if you
arrange the move appropriately).

2) FMA4 may require complications in the instruction decoder and
register renamer that the Intel engineers were not prepared to pay
for, given the small benefit.

- anton
--
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html



Subject: Re: FMA4 on Matisse?
From: Bonita Montero
Newsgroups: comp.lang.asm.x86
Organization: albasani.net
Date: Mon, 21 Oct 2019 11:56 UTC
References: 1 2
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Bonita.M...@nospicedham.gmail.com (Bonita Montero)
Newsgroups: comp.lang.asm.x86
Subject: Re: FMA4 on Matisse?
Date: Mon, 21 Oct 2019 13:56:39 +0200
Organization: albasani.net
Lines: 7
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <qok6dk$icp$1@news.albasani.net>
References: <bLUqF.1074454$yw3.803879@fx35.am4>
<2019Oct21.103649@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="7bfb955b236d7c980e7375b09d3981f6";
logging-data="29449"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WZxqT4eWJ8AiZNsEWZoqT/6jjqz+3bCI="
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.1.2
Cancel-Lock: sha1:cuCFYUUlVQcu70bbnfZjkPGUzUU=
View all headers
1) Well-placed register-register moves can have latency 0 and cost no
execution unit resources, only some front-end resources, because the
register renamer eliminates them.

It takes one decoder-slot and thereby will delay other instructions
which go to the next decoded bundle.



1
rocksolid light 0.7.2
clearneti2ptor