Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Computer programmers do it byte by byte.


devel / comp.arch / Intel Rocket Lake FUBAR

SubjectAuthor
* Intel Rocket Lake FUBARAnton Ertl
`* Re: Intel Rocket Lake FUBARMichael S
 `- Re: Intel Rocket Lake FUBARAnton Ertl

1
Intel Rocket Lake FUBAR

<2022Nov27.004943@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29186&group=comp.arch#29186

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Intel Rocket Lake FUBAR
Date: Sat, 26 Nov 2022 23:49:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 47
Message-ID: <2022Nov27.004943@mips.complang.tuwien.ac.at>
Injection-Info: reader01.eternal-september.org; posting-host="5d4ae79c267fe45ab272e3485c7578ba";
logging-data="1576790"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19NumT2/SENTYRPkghrUExV"
Cancel-Lock: sha1:+MDwRKWy4ZfUiDYiUJXCrX93N5I=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 26 Nov 2022 23:49 UTC

We have one Xeon W-1370P machine (with a Gigabyte W480M Vision W
mainboard and 128GB ECC RAM), and two Xeon E-2388G machines (with Asus
P12R-M/10G-2T mainboards and 128GB ECC RAM). Basically these are both
Rocket Lakes with 125W TDP, with the Xeon W-1370P going up to 5.2GHz
and the Xeon E-2388G going up to 5.1GHz in theory. In practice I have
never seen more than 4.7GHz from the 2388G, but that's ok with me,
because the last few MHz consume a lot of power.

What's a little bit more surpristing is the following results for
single-threaded loads on otherwise idle machines (and, for comparison,
a Ryzen 5800X result); all are Debian 11 machines (originally clones
of each other, but the 2388G has a Linux 5.18 kernel instead of the
5.10 default). The upper results are for one program, the lower
results for a different program for solving the same problem.

Xeon E-2388G Xeon W-1370P Ryzen 5800X
Program A
99028539338 112558972215 117982204259 cycles
57622936318 53595345396 53545319282 instructions
3.298 5.197 4.830 GHz
30.043763137 21.658751921 24.429302882 seconds time elapsed
27.770830000 20.525982000 23.347058000 seconds user
2.255580000 1.132109000 1.079956000 seconds sys

Program B
117843778770 110928900080 88255917230 cycles
205974209911 198685544708 196521369169 instructions
4.648 5.197 4.725 GHz
25.365717369 21.347959839 18.681791884 seconds time elapsed
24.048509000 20.794424000 18.264130000 seconds user
1.307810000 0.551958000 0.416002000 seconds sys

The interesting part here is that the 2388G runs Program A at only
3.3GHz, while the others manage to run it at full speed. The
resulting cycles are lower (probably because this program waits a lot
for memory, maybe this is also why whatever determines boost runs the
program only at 3.3GHz. The end result is that both programs consume
more time on the 2388G box compared to the 1370P box.

Additional interesting results are that the programs execute more
instructions and consume more system time on the 2388G, possibly
because of the difference in kernels.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Intel Rocket Lake FUBAR

<b91f5496-3878-487d-98dc-4804429a9200n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29188&group=comp.arch#29188

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:f707:0:b0:6fa:8b0b:10c9 with SMTP id s7-20020ae9f707000000b006fa8b0b10c9mr27676506qkg.732.1669509609661;
Sat, 26 Nov 2022 16:40:09 -0800 (PST)
X-Received: by 2002:a05:6870:f6a5:b0:143:a201:19fd with SMTP id
el37-20020a056870f6a500b00143a20119fdmr56866oab.2.1669509609402; Sat, 26 Nov
2022 16:40:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Nov 2022 16:40:09 -0800 (PST)
In-Reply-To: <2022Nov27.004943@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:8ca1:2642:c05a:88da;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:8ca1:2642:c05a:88da
References: <2022Nov27.004943@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b91f5496-3878-487d-98dc-4804429a9200n@googlegroups.com>
Subject: Re: Intel Rocket Lake FUBAR
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 27 Nov 2022 00:40:09 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3880
 by: Michael S - Sun, 27 Nov 2022 00:40 UTC

On Sunday, November 27, 2022 at 2:33:15 AM UTC+2, Anton Ertl wrote:
> We have one Xeon W-1370P machine (with a Gigabyte W480M Vision W
> mainboard and 128GB ECC RAM), and two Xeon E-2388G machines (with Asus
> P12R-M/10G-2T mainboards and 128GB ECC RAM). Basically these are both
> Rocket Lakes with 125W TDP, with the Xeon W-1370P going up to 5.2GHz
> and the Xeon E-2388G going up to 5.1GHz in theory. In practice I have
> never seen more than 4.7GHz from the 2388G, but that's ok with me,
> because the last few MHz consume a lot of power.
>
> What's a little bit more surpristing is the following results for
> single-threaded loads on otherwise idle machines (and, for comparison,
> a Ryzen 5800X result); all are Debian 11 machines (originally clones
> of each other, but the 2388G has a Linux 5.18 kernel instead of the
> 5.10 default). The upper results are for one program, the lower
> results for a different program for solving the same problem.
>
> Xeon E-2388G Xeon W-1370P Ryzen 5800X
> Program A
> 99028539338 112558972215 117982204259 cycles
> 57622936318 53595345396 53545319282 instructions
> 3.298 5.197 4.830 GHz
> 30.043763137 21.658751921 24.429302882 seconds time elapsed
> 27.770830000 20.525982000 23.347058000 seconds user
> 2.255580000 1.132109000 1.079956000 seconds sys
>
> Program B
> 117843778770 110928900080 88255917230 cycles
> 205974209911 198685544708 196521369169 instructions
> 4.648 5.197 4.725 GHz
> 25.365717369 21.347959839 18.681791884 seconds time elapsed
> 24.048509000 20.794424000 18.264130000 seconds user
> 1.307810000 0.551958000 0.416002000 seconds sys
>
> The interesting part here is that the 2388G runs Program A at only
> 3.3GHz, while the others manage to run it at full speed. The
> resulting cycles are lower (probably because this program waits a lot
> for memory, maybe this is also why whatever determines boost runs the
> program only at 3.3GHz. The end result is that both programs consume
> more time on the 2388G box compared to the 1370P box.
>
> Additional interesting results are that the programs execute more
> instructions and consume more system time on the 2388G, possibly
> because of the difference in kernels.
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

I had seen surprisingly low clocks on AMD server parts (Zen3-based EPYC).
The only cure appears to be running the benchmark for at least 5 seconds
locked on the same core.
But those were server parts. Xeon-W is for workstations, so I would expect
less aggressive power saving from its governor.

Re: Intel Rocket Lake FUBAR

<2022Nov28.174816@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29196&group=comp.arch#29196

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Intel Rocket Lake FUBAR
Date: Mon, 28 Nov 2022 16:48:16 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 57
Message-ID: <2022Nov28.174816@mips.complang.tuwien.ac.at>
References: <2022Nov27.004943@mips.complang.tuwien.ac.at> <b91f5496-3878-487d-98dc-4804429a9200n@googlegroups.com>
Injection-Info: reader01.eternal-september.org; posting-host="2c6b2e1fbc5e811badd8c414ce4e3782";
logging-data="2133741"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SygHMxlODa3l1gb07EY1D"
Cancel-Lock: sha1:YwZScY4XU1OLtXQLPaLiEaivro4=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 28 Nov 2022 16:48 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Sunday, November 27, 2022 at 2:33:15 AM UTC+2, Anton Ertl wrote:
[Differences in behaviour between two Rocket Lakes:]

And here I add a Tiger Lake for comparison; however, this one runs on
Ubuntu 22.04 (libc 2.35) rather than Debian 11 (libc 2.31):

Xeon E-2388G Xeon W-1370P i5-1135G7 Ryzen 5800X
Program A
99028539338 112558972215 126197607154 117982204259 cycles
57622936318 53595345396 55972357543 53545319282 instructions
3.298 5.197 2.549 4.830 GHz
30.043763137 21.658751921 49.503123371 24.429302882 seconds time elapsed
27.770830000 20.525982000 46.795417000 23.347058000 seconds user
2.255580000 1.132109000 2.703966000 1.079956000 seconds sys

Program B
117843778770 110928900080 94179459170 88255917230 cycles
205974209911 198685544708 245016339386 196521369169 instructions
4.648 5.197 4.161 4.725 GHz
25.365717369 21.347959839 22.637903769 18.681791884 seconds time elapsed
24.048509000 20.794424000 21.433026000 18.264130000 seconds user
1.307810000 0.551958000 1.204057000 0.416002000 seconds sys

Looking at program A, the 1135G7 clocks down, like the 2388G and
unlike the others. It's interesting that it's so much slower than the
others; the others have 4 32GB DIMMs, while the 1135G7 has 8GB
soldered in, and 32GB added on a DIMM. The large number of cycles
despite the low clock makes it appear as if each cache miss takes a
lot longer than on the other CPUs.

Looking at the instructions of Program B (which spends a lot of time
in qsort()), apparently there is a difference in qsort() between libc
2.31 and libc 2.35. Whether that or the larger L2 cache is the reason
for the lower cycles is unclear.

>I had seen surprisingly low clocks on AMD server parts (Zen3-based EPYC).
>The only cure appears to be running the benchmark for at least 5 seconds
>locked on the same core.

That's strange. What would be the point of that? If you can wait for
5s before clocking up, why not wait forever?

>But those were server parts. Xeon-W is for workstations, so I would expect
>less aggressive power saving from its governor.

Yes, I guess that the governor notices the large number of DRAM
accesses from Program A (or maybe a large number of cycles spent
waiting for DRAM), and lowers the clock in order to save power
(waiting at a faster clock does not buy anything, although the user
time results of the 1370P compared to the 2388G and 1135G7 is
counterevidence).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor