Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

I haven't lost my mind -- it's backed up on tape somewhere.


devel / comp.arch / Cache size on Ampere

SubjectAuthor
* Cache size on AmpereStefan Monnier
+* Re: Cache size on AmpereAnton Ertl
|`* Re: Cache size on AmpereStefan Monnier
| `- Re: Cache size on AmpereAnton Ertl
+- Re: Cache size on AmpereMitchAlsup
`- Re: Cache size on AmpereMichael S

1
Cache size on Ampere

<jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=22189&group=comp.arch#22189

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Cache size on Ampere
Date: Fri, 03 Dec 2021 13:38:48 -0500
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="a80485fb2a6046220885a1c1c101ddca";
logging-data="21489"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19gplOG4dyI+LWQmtHAL2/f"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:J17xWxKLYLB8NJ7jMYX8Bz25NN0=
sha1:4+MzRMq26t6mY+NRwbLMU9QFRkA=
 by: Stefan Monnier - Fri, 3 Dec 2021 18:38 UTC

I just bumped into the Ampere Altra's specifications and was struck: for
a processor with 250W TDP isn't a 32MB last-level cache small?

I mean 32MB was considered large back in 2012 (e.g. on the Itanium
9500), but nowadays AMD's Epyc CPUs come with 128-256MB of cache.

IIUC the Altra does have a 1MB per-core L2 cache, which is twice as
large as current Epyc's core, IIUC, but I wonder if someone here has
some intuition about why they'd go with such a small LLC.

Stefan

Re: Cache size on Ampere

<2021Dec3.234045@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=22191&group=comp.arch#22191

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cache size on Ampere
Date: Fri, 03 Dec 2021 22:40:45 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 25
Message-ID: <2021Dec3.234045@mips.complang.tuwien.ac.at>
References: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
Injection-Info: reader02.eternal-september.org; posting-host="9dfdea17303e354f3e1ffa8a4851d029";
logging-data="20349"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187Y0nvyBswVe/aYFu9uLvg"
Cancel-Lock: sha1:n914h49QZKpEhEQl88575pFy/Co=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Fri, 3 Dec 2021 22:40 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>I just bumped into the Ampere Altra's specifications and was struck: for
>a processor with 250W TDP isn't a 32MB last-level cache small?

That's an odd relation. When I read that, I put it in relation to the
number of cores: 32MB L3 for 80 cores on the Altra Q80-33, and 16MB for 128
cores on the Altra Max M128-30.

>IIUC the Altra does have a 1MB per-core L2 cache, which is twice as
>large as current Epyc's core, IIUC, but I wonder if someone here has
>some intuition about why they'd go with such a small LLC.

Apparently they think that the 1MB L2 is big enough for their
customers, and they use the L3 only as a victim cache and for
communicating between cores. There seem to be enough usage patterns
that do not need that much cache; AMD has announced that they will
offer server CPUs (Bergamo) with more cores and less cache
<https://www.anandtech.com/show/17055/amd-gives-details-on-epyc-zen4-genoa-and-bergamo-up-to-96-and-128-cores>
in addition to the ones with a similar balance to their current
offerings.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cache size on Ampere

<jwvy2515gpn.fsf-monnier+comp.arch@gnu.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=22192&group=comp.arch#22192

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Cache size on Ampere
Date: Fri, 03 Dec 2021 20:09:25 -0500
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <jwvy2515gpn.fsf-monnier+comp.arch@gnu.org>
References: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
<2021Dec3.234045@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="db93e9404e65a3652283b11476d20dca";
logging-data="20696"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19SWEUWTRiwe7Ak6hc0NMuh"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:SNw9r+m4LEHwaBscZmpZTPXDgG8=
sha1:13OYSlBwUN63Upt1YerbYYtMwWw=
 by: Stefan Monnier - Sat, 4 Dec 2021 01:09 UTC

>>I just bumped into the Ampere Altra's specifications and was struck: for
>>a processor with 250W TDP isn't a 32MB last-level cache small?
> That's an odd relation. When I read that, I put it in relation to the
> number of cores: 32MB L3 for 80 cores on the Altra Q80-33, and 16MB for 128
> cores on the Altra Max M128-30.

I find TDP to be a better approximation of "total word done per second"
than the number of cores or the CPU's frequency. I'm not claiming it's
perfect, by a long shot, but I think it's meaningful.

> Apparently they think that the 1MB L2 is big enough for their
> customers, and they use the L3 only as a victim cache and for
> communicating between cores. There seem to be enough usage patterns
> that do not need that much cache; AMD has announced that they will
> offer server CPUs (Bergamo) with more cores and less cache

I was wondering if their decision was also linked to the
microarchitecture of the cores themselves. E.g. they have too few
in-flight instructions to withstand an L2-miss-L3-hit without stalling,
and that in turn (somehow) makes a big LLC less beneficial (maybe not in
terms of single-thread performance but in terms of overall consumption)?

Stefan

Re: Cache size on Ampere

<2021Dec4.190631@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=22193&group=comp.arch#22193

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cache size on Ampere
Date: Sat, 04 Dec 2021 18:06:31 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 65
Message-ID: <2021Dec4.190631@mips.complang.tuwien.ac.at>
References: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org> <2021Dec3.234045@mips.complang.tuwien.ac.at> <jwvy2515gpn.fsf-monnier+comp.arch@gnu.org>
Injection-Info: reader02.eternal-september.org; posting-host="9dfdea17303e354f3e1ffa8a4851d029";
logging-data="2147"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TT7mG1Dx4H76ZWaLJns+4"
Cancel-Lock: sha1:w3wvAR0P9pX1ZBKlA+9RHlGwsOo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 4 Dec 2021 18:06 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>>I just bumped into the Ampere Altra's specifications and was struck: for
>>>a processor with 250W TDP isn't a 32MB last-level cache small?
>> That's an odd relation. When I read that, I put it in relation to the
>> number of cores: 32MB L3 for 80 cores on the Altra Q80-33, and 16MB for 128
>> cores on the Altra Max M128-30.
>
>I find TDP to be a better approximation of "total word done per second"
>than the number of cores or the CPU's frequency. I'm not claiming it's
>perfect, by a long shot, but I think it's meaningful.

I don't. E.g., consider that an 8-core Core i9-11900K can consume
more than 250W, similar to a 64-core EPYC. For a typical server
workload, I doubt that the amount of work done by the Core i9 is
anywhere near the EPYC.

Of course, the other metrics also have their flaws, but at least wrt
to cores used in servers that are not as far off as TDP. And that
even includes the Firestorm core (Apple); yes, it has about a factor
1.5 higher IPC (i.e., work/frequency ratio) than Intel's and AMD's
offerings, but it's advantage in power consumption is higher than 1.5.

>> Apparently they think that the 1MB L2 is big enough for their
>> customers, and they use the L3 only as a victim cache and for
>> communicating between cores. There seem to be enough usage patterns
>> that do not need that much cache; AMD has announced that they will
>> offer server CPUs (Bergamo) with more cores and less cache
>
>I was wondering if their decision was also linked to the
>microarchitecture of the cores themselves. E.g. they have too few
>in-flight instructions to withstand an L2-miss-L3-hit without stalling,
>and that in turn (somehow) makes a big LLC less beneficial (maybe not in
>terms of single-thread performance but in terms of overall consumption)?

I doubt it, for two reasons:

* If there are a significant number of L2 misses, serving them from L3
is much better than serving them from main memory especially if the
core does not support many in-flight L2 misses. Conversely, for
Stream-like applications, and assuming that main memory has the same
bandwidth as L3, if you support enough in-flight accesses to cover
main memory, you don't need L3 (even if the working set would fit in
L3). But I doubt that the conditions are satisfied.

* Simultaneous in-flight instructions are important for Stream-like
stuff, and not at all for pointer-chasing. My impression is that
the typical cloud stuff is more along the lines of pointer-chasing
than HPC (e.g., serving a web request).

I also don't think that code sharing is the reason for this L3 size:
with the current design, the shared code would be duplicated a lot in
the L2 caches and would consume a lot of that (especially if the code
is so unlocal that the L3 plays a role). If I wanted to exploit code
sharing, I would design separate I-caches out until the shared cache;
e.g. have a per-core L2 I-cache just big enough that the number of
requests to the shared L3 cache does not produce performance problems.
That leaves the L2 D-caches for keeping (typically unshared) data.

I still think that the CPU is designed for applications with few L2
misses.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cache size on Ampere

<a5e2158e-f658-4632-ac58-d9308dc38d0cn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=22199&group=comp.arch#22199

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3187:: with SMTP id bi7mr30046900qkb.534.1638740844491;
Sun, 05 Dec 2021 13:47:24 -0800 (PST)
X-Received: by 2002:a05:6830:1445:: with SMTP id w5mr26632183otp.112.1638740844247;
Sun, 05 Dec 2021 13:47:24 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Dec 2021 13:47:24 -0800 (PST)
In-Reply-To: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:40fa:f6f2:2e75:c4dc;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:40fa:f6f2:2e75:c4dc
References: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a5e2158e-f658-4632-ac58-d9308dc38d0cn@googlegroups.com>
Subject: Re: Cache size on Ampere
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 05 Dec 2021 21:47:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 25
 by: MitchAlsup - Sun, 5 Dec 2021 21:47 UTC

On Friday, December 3, 2021 at 11:05:41 AM UTC-8, Stefan Monnier wrote:
> I just bumped into the Ampere Altra's specifications and was struck: for
> a processor with 250W TDP isn't a 32MB last-level cache small?
<
A) it depends on the workload
<
B) it depends on what the L3 cache is topologically.
<
If LLC is essentially a DRAM cache where DRAM lines are bought in, ECC
checked, fixed, written back out after merging with CPU and I/O Write
traffic, it very well may be big enough. It's not trying to decrease latency
seen at the CPU, but it is trying to reduce the latency of request arriving
at DRAM.
<
C) there may be cost reasons unknown to us (community) as to why
this kind of decision was made.
>
> I mean 32MB was considered large back in 2012 (e.g. on the Itanium
> 9500), but nowadays AMD's Epyc CPUs come with 128-256MB of cache.
>
> IIUC the Altra does have a 1MB per-core L2 cache, which is twice as
> large as current Epyc's core, IIUC, but I wonder if someone here has
> some intuition about why they'd go with such a small LLC.
>
>
> Stefan

Re: Cache size on Ampere

<0bbfedac-9ef1-461e-981d-4f29674d2fean@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=22200&group=comp.arch#22200

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5c16:: with SMTP id i22mr34717685qti.313.1638742607417;
Sun, 05 Dec 2021 14:16:47 -0800 (PST)
X-Received: by 2002:aca:646:: with SMTP id 67mr20072489oig.175.1638742607205;
Sun, 05 Dec 2021 14:16:47 -0800 (PST)
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 5 Dec 2021 14:16:47 -0800 (PST)
In-Reply-To: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.49; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.49
References: <jwvilw5a6cp.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0bbfedac-9ef1-461e-981d-4f29674d2fean@googlegroups.com>
Subject: Re: Cache size on Ampere
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 05 Dec 2021 22:16:47 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 15
 by: Michael S - Sun, 5 Dec 2021 22:16 UTC

On Friday, December 3, 2021 at 9:05:41 PM UTC+2, Stefan Monnier wrote:
> I just bumped into the Ampere Altra's specifications and was struck: for
> a processor with 250W TDP isn't a 32MB last-level cache small?
>
> I mean 32MB was considered large back in 2012 (e.g. on the Itanium
> 9500), but nowadays AMD's Epyc CPUs come with 128-256MB of cache.
>
> IIUC the Altra does have a 1MB per-core L2 cache, which is twice as
> large as current Epyc's core, IIUC, but I wonder if someone here has
> some intuition about why they'd go with such a small LLC.
>
>
> Stefan

Is Ampere currently in business of selling general-purpose server CPUs to any buyer or in business of selling themselves to one of the big guys (Google-Facebook-Microsoft, less likely Oracle)?
Probably, more later than the former. If true, your question is not mighty interesting.

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor