Message-ID:

:-) your own self. -- Larry Wall in <199709261754.KAA23761@wall.org>

computers / comp.arch / Re: Misc: Cache Sizes / Observations

Re: Misc: Cache Sizes / Observations

<89d9f2fb-57a0-42bc-ae83-0bca231076bcn@googlegroups.com>

https://www.novabbs.com/computers/article-flat.php?id=23892&group=comp.arch#23892

X-Received: by 2002:a05:6214:c89:b0:432:c49b:41a5 with SMTP id r9-20020a0562140c8900b00432c49b41a5mr18602893qvr.48.1646278979964;
Wed, 02 Mar 2022 19:42:59 -0800 (PST)
X-Received: by 2002:a05:6870:7391:b0:d9:ae66:b8df with SMTP id
z17-20020a056870739100b000d9ae66b8dfmr2312266oam.7.1646278979702; Wed, 02 Mar
2022 19:42:59 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 2 Mar 2022 19:42:59 -0800 (PST)
In-Reply-To: <svp7va$3ga$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=162.229.185.59; posting-account=Gm3E_woAAACkDRJFCvfChVjhgA24PTsb
NNTP-Posting-Host: 162.229.185.59
References: <svj5qj$19u$1@dont-email.me> <svjak4$9kl$1@dont-email.me>
<svkeq1$phm$1@dont-email.me> <svm7r9$3ec$1@dont-email.me> <2022Mar2.101217@mips.complang.tuwien.ac.at>
<svoa5q$ud$1@dont-email.me> <2022Mar2.192515@mips.complang.tuwien.ac.at>
<svogd7$o9s$1@dont-email.me> <svokin$rtp$1@dont-email.me> <3215daef-0814-4dd1-b2ae-a00cd22d0d1dn@googlegroups.com>
<svp7va$3ga$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <89d9f2fb-57a0-42bc-ae83-0bca231076bcn@googlegroups.com>
Subject: Re: Misc: Cache Sizes / Observations
From: yogaman...@yahoo.com (Scott Smader)
Injection-Date: Thu, 03 Mar 2022 03:42:59 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 210

by: Scott Smader - Thu, 3 Mar 2022 03:42 UTC

On Wednesday, March 2, 2022 at 6:08:45 PM UTC-8, BGB wrote:
> On 3/2/2022 5:43 PM, MitchAlsup wrote:
> > On Wednesday, March 2, 2022 at 2:37:47 PM UTC-6, Stephen Fuld wrote:
> >> On 3/2/2022 11:26 AM, BGB wrote:
> >>> On 3/2/2022 12:25 PM, Anton Ertl wrote:
> >>>> BGB writes:
> >>>>> On 3/2/2022 3:12 AM, Anton Ertl wrote:
> >>>>>> A 512KB cache with 64-byte lines has 8192 cache lines, just like a
> >>>>>> 256KB cache with 32-byte lines and a 128KB cache with 16-byte lines.
> >>>>>> Without spatial locality, I would expect similar miss rates for all of
> >>>>>> them for the same associativity; and given that programs have spatial
> >>>>>> locality, I would expect the larger among these configurations to have
> >>>>>> an advantage. Are you sure that your cache simulator has no bugs?
> >>>> ...
> >>>>> After fixing this bug:
> >>>>> 131072 2.004% 1.318% (1 way, 16 line)
> >>>>> 65536 2.851% 1.516% (1 way, 16 line)
> >>>>> 32768 3.540% 1.445% (1 way, 16 line)
> >>>>> 16384 6.604% 4.043% (1 way, 16 line)
> >>>>> 8192 9.112% 6.052% (1 way, 16 line)
> >>>>> 4096 11.310% 4.504% (1 way, 16 line)
> >>>>> 2048 14.326% 5.990% (1 way, 16 line)
> >>>>> 1024 17.632% 7.066% (1 way, 16 line)
> >>>>>
> >>>>> 131072 1.966% 0.821% (2 way, 16 line)
> >>>>> 65536 2.550% 0.513% (2 way, 16 line)
> >>>>> 32768 3.303% 0.791% (2 way, 16 line)
> >>>>> 16384 6.779% 3.842% (2 way, 16 line)
> >>>>> 8192 8.484% 1.766% (2 way, 16 line)
> >>>>> 4096 10.905% 2.773% (2 way, 16 line)
> >>>>> 2048 13.588% 3.319% (2 way, 16 line)
> >>>>> 1024 16.022% 3.229% (2 way, 16 line)
> >>>>>
> >>>>> 262144 15.748% 14.750% (1 way, 32 line)
> >>>>> 131072 15.929% 14.382% (1 way, 32 line)
> >>>>> 65536 16.274% 14.057% (1 way, 32 line)
> >>>>> 32768 16.843% 14.122% (1 way, 32 line)
> >>>>> 16384 19.819% 16.567% (1 way, 32 line)
> >>>>> 8192 22.360% 18.319% (1 way, 32 line)
> >>>>> 4096 24.583% 16.422% (1 way, 32 line)
> >>>>> 2048 27.325% 16.513% (1 way, 32 line)
> >>>>>
> >>>>> 262144 2.093% 0.872% (2 way, 32 line)
> >>>>> 131072 2.893% 0.836% (2 way, 32 line)
> >>>>> 65536 3.540% 0.949% (2 way, 32 line)
> >>>>> 32768 4.854% 1.808% (2 way, 32 line)
> >>>>> 16384 8.788% 4.997% (2 way, 32 line)
> >>>>> 8192 11.456% 3.638% (2 way, 32 line)
> >>>>> 4096 14.699% 4.352% (2 way, 32 line)
> >>>>> 2048 17.580% 4.251% (2 way, 32 line)
> >>>>>
> >>>>> 524288 10.825% 10.370% (1 way, 64 line)
> >>>>> 262144 11.019% 10.412% (1 way, 64 line)
> >>>>> 131072 11.188% 10.232% (1 way, 64 line)
> >>>>> 65536 11.652% 10.320% (1 way, 64 line)
> >>>>> 32768 12.388% 10.699% (1 way, 64 line)
> >>>>> 16384 15.806% 13.581% (1 way, 64 line)
> >>>>> 8192 18.753% 15.604% (1 way, 64 line)
> >>>>> 4096 21.158% 13.259% (1 way, 64 line)
> >>>>>
> >>>>> 524288 0.863% 0.439% (2 way, 64 line)
> >>>>> 262144 1.297% 0.535% (2 way, 64 line)
> >>>>> 131072 1.896% 0.700% (2 way, 64 line)
> >>>>> 65536 2.505% 0.979% (2 way, 64 line)
> >>>>> 32768 3.876% 1.923% (2 way, 64 line)
> >>>>> 16384 8.341% 5.535% (2 way, 64 line)
> >>>>> 8192 11.346% 3.956% (2 way, 64 line)
> >>>>> 4096 14.262% 3.722% (2 way, 64 line)
> >>>>>
> >>>>>
> >>>>> It would appear that 1-way still does poorly with larger cache lines,
> >>>>
> >>>> Probably still a bug, see above.
> >>>>
> >>>>> As for why now 1-way 32B appears to be doing worse than 1-way 64B, no
> >>>>> idea, this doesn't really make sense.
> >>>>
> >>>> For the same number of cache lines, that's expected, due to spatial
> >>>> locality.
> >>>>
> >>>
> >>> After posting this, I did find another bug:
> >>> The cache index pairs were always being calculated as-if the cache line
> >>> were 16B. After fixing this bug, all the lines got much closer together
> >>> (as shown on another graph posted to Twitter after the last graph).
> >>>
> >>> In this case, now the overall hit/miss ratio is more consistent between
> >>> cache-line sizes. The main difference is that larger cache-line sizes
> >>> still have a higher proportion of conflict misses.
> >>>
> >>>
> >>> Values following the more recent bugfix:
> >>> 131072 2.004% 1.318% (1 way, 16 line)
> >>> 65536 2.851% 1.516% (1 way, 16 line)
> >>> 32768 3.540% 1.445% (1 way, 16 line)
> >>> 16384 6.604% 4.043% (1 way, 16 line)
> >>> 8192 9.112% 6.052% (1 way, 16 line)
> >>> 4096 11.310% 4.504% (1 way, 16 line)
> >>> 2048 14.326% 5.990% (1 way, 16 line)
> >>> 1024 17.632% 7.066% (1 way, 16 line)
> >>>
> >>> 131072 1.966% 0.821% (2 way, 16 line)
> >>> 65536 2.550% 0.513% (2 way, 16 line)
> >>> 32768 3.303% 0.791% (2 way, 16 line)
> >>> 16384 6.779% 3.842% (2 way, 16 line)
> >>> 8192 8.484% 1.766% (2 way, 16 line)
> >>> 4096 10.905% 2.773% (2 way, 16 line)
> >>> 2048 13.588% 3.319% (2 way, 16 line)
> >>> 1024 16.022% 3.229% (2 way, 16 line)
> >>>
> >>> 262144 0.905% 0.630% (1 way, 32 line)
> >>> 131072 1.326% 0.912% (1 way, 32 line)
> >>> 65536 1.976% 1.160% (1 way, 32 line)
> >>> 32768 2.712% 1.504% (1 way, 32 line)
> >>> 16384 5.964% 4.456% (1 way, 32 line)
> >>> 8192 8.728% 6.671% (1 way, 32 line)
> >>> 4096 11.262% 5.347% (1 way, 32 line)
> >>> 2048 14.579% 6.659% (1 way, 32 line)
> >>>
> >>> 262144 0.671% 0.329% (2 way, 32 line)
> >>> 131072 1.148% 0.436% (2 way, 32 line)
> >>> 65536 1.514% 0.348% (2 way, 32 line)
> >>> 32768 2.259% 0.795% (2 way, 32 line)
> >>> 16384 6.041% 4.169% (2 way, 32 line)
> >>> 8192 8.102% 2.335% (2 way, 32 line)
> >>> 4096 10.690% 3.071% (2 way, 32 line)
> >>> 2048 13.354% 3.134% (2 way, 32 line)
> >>>
> >>> 524288 0.442% 0.311% (1 way, 64 line)
> >>> 262144 0.717% 0.561% (1 way, 64 line)
> >>> 131072 1.023% 0.759% (1 way, 64 line)
> >>> 65536 1.661% 1.154% (1 way, 64 line)
> >>> 32768 2.538% 1.818% (1 way, 64 line)
> >>> 16384 6.168% 5.201% (1 way, 64 line)
> >>> 8192 9.279% 7.640% (1 way, 64 line)
> >>> 4096 12.084% 6.163% (1 way, 64 line)
> >>>
> >>> 524288 0.219% 0.079% (2 way, 64 line)
> >>> 262144 0.428% 0.215% (2 way, 64 line)
> >>> 131072 0.697% 0.251% (2 way, 64 line)
> >>> 65536 0.984% 0.298% (2 way, 64 line)
> >>> 32768 1.924% 1.005% (2 way, 64 line)
> >>> 16384 6.169% 4.797% (2 way, 64 line)
> >>> 8192 8.642% 2.983% (2 way, 64 line)
> >>> 4096 10.878% 2.647% (2 way, 64 line)
> >>>
> >>>
> >>> Though, it seems 16B cache lines are no longer a clear winner in this
> >>> case...
> >> I am not sure how you determined this. To do an apples to apples
> >> comparison, as Anton explained, you have to compare the same amount of
> >> total cache. Thus, for example, comparing a 16 byte line at a
> >> particular cache size should be compared against a 32 byte line at twice
> >> the cache size. Otherwise, the you can't distinguish between the total
> >> cache size effect and the cache line size effect. To me, it seems like
> >> for equivalent cache sizes, 16B lines are a clear win. This reflects a
> >> higher proportion of temporal locality than spatial locality.
> > <
> > And then there is that bus occupancy thing.........
> I am not modelling the bus or latency in this case, only miss rates and
> similar.
> >>
> >> BTW, with a little more work, you can distinguish spatial locality hits
> >> from temporal locality hits. You have to keep the actual address that
> >> caused the miss, then on subsequent hits, see if they are to the same or
> >> a different address.
> > <
> > We figured out (around 1992) that one could model all cache sizes
> > and association levels simultaneously.
> I was running a bunch of instances of the cache in parallel and then
> periodically dumping the output statistics into a log file (once every
> 16-million memory accesses).
>
> Then was using the numbers in the final dump of the log file (before I
> closed the emulator).
>
>
> Annoyingly, I had to hand-edit them into a form where I could load them
> into a spreadsheet for a graph (kinda annoying). Hadn't gotten around
> around to write an alternate set of logic that dumps the numbers in CSV
> form (as for whatever reason, OpenOffice can't import tables correctly
> where the numbers are padded into columns by varying amounts of
> whitespace, ...).

Dunno about OpenOffice, but the Calc Import dialog for CSV in LibreOffice
(v7.1.8.1) allows both <Tab> and <Space> characters as separators and
has a button to allow separator characters to be merged. It seems to work
like I think you want.

> >>
> >> One more suggestion. If you are not outputting a trace, but keeping the
> >> statistics "on the fly" in the emulator, this is non-optimal. Yes,
> >> outputting the full trace (of all loads and stores) will slow the
> >> emulator down significantly, you only have to do it once. Then by
> >> creating another program that just emulates the cache behavior, you can
> >> run lots of tests on the same trace data (different line sizes, cache
> >> sizes, LRU policies, number of ways, etc.) without rerunning the full
> >> simulator. This second program should be quite fast, as it doesn't have
> >> to emulate the whole CPU.
> >>> There is still a hump in the conflict miss estimates, I still suspect
> >>> this is likely due to the smaller caches hitting the limit of the 4-way
> >>> estimator.
> >>>
> >>> Well, either that, or I am not estimating conflict miss rate correctly...
> >> See
> >>
> >>
> >> https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric#Conflict_misses
> >> --
> >> - Stephen Fuld
> >> (e-mail address disguised to prevent spam)

Subject	Replies	Author
Misc: Cache Sizes / Observations By: BGB on Mon, 28 Feb 2022	23	BGB