Message-ID:

My apologies if I sound angry. I feel like I'm talking to a void. -- Avery Pennarun

On 3/2/2022 11:26 AM, BGB wrote:
> On 3/2/2022 12:25 PM, Anton Ertl wrote:
>> BGB <cr88192@gmail.com> writes:
>>> On 3/2/2022 3:12 AM, Anton Ertl wrote:
>>>> A 512KB cache with 64-byte lines has 8192 cache lines, just like a
>>>> 256KB cache with 32-byte lines and a 128KB cache with 16-byte lines.
>>>> Without spatial locality, I would expect similar miss rates for all of
>>>> them for the same associativity; and given that programs have spatial
>>>> locality, I would expect the larger among these configurations to have
>>>> an advantage. Are you sure that your cache simulator has no bugs?
>> ...
>>> After fixing this bug:
>>> 131072      2.004%     1.318% (1 way, 16 line)
>>>    65536      2.851%     1.516% (1 way, 16 line)
>>>    32768      3.540%     1.445% (1 way, 16 line)
>>>    16384      6.604%     4.043% (1 way, 16 line)
>>>     8192      9.112%     6.052% (1 way, 16 line)
>>>     4096     11.310%     4.504% (1 way, 16 line)
>>>     2048     14.326%     5.990% (1 way, 16 line)
>>>     1024     17.632%     7.066% (1 way, 16 line)
>>>
>>> 131072      1.966%     0.821% (2 way, 16 line)
>>>    65536      2.550%     0.513% (2 way, 16 line)
>>>    32768      3.303%     0.791% (2 way, 16 line)
>>>    16384      6.779%     3.842% (2 way, 16 line)
>>>     8192      8.484%     1.766% (2 way, 16 line)
>>>     4096     10.905%     2.773% (2 way, 16 line)
>>>     2048     13.588%     3.319% (2 way, 16 line)
>>>     1024     16.022%     3.229% (2 way, 16 line)
>>>
>>> 262144     15.748%    14.750% (1 way, 32 line)
>>> 131072     15.929%    14.382% (1 way, 32 line)
>>>    65536     16.274%    14.057% (1 way, 32 line)
>>>    32768     16.843%    14.122% (1 way, 32 line)
>>>    16384     19.819%    16.567% (1 way, 32 line)
>>>     8192     22.360%    18.319% (1 way, 32 line)
>>>     4096     24.583%    16.422% (1 way, 32 line)
>>>     2048     27.325%    16.513% (1 way, 32 line)
>>>
>>> 262144      2.093%     0.872% (2 way, 32 line)
>>> 131072      2.893%     0.836% (2 way, 32 line)
>>>    65536      3.540%     0.949% (2 way, 32 line)
>>>    32768      4.854%     1.808% (2 way, 32 line)
>>>    16384      8.788%     4.997% (2 way, 32 line)
>>>     8192     11.456%     3.638% (2 way, 32 line)
>>>     4096     14.699%     4.352% (2 way, 32 line)
>>>     2048     17.580%     4.251% (2 way, 32 line)
>>>
>>> 524288     10.825%    10.370% (1 way, 64 line)
>>> 262144     11.019%    10.412% (1 way, 64 line)
>>> 131072     11.188%    10.232% (1 way, 64 line)
>>>    65536     11.652%    10.320% (1 way, 64 line)
>>>    32768     12.388%    10.699% (1 way, 64 line)
>>>    16384     15.806%    13.581% (1 way, 64 line)
>>>     8192     18.753%    15.604% (1 way, 64 line)
>>>     4096     21.158%    13.259% (1 way, 64 line)
>>>
>>> 524288      0.863%     0.439% (2 way, 64 line)
>>> 262144      1.297%     0.535% (2 way, 64 line)
>>> 131072      1.896%     0.700% (2 way, 64 line)
>>>    65536      2.505%     0.979% (2 way, 64 line)
>>>    32768      3.876%     1.923% (2 way, 64 line)
>>>    16384      8.341%     5.535% (2 way, 64 line)
>>>     8192     11.346%     3.956% (2 way, 64 line)
>>>     4096     14.262%     3.722% (2 way, 64 line)
>>>
>>>
>>> It would appear that 1-way still does poorly with larger cache lines,
>>
>> Probably still a bug, see above.
>>
>>> As for why now 1-way 32B appears to be doing worse than 1-way 64B, no
>>> idea, this doesn't really make sense.
>>
>> For the same number of cache lines, that's expected, due to spatial
>> locality.
>>
>
> After posting this, I did find another bug:
> The cache index pairs were always being calculated as-if the cache line
> were 16B. After fixing this bug, all the lines got much closer together
> (as shown on another graph posted to Twitter after the last graph).
>
> In this case, now the overall hit/miss ratio is more consistent between
> cache-line sizes. The main difference is that larger cache-line sizes
> still have a higher proportion of conflict misses.
>
>
> Values following the more recent bugfix:
> 131072      2.004%     1.318% (1 way, 16 line)
> 65536      2.851%     1.516% (1 way, 16 line)
> 32768      3.540%     1.445% (1 way, 16 line)
> 16384      6.604%     4.043% (1 way, 16 line)
>    8192      9.112%     6.052% (1 way, 16 line)
>    4096     11.310%     4.504% (1 way, 16 line)
>    2048     14.326%     5.990% (1 way, 16 line)
>    1024     17.632%     7.066% (1 way, 16 line)
>
> 131072      1.966%     0.821% (2 way, 16 line)
> 65536      2.550%     0.513% (2 way, 16 line)
> 32768      3.303%     0.791% (2 way, 16 line)
> 16384      6.779%     3.842% (2 way, 16 line)
>    8192      8.484%     1.766% (2 way, 16 line)
>    4096     10.905%     2.773% (2 way, 16 line)
>    2048     13.588%     3.319% (2 way, 16 line)
>    1024     16.022%     3.229% (2 way, 16 line)
>
> 262144      0.905%     0.630% (1 way, 32 line)
> 131072      1.326%     0.912% (1 way, 32 line)
> 65536      1.976%     1.160% (1 way, 32 line)
> 32768      2.712%     1.504% (1 way, 32 line)
> 16384      5.964%     4.456% (1 way, 32 line)
>    8192      8.728%     6.671% (1 way, 32 line)
>    4096     11.262%     5.347% (1 way, 32 line)
>    2048     14.579%     6.659% (1 way, 32 line)
>
> 262144      0.671%     0.329% (2 way, 32 line)
> 131072      1.148%     0.436% (2 way, 32 line)
> 65536      1.514%     0.348% (2 way, 32 line)
> 32768      2.259%     0.795% (2 way, 32 line)
> 16384      6.041%     4.169% (2 way, 32 line)
>    8192      8.102%     2.335% (2 way, 32 line)
>    4096     10.690%     3.071% (2 way, 32 line)
>    2048     13.354%     3.134% (2 way, 32 line)
>
> 524288      0.442%     0.311% (1 way, 64 line)
> 262144      0.717%     0.561% (1 way, 64 line)
> 131072      1.023%     0.759% (1 way, 64 line)
> 65536      1.661%     1.154% (1 way, 64 line)
> 32768      2.538%     1.818% (1 way, 64 line)
> 16384      6.168%     5.201% (1 way, 64 line)
>    8192      9.279%     7.640% (1 way, 64 line)
>    4096     12.084%     6.163% (1 way, 64 line)
>
> 524288      0.219%     0.079% (2 way, 64 line)
> 262144      0.428%     0.215% (2 way, 64 line)
> 131072      0.697%     0.251% (2 way, 64 line)
> 65536      0.984%     0.298% (2 way, 64 line)
> 32768      1.924%     1.005% (2 way, 64 line)
> 16384      6.169%     4.797% (2 way, 64 line)
>    8192      8.642%     2.983% (2 way, 64 line)
>    4096     10.878%     2.647% (2 way, 64 line)
>
>
> Though, it seems 16B cache lines are no longer a clear winner in this
> case...

I am not sure how you determined this. To do an apples to apples
comparison, as Anton explained, you have to compare the same amount of
total cache. Thus, for example, comparing a 16 byte line at a
particular cache size should be compared against a 32 byte line at twice
the cache size. Otherwise, the you can't distinguish between the total
cache size effect and the cache line size effect. To me, it seems like
for equivalent cache sizes, 16B lines are a clear win. This reflects a
higher proportion of temporal locality than spatial locality.

BTW, with a little more work, you can distinguish spatial locality hits
from temporal locality hits. You have to keep the actual address that
caused the miss, then on subsequent hits, see if they are to the same or
a different address.

One more suggestion. If you are not outputting a trace, but keeping the
statistics "on the fly" in the emulator, this is non-optimal. Yes,
outputting the full trace (of all loads and stores) will slow the
emulator down significantly, you only have to do it once. Then by
creating another program that just emulates the cache behavior, you can
run lots of tests on the same trace data (different line sizes, cache
sizes, LRU policies, number of ways, etc.) without rerunning the full
simulator. This second program should be quite fast, as it doesn't have
to emulate the whole CPU.

> There is still a hump in the conflict miss estimates, I still suspect
> this is likely due to the smaller caches hitting the limit of the 4-way
> estimator.
>
> Well, either that, or I am not estimating conflict miss rate correctly...

See

https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric#Conflict_misses

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Subject	Replies	Author
Misc: Cache Sizes / Observations By: BGB on Mon, 28 Feb 2022	23	BGB

My apologies if I sound angry. I feel like I'm talking to a void. -- Avery Pennarun

computers / comp.arch / Re: Misc: Cache Sizes / Observations