Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"It is easier to port a shell than a shell script." -- Larry Wall


computers / alt.comp.hardware.pc-homebuilt / Re: Cache Hierarchy Errors and Bios problems and Cache Size ?

SubjectAuthor
o Re: Cache Hierarchy Errors and Bios problems and Cache Size ?Paul

1
Re: Cache Hierarchy Errors and Bios problems and Cache Size ?

<tenaib$1pdeh$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=902&group=alt.comp.hardware.pc-homebuilt#902

 copy link   Newsgroups: alt.comp.hardware.pc-homebuilt
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: nos...@needed.invalid (Paul)
Newsgroups: alt.comp.hardware.pc-homebuilt
Subject: Re: Cache Hierarchy Errors and Bios problems and Cache Size ?
Date: Wed, 31 Aug 2022 05:44:42 -0400
Organization: A noiseless patient Spider
Lines: 72
Message-ID: <tenaib$1pdeh$1@dont-email.me>
References: <e67a5493-dd1e-4cbc-a20f-bc72d41a74e5n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 31 Aug 2022 09:44:43 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="bf4f192891370c34efba4c158dc7fd6f";
logging-data="1881553"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+gRfG8oMCGx/7rzzJEF7ZmkTpTUfOMNPE="
User-Agent: Ratcatcher/2.0.0.25 (Windows/20130802)
Cancel-Lock: sha1:HXwCX+laAN2Xwid2VZsEaNUq0m0=
In-Reply-To: <e67a5493-dd1e-4cbc-a20f-bc72d41a74e5n@googlegroups.com>
Content-Language: en-US
 by: Paul - Wed, 31 Aug 2022 09:44 UTC

On 8/31/2022 3:55 AM, Skybuck Flying wrote:
> Hellllloooo,
>
> Some alarming information has reached my eyeszzz and I can't say that I am pleaszzzeedd with it hehehe.
>
> The ryzen chips seem to be affected by Cachie Hierarchy Errors and they seem to happen randomly.
>
> A document also cross my eyezzz where these kinds of errors might be caused by high energytic particles.
>
> Since Ryzen has such a large cache, it's surface area is bigger and thus has a higher chance of being hit by a high energy particle ?!
>
> Could this be the cause of these mysterious Cache Hierachy Errors, which are plenty on google ? Or is it indeed caused by some bios malfunction which is also rumored to be the reason for the zen 4 delay.
>
> As a potential buyer of one of these chips ?! How likely is it that these kinds of problems will happen to me and my future chip ?!
>
> Have these problems been combatted and solved by now ?!
>
> Or is it a continueing lingering problem......
>
> Bye for now,
> Skybuck.
>

https://hardwarecanucks.com/cpu-motherboard/ecc-memory-amds-ryzen-deep-dive/5/

"While the error type might be listed as “Cache Hierarchy Error”, clicking on the
details tab reveals "ErrorType 9" which means that a memory hierarchy error has occurred."

The internals of the processor are protected by ECC on the CPU silicon die,
separate from the ECC subsystem used with the external memory DIMMs. It's a
question of how these are reported, that's the issue.

ECC memory protection provided by an ECC DIMM, is not an end to end process.
It's an isolated subsystem. New ECC is calculated inside the CPU, as memory
transactions enter the processor and traverse inside the processor. Errors
that happen once 128 bits of data get inside the CPU, would be recorded as
Machine Check Exceptions (MCE) rather than as memory ECC. Normally at least.

Normally, fault injection inside the CPU is not "exposed" as a user test
feature. I don't know of a way to generate an actual honest MCE, using just
code to do it. Presumably there is a fault injection register hiding in
there somewhere, and it's not limited to only being injected by a scan chain.
Features like this are necessary, to actually verify that the reporting
system in Windows or Linux is working.

In the case of the incoming 7000 series with "DDR5 only", the DDR5 scheme
is a different design from DDR4, radically different. Each DIMM has two channels
on it. And this will require some changes to how memory bus error detection
and reporting is done.

All I can tell you, is one of my informal indicators of corruption,
has NOT been set off on this machine. I'm typing this on a Zen3 5600G AM4
and Thunderbird has not had a corrupted file since I moved stuff over
here from the old Intel machine. The Intel machine regularly was
corrupting stuff. And that was a function of the memory type, DDR2
at the time. A VIA machine (Asrock mobo), was fault-free by comparison.
I went through three sets of RAM on the Intel board, and it did not
help the error floor on my E8400. The Intel system used an X48, and
the external memory protection flavor of ECC had been disabled at
BIOS level.

One sign that things are not "normal" on motherboards any more, is
the lack of an "ECC scrub" setting in the BIOS. There hasn't been
a sign of an active ECC subsystem in the BIOS screens, since, well,
since UEFI was invented. It's all very confusing, as to exactly
what the hell they think they are doing. As the above article
is attesting. The above article would be a LOT better, if a fault
injection mechanism to test for MCE conditions was known. The picture
the article paints, is not a complete picture. It's just a situation
that looks bad.

Paul

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor