Message-ID:

I surely do hope that's a syntax error. -- Larry Wall in <199710011752.KAA21624@wall.org>

On 4/24/2023 7:37 AM, David Brown wrote:
> On 24/04/2023 09:32, Don Y wrote:
>> On 4/22/2023 7:57 AM, David Brown wrote:
>>>>> However, in almost every case where CRC's might be useful, you have
>>>>> additional checks of the sanity of the data, and an all-zero or all-one
>>>>> data block would be rejected. For example, Ethernet packets use CRC for
>>>>> integrity checking, but an attempt to send a packet type 0 from MAC
>>>>> address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, would
>>>>> be rejected anyway.
>>>>
>>>> Why look at "data" -- which may be suspect -- and *then* check its CRC?
>>>> Run the CRC first. If it fails, decide how you are going to proceed
>>>> or recover.
>>>
>>> That is usually the order, yes. Sometimes you want "fail fast", such as
>>> dropping a packet that was not addressed to you (it doesn't matter if it was
>>> received correctly but for someone else, or it was addressed to you but the
>>> receiver address was corrupted - you are dropping the packet either way).
>>> But usually you will run the CRC then look at the data.
>>>
>>> But the order doesn't matter - either way, you are still checking for valid
>>> data, and if the data is invalid, it does not matter if the CRC only passed
>>> by luck or by all zeros.
>>
>> You're assuming the CRC is supposed to *vouch* for the data.
>> The CRC can be there simply to vouch for the *transport* of a
>> datagram.
>
> I am assuming that the CRC is there to determine the integrity of the data in
> the face of possible unintentional errors. That's what CRC checks are for.
> They have nothing to do with the content of the data, or the type of the data
> package or image.

Exactly. And, a CRC on *a* protocol can use ANY ALGORITHM that the protocol
defines. Not some "canned one-size fits all" approach.

> As an example of the use of CRC's in messaging, look at Ethernet frames:
>
> <https://en.wikipedia.org/wiki/Ethernet_frame>
>
> The CRC does not care about the content of the data it protects.

AND, if the packet yielded an incorrect CRC, you can assume the
data was corrupt... OR, you are looking at a different protocol
and MISTAKING it for something that you *think* it might be.

If I produce a stream of data, can you tell me what the checksum
for THAT stream *should* be? You have to either be told what
it is (and have a way of knowing what the checksum SHOULD be)
*or* have to make some assumptions about it.

If you have assumed wrong *or* if the data has been corrupt, then
the CRC should fail. You don't care why it failed -- because you
can't do anything about it. You just know that you can't use the data
in the way you THOUGHT it could be used.

>> So, use a version-specific CRC on the packet. If it fails, then
>> either the data in the packet has been corrupted (which could just
>> as easily have involved an embedded "interface version" parameter);
>> or the packet was formed with the wrong CRC.
>>
>> If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
>> why bother looking at a "protocol version" parameter? Would
>> you ALSO want to verify all the rest of the parameters?
>
> I'm sorry, I simply cannot see your point. Identifying the version of a
> protocol, or other protocol type information, is a totally orthogonal task to
> ensuring the integrity of the data. The concepts should be handled separately.

It is. A packet using protocol XYZ is delivered to port ABC.
Port ABC *only* handles protocol XYZ. Anything else arriving there,
with a potentially different checksum, is invalid. Even if, for example,
byte number 27 happens to have the correct "magic number" for that
protocol.

Because the message doesn't obey the rules defined by the protocol
FOR THAT PORT. What do I gain by insisting that byte number 27 must
be 0x5A that the CRC doesn't already tell me?

You are assuming the CRC has to identify the protocol. I didn't say that.
All I said was the CRC has to be correct for THAT protocol.

You likely don't use the same algorithm to compute the checksum of
a boot image as you do to verify the integrity of a ethernet datagram.
So, if you were presented with a stream of data, you wouldn't
arbitrarily decide to try different CRCs to see which yielded correct
results and, from that, *infer* the nature of the message.

Why would you think I wouldn't expect *a* particular protocol to use
a particular CRC?

>>>> What term would you have me use to indicate a "bias" applied to a CRC
>>>> algorithm?
>>>
>>> Well, first I'd note that any kind of modification to the basic CRC
>>> algorithm is pointless from the viewpoint of its use as an integrity check.
>>> (There have been, mostly historically, some justifications in terms of
>>> implementation efficiency. For example, bit and byte re-ordering could be
>>> done to suit hardware bit-wise implementations.)
>>>
>>> Otherwise I'd say you are picking a specific initial value if that is what
>>> you are doing, or modifying the final value (inverting it or xor'ing it with
>>> a fixed value). There is, AFAIK, no specific terms for these - and I don't
>>> see any benefit in having one. Misusing the term "salt" from cryptography
>>> is certainly not helpful.
>>
>> Salt just ensures that you can differentiate between functionally identical
>> values. I.e., in a CRC, it differentiates between the "0x0000" that CRC-1
>> generates from the "0x0000" that CRC-2 generates.
>
> Can we agree that this is called an "initial value", not "salt" ?

It depends on how you implement it. The point is to produce
different results for the same polynmomial.

>> You don't see the parallel to ensuring that *my* use of "Passw0rd" is
>> encoded in a different manner than *your* use of "Passw0rd"?
>
> No. They are different things.
>
> An important difference is that adding "salt" to a password hash is an
> important security feature. Picking a different initial value for a CRC
> instead of having appropriate protocol versioning in the data (or a surrounding
> envelope) is a misfeature.

And you don't see that verifying that a packet of data received at
port ABC that should only see the checksum associated with protocol
XYZ as being similarly related?

Why not just assume the lower level protocols are sufficient to
guarantee reliable delivery and, if something arrives at port ABC
then, by definition, it must be intact (not corrupt) and, as
nothing other than protocol XYZ *should* target that port, why
even bother checking magic numbers in a protocol packet?

You build these *superfluous* tests into products to ensure their
integrity -- by catching ANYTHING that "can't happen" (yet
somehow does)

> The second difference is the purpose of the hashing. The CRC here is for data
> integrity - spotting mistakes in the data during transfer or storage. The hash
> in a password is for security, avoiding the password ever being transmitted or
> stored in plain text.
>
> Any coincidence in the the way these might be implemented is just that -
> coincidence.
>
>>>> See the RMI desciption.
>>>
>>> I'm sorry, I have no idea what "RMI" is or where it is described. You've
>>> mentioned that abbreviation twice, but I can't figure it out.
>>
>> <https://en.wikipedia.org/wiki/RMI>
>> <https://en.wikipedia.org/wiki/OCL>
>>
>> Nothing magical with either term.
>
> I looked up RMI on Wikipedia before asking, and saw nothing of relevance to
> CRC's or checksums.

How do you think the marshalled arguments get from device A to (remote)
device B? And, the result(s) from device B back to device A?

Obviously *some* form of communication medium. So, some potential for
data to be corrupted (or altered!) in transit. Along with other
data streams to compete for those endpoints.

Imagine invoking a function and, between the actual construction of the
stack frame and the first line of code in the targeted function, "something"
can interfere with the data you're trying to pass (and results you're
hoping to eventually receive) as well as the actual function being targeted!

You don't worry about this because the compiler handles all of the machinery
AND it relies on the CPU being well-behaved; nothing can sneak in and
disturb the address/data -busses or alter register contents during this
process.

If, OTOH, such a possibility existed (as is the case with RPC/RMI), then
you would want the compiler to generate the machinery to ensure the
arguments get to the correct function and for the function to be able to
ensure that the arguments are actually intended for it.

If any of these things failed to happen, you'd panic() -- because there's
nothing you can do, at that point. You certainly can't fix any corrupted
values and can't deduce where they were intended to go (given that all
of that information can be just as corrupt).

With RPC/RMI, you can at least *know* that the "function linkage" failed
to operate as expected ON THIS INVOCATION. Because the RPC/RMI can
return a result indicating whether the linkage was intact *and*, if
so, the result of the actual function invocation.

If you deliver every packet to a single port, then the process listening
to that port has to demultiplex incoming messages to determine the server-side
stub to invoke for that message instance. You would likely use a standardized
protocol because you don't know anything about the incoming message -- except
that it is *supposed* to target a "remote procedure" (*local* to this node).

OTOH, if you target each particular remote function/procedure/method to
a function/procedure/method-SPECIFIC port, then how you handle "messages"
for one function need have no bearing on how you handle them for others.
And, you can exploit this as an added test to ensure the message you
are receiving at port JKL actually *appears* to be intended for port
JKL and not an accidental misdirect of a message intended for some
other port.

> I noticed no mention of "OCL" in your posts, and looking

You need to read more carefully.

---8<---8<---
>>>> I can't think of any use-cases where you would be passing around a block of
>>>> "pure" data that could reasonably take absolutely any value, without any
>>>> type of "envelope" information, and where you would think a CRC check is
>>>> appropriate.
>>>
>>> I append a *version specific* CRC to each packet of marshalled data
>>> in my RMIs. If the data is corrupted in transit *or* if the
>>> wrong version API ends up targeted, the operation will abend
>>> because we know the data "isn't right".
>>
>> Using a version-specific CRC sounds silly. Put the version information in
>> the packet.
>
> The packet routed to a particular interface is *supposed* to
> conform to "version X" of an interface. There are different stubs
> generated for different versions of EACH interface. The OCL for
> the interface defines (and is used to check) the form of that
> interface to that service/mechanism.
>
> The parameters are checked on the client side -- why tie up the
> transport medium with data that is inappropriate (redundant)
> to THAT interface? Why tie up the server verifying that data?
> The stub generator can perform all of those checks automatically
> and CONSISTENTLY based on the OCL definition of that version
> of that interface (because developers make mistakes).
>
> So, at the instant you schedule the marshalled data for transmission,
> you *know* the parameters are "appropriate" and compliant with
> the constraints of THAT version of THAT interface.
>
> Now, you have to ensure the packet doesn't get corrupted (altered) in
> transmission. If it remains intact, then there is no need to check
> the parameters on the server side.
>
> NONE OF THE PARAMETERS... including the (implied) "interface version" field!
>
> Yet, folks make mistakes. So, you want some additional reassurance
> that this is at least intended for this version of the interface,
> ESPECIALLY IF THAT CAN BE MADE AVAILABLE FOR ZERO COST (i.e., check
> to see if the residual is 0xDEADBEEF instead of 0xB16B00B5).
>
> Why burden the packet with a "protocol version" parameter?
---8<---8<---

> it up on Wikipedia gives no clues.

As I said, above:

"If, OTOH, such a possibility existed (as is the case with RPC/RMI),
then you would want the compiler to generate the machinery to ensure
the arguments get to the correct function and for the function to be
able to ensure that the arguments are actually intended for it."

You would want the IDL (Interface Definition Language) compiler to
generate stubs (client- and server-side) that enforced the constraints
specified in the IDL and OCL.

Again, in a perfect world, you'd not need any of these mechanisms.
Data wouldn't be corrupted on the wire. Hostiles wouldn't try to
subvert those messages. Developers would always ensure they
adhered to the contracts laid out for each API. etc.

"Yet, folks make mistakes."

> So for now, I'll assume you don't want anyone to know what you meant and I can
> safely ignore anything you write in connection with the terms.

Perhaps other folks were more careful in their reading (of the quoted passage,
above).

>>>> OTOH, "salting" the calculation so that it is expected to yield
>>>> a value of 0x13 means *those* situations will be flagged as errors
>>>> (and a different set of situations will sneak by, undetected).
>>>
>>> And that gives you exactly /zero/ benefit.
>>
>> See above.
>
> I did. Zero benefit.

Perhaps your reading was as deficient there as you've admitted it to
be elsewhere?

> Actually, it is worse than useless - it makes it harder to identify the
> protocol, and reduces the information content of the CRC check.
>
>>> You run your hash algorithm, and check for the single value that indicates
>>> no errors. It does not matter if that number is 0, 0x13, or - often more
>> -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> As you've admitted, it doesn't matter. So, why wouldn't I opt to have
>> an algorithm for THIS interface give me a result that is EXPECTED
>> for this protocol? What value picking "0"?
>
> A /single/ result does not matter (other than needlessly complicating things).
> Having multiple different valid results /does/ matter.

For any CRC calculation instance, you *know* what the result is expected to be.
How many different "check" algorithms do you think are operating in your
PC as you type/read (i.e., all of the protocols between devices running
in the box, all of the ROMs in those devices, the media accessed by them,
etc.)? Has EVERY developer who needed a CRC settled on the "Holy Grail"
of CRCs... because it's easiest? Or, have they each chosen schemes that
they consider appropriate to their needs?

I compute hashes of individual memory pages during reschedule()s.
And, verify that they are intact when next accessed (because they
may have been corrupted by a side-channel attack while not
actively being accessed -- by the owning task -- despite the
protections afforded by the MMU). Should I use the same "check"
algorithm that I do when sending a message to another node?
Or, that I use on the wire?

Should I use the same algorithm when checking 4K pages as I would
when checking 16MB pages? The goal isn't to *correct* errors so
I'd want one that detects the greatest number of errors LIKELY
INDUCED BY SUCH AN ATTACK (which can differ from the types of
*burst* errors that corrupt packets on the wire or lead to
read/write disturb errors in FLASH...)

As I said, up-thread: "... you don't just use CRCs (secure hashes, etc.)
on 'code images'"

>>>>> That is why you need to distinguish between the two possibilities. If you
>>>>> don't have to worry about malicious attacks, a 32-bit CRC takes a dozen
>>>>> lines of C code and a 1 KB table, all running extremely efficiently. If
>>>>> security is an issue, you need digital signatures - an RSA-based signature
>>>>> system is orders of magnitude more effort in both development time and in
>>>>> run time.
>>>>
>>>> It's considerably more expensive AND not fool-proof -- esp if the
>>>> attacker knows you are signing binaries. "OK, now I need to find
>>>> WHERE the signature is verified and just patch that "CALL" out
>>>> of the code".
>>>
>>> I'm not sure if that is a straw-man argument, or just showing your ignorance
>>> of the topic. Do you really think security checks are done by the program
>>> you are trying to send securely? That would be like trying to have building
>>> security where people entering the building look at their own security cards.
>>
>> Do YOU really think we all design applications that run in PCs where some
>> CLOSED OS performs these tests in a manner that can't be subverted?
>
> Do you bother to read my posts at all? Or do you prefer to make up things that
> you imagine I write, so that you can make nonsensical attacks on them?
> Certainly there is no sane reading of my posts (written and sent from an /open/
> OS) where "do not rely on security by obscurity" could be taken to mean "rely
> on obscured and closed platforms".

"Do you really think security checks are done by the program you are trying
to send securely? That would be like trying to have building security where
people entering the building look at their own security cards."

Who *else* is involved in the acceptance/verification of a code image
in an embedded product? (Not all "run Linux")

>> *WE* (tend to) write ALL the code in the products developed, here.
>> So, whether it's the POST WE wrote that is performing the test or
>> the loader WE wrote, it's still *our* program.
>>
>> Yes, we ARE looking at our own security cards!
>>
>> Manufacturers *try* to hide ("obscurity") details of these mechanisms
>> in an attempt to improve effective security. But, there's nothing
>> that makes these guarantees.
>
> Why are you trying to "persuade" me that manufacturer obscurity is a bad
> thing? You have been promoting obscurity of algorithms as though it were
> helpful for security - I have made clear that it is not. Are you getting your
> own position mixed up with mine?

If the manufacturer saw no benefit to obscurity, then why embrace it?

>> Give me the sources for Windows (Linux, *BSD, etc.) and I can
>> subvert all the state-of-the-art digital signing used to ensure
>> binaries aren't altered. Nothing *outside* the box is involved
>> so, by definition, everything I need has to reside *in* the box.
>
> No, you can't. The sources for Linux and *BSD /are/ all freely available. The
> private signing keys used by, for example, Red Hat or Debian, are /not/ freely
> available. You cannot make changes to a Red Hat or Debian package that will
> pass the security checks - you are unable to sign the packages.

Sure I can! If you are just signing a package to verify that it hasn't
been tampered with BUT THE CONTENTS ARE NOT ENCRYPTED, then all you have
to do is remove the signature check -- leaving the signature in the
(unchecked) executable.

This is different than *encrypting* the package (the OP said nothing
about encrypting his executable).

> This is precisely because something /outside/ the box /is/ involved - the
> private half of the public/private key used for signing. The public half - and
> all the details of the algorithms - is easily available to let people verify
> the signature, but the private half is kept secret.

And, if I eliminate the check that verifies the signature, then what
value signing? "Yes, I assume the risk of running an allegedly signed
executable (THAT MAY HAVE BEEN TAMPERED WITH)."

> (Sorry, but I've skipped and snipped the rest. I simply don't have time to go
> through it in detail. If others find it useful or interesting, that's great,
> but there has to be limits somewhere.)

The limits seem to be in your imagination. You believe there's *a* way
of doing things instead of a multitude of ways, each with different
tradeoffs. And, think you'll always have <whatever> is needed (resources,
time, staff, expertise, etc.) to get exactly those things. The "box"
surrounding you limits what you can see.

Sad in an engineer. But, must be incredibly comforting!

Bye, David.

Subject	Replies	Author
Embedding a Checksum in an Image File By: Rick C on Thu, 20 Apr 2023	79	Rick C

I surely do hope that's a syntax error. -- Larry Wall in <199710011752.KAA21624@wall.org>

computers / comp.arch.embedded / Re: Embedding a Checksum in an Image File