Message-ID:

"...Unix, MS-DOS, and Windows NT (also known as the Good, the Bad, and the Ugly)." (By Matt Welsh)

On 6/8/2021 12:38 PM, James Brakefield wrote:

> |> I contend that a good many "32b" implementations are really glorified
> |> 8/16b applications that exhausted their memory space.
>
> The only thing that will take more than 4GB is video or a day's worth of photos.

That's not true. For example, I rely on a "PC" in my current design
to support the RDBMS. Otherwise, I would have to design a "special
node" (I have a distributed system) that had the resources necessary
to process multiple concurrent queries in a timely fashion; I can
put 100GB of RAM in a PC (whereas my current nodes only have 256MB).

The alternative is to rely on secondary (disk) storage -- which is
even worse!

And "video" is incredibly nondescript. It conjures ideas of STBs.
Instead, I see a wider range of applications in terms of *vision*.

E.g., let your doorbell camera "notice motion", recognize that
motion as indicative of someone/thing approaching it (e.g.,
a visitor), recognize the face/features of the visitor and
alert you to its presence (if desired). No need to involve a
cloud service to do this.

[My "doorbell" is a camera/microphone/speaker. *If* I want to
know that you are present, *it* will tell me. Or, if told to
do so, will grant you access to the house (even in my absence).
For "undesirables", I'm mounting a coin mechanism adjacent to
the entryway (our front door is protected by a gated porch area):
"Deposit 25c to ring bell. If we want to talk to you, your
deposit will be refunded. If *not*, consider that the *cost* of
pestering us!"]

There are surveillance cameras discretely placed around the exterior
of the house (don't want the place to look like a frigging *bank*!).
One of them has a clear view of the mailbox (our mail is delivered
via lettercarriers riding in mail trucks). Same front door camera
hardware. But, now: detect motion; detect motion STOPPING
proximate to mailbox (for a few seconds or more); detect motion
resuming; signal "mail available". Again, no need to involve a
cloud service to accomplish this. And, when not watching for mail
delivery, it's performing "general" surveillance -- mail detection
is a "free bonus"!

Imagine designing a vision-based inspection system where you "train"
the CAMERA -- instead of some box that the camera connects to. And,
the CAMERA signals accept/reject directly.

[I use a boatload of cameras, here; they are cheap sensors -- the
"cost" lies in the signal processing!]

> So there is likely to be some embedded aps that need a > 32-bit address space.
> Cost, size or storage capacity are no longer limiting factors.

No, cost size and storage are ALWAYS limiting factors!

E.g., each of my nodes derive power from the wired network connection.
That puts a practical limit of ~12W on what a node can dissipate.
That has to support the processing core plus any local I/Os! Note
that dissipated power == heat. So, one also has to be conscious of
how that heat will affect the devices' environs.

(Yes, there are schemes to increase this to ~100W but now the cost
of providing power -- and BACKUP power -- to a remote device starts
to be a sizeable portion of the product's cost and complexity).

My devices are intended to be "invisible" to the user -- so, they
have to hide *inside* something (most commonly, the walls or
ceiling -- in standard Jboxes for accessibility and Code compliance).
So, that limits their size/volume (mine are about the volume of a
standard duplex receptacle -- 3 cu in -- so fit in even the smallest
of 1G boxes... even pancake boxes!)

They have to be inexpensive so I can justify using LOTS of them
(I will have 240 deployed, here; my industrial beta site will have
over 1000; commercial beta site almost a similar number). Not only
is the cost of initial acquisition of concern, but also the *perceived*
cost of maintaining the hardware in a functional state (customer
doesn't want to have $10K of spares on hand for rapid incident response
and staff to be able to diagnose and repair/replace "on demand")

In my case, I sidestep the PERSISTENT storage issue by relegating that
to the RDBMS. In *that* domain, I can freely add spinning rust or
an SSD without complicating the design of the rest of the nodes.
So, "storage" becomes:
- how much do I need for a secure bootstrap
- how much do I need to contain a downloaded (from the RDBMS!) binary
- how much do I need to keep "local runtime resources"
- how much can I exploit surplus capacity *elsewhere* in the system
to address transient needs

Imagine what it would be like having to replace "worn" SD cards
at some frequency in hundreds of devices scattered around hundreds
of "invisible" places! Almost as bad as replacing *batteries* in
those devices!

[Have you ever had an SD card suddenly write protect itself?]

> Am trying to puzzle out what a 64-bit embedded processor should look like.

"Should"? That depends on what you expect it to do for you.
The nonrecurring cost of development will become an ever-increasing
portion of the device's "cost". If you sell 10K units but spend
500K on development (over its lifetime), you've justification for
spending a few more dollars on recurring costs *if* you can realize
a reduction in development/maintenance costs (because the development
is easier, bugs are fewer/easier to find, etc.)

Developers (and silicon vendors, as Good Business Practice)
will look at their code and see what's "hard" to do, efficiently.
Then, consider mechanisms that could make that easier or
more effective.

I see the addition of hardware features that enhance the robustness
of the software development *process*. E.g., allowing for compartmentalizing
applications and subsystems more effectively and *efficiently*.

[I put individual objects into their own address space containers
to ensure Object A can't be mangled by Client B (or Object C). As
a result, talking to an object is expensive because I have to hop
back and forth across that protection boundary. It's even worse
when the targeted object is located on some other physical node
(as now I have the transport cost to contend with).]

Similarly, making communications more robust. We already see that
with crypto accelerators. The idea of device "islands" is
obsolescent. Increasingly, devices will interact with other
devices to solve problems. More processing will move to the
edge simply because of scaling issues (I can add more CPUs
far more effectively than I can increase the performance of
a "centralized" CPU; add another sense/control point? let *it*
bring some processing abilities along with it!).

And, securing the product from tampering/counterfeiting; it seems
like most approaches, to date, have some hidden weakness. It's hard
to believe hardware can't ameliorate that. The fact that "obscurity"
is still relied upon by silicon vendors suggests an acknowledgement
of their weaknesses.

Beyond that? Likely more DSP-related support in the "native"
instruction set (so you can blend operations between conventional
computing needs and signal processing related issues).

And, graphics acceleration as many applications implement user
interfaces in the appliance.

There may be some other optimizations that help with hashing
or managing large "datasets" (without them being considered
formal datasets).

Power management (and measurement) will become increasingly
important (I spend almost as much on the "power supply"
as I do on the compute engine). Developers will want to be
able to easily ascertain what they are consuming as well
as why -- so they can (dynamically) alter their strategies.
In addition to varying CPU clock frequency, there may be
mechanisms to automatically (!) power down sections of
the die based on observed instruction sequences (instead
of me having to explicitly do so).

[E.g., I shed load when I'm running off backup power.
This involves powering down nodes as well as the "fields"
on selective nodes. How do I decide *which* load to shed to
gain the greatest benefit?]

Memory management (in the conventional sense) will likely
see more innovation. Instead of just "settling" for a couple
of page sizes, we might see "adjustable" page sizes.
Or, the ability to specify some PORTION of a *particular*
page as being "valid" -- instead of treating the entire
page as such.

Scheduling algorithms will hopefully get additional
hardware support. E.g., everything is deadline driven
in my design ("real-time"). So, schedulers are concerned
with evaluating the deadlines of "ready" tasks -- which
can vary, over time, as well as may need further qualification
based on other criteria (e.g., least-slack-time scheduling)

Everything in my system is an *opaque* object on which a
set of POSSIBLE methods that can be invoked. But, each *Client*
of that object (an Actor may be multiple Clients if it possesses
multiple different Handles to the Object) is constrained as to
which methods can be invoked via a particular Handle.

So, I can (e.g.) create an Authenticator object that has methods like
"set_passphrase" and "test_passphrase" and "invalidate_passphrase".
Yet, no "disclose_passphrase" method (for obvious reasons).
I can create an Interface to one privileged Client that
allows it to *set* a new passphrase. And, all other Interfaces
(to that Client as well as others!) may all be restricted to
only *testing* the passphrase ("Is it 'foobar'?"). And, I can
limit the number of attempts that you can invoke a particular
method over a particular interface so the OS does the enforcement
instead of relying on the Server to do so.

[What's to stop a Client from hammering on the Server (Authenticator
Object) repeatedly -- invoking test_passphrase with full knowledge
that it doesn't know the correct passhrase: "Is it 'foobar'?"
"Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?"
The Client has been enabled to do this; that doesn't mean he can't or
won't abuse it!

Note that unlimited access means the server has to respond to each of
those method invocations. By contrast, putting a limit on them
means the OS can block the invocation from ever reaching the Object
(and needlessly tying up the Object's resources). A capabilities
based system that relies on encrypted tokens means the Server has
to decrypt a token in order to determine that it is invalid;
the Server's resources are consumed instead of the Client's]

It takes effort (in the kernel) to verify that a Client *can* access a
particular Object (i.e., has a Handle to it) AND that the Client can
invoke THAT particular Method on that Object via this Handle (bound to
a particular Object *Interface*) as well as verifying the format of
the data, converting to a format suitable for the targeted Object
(which may use a different representational structure) for a
particular Version of the Interface...

I can either skimp on performing some of these checks (and rely
on other mechanisms to ensure the security and reliability of
the codebase -- in the presence of unvetted Actors) or hope
that some hardware mechanism in the processor makes these a bit
easier.

> At the low end, yeah, a simple RISC processor. And support for complex arithmetic
> using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
> 32-bit pointers into the software?

I doubt complex arithmetic will have much play. There might be support for
*building* larger data types (e.g., I use BigRationals which are incredibly
inefficient). But, the bigger bang will be for operators that allow
tedious/iterative solutions to be implemented in constant time. This,
for example, is why a hardware multiply (or other FPU capabilities)
is such a win -- consider the amount of code that is replaced by a single
op-code! Ditto things like "find first set bit", etc.

Why stick with 32b floats when you can likely implement doubles with a bit
more microcode (surely faster than trying to do wider operations built from
narrower ones)?

There's an entirely different mindset when you start thinking in
terms of "bigger processors". I.e., the folks who see 32b processors as
just *wider* 8/16b processors have typically not made this adjustment.
It's like trying to "sample the carry" in a HLL (common in ASM)
instead of concentrating on what you REALLY want to do and letting
the language make it easier for you to express that.

Expect to see people making leaps forward in terms of what they
expect from the solutions they put forth. Anything that you could
do with a PC, before, can now be done *in* a handheld flashlight!

Subject	Replies	Author
64-bit embedded computing is here and now By: James Brakefield on Mon, 7 Jun 2021	58	James Brakefield

"...Unix, MS-DOS, and Windows NT (also known as the Good, the Bad, and the Ugly)." (By Matt Welsh)

computers / comp.arch.embedded / Re: 64-bit embedded computing is here and now