novaBBS - comp.arch - Hardware OS

Hardware OS

<4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=23343&group=comp.arch#23343

X-Received: by 2002:a05:620a:40c8:: with SMTP id g8mr42949qko.706.1644560402917;
Thu, 10 Feb 2022 22:20:02 -0800 (PST)
X-Received: by 2002:a05:6870:a442:: with SMTP id n2mr255364oal.21.1644560402610;
Thu, 10 Feb 2022 22:20:02 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 10 Feb 2022 22:20:02 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:24b5:1cb1:1040:7484;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:24b5:1cb1:1040:7484
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Subject: Hardware OS
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Fri, 11 Feb 2022 06:20:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 13

by: robf...@gmail.com - Fri, 11 Feb 2022 06:20 UTC

I am wondering how much of the OS to put into hardware? Are there any references to OS components placed into hardware?

I put the ready queue into hardware. To help prevent the queue overflowing a massive fifo is used that can handle thousands of tasks. Selecting a task to run is easy now. Just pop a task id off the fifo. Push the task id back onto the fifo if it is to continue running.

The system can handle up to 4095 tasks. I am thinking of having the task timeout decrement for waiting tasks implemented in hardware. So, an array of 4095 48-bit hardware decrementers could be used. Or rather an array of 64 hardware decrementers each responsible for 64 tasks. It would then take 64 clock cycles to do a decrement operation.

robf...@gmail.com wrote:
> I am wondering how much of the OS to put into hardware? Are there any references to OS components placed into hardware?
>
> I put the ready queue into hardware. To help prevent the queue overflowing a massive fifo is used that can handle thousands of tasks. Selecting a task to run is easy now. Just pop a task id off the fifo. Push the task id back onto the fifo if it is to continue running.
>
> The system can handle up to 4095 tasks. I am thinking of having the task timeout decrement for waiting tasks implemented in hardware. So, an array of 4095 48-bit hardware decrementers could be used. Or rather an array of 64 hardware decrementers each responsible for 64 tasks. It would then take 64 clock cycles to do a decrement operation.
>
This sounds like a perfect match for a priority queue, with no need to
update a bunch of counters?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Hardware OS

<j6mv4mFeqe6U1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23349&group=comp.arch#23349

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Hardware OS
Date: Fri, 11 Feb 2022 13:20:53 +0200
Organization: Tidorum Ltd
Lines: 36
Message-ID: <j6mv4mFeqe6U1@mid.individual.net>
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 3gJ3T9371US16XTgknkKigB0qu8mArFKnV2UK3g5Q8gIlnTt5X
Cancel-Lock: sha1:VSuyIwJaa9lZwjm6RrXlWA7b5aE=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.6.0
Content-Language: en-US
In-Reply-To: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>

by: Niklas Holsti - Fri, 11 Feb 2022 11:20 UTC

On 2022-02-11 8:20, robf...@gmail.com wrote:
> I am wondering how much of the OS to put into hardware? Are there any
> references to OS components placed into hardware?
>
> I put the ready queue into hardware. To help prevent the queue
> overflowing a massive fifo is used that can handle thousands of
> tasks. Selecting a task to run is easy now. Just pop a task id off
> the fifo. Push the task id back onto the fifo if it is to continue
> running.
>
> The system can handle up to 4095 tasks. I am thinking of having the
> task timeout decrement for waiting tasks implemented in hardware. So,
> an array of 4095 48-bit hardware decrementers could be used. Or
> rather an array of 64 hardware decrementers each responsible for 64
> tasks. It would then take 64 clock cycles to do a decrement
> operation.

There was at least one HW "coprocessor" designed to implement the Ada
tasking facilities for real-time systems. A web search for "Ada tasking
coprocessor" finds several publications; this is one that is available
gratis:

https://www.researchgate.net/publication/234334052_AXS_An_Ada_coprocessor_based_real-time_kernel

However, just as for language-specific processors like Lisp machines, it
seems that speed increases in general-purpose processors overcame the
advantages of the Ada tasking coprocessors. And perhaps the Ada tasking
SW implementations also improved, and the Ada language evolved with new
tasking features that were not supported by the coprocessors. I don't
think any current Ada systems use tasking coprocessors.

I do believe that it makes sense to develop processor architectures to
make task/thread/process switches and inter-task communications more
efficient, and this may benefit from new HW features. The Mill is one
example of such architectures.

Re: Hardware OS

<848c866c-9b07-41fa-83eb-6714eff388b3n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23351&group=comp.arch#23351

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:2427:: with SMTP id gy7mr1505976qvb.71.1644594911036;
Fri, 11 Feb 2022 07:55:11 -0800 (PST)
X-Received: by 2002:a4a:98c8:: with SMTP id b8mr764962ooj.50.1644594910771;
Fri, 11 Feb 2022 07:55:10 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 07:55:10 -0800 (PST)
In-Reply-To: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:84f7:6231:3421:a79e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:84f7:6231:3421:a79e
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <848c866c-9b07-41fa-83eb-6714eff388b3n@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 11 Feb 2022 15:55:11 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 101

by: MitchAlsup - Fri, 11 Feb 2022 15:55 UTC

On Friday, February 11, 2022 at 12:20:05 AM UTC-6, robf...@gmail.com wrote:
> I am wondering how much of the OS to put into hardware? Are there any references to OS components placed into hardware?
<
I am working on a design (My 66000) where the hardware contains a low-level
scheduler and simultaneously hosts up to 128 virtual machines, each virtual
machine can have up to 512 application processes affinitized to cores in this
chip, each. Device and IPI interrupts, exceptions, privileged, OS, and application
processes.
<
Events index this table, and associate the event to a process which will run on
the affinity set of desired cores at a priority.
<
cores do not preform context switches--cores run until a context switch message
arrives. Upon arrival, current context is messaged back to scheduler where process
state is saved.
<
From the arrival of the context switch arriving at CPU to starting to run that process
in the CPU takes about 10 cycles. Each context switch is contains a process in its
virtual machine. The scheduler operates across multiple protection domains, much
like a MMU running nested page tables. All from a few pages of DRAM overhead
per chip.
>
> I put the ready queue into hardware. To help prevent the queue overflowing a massive fifo is used that can handle thousands of tasks. Selecting a task to run is easy now. Just pop a task id off the fifo. Push the task id back onto the fifo if it is to continue running.
<
My queuing system contains 64-priority levels, and is completely managed by the
Scheduler. Insert work on queue: 1-cycle, remove work from queue: 1-cycle.
process arriving Event: 5 cycles of latency, 1-cycle throughput. CPUs talk to
Scheduler using <yes> interrupts to deActivate a potentially running process,
Disable, Enable, or inValidate a process in the Event Table, One interrupt rotates
the queue entry associated with a process (time slice).
>
> The system can handle up to 4095 tasks. I am thinking of having the task timeout decrement for waiting tasks implemented in hardware. So, an array of 4095 48-bit hardware decrementers could be used. Or rather an array of 64 hardware decrementers each responsible for 64 tasks. It would then take 64 clock cycles to do a decrement operation.
<
Right now: there can be up to 128 virtual machines, each virtual machine can
have up to 1024 threads affinitized to this chip, and there can be as many as
256-chips in the system. There is a 24-bit index limit to the number of processes
a virtual machine can have alive.
<
Schedulers are managed by dedicating a certain MMIO address per chip to
receive "events". Scheduler is a function unit inside Memory Controller,
so these Events arrive in some order, and are processed in order of arrival
at memory controller. Tables used are heavily cached by scheduler and
while OS and HV can access the tables, they are marked unCacheable so
the scheduler can keep track of what the OSs and HV do to the tales.
<
A core does not have a priority, the process running on the thread does.
Scheduler keeps track of all core priorities and dispatches processes
to cores in such a way as to always have the highest priority process
running on a core of its affinity set.
<
In effect, the cores are number crunchers while scheduler is the peripheral
processor sending Exchange Jumps to the CDC 6600 processor.
<
I looked into Ada requirements. What stopped me was the
<
select
task.entry_point( parameters );
or
delay 0.5;
end
<
Whether this stuff is used in practice, I don't know. The scheduler super-
structure certainly appears to support most of the necessities, and I have
a clean way to integrate these Entry-Point tables in my overall scheme.
<
The messages and processing in the scheduler to get all this right adds
significant burden on the memory footprint the scheduler needs, blowing
bigger than efficient scheduler caches could manage. Also note: I
want to get this Ada stuff right "across virtual machines" not just
within a single shared address space.
<
In any event, it is real-time in the sense where the real-time processes
can withstand the uncertainty and latency adders of caches and TLBs.
It is not hard-real-time.
<
Anyway, I am grinding through the <hopefully> last details and will be
in a position expose my model soon.

Re: Hardware OS

<20ac87b8-2859-4f64-8b72-99634acb37b5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23352&group=comp.arch#23352

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:192:: with SMTP id s18mr1561818qtw.43.1644594956759;
Fri, 11 Feb 2022 07:55:56 -0800 (PST)
X-Received: by 2002:aca:acce:: with SMTP id v197mr504501oie.272.1644594956354;
Fri, 11 Feb 2022 07:55:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 07:55:56 -0800 (PST)
In-Reply-To: <j6mv4mFeqe6U1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:84f7:6231:3421:a79e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:84f7:6231:3421:a79e
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com> <j6mv4mFeqe6U1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <20ac87b8-2859-4f64-8b72-99634acb37b5n@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 11 Feb 2022 15:55:56 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 36

by: MitchAlsup - Fri, 11 Feb 2022 15:55 UTC

On Friday, February 11, 2022 at 5:20:58 AM UTC-6, Niklas Holsti wrote:
> On 2022-02-11 8:20, robf...@gmail.com wrote:
> > I am wondering how much of the OS to put into hardware? Are there any
> > references to OS components placed into hardware?
> >
> > I put the ready queue into hardware. To help prevent the queue
> > overflowing a massive fifo is used that can handle thousands of
> > tasks. Selecting a task to run is easy now. Just pop a task id off
> > the fifo. Push the task id back onto the fifo if it is to continue
> > running.
> >
> > The system can handle up to 4095 tasks. I am thinking of having the
> > task timeout decrement for waiting tasks implemented in hardware. So,
> > an array of 4095 48-bit hardware decrementers could be used. Or
> > rather an array of 64 hardware decrementers each responsible for 64
> > tasks. It would then take 64 clock cycles to do a decrement
> > operation.
> There was at least one HW "coprocessor" designed to implement the Ada
> tasking facilities for real-time systems. A web search for "Ada tasking
> coprocessor" finds several publications; this is one that is available
> gratis:
>
> https://www.researchgate.net/publication/234334052_AXS_An_Ada_coprocessor_based_real-time_kernel
>
> However, just as for language-specific processors like Lisp machines, it
> seems that speed increases in general-purpose processors overcame the
> advantages of the Ada tasking coprocessors. And perhaps the Ada tasking
> SW implementations also improved, and the Ada language evolved with new
> tasking features that were not supported by the coprocessors. I don't
> think any current Ada systems use tasking coprocessors.
>
> I do believe that it makes sense to develop processor architectures to
> make task/thread/process switches and inter-task communications more
> efficient, and this may benefit from new HW features. The Mill is one
> example of such architectures.
<
Do you have a link that is not behind a pay wall or "sign up for free" wall.

Re: Hardware OS

<3955665c-cb26-4bc8-b546-9a644f647cc6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23353&group=comp.arch#23353

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57d0:: with SMTP id w16mr1563712qta.171.1644595114033;
Fri, 11 Feb 2022 07:58:34 -0800 (PST)
X-Received: by 2002:a05:6808:118c:: with SMTP id j12mr444134oil.259.1644595113818;
Fri, 11 Feb 2022 07:58:33 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 07:58:33 -0800 (PST)
In-Reply-To: <j6mv4mFeqe6U1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:84f7:6231:3421:a79e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:84f7:6231:3421:a79e
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com> <j6mv4mFeqe6U1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3955665c-cb26-4bc8-b546-9a644f647cc6n@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 11 Feb 2022 15:58:34 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 39

by: MitchAlsup - Fri, 11 Feb 2022 15:58 UTC

On Friday, February 11, 2022 at 5:20:58 AM UTC-6, Niklas Holsti wrote:
> On 2022-02-11 8:20, robf...@gmail.com wrote:
> > I am wondering how much of the OS to put into hardware? Are there any
> > references to OS components placed into hardware?
> >
> > I put the ready queue into hardware. To help prevent the queue
> > overflowing a massive fifo is used that can handle thousands of
> > tasks. Selecting a task to run is easy now. Just pop a task id off
> > the fifo. Push the task id back onto the fifo if it is to continue
> > running.
> >
> > The system can handle up to 4095 tasks. I am thinking of having the
> > task timeout decrement for waiting tasks implemented in hardware. So,
> > an array of 4095 48-bit hardware decrementers could be used. Or
> > rather an array of 64 hardware decrementers each responsible for 64
> > tasks. It would then take 64 clock cycles to do a decrement
> > operation.
> There was at least one HW "coprocessor" designed to implement the Ada
> tasking facilities for real-time systems. A web search for "Ada tasking
> coprocessor" finds several publications; this is one that is available
> gratis:
>
> https://www.researchgate.net/publication/234334052_AXS_An_Ada_coprocessor_based_real-time_kernel
>
> However, just as for language-specific processors like Lisp machines, it
> seems that speed increases in general-purpose processors overcame the
> advantages of the Ada tasking coprocessors. And perhaps the Ada tasking
> SW implementations also improved, and the Ada language evolved with new
> tasking features that were not supported by the coprocessors. I don't
> think any current Ada systems use tasking coprocessors.
>
> I do believe that it makes sense to develop processor architectures to
> make task/thread/process switches and inter-task communications more
> efficient, and this may benefit from new HW features. The Mill is one
> example of such architectures.
<
OK I found a copy.
<
This system is performing a "context" switch in 20-40 microseconds.
I want to be performing something very similar in 2-4 nanoseconds.

Re: Hardware OS

<OYwNJ.1808$uW1.1027@fx27.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23356&group=comp.arch#23356

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx27.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware OS
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
In-Reply-To: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 31
Message-ID: <OYwNJ.1808$uW1.1027@fx27.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 11 Feb 2022 17:23:58 UTC
Date: Fri, 11 Feb 2022 12:23:48 -0500
X-Received-Bytes: 2564

by: EricP - Fri, 11 Feb 2022 17:23 UTC

A single global ready list is only one possible scheduler design.

For some OS the scheduler is selectable at boot time.
Different scheduler designs are for different purposes.
Some algorithms are FIFO, earliest deadline first, deadline-monotonic,
Shortest remaining time, fixed or dynamic priority preemptive, round-robin.

Or an OS could have multiple schedulers, selectable by a threads' class.

For example, the WinNT scheduler uses 32 priority ready queues,
with priority 0..15 being round-robin time-slice, 16..31 pseudo-real-time.
To avoid global SMP spinlocks on whole-system scheduler tables
it has two schedulers, one local, one global.

One SMP scheduler problem area is balancing thread affinity run sets
(the bit mask of cores that a thread is allowed to run on),
dynamic cache affinity (the most recent core may have a threads' cache),
the cost of moving threads and its cache between cores,
and the desire to not scan long lists causing cache line misses.

I don't see how your hardware sorter helps with any of that.

Re: Hardware OS

<f0d9ee58-f57d-4e51-bb58-1fd4e456be13n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23358&group=comp.arch#23358

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:258d:: with SMTP id fq13mr1898081qvb.127.1644601507052;
Fri, 11 Feb 2022 09:45:07 -0800 (PST)
X-Received: by 2002:a05:6830:2b20:: with SMTP id l32mr1043180otv.333.1644601506878;
Fri, 11 Feb 2022 09:45:06 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 09:45:06 -0800 (PST)
In-Reply-To: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.97.60; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.97.60
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f0d9ee58-f57d-4e51-bb58-1fd4e456be13n@googlegroups.com>
Subject: Re: Hardware OS
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Fri, 11 Feb 2022 17:45:07 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 71

by: Paul A. Clayton - Fri, 11 Feb 2022 17:45 UTC

On Friday, February 11, 2022 at 1:20:05 AM UTC-5, robf...@gmail.com wrote:
> I am wondering how much of the OS to put into hardware?

Stating the obvious:
* separation of mechanism and policy
** how difficult can implementing a "useless" policy be?
** unimagined policy?
** how are abstract leaks to be managed?
* communication cost
** cost of developing and establishing mindshare of an
interface (with multiple implementations)
** cost of translating across interface boundaries
*** exposing an interface with an expectation/requirement
of external use tends to increase the cost of change
* design space of 'hardware'
** what resources are managed (execution resources/run
times are common but energy/thermal with different
localities of time and space, bandwidths, cache capacity,
etc. are beginning to be managed with hardware
assistance) [While BubbleWrap suggested exploiting
destructive operation for performance, the tradeoff of
hardware lifetime, result reliability, timeliness, and
other factors does not seem to be a broadly-recognized
design consideration.]
** pure hardware, firmware on specialized processor,
firmware on generic processor [possibly with scheduling
as a generic task with specific value-to-resource priority]

Older systems, being sold as systems and not as hardware,
could have had less clear distinction between OS software,
firmware, and hardware.

Integrating the interfaces with the system as a whole also
seems desirable. E.g., communication among agents will
not have a one-size-fits-all nature; different operations will
have different value-delay curves, possibly varying with
external circumstances. A thread attempting a performance-
critical complex atomic operation might be willing to pay
for avoiding failures (e.g., via more expensive finer-grained
tracking or a limited-capacity access ordering facility that
tries to interleave operations in a valid version-like order).

An operating system abstracts and allocates hardware
resources (virtualization) and provides basic services
(as highly trusted, privileged [with possibly implications
for overhead], and baseline system software). Abstraction
introduces translation overheads (translation work
and verbosity from irregular language/purpose mappings,
as well as miscommunication). While many components
seem mature (favoring fixed, hardware-ish methods) and
many operations are slow relative to computation (reducing
the cost of software handling), relative efficiency can
still be important and one would not want to paint oneself
into a corner.

The allocation aspect seems like an economic problem.
Establishing a medium of exchange and a market/network
of exchange might be a step toward a general abstraction.
(Economics is also about information retention and
communication; there would seem to be an economic
aspect to the resources for information discovery,
communication, and retention.)

Even something as "simple" as allocating a pair of
communicating to processing resources seems challenging.
Physical proximity would reduce communication costs and
might facilitate capacity sharing, but sharing a distant core
with a different thread might be better in delay or energy
use than waking up a nearby non-shared core.

This is less coherent than I would like, but I hope it is
still worth reading.

Re: Hardware OS

<czxNJ.30300$3jp8.29459@fx33.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23360&group=comp.arch#23360

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware OS
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com> <848c866c-9b07-41fa-83eb-6714eff388b3n@googlegroups.com>
In-Reply-To: <848c866c-9b07-41fa-83eb-6714eff388b3n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 48
Message-ID: <czxNJ.30300$3jp8.29459@fx33.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 11 Feb 2022 18:04:56 UTC
Date: Fri, 11 Feb 2022 13:04:40 -0500
X-Received-Bytes: 2887

by: EricP - Fri, 11 Feb 2022 18:04 UTC

MitchAlsup wrote:
> <
> I looked into Ada requirements. What stopped me was the
> <
> select
> task.entry_point( parameters );
> or
> delay 0.5;
> end
> <
> Whether this stuff is used in practice, I don't know. The scheduler super-
> structure certainly appears to support most of the necessities, and I have
> a clean way to integrate these Entry-Point tables in my overall scheme.
> <
> The messages and processing in the scheduler to get all this right adds
> significant burden on the memory footprint the scheduler needs, blowing
> bigger than efficient scheduler caches could manage. Also note: I
> want to get this Ada stuff right "across virtual machines" not just
> within a single shared address space.
> <
> In any event, it is real-time in the sense where the real-time processes
> can withstand the uncertainty and latency adders of caches and TLBs.
> It is not hard-real-time.
> <

Ada has the above timed entry construct which is the "impatient client"
and also a timed accept which is the "impatient server".

The WinNT system service WaitForMultipleObjects waits for any of
up to 64 events to be set or and optional time-out.
It might be used by an Ada run-time library to help implement rendezvous
if the Ada tasks were implemented as threads within a single process
as parameter data doesn't have to actually move between address spaces.
And if the 64 event limit was acceptable.

If this is between address spaces it gets more difficult because the
client is waiting in a queue inside the server, and if client times out
it must request the server dequeue it and wait for an ACK.
And if the server selects that client for service while the dequeue
request is in flight then it send a NAK.
But the server has to be able to dequeue and ACK a client at any time
even if it is off processing something else and
not just when it is also waiting at the rendezvous.
So client dequeue request acts like an interrupt to the server thread.

Anyway, inter-process rendezvous looks messy.

Re: Hardware OS

<f11369b3-5e66-4667-a92f-a59a5e9c723fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23361&group=comp.arch#23361

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:192:: with SMTP id s18mr2024180qtw.43.1644603373977;
Fri, 11 Feb 2022 10:16:13 -0800 (PST)
X-Received: by 2002:a05:6808:118c:: with SMTP id j12mr755136oil.259.1644603373710;
Fri, 11 Feb 2022 10:16:13 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 10:16:13 -0800 (PST)
In-Reply-To: <f0d9ee58-f57d-4e51-bb58-1fd4e456be13n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:84f7:6231:3421:a79e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:84f7:6231:3421:a79e
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com> <f0d9ee58-f57d-4e51-bb58-1fd4e456be13n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f11369b3-5e66-4667-a92f-a59a5e9c723fn@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 11 Feb 2022 18:16:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 137

by: MitchAlsup - Fri, 11 Feb 2022 18:16 UTC

On Friday, February 11, 2022 at 11:45:08 AM UTC-6, Paul A. Clayton wrote:
> On Friday, February 11, 2022 at 1:20:05 AM UTC-5, robf...@gmail.com wrote:
> > I am wondering how much of the OS to put into hardware?
> Stating the obvious:
> * separation of mechanism and policy
<
Mechanism is for HW
policy is for SW
<
> ** how difficult can implementing a "useless" policy be?
<
Surprisingly difficult--but you earned a place in my heart for this one.
<
> ** unimagined policy?
> ** how are abstract leaks to be managed?
<
One must not allow a guest to discover anything about host, but
must allow host to discover everything about Guest.
<
> * communication cost
<
Small messages (say 75% of the register file) should be transported
atomically from message sender to message receiver in a single
<ahem> message over the system interconnect--that is atomically
AND larger than a single cache line.
<
Message should take no longer than:
CEIL( messageLength/LineLength ) × CacheLineLatency + 5 clocks
<
> ** cost of developing and establishing mindshare of an
> interface (with multiple implementations)
> ** cost of translating across interface boundaries
<
Including retranslation of address spaces. Device may use one
vision of the DMA request address space, HostBridge translates
this into the system view of that same address.
<
> *** exposing an interface with an expectation/requirement
> of external use tends to increase the cost of change
<
use available standardized external interfaces (PCIe, DRAM DIMM)
whenever possible.
<
Use standardized transport protocol (PCIe) whenever possible
EVEN WITHIN your own CHIP. You had to build the HW once
(the external interface), replication the HW takes 0 engineering
time.
<
> * design space of 'hardware'
<
Chips are NOT just processors anymore. CPUs, GPUs, Host Bridges,
repeaters,...
<
> ** what resources are managed (execution resources/run
> times are common but energy/thermal with different
> localities of time and space, bandwidths, cache capacity,
> etc. are beginning to be managed with hardware
> assistance) [While BubbleWrap suggested exploiting
> destructive operation for performance, the tradeoff of
> hardware lifetime, result reliability, timeliness, and
> other factors does not seem to be a broadly-recognized
> design consideration.]
> ** pure hardware, firmware on specialized processor,
> firmware on generic processor [possibly with scheduling
> as a generic task with specific value-to-resource priority]
>
> Older systems, being sold as systems and not as hardware,
> could have had less clear distinction between OS software,
> firmware, and hardware.
<
Do not forget about HyperVisors, and be careful by providing
a coherent paravirtualization interface HV<->Guest OS where
Guest OS can ask for HV service, and where HV can ask Guest
OS for service.
<
Do not forget about recovering from system errors
Accounting for corrected ECC errors,
Signaling unCorrectable ECC errors,
HW function units reaching exception conditions
Rejecting requests that--if performed as an instruction
......would raise Operation or Operand exceptions
Provide Translation Errors to be handled at user, super, hyper
......levels.
>
> Integrating the interfaces with the system as a whole also
> seems desirable. E.g., communication among agents will
> not have a one-size-fits-all nature; different operations will
> have different value-delay curves, possibly varying with
> external circumstances. A thread attempting a performance-
> critical complex atomic operation might be willing to pay
> for avoiding failures (e.g., via more expensive finer-grained
> tracking or a limited-capacity access ordering facility that
> tries to interleave operations in a valid version-like order).
<
The ATOMIC protocol should favor the process making forward
progress over the one interfering with the one making forward
progress. This requires a re-think of the cache coherence protocols.
>
> An operating system abstracts and allocates hardware
HyperVisor does this
> resources (virtualization) and provides basic services
> (as highly trusted, privileged [with possibly implications
> for overhead], and baseline system software). Abstraction
HyperVisor does not trust Guest OSs.
> introduces translation overheads (translation work
> and verbosity from irregular language/purpose mappings,
> as well as miscommunication). While many components
> seem mature (favoring fixed, hardware-ish methods) and
> many operations are slow relative to computation (reducing
> the cost of software handling), relative efficiency can
> still be important and one would not want to paint oneself
> into a corner.
<
That is why a context switch should be closer to 10 cycles than
1000 cycles. A full complete context switch from one Thread
operating in one Guest OS to another Thread in a different
Guest OS should take that same 10 cycles. This requires
moving context switch decisions for the "processors" and
delivering it to things that smell more like peripheral processors.
>
> The allocation aspect seems like an economic problem.
> Establishing a medium of exchange and a market/network
> of exchange might be a step toward a general abstraction.
> (Economics is also about information retention and
> communication; there would seem to be an economic
> aspect to the resources for information discovery,
> communication, and retention.)
>
> Even something as "simple" as allocating a pair of
> communicating to processing resources seems challenging.
> Physical proximity would reduce communication costs and
> might facilitate capacity sharing, but sharing a distant core
> with a different thread might be better in delay or energy
> use than waking up a nearby non-shared core.
>
> This is less coherent than I would like, but I hope it is
> still worth reading.

Re: Hardware OS

<5400173f-d5c4-4017-bf55-abb9fd9b52f2n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23363&group=comp.arch#23363

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:190b:: with SMTP id w11mr2046620qtc.186.1644603787103;
Fri, 11 Feb 2022 10:23:07 -0800 (PST)
X-Received: by 2002:a4a:9723:: with SMTP id u32mr1005311ooi.5.1644603786870;
Fri, 11 Feb 2022 10:23:06 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 10:23:06 -0800 (PST)
In-Reply-To: <czxNJ.30300$3jp8.29459@fx33.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:84f7:6231:3421:a79e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:84f7:6231:3421:a79e
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
<848c866c-9b07-41fa-83eb-6714eff388b3n@googlegroups.com> <czxNJ.30300$3jp8.29459@fx33.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5400173f-d5c4-4017-bf55-abb9fd9b52f2n@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 11 Feb 2022 18:23:07 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 65

by: MitchAlsup - Fri, 11 Feb 2022 18:23 UTC

On Friday, February 11, 2022 at 12:05:00 PM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > <
> > I looked into Ada requirements. What stopped me was the
> > <
> > select
> > task.entry_point( parameters );
> > or
> > delay 0.5;
> > end
> > <
> > Whether this stuff is used in practice, I don't know. The scheduler super-
> > structure certainly appears to support most of the necessities, and I have
> > a clean way to integrate these Entry-Point tables in my overall scheme.
> > <
> > The messages and processing in the scheduler to get all this right adds
> > significant burden on the memory footprint the scheduler needs, blowing
> > bigger than efficient scheduler caches could manage. Also note: I
> > want to get this Ada stuff right "across virtual machines" not just
> > within a single shared address space.
> > <
> > In any event, it is real-time in the sense where the real-time processes
> > can withstand the uncertainty and latency adders of caches and TLBs.
> > It is not hard-real-time.
> > <
> Ada has the above timed entry construct which is the "impatient client"
> and also a timed accept which is the "impatient server".
>
> The WinNT system service WaitForMultipleObjects waits for any of
> up to 64 events to be set or and optional time-out.
> It might be used by an Ada run-time library to help implement rendezvous
> if the Ada tasks were implemented as threads within a single process
> as parameter data doesn't have to actually move between address spaces.
> And if the 64 event limit was acceptable.
<
This likely takes on-the-order-of 10,000 cycles,
we should be looking for something more on the order of 100 cycles.
<
hardware is good at performing work that takes between 1 cycle and
20-cycles--that is between integer ops and fp SQRT. When it starts taking
longer than this, some piece of SW will end up taking exception to the
prescribed mechanism.
<
This is GOOD
<
It keeps HW from mixing mechanism with policy.
>
> If this is between address spaces it gets more difficult because the
> client is waiting in a queue inside the server, and if client times out
> it must request the server dequeue it and wait for an ACK.
> And if the server selects that client for service while the dequeue
> request is in flight then it send a NAK.
<
This is exactly what caused the telephone system crash 20-30
years ago in NE USA.
<
> But the server has to be able to dequeue and ACK a client at any time
> even if it is off processing something else and
> not just when it is also waiting at the rendezvous.
> So client dequeue request acts like an interrupt to the server thread.
>
> Anyway, inter-process rendezvous looks messy.
<
It is messier if the entry point in in a different Guest OS than the
caller. It is REALLY messy when the entry point is RPC to a
different system.

Re: Hardware OS

<c735de17-4aba-4558-9d98-2fe19e352d17n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23373&group=comp.arch#23373

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57d0:: with SMTP id w16mr2433899qta.171.1644611778950;
Fri, 11 Feb 2022 12:36:18 -0800 (PST)
X-Received: by 2002:a4a:d817:: with SMTP id f23mr1184031oov.35.1644611778632;
Fri, 11 Feb 2022 12:36:18 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!2.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 12:36:18 -0800 (PST)
In-Reply-To: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=73.188.126.34; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 73.188.126.34
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c735de17-4aba-4558-9d98-2fe19e352d17n@googlegroups.com>
Subject: Re: Hardware OS
From: timcaff...@aol.com (Timothy McCaffrey)
Injection-Date: Fri, 11 Feb 2022 20:36:18 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 46

by: Timothy McCaffrey - Fri, 11 Feb 2022 20:36 UTC

On Friday, February 11, 2022 at 1:20:05 AM UTC-5, robf...@gmail.com wrote:
> I am wondering how much of the OS to put into hardware? Are there any references to OS components placed into hardware?
>
> I put the ready queue into hardware. To help prevent the queue overflowing a massive fifo is used that can handle thousands of tasks. Selecting a task to run is easy now. Just pop a task id off the fifo. Push the task id back onto the fifo if it is to continue running.
>
> The system can handle up to 4095 tasks. I am thinking of having the task timeout decrement for waiting tasks implemented in hardware. So, an array of 4095 48-bit hardware decrementers could be used. Or rather an array of 64 hardware decrementers each responsible for 64 tasks. It would then take 64 clock cycles to do a decrement operation.

As a historical note, I worked on a terminal front end in the 80s, based on a HP 21MX, which had a bunch of custom microcode that took care
of things like task scheduling, interrupts and semaphore handling. It had (IIRC) 4 priority queues and would schedule the from the head of the highest priority queue that wasn't empty. When a task got interrupted (or got made ready because a semaphore was released or the interrupt a task had been waiting for arrived) it was put on the tail of the queue.

Note that the microcode and the rest of the OS had to agree on things like: where the queues were located, where various fields were in the task table structure (such as task priority) and where registers (including memory mapping registers) are stored/loaded on a task switch.

The box could handle about 3000 interrupts/second, which is pretty good for a CPU whose typical instruction was 5-7 microseconds (sometimes *much* more than that).

On a related note: Timers were implemented as an array of counters, one for each task. There was an instruction added that buzzed through
this array (which was executed by the timer task, which was scheduled when the timer interrupt happened). Any timer that decremented to
zero caused the task to marked ready and put on the appropriate ready queue..

Wow, that was the wrong way to do it (a typical box usually had 100-200 active tasks). The number of tasks that are actually waiting on a timer
(and/or interrupt) is usually pretty small. (much) later in life I implemented this as a RB-tree with deadlines. Much more efficient, and it scaled
better as well.

- Tim

Re: Hardware OS

<58201692-5516-4bd1-8686-7a4742fe65cfn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23381&group=comp.arch#23381

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:54c:: with SMTP id m12mr3015195qtx.300.1644626212633;
Fri, 11 Feb 2022 16:36:52 -0800 (PST)
X-Received: by 2002:a05:6808:1598:: with SMTP id t24mr1307069oiw.50.1644626212386;
Fri, 11 Feb 2022 16:36:52 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 11 Feb 2022 16:36:52 -0800 (PST)
In-Reply-To: <memo.20220211221709.20064E@jgd.cix.co.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:84f7:6231:3421:a79e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:84f7:6231:3421:a79e
References: <3955665c-cb26-4bc8-b546-9a644f647cc6n@googlegroups.com> <memo.20220211221709.20064E@jgd.cix.co.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <58201692-5516-4bd1-8686-7a4742fe65cfn@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 12 Feb 2022 00:36:52 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 29

by: MitchAlsup - Sat, 12 Feb 2022 00:36 UTC

On Friday, February 11, 2022 at 4:17:13 PM UTC-6, John Dallman wrote:
> In article <3955665c-cb26-4bc8...@googlegroups.com>,
> Mitch...@aol.com (MitchAlsup) wrote:
>
> > On Friday, February 11, 2022 at 5:20:58 AM UTC-6, Niklas Holsti
> > wrote:
> > > I do believe that it makes sense to develop processor
> > > architectures to make task/thread/process switches and inter-task
> > > communications more efficient, and this may benefit from new HW
> > > features. The Mill is one example of such architectures.
<
> Possibly, but freezing the data structures involved forever, by having
> instructions that know about them, is unlikely to be a productive route
> in the long term. The VAX tried that.
<
In my design certain <ahem> pages have an agreed address between HW and
SW, but the actual location is set during configuration.
<
> > OK I found a copy.
> > This system is performing a "context" switch in 20-40 microseconds.
> > I want to be performing something very similar in 2-4 nanoseconds.
<
> Presumably, the only way to do that with current memory technology would
> be to switch between register sets?
<
Nope: each core contains exactly 1 register set of exactly the number
of architectural registers (low end design). Messages performing context
switch carry only exactly that number of registers (or less!)
>
> John

Re: Hardware OS

<su88jb$349$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23387&group=comp.arch#23387

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: spamj...@blueyonder.co.uk (Tom Gardner)
Newsgroups: comp.arch
Subject: Re: Hardware OS
Date: Sat, 12 Feb 2022 12:18:51 +0000
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <su88jb$349$2@dont-email.me>
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 12 Feb 2022 12:18:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7e1dfc9e71d7bad28d6fbbe1290c39f3";
logging-data="3209"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18FPGxsXrGCuY+/zPtw1Z2E"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
Firefox/52.0 SeaMonkey/2.49.4
Cancel-Lock: sha1:9NBW64jaG5pTsSw72Gbrnx61rsE=
In-Reply-To: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>

by: Tom Gardner - Sat, 12 Feb 2022 12:18 UTC

On 11/02/22 06:20, robf...@gmail.com wrote:
> I am wondering how much of the OS to put into hardware? Are there any
> references to OS components placed into hardware?
>
> I put the ready queue into hardware. To help prevent the queue overflowing a
> massive fifo is used that can handle thousands of tasks. Selecting a task to
> run is easy now. Just pop a task id off the fifo. Push the task id back onto
> the fifo if it is to continue running.
>
> The system can handle up to 4095 tasks. I am thinking of having the task
> timeout decrement for waiting tasks implemented in hardware. So, an array of
> 4095 48-bit hardware decrementers could be used. Or rather an array of 64
> hardware decrementers each responsible for 64 tasks. It would then take 64
> clock cycles to do a decrement operation.

For the embedded *hard* realtime market, see the XMOS xCORE
processors.

Effectively the RTOS is in hardware, and they have a programming
environment that /presumes/ applications will be multithread/core.

It is a modern, updated, commercially successful successor to
the Transputer plus Occam. Buy them at DigiKey.

Re: Hardware OS

<DZPNJ.12962$R1C9.12134@fx22.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23394&group=comp.arch#23394

copy link Newsgroups: comp.arch

Path: i2pn2.org!rocksolid2!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx22.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware OS
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com> <f0d9ee58-f57d-4e51-bb58-1fd4e456be13n@googlegroups.com> <f11369b3-5e66-4667-a92f-a59a5e9c723fn@googlegroups.com>
In-Reply-To: <f11369b3-5e66-4667-a92f-a59a5e9c723fn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 36
Message-ID: <DZPNJ.12962$R1C9.12134@fx22.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 12 Feb 2022 15:01:55 UTC
Date: Sat, 12 Feb 2022 09:59:43 -0500
X-Received-Bytes: 2659

by: EricP - Sat, 12 Feb 2022 14:59 UTC

MitchAlsup wrote:
> On Friday, February 11, 2022 at 11:45:08 AM UTC-6, Paul A. Clayton wrote:
>> On Friday, February 11, 2022 at 1:20:05 AM UTC-5, robf...@gmail.com wrote:
>>> I am wondering how much of the OS to put into hardware?
>> ** how are abstract leaks to be managed?
> <
> One must not allow a guest to discover anything about host, but
> must allow host to discover everything about Guest.

AMD has its Secure Virtual Machine technology wherein the
host can manage guests but not find out anything about them.

AMD64 Volume 2 System Programming
"15.34 Secure Encrypted Virtualization
Secure Encrypted Virtualization (SEV) is available when the CPU is running
in guest mode utilizing AMD-V virtualization features. SEV enables running
encrypted virtual machines (VMs) in which the code and data of the virtual
machine are secured so that the decrypted version is available only within
the VM itself. Each virtual machine may be associated with a unique
encryption key so if data is accessed by a different entity using a
different key, the SEV encrypted VM's data will be decrypted with an
incorrect key, leading to unintelligible data.

It is important to note that SEV mode therefore represents a departure
from the standard x86 virtualization security model, as the hypervisor
is no longer able to inspect or alter all guest code or data.
The guest page tables, managed by the guest, may mark data memory pages
as either private or shared, thus allowing selected pages to be shared
outside the guest. Private memory is encrypted using a guest-specific key,
while shared memory is accessible to the hypervisor."

Presumably this allows traditionally physically secure operations
to be outsourced to cloud virtual machine providers with
lesser security clearances.

Re: Hardware OS

<4e130fbc-3d2d-4193-ad9a-de77a34e0330n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=23396&group=comp.arch#23396

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:509:: with SMTP id u9mr4571876qtg.530.1644681240229;
Sat, 12 Feb 2022 07:54:00 -0800 (PST)
X-Received: by 2002:a4a:98c8:: with SMTP id b8mr2199463ooj.50.1644681240012;
Sat, 12 Feb 2022 07:54:00 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 12 Feb 2022 07:53:59 -0800 (PST)
In-Reply-To: <DZPNJ.12962$R1C9.12134@fx22.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:113c:643c:9fe8:1531;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:113c:643c:9fe8:1531
References: <4708b1d7-d2bf-40a4-91f0-2585b7ef7415n@googlegroups.com>
<f0d9ee58-f57d-4e51-bb58-1fd4e456be13n@googlegroups.com> <f11369b3-5e66-4667-a92f-a59a5e9c723fn@googlegroups.com>
<DZPNJ.12962$R1C9.12134@fx22.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4e130fbc-3d2d-4193-ad9a-de77a34e0330n@googlegroups.com>
Subject: Re: Hardware OS
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 12 Feb 2022 15:54:00 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 46

by: MitchAlsup - Sat, 12 Feb 2022 15:53 UTC

On Saturday, February 12, 2022 at 9:01:59 AM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Friday, February 11, 2022 at 11:45:08 AM UTC-6, Paul A. Clayton wrote:
> >> On Friday, February 11, 2022 at 1:20:05 AM UTC-5, robf...@gmail.com wrote:
> >>> I am wondering how much of the OS to put into hardware?
> >> ** how are abstract leaks to be managed?
> > <
> > One must not allow a guest to discover anything about host, but
> > must allow host to discover everything about Guest.
> AMD has its Secure Virtual Machine technology wherein the
> host can manage guests but not find out anything about them.
>
> AMD64 Volume 2 System Programming
> "15.34 Secure Encrypted Virtualization
> Secure Encrypted Virtualization (SEV) is available when the CPU is running
> in guest mode utilizing AMD-V virtualization features. SEV enables running
> encrypted virtual machines (VMs) in which the code and data of the virtual
> machine are secured so that the decrypted version is available only within
> the VM itself. Each virtual machine may be associated with a unique
> encryption key so if data is accessed by a different entity using a
> different key, the SEV encrypted VM's data will be decrypted with an
> incorrect key, leading to unintelligible data.
>
> It is important to note that SEV mode therefore represents a departure
> from the standard x86 virtualization security model, as the hypervisor
> is no longer able to inspect or alter all guest code or data.
> The guest page tables, managed by the guest, may mark data memory pages
> as either private or shared, thus allowing selected pages to be shared
> outside the guest. Private memory is encrypted using a guest-specific key,
> while shared memory is accessible to the hypervisor."
>
> Presumably this allows traditionally physically secure operations
> to be outsourced to cloud virtual machine providers with
> lesser security clearances.
<
This seems to me a solution to a non-existing problem::
<
Without respect to the keyed encryption of page data::
<
For example: instead of disallowing the Real HyperVisor from seeing
into its Guest OS, convert that real HyperVisor into virtualized
HyperVisor (i.e., a Guest OS that thinks it's a HyperVisor) Wherein
the real HyperVisor* is "just a program loader service" and a "message
deliver service". Now the <ahem: what was thought> real HV cannot
see into the Guests OS.......
<
(*) small enough to be proven correct.

The best defense against logic is ignorance.

devel / comp.arch / Hardware OS

Subject	Author
Hardware OS	robf...@gmail.com
Re: Hardware OS	Terje Mathisen
Re: Hardware OS	Niklas Holsti
Re: Hardware OS	MitchAlsup
Re: Hardware OS	MitchAlsup
Re: Hardware OS	MitchAlsup
Re: Hardware OS	MitchAlsup
Re: Hardware OS	EricP
Re: Hardware OS	MitchAlsup
Re: Hardware OS	EricP
Re: Hardware OS	Paul A. Clayton
Re: Hardware OS	MitchAlsup
Re: Hardware OS	EricP
Re: Hardware OS	MitchAlsup
Re: Hardware OS	Timothy McCaffrey
Re: Hardware OS	Tom Gardner