Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Row, row, row your bits, gently down the stream...


devel / comp.arch / Re: Hardware assisted message passing

SubjectAuthor
* Hardware assisted message passingStephen Fuld
+* Re: Hardware assisted message passingTimothy McCaffrey
|+* Re: Hardware assisted message passingrobf...@gmail.com
||+* Re: Hardware assisted message passingStephen Fuld
|||`* Re: Hardware assisted message passingMitchAlsup
||| `- Re: Hardware assisted message passingStephen Fuld
||`- Re: Hardware assisted message passingMitchAlsup
|`- Re: Hardware assisted message passingStephen Fuld
+* Re: Hardware assisted message passingTheo Markettos
|+- Re: Hardware assisted message passingStephen Fuld
|+* Re: Hardware assisted message passingMitchAlsup
||`* Re: Hardware assisted message passingEricP
|| `* Re: Hardware assisted message passingMitchAlsup
||  `- Re: Hardware assisted message passingBGB
|`* Re: Hardware assisted message passingEricP
| `* Re: Hardware assisted message passingGeorge Neuner
|  `* Re: Hardware assisted message passingEricP
|   +- Re: Hardware assisted message passingEricP
|   `* Re: Hardware assisted message passingGeorge Neuner
|    `- Re: Hardware assisted message passingEricP
`* Re: Hardware assisted message passingMitchAlsup
 +* Re: Hardware assisted message passingThomas Koenig
 |`- Re: Hardware assisted message passingMitchAlsup
 +* Re: Hardware assisted message passingStephen Fuld
 |+- Re: Hardware assisted message passingMitchAlsup
 |`* Re: Hardware assisted message passingMitchAlsup
 | `- Re: Hardware assisted message passingStephen Fuld
 +* Re: Hardware assisted message passingEricP
 |+* Re: Hardware assisted message passingMitchAlsup
 ||+* Re: Hardware assisted message passingIvan Godard
 |||`- Re: Hardware assisted message passingMitchAlsup
 ||+* Re: Hardware assisted message passingStephen Fuld
 |||+* Re: Hardware assisted message passingMitchAlsup
 ||||`- Re: Hardware assisted message passingMitchAlsup
 |||`- Re: Hardware assisted message passingEricP
 ||`- Re: Hardware assisted message passingEricP
 |`* Re: Hardware assisted message passingNiklas Holsti
 | +- Re: Hardware assisted message passingMitchAlsup
 | `* Re: Hardware assisted message passingEricP
 |  `* Re: Hardware assisted message passingNiklas Holsti
 |   `* Re: Hardware assisted message passingEricP
 |    +* Re: Hardware assisted message passingNiklas Holsti
 |    |`* Re: Hardware assisted message passingEricP
 |    | +* Re: Hardware assisted message passingStefan Monnier
 |    | |`- Re: Hardware assisted message passingEricP
 |    | `* Re: Hardware assisted message passingNiklas Holsti
 |    |  +* Re: Hardware assisted message passingMitchAlsup
 |    |  |`* Re: Hardware assisted message passingNiklas Holsti
 |    |  | `* Re: Hardware assisted message passingMitchAlsup
 |    |  |  `* Re: Hardware assisted message passingNiklas Holsti
 |    |  |   +* Re: Hardware assisted message passingMitchAlsup
 |    |  |   |`- Re: Hardware assisted message passingNiklas Holsti
 |    |  |   `* Re: Hardware assisted message passingStephen Fuld
 |    |  |    `* Re: Hardware assisted message passingNiklas Holsti
 |    |  |     `* Re: Hardware assisted message passingStephen Fuld
 |    |  |      +* Re: Hardware assisted message passingMitchAlsup
 |    |  |      |`- Re: Hardware assisted message passingStephen Fuld
 |    |  |      `* Re: Hardware assisted message passingNiklas Holsti
 |    |  |       +* Re: Hardware assisted message passingMitchAlsup
 |    |  |       |`* Re: Hardware assisted message passingNiklas Holsti
 |    |  |       | `* Re: Hardware assisted message passingMitchAlsup
 |    |  |       |  `* Re: Hardware assisted message passingNiklas Holsti
 |    |  |       |   +* Re: Hardware assisted message passingIvan Godard
 |    |  |       |   |`* Re: Hardware assisted message passingNiklas Holsti
 |    |  |       |   | +- Re: Hardware assisted message passingIvan Godard
 |    |  |       |   | `* Re: Hardware assisted message passingStephen Fuld
 |    |  |       |   |  `* Re: Hardware assisted message passingMitchAlsup
 |    |  |       |   |   `* Re: Hardware assisted short message passingMitchAlsup
 |    |  |       |   |    `* Re: Hardware assisted short message passingStephen Fuld
 |    |  |       |   |     `* Re: Hardware assisted short message passingMitchAlsup
 |    |  |       |   |      `* Re: Hardware assisted short message passingStephen Fuld
 |    |  |       |   |       +- Re: Hardware assisted short message passingMitchAlsup
 |    |  |       |   |       +- Re: Hardware assisted short message passingrobf...@gmail.com
 |    |  |       |   |       `- Re: Hardware assisted short message passingMitchAlsup
 |    |  |       |   `- Re: Hardware assisted message passingMitchAlsup
 |    |  |       `- Re: Hardware assisted message passingIvan Godard
 |    |  `* Re: Hardware assisted message passingEricP
 |    |   `* Re: Hardware assisted message passingNiklas Holsti
 |    |    `* Re: Hardware assisted message passingEricP
 |    |     `- Re: Hardware assisted message passingNiklas Holsti
 |    `* Re: Hardware assisted message passingMitchAlsup
 |     `* Re: Hardware assisted message passingEricP
 |      +* Re: Hardware assisted message passingNiklas Holsti
 |      |`- Re: Hardware assisted message passingEricP
 |      `- Re: Hardware assisted message passingNiklas Holsti
 `* Re: Hardware assisted message passingBakul Shah
  `* Re: Hardware assisted message passingMitchAlsup
   +* Re: Hardware assisted message passingBakul Shah
   |+- Re: Hardware assisted message passingMitchAlsup
   |`* Re: Hardware assisted message passingIvan Godard
   | +- Re: Hardware assisted message passingMitchAlsup
   | `* Re: Hardware assisted message passingBakul Shah
   |  `* Re: Hardware assisted message passingIvan Godard
   |   `* Re: Hardware assisted message passingBakul Shah
   |    `- Re: Hardware assisted message passingMitchAlsup
   `* Re: Hardware assisted message passingBakul Shah
    `- Re: Hardware assisted message passingMitchAlsup

Pages:1234
Re: Hardware assisted message passing

<somlvi$8i1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22215&group=comp.arch#22215

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bak...@iitbombay.org (Bakul Shah)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Mon, 6 Dec 2021 19:55:28 -0800
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <somlvi$8i1$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 7 Dec 2021 03:55:30 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a63cdf83c829c26b0a007a1c3319910a";
logging-data="8769"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jC79sC/78W92N/adGMcgn"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:68.0)
Gecko/20100101 Firefox/68.0 SeaMonkey/2.53.10
Cancel-Lock: sha1:z+p/EIpLKpJoUpyHRlbl4jk5i6o=
In-Reply-To: <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
 by: Bakul Shah - Tue, 7 Dec 2021 03:55 UTC

On 12/5/21 1:09 PM, MitchAlsup wrote:
>>
>> So, the basic use is to send a “message” of typically up to several
>> hundred bytes (though often much less) from one process to another
>> another process that is willing to accept it. An extension is to allow
>> the hardware itself to send or receiver a message. So on to other uses.
> <
> My intended use was to explicitly make interrupts a first class citizen
> of the architecture. An interrupt causes a waiting thread to receive control
> at a specified priority and within a specified set of CPUs (affinity).

As you may know, typically this is not how (currently)
interrupt handling works. There is no thread *waiting* to
receive control. The interrupt handler may borrow the the
kernel stack of the currently running process or may use its
own stack. The handler may wakeup an "upper half" supervisory
process depending what specific condition it is waiting on.

The issue is that
a) the handler has to do something to make the interrupt
condition go away,
b) at very high IO rates it has very little time to handle
the condition and queuing up the interrupt condition would
make it miss some input. You can use FIFOs in each
direction but even so you'd want to service the device
fast enough.

The upper-half runs in the some requesting process'es
context, and there may be multiple such processes waiting.

Re: Hardware assisted message passing

<j18kaiFibh7U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22217&group=comp.arch#22217

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Tue, 7 Dec 2021 10:29:05 +0200
Organization: Tidorum Ltd
Lines: 75
Message-ID: <j18kaiFibh7U1@mid.individual.net>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<u1xrJ.95050$np6.11511@fx46.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net 3je3RIy7uvxyMmP5RVNO9QNjdLK5EAJ4abFy3BuGZ92z8K2qc/
Cancel-Lock: sha1:EPG8zkaviabi3JVKKxs3WYqtnQw=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <u1xrJ.95050$np6.11511@fx46.iad>
Content-Language: en-US
 by: Niklas Holsti - Tue, 7 Dec 2021 08:29 UTC

Just some notes on the discussion of Ada rendez-vous, a bit off-topic:

On 2021-12-07 1:29, EricP wrote:
> MitchAlsup wrote:
[...]
>> On Monday, November 29, 2021 at 8:14:02 AM UTC-8, Stephen Fuld wrote:
>>> When I first encountered the Elxsi 6400 system in the mid 1980s, I
>>> saw the “unifying” power of hardware (well, in their case microcode)
>>> assisted message passing. [...]
>>> So, the basic use is to send a “message” of typically up to several
>>> hundred bytes (though often much less) from one process to another
>>> another process that is willing to accept it. An extension is to
>>> allow the hardware itself to send or receiver a message. So on to
>>> other uses.
>>
>> My intended use was to explicitly make interrupts a first class
>> citizen of the architecture. An interrupt causes a waiting thread
>> to receive control at a specified priority and within a specified
>> set of CPUs (affinity). [...]
>>
>> An interrupt is associated with a thread, that thread has a priority,
>> Thread header (PSW, Root Pointer, State) and a register file.
>>
>> The receipt of an interrupt either gets queued onto that thread (if
>> it is already running) or activates that thread if is is not
>> running. The queueing guarantees that the thread is never active
>> more than once, and that no interrupts are ever lost.
[...]

>> I worked in ADA rendezvous into the system, too. They have almost
>> all of the properties of the std interrupts, but add in the address
>> space join/release, and the reactivation of caller at the end of
>> rendezvous.
>
> I have a certain amount of apprehension about this approach.
>
> Note that Ada rendezvous are inherently _synchronous_ client-server only,
> wherein the client waits until the server finishes the rendezvous.

Yes.

> There are no events, mutexes, semaphores, etc in that model. It does
> have timed entry and accept statements, but that is synchronous
> polling.

Basically it is a time-out on the wait for the synchronous
communication. But it can be used for polling, true.

> You can build things like events and mutexes out of Ada tasks and
> rendezvous but it is ridiculously expensive because they require more
> tasks each with its header and stack, etc to create "mutex servers".

There were some Ada compilers that detected such simple "mutex" or
"monitor" tasks and optimized out the thread and stack aspects. But
current Ada has instead added data-mediated communication, using a
feature called "protected objects", similar to the "monitor" concept in
some Pascal-derived multi-threading languages and also similar to
"synchronized" in Java. With this feature one can build queues and
similar shared data structures without the extra task-switching and
stacking overhead.

> The first thing people found out about synchronous RPC is that they really
> didn't want synchronous RPC, they wanted asynchronous RPC so they could
> submit multiple concurrent requests at once then wait for all to be done.

Ada has an standardized but optional RPC mechanism that supports both
synchronous and asynchronous RPC. However, it has not been widely used
-- programmers prefer to use mechanisms that work inter-language, such
as sockets -- and I believe is not supported now by any Ada compiler.

Re: Hardware assisted message passing

<cVJrJ.106596$Wkjc.102711@fx35.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22219&group=comp.arch#22219

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx35.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <cPf*upDAy@news.chiark.greenend.org.uk> <ONqrJ.50504$zF3.12794@fx03.iad> <4pbtqg98to3k4ib0qi4iuu7mlv7r2iafbc@4ax.com>
In-Reply-To: <4pbtqg98to3k4ib0qi4iuu7mlv7r2iafbc@4ax.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 107
Message-ID: <cVJrJ.106596$Wkjc.102711@fx35.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 07 Dec 2021 14:08:08 UTC
Date: Tue, 07 Dec 2021 09:07:06 -0500
X-Received-Bytes: 5200
 by: EricP - Tue, 7 Dec 2021 14:07 UTC

George Neuner wrote:
> On Mon, 06 Dec 2021 11:23:00 -0500, EricP
> <ThatWouldBeTelling@thevillage.com> wrote:
>
>> In the example scenario I'm thinking of there are, say, 64k processor
>> nodes in a 256*256 2-D torus network, each node with a 16-bit address.
>> Each node executes some number of neurons, say 16,
>> so each neuron has a 20-bit address.
>> Each neuron can have up to 128 synapses, arbitrarily connected,
>> Any neuron can send a "fired" message addressed from one of its
>> axons (outputs) to any different neuron dendrite (input).
>
> Torus has a lot of communication overhead with so many nodes.

Why so?

My example is a scaling up of what the paper describes,
which has 3072 FPGA barrel processor cores running 16 threads each,
49152 threads. Cores are connected with links north, east, south and west.

I added the torus part because otherwise the distance from opposite
corners is excessive.

There is still going to be traffic jams at some intersections.
One might add more interconnect links so there are short cuts
across core grid to lower the hop count.

> Connection Machines used a hypercube (SIMD) or hypertree (SPMD).
> The SIMD models implemented single "machine" cycle any->any messaging.
> [Note: "machine" cycle /not/ CPU cycle. Each CPU implemented one or
> more vCPUs, and (being SIMD) every active vCPU sent its message in the
> same cycle].
>
>
>> So there are, say, 1M neurons and if every neuron fired and was
>> maximally connected then there would be 128M messages _per iteration_.
>>
>> Since neurons can be arbitrarily connected, messages can travel
>> in any direction through the torus to get to their destination,
>> with variable routings and numbers of hops,
>> and therefore different delivery latencies.
>> Two messages can leave in the order A B and arrive in the order B A.
>
> Can happen with any topology unless you implement causal messaging.

Yes, if routing is static then the network itself can act like a FIFO
between two points and one might be tempted to take advantage of this
when solving a 'causality problem'.

So I'm just pointing out that is not the case here.

>> To ensure experiments are repeatable the system must be globally
>> synchronous. That is, for each clock tick a set of neurons fire
>> and complete their operation. IOW it is NOT a free-running collection
>> of asynchronous nodes firing messages at each other ASAP.
>>
>> For efficiency nodes may temporarily be locally asynchronous.
>>
>> For efficiency we do not want to send messages for neurons that
>> don't fire in a clock cycle.
>
> Which is problematic as you note below.
>
> But if you can arrange that neurons output /something/ on every graph
> cycle - even if it is a null or "no change" message - then every
> neuron can process input on every graph cycle.

Right, but as I understand the scenario for the termination problem,
originally posed by E.W. Dijkstra in 1980 in
"Termination detection for diffusing computations",
was that messages are only sent when needed.
(Though I found Dijkstra's paper difficult to follow.)

Dijkstra–Scholten algorithm
https://en.wikipedia.org/wiki/Dijkstra%E2%80%93Scholten_algorithm

https://en.wikipedia.org/wiki/Huang%27s_algorithm

Also note the neural network connections could be _cyclic_.

Not always sending messages is what makes it an
interesting synchronization problem.

>> The problem is, how do we know when the neurons should fire if
>> we can't just count message arrivals since some messages may not
>> be sent in a particular cycle,
>> and when has a cycle has finished so we can start the next iteration?
>
> Again the answer is causal messaging.

Ok but exactly how?

If I understood Safra's solution, cited in the paper, they form the nodes
into a linear token ring and pass a token around the loop until there are
two passes without any changes in the net difference between the
node's send and received message counts.
That sounds awfully expensive.

>> Does that basically summarize the problem space, for neural nets at least?
>> Because I had some ideas on how to do this which might be more efficient
>> than Safra but I didn't want to launch into a great long explanation if
>> I had misunderstood the problem.
>
> YMMV,
> George

Re: Hardware assisted message passing

<d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22221&group=comp.arch#22221

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:2c9:: with SMTP id a9mr1185927qtx.28.1638901205859; Tue, 07 Dec 2021 10:20:05 -0800 (PST)
X-Received: by 2002:a4a:5a43:: with SMTP id v64mr28076702ooa.26.1638901205645; Tue, 07 Dec 2021 10:20:05 -0800 (PST)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 7 Dec 2021 10:20:05 -0800 (PST)
In-Reply-To: <somlvi$8i1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a0:7384:e659:3cc7; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a0:7384:e659:3cc7
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <somlvi$8i1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 07 Dec 2021 18:20:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 70
 by: MitchAlsup - Tue, 7 Dec 2021 18:20 UTC

On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
> On 12/5/21 1:09 PM, MitchAlsup wrote:
> >>
> >> So, the basic use is to send a “message” of typically up to several
> >> hundred bytes (though often much less) from one process to another
> >> another process that is willing to accept it. An extension is to allow
> >> the hardware itself to send or receiver a message. So on to other uses..
> > <
> > My intended use was to explicitly make interrupts a first class citizen
> > of the architecture. An interrupt causes a waiting thread to receive control
> > at a specified priority and within a specified set of CPUs (affinity).
<
> As you may know, typically this is not how (currently)
> interrupt handling works. There is no thread *waiting* to
> receive control.
<
What do you all the pointer to code in the interrupt vector table ?
Is it not a waiting thread ?????
<
> The interrupt handler may borrow the the
> kernel stack of the currently running process or may use its
> own stack. The handler may wakeup an "upper half" supervisory
> process depending what specific condition it is waiting on.
<
Yes, my model allows for this wakeup by sending an IPI,
after all waking up a waiting thread on another chip with
different chip/core affinity is going to basically require an
IPI at some point anyway.
>
> The issue is that
> a) the handler has to do something to make the interrupt
> condition go away,
<
So, the faster one gets to the ISR the better !
The less context the ISR has to save, the better !!
My model has HW perform both functions.
<
The SW receiving control already has its registers in the
same state they were when it went into a wait state (last
time) so it should have the various data needed to access
the interrupting device instantly. It already has its own stack
and does not have to borrow system stack, not save state on
system stack, nor does it need ANY particular privilege.
<
> b) at very high IO rates it has very little time to handle
> the condition and queuing up the interrupt condition would
> make it miss some input. You can use FIFOs in each
> direction but even so you'd want to service the device
> fast enough.
<
My IPIs are blips on the memory BW interconnect. As explained
above, the queueing is single cycle, so an interrupt can be
accepted and queued every cycle. Context-switch dispatch
is a 5-cycle event on the interconnect and at the CPU.
<
Since ISR already has its state, it's very first instructions will
read from the control registers of the device, a few instructions
later it will write to the control registers of the device, look up
something in its tables, and IPI (1 instruction) the waiting thread
back into a running state.
<
This model should REMOVE several hundred cycles per interrupt
from how ARM or x86 do the same work.
>
> The upper-half runs in the some requesting process'es
> context, and there may be multiple such processes waiting.

Re: Hardware assisted message passing

<303d8b2e-88fb-424f-9e85-2dfa1d7a0c2en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22222&group=comp.arch#22222

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5365:: with SMTP id e5mr890403qvv.127.1638901347900; Tue, 07 Dec 2021 10:22:27 -0800 (PST)
X-Received: by 2002:a9d:82a:: with SMTP id 39mr36657321oty.282.1638901347661; Tue, 07 Dec 2021 10:22:27 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 7 Dec 2021 10:22:27 -0800 (PST)
In-Reply-To: <j18kaiFibh7U1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1a0:7384:e659:3cc7; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1a0:7384:e659:3cc7
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <u1xrJ.95050$np6.11511@fx46.iad> <j18kaiFibh7U1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <303d8b2e-88fb-424f-9e85-2dfa1d7a0c2en@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 07 Dec 2021 18:22:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 80
 by: MitchAlsup - Tue, 7 Dec 2021 18:22 UTC

On Tuesday, December 7, 2021 at 2:29:09 AM UTC-6, Niklas Holsti wrote:
> Just some notes on the discussion of Ada rendez-vous, a bit off-topic:
>
> On 2021-12-07 1:29, EricP wrote:
> > MitchAlsup wrote:
> [...]
> >> On Monday, November 29, 2021 at 8:14:02 AM UTC-8, Stephen Fuld wrote:
> >>> When I first encountered the Elxsi 6400 system in the mid 1980s, I
> >>> saw the “unifying” power of hardware (well, in their case microcode)
> >>> assisted message passing. [...]
> >>> So, the basic use is to send a “message” of typically up to several
> >>> hundred bytes (though often much less) from one process to another
> >>> another process that is willing to accept it. An extension is to
> >>> allow the hardware itself to send or receiver a message. So on to
> >>> other uses.
> >>
> >> My intended use was to explicitly make interrupts a first class
> >> citizen of the architecture. An interrupt causes a waiting thread
> >> to receive control at a specified priority and within a specified
> >> set of CPUs (affinity). [...]
> >>
> >> An interrupt is associated with a thread, that thread has a priority,
> >> Thread header (PSW, Root Pointer, State) and a register file.
> >>
> >> The receipt of an interrupt either gets queued onto that thread (if
> >> it is already running) or activates that thread if is is not
> >> running. The queueing guarantees that the thread is never active
> >> more than once, and that no interrupts are ever lost.
> [...]
> >> I worked in ADA rendezvous into the system, too. They have almost
> >> all of the properties of the std interrupts, but add in the address
> >> space join/release, and the reactivation of caller at the end of
> >> rendezvous.
> >
> > I have a certain amount of apprehension about this approach.
> >
> > Note that Ada rendezvous are inherently _synchronous_ client-server only,
> > wherein the client waits until the server finishes the rendezvous.
> Yes.
> > There are no events, mutexes, semaphores, etc in that model. It does
> > have timed entry and accept statements, but that is synchronous
> > polling.
<
> Basically it is a time-out on the wait for the synchronous
> communication. But it can be used for polling, true.
<
> > You can build things like events and mutexes out of Ada tasks and
> > rendezvous but it is ridiculously expensive because they require more
> > tasks each with its header and stack, etc to create "mutex servers".
<
> There were some Ada compilers that detected such simple "mutex" or
> "monitor" tasks and optimized out the thread and stack aspects. But
> current Ada has instead added data-mediated communication, using a
> feature called "protected objects", similar to the "monitor" concept in
> some Pascal-derived multi-threading languages and also similar to
> "synchronized" in Java. With this feature one can build queues and
> similar shared data structures without the extra task-switching and
> stacking overhead.
<
An ISA ATOMIC event is more efficient than a rendezvous ATOMIC event
so my model has no problem in allowing the compiler to do such.
<
> > The first thing people found out about synchronous RPC is that they really
> > didn't want synchronous RPC, they wanted asynchronous RPC so they could
> > submit multiple concurrent requests at once then wait for all to be done.
> Ada has an standardized but optional RPC mechanism that supports both
> synchronous and asynchronous RPC. However, it has not been widely used
> -- programmers prefer to use mechanisms that work inter-language, such
> as sockets -- and I believe is not supported now by any Ada compiler.
<
Thanks

Re: Hardware assisted message passing

<EZOrJ.100666$Ql5.18772@fx39.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22224&group=comp.arch#22224

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <cPf*upDAy@news.chiark.greenend.org.uk> <ONqrJ.50504$zF3.12794@fx03.iad> <4pbtqg98to3k4ib0qi4iuu7mlv7r2iafbc@4ax.com> <cVJrJ.106596$Wkjc.102711@fx35.iad>
In-Reply-To: <cVJrJ.106596$Wkjc.102711@fx35.iad>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 56
Message-ID: <EZOrJ.100666$Ql5.18772@fx39.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 07 Dec 2021 19:54:12 UTC
Date: Tue, 07 Dec 2021 14:53:59 -0500
X-Received-Bytes: 3283
 by: EricP - Tue, 7 Dec 2021 19:53 UTC

EricP wrote:
> George Neuner wrote:
>> On Mon, 06 Dec 2021 11:23:00 -0500, EricP
>
>>> The problem is, how do we know when the neurons should fire if
>>> we can't just count message arrivals since some messages may not
>>> be sent in a particular cycle,
>>> and when has a cycle has finished so we can start the next iteration?
>>
>> Again the answer is causal messaging.
>
> Ok but exactly how?
>
> If I understood Safra's solution, cited in the paper, they form the nodes
> into a linear token ring and pass a token around the loop until there are
> two passes without any changes in the net difference between the
> node's send and received message counts.
> That sounds awfully expensive.

The basic idea I had for how to do this is to ask all the neurons
what they are going to do next, then watch until they do that.

1) Define the start of a cycle as the point when all neurons have received
the messages for cycle T_n but have not yet fired for cycle T_n+1.

2) A central manager thread broadcasts a GO message to all neurons
for this cycle. Such signal is distributed to multiple groups
and subgroups such as where each node informs 4 others so that after
8 steps all 64k nodes have received the broadcast.

3) All neurons use the values pending in their inputs to calculate their
activation function and decide whether they will fire in this cycle.
They all then report back to the manager the number of messages
that they _intend_ to send in this cycle.

The total number of messages intended to send is summarized up the
opposite of the broadcast, say in quads so that after 8 steps the
manager has the total number of messages to be sent in this cycle
between all 64k nodes.

4) After reporting the number of messages they intend to send
to the manager the neurons proceed to do so until all are gone.

Neurons also count the number of messages they have received
with values for the next cycle.

5) The manager broadcasts a probe request asking all neurons how many
messages they actually received. The replies are totaled up in quads
so that after 8 steps all 64k nodes current totals are summed.

When the number actually received matches the number intended to send
then cycle is over and a new set of inputs is waiting on each neuron
for the next cycle to start.

Re: Hardware assisted message passing

<soota5$js6$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22227&group=comp.arch#22227

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bak...@iitbombay.org (Bakul Shah)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Tue, 7 Dec 2021 16:12:51 -0800
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <soota5$js6$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<somlvi$8i1$1@dont-email.me>
<d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 8 Dec 2021 00:12:54 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4e4171fbfa52a5e6148afa56f859f760";
logging-data="20358"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Wnv0xQCKDd1beg9rYNIFT"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:68.0)
Gecko/20100101 Firefox/68.0 SeaMonkey/2.53.10
Cancel-Lock: sha1:VvaIKsr+cc1885t5Yj0dyW53SWY=
In-Reply-To: <d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
 by: Bakul Shah - Wed, 8 Dec 2021 00:12 UTC

On 12/7/21 10:20 AM, MitchAlsup wrote:
> On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
>> On 12/5/21 1:09 PM, MitchAlsup wrote:
>>>>
>>>> So, the basic use is to send a “message” of typically up to several
>>>> hundred bytes (though often much less) from one process to another
>>>> another process that is willing to accept it. An extension is to allow
>>>> the hardware itself to send or receiver a message. So on to other uses.
>>> <
>>> My intended use was to explicitly make interrupts a first class citizen
>>> of the architecture. An interrupt causes a waiting thread to receive control
>>> at a specified priority and within a specified set of CPUs (affinity).
> <
>> As you may know, typically this is not how (currently)
>> interrupt handling works. There is no thread *waiting* to
>> receive control.
> <
> What do you all the pointer to code in the interrupt vector table ?
> Is it not a waiting thread ?????

No. I think of it as a procedure call. AFAIK all the processors are
still following the model Dijkstra & IBM independently came up with.
From Dijkstra's "My Recollections of Operating System Design":
https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html

To start with I tried to convince myself that it was possible to
save and restore enough of the machine state so that, after the
servicing of the interrupt, under all circumstances the interrupted
computation could be resumed correctly.

... I designed a protocol for saving and restoring that could be
shown always to work properly. For all three of us it had been a
very instructive experience. It was clearly an experience that
others had missed because for years I would find flaws in designs
of interrupt hardware.

The interrupt vector is more like a function table - what you call
depends on the entry associated with an interrupt source.

The Amd29k had one of the simplest implementations. On interrupt
or trap, it copied just *three* register to backup registers and
disabled all interrupts. A few instructions were needed to do
anything useful. The "iret" instruction restored these registers
and the original code continued executing.

Something like x86 has a much more complicated system but this
basic principle remains.

You can do a complete context switch on interrupt but then you
are saving a lot more state than necessary. Most of the time the
same code is going to continue executing after iret so why bother.
IMHO, it is better to think of an interrupt as an external trap
than as a signal for a thread switch.

The other thing to think about is that the OS scheduler would
want to be in control of all thread switching so that it can
keep track of who is using what resources and follow some
scheduling policies to provide an overall control for a smooth
operation.

Re: Hardware assisted message passing

<soov52$sus$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22228&group=comp.arch#22228

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bak...@iitbombay.org (Bakul Shah)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Tue, 7 Dec 2021 16:44:18 -0800
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <soov52$sus$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<somlvi$8i1$1@dont-email.me>
<d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 8 Dec 2021 00:44:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4e4171fbfa52a5e6148afa56f859f760";
logging-data="29660"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199HArvTPOuMHvEwHkXMLZZ"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:68.0)
Gecko/20100101 Firefox/68.0 SeaMonkey/2.53.10
Cancel-Lock: sha1:BKWEsw+RewUgevvri8Fwx2JLSjs=
In-Reply-To: <d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
 by: Bakul Shah - Wed, 8 Dec 2021 00:44 UTC

[Forgot to complete the response...]
On 12/7/21 10:20 AM, MitchAlsup wrote:
> On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
>>
>> The interrupt handler may borrow the the
>> kernel stack of the currently running process or may use its
>> own stack. The handler may wakeup an "upper half" supervisory
>> process depending what specific condition it is waiting on.
> <
> Yes, my model allows for this wakeup by sending an IPI,
> after all waking up a waiting thread on another chip with
> different chip/core affinity is going to basically require an
> IPI at some point anyway.

IPI is simply reaching out and touching some processor!
Whichever processor ultimately services an interrupt request,
it still has to do the same thing.

>>
>> The issue is that
>> a) the handler has to do something to make the interrupt
>> condition go away,
> <
> So, the faster one gets to the ISR the better !
> The less context the ISR has to save, the better !!
> My model has HW perform both functions.
> <
> The SW receiving control already has its registers in the
> same state they were when it went into a wait state (last
> time) so it should have the various data needed to access
> the interrupting device instantly. It already has its own stack
> and does not have to borrow system stack, not save state on
> system stack, nor does it need ANY particular privilege.

What I meant: what needs to be done to "make the interrupt
condition go away" is highly device and condition dependent.
If you have full control over the peripheral/IO device, you
can define a protocol that can maximize throughput and/or
minimize latency and the s/w overhead but you are stuck with
USB, UART, SPI, I2C etc. :-)

>> b) at very high IO rates it has very little time to handle
>> the condition and queuing up the interrupt condition would
>> make it miss some input. You can use FIFOs in each
>> direction but even so you'd want to service the device
>> fast enough.
> <
> My IPIs are blips on the memory BW interconnect. As explained
> above, the queueing is single cycle, so an interrupt can be
> accepted and queued every cycle. Context-switch dispatch
> is a 5-cycle event on the interconnect and at the CPU.
> <
> Since ISR already has its state, it's very first instructions will
> read from the control registers of the device, a few instructions
> later it will write to the control registers of the device, look up
> something in its tables, and IPI (1 instruction) the waiting thread
> back into a running state.
> <
> This model should REMOVE several hundred cycles per interrupt
> from how ARM or x86 do the same work.

Perhaps. Everything you wrote above can help traditional interrupt
handling as well.

Re: Hardware assisted message passing

<80fa2f56-3c38-43e6-b13e-53dacc9cbe26n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22229&group=comp.arch#22229

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6b56:: with SMTP id x22mr3863313qts.656.1638927140561;
Tue, 07 Dec 2021 17:32:20 -0800 (PST)
X-Received: by 2002:a05:6808:1141:: with SMTP id u1mr8639566oiu.30.1638927140265;
Tue, 07 Dec 2021 17:32:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 7 Dec 2021 17:32:20 -0800 (PST)
In-Reply-To: <soota5$js6$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a4f6:3ac6:4901:b5ca;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a4f6:3ac6:4901:b5ca
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<somlvi$8i1$1@dont-email.me> <d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
<soota5$js6$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <80fa2f56-3c38-43e6-b13e-53dacc9cbe26n@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 08 Dec 2021 01:32:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Wed, 8 Dec 2021 01:32 UTC

On Tuesday, December 7, 2021 at 6:12:56 PM UTC-6, Bakul Shah wrote:
> On 12/7/21 10:20 AM, MitchAlsup wrote:
> > On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
> >> On 12/5/21 1:09 PM, MitchAlsup wrote:
> >>>>
> >>>> So, the basic use is to send a “message” of typically up to several
> >>>> hundred bytes (though often much less) from one process to another
> >>>> another process that is willing to accept it. An extension is to allow
> >>>> the hardware itself to send or receiver a message. So on to other uses.
> >>> <
> >>> My intended use was to explicitly make interrupts a first class citizen
> >>> of the architecture. An interrupt causes a waiting thread to receive control
> >>> at a specified priority and within a specified set of CPUs (affinity)..
> > <
> >> As you may know, typically this is not how (currently)
> >> interrupt handling works. There is no thread *waiting* to
> >> receive control.
> > <
> > What do you all the pointer to code in the interrupt vector table ?
> > Is it not a waiting thread ?????
<
> No. I think of it as a procedure call.
<
After receiving control and saving state--is it not fundamentally
the embodiment of an abstract thread.
<
A thread has a collection of bit patterns that describe and limit
the activities of the thread {IP, MMU Root Pointer, Program Status
Register(s), ...} and is running under and OS and possibly an HV;
along with program visible state {GPRs, FPRs, XMM registers,...}.
<
What property above does the IRS not contain ??
<
Yes, it is running at system privilege (something NOT necessary)
and technically has access to the <suspended> task's virtual
memory area (also technically unnecessary).
<
So what prevents the running embodiment from smelling like a thread ?
<
> AFAIK all the processors are
> still following the model Dijkstra & IBM independently came up with.
<
Just because a model developed 2 decades before there were multiprocessors
does not mean it should continue to be followed 5 decades later !!
<
> From Dijkstra's "My Recollections of Operating System Design":
> https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html
>
> To start with I tried to convince myself that it was possible to
> save and restore enough of the machine state so that, after the
> servicing of the interrupt, under all circumstances the interrupted
> computation could be resumed correctly.
<
I assigned this unit of work to HW. No instructions are used to save
state, change a myriad of control registers. HW guarantees that
when control returns to the interrupted task taht (other than wall
clock time) not-running is invisible.
>
> ... I designed a protocol for saving and restoring that could be
> shown always to work properly. For all three of us it had been a
> very instructive experience. It was clearly an experience that
> others had missed because for years I would find flaws in designs
> of interrupt hardware.
>
> The interrupt vector is more like a function table - what you call
> depends on the entry associated with an interrupt source.
<
The "thing" accessing the function table (above) is a thread, and
it transfers control to an entry point which (then) starts running
just like a thread !! What makes this different ??
>
> The Amd29k had one of the simplest implementations. On interrupt
> or trap, it copied just *three* register to backup registers and
> disabled all interrupts. A few instructions were needed to do
> anything useful. The "iret" instruction restored these registers
> and the original code continued executing.
<
88K did similarly, and there is some suspicion we were down there first,
although it can be said a lot of this part of 88K was stolen from MIPS;
So, no credit is being given to the originators......
>
> Something like x86 has a much more complicated system but this
> basic principle remains.
<
The suck wind and be overly-complicated part they got right in
spades !!
<
It is time to move beyond the "I am now running and I need to morph
into something/somebody else." This was an acceptable model when
there was only 1 CPU/core. Now that the minimum is 4-6-8, and
especially now that there are many systems with multiple chips each
chip containing 4-16 cores, the 1-core does everything model is
"sub-optimal"
<
This is what I am trying to fix.
<
Thus, interrupts are not directed to a processor/core/CPU, they are
directed to a system queueing "function unit" taht can perform
all of the context switch decisions (except afinitization) without
using ANY (zero, 0.0, nada) CPU instructions.
>
> You can do a complete context switch on interrupt but then you
> are saving a lot more state than necessary.
<
But if I can save everything that IS necessary in 5-cycles, has saving
extra state hurt me in any way ? 1-cycle saves all 8 double words
that represent thread header state, another 4-cycles move the register
file to memory.
<
This does not get multiplied by 2× (a save is associated with a restore)
because the arriving data pushes out the current data as if in an exchange.
So there are 5-beats of data from the memory controller towards the core,
and there are 5-beats of data from the core to the memory controller,
and these 10-beats of data take only 6-cycles--as seen at the point of
context switching.
<
> Most of the time the
> same code is going to continue executing after iret so why bother.
> IMHO, it is better to think of an interrupt as an external trap
> than as a signal for a thread switch.
<
So, how do you create a trap/exception in chip[i].core[k].thread[j] that
causes the exception handler to run in chip[m].core[p].thread[e] ?
>
> The other thing to think about is that the OS scheduler would
> want to be in control of all thread switching so that it can
> keep track of who is using what resources and follow some
> scheduling policies to provide an overall control for a smooth
> operation.
<
The OS moves threads in and out of the system queueing subsystem.
While IN the subsystem the system queue performs the switching.
While NOT in the subsystem, the thread is essentially IDLE in a wait
state.
<
There are statistics controls in the thread header that enable the system
to watch and control what and how many resources are in use.

Re: Hardware assisted message passing

<62c6a35b-2cd6-4bd4-89e3-c7c3a038b585n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22230&group=comp.arch#22230

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:5cc:: with SMTP id 195mr3404476qkf.680.1638927685985; Tue, 07 Dec 2021 17:41:25 -0800 (PST)
X-Received: by 2002:a05:6808:1914:: with SMTP id bf20mr8950458oib.7.1638927685773; Tue, 07 Dec 2021 17:41:25 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 7 Dec 2021 17:41:25 -0800 (PST)
In-Reply-To: <soov52$sus$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a4f6:3ac6:4901:b5ca; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a4f6:3ac6:4901:b5ca
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <somlvi$8i1$1@dont-email.me> <d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com> <soov52$sus$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <62c6a35b-2cd6-4bd4-89e3-c7c3a038b585n@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 08 Dec 2021 01:41:25 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 85
 by: MitchAlsup - Wed, 8 Dec 2021 01:41 UTC

On Tuesday, December 7, 2021 at 6:44:21 PM UTC-6, Bakul Shah wrote:
> [Forgot to complete the response...]
> On 12/7/21 10:20 AM, MitchAlsup wrote:
> > On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
> >>
> >> The interrupt handler may borrow the the
> >> kernel stack of the currently running process or may use its
> >> own stack. The handler may wakeup an "upper half" supervisory
> >> process depending what specific condition it is waiting on.
> > <
> > Yes, my model allows for this wakeup by sending an IPI,
> > after all waking up a waiting thread on another chip with
> > different chip/core affinity is going to basically require an
> > IPI at some point anyway.
<
> IPI is simply reaching out and touching some processor!
<
No, some thread ! That thread has been afinitized to a subset
of cores (=processor) and should a context switch be appropriate,
it is performed to the lowest priority core, on this chip. So, interrupts
are absolutely NOT directed towards some processor--they are directed
to the ISR thread, and the system queue figures out when and who to
context switch--all without any CPU cycles being utilized (=wasted).
<
> Whichever processor ultimately services an interrupt request,
> it still has to do the same thing.
> >>
> >> The issue is that
> >> a) the handler has to do something to make the interrupt
> >> condition go away,
> > <
> > So, the faster one gets to the ISR the better !
> > The less context the ISR has to save, the better !!
> > My model has HW perform both functions.
> > <
> > The SW receiving control already has its registers in the
> > same state they were when it went into a wait state (last
> > time) so it should have the various data needed to access
> > the interrupting device instantly. It already has its own stack
> > and does not have to borrow system stack, not save state on
> > system stack, nor does it need ANY particular privilege.
<
> What I meant: what needs to be done to "make the interrupt
> condition go away" is highly device and condition dependent.
> If you have full control over the peripheral/IO device, you
> can define a protocol that can maximize throughput and/or
> minimize latency and the s/w overhead but you are stuck with
> USB, UART, SPI, I2C etc. :-)
<
We seem to be talking past each other.
<
Your response misused a bunch of words which might be appropriate
when talking about yesterday's architectures, but are not appropriate
when talking about an architecture which was designed under the notion
of "inherent parallelism". I was redirecting you word choices to a higher
level of abstraction, and one that is inherently more efficient that yesterday
architectures--all without changing an iota of what devices do and what
ISRs do when confronted by interrupts.
<
All this architecture is getting rid of is a) cycles of latency, b) cycles of
Queueing overhead and the locks associated, c) cycles of save and restore.
<
> >> b) at very high IO rates it has very little time to handle
> >> the condition and queuing up the interrupt condition would
> >> make it miss some input. You can use FIFOs in each
> >> direction but even so you'd want to service the device
> >> fast enough.
> > <
> > My IPIs are blips on the memory BW interconnect. As explained
> > above, the queueing is single cycle, so an interrupt can be
> > accepted and queued every cycle. Context-switch dispatch
> > is a 5-cycle event on the interconnect and at the CPU.
> > <
> > Since ISR already has its state, it's very first instructions will
> > read from the control registers of the device, a few instructions
> > later it will write to the control registers of the device, look up
> > something in its tables, and IPI (1 instruction) the waiting thread
> > back into a running state.
> > <
> > This model should REMOVE several hundred cycles per interrupt
> > from how ARM or x86 do the same work.
<
> Perhaps. Everything you wrote above can help traditional interrupt
> handling as well.
<
And yet you seem to be defending the old and inefficient position

Re: Hardware assisted message passing

<sop4c1$lqt$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22231&group=comp.arch#22231

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Tue, 7 Dec 2021 18:13:21 -0800
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <sop4c1$lqt$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<somlvi$8i1$1@dont-email.me>
<d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
<soota5$js6$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 8 Dec 2021 02:13:22 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1e815e270007718034a0a0072fbb7457";
logging-data="22365"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18FwQlXmPqfSgTPzsMLA11H"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.2
Cancel-Lock: sha1:EMco/b0KQjnXsUvjHmDMX+ICDew=
In-Reply-To: <soota5$js6$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Wed, 8 Dec 2021 02:13 UTC

On 12/7/2021 4:12 PM, Bakul Shah wrote:
> On 12/7/21 10:20 AM, MitchAlsup wrote:
>> On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
>>> On 12/5/21 1:09 PM, MitchAlsup wrote:
>>>>>
>>>>> So, the basic use is to send a “message” of typically up to several
>>>>> hundred bytes (though often much less) from one process to another
>>>>> another process that is willing to accept it. An extension is to allow
>>>>> the hardware itself to send or receiver a message. So on to other
>>>>> uses.
>>>> <
>>>> My intended use was to explicitly make interrupts a first class citizen
>>>> of the architecture. An interrupt causes a waiting thread to receive
>>>> control
>>>> at a specified priority and within a specified set of CPUs (affinity).
>> <
>>> As you may know, typically this is not how (currently)
>>> interrupt handling works. There is no thread *waiting* to
>>> receive control.
>> <
>> What do you all the pointer to code in the interrupt vector table ?
>> Is it not a waiting thread ?????
>
> No. I think of it as a procedure call. AFAIK all the processors are
> still following the model Dijkstra & IBM independently came up with.
> From Dijkstra's "My Recollections of Operating System Design":
> https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html
>
>   To start with I tried to convince myself that it was possible to
>   save and restore enough of the machine state so that, after the
>   servicing of the interrupt, under all circumstances the interrupted
>   computation could be resumed correctly.
>
>   ... I designed a protocol for saving and restoring that could be
>   shown always to work properly. For all three of us it had been a
>   very instructive experience. It was clearly an experience that
>   others had missed because for years I would find flaws in designs
>   of interrupt hardware.
>
> The interrupt vector is more like a function table - what you call
> depends on the entry associated with an interrupt source.
>
> The Amd29k had one of the simplest implementations. On interrupt
> or trap, it copied just *three* register to backup registers and
> disabled all interrupts. A few instructions were needed to do
> anything useful. The "iret" instruction restored these registers
> and the original code continued executing.
>
> Something like x86 has a much more complicated system but this
> basic principle remains.
>
> You can do a complete context switch on interrupt but then you
> are saving a lot more state than necessary. Most of the time the
> same code is going to continue executing after iret so why bother.
> IMHO, it is better to think of an interrupt as an external trap
> than as a signal for a thread switch.
>
> The other thing to think about is that the OS scheduler would
> want to be in control of all thread switching so that it can
> keep track of who is using what resources and follow some
> scheduling policies to provide an overall control for a smooth
> operation.

You can have it both ways. Mill dispatch is to a function pointer as you
describe, but that can point at a portal so the call migrates to a
different context as Mitch describes. Of course, you stay in the same
stack, which Mitch doesn't do, but Mill can interleave multiple contexts
on a single stack.

Your concern for the cost of context switch is a bit outdated. Both M66
and Mill can do a switch in the time of a legacy call, though using
different methods, and there are other ways available too. The critical
thing is to design context switch as a first class operation from the
beginning.

Re: Hardware assisted message passing

<a6b2fa2f-6c2b-4e50-b746-2bfe5bc90c23n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22232&group=comp.arch#22232

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:5286:: with SMTP id kj6mr3941276qvb.50.1638931542425; Tue, 07 Dec 2021 18:45:42 -0800 (PST)
X-Received: by 2002:a05:6808:16ac:: with SMTP id bb44mr9020744oib.122.1638931530214; Tue, 07 Dec 2021 18:45:30 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 7 Dec 2021 18:45:30 -0800 (PST)
In-Reply-To: <sop4c1$lqt$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a4f6:3ac6:4901:b5ca; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a4f6:3ac6:4901:b5ca
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <somlvi$8i1$1@dont-email.me> <d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com> <soota5$js6$1@dont-email.me> <sop4c1$lqt$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a6b2fa2f-6c2b-4e50-b746-2bfe5bc90c23n@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 08 Dec 2021 02:45:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 90
 by: MitchAlsup - Wed, 8 Dec 2021 02:45 UTC

On Tuesday, December 7, 2021 at 8:13:25 PM UTC-6, Ivan Godard wrote:
> On 12/7/2021 4:12 PM, Bakul Shah wrote:
> > On 12/7/21 10:20 AM, MitchAlsup wrote:
> >> On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
> >>> On 12/5/21 1:09 PM, MitchAlsup wrote:
> >>>>>
> >>>>> So, the basic use is to send a “message” of typically up to several
> >>>>> hundred bytes (though often much less) from one process to another
> >>>>> another process that is willing to accept it. An extension is to allow
> >>>>> the hardware itself to send or receiver a message. So on to other
> >>>>> uses.
> >>>> <
> >>>> My intended use was to explicitly make interrupts a first class citizen
> >>>> of the architecture. An interrupt causes a waiting thread to receive
> >>>> control
> >>>> at a specified priority and within a specified set of CPUs (affinity).
> >> <
> >>> As you may know, typically this is not how (currently)
> >>> interrupt handling works. There is no thread *waiting* to
> >>> receive control.
> >> <
> >> What do you all the pointer to code in the interrupt vector table ?
> >> Is it not a waiting thread ?????
> >
> > No. I think of it as a procedure call. AFAIK all the processors are
> > still following the model Dijkstra & IBM independently came up with.
> > From Dijkstra's "My Recollections of Operating System Design":
> > https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html
> >
> > To start with I tried to convince myself that it was possible to
> > save and restore enough of the machine state so that, after the
> > servicing of the interrupt, under all circumstances the interrupted
> > computation could be resumed correctly.
> >
> > ... I designed a protocol for saving and restoring that could be
> > shown always to work properly. For all three of us it had been a
> > very instructive experience. It was clearly an experience that
> > others had missed because for years I would find flaws in designs
> > of interrupt hardware.
> >
> > The interrupt vector is more like a function table - what you call
> > depends on the entry associated with an interrupt source.
> >
> > The Amd29k had one of the simplest implementations. On interrupt
> > or trap, it copied just *three* register to backup registers and
> > disabled all interrupts. A few instructions were needed to do
> > anything useful. The "iret" instruction restored these registers
> > and the original code continued executing.
> >
> > Something like x86 has a much more complicated system but this
> > basic principle remains.
> >
> > You can do a complete context switch on interrupt but then you
> > are saving a lot more state than necessary. Most of the time the
> > same code is going to continue executing after iret so why bother.
> > IMHO, it is better to think of an interrupt as an external trap
> > than as a signal for a thread switch.
> >
> > The other thing to think about is that the OS scheduler would
> > want to be in control of all thread switching so that it can
> > keep track of who is using what resources and follow some
> > scheduling policies to provide an overall control for a smooth
> > operation.
> You can have it both ways. Mill dispatch is to a function pointer as you
> describe, but that can point at a portal so the call migrates to a
> different context as Mitch describes. Of course, you stay in the same
> stack, which Mitch doesn't do, but Mill can interleave multiple contexts
> on a single stack.
>
> Your concern for the cost of context switch is a bit outdated. Both M66
> and Mill can do a switch in the time of a legacy call, though using
> different methods, and there are other ways available too. The critical
> thing is to design context switch as a first class operation from the
> beginning.
<
Well, I got there many years after the ISA had stabilized sort of like walking
backwards to the starting point you should have taken originally.

Re: Hardware assisted message passing

<sopgdf$g2d$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22233&group=comp.arch#22233

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Tue, 7 Dec 2021 23:38:53 -0600
Organization: A noiseless patient Spider
Lines: 202
Message-ID: <sopgdf$g2d$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<cPf*upDAy@news.chiark.greenend.org.uk>
<89dad950-6ca9-4836-9d0d-a5d0d09fcc72n@googlegroups.com>
<NNqrJ.50503$zF3.50320@fx03.iad>
<27b35f1c-83dc-402f-9f5d-db0c90ca2a45n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 8 Dec 2021 05:38:55 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0da8b3598a6c92ee98c0ce81a9b18d1b";
logging-data="16461"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19VOkPlh1YeKGtNwKLmM4eK"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.2
Cancel-Lock: sha1:ZG1wfdkZTW3A+S/5/vAKQPFNIvQ=
In-Reply-To: <27b35f1c-83dc-402f-9f5d-db0c90ca2a45n@googlegroups.com>
Content-Language: en-US
 by: BGB - Wed, 8 Dec 2021 05:38 UTC

On 12/6/2021 1:59 PM, MitchAlsup wrote:
> On Monday, December 6, 2021 at 10:23:12 AM UTC-6, EricP wrote:
>> MitchAlsup wrote:
>>> On Wednesday, December 1, 2021 at 2:32:54 AM UTC-8, Theo Markettos wrote:
>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> wrote:
>>>>> 2. Elxsi also used the message passing mechanism to process faults
>>>>> such as divide by zero. The hardware sends a message to the offending
>>>>> process, which can choose to process it or not. Thus another mechanism
>>>>> can be subsumed into message passing.
>>> <
>>>> Faults are just variations of traps, so the above also applies.
>>> <
>>> Pedantic mode=on
>>> <
>>> No Faults are not traps! A trap is something requested by an instruction
>>> (the TAP instruction, for example) while a fault is an unexpected event
>>> during the performance of an instruction.
>>> <
>>> Yes you can "map" traps and faults into the same "vector" space, but
>>> you should be very careful when doing this.
>>> <
>> BTW you left Pedantic mode on.
> <
> I can't decide if I should let this one go (or not) !!
>

Unmatched braces for pedantic mode...

Haven't said anything in this thread sooner, but the interrupt mechanism
in BJX2 also passes an address around, which could in theory be used for
inter-core message passing.

It is possible that an interrupt could pass:
A 16-bit exception code;
A 48-bit address;
An additional 64-bit data field:
Overlaps with the high-order bits of an XTLB miss.

In premise, the address field could be used as a data field if no
address needs to be given.

Any request number bits generally go into the low 16-bits, which are
in-general:
(15:12): Exception Type
(11: 8): Core/Node ID (0=Local, 1..F: Route to another core).
( 7: 0): Interrupt Number

Exception Types (Thus far):
0..7: Unused / Non-Interrupt (Ignored)
8: General Exception
9: Unused
A: TLB/MMU Faults
B: Unused
C: IRQs
D: Unused (Possibly, Inter-Core IRQ)
E: Syscall Event (Typically Local)
F: Internal Use (Eg, 'RTE')

These types of messages are handled differently within the core:
General Exception:
Not in an ISR: Force ISR immediately.
In an ISR: Immediate Deadlock (1).
TLB Miss:
Not in an ISR: Trigger an ISR.
In an ISR: Ignore or Queue (ISRs don't use the MMU)
IRQ:
Not in an ISR: Trigger an ISR or Ignore (depends on SR).
In an ISR: Queue or Ignore (depends on SR).

1: Other CPUs tend to auto-reboot in this case. However, deadlocking is
more useful for debugging.

Scaling this to a larger number of cores is TBD, would likely need to
either pull in more bits from somewhere else, or maybe add an XTRAP
instruction:
XTRAP Xn //Trap with 128-bit Exception Code

Where, say:
( 63: 16): Address Field (Low Bits)
( 15: 0): Exception Code (as before)
(111: 64): Address Field (High Bits)
(127:112): Expanded Tag / Routing Bits
(123:112): Extended Core ID
(127:124): Tag

But, yeah, not much notable/new ATM in BJX2 land.

Got distracted with developing another speed-oriented video codec:
Color-cell based, no entropy coder, optional LZ stage (RP2);
Uses an RGB555 space with differential endpoint encodings;
Can typically shoehorn endpoint pairs down to 8 or 16 bits;
This involved considerable use of "aggressive bit twiddly".
Uses a 6-bit "pattern table" holding a crude imitation of a DCT (*2);
Smaller than a 4x4x1 block (16 bits).
Many blocks can reuse the endpoints from prior blocks if similar;
Uses simple skip or block-offset-copy motion compensation;
...

*2: It is based on the observation that, when DCT does well at encoding
a block compactly, it is often because it manages to compact the block
down effectively to a single (non-zero) AC coefficient, so the premise
is, what if we set up a pattern table to mimic the superficial behavior
of a DCT block with only a single non-zero AC coefficient?... (The role
of the DC coefficient and AC coefficient's magnitude instead being
shifted over to the endpoints).

In the actual implementation, it mostly boils down to some
semi-convoluted logic for generating a 6-bit lookup table filled with
precomputed patterns (and can be used instead of a 4x4 block if it is
"close enough").

Design is loosely similar to another codec of mine from sometime last
year, just using differential RGB555 rather than dynamic index-color
(which had very noticeable quality issues due to "palette lag").

On my desktop PC, it is fast enough to generally operate in
gigapixel/second territory with a single-threaded decoder.

On BJX2 (at 50MHz), it is fast enough to sustain 320x200 30fps video
playback.

Albeit, this is with a color-cell/block decoder written in ASM, and
reading the whole AVI into RAM in advance (otherwise the video playback
becomes severely IO bound if the video stream is over ~ 800kbps; but at
present this is limited to playing an AVI less than 32MB).

While significantly better than CRAM, its quality still leaves
"something to be desired" if one tries to force it down to ~ 300 or 400
kbps. Comparably, at 320x200@30Hz, CRAM tends to need around several
Mbps to look reasonable.

So, the CPU-side decoding cost with the new codec is very similar to
that of CRAM, but with it having a significantly lower bitrate.

If one starts with a CRAM-like design, it is possible to get significant
space savings by LZ compressing the video frames (but, LZ compressed
CRAM, by itself, still sucks pretty bad).

I had observed that the AVI's could be made around 30% smaller by using
Deflate on them (via ZIP or gzip). However, trying to throw a Huffman
coded LZ stage (ULZ) at the frames did not yield similar effect
(per-frame, the potential savings were much smaller).

Most of the frames, individually, were a bit too small and fell well
below the "break even" points for the use of Huffman compression (most
of the buffers were small enough that RP2 was generally giving the best
compression, followed closely by LZ4).

It is more likely that Deflate was able to get an advantage when applied
over the AVI in that it is able to fit multiple video frames into its
32K sliding window (and being able to spread the same Huffman tables
over a number of video frames).

Did also look at throwing an MTF+Rice based LZ encoder at the problem
(BSRLZ), but it didn't beat out RP2 by "enough to matter". It was
generally able to produce the smallest output in an input size-range of
~ 200B to 1K, but only a few percent ahead of RP2.

So, for input frame-size ranges (typical smallest encoding):
0..99B: raw
100B..200B: RP2
200B..1K: RP2 or BSRLZ
1K..4K: RP2 or ULZ (pretty close)
4K-20K: ULZ (RP2 follows within 10%)
20K+: ULZ (wins by ~ 10-15% vs RP2)
50K+: ULZ (wins by ~ 20-30% vs RP2)

Where: RP2 is a byte oriented LZ, tends to be slightly more compact than
LZ4 in many cases (except BJX2 machine code; LZ4 wins here). RP2
decoding is also pretty fast (roughly in the same category as LZ4).

ULZ sort of resembles a hybrid of a simplified Deflate and LZ4; Uses an
LZ4-like structure for the encoded stream and a simpler scheme for
encoding the tables of symbol lengths, and limited symbols to 12 or 13
bits. Compression tends to be similar to that of Deflate.

One drawback of ULZ is its use of 3 Huffman tables. If I were to go this
route, could also consider a variant with a single Huffman table (for
literal bytes), and handling match lengths and distances using
Rice-coded prefixes. Likely option is to add an option in the encoding
to specify the use of Rice-coded lengths and distances.

The remaining unaddressed issue here is the relatively high constant
cost of setting up a Huffman table (if used, it is like every so often,
a frame would contain what is effectively a hand-grenade in terms of
performance cost).

However, if it is set to require ULZ to cross a 20% threshold, it is
pretty much unused for 320x200 or 320x180 (its probability of "winning"
is strongly correlated with frame resolution).

....

Re: Hardware assisted message passing

<96e1rg9f3ag86ulefk9iqh39hgff1irffd@4ax.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22234&group=comp.arch#22234

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: gneun...@comcast.net (George Neuner)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Wed, 08 Dec 2021 09:32:16 -0500
Organization: A noiseless patient Spider
Lines: 149
Message-ID: <96e1rg9f3ag86ulefk9iqh39hgff1irffd@4ax.com>
References: <so2u87$tbg$1@dont-email.me> <cPf*upDAy@news.chiark.greenend.org.uk> <ONqrJ.50504$zF3.12794@fx03.iad> <4pbtqg98to3k4ib0qi4iuu7mlv7r2iafbc@4ax.com> <cVJrJ.106596$Wkjc.102711@fx35.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="14b4c7ff446ca3cb77e614126ec85928";
logging-data="14338"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jKpcEdDyiYmmeANSjZhtDKjQ6uZObekw="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:zj5EUViq3QKiqL8ymMx8Lv/NKRw=
 by: George Neuner - Wed, 8 Dec 2021 14:32 UTC

On Tue, 07 Dec 2021 09:07:06 -0500, EricP
<ThatWouldBeTelling@thevillage.com> wrote:

>George Neuner wrote:
>> On Mon, 06 Dec 2021 11:23:00 -0500, EricP
>> <ThatWouldBeTelling@thevillage.com> wrote:
>>
>>> In the example scenario I'm thinking of there are, say, 64k processor
>>> nodes in a 256*256 2-D torus network, each node with a 16-bit address.
>>> Each node executes some number of neurons, say 16,
>>> so each neuron has a 20-bit address.
>>> Each neuron can have up to 128 synapses, arbitrarily connected,
>>> Any neuron can send a "fired" message addressed from one of its
>>> axons (outputs) to any different neuron dendrite (input).
>>
>> Torus has a lot of communication overhead with so many nodes.
>
>Why so?
>
>My example is a scaling up of what the paper describes,
>which has 3072 FPGA barrel processor cores running 16 threads each,
>49152 threads. Cores are connected with links north, east, south and west.
>
>I added the torus part because otherwise the distance from opposite
>corners is excessive.

The problem with /large/ torus networks is the network "diameter". The
diameter is defined variously either as the minimum number of relay
nodes a message will have to pass through, or the minimum number of
links a message will travel over, between the most distant nodes in
the network graph.

You obviously know this, but so we are on the same page ...

A 2D torus effectively is an MxN array (a sheet) with wrap around
connections at the edges. A 3D torus is a stack P of MxN sheets with
corresponding nodes in each sheet connected vertically plus wrap
around top to bottom.

The distance between the most distant nodes in an MxN 2D torus is
[(M+N)/2]: you can move in row or column, but since both wrap around
at the edges, the farthest you may need to travel in any dimension is
half its length.

Obviously in a real network you may have to travel further due to
routing decisions.

This generalizes to 3D: the diameter of a PxMxN torus is [(P+M+N)/2].

Ok. IIRC, you were talking about ~64K nodes. Simplest configuration
is a (64000 node) 40x40x40 cube . The diameter of this cube
(configured as a torus) is 60: [(40+40+40)/2]

Contrast with a hypercube where the diameter is log2(# of nodes). A
65536 node cube has a diameter of 16. Also message routing is simpler
as there is no notion of "direction" in a hypercube - routing just
computes the hamming difference between the cube addresses of sender
and receiver, and then any link which corresponds to a difference
between the sender's address and the difference "address" gets the
message closer to its receiver.

Hypertrees are more complicated but potentially can do even better.

A 65536 node hypercube has (roughly) the same diameter as a 1000 node
torus. Which do you want working on your program.

Torus networks are great for problems that have good spacial locality,
or that have very regular messaging patterns. They pretty much /suck/
for problems that need more general - or worse, arbitrary -
communication.

>>> To ensure experiments are repeatable the system must be globally
>>> synchronous. That is, for each clock tick a set of neurons fire
>>> and complete their operation. IOW it is NOT a free-running collection
>>> of asynchronous nodes firing messages at each other ASAP.
>>>
>>> For efficiency nodes may temporarily be locally asynchronous.
>>>
>>> For efficiency we do not want to send messages for neurons that
>>> don't fire in a clock cycle.
>>
>> Which is problematic as you note below.
>>
>> But if you can arrange that neurons output /something/ on every graph
>> cycle - even if it is a null or "no change" message - then every
>> neuron can process input on every graph cycle.
>
>Right, but as I understand the scenario for the termination problem,
>originally posed by E.W. Dijkstra in 1980 in
>"Termination detection for diffusing computations",
>was that messages are only sent when needed.
>(Though I found Dijkstra's paper difficult to follow.)
>
>Dijkstra–Scholten algorithm
>https://en.wikipedia.org/wiki/Dijkstra%E2%80%93Scholten_algorithm
>
>https://en.wikipedia.org/wiki/Huang%27s_algorithm
>
>Also note the neural network connections could be _cyclic_.

Which is one of those "more general" patterns for which torus are not
great (at least when they get large).

>Not always sending messages is what makes it an
>interesting synchronization problem.

Note that Huang did not actually solve the problem ... at least not
entirely ... because he provided no way to tell whether messages were
lost or simply not sent.

His method essentially is a form of causal messaging. A pecular form
IMO, but it works for some purposes.

>>> The problem is, how do we know when the neurons should fire if
>>> we can't just count message arrivals since some messages may not
>>> be sent in a particular cycle,
>>> and when has a cycle has finished so we can start the next iteration?
>
>> Again the answer is causal messaging.
>
>Ok but exactly how?
>
>If I understood Safra's solution, cited in the paper, they form the nodes
>into a linear token ring and pass a token around the loop until there are
>two passes without any changes in the net difference between the
>node's send and received message counts.
>That sounds awfully expensive.

I'll have to check my references and come back to this.

>>> Does that basically summarize the problem space, for neural nets at least?
>>> Because I had some ideas on how to do this which might be more efficient
>>> than Safra but I didn't want to launch into a great long explanation if
>>> I had misunderstood the problem.

George

Re: Hardware assisted message passing

<eA6sJ.101542$Ql5.99592@fx39.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22235&group=comp.arch#22235

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <cPf*upDAy@news.chiark.greenend.org.uk> <ONqrJ.50504$zF3.12794@fx03.iad> <4pbtqg98to3k4ib0qi4iuu7mlv7r2iafbc@4ax.com> <cVJrJ.106596$Wkjc.102711@fx35.iad> <96e1rg9f3ag86ulefk9iqh39hgff1irffd@4ax.com>
In-Reply-To: <96e1rg9f3ag86ulefk9iqh39hgff1irffd@4ax.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 36
Message-ID: <eA6sJ.101542$Ql5.99592@fx39.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 08 Dec 2021 18:12:26 UTC
Date: Wed, 08 Dec 2021 13:12:27 -0500
X-Received-Bytes: 2460
 by: EricP - Wed, 8 Dec 2021 18:12 UTC

George Neuner wrote:
> On Tue, 07 Dec 2021 09:07:06 -0500, EricP
>> Ok but exactly how?
>>
>> If I understood Safra's solution, cited in the paper, they form the nodes
>> into a linear token ring and pass a token around the loop until there are
>> two passes without any changes in the net difference between the
>> node's send and received message counts.
>> That sounds awfully expensive.
>
> I'll have to check my references and come back to this.

Here is the description of Safra that the original paper linked to.
I found it difficult to follow the description.

Shmuel Safra’s version of termination detection, Dijkstra, 1987
https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD998.PDF

also published (but not called Safra) in
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.114.9127&rep=rep1&type=pdf

I also just found it in a book and am about to look at it -
maybe it is easier to follow.

"On a Method of Multiprogramming", Feijen, Gasteren 1999
chapter 29 Shmuel Safra's Termination Detection Algorithm

They all seem to be focused on counting the number of messages
actually sent and actually received, which are two dynamic quantities,
and trying to detect when no messages are or will be in-flight in a system.

As I suggested in another msg, asking the neurons how many messages each
intends to send and summing that up is a static value for each cycle and
allows easy termination detection, and the sums can be done in parallel.
Or perhaps I misunderstood something about the problem.

Re: Hardware assisted message passing

<soqvgp$1jj$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22236&group=comp.arch#22236

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Wed, 8 Dec 2021 11:02:47 -0800
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <soqvgp$1jj$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<u1xrJ.95050$np6.11511@fx46.iad>
<2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 8 Dec 2021 19:02:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c7144fc8a349811ce70f566333a06f2a";
logging-data="1651"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/WpKYq3sD9ylSlgWntMQaFvuT2D2TPHZY="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.0
Cancel-Lock: sha1:gVRkMhU+cO8s3XJRSxzANj3ka9Y=
In-Reply-To: <2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Wed, 8 Dec 2021 19:02 UTC

On 12/6/2021 4:37 PM, MitchAlsup wrote:
> On Monday, December 6, 2021 at 5:31:04 PM UTC-6, EricP wrote:
>> MitchAlsup wrote:

snip

>> The first thing people found out about synchronous RPC is that they really
>> didn't want synchronous RPC, they wanted asynchronous RPC so they could
>> submit multiple concurrent requests at once then wait for all to be done.

This gets into the area of what features a message passing system should
support. One of the questions is asynchronous or not. This is related
to what support for multicasting is appropriate. Just to be sure, by
asynchronous, I mean the ability for an executing thread to send a
message without necessarily giving up control, so the next instruction
can execute immediately. This is related to multi-casting in that a
thread can use it to send messages to multiple destinations as a sort of
"poor mans" multicasting.

> Why can an ADA asynch not be setup by having an acceptation entry point
> enqueue the rendezvous calls onto (one or more) queues that this task
> shares with a worker task ?

That can work, but it seems to imply a lot of extra overhead versus the
sender sending messages to multiple worker tasks directly.

There are lots of things to decide about multicasting, the first of
which is how much of it include in the basic hardware.

> <
>>
>> I would want to work through some realistic examples, from soup to nuts,
>> enqueuing multiple async IO packets to various parts of the IO subsystem
>> using the rendezvous model. Just to be sure there are no gotcha's
>> and can be as efficient as hoped.
> <
> I do not see any fundamental reason why asynchronous RPC could not be worked
> into what I currently have. While it's not like the <strict> ADA rendezvous, it uses
> this piece here, and that piece there FROM the rendezvous feature set.

One of my purposes in starting this thread is to have people more
knowledgeable than me hash out what features are required/desirable,
and, for the desirable ones, work through the tradeoffs.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Hardware assisted message passing

<7e9659b0-a544-493d-b071-c626c60a3af6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22237&group=comp.arch#22237

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5eca:: with SMTP id jm10mr10361928qvb.54.1638994667856;
Wed, 08 Dec 2021 12:17:47 -0800 (PST)
X-Received: by 2002:a05:6830:1445:: with SMTP id w5mr1559821otp.112.1638994667526;
Wed, 08 Dec 2021 12:17:47 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 8 Dec 2021 12:17:47 -0800 (PST)
In-Reply-To: <soqvgp$1jj$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b00f:a0c6:d327:6bf1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b00f:a0c6:d327:6bf1
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<u1xrJ.95050$np6.11511@fx46.iad> <2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com>
<soqvgp$1jj$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7e9659b0-a544-493d-b071-c626c60a3af6n@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 08 Dec 2021 20:17:47 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 84
 by: MitchAlsup - Wed, 8 Dec 2021 20:17 UTC

On Wednesday, December 8, 2021 at 1:02:52 PM UTC-6, Stephen Fuld wrote:
> On 12/6/2021 4:37 PM, MitchAlsup wrote:
> > On Monday, December 6, 2021 at 5:31:04 PM UTC-6, EricP wrote:
> >> MitchAlsup wrote:
> snip
> >> The first thing people found out about synchronous RPC is that they really
> >> didn't want synchronous RPC, they wanted asynchronous RPC so they could
> >> submit multiple concurrent requests at once then wait for all to be done.
> This gets into the area of what features a message passing system should
> support. One of the questions is asynchronous or not. This is related
> to what support for multicasting is appropriate. Just to be sure, by
> asynchronous, I mean the ability for an executing thread to send a
> message without necessarily giving up control, so the next instruction
> can execute immediately. This is related to multi-casting in that a
> thread can use it to send messages to multiple destinations as a sort of
> "poor mans" multicasting.
> > Why can an ADA asynch not be setup by having an acceptation entry point
> > enqueue the rendezvous calls onto (one or more) queues that this task
> > shares with a worker task ?
> That can work, but it seems to imply a lot of extra overhead versus the
> sender sending messages to multiple worker tasks directly.
>
> There are lots of things to decide about multicasting, the first of
> which is how much of it include in the basic hardware.
> > <
> >>
> >> I would want to work through some realistic examples, from soup to nuts,
> >> enqueuing multiple async IO packets to various parts of the IO subsystem
> >> using the rendezvous model. Just to be sure there are no gotcha's
> >> and can be as efficient as hoped.
> > <
> > I do not see any fundamental reason why asynchronous RPC could not be worked
> > into what I currently have. While it's not like the <strict> ADA rendezvous, it uses
> > this piece here, and that piece there FROM the rendezvous feature set.
<
> One of my purposes in starting this thread is to have people more
> knowledgeable than me hash out what features are required/desirable,
> and, for the desirable ones, work through the tradeoffs.
<
Me; I start with what I know how to "do" in HW and then work on providing a
means to express "I want to use That HW" as a short sequence of instructions
(as short as 1 instruction). The exposed model has to work as well for the
typical model of trapping into the OS to ask for some service, and sending
a message several hops down the chip-to-chip interconnect and activate a
thread (over there) asking it to perform some service. SW should see al little
difference between these scenarios as possible (hopefully none).
<
For example::
<
a) I did not start with an assumption that interrupts were targeted at CPUs
(1950 model) but that interrupts are targeted at threads (ISRs:: 2020 model).
b) I already had the notion that HW performs context switches
{the saving and restoring of state} ((But note HW does not mean CPU or core))
c) the components of a system are INHERENTLY parallel and concurrent
1) there are more than 1 cores
2) there may be more than one memory + DRAM controller
3) there may be more than one southbridge
4) there may be more than one chip-to-chip interconnect
d) the components of a system are blocks of logic {thousands to millions of gates}
1) a core contains several to many function units that perform arithmetic, memory
.....access, and control flow. These are well understood units-of-work.
2) there is no reason one could NOT put a function unit in the memory controller
.....should we have a well defined set of units-of-work (effectively instructions)
.....for that function unit.
3) once positioned, one needs a means for SW to access the HW functionality.
e) once accessible, these function units can remove work from the OS+HV
.....in ways that enhance system performance {similar to making faster calculating
functions units of a core.}
f) one has to draw the line with a good understanding of the worker (typical threads)
and of the controller (OS, HV, ISR, I/O completion,...) in order that the new model
can be programmed under systems architected "Oh so long ago" {Unix, Linux}
<
------------------------------------------------------------------------------------------------------------------------------------
<
I know how to build HW queues that can perform typical queueing activities at
least 1 per cycle {Insert on tail, insert on front, remove from front, Move to wait state,
Move from wait state back to run state}. I also know no SW
I know how to build bus interconnect (fabrics) that can sent interrupts at these kinds of
rates
So, for example, my HW understands the concept of obeying affinity, but the OS/HV
is in charge of manipulating affinity.
<
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Hardware assisted message passing

<27e3e923-ee81-4307-94d0-90a07110b3d9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22238&group=comp.arch#22238

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:301b:: with SMTP id ke27mr10350947qvb.68.1638996136418;
Wed, 08 Dec 2021 12:42:16 -0800 (PST)
X-Received: by 2002:aca:ac8a:: with SMTP id v132mr1923549oie.44.1638996136125;
Wed, 08 Dec 2021 12:42:16 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 8 Dec 2021 12:42:15 -0800 (PST)
In-Reply-To: <7e9659b0-a544-493d-b071-c626c60a3af6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b00f:a0c6:d327:6bf1;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b00f:a0c6:d327:6bf1
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<u1xrJ.95050$np6.11511@fx46.iad> <2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com>
<soqvgp$1jj$1@dont-email.me> <7e9659b0-a544-493d-b071-c626c60a3af6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <27e3e923-ee81-4307-94d0-90a07110b3d9n@googlegroups.com>
Subject: Re: Hardware assisted message passing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 08 Dec 2021 20:42:16 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 133
 by: MitchAlsup - Wed, 8 Dec 2021 20:42 UTC

On Wednesday, December 8, 2021 at 2:17:49 PM UTC-6, MitchAlsup wrote:
> On Wednesday, December 8, 2021 at 1:02:52 PM UTC-6, Stephen Fuld wrote:
> > On 12/6/2021 4:37 PM, MitchAlsup wrote:
> > > On Monday, December 6, 2021 at 5:31:04 PM UTC-6, EricP wrote:
> > >> MitchAlsup wrote:
> > snip
> > >> The first thing people found out about synchronous RPC is that they really
> > >> didn't want synchronous RPC, they wanted asynchronous RPC so they could
> > >> submit multiple concurrent requests at once then wait for all to be done.
> > This gets into the area of what features a message passing system should
> > support. One of the questions is asynchronous or not. This is related
> > to what support for multicasting is appropriate. Just to be sure, by
> > asynchronous, I mean the ability for an executing thread to send a
> > message without necessarily giving up control, so the next instruction
> > can execute immediately. This is related to multi-casting in that a
> > thread can use it to send messages to multiple destinations as a sort of
> > "poor mans" multicasting.
> > > Why can an ADA asynch not be setup by having an acceptation entry point
> > > enqueue the rendezvous calls onto (one or more) queues that this task
> > > shares with a worker task ?
> > That can work, but it seems to imply a lot of extra overhead versus the
> > sender sending messages to multiple worker tasks directly.
> >
> > There are lots of things to decide about multicasting, the first of
> > which is how much of it include in the basic hardware.
> > > <
> > >>
> > >> I would want to work through some realistic examples, from soup to nuts,
> > >> enqueuing multiple async IO packets to various parts of the IO subsystem
> > >> using the rendezvous model. Just to be sure there are no gotcha's
> > >> and can be as efficient as hoped.
> > > <
> > > I do not see any fundamental reason why asynchronous RPC could not be worked
> > > into what I currently have. While it's not like the <strict> ADA rendezvous, it uses
> > > this piece here, and that piece there FROM the rendezvous feature set.
> <
> > One of my purposes in starting this thread is to have people more
> > knowledgeable than me hash out what features are required/desirable,
> > and, for the desirable ones, work through the tradeoffs.
> <
> Me; I start with what I know how to "do" in HW and then work on providing a
> means to express "I want to use That HW" as a short sequence of instructions
> (as short as 1 instruction). The exposed model has to work as well for the
> typical model of trapping into the OS to ask for some service, and sending
> a message several hops down the chip-to-chip interconnect and activate a
> thread (over there) asking it to perform some service. SW should see al little
> difference between these scenarios as possible (hopefully none).
> <
> For example::
> <
> a) I did not start with an assumption that interrupts were targeted at CPUs
> (1950 model) but that interrupts are targeted at threads (ISRs:: 2020 model).
> b) I already had the notion that HW performs context switches
> {the saving and restoring of state} ((But note HW does not mean CPU or core))
> c) the components of a system are INHERENTLY parallel and concurrent
> 1) there are more than 1 cores
> 2) there may be more than one memory + DRAM controller
> 3) there may be more than one southbridge
> 4) there may be more than one chip-to-chip interconnect
> d) the components of a system are blocks of logic {thousands to millions of gates}
> 1) a core contains several to many function units that perform arithmetic, memory
> ....access, and control flow. These are well understood units-of-work.
> 2) there is no reason one could NOT put a function unit in the memory controller
> ....should we have a well defined set of units-of-work (effectively instructions)
> ....for that function unit.
> 3) once positioned, one needs a means for SW to access the HW functionality.
> e) once accessible, these function units can remove work from the OS+HV
> ....in ways that enhance system performance {similar to making faster calculating
> functions units of a core.}
> f) one has to draw the line with a good understanding of the worker (typical threads)
> and of the controller (OS, HV, ISR, I/O completion,...) in order that the new model
> can be programmed under systems architected "Oh so long ago" {Unix, Linux}
> <
> ------------------------------------------------------------------------------------------------------------------------------------
> <
> I know how to build HW queues that can perform typical queueing activities at
> least 1 per cycle {Insert on tail, insert on front, remove from front, Move to wait state,
> Move from wait state back to run state}.
> I know how to build bus interconnect (fabrics) that can sent interrupts at these kinds of
> rates
> So, for example, my HW understands the concept of obeying affinity, but the OS/HV
> is in charge of manipulating affinity.
<
<some control key published this before I was actually done writing>
<
Take, for example, the arrival of an interrupt on a typical modern machine::
<
a) interrupts arrives at APIC, APIC forwards it to core[k]
b) core[k] takes interrupt, several hundred cycle later, core[k] has arrived at ISR
c) ISR saves a bunch of state {which is going to have low cache hit rates}
.....and takes another hundred-odd cycles.
d) ISR has to figure out why it received control and reads a few device control
.....registers {another hundred to thousand cycles}
e) based on the data received, perform some device cleanup
f) based on the data received, schedule some I/O cleanup
g) restore a bunch of state {generally with good cache hit rates}
h) syscall to OS thread scheduler
{ h.1) OS sees that I/O cleanup handler has been afinitized to a different CPU
h.2) OS chooses one core from the I/OCH affinity list and IPIs that core.
h.3) other core takes interrupt {several hundred cycles later}
h.4) other core figures out that IOCH[l] needs to run
h.5) other core morphs itself from IPI ISR into IOCH
} i) generally arriving at I/O cleanup handler
j) I/OCR saves a bunch of state {which is going to have low cache hit rates}
.....and takes another hundred-odd cycles.
k) schedule some thread from a wait state to a run state.
l) restore a bunch of state {generally with good cache hit rates}
m) syscall to OS thread scheduler
n) .....
<
<
So, my currently working model gets rid of the saving and restoring and
the ineffective hits rates in the cache for all the above stuff, There are no
calls to the OS scheduler (this happens in HW), the placement of a task
to be performed on the OS queues is 1 instruction and takes 1 (pipelined)
cycle,
<
AND
<
the actual work of performing a complete context switch (as seen at the
core) is 7 cycles. All the cycles leading up to the start remain devoted to
the currently executing thread (with its hot cache states). After the context
switch the <ISR> thread is running in a SW environment where compiled
code runs fine {no privilege, no supervisor rights needed, no thread can run
twice} except that one would expect the caches (and TLBs) to be cold.
<
Doing real work to doing real work in 7 cycles in a completely different VM
in a completely different OS on a potentially completely different core.
> <
> > --
> > - Stephen Fuld
> > (e-mail address disguised to prevent spam)

Re: Hardware assisted message passing

<nBasJ.78885$JZ3.18060@fx05.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22240&group=comp.arch#22240

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx05.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <u1xrJ.95050$np6.11511@fx46.iad> <j18kaiFibh7U1@mid.individual.net>
In-Reply-To: <j18kaiFibh7U1@mid.individual.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 118
Message-ID: <nBasJ.78885$JZ3.18060@fx05.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 08 Dec 2021 22:46:43 UTC
Date: Wed, 08 Dec 2021 17:43:35 -0500
X-Received-Bytes: 5912
 by: EricP - Wed, 8 Dec 2021 22:43 UTC

Niklas Holsti wrote:
> Just some notes on the discussion of Ada rendez-vous, a bit off-topic:
>
> On 2021-12-07 1:29, EricP wrote:
>> MitchAlsup wrote:
> [...]
>>> On Monday, November 29, 2021 at 8:14:02 AM UTC-8, Stephen Fuld wrote:
>>>> When I first encountered the Elxsi 6400 system in the mid 1980s, I
>>>> saw the “unifying” power of hardware (well, in their case microcode)
>>>> assisted message passing. [...]
>>>> So, the basic use is to send a “message” of typically up to several
>>>> hundred bytes (though often much less) from one process to another
>>>> another process that is willing to accept it. An extension is to
>>>> allow the hardware itself to send or receiver a message. So on to
>>>> other uses.
>>>
>>> My intended use was to explicitly make interrupts a first class
>>> citizen of the architecture. An interrupt causes a waiting thread
>>> to receive control at a specified priority and within a specified
>>> set of CPUs (affinity). [...]
>>>
>>> An interrupt is associated with a thread, that thread has a priority,
>>> Thread header (PSW, Root Pointer, State) and a register file.
>>>
>>> The receipt of an interrupt either gets queued onto that thread (if
>>> it is already running) or activates that thread if is is not
>>> running. The queueing guarantees that the thread is never active
>>> more than once, and that no interrupts are ever lost.
> [...]
>
>>> I worked in ADA rendezvous into the system, too. They have almost
>>> all of the properties of the std interrupts, but add in the address
>>> space join/release, and the reactivation of caller at the end of
>>> rendezvous.
>>
>> I have a certain amount of apprehension about this approach.
>>
>> Note that Ada rendezvous are inherently _synchronous_ client-server only,
>> wherein the client waits until the server finishes the rendezvous.
>
>
> Yes.
>
>
>> There are no events, mutexes, semaphores, etc in that model. It does
>> have timed entry and accept statements, but that is synchronous
>> polling.
>
> Basically it is a time-out on the wait for the synchronous
> communication. But it can be used for polling, true.

A non-zero delay is just a long poll.

>> You can build things like events and mutexes out of Ada tasks and
>> rendezvous but it is ridiculously expensive because they require more
>> tasks each with its header and stack, etc to create "mutex servers".
>
>
> There were some Ada compilers that detected such simple "mutex" or
> "monitor" tasks and optimized out the thread and stack aspects. But
> current Ada has instead added data-mediated communication, using a
> feature called "protected objects", similar to the "monitor" concept in
> some Pascal-derived multi-threading languages and also similar to
> "synchronized" in Java. With this feature one can build queues and
> similar shared data structures without the extra task-switching and
> stacking overhead.

The Ada run-time library builds its rendezvous mechanism out of these
more primintive operations, mutexes guarding linked lists,
event flags and wait-for operations, asynchronous timers with callbacks,
various instructions like atomic compare-and-swap, atomic fetch-add.

I prefer to use the primitives directly and build exactly what I want.

>> The first thing people found out about synchronous RPC is that they
>> really
>> didn't want synchronous RPC, they wanted asynchronous RPC so they could
>> submit multiple concurrent requests at once then wait for all to be done.
>
> Ada has an standardized but optional RPC mechanism that supports both
> synchronous and asynchronous RPC. However, it has not been widely used
> -- programmers prefer to use mechanisms that work inter-language, such
> as sockets -- and I believe is not supported now by any Ada compiler.

I was using RPC as an example of the limitations of synchronous
client server. Its good for some things, not for others.

The point being that you can always turn an asynchronous interface
into a synchronous one but not the other way around.

The DEC Ada85 compiler has task interfaces for Asynchronous System Traps
through an implementation pragma that connected an AST to an entry/accept
such that an IO completion with an AST notification callback could
wake up an Ada task as though it received an Accept call.
So it can be forced to work, but you are adding all kinds of tasking
overhead to what could be a simple callback routine.

task Handler is
entry Receive_AST (astParm : Integer);
pragma AST_ENTRY (Receive_AST);
end Handler;

task body Handler is
accept Receive_AST (astParm : Integer) do
...
end Receive_AST;
end Handler;

...
QIO (..., ASTADDR => Receive_AST, ASTPRM => 33);

What is lost here is that it takes an AST which acts like a thread
interrupt and turns it into something that is delivered by polling.
These are not interchangeable concepts - you can't do a proper SIGINT
or SIGKILL that is only delivered by polling.

Re: Hardware assisted message passing

<sorjbl$50r$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22241&group=comp.arch#22241

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bak...@iitbombay.org (Bakul Shah)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Wed, 8 Dec 2021 16:41:25 -0800
Organization: A noiseless patient Spider
Lines: 96
Message-ID: <sorjbl$50r$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<somlvi$8i1$1@dont-email.me>
<d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
<soota5$js6$1@dont-email.me> <sop4c1$lqt$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 9 Dec 2021 00:41:25 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bf2e0879a8a0487bae0fb2387ee1ab94";
logging-data="5147"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Qqs/JETnMaOoWLiLZ9ih9"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:68.0)
Gecko/20100101 Firefox/68.0 SeaMonkey/2.53.10
Cancel-Lock: sha1:Q7BrHlnIcfl0foE8ApSjYYkLt6I=
In-Reply-To: <sop4c1$lqt$1@dont-email.me>
 by: Bakul Shah - Thu, 9 Dec 2021 00:41 UTC

On 12/7/21 6:13 PM, Ivan Godard wrote:
> On 12/7/2021 4:12 PM, Bakul Shah wrote:
>> On 12/7/21 10:20 AM, MitchAlsup wrote:
>>> On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
>>>> On 12/5/21 1:09 PM, MitchAlsup wrote:
>>>>>>
>>>>>> So, the basic use is to send a “message” of typically up to several
>>>>>> hundred bytes (though often much less) from one process to another
>>>>>> another process that is willing to accept it. An extension is to
>>>>>> allow
>>>>>> the hardware itself to send or receiver a message. So on to other
>>>>>> uses.
>>>>> <
>>>>> My intended use was to explicitly make interrupts a first class
>>>>> citizen
>>>>> of the architecture. An interrupt causes a waiting thread to
>>>>> receive control
>>>>> at a specified priority and within a specified set of CPUs (affinity).
>>> <
>>>> As you may know, typically this is not how (currently)
>>>> interrupt handling works. There is no thread *waiting* to
>>>> receive control.
>>> <
>>> What do you all the pointer to code in the interrupt vector table ?
>>> Is it not a waiting thread ?????
>>
>> No. I think of it as a procedure call. AFAIK all the processors are
>> still following the model Dijkstra & IBM independently came up with.
>>  From Dijkstra's "My Recollections of Operating System Design":
>> https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html
>>
>>    To start with I tried to convince myself that it was possible to
>>    save and restore enough of the machine state so that, after the
>>    servicing of the interrupt, under all circumstances the interrupted
>>    computation could be resumed correctly.
>>
>>    ... I designed a protocol for saving and restoring that could be
>>    shown always to work properly. For all three of us it had been a
>>    very instructive experience. It was clearly an experience that
>>    others had missed because for years I would find flaws in designs
>>    of interrupt hardware.
>>
>> The interrupt vector is more like a function table - what you call
>> depends on the entry associated with an interrupt source.
>>
>> The Amd29k had one of the simplest implementations. On interrupt
>> or trap, it copied just *three* register to backup registers and
>> disabled all interrupts. A few instructions were needed to do
>> anything useful. The "iret" instruction restored these registers
>> and the original code continued executing.
>>
>> Something like x86 has a much more complicated system but this
>> basic principle remains.
>>
>> You can do a complete context switch on interrupt but then you
>> are saving a lot more state than necessary. Most of the time the
>> same code is going to continue executing after iret so why bother.
>> IMHO, it is better to think of an interrupt as an external trap
>> than as a signal for a thread switch.
>>
>> The other thing to think about is that the OS scheduler would
>> want to be in control of all thread switching so that it can
>> keep track of who is using what resources and follow some
>> scheduling policies to provide an overall control for a smooth
>> operation.
>
> You can have it both ways. Mill dispatch is to a function pointer as you
> describe, but that can point at a portal so the call migrates to a
> different context as Mitch describes. Of course, you stay in the same
> stack, which Mitch doesn't do, but Mill can interleave multiple contexts
> on a single stack.

I think of a Mill portal call as continuing in the *same* thread
but in a different turf (protection domain). This is analogous to
a system call on current OSes that continues in the same thread
but in a higher privilege domain (which makes it inherently less
secure - but that is a different discussion thread!). I do think
Mill got this exactly right. At least conceptually it is the
"same stack" but different segments are in different turfs.

> Your concern for the cost of context switch is a bit outdated. Both M66
> and Mill can do a switch in the time of a legacy call, though using
> different methods, and there are other ways available too. The critical
> thing is to design context switch as a first class operation from the
> beginning.

Agree in principle but this is somewhat independent of how
interrupts can be handled!

Does Mill allow a context switch from thread A in turf T
to an unrelated thread B in turf U? I thought it was more
of a Manhattan geometry: you switch turfs in the same thread
or switch threads in the same turf.

I should post a separate article on my mental model of ideal
interrupt handling....

Re: Hardware assisted message passing

<j1dtflFiurcU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22242&group=comp.arch#22242

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Thu, 9 Dec 2021 10:36:02 +0200
Organization: Tidorum Ltd
Lines: 73
Message-ID: <j1dtflFiurcU1@mid.individual.net>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<u1xrJ.95050$np6.11511@fx46.iad> <j18kaiFibh7U1@mid.individual.net>
<nBasJ.78885$JZ3.18060@fx05.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net gzFrx53iCOgOuIV4aW24rgXNoYVRbAT6eNJO9LbtuIL3OcSjok
Cancel-Lock: sha1:de47gDkp1HXd2gc1+bt+6iNCS5A=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <nBasJ.78885$JZ3.18060@fx05.iad>
Content-Language: en-US
 by: Niklas Holsti - Thu, 9 Dec 2021 08:36 UTC

On 2021-12-09 0:43, EricP wrote:
> Niklas Holsti wrote:
>> Just some notes on the discussion of Ada rendez-vous, a bit off-topic:

[snip]

> The Ada run-time library builds its rendezvous mechanism out of these
> more primintive operations, mutexes guarding linked lists,
> event flags and wait-for operations, asynchronous timers with callbacks,
> various instructions like atomic compare-and-swap, atomic fetch-add.
>
> I prefer to use the primitives directly and build exactly what I want.

The Ada RTS provides a portable interface to such things, which I find
is generally preferable to using the system-specific primitives directly.

> The DEC Ada85 compiler has task interfaces for Asynchronous System Traps
> through an implementation pragma that connected an AST to an entry/accept
> such that an IO completion with an AST notification callback could
> wake up an Ada task as though it received an Accept call.
> So it can be forced to work, but you are adding all kinds of tasking
> overhead to what could be a simple callback routine.

But you also get the mutex protections for task-private data
automatically, instead of an unpredictable callback at any time.

If that compiler would be updated for current Ada, it would probably
offer to connect the AST to an operation of a protected object instead
of a task entry (as is now done for interrupts). Calling an operation of
a protected object only implies a mutex (or priority ceiling lift) and
no task switch.

Also, some Ada compilers implement accept statements so that the body of
the accept (the rendez-vous section in the accepting task's source code)
is executed by the calling task, without a full task switch.

> task Handler is
>   entry Receive_AST (astParm : Integer);
>   pragma AST_ENTRY (Receive_AST);
> end Handler;
>
> task body Handler is
>   accept Receive_AST (astParm : Integer) do
>   ...
>   end Receive_AST;
> end Handler;
>
>   ...
>   QIO (..., ASTADDR => Receive_AST, ASTPRM => 33);
>
> What is lost here is that it takes an AST which acts like a thread
> interrupt and turns it into something that is delivered by polling.

A task waiting at an accept is not "polling", IMO, even if the accept
statement has a time-out so that the task can periodically or
sporadically do something else in between waiting.

> These are not interchangeable concepts - you can't do a proper SIGINT
> or SIGKILL that is only delivered by polling.

What is "proper" is subjective and depends on the requirements for
whatever you are implementing -- requirements on reaction time, clean-up
and last-wish actions.

Re: Hardware assisted message passing

<sosibd$s7v$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22243&group=comp.arch#22243

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
Date: Thu, 9 Dec 2021 01:30:21 -0800
Organization: A noiseless patient Spider
Lines: 118
Message-ID: <sosibd$s7v$1@dont-email.me>
References: <so2u87$tbg$1@dont-email.me>
<e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com>
<somlvi$8i1$1@dont-email.me>
<d9d12563-6259-42a1-965a-37428dfd4ea3n@googlegroups.com>
<soota5$js6$1@dont-email.me> <sop4c1$lqt$1@dont-email.me>
<sorjbl$50r$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 9 Dec 2021 09:30:22 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="3dcf6b61a2d6912d32bd387ca80d2386";
logging-data="28927"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/yzTx+YQyUq5xzDhbsWG5D"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.2
Cancel-Lock: sha1:K9681wMQyrIygMYii/YLiH1leG8=
In-Reply-To: <sorjbl$50r$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Thu, 9 Dec 2021 09:30 UTC

On 12/8/2021 4:41 PM, Bakul Shah wrote:
> On 12/7/21 6:13 PM, Ivan Godard wrote:
>> On 12/7/2021 4:12 PM, Bakul Shah wrote:
>>> On 12/7/21 10:20 AM, MitchAlsup wrote:
>>>> On Monday, December 6, 2021 at 9:55:33 PM UTC-6, Bakul Shah wrote:
>>>>> On 12/5/21 1:09 PM, MitchAlsup wrote:
>>>>>>>
>>>>>>> So, the basic use is to send a “message” of typically up to several
>>>>>>> hundred bytes (though often much less) from one process to another
>>>>>>> another process that is willing to accept it. An extension is to
>>>>>>> allow
>>>>>>> the hardware itself to send or receiver a message. So on to other
>>>>>>> uses.
>>>>>> <
>>>>>> My intended use was to explicitly make interrupts a first class
>>>>>> citizen
>>>>>> of the architecture. An interrupt causes a waiting thread to
>>>>>> receive control
>>>>>> at a specified priority and within a specified set of CPUs
>>>>>> (affinity).
>>>> <
>>>>> As you may know, typically this is not how (currently)
>>>>> interrupt handling works. There is no thread *waiting* to
>>>>> receive control.
>>>> <
>>>> What do you all the pointer to code in the interrupt vector table ?
>>>> Is it not a waiting thread ?????
>>>
>>> No. I think of it as a procedure call. AFAIK all the processors are
>>> still following the model Dijkstra & IBM independently came up with.
>>>  From Dijkstra's "My Recollections of Operating System Design":
>>> https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html
>>>
>>>    To start with I tried to convince myself that it was possible to
>>>    save and restore enough of the machine state so that, after the
>>>    servicing of the interrupt, under all circumstances the interrupted
>>>    computation could be resumed correctly.
>>>
>>>    ... I designed a protocol for saving and restoring that could be
>>>    shown always to work properly. For all three of us it had been a
>>>    very instructive experience. It was clearly an experience that
>>>    others had missed because for years I would find flaws in designs
>>>    of interrupt hardware.
>>>
>>> The interrupt vector is more like a function table - what you call
>>> depends on the entry associated with an interrupt source.
>>>
>>> The Amd29k had one of the simplest implementations. On interrupt
>>> or trap, it copied just *three* register to backup registers and
>>> disabled all interrupts. A few instructions were needed to do
>>> anything useful. The "iret" instruction restored these registers
>>> and the original code continued executing.
>>>
>>> Something like x86 has a much more complicated system but this
>>> basic principle remains.
>>>
>>> You can do a complete context switch on interrupt but then you
>>> are saving a lot more state than necessary. Most of the time the
>>> same code is going to continue executing after iret so why bother.
>>> IMHO, it is better to think of an interrupt as an external trap
>>> than as a signal for a thread switch.
>>>
>>> The other thing to think about is that the OS scheduler would
>>> want to be in control of all thread switching so that it can
>>> keep track of who is using what resources and follow some
>>> scheduling policies to provide an overall control for a smooth
>>> operation.
>>
>> You can have it both ways. Mill dispatch is to a function pointer as
>> you describe, but that can point at a portal so the call migrates to a
>> different context as Mitch describes. Of course, you stay in the same
>> stack, which Mitch doesn't do, but Mill can interleave multiple
>> contexts on a single stack.
>
> I think of a Mill portal call as continuing in the *same* thread
> but in a different turf (protection domain). This is analogous to
> a system call on current OSes that continues in the same thread
> but in a higher privilege domain (which makes it inherently less
> secure - but that is a different discussion thread!). I do think
> Mill got this exactly right. At least conceptually it is the
> "same stack" but different segments are in different turfs.

It's a matter of definition - just what is a "thread"? If you interrupt
a running program, are you in the same thread even though you are doing
work completely unrelated to what was being done before? Or are you in a
completely different thread that has hijacked a random stack?

Remember TSRs? Are they pending threads, or or function libraries?at

>> Your concern for the cost of context switch is a bit outdated. Both
>> M66 and Mill can do a switch in the time of a legacy call, though
>> using different methods, and there are other ways available too. The
>> critical thing is to design context switch as a first class operation
>> from the beginning.
>
> Agree in principle but this is somewhat independent of how
> interrupts can be handled!
>
> Does Mill allow a context switch from thread A in turf T
> to an unrelated thread B in turf U? I thought it was more
> of a Manhattan geometry: you switch turfs in the same thread
> or switch threads in the same turf.

The former; that's what portal calls do. But that doesn't switch stack
or core, just context. Or not, depending on what you mean by a thread.

If a thread is defined as a nesting call sequence, then a portal call
does not change thread - the return from a portal call puts you in the
caller. If a thread is an addressing (scope) sequence (downstack links
for example, for any language with proper static nesting of functions)
then a portal call does change thread, although it is up to the language
implementation to ensure the links are right - the Mill guarantees
correct secure visibility, but doesn't make you look :-)

> I should post a separate article on my mental model of ideal
> interrupt handling....

You should. Or at least post it to the Mill dev list :-)

Re: Hardware assisted message passing

<66rsJ.178509$IW4.13731@fx48.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22244&group=comp.arch#22244

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx48.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <u1xrJ.95050$np6.11511@fx46.iad> <j18kaiFibh7U1@mid.individual.net> <nBasJ.78885$JZ3.18060@fx05.iad> <j1dtflFiurcU1@mid.individual.net>
In-Reply-To: <j1dtflFiurcU1@mid.individual.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 55
Message-ID: <66rsJ.178509$IW4.13731@fx48.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 09 Dec 2021 17:33:54 UTC
Date: Thu, 09 Dec 2021 12:28:58 -0500
X-Received-Bytes: 3011
 by: EricP - Thu, 9 Dec 2021 17:28 UTC

Niklas Holsti wrote:
> On 2021-12-09 0:43, EricP wrote:
>
>> task Handler is
>> entry Receive_AST (astParm : Integer);
>> pragma AST_ENTRY (Receive_AST);
>> end Handler;
>>
>> task body Handler is
>> accept Receive_AST (astParm : Integer) do
>> ...
>> end Receive_AST;
>> end Handler;
>>
>> ...
>> QIO (..., ASTADDR => Receive_AST, ASTPRM => 33);
>>
>> What is lost here is that it takes an AST which acts like a thread
>> interrupt and turns it into something that is delivered by polling.
>
> A task waiting at an accept is not "polling", IMO, even if the accept
> statement has a time-out so that the task can periodically or
> sporadically do something else in between waiting.

If either the entry or accept statement has a time-out I consider
that a polling interface as it is checking for a condition then
doing something else, pretty much the definition of busy-waiting.

Granted the above example code does not so the task would wait at
accept until an AST arrived. However a more realistic example using
asynch IO would do other things while the IO is outstanding,
so in practice the above would have a time-out to poll the accept.

>> These are not interchangeable concepts - you can't do a proper SIGINT
>> or SIGKILL that is only delivered by polling.
>
> What is "proper" is subjective and depends on the requirements for
> whatever you are implementing -- requirements on reaction time, clean-up
> and last-wish actions.

Those signals are expected to be delivered to the target thread using
interrupt semantics. If they are not the behavior difference is noticeable.

For example, if a terminal interface wants a CTRL-C handler,
and the interface for CTRL-C is through accept statements,
then this would have to be accepts with delay=0 which would
have to be polled by the terminal handler code
(otherwise the interface would appear to hang).

That is not the same behavior as a signal delivered to a thread via
interrupt semantics because, for example, it won't break into an
infinite loop as an interrupt would.

Re: Hardware assisted message passing

<76rsJ.178510$IW4.98289@fx48.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22245&group=comp.arch#22245

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx48.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <u1xrJ.95050$np6.11511@fx46.iad> <2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com> <soqvgp$1jj$1@dont-email.me>
In-Reply-To: <soqvgp$1jj$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 64
Message-ID: <76rsJ.178510$IW4.98289@fx48.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 09 Dec 2021 17:33:55 UTC
Date: Thu, 09 Dec 2021 12:33:38 -0500
X-Received-Bytes: 3877
 by: EricP - Thu, 9 Dec 2021 17:33 UTC

Stephen Fuld wrote:
> On 12/6/2021 4:37 PM, MitchAlsup wrote:
>> On Monday, December 6, 2021 at 5:31:04 PM UTC-6, EricP wrote:
>>> MitchAlsup wrote:
>
> snip
>
>>> The first thing people found out about synchronous RPC is that they
>>> really
>>> didn't want synchronous RPC, they wanted asynchronous RPC so they could
>>> submit multiple concurrent requests at once then wait for all to be
>>> done.
>
> This gets into the area of what features a message passing system should
> support. One of the questions is asynchronous or not. This is related
> to what support for multicasting is appropriate. Just to be sure, by
> asynchronous, I mean the ability for an executing thread to send a
> message without necessarily giving up control, so the next instruction
> can execute immediately. This is related to multi-casting in that a
> thread can use it to send messages to multiple destinations as a sort of
> "poor mans" multicasting.

Lets walk through how an async interface might work with rendezvous.

Normally a synchronous client-server interface is going to stall a
new client in a queue until all prior clients have at least started
their rendezvous.

If you don't want the possibility of waiting for service in a queue,
with this style tasking one must create a task that acts as a queue server
between the client(s) and server(s) because in this environment there
are no mutexes or events, just tasks which can rendezvous.
The sole purpose of this queue server task is to accept work packets
from clients tasks and pass them to servers tasks.

Completion notification messages will be sent to each client when
a server has finished its work packet. But we don't want a server
to stall waiting for a client to accept its notify msg so each
client has a notify queue task whose job is just to pass notify
msgs back to clients.
Note that the above requires passing a handle to the client's notify queue
task in the work packet so the server knows which reply queue task to send
notify's to, which implies some form of reference count tracking
(OS IO subsystem normally takes care of all of this).

Delivery of completion notifications to clients is synchronous
and clients must check for any notify replies from time to time.

One important thing the above does not deal with is _cancellation_.
All outstanding async IO and usually any other async operations require
the ability to be asynchronously canceled so they don't hang any
dependent resources waiting for operations that may never complete.
This can be done with a pointer to a subroutine and some
atomic instructions.

The above has no ability to deliver asynchronous interrupt notifications.
There is no ability, for example, to deliver a CTRL-C interrupt
from a serial driver server back to a terminal interface client.
To do this would require the ability to post a Thread Interrupt Procedure
to a thread, and/or raise an exception in other threads.

Re: Hardware assisted message passing

<ZvrsJ.48263$L_2.28684@fx04.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22246&group=comp.arch#22246

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx04.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Hardware assisted message passing
References: <so2u87$tbg$1@dont-email.me> <e6d5d393-a57e-4e5d-9632-fa2fda932dc0n@googlegroups.com> <u1xrJ.95050$np6.11511@fx46.iad> <2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com>
In-Reply-To: <2b4aa260-7cec-4891-9813-569d66a15626n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 45
Message-ID: <ZvrsJ.48263$L_2.28684@fx04.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 09 Dec 2021 18:01:29 UTC
Date: Thu, 09 Dec 2021 13:00:24 -0500
X-Received-Bytes: 3155
 by: EricP - Thu, 9 Dec 2021 18:00 UTC

MitchAlsup wrote:
> On Monday, December 6, 2021 at 5:31:04 PM UTC-6, EricP wrote:
>> MitchAlsup wrote:
>>>> 3. Similarly, with suitable hardware, you no longer need a separate
>>>> mechanism for inter-processor interrupts.
>>> <
>>> Exactly--and a useful byproduct. But this unification needs to be expressible
>>> at the instruction level. That is the SATA interrupt handler needs a way to
>>> schedule the user-I/O-completion handler efficiently (1 instruction) and that
>>> thread needs an efficient way to take the waiting thread off the wait queue
>>> and drop it back on the run queue (also 1-instruction). There is no particular
>>> reason that any CPU context switches are processed by running instructions!
>>> <
>>> Indeed, having the entire thread state arrive as the interrupt message
>>> gets rid of lots of OS issues (cue EricP)
> <
>> Is this the same as the Thread Interrupt Procedures (TIP's)
>> I was talking about earlier?
> <
> Possibly, but the "jist" of the model is to remove context switching (instructions
> visible control registers, instruction ordering) from the CPU and put decision
> making elsewhere in the system (1 per chip)
>> TIPs are handy for working around the limitations of synchronous
>> interfaces and notifying threads of async completions.
> <
> Sounds more like I unified TIPs into IPIs. Any CPU anywhere in the system can
> send an interrupt to any virtual machine running any OS running any set of
> threads in a single instruction.

Ok, I think I got ya - device interrupts are TIPs sent from devices,
IPIs are TIPs sent from other (or same!) threads.

Note that the OS TIPs I'm thinking of are a packet that is queued
to a thread which carries a pointer to a routine to call and
3 void* args to pass to that routine.
There can also be a function to call if a TIP is canceled
passing it the args.
The routines are responsible for deallocating the packet if necessary
(only the TIP creator knows how the packet was originally allocated).

A HW device would be passed a TIP packet pointer containing the above
when an IO is started. When the device interrupts it passed back the
packet pointer which is eventually queued to a device handler thread
and delivered when possible.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor