Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Saints should always be judged guilty until they are proven innocent. -- George Orwell


devel / comp.arch / Re: A BID for conceptually unified resource management ☺

SubjectAuthor
* A BID for conceptually unified resource managementPaul A. Clayton
`* Re: A BID for conceptually unified resource managemeMitchAlsup
 +* Re: A BID for conceptually unified resource managemeJohnG
 |`* Re: A BID for conceptually unified resource managemeMitchAlsup
 | `* Re: A BID for conceptually unified resource managemeJohnG
 |  `- Re: A BID for conceptually unified resource managemeMitchAlsup
 `* Re: A BID for conceptually unified resource managementPaul A. Clayton
  +- Re: A BID for conceptually unified resource managemeMitchAlsup
  `* Re: A BID for conceptually unified resource managementStefan Monnier
   `* Re: A BID for conceptually unified resource managementIvan Godard
    `- Re: A BID for conceptually unified resource managementBGB

1
A BID for conceptually unified resource management ☺

<tbhj1b$2k9a$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26852&group=comp.arch#26852

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: paaroncl...@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: A_BID_for_conceptually_unified_resource_management_

Date: Sat, 23 Jul 2022 15:43:39 -0400
Organization: A noiseless patient Spider
Lines: 107
Message-ID: <tbhj1b$2k9a$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Jul 2022 19:43:39 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="c4b38e3b57e6a2235f9f9dee471117cc";
logging-data="86314"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19TBAiuw97FLFb6y+eiyC6IaC+kp26B+OE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.0
Cancel-Lock: sha1:j0C7DzU/iTX3tfgq1XIxZii8LbM=
X-Mozilla-News-Host: news://news.eternal-september.org:119
 by: Paul A. Clayton - Sat, 23 Jul 2022 19:43 UTC

I have previously noted that x86's MONITOR/MWAIT (which sets up a
memory address to be monitored for invalidation/update and waits
until the update occurs or other condition ends the wait) is a
more limited form of the YIELD instruction in MIPS MT-ASE (which
can inform the core that the thread is dead, "invokes the pro-
cessor’s scheduling logic and relinquishes the CPU for any other
threads which ought to execute first according to the implemented
scheduling policy", or sets a bit vector of conditions on which to
wait — the event definition would require implementation-defined
action whereas x86's MONITOR provides an architectural method for
defining the event). I have also noted that MONITOR/MWAIT has
similarities to LL/SC.

With AMD adding timer with MONITORX/MWAITX, more scheduling
functionality is exposed. In theory, a core with SoEMT could use
the timer duration to decide whether to execute in another thread
while waiting. The C-state information (particularly if expanded
in value set) could be used as a priority indicator. This also
points out the possibility of a "request time slice" operation.
This might be a performance hint where an execution phase with
significant temporal locality has a certain worst/bad case (or
expected) execution time. Such a mechanism might also be useful
for temporary interrupt blocking/diversion. This then further
connects the interface to atomic (in memory or with respect to
interrupts) operations.

(MONITORX/MWAITX is similar to Unix poll system call that waits on
availability of a file descriptor or until a timer runs out; where
in Unix everything is a file, processor ISAs often approximate
everything is a memory slot [with My 66000 being perhaps a little
unusual, outside of some microcontrollers, in having registers
being 'merely' a cache of memory].)

It seems that these aspects and others fit together and might
benefit from a coherent interface. While a SCHEDULE instruction
(which I have proposed earlier) might be able to join many of
these aspects, that name/concept does not seem to fit with the
atomic operation aspect (even though such is schedule-like).
"Schedule" also implies only managing time/execution resources.
"Bid" (where a value function is communicated to hardware,
presumably with a pre-established 'budget') might give a better
sense of diversity of resources (time, power, cache capacity, NoC
bandwidth, etc.) and might have a better implication of
"optimistic concurrency"/transactional memory, but "bid" does not
shout to me that it is the best name for such an interface.

Giving an interface a name is also not sufficient. One of the
problems with the YIELD instruction in MIPS MT-ASE is the amount
of meaning that is implementation dependent with no pre-defined
scheduler/resource manager interfaces. (It is a little like
providing load/store instructions with MMIO but with new device
drivers having to be written for every core implementation, but
worse because thread scheduling is more closely tied to the
processor.)

In principle, store-multiple to an MMIO-like "device" could
provide the same semantic as an instruction. A memory move
instruction with some atomicity guarantees could increase the data
available for an atomic command. Sequentially, non-atomically
storing values could also be used, but such would seem to require
either additional storage (to isolate data per thread) and
resumption capability or atomicity failure detection and retry
capability.

It also seems that a thread should be able to "wire money" to
another thread, granting some of its resource budget to another
thread. An OS's scheduling facility could be a special case of
this where the OS has a large budget that it can delegate. This
also seems to imply a much greater utility to "user-level"
threads. Combined with something like My 66000's Port Holes;
budget as well as access permission could be transferred. (This
then implies yet another area of potential unification, where
permission is another kind of resource that can be delegated.)

Resource sharing is another factor worth considering. A
multithreaded program might be more single program multiple data
(where instruction caching could benefit more than average from
sharing but data is disjoint) or more data sharing (common
read-mostly data or diversely modified common data that would
benefit from closer sharing). Different programs or concurrently
executing phases might have different core-internal component
utilization that would benefit from SMT or from a fixed
partitioning of core-internal resources. Communication patterns
would also influence where and when an execution stream should be
executed; pipeline-style communication might tolerate the latency
of farther separation (even if energy efficiency would urge closer
placement when convenient) and the bandwidth demand might not be
high (when a large portion of the data is thread local).

"Bids" also implies a value for timeliness of various types. Some
tasks might be worthless if not completed by a certain time, some
might have no extra value if completed early (and even lower value
when considering buffering overhead). In theory, a task which is
guaranteed to fail timeliness might be canceled early, freeing
resources for other tasks. The value of completing at task at a
certain time might also depend on the resource availability for
other tasks; if the work cannot be "cashed in" until a collection
of partially dependent tasks complete, then one task completing
fast may have no extra value if consuming tasks cannot run
(stalled on another data hazard or a structural hazard).
Performance- and power-heterogeneous cores complicate the
structural hazard problem. (There may be other forms of
heterogeneity within the same ISA that complicate resource
allocation. ISA subsets and heterogeneous ISAs would add further
complexity.)

I suppose this is enough tangents even for one of my posts.

Re: A BID for conceptually unified resource management ☺

<07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26858&group=comp.arch#26858

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5cc3:0:b0:31f:782:8588 with SMTP id s3-20020ac85cc3000000b0031f07828588mr5184375qta.594.1658608294059;
Sat, 23 Jul 2022 13:31:34 -0700 (PDT)
X-Received: by 2002:a05:620a:294e:b0:6a7:750b:abf8 with SMTP id
n14-20020a05620a294e00b006a7750babf8mr4275561qkp.513.1658608293899; Sat, 23
Jul 2022 13:31:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 23 Jul 2022 13:31:33 -0700 (PDT)
In-Reply-To: <tbhj1b$2k9a$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tbhj1b$2k9a$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
Subject: Re:_A_BID_for_conceptually_unified_resource_manageme
nt_☺
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 23 Jul 2022 20:31:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 11227
 by: MitchAlsup - Sat, 23 Jul 2022 20:31 UTC

On Saturday, July 23, 2022 at 2:43:42 PM UTC-5, Paul A. Clayton wrote:
> I have previously noted that x86's MONITOR/MWAIT (which sets up a
> memory address to be monitored for invalidation/update and waits
> until the update occurs or other condition ends the wait) is a
> more limited form of the YIELD instruction in MIPS MT-ASE (which
> can inform the core that the thread is dead, "invokes the pro-
> cessor’s scheduling logic and relinquishes the CPU for any other
> threads which ought to execute first according to the implemented
> scheduling policy", or sets a bit vector of conditions on which to
> wait — the event definition would require implementation-defined
> action whereas x86's MONITOR provides an architectural method for
> defining the event). I have also noted that MONITOR/MWAIT has
> similarities to LL/SC.
>
> With AMD adding timer with MONITORX/MWAITX, more scheduling
> functionality is exposed. In theory, a core with SoEMT could use
> the timer duration to decide whether to execute in another thread
> while waiting. The C-state information (particularly if expanded
> in value set) could be used as a priority indicator. This also
> points out the possibility of a "request time slice" operation.
> This might be a performance hint where an execution phase with
> significant temporal locality has a certain worst/bad case (or
> expected) execution time. Such a mechanism might also be useful
> for temporary interrupt blocking/diversion. This then further
> connects the interface to atomic (in memory or with respect to
> interrupts) operations.
>
> (MONITORX/MWAITX is similar to Unix poll system call that waits on
> availability of a file descriptor or until a timer runs out; where
> in Unix everything is a file, processor ISAs often approximate
> everything is a memory slot [with My 66000 being perhaps a little
> unusual, outside of some microcontrollers, in having registers
> being 'merely' a cache of memory].)
>
> It seems that these aspects and others fit together and might
> benefit from a coherent interface. While a SCHEDULE instruction
<
What makes you think SCHEDULE should be an instruction or even
performed in a CPU ? I can (and did) think of situations where k<n
CPUs all SCHED some other threads simultaneously. These threads
being SCHED probably have different affinity than the threads doing
SCHED and also different priorities.
<
So what you want at a minimum is the ability to take a thread from
a WAIT state to a RUN state, where it can contend with all other
threads on all other run states for execution resources, without
having to "grab" an ATOMIC lock; and vice versa.
>
Secondarily: you want the properties that threads only run on cores
they have affinity to; only preempt threads with which they have
higher priority; AND that at any given point in time, all of the highest
priority runnable n threads are running. Finally: you want to be able to
do these things without IPI-ing from CPU to CPU, or even necessarily
using any form of ATOMIC locking !! My 66000 arch provides such a
mechanism.
<
> (which I have proposed earlier) might be able to join many of
> these aspects, that name/concept does not seem to fit with the
> atomic operation aspect (even though such is schedule-like).
> "Schedule" also implies only managing time/execution resources.
> "Bid" (where a value function is communicated to hardware,
> presumably with a pre-established 'budget') might give a better
> sense of diversity of resources (time, power, cache capacity, NoC
> bandwidth, etc.) and might have a better implication of
> "optimistic concurrency"/transactional memory, but "bid" does not
> shout to me that it is the best name for such an interface.
>
> Giving an interface a name is also not sufficient. One of the
> problems with the YIELD instruction in MIPS MT-ASE is the amount
> of meaning that is implementation dependent with no pre-defined
> scheduler/resource manager interfaces. (It is a little like
> providing load/store instructions with MMIO but with new device
> drivers having to be written for every core implementation, but
> worse because thread scheduling is more closely tied to the
> processor.)
>
> In principle, store-multiple to an MMIO-like "device" could
> provide the same semantic as an instruction. A memory move
> instruction with some atomicity guarantees could increase the data
> available for an atomic command. Sequentially, non-atomically
> storing values could also be used, but such would seem to require
> either additional storage (to isolate data per thread) and
> resumption capability or atomicity failure detection and retry
> capability.
<
My 66000 MM (memory to memory Move instruction) has some special
cases to make use of the My 66000 interconnect architecture (which is
an extension of the PCIe data movement philosophy).
<
My 66000 interconnect philosophy is arranged such that PCIe memory
transfers are supported in the same sizes as PCIe supports--i.e., up to
a page can be transferred in a single PCIe transport command, and so
can be transported in a single My 66000 interconnect command.
<
So when a PCIe device transports a page from SATA to memory, that
page is transported in a single interconnect message. All interested
3rd parties see the state of memory either before any modifications
to that page, or after all modifications to that page. Thus these trans-
ports are ATOMIC.
<
Should a low level device driver be able to construct its message to
its I/O device such that all of the data can be moved from the driver
to the device in a single "message" that message is ATOMIC even
without grabbing the big I/O lock. This can be in the form of STM
when the data is in CPU registers, or it can be MM if the data is in
virtual memory.
>
> It also seems that a thread should be able to "wire money" to
> another thread, granting some of its resource budget to another
> thread. An OS's scheduling facility could be a special case of
> this where the OS has a large budget that it can delegate. This
> also seems to imply a much greater utility to "user-level"
> threads. Combined with something like My 66000's Port Holes;
> budget as well as access permission could be transferred. (This
> then implies yet another area of potential unification, where
> permission is another kind of resource that can be delegated.)
<
Note: My 66000 PortHoles allow one thread to access another thread's
virtual memory--which does not seem to be the semantic you are trying
to use here.
<
Maybe what you want is the ability to send a thread work as if that
thread were called from inside your thread. My 66000 messages
provide this means--and variations of this are used to perform what
most would call supervisor calls.
<
These messages identify their target using a 'method' and the unit of
work gets queued at the target thread. Thread then contends for
core cycles in order to perform the request.
<
Effectively: when the service (target) thread has an empty queue it
in in a WAIT state, and when its queue is not empty it is in a RUN state.
<
{I regret having to be somewhat vague here}
>
> Resource sharing is another factor worth considering. A
> multithreaded program might be more single program multiple data
> (where instruction caching could benefit more than average from
> sharing but data is disjoint) or more data sharing (common
> read-mostly data or diversely modified common data that would
> benefit from closer sharing). Different programs or concurrently
> executing phases might have different core-internal component
> utilization that would benefit from SMT or from a fixed
> partitioning of core-internal resources. Communication patterns
> would also influence where and when an execution stream should be
> executed; pipeline-style communication might tolerate the latency
> of farther separation (even if energy efficiency would urge closer
> placement when convenient) and the bandwidth demand might not be
> high (when a large portion of the data is thread local).
>
> "Bids" also implies a value for timeliness of various types. Some
> tasks might be worthless if not completed by a certain time, some
> might have no extra value if completed early (and even lower value
> when considering buffering overhead). In theory, a task which is
> guaranteed to fail timeliness might be canceled early, freeing
> resources for other tasks. The value of completing at task at a
> certain time might also depend on the resource availability for
> other tasks; if the work cannot be "cashed in" until a collection
> of partially dependent tasks complete, then one task completing
> fast may have no extra value if consuming tasks cannot run
> (stalled on another data hazard or a structural hazard).
<
Hard Real time is <well> hard !! All sorts of ATOMIC things cause
effective priority inversions, unless the Hard RealTime Program
exclusively uses single instruction ATOMIC requests--probably
no less than DCAS--or have blocking free means to assign work
as-if-atomically.
<
Then on top of all the HRTP problems, add in affinity, priority
queueing, and unpredictable thread state changes (RUN<->WAIT)
...............
<
It is harder than you can imagine.......
<
> Performance- and power-heterogeneous cores complicate the
> structural hazard problem. (There may be other forms of
> heterogeneity within the same ISA that complicate resource
> allocation. ISA subsets and heterogeneous ISAs would add further
> complexity.)
>
> I suppose this is enough tangents even for one of my posts.


Click here to read the complete article
Re: A BID for conceptually unified resource management ☺

<4c55a782-2823-4dcb-89bf-51e6fa3472b1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26865&group=comp.arch#26865

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5cc3:0:b0:31f:782:8588 with SMTP id s3-20020ac85cc3000000b0031f07828588mr6270639qta.594.1658645205724;
Sat, 23 Jul 2022 23:46:45 -0700 (PDT)
X-Received: by 2002:a05:6214:c4a:b0:473:d7bb:e308 with SMTP id
r10-20020a0562140c4a00b00473d7bbe308mr5976411qvj.53.1658645205570; Sat, 23
Jul 2022 23:46:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 23 Jul 2022 23:46:45 -0700 (PDT)
In-Reply-To: <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=216.93.195.228; posting-account=5gV3HwoAAAAce05MvbMFVKxb-iBCVVSr
NNTP-Posting-Host: 216.93.195.228
References: <tbhj1b$2k9a$1@dont-email.me> <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4c55a782-2823-4dcb-89bf-51e6fa3472b1n@googlegroups.com>
Subject: Re:_A_BID_for_conceptually_unified_resource_manageme
nt_☺
From: gomijaco...@gmail.com (JohnG)
Injection-Date: Sun, 24 Jul 2022 06:46:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2364
 by: JohnG - Sun, 24 Jul 2022 06:46 UTC

On Saturday, July 23, 2022 at 1:31:35 PM UTC-7, MitchAlsup wrote:
> <
> My 66000 interconnect philosophy is arranged such that PCIe memory
> transfers are supported in the same sizes as PCIe supports--i.e., up to
> a page can be transferred in a single PCIe transport command, and so
> can be transported in a single My 66000 interconnect command.
> <
> So when a PCIe device transports a page from SATA to memory, that
> page is transported in a single interconnect message. All interested
> 3rd parties see the state of memory either before any modifications
> to that page, or after all modifications to that page. Thus these trans-
> ports are ATOMIC.
> <

Just verifying that you understand that just because the PCIe spec allows a TLP of up to 4k, essentially nothing supports transfers of that length; most common max transfer sizes are 128B or 256B. Also any atomicity guarantees would have to come from your cache coherency and memory subsystem because the spec is very quiet about ordering from the POV of the host processors.

-JohnG

Re: A BID for conceptually unified resource management ☺

<tbjijh$kqo9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26869&group=comp.arch#26869

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: paaroncl...@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re:_A_BID_for_conceptually_unified_resource_management_

Date: Sun, 24 Jul 2022 09:48:32 -0400
Organization: A noiseless patient Spider
Lines: 243
Message-ID: <tbjijh$kqo9$1@dont-email.me>
References: <tbhj1b$2k9a$1@dont-email.me>
<07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Jul 2022 13:48:34 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="37982a0888430c575989e4dfcb94bcb8";
logging-data="682761"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/fDEp1vM6fYJt/CFmE6tiTpoS1mQduRcA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.0
Cancel-Lock: sha1:kOAtfeQ+aw5O77NznRqXLelL+8c=
In-Reply-To: <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
 by: Paul A. Clayton - Sun, 24 Jul 2022 13:48 UTC

MitchAlsup wrote:
> On Saturday, July 23, 2022 at 2:43:42 PM UTC-5, Paul A. Clayton wrote:
[snip]>> It seems that these aspects and others fit together and might
>> benefit from a coherent interface. While a SCHEDULE instruction
>
> What makes you think SCHEDULE should be an instruction or even
> performed in a CPU ?

A thread as it is running can reach a point in its execution
stream where the thread is aware that the benefit of its resource
allocations are changing. SCHEDULE/BID informs a hardware
thread scheduler/resource manager of these changes.

Wait-for-event communicates the simple information "I do not have
anything useful to do until after this event occurs".

In some cases, the response to/handling of this information would
be local (at least initially — the information could be passed on,
e.g., a core might communicate to hardware at the level of a core
cluster that the core will be idle until the event occurs).

SCHEDULE/BID does not have to be an instruction if the hardware is
presented as a MMIO device (with architectural definitions of
some behavior and a well-known address that can be recognized by
the core to quickly route the information).

(The difference between a functional unit, a coprocessor, and an
accelerator/device can be fuzzy architecturally — and fuzzier
microarchitecturally.)

By the way, MWAITX only sets a power state. While a SoEMT core
could use this as information to switch-in another thread, I do
not think any x86 has implemented SoEMT. (SoEMT could be viewed as
strong affinity hardware thread scheduling, where hardware is not
allowed to migrate a thread.)

> I can (and did) think of situations where k<n
> CPUs all SCHED some other threads simultaneously. These threads
> being SCHED probably have different affinity than the threads doing
> SCHED and also different priorities.
>
> So what you want at a minimum is the ability to take a thread from
> a WAIT state to a RUN state, where it can contend with all other
> threads on all other run states for execution resources, without
> having to "grab" an ATOMIC lock; and vice versa.
>
> Secondarily: you want the properties that threads only run on cores
> they have affinity to;

Affinity is not binary. The cost of a migration can be near zero
(e.g., a thread is changing phase so most of its cache contents
are not useful and it does not communicate significantly with
other threads [so its relative location is insignificant], etc.),
the cost can be small if the move is short (e.g., moving to
another core that shares L2 data might be quite sufficient). If
the thread has not run for quite some time and other threads have
effectively flushed the caches, affinity considerations may be
about proximity of other agents rather than the seeming more
common cache considerations.

The cost of waiting can also vary. If a task's completion value
would be greatly diminished by waiting until a specific resource
is available, using a different resource may be preferred over
waiting. (The likely performance of the resource can also make a
difference. A task might usually have affinity for a high
performance multithreaded core that is usually only running one or
two of the potential eight threads, but if that core is running
seven threads the task might be better scheduled to a
lower-performance core that is less occupied.)

> only preempt threads with which they have
> higher priority;

Priority is not necessarily "linear". The value of completing a
task at a given time

> AND that at any given point in time, all of the highest
> priority runnable n threads are running. Finally: you want to be able to
> do these things without IPI-ing from CPU to CPU,

With idiom recognition, a software IPI could, I think, function
very much like a hardware-internal communication. Explicit
communication of intent has advantages, but if the designer of an
architecture makes a mistake, the mistake can sometimes be papered
over by microarchitecture.

[snip]
>> In principle, store-multiple to an MMIO-like "device" could
>> provide the same semantic as an instruction. A memory move
>> instruction with some atomicity guarantees could increase the data
>> available for an atomic command. Sequentially, non-atomically
>> storing values could also be used, but such would seem to require
>> either additional storage (to isolate data per thread) and
>> resumption capability or atomicity failure detection and retry
>> capability.
> <
> My 66000 MM (memory to memory Move instruction) has some special
> cases to make use of the My 66000 interconnect architecture (which is
> an extension of the PCIe data movement philosophy).
>
> My 66000 interconnect philosophy is arranged such that PCIe memory
> transfers are supported in the same sizes as PCIe supports--i.e., up to
> a page can be transferred in a single PCIe transport command, and so
> can be transported in a single My 66000 interconnect command.
>
> So when a PCIe device transports a page from SATA to memory, that
> page is transported in a single interconnect message. All interested
> 3rd parties see the state of memory either before any modifications
> to that page, or after all modifications to that page. Thus these trans-
> ports are ATOMIC.
>
> Should a low level device driver be able to construct its message to
> its I/O device such that all of the data can be moved from the driver
> to the device in a single "message" that message is ATOMIC even
> without grabbing the big I/O lock. This can be in the form of STM
> when the data is in CPU registers, or it can be MM if the data is in
> virtual memory.

Yes, I had that in mind since you had mentioned it earlier.

>> It also seems that a thread should be able to "wire money" to
>> another thread, granting some of its resource budget to another
>> thread. An OS's scheduling facility could be a special case of
>> this where the OS has a large budget that it can delegate. This
>> also seems to imply a much greater utility to "user-level"
>> threads. Combined with something like My 66000's Port Holes;
>> budget as well as access permission could be transferred. (This
>> then implies yet another area of potential unification, where
>> permission is another kind of resource that can be delegated.)
>
> Note: My 66000 PortHoles allow one thread to access another thread's
> virtual memory--which does not seem to be the semantic you are trying
> to use here.
>
> Maybe what you want is the ability to send a thread work as if that
> thread were called from inside your thread. My 66000 messages
> provide this means--and variations of this are used to perform what
> most would call supervisor calls.

I am not certain "what I want" in terms of interface.

By the way, a thread might want to transfer resources to speed
completion of other threads in gang scheduling. If a thread that
finishes early merely releases its hold on resources such as local
and global power deliver, local and global heat extraction, NoC
bandwidth — resources which are shared to some degree with other
execution resources — then another thread might replace that
thread and use those resources meaning that the other
gang-scheduled threads cannot use these shared resources. (This is
obviously a contrived case with limited benefit and application,
but if one knows a case can exist, it seems one should consider
making provision for that case.)

> These messages identify their target using a 'method' and the unit of
> work gets queued at the target thread. Thread then contends for
> core cycles in order to perform the request.
>
> Effectively: when the service (target) thread has an empty queue it
> in in a WAIT state, and when its queue is not empty it is in a RUN state.
>
> {I regret having to be somewhat vague here}

While I would like to have all your computer architecture thoughts
from twenty years from now (i.e., when you will have more fully
developed them and worked through even more tradeoff scenarios) —
or even all of them from right now — in a perfectly indexed easily
accessed form, I know both are not possible (the former being
physically impossible, unless time travel is possible ☺). You
provide more than enough to chew on for quite some time.

[snip]
> Hard Real time is <well> hard !!

'Hard real time' is poorly defined. The definition "the deadline
*must* be met" seems useless. If the deadline is extremely lax
(e.g., ten billion years), the distinction between normal programs
(or soft real time programs) seems insignificant. The criticality
of the deadline is also variable. "Does not meet specification"
can mean someone dies, the person responsible pays a financial
penalty, the person responsible serves time in prison, the person
responsible is executed (or killed by a mob), the winning move
("not to play") is not made, or the expectation of service was
violated and customer satisfaction is reduced.


Click here to read the complete article
Re: A BID for conceptually unified resource management ☺

<6fe594a0-aa76-4b38-9750-6d55914f32d1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26873&group=comp.arch#26873

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:31c:b0:6b5:d5a5:687f with SMTP id s28-20020a05620a031c00b006b5d5a5687fmr6823072qkm.375.1658688031456;
Sun, 24 Jul 2022 11:40:31 -0700 (PDT)
X-Received: by 2002:a05:6214:d84:b0:473:3106:a97d with SMTP id
e4-20020a0562140d8400b004733106a97dmr8039647qve.112.1658688031337; Sun, 24
Jul 2022 11:40:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Jul 2022 11:40:31 -0700 (PDT)
In-Reply-To: <4c55a782-2823-4dcb-89bf-51e6fa3472b1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tbhj1b$2k9a$1@dont-email.me> <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<4c55a782-2823-4dcb-89bf-51e6fa3472b1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6fe594a0-aa76-4b38-9750-6d55914f32d1n@googlegroups.com>
Subject: Re:_A_BID_for_conceptually_unified_resource_manageme
nt_☺
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 24 Jul 2022 18:40:31 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sun, 24 Jul 2022 18:40 UTC

On Sunday, July 24, 2022 at 1:46:47 AM UTC-5, JohnG wrote:
> On Saturday, July 23, 2022 at 1:31:35 PM UTC-7, MitchAlsup wrote:
> > <
> > My 66000 interconnect philosophy is arranged such that PCIe memory
> > transfers are supported in the same sizes as PCIe supports--i.e., up to
> > a page can be transferred in a single PCIe transport command, and so
> > can be transported in a single My 66000 interconnect command.
> > <
> > So when a PCIe device transports a page from SATA to memory, that
> > page is transported in a single interconnect message. All interested
> > 3rd parties see the state of memory either before any modifications
> > to that page, or after all modifications to that page. Thus these trans-
> > ports are ATOMIC.
> > <
> Just verifying that you understand that just because the PCIe spec
> allows a TLP of up to 4k, essentially nothing supports transfers of
> that length; most common max transfer sizes are 128B or 256B.
<
Probably because typical interconnects do not support long transfers,
not because they are inefficient in any form.
<
> Also any atomicity guarantees would have to come from your cache
> coherency and memory subsystem because the spec is very quiet
> about ordering from the POV of the host processors.
<
Yes.
>
> -JohnG

Re: A BID for conceptually unified resource management ☺

<74c3833e-cda8-4599-91f1-52a5342cbc84n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26875&group=comp.arch#26875

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:204:b0:31f:3bb:3294 with SMTP id b4-20020a05622a020400b0031f03bb3294mr7748067qtx.436.1658689956186;
Sun, 24 Jul 2022 12:12:36 -0700 (PDT)
X-Received: by 2002:a05:620a:f:b0:6b5:e246:b6f with SMTP id
j15-20020a05620a000f00b006b5e2460b6fmr6741382qki.728.1658689956014; Sun, 24
Jul 2022 12:12:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Jul 2022 12:12:35 -0700 (PDT)
In-Reply-To: <tbjijh$kqo9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tbhj1b$2k9a$1@dont-email.me> <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<tbjijh$kqo9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <74c3833e-cda8-4599-91f1-52a5342cbc84n@googlegroups.com>
Subject: Re:_A_BID_for_conceptually_unified_resource_manageme
nt_☺
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 24 Jul 2022 19:12:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 14406
 by: MitchAlsup - Sun, 24 Jul 2022 19:12 UTC

On Sunday, July 24, 2022 at 8:48:37 AM UTC-5, Paul A. Clayton wrote:
> MitchAlsup wrote:

> > I can (and did) think of situations where k<n
> > CPUs all SCHED some other threads simultaneously. These threads
> > being SCHED probably have different affinity than the threads doing
> > SCHED and also different priorities.
> >
> > So what you want at a minimum is the ability to take a thread from
> > a WAIT state to a RUN state, where it can contend with all other
> > threads on all other run states for execution resources, without
> > having to "grab" an ATOMIC lock; and vice versa.
> >
> > Secondarily: you want the properties that threads only run on cores
> > they have affinity to;
<
> Affinity is not binary. The cost of a migration can be near zero
> (e.g., a thread is changing phase so most of its cache contents
> are not useful and it does not communicate significantly with
> other threads [so its relative location is insignificant], etc.),
> the cost can be small if the move is short (e.g., moving to
> another core that shares L2 data might be quite sufficient). If
> the thread has not run for quite some time and other threads have
> effectively flushed the caches, affinity considerations may be
> about proximity of other agents rather than the seeming more
> common cache considerations.
<
Yes, affinity is not binary, and a multiplicity of threads can have
a combined affinity set where the sum of the threads can execute
everywhere, but any individual thread can only execute on a small
partition of cores.
<
Affinity also interacts with priority:: Consider a system where there
are k cores and p priority levels. And we have a thread t which can
only run on core c at a priority p1 which is significantly above the
typical priority of worker threads. Now, t is runnable, but there is at
least 1 running thread on core c at a slightly higher priority level
than p1, so lower priority threads continue to run on other cores
even while thread t waits. Here, affinity* is creating a priority inversion..
(*) most would say they ask for it by using affinity as they did.
<
As far as a thread running on a core goes, whenever it decides it has
completed a job and will wait for another unit of work to show up
it can simply execute a WAIT instruction and go into a wait-state.
Whenever* more work shows up, it contends with affinity and priority
to execute on a core. (*) Whenever might even be "already happened".
>
> The cost of waiting can also vary. If a task's completion value
> would be greatly diminished by waiting until a specific resource
> is available, using a different resource may be preferred over
> waiting. (The likely performance of the resource can also make a
> difference. A task might usually have affinity for a high
> performance multithreaded core that is usually only running one or
> two of the potential eight threads, but if that core is running
> seven threads the task might be better scheduled to a
> lower-performance core that is less occupied.)
> > only preempt threads with which they have
> > higher priority;
> Priority is not necessarily "linear". The value of completing a
> task at a given time
> > AND that at any given point in time, all of the highest
> > priority runnable n threads are running. Finally: you want to be able to
> > do these things without IPI-ing from CPU to CPU,
<
> With idiom recognition, a software IPI could, I think, function
> very much like a hardware-internal communication. Explicit
> communication of intent has advantages, but if the designer of an
> architecture makes a mistake, the mistake can sometimes be papered
> over by microarchitecture.
<
But imagine core c starting up thread t at priority p within its affinity set
a without needing an IPI to perform this. Thread t begins running on
one core in the affinity set a at priority p simple because core c delivered
work to its "in box".
>
> [snip]
> >> In principle, store-multiple to an MMIO-like "device" could
> >> provide the same semantic as an instruction. A memory move
> >> instruction with some atomicity guarantees could increase the data
> >> available for an atomic command. Sequentially, non-atomically
> >> storing values could also be used, but such would seem to require
> >> either additional storage (to isolate data per thread) and
> >> resumption capability or atomicity failure detection and retry
> >> capability.
> > <
> > My 66000 MM (memory to memory Move instruction) has some special
> > cases to make use of the My 66000 interconnect architecture (which is
> > an extension of the PCIe data movement philosophy).
> >
> > My 66000 interconnect philosophy is arranged such that PCIe memory
> > transfers are supported in the same sizes as PCIe supports--i.e., up to
> > a page can be transferred in a single PCIe transport command, and so
> > can be transported in a single My 66000 interconnect command.
> >
> > So when a PCIe device transports a page from SATA to memory, that
> > page is transported in a single interconnect message. All interested
> > 3rd parties see the state of memory either before any modifications
> > to that page, or after all modifications to that page. Thus these trans-
> > ports are ATOMIC.
> >
> > Should a low level device driver be able to construct its message to
> > its I/O device such that all of the data can be moved from the driver
> > to the device in a single "message" that message is ATOMIC even
> > without grabbing the big I/O lock. This can be in the form of STM
> > when the data is in CPU registers, or it can be MM if the data is in
> > virtual memory.
<
> Yes, I had that in mind since you had mentioned it earlier.
<
> >> It also seems that a thread should be able to "wire money" to
> >> another thread, granting some of its resource budget to another
> >> thread. An OS's scheduling facility could be a special case of
> >> this where the OS has a large budget that it can delegate. This
> >> also seems to imply a much greater utility to "user-level"
> >> threads. Combined with something like My 66000's Port Holes;
> >> budget as well as access permission could be transferred. (This
> >> then implies yet another area of potential unification, where
> >> permission is another kind of resource that can be delegated.)
> >
> > Note: My 66000 PortHoles allow one thread to access another thread's
> > virtual memory--which does not seem to be the semantic you are trying
> > to use here.
> >
> > Maybe what you want is the ability to send a thread work as if that
> > thread were called from inside your thread. My 66000 messages
> > provide this means--and variations of this are used to perform what
> > most would call supervisor calls.
<
> I am not certain "what I want" in terms of interface.
>
> By the way, a thread might want to transfer resources to speed
> completion of other threads in gang scheduling. If a thread that
<
I can see thread t sending STAT device s to another thread, but I cannot
see sending core c's FPU to a thread not already running on core c.
So what kind of limitations are you self-imposing on the word resource,
here ??
<
> finishes early merely releases its hold on resources such as local
> and global power deliver, local and global heat extraction, NoC
> bandwidth — resources which are shared to some degree with other
> execution resources — then another thread might replace that
> thread and use those resources meaning that the other
> gang-scheduled threads cannot use these shared resources. (This is
> obviously a contrived case with limited benefit and application,
> but if one knows a case can exist, it seems one should consider
> making provision for that case.)
<
I think, for the most part, systems use locks to perform these transfers
(i.e., greedy)
<
> > These messages identify their target using a 'method' and the unit of
> > work gets queued at the target thread. Thread then contends for
> > core cycles in order to perform the request.
> >
> > Effectively: when the service (target) thread has an empty queue it
> > in in a WAIT state, and when its queue is not empty it is in a RUN state.
> >
> > {I regret having to be somewhat vague here}
<
> While I would like to have all your computer architecture thoughts
> from twenty years from now (i.e., when you will have more fully
> developed them and worked through even more tradeoff scenarios) —
> or even all of them from right now — in a perfectly indexed easily
> accessed form, I know both are not possible (the former being
> physically impossible, unless time travel is possible ☺). You
> provide more than enough to chew on for quite some time.
<
Bon Appétit.
>
> [snip]
> > Hard Real time is <well> hard !!
<
> 'Hard real time' is poorly defined. The definition "the deadline
> *must* be met" seems useless. If the deadline is extremely lax
> (e.g., ten billion years), the distinction between normal programs
> (or soft real time programs) seems insignificant. The criticality
> of the deadline is also variable. "Does not meet specification"
> can mean someone dies, the person responsible pays a financial
> penalty, the person responsible serves time in prison, the person
> responsible is executed (or killed by a mob), the winning move
> ("not to play") is not made, or the expectation of service was
> violated and customer satisfaction is reduced.
>
> Reliability of timing can also sometimes be interchanged with
> reliability of general function. A four-worker redundant system
> could have one worker fail timing and still provide three on-time
> completions for a majority vote.
>
> In practice, I think HRT means a tight deadline (where there is
> significant risk of not meeting the deadline if significant effort
> is not used to constrain operation and interference), but
> "significant risk" and "significant effort" are poorly defined.
> Using a processor with ten times the performance might or might
> not be considered significant effort. Ten times more likely than
> all other failure modes combined might or might not be significant
> risk.
>
> ("Effort" seems tied to other areas of the system. In some HRT
> systems, increasing processor performance is cheap; in others,
> even small changes can greatly change system cost/value.
> Organizational structure can also be important; porous
> organizations may support more flexible resource allocation at the
> cost of complexity. Knowing that a seemingly small interface
> change can greatly ease a design task can be frustrating when one
> cannot even ask about the cost of that change to other design
> tasks in the system. On the other hand, questioning a nearest
> neighbor in a design effort can result in an effective broadcast
> of the question, having a cost far greater than the asker would
> assume.)
>
> I suspect formal proofs of meeting the specification are not
> commonly used for HRT systems, so that part of failure risk is
> excluded. (Of course, specifications can fail to match
> expectation/desire or be misinterpreted/misapplied when doing
> product design validation. Production variation and testing
> constraints would also seem to add risk.)
<
> > All sorts of ATOMIC things cause
> > effective priority inversions, unless the Hard RealTime Program
> > exclusively uses single instruction ATOMIC requests--probably
> > no less than DCAS--or have blocking free means to assign work
> > as-if-atomically.
> > <
> > Then on top of all the HRTP problems, add in affinity, priority
> > queueing, and unpredictable thread state changes (RUN<->WAIT)
> > ..............
> >
> > It is harder than you can imagine.......
<
> It is certainly harder, in the general case, than I would want to
> imagine. Yet I also think it could be fun to work on such a
> system, thinking about how could this go wrong, what can we
> constrain to make the problem more tractable, what is the actual
> requirement, etc.
>
> Of course, such projects would be much less fun when the
> parameters change frequently: "it needs this functionality also",
> "its maximum power draw is now 1.53A rather 1.74A", "the design
> must be ready for system validation in three weeks rather than
> three months", etc. Frequent changes and extreme expectations
> probably increase the temptation to cheat; pressuring someone to
> lie may be even worse for an engineer.


Click here to read the complete article
Re: A BID for conceptually unified resource management ☺

<d23c7636-9094-49f1-9d56-c68d2bd719e8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26880&group=comp.arch#26880

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5c91:0:b0:31f:2385:3633 with SMTP id r17-20020ac85c91000000b0031f23853633mr8104633qta.674.1658698453067;
Sun, 24 Jul 2022 14:34:13 -0700 (PDT)
X-Received: by 2002:a05:620a:28c8:b0:6b5:e327:3358 with SMTP id
l8-20020a05620a28c800b006b5e3273358mr6876455qkp.365.1658698452896; Sun, 24
Jul 2022 14:34:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Jul 2022 14:34:12 -0700 (PDT)
In-Reply-To: <6fe594a0-aa76-4b38-9750-6d55914f32d1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=108.91.172.125; posting-account=5gV3HwoAAAAce05MvbMFVKxb-iBCVVSr
NNTP-Posting-Host: 108.91.172.125
References: <tbhj1b$2k9a$1@dont-email.me> <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<4c55a782-2823-4dcb-89bf-51e6fa3472b1n@googlegroups.com> <6fe594a0-aa76-4b38-9750-6d55914f32d1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d23c7636-9094-49f1-9d56-c68d2bd719e8n@googlegroups.com>
Subject: Re:_A_BID_for_conceptually_unified_resource_manageme
nt_☺
From: gomijaco...@gmail.com (JohnG)
Injection-Date: Sun, 24 Jul 2022 21:34:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3049
 by: JohnG - Sun, 24 Jul 2022 21:34 UTC

On Sunday, July 24, 2022 at 11:40:32 AM UTC-7, MitchAlsup wrote:
> On Sunday, July 24, 2022 at 1:46:47 AM UTC-5, JohnG wrote:
> > Just verifying that you understand that just because the PCIe spec
> > allows a TLP of up to 4k, essentially nothing supports transfers of
> > that length; most common max transfer sizes are 128B or 256B.
> <
> Probably because typical interconnects do not support long transfers,
> not because they are inefficient in any form.

PCIe as-designed has a few issues that make large transfers less desirable. The first it that it's hard to not have at least one store-and-forward penalty per hop since the error correction covers the whole transfer and comes at the end. If you start doing cut-through you need a way to undo the earlier part of the transfer if you find the packet was corrupt. The other is head-of-line blocking as smaller, latency-sensitive packets might get jammed up behind the larger transfers.

This might change somewhat with PCIe 6.0's FLIT-Mode. It's a micropacket-based strategy where every 256B FLIT carries 236B of payload and the other 20B is error correction, link credits, etc. So it starts to look a bit more like other HPC interconnects (Cray's Gemini, Ares, and Slingshot; SGI's HIPPI-6400; or ATM). I'm not sure if TLPs can be interleaved though. And you still might have to take one store-and-forward penalty as I'm guessing TLPs might still have the trailing ECRC to protect the whole transfer (though usually the rationale for FLITs is the network is supposed to be 'reliable' to the upper-layer protocols). I haven't looked at the spec.

-JohnG

Re: A BID for conceptually unified resource management ☺

<bb3746cb-9db9-4ef0-abb3-0e965f870a5en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26887&group=comp.arch#26887

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5e4d:0:b0:31f:36b7:5b16 with SMTP id i13-20020ac85e4d000000b0031f36b75b16mr2758290qtx.132.1658703454850;
Sun, 24 Jul 2022 15:57:34 -0700 (PDT)
X-Received: by 2002:a05:620a:c16:b0:6b6:c7c:67c9 with SMTP id
l22-20020a05620a0c1600b006b60c7c67c9mr7536303qki.656.1658703454715; Sun, 24
Jul 2022 15:57:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Jul 2022 15:57:34 -0700 (PDT)
In-Reply-To: <d23c7636-9094-49f1-9d56-c68d2bd719e8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <tbhj1b$2k9a$1@dont-email.me> <07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<4c55a782-2823-4dcb-89bf-51e6fa3472b1n@googlegroups.com> <6fe594a0-aa76-4b38-9750-6d55914f32d1n@googlegroups.com>
<d23c7636-9094-49f1-9d56-c68d2bd719e8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bb3746cb-9db9-4ef0-abb3-0e965f870a5en@googlegroups.com>
Subject: Re:_A_BID_for_conceptually_unified_resource_manageme
nt_☺
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 24 Jul 2022 22:57:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3285
 by: MitchAlsup - Sun, 24 Jul 2022 22:57 UTC

On Sunday, July 24, 2022 at 4:34:14 PM UTC-5, JohnG wrote:
> On Sunday, July 24, 2022 at 11:40:32 AM UTC-7, MitchAlsup wrote:
> > On Sunday, July 24, 2022 at 1:46:47 AM UTC-5, JohnG wrote:
> > > Just verifying that you understand that just because the PCIe spec
> > > allows a TLP of up to 4k, essentially nothing supports transfers of
> > > that length; most common max transfer sizes are 128B or 256B.
> > <
> > Probably because typical interconnects do not support long transfers,
> > not because they are inefficient in any form.
> PCIe as-designed has a few issues that make large transfers less desirable. The first it that it's hard to not have at least one store-and-forward penalty per hop since the error correction covers the whole transfer and comes at the end. If you start doing cut-through you need a way to undo the earlier part of the transfer if you find the packet was corrupt. The other is head-of-line blocking as smaller, latency-sensitive packets might get jammed up behind the larger transfers.
<
My 66000 transport is SECDED ECC protected on each 64-bit boundary.
>
> This might change somewhat with PCIe 6.0's FLIT-Mode. It's a micropacket-based strategy where every 256B FLIT carries 236B of payload and the other 20B is error correction, link credits, etc. So it starts to look a bit more like other HPC interconnects (Cray's Gemini, Ares, and Slingshot; SGI's HIPPI-6400; or ATM). I'm not sure if TLPs can be interleaved though. And you still might have to take one store-and-forward penalty as I'm guessing TLPs might still have the trailing ECRC to protect the whole transfer (though usually the rationale for FLITs is the network is supposed to be 'reliable' to the upper-layer protocols). I haven't looked at the spec.
>
> -JohnG

Re: A BID for conceptually unified resource management ☺

<jwvv8rmvyl0.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26888&group=comp.arch#26888

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: A BID for conceptually unified resource management

Date: Sun, 24 Jul 2022 20:16:53 -0400
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <jwvv8rmvyl0.fsf-monnier+comp.arch@gnu.org>
References: <tbhj1b$2k9a$1@dont-email.me>
<07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<tbjijh$kqo9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="46f38ab11984cb2e8996536073517d0a";
logging-data="954571"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+snhXalf5egJpsEvlJzTnk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:ADhRU6tes8w79s8KYEKtxh39Iw0=
sha1:atimkEO/VdDDp0fvPXiyosmEho8=
 by: Stefan Monnier - Mon, 25 Jul 2022 00:16 UTC

> 'Hard real time' is poorly defined. The definition "the deadline *must* be
> met" seems useless.

Kinda, indeed. I usually take it to mean that a late answer is a wrong
answer ("wrong answer" has a similar set of possible consequences).

E.g. when playing a song, the decoder has hard real time constraints
because if it delivers the result too late, you hear an interruption in
your music, which is comparable to playing the wrong note.

Stefan

Re: A BID for conceptually unified resource management ☺

<tbkp23$tere$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26889&group=comp.arch#26889

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re:_A_BID_for_conceptually_unified_resource_management_

Date: Sun, 24 Jul 2022 17:44:52 -0700
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <tbkp23$tere$1@dont-email.me>
References: <tbhj1b$2k9a$1@dont-email.me>
<07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<tbjijh$kqo9$1@dont-email.me> <jwvv8rmvyl0.fsf-monnier+comp.arch@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 25 Jul 2022 00:44:51 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="1663c383817a87942935d6a881c8976f";
logging-data="965486"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Oc4M02leoaDjywJlLy3K9"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:EXfx4ueWEJGvCp8ag8JZDQtWeQo=
In-Reply-To: <jwvv8rmvyl0.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US
 by: Ivan Godard - Mon, 25 Jul 2022 00:44 UTC

On 7/24/2022 5:16 PM, Stefan Monnier wrote:
>> 'Hard real time' is poorly defined. The definition "the deadline *must* be
>> met" seems useless.
>
> Kinda, indeed. I usually take it to mean that a late answer is a wrong
> answer ("wrong answer" has a similar set of possible consequences).
>
> E.g. when playing a song, the decoder has hard real time constraints
> because if it delivers the result too late, you hear an interruption in
> your music, which is comparable to playing the wrong note.
>
>
> Stefan

Real-time is where there is an event time for which "late" is an event
failure.

Soft RT is when an event failure is a statistic.

Hard RT is when an event fail is a systemic failure.

Re: A BID for conceptually unified resource management ☺

<tblkd1$14adq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26893&group=comp.arch#26893

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re:_A_BID_for_conceptually_unified_resource_management_

Date: Mon, 25 Jul 2022 03:31:24 -0500
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <tblkd1$14adq$1@dont-email.me>
References: <tbhj1b$2k9a$1@dont-email.me>
<07cdc263-49c5-4991-b4a2-2f690778ef5an@googlegroups.com>
<tbjijh$kqo9$1@dont-email.me> <jwvv8rmvyl0.fsf-monnier+comp.arch@gnu.org>
<tbkp23$tere$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 25 Jul 2022 08:31:30 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="17dfea8260c097f7bb3c88934754c72b";
logging-data="1190330"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VQS4dqcTO7dfoK/3bGU6F"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:p6pF5aaHah1plqi+mAsSZSXP4xc=
Content-Language: en-US
In-Reply-To: <tbkp23$tere$1@dont-email.me>
 by: BGB - Mon, 25 Jul 2022 08:31 UTC

On 7/24/2022 7:44 PM, Ivan Godard wrote:
> On 7/24/2022 5:16 PM, Stefan Monnier wrote:
>>> 'Hard real time' is poorly defined. The definition "the deadline
>>> *must* be
>>> met" seems useless.
>>
>> Kinda, indeed.  I usually take it to mean that a late answer is a wrong
>> answer ("wrong answer" has a similar set of possible consequences).
>>
>> E.g. when playing a song, the decoder has hard real time constraints
>> because if it delivers the result too late, you hear an interruption in
>> your music, which is comparable to playing the wrong note.
>>
>>
>>          Stefan
>
>
> Real-time is where there is an event time for which "late" is an event
> failure.
>
> Soft RT is when an event failure is a statistic.
>
> Hard RT is when an event fail is a systemic failure.

Agreed.

Something like multimedia playback is more soft real time, because while
it may degrade the quality of the experience (from dropped frames and
audio artifacts), this is the limit of its effects.

Meanwhile, for cases where a timing issue may result in hardware damage,
hardware failure, malfunction resulting in some other form of permanent
damage, or some other similarly adverse effect. These are more "hard
real time".

They may also effect design constraints on the system as a whole.

A soft real time system can generally tolerate being run on a
multitasking OS with virtual memory and other sources of potential
timing issue (the program will just have to deal with it).

For hard real time, this may be unacceptable, and the only really viable
option is for the controller program to run "bare metal".

....

Likewise, what one deals with in something like a 3D game or video
player is not really anything like what one needs to deal with in
something like a CNC controller, ...

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor