Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"I prefer rogues to imbeciles, because they sometimes take a rest." -- Alexandre Dumas (fils)


devel / comp.arch / A similarity between loop parallelism and thread parallelism?

SubjectAuthor
* A similarity between loop parallelism and thread parallelism?Paul A. Clayton
+* Re: A similarity between loop parallelism and thread parallelism?MitchAlsup
|`* Re: A similarity between loop parallelism and thread parallelism?Terje Mathisen
| `- Re: A similarity between loop parallelism and thread parallelism?MitchAlsup
`* Re: A similarity between loop parallelism and thread parallelism?MitchAlsup
 `* Re: A similarity between loop parallelism and thread parallelism?Ivan Godard
  `* Re: A similarity between loop parallelism and thread parallelism?MitchAlsup
   +- Re: A similarity between loop parallelism and thread parallelism?Branimir Maksimovic
   `- Re: A similarity between loop parallelism and thread parallelism?Ivan Godard

1
A similarity between loop parallelism and thread parallelism?

<7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20696&group=comp.arch#20696

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:e897:: with SMTP id a145mr5820926qkg.297.1633023921817;
Thu, 30 Sep 2021 10:45:21 -0700 (PDT)
X-Received: by 2002:a05:6830:1e7b:: with SMTP id m27mr6297814otr.350.1633023921665;
Thu, 30 Sep 2021 10:45:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 30 Sep 2021 10:45:21 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
Subject: A similarity between loop parallelism and thread parallelism?
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Thu, 30 Sep 2021 17:45:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 84
 by: Paul A. Clayton - Thu, 30 Sep 2021 17:45 UTC

[This post is less complete/coherent than I had hoped but
still might be worth posting.]

In a loop three kinds of values can be distinguished: in-loop
constants, counter-like values that can be generated in a
straightforward manner for any iteration based on in-loop
constants and an interation identifier, and values dependent
on other iterations.

(With respect to "generated in a straightforward manner" I
was thinking mainly of stride-based values, but
"straightforward" seems to be more concerned with the cost of
redundant work — including reading values — relative to the
parallelism exposed.)

Mitch Alsup's My 66000's Virtual Vector Method declares
in-loop constants but does not (if I understand correctly)
indicate values of the second kind (which can be generated
in parallel with modest extra work), perhaps expecting
hardware to recognize accumulations and "unroll" them into a
parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]c[i]+2N, c[i+3]=c[i]+3N, etc.).

(Accumulations where intermediate values are not exposed may
provide other work reductions, e.g., a carry-save adder does
less hardware-work than a full adder when accumulating.
Even with intermitent exposure of intermediate values, e.g.,
conditional use or intermittent checkpointing, some
optimization might be practical.)

It seems that thread parallelism would have similar
distinctions of value types. As with a loop there may be
value-communication constraints on how much "unrolling" is
possible, but it seems that there would still be significant
opportunities. The broader decoupling of thread parallelism
increases the difficulty of software "unrolling". As with a
loop, the iteration count and parallel execution resource
availability influence cost-benefit, but with threads the
"iteration count" (expected/average/exploitable concurrent
threads performing a task (or accessing shared values/names)
can be more dynamic and the parallel execution resources can
typically scale more.

Lock elision (one use of transactional memory) seeks to
remove something like a name dependency on the lock name. It
seems this could be expanded to a broader range of name
dependencies (perhaps via versioned memory) and "iteration"
dependencies [I rather suspect there is a CS term for this].

My 66000's Exotic (Enhanced?) Synchronization Method provides
a facility for something like iteration count to avoid
artificial conflict/dependence, though (if I understand
correctly) this is only exposed after a conflict and the
generality (no software-provided information) seems to
constrain the information hardware can provide to software
or exploit directly. (There might be circumstances where
hardware could provide an "iteration ID" or version/timestamp
that could be used to generate unique values.)

(Some generated values may be friendly to temporal tricks.
E.g., a pointer returned by a memory allocation could use a
dummy address that is updated with an actual address later.
Performance counter incrementing typically does not return
a value even if precise [non-atomic updates can be acceptably
accurate for some performance monitoring]. This also seems to
touch on the tradeoff of software partitioning a resource so
that atomic/unique updates are independent across partitions
compared with hardware increasing the speed (and/or hiding
the latency) of atomic operations.)

With loops, partially from programming convenience if
largely from semantic necessity, collection indexes are
shared among collections. With threads, there seems to more
often be a decoupling, single-use may be required but there
may be no coupling among collections. (E.g., income and
expenses must be single-use and the binding has a temporal
component, but the binding is not necessarily strict [e.g.,
barring certain accounting rules] between which source of
income goes to which expense.)

I want to think on this more, but I felt some urgency to
post something. This may be something taught in Introduction
to Parallelism Theory in CS ciricula, but it seemed an
interesting connection with potential applications.

Re: A similarity between loop parallelism and thread parallelism?

<220a0765-ea71-4400-b28b-eeac8617468an@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20697&group=comp.arch#20697

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9e8d:: with SMTP id h135mr6470824qke.189.1633029052140;
Thu, 30 Sep 2021 12:10:52 -0700 (PDT)
X-Received: by 2002:aca:5f09:: with SMTP id t9mr680952oib.157.1633029051910;
Thu, 30 Sep 2021 12:10:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 30 Sep 2021 12:10:51 -0700 (PDT)
In-Reply-To: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d99e:42d4:12ef:f6f;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d99e:42d4:12ef:f6f
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <220a0765-ea71-4400-b28b-eeac8617468an@googlegroups.com>
Subject: Re: A similarity between loop parallelism and thread parallelism?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 30 Sep 2021 19:10:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 138
 by: MitchAlsup - Thu, 30 Sep 2021 19:10 UTC

On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
> [This post is less complete/coherent than I had hoped but
> still might be worth posting.]
>
> In a loop three kinds of values can be distinguished: in-loop
> constants, counter-like values that can be generated in a
> straightforward manner for any iteration based on in-loop
> constants and an interation identifier, and values dependent
> on other iterations.
>
> (With respect to "generated in a straightforward manner" I
> was thinking mainly of stride-based values, but
> "straightforward" seems to be more concerned with the cost of
> redundant work — including reading values — relative to the
> parallelism exposed.)
>
> Mitch Alsup's My 66000's Virtual Vector Method declares
> in-loop constants but does not (if I understand correctly)
> indicate values of the second kind (which can be generated
> in parallel with modest extra work), perhaps expecting
> hardware to recognize accumulations and "unroll" them into a
> parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]=
> c[i]+2N, c[i+3]=c[i]+3N, etc.).
<
The distinction I make in VVM are a) variables (or immediates)
that do not change while the loop is iterating--these are read
out once and placed into the reservation station entry operands
are are provided to the function unit on each iteration of the loop;
b) vector temporaries--values that are recalculated each iteration,
c) loop carried dependencies--values that a successor iteration
much wait for a prior iteration to complete. The distinction between
b and c is subtle and involves how the first iteration is setup in the
stations, but also includes which values from within the loop arrive
outside of the loop. So the distinction is how loops are setup and
how loops terminate.
<
HW is allowed to perform as many loop iterations as it can
beginning k iterations every cycle. Where it is unlikely that k
will ever be larger than 8 except on byte/char loops.
>
> (Accumulations where intermediate values are not exposed may
> provide other work reductions, e.g., a carry-save adder does
> less hardware-work than a full adder when accumulating.
> Even with intermitent exposure of intermediate values, e.g.,
> conditional use or intermittent checkpointing, some
> optimization might be practical.)
<
This is currently "on hold" in My 66000, I know how to "do it"
but I am failing to see the use case--but perhaps the quire of
posits will push the issue.
>
> It seems that thread parallelism would have similar
> distinctions of value types. As with a loop there may be
> value-communication constraints on how much "unrolling" is
> possible, but it seems that there would still be significant
> opportunities. The broader decoupling of thread parallelism
> increases the difficulty of software "unrolling". As with a
> loop, the iteration count and parallel execution resource
> availability influence cost-benefit, but with threads the
> "iteration count" (expected/average/exploitable concurrent
> threads performing a task (or accessing shared values/names)
> can be more dynamic and the parallel execution resources can
> typically scale more.
<
Until one can spawn threads for a similar cost as performing
a unit of integer arithmetic, and collect a dead thread at
similar cost, SW will always want to use as FEW threads as
possible, not as many.
>
> Lock elision (one use of transactional memory) seeks to
> remove something like a name dependency on the lock name. It
> seems this could be expanded to a broader range of name
> dependencies (perhaps via versioned memory) and "iteration"
> dependencies [I rather suspect there is a CS term for this].
>
> My 66000's Exotic (Enhanced?) Synchronization Method provides
Exotic
> a facility for something like iteration count to avoid
> artificial conflict/dependence, though (if I understand
> correctly) this is only exposed after a conflict and the
> generality (no software-provided information) seems to
> constrain the information hardware can provide to software
> or exploit directly. (There might be circumstances where
> hardware could provide an "iteration ID" or version/timestamp
> that could be used to generate unique values.)
<
ESM attempts to run straight through an ATOMIC event just in
case "it works" !! A failure of running straight through an event
causes HW to suspect that interference is "going on" and that
is it safer to be correct than fast. In this mode, each bundle of
memory addresses is shipped to an arbiter who looks to see
if any current events are currently working on the same address(s).
If the address list is full, the arbiter sends the OK message (value=0)
If the address is on the list, the abriter sends back the count main-
tained by the address (positive). If the list is full the arbiter sends
back the "try again later" message (negative).
<
SW can proactively use the positive number sent back by the arbiter
to perform an event on something else (deeper into the queue)
thus minimizing (near future) interference.
<
All of this was an attempt at balancing the notion that an interference
free ATOMIC event should be no more costly than the same series of
instructions (but without any hint of ATOMIC operation). That is it
ATOMICITY is cheap in the absence of interference. With the notion
that ATOMICITY remains correct when interference is present.
>
> (Some generated values may be friendly to temporal tricks.
> E.g., a pointer returned by a memory allocation could use a
> dummy address that is updated with an actual address later.
> Performance counter incrementing typically does not return
> a value even if precise [non-atomic updates can be acceptably
> accurate for some performance monitoring]. This also seems to
> touch on the tradeoff of software partitioning a resource so
> that atomic/unique updates are independent across partitions
> compared with hardware increasing the speed (and/or hiding
> the latency) of atomic operations.)
>
> With loops, partially from programming convenience if
> largely from semantic necessity, collection indexes are
> shared among collections. With threads, there seems to more
> often be a decoupling, single-use may be required but there
> may be no coupling among collections. (E.g., income and
> expenses must be single-use and the binding has a temporal
> component, but the binding is not necessarily strict [e.g.,
> barring certain accounting rules] between which source of
> income goes to which expense.)
<
Andy Glew was researching this decades ago, using micro-threads
to expand the execution windows. One requires very low thread
spawn costs to achieve forward progress--and significant work in
compilers and languages IIRC.
>
> I want to think on this more, but I felt some urgency to
> post something. This may be something taught in Introduction
> to Parallelism Theory in CS ciricula, but it seemed an
> interesting connection with potential applications.

Re: A similarity between loop parallelism and thread parallelism?

<sj6al7$134t$1@gioia.aioe.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20699&group=comp.arch#20699

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!ppYixYMWAWh/woI8emJOIQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: A similarity between loop parallelism and thread parallelism?
Date: Fri, 1 Oct 2021 08:42:47 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sj6al7$134t$1@gioia.aioe.org>
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
<220a0765-ea71-4400-b28b-eeac8617468an@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="35997"; posting-host="ppYixYMWAWh/woI8emJOIQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.9
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Fri, 1 Oct 2021 06:42 UTC

MitchAlsup wrote:
> On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
>> [This post is less complete/coherent than I had hoped but
>> still might be worth posting.]

Thanks for posting, if nothing else it brings out Mitch! :-)

>> My 66000's Exotic (Enhanced?) Synchronization Method provides
> Exotic
>> a facility for something like iteration count to avoid
>> artificial conflict/dependence, though (if I understand
>> correctly) this is only exposed after a conflict and the
>> generality (no software-provided information) seems to
>> constrain the information hardware can provide to software
>> or exploit directly. (There might be circumstances where
>> hardware could provide an "iteration ID" or version/timestamp
>> that could be used to generate unique values.)
> <
> ESM attempts to run straight through an ATOMIC event just in
> case "it works" !! A failure of running straight through an event
> causes HW to suspect that interference is "going on" and that
> is it safer to be correct than fast. In this mode, each bundle of
> memory addresses is shipped to an arbiter who looks to see
> if any current events are currently working on the same address(s).
> If the address list is full, the arbiter sends the OK message (value=0)

I am pretty sure that's a typo, you meant to say "if the address list is
empty", right?

> If the address is on the list, the abriter sends back the count main-
> tained by the address (positive). If the list is full the arbiter sends
> back the "try again later" message (negative).
> <
> SW can proactively use the positive number sent back by the arbiter
> to perform an event on something else (deeper into the queue)
> thus minimizing (near future) interference.

This is a/the key step: Provide feedback from the hw arbiter since that
is the only entity which has full visibility into everything relevant
currently going on.

I've posted previously how the exact same idea allowed us to save the
emergency alert system for Oslo's newest hospital.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: A similarity between loop parallelism and thread parallelism?

<0b1f3296-060a-4e4c-829e-c5b41e0338ean@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20700&group=comp.arch#20700

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a095:: with SMTP id j143mr9620260qke.277.1633096119148;
Fri, 01 Oct 2021 06:48:39 -0700 (PDT)
X-Received: by 2002:aca:5c3:: with SMTP id 186mr3924635oif.155.1633096118867;
Fri, 01 Oct 2021 06:48:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 1 Oct 2021 06:48:38 -0700 (PDT)
In-Reply-To: <sj6al7$134t$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e849:c6d9:1c51:4f04;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e849:c6d9:1c51:4f04
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
<220a0765-ea71-4400-b28b-eeac8617468an@googlegroups.com> <sj6al7$134t$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0b1f3296-060a-4e4c-829e-c5b41e0338ean@googlegroups.com>
Subject: Re: A similarity between loop parallelism and thread parallelism?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 01 Oct 2021 13:48:39 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 55
 by: MitchAlsup - Fri, 1 Oct 2021 13:48 UTC

On Friday, October 1, 2021 at 1:42:50 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
> >> [This post is less complete/coherent than I had hoped but
> >> still might be worth posting.]
> Thanks for posting, if nothing else it brings out Mitch! :-)
> >> My 66000's Exotic (Enhanced?) Synchronization Method provides
> > Exotic
> >> a facility for something like iteration count to avoid
> >> artificial conflict/dependence, though (if I understand
> >> correctly) this is only exposed after a conflict and the
> >> generality (no software-provided information) seems to
> >> constrain the information hardware can provide to software
> >> or exploit directly. (There might be circumstances where
> >> hardware could provide an "iteration ID" or version/timestamp
> >> that could be used to generate unique values.)
> > <
> > ESM attempts to run straight through an ATOMIC event just in
> > case "it works" !! A failure of running straight through an event
> > causes HW to suspect that interference is "going on" and that
> > is it safer to be correct than fast. In this mode, each bundle of
> > memory addresses is shipped to an arbiter who looks to see
> > if any current events are currently working on the same address(s).
> > If the address list is full, the arbiter sends the OK message (value=0)
> I am pretty sure that's a typo, you meant to say "if the address list is
> empty", right?
<
How did I type that ?!?
<
What I meant is:: if no address on the bundled list matches any address
already on the list, the arbiter returns 0 (zero=success).
<
I should also mention that the core receiving success is allowed to NaK
requests for its address list delaying the interferer and not the interfered.
At the end of the event the core sends out the same address list to the
arbiter who uses it to remove addresses from "the list".
<
> > If the address is on the list, the abriter sends back the count main-
> > tained by the address (positive). If the list is full the arbiter sends
> > back the "try again later" message (negative).
> > <
> > SW can proactively use the positive number sent back by the arbiter
> > to perform an event on something else (deeper into the queue)
> > thus minimizing (near future) interference.
> This is a/the key step: Provide feedback from the hw arbiter since that
> is the only entity which has full visibility into everything relevant
> currently going on.
>
> I've posted previously how the exact same idea allowed us to save the
> emergency alert system for Oslo's newest hospital.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: A similarity between loop parallelism and thread parallelism?

<0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20701&group=comp.arch#20701

 copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4111:: with SMTP id q17mr13258569qtl.264.1633096300941;
Fri, 01 Oct 2021 06:51:40 -0700 (PDT)
X-Received: by 2002:aca:b5c3:: with SMTP id e186mr4053342oif.51.1633096300392;
Fri, 01 Oct 2021 06:51:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 1 Oct 2021 06:51:40 -0700 (PDT)
In-Reply-To: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e849:c6d9:1c51:4f04;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e849:c6d9:1c51:4f04
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com>
Subject: Re: A similarity between loop parallelism and thread parallelism?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 01 Oct 2021 13:51:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 92
 by: MitchAlsup - Fri, 1 Oct 2021 13:51 UTC

On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
> [This post is less complete/coherent than I had hoped but
> still might be worth posting.]
>
> In a loop three kinds of values can be distinguished: in-loop
> constants, counter-like values that can be generated in a
> straightforward manner for any iteration based on in-loop
> constants and an interation identifier, and values dependent
> on other iterations.
>
> (With respect to "generated in a straightforward manner" I
> was thinking mainly of stride-based values, but
> "straightforward" seems to be more concerned with the cost of
> redundant work — including reading values — relative to the
> parallelism exposed.)
>
> Mitch Alsup's My 66000's Virtual Vector Method declares
> in-loop constants but does not (if I understand correctly)
> indicate values of the second kind (which can be generated
> in parallel with modest extra work), perhaps expecting
> hardware to recognize accumulations and "unroll" them into a
> parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]=
> c[i]+2N, c[i+3]=c[i]+3N, etc.).
<
After thinking about this overnight, loop level parallelism is
fundamentally different than thread level parallelism. Thread
level parallelism can only use memory to carry data from iteration
to iteration and cannot rely on the relative performance of iteration
to iteration that loop lever parallelism can rely.
>
> (Accumulations where intermediate values are not exposed may
> provide other work reductions, e.g., a carry-save adder does
> less hardware-work than a full adder when accumulating.
> Even with intermitent exposure of intermediate values, e.g.,
> conditional use or intermittent checkpointing, some
> optimization might be practical.)
>
> It seems that thread parallelism would have similar
> distinctions of value types. As with a loop there may be
> value-communication constraints on how much "unrolling" is
> possible, but it seems that there would still be significant
> opportunities. The broader decoupling of thread parallelism
> increases the difficulty of software "unrolling". As with a
> loop, the iteration count and parallel execution resource
> availability influence cost-benefit, but with threads the
> "iteration count" (expected/average/exploitable concurrent
> threads performing a task (or accessing shared values/names)
> can be more dynamic and the parallel execution resources can
> typically scale more.
>
> Lock elision (one use of transactional memory) seeks to
> remove something like a name dependency on the lock name. It
> seems this could be expanded to a broader range of name
> dependencies (perhaps via versioned memory) and "iteration"
> dependencies [I rather suspect there is a CS term for this].
>
> My 66000's Exotic (Enhanced?) Synchronization Method provides
> a facility for something like iteration count to avoid
> artificial conflict/dependence, though (if I understand
> correctly) this is only exposed after a conflict and the
> generality (no software-provided information) seems to
> constrain the information hardware can provide to software
> or exploit directly. (There might be circumstances where
> hardware could provide an "iteration ID" or version/timestamp
> that could be used to generate unique values.)
>
> (Some generated values may be friendly to temporal tricks.
> E.g., a pointer returned by a memory allocation could use a
> dummy address that is updated with an actual address later.
> Performance counter incrementing typically does not return
> a value even if precise [non-atomic updates can be acceptably
> accurate for some performance monitoring]. This also seems to
> touch on the tradeoff of software partitioning a resource so
> that atomic/unique updates are independent across partitions
> compared with hardware increasing the speed (and/or hiding
> the latency) of atomic operations.)
>
> With loops, partially from programming convenience if
> largely from semantic necessity, collection indexes are
> shared among collections. With threads, there seems to more
> often be a decoupling, single-use may be required but there
> may be no coupling among collections. (E.g., income and
> expenses must be single-use and the binding has a temporal
> component, but the binding is not necessarily strict [e.g.,
> barring certain accounting rules] between which source of
> income goes to which expense.)
>
> I want to think on this more, but I felt some urgency to
> post something. This may be something taught in Introduction
> to Parallelism Theory in CS ciricula, but it seemed an
> interesting connection with potential applications.

Re: A similarity between loop parallelism and thread parallelism?

<sj8h2p$q30$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20709&group=comp.arch#20709

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: A similarity between loop parallelism and thread parallelism?
Date: Fri, 1 Oct 2021 19:44:42 -0700
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sj8h2p$q30$1@dont-email.me>
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
<0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 2 Oct 2021 02:44:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ef4b57ca18a8e605dd5472dfb0332189";
logging-data="26720"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4/WTnJx6OPU8P+CwrCNxn"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Cancel-Lock: sha1:MFKmGy2C4898F2BZbnYInIK7Gig=
In-Reply-To: <0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Sat, 2 Oct 2021 02:44 UTC

On 10/1/2021 6:51 AM, MitchAlsup wrote:
> On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
>> [This post is less complete/coherent than I had hoped but
>> still might be worth posting.]
>>
>> In a loop three kinds of values can be distinguished: in-loop
>> constants, counter-like values that can be generated in a
>> straightforward manner for any iteration based on in-loop
>> constants and an interation identifier, and values dependent
>> on other iterations.
>>
>> (With respect to "generated in a straightforward manner" I
>> was thinking mainly of stride-based values, but
>> "straightforward" seems to be more concerned with the cost of
>> redundant work — including reading values — relative to the
>> parallelism exposed.)
>>
>> Mitch Alsup's My 66000's Virtual Vector Method declares
>> in-loop constants but does not (if I understand correctly)
>> indicate values of the second kind (which can be generated
>> in parallel with modest extra work), perhaps expecting
>> hardware to recognize accumulations and "unroll" them into a
>> parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]=
>> c[i]+2N, c[i+3]=c[i]+3N, etc.).
> <
> After thinking about this overnight, loop level parallelism is
> fundamentally different than thread level parallelism. Thread
> level parallelism can only use memory to carry data from iteration
> to iteration and cannot rely on the relative performance of iteration
> to iteration that loop lever parallelism can rely.

Not true.

Re: A similarity between loop parallelism and thread parallelism?

<ceccf59b-798d-46aa-82b9-0732072f0c10n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20742&group=comp.arch#20742

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:70f:: with SMTP id b15mr7234844qvz.16.1633652204388;
Thu, 07 Oct 2021 17:16:44 -0700 (PDT)
X-Received: by 2002:a9d:7e8c:: with SMTP id m12mr5960868otp.227.1633652204108;
Thu, 07 Oct 2021 17:16:44 -0700 (PDT)
Path: rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 7 Oct 2021 17:16:43 -0700 (PDT)
In-Reply-To: <sj8h2p$q30$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ec77:5d4:3f50:5858;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ec77:5d4:3f50:5858
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
<0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com> <sj8h2p$q30$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ceccf59b-798d-46aa-82b9-0732072f0c10n@googlegroups.com>
Subject: Re: A similarity between loop parallelism and thread parallelism?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 08 Oct 2021 00:16:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 38
 by: MitchAlsup - Fri, 8 Oct 2021 00:16 UTC

On Friday, October 1, 2021 at 9:44:44 PM UTC-5, Ivan Godard wrote:
> On 10/1/2021 6:51 AM, MitchAlsup wrote:
> > On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
> >> [This post is less complete/coherent than I had hoped but
> >> still might be worth posting.]
> >>
> >> In a loop three kinds of values can be distinguished: in-loop
> >> constants, counter-like values that can be generated in a
> >> straightforward manner for any iteration based on in-loop
> >> constants and an interation identifier, and values dependent
> >> on other iterations.
> >>
> >> (With respect to "generated in a straightforward manner" I
> >> was thinking mainly of stride-based values, but
> >> "straightforward" seems to be more concerned with the cost of
> >> redundant work — including reading values — relative to the
> >> parallelism exposed.)
> >>
> >> Mitch Alsup's My 66000's Virtual Vector Method declares
> >> in-loop constants but does not (if I understand correctly)
> >> indicate values of the second kind (which can be generated
> >> in parallel with modest extra work), perhaps expecting
> >> hardware to recognize accumulations and "unroll" them into a
> >> parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]=
> >> c[i]+2N, c[i+3]=c[i]+3N, etc.).
> > <
> > After thinking about this overnight, loop level parallelism is
> > fundamentally different than thread level parallelism. Thread
> > level parallelism can only use memory to carry data from iteration
> > to iteration and cannot rely on the relative performance of iteration
> > to iteration that loop lever parallelism can rely.
<
> Not true.
<
How to threads communicate other than through memory?
{Note: files/pipes/streams are memory in this scenario!}

Re: A similarity between loop parallelism and thread parallelism?

<xaN7J.24002$d82.17430@fx21.iad>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20747&group=comp.arch#20747

 copy link   Newsgroups: comp.arch
Path: rocksolid2!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx21.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@icloud.com (Branimir Maksimovic)
Subject: Re: A similarity between loop parallelism and thread parallelism?
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
<0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com>
<sj8h2p$q30$1@dont-email.me>
<ceccf59b-798d-46aa-82b9-0732072f0c10n@googlegroups.com>
User-Agent: slrn/1.0.3 (Darwin)
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Lines: 41
Message-ID: <xaN7J.24002$d82.17430@fx21.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Fri, 08 Oct 2021 01:30:05 UTC
Organization: usenet-news.net
Date: Fri, 08 Oct 2021 01:30:05 GMT
X-Received-Bytes: 2912
 by: Branimir Maksimovic - Fri, 8 Oct 2021 01:30 UTC

On 2021-10-08, MitchAlsup <MitchAlsup@aol.com> wrote:
> On Friday, October 1, 2021 at 9:44:44 PM UTC-5, Ivan Godard wrote:
>> On 10/1/2021 6:51 AM, MitchAlsup wrote:
>> > On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton
>> > wrote:
>> >> [This post is less complete/coherent than I had hoped but still might be
>> >> worth posting.]
>> >>
>> >> In a loop three kinds of values can be distinguished: in-loop constants,
>> >> counter-like values that can be generated in a straightforward manner for
>> >> any iteration based on in-loop constants and an interation identifier,
>> >> and values dependent on other iterations.
>> >>
>> >> (With respect to "generated in a straightforward manner" I was thinking
>> >> mainly of stride-based values, but "straightforward" seems to be more
>> >> concerned with the cost of redundant work — including reading values —
>> >> relative to the parallelism exposed.)
>> >>
>> >> Mitch Alsup's My 66000's Virtual Vector Method declares in-loop constants
>> >> but does not (if I understand correctly) indicate values of the second
>> >> kind (which can be generated in parallel with modest extra work), perhaps
>> >> expecting hardware to recognize accumulations and "unroll" them into a
>> >> parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]= c[i]+2N,
>> >> c[i+3]=c[i]+3N, etc.).
>> > < After thinking about this overnight, loop level parallelism is
>> > fundamentally different than thread level parallelism. Thread level
>> > parallelism can only use memory to carry data from iteration to iteration
>> > and cannot rely on the relative performance of iteration to iteration that
>> > loop lever parallelism can rely.
><
>> Not true.
>< How to threads communicate other than through memory? {Note:
>files/pipes/streams are memory in this scenario!}
/it's a matter if more cores access same memoru cell, that is different....

--

7-77-777
Evil Sinner!
with software, you repeat same experiment, expecting different results...

Re: A similarity between loop parallelism and thread parallelism?

<sjoisj$mar$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=20753&group=comp.arch#20753

 copy link   Newsgroups: comp.arch
Path: rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: A similarity between loop parallelism and thread parallelism?
Date: Thu, 7 Oct 2021 21:53:40 -0700
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <sjoisj$mar$1@dont-email.me>
References: <7bc9d176-6e32-475e-857e-4614b91762c1n@googlegroups.com>
<0bff3a03-7999-4255-b373-f8bd28299f4dn@googlegroups.com>
<sj8h2p$q30$1@dont-email.me>
<ceccf59b-798d-46aa-82b9-0732072f0c10n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 8 Oct 2021 04:53:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d1d272887b4bad0d5b4c01b0d56b7f5e";
logging-data="22875"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+7cKmQGNW4dHKU7uzWh6ag"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Cancel-Lock: sha1:uq4v+F/tHWX+c2n9xx1TJqYNRbY=
In-Reply-To: <ceccf59b-798d-46aa-82b9-0732072f0c10n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Fri, 8 Oct 2021 04:53 UTC

On 10/7/2021 5:16 PM, MitchAlsup wrote:
> On Friday, October 1, 2021 at 9:44:44 PM UTC-5, Ivan Godard wrote:
>> On 10/1/2021 6:51 AM, MitchAlsup wrote:
>>> On Thursday, September 30, 2021 at 12:45:23 PM UTC-5, Paul A. Clayton wrote:
>>>> [This post is less complete/coherent than I had hoped but
>>>> still might be worth posting.]
>>>>
>>>> In a loop three kinds of values can be distinguished: in-loop
>>>> constants, counter-like values that can be generated in a
>>>> straightforward manner for any iteration based on in-loop
>>>> constants and an interation identifier, and values dependent
>>>> on other iterations.
>>>>
>>>> (With respect to "generated in a straightforward manner" I
>>>> was thinking mainly of stride-based values, but
>>>> "straightforward" seems to be more concerned with the cost of
>>>> redundant work — including reading values — relative to the
>>>> parallelism exposed.)
>>>>
>>>> Mitch Alsup's My 66000's Virtual Vector Method declares
>>>> in-loop constants but does not (if I understand correctly)
>>>> indicate values of the second kind (which can be generated
>>>> in parallel with modest extra work), perhaps expecting
>>>> hardware to recognize accumulations and "unroll" them into a
>>>> parallel form (c[i+1]=c[i]+N being unrolled with c[i+2]=
>>>> c[i]+2N, c[i+3]=c[i]+3N, etc.).
>>> <
>>> After thinking about this overnight, loop level parallelism is
>>> fundamentally different than thread level parallelism. Thread
>>> level parallelism can only use memory to carry data from iteration
>>> to iteration and cannot rely on the relative performance of iteration
>>> to iteration that loop lever parallelism can rely.
> <
>> Not true.
> <
> How to threads communicate other than through memory?
> {Note: files/pipes/streams are memory in this scenario!}
>

To take only a widely known hardware example, through HEYU interrupts.
The hardware interrupt itself, not the RPC that is built with it.

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor