Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Victory or defeat!


devel / comp.arch / Re: Vector ISA Categorisation

SubjectAuthor
* Split register filesThomas Koenig
+* Re: Split register filesIvan Godard
|`* Re: Split register filesThomas Koenig
| `* Re: Split register filesBrett
|  `* Re: Split register filesThomas Koenig
|   `* Re: Split register filesBrett
|    `* Re: Split register filesBrett
|     `* Re: Split register filesIvan Godard
|      `* Re: Split register filesBrett
|       +* Re: Split register filesIvan Godard
|       |+* Re: Split register filesStefan Monnier
|       ||`* Re: Split register filesIvan Godard
|       || +- Re: Split register filesStephen Fuld
|       || +- Re: Split register filesStefan Monnier
|       || `* Rescue vs scratchpad (was: Split register files)Stefan Monnier
|       ||  `- Re: Rescue vs scratchpad (was: Split register files)Ivan Godard
|       |`* Re: Split register filesBrett
|       | `* Re: Split register filesIvan Godard
|       |  `* Re: Split register filesBrett
|       |   `* Re: Split register filesIvan Godard
|       |    `* Re: Mill conAsm vs genAsm (was: Split register files)Marcus
|       |     `* Re: Mill conAsm vs genAsm (was: Split register files)Ivan Godard
|       |      `* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       +* Re: Mill conAsm vs genAsm (was: Split register files)Ivan Godard
|       |       |+* Re: Mill conAsm vs genAsm (was: Split register files)MitchAlsup
|       |       ||`* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || +* Re: Mill conAsm vs genAsm (was: Split register files)MitchAlsup
|       |       || |+* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || ||`* Re: Mill conAsm vs genAsm (was: Split register files)Marcus
|       |       || || `* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || ||  `* Re: Mill conAsm vs genAsm (was: Split register files)Marcus
|       |       || ||   `* Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    +* Re: Vector ISA CategorisationStephen Fuld
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |`* Re: Vector ISA CategorisationStefan Monnier
|       |       || ||    | `- Re: Vector ISA CategorisationStephen Fuld
|       |       || ||    +* Re: Vector ISA CategorisationMarcus
|       |       || ||    |+* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    ||`* Re: Vector ISA Categorisationmbitsnbites
|       |       || ||    || +* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || |`- Re: Vector ISA CategorisationMarcus
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +* Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || |`* Re: Vector ISA CategorisationIvan Godard
|       |       || ||    || | `- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +* Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || |`* Re: Vector ISA CategorisationMarcus
|       |       || ||    || | `- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || `- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |+* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    ||+- Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    ||`* Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    || +- Re: Vector ISA CategorisationIvan Godard
|       |       || ||    || `- Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    |+* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    ||`* Re: Vector ISA CategorisationEricP
|       |       || ||    || +* Re: Vector ISA CategorisationStefan Monnier
|       |       || ||    || |`- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || |`* Re: Vector ISA CategorisationEricP
|       |       || ||    || | `- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || +- Re: Vector ISA CategorisationQuadibloc
|       |       || ||    || +* Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    || |`* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || | `* Re: Vector ISA CategorisationThomas Koenig
|       |       || ||    || |  `- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    || `- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |+- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    |+- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    |`* Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    | `* Re: Vector ISA CategorisationTerje Mathisen
|       |       || ||    |  `- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    +- Re: Vector ISA CategorisationMitchAlsup
|       |       || ||    +- Re: Vector ISA Categorisationluke.l...@gmail.com
|       |       || ||    `- Re: Vector ISA CategorisationMitchAlsup
|       |       || |`* Re: Mill conAsm vs genAsm (was: Split register files)Quadibloc
|       |       || `* Re: Mill conAsm vs genAsm (was: Split register files)luke.l...@gmail.com
|       |       |`* Re: Mill conAsm vs genAsm (was: Split register files)Paul A. Clayton
|       |       `* Re: Mill conAsm vs genAsmStefan Monnier
|       +* Re: Split register filesStefan Monnier
|       `* Re: Split register filesThomas Koenig
+* Re: Split register filesJohn Dallman
+* Re: Split register filesAnton Ertl
+- Re: Split register filesStefan Monnier
`* Re: Split register filesMitchAlsup

Pages:12345678
Re: Vector ISA Categorisation

<ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18723&group=comp.arch#18723

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:71c1:: with SMTP id m184mr7125923qkc.367.1626219988440;
Tue, 13 Jul 2021 16:46:28 -0700 (PDT)
X-Received: by 2002:a05:6830:830:: with SMTP id t16mr2189977ots.82.1626219988064;
Tue, 13 Jul 2021 16:46:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 16:46:27 -0700 (PDT)
In-Reply-To: <5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Jul 2021 23:46:28 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 13 Jul 2021 23:46 UTC

On Tuesday, July 13, 2021 at 5:15:50 PM UTC-5, Quadibloc wrote:
> On Tuesday, July 13, 2021 at 10:39:42 AM UTC-6, MitchAlsup wrote:
>
> > Instead of a vector ISA, My 66000 created two bookend instruction-modifiers
> > (VEC and LOOP) which define the HW semantics of the vectorized loop. The
> > instructions being vectorized are from the std ISA. So, we get complete
> > vectorization by adding exactly 2 instructions and never need any more.
> I'm now working on laying groundwork for, at some later date, including something
> vaguely resembling this in Concertina II.
>
> http://www.quadibloc.com/arch/ct14int.htm
>
> Since Concertina II is built around fetching 256-bit chunks of code, and decoding
> the instructions in them in parallel, I had to add block header formats in which
> it was possible to label instructions as "prefixed". Essentially, this would apply
> to everything between VEC and LOOP. The idea is to delay decoding and execution
> of the indicated instructions until the VEC instruction is processed, so that the
> actions taken in response to the instructions can be modified.
<
VEC can be completely executed in DECODE stage of the pipeline. The destination
register gets the address of the successive instruction, and the immediate is used
to seed which registers carry iteration dependencies (also a DECODE thing). All of
this is known and manifest in DECODE.
>
> Of course, in this particular case, only execution, and not decoding, is really
> changed - unless one includes conversion to micro-ops as part of decoding,
> though. Also, the mechanism is a general one.
>
> There will be plenty of differences between what VVM will become after I get my
> sticky paws on it than what it is in your design, I must confess. So if I give you credit
> for its virtues, I will also naturally include a disclaimer that you are in no way to
> blame for its flaws as I have changed it.
>
> One I've noted: I will explicitly differentiate 'real registers' from 'dataflow nodes'
> by allocating the first 1/4 of register specifier space to the former.
<
I found no particular need to do this as it can easily be done while the loop is being
installed in the stations.
>
> Another is that a branch instruction won't terminate the loop. If one is found between
> my equivalents of VEC and LOOP, an illegal instruction exception will be thrown.
<
My hunch is that you will live to regret this decision.
>
> That's because I'm conceptualizing my version as building a dataflow machine
> inside the CPU, and so branches don't belong. Of course, there is an exit condition,
> but that will be put at the beginning of the sequence, along with the increment clause.
<
So, you are giving up on vectorizing "middle-out" exits of loops ?
>
> So the instruction you call VEC would get called something like BXDS (Build and
> Execute Dataflow Sequence) and LOOP would just become END.
<
Why not just "DO" and "END"
>
> In other recent changes, I've made it possible to have a header that's only 16 bits
> long, so as to return a block format with mostly 16-bit instructions for extra-compact
> code; this also allowed adding a 48-bit header, so that the extended format instructions
> with arbitrary mixing of 16, 32, 48, and 64-bit instructions no longer require a full
> 64 bits of overhead.
>
> So both of these changes are aimed at allowing more compact code.
>
> John Savard

Re: Vector ISA Categorisation

<ff7dddfd-c15c-4793-a736-5622e0d8b425n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18724&group=comp.arch#18724

  copy link   Newsgroups: comp.arch
X-Received: by 2002:aed:3131:: with SMTP id 46mr6566321qtg.253.1626220405904;
Tue, 13 Jul 2021 16:53:25 -0700 (PDT)
X-Received: by 2002:a9d:5f19:: with SMTP id f25mr6036187oti.206.1626220405699;
Tue, 13 Jul 2021 16:53:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 16:53:25 -0700 (PDT)
In-Reply-To: <f91f3a66-f4f6-4367-9c70-2d2ba5cf4a37n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <f91f3a66-f4f6-4367-9c70-2d2ba5cf4a37n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff7dddfd-c15c-4793-a736-5622e0d8b425n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Jul 2021 23:53:25 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Tue, 13 Jul 2021 23:53 UTC

On Tuesday, July 13, 2021 at 5:25:49 PM UTC-5, Quadibloc wrote:
> On Tuesday, July 13, 2021 at 5:56:10 AM UTC-6, luke.l...@gmail.com wrote:
> > does QTY 2 128-entry 64-bit regfiles with 12 read and
> > 10 write ports sound like it's a good idea?
> Regrettably, no.
>
> Read ports are fairly easy to do. There is simple and well-known circuitry
> for having up to four or so read ports in a memory - and you can extend that
> to an arbitrarily high number just by writing multiple copies of the memory in
> parallel.
>
> But write ports are a different matter entirely. Yes, memories have been designed
> with more than one write port, as in *two*. But that still involves having to be able
> to serialize the writes if they happen to be in conflict.
>
> A memory with 10 write ports can basically be regarded as impossible.
<
As someone who actually built (as in desing, SPICE, circuit design, layout,
test, and verification) of a 6R6W register file in a 4 metal process; I have to agree
that a 10W file is simply not feasible. 6W was just barely doable and might not
be doable today in FINFET technology. I used every notion of transistor sizing
to make it work across process, temperature, and voltage at 6R6W.
<
Then we used 2 copies of this file to get 12R6W capability.
>
> Of course, if a whole bunch of writes, with no intervening reads, arrive at once
> to the same location, one could decide that it doesn't really even matter which of
> those writes actually happens - and so if one declines to support that kind of
> situation, maybe one could make a memory with many write ports that just did
> all the writes in parallel, and just killed superfluous simultaneous writes.
<
This never happens after one renames the registers, each write to architectural
register K takes place to physical register rename[k] and we are guaranteed that
rename[k] is unique in this table.
>
> The problem with that is if things can be delayed enough to be out of sequence,
> but that's really a problem that is external to the memory...
<
You have to have successive writes by instructions which did not fit into the execution
window simultaneously.
>
> John Savard

Re: Vector ISA Categorisation

<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18730&group=comp.arch#18730

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9504:: with SMTP id x4mr7121305qkd.235.1626225636134;
Tue, 13 Jul 2021 18:20:36 -0700 (PDT)
X-Received: by 2002:aca:dbd6:: with SMTP id s205mr839869oig.155.1626225635855;
Tue, 13 Jul 2021 18:20:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 18:20:35 -0700 (PDT)
In-Reply-To: <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:3409:149f:aa05:d54e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:3409:149f:aa05:d54e
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 01:20:36 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 01:20 UTC

On Tuesday, July 13, 2021 at 5:46:29 PM UTC-6, MitchAlsup wrote:
> On Tuesday, July 13, 2021 at 5:15:50 PM UTC-5, Quadibloc wrote:

> > Since Concertina II is built around fetching 256-bit chunks of code, and decoding
> > the instructions in them in parallel, I had to add block header formats in which
> > it was possible to label instructions as "prefixed". Essentially, this would apply
> > to everything between VEC and LOOP. The idea is to delay decoding and execution
> > of the indicated instructions until the VEC instruction is processed, so that the
> > actions taken in response to the instructions can be modified.

> VEC can be completely executed in DECODE stage of the pipeline. The destination
> register gets the address of the successive instruction, and the immediate is used
> to seed which registers carry iteration dependencies (also a DECODE thing). All of
> this is known and manifest in DECODE.

Yes, but that doesn't help me if I'm decoding multiple instructions in
parallel. I still have to prevent the instructions following the VEC from being
decoded as if there is no VEC before the VEC is applied to them.

> > One I've noted: I will explicitly differentiate 'real registers' from 'dataflow nodes'
> > by allocating the first 1/4 of register specifier space to the former.

> I found no particular need to do this as it can easily be done while the loop is being
> installed in the stations.

Basically, my hope is that it will make things easier for people with less implementation
skills...

> > Another is that a branch instruction won't terminate the loop. If one is found between
> > my equivalents of VEC and LOOP, an illegal instruction exception will be thrown.

> My hunch is that you will live to regret this decision.

> > That's because I'm conceptualizing my version as building a dataflow machine
> > inside the CPU, and so branches don't belong. Of course, there is an exit condition,
> > but that will be put at the beginning of the sequence, along with the increment clause.

> So, you are giving up on vectorizing "middle-out" exits of loops ?

What I'm defining _isn't a loop_. And I don't want the programmer to make
the mistake of thinking of it as a loop.

However, it _would_ be useful to at least allow instruction predication; here,
the issue comes up that I can't use the existing predication mechanism,
as it's "outside" the vectorized sequence. But if I put a predication prefix
inside the instruction stream, I have no way to indicate doubly-prefixed
instructions. I don't think that's fatal, though; it could be just
a characteristic of how the VEC prefix is handled that predication
prefixes are carefully watched for.

My picture is that there's a preamble that increments index registers
and stuff like that; it's applied between 'iterations'. But once that's done,
the first vectorized instruction in the sequence immediately repeats;
and the second instruction repeats each time whatever it needs is available, and
so on.

Each instruction behaves like one ALU station in a dataflow machine, continuously
repeating, independently of everything else, as its inputs arrive.
Each of them, however, can recieve a global signal which propagates
with their required inputs that the sequence has ended.

So the first 20 instructions could all have repeated 10 times before the 21st instruction,
the *ninth* time around, decides it's time to exit from the loop. That's why I
can't allow for early exits.

I'm not planning to demand this feature be implemented with the kind of sophistication
required for out-of-order execution with precise exceptions.

What!? If these loops are allowed to access _memory_ what's going
to happen if there's a cache miss?

The *dataflow sequence* itself is not interruptible, but it can certainly
wait as long as it needs to for its required inputs. So if some action to
be taken to get those inputs requires an interrupt routine to be executed,
that needs to happen 'elsewhere'.

Does that mean that this feature requires SMT or multicore? No, that would be
a bad idea.

Let's say that the vector sequence does something that requires an interrupt
routine to happen. In that case, the LOOP instruction is set as the
return from the interrupt.

The LOOP instruction waits until the vectorized computation is finished
before allowing execution of the rest of the program to proceed. So after
the return from the interrupt, the needed data has been provided to
whatever point within the vector sequence required it, and the vector
execution continues.

Although the normal ALUs and the pipeline of the CPU are used, for many
purposes, things behave as if the vector sequence is running on a separate
processor, or at least on an additional SMT-style thread. It's outside of
normal program execution, because in effect the CPU is 'rewired' to become
a dataflow machine.

Of course, that's only an illusion, since the CPU works normally while the
interrupt routine runs.

John Savard

Re: Vector ISA Categorisation

<f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18733&group=comp.arch#18733

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1105:: with SMTP id e5mr7227612qty.268.1626229738527; Tue, 13 Jul 2021 19:28:58 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr6096208oti.178.1626229738284; Tue, 13 Jul 2021 19:28:58 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news2.arglkargh.de!news.karotte.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 19:28:58 -0700 (PDT)
In-Reply-To: <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me> <fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com> <5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589
-4b4a-8447-39128dab7af8n@googlegroups.com> <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Jul 2021 02:28:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 145
 by: MitchAlsup - Wed, 14 Jul 2021 02:28 UTC

On Tuesday, July 13, 2021 at 8:20:37 PM UTC-5, Quadibloc wrote:
> On Tuesday, July 13, 2021 at 5:46:29 PM UTC-6, MitchAlsup wrote:
> > On Tuesday, July 13, 2021 at 5:15:50 PM UTC-5, Quadibloc wrote:
>
> > > Since Concertina II is built around fetching 256-bit chunks of code, and decoding
> > > the instructions in them in parallel, I had to add block header formats in which
> > > it was possible to label instructions as "prefixed". Essentially, this would apply
> > > to everything between VEC and LOOP. The idea is to delay decoding and execution
> > > of the indicated instructions until the VEC instruction is processed, so that the
> > > actions taken in response to the instructions can be modified.
>
> > VEC can be completely executed in DECODE stage of the pipeline. The destination
> > register gets the address of the successive instruction, and the immediate is used
> > to seed which registers carry iteration dependencies (also a DECODE thing). All of
> > this is known and manifest in DECODE.
> Yes, but that doesn't help me if I'm decoding multiple instructions in
> parallel. I still have to prevent the instructions following the VEC from being
> decoded as if there is no VEC before the VEC is applied to them.
> > > One I've noted: I will explicitly differentiate 'real registers' from 'dataflow nodes'
> > > by allocating the first 1/4 of register specifier space to the former.
>
> > I found no particular need to do this as it can easily be done while the loop is being
> > installed in the stations.
> Basically, my hope is that it will make things easier for people with less implementation
> skills...
<
These are not the people you want building your first implementation........
<
> > > Another is that a branch instruction won't terminate the loop. If one is found between
> > > my equivalents of VEC and LOOP, an illegal instruction exception will be thrown.
>
> > My hunch is that you will live to regret this decision.
>
> > > That's because I'm conceptualizing my version as building a dataflow machine
> > > inside the CPU, and so branches don't belong. Of course, there is an exit condition,
> > > but that will be put at the beginning of the sequence, along with the increment clause.
>
> > So, you are giving up on vectorizing "middle-out" exits of loops ?
<
> What I'm defining _isn't a loop_. And I don't want the programmer to make
> the mistake of thinking of it as a loop.
>
> However, it _would_ be useful to at least allow instruction predication; here,
> the issue comes up that I can't use the existing predication mechanism,
> as it's "outside" the vectorized sequence.
<
Predication casting instructions are NOT CONTROL TRANSFER instructions !
They are conditional execution casting instruction-modifiers.
Since control is not transferred, these can be used inside vectorized loops!
<
> But if I put a predication prefix
> inside the instruction stream, I have no way to indicate doubly-prefixed
> instructions. I don't think that's fatal, though; it could be just
> a characteristic of how the VEC prefix is handled that predication
> prefixes are carefully watched for.
>
> My picture is that there's a preamble that increments index registers
> and stuff like that; it's applied between 'iterations'. But once that's done,
> the first vectorized instruction in the sequence immediately repeats;
> and the second instruction repeats each time whatever it needs is available, and
> so on.
>
> Each instruction behaves like one ALU station in a dataflow machine, continuously
> repeating, independently of everything else, as its inputs arrive.
> Each of them, however, can recieve a global signal which propagates
> with their required inputs that the sequence has ended.
<
Basically correct:: each instruction in a vectorized loop can execute at least once
per cycle when data-flow dependencies allow. So, even a 1-wide machine can
execute at 5-10-20-40 instruction per cycle. Certainly wider machines can
perform at even higher rates.
>
> So the first 20 instructions could all have repeated 10 times before the 21st instruction,
> the *ninth* time around, decides it's time to exit from the loop. That's why I
> can't allow for early exits.
<
They might have completed execution, but they have not delivered results.
And this is key !
>
> I'm not planning to demand this feature be implemented with the kind of sophistication
> required for out-of-order execution with precise exceptions.
<
My 66000 Vectorized loops actually in a minimal partial order so
cleaning up the mess goes from exponential to quadratic.
>
> What!? If these loops are allowed to access _memory_ what's going
> to happen if there's a cache miss?
<
Cache lines are fetched, if the memory reference did not complete, the
arriving data is not placed in the cache (and might have to be sent back
whence it came).
>
> The *dataflow sequence* itself is not interruptible, but it can certainly
> wait as long as it needs to for its required inputs. So if some action to
> be taken to get those inputs requires an interrupt routine to be executed,
> that needs to happen 'elsewhere'.
>
> Does that mean that this feature requires SMT or multicore? No, that would be
> a bad idea.
<
Vectorized My 66000 loops achieve a higher gain from lower capability
implementations (1-wide in order) than from GBOoO. This should take
pressure off of having to build the super GBOoO machine.
>
> Let's say that the vector sequence does something that requires an interrupt
> routine to happen. In that case, the LOOP instruction is set as the
> return from the interrupt.
<
No the instruction causing the exception is the one pointed at by IP.
Execution returns and finished the loop in Scalar mode before setting
up the vectorized loop again.
>
> The LOOP instruction waits until the vectorized computation is finished
> before allowing execution of the rest of the program to proceed.
>
In practice, yes. Mandated: no. But the immediate associated with VEC
tells the instructions following the loop whether they are dependent on
calculations taking place within the loop (or not). So figuring out who can
run and who cannot is fairly easy. Conversely, you are going to need
many of the reservation stations to manage the iteration to iteration
characteristics of the vectorized loop. So, these resources might not
help the instructions following the loop anyway.
<
> So after
> the return from the interrupt, the needed data has been provided to
> whatever point within the vector sequence required it, and the vector
> execution continues.
<
When an exception is raised, registers obtain the contents as if the
loop had been processed in scalar mode, and thus, one returns to
the loop in scalar form everything already setup ready to go again.
>
> Although the normal ALUs and the pipeline of the CPU are used, for many
> purposes, things behave as if the vector sequence is running on a separate
> processor, or at least on an additional SMT-style thread. It's outside of
> normal program execution, because in effect the CPU is 'rewired' to become
> a dataflow machine.
<
The HW GUY will see the scalar to SIMD transition taking place "near" the cache
The SW guy will see performance improve but is not enlightened as to where
all that performance came from.
>
> Of course, that's only an illusion, since the CPU works normally while the
> interrupt routine runs.
>
> John Savard

Re: Vector ISA Categorisation

<6fc3a65d-60b0-4704-a8c5-be4e5500bd82n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18734&group=comp.arch#18734

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:14e2:: with SMTP id k2mr8143414qvw.21.1626231205837;
Tue, 13 Jul 2021 19:53:25 -0700 (PDT)
X-Received: by 2002:a9d:4c9a:: with SMTP id m26mr6003942otf.110.1626231205609;
Tue, 13 Jul 2021 19:53:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 19:53:25 -0700 (PDT)
In-Reply-To: <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:3409:149f:aa05:d54e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:3409:149f:aa05:d54e
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6fc3a65d-60b0-4704-a8c5-be4e5500bd82n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 02:53:25 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 02:53 UTC

On Tuesday, July 13, 2021 at 8:28:59 PM UTC-6, MitchAlsup wrote:
> On Tuesday, July 13, 2021 at 8:20:37 PM UTC-5, Quadibloc wrote:

> > Basically, my hope is that it will make things easier for people with less implementation
> > skills...

> These are not the people you want building your first implementation........

That's true. But I want it to be capable of wide adoption...

> > > So, you are giving up on vectorizing "middle-out" exits of loops ?

> > What I'm defining _isn't a loop_. And I don't want the programmer to make
> > the mistake of thinking of it as a loop.

> > However, it _would_ be useful to at least allow instruction predication; here,
> > the issue comes up that I can't use the existing predication mechanism,
> > as it's "outside" the vectorized sequence.

> Predication casting instructions are NOT CONTROL TRANSFER instructions !
> They are conditional execution casting instruction-modifiers.
> Since control is not transferred, these can be used inside vectorized loops!

No, that's quite true, they aren't. But they allow one to, based on a condition,
execute, or not execute, some code. That's why I thought of them.

Since, however, the sequence is being executed in a dataflow manner, there
is no real control flow. I suppose I could use the sequence to imply a data
path for the condition code bits... but since I have to build an entirely new
predication mechanism anyways, I might as well build an appropriate one...

> > So the first 20 instructions could all have repeated 10 times before the 21st instruction,
> > the *ninth* time around, decides it's time to exit from the loop. That's why I
> > can't allow for early exits.

> They might have completed execution, but they have not delivered results.
> And this is key !

Oh, yes, they could, if they were memory-reference store instructions.

> In practice, yes. Mandated: no. But the immediate associated with VEC
> tells the instructions following the loop whether they are dependent on
> calculations taking place within the loop (or not). So figuring out who can
> run and who cannot is fairly easy. Conversely, you are going to need
> many of the reservation stations to manage the iteration to iteration
> characteristics of the vectorized loop. So, these resources might not
> help the instructions following the loop anyway.

In fact, I expect to need so many reservation stations that I might add
a bunch of extra ones that are too simple to be used for OoO execution.

John Savard

Re: Vector ISA Categorisation

<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18735&group=comp.arch#18735

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5f83:: with SMTP id j3mr7212143qta.149.1626231471253;
Tue, 13 Jul 2021 19:57:51 -0700 (PDT)
X-Received: by 2002:a4a:95f2:: with SMTP id p47mr6174822ooi.40.1626231471012;
Tue, 13 Jul 2021 19:57:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 19:57:50 -0700 (PDT)
In-Reply-To: <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:3409:149f:aa05:d54e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:3409:149f:aa05:d54e
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 02:57:51 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 02:57 UTC

On Tuesday, July 13, 2021 at 8:28:59 PM UTC-6, MitchAlsup wrote:
> On Tuesday, July 13, 2021 at 8:20:37 PM UTC-5, Quadibloc wrote:

> > So after
> > the return from the interrupt, the needed data has been provided to
> > whatever point within the vector sequence required it, and the vector
> > execution continues.

> When an exception is raised, registers obtain the contents as if the
> loop had been processed in scalar mode, and thus, one returns to
> the loop in scalar form everything already setup ready to go again.

That, in a nutshell, is _precisely_ the capability that I have no intention
of requiring people implementing the Concertina II architecture to
include. Instead, implementations would normally be incapable of
unravelling things from vector mode back to scalar mode, which is why
I basically regard a vector computation of this type as going on
independently in an uninterruptible fashion.

I'm trying to keep things very simple. Despite appearances.

John Savard

Re: Vector ISA Categorisation

<a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18736&group=comp.arch#18736

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5e08:: with SMTP id h8mr7166946qtx.54.1626231855644;
Tue, 13 Jul 2021 20:04:15 -0700 (PDT)
X-Received: by 2002:aca:a84a:: with SMTP id r71mr144784oie.0.1626231855281;
Tue, 13 Jul 2021 20:04:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Jul 2021 20:04:15 -0700 (PDT)
In-Reply-To: <9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5862:643b:ebd2:6621;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5862:643b:ebd2:6621
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Jul 2021 03:04:15 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 14 Jul 2021 03:04 UTC

On Tuesday, July 13, 2021 at 9:57:52 PM UTC-5, Quadibloc wrote:
> On Tuesday, July 13, 2021 at 8:28:59 PM UTC-6, MitchAlsup wrote:
> > On Tuesday, July 13, 2021 at 8:20:37 PM UTC-5, Quadibloc wrote:
>
> > > So after
> > > the return from the interrupt, the needed data has been provided to
> > > whatever point within the vector sequence required it, and the vector
> > > execution continues.
>
> > When an exception is raised, registers obtain the contents as if the
> > loop had been processed in scalar mode, and thus, one returns to
> > the loop in scalar form everything already setup ready to go again.
<
> That, in a nutshell, is _precisely_ the capability that I have no intention
> of requiring people implementing the Concertina II architecture to
> include. Instead, implementations would normally be incapable of
> unravelling things from vector mode back to scalar mode, which is why
> I basically regard a vector computation of this type as going on
> independently in an uninterruptible fashion.
<
How, pray tell, do you intend on mandating this ??
>
> I'm trying to keep things very simple. Despite appearances.
>
> John Savard

Re: Vector ISA Categorisation

<scm0j1$935$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18741&group=comp.arch#18741

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Vector ISA Categorisation
Date: Tue, 13 Jul 2021 23:35:13 -0700
Organization: A noiseless patient Spider
Lines: 103
Message-ID: <scm0j1$935$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me>
<63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me>
<dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com>
<2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com>
<sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com>
<scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com>
<schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com>
<scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com>
<863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com>
<ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 14 Jul 2021 06:35:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c7b08b79d36dfc4ff85d3651a084697d";
logging-data="9317"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/zYTCjJh4xnLJmxYpdGaUR"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:SQoKubVT/imKXkdKBppXb+0KMU8=
In-Reply-To: <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 14 Jul 2021 06:35 UTC

On 7/13/2021 6:20 PM, Quadibloc wrote:
> On Tuesday, July 13, 2021 at 5:46:29 PM UTC-6, MitchAlsup wrote:
>> On Tuesday, July 13, 2021 at 5:15:50 PM UTC-5, Quadibloc wrote:
>
>>> Since Concertina II is built around fetching 256-bit chunks of code, and decoding
>>> the instructions in them in parallel, I had to add block header formats in which
>>> it was possible to label instructions as "prefixed". Essentially, this would apply
>>> to everything between VEC and LOOP. The idea is to delay decoding and execution
>>> of the indicated instructions until the VEC instruction is processed, so that the
>>> actions taken in response to the instructions can be modified.
>
>> VEC can be completely executed in DECODE stage of the pipeline. The destination
>> register gets the address of the successive instruction, and the immediate is used
>> to seed which registers carry iteration dependencies (also a DECODE thing). All of
>> this is known and manifest in DECODE.
>
> Yes, but that doesn't help me if I'm decoding multiple instructions in
> parallel. I still have to prevent the instructions following the VEC from being
> decoded as if there is no VEC before the VEC is applied to them.
>
>>> One I've noted: I will explicitly differentiate 'real registers' from 'dataflow nodes'
>>> by allocating the first 1/4 of register specifier space to the former.
>
>> I found no particular need to do this as it can easily be done while the loop is being
>> installed in the stations.
>
> Basically, my hope is that it will make things easier for people with less implementation
> skills...
>
>>> Another is that a branch instruction won't terminate the loop. If one is found between
>>> my equivalents of VEC and LOOP, an illegal instruction exception will be thrown.
>
>> My hunch is that you will live to regret this decision.
>
>>> That's because I'm conceptualizing my version as building a dataflow machine
>>> inside the CPU, and so branches don't belong. Of course, there is an exit condition,
>>> but that will be put at the beginning of the sequence, along with the increment clause.
>
>> So, you are giving up on vectorizing "middle-out" exits of loops ?
>
> What I'm defining _isn't a loop_. And I don't want the programmer to make
> the mistake of thinking of it as a loop.
>
> However, it _would_ be useful to at least allow instruction predication; here,
> the issue comes up that I can't use the existing predication mechanism,
> as it's "outside" the vectorized sequence. But if I put a predication prefix
> inside the instruction stream, I have no way to indicate doubly-prefixed
> instructions. I don't think that's fatal, though; it could be just
> a characteristic of how the VEC prefix is handled that predication
> prefixes are carefully watched for.
>
> My picture is that there's a preamble that increments index registers
> and stuff like that; it's applied between 'iterations'. But once that's done,
> the first vectorized instruction in the sequence immediately repeats;
> and the second instruction repeats each time whatever it needs is available, and
> so on.
>
> Each instruction behaves like one ALU station in a dataflow machine, continuously
> repeating, independently of everything else, as its inputs arrive.
> Each of them, however, can recieve a global signal which propagates
> with their required inputs that the sequence has ended.
>
> So the first 20 instructions could all have repeated 10 times before the 21st instruction,
> the *ninth* time around, decides it's time to exit from the loop. That's why I
> can't allow for early exits.
>
> I'm not planning to demand this feature be implemented with the kind of sophistication
> required for out-of-order execution with precise exceptions.
>
> What!? If these loops are allowed to access _memory_ what's going
> to happen if there's a cache miss?
>
> The *dataflow sequence* itself is not interruptible, but it can certainly
> wait as long as it needs to for its required inputs. So if some action to
> be taken to get those inputs requires an interrupt routine to be executed,
> that needs to happen 'elsewhere'.
>
> Does that mean that this feature requires SMT or multicore? No, that would be
> a bad idea.
>
> Let's say that the vector sequence does something that requires an interrupt
> routine to happen. In that case, the LOOP instruction is set as the
> return from the interrupt.
>
> The LOOP instruction waits until the vectorized computation is finished
> before allowing execution of the rest of the program to proceed. So after
> the return from the interrupt, the needed data has been provided to
> whatever point within the vector sequence required it, and the vector
> execution continues.
>
> Although the normal ALUs and the pipeline of the CPU are used, for many
> purposes, things behave as if the vector sequence is running on a separate
> processor, or at least on an additional SMT-style thread. It's outside of
> normal program execution, because in effect the CPU is 'rewired' to become
> a dataflow machine.
>
> Of course, that's only an illusion, since the CPU works normally while the
> interrupt routine runs.
>
> John Savard
>

How would you express this in C++ source?

Re: Vector ISA Categorisation

<0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18757&group=comp.arch#18757

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4a18:: with SMTP id x24mr9969018qtq.239.1626280357738; Wed, 14 Jul 2021 09:32:37 -0700 (PDT)
X-Received: by 2002:a9d:7f91:: with SMTP id t17mr8823110otp.22.1626280357502; Wed, 14 Jul 2021 09:32:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Jul 2021 09:32:37 -0700 (PDT)
In-Reply-To: <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me> <fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com> <5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589
-4b4a-8447-39128dab7af8n@googlegroups.com> <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com> <9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 16:32:37 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 40
 by: Quadibloc - Wed, 14 Jul 2021 16:32 UTC

On Tuesday, July 13, 2021 at 9:04:16 PM UTC-6, MitchAlsup wrote:
> On Tuesday, July 13, 2021 at 9:57:52 PM UTC-5, Quadibloc wrote:

> > That, in a nutshell, is _precisely_ the capability that I have no intention
> > of requiring people implementing the Concertina II architecture to
> > include. Instead, implementations would normally be incapable of
> > unravelling things from vector mode back to scalar mode, which is why
> > I basically regard a vector computation of this type as going on
> > independently in an uninterruptible fashion.

> How, pray tell, do you intend on mandating this ??

I probably am not understanding your question.

If the spec says that implementations aren't required to be able to do the
unravelling, and there are no instructions that are provided to benefit from
it... there's no problem with a simple implementation that emulates the
full-fledged dataflow setup by just doing a loop which _is_ theoretically
fully capable of going back to the scalar mode... which it never really left.

The reason I'm saying, though, that implementations aren't required to be
able to unravel is because for somewhat higher-end implementations that
do the actual dataflow stuff in hardware - unravelling imposes additional costs.

Yes, those costs perhaps come "for free" in an OoO system - which is why
you designed VVM the way you did.

But remember: in order to implement in-line immediate values in the fashion
that I did, I organized programs into 256-bit blocks. And *then*, because I
had the blocks, I threw in VLIW features - explicit indication of parallelism,
and instruction predication, like in a DSP.

That means non-OoO implementations. Non-OoO implementations that are
still aiming at performance, so they might implement the dataflow in hardware.

So I'm saying they're allowed to do that - to implement the dataflow in minimal
hardware, without adding on the OoO capabilities they've eschewed.

*That's* the rationale, however flawled, behind my design decision.

John Savard

Re: Vector ISA Categorisation

<24c6fe0d-fc5f-4e99-bb5a-3880b7cc7df2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18758&group=comp.arch#18758

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:ff01:: with SMTP id w1mr11573742qvt.28.1626280535807;
Wed, 14 Jul 2021 09:35:35 -0700 (PDT)
X-Received: by 2002:aca:31ca:: with SMTP id x193mr7972961oix.84.1626280535609;
Wed, 14 Jul 2021 09:35:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Jul 2021 09:35:35 -0700 (PDT)
In-Reply-To: <scm0j1$935$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sc9iib$3ei$1@dont-email.me>
<scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <scm0j1$935$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24c6fe0d-fc5f-4e99-bb5a-3880b7cc7df2n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 16:35:35 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 16:35 UTC

On Wednesday, July 14, 2021 at 12:35:16 AM UTC-6, Ivan Godard wrote:
> On 7/13/2021 6:20 PM, Quadibloc wrote:

> > Although the normal ALUs and the pipeline of the CPU are used, for many
> > purposes, things behave as if the vector sequence is running on a separate
> > processor, or at least on an additional SMT-style thread. It's outside of
> > normal program execution, because in effect the CPU is 'rewired' to become
> > a dataflow machine.

> > Of course, that's only an illusion, since the CPU works normally while the
> > interrupt routine runs.

> How would you express this in C++ source?

That's not a question I've really thought about.

Presumably, a simulation in a conventional HLL, if that language had
the ability to spawn threads, could put the dataflow engine in a separate
thread when it's used.

As long as it can be expressed in VHDL or in Verilog, it can be realized
in hardware, so I hadn't really cared if everything in the architecture had
a C++ equivalent.

Of course, I may be misunderstanding your question.

John Savard

Re: Vector ISA Categorisation

<8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18761&group=comp.arch#18761

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9504:: with SMTP id x4mr10610259qkd.235.1626281408263; Wed, 14 Jul 2021 09:50:08 -0700 (PDT)
X-Received: by 2002:aca:5cd7:: with SMTP id q206mr7878184oib.99.1626281408028; Wed, 14 Jul 2021 09:50:08 -0700 (PDT)
Path: i2pn2.org!rocksolid2!news.neodome.net!news.theuse.net!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Jul 2021 09:50:07 -0700 (PDT)
In-Reply-To: <0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9888:255f:9776:664a; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9888:255f:9776:664a
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me> <fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com> <5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589
-4b4a-8447-39128dab7af8n@googlegroups.com> <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com> <9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com> <0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 14 Jul 2021 16:50:08 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 52
 by: MitchAlsup - Wed, 14 Jul 2021 16:50 UTC

On Wednesday, July 14, 2021 at 11:32:38 AM UTC-5, Quadibloc wrote:
> On Tuesday, July 13, 2021 at 9:04:16 PM UTC-6, MitchAlsup wrote:
> > On Tuesday, July 13, 2021 at 9:57:52 PM UTC-5, Quadibloc wrote:
>
> > > That, in a nutshell, is _precisely_ the capability that I have no intention
> > > of requiring people implementing the Concertina II architecture to
> > > include. Instead, implementations would normally be incapable of
> > > unravelling things from vector mode back to scalar mode, which is why
> > > I basically regard a vector computation of this type as going on
> > > independently in an uninterruptible fashion.
>
> > How, pray tell, do you intend on mandating this ??
> I probably am not understanding your question.
<
How are you going to mandate exception free looping ? "in an uninterruptible
fashion". Either you drop back to scalar, or you have to save a lot of state.
>
> If the spec says that implementations aren't required to be able to do the
> unravelling, and there are no instructions that are provided to benefit from
> it... there's no problem with a simple implementation that emulates the
> full-fledged dataflow setup by just doing a loop which _is_ theoretically
> fully capable of going back to the scalar mode... which it never really left.
>
> The reason I'm saying, though, that implementations aren't required to be
> able to unravel is because for somewhat higher-end implementations that
> do the actual dataflow stuff in hardware - unravelling imposes additional costs.
<
Exactly the same kinds of costs that precise exceptions add.
Everyone and his brother have been willing to absorb these costs for 40 years.....
>
> Yes, those costs perhaps come "for free" in an OoO system - which is why
> you designed VVM the way you did.
<
I actually designed VVM for the 1-wide InOrder machine--that it still works well
for the wide-superScalar machines is a distinct benefit.
>
> But remember: in order to implement in-line immediate values in the fashion
> that I did, I organized programs into 256-bit blocks. And *then*, because I
> had the blocks, I threw in VLIW features - explicit indication of parallelism,
> and instruction predication, like in a DSP.
>
> That means non-OoO implementations. Non-OoO implementations that are
> still aiming at performance, so they might implement the dataflow in hardware.
<
I chose otherwise:: GB OoO is in, VLIW is not. Mostly because of history
of VLIW successes permeate the world of comp.arch:: NOT.
>
> So I'm saying they're allowed to do that - to implement the dataflow in minimal
> hardware, without adding on the OoO capabilities they've eschewed.
>
> *That's* the rationale, however flawled, behind my design decision.
>
> John Savard

Re: Vector ISA Categorisation

<3620a96a-b37b-47b1-b362-1e15f87f75b3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18763&group=comp.arch#18763

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:9ac7:: with SMTP id k7mr11847864qvf.49.1626283459890;
Wed, 14 Jul 2021 10:24:19 -0700 (PDT)
X-Received: by 2002:a9d:3b0:: with SMTP id f45mr9451213otf.5.1626283459643;
Wed, 14 Jul 2021 10:24:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Jul 2021 10:24:19 -0700 (PDT)
In-Reply-To: <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
<0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com> <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3620a96a-b37b-47b1-b362-1e15f87f75b3n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 17:24:19 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 17:24 UTC

On Wednesday, July 14, 2021 at 10:50:09 AM UTC-6, MitchAlsup wrote:

> How are you going to mandate exception free looping ? "in an uninterruptible
> fashion". Either you drop back to scalar, or you have to save a lot of state.

I intend to do neither.

Instead, if, say, an I/O device throws a high-priority interrupt, yes, the processor
handles it.

But the vector unit just keeps chugging away. The only way to stop it is by
pulling the plug.

That may be a slight exaggeration: no doubt there will be means to _abort_
what it is doing if something happened to make it no longer relevant, so that the
vector unit is freed up for another computation.

> I chose otherwise:: GB OoO is in, VLIW is not. Mostly because of history
> of VLIW successes permeate the world of comp.arch:: NOT.

And my choice is... in-order is in, VLIW is in, OoO is in, GB OoO is in.

It's up to the implementor to decide if a VLIW-centric implementation is worth
doing.

And while the Itanium is perhaps legendary as a failure, and there may be a few
other VLIW designs that have fallen by the wayside, TI continues to make and sell
the TMS320C6000... so I *think* I have reason to conclude that VLIW designs have
not _all_ been disastrous failures, despite appearances.

They have a niche - and that niche is DSP applications, _not_ cutting-edge
supercomputing.

John Savard

Re: Vector ISA Categorisation

<7d83f656-02c2-4f18-b466-0483e06c3cb0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18764&group=comp.arch#18764

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:34c:: with SMTP id r12mr10302708qtw.196.1626284123417;
Wed, 14 Jul 2021 10:35:23 -0700 (PDT)
X-Received: by 2002:a9d:4c9a:: with SMTP id m26mr8704477otf.110.1626284123172;
Wed, 14 Jul 2021 10:35:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Jul 2021 10:35:22 -0700 (PDT)
In-Reply-To: <3620a96a-b37b-47b1-b362-1e15f87f75b3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:98d2:bbe0:a241:7dc2
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
<0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com> <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
<3620a96a-b37b-47b1-b362-1e15f87f75b3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7d83f656-02c2-4f18-b466-0483e06c3cb0n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 17:35:23 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 17:35 UTC

On Wednesday, July 14, 2021 at 11:24:20 AM UTC-6, Quadibloc wrote:

> And while the Itanium is perhaps legendary as a failure, and there may be a few
> other VLIW designs that have fallen by the wayside, TI continues to make and sell
> the TMS320C6000... so I *think* I have reason to conclude that VLIW designs have
> not _all_ been disastrous failures, despite appearances.

And it _may_ help that the design of the VLIW features I include owes more to
that apparent non-failure, the TMS320C6000, than the Itanium.

The TMS320C6000 basically just has a standard RISC instruction set, with the
addition of a single bit to each instruction that allows groups of instructions that
can be executed simultaneously without worries to be indicated. That's it.

Not three instruction slots with three different instruction sets designed in a
massively implementation-dependent manner that should not survive more than
one generation of hardware.

John Savard

Re: Vector ISA Categorisation

<9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18771&group=comp.arch#18771

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1304:: with SMTP id o4mr494680qkj.366.1626305615807;
Wed, 14 Jul 2021 16:33:35 -0700 (PDT)
X-Received: by 2002:a05:6830:3108:: with SMTP id b8mr530609ots.182.1626305615544;
Wed, 14 Jul 2021 16:33:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 14 Jul 2021 16:33:35 -0700 (PDT)
In-Reply-To: <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:8d1c:2589:3147:6cac;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:8d1c:2589:3147:6cac
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
<0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com> <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 14 Jul 2021 23:33:35 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Wed, 14 Jul 2021 23:33 UTC

On Wednesday, July 14, 2021 at 10:50:09 AM UTC-6, MitchAlsup wrote:

> I chose otherwise:: GB OoO is in, VLIW is not. Mostly because of history
> of VLIW successes permeate the world of comp.arch:: NOT.

On looking at the TMS320C6000 again, this led me to some reflection.

My elaborate scheme of indicating parallelism with U, D, and B bits,
so that it can be explicitly indicated which instructions depend on which
other instructions...

only makes sense in a design which _doesn't even have proper interlocks_.

Since I do expect implementations, even if they're VLIW-oriented, to be able
to handle code in the other non-VLIW block formats, and even the simplest
in-order implementation has to have interlocks to work, so this isn't like the
stuff needed for OoO, that isn't sensible.

So I should simplify down and reduce what I offer in that area to merely a single
bit per instruction to indicate groups of instructions that can be executed in
parallel without any checking.

John Savard

Re: Vector ISA Categorisation

<2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18813&group=comp.arch#18813

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2212:: with SMTP id m18mr5514659qkh.98.1626375114700; Thu, 15 Jul 2021 11:51:54 -0700 (PDT)
X-Received: by 2002:a9d:6f84:: with SMTP id h4mr5108161otq.240.1626375114395; Thu, 15 Jul 2021 11:51:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 15 Jul 2021 11:51:54 -0700 (PDT)
In-Reply-To: <5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=217.147.94.29; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 217.147.94.29
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de> <sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com> <80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com> <
5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Thu, 15 Jul 2021 18:51:54 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 55
 by: luke.l...@gmail.com - Thu, 15 Jul 2021 18:51 UTC

On Monday, July 12, 2021 at 8:55:42 PM UTC+1, MitchAlsup wrote:

> But here, the same vector code runs when there is memory aliasing as when there is not::
>
> for( i = 0; i < MAX, i++ )
> a[i] = b[i]*x+a[i-j];

> Thus the compiler does not have to solve this memory aliasing problem in order to convert
> the loop into vector form. When j ~IN( 0..MAX ) it runs at full vector speed. When j = 1, the
> loop runs at the latency of the FMAC unit plus one cycle. SAME code.

indeed. however... this has quite a cost in hardware, and, if i recall correctly,
if the program is much longer (extreme case - 10,000 or 100,000 instructions)
there's no way that an OoO in-flight scheduler can cope with that, so would
be forced to go back to the scalar execution.

> <
> > however i have an idea: an architectural "hint" instruction which the programmer tells the hardware *how many* elements may be run Horizontally without violating hazards. hardware may run *up to* that limit but not exceed it.
> <
> Mostly the programmer does not know this unit of data.

yes but the compiler would.

> >
> > if the limit is set "unlimited" then fascinatingly it says "all elements may be processed vertically and that by definition is the standard Cray-style Horizontal-first execution.
> <
> Which comes with the compiler HAVING to solve the "memory is not aliased" problem.

if we may reasonably assume that for simple-enough loops the compiler may
perform auto-vectorisation passes that successfully identify aliasing, there is
a huge advantage to the "hint":

high-performance VVM hardware *has* to use OoO in-flight scheduling in order
to gather multiple 'Vertical-First' stripes together for parallel execution. there
is a down-side to this: it requires quite complex hardware, and even on that
complex hardware, if the number of instructions is too great to hold in in-flight
data, the hardware has to give up and do scalar.

*however*.... what if there was a "hint" from the compiler? an instruction which
informed the hardware, "it is absolutely 100% guaranteed to be the case that
8 elements can be parallelised".

with such a hint available, even VVM LOOPs of 100,000 instructions in length
could still be performed with (up to) 8 elements at a time being thrown at the
back-end ALUs.

l.

Re: Vector ISA Categorisation

<96faa9e8-9024-4b01-aae6-b695161c2322n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18815&group=comp.arch#18815

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:d8c:: with SMTP id 134mr5854817qkn.433.1626378878635;
Thu, 15 Jul 2021 12:54:38 -0700 (PDT)
X-Received: by 2002:a4a:6049:: with SMTP id t9mr4795599oof.14.1626378878380;
Thu, 15 Jul 2021 12:54:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 15 Jul 2021 12:54:38 -0700 (PDT)
In-Reply-To: <9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.40.200.80; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.40.200.80
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
<0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com> <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
<9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <96faa9e8-9024-4b01-aae6-b695161c2322n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Thu, 15 Jul 2021 19:54:38 +0000
Content-Type: text/plain; charset="UTF-8"
 by: luke.l...@gmail.com - Thu, 15 Jul 2021 19:54 UTC

On Thursday, July 15, 2021 at 12:33:36 AM UTC+1, Quadibloc wrote:

> So I should simplify down and reduce what I offer in that area to merely a single
> bit per instruction to indicate groups of instructions that can be executed in
> parallel without any checking.

SVP64 is a "prefix" ISA. (Simple V Prefix 64 bit, totally original naming). the 32 bit prefix applies to an *existing* v3.0B Power ISA *scalar* instruction.

this is quite expensive inasmuch as "normal" Vector ISAs have explicut 32 bit vector opcodes. mind you, normal Vector ISAs don't have over a quarter of a million instructions.

i remembered a discussion on here last year about saving program size by having "shift registers" with a context and a bitlevel shift register. when the LSB is a 1, the "context" is applied to the current instruction.

if the context is repeated enough times it becomes worthwhile to separate it from the suffix instructions it applies to.

i thought it might be worth reminding you about that in case it's useful here. as in, you don't gave to "block everything together" (another technique i really like), there are alternatives.

l.

Re: Vector ISA Categorisation

<806b1dec-59b7-4f81-9dcd-4550fe43f5e0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18816&group=comp.arch#18816

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:b6c5:: with SMTP id g188mr5747629qkf.92.1626381421034; Thu, 15 Jul 2021 13:37:01 -0700 (PDT)
X-Received: by 2002:a05:6808:1313:: with SMTP id y19mr5048323oiv.37.1626381420758; Thu, 15 Jul 2021 13:37:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 15 Jul 2021 13:37:00 -0700 (PDT)
In-Reply-To: <2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:95b1:3c6f:d12f:872c; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:95b1:3c6f:d12f:872c
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb6vfb$1ov$1@dont-email.me> <sb70q1$fsg$2@newsreader4.netcologne.de> <sb912k$c4c$1@dont-email.me> <sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com> <80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com> <
5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com> <2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <806b1dec-59b7-4f81-9dcd-4550fe43f5e0n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 15 Jul 2021 20:37:01 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 67
 by: MitchAlsup - Thu, 15 Jul 2021 20:37 UTC

On Thursday, July 15, 2021 at 1:51:55 PM UTC-5, luke.l...@gmail.com wrote:
> On Monday, July 12, 2021 at 8:55:42 PM UTC+1, MitchAlsup wrote:
>
> > But here, the same vector code runs when there is memory aliasing as when there is not::
> >
> > for( i = 0; i < MAX, i++ )
> > a[i] = b[i]*x+a[i-j];
> > Thus the compiler does not have to solve this memory aliasing problem in order to convert
> > the loop into vector form. When j ~IN( 0..MAX ) it runs at full vector speed. When j = 1, the
> > loop runs at the latency of the FMAC unit plus one cycle. SAME code.
<
> indeed. however... this has quite a cost in hardware, and, if i recall correctly,
> if the program is much longer (extreme case - 10,000 or 100,000 instructions)
> there's no way that an OoO in-flight scheduler can cope with that, so would
> be forced to go back to the scalar execution.
<
It depends on how the station is seeded with the operand tag that it is looking for.
Do this correctly, and the station keeps track of everything for you.
Do it incorrectly, and bad things happen.
<
> > <
> > > however i have an idea: an architectural "hint" instruction which the programmer tells the hardware *how many* elements may be run Horizontally without violating hazards. hardware may run *up to* that limit but not exceed it.
> > <
> > Mostly the programmer does not know this unit of data.
> yes but the compiler would.
<
This could be dependent on a value in an input file.
> > >
> > > if the limit is set "unlimited" then fascinatingly it says "all elements may be processed vertically and that by definition is the standard Cray-style Horizontal-first execution.
> > <
> > Which comes with the compiler HAVING to solve the "memory is not aliased" problem.
<
> if we may reasonably assume that for simple-enough loops the compiler may
> perform auto-vectorisation passes that successfully identify aliasing, there is
> a huge advantage to the "hint":
<
There is a huge advantage if the compiler does not have to solve the aliasing arithmetic
problem and still vectorize the loop !
>
> high-performance VVM hardware *has* to use OoO in-flight scheduling in order
<
Minimal partial order is like BigO( n^3 ) while OoO is BigO( e^n )
VVM is carefully written to specify minimal partial order.
It could be done on a real OoO machine, but this is neither necessary not does
it add performance !!
<
> to gather multiple 'Vertical-First' stripes together for parallel execution. there
> is a down-side to this: it requires quite complex hardware, and even on that
> complex hardware, if the number of instructions is too great to hold in in-flight
> data, the hardware has to give up and do scalar.
<
The part of the loop which fits in the execution window can still run vector
and the parts which do not can run the scalarity of the front end.
>
> *however*.... what if there was a "hint" from the compiler? an instruction which
> informed the hardware, "it is absolutely 100% guaranteed to be the case that
> 8 elements can be parallelised".
<
Mostly the HW can figure this out by examining the register dependencies at
loop install time, and by performing AGENs in program order. When memory
references are dense the problem is actually easy. The AGENs in program
order is what causes minimal partial order.
>
> with such a hint available, even VVM LOOPs of 100,000 instructions in length
> could still be performed with (up to) 8 elements at a time being thrown at the
> back-end ALUs.
>
> l.

Re: Vector ISA Categorisation

<f33ed1dc-9b76-4cdc-873f-f84923451e4fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18823&group=comp.arch#18823

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:665a:: with SMTP id j26mr6902317qtp.254.1626398469297; Thu, 15 Jul 2021 18:21:09 -0700 (PDT)
X-Received: by 2002:aca:4946:: with SMTP id w67mr3459337oia.155.1626398469045; Thu, 15 Jul 2021 18:21:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 15 Jul 2021 18:21:08 -0700 (PDT)
In-Reply-To: <96faa9e8-9024-4b01-aae6-b695161c2322n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:d5c8:401f:8150:58c5; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:d5c8:401f:8150:58c5
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me> <fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com> <5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589
-4b4a-8447-39128dab7af8n@googlegroups.com> <7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com> <9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com> <0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com> <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com> <9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com> <96faa9e8-9024-4b01-aae6-b695161c2322n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f33ed1dc-9b76-4cdc-873f-f84923451e4fn@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 16 Jul 2021 01:21:09 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 22
 by: Quadibloc - Fri, 16 Jul 2021 01:21 UTC

On Thursday, July 15, 2021 at 1:54:39 PM UTC-6, luke.l...@gmail.com wrote:
> you don't gave to "block everything together" (another technique i really like), there are alternatives.

Oh, yes. The reason I've organized the instructions into blocks is:

Mitch Alsup pointed out that having immediate operands in instructions is very advantageous
these days, given how expensive fetches from memory are,

and I wanted to make sure it was painfully simple to determine the length of an instruction,
but the lengths of data items vary over such a range they would make instruction lengths
complicated.

So I came up with an idea, although someone else's idea, Heidi Pan's heads-and-tails, was
an inspiration... put instructions in blocks, allow unused space, have pointers within the
block show were constant values are.

The block is fetched already to get the instructions, so it has the advantage of an
immediate, but the length of the instruction is undisturbed, as if the constant were
in a register even.

So using blocks solved one problem in a very simple way.

John Savard

Re: Vector ISA Categorisation

<scr62r$mlm$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18824&group=comp.arch#18824

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-ef14-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Vector ISA Categorisation
Date: Fri, 16 Jul 2021 05:39:39 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <scr62r$mlm$1@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me>
<sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me>
<scac7e$ph4$1@dont-email.me>
<63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me>
<dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com>
<2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com>
<sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com>
<scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com>
<schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com>
<a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com>
<80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com>
<5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com>
<2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>
Injection-Date: Fri, 16 Jul 2021 05:39:39 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-ef14-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:ef14:0:7285:c2ff:fe6c:992d";
logging-data="23222"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 16 Jul 2021 05:39 UTC

luke.l...@gmail.com <luke.leighton@gmail.com> schrieb:
> On Monday, July 12, 2021 at 8:55:42 PM UTC+1, MitchAlsup wrote:
>
>> But here, the same vector code runs when there is memory aliasing as when there is not::
>>
>> for( i = 0; i < MAX, i++ )
>> a[i] = b[i]*x+a[i-j];
>
>> Thus the compiler does not have to solve this memory aliasing problem in order to convert
>> the loop into vector form. When j ~IN( 0..MAX ) it runs at full vector speed. When j = 1, the
>> loop runs at the latency of the FMAC unit plus one cycle. SAME code.
>
> indeed. however... this has quite a cost in hardware, and, if i recall correctly,
> if the program is much longer (extreme case - 10,000 or 100,000 instructions)
> there's no way that an OoO in-flight scheduler can cope with that, so would
> be forced to go back to the scalar execution.

That's a C specific language problem, when the programmer forgets
to use restrict.

Other languages, such as Fortran, forbid hidden aliasing, and the
compiler can assume that nothing of the sort happens when it
is not visible in the source code.

I wouldn't hamper well-written programs for the benefit of
programmers who are too lazy to type an 8-character keyword in C.

[...]

> *however*.... what if there was a "hint" from the compiler? an instruction which
> informed the hardware, "it is absolutely 100% guaranteed to be the case that
> 8 elements can be parallelised".

That makes absolute sense. There is value in doing loops in VVM
even when things alias, and having a bit which specifies "go wild
and parallelize" is a good thing.

This can be a direct translation from a language construct, for
example Fortrans "DO CONCURRENT", or Cray's famous
C$DIR IVDEP directive.

Aliasing analysis is not easy. I can point out the code I
contributed quite some time ago to gfortran to help try and avoid
temporaries in Fortran code like

a(n1:n2:n3) = a(n4:n5:n6)

where the first number of the triplet is the start, the second the
end and the third the stride and the language definition prescribes
that, conceptually, the right-hand-side has to be evaluated before
the left-hand side is assigned.

It is always possible to solve the correctness problem with
a temporary, but of course it is better if the compiler gets
by without. gfortran (and, I presume, other compilers as well)
has a bag of tricks including loop reversal.

One additional possibility is that the compiler could insert a run-time
check to see if an aliasing situaton occurs, and then either set a
flag in the VEC instruction or call a different version of the loop.

Re: Vector ISA Categorisation

<5a96aea9-fac3-4a98-939a-58bb54e493ddn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18842&group=comp.arch#18842

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6b0f:: with SMTP id w15mr9866373qts.366.1626451361510;
Fri, 16 Jul 2021 09:02:41 -0700 (PDT)
X-Received: by 2002:a9d:7f14:: with SMTP id j20mr8399521otq.82.1626451361254;
Fri, 16 Jul 2021 09:02:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 16 Jul 2021 09:02:41 -0700 (PDT)
In-Reply-To: <9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:2556:f07:e0d:c581;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:2556:f07:e0d:c581
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb99gi$1r5$1@newsreader4.netcologne.de>
<sbh665$sht$1@dont-email.me> <sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me>
<sc12qv$8ka$1@dont-email.me> <sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me>
<sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me>
<sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <scjk6o$rme$1@dont-email.me>
<fbf6751b-6b88-4283-92ea-1fff4b7fe200n@googlegroups.com> <863e6886-f580-4cce-aaef-ddf8d6baa4dfn@googlegroups.com>
<5deba2bb-fa46-43e7-a8f2-01bc5cffb519n@googlegroups.com> <ca4a2e3b-0589-4b4a-8447-39128dab7af8n@googlegroups.com>
<7dc19ec1-a43d-4868-ad8b-7f03c320012fn@googlegroups.com> <f6590a76-2e16-486d-baf8-879ebdf36266n@googlegroups.com>
<9cb2db22-9778-4e1b-bcc4-71005c225c73n@googlegroups.com> <a688aab7-5ba9-4f22-8817-1060c3b29375n@googlegroups.com>
<0785c4c1-b891-4b0a-8fa9-d712ea178e1en@googlegroups.com> <8f0d50a6-897a-4924-84f9-f790395c90e9n@googlegroups.com>
<9260f971-4fd2-487c-ab00-249ca72f29a4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a96aea9-fac3-4a98-939a-58bb54e493ddn@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 16 Jul 2021 16:02:41 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Fri, 16 Jul 2021 16:02 UTC

On Wednesday, July 14, 2021 at 5:33:36 PM UTC-6, Quadibloc wrote:
> On Wednesday, July 14, 2021 at 10:50:09 AM UTC-6, MitchAlsup wrote:
> > I chose otherwise:: GB OoO is in, VLIW is not. Mostly because of history
> > of VLIW successes permeate the world of comp.arch:: NOT.
> On looking at the TMS320C6000 again, this led me to some reflection.
>
> My elaborate scheme of indicating parallelism with U, D, and B bits,
> so that it can be explicitly indicated which instructions depend on which
> other instructions...
>
> only makes sense in a design which _doesn't even have proper interlocks_.
>
> Since I do expect implementations, even if they're VLIW-oriented, to be able
> to handle code in the other non-VLIW block formats, and even the simplest
> in-order implementation has to have interlocks to work, so this isn't like the
> stuff needed for OoO, that isn't sensible.
>
> So I should simplify down and reduce what I offer in that area to merely a single
> bit per instruction to indicate groups of instructions that can be executed in
> parallel without any checking.

As the changes required were extensive, I have only now gotten around to
making them, and posting the updated pages to

http://www.quadibloc.com/arch/ct14int.htm

and succeeding pages.

John Savard

Re: Vector ISA Categorisation

<d464e070-2e9e-4de5-936a-c47b66209887n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18843&group=comp.arch#18843

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1304:: with SMTP id o4mr10651208qkj.366.1626451632533;
Fri, 16 Jul 2021 09:07:12 -0700 (PDT)
X-Received: by 2002:a05:6808:14c8:: with SMTP id f8mr8419122oiw.7.1626451632323;
Fri, 16 Jul 2021 09:07:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 16 Jul 2021 09:07:12 -0700 (PDT)
In-Reply-To: <806b1dec-59b7-4f81-9dcd-4550fe43f5e0n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=217.147.94.29; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 217.147.94.29
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sb6vfb$1ov$1@dont-email.me>
<sb70q1$fsg$2@newsreader4.netcologne.de> <sb912k$c4c$1@dont-email.me>
<sb99gi$1r5$1@newsreader4.netcologne.de> <sbh665$sht$1@dont-email.me>
<sbubiu$unp$1@dont-email.me> <sbudg8$aje$1@dont-email.me> <sc12qv$8ka$1@dont-email.me>
<sc186o$gns$1@dont-email.me> <sc5cg5$a3p$1@dont-email.me> <sc5fh8$p7q$1@dont-email.me>
<sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me>
<scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com>
<80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com> <5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com>
<2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com> <806b1dec-59b7-4f81-9dcd-4550fe43f5e0n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d464e070-2e9e-4de5-936a-c47b66209887n@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Fri, 16 Jul 2021 16:07:12 +0000
Content-Type: text/plain; charset="UTF-8"
 by: luke.l...@gmail.com - Fri, 16 Jul 2021 16:07 UTC

On Thursday, July 15, 2021 at 9:37:02 PM UTC+1, MitchAlsup wrote:
> On Thursday, July 15, 2021 at 1:51:55 PM UTC-5, luke.l...@gmail.com wrote:
> > *however*.... what if there was a "hint" from the compiler? an instruction which
> > informed the hardware, "it is absolutely 100% guaranteed to be the case that
> > 8 elements can be parallelised".
> <
> Mostly the HW can figure this out by examining the register dependencies at
> loop install time, and by performing AGENs in program order. When memory
> references are dense the problem is actually easy. The AGENs in program
> order is what causes minimal partial order.

ok, so i am slowly trying to narrow down on the circumstances to describe,
where hints would be useful.

let us imagine that it is 10,000 instructions into the loop before an address
read is encountered, and that the loop itself is 100,000. this is (deliberately)
far beyond what any system may successfully identify the maximum safe
element-level parallelism through looking for AGEN'd opportunities.

so let us then imagine that the hardware goes, "hmmm, i have not yet
encountered a LD/ST but i am going to continue assuming that 64
elements in parallel is perfectly acceptable".

it also goes, "hmmm i simply cannot possibly create in-flight buffers for
10,000 instructions so i am forced to write most of the 10,000 instructions
data into actual registers"

it then encounters the LD/ST, performs the analysis of 64 LD/ST AGENs
and finds, oh s***, 33 of them are overlapping memory addresses.

by this time it is far, far too late: it's already written to the regfile and the
"damage" is impossible to unwind.

if however there was a "hint" (in this case "31 elements may be performed in
parallel perfectly safely") it would *never hit* the 33 cases that were impossible
to detect without an in-flight buffer of size 10,000.

l.

Re: Vector ISA Categorisation

<1d608893-577a-407f-93cc-beaf5669e7ccn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18844&group=comp.arch#18844

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2f5:: with SMTP id a21mr1577136qko.36.1626451873423; Fri, 16 Jul 2021 09:11:13 -0700 (PDT)
X-Received: by 2002:a9d:491c:: with SMTP id e28mr8586873otf.342.1626451873179; Fri, 16 Jul 2021 09:11:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 16 Jul 2021 09:11:12 -0700 (PDT)
In-Reply-To: <scr62r$mlm$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=217.147.94.29; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 217.147.94.29
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sc5fh8$p7q$1@dont-email.me> <sc8pjr$8ib$1@dont-email.me> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com> <80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com> <5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com> <2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com> <scr62r$mlm$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1d608893-577a-407f-93cc-beaf5669e7ccn@googlegroups.com>
Subject: Re: Vector ISA Categorisation
From: luke.lei...@gmail.com (luke.l...@gmail.com)
Injection-Date: Fri, 16 Jul 2021 16:11:13 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 21
 by: luke.l...@gmail.com - Fri, 16 Jul 2021 16:11 UTC

On Friday, July 16, 2021 at 6:39:41 AM UTC+1, Thomas Koenig wrote:

> Aliasing analysis is not easy.

i would be very surprised if it was. however for basic things
such as static 2D arrays (with twin nested for-loops) i would
expect it to be almost trivial:

int x[32][8];
for (i = 1; i < 32; i++)
for (j = 0; j < 8; j++) {
x[i-1][j] += x[i][j]
}

it should be pretty clear that because the memory is in fact
compactable to a contiguous 1D array, 8 element parallelism
is perfectly achievable.

thus i would expect a "hint" to be generated in this case by
the compiler "up to 8 INT elements may be performed in parallel"

l.

Re: Vector ISA Categorisation

<scscf8$oi4$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18845&group=comp.arch#18845

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Vector ISA Categorisation
Date: Fri, 16 Jul 2021 09:34:49 -0700
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <scscf8$oi4$1@dont-email.me>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<scac7e$ph4$1@dont-email.me>
<63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me>
<dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com>
<2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com>
<sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com>
<scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com>
<schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com>
<a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com>
<80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com>
<5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com>
<2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>
<scr62r$mlm$1@newsreader4.netcologne.de>
<1d608893-577a-407f-93cc-beaf5669e7ccn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 16 Jul 2021 16:34:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5fecf0e5eb1d9ff1bc8849f0203a63da";
logging-data="25156"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Ht4YI7Ss+LhI8vApfl6qz"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:ZTY76wvTyiGAQ+Id7DFRHt+SkE8=
In-Reply-To: <1d608893-577a-407f-93cc-beaf5669e7ccn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Fri, 16 Jul 2021 16:34 UTC

On 7/16/2021 9:11 AM, luke.l...@gmail.com wrote:
> On Friday, July 16, 2021 at 6:39:41 AM UTC+1, Thomas Koenig wrote:
>
>> Aliasing analysis is not easy.
>
> i would be very surprised if it was. however for basic things
> such as static 2D arrays (with twin nested for-loops) i would
> expect it to be almost trivial:
>
> int x[32][8];
> for (i = 1; i < 32; i++)
> for (j = 0; j < 8; j++) {
> x[i-1][j] += x[i][j]
> }
>
> it should be pretty clear that because the memory is in fact
> compactable to a contiguous 1D array, 8 element parallelism
> is perfectly achievable.
>
> thus i would expect a "hint" to be generated in this case by
> the compiler "up to 8 INT elements may be performed in parallel"
>
> l.
>

Start here:

https://en.wikipedia.org/wiki/Polytope_model

Re: Vector ISA Categorisation

<scsd98$hip$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18846&group=comp.arch#18846

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-ef14-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Vector ISA Categorisation
Date: Fri, 16 Jul 2021 16:48:40 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <scsd98$hip$1@newsreader4.netcologne.de>
References: <sb6s70$dip$1@newsreader4.netcologne.de>
<scac7e$ph4$1@dont-email.me>
<63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com>
<scan92$k7m$1@dont-email.me>
<dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com>
<9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com>
<2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com>
<9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com>
<sceb52$b4t$1@dont-email.me>
<0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com>
<scgi37$u7v$1@dont-email.me>
<57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com>
<schqua$k02$1@dont-email.me>
<999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com>
<a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com>
<80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com>
<5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com>
<2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com>
<scr62r$mlm$1@newsreader4.netcologne.de>
<1d608893-577a-407f-93cc-beaf5669e7ccn@googlegroups.com>
Injection-Date: Fri, 16 Jul 2021 16:48:40 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-ef14-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:ef14:0:7285:c2ff:fe6c:992d";
logging-data="18009"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 16 Jul 2021 16:48 UTC

luke.l...@gmail.com <luke.leighton@gmail.com> schrieb:
> On Friday, July 16, 2021 at 6:39:41 AM UTC+1, Thomas Koenig wrote:
>
>> Aliasing analysis is not easy.
>
> i would be very surprised if it was. however for basic things
> such as static 2D arrays (with twin nested for-loops) i would
> expect it to be almost trivial:
>
> int x[32][8];
> for (i = 1; i < 32; i++)
> for (j = 0; j < 8; j++) {
> x[i-1][j] += x[i][j]
> }
>
> it should be pretty clear that because the memory is in fact
> compactable to a contiguous 1D array, 8 element parallelism
> is perfectly achievable.

Sure, and current compilers vectorize this. A lot of techniques
work well when compilers know the size of the arrays involved
and the bounds they work with (which is one reason why LTO or
IPO are sometimes a success).

However, I think it is safe to say that most programs these days
operate on problem sizes depending on input. Let's replace the
constant 8 by a variable.

In that case, what hint should the compiler specify? It would
probaby be better just to issue loop instructions for the inner
loop in that case.

Re: Vector ISA Categorisation

<hsjII.8947$6j.5699@fx04.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18848&group=comp.arch#18848

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx04.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Vector ISA Categorisation
References: <sb6s70$dip$1@newsreader4.netcologne.de> <sc8uoc$2tc$1@dont-email.me> <sc9iib$3ei$1@dont-email.me> <scac7e$ph4$1@dont-email.me> <63597d55-f5bd-42fc-bae3-38155d072128n@googlegroups.com> <scan92$k7m$1@dont-email.me> <dc5c8894-e51d-46a8-b682-9784fb8ac205n@googlegroups.com> <9020308c-08e6-4f4f-b29f-e4320c19b1c2n@googlegroups.com> <2336ffa3-df90-461a-a1cb-51147dfc504dn@googlegroups.com> <9a596e40-0c21-4b4c-83b1-56c745dd199cn@googlegroups.com> <sceb52$b4t$1@dont-email.me> <0e87d075-e620-4173-accc-e16e0adbba35n@googlegroups.com> <scgi37$u7v$1@dont-email.me> <57a0784c-b114-460e-af96-9930e94441f3n@googlegroups.com> <schqua$k02$1@dont-email.me> <999476a5-100e-49dc-9a06-4550a7c928f0n@googlegroups.com> <a0e2018b-9bae-4549-855b-50ff92cacbc2n@googlegroups.com> <80be8c80-109d-4b85-9822-68fa19ee17ffn@googlegroups.com> <5c992582-0cde-4dc2-8e80-556de4f8eb26n@googlegroups.com> <2ee7b3e0-21c2-4d48-ac4d-bac47df14d21n@googlegroups.com> <806b1dec-59b7-4f81-9dcd-4550fe43f5e0n@googlegroups.com>
In-Reply-To: <806b1dec-59b7-4f81-9dcd-4550fe43f5e0n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 96
Message-ID: <hsjII.8947$6j.5699@fx04.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 16 Jul 2021 17:36:13 UTC
Date: Fri, 16 Jul 2021 13:36:03 -0400
X-Received-Bytes: 6586
 by: EricP - Fri, 16 Jul 2021 17:36 UTC

MitchAlsup wrote:
> On Thursday, July 15, 2021 at 1:51:55 PM UTC-5, luke.l...@gmail.com wrote:
>> On Monday, July 12, 2021 at 8:55:42 PM UTC+1, MitchAlsup wrote:
>>
>>> But here, the same vector code runs when there is memory aliasing as when there is not::
>>>
>>> for( i = 0; i < MAX, i++ )
>>> a[i] = b[i]*x+a[i-j];
>>> Thus the compiler does not have to solve this memory aliasing problem in order to convert
>>> the loop into vector form. When j ~IN( 0..MAX ) it runs at full vector speed. When j = 1, the
>>> loop runs at the latency of the FMAC unit plus one cycle. SAME code.
> <
>> indeed. however... this has quite a cost in hardware, and, if i recall correctly,
>> if the program is much longer (extreme case - 10,000 or 100,000 instructions)
>> there's no way that an OoO in-flight scheduler can cope with that, so would
>> be forced to go back to the scalar execution.
> <
> It depends on how the station is seeded with the operand tag that it is looking for.
> Do this correctly, and the station keeps track of everything for you.
> Do it incorrectly, and bad things happen.
> <
>>> <
>>>> however i have an idea: an architectural "hint" instruction which the programmer tells the hardware *how many* elements may be run Horizontally without violating hazards. hardware may run *up to* that limit but not exceed it.
>>> <
>>> Mostly the programmer does not know this unit of data.
>> yes but the compiler would.
> <
> This could be dependent on a value in an input file.
>>>> if the limit is set "unlimited" then fascinatingly it says "all elements may be processed vertically and that by definition is the standard Cray-style Horizontal-first execution.
>>> <
>>> Which comes with the compiler HAVING to solve the "memory is not aliased" problem.
> <
>> if we may reasonably assume that for simple-enough loops the compiler may
>> perform auto-vectorisation passes that successfully identify aliasing, there is
>> a huge advantage to the "hint":
> <
> There is a huge advantage if the compiler does not have to solve the aliasing arithmetic
> problem and still vectorize the loop !
>> high-performance VVM hardware *has* to use OoO in-flight scheduling in order
> <
> Minimal partial order is like BigO( n^3 ) while OoO is BigO( e^n )
> VVM is carefully written to specify minimal partial order.
> It could be done on a real OoO machine, but this is neither necessary not does
> it add performance !!
> <
>> to gather multiple 'Vertical-First' stripes together for parallel execution. there
>> is a down-side to this: it requires quite complex hardware, and even on that
>> complex hardware, if the number of instructions is too great to hold in in-flight
>> data, the hardware has to give up and do scalar.
> <
> The part of the loop which fits in the execution window can still run vector
> and the parts which do not can run the scalarity of the front end.
>> *however*.... what if there was a "hint" from the compiler? an instruction which
>> informed the hardware, "it is absolutely 100% guaranteed to be the case that
>> 8 elements can be parallelised".
> <
> Mostly the HW can figure this out by examining the register dependencies at
> loop install time, and by performing AGENs in program order. When memory
> references are dense the problem is actually easy. The AGENs in program
> order is what causes minimal partial order.

What concerns me is the amount of tracking information required
to do this in-flight alias detection. Traditional OoO needs to
track data flow dependencies, but that information is instantaneous
and can be disposed of as soon as dependent executes.

VVM needs to correlate every original source address to a version of
a register, then track all the dependencies on that source as it
flows through the operations and merges with other dependencies,
and hold onto all that tracking info while any dependent
is still in flight.

Maybe something like...

Detecting writes to prior memory address with in-flight dependents
could be done with a TCAM. Maybe include a Bloom filter reset at
LOOP start to limit TCAM lookups and power usage.
Nicer if data addresses are all naturally aligned too.

Something like a scoreboard but which retains historical versions
could track in-flight dependencies from source to sink.

Then we need some way to detect that there are no in-flight
dependencies remaining and recover above resources.
So some way to backtrack through the dependency network
and determine that the last consumer has retired,
and recover all the tracking nodes and TCAM entries.
And ideally do all that resource recovery in parallel in 1 clock.

>> with such a hint available, even VVM LOOPs of 100,000 instructions in length
>> could still be performed with (up to) 8 elements at a time being thrown at the
>> back-end ALUs.
>>
>> l.

Pages:12345678
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor