Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and fixed.


devel / comp.arch / Load/Store with auto-increment

SubjectAuthor
* Load/Store with auto-incrementMarcus
+* Re: Load/Store with auto-incrementJohn Levine
|+* Re: Load/Store with auto-incrementMarcus
||`- Re: Load/Store with auto-incrementMitchAlsup
|+* Re: Load/Store with auto-incrementAnton Ertl
||+* Re: Load/Store with auto-incrementJohn Levine
|||+* Re: Load/Store with auto-incrementDavid Brown
||||+* Re: Load/Store with auto-incrementJohn Levine
|||||+- Re: Load/Store with auto-incrementMitchAlsup
|||||`- Re: Load/Store with auto-incrementAnton Ertl
||||`* Re: Load/Store with auto-incrementMitchAlsup
|||| `* Re: Load/Store with auto-incrementDavid Brown
||||  `- Re: Load/Store with auto-incrementJohn Levine
|||+- Re: Load/Store with auto-incrementStephen Fuld
|||+* Re: Load/Store with auto-incrementJohn Levine
||||`* Re: Load/Store with auto-incrementScott Lurndal
|||| `* Re: Load/Store with auto-incrementJohn Dallman
||||  `* Re: Load/Store with auto-incrementScott Lurndal
||||   `* Re: Load/Store with auto-incrementJohn Dallman
||||    `* Re: Load/Store with auto-incrementluke.l...@gmail.com
||||     `* Re: Load/Store with auto-incrementScott Lurndal
||||      `* Re: Load/Store with auto-incrementluke.l...@gmail.com
||||       +- Re: Load/Store with auto-incrementScott Lurndal
||||       +- Re: Load/Store with auto-incrementAnton Ertl
||||       `- Re: Load/Store with auto-incrementJohn Dallman
|||`* Re: Load/Store with auto-incrementAnton Ertl
||| +* Re: Load/Store with auto-incrementScott Lurndal
||| |`* Re: Load/Store with auto-incrementAnton Ertl
||| | `* Re: Load/Store with auto-incrementScott Lurndal
||| |  `* Re: Load/Store with auto-incrementAnton Ertl
||| |   `- Re: Load/Store with auto-incrementScott Lurndal
||| `- Re: Load/Store with auto-incrementEricP
||`- Re: Load/Store with auto-incrementluke.l...@gmail.com
|`- Re: Load/Store with auto-incrementluke.l...@gmail.com
+* Re: Load/Store with auto-incrementMitchAlsup
|`* Re: Load/Store with auto-incrementBGB
| `- Re: Load/Store with auto-incrementrobf...@gmail.com
+* Re: Load/Store with auto-incrementThomas Koenig
|+* Re: Load/Store with auto-incrementMitchAlsup
||+* Re: Load/Store with auto-incrementScott Lurndal
|||+- Re: Load/Store with auto-incrementDavid Brown
|||+* Re: Load/Store with auto-incrementThomas Koenig
||||+- Re: Load/Store with auto-incrementMitchAlsup
||||+- Re: Load/Store with auto-incrementThomas Koenig
||||`- Re: Load/Store with auto-incrementAnton Ertl
|||`- Re: Load/Store with auto-incrementMitchAlsup
||`- Re: Load/Store with auto-incrementluke.l...@gmail.com
|`* Re: Load/Store with auto-incrementAnton Ertl
| `* Re: Load/Store with auto-incrementEricP
|  `* Re: Load/Store with auto-incrementScott Lurndal
|   `* Re: Load/Store with auto-incrementEricP
|    +* Re: Load/Store with auto-incrementScott Lurndal
|    |`* Re: Load/Store with auto-incrementBGB
|    | `* Re: Load/Store with auto-incrementScott Lurndal
|    |  `* Re: Load/Store with auto-incrementBGB
|    |   +* Re: Load/Store with auto-incrementScott Lurndal
|    |   |+* Re: Load/Store with auto-incrementBGB
|    |   ||+- Re: Load/Store with auto-incrementBGB
|    |   ||`- Re: Load/Store with auto-incrementJohn Dallman
|    |   |`* Re: Load/Store with auto-incrementPaul A. Clayton
|    |   | +* Re: Load/Store with auto-incrementBGB
|    |   | |`- Re: Load/Store with auto-incrementJohn Dallman
|    |   | `* Re: Load/Store with auto-incrementScott Lurndal
|    |   |  `- Re: Load/Store with auto-incrementJohn Dallman
|    |   `- Re: Load/Store with auto-incrementMitchAlsup
|    `* Re: Load/Store with auto-incrementAnton Ertl
|     +* Re: Load/Store with auto-incrementluke.l...@gmail.com
|     |`* Re: Load/Store with auto-incrementEricP
|     | +* Re: Load/Store with auto-incrementluke.l...@gmail.com
|     | |`- Re: Load/Store with auto-incrementMitchAlsup
|     | `- Re: Load/Store with auto-incrementMitchAlsup
|     +* Re: Load/Store with auto-incrementEricP
|     |`* Re: Load/Store with auto-incrementAnton Ertl
|     | +* Re: Load/Store with auto-incrementEricP
|     | |+* Re: Load/Store with auto-incrementMitchAlsup
|     | ||`- Re: Load/Store with auto-incrementMitchAlsup
|     | |`- Re: Load/Store with auto-incrementAnton Ertl
|     | +* Re: Load/Store with auto-incrementEricP
|     | |`- Re: Load/Store with auto-incrementMitchAlsup
|     | `* Re: Load/Store with auto-incrementEricP
|     |  +* Re: Load/Store with auto-incrementMitchAlsup
|     |  |`* Re: Load/Store with auto-incrementEricP
|     |  | `* Re: Load/Store with auto-incrementluke.l...@gmail.com
|     |  |  +* Re: Load/Store with auto-incrementluke.l...@gmail.com
|     |  |  |`- Re: Load/Store with auto-incrementEricP
|     |  |  `* Re: Load/Store with auto-incrementEricP
|     |  |   `* Re: Load/Store with auto-incrementMitchAlsup
|     |  |    `- Re: Load/Store with auto-incrementEricP
|     |  `- Re: Load/Store with auto-incrementluke.l...@gmail.com
|     +* Re: Load/Store with auto-incrementBGB
|     |+- Re: Load/Store with auto-incrementScott Lurndal
|     |`* Re: Load/Store with auto-incrementMitchAlsup
|     | `* Re: Load/Store with auto-incrementBGB
|     |  `* Re: Load/Store with auto-incrementMitchAlsup
|     |   `* Re: Load/Store with auto-incrementBGB
|     |    `- Re: Load/Store with auto-incrementMitchAlsup
|     `- Re: Load/Store with auto-incrementAnton Ertl
`* Re: Load/Store with auto-incrementAnton Ertl
 `* Re: Load/Store with auto-incrementBGB
  `* Re: Load/Store with auto-incrementMitchAlsup
   +* Re: Load/Store with auto-incrementrobf...@gmail.com
   `- Re: Load/Store with auto-incrementBGB

Pages:12345678
Load/Store with auto-increment

<u35prk$2ssbq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32015&group=comp.arch#32015

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Load/Store with auto-increment
Date: Sat, 6 May 2023 16:56:52 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <u35prk$2ssbq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 6 May 2023 14:56:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5d63706049988a98180f70b490247597";
logging-data="3043706"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19w6NLdpDWGV7qJ5V/4DPMNqL9D1RcyRhE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.10.0
Cancel-Lock: sha1:PA3TbqZM5OIcvLYHZqtuGmCw3vM=
Content-Language: en-US
 by: Marcus - Sat, 6 May 2023 14:56 UTC

Load/store with auto-increment/decrement can reduce the number of
instructions in many loops (especially those that mostly iterate over
arrays of data). It can also be used in function prologues and epilogues
(for push/pop functionality).

For a long time I had dismissed load/store with auto-increment for my
ISA (MRISC32). The reason is that a load operation with auto-increment
would have TWO results (the loaded value and the updated address base),
which would be a complication (all other instructions have at most one
result).

However, a couple of days ago I realized that store operations do not
have any result, so I could add instructions for store with auto-
increment, and still only have one result. I have a pretty good idea
of how to do it (instruction encoding etc), and it would fit fairly
well (the only oddity would be that the result register is not the
first register address in the instruction word, but the second register
address, which requires some more MUX:ing in the decoding stages).

The next question is: What flavors should I have?

- Post-increment (most common?)
- Post-decrement
- Pre-increment
- Pre-decrement (second most common?)

The "pre" variants would possibly add more logic to critical paths (e.g.
add more gate delay in the AGU before the address is ready for the
memory stage).

Any thoughts? Is it worth it?

/Marcus

Re: Load/Store with auto-increment

<u35s66$280b$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32016&group=comp.arch#32016

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!adore2!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sat, 6 May 2023 15:36:38 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u35s66$280b$1@gal.iecc.com>
References: <u35prk$2ssbq$1@dont-email.me>
Injection-Date: Sat, 6 May 2023 15:36:38 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="73739"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <u35prk$2ssbq$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 6 May 2023 15:36 UTC

According to Marcus <m.delete@this.bitsnbites.eu>:
>Load/store with auto-increment/decrement can reduce the number of
>instructions in many loops (especially those that mostly iterate over
>arrays of data). It can also be used in function prologues and epilogues
>(for push/pop functionality). ...

>Any thoughts? Is it worth it?

Autoincrement was quite popular in the 1960s and 70s. The DEC 12 and
18 bit minis and the DG Nova had a version of it where specific
addresses would autoinrement or decrement when used as indirect
addresses. I did a fair amount of PDP-8 programming and those
autoincrement locations were precious, which said as much about the
limits of the 8's instruction set as anything else.

The PDP-11 generalized this to useful modes -(R) and (R)+ to
predecrement or postincrement any register when used as an address,
which is how it handled stacks and the simple cases of stepping
through a string or array.

It also had indirect versions of both, @(R)+ which was useful for
stepping through an array of pointers (one instruction dispatch for
threaded code or coroutines) and @-(R) which turned out to be useless
and was dropped in the VAX.

Here it is 50 years later and they're all gone. I think the increase
in code density wasn't worth the contortions to ensure that your data
structures fit the few cases that the autoincrement modes handled. It
also made it harder to parallelize and pipeline stuff since address
modes had side effects that had to be scheduled around or potentially
unwound in a page fault.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Load/Store with auto-increment

<u3634o$2uha6$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32017&group=comp.arch#32017

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sat, 6 May 2023 19:35:20 +0200
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <u3634o$2uha6$1@dont-email.me>
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 6 May 2023 17:35:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5d63706049988a98180f70b490247597";
logging-data="3097926"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mn0jVkjbr7DHg5Ka7x9EpDcW1BC6J/0w="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.10.0
Cancel-Lock: sha1:PixZbp9bJQxABgQd7OKe8ZGHKuI=
Content-Language: en-US
In-Reply-To: <u35s66$280b$1@gal.iecc.com>
 by: Marcus - Sat, 6 May 2023 17:35 UTC

On 2023-05-06, John Levine wrote:
> Here it is 50 years later and they're all gone. I think the increase
> in code density wasn't worth the contortions to ensure that your data
> structures fit the few cases that the autoincrement modes handled. It
> also made it harder to parallelize and pipeline stuff since address
> modes had side effects that had to be scheduled around or potentially
> unwound in a page fault.

Actually, ARM has auto-increment (even AArch64). I think that if you
limit what you can do (not the crazy multi-memory accesses instructions
that was popular in CISC, e.g. 68k), you should not have any problems
with page fault handling etc. Unless...

Does the auto-increment instruction implicitly introduce a data-
dependency that's also dependent on the memory operation to complete?
Is there any real difference compared to doing the memory operation
and the address increment in two separate instructions (in an OoO
machine)?

/Marcus

Re: Load/Store with auto-increment

<e37d732c-7f37-4631-b785-b860a1f0a6edn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32019&group=comp.arch#32019

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:4c0a:b0:61b:5bdd:d2b6 with SMTP id qh10-20020a0562144c0a00b0061b5bddd2b6mr882747qvb.9.1683396939531;
Sat, 06 May 2023 11:15:39 -0700 (PDT)
X-Received: by 2002:aca:6285:0:b0:38c:2e50:7ba1 with SMTP id
w127-20020aca6285000000b0038c2e507ba1mr1234061oib.9.1683396939261; Sat, 06
May 2023 11:15:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 6 May 2023 11:15:38 -0700 (PDT)
In-Reply-To: <u35prk$2ssbq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e956:2f5e:529f:d971;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e956:2f5e:529f:d971
References: <u35prk$2ssbq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e37d732c-7f37-4631-b785-b860a1f0a6edn@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 06 May 2023 18:15:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 66
 by: MitchAlsup - Sat, 6 May 2023 18:15 UTC

On Saturday, May 6, 2023 at 9:59:03 AM UTC-5, Marcus wrote:
> Load/store with auto-increment/decrement can reduce the number of
> instructions in many loops (especially those that mostly iterate over
> arrays of data). It can also be used in function prologues and epilogues
> (for push/pop functionality).
<
Can it actually save instructions ??
<
p = <some address>;
q = <some other address>;
for( i = 0; i < max; i++ )
*p++ = *q++;
<
LDA Rp,[IP,,displacement1]
LDA Rq,[IP,,displacement2]
MOV Ri,#0
VEC Rt,{}
top_of_loop:
LDSW Rqm,[Rq+Ri<<2]
STW Rgm,[Rp+Ri<<2
LOOP LE,Ri,#1,Rmax
end_of_loop:
>
Which instruction can be saved in this loop??
<
> For a long time I had dismissed load/store with auto-increment for my
> ISA (MRISC32). The reason is that a load operation with auto-increment
> would have TWO results (the loaded value and the updated address base),
<
That is the first problem.
<
> which would be a complication (all other instructions have at most one
> result).
>
> However, a couple of days ago I realized that store operations do not
> have any result, so I could add instructions for store with auto-
> increment, and still only have one result. I have a pretty good idea
> of how to do it (instruction encoding etc), and it would fit fairly
> well (the only oddity would be that the result register is not the
> first register address in the instruction word, but the second register
> address, which requires some more MUX:ing in the decoding stages).
<
So, autoincrement on STs only ??
>
> The next question is: What flavors should I have?
>
> - Post-increment (most common?)
> - Post-decrement
> - Pre-increment
> - Pre-decrement (second most common?)
<
Not having these eliminates having to choose.
>
> The "pre" variants would possibly add more logic to critical paths (e.g.
> add more gate delay in the AGU before the address is ready for the
> memory stage).
>
> Any thoughts? Is it worth it?
<
In my option, needing autoincrements is a sign of a weak ISA and
possibly that of a less than stellar compiler.
>
> /Marcus

Re: Load/Store with auto-increment

<13a8c99c-2022-4d9d-9f03-fb80f5c25a44n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32020&group=comp.arch#32020

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3715:b0:746:9016:1eb0 with SMTP id de21-20020a05620a371500b0074690161eb0mr1594535qkb.2.1683397041167;
Sat, 06 May 2023 11:17:21 -0700 (PDT)
X-Received: by 2002:aca:a98d:0:b0:386:d70b:d67c with SMTP id
s135-20020acaa98d000000b00386d70bd67cmr953775oie.11.1683397039415; Sat, 06
May 2023 11:17:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 6 May 2023 11:17:19 -0700 (PDT)
In-Reply-To: <u3634o$2uha6$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e956:2f5e:529f:d971;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e956:2f5e:529f:d971
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com> <u3634o$2uha6$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <13a8c99c-2022-4d9d-9f03-fb80f5c25a44n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 06 May 2023 18:17:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2642
 by: MitchAlsup - Sat, 6 May 2023 18:17 UTC

On Saturday, May 6, 2023 at 12:36:30 PM UTC-5, Marcus wrote:
> On 2023-05-06, John Levine wrote:
> > Here it is 50 years later and they're all gone. I think the increase
> > in code density wasn't worth the contortions to ensure that your data
> > structures fit the few cases that the autoincrement modes handled. It
> > also made it harder to parallelize and pipeline stuff since address
> > modes had side effects that had to be scheduled around or potentially
> > unwound in a page fault.
> Actually, ARM has auto-increment (even AArch64). I think that if you
> limit what you can do (not the crazy multi-memory accesses instructions
> that was popular in CISC, e.g. 68k), you should not have any problems
> with page fault handling etc. Unless...
>
> Does the auto-increment instruction implicitly introduce a data-
> dependency that's also dependent on the memory operation to complete?
<
Not necessarily, but it does create a base-register to base-register
dependency on uses of the addressing register. So, memory is not
compromised, but use of the register can be.
<
> Is there any real difference compared to doing the memory operation
> and the address increment in two separate instructions (in an OoO
> machine)?
>
> /Marcus

Re: Load/Store with auto-increment

<u36fd2$121nc$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32021&group=comp.arch#32021

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3fec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sat, 6 May 2023 21:04:34 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u36fd2$121nc$1@newsreader4.netcologne.de>
References: <u35prk$2ssbq$1@dont-email.me>
Injection-Date: Sat, 6 May 2023 21:04:34 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3fec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3fec:0:7285:c2ff:fe6c:992d";
logging-data="1115884"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 6 May 2023 21:04 UTC

Marcus <m.delete@this.bitsnbites.eu> schrieb:
> Load/store with auto-increment/decrement can reduce the number of
> instructions in many loops (especially those that mostly iterate over
> arrays of data). It can also be used in function prologues and epilogues
> (for push/pop functionality).

One step further: You can have something like POWER's load and
store with update. For example,

ldux rt,ra,rb

will load a doubleword from the address ra + rb and set ra to
ra + rb, or

ldu rt,num(ra)

will load rt from num + ra and set ra = ra + num.

You can simulate autoincrement/autodecrement if you write

ldu rt,8(ra)

or

ldu rt,-8(ra)

respectively.

> For a long time I had dismissed load/store with auto-increment for my
> ISA (MRISC32). The reason is that a load operation with auto-increment
> would have TWO results (the loaded value and the updated address base),
> which would be a complication (all other instructions have at most one
> result).

Exactly.

> However, a couple of days ago I realized that store operations do not
> have any result, so I could add instructions for store with auto-
> increment, and still only have one result.

That would create a rather weird asymmetry between load and store.
It could also create problems for the compiler - I'm not sure that
gcc is set up to easily handle different addressing modes for load
and store.

> I have a pretty good idea
> of how to do it (instruction encoding etc), and it would fit fairly
> well (the only oddity would be that the result register is not the
> first register address in the instruction word, but the second register
> address, which requires some more MUX:ing in the decoding stages).
>
> The next question is: What flavors should I have?
>
> - Post-increment (most common?)
> - Post-decrement
> - Pre-increment
> - Pre-decrement (second most common?)

If you want to save instructions in a loop and have a "compare to zero"
instruction (which I seem to remember you do), then a negative index
could be something else to try.

Consider transforming

for (int i=0; i<n; i++)
a[i] = b[i] + 2;

into

*ap = a + n;
*bp = b + n;
for (int i=-n; i != 0; i++)
ap[i] = bp[i] + 2;

and expressing the body of the loop as

start:
ldd r1,rb,-ri
addi r1,r1,2
std r1,ra,-ri
add ri,ri,1
beq0 ri,start

Hmm... is there any ISA which allows for both negative and positive
indexing?

> The "pre" variants would possibly add more logic to critical paths (e.g.
> add more gate delay in the AGU before the address is ready for the
> memory stage).
>
> Any thoughts? Is it worth it?

Not sure it is - this kind of instruction will be split into two
micro-instructions on any OoO machine, and probably for in-order,
as well.

Re: Load/Store with auto-increment

<1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32023&group=comp.arch#32023

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:f07:b0:74d:dcc:c9b7 with SMTP id v7-20020a05620a0f0700b0074d0dccc9b7mr1883353qkl.0.1683424836305;
Sat, 06 May 2023 19:00:36 -0700 (PDT)
X-Received: by 2002:a9d:7a42:0:b0:6a7:bdef:16a8 with SMTP id
z2-20020a9d7a42000000b006a7bdef16a8mr1593368otm.4.1683424836054; Sat, 06 May
2023 19:00:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 6 May 2023 19:00:35 -0700 (PDT)
In-Reply-To: <u36fd2$121nc$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:35e6:5c25:c066:8344;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:35e6:5c25:c066:8344
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 07 May 2023 02:00:36 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 19
 by: MitchAlsup - Sun, 7 May 2023 02:00 UTC

Consider a string of *p++
a = *p++;
b = *p++;
c = *p++;
<
Here we see the failure of the ++ or -- notation.
The LD of b is dependent on the ++ of a
The LD of c is dependent on the ++ of b
Whereas if the above was written::
<
a = p[0];
b = p[1];
c = p[2];
p +=3;
<
Now all three LDs are independent and can issue/execute/retire
simultaneously. Also, the add to p is independent, so we took
3 "instructions" that were serially dependent and make them into
4 instructions that are completely independent in all phases of
execution.

Re: Load/Store with auto-increment

<u3744r$3779u$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32024&group=comp.arch#32024

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sat, 6 May 2023 21:58:33 -0500
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <u3744r$3779u$1@dont-email.me>
References: <u35prk$2ssbq$1@dont-email.me>
<e37d732c-7f37-4631-b785-b860a1f0a6edn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 7 May 2023 02:58:35 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="50d01ac6052893174202701e4104c5f8";
logging-data="3382590"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+oZ4XSr8lkw8ak/0F2FstH"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.10.1
Cancel-Lock: sha1:+FPfqmEgc1zgcsLUzn+tKDQ9oqs=
In-Reply-To: <e37d732c-7f37-4631-b785-b860a1f0a6edn@googlegroups.com>
Content-Language: en-US
 by: BGB - Sun, 7 May 2023 02:58 UTC

On 5/6/2023 1:15 PM, MitchAlsup wrote:
> On Saturday, May 6, 2023 at 9:59:03 AM UTC-5, Marcus wrote:
>> Load/store with auto-increment/decrement can reduce the number of
>> instructions in many loops (especially those that mostly iterate over
>> arrays of data). It can also be used in function prologues and epilogues
>> (for push/pop functionality).
> <
> Can it actually save instructions ??
> <
> p = <some address>;
> q = <some other address>;
> for( i = 0; i < max; i++ )
> *p++ = *q++;
> <
> LDA Rp,[IP,,displacement1]
> LDA Rq,[IP,,displacement2]
> MOV Ri,#0
> VEC Rt,{}
> top_of_loop:
> LDSW Rqm,[Rq+Ri<<2]
> STW Rgm,[Rp+Ri<<2
> LOOP LE,Ri,#1,Rmax
> end_of_loop:
>>
> Which instruction can be saved in this loop??
> <
>> For a long time I had dismissed load/store with auto-increment for my
>> ISA (MRISC32). The reason is that a load operation with auto-increment
>> would have TWO results (the loaded value and the updated address base),
> <
> That is the first problem.
> <
>> which would be a complication (all other instructions have at most one
>> result).
>>
>> However, a couple of days ago I realized that store operations do not
>> have any result, so I could add instructions for store with auto-
>> increment, and still only have one result. I have a pretty good idea
>> of how to do it (instruction encoding etc), and it would fit fairly
>> well (the only oddity would be that the result register is not the
>> first register address in the instruction word, but the second register
>> address, which requires some more MUX:ing in the decoding stages).
> <
> So, autoincrement on STs only ??
>>
>> The next question is: What flavors should I have?
>>
>> - Post-increment (most common?)
>> - Post-decrement
>> - Pre-increment
>> - Pre-decrement (second most common?)
> <
> Not having these eliminates having to choose.
>>
>> The "pre" variants would possibly add more logic to critical paths (e.g.
>> add more gate delay in the AGU before the address is ready for the
>> memory stage).
>>
>> Any thoughts? Is it worth it?
> <
> In my option, needing autoincrements is a sign of a weak ISA and
> possibly that of a less than stellar compiler.

I skipped auto-increment as it typically saves "hardly anything" (at
best) and adds an awkward case that needs to be decomposed into two
sub-operations (most other cases).

So, I didn't really feel it was "worth it".

It could almost make sense on a 1-wide machine, except that one needs to
add one of the main expensive parts of a 2-wide machine in order to
support it (and on a superscalar machine, the increment would likely end
up running in parallel with some other op anyways).

....

For register save/restore, maybe it makes sense:
But, one can use normal displacement loads/stores and a single big
adjustment instead;
Things like "*ptr++" could use it, but are still not common enough to
make it significant (combined with the thing of the "ptr++" part usually
just running in parallel with another op anyways).

>>
>> /Marcus

Re: Load/Store with auto-increment

<589f8ad0-afa6-455e-b896-11e988c77112n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32025&group=comp.arch#32025

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2945:b0:74a:da50:3ce1 with SMTP id n5-20020a05620a294500b0074ada503ce1mr2383126qkp.7.1683444977414;
Sun, 07 May 2023 00:36:17 -0700 (PDT)
X-Received: by 2002:aca:bf0a:0:b0:38e:ab70:28dc with SMTP id
p10-20020acabf0a000000b0038eab7028dcmr1581539oif.8.1683444977157; Sun, 07 May
2023 00:36:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 7 May 2023 00:36:16 -0700 (PDT)
In-Reply-To: <u3744r$3779u$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1dde:6a00:f4c4:b050:c21e:322b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1dde:6a00:f4c4:b050:c21e:322b
References: <u35prk$2ssbq$1@dont-email.me> <e37d732c-7f37-4631-b785-b860a1f0a6edn@googlegroups.com>
<u3744r$3779u$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <589f8ad0-afa6-455e-b896-11e988c77112n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sun, 07 May 2023 07:36:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5691
 by: robf...@gmail.com - Sun, 7 May 2023 07:36 UTC

On Saturday, May 6, 2023 at 10:58:39 PM UTC-4, BGB wrote:
> On 5/6/2023 1:15 PM, MitchAlsup wrote:
> > On Saturday, May 6, 2023 at 9:59:03 AM UTC-5, Marcus wrote:
> >> Load/store with auto-increment/decrement can reduce the number of
> >> instructions in many loops (especially those that mostly iterate over
> >> arrays of data). It can also be used in function prologues and epilogues
> >> (for push/pop functionality).
> > <
> > Can it actually save instructions ??
> > <
> > p = <some address>;
> > q = <some other address>;
> > for( i = 0; i < max; i++ )
> > *p++ = *q++;
> > <
> > LDA Rp,[IP,,displacement1]
> > LDA Rq,[IP,,displacement2]
> > MOV Ri,#0
> > VEC Rt,{}
> > top_of_loop:
> > LDSW Rqm,[Rq+Ri<<2]
> > STW Rgm,[Rp+Ri<<2
> > LOOP LE,Ri,#1,Rmax
> > end_of_loop:
> >>
> > Which instruction can be saved in this loop??
> > <
> >> For a long time I had dismissed load/store with auto-increment for my
> >> ISA (MRISC32). The reason is that a load operation with auto-increment
> >> would have TWO results (the loaded value and the updated address base),
> > <
> > That is the first problem.
> > <
> >> which would be a complication (all other instructions have at most one
> >> result).
> >>
> >> However, a couple of days ago I realized that store operations do not
> >> have any result, so I could add instructions for store with auto-
> >> increment, and still only have one result. I have a pretty good idea
> >> of how to do it (instruction encoding etc), and it would fit fairly
> >> well (the only oddity would be that the result register is not the
> >> first register address in the instruction word, but the second register
> >> address, which requires some more MUX:ing in the decoding stages).
> > <
> > So, autoincrement on STs only ??
> >>
> >> The next question is: What flavors should I have?
> >>
> >> - Post-increment (most common?)
> >> - Post-decrement
> >> - Pre-increment
> >> - Pre-decrement (second most common?)
> > <
> > Not having these eliminates having to choose.
> >>
> >> The "pre" variants would possibly add more logic to critical paths (e.g.
> >> add more gate delay in the AGU before the address is ready for the
> >> memory stage).
> >>
> >> Any thoughts? Is it worth it?
> > <
> > In my option, needing autoincrements is a sign of a weak ISA and
> > possibly that of a less than stellar compiler.
> I skipped auto-increment as it typically saves "hardly anything" (at
> best) and adds an awkward case that needs to be decomposed into two
> sub-operations (most other cases).
>
> So, I didn't really feel it was "worth it".
>
> It could almost make sense on a 1-wide machine, except that one needs to
> add one of the main expensive parts of a 2-wide machine in order to
> support it (and on a superscalar machine, the increment would likely end
> up running in parallel with some other op anyways).
>
> ...
>
>
> For register save/restore, maybe it makes sense:
> But, one can use normal displacement loads/stores and a single big
> adjustment instead;
> Things like "*ptr++" could use it, but are still not common enough to
> make it significant (combined with the thing of the "ptr++" part usually
> just running in parallel with another op anyways).
>
>
> >>
> >> /Marcus

Auto inc/dec can be difficult for the compiler to make use of. Sometime
the p++ will end up as a separate add anyway. If there is scaled indexed
addressing often loop increment vars can be used, and the loop
increment is needed anyway.
p[n] = q[n];
n++;
I used extra bits available in load / store instruction to indicate the
cache-ability of data. Requires compiler support though.

Having a push instruction can be handy, and good for code density if it
can push multiple registers in a single instruction.

I have multi-register loads and stores in groups of eight registers for
Thor. Based on filling up the entire cache line with register data then
issuing a single load or store operation.

Re: Load/Store with auto-increment

<2023May7.140701@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32026&group=comp.arch#32026

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 07 May 2023 12:07:01 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 66
Message-ID: <2023May7.140701@mips.complang.tuwien.ac.at>
References: <u35prk$2ssbq$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="3e7108f4b3fcf6152a3c6f6b0d93514a";
logging-data="3556052"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19seCGWc4hzrf2WTqj5Rb9r"
Cancel-Lock: sha1:F5W5h6ujkGc/bzPUNQsQ0NuBulU=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 7 May 2023 12:07 UTC

Marcus <m.delete@this.bitsnbites.eu> writes:
>Load/store with auto-increment/decrement can reduce the number of
>instructions in many loops (especially those that mostly iterate over
>arrays of data).

Yes.

If you do it only for stores, as suggested below, it could be used for
loops that read from one or more arrays and write to one array, all
with the same stride, as follows (in pseudo-C-code):

/* read from a and b, write to c */
da=a-c;
db=b-c;
for (...) {
*c = c[da] * c[db];
c+=stride;
}

the "c+=stride" could become the autoincrement of the store.

>It can also be used in function prologues and epilogues
>(for push/pop functionality).

Not so great, because it introduces data dependencies between the
stores that you then have to get rid of if you want to support more
than one store per cycle. As for the pops, those are loads, and here
the autoincrement would require an additional write port to the
register file, as you point out below; plus it would introduce data
dependencies that you don't want (many cores support more than one
load per cycle).

>The next question is: What flavors should I have?
>
>- Post-increment (most common?)
>- Post-decrement
>- Pre-increment
>- Pre-decrement (second most common?)
>
>The "pre" variants would possibly add more logic to critical paths (e.g.
>add more gate delay in the AGU before the address is ready for the
>memory stage).

You typically have memory-access instructions that include an addition
in the address computation; in that case pre obviously has no extra
cost. The cost of the addition can be reduced (eliminated) with a
technique called sum-addressed memory. OTOH, IA-64 supports only
memory accesses of an address given in a register, so here the
architects apparently thought that sum-addressed memory is still too
slow.

Increment vs. decrement: If your store supports reading two registers
for address computation (in addition to the data register), you can
put the stride in a register, making the whole question moot. Even if
you only support reading one register in addition to the data, you can
have a sign-extended constant stride, again giving you both increment
and decrement options. Note that having a store that does not support
the sum of two registers, but does support autoincrement, and a load
that supports the sum of two registers as address is means that both
loads and stores can read two registers and write one register, which
may be useful for certain microarchitectural approaches.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Load/Store with auto-increment

<2023May7.174702@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32027&group=comp.arch#32027

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 07 May 2023 15:47:02 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2023May7.174702@mips.complang.tuwien.ac.at>
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com>
Injection-Info: dont-email.me; posting-host="3e7108f4b3fcf6152a3c6f6b0d93514a";
logging-data="3613319"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WlKaqQJKyGDZKl2YMo14g"
Cancel-Lock: sha1:GiHQzA0U8C3SvdKw5cZbaGu8FE8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 7 May 2023 15:47 UTC

John Levine <johnl@taugh.com> writes:
>Here it is 50 years later and they're all gone.

PowerPC and ARM A32 is still there. And there's even a new
architecture with auto-increment: ARM A64.

>I think the increase
>in code density wasn't worth the contortions to ensure that your data
>structures fit the few cases that the autoincrement modes handled.

Are you thinking of the DSPs that do not have displacement addressing,
but have auto-increment, leading to a number of papers on how the
compiler should arrange the variables to make best use of that?

With displacement addressing no such contortions are necessary.

>It
>also made it harder to parallelize and pipeline stuff since address
>modes had side effects that had to be scheduled around or potentially
>unwound in a page fault.

Pipelining was apparently no problem, as evidenced by several early
RISCs (ARM (A32), HPPA, PowerPC) having auto-increment. Just don't
write the address register before verifying the address. And
parallelizing is no problem, either: IA-64 was designed for Explicitly
Parallel Instruction Computing, and has auto-increment.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Load/Store with auto-increment

<O_P5M.569945$5S78.177648@fx48.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32028&group=comp.arch#32028

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx48.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Load/Store with auto-increment
Newsgroups: comp.arch
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de> <1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com>
Lines: 24
Message-ID: <O_P5M.569945$5S78.177648@fx48.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 07 May 2023 16:05:02 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 07 May 2023 16:05:02 GMT
X-Received-Bytes: 1504
 by: Scott Lurndal - Sun, 7 May 2023 16:05 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>Consider a string of *p++
> a = *p++;
> b = *p++;
> c = *p++;
><
>Here we see the failure of the ++ or -- notation.
>The LD of b is dependent on the ++ of a
>The LD of c is dependent on the ++ of b
>Whereas if the above was written::
><
> a = p[0];
> b = p[1];
> c = p[2];
> p +=3;
><
>Now all three LDs are independent and can issue/execute/retire
>simultaneously. Also, the add to p is independent, so we took
>3 "instructions" that were serially dependent and make them into
>4 instructions that are completely independent in all phases of
>execution.

Can the compiler not recogize the first pattern and convert
it into the second form under the as-if rule?

Re: Load/Store with auto-increment

<u38l6v$1shq$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32029&group=comp.arch#32029

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!sewer!alphared!2.eu.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 16:55:59 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u38l6v$1shq$1@gal.iecc.com>
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com> <2023May7.174702@mips.complang.tuwien.ac.at>
Injection-Date: Sun, 7 May 2023 16:55:59 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="62010"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com> <2023May7.174702@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sun, 7 May 2023 16:55 UTC

It appears that Anton Ertl <anton@mips.complang.tuwien.ac.at> said:
>PowerPC and ARM A32 is still there. And there's even a new
>architecture with auto-increment: ARM A64.

I need to take a look.

>>I think the increase
>>in code density wasn't worth the contortions to ensure that your data
>>structures fit the few cases that the autoincrement modes handled.
>
>Are you thinking of the DSPs that do not have displacement addressing,
>but have auto-increment, leading to a number of papers on how the
>compiler should arrange the variables to make best use of that?

Autoincrement only increments by the size of a single datum so it
works for strings and vectors, not for arrays of structures or 2-D
arrays. Compare it to the 360's BXLE loop closing instruction which
put the stride in a register so it could be whatever you wanted.
It also had base+index which the Vax did but the PDP-11 only sort
of did if you used absolute addresses instead of a base.

On the PDP-11 autoincrement allowed a two instruction string copy loop:

c: movb (r1)+,(r2)+
bnz c ; loop if the byte wasn't zero

but how useful is that now? I don't know.

>With displacement addressing no such contortions are necessary.

I don't see how that solves the stride problem. Or did you mean
something else?

>>It >also made it harder to parallelize and pipeline stuff since address
>>modes had side effects that had to be scheduled around or potentially
>>unwound in a page fault.
>
>Pipelining was apparently no problem, as evidenced by several early
>RISCs (ARM (A32), HPPA, PowerPC) having auto-increment. Just don't
>write the address register before verifying the address. ...

Do they have the kind of hazards that the -11 and Vax did, where you could
autoincrement the same register more than once in a single instruction, or
use the incremented register as an operand? That made things messy.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Load/Store with auto-increment

<u38o39$3f5ig$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32030&group=comp.arch#32030

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 19:45:13 +0200
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <u38o39$3f5ig$1@dont-email.me>
References: <u35prk$2ssbq$1@dont-email.me>
<u36fd2$121nc$1@newsreader4.netcologne.de>
<1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com>
<O_P5M.569945$5S78.177648@fx48.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 7 May 2023 17:45:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="421705caa6da4b74b729e5a74b7f32c2";
logging-data="3642960"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19gLZEj8qhxCZ0QXN7gZysWVaNbJ/qqax8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.7.1
Cancel-Lock: sha1:N1Io1iZBI5D9hhBSc2PiBNTEpEI=
In-Reply-To: <O_P5M.569945$5S78.177648@fx48.iad>
Content-Language: en-GB
 by: David Brown - Sun, 7 May 2023 17:45 UTC

On 07/05/2023 18:05, Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> Consider a string of *p++
>> a = *p++;
>> b = *p++;
>> c = *p++;
>> <
>> Here we see the failure of the ++ or -- notation.
>> The LD of b is dependent on the ++ of a
>> The LD of c is dependent on the ++ of b
>> Whereas if the above was written::
>> <
>> a = p[0];
>> b = p[1];
>> c = p[2];
>> p +=3;
>> <
>> Now all three LDs are independent and can issue/execute/retire
>> simultaneously. Also, the add to p is independent, so we took
>> 3 "instructions" that were serially dependent and make them into
>> 4 instructions that are completely independent in all phases of
>> execution.
>
> Can the compiler not recogize the first pattern and convert
> it into the second form under the as-if rule?

Yes, and compilers have done such conversions for decades. (Of course,
that assumes you are not dealing with external data, or expressions that
could alias each other.)

Re: Load/Store with auto-increment

<u38o72$3f5ig$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32031&group=comp.arch#32031

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 19:47:14 +0200
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <u38o72$3f5ig$2@dont-email.me>
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com>
<2023May7.174702@mips.complang.tuwien.ac.at> <u38l6v$1shq$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 7 May 2023 17:47:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="421705caa6da4b74b729e5a74b7f32c2";
logging-data="3642960"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192txeTFq3mqYdn9V/tRMnbtZQC5Z88sQ0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.7.1
Cancel-Lock: sha1:iz5Bbh4JGmnhm3Mdg7J7TOpCJUw=
In-Reply-To: <u38l6v$1shq$1@gal.iecc.com>
Content-Language: en-GB
 by: David Brown - Sun, 7 May 2023 17:47 UTC

On 07/05/2023 18:55, John Levine wrote:
> It appears that Anton Ertl <anton@mips.complang.tuwien.ac.at> said:
>> PowerPC and ARM A32 is still there. And there's even a new
>> architecture with auto-increment: ARM A64.
>
> I need to take a look.
>
>>> I think the increase
>>> in code density wasn't worth the contortions to ensure that your data
>>> structures fit the few cases that the autoincrement modes handled.
>>
>> Are you thinking of the DSPs that do not have displacement addressing,
>> but have auto-increment, leading to a number of papers on how the
>> compiler should arrange the variables to make best use of that?
>
> Autoincrement only increments by the size of a single datum so it
> works for strings and vectors, not for arrays of structures or 2-D
> arrays. Compare it to the 360's BXLE loop closing instruction which
> put the stride in a register so it could be whatever you wanted.
> It also had base+index which the Vax did but the PDP-11 only sort
> of did if you used absolute addresses instead of a base.
>
> On the PDP-11 autoincrement allowed a two instruction string copy loop:
>
> c: movb (r1)+,(r2)+
> bnz c ; loop if the byte wasn't zero
>
> but how useful is that now? I don't know.
>

Similar instructions would be used for copying memory blocks, and that
is very useful!

Re: Load/Store with auto-increment

<u38or1$13gb2$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32032&group=comp.arch#32032

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3fec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 17:57:53 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u38or1$13gb2$1@newsreader4.netcologne.de>
References: <u35prk$2ssbq$1@dont-email.me>
<u36fd2$121nc$1@newsreader4.netcologne.de>
<1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com>
<O_P5M.569945$5S78.177648@fx48.iad>
Injection-Date: Sun, 7 May 2023 17:57:53 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3fec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3fec:0:7285:c2ff:fe6c:992d";
logging-data="1163618"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 7 May 2023 17:57 UTC

Scott Lurndal <scott@slp53.sl.home> schrieb:
> MitchAlsup <MitchAlsup@aol.com> writes:
>>Consider a string of *p++
>> a = *p++;
>> b = *p++;
>> c = *p++;
>><
>>Here we see the failure of the ++ or -- notation.
>>The LD of b is dependent on the ++ of a
>>The LD of c is dependent on the ++ of b
>>Whereas if the above was written::
>><
>> a = p[0];
>> b = p[1];
>> c = p[2];
>> p +=3;
>><
>>Now all three LDs are independent and can issue/execute/retire
>>simultaneously. Also, the add to p is independent, so we took
>>3 "instructions" that were serially dependent and make them into
>>4 instructions that are completely independent in all phases of
>>execution.
>
> Can the compiler not recogize the first pattern and convert
> it into the second form under the as-if rule?

Of course:

void bar (int a, int b, int c);

void foo (int *p)
{ int a, b, c;
a = *p++;
b = *p++;
c = *p++;
bar (a, b, c);
}

results in

lw a2,8(a0)
lw a1,4(a0)
lw a0,0(a0)
tail bar

on RISC-V, for example (aarch64 plays games with load double,
so it's a bit harder to read).

But I believe Mitch was referring to the assembler equivalent, where
p be held in a register.

Autodecrement and increment is done on 386ff. How do they avoid
the register dependency of the stack register? Special handling?
Instruction fusing?

Re: Load/Store with auto-increment

<u38plm$2b6n$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32033&group=comp.arch#32033

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 18:12:06 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u38plm$2b6n$1@gal.iecc.com>
References: <u35prk$2ssbq$1@dont-email.me> <2023May7.174702@mips.complang.tuwien.ac.at> <u38l6v$1shq$1@gal.iecc.com> <u38o72$3f5ig$2@dont-email.me>
Injection-Date: Sun, 7 May 2023 18:12:06 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="77015"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <u35prk$2ssbq$1@dont-email.me> <2023May7.174702@mips.complang.tuwien.ac.at> <u38l6v$1shq$1@gal.iecc.com> <u38o72$3f5ig$2@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sun, 7 May 2023 18:12 UTC

It appears that David Brown <david.brown@hesbynett.no> said:
>> On the PDP-11 autoincrement allowed a two instruction string copy loop:
>>
>> c: movb (r1)+,(r2)+
>> bnz c ; loop if the byte wasn't zero
>>
>> but how useful is that now? I don't know.
>
>Similar instructions would be used for copying memory blocks, and that
>is very useful!

Not really. On modern computers you want to copy in ways that make
best use of the multiple registers so you're more likely to do a
sequence of loads followed by a sequence of stores, mabybe with shift
and mask in between if they're not aligned, then move on to the next
block. You could use autoincrement but you'll probably get better
performance with instructions that clearly don't depend on each other
so they can run in parallel, e.g.

; r8 is source, r9 is dest
loop:
ld r1,0[r8]
ld r2,8[r8]
ld r3,16[r8]
ld r4,24[r8]
; shift and mask to align if needed
st r1,0[r9]
st r2,8[r9]
st r3,16[r9]
st r4,24[r9]

addi r8,#32
addi r9,#32
branch if not done to loop

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Load/Store with auto-increment

<7b5a372d-f4af-4c05-bc44-f30bb259f0d3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32034&group=comp.arch#32034

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5a03:0:b0:5ef:4ec7:c70a with SMTP id ei3-20020ad45a03000000b005ef4ec7c70amr1806840qvb.1.1683484164931;
Sun, 07 May 2023 11:29:24 -0700 (PDT)
X-Received: by 2002:a9d:7559:0:b0:6a5:f8b6:ccce with SMTP id
b25-20020a9d7559000000b006a5f8b6cccemr2104037otl.6.1683484164634; Sun, 07 May
2023 11:29:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 7 May 2023 11:29:24 -0700 (PDT)
In-Reply-To: <O_P5M.569945$5S78.177648@fx48.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b114:e412:2171:9b52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b114:e412:2171:9b52
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com> <O_P5M.569945$5S78.177648@fx48.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7b5a372d-f4af-4c05-bc44-f30bb259f0d3n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 07 May 2023 18:29:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2337
 by: MitchAlsup - Sun, 7 May 2023 18:29 UTC

On Sunday, May 7, 2023 at 11:05:06 AM UTC-5, Scott Lurndal wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >Consider a string of *p++
> > a = *p++;
> > b = *p++;
> > c = *p++;
> ><
> >Here we see the failure of the ++ or -- notation.
> >The LD of b is dependent on the ++ of a
> >The LD of c is dependent on the ++ of b
> >Whereas if the above was written::
> ><
> > a = p[0];
> > b = p[1];
> > c = p[2];
> > p +=3;
> ><
> >Now all three LDs are independent and can issue/execute/retire
> >simultaneously. Also, the add to p is independent, so we took
> >3 "instructions" that were serially dependent and make them into
> >4 instructions that are completely independent in all phases of
> >execution.
> Can the compiler not recogize the first pattern and convert
> it into the second form under the as-if rule?
<
A) the compiler is so allowed
B) once the compiler is doing this, wanting auto{inc,dec} in your
ISA evaporates.

Re: Load/Store with auto-increment

<fca64efd-7308-49c9-9a30-74d6d1ddee8an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32035&group=comp.arch#32035

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5709:0:b0:3e4:e5bf:a24f with SMTP id 9-20020ac85709000000b003e4e5bfa24fmr3029689qtw.7.1683484311318;
Sun, 07 May 2023 11:31:51 -0700 (PDT)
X-Received: by 2002:a4a:bd97:0:b0:547:74ff:19d6 with SMTP id
k23-20020a4abd97000000b0054774ff19d6mr2638509oop.1.1683484311046; Sun, 07 May
2023 11:31:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 7 May 2023 11:31:50 -0700 (PDT)
In-Reply-To: <u38o72$3f5ig$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b114:e412:2171:9b52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b114:e412:2171:9b52
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com>
<2023May7.174702@mips.complang.tuwien.ac.at> <u38l6v$1shq$1@gal.iecc.com> <u38o72$3f5ig$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fca64efd-7308-49c9-9a30-74d6d1ddee8an@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 07 May 2023 18:31:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3135
 by: MitchAlsup - Sun, 7 May 2023 18:31 UTC

On Sunday, May 7, 2023 at 12:49:24 PM UTC-5, David Brown wrote:
> On 07/05/2023 18:55, John Levine wrote:
> > It appears that Anton Ertl <an...@mips.complang.tuwien.ac.at> said:
> >> PowerPC and ARM A32 is still there. And there's even a new
> >> architecture with auto-increment: ARM A64.
> >
> > I need to take a look.
> >
> >>> I think the increase
> >>> in code density wasn't worth the contortions to ensure that your data
> >>> structures fit the few cases that the autoincrement modes handled.
> >>
> >> Are you thinking of the DSPs that do not have displacement addressing,
> >> but have auto-increment, leading to a number of papers on how the
> >> compiler should arrange the variables to make best use of that?
> >
> > Autoincrement only increments by the size of a single datum so it
> > works for strings and vectors, not for arrays of structures or 2-D
> > arrays. Compare it to the 360's BXLE loop closing instruction which
> > put the stride in a register so it could be whatever you wanted.
> > It also had base+index which the Vax did but the PDP-11 only sort
> > of did if you used absolute addresses instead of a base.
> >
> > On the PDP-11 autoincrement allowed a two instruction string copy loop:
> >
> > c: movb (r1)+,(r2)+
> > bnz c ; loop if the byte wasn't zero
> >
> > but how useful is that now? I don't know.
> >
> Similar instructions would be used for copying memory blocks, and that
> is very useful!
<
Except you are moving blocks 1-byte at a time--which was fine for PDP-11 days
and for the era of 16-bits "was sufficient" addressing.

Re: Load/Store with auto-increment

<35ac8d12-b200-40a2-b61d-3d1ade489fafn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32036&group=comp.arch#32036

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1b89:b0:3ef:3cdf:c291 with SMTP id bp9-20020a05622a1b8900b003ef3cdfc291mr3489552qtb.13.1683484418579;
Sun, 07 May 2023 11:33:38 -0700 (PDT)
X-Received: by 2002:a9d:7e96:0:b0:6a3:8428:fd4e with SMTP id
m22-20020a9d7e96000000b006a38428fd4emr2110071otp.6.1683484418314; Sun, 07 May
2023 11:33:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 7 May 2023 11:33:38 -0700 (PDT)
In-Reply-To: <u38or1$13gb2$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b114:e412:2171:9b52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b114:e412:2171:9b52
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com> <O_P5M.569945$5S78.177648@fx48.iad>
<u38or1$13gb2$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <35ac8d12-b200-40a2-b61d-3d1ade489fafn@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 07 May 2023 18:33:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3326
 by: MitchAlsup - Sun, 7 May 2023 18:33 UTC

On Sunday, May 7, 2023 at 12:58:59 PM UTC-5, Thomas Koenig wrote:
> Scott Lurndal <sc...@slp53.sl.home> schrieb:
> > MitchAlsup <Mitch...@aol.com> writes:
> >>Consider a string of *p++
> >> a = *p++;
> >> b = *p++;
> >> c = *p++;
> >><
> >>Here we see the failure of the ++ or -- notation.
> >>The LD of b is dependent on the ++ of a
> >>The LD of c is dependent on the ++ of b
> >>Whereas if the above was written::
> >><
> >> a = p[0];
> >> b = p[1];
> >> c = p[2];
> >> p +=3;
> >><
> >>Now all three LDs are independent and can issue/execute/retire
> >>simultaneously. Also, the add to p is independent, so we took
> >>3 "instructions" that were serially dependent and make them into
> >>4 instructions that are completely independent in all phases of
> >>execution.
> >
> > Can the compiler not recogize the first pattern and convert
> > it into the second form under the as-if rule?
> Of course:
>
> void bar (int a, int b, int c);
>
> void foo (int *p)
> {
> int a, b, c;
> a = *p++;
> b = *p++;
> c = *p++;
> bar (a, b, c);
> }
>
> results in
>
> lw a2,8(a0)
> lw a1,4(a0)
> lw a0,0(a0)
> tail bar
>
> on RISC-V, for example (aarch64 plays games with load double,
> so it's a bit harder to read).
>
> But I believe Mitch was referring to the assembler equivalent, where
> p be held in a register.
>
> Autodecrement and increment is done on 386ff. How do they avoid
> the register dependency of the stack register? Special handling?
> Instruction fusing?
<
They did not--they just "ate" the latency and register conflicts.
But in general, the Great-Big execution window made all those
"go away".

Re: Load/Store with auto-increment

<0b35b4ce-0669-49cb-9a12-1443805f98e1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32037&group=comp.arch#32037

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3193:b0:74e:34b:d1b8 with SMTP id bi19-20020a05620a319300b0074e034bd1b8mr2920009qkb.12.1683484583837;
Sun, 07 May 2023 11:36:23 -0700 (PDT)
X-Received: by 2002:aca:bf87:0:b0:390:7dc9:dd39 with SMTP id
p129-20020acabf87000000b003907dc9dd39mr1409198oif.10.1683484583541; Sun, 07
May 2023 11:36:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 7 May 2023 11:36:23 -0700 (PDT)
In-Reply-To: <u38plm$2b6n$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b114:e412:2171:9b52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b114:e412:2171:9b52
References: <u35prk$2ssbq$1@dont-email.me> <2023May7.174702@mips.complang.tuwien.ac.at>
<u38l6v$1shq$1@gal.iecc.com> <u38o72$3f5ig$2@dont-email.me> <u38plm$2b6n$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0b35b4ce-0669-49cb-9a12-1443805f98e1n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 07 May 2023 18:36:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3065
 by: MitchAlsup - Sun, 7 May 2023 18:36 UTC

On Sunday, May 7, 2023 at 1:12:10 PM UTC-5, John Levine wrote:
> It appears that David Brown <david...@hesbynett.no> said:
> >> On the PDP-11 autoincrement allowed a two instruction string copy loop:
> >>
> >> c: movb (r1)+,(r2)+
> >> bnz c ; loop if the byte wasn't zero
> >>
> >> but how useful is that now? I don't know.
> >
> >Similar instructions would be used for copying memory blocks, and that
> >is very useful!
> Not really. On modern computers you want to copy in ways that make
> best use of the multiple registers so you're more likely to do a
> sequence of loads followed by a sequence of stores, mabybe with shift
> and mask in between if they're not aligned, then move on to the next
> block. You could use autoincrement but you'll probably get better
> performance with instructions that clearly don't depend on each other
> so they can run in parallel, e.g.
<
Or you can (put into ISA and) use MM (memory to memory move)
<
MM Rcount,Rfrom,Rto
<
And rest assured that HW will simply do the optimal thing for that
implementation {up to 1 cache line per cycle.}
>
> ; r8 is source, r9 is dest
> loop:
> ld r1,0[r8]
> ld r2,8[r8]
> ld r3,16[r8]
> ld r4,24[r8]
> ; shift and mask to align if needed
> st r1,0[r9]
> st r2,8[r9]
> st r3,16[r9]
> st r4,24[r9]
>
> addi r8,#32
> addi r9,#32
> branch if not done to loop
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

Re: Load/Store with auto-increment

<u38t36$3g2cj$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32038&group=comp.arch#32038

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 12:10:27 -0700
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <u38t36$3g2cj$1@dont-email.me>
References: <u35prk$2ssbq$1@dont-email.me> <u35s66$280b$1@gal.iecc.com>
<2023May7.174702@mips.complang.tuwien.ac.at> <u38l6v$1shq$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 7 May 2023 19:10:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d4876e9109ca5939ccdc080e24c36f29";
logging-data="3672467"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/gne2j0nxXSfVo+pvYfDk6CE9AHCOPr9Y="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.10.1
Cancel-Lock: sha1:MEnnbyeZjooSWg+X0iWSl0rhG8k=
In-Reply-To: <u38l6v$1shq$1@gal.iecc.com>
Content-Language: en-US
 by: Stephen Fuld - Sun, 7 May 2023 19:10 UTC

On 5/7/2023 9:55 AM, John Levine wrote:

snip

> Autoincrement only increments by the size of a single datum so it
> works for strings and vectors, not for arrays of structures or 2-D
> arrays. Compare it to the 360's BXLE loop closing instruction which
> put the stride in a register so it could be whatever you wanted.

Or the 1108 which allowed you to specify, with an instruction bit, that
the high order half of an index register is added to the low order half
(which is all that was used for address calculation) after the memory
address is computed.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Load/Store with auto-increment

<u392pi$13o38$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32039&group=comp.arch#32039

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3fec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 20:47:46 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u392pi$13o38$1@newsreader4.netcologne.de>
References: <u35prk$2ssbq$1@dont-email.me>
<u36fd2$121nc$1@newsreader4.netcologne.de>
<1b0d15ec-a257-483f-82ec-a751774b1d9fn@googlegroups.com>
<O_P5M.569945$5S78.177648@fx48.iad>
<u38or1$13gb2$1@newsreader4.netcologne.de>
Injection-Date: Sun, 7 May 2023 20:47:46 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3fec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3fec:0:7285:c2ff:fe6c:992d";
logging-data="1171560"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 7 May 2023 20:47 UTC

Thomas Koenig <tkoenig@netcologne.de> schrieb:

> Autodecrement and increment is done on 386ff. How do they avoid
> the register dependency of the stack register? Special handling?
> Instruction fusing?

Seems like they have a dedicated stack engine for the
purpose. Agner Fog (who else) has a nice explanation at
https://agner.org/optimize/microarchitecture.pdf . Basically,
there is an extra stage in the pipeline for handling stack pointers
and for inserting stack synchronization micro-ops.

That is one level of complexity that address + offset addressing
relative to the stack pointer solves nicely.

Re: Load/Store with auto-increment

<u3954k$3hea5$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32040&group=comp.arch#32040

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
Date: Sun, 7 May 2023 16:27:45 -0500
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <u3954k$3hea5$1@dont-email.me>
References: <u35prk$2ssbq$1@dont-email.me>
<2023May7.140701@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 7 May 2023 21:27:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="50d01ac6052893174202701e4104c5f8";
logging-data="3717445"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/IEL6Q1+NqDW2e3tqDvEd1"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.10.1
Cancel-Lock: sha1:55WB0g1fhh21vH3TTQTlOL0ORys=
Content-Language: en-US
In-Reply-To: <2023May7.140701@mips.complang.tuwien.ac.at>
 by: BGB - Sun, 7 May 2023 21:27 UTC

On 5/7/2023 7:07 AM, Anton Ertl wrote:
> Marcus <m.delete@this.bitsnbites.eu> writes:
>> Load/store with auto-increment/decrement can reduce the number of
>> instructions in many loops (especially those that mostly iterate over
>> arrays of data).
>
> Yes.
>
> If you do it only for stores, as suggested below, it could be used for
> loops that read from one or more arrays and write to one array, all
> with the same stride, as follows (in pseudo-C-code):
>
> /* read from a and b, write to c */
> da=a-c;
> db=b-c;
> for (...) {
> *c = c[da] * c[db];
> c+=stride;
> }
>
> the "c+=stride" could become the autoincrement of the store.
>

Not all instructions are created equal.

Fewer instructions may not be a win if these instructions would result
in a higher latency.

>> It can also be used in function prologues and epilogues
>> (for push/pop functionality).
>
> Not so great, because it introduces data dependencies between the
> stores that you then have to get rid of if you want to support more
> than one store per cycle. As for the pops, those are loads, and here
> the autoincrement would require an additional write port to the
> register file, as you point out below; plus it would introduce data
> dependencies that you don't want (many cores support more than one
> load per cycle).
>

But, is kinda moot as, say:
MOV.Q R13, @-SP
MOV.Q R12, @-SP
MOV.Q R11, @-SP
MOV.Q R10, @-SP
MOV.Q R9, @-SP
MOV.Q R8, @-SP

Only saves 1 instruction vs, say:
ADD -48, SP
MOV.Q R13, (SP, 40)
MOV.Q R12, (SP, 32)
MOV.Q R11, (SP, 24)
MOV.Q R10, (SP, 16)
MOV.Q R9, (SP, 8)
MOV.Q R8, (SP, 0)

Depending on how it is implemented, the dependency issues on the shared
register could actually make the use of auto-increment slower than the
use of fixed displacement loads/stores (and, if one needs to wait the
whole latency of a load or store for the increment's write-back to
finish, using auto-increment in this way is likely "dead on arrival").

I can also note that an earlier form of BJX2 had PUSH/POP instructions,
but these were removed. Noting the above, it is probably not all that
hard to guess why...

>> The next question is: What flavors should I have?
>>
>> - Post-increment (most common?)
>> - Post-decrement
>> - Pre-increment
>> - Pre-decrement (second most common?)
>>
>> The "pre" variants would possibly add more logic to critical paths (e.g.
>> add more gate delay in the AGU before the address is ready for the
>> memory stage).
>
> You typically have memory-access instructions that include an addition
> in the address computation; in that case pre obviously has no extra
> cost. The cost of the addition can be reduced (eliminated) with a
> technique called sum-addressed memory. OTOH, IA-64 supports only
> memory accesses of an address given in a register, so here the
> architects apparently thought that sum-addressed memory is still too
> slow.
>
> Increment vs. decrement: If your store supports reading two registers
> for address computation (in addition to the data register), you can
> put the stride in a register, making the whole question moot. Even if
> you only support reading one register in addition to the data, you can
> have a sign-extended constant stride, again giving you both increment
> and decrement options. Note that having a store that does not support
> the sum of two registers, but does support autoincrement, and a load
> that supports the sum of two registers as address is means that both
> loads and stores can read two registers and write one register, which
> may be useful for certain microarchitectural approaches.
>

Nothing to add here.

> - anton

Re: Load/Store with auto-increment

<2c034f32-5954-4c48-b650-16973aa55606n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=32041&group=comp.arch#32041

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3197:b0:74a:d2b0:42cb with SMTP id bi23-20020a05620a319700b0074ad2b042cbmr3446446qkb.2.1683496063240;
Sun, 07 May 2023 14:47:43 -0700 (PDT)
X-Received: by 2002:a05:6830:1455:b0:6a6:8b7:d48 with SMTP id
w21-20020a056830145500b006a608b70d48mr2144930otp.7.1683496062979; Sun, 07 May
2023 14:47:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 7 May 2023 14:47:42 -0700 (PDT)
In-Reply-To: <u3954k$3hea5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b114:e412:2171:9b52;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b114:e412:2171:9b52
References: <u35prk$2ssbq$1@dont-email.me> <2023May7.140701@mips.complang.tuwien.ac.at>
<u3954k$3hea5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2c034f32-5954-4c48-b650-16973aa55606n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 07 May 2023 21:47:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5800
 by: MitchAlsup - Sun, 7 May 2023 21:47 UTC

On Sunday, May 7, 2023 at 4:27:52 PM UTC-5, BGB wrote:
> On 5/7/2023 7:07 AM, Anton Ertl wrote:
> > Marcus <m.de...@this.bitsnbites.eu> writes:
> >> Load/store with auto-increment/decrement can reduce the number of
> >> instructions in many loops (especially those that mostly iterate over
> >> arrays of data).
> >
> > Yes.
> >
> > If you do it only for stores, as suggested below, it could be used for
> > loops that read from one or more arrays and write to one array, all
> > with the same stride, as follows (in pseudo-C-code):
> >
> > /* read from a and b, write to c */
> > da=a-c;
> > db=b-c;
> > for (...) {
> > *c = c[da] * c[db];
> > c+=stride;
> > }
> >
> > the "c+=stride" could become the autoincrement of the store.
> >
> Not all instructions are created equal.
>
> Fewer instructions may not be a win if these instructions would result
> in a higher latency.
<
But eliminating sequential dependencies is almost always a win
because it directly addresses latency.
<
> >> It can also be used in function prologues and epilogues
> >> (for push/pop functionality).
> >
> > Not so great, because it introduces data dependencies between the
> > stores that you then have to get rid of if you want to support more
> > than one store per cycle. As for the pops, those are loads, and here
> > the autoincrement would require an additional write port to the
> > register file, as you point out below; plus it would introduce data
> > dependencies that you don't want (many cores support more than one
> > load per cycle).
> >
> But, is kinda moot as, say:
> MOV.Q R13, @-SP
> MOV.Q R12, @-SP
> MOV.Q R11, @-SP
> MOV.Q R10, @-SP
> MOV.Q R9, @-SP
> MOV.Q R8, @-SP
>
> Only saves 1 instruction vs, say:
> ADD -48, SP
> MOV.Q R13, (SP, 40)
> MOV.Q R12, (SP, 32)
> MOV.Q R11, (SP, 24)
> MOV.Q R10, (SP, 16)
> MOV.Q R9, (SP, 8)
> MOV.Q R8, (SP, 0)
<
If you actually wanted to save instructions you would::
<
MOV.Q R13:R8,@-SP
<
So the argument of saving 1 instruction becomes moot--you can save 5
instructions.
>
> Depending on how it is implemented, the dependency issues on the shared
> register could actually make the use of auto-increment slower than the
> use of fixed displacement loads/stores (and, if one needs to wait the
> whole latency of a load or store for the increment's write-back to
> finish, using auto-increment in this way is likely "dead on arrival").
>
>
> I can also note that an earlier form of BJX2 had PUSH/POP instructions,
> but these were removed. Noting the above, it is probably not all that
> hard to guess why...
> >> The next question is: What flavors should I have?
> >>
> >> - Post-increment (most common?)
> >> - Post-decrement
> >> - Pre-increment
> >> - Pre-decrement (second most common?)
> >>
> >> The "pre" variants would possibly add more logic to critical paths (e.g.
> >> add more gate delay in the AGU before the address is ready for the
> >> memory stage).
> >
> > You typically have memory-access instructions that include an addition
> > in the address computation; in that case pre obviously has no extra
> > cost. The cost of the addition can be reduced (eliminated) with a
> > technique called sum-addressed memory. OTOH, IA-64 supports only
> > memory accesses of an address given in a register, so here the
> > architects apparently thought that sum-addressed memory is still too
> > slow.
> >
> > Increment vs. decrement: If your store supports reading two registers
> > for address computation (in addition to the data register), you can
> > put the stride in a register, making the whole question moot. Even if
> > you only support reading one register in addition to the data, you can
> > have a sign-extended constant stride, again giving you both increment
> > and decrement options. Note that having a store that does not support
> > the sum of two registers, but does support autoincrement, and a load
> > that supports the sum of two registers as address is means that both
> > loads and stores can read two registers and write one register, which
> > may be useful for certain microarchitectural approaches.
> >
> Nothing to add here.
>
> > - anton

Pages:12345678
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor