Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Linux is obsolete (Andrew Tanenbaum)


devel / comp.arch / Automatic register spill / restore?

SubjectAuthor
* Automatic register spill / restore?Andy
+* Re: Automatic register spill / restore?EricP
|+* Re: Automatic register spill / restore?Niklas Holsti
||+* Re: Automatic register spill / restore?Anton Ertl
|||+* Re: Automatic register spill / restore?Ivan Godard
||||`- Re: Automatic register spill / restore?EricP
|||`* Re: Automatic register spill / restore?Niklas Holsti
||| `- Re: Automatic register spill / restore?MitchAlsup
||`* Re: Automatic register spill / restore?EricP
|| `* Re: Automatic register spill / restore?Niklas Holsti
||  +* Re: Automatic register spill / restore?Niklas Holsti
||  |+- Re: Automatic register spill / restore?MitchAlsup
||  |`- Re: Automatic register spill / restore?EricP
||  `* Re: Automatic register spill / restore?MitchAlsup
||   +- Re: Automatic register spill / restore?Michael S
||   `* Re: Automatic register spill / restore?Anton Ertl
||    +* Re: Automatic register spill / restore?John Dallman
||    |`- Re: Automatic register spill / restore?Marcus
||    `- Re: Automatic register spill / restore?EricP
|`* Re: Automatic register spill / restore?Andy
| +* Re: Automatic register spill / restore?BGB
| |`* Re: Wither VLIW? was Automatic register spill / restore?Andy
| | `- Re: Wither VLIW? was Automatic register spill / restore?BGB
| `- Re: Automatic register spill / restore?EricP
+* Re: Automatic register spill / restore?MitchAlsup
|+* Re: Automatic register spill / restore?John Levine
||`- Re: Automatic register spill / restore?Niklas Holsti
|+* Re: Automatic register spill / restore?Marcus
||`* Re: Automatic register spill / restore?MitchAlsup
|| `- Re: Automatic register spill / restore?John Levine
|`* Re: Automatic register spill / restore?Andy
| `* Re: Automatic register spill / restore?MitchAlsup
|  `* Re: Automatic register spill / restore?Ivan Godard
|   `- Re: Automatic register spill / restore?MitchAlsup
`- Re: Automatic register spill / restore?Jecel Assumpção Jr

Pages:12
Automatic register spill / restore?

<t9tuvg$1qgu$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26310&group=comp.arch#26310

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!vxOTfS5Jn/b499FNW7Y8DA.user.46.165.242.75.POSTED!not-for-mail
From: nos...@nowhere.com (Andy)
Newsgroups: comp.arch
Subject: Automatic register spill / restore?
Date: Mon, 4 Jul 2022 17:48:30 +1200
Organization: Aioe.org NNTP Server
Message-ID: <t9tuvg$1qgu$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="59934"; posting-host="vxOTfS5Jn/b499FNW7Y8DA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.9.1
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Andy - Mon, 4 Jul 2022 05:48 UTC

The discussions going on about register to/from stack and load/store
multiple instructions has got me vaguely remembering that there was some
talk about old mainframes that could save to stack automatically any
registers in danger of being overwritten after a jump to subroutine or such.

Anyone else remember such a thing?, or am I just making things up by
reason of senility and/or madness? (entirely possible I'm afraid :-( )

I wonder, that if they were real, is it possible given today's
transistor core counts those mechanisms could be revived in modern clean
sheet cpu designs?, to at least help alleviate some of the issues we see
in the current state of the art perhaps?

Or were there some really big downsides to the automagic things that I'm
not remembering well enough?

Re: Automatic register spill / restore?

<OlEwK.24566$8f2.9092@fx38.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26312&group=comp.arch#26312

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx38.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
References: <t9tuvg$1qgu$1@gioia.aioe.org>
In-Reply-To: <t9tuvg$1qgu$1@gioia.aioe.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 42
Message-ID: <OlEwK.24566$8f2.9092@fx38.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 04 Jul 2022 16:14:38 UTC
Date: Mon, 04 Jul 2022 12:14:31 -0400
X-Received-Bytes: 2403
 by: EricP - Mon, 4 Jul 2022 16:14 UTC

Andy wrote:
> The discussions going on about register to/from stack and load/store
> multiple instructions has got me vaguely remembering that there was some
> talk about old mainframes that could save to stack automatically any
> registers in danger of being overwritten after a jump to subroutine or
> such.
>
> Anyone else remember such a thing?, or am I just making things up by
> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>
>
> I wonder, that if they were real, is it possible given today's
> transistor core counts those mechanisms could be revived in modern clean
> sheet cpu designs?, to at least help alleviate some of the issues we see
> in the current state of the art perhaps?
>
> Or were there some really big downsides to the automagic things that I'm
> not remembering well enough?

Sparc register windows. They were opaque, asynchronous lazy spill/fill
which was done by kernel mode traps. Reportedly it had... issues.

A quicky search of 'sparc "register window" ' finds

[paywalled]
Error behavior comparison of multiple computing systems: A case study
using linux on pentium, solaris on sparc, and aix on power, 2008
https://ieeexplore.ieee.org/abstract/document/4725314/

I found a copy available in a online viewer with download option:

https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxkY2hlbjhyZXNlYXJjaHxneDoxNzdkNWMxOWVmOWNhMDdm

says:
"The registers introduced to manage the register window in the
SPARC architecture are particularly error-sensitive and contribute
to more than 50% of the hang cases for the Solaris System.
This result indicates that, while using the register window may
improve performance, it can create unexpected reliability problems."

Re: Automatic register spill / restore?

<bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26315&group=comp.arch#26315

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:570d:0:b0:31d:3d2a:c4b8 with SMTP id 13-20020ac8570d000000b0031d3d2ac4b8mr12300478qtw.61.1656957410466;
Mon, 04 Jul 2022 10:56:50 -0700 (PDT)
X-Received: by 2002:ad4:5bed:0:b0:472:f982:836d with SMTP id
k13-20020ad45bed000000b00472f982836dmr3673032qvc.11.1656957410200; Mon, 04
Jul 2022 10:56:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 4 Jul 2022 10:56:50 -0700 (PDT)
In-Reply-To: <t9tuvg$1qgu$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 04 Jul 2022 17:56:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 34
 by: MitchAlsup - Mon, 4 Jul 2022 17:56 UTC

On Monday, July 4, 2022 at 12:48:36 AM UTC-5, Andy wrote:
> The discussions going on about register to/from stack and load/store
> multiple instructions has got me vaguely remembering that there was some
> talk about old mainframes that could save to stack automatically any
> registers in danger of being overwritten after a jump to subroutine or such.
>
> Anyone else remember such a thing?, or am I just making things up by
> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>
IBM 360 series::
caller allocated a register save area and always kept it in R13
upon any arrival (Callee, interruptee) would perform STM 12,12(13)
which would dump all 16 registers in the save area, as the first step
in properly receiving control.
<
VAX did something similar, but performed in in the CALL side of ISA
processing.
>
> I wonder, that if they were real, is it possible given today's
> transistor core counts those mechanisms could be revived in modern clean
> sheet cpu designs?, to at least help alleviate some of the issues we see
> in the current state of the art perhaps?
<
Effectively that is what ENTER and EXIT do in My 66000 architecture.
>
> Or were there some really big downsides to the automagic things that I'm
> not remembering well enough?
<
Automagic was so slow in VAX that the more modern compilers used JSR
instead of CALL and got rid of ½ of the cycles in calling/returning.

Re: Automatic register spill / restore?

<t9vd52$26f3$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26321&group=comp.arch#26321

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Mon, 4 Jul 2022 18:56:34 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t9vd52$26f3$1@gal.iecc.com>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
Injection-Date: Mon, 4 Jul 2022 18:56:34 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="72163"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Mon, 4 Jul 2022 18:56 UTC

According to MitchAlsup <MitchAlsup@aol.com>:
>> Anyone else remember such a thing?, or am I just making things up by
>> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>>
>IBM 360 series::
>caller allocated a register save area and always kept it in R13
>upon any arrival (Callee, interruptee) would perform STM 12,12(13)
>which would dump all 16 registers in the save area, as the first step
>in properly receiving control.

Actually it was STM 14,12,12(13). You chained save area pointers in
R13 in other ways depending on whether your routine was reentrant or
recursive or neither. The first "12" might be a smaller number if your
routine didn't change the high numbered registers. On TSS the sequence
was messier since every routine had both a code pointer (V address) and a
data pointer (R address.)

For obvious reasons interrupts could not depend on register contents
and saved registers in a static place in low memory.

>VAX did something similar, but performed in in the CALL side of ISA
>processing.

The VAX calling sequence had a bit mask of registers to save at
the start of the procedure and very complex CALL/RET instructions that
did the save and set up the stack frame. They were so slow that a lot
of software used the much simpler JSB/RSB that just saved the return
address on the stack and jumped.

SPARC used the old pcc compiler which didn't do very sophisticated
register management, so the SPARC register windows were a way to
save and restore all the registers fast. The contemporary 801
project used the much better PL.8 compiler so their calling sequence
just saved the registers it needed to. The 801 did not have load
or store multiple but the ROMP put it back in so they could get
full memory speed on the load/store traffic.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Automatic register spill / restore?

<jigrgdF3ft7U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26322&group=comp.arch#26322

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Mon, 4 Jul 2022 21:57:17 +0300
Organization: Tidorum Ltd
Lines: 82
Message-ID: <jigrgdF3ft7U1@mid.individual.net>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net i0NQL/8Hlr/4G30dxxJBcgfn0hKkOunBEnPU0PA+NA7WM4KhVV
Cancel-Lock: sha1:siu+77D/RT4tlgcyaBqOyG/z62U=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
In-Reply-To: <OlEwK.24566$8f2.9092@fx38.iad>
 by: Niklas Holsti - Mon, 4 Jul 2022 18:57 UTC

On 2022-07-04 19:14, EricP wrote:
> Andy wrote:
>> The discussions going on about register to/from stack and load/store
>> multiple instructions has got me vaguely remembering that there was
>> some talk about old mainframes that could save to stack automatically
>> any registers in danger of being overwritten after a jump to
>> subroutine or such.
>>
>> Anyone else remember such a thing?, or am I just making things up by
>> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>>
>>
>> I wonder, that if they were real, is it possible given today's
>> transistor core counts those mechanisms could be revived in modern
>> clean sheet cpu designs?, to at least help alleviate some of the
>> issues we see in the current state of the art perhaps?
>>
>> Or were there some really big downsides to the automagic things that
>> I'm not remembering well enough?
>
> Sparc register windows. They were opaque, asynchronous lazy spill/fill
> which was done by kernel mode traps. Reportedly it had... issues.

What sort of "issues"? Performance wrt alternatives? SPARC processors
are still used extensively in space systems, in particular on ESA
missions. I've written embedded SW for several such systems and I'm not
aware of any serious issues.

Register windows do complicate static WCET analysis a bit, because it is
not easy to predict exactly which call or which return can cause a trap.
But an upper bound can be calculated that is not very pessimistic. And
who uses static WCET analysis any more :-(

I have read that one port of gcc for SPARC offers a "flat"
register-usage model as an alternative to register windows. In the
"flat" model the same set of 32 registers is used at all points in the
program, and the register windows are never rotated (except for trap
handling, I assume). I suppose this "flat" option was implemented for a
reason, but the reason was not explained, IIRC.

A minor annoyance of register windows, and of the standard trap handlers
for rotating windows, is that every non-leaf subroutine must allocate 96
octets of stack for saving 24 registers. This area is used if, and only
if, the subroutine executes a call that causes a register-ring-overflow
trap. A programming style that uses many small subroutines can lead to
suprisingly large stack usage.

> A quicky search of 'sparc "register window" ' finds
>
> [paywalled]
> Error behavior comparison of multiple computing systems: A case study
> using linux on pentium, solaris on sparc, and aix on power, 2008
> https://ieeexplore.ieee.org/abstract/document/4725314/
>
> I found a copy available in a online viewer with download option:
>
> https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxkY2hlbjhyZXNlYXJjaHxneDoxNzdkNWMxOWVmOWNhMDdm

Thanks for this reference...

> says:
> "The registers introduced to manage the register window in the
> SPARC architecture are particularly error-sensitive and contribute
> to more than 50% of the hang cases for the Solaris System.
> This result indicates that, while using the register window may
> improve performance, it can create unexpected reliability problems."

These are not "natural", real-life errors and hangs. That study injects
single-bit errors into memory words and system registers and sees what
happens. It is to be expected that critical system registers, such as
the ones involved in register-ring management, are very sensititive to
such simulated HW errors. This is not a flaw in the logical
architecture, but HW architects may want to make these registers
especially robust.

SPARC processors used in space applications are radiation-tolerant and
use triple modular redundancy for critical registers.

Re: Automatic register spill / restore?

<ta0lus$3l4lo$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26337&group=comp.arch#26337

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 08:32:59 +0200
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <ta0lus$3l4lo$1@dont-email.me>
References: <t9tuvg$1qgu$1@gioia.aioe.org>
<bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 5 Jul 2022 06:33:00 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="c2a5933df0e4ca288e92153a312d0907";
logging-data="3838648"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19LNvmDzKJCG86ko9KHFLuE483a6lmbxpA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.9.1
Cancel-Lock: sha1:4MaIJUNzSFa8WVUxzp8kD4aIw9g=
Content-Language: en-US
In-Reply-To: <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
 by: Marcus - Tue, 5 Jul 2022 06:32 UTC

On 2022-07-04, MitchAlsup wrote:
> On Monday, July 4, 2022 at 12:48:36 AM UTC-5, Andy wrote:
>> The discussions going on about register to/from stack and load/store
>> multiple instructions has got me vaguely remembering that there was some
>> talk about old mainframes that could save to stack automatically any
>> registers in danger of being overwritten after a jump to subroutine or such.
>>
>> Anyone else remember such a thing?, or am I just making things up by
>> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>>
> IBM 360 series::
> caller allocated a register save area and always kept it in R13
> upon any arrival (Callee, interruptee) would perform STM 12,12(13)
> which would dump all 16 registers in the save area, as the first step
> in properly receiving control.

How does this differ from a regular STM to stack (except that the
allocation is done by the caller instead of the callee)? E.g. 68k style:

movem.l d0-d7/a0-a6,-(sp)

> <
> VAX did something similar, but performed in in the CALL side of ISA
> processing.
>>
>> I wonder, that if they were real, is it possible given today's
>> transistor core counts those mechanisms could be revived in modern clean
>> sheet cpu designs?, to at least help alleviate some of the issues we see
>> in the current state of the art perhaps?
> <
> Effectively that is what ENTER and EXIT do in My 66000 architecture.
>>
>> Or were there some really big downsides to the automagic things that I'm
>> not remembering well enough?
> <
> Automagic was so slow in VAX that the more modern compilers used JSR
> instead of CALL and got rid of ½ of the cycles in calling/returning.

Re: Automatic register spill / restore?

<jii91lF9jqkU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26338&group=comp.arch#26338

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 10:54:29 +0300
Organization: Tidorum Ltd
Lines: 21
Message-ID: <jii91lF9jqkU1@mid.individual.net>
References: <t9tuvg$1qgu$1@gioia.aioe.org>
<bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
<t9vd52$26f3$1@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net rQzuGEIDq4ZyNdGx7+CArA6LG21u58z2GP6x+S5GsMcHmgRgLa
Cancel-Lock: sha1:uET1hkfsT+oyG9lsAMP7SXYn3FA=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
In-Reply-To: <t9vd52$26f3$1@gal.iecc.com>
 by: Niklas Holsti - Tue, 5 Jul 2022 07:54 UTC

On 2022-07-04 21:56, John Levine wrote:
>
> SPARC used the old pcc compiler which didn't do very sophisticated
> register management, so the SPARC register windows were a way to
> save and restore all the registers fast.

Nitpick: not "all the registers". When a SPARC rotates the register
window in connection with a call, 16 registers visible to the caller are
saved (become inaccessible). These are the 8 "in" registers and the 8
"local" registers of the caller. The callee gets 16 new registers: 8 new
"local" registers and 8 new "out" registers.

The 8 "out" registers of the caller are seen in the callee as the 8 "in"
registers of the callee. Successive windows overlap on those 8
registers, which of course are used to pass parameters.

In addition there are 8 "global" registers that are not held in the
register ring and are always accessible with the same names.

IIRC...

Re: Automatic register spill / restore?

<2022Jul5.095444@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26339&group=comp.arch#26339

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 05 Jul 2022 07:54:44 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 24
Message-ID: <2022Jul5.095444@mips.complang.tuwien.ac.at>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad> <jigrgdF3ft7U1@mid.individual.net>
Injection-Info: reader01.eternal-september.org; posting-host="95fc7f36edda024a93ccf81aa524727e";
logging-data="3845415"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+jAxz+6MP91Wo/25C/XpUN"
Cancel-Lock: sha1:wukn2odVfa11seYcsprkZwARglo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Tue, 5 Jul 2022 07:54 UTC

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>And who uses static WCET analysis any more :-(

I would presume that those applications that needed WCET (worst-case
execution time) analysis in the past still need it. What would they
use instead?

>I have read that one port of gcc for SPARC offers a "flat"
>register-usage model as an alternative to register windows. In the
>"flat" model the same set of 32 registers is used at all points in the
>program, and the register windows are never rotated (except for trap
>handling, I assume). I suppose this "flat" option was implemented for a
>reason, but the reason was not explained, IIRC.

I read somewhere that the register windows could be used for fast task
switching instead of for fast calling. To make use of that you would
need a compiler that does not use the windows for calling; it would
also have to limit its register usage to 16 (one window per task) or
24 (two windows per task) registers.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Automatic register spill / restore?

<ta10jg$3m27q$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26340&group=comp.arch#26340

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 02:34:39 -0700
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <ta10jg$3m27q$1@dont-email.me>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net>
<2022Jul5.095444@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 5 Jul 2022 09:34:41 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="1d36074994d2edd4d03c5bd11932532d";
logging-data="3868922"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/n64uRKsjXPtxLaxFwMnrO"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:FcDBZIiAN/Zg7qbW/BLAahYZMkg=
In-Reply-To: <2022Jul5.095444@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Ivan Godard - Tue, 5 Jul 2022 09:34 UTC

On 7/5/2022 12:54 AM, Anton Ertl wrote:
> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>> And who uses static WCET analysis any more :-(
>
> I would presume that those applications that needed WCET (worst-case
> execution time) analysis in the past still need it. What would they
> use instead?
>
>> I have read that one port of gcc for SPARC offers a "flat"
>> register-usage model as an alternative to register windows. In the
>> "flat" model the same set of 32 registers is used at all points in the
>> program, and the register windows are never rotated (except for trap
>> handling, I assume). I suppose this "flat" option was implemented for a
>> reason, but the reason was not explained, IIRC.
>
> I read somewhere that the register windows could be used for fast task
> switching instead of for fast calling. To make use of that you would
> need a compiler that does not use the windows for calling; it would
> also have to limit its register usage to 16 (one window per task) or
> 24 (two windows per task) registers.

Shouldn't need such limits. You can flush the windows by a recursive
call (making a dummy window) that stops when the end interrupt happens.
Then switch the stack and do a recursive return of the number of dummy
windows that were injected when the new stack was previously switched
out of, saved somewhere (possibly in the dummy). The flush/restore can
tell how many dummy windows there are by counting the recursive calls
before the interrupt; the overflow routine doesn't have to spill any of
the dummies, nor the restore refill them. It does have to spill/refill
the non-dummy windows, but that's inevitable.

It would go even faster if the hardware refilled lazy, assuming that the
spilled windows don't need to be window-block aligned; do you know it
that was required by the implementation?

Re: Automatic register spill / restore?

<52bc40b7-0cbb-4f97-8c00-ae37b397321bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26342&group=comp.arch#26342

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:c91:b0:6af:4b9:4c1b with SMTP id q17-20020a05620a0c9100b006af04b94c1bmr24267597qki.615.1657037413017;
Tue, 05 Jul 2022 09:10:13 -0700 (PDT)
X-Received: by 2002:ac8:7d86:0:b0:31d:4398:b33 with SMTP id
c6-20020ac87d86000000b0031d43980b33mr12361694qtd.513.1657037412753; Tue, 05
Jul 2022 09:10:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 09:10:12 -0700 (PDT)
In-Reply-To: <ta0lus$3l4lo$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
<ta0lus$3l4lo$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <52bc40b7-0cbb-4f97-8c00-ae37b397321bn@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 05 Jul 2022 16:10:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 49
 by: MitchAlsup - Tue, 5 Jul 2022 16:10 UTC

On Tuesday, July 5, 2022 at 1:33:03 AM UTC-5, Marcus wrote:
> On 2022-07-04, MitchAlsup wrote:
> > On Monday, July 4, 2022 at 12:48:36 AM UTC-5, Andy wrote:
> >> The discussions going on about register to/from stack and load/store
> >> multiple instructions has got me vaguely remembering that there was some
> >> talk about old mainframes that could save to stack automatically any
> >> registers in danger of being overwritten after a jump to subroutine or such.
> >>
> >> Anyone else remember such a thing?, or am I just making things up by
> >> reason of senility and/or madness? (entirely possible I'm afraid :-( )
> >>
> > IBM 360 series::
> > caller allocated a register save area and always kept it in R13
> > upon any arrival (Callee, interruptee) would perform STM 12,12(13)
> > which would dump all 16 registers in the save area, as the first step
> > in properly receiving control.
> How does this differ from a regular STM to stack (except that the
> allocation is done by the caller instead of the callee)? E.g. 68k style:
<
It was a linked list not a stack.
>
> movem.l d0-d7/a0-a6,-(sp)
> > <
> > VAX did something similar, but performed in in the CALL side of ISA
> > processing.
> >>
> >> I wonder, that if they were real, is it possible given today's
> >> transistor core counts those mechanisms could be revived in modern clean
> >> sheet cpu designs?, to at least help alleviate some of the issues we see
> >> in the current state of the art perhaps?
> > <
> > Effectively that is what ENTER and EXIT do in My 66000 architecture.
> >>
> >> Or were there some really big downsides to the automagic things that I'm
> >> not remembering well enough?
> > <
> > Automagic was so slow in VAX that the more modern compilers used JSR
> > instead of CALL and got rid of ½ of the cycles in calling/returning.

Re: Automatic register spill / restore?

<SOZwK.281950$ssF.145632@fx14.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26344&group=comp.arch#26344

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad> <jigrgdF3ft7U1@mid.individual.net>
In-Reply-To: <jigrgdF3ft7U1@mid.individual.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 127
Message-ID: <SOZwK.281950$ssF.145632@fx14.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 05 Jul 2022 16:39:14 UTC
Date: Tue, 05 Jul 2022 12:38:16 -0400
X-Received-Bytes: 6557
 by: EricP - Tue, 5 Jul 2022 16:38 UTC

Niklas Holsti wrote:
> On 2022-07-04 19:14, EricP wrote:
>> Andy wrote:
>>> The discussions going on about register to/from stack and load/store
>>> multiple instructions has got me vaguely remembering that there was
>>> some talk about old mainframes that could save to stack automatically
>>> any registers in danger of being overwritten after a jump to
>>> subroutine or such.
>>>
>>> Anyone else remember such a thing?, or am I just making things up by
>>> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>>>
>>>
>>> I wonder, that if they were real, is it possible given today's
>>> transistor core counts those mechanisms could be revived in modern
>>> clean sheet cpu designs?, to at least help alleviate some of the
>>> issues we see in the current state of the art perhaps?
>>>
>>> Or were there some really big downsides to the automagic things that
>>> I'm not remembering well enough?
>>
>> Sparc register windows. They were opaque, asynchronous lazy spill/fill
>> which was done by kernel mode traps. Reportedly it had... issues.
>
>
> What sort of "issues"? Performance wrt alternatives? SPARC processors
> are still used extensively in space systems, in particular on ESA
> missions. I've written embedded SW for several such systems and I'm not
> aware of any serious issues.
>
> Register windows do complicate static WCET analysis a bit, because it is
> not easy to predict exactly which call or which return can cause a trap.
> But an upper bound can be calculated that is not very pessimistic. And
> who uses static WCET analysis any more :-(
>
> I have read that one port of gcc for SPARC offers a "flat"
> register-usage model as an alternative to register windows. In the
> "flat" model the same set of 32 registers is used at all points in the
> program, and the register windows are never rotated (except for trap
> handling, I assume). I suppose this "flat" option was implemented for a
> reason, but the reason was not explained, IIRC.
>
> A minor annoyance of register windows, and of the standard trap handlers
> for rotating windows, is that every non-leaf subroutine must allocate 96
> octets of stack for saving 24 registers. This area is used if, and only
> if, the subroutine executes a call that causes a register-ring-overflow
> trap. A programming style that uses many small subroutines can lead to
> suprisingly large stack usage.
>
>
>> A quicky search of 'sparc "register window" ' finds
>>
>> [paywalled]
>> Error behavior comparison of multiple computing systems: A case study
>> using linux on pentium, solaris on sparc, and aix on power, 2008
>> https://ieeexplore.ieee.org/abstract/document/4725314/
>>
>> I found a copy available in a online viewer with download option:
>>
>> https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxkY2hlbjhyZXNlYXJjaHxneDoxNzdkNWMxOWVmOWNhMDdm
>
>
>
> Thanks for this reference...
>
>
>> says:
>> "The registers introduced to manage the register window in the
>> SPARC architecture are particularly error-sensitive and contribute
>> to more than 50% of the hang cases for the Solaris System.
>> This result indicates that, while using the register window may
>> improve performance, it can create unexpected reliability problems."
>
>
> These are not "natural", real-life errors and hangs. That study injects
> single-bit errors into memory words and system registers and sees what
> happens. It is to be expected that critical system registers, such as
> the ones involved in register-ring management, are very sensititive to
> such simulated HW errors. This is not a flaw in the logical
> architecture, but HW architects may want to make these registers
> especially robust.

Right, the errors were simulated but as I read it they contend the
fragility is due to the register windows and the way were manipulated.

> SPARC processors used in space applications are radiation-tolerant and
> use triple modular redundancy for critical registers.

I can't find the references just now but I have seen it
discussed in the past so I'll see what I can dig up.

IIRC the primary issue was the phase delay in window save/restore
was opaque so code could only know for sure that memory was up to
date by explicit flushing through OS traps -
something akin to manual cache coherence.
I would be paranoid about things like general OS calls, exceptions,
interrupts, user mode task switches not seeing a coherent view of memory,
and would flush the pending register window on entry to these
until each can be proved safe, on a case-by-case basis.

One issue is even for in-order implementations it can require a large
number of hardware registers but programmers can only use a small number,
similar to costs in bank-switched register sets like Arm32.

H&P note of Sparc:
- Given that each window has 16 unique registers, an implementation of
SPARC can have as few as 40 physical registers and as many as 520,
although most have 128 to 136.
...
The danger of register windows is that the larger number of
registers could slow down the clock rate."

One compiler writer notes:
- "The [Sparc] architecture requires a designer to implement 120 registers,
of which only 24 to 29 are available for use by the compiler writer."

- "Setjmp is an exceptional case for saving register windows.
Because setjmp saves processor state, it is necessary for it to
force the hidden register state to the stack and to save the current
state into the jump buffer. ...
This makes setjmp an exceptionally slow operation."

- some comments in comp.compilers
https://compilers.iecc.com/comparch/article/94-02-130
https://compilers.iecc.com/comparch/article/94-02-134

Re: Automatic register spill / restore?

<Ir_wK.281951$ssF.94316@fx14.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26345&group=comp.arch#26345

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad> <jigrgdF3ft7U1@mid.individual.net> <2022Jul5.095444@mips.complang.tuwien.ac.at> <ta10jg$3m27q$1@dont-email.me>
In-Reply-To: <ta10jg$3m27q$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 55
Message-ID: <Ir_wK.281951$ssF.94316@fx14.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 05 Jul 2022 17:22:48 UTC
Date: Tue, 05 Jul 2022 13:22:12 -0400
X-Received-Bytes: 3640
 by: EricP - Tue, 5 Jul 2022 17:22 UTC

Ivan Godard wrote:
> On 7/5/2022 12:54 AM, Anton Ertl wrote:
>> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>>> And who uses static WCET analysis any more :-(
>>
>> I would presume that those applications that needed WCET (worst-case
>> execution time) analysis in the past still need it. What would they
>> use instead?
>>
>>> I have read that one port of gcc for SPARC offers a "flat"
>>> register-usage model as an alternative to register windows. In the
>>> "flat" model the same set of 32 registers is used at all points in the
>>> program, and the register windows are never rotated (except for trap
>>> handling, I assume). I suppose this "flat" option was implemented for a
>>> reason, but the reason was not explained, IIRC.
>>
>> I read somewhere that the register windows could be used for fast task
>> switching instead of for fast calling. To make use of that you would
>> need a compiler that does not use the windows for calling; it would
>> also have to limit its register usage to 16 (one window per task) or
>> 24 (two windows per task) registers.
>
>
> Shouldn't need such limits. You can flush the windows by a recursive
> call (making a dummy window) that stops when the end interrupt happens.
> Then switch the stack and do a recursive return of the number of dummy
> windows that were injected when the new stack was previously switched
> out of, saved somewhere (possibly in the dummy). The flush/restore can
> tell how many dummy windows there are by counting the recursive calls
> before the interrupt; the overflow routine doesn't have to spill any of
> the dummies, nor the restore refill them. It does have to spill/refill
> the non-dummy windows, but that's inevitable.
>
> It would go even faster if the hardware refilled lazy, assuming that the
> spilled windows don't need to be window-block aligned; do you know it
> that was required by the implementation?

The size of the Sparc saved HW register window is model dependent
from 2..32 so to flush it you would have to do 32 SAVE instructions,
which could potentially trigger multiple OS traps.

The hardware doesn't seem to have the tracking registers necessary
to optimize save/restore, though maybe later models did.

One improvment might be to have a model register that can be
read to find the HW window size in user mode.
Another could be a FLUSH instruction.

It looks like it just has a Valid bit per register window.
It also needs a Modified bit to track changes since last spill/fill,
and low-water and high-water spill/refill stack pointers.
That would allow it to only flush windows modified since reload,
spill-ahead when the window is 3/4 full, refill-ahead when 1/4 full,
and minimize the window management stalls.

Re: Automatic register spill / restore?

<ta21sv$agl$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26346&group=comp.arch#26346

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 19:02:55 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <ta21sv$agl$1@gal.iecc.com>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com> <ta0lus$3l4lo$1@dont-email.me> <52bc40b7-0cbb-4f97-8c00-ae37b397321bn@googlegroups.com>
Injection-Date: Tue, 5 Jul 2022 19:02:55 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="10773"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com> <ta0lus$3l4lo$1@dont-email.me> <52bc40b7-0cbb-4f97-8c00-ae37b397321bn@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Tue, 5 Jul 2022 19:02 UTC

According to MitchAlsup <MitchAlsup@aol.com>:
>> > IBM 360 series::
>> > caller allocated a register save area and always kept it in R13
>> > upon any arrival (Callee, interruptee) would perform STM 12,12(13)
>> > which would dump all 16 registers in the save area, as the first step
>> > in properly receiving control.
>> How does this differ from a regular STM to stack (except that the
>> allocation is done by the caller instead of the callee)? E.g. 68k style:
><
>It was a linked list not a stack.

Depends on the details. In PL/I it was a stack, in Fortran and COBOL which
did not allow recursion, it was a linked list of static save areas.

>> > Automagic was so slow in VAX that the more modern compilers used JSR
>> > instead of CALL and got rid of ½ of the cycles in calling/returning.

JSB actually, but yes, many of the complex VAX instructions were so slow that
they weren't useful.

VAX appears to have been designed in an era when microcode was much faster
than RAM so everything was limited by memory speed. Unfortunately, by the
time there were actual VAX computers, that wasn't true any more.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Automatic register spill / restore?

<jijg97Ffk0hU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26347&group=comp.arch#26347

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 22:04:06 +0300
Organization: Tidorum Ltd
Lines: 202
Message-ID: <jijg97Ffk0hU1@mid.individual.net>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net> <SOZwK.281950$ssF.145632@fx14.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net KxkMg/JAJjyaUl/ca1ZvJQwjc0VCvgMLczNOj10Yv4/n1Qki8s
Cancel-Lock: sha1:NhHVgx0dsK0RSwBE/o54CCGmi+4=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
In-Reply-To: <SOZwK.281950$ssF.145632@fx14.iad>
 by: Niklas Holsti - Tue, 5 Jul 2022 19:04 UTC

On 2022-07-05 19:38, EricP wrote:
> Niklas Holsti wrote:
>> On 2022-07-04 19:14, EricP wrote:
>>> Andy wrote:
>>>> The discussions going on about register to/from stack and load/store
>>>> multiple instructions has got me vaguely remembering that there was
>>>> some talk about old mainframes that could save to stack
>>>> automatically any registers in danger of being overwritten after a
>>>> jump to subroutine or such.
>>>>
>>>> Anyone else remember such a thing?, or am I just making things up by
>>>> reason of senility and/or madness? (entirely possible I'm afraid :-( )
>>>>
>>>>
>>>> I wonder, that if they were real, is it possible given today's
>>>> transistor core counts those mechanisms could be revived in modern
>>>> clean sheet cpu designs?, to at least help alleviate some of the
>>>> issues we see in the current state of the art perhaps?
>>>>
>>>> Or were there some really big downsides to the automagic things that
>>>> I'm not remembering well enough?
>>>
>>> Sparc register windows. They were opaque, asynchronous lazy spill/fill
>>> which was done by kernel mode traps. Reportedly it had... issues.
>>
>>
>> What sort of "issues"? Performance wrt alternatives? SPARC processors
>> are still used extensively in space systems, in particular on ESA
>> missions. I've written embedded SW for several such systems and I'm
>> not aware of any serious issues.
>>
>> Register windows do complicate static WCET analysis a bit, because it
>> is not easy to predict exactly which call or which return can cause a
>> trap. But an upper bound can be calculated that is not very
>> pessimistic. And who uses static WCET analysis any more :-(
>>
>> I have read that one port of gcc for SPARC offers a "flat"
>> register-usage model as an alternative to register windows. In the
>> "flat" model the same set of 32 registers is used at all points in the
>> program, and the register windows are never rotated (except for trap
>> handling, I assume). I suppose this "flat" option was implemented for
>> a reason, but the reason was not explained, IIRC.
>>
>> A minor annoyance of register windows, and of the standard trap
>> handlers for rotating windows, is that every non-leaf subroutine must
>> allocate 96 octets of stack for saving 24 registers. This area is used
>> if, and only if, the subroutine executes a call that causes a
>> register-ring-overflow trap. A programming style that uses many small
>> subroutines can lead to suprisingly large stack usage.
>>
>>
>>> A quicky search of 'sparc "register window" ' finds
>>>
>>> [paywalled]
>>> Error behavior comparison of multiple computing systems: A case study
>>> using linux on pentium, solaris on sparc, and aix on power, 2008
>>> https://ieeexplore.ieee.org/abstract/document/4725314/
>>>
>>> I found a copy available in a online viewer with download option:
>>>
>>> https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxkY2hlbjhyZXNlYXJjaHxneDoxNzdkNWMxOWVmOWNhMDdm
>>
>>
>>
>>
>> Thanks for this reference...
>>
>>
>>> says:
>>> "The registers introduced to manage the register window in the
>>> SPARC architecture are particularly error-sensitive and contribute
>>> to more than 50% of the hang cases for the Solaris System.
>>> This result indicates that, while using the register window may
>>> improve performance, it can create unexpected reliability problems."
>>
>>
>> These are not "natural", real-life errors and hangs. That study
>> injects single-bit errors into memory words and system registers and
>> sees what happens. It is to be expected that critical system
>> registers, such as the ones involved in register-ring management, are
>> very sensititive to such simulated HW errors. This is not a flaw in
>> the logical architecture, but HW architects may want to make these
>> registers especially robust.
>
> Right, the errors were simulated but as I read it they contend the
> fragility is due to the register windows and the way were manipulated.

But they also say: "Injections into the system registers indicate that
each of the three processors (the Pentium 4, UltraSPARC IIIi, and
POWER5) have two to three critical registers that are very sensitive to
errors (see Figure 9 through Figure 11), those being IDTR, GDTR and CR4
in the Pentium 4; TBA and SET_SOFTINT in the UltraSPARCIIIi; and SPRG1,
IAR, and MSR in the POWER5."

I still don't think that this paper shows an intrinsic reliability flaw
in the register-window idea, at least not one that cannot be compensated
by increasing the robustness of the critical register HW.

However, I see now that UltraSPARC IIIi, used in the paper, uses the
SPARC V9 (or JPS1) architecture, while my experience is with the earlier
SPARC versions V7 and V8. I think the register-window mechanism is the
same in V9, but the paper's description of the mechanism disagrees in
some details with my understanding -- for example, the paper says that
each window has 32 registers, not 24 (visible) or 16 (owned by the window).

>> SPARC processors used in space applications are radiation-tolerant and
>> use triple modular redundancy for critical registers.
>
> I can't find the references just now but I have seen it
> discussed in the past so I'll see what I can dig up.
>
> IIRC the primary issue was the phase delay in window save/restore
> was opaque so code could only know for sure that memory was up to
> date by explicit flushing through OS traps -
> something akin to manual cache coherence.

All the occupied windows in the register ring have to be stored into
memory (into the areas reserved for this in each stack frame) when a
thread is suspended, and at least one windowful has to be reloaded when
the thread is resumed. This is in addition to storing and reloading the
8 "global" registers and any other thread-specific context.

A debugger may also prefer to store the ring into memory so that it can
access the saved SP, FP, and register-allocated locals of upper-level
calls. But at least in SPARC V7 and V8 that is all deterministic and
program-controlled. AFAIK the SPARC processors do not autonomously store
register-ring contents into memory in the background, it has to be
programmed explicitly.

A modern OoO processor may have hundreds of working registers, not
architecturally visible to the programmer. I don't know how those are
handled on thread switch, but it is probably complicated.

> I would be paranoid about things like general OS calls, exceptions,
> interrupts, user mode task switches not seeing a coherent view of memory,
> and would flush the pending register window on entry to these
> until each can be proved safe, on a case-by-case basis.

Traps and interrupts work in the same way as subroutine calls. OS calls
are typically done by a triggering a SW trap. Exceptions may have to
unwind the stack which involves both RESTORE instructions (as in normal
returns from calls) and reloading register windows that had been pushed
out of the register ring. Some work, but not horribly complex.

> One issue is even for in-order implementations it can require a large
> number of hardware registers but programmers can only use a small number,
> similar to costs in bank-switched register sets like Arm32.

But the same issue arises in OoO systems without register windows,
right? Hundreds of working registers, invisible to the programmer.

> H&P note of Sparc:
> - Given that each window has 16 unique registers, an implementation of
>   SPARC can have as few as 40 physical registers and as many as 520,
>   although most have 128 to 136.
>   ...
>   The danger of register windows is that the larger number of
>   registers could slow down the clock rate."

I suppose the OoO processors with large numbers of registers should have
the same clock-rate risk, but perhaps the OoO parallelism compensates.

> One compiler writer notes:
> - "The [Sparc] architecture requires a designer to implement 120 registers,
>   of which only 24 to 29 are available for use by the compiler writer."

Same issue for OoO machines.

> - "Setjmp is an exceptional case for saving register windows.
>   Because setjmp saves processor state, it is necessary for it to
>   force the hidden register state to the stack and to save the current
>   state into the jump buffer. ...
>   This makes setjmp an exceptionally slow operation."


Click here to read the complete article
Re: Automatic register spill / restore?

<jijhqcFfscrU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26348&group=comp.arch#26348

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 22:30:19 +0300
Organization: Tidorum Ltd
Lines: 48
Message-ID: <jijhqcFfscrU1@mid.individual.net>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net>
<2022Jul5.095444@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net jjnvHIh2fsNkWfds/tn1fwqyZTYRJXR1iwfwoIFKlv08/TDxzK
Cancel-Lock: sha1:fxFA0GClFyRN9gumI1DOSnFaOUo=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
In-Reply-To: <2022Jul5.095444@mips.complang.tuwien.ac.at>
 by: Niklas Holsti - Tue, 5 Jul 2022 19:30 UTC

On 2022-07-05 10:54, Anton Ertl wrote:
> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>> And who uses static WCET analysis any more :-(
>
> I would presume that those applications that needed WCET (worst-case
> execution time) analysis in the past still need it. What would they
> use instead?

They can seldom use /static/ WCET analysis because the execution speed
of modern processors is too unpredictable, thanks to all the
acceleration mechanism that are ever-evolving and poorly documented, not
to mention multi-cores and their resource-contention problems, not to
mention out-of-order and speculative processing...

The only pleasure aficionados of static WCET analysis (like myself) have
nowadays is the schadenfreude we feel when all those acceleration
mechanisms turn out to be side channels leaking secrets -- Spectre and
Meltdown etc.

People who need WCETs instead use "hybrid" methods that combine
fine-grained execution-time measurements with some static control-flow
analysis to compute a probabilistic "WCET estimate". Possibly combined
with randomized HW to motivate their use of "extreme-value statistics"
for computing the reliability of that probabilistic WCET estimate.

>> I have read that one port of gcc for SPARC offers a "flat"
>> register-usage model as an alternative to register windows. In the
>> "flat" model the same set of 32 registers is used at all points in the
>> program, and the register windows are never rotated (except for trap
>> handling, I assume). I suppose this "flat" option was implemented for a
>> reason, but the reason was not explained, IIRC.
>
> I read somewhere that the register windows could be used for fast task
> switching instead of for fast calling. To make use of that you would
> need a compiler that does not use the windows for calling; it would
> also have to limit its register usage to 16 (one window per task) or
> 24 (two windows per task) registers.

That seems possible, but I haven't come across a real example.

In addition to the cases of one or two windows per task in a "flat"
model without any within-task use of SAVE/RESTORE instructions, one
could instead partition the register ring so that each task could use a
task-specific sector of the ring, containing any number of windows, with
the task using SAVE/RESTORE for call/return in the usual way (non-"flat").

Re: Automatic register spill / restore?

<jijio2Fg0ofU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26349&group=comp.arch#26349

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 22:46:09 +0300
Organization: Tidorum Ltd
Lines: 14
Message-ID: <jijio2Fg0ofU1@mid.individual.net>
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net> <SOZwK.281950$ssF.145632@fx14.iad>
<jijg97Ffk0hU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net Q8KKYzNs3z7CVN9LD4szdwbdC2njpl+yM4UhRUnGOsk7RNn15E
Cancel-Lock: sha1:IrLhnzYKpOA19FBlweoHt6OTAkk=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
In-Reply-To: <jijg97Ffk0hU1@mid.individual.net>
 by: Niklas Holsti - Tue, 5 Jul 2022 19:46 UTC

Replying to myself:

On 2022-07-05 22:04, Niklas Holsti wrote:

> A modern OoO processor may have hundreds of working registers, not
> architecturally visible to the programmer. I don't know how those are
> handled on thread switch, but it is probably complicated.

On second thought, the invisible working registers should be handled
fully by the OoO HW if the thread switch is implemented by normal code
that only manipulates the architecturally visible registers. So the
SPARC register windows do complicate thread switching more than these
invisible working registers do.

Re: Automatic register spill / restore?

<dad7bdde-2b36-40fd-8b48-74448f32c158n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26350&group=comp.arch#26350

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:14d2:b0:31d:3a01:3db2 with SMTP id u18-20020a05622a14d200b0031d3a013db2mr19086701qtx.575.1657051652437;
Tue, 05 Jul 2022 13:07:32 -0700 (PDT)
X-Received: by 2002:a05:6214:1cc3:b0:46e:64aa:842a with SMTP id
g3-20020a0562141cc300b0046e64aa842amr34212715qvd.101.1657051652179; Tue, 05
Jul 2022 13:07:32 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 13:07:32 -0700 (PDT)
In-Reply-To: <jijg97Ffk0hU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net> <SOZwK.281950$ssF.145632@fx14.iad> <jijg97Ffk0hU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dad7bdde-2b36-40fd-8b48-74448f32c158n@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 05 Jul 2022 20:07:32 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 149
 by: MitchAlsup - Tue, 5 Jul 2022 20:07 UTC

On Tuesday, July 5, 2022 at 2:04:11 PM UTC-5, Niklas Holsti wrote:
> On 2022-07-05 19:38, EricP wrote:
> > Niklas Holsti wrote:
<snip>
> >
> > Right, the errors were simulated but as I read it they contend the
> > fragility is due to the register windows and the way were manipulated.
> But they also say: "Injections into the system registers indicate that
> each of the three processors (the Pentium 4, UltraSPARC IIIi, and
> POWER5) have two to three critical registers that are very sensitive to
> errors (see Figure 9 through Figure 11), those being IDTR, GDTR and CR4
> in the Pentium 4; TBA and SET_SOFTINT in the UltraSPARCIIIi; and SPRG1,
> IAR, and MSR in the POWER5."
<
I don't know of any processors that can usefully withstand someone coming
along and randomly flipping bits in PSR, Root Pointers, Page table structures,
and other control registers. SPARC should be little different.
>
> I still don't think that this paper shows an intrinsic reliability flaw
> in the register-window idea, at least not one that cannot be compensated
> by increasing the robustness of the critical register HW.
<
Agreed.
<
>
> However, I see now that UltraSPARC IIIi, used in the paper, uses the
> SPARC V9 (or JPS1) architecture, while my experience is with the earlier
> SPARC versions V7 and V8. I think the register-window mechanism is the
> same in V9, but the paper's description of the mechanism disagrees in
> some details with my understanding -- for example, the paper says that
> each window has 32 registers, not 24 (visible) or 16 (owned by the window).
<
SPARC V9 added some new functionality of OS over register windows that
was supposed to make some of the background overhead go away.
<
> >> SPARC processors used in space applications are radiation-tolerant and
> >> use triple modular redundancy for critical registers.
> >
> > I can't find the references just now but I have seen it
> > discussed in the past so I'll see what I can dig up.
> >
> > IIRC the primary issue was the phase delay in window save/restore
> > was opaque so code could only know for sure that memory was up to
> > date by explicit flushing through OS traps -
> > something akin to manual cache coherence.
<
> All the occupied windows in the register ring have to be stored into
> memory (into the areas reserved for this in each stack frame) when a
> thread is suspended, and at least one windowful has to be reloaded when
> the thread is resumed. This is in addition to storing and reloading the
> 8 "global" registers and any other thread-specific context.
<
Imagine passing a pointer to the argument which arrives in a register window.
>
> A debugger may also prefer to store the ring into memory so that it can
> access the saved SP, FP, and register-allocated locals of upper-level
> calls. But at least in SPARC V7 and V8 that is all deterministic and
> program-controlled. AFAIK the SPARC processors do not autonomously store
> register-ring contents into memory in the background, it has to be
> programmed explicitly.
>
> A modern OoO processor may have hundreds of working registers, not
> architecturally visible to the programmer. I don't know how those are
> handled on thread switch, but it is probably complicated.
<
But those OoO registers are in use, whereas the invisible windows of SPARC
cannot be "in use".
<
> > I would be paranoid about things like general OS calls, exceptions,
> > interrupts, user mode task switches not seeing a coherent view of memory,
> > and would flush the pending register window on entry to these
> > until each can be proved safe, on a case-by-case basis.
<
When we shut down a company, we had a server (SPARC V8) that had been up
for 7 years 4 months and several days. It had every one of its CPU modules
replaced, every one of its memory modules replaced, and most of the disk
farm replaced; yet the system was never out of service or "down". Thus, I don't
see how register windows did those kinds of systems "any harm".
<
> Traps and interrupts work in the same way as subroutine calls.
<
I have to fundamentally disagree with this statement. subroutine calls know
who they are transferring control to, and that it is within its own address space.
Traps can be made to smell like subroutine calls, but are spurious in nature.
Interrupts are not like subroutine calls, because they are entirely asynchronous
with the thread being interrupted. You can make them smell like unexpected
subroutine calls if you desire.
<
> OS calls
> are typically done by a triggering a SW trap. Exceptions may have to
> unwind the stack which involves both RESTORE instructions (as in normal
> returns from calls) and reloading register windows that had been pushed
> out of the register ring. Some work, but not horribly complex.
<
> > One issue is even for in-order implementations it can require a large
> > number of hardware registers but programmers can only use a small number,
> > similar to costs in bank-switched register sets like Arm32.
<
> But the same issue arises in OoO systems without register windows,
> right? Hundreds of working registers, invisible to the programmer.
<
But in OoO design, one may have hundreds of physical registers, but over the
span of 200-ish clocks, every one of them can be used, without ever crossing
a subroutine boundary. Yes, the access time will be similar due to size/area/ports,
but the registers are in use, whereas in a register window, they are cannot be in use.
<
To build an OoO SPARC one would need the large window of the OoO and then
add to that the excesses of the register window. Worst of all cases (except Itanic-
like rotating register.....)....
<
> > H&P note of Sparc:
> > - Given that each window has 16 unique registers, an implementation of
> > SPARC can have as few as 40 physical registers and as many as 520,
> > although most have 128 to 136.
> > ...
> > The danger of register windows is that the larger number of
> > registers could slow down the clock rate."
<
Whereas MIPS got to high frequencies rather easily, SPARC never did.
Read into that what you will.
<
> I suppose the OoO processors with large numbers of registers should have
> the same clock-rate risk, but perhaps the OoO parallelism compensates.
<
You can "pipeline" the register file access time away. What you cannot pipeline
away is the trap handling complexity.
<
> > One compiler writer notes:
> > - "The [Sparc] architecture requires a designer to implement 120 registers,
> > of which only 24 to 29 are available for use by the compiler writer."
> Same issue for OoO machines.
> > - "Setjmp is an exceptional case for saving register windows.
> > Because setjmp saves processor state, it is necessary for it to
> > force the hidden register state to the stack and to save the current
> > state into the jump buffer. ...
> > This makes setjmp an exceptionally slow operation."
> Hm. That may be necessary for some weird uses of setjmp/longjmp, such as
> implementing some cheapo threading or coroutining services, but I don't
> think it is needed for the more normal use of setjmp/longjmp as an
> exception-handling mechanism.
> > - some comments in comp.compilers
> > https://compilers.iecc.com/comparch/article/94-02-130
> > https://compilers.iecc.com/comparch/article/94-02-134
> Those discussions of setjmp/longjmp implementations reveal some
> complexity, but it certainly works on SPARCs, and IMO it can't be seen
> as a major draw-back. But I may be prejudiced because my applications
> use exceptions either never (embedded SW) or rarely (other SW).
<
So, while it is not a MAJOR DrawBack, it is onerous enough that the
designer should proceed with great caution.

Re: Automatic register spill / restore?

<87087f76-7725-4f1b-baff-7e026b182555n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26351&group=comp.arch#26351

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1467:b0:6af:3f68:6be7 with SMTP id j7-20020a05620a146700b006af3f686be7mr24627807qkl.717.1657051886651;
Tue, 05 Jul 2022 13:11:26 -0700 (PDT)
X-Received: by 2002:ac8:5796:0:b0:31d:47a1:b640 with SMTP id
v22-20020ac85796000000b0031d47a1b640mr12438927qta.618.1657051886479; Tue, 05
Jul 2022 13:11:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 13:11:26 -0700 (PDT)
In-Reply-To: <jijhqcFfscrU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net> <2022Jul5.095444@mips.complang.tuwien.ac.at>
<jijhqcFfscrU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <87087f76-7725-4f1b-baff-7e026b182555n@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 05 Jul 2022 20:11:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 30
 by: MitchAlsup - Tue, 5 Jul 2022 20:11 UTC

On Tuesday, July 5, 2022 at 2:30:24 PM UTC-5, Niklas Holsti wrote:
> On 2022-07-05 10:54, Anton Ertl wrote:
> > Niklas Holsti <niklas...@tidorum.invalid> writes:
> >> And who uses static WCET analysis any more :-(
> >
> > I would presume that those applications that needed WCET (worst-case
> > execution time) analysis in the past still need it. What would they
> > use instead?
> They can seldom use /static/ WCET analysis because the execution speed
> of modern processors is too unpredictable, thanks to all the
> acceleration mechanism that are ever-evolving and poorly documented, not
> to mention multi-cores and their resource-contention problems, not to
> mention out-of-order and speculative processing...
>
> The only pleasure aficionados of static WCET analysis (like myself) have
> nowadays is the schadenfreude we feel when all those acceleration
> mechanisms turn out to be side channels leaking secrets -- Spectre and
> Meltdown etc.
<
Have we finally reached the point that we suspect ANY and EVERY new way
to extract IPL is LIKELY to create side-channels ?
<
So, the only sane architecture point is to define the boundary between
architecture (SW model) and microarchitecture (HW model) such that
microarchectural state is never visible even with a high precision closk.
>
> People who need WCETs instead use "hybrid" methods that combine
> fine-grained execution-time measurements with some static control-flow
> analysis to compute a probabilistic "WCET estimate". Possibly combined
> with randomized HW to motivate their use of "extreme-value statistics"
> for computing the reliability of that probabilistic WCET estimate.

Re: Automatic register spill / restore?

<677a254d-87b9-4bfa-87ce-3e96db396025n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26352&group=comp.arch#26352

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4e51:0:b0:31b:efe0:aa24 with SMTP id e17-20020ac84e51000000b0031befe0aa24mr29386546qtw.635.1657051983807;
Tue, 05 Jul 2022 13:13:03 -0700 (PDT)
X-Received: by 2002:a05:6214:2387:b0:462:1026:b5bb with SMTP id
fw7-20020a056214238700b004621026b5bbmr32543333qvb.38.1657051983672; Tue, 05
Jul 2022 13:13:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 13:13:03 -0700 (PDT)
In-Reply-To: <jijio2Fg0ofU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net> <SOZwK.281950$ssF.145632@fx14.iad>
<jijg97Ffk0hU1@mid.individual.net> <jijio2Fg0ofU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <677a254d-87b9-4bfa-87ce-3e96db396025n@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 05 Jul 2022 20:13:03 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 16
 by: MitchAlsup - Tue, 5 Jul 2022 20:13 UTC

On Tuesday, July 5, 2022 at 2:46:13 PM UTC-5, Niklas Holsti wrote:
> Replying to myself:
> On 2022-07-05 22:04, Niklas Holsti wrote:
>
> > A modern OoO processor may have hundreds of working registers, not
> > architecturally visible to the programmer. I don't know how those are
> > handled on thread switch, but it is probably complicated.
<
> On second thought, the invisible working registers should be handled
> fully by the OoO HW if the thread switch is implemented by normal code
> that only manipulates the architecturally visible registers. So the
> SPARC register windows do complicate thread switching more than these
> invisible working registers do.
<
What happens when context switches are not preformed by running of
instructions, but by the arrival of a "context switch" message ? and HW
does everything wrt saving old state and absorbing new state.

Re: Automatic register spill / restore?

<4cf362ef-a2d6-44d7-90a9-8c0b2a0e6777n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26353&group=comp.arch#26353

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5713:0:b0:31a:c706:50ef with SMTP id 19-20020ac85713000000b0031ac70650efmr30757037qtw.267.1657054145033;
Tue, 05 Jul 2022 13:49:05 -0700 (PDT)
X-Received: by 2002:a05:6214:2308:b0:432:e69f:5d71 with SMTP id
gc8-20020a056214230800b00432e69f5d71mr33384416qvb.19.1657054144781; Tue, 05
Jul 2022 13:49:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 13:49:04 -0700 (PDT)
In-Reply-To: <dad7bdde-2b36-40fd-8b48-74448f32c158n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:65dd:bb7:48b5:b678;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:65dd:bb7:48b5:b678
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad>
<jigrgdF3ft7U1@mid.individual.net> <SOZwK.281950$ssF.145632@fx14.iad>
<jijg97Ffk0hU1@mid.individual.net> <dad7bdde-2b36-40fd-8b48-74448f32c158n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4cf362ef-a2d6-44d7-90a9-8c0b2a0e6777n@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 05 Jul 2022 20:49:05 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 9866
 by: Michael S - Tue, 5 Jul 2022 20:49 UTC

On Tuesday, July 5, 2022 at 11:07:33 PM UTC+3, MitchAlsup wrote:
> On Tuesday, July 5, 2022 at 2:04:11 PM UTC-5, Niklas Holsti wrote:
> > On 2022-07-05 19:38, EricP wrote:
> > > Niklas Holsti wrote:
> <snip>
> > >
> > > Right, the errors were simulated but as I read it they contend the
> > > fragility is due to the register windows and the way were manipulated.
> > But they also say: "Injections into the system registers indicate that
> > each of the three processors (the Pentium 4, UltraSPARC IIIi, and
> > POWER5) have two to three critical registers that are very sensitive to
> > errors (see Figure 9 through Figure 11), those being IDTR, GDTR and CR4
> > in the Pentium 4; TBA and SET_SOFTINT in the UltraSPARCIIIi; and SPRG1,
> > IAR, and MSR in the POWER5."
> <
> I don't know of any processors that can usefully withstand someone coming
> along and randomly flipping bits in PSR, Root Pointers, Page table structures,
> and other control registers. SPARC should be little different.
> >
> > I still don't think that this paper shows an intrinsic reliability flaw
> > in the register-window idea, at least not one that cannot be compensated
> > by increasing the robustness of the critical register HW.
> <
> Agreed.
> <
> >
> > However, I see now that UltraSPARC IIIi, used in the paper, uses the
> > SPARC V9 (or JPS1) architecture, while my experience is with the earlier
> > SPARC versions V7 and V8. I think the register-window mechanism is the
> > same in V9, but the paper's description of the mechanism disagrees in
> > some details with my understanding -- for example, the paper says that
> > each window has 32 registers, not 24 (visible) or 16 (owned by the window).
> <
> SPARC V9 added some new functionality of OS over register windows that
> was supposed to make some of the background overhead go away.
> <
> > >> SPARC processors used in space applications are radiation-tolerant and
> > >> use triple modular redundancy for critical registers.
> > >
> > > I can't find the references just now but I have seen it
> > > discussed in the past so I'll see what I can dig up.
> > >
> > > IIRC the primary issue was the phase delay in window save/restore
> > > was opaque so code could only know for sure that memory was up to
> > > date by explicit flushing through OS traps -
> > > something akin to manual cache coherence.
> <
> > All the occupied windows in the register ring have to be stored into
> > memory (into the areas reserved for this in each stack frame) when a
> > thread is suspended, and at least one windowful has to be reloaded when
> > the thread is resumed. This is in addition to storing and reloading the
> > 8 "global" registers and any other thread-specific context.
> <
> Imagine passing a pointer to the argument which arrives in a register window.
> >
> > A debugger may also prefer to store the ring into memory so that it can
> > access the saved SP, FP, and register-allocated locals of upper-level
> > calls. But at least in SPARC V7 and V8 that is all deterministic and
> > program-controlled. AFAIK the SPARC processors do not autonomously store
> > register-ring contents into memory in the background, it has to be
> > programmed explicitly.
> >
> > A modern OoO processor may have hundreds of working registers, not
> > architecturally visible to the programmer. I don't know how those are
> > handled on thread switch, but it is probably complicated.
> <
> But those OoO registers are in use, whereas the invisible windows of SPARC
> cannot be "in use".
> <
> > > I would be paranoid about things like general OS calls, exceptions,
> > > interrupts, user mode task switches not seeing a coherent view of memory,
> > > and would flush the pending register window on entry to these
> > > until each can be proved safe, on a case-by-case basis.
> <
> When we shut down a company, we had a server (SPARC V8) that had been up
> for 7 years 4 months and several days. It had every one of its CPU modules
> replaced, every one of its memory modules replaced, and most of the disk
> farm replaced; yet the system was never out of service or "down". Thus, I don't
> see how register windows did those kinds of systems "any harm".
> <
> > Traps and interrupts work in the same way as subroutine calls.
> <
> I have to fundamentally disagree with this statement. subroutine calls know
> who they are transferring control to, and that it is within its own address space.
> Traps can be made to smell like subroutine calls, but are spurious in nature.
> Interrupts are not like subroutine calls, because they are entirely asynchronous
> with the thread being interrupted. You can make them smell like unexpected
> subroutine calls if you desire.
> <
> > OS calls
> > are typically done by a triggering a SW trap. Exceptions may have to
> > unwind the stack which involves both RESTORE instructions (as in normal
> > returns from calls) and reloading register windows that had been pushed
> > out of the register ring. Some work, but not horribly complex.
> <
> > > One issue is even for in-order implementations it can require a large
> > > number of hardware registers but programmers can only use a small number,
> > > similar to costs in bank-switched register sets like Arm32.
> <
> > But the same issue arises in OoO systems without register windows,
> > right? Hundreds of working registers, invisible to the programmer.
> <
> But in OoO design, one may have hundreds of physical registers, but over the
> span of 200-ish clocks, every one of them can be used, without ever crossing
> a subroutine boundary. Yes, the access time will be similar due to size/area/ports,
> but the registers are in use, whereas in a register window, they are cannot be in use.
> <
> To build an OoO SPARC one would need the large window of the OoO and then
> add to that the excesses of the register window. Worst of all cases (except Itanic-
> like rotating register.....)....
> <
> > > H&P note of Sparc:
> > > - Given that each window has 16 unique registers, an implementation of
> > > SPARC can have as few as 40 physical registers and as many as 520,
> > > although most have 128 to 136.
> > > ...
> > > The danger of register windows is that the larger number of
> > > registers could slow down the clock rate."
> <
> Whereas MIPS got to high frequencies rather easily, SPARC never did.

Actually, it did, eventually.
Fujitsu got to 4250 MHz, Oracle to 5000 MHz.
MIPS never reached quite that high. But, then again, it didn't last that long and even at the peak of its power had much lower development budget.

> Read into that what you will.

> <
> > I suppose the OoO processors with large numbers of registers should have
> > the same clock-rate risk, but perhaps the OoO parallelism compensates.
> <
> You can "pipeline" the register file access time away. What you cannot pipeline
> away is the trap handling complexity.
> <
> > > One compiler writer notes:
> > > - "The [Sparc] architecture requires a designer to implement 120 registers,
> > > of which only 24 to 29 are available for use by the compiler writer."
> > Same issue for OoO machines.
> > > - "Setjmp is an exceptional case for saving register windows.
> > > Because setjmp saves processor state, it is necessary for it to
> > > force the hidden register state to the stack and to save the current
> > > state into the jump buffer. ...
> > > This makes setjmp an exceptionally slow operation."
> > Hm. That may be necessary for some weird uses of setjmp/longjmp, such as
> > implementing some cheapo threading or coroutining services, but I don't
> > think it is needed for the more normal use of setjmp/longjmp as an
> > exception-handling mechanism.
> > > - some comments in comp.compilers
> > > https://compilers.iecc.com/comparch/article/94-02-130
> > > https://compilers.iecc.com/comparch/article/94-02-134
> > Those discussions of setjmp/longjmp implementations reveal some
> > complexity, but it certainly works on SPARCs, and IMO it can't be seen
> > as a major draw-back. But I may be prejudiced because my applications
> > use exceptions either never (embedded SW) or rarely (other SW).
> <
> So, while it is not a MAJOR DrawBack, it is onerous enough that the
> designer should proceed with great caution.


Click here to read the complete article
Re: Automatic register spill / restore?

<Lc2xK.353316$vAW9.29282@fx10.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26354&group=comp.arch#26354

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
References: <t9tuvg$1qgu$1@gioia.aioe.org> <OlEwK.24566$8f2.9092@fx38.iad> <jigrgdF3ft7U1@mid.individual.net> <SOZwK.281950$ssF.145632@fx14.iad> <jijg97Ffk0hU1@mid.individual.net> <jijio2Fg0ofU1@mid.individual.net>
In-Reply-To: <jijio2Fg0ofU1@mid.individual.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 48
Message-ID: <Lc2xK.353316$vAW9.29282@fx10.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 05 Jul 2022 21:39:55 UTC
Date: Tue, 05 Jul 2022 17:38:30 -0400
X-Received-Bytes: 3028
 by: EricP - Tue, 5 Jul 2022 21:38 UTC

Niklas Holsti wrote:
> Replying to myself:
>
> On 2022-07-05 22:04, Niklas Holsti wrote:
>
>> A modern OoO processor may have hundreds of working registers, not
>> architecturally visible to the programmer. I don't know how those are
>> handled on thread switch, but it is probably complicated.
>
>
> On second thought, the invisible working registers should be handled
> fully by the OoO HW if the thread switch is implemented by normal code
> that only manipulates the architecturally visible registers. So the
> SPARC register windows do complicate thread switching more than these
> invisible working registers do.

Yes, as Mitch also points out, these registers (mostly) contain
currently valid data - its just that their 5-bit architecture
register names are prefixed by the 5-bit Window Pointer that was
current when they were written. But user mode doesn't have access
to the full 10-bit architecture register numbers.

What an OoO implementation might do is free up the physical registers
after an older window was spilled, allowing them to be reused for
in-flight instructions. But it would still need to have enough
physical registers for the worse case window save sets,
plus a minimum to issue new instructions.
So that is about 32*16+16 = 528 + say 200 for in-flight = 728.

But I would want a fully automatic spill/fill mechanism to go with this
so when Renamer, which assigns free registers, sees the free list is too
small it can trigger a spill and get back a block of 16 free physicals.

The register window is a circular buffer, with 6-bit head and tail
circular indexes (5 bits index plus a wrap bit), Valid and Modify bits
for each set, and a LDM/STM mechanism attached to automatically
spill/fill sets without trapping to kernel mode.

The large number of effective architecture register could be a problem.
Using SRAM style Rename tables would require 10-bit arch reg id index,
1024 entries of say 10 bits each for the 728 physical registers,
to hold the architecture-to-physical map.
That's a pretty big table to checkpoint for each conditional branch.

OoO register windows gets expensive real fast.
That probably explains their scarcity in the wild.

Re: Automatic register spill / restore?

<ta2jr8$1cgn$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26358&group=comp.arch#26358

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!vxOTfS5Jn/b499FNW7Y8DA.user.46.165.242.75.POSTED!not-for-mail
From: nos...@nowhere.com (Andy)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Wed, 6 Jul 2022 12:09:10 +1200
Organization: Aioe.org NNTP Server
Message-ID: <ta2jr8$1cgn$1@gioia.aioe.org>
References: <t9tuvg$1qgu$1@gioia.aioe.org>
<bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="45591"; posting-host="vxOTfS5Jn/b499FNW7Y8DA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.9.1
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Andy - Wed, 6 Jul 2022 00:09 UTC

On 5/07/22 05:56, MitchAlsup wrote:

> IBM 360 series::
> caller allocated a register save area and always kept it in R13
> upon any arrival (Callee, interruptee) would perform STM 12,12(13)
> which would dump all 16 registers in the save area, as the first step
> in properly receiving control.
> <
> VAX did something similar, but performed in in the CALL side of ISA
> processing.
>>
>> I wonder, that if they were real, is it possible given today's
>> transistor core counts those mechanisms could be revived in modern clean
>> sheet cpu designs?, to at least help alleviate some of the issues we see
>> in the current state of the art perhaps?
> <
> Effectively that is what ENTER and EXIT do in My 66000 architecture.
>>
Heh, okay then, problem solved I guess. (I really should get around to
emailing you for a copy of the My 66000 manual, but I've still got a ton
of PDFs littering my desktop at the moment)

However on the other hand I'm still wondering if a fully transparent
mechanism existed/is possible.
IE a program JSRs to a subroutine and just uses the registers it needs
without any special instructions or calling conventions needed.
When it hits the return instruction all the registers it disturbed are
restored to their former values automatically (except for the nominated
resultant register/s).

>> Or were there some really big downsides to the automagic things that I'm
>> not remembering well enough?
> <
> Automagic was so slow in VAX that the more modern compilers used JSR
> instead of CALL and got rid of ½ of the cycles in calling/returning.

So hopefully there's enough transistors to go around these days that
such a thing could be made faster than programed spill/fill maybe?

Re: Automatic register spill / restore?

<ef9984fa-00f5-4d19-a7ec-f87872886fc8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26359&group=comp.arch#26359

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:634b:0:b0:6b4:6de9:ae4b with SMTP id x72-20020a37634b000000b006b46de9ae4bmr7124731qkb.293.1657068272725;
Tue, 05 Jul 2022 17:44:32 -0700 (PDT)
X-Received: by 2002:a05:620a:2894:b0:6ae:ba71:ea98 with SMTP id
j20-20020a05620a289400b006aeba71ea98mr24984822qkp.89.1657068272586; Tue, 05
Jul 2022 17:44:32 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 17:44:32 -0700 (PDT)
In-Reply-To: <ta2jr8$1cgn$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
<ta2jr8$1cgn$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ef9984fa-00f5-4d19-a7ec-f87872886fc8n@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 06 Jul 2022 00:44:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 87
 by: MitchAlsup - Wed, 6 Jul 2022 00:44 UTC

On Tuesday, July 5, 2022 at 7:09:18 PM UTC-5, Andy wrote:
> On 5/07/22 05:56, MitchAlsup wrote:
>
> > IBM 360 series::
> > caller allocated a register save area and always kept it in R13
> > upon any arrival (Callee, interruptee) would perform STM 12,12(13)
> > which would dump all 16 registers in the save area, as the first step
> > in properly receiving control.
> > <
> > VAX did something similar, but performed in in the CALL side of ISA
> > processing.
> >>
> >> I wonder, that if they were real, is it possible given today's
> >> transistor core counts those mechanisms could be revived in modern clean
> >> sheet cpu designs?, to at least help alleviate some of the issues we see
> >> in the current state of the art perhaps?
> > <
> > Effectively that is what ENTER and EXIT do in My 66000 architecture.
> >>
> Heh, okay then, problem solved I guess. (I really should get around to
> emailing you for a copy of the My 66000 manual, but I've still got a ton
> of PDFs littering my desktop at the moment)
>
> However on the other hand I'm still wondering if a fully transparent
> mechanism existed/is possible.
<
> IE a program JSRs to a subroutine and just uses the registers it needs
> without any special instructions or calling conventions needed.
<
Effectively, Mill does this without adding instructions.
My 66000 does this with instructions.
<
On the other hand, up to ½ of all subroutine calls are to leaf subroutines
and in My 66000 ISA, if a subroutine does not touch R16..R31 it neither
has to nor expends any effort to save/restore registers. In my opinion
this is where you want the overhead to be minimal--don't do stuff that
does not need doing. For such subroutines the overhead is:
<
a) setting up arguments to the yet to be called subroutine
b) calling the subroutine
c) performing the subroutine
d) RET
e) doing something with the return value.
<
Any and all subroutines that can be performed using R0..R16 and do not
need any Local_data_area on the stack are overhead free.
<
{You can't get rid of (a) and (e) without inlining the subroutine.
And you can't get rid of (c), leaving only the 2 transfers of control}
<
> When it hits the return instruction all the registers it disturbed are
> restored to their former values automatically (except for the nominated
> resultant register/s).
<
You are making the assumption that values in registers cost more
to recalculate than to save and restore--which is often untrue. It is
often the case that one can recalculate a value in one cycle where
saving and restoring might cost up to 6 cycles. Then there are
registers the compiler KNOWS are not holding a value that is still
alive. Allowing these to die at certain control transfer values, saves
overhead.
<
> >> Or were there some really big downsides to the automagic things that I'm
> >> not remembering well enough?
> > <
> > Automagic was so slow in VAX that the more modern compilers used JSR
> > instead of CALL and got rid of ½ of the cycles in calling/returning.
<
> So hopefully there's enough transistors to go around these days that
> such a thing could be made faster than programed spill/fill maybe?
<
I am betting in that direction with ENTER and EXIT, and zero-instruction
context switching. With our current transistor budgets, one should be able
to deliver/receive at least 4-8 registers per cycle to/from the stack. This
saves 3-7 AGENs and consumes no more than ¼ cache line per cycle.
It is hard to imaging a series of LDs and STs that would save this much
power, cycles, accesses,... Also with ENTER and EXIT one can put preserved
registers on a stack where they cannot be destroyed and guarantee that
the ABI is preserved in both directions across the CALL boundary.

Re: Automatic register spill / restore?

<ta2mnm$3r88k$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26360&group=comp.arch#26360

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Automatic register spill / restore?
Date: Tue, 5 Jul 2022 17:58:29 -0700
Organization: A noiseless patient Spider
Lines: 91
Message-ID: <ta2mnm$3r88k$1@dont-email.me>
References: <t9tuvg$1qgu$1@gioia.aioe.org>
<bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
<ta2jr8$1cgn$1@gioia.aioe.org>
<ef9984fa-00f5-4d19-a7ec-f87872886fc8n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 6 Jul 2022 00:58:30 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="86c332ad6edcf9ecf7ef87ea8edd0325";
logging-data="4038932"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19pwQwbWFjXB5mG0cHE3M1w"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.11.0
Cancel-Lock: sha1:MTOQxBnOi0iTyRuFv7dlCjZJLdM=
Content-Language: en-US
In-Reply-To: <ef9984fa-00f5-4d19-a7ec-f87872886fc8n@googlegroups.com>
 by: Ivan Godard - Wed, 6 Jul 2022 00:58 UTC

On 7/5/2022 5:44 PM, MitchAlsup wrote:
> On Tuesday, July 5, 2022 at 7:09:18 PM UTC-5, Andy wrote:
>> On 5/07/22 05:56, MitchAlsup wrote:
>>
>>> IBM 360 series::
>>> caller allocated a register save area and always kept it in R13
>>> upon any arrival (Callee, interruptee) would perform STM 12,12(13)
>>> which would dump all 16 registers in the save area, as the first step
>>> in properly receiving control.
>>> <
>>> VAX did something similar, but performed in in the CALL side of ISA
>>> processing.
>>>>
>>>> I wonder, that if they were real, is it possible given today's
>>>> transistor core counts those mechanisms could be revived in modern clean
>>>> sheet cpu designs?, to at least help alleviate some of the issues we see
>>>> in the current state of the art perhaps?
>>> <
>>> Effectively that is what ENTER and EXIT do in My 66000 architecture.
>>>>
>> Heh, okay then, problem solved I guess. (I really should get around to
>> emailing you for a copy of the My 66000 manual, but I've still got a ton
>> of PDFs littering my desktop at the moment)
>>
>> However on the other hand I'm still wondering if a fully transparent
>> mechanism existed/is possible.
> <
>> IE a program JSRs to a subroutine and just uses the registers it needs
>> without any special instructions or calling conventions needed.
> <
> Effectively, Mill does this without adding instructions.
> My 66000 does this with instructions.
> <
> On the other hand, up to ½ of all subroutine calls are to leaf subroutines
> and in My 66000 ISA, if a subroutine does not touch R16..R31 it neither
> has to nor expends any effort to save/restore registers. In my opinion
> this is where you want the overhead to be minimal--don't do stuff that
> does not need doing. For such subroutines the overhead is:
> <
> a) setting up arguments to the yet to be called subroutine
> b) calling the subroutine
> c) performing the subroutine
> d) RET
> e) doing something with the return value.
> <
> Any and all subroutines that can be performed using R0..R16 and do not
> need any Local_data_area on the stack are overhead free.
> <
> {You can't get rid of (a) and (e) without inlining the subroutine.
> And you can't get rid of (c), leaving only the 2 transfers of control}
> <
>> When it hits the return instruction all the registers it disturbed are
>> restored to their former values automatically (except for the nominated
>> resultant register/s).
> <
> You are making the assumption that values in registers cost more
> to recalculate than to save and restore--which is often untrue. It is
> often the case that one can recalculate a value in one cycle where
> saving and restoring might cost up to 6 cycles. Then there are
> registers the compiler KNOWS are not holding a value that is still
> alive. Allowing these to die at certain control transfer values, saves
> overhead.
> <
>>>> Or were there some really big downsides to the automagic things that I'm
>>>> not remembering well enough?
>>> <
>>> Automagic was so slow in VAX that the more modern compilers used JSR
>>> instead of CALL and got rid of ½ of the cycles in calling/returning.
> <
>> So hopefully there's enough transistors to go around these days that
>> such a thing could be made faster than programed spill/fill maybe?
> <
> I am betting in that direction with ENTER and EXIT, and zero-instruction
> context switching. With our current transistor budgets, one should be able
> to deliver/receive at least 4-8 registers per cycle to/from the stack. This
> saves 3-7 AGENs and consumes no more than ¼ cache line per cycle.
> It is hard to imaging a series of LDs and STs that would save this much
> power, cycles, accesses,... Also with ENTER and EXIT one can put preserved
> registers on a stack where they cannot be destroyed and guarantee that
> the ABI is preserved in both directions across the CALL boundary.

Whereas if the HW at the call site knows which registers have live
content then it can silently do the save/restore without instruction -
and without compiler analysis. If the save/restore is inherently lazy
and the callee doesn't use the location then there's no overhead at all.
That's what Mill tries to do.

However, IMO the biggest value of Mill-style implicit save compared to
the my66 (and all others here) explicit save is future-proofing. The two
methods can be made overhead-equivalent with enough work, but I prefer
to keep the sausage-making hidden.

Re: Automatic register spill / restore?

<0b13ecd1-45ac-42bb-998f-b256c05347e9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26361&group=comp.arch#26361

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:113:b0:31b:700c:6af1 with SMTP id u19-20020a05622a011300b0031b700c6af1mr31484401qtw.382.1657072917575;
Tue, 05 Jul 2022 19:01:57 -0700 (PDT)
X-Received: by 2002:a05:622a:1356:b0:31d:4187:69a9 with SMTP id
w22-20020a05622a135600b0031d418769a9mr15282734qtk.279.1657072917349; Tue, 05
Jul 2022 19:01:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Jul 2022 19:01:57 -0700 (PDT)
In-Reply-To: <ta2mnm$3r88k$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <t9tuvg$1qgu$1@gioia.aioe.org> <bbea1a0c-c744-4cfe-86c0-d9d1281c829en@googlegroups.com>
<ta2jr8$1cgn$1@gioia.aioe.org> <ef9984fa-00f5-4d19-a7ec-f87872886fc8n@googlegroups.com>
<ta2mnm$3r88k$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0b13ecd1-45ac-42bb-998f-b256c05347e9n@googlegroups.com>
Subject: Re: Automatic register spill / restore?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 06 Jul 2022 02:01:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 130
 by: MitchAlsup - Wed, 6 Jul 2022 02:01 UTC

On Tuesday, July 5, 2022 at 7:58:33 PM UTC-5, Ivan Godard wrote:
> On 7/5/2022 5:44 PM, MitchAlsup wrote:
> > On Tuesday, July 5, 2022 at 7:09:18 PM UTC-5, Andy wrote:
> >> On 5/07/22 05:56, MitchAlsup wrote:
> >>
> >>> IBM 360 series::
> >>> caller allocated a register save area and always kept it in R13
> >>> upon any arrival (Callee, interruptee) would perform STM 12,12(13)
> >>> which would dump all 16 registers in the save area, as the first step
> >>> in properly receiving control.
> >>> <
> >>> VAX did something similar, but performed in in the CALL side of ISA
> >>> processing.
> >>>>
> >>>> I wonder, that if they were real, is it possible given today's
> >>>> transistor core counts those mechanisms could be revived in modern clean
> >>>> sheet cpu designs?, to at least help alleviate some of the issues we see
> >>>> in the current state of the art perhaps?
> >>> <
> >>> Effectively that is what ENTER and EXIT do in My 66000 architecture.
> >>>>
> >> Heh, okay then, problem solved I guess. (I really should get around to
> >> emailing you for a copy of the My 66000 manual, but I've still got a ton
> >> of PDFs littering my desktop at the moment)
> >>
> >> However on the other hand I'm still wondering if a fully transparent
> >> mechanism existed/is possible.
> > <
> >> IE a program JSRs to a subroutine and just uses the registers it needs
> >> without any special instructions or calling conventions needed.
> > <
> > Effectively, Mill does this without adding instructions.
> > My 66000 does this with instructions.
> > <
> > On the other hand, up to ½ of all subroutine calls are to leaf subroutines
> > and in My 66000 ISA, if a subroutine does not touch R16..R31 it neither
> > has to nor expends any effort to save/restore registers. In my opinion
> > this is where you want the overhead to be minimal--don't do stuff that
> > does not need doing. For such subroutines the overhead is:
> > <
> > a) setting up arguments to the yet to be called subroutine
> > b) calling the subroutine
> > c) performing the subroutine
> > d) RET
> > e) doing something with the return value.
> > <
> > Any and all subroutines that can be performed using R0..R16 and do not
> > need any Local_data_area on the stack are overhead free.
> > <
> > {You can't get rid of (a) and (e) without inlining the subroutine.
> > And you can't get rid of (c), leaving only the 2 transfers of control}
> > <
> >> When it hits the return instruction all the registers it disturbed are
> >> restored to their former values automatically (except for the nominated
> >> resultant register/s).
> > <
> > You are making the assumption that values in registers cost more
> > to recalculate than to save and restore--which is often untrue. It is
> > often the case that one can recalculate a value in one cycle where
> > saving and restoring might cost up to 6 cycles. Then there are
> > registers the compiler KNOWS are not holding a value that is still
> > alive. Allowing these to die at certain control transfer values, saves
> > overhead.
> > <
> >>>> Or were there some really big downsides to the automagic things that I'm
> >>>> not remembering well enough?
> >>> <
> >>> Automagic was so slow in VAX that the more modern compilers used JSR
> >>> instead of CALL and got rid of ½ of the cycles in calling/returning.
> > <
> >> So hopefully there's enough transistors to go around these days that
> >> such a thing could be made faster than programed spill/fill maybe?
> > <
> > I am betting in that direction with ENTER and EXIT, and zero-instruction
> > context switching. With our current transistor budgets, one should be able
> > to deliver/receive at least 4-8 registers per cycle to/from the stack. This
> > saves 3-7 AGENs and consumes no more than ¼ cache line per cycle.
> > It is hard to imaging a series of LDs and STs that would save this much
> > power, cycles, accesses,... Also with ENTER and EXIT one can put preserved
> > registers on a stack where they cannot be destroyed and guarantee that
> > the ABI is preserved in both directions across the CALL boundary.
<
> Whereas if the HW at the call site knows which registers have live
> content then it can silently do the save/restore without instruction -
> and without compiler analysis. If the save/restore is inherently lazy
> and the callee doesn't use the location then there's no overhead at all.
> That's what Mill tries to do.
<
It would be easy enough to have the ENTER instruction just associate
a stack address with the registers annotated, and then only move them
to stack if they get written; i.e., lazily. Then the EXIT instruction can
restore only those modified. However, this can cause "sequencing"
issues as the registers are not 'dense' whereas they were dense in
the encoding. I prefer not to have too many sequencing issues.
>
> However, IMO the biggest value of Mill-style implicit save compared to
> the my66 (and all others here) explicit save is future-proofing. The two
> methods can be made overhead-equivalent with enough work, but I prefer
> to keep the sausage-making hidden.

Pages:12
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor