Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Aren't you glad you're not getting all the government you pay for now?


computers / comp.sys.unisys / Speaking of the 1110 AKA 1100/40 and its two types of memory ...

SubjectAuthor
* Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
+* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
|`* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
| `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
|  +- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
|  `- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
`* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
 `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
  `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
   +* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
   |+* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
   ||`* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
   || `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
   ||  +- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
   ||  `- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Don Vito Martinelli
   |`* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
   | `- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
   `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
    `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
     +* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
     |`- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
     +* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
     |+- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
     |`- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
     +* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
     |+- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
     |`* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
     | `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...David W Schroth
     |  +* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
     |  |`- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...David W Schroth
     |  `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
     |   `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...David W Schroth
     |    +* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal
     |    |`* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...David W Schroth
     |    | `- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
     |    `- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Stephen Fuld
     `* Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Lewis Cole
      `- Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...Scott Lurndal

Pages:12
Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=386&group=comp.sys.unisys#386

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:a05:622a:153:b0:403:adff:5bb4 with SMTP id v19-20020a05622a015300b00403adff5bb4mr115179qtw.13.1691990333567;
Sun, 13 Aug 2023 22:18:53 -0700 (PDT)
X-Received: by 2002:a63:3488:0:b0:564:3137:58fd with SMTP id
b130-20020a633488000000b00564313758fdmr1406550pga.2.1691990332946; Sun, 13
Aug 2023 22:18:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Sun, 13 Aug 2023 22:18:52 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:956e:6971:d86c:53cb;
posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:956e:6971:d86c:53cb
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
Subject: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Mon, 14 Aug 2023 05:18:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Lewis Cole - Mon, 14 Aug 2023 05:18 UTC

So just for giggles, I've been thinking more about what (if anything) can be done to improve system performance by tweaks to a CPU's cache architecture.
I've always thought that having a separate cache for supervisor mode references and user mode references _SHOULD_ make things faster, but when I poked around old stuff on the Internet about caches from the beginning of time, I found that while Once Upon A Time, separate supervisor mode and user mode caches were considered something to try, they were apparently abandoned because a unified cache seemed to work better in simulations. Surprise, surprise.

This seems just so odd to me and so I've been wondering how much this result is an artifact of the toy OS that as used in the simulations (Unix) or the (by today's standards) small single layer caches used.
This got me to thinking about the 1110 AKA 1100/40 which had no caches but did have two different types of memory with different access speeds.
(I've always thought of the 1110 AKA 1100/40 as such an ugly machine that I've always ignored it and therefore remained ignorant of it even when I worked for the Company.)
To the extent that the faster (but smaller) memory could be viewed as a "cache" with a 100% hit rate, I've been wondering about how performance differed based on memory placement back then.
Was the Exec (whatever level it might have been ... between 27 and 33?) mostly or wholly loaded into the faster memory?
Was there special code (I think there was) that prioritized placement of certain things in memory and if so how?
What sort of performance gains did use of the faster memory produce or conversely what sort of performance penalties occur when it wasn't?

IOW, anyone care to dig up some old memories about the 1110 AKA 1100/40 you'd like to regale me with? Enquiring minds want to know.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<KTrCM.59048$m8Ke.33526@fx08.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=387&group=comp.sys.unisys#387

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx08.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
Lines: 17
Message-ID: <KTrCM.59048$m8Ke.33526@fx08.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 14 Aug 2023 15:38:18 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 14 Aug 2023 15:38:18 GMT
X-Received-Bytes: 1482
 by: Scott Lurndal - Mon, 14 Aug 2023 15:38 UTC

Lewis Cole <l_cole@juno.com> writes:
>So just for giggles, I've been thinking more about what (if anything) can b=
>e done to improve system performance by tweaks to a CPU's cache architectur=
>e.
>I've always thought that having a separate cache for supervisor mode refere=
>nces and user mode references _SHOULD_ make things faster, but when I poked=
> around old stuff on the Internet about caches from the beginning of time, =
>I found that while Once Upon A Time, separate supervisor mode and user mode=
> caches were considered something to try, they were apparently abandoned be=
>cause a unified cache seemed to work better in simulations. Surprise, surp=
>rise.
>
>This seems just so odd to me and so I've been wondering how much this resul=
>t is an artifact of the toy OS that as used in the simulations (Unix) or th=

Toy OS?

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<ubdjto$2d085$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=388&group=comp.sys.unisys#388

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Mon, 14 Aug 2023 09:14:48 -0700
Organization: A noiseless patient Spider
Lines: 75
Message-ID: <ubdjto$2d085$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 14 Aug 2023 16:14:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3c0ae20105bdbb5d331fe6d739e32ede";
logging-data="2523397"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1UYxiq41puuEbjrxVsz8s0bIaVSDQyTI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:RdwhkLEgz23ddTpe58H/AzAHDN4=
In-Reply-To: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
Content-Language: en-US
 by: Stephen Fuld - Mon, 14 Aug 2023 16:14 UTC

On 8/13/2023 10:18 PM, Lewis Cole wrote:
> So just for giggles, I've been thinking more about what (if anything) can be done to improve system performance by tweaks to a CPU's cache architecture.
> I've always thought that having a separate cache for supervisor mode references and user mode references _SHOULD_ make things faster, but when I poked around old stuff on the Internet about caches from the beginning of time, I found that while Once Upon A Time, separate supervisor mode and user mode caches were considered something to try, they were apparently abandoned because a unified cache seemed to work better in simulations. Surprise, surprise.

Yeah. Only using half the cache at any one time would seem to decrease
performance. :-)

> This seems just so odd to me and so I've been wondering how much this result is an artifact of the toy OS that as used in the simulations (Unix) or the (by today's standards) small single layer caches used.
> This got me to thinking about the 1110 AKA 1100/40 which had no caches but did have two different types of memory with different access speeds.
> (I've always thought of the 1110 AKA 1100/40 as such an ugly machine that I've always ignored it and therefore remained ignorant of it even when I worked for the Company.)
> To the extent that the faster (but smaller) memory could be viewed as a "cache" with a 100% hit rate, I've been wondering about how performance differed based on memory placement back then.

According to the 1110 system description on Bitsavers, the cycle time
for the primary memory (implemented as plated wire) was 325ns for a read
and 520ns for a write, whereas the extended memory (the same core
modules as used for the 1106 main memory) had 1,500 ns cycle time, so a
substantial difference, especially for reads.

But it really wasn't a cache. While there was a way to use the a
channel in a back-to back configuration, to transfer memory blocks from
one type of memory to the other (i.e. not use BT instructions), IIRC,
this was rarely used.

> Was the Exec (whatever level it might have been ... between 27 and 33?) mostly or wholly loaded into the faster memory?

IIRC, 27 was the last 1108/1106 only level. 28 was an internal
Roseville level to start the integration of 1110 support. Level 29
(again IIRC) was the second internal version, perhaps also used for
early beta site 1110 customers; 30 was the first 1110 version, released
on a limited basis primarily to 1110 customers, while 31 was the general
stable release.

> Was there special code (I think there was) that prioritized placement of certain things in memory and if so how?

There were options on the bank collector statements to specify prefer or
require either primary or extended memory. If you didn't specify, the
default was I-banks in primary, D-banks in extended. That made sense,
as all instructions required an I-bank read, but many instructions don't
require a D-bank reference (e.g. register to register, j=U or XU,
control transfer instructions), and the multiple D-bank instructions
(e.g. Search and BT) were rare. Also, since I-banks were almost
entirely reads, you took advantage of the faster read cycle time.

Also, I suspect most programs had a larger D-bank than I-bank, and since
you typically had more extended than primary memory, this allowed more
optimal use of the expensive primary memory.

I don't remember what parts of the Exec were where, but I suspect it was
the same as for user programs. Of course, the interrupt vector
instructions had to be in primary due to their hardware fixed addresses.

> What sort of performance gains did use of the faster memory produce or conversely what sort of performance penalties occur when it wasn't?

As you can see from the different cycle times, the differences were
substantial.

> IOW, anyone care to dig up some old memories about the 1110 AKA 1100/40 you'd like to regale me with? Enquiring minds want to know.

I hope this helps. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<ubdq1b$2dooc$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=389&group=comp.sys.unisys#389

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Mon, 14 Aug 2023 10:59:06 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <ubdq1b$2dooc$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<KTrCM.59048$m8Ke.33526@fx08.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 14 Aug 2023 17:59:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3c0ae20105bdbb5d331fe6d739e32ede";
logging-data="2548492"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+xEbmJhVgbUhJj/QCPxnrtCjmpWSJTisI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:4K9zJ/6eL/DjP8SMWP9EEnxwfJ0=
In-Reply-To: <KTrCM.59048$m8Ke.33526@fx08.iad>
Content-Language: en-US
 by: Stephen Fuld - Mon, 14 Aug 2023 17:59 UTC

On 8/14/2023 8:38 AM, Scott Lurndal wrote:
> Lewis Cole <l_cole@juno.com> writes:
>> So just for giggles, I've been thinking more about what (if anything) can b=
>> e done to improve system performance by tweaks to a CPU's cache architectur=
>> e.
>> I've always thought that having a separate cache for supervisor mode refere=
>> nces and user mode references _SHOULD_ make things faster, but when I poked=
>> around old stuff on the Internet about caches from the beginning of time, =
>> I found that while Once Upon A Time, separate supervisor mode and user mode=
>> caches were considered something to try, they were apparently abandoned be=
>> cause a unified cache seemed to work better in simulations. Surprise, surp=
>> rise.
>>
>> This seems just so odd to me and so I've been wondering how much this resul=
>> t is an artifact of the toy OS that as used in the simulations (Unix) or th=
>
> Toy OS?

Back in the time frame Lewis was talking about (1970s), many mainframe
people regarded Unix as a "toy OS". No one would think that now! 🙂

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=390&group=comp.sys.unisys#390

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:ad4:4aea:0:b0:63c:fb61:1a4a with SMTP id cp10-20020ad44aea000000b0063cfb611a4amr125398qvb.4.1692046056806;
Mon, 14 Aug 2023 13:47:36 -0700 (PDT)
X-Received: by 2002:a17:902:ec85:b0:1bd:dcdf:6179 with SMTP id
x5-20020a170902ec8500b001bddcdf6179mr2087310plg.2.1692046056463; Mon, 14 Aug
2023 13:47:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Mon, 14 Aug 2023 13:47:35 -0700 (PDT)
In-Reply-To: <ubdjto$2d085$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:bcad:a3a0:3fae:4d8f;
posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:bcad:a3a0:3fae:4d8f
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Mon, 14 Aug 2023 20:47:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 9121
 by: Lewis Cole - Mon, 14 Aug 2023 20:47 UTC

On Monday, August 14, 2023 at 9:14:50 AM UTC-7, Stephen Fuld wrote:
>> On 8/13/2023 10:18 PM, Lewis Cole wrote:
>> So just for giggles, I've been thinking
>> more about what (if anything) can be
>> done to improve system performance by
>> tweaks to a CPU's cache architecture.
>> I've always thought that having a
>> separate cache for supervisor mode
>> references and user mode references
>> _SHOULD_ make things faster, but when
>> I poked around old stuff on the
>> Internet about caches from the
>> beginning of time, I found that while
>> Once Upon A Time, separate supervisor
>> mode and user mode caches were
>> considered something to try, they
>> were apparently abandoned because a
>> unified cache seemed to work better
>> in simulations. Surprise, surprise.
>
> Yeah. Only using half the cache at any one time would seem to decrease
> performance. :-)
>

Of course, the smiley face indicates that you are being facetious.
But just on the off chance that someone wandering through the group might take you seriously, let me point out that re-purposing half of a cache DOES NOT necessarily reduce performance, and may in fact increase it if the way that the "missing" half is used somehow manages to increase the overall hit rate ... such as reducing a unified cache that's used to store both code and data with a separate i-cache for holding instructions and a separate d-cache for holding data which is _de rigueur_ on processor caches these days.

I think it should be clear from the multiple layers of cache these days, each layer being slower but larger than the one above it, that the further you go down (towards memory), the more a given cache is supposed to cache instructions/data that is "high use", but not so much as what's in the cache above it.
And even since the beginning of time (well ... since real live multi-tasking OS appeared), it has been obvious that processors tend to spend most of their time in supervisor mode (OS) code rather than in user (program) code.
From what I've read, the reason why separate supervisor and user mode caches performed worse than a unified cache was because of all the bouncing around through out the OS that was done.
Back in The Good Old days where caches were very small essentially single layer, it is easy to imagine that a substantial fraction of any OS code/data (including that of a toy) could not fit in the one and only small cache and would not stay there for very long if it somehow managed to get there.
But these days, caches are huge (especially the lower level ones) and it doesn't seem all that unimaginable to me that you could fit and keep a substantial portion of any OS laying around in one of the L3 caches of today ... or worse yet, in a L4 cache if a case for better performance can be made.

>> This seems just so odd to me and so
>> I've been wondering how much this
>> result is an artifact of the toy OS
>> that as used in the simulations
>> (Unix) or the (by today's standards)
>> small single layer caches used.
>> This got me to thinking about the
>> 1110 AKA 1100/40 which had no caches
>> but did have two different types of
>> memory with different access speeds.
>> (I've always thought of the 1110
>> AKA 1100/40 as such an ugly machine
>> that I've always ignored it and
>> therefore remained ignorant of it
>> even when I worked for the Company.)
>> To the extent that the faster (but
>> smaller) memory could be viewed as
>> a "cache" with a 100% hit rate, I've
>> been wondering about how performance
>> differed based on memory placement
>> back then.
>
> According to the 1110 system description on Bitsavers, the cycle time
> for the primary memory (implemented as plated wire) was 325ns for a read
> and 520ns for a write, whereas the extended memory (the same core
> modules as used for the 1106 main memory) had 1,500 ns cycle time, so a
> substantial difference, especially for reads.

Yes.

> But it really wasn't a cache. While there was a way to use the a
> channel in a back-to back configuration, to transfer memory blocks from
> one type of memory to the other (i.e. not use BT instructions), IIRC,
> this was rarely used.

No, it wasn't a cache, which I thought I made clear in my OP.
Nevertheless, I think one can reasonably view/think of "primary" memory as if it were a slower memory that just happened to be cached where just by some accident, the cache would always return a hit.
Perhaps this seems weird to you, but it seems like a convenient tool to me to see if there might be any advantage to having separate supervisor mode and user mode caches.

>> Was the Exec (whatever level it
>> might have been ... between 27
>> and 33?) mostly or wholly loaded
>> into the faster memory?
>
> IIRC, 27 was the last 1108/1106 only level. 28 was an internal
> Roseville level to start the integration of 1110 support. Level 29
> (again IIRC) was the second internal version, perhaps also used for
> early beta site 1110 customers; 30 was the first 1110 version, released
> on a limited basis primarily to 1110 customers, while 31 was the general
> stable release.
>

Thanks for the history.

>> Was there special code (I think
>> there was) that prioritized
>> placement of certain things in
>> memory and if so how?
>
> There were options on the bank collector statements to specify prefer or
> require either primary or extended memory. If you didn't specify, the
> default was I-banks in primary, D-banks in extended. That made sense,
> as all instructions required an I-bank read, but many instructions don't
> require a D-bank reference (e.g. register to register, j=U or XU,
> control transfer instructions), and the multiple D-bank instructions
>(e.g. Search and BT) were rare. Also, since I-banks were almost
> entirely reads, you took advantage of the faster read cycle time.
>
> Also, I suspect most programs had a larger D-bank than I-bank, and since
> you typically had more extended than primary memory, this allowed more
> optimal use of the expensive primary memory.
>
> I don't remember what parts of the Exec were where, but I suspect it was
> the same as for user programs. Of course, the interrupt vector
> instructions had to be in primary due to their hardware fixed addresses.
>

For me, life started with 36 level by which time *BOOT1, et. al. had given way to *BTBLK, et. al.
Whatever the old bootstrap did, the new one tried to place the Exec I- and D-banks at opposite ends of memory, presumably so that concurrent accesses stood a better chance of not blocking one another due to being in a physically different memory that was often times interleaved.
IIRC, whether or not this was actually useful, it didn't change until M-Series hit the fan with paging.

>> What sort of performance gains
>> did use of the faster memory
>> produce or conversely what sort
>> of performance penalties occur
>> when it wasn't?
>
> As you can see from the different cycle times, the differences were
> substantial.

Yes, but do you know of anything that would suggest things were faster/slower because a lot of the OS was in primary storage most of the time ... IOW something that would support/refute the idea that separate supervisor and user mode caches might now be A Good Idea?

>> IOW, anyone care to dig up some
>> old memories about the 1110 AKA
>> 1100/40 you'd like to regale me
>> with? Enquiring minds want to know.
>
> I hope this helps. :-)

Yes, it does, but feel free to add more.

> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<ubf86b$2nnjm$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=391&group=comp.sys.unisys#391

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Tue, 15 Aug 2023 00:06:51 -0700
Organization: A noiseless patient Spider
Lines: 173
Message-ID: <ubf86b$2nnjm$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me>
<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 15 Aug 2023 07:06:51 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="618f27bf57e07137fb14ed222628157e";
logging-data="2874998"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180g6M1qKSa76jtEY3+1Dl4QS/8G5ergA0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:2UmEPKsgR0ooI6qNhFPHAEFlIYo=
Content-Language: en-US
In-Reply-To: <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
 by: Stephen Fuld - Tue, 15 Aug 2023 07:06 UTC

On 8/14/2023 1:47 PM, Lewis Cole wrote:
> On Monday, August 14, 2023 at 9:14:50 AM UTC-7, Stephen Fuld wrote:
>>> On 8/13/2023 10:18 PM, Lewis Cole wrote:
>>> So just for giggles, I've been thinking
>>> more about what (if anything) can be
>>> done to improve system performance by
>>> tweaks to a CPU's cache architecture.
>>> I've always thought that having a
>>> separate cache for supervisor mode
>>> references and user mode references
>>> _SHOULD_ make things faster, but when
>>> I poked around old stuff on the
>>> Internet about caches from the
>>> beginning of time, I found that while
>>> Once Upon A Time, separate supervisor
>>> mode and user mode caches were
>>> considered something to try, they
>>> were apparently abandoned because a
>>> unified cache seemed to work better
>>> in simulations. Surprise, surprise.
>>
>> Yeah. Only using half the cache at any one time would seem to decrease
>> performance. :-)
>>
>
> Of course, the smiley face indicates that you are being facetious.
> But just on the off chance that someone wandering through the group might take you seriously, let me point out that re-purposing half of a cache DOES NOT necessarily reduce performance, and may in fact increase it if the way that the "missing" half is used somehow manages to increase the overall hit rate ... such as reducing a unified cache that's used to store both code and data with a separate i-cache for holding instructions and a separate d-cache for holding data which is _de rigueur_ on processor caches these days.
>
> I think it should be clear from the multiple layers of cache these days, each layer being slower but larger than the one above it, that the further you go down (towards memory), the more a given cache is supposed to cache instructions/data that is "high use", but not so much as what's in the cache above it.
> And even since the beginning of time (well ... since real live multi-tasking OS appeared), it has been obvious that processors tend to spend most of their time in supervisor mode (OS) code rather than in user (program) code.

I don't want to get into an argument about caching with you, but I am
sure that the percentage of time spent in supervisor mode is very
workload dependent.

> From what I've read, the reason why separate supervisor and user mode caches performed worse than a unified cache was because of all the bouncing around through out the OS that was done.
> Back in The Good Old days where caches were very small essentially single layer, it is easy to imagine that a substantial fraction of any OS code/data (including that of a toy) could not fit in the one and only small cache and would not stay there for very long if it somehow managed to get there.
> But these days, caches are huge (especially the lower level ones) and it doesn't seem all that unimaginable to me that you could fit and keep a substantial portion of any OS laying around in one of the L3 caches of today ... or worse yet, in a L4 cache if a case for better performance can be made.
>
>>> This seems just so odd to me and so
>>> I've been wondering how much this
>>> result is an artifact of the toy OS
>>> that as used in the simulations
>>> (Unix) or the (by today's standards)
>>> small single layer caches used.
>>> This got me to thinking about the
>>> 1110 AKA 1100/40 which had no caches
>>> but did have two different types of
>>> memory with different access speeds.
>>> (I've always thought of the 1110
>>> AKA 1100/40 as such an ugly machine
>>> that I've always ignored it and
>>> therefore remained ignorant of it
>>> even when I worked for the Company.)
>>> To the extent that the faster (but
>>> smaller) memory could be viewed as
>>> a "cache" with a 100% hit rate, I've
>>> been wondering about how performance
>>> differed based on memory placement
>>> back then.
>>
>> According to the 1110 system description on Bitsavers, the cycle time
>> for the primary memory (implemented as plated wire) was 325ns for a read
>> and 520ns for a write, whereas the extended memory (the same core
>> modules as used for the 1106 main memory) had 1,500 ns cycle time, so a
>> substantial difference, especially for reads.
>
> Yes.
>
>> But it really wasn't a cache. While there was a way to use the a
>> channel in a back-to back configuration, to transfer memory blocks from
>> one type of memory to the other (i.e. not use BT instructions), IIRC,
>> this was rarely used.
>
> No, it wasn't a cache, which I thought I made clear in my OP.
> Nevertheless, I think one can reasonably view/think of "primary" memory as if it were a slower memory that just happened to be cached where just by some accident, the cache would always return a hit.
> Perhaps this seems weird to you, but it seems like a convenient tool to me to see if there might be any advantage to having separate supervisor mode and user mode caches.

I agree that it sounds weird to me, but if it helps you, have at it.

>
>>> Was the Exec (whatever level it
>>> might have been ... between 27
>>> and 33?) mostly or wholly loaded
>>> into the faster memory?
>>
>> IIRC, 27 was the last 1108/1106 only level. 28 was an internal
>> Roseville level to start the integration of 1110 support. Level 29
>> (again IIRC) was the second internal version, perhaps also used for
>> early beta site 1110 customers; 30 was the first 1110 version, released
>> on a limited basis primarily to 1110 customers, while 31 was the general
>> stable release.
>>
>
> Thanks for the history.
>
>>> Was there special code (I think
>>> there was) that prioritized
>>> placement of certain things in
>>> memory and if so how?
>>
>> There were options on the bank collector statements to specify prefer or
>> require either primary or extended memory. If you didn't specify, the
>> default was I-banks in primary, D-banks in extended. That made sense,
>> as all instructions required an I-bank read, but many instructions don't
>> require a D-bank reference (e.g. register to register, j=U or XU,
>> control transfer instructions), and the multiple D-bank instructions
>> (e.g. Search and BT) were rare. Also, since I-banks were almost
>> entirely reads, you took advantage of the faster read cycle time.
>>
>> Also, I suspect most programs had a larger D-bank than I-bank, and since
>> you typically had more extended than primary memory, this allowed more
>> optimal use of the expensive primary memory.
>>
>> I don't remember what parts of the Exec were where, but I suspect it was
>> the same as for user programs. Of course, the interrupt vector
>> instructions had to be in primary due to their hardware fixed addresses.
>>
>
> For me, life started with 36 level by which time *BOOT1, et. al. had given way to *BTBLK, et. al.
> Whatever the old bootstrap did, the new one tried to place the Exec I- and D-banks at opposite ends of memory, presumably so that concurrent accesses stood a better chance of not blocking one another due to being in a physically different memory that was often times interleaved.
> IIRC, whether or not this was actually useful, it didn't change until M-Series hit the fan with paging.

First of all, when I mentioned the interrupt vectors, I wasn't talking
about boot elements, but the code starting at address 0200 (128 decimal)
through 0377 on at least pre 1100/90 systems which was a set of LMJ
instructions, one per interrupt type, that were the first instructions
executed after an interrupt. e.g. on an 1108, on an ISI External
Interrupt on CPU0 the hardware would transfer control to address 0200,
where the LMJ instruction would capture the address of the next
instruction to be executed in the interrupted program, then transfer
control to the ISI interrupt handler.

But you did jog my memory about Exec placement. On an 1108, the Exec
I-bank was loaded starting at address 0, and extended at far as needed.
The Exec D-bank was loaded at the end of memory i.e. ending at 262K for
a fully configured memory, extending "downward" as far as needed. This
left the largest contiguous space possible for user programs, as well as
insuring that the Exec I and D banks were in different memory banks, to
guarantee overlapped timing for I fetch and data access. I guess that
the 1110 just did the same thing, as it didn't require changing another
thing, and maximized the contiguous space available for user banks in
both primary and extended memory.

>>> What sort of performance gains
>>> did use of the faster memory
>>> produce or conversely what sort
>>> of performance penalties occur
>>> when it wasn't?
>>
>> As you can see from the different cycle times, the differences were
>> substantial.
>
> Yes, but do you know of anything that would suggest things were faster/slower because a lot of the OS was in primary storage most of the time ... IOW something that would support/refute the idea that separate supervisor and user mode caches might now be A Good Idea?


Click here to read the complete article
Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<quLCM.59082$m8Ke.5166@fx08.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=392&group=comp.sys.unisys#392

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx08.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <KTrCM.59048$m8Ke.33526@fx08.iad> <ubdq1b$2dooc$1@dont-email.me>
Lines: 27
Message-ID: <quLCM.59082$m8Ke.5166@fx08.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 15 Aug 2023 13:56:38 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 15 Aug 2023 13:56:38 GMT
X-Received-Bytes: 2114
 by: Scott Lurndal - Tue, 15 Aug 2023 13:56 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 8/14/2023 8:38 AM, Scott Lurndal wrote:
>> Lewis Cole <l_cole@juno.com> writes:
>>> So just for giggles, I've been thinking more about what (if anything) can b=
>>> e done to improve system performance by tweaks to a CPU's cache architectur=
>>> e.
>>> I've always thought that having a separate cache for supervisor mode refere=
>>> nces and user mode references _SHOULD_ make things faster, but when I poked=
>>> around old stuff on the Internet about caches from the beginning of time, =
>>> I found that while Once Upon A Time, separate supervisor mode and user mode=
>>> caches were considered something to try, they were apparently abandoned be=
>>> cause a unified cache seemed to work better in simulations. Surprise, surp=
>>> rise.
>>>
>>> This seems just so odd to me and so I've been wondering how much this resul=
>>> t is an artifact of the toy OS that as used in the simulations (Unix) or th=
>>
>> Toy OS?
>
>Back in the time frame Lewis was talking about (1970s), many mainframe
>people regarded Unix as a "toy OS". No one would think that now!

Some people, perhaps.

Burroughs, on the other hand, had unix offerings via Convergent Technologies,
and as Unisys, developed the unix-based OPUS systems (distributed, massively parallel
intel-based systems running a custom microkernel-based distributed version of SVR4).

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<tELCM.492852$TCKc.106157@fx13.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=393&group=comp.sys.unisys#393

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me>
Lines: 60
Message-ID: <tELCM.492852$TCKc.106157@fx13.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 15 Aug 2023 14:07:21 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 15 Aug 2023 14:07:21 GMT
X-Received-Bytes: 4273
 by: Scott Lurndal - Tue, 15 Aug 2023 14:07 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 8/14/2023 1:47 PM, Lewis Cole wrote:
>> On Monday, August 14, 2023 at 9:14:50 AM UTC-7, Stephen Fuld wrote:
>>>> On 8/13/2023 10:18 PM, Lewis Cole wrote:
>>>> So just for giggles, I've been thinking
>>>> more about what (if anything) can be
>>>> done to improve system performance by
>>>> tweaks to a CPU's cache architecture.
>>>> I've always thought that having a
>>>> separate cache for supervisor mode
>>>> references and user mode references
>>>> _SHOULD_ make things faster, but when
>>>> I poked around old stuff on the
>>>> Internet about caches from the
>>>> beginning of time, I found that while
>>>> Once Upon A Time, separate supervisor
>>>> mode and user mode caches were
>>>> considered something to try, they
>>>> were apparently abandoned because a
>>>> unified cache seemed to work better
>>>> in simulations. Surprise, surprise.
>>>
>>> Yeah. Only using half the cache at any one time would seem to decrease
>>> performance. :-)
>>>
>>
>> Of course, the smiley face indicates that you are being facetious.
>> But just on the off chance that someone wandering through the group might take you seriously, let me point out that re-purposing half of a cache DOES NOT necessarily reduce performance, and may in fact increase it if the way that the "missing" half is used somehow manages to increase the overall hit rate ... such as reducing a unified cache that's used to store both code and data with a separate i-cache for holding instructions and a separate d-cache for holding data which is _de rigueur_ on processor caches these days.
>>
>> I think it should be clear from the multiple layers of cache these days, each layer being slower but larger than the one above it, that the further you go down (towards memory), the more a given cache is supposed to cache instructions/data that is "high use", but not so much as what's in the cache above it.
>> And even since the beginning of time (well ... since real live multi-tasking OS appeared), it has been obvious that processors tend to spend most of their time in supervisor mode (OS) code rather than in user (program) code.
>
>I don't want to get into an argument about caching with you, but I am
>sure that the percentage of time spent in supervisor mode is very
>workload dependent.
>

Indeed. On modern toy unix systems, the split is closer to 10% system, 90% user.

For example, a large parallel compilation job[*] (using half of the available 64 cores):

%Cpu(s): 28.6 us, 2.7 sy, 0.1 ni, 66.6 id, 1.8 wa, 0.0 hi, 0.3 si, 0.0 st

That's 28.6% in user mode, 2.7% in system (supervisor) mode.

Most modern server processors (intel, arm64) offer programmable cache partitioning
mechanisms that allow the OS to designate that a schedulable entity belongs to a
specific partition, and provides controls to designate portions of the cache are
reserved to those schedulable entities (threads, processes).

Note also that in modern server grade processors, there are extensions to the
instruction set to allow the application to instruct the cache that data will
be used in the future, in which case the cache controller _may_ pre-load the
data in anticipation of future use.

Most cache subsystems also include logic to anticipate future accesses and
preload the data into the cache before the processor requires it based on
historic patterns of access.

[*] takes close to an hour on a single core system, over 9 million SLOC.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<iVMCM.91255$VzFf.32852@fx03.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=394&group=comp.sys.unisys#394

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx03.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <KTrCM.59048$m8Ke.33526@fx08.iad> <ubdq1b$2dooc$1@dont-email.me> <quLCM.59082$m8Ke.5166@fx08.iad>
Lines: 30
Message-ID: <iVMCM.91255$VzFf.32852@fx03.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 15 Aug 2023 15:33:34 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 15 Aug 2023 15:33:34 GMT
X-Received-Bytes: 2303
 by: Scott Lurndal - Tue, 15 Aug 2023 15:33 UTC

scott@slp53.sl.home (Scott Lurndal) writes:
>Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>>On 8/14/2023 8:38 AM, Scott Lurndal wrote:
>>> Lewis Cole <l_cole@juno.com> writes:
>>>> So just for giggles, I've been thinking more about what (if anything) can b=
>>>> e done to improve system performance by tweaks to a CPU's cache architectur=
>>>> e.
>>>> I've always thought that having a separate cache for supervisor mode refere=
>>>> nces and user mode references _SHOULD_ make things faster, but when I poked=
>>>> around old stuff on the Internet about caches from the beginning of time, =
>>>> I found that while Once Upon A Time, separate supervisor mode and user mode=
>>>> caches were considered something to try, they were apparently abandoned be=
>>>> cause a unified cache seemed to work better in simulations. Surprise, surp=
>>>> rise.
>>>>
>>>> This seems just so odd to me and so I've been wondering how much this resul=
>>>> t is an artifact of the toy OS that as used in the simulations (Unix) or th=
>>>
>>> Toy OS?
>>
>>Back in the time frame Lewis was talking about (1970s), many mainframe
>>people regarded Unix as a "toy OS". No one would think that now!
>
>Some people, perhaps.
>
>Burroughs, on the other hand, had unix offerings via Convergent Technologies,
>and as Unisys, developed the unix-based OPUS systems (distributed, massively parallel
>intel-based systems running a custom microkernel-based distributed version of SVR4).

I'll note that Mapper was one of the applications that ran on the OPUS systems.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<ubg6gj$2s7av$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=395&group=comp.sys.unisys#395

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Tue, 15 Aug 2023 08:44:19 -0700
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <ubg6gj$2s7av$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<KTrCM.59048$m8Ke.33526@fx08.iad> <ubdq1b$2dooc$1@dont-email.me>
<quLCM.59082$m8Ke.5166@fx08.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 15 Aug 2023 15:44:19 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="618f27bf57e07137fb14ed222628157e";
logging-data="3022175"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qdol5/VGMxpGaeRGofQBii8ER2nKvMwg="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:tF7xwEvFyopY69RlPQEwQGp2UO0=
Content-Language: en-US
In-Reply-To: <quLCM.59082$m8Ke.5166@fx08.iad>
 by: Stephen Fuld - Tue, 15 Aug 2023 15:44 UTC

On 8/15/2023 6:56 AM, Scott Lurndal wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> On 8/14/2023 8:38 AM, Scott Lurndal wrote:
>>> Lewis Cole <l_cole@juno.com> writes:
>>>> So just for giggles, I've been thinking more about what (if anything) can b=
>>>> e done to improve system performance by tweaks to a CPU's cache architectur=
>>>> e.
>>>> I've always thought that having a separate cache for supervisor mode refere=
>>>> nces and user mode references _SHOULD_ make things faster, but when I poked=
>>>> around old stuff on the Internet about caches from the beginning of time, =
>>>> I found that while Once Upon A Time, separate supervisor mode and user mode=
>>>> caches were considered something to try, they were apparently abandoned be=
>>>> cause a unified cache seemed to work better in simulations. Surprise, surp=
>>>> rise.
>>>>
>>>> This seems just so odd to me and so I've been wondering how much this resul=
>>>> t is an artifact of the toy OS that as used in the simulations (Unix) or th=
>>>
>>> Toy OS?
>>
>> Back in the time frame Lewis was talking about (1970s), many mainframe
>> people regarded Unix as a "toy OS". No one would think that now!
>
> Some people, perhaps.

I suppose I should have been more specific. At least among the
Univac/Sperry users, which Lewis and I were both part of, that view was
pretty common.

> Burroughs, on the other hand, had unix offerings via Convergent Technologies,

But that was later, at least the 1980s.

> and as Unisys, developed the unix-based OPUS systems (distributed, massively parallel
> intel-based systems running a custom microkernel-based distributed version of SVR4).

Which, of course, was even later.

As I said, as Unix improved, the belief that it was a "toy" system
diminished.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<ubg70j$2s7av$2@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=396&group=comp.sys.unisys#396

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Tue, 15 Aug 2023 08:52:51 -0700
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <ubg70j$2s7av$2@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me>
<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 15 Aug 2023 15:52:51 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="618f27bf57e07137fb14ed222628157e";
logging-data="3022175"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/w4dsPv7bvF1zgY+nh0kc1ac3AwhzDjeo="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:SVWDMzZTppdwCDGzs261CjE0lL8=
Content-Language: en-US
In-Reply-To: <tELCM.492852$TCKc.106157@fx13.iad>
 by: Stephen Fuld - Tue, 15 Aug 2023 15:52 UTC

On 8/15/2023 7:07 AM, Scott Lurndal wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> On 8/14/2023 1:47 PM, Lewis Cole wrote:
>>> On Monday, August 14, 2023 at 9:14:50 AM UTC-7, Stephen Fuld wrote:
>>>>> On 8/13/2023 10:18 PM, Lewis Cole wrote:
>>>>> So just for giggles, I've been thinking
>>>>> more about what (if anything) can be
>>>>> done to improve system performance by
>>>>> tweaks to a CPU's cache architecture.
>>>>> I've always thought that having a
>>>>> separate cache for supervisor mode
>>>>> references and user mode references
>>>>> _SHOULD_ make things faster, but when
>>>>> I poked around old stuff on the
>>>>> Internet about caches from the
>>>>> beginning of time, I found that while
>>>>> Once Upon A Time, separate supervisor
>>>>> mode and user mode caches were
>>>>> considered something to try, they
>>>>> were apparently abandoned because a
>>>>> unified cache seemed to work better
>>>>> in simulations. Surprise, surprise.
>>>>
>>>> Yeah. Only using half the cache at any one time would seem to decrease
>>>> performance. :-)
>>>>
>>>
>>> Of course, the smiley face indicates that you are being facetious.
>>> But just on the off chance that someone wandering through the group might take you seriously, let me point out that re-purposing half of a cache DOES NOT necessarily reduce performance, and may in fact increase it if the way that the "missing" half is used somehow manages to increase the overall hit rate ... such as reducing a unified cache that's used to store both code and data with a separate i-cache for holding instructions and a separate d-cache for holding data which is _de rigueur_ on processor caches these days.
>>>
>>> I think it should be clear from the multiple layers of cache these days, each layer being slower but larger than the one above it, that the further you go down (towards memory), the more a given cache is supposed to cache instructions/data that is "high use", but not so much as what's in the cache above it.
>>> And even since the beginning of time (well ... since real live multi-tasking OS appeared), it has been obvious that processors tend to spend most of their time in supervisor mode (OS) code rather than in user (program) code.
>>
>> I don't want to get into an argument about caching with you, but I am
>> sure that the percentage of time spent in supervisor mode is very
>> workload dependent.
>>
>
> Indeed. On modern toy unix systems, the split is closer to 10% system, 90% user.

That is more in line with my experience and expectations. Of course, if
you are doing OLTP is is probably a higher percentage of system than
what you show; conversely, a highly compute bound scientific job may be
even less.

> For example, a large parallel compilation job[*] (using half of the available 64 cores):
>
> %Cpu(s): 28.6 us, 2.7 sy, 0.1 ni, 66.6 id, 1.8 wa, 0.0 hi, 0.3 si, 0.0 st
>
> That's 28.6% in user mode, 2.7% in system (supervisor) mode.
>
> Most modern server processors (intel, arm64) offer programmable cache partitioning
> mechanisms that allow the OS to designate that a schedulable entity belongs to a
> specific partition, and provides controls to designate portions of the cache are
> reserved to those schedulable entities (threads, processes).

I wasn't aware of this. I will have to do some research. :-)

> Note also that in modern server grade processors, there are extensions to the
> instruction set to allow the application to instruct the cache that data will
> be used in the future, in which case the cache controller _may_ pre-load the
> data in anticipation of future use.

Something beyond prefetch instructions?

> Most cache subsystems also include logic to anticipate future accesses and
> preload the data into the cache before the processor requires it based on
> historic patterns of access.

Sure - especially sequential access patterns are easy to detect.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<d9OCM.632528$TPw2.94445@fx17.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=397&group=comp.sys.unisys#397

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad> <ubg70j$2s7av$2@dont-email.me>
Lines: 43
Message-ID: <d9OCM.632528$TPw2.94445@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 15 Aug 2023 16:58:49 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 15 Aug 2023 16:58:49 GMT
X-Received-Bytes: 2477
 by: Scott Lurndal - Tue, 15 Aug 2023 16:58 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 8/15/2023 7:07 AM, Scott Lurndal wrote:

>>
>> Most modern server processors (intel, arm64) offer programmable cache partitioning
>> mechanisms that allow the OS to designate that a schedulable entity belongs to a
>> specific partition, and provides controls to designate portions of the cache are
>> reserved to those schedulable entities (threads, processes).
>
>I wasn't aware of this. I will have to do some research. :-)

For the ARM64 version, look for Memory System Resource Partitioning and Monitoring
(MPAM).

https://developer.arm.com/documentation/ddi0598/latest/

Note that this only controls "allocation", not "access" - i.e. any application
can hit a line in any partition, but new lines are only allocated in partitions
associated with the entity that caused the fill to occur.

Resources include cache allocation and memory bandwidth.

>
>
>> Note also that in modern server grade processors, there are extensions to the
>> instruction set to allow the application to instruct the cache that data will
>> be used in the future, in which case the cache controller _may_ pre-load the
>> data in anticipation of future use.
>
>Something beyond prefetch instructions?

Prefetch instructions are what I had in mind.

>
>
>> Most cache subsystems also include logic to anticipate future accesses and
>> preload the data into the cache before the processor requires it based on
>> historic patterns of access.
>
>Sure - especially sequential access patterns are easy to detect.

Yep, stride-based prefetches have been common for a couple of decades
now.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=398&group=comp.sys.unisys#398

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:a05:6214:8ef:b0:63f:6499:86d0 with SMTP id dr15-20020a05621408ef00b0063f649986d0mr158755qvb.5.1692125297686;
Tue, 15 Aug 2023 11:48:17 -0700 (PDT)
X-Received: by 2002:a05:6a00:2488:b0:67d:41a8:3e19 with SMTP id
c8-20020a056a00248800b0067d41a83e19mr6004966pfv.3.1692125297349; Tue, 15 Aug
2023 11:48:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Tue, 15 Aug 2023 11:48:16 -0700 (PDT)
In-Reply-To: <ubf86b$2nnjm$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:111a:c7c:cbd:f2bd;
posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:111a:c7c:cbd:f2bd
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Tue, 15 Aug 2023 18:48:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 11469
 by: Lewis Cole - Tue, 15 Aug 2023 18:48 UTC

On Tuesday, August 15, 2023 at 12:06:53 AM UTC-7, Stephen Fuld wrote:
<snip>
>>> Yeah. Only using half the cache at any one time would seem to decrease
>>> performance. :-)
>>
>> Of course, the smiley face indicates
>> that you are being facetious.
>> But just on the off chance that
>> someone wandering through the group
>> might take you seriously, let me
>> point out that re-purposing half of
>> a cache DOES NOT necessarily reduce
>> performance, and may in fact increase
>> it if the way that the "missing" half
>> is used somehow manages to increase
>> the overall hit rate ... such as
>> reducing a unified cache that's used
>> to store both code and data with a
>> separate i-cache for holding
>> instructions and a separate d-cache
>> for holding data which is _de rigueur_
>> on processor caches these days.
>>
>> I think it should be clear from the
>> multiple layers of cache these days,
>> each layer being slower but larger
>> than the one above it, that the
>> further you go down (towards memory),
>> the more a given cache is supposed to
>> cache instructions/data that is "high
>> use", but not so much as what's in
>> the cache above it.
>> And even since the beginning of time
>> (well ... since real live multi-tasking
>> OS appeared), it has been obvious that
>> processors tend to spend most of their
>> time in supervisor mode (OS) code
>> rather than in user (program) code.
>
> I don't want to get into an argument about caching with you, [...]

I'm not sure what sort of argument you think I'm trying get into WRT caching, but I assume that we both are familiar enough with it so that there's really no argument to be had so your comment makes no sense to me.

> [...] but I am sure that the percentage of time spent in supervisor mode is very
> workload dependent.

Agreed.
But to the extent that the results of the SS keyin were ... useful .. in The Good Old Days at Roseville, I recall seeing something in excess of 80+ percent of the time was spent in the Exec on regular basis.

I'm sure that things have changed now in the Real World as the fixation with server farms that run VMs no doubt means that every attempt is likely being made to stay out of underlying OS and probably even the virtualized OSs as well.
I have no problem with this and think that it's A Good Thing.

But in my OP, I referred to "system performance", not "system throughput".
If all I wanted to do was to increase system throughput, then the obvious solution is to basically get rid of the OS entirely and link in whatever OS-like services are needed with a user program that I could then magically load all by itself into system memory and then run all by itself in a more or less batch processing manner (which I gather was what MirageOS tried to do).

However, I'm assuming that an OS is necessary, not just for throughput but also responsiveness for example and ISTM that the best way (that I can think of at least) is to try to keep as much of the "high use" portions of an OS in a cache.
If everything works, the percentage of time that is spent in the OS should drop relative to what it would otherwise happen to be and ideally system performance (both throughput and responsiveness) should increase, even in the case of micro-kernels where everthing that absolutely doesn't require OS priviledge is pushed out to user land, because (hopefully) crucially needed OS code (say like the message passing code kernel at the heart of many micro-kernels) would be able to run as fast as possible.

<snip>
>> No, it wasn't a cache, which I thought
>> I made clear in my OP.
>> Nevertheless, I think one can reasonably
>> view/think of "primary" memory as if it
>> were a slower memory that just happened
>> to be cached where just by some accident,
>> the cache would always return a hit.
>> Perhaps this seems weird to you, but it
>> seems like a convenient tool to me to
>> see if there might be any advantage to
>> having separate supervisor mode and user
>> mode caches.
>
> I agree that it sounds weird to me, but if it helps you, have at it.

Okay, well, I hope that why I'm thinking this way is obvious.

Caches are devices that make the memory they serve look faster than it really is.
Caches are usually used to try to speed up ALL memory faster without regard to what's in a particular region of memory.
Caches have historically been very, very small, only a few kilo-bytes/36-bit words in size, and so couldn't hold more than a very small portion a user program or OS at any instant in time, and so I suspect that the effect of a larger cache dedicated to OS-code, say, couldn't really be accurately determined in The Good Old Days.
The primary memory of an 1110 AKA 1100/40, while small by today's standards, was huge back then -- much larger than any cache of the time -- and I suspect often times (usually?) contained a significant portion of the code portion of the Exec because of the way things were loaded at boot time.
To the extent that the primary memory behaved like a region of much slower memory that just happened to be front ended by a very fast and good (perfect) hit rate, I'm hoping that the effort to get things (and keep things) into primary memory serves as possible (indirect) measure of how useful having a separate OS cache might be.

<snip>
>>> I don't remember what parts of the Exec were where, but I suspect it was
>>> the same as for user programs. Of course, the interrupt vector
>>> instructions had to be in primary due to their hardware fixed addresses..
>>>
>> For me, life started with 36 level by
>> which time *BOOT1, et. al. had given
>> way to *BTBLK, et. al.
>> Whatever the old bootstrap did, the
>> new one tried to place the Exec I-
>> and D-banks at opposite ends of memory,
>> presumably so that concurrent accesses
>> stood a better chance of not blocking
>> one another due to being in a
>> physically different memory that was
>> often times interleaved.
>> IIRC, whether or not this was actually
>> useful, it didn't change until M-Series hit the fan with paging.
>
> First of all, when I mentioned the interrupt vectors, I wasn't talking
> about boot elements, but the code starting at address 0200 (128 decimal)
> through 0377 on at least pre 1100/90 systems which was a set of LMJ
> instructions, one per interrupt type, that were the first instructions
> executed after an interrupt. e.g. on an 1108, on an ISI External
> Interrupt on CPU0 the hardware would transfer control to address 0200,
> where the LMJ instruction would capture the address of the next
> instruction to be executed in the interrupted program, then transfer
> control to the ISI interrupt handler.

I understood what you meant by "interrupt vectors". Honestly.

My reference to Bootstrap elements was to point out that I'm familiar with the bootstrap process and bank placement that occurred after the 1100/80 hit the fan.
I don't recall much (if any) procing having to do with EON (1106 and 1108) or TON (1110 and 1100/80 and later 1100/60) in the bootstrap code and so I'm assuming that what occurred for the 1100/80 as far as bank placement also occurred for the 1110 AKA 1100/40.
If that's not the case, then feel free to correct me.

As for the 1100/80 boot process, the hardware loaded the bootblock (*BTBLK) starting at the address indicated by the MSR register, and once loaded, started executing the code there in by taking an I/O interrupt.
The vectors, being part of the bootblock code, directed execution to the initial I/O interrupt handler which IIRC was the start of *BTBLK.
I don't recall what the vectors looked like, but I suspect that they were *NOT* LMJ instructions because on the 1100/80, P started out with an ABSOLUTE address, not a relative address.
IIRC the Processor and Storage manual specifically mentioned that the vector need not be an LMJ because any relative return address captured was likely to be wrong.

> But you did jog my memory about Exec placement. On an 1108, the Exec
> I-bank was loaded starting at address 0, and extended at far as needed.
> The Exec D-bank was loaded at the end of memory i.e. ending at 262K for
> a fully configured memory, extending "downward" as far as needed. This
> left the largest contiguous space possible for user programs, as well as
> insuring that the Exec I and D banks were in different memory banks, to
> guarantee overlapped timing for I fetch and data access. I guess that
> the 1110 just did the same thing, as it didn't require changing another
> thing, and maximized the contiguous space available for user banks in
> both primary and extended memory.

FWIW, the I-bank couldn't start at 0 on an 1100/80 or 1100/80A.
The caches (Storage Interface Units AKA SIUs) made memory look like it was centered the address 8-million (decimal) and expanded upwards and downwards from there.
The maximum amount of memory possible was (IIRC) 8-million and so I suppose that one could theoretically get memory to go from 0 to 8-million, but AFAIK, that never happened and so always started at 4-million at its lowest.


Click here to read the complete article
Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<7fc50542-3a47-4f75-a64d-1cb828c07893n@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=399&group=comp.sys.unisys#399

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:a05:620a:252:b0:76c:da44:d05c with SMTP id q18-20020a05620a025200b0076cda44d05cmr6763qkn.10.1692154612108;
Tue, 15 Aug 2023 19:56:52 -0700 (PDT)
X-Received: by 2002:a17:903:1c3:b0:1b5:2496:8c0d with SMTP id
e3-20020a17090301c300b001b524968c0dmr246263plh.3.1692154611510; Tue, 15 Aug
2023 19:56:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Tue, 15 Aug 2023 19:56:50 -0700 (PDT)
In-Reply-To: <tELCM.492852$TCKc.106157@fx13.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:7565:e04f:8636:9085;
posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:7565:e04f:8636:9085
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7fc50542-3a47-4f75-a64d-1cb828c07893n@googlegroups.com>
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Wed, 16 Aug 2023 02:56:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5236
 by: Lewis Cole - Wed, 16 Aug 2023 02:56 UTC

On Tuesday, August 15, 2023 at 7:07:23 AM UTC-7, Scott Lurndal wrote:
> Stephen Fuld writes:
>> I don't want to get into an argument about caching with you, but I am
>> sure that the percentage of time spent in supervisor mode is very
>> workload dependent.
>>
>
> Indeed. On modern toy unix systems, the split is closer to 10% system, 90% user.
>
> For example, a large parallel compilation job[*] (using half of the available 64 cores):
>
> %Cpu(s): 28.6 us, 2.7 sy, 0.1 ni, 66.6 id, 1.8 wa, 0.0 hi, 0.3 si, 0.0 st
>
> That's 28.6% in user mode, 2.7% in system (supervisor) mode.

That's nice.
But it doesn't speak to what would happen to system performance if the amount of time spent in the OS went down, does it?
Nor does it say anything about whether or not having a dedicated supervisor cache would help or hurt things.
If you have some papers that you can point to, I'd love to hear about them.

> Most modern server processors (intel, arm64) offer programmable cache partitioning
> mechanisms that allow the OS to designate that a schedulable entity belongs to a
> specific partition, and provides controls to designate portions of the cache are
> reserved to those schedulable entities (threads, processes).

I think that "programmable cache partitioning" is what the ARM folks call some of their processors' ability to partition one of their caches.
I think that the equivalent thing for x86-64 processors by Intel (which is the dominant x86-64 server processor maker) is called "Cache Allocation Technology" (CAT) and what it does is to basically set limits on how much cache thread/process/something-or-other can use so that said thread/process/something-or-other's cache usage doesn't impact other threads/processes/something-or-others.
While this is ... amusing ... it doesn't really say much with regard to the possible impact of supervisor mode cache ... except perhaps in the case of a micro-kernel where various user-land programs perform OS functions in which case I would argue that, in effect, the partitions created amount to supervisor mode caches for supervisor code that doesn't happen to run with the supervisor bit set.

Just as an aside, if someone wanted to write a 2200 system simulator, say, it would be A Good Idea to dedicate one processor *CORE* to each simulated 2200 IP and basically ignore any "hyperthreaded" processors as all they can do if they are allowed to execute is to disrupt the cache needed by the actual core to simulate the IP.

> Note also that in modern server grade processors, there are extensions to the
> instruction set to allow the application to instruct the cache that data will
> be used in the future, in which case the cache controller _may_ pre-load the
> data in anticipation of future use.

If you are referring to the various PREFETCHxxx instructions, yes, they exist, but they are usually "hints" the last time I looked and only load *DATA* into the L3 DATA cache in advance of its possible use.
So unless something's changed (and feel free to let me know if it has), you can't pre-fetch supervisor mode code for some service that user mode code might need Real Soon Now.

> Most cache subsystems also include logic to anticipate future accesses and
> preload the data into the cache before the processor requires it based on
> historic patterns of access.

Again, data, not code.
So if you have a request for some OS-like service that would be useful to have done as quickly as possible, the actual code might not still be in an instruction cache if a user program has been running for a long time and so has filled the cache with its working set.

> [*] takes close to an hour on a single core system, over 9 million SLOC.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<IHqDM.417690$xMqa.412229@fx12.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=400&group=comp.sys.unisys#400

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad> <7fc50542-3a47-4f75-a64d-1cb828c07893n@googlegroups.com>
Lines: 118
Message-ID: <IHqDM.417690$xMqa.412229@fx12.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Thu, 17 Aug 2023 15:06:16 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Thu, 17 Aug 2023 15:06:16 GMT
X-Received-Bytes: 6476
 by: Scott Lurndal - Thu, 17 Aug 2023 15:06 UTC

Lewis Cole <l_cole@juno.com> writes:
>On Tuesday, August 15, 2023 at 7:07:23=E2=80=AFAM UTC-7, Scott Lurndal wrot=
>e:
>> Stephen Fuld writes:=20
>>> I don't want to get into an argument about caching with you, but I am
>>> sure that the percentage of time spent in supervisor mode is very
>>> workload dependent.
>>>
>>
>> Indeed. On modern toy unix systems, the split is closer to 10% system, 90=
>% user.
>>
>> For example, a large parallel compilation job[*] (using half of the avail=
>able 64 cores):
>>
>> %Cpu(s): 28.6 us, 2.7 sy, 0.1 ni, 66.6 id, 1.8 wa, 0.0 hi, 0.3 si, 0.0 st
>>
>> That's 28.6% in user mode, 2.7% in system (supervisor) mode.
>
>That's nice.
>But it doesn't speak to what would happen to system performance if the amou=
>nt of time spent in the OS went down, does it?

I don't understand what you you are suggesting. The time spent in the
OS is 2.7% If the amount spent in the OS is goes down, that's just
more time available for user mode code to execute.

On intel processors, the caches are physical tagged and physical indexed,
which means that any access to a particular address, regardless of
access mode (user, supervisor) or execution ring will hit on the cache
for any given physical address. Security access controls are in the TLBs
(at both L1 and L2).

On ARM64 processors, the caches are additionally tagged with the exception
level (User, Kernel, Hypervisor, Secure Monitor) which additionally qualifies
accesses to each cache line.

ARM does provide a mechanism to partition the caches for -allocation- only,
otherwise the normal aforementioned access constraints apply. The ARM64
mechanism (MPAM) assigns a partition identifier to entities (e.g. processes)
and any cache allocation for that entity will be allocated from the specified
partition; any accesses to the physical address corresponding to the cache
line will be satisfied from any partition so long as any security constraints
are met.

>Nor does it say anything about whether or not having a dedicated supervisor=
> cache would help or hurt things.

It certainly implies such. If the supervisor is only running 3% of the
time, having a dedicated supervisor cache would hurt performance.

>I think that "programmable cache partitioning" is what the ARM folks call s=
>ome of their processors' ability to partition one of their caches.

It's called MPAM (and FWIW, I spent a few years on the ARM Technical Advisory
Board while the 64-bit architecture was being developed).

>I think that the equivalent thing for x86-64 processors by Intel (which is =
>the dominant x86-64 server processor maker) is called "Cache Allocation Tec=
>hnology" (CAT) and what it does is to basically set limits on how much cach=
>e thread/process/something-or-other can use so that said thread/process/som=
>ething-or-other's cache usage doesn't impact other threads/processes/someth=
>ing-or-others.

Again, this controls "allocation", not access.

as an aside, if someone wanted to write a 2200 system simulator, say, =
>it would be A Good Idea to dedicate one processor *CORE* to each simulated =
>2200 IP and basically ignore any "hyperthreaded" processors as all they can=
> do if they are allowed to execute is to disrupt the cache needed by the ac=
>tual core to simulate the IP.

Well given my former employer (Unisys)has developed and sells a 2200 system
simulator, it may be useful to ask them for details on the implementation.

>If you are referring to the various PREFETCHxxx instructions, yes, they exi=
>st, but they are usually "hints" the last time I looked and only load *DATA=
>* into the L3 DATA cache in advance of its possible use.

Yes, they are hints. That allows the processor vendors to best determine
the behavior for their particular microarchitecture. I can tell you that
they are honored on all the ARM64 processors we build.

>So unless something's changed (and feel free to let me know if it has), you=
> can't pre-fetch supervisor mode code for some service that user mode code =
>might need Real Soon Now.

Right. Given that the supervisor code isn't running, it would be difficult
for it to anticipate subsequent user mode behavior. In any case, looking
at current measurements for context switches between user and kernel modes
on modern intel and arm64 processors, it wouldn't be likely to help
performance even if such a mechanism was available, and indeed, if supervisor
calls are that common, it's entirely likely that the line would already
be present in at least the LLC, if not closer to the processor, particurly
with SMT (aka hyperthreading).

>
>> Most cache subsystems also include logic to anticipate future accesses an=
>d
>> preload the data into the cache before the processor requires it based on
>> historic patterns of access.
>
>Again, data, not code.

Actually, instruction prefetching is far easier than data prefetching, and
with dedicated Icache at L1, the icache will prefetch.

>So if you have a request for some OS-like service that would be useful to h=
>ave done as quickly as possible, the actual code might not still be in an i=
>nstruction cache if a user program has been running for a long time and so =
>has filled the cache with its working set.

All the major processors have performance monitoring tools to count
icache misses. The vast majority of such misses are related to
branches and function calls, fetching supervisor mode code is likely
in the noise.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<ubo3b6$9hr9$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=402&group=comp.sys.unisys#402

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Fri, 18 Aug 2023 08:39:18 -0700
Organization: A noiseless patient Spider
Lines: 98
Message-ID: <ubo3b6$9hr9$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me>
<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me>
<f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 18 Aug 2023 15:39:18 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c9773ed5d043ce45c6cf820e0cfa5b95";
logging-data="313193"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18EmaojFp62T3fgwBUvoldwSaocVnMY+LE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:3QQGtbu7LwwLBE4PEBr/TN/+3RE=
Content-Language: en-US
In-Reply-To: <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>
 by: Stephen Fuld - Fri, 18 Aug 2023 15:39 UTC

On 8/15/2023 11:48 AM, Lewis Cole wrote:
> On Tuesday, August 15, 2023 at 12:06:53 AM UTC-7, Stephen Fuld wrote:
> <snip>
>>>> Yeah. Only using half the cache at any one time would seem to decrease
>>>> performance. :-)
>>>
>>> Of course, the smiley face indicates
>>> that you are being facetious.

No. I wasn't. See below.

>>> But just on the off chance that
>>> someone wandering through the group
>>> might take you seriously, let me
>>> point out that re-purposing half of
>>> a cache DOES NOT necessarily reduce
>>> performance, and may in fact increase
>>> it if the way that the "missing" half
>>> is used somehow manages to increase
>>> the overall hit rate ...

Splitting a size X cache into two sized x/2 caches will almost certainly
*reduce* hit rate. Think of it this way. The highest hit rate is
obtained when the number of most likely to be used blocks are exactly
evenly split between the two caches. That would make the contents of
the two half sized caches exactly the same as those of the full sized
cache. Conversely, if one of the caches has a different, (which means
lesser used) block, then its hit rate would be lower. There is no way
that splitting the caches would lead to a higher hit rate. But hit rate
isn't the only thing that determines cache/system performance.

>>> such as
>>> reducing a unified cache that's used
>>> to store both code and data with a
>>> separate i-cache for holding
>>> instructions and a separate d-cache
>>> for holding data which is _de rigueur_
>>> on processor caches these days.

Separating I and D caches has other advantages. Specifically, since
they have separate (duplicated) hardware logic both for addressing and
the actual data storage, the two caches can be accessed simultaneously,
which improves performance, as the instruction fetch part of a modern
CPU is totally asynchronous with the operand fetch/store part, and they
can be overlapped. This ability, to do an instruction fetch from cache
simultaneously with handling a load/store is enough to overcome the
lower hit rate. Note that this advantage doesn't apply to a
user/supervisor separation, as the CPU is in one mode or the other, not
both simultaneously.

>>>
>>> I think it should be clear from the
>>> multiple layers of cache these days,
>>> each layer being slower but larger
>>> than the one above it, that the
>>> further you go down (towards memory),
>>> the more a given cache is supposed to
>>> cache instructions/data that is "high
>>> use", but not so much as what's in
>>> the cache above it.

True for an exclusive cache, but not for an inclusive one.

>>> And even since the beginning of time
>>> (well ... since real live multi-tasking
>>> OS appeared), it has been obvious that
>>> processors tend to spend most of their
>>> time in supervisor mode (OS) code
>>> rather than in user (program) code.
>>
>> I don't want to get into an argument about caching with you, [...]
>
> I'm not sure what sort of argument you think I'm trying get into WRT caching, but I assume that we both are familiar enough with it so that there's really no argument to be had so your comment makes no sense to me.
>
>> [...] but I am sure that the percentage of time spent in supervisor mode is very
>> workload dependent.
>
> Agreed.
> But to the extent that the results of the SS keyin were ... useful .. in The Good Old Days at Roseville, I recall seeing something in excess of 80+ percent of the time was spent in the Exec on regular basis.

It took me a while to respond to this, as I had a memory, but had to
find the manual to check. You might have had some non-released code
running in Roseville, but the standard SS keyin doesn't show what
percentage of time is spent in Exec. To me, and as supported by the
evidence Scott gave, 80% seems way too high.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<vMNDM.458005$qnnb.405087@fx11.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=403&group=comp.sys.unisys#403

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me> <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com> <ubo3b6$9hr9$1@dont-email.me>
Lines: 24
Message-ID: <vMNDM.458005$qnnb.405087@fx11.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 18 Aug 2023 17:21:31 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 18 Aug 2023 17:21:31 GMT
X-Received-Bytes: 2298
 by: Scott Lurndal - Fri, 18 Aug 2023 17:21 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 8/15/2023 11:48 AM, Lewis Cole wrote:
>> On Tuesday, August 15, 2023 at 12:06:53 AM UTC-7, Stephen Fuld wrote:

>> Agreed.
>> But to the extent that the results of the SS keyin were ... useful .. in The Good Old Days at Roseville, I recall seeing something in excess of 80+ percent of the time was spent in the Exec on regular basis.
>
>It took me a while to respond to this, as I had a memory, but had to
>find the manual to check. You might have had some non-released code
>running in Roseville, but the standard SS keyin doesn't show what
>percentage of time is spent in Exec. To me, and as supported by the
>evidence Scott gave, 80% seems way too high.

To be fair, one must consider functional differences in operating systems.

Back in the day, for example, the operating system was responsible for
record management, while in *nix code that is all delegated to user mode
code. So for applications that heavily used files (and in the olden days
the lack of memory was compensated for by using temporary files on mass
storage devices) there would likely be far more time spent in supervisor
code than in *nix/windows today. In the Burroughs MCP, for example,
a subtantial portion of DMSII is part of the OS rather than purely usermode
code is it would be with e.g. Oracle on *nix.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<uboahc$am7b$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=404&group=comp.sys.unisys#404

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Fri, 18 Aug 2023 10:42:04 -0700
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <uboahc$am7b$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me>
<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me>
<f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>
<ubo3b6$9hr9$1@dont-email.me> <vMNDM.458005$qnnb.405087@fx11.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 18 Aug 2023 17:42:04 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c9773ed5d043ce45c6cf820e0cfa5b95";
logging-data="350443"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18eHrhNXOyGsMpRiTllUG39eUCm2COYdVc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FCIFjGfj1i3XUAgd+N3T3E6Wyzc=
Content-Language: en-US
In-Reply-To: <vMNDM.458005$qnnb.405087@fx11.iad>
 by: Stephen Fuld - Fri, 18 Aug 2023 17:42 UTC

On 8/18/2023 10:21 AM, Scott Lurndal wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> On 8/15/2023 11:48 AM, Lewis Cole wrote:
>>> On Tuesday, August 15, 2023 at 12:06:53 AM UTC-7, Stephen Fuld wrote:
>
>>> Agreed.
>>> But to the extent that the results of the SS keyin were ... useful .. in The Good Old Days at Roseville, I recall seeing something in excess of 80+ percent of the time was spent in the Exec on regular basis.
>>
>> It took me a while to respond to this, as I had a memory, but had to
>> find the manual to check. You might have had some non-released code
>> running in Roseville, but the standard SS keyin doesn't show what
>> percentage of time is spent in Exec. To me, and as supported by the
>> evidence Scott gave, 80% seems way too high.
>
> To be fair, one must consider functional differences in operating systems.

Sure, but Lewis was referring to the 1100 Exec.

> Back in the day, for example, the operating system was responsible for
> record management, while in *nix code that is all delegated to user mode
> code.

For the 1100 OS, it was never responsible for "record management". The
OS only knew about "blocks" of data. All blocking/deblocking/record
management was done in user libraries, or for the database system, by
the database system code, which ran in user mode.

> So for applications that heavily used files (and in the olden days
> the lack of memory was compensated for by using temporary files on mass
> storage devices) there would likely be far more time spent in supervisor
> code than in *nix/windows today.

While I agree about using more temporary files, etc., for the 1100 OS,
only the block I/O was done in supervisor mode. So the effect of more
files was less than for an OS that did record management in supervisor more.

> In the Burroughs MCP, for example,
> a subtantial portion of DMSII is part of the OS rather than purely usermode
> code is it would be with e.g. Oracle on *nix.

I certainly defer to your knowledge of MCP systems, for the 1100/OS, its
database systems, in those days, primarily DMS 1100, ran in user mode.
It only used supervisor mode (called Exec mode), for block/database page
I/O.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<uc30v8$2ga95$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=405&group=comp.sys.unisys#405

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Tue, 22 Aug 2023 12:06:16 -0700
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <uc30v8$2ga95$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me>
<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad>
<ubg70j$2s7av$2@dont-email.me> <d9OCM.632528$TPw2.94445@fx17.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 22 Aug 2023 19:06:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c791ee6ca3a519710bc74235e9c9d7c7";
logging-data="2631973"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18dPICZYI133n+I8XQya2/k4QihRadt5Ng="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:TknduB2FT6Y+lRS/ipsII0OMfyo=
In-Reply-To: <d9OCM.632528$TPw2.94445@fx17.iad>
Content-Language: en-US
 by: Stephen Fuld - Tue, 22 Aug 2023 19:06 UTC

On 8/15/2023 9:58 AM, Scott Lurndal wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> On 8/15/2023 7:07 AM, Scott Lurndal wrote:
>
>>>
>>> Most modern server processors (intel, arm64) offer programmable cache partitioning
>>> mechanisms that allow the OS to designate that a schedulable entity belongs to a
>>> specific partition, and provides controls to designate portions of the cache are
>>> reserved to those schedulable entities (threads, processes).
>>
>> I wasn't aware of this. I will have to do some research. :-)
>
> For the ARM64 version, look for Memory System Resource Partitioning and Monitoring
> (MPAM).
>
> https://developer.arm.com/documentation/ddi0598/latest/
>
> Note that this only controls "allocation", not "access" - i.e. any application
> can hit a line in any partition, but new lines are only allocated in partitions
> associated with the entity that caused the fill to occur.
>
> Resources include cache allocation and memory bandwidth.

First, thanks for the link. I looked at it a little (I am not an ARM
programmer). I can appreciate its utility, particularly in something
like a cloud server environment where it is useful to prevent one
application either inadvertently or on purpose, from "overpowering"
(i.e. monopolizing resources) another, so you can meet SLAs etc. This
is well explained in the "Overview" section of the manual.

However, I don't see its value in the situation that Lewis is talking
about, supervisor vs users. A user can't "overpower" the OS, as the OS
could simply not give it much CPU time. And if can't rely on the OS not
to overpower or steal otherwise needed resources from the user programs,
then I think you have worse problems. :-(

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<u%8FM.178313$f7Ub.91998@fx47.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=406&group=comp.sys.unisys#406

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx47.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad> <ubg70j$2s7av$2@dont-email.me> <d9OCM.632528$TPw2.94445@fx17.iad> <uc30v8$2ga95$1@dont-email.me>
Lines: 65
Message-ID: <u%8FM.178313$f7Ub.91998@fx47.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 22 Aug 2023 20:36:42 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 22 Aug 2023 20:36:42 GMT
X-Received-Bytes: 4060
 by: Scott Lurndal - Tue, 22 Aug 2023 20:36 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>On 8/15/2023 9:58 AM, Scott Lurndal wrote:
>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>>> On 8/15/2023 7:07 AM, Scott Lurndal wrote:
>>
>>>>
>>>> Most modern server processors (intel, arm64) offer programmable cache partitioning
>>>> mechanisms that allow the OS to designate that a schedulable entity belongs to a
>>>> specific partition, and provides controls to designate portions of the cache are
>>>> reserved to those schedulable entities (threads, processes).
>>>
>>> I wasn't aware of this. I will have to do some research. :-)
>>
>> For the ARM64 version, look for Memory System Resource Partitioning and Monitoring
>> (MPAM).
>>
>> https://developer.arm.com/documentation/ddi0598/latest/
>>
>> Note that this only controls "allocation", not "access" - i.e. any application
>> can hit a line in any partition, but new lines are only allocated in partitions
>> associated with the entity that caused the fill to occur.
>>
>> Resources include cache allocation and memory bandwidth.
>
>First, thanks for the link. I looked at it a little (I am not an ARM
>programmer). I can appreciate its utility, particularly in something
>like a cloud server environment where it is useful to prevent one
>application either inadvertently or on purpose, from "overpowering"
>(i.e. monopolizing resources) another, so you can meet SLAs etc. This
>is well explained in the "Overview" section of the manual.
>
>However, I don't see its value in the situation that Lewis is talking
>about, supervisor vs users. A user can't "overpower" the OS, as the OS
>could simply not give it much CPU time. And if can't rely on the OS not
>to overpower or steal otherwise needed resources from the user programs,
>then I think you have worse problems. :-(
>

It doesn't have any value in Lewis' situation. Modern operating systems
don't spend significant time in kernel mode, by design. Cache partioning
by execution state (user, kernel/supervisor, hypervisor, firmware (e.g SMM or EL3))
will reduce overall performance in that type of environment.

Whether it would matter for e.g. the 2200 emulators currently shipping
(or even a CMOS 2200 were they still being designed), might be a different
matter.

In modern processors, where the caches are physically tagged, data shared
by the kernel and user mode code (e.g. when communicating between them)
will occupy a single cache line (or set thereof) - if the cache were
partitioned between them, you'll either have the same line cached in
both places (if unmodified) or the supervisor partition would need to invalidate
the usermode partition line if the supervisor modifies the line.

I can't see that type of partitioning ever having a positive effect
in the current generation of architectures.

Adding additional ways to each set helps as well.

For high-performance networking, much of the kernel functionality has
been moved to user mode anyway. See:

https://www.dpdk.org/
https://opendataplane.org/

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<uc53gu$2uqis$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=407&group=comp.sys.unisys#407

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: hyperspa...@vogon.gov.invalid (Don Vito Martinelli)
Newsgroups: comp.sys.unisys
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Date: Wed, 23 Aug 2023 16:02:05 +0200
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <uc53gu$2uqis$1@dont-email.me>
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me>
<4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me> <tELCM.492852$TCKc.106157@fx13.iad>
<ubg70j$2s7av$2@dont-email.me> <d9OCM.632528$TPw2.94445@fx17.iad>
<uc30v8$2ga95$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 23 Aug 2023 14:02:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3b43d7d2163e9c445419336279d2eae2";
logging-data="3107420"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/CDypT9NSNt5XXs/yNu0IWmNxufHpDUeQ="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
SeaMonkey/2.53.17
Cancel-Lock: sha1:dcLL4PvgAnn1D4Vq98bKx2QaIGA=
In-Reply-To: <uc30v8$2ga95$1@dont-email.me>
 by: Don Vito Martinelli - Wed, 23 Aug 2023 14:02 UTC

Stephen Fuld wrote:
> On 8/15/2023 9:58 AM, Scott Lurndal wrote:
>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>>> On 8/15/2023 7:07 AM, Scott Lurndal wrote:
>>
>>>>
>>>> Most modern server processors (intel, arm64) offer programmable
>>>> cache partitioning
>>>> mechanisms that allow the OS to designate that a schedulable entity
>>>> belongs to a
>>>> specific partition, and provides controls to designate portions of
>>>> the cache are
>>>> reserved to those schedulable entities (threads, processes).
>>>
>>> I wasn't aware of this.  I will have to do some research.  :-)
>>
>> For the ARM64 version, look for Memory System Resource Partitioning
>> and Monitoring
>> (MPAM).
>>
>> https://developer.arm.com/documentation/ddi0598/latest/
>>
>> Note that this only controls "allocation", not "access" - i.e. any
>> application
>> can hit a line in any partition, but new lines are only allocated in
>> partitions
>> associated with the entity that caused the fill to occur.
>>
>> Resources include cache allocation and memory bandwidth.
>
> First, thanks for the link.  I looked at it a little (I am not an ARM
> programmer).  I can appreciate its utility, particularly in something
> like a cloud server environment where it is useful to prevent one
> application either inadvertently or on purpose, from "overpowering"
> (i.e. monopolizing resources) another, so you can meet SLAs etc.  This
> is well explained in the "Overview" section of the manual.
>
> However, I don't see its value in the situation that Lewis is talking
> about, supervisor vs users.  A user can't "overpower" the OS, as the OS
> could simply not give it much CPU time.  And if can't rely on the OS not
> to overpower or steal otherwise needed resources from the user programs,
> then I think you have worse problems.  :-(
>
>

I ran across a problem in the late 90's where terminating DDP-PPC or
DDN/TAS (can't remember which) on a single processor machine could cause
the processor to go to 100% Realtime. The only way to stop it was to
stop the partition. Unsurprisingly, there was a PCR for this problem
and I applied it smartish.
In 2017/18 we had a somewhat similar problem where a machine with a
performance key limiting it to 5% (a guess) would be permitted to run at
100% for a while (some TIP problem) before the performance key struck
back and restricted it to well under 1% until the "mips loan" had been
"repaid". This took maybe 30 minutes, maybe longer.

Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<646745b0-5c25-44b9-a9a2-06b25fa9c240n@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=423&group=comp.sys.unisys#423

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:ac8:4d0f:0:b0:419:cbad:b923 with SMTP id w15-20020ac84d0f000000b00419cbadb923mr170617qtv.6.1696818457545;
Sun, 08 Oct 2023 19:27:37 -0700 (PDT)
X-Received: by 2002:a05:6808:20a7:b0:3a7:8c2c:8c8e with SMTP id
s39-20020a05680820a700b003a78c2c8c8emr7884797oiw.11.1696818457262; Sun, 08
Oct 2023 19:27:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Sun, 8 Oct 2023 19:27:36 -0700 (PDT)
In-Reply-To: <ubo3b6$9hr9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:28e9:117d:f613:317b;
posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:28e9:117d:f613:317b
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me> <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>
<ubo3b6$9hr9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <646745b0-5c25-44b9-a9a2-06b25fa9c240n@googlegroups.com>
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Mon, 09 Oct 2023 02:27:37 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 18612
 by: Lewis Cole - Mon, 9 Oct 2023 02:27 UTC

I'm up to my ass in alligators IRL and so I haven't (and likely won't for some time) have a lot of time to read and respond to posts here.
So I'm going to respond to only Mr. Fuld's posts rather than any others.
And since I am up to my ass in alligators, I'm going to break up my response to Mr. Fuld's post into two parts so that I can get SOMETHING out Real Soon Now.
So here is the first part:

On 8/15/2023 11:48 AM, Lewis Cole wrote:
>> On Tuesday, August 15, 2023 at 12:06:53AM UTC-7, Stephen Fuld wrote:
>> <snip>
>>>>> Yeah. Only using half the cache at any one time would seem to decrease
>>>>> performance. :-)
>>>>
>>>> Of course, the smiley face indicates
>>>> that you are being facetious.
>
>No. I wasn't. See below.

I thought you didn't want to argue about caching? ;-)
Well, hopefully we both argue on pretty much everything and any disagreement is likely due to us not being on the same page WRT our working assumptions, so perhaps this is A Good Thing.

However, I apologize to you for any annoyance I caused you due to my assumptions about your reply.

>>>> But just on the off chance that
>>>> someone wandering through the group
>>>> might take you seriously, let me
>>>> point out that re-purposing half of
>>>> a cache DOES NOT necessarily reduce
>>>> performance, and may in fact increase
>>>> it if the way that the "missing" half
>>>> is used somehow manages to increase
>>>> the overall hit rate ...
>
> Splitting a size X cache into two sized x/2 caches will almost certainly
> *reduce* hit rate.

There are several factors that influence hit rate, one of them is cache size.
Others, such as number of associativity ways, replacement policy, and line size are also obvious influences.
In addition, there are other factors that can make up for a slightly reduced hit rate so that such a cache can still be competitive with a cache with a slightly higher hit rate such as cycle time.

If two caches are literally identical in every way except for size, then you are *CORRECT* that the hit rate will almost certainly be lower for the smaller cache.
However, given two caches, one of which just happens to be half the size of the other, it does *NOT* follow that the smaller cache must necessarily have a lower hit rate than the other as changes to some of the other factors that affect hit rate might just make up for what was lost due to the smaller size.

> Think of it this way. The highest hit rate is
> obtained when the number of most likely to be used blocks are exactly
> evenly split between the two caches.

Ummm, no. I guess we are going to have an argument over caching after all .....

The highest hit rate is obtained when a cache manages to successfully anticipate, load up into its local storage, and then provide that which the processor needs *BEFORE* the processor actually makes a request to get it from memory. Period.
This is true regardless of whether or not we're talking about one cache or multiple caches.
From the point of view of performance, it's what makes caching work.
(Note that there may be other reasons for a cache such as to reduce bus traffic, but let's ignore these reasons for the moment.)
Whether or not, say, an I-cache happens to have the same number of likely-to-be-used blocks as the D-cache is irrelevant.
They may have the same number. They may not have the same number. I suspect for reasons that I'll wave my arms at shortly that they usually aren't.
What matters is whether they have what's needed and can deliver it before the processor actually requests it.

Now if an I-cache is getting lots and lots of hits, then presumably it is likely filled with code loops that are being executed frequently.
The longer that the processor can continue to execute these loops, the more it will execute them at speeds that approach that which it would if main memory were as fast as the cache memory.
And the more that this happens, the more this speed offsets the the much slower speed of the processor when it isn't getting constant hits.

However, running in cached loops doesn't imply much about the data that these loops are accessing.
They may be marching through long arrays of data or they may be pounding away at a small portion of data structure such as at the front of a ring buffer. It's all very much application dependent.
About the only thing that we can infer is that because the code is executing loops, there is at least one instruction (the jump to the top of a loop) which doesn't have/need a corresponding piece of data in the D-cache.
IOW, there will tend to be one fewer piece of data in the D-cache than in the I-cache.
Whether or not this translates into equal numbers of cache lines rather than data words just depends.
(And note I haven't even touched on the effect of writes to the contents of the D-cache.)

So if you think that caches will have their highest hit rate when the "number of most likely to be used blocks are exactly evenly split between the two caches", you're going to have to provide a better argument/evidence to support your reasoning before I will accept this premise, either in the form of a citation or better yet, a "typical" example.

> That would make the contents of
> the two half sized caches exactly the same as those of the full sized
> cache.

No, it wouldn't. See above.

> Conversely, if one of the caches has a different, (which means
> lesser used) block, then its hit rate would be lower.

No, it wouldn't. See above.

> There is no way
> that splitting the caches would lead to a higher hit rate.

As I waved my arms at before, it is possible if more changes than are made than just to its size.

For example, if a cache happens to be a directed mapped cache, then there's only one spot in the cache for a piece of data with a particular index.
If another piece of data with the same index is requested, then old piece of data is lost/replaced with the new one.
This is basic direct mapped cache behavior 101.

OTOH, if a cache happens to be a set associative cache of any way greater than one (i.e. not direct mapped), then the new piece of data can end up in a different spot within the same set for the given index from which it can be returned if it is not lost/replaced for some other reason.
This is basic set associativity cache behavior 101.

The result is that if the processor has a direct mapped cache and just happens to make alternating accesses to two pieces of data that have the same index, the directed mapped cache will *ALWAYS* take a miss on every access (i.e. have a hit rate of 0%), while the same processor with a set associative cache of any way greater than one will *ALWAYS* take a take a hit (i.e. have a hit rate of 100%).
And note that nowhere in the above description is there any mention of cache size.
Cache size DOES implicitly affect the likelihood of a collision, and so "typically" you will get more collisions which will cause a direct mapped cache to perform worse than a set associative cache.
And you can theoretically (although not practically) go one step further by making a cache fully associative which will eliminate conflict misses entirely.
In short, there most certain is "a way" that the hit rate can be higher on a smaller size cache than a larger one, contrary to your claim.

> But hit rate
> isn't the only thing that determines cache/system performance.

Yes. Finally, we agree on something.

>>>> such as
>>>> reducing a unified cache that's used
>>>> to store both code and data with a
>>>> separate i-cache for holding
>>>> instructions and a separate d-cache
>>>> for holding data which is _de rigueur_
>>>> on processor caches these days.
>
> Separating I and D caches has other advantages. Specifically, since
> they have separate (duplicated) hardware logic both for addressing and
> the actual data storage, the two caches can be accessed simultaneously,
> which improves performance, as the instruction fetch part of a modern
> CPU is totally asynchronous with the operand fetch/store part, and they
> can be overlapped. This ability, to do an instruction fetch from cache
> simultaneously with handling a load/store is enough to overcome the
> lower hit rat

Having a separate I-cache and D-cache may well have other advantages besides increased hit rate.
And increased concurrency may well be one of them.
However, my point by mentioning the existence of separate I-caches and D-caches was to point out that given a sufficiently Good Reason, splitting/replacing a single cache with smaller caches may be A Good Idea.
Increased concurrency doesn't change that argument in the slightest.
Simply replace any mention of "increased hit rate" with "increased concurrency" and the result is the same.

If you want to claim that increased concurrency was the *MAIN* reason for the existence of separate I-caches and D-caches, then I await with bated breath for you to present evidence and/or a better argument to show this was the case.
And if you're wondering why I'm not presenting -- and am not going to present -- any evidence or argument to support my claim that it was due to increased hit rate, that's because we both seem to agree on the basic premise I mentioned before, namely, given a sufficiently Good Reason, then splitting/replaceing a single cache with smaller caches may be A Good Idea.
Any argument that you present strengthens that premise without the need for me to do anything.


Click here to read the complete article
Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<e8862a35-38bb-4f36-a512-f44b552c2e65n@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=424&group=comp.sys.unisys#424

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:a05:620a:438c:b0:774:feb:2628 with SMTP id a12-20020a05620a438c00b007740feb2628mr192510qkp.13.1696818614316; Sun, 08 Oct 2023 19:30:14 -0700 (PDT)
X-Received: by 2002:a05:6808:200c:b0:3a1:f2a4:3d7 with SMTP id q12-20020a056808200c00b003a1f2a403d7mr7917070oiw.1.1696818614019; Sun, 08 Oct 2023 19:30:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!69.80.99.11.MISMATCH!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Sun, 8 Oct 2023 19:30:13 -0700 (PDT)
In-Reply-To: <ubo3b6$9hr9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:28e9:117d:f613:317b; posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:28e9:117d:f613:317b
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me> <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com> <ubo3b6$9hr9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e8862a35-38bb-4f36-a512-f44b552c2e65n@googlegroups.com>
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Mon, 09 Oct 2023 02:30:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 271
 by: Lewis Cole - Mon, 9 Oct 2023 02:30 UTC

So here's the second part of my reply to Mr. Fuld's last response to me.
Considering how quickly this reply has grown, I may end up breaking it up into a third part as well.

On 8/15/2023 11:48 AM, Lewis Cole wrote:
>> On Tuesday, August 15, 2023 at 12:06:53AM UTC-7, Stephen Fuld wrote:
>> <snip>
>>>> And even since the beginning of time
>>>> (well ... since real live multi-tasking
>>>> OS appeared), it has been obvious that
>>>> processors tend to spend most of their
>>>> time in supervisor mode (OS) code
>>>> rather than in user (program) code.
>>>
>>> I don't want to get into an argument about caching with you, [...]
>>
>> I'm not sure what sort of argument you
>> think I'm trying get into WRT caching,
>> but I assume that we both are familiar
>> enough with it so that there's really
>> no argument to be had so your
>> comment makes no sense to me.
>>
>>> [...] but I am sure that the percentage of time spent in supervisor mode is very
>>> workload dependent.
>>
>> Agreed.
>> But to the extent that the results of
>> the SS keyin were ... useful .. in The
>> Good Old Days at Roseville, I recall
>> seeing something in excess of 80+
>> percent of the time was spent in the
>> Exec on regular basis.
>
> It took me a while to respond to this, as I had a memory, but had to
> find the manual to check. You might have had some non-released code
> running in Roseville, but the standard SS keyin doesn't show what
> percentage of time is spent in Exec. To me, and as supported by the
> evidence Scott gave, 80% seems way too high.

So let me get this straight: You don't believe the 80% figure I cite because it seems too high to you and it didn't come a "standard" SS keyin of the time.
Meanwhile, you believe the figure cited by Mr. Lurndal because it seems more believable even though it comes from a system that's almost certainly running a different workload than the one I'm referring to which was from decades ago.
Did I get this right?
Seriously?

What happened to the bit where *YOU* were saying about the amount of time spent in an OS was probably workload dependent?
And since when does the credibility of local code written in Roseville (by people who are likely responsible for the care and feeding of the Exec that the local code is being written for) some how become suspect just because said code didn't make it into a release ... whether it's output is consistent with what you believe or not?

FWIW, I stand by the statement about seeing CPU utilization in excess of 80+% on a regular basis because that is what I recall seeing.
You can choose to believe me or not.
(And I would like to point out that I don't appreciate being called a liar no matter how politely it is done.)

I cannot provide direct evidence to support my statement.
I don't have any console listings or demand terminal session listings where I entered an "@@cons ss", for example.
However, I can point to an old (~1981) video that clearly suggests that the 20% figure cited by Mr. Lurndal almost certainly doesn't apply to the Exec at least in some environments from way back when.
And I can wave my arms at why it is most certainly possible for a much higher figure to show up, at least theoretically, even today.

So the video I want to draw your attention to is entitled, "19th Annual Sperry Univac Spring Technical Symposium - 'Proposed Memory Management Techniques for Sperry Univac 1100 Series Systems'", and can be found here:

< https://digital.hagley.org/VID_1985261_B110_ID05?solr_nav%5Bid%5D=88d187d912cfce1a5ad1&solr_nav%5Bpage%5D=0&solr_nav%5Boffset%5D=2 >

Around timestamp [4:43] in the video, S.J. Trivedi, one of the gurus of the DA at the time, notes that *DA element of the Dynamic Allocator (DA) had been observed to take up 33% of the total CPU time for a 3x (pronounced "three by" for the unfamiliar Univac/Sperry Univac/Sperry notation who are still reading along) 1100/80A system with 3MW of memory.
He notes that "33%" usage effectively means that a customer who buys a 3x 1100/80A is buying a 2x 1100/80A to do useful work with the third CPU being used to take care of memory management.
He adds (presumably jokingly) that because the DA is non-reentrant, another CPU can't be added to help out the one doing the DA work, but if it had been re-entrant, then maybe one and a half CPUs (meaning 1.5 CPUs out of 3 CPUs or 50% of the available computing power in the system) to up to two CPUs (meaning 2 CPUs out 3 CPUs or 66% of the available computing power in the system) could be "gobbled up by the DA".
(FWIW, I *THINK* that the changes he was referring to became known as the "Performance Enhancement Package" (PEP) package which was integrated some time after I arrived at Roseville and I *HOPE* significantly reduced the average amount of time chewed up for memory management, but by how much, I have no idea.)

Now 33% isn't 80+% (nor is a hypothetical 50% or 66% for that matter), but it also isn't something that's around or under 20%.
And if you make the entirely reasonable assumption that there was/is more to the time spent in the Exec than just memory allocation, then may I not so humbly suggest that something north of 20% of total available CPU time might well be spent in the Exec.

Just for giggles, though, let's say that that was then and this is now, meaning that the amount of time spent in the Exec is (and has been for some time) roughly the same as the figure that Mr. Lurndal cited ... so what?
Mr. Lurndal apparently wants to argue that since the *AVERAGE* amount of time that some systems (presumably those whose OSs' names end in the letters "ix") spend in the supervisor is "only" 20%, that means that it isn't worth having a separate supervisor cache.
After all, his reasoning goes, if the entire time spent in the supervisor were eliminated, that would mean an increase of only 20% more time to user programs.

Just on principle, this is a silly thing to say.

It obviously incorrectly equates time with useful work and then compounds that by treating time spent in user code as important while time spent in supervisor mode as waste.
It shouldn't take much to realize that this is nonsense.
Imagine a user program making a request to the OS to send a message somewhere that can't be delivered for some reason (e.g. an error or some programmatic limits being exceeded), the OS should return a bit more quickly than if it could send the message.
So the user program should get a bit more time and the OS should get a bit less time.
But no one in their right mind should automatically presume the user program should be able to do something more "useful" with the extra time it has.
And even if the user program does come up with something "useful" to do, whatever it does, it isn't able to do what it originally wanted to do which was apparently worth doing or else it wouldn't have bothered at all.
Meanwhile, if the OS could send the message and could in fact could do so more quickly than usual, the time in the OS might well go down at least initially.
But that may be temporary as the time could go up because the user program tries to send even more messages resulting in more time being spent in the OS.
And note that time spent in the OS going up could mean that more "useful" work is getting done.

And if you need some evidence that small amounts of time spent in an OS can have a significant impact on what a user is able to accomplish that is disproportionate, I will point out that as a percentage, the amount of time spent in an interrupt handler is generally very small compared to the time spent in the rest of an OS.

But enough of principles, let's talk about the 20% figure Mr. Lurndal cites..

Unless the system is running only a single user program on a single CPU, this figure is almost certainly the *AVERAGE* amount of time that the supervisor flag is set for the user programs that just happen to be running on the system during some particular time (i.e. it's load dependent).
And being an *AVERAGE*, that means that even on the system that Mr. Lurndal was referencing, the amount of time sent in the supervisor versus the amount of time spent in user program(s) could well be much lower or higher for shorter periods of time which end up being averaged out in the wash.
So the amount of supervisor time that is in some sense "available" to improve (or decrease) system performance can well be more than the 20% average figure, even in this most simplistic analysis, especially if something out of the ordinary happens to happen.

So let's talk about the 20% figure and work loads just for giggles to see if something out of the ordinary can just happen to happen.

Once Upon a Time, electronic digital computers used to do a lot of number crunching.
Back in these Good Old Days, user program requests were presumably few and far between (since the programs called upon the supervisor for services pretty much without regard to what was happening in the outside world).
And when a user program was in control, it had "useful work" to do.
Over time, such programs started running parts in parallel on different CPUs and Amdahl wrote his famous paper about how speed up of the small, non-parallel part of the programs didn't do much to increase overall performance of the parallel portion.
Back then, though, performance was measured in terms of CPU utilization and throughput and maybe turn around time.
This sort of computing environment is still in existence today, and as best as I can tell, it is the sort of environment that Mr. Lurndal works in and so is familiar with.


Click here to read the complete article
Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<106bf172-c240-42f7-b756-7de5a299e218n@googlegroups.com>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=425&group=comp.sys.unisys#425

 copy link   Newsgroups: comp.sys.unisys
X-Received: by 2002:a05:6214:3388:b0:65b:c47:10dc with SMTP id mv8-20020a056214338800b0065b0c4710dcmr161382qvb.3.1696818781181;
Sun, 08 Oct 2023 19:33:01 -0700 (PDT)
X-Received: by 2002:a9d:68da:0:b0:6b9:99ab:7f25 with SMTP id
i26-20020a9d68da000000b006b999ab7f25mr4526681oto.6.1696818780811; Sun, 08 Oct
2023 19:33:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.sys.unisys
Date: Sun, 8 Oct 2023 19:33:00 -0700 (PDT)
In-Reply-To: <ubo3b6$9hr9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:c080:3f60:28e9:117d:f613:317b;
posting-account=DycLBQoAAACVeYHALMkZoo5C926pUXDC
NNTP-Posting-Host: 2601:602:c080:3f60:28e9:117d:f613:317b
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com>
<ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com>
<ubf86b$2nnjm$1@dont-email.me> <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com>
<ubo3b6$9hr9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <106bf172-c240-42f7-b756-7de5a299e218n@googlegroups.com>
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
From: l_c...@juno.com (Lewis Cole)
Injection-Date: Mon, 09 Oct 2023 02:33:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 11743
 by: Lewis Cole - Mon, 9 Oct 2023 02:33 UTC

So here's the third part of my reply to Mr. Fuld's last response to me.

What more can I possibly say?
Well, it seems to me that Mr. Lurndal was trying to imply that a separate supervisor cache was a stupid thing to think about *TODAY* , and in doing so, he's also implying that it's also a stupid thing to think about for the future.
I mean 20% is 20% no matter what.
I think this completely ignores where OS design appears to be heading which could well make a separate supervisor cache more desirable in the future that it would be today.
So I'd like to wave my arms at what lead me to thinking about a separate supervisor cache so you can see where I think things are heading and judge for yourself whether or not such a thing might make more sense in the future than now.

So let's start with a little basic history (highly simplified of course) that I hope everyone will agree with.

Once upon a time, digital electronic computers used vacuum tubes which were bulky and power hungry.
Then along came transistors which caused a major reduction in size and power consumption along with an increase in performance (AKA switching speed).
And once transistors were packaged into ICs, performance really started taking off.
CPU performance grew at a rate that basically mirrored the number of transistors on a chip and Moore's Law said this doubled every couple of years.

But then, something happened around the start of the 1990s.
Moore's Law still seemed to apply WRT transistor count, but CPU performance no long tracked the number of transistors.
CPU performance continued to increase, but linearly rather than exponentially.
Cranking up the clock speed and increasing the number of pipeline stages no longer helped the way it did before.

So CPU designers went to multiple CPUs per chip which all accessed a common memory and the shared memory paradigm that we've all come to know and love became king.
There's only one small problem ... the caches that are needed to make shared memory work are having problems keeping up with dealing with the growth in CPU cores due to the need to keep the multiple caches "coherent" (i.e. look the same) within the system.
There have apparently been for some time now toy tests where coherence has caused caches to run slower than the memory they are in front of.

Now Once Upon a Time, message passing and shared memory were considered logically equivalent approaches where the latter just happened to be faster.
But over time, it now appears that while shared memory is faster for awhile, it doesn't scale as well as message passing and so if you want to build really large systems with lots of CPUs, message passing is the way to go.
And before some *ix fanboy points out that there are systems that have thousands of CPUs, I'm talking about systems with tens of thousands of CPUs and beyond, which is territory that no *ix system has yet been able to reach.

So about 10 years ago, the boys and girls at ETH Zurich along with the boys and girls at Microsoft decided to try to come up with an OS based on a new model which became known as a "multi-kernel".
The new OS they created, called Barrelfish, treated all CPUs as if they were networked even if they were on the same chip, sharing the same common memory.
Barrelfish used explicit message passing to do everything including keeping replicated copies of OS data up-to-date (coherent) rather than relying on the underlying cache hardware to always implicitly get there from here.
A good description of Barrelfish can be found in these videos from Microsoft Research:

< https://www.youtube.com/watch?v=gnd9LPWv1U8 >
< https://www.youtube.com/watch?v=iw1SwGNBWMU >

The first video goes over the scalability problems with current OSs starting around [19:10] and then waves its arms at how message passing is faster/more scalable for systems with more than 8 cores and messing with more than 8 cache lines around [50:43].
The second video goes how/why Barrelfish is the way it is, waving its arms at things like message passing costs.

If you want, you can find an MS thesis paper that showed Barrelfish was slower than a monolithic kernel as well as follow-up papers that suggest that the slowness was due to some sort of locking that has since been fixed.
Generally, though, it appears that Barrelfish demonstrated what was hoped for and so even though it stopped being an actively worked on project some years ago, the ideas behind it have trickled into newer OS design thinking.
And of course this included folks who were looking into how to run OSs whose names end in "ix" on top of Barrelfish.

Of course, the Barrelfish folks worked a lot on getting message passing to be relatively fast.
Nevertheless, I thought it might be amusing to look into whether or not there were any user programs that relied on message passing and what they did to make things work as fast as it could.
(Note that I am *NOT* talking about such software communicates with the host OS ... I'm interested only in how the user software itself worked.)
That's when I stumbled into what's (misleadingly) called "High Frequency Trading" (HFT) systems or (more accurately called) "Low Latency Trading" systems used to trade stocks and things on stock markets.
Such systems need to be able to send messages (to effect trades) that have to be done Right Away (10 MICRO-seconds or less) and done Right Away consistently (within 10 microseconds virtually all the time) and do so with high reliability since we're talking about billions of dollars being at stake.

There were several things that popped out at me right away.
One was that in one of the videos about such systems, the presenter said that everything is working against such systems including the networks and computers and the OSs they use because of their fixation with "fairness" and "throughput".
Another (perhaps from the same video) was that the amount of time spent in the critical "hot paths" was very small, and because most of the time was spent in other code that basically performed "administrative" work, the caches were basically always flushed of anything "hot path" related by the time they needed to be executed.

For some reason, the words "interrupt handler" came to mind when this latter point came up.
I recalled a story about how a system -- I believe an 1100/80A -- ran faster when it's "bouncing ball" broke.
For those unfamiliar with 1100 Series interrupt handling, all external interrupts are broadcast to all online processors in a partition.
The processor that takes the interrupt is the one that has the "bouncing ball" (AKA the Broadcast Interrupt Eligibility flag) set in it (if it also happens to have deferrable interrupts enabled flag set).
Once a processor takes an external interrupt, the "bouncing ball" is (hopefully) passed to another processor although if there isn't any other processor that can accept the "bouncing ball", it may well stay with the processor that last took an external interrupt.
According to the story I referred to before, the "bouncing ball" mechanism broke somehow and so the bouncing ball remained with one processor even though there were other eligible processors laying around.
The result, according to the story, is that system performance increased, although the reason why wasn't clear.
Rumor had it that things speed up because the interrupt handler code remained in cache or because the user or OS code and data didn't get flushed because of the interrupt handler hitting the fan ... or perhaps none of the above.

What I do know is that from my machinations with looking into simulation of an 1100/2200 system, the "bouncing ball" logic seemed to unnecessarily complicate the simulator code.
I was tempted to skip it until I heard that the Company's emulator supposedly emulated the "bouncing ball" behavior.
But more recently, I've noticed in the model dependencies section of the processor programmer manual, the "bouncing ball" is now model dependent for some reason.
It could be for performance, or it could be because it just isn't worth it in terms of complexity now that things are emulated.

This is what got me to wondering if a separate supervisor cache could be useful to boost system performance by tweaking such a cache to retain interrupt handler code for a "long" time.
But then, it occurred to me that because the L3 cache was so large, it might be useful to dedicate some or all of it to holding supervisor code.
One would assume that in the case of something like SeL4, this would be A Good Idea as the "supervisor" (AKA OS kernel) consists only of the message passing engine and interrupt handlers with everything else moved to user space.
In the case of Barrelfish, although OS data is replicated, the L3 cache is apparently being used to speed up passing messages between other cores on the same chip and so it seemed like a separate supervisor D-cache might be A Good Idea.
And since OS kernel code tends not to change (meaning maintaining coherence isn't much of a concern), then having a separate supervisor (I-cache) might be A Good Idea as well.

Obviously, the best way to investigate this is by way of controlled experiments, say by instrumenting a copy of Boches and seeing where/how time is spent.
But this was too high of a learning curve for me and so I thought I'd look for other evidence that might show that separate supervisor cache (or two) might be useful ... or not.
That lead me to posting my question ... which has turned out to be a waste of time.
At the moment, I have better things to do now than to look into this matter any further, especially while the weather is good so I think I'm done with this thread.
Feel free to comment as you like without concern for me responding as I probably won't even be reading here for awhile.


Click here to read the complete article
Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

<e7WUM.19265$sqIa.15836@fx07.iad>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=426&group=comp.sys.unisys#426

 copy link   Newsgroups: comp.sys.unisys
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx07.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...
Newsgroups: comp.sys.unisys
References: <08eedc84-339f-47d8-a2c1-65a076a53ed7n@googlegroups.com> <ubdjto$2d085$1@dont-email.me> <4a2bbd4c-8e8b-4136-ae74-89b9a9ea1edcn@googlegroups.com> <ubf86b$2nnjm$1@dont-email.me> <f56ecc39-a1ec-4e67-8fc4-696114f6d99cn@googlegroups.com> <ubo3b6$9hr9$1@dont-email.me> <646745b0-5c25-44b9-a9a2-06b25fa9c240n@googlegroups.com>
Lines: 330
Message-ID: <e7WUM.19265$sqIa.15836@fx07.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 09 Oct 2023 16:46:02 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 09 Oct 2023 16:46:02 GMT
X-Received-Bytes: 18524
 by: Scott Lurndal - Mon, 9 Oct 2023 16:46 UTC

Lewis Cole <l_cole@juno.com> writes:
>I'm up to my ass in alligators IRL and so I haven't (and likely won't for s=
>ome time) have a lot of time to read and respond to posts here.
>So I'm going to respond to only Mr. Fuld's posts rather than any others.
>And since I am up to my ass in alligators, I'm going to break up my respons=
>e to Mr. Fuld's post into two parts so that I can get SOMETHING out Real So=
>on Now.
>So here is the first part:
>
>On 8/15/2023 11:48 AM, Lewis Cole wrote:
>>> On Tuesday, August 15, 2023 at 12:06:53AM UTC-7, Stephen Fuld wrote:
>>> <snip>
>>>>>> Yeah. Only using half the cache at any one time would seem to decreas=
>e
>>>>>> performance. :-)
>>>>>
>>>>> Of course, the smiley face indicates
>>>>> that you are being facetious.
>>
>>No. I wasn't. See below.
>
>I thought you didn't want to argue about caching? ;-)
>Well, hopefully we both argue on pretty much everything and any disagreemen=
>t is likely due to us not being on the same page WRT our working assumption=
>s, so perhaps this is A Good Thing.
>
>However, I apologize to you for any annoyance I caused you due to my assump=
>tions about your reply.
>
>>>>> But just on the off chance that
>>>>> someone wandering through the group
>>>>> might take you seriously, let me
>>>>> point out that re-purposing half of
>>>>> a cache DOES NOT necessarily reduce
>>>>> performance, and may in fact increase
>>>>> it if the way that the "missing" half
>>>>> is used somehow manages to increase
>>>>> the overall hit rate ...
>>
>> Splitting a size X cache into two sized x/2 caches will almost certainly
>> *reduce* hit rate.=20
>
>There are several factors that influence hit rate, one of them is cache siz=
>e.
>Others, such as number of associativity ways, replacement policy, and line =
>size are also obvious influences.
>In addition, there are other factors that can make up for a slightly reduce=
>d hit rate so that such a cache can still be competitive with a cache with =
>a slightly higher hit rate such as cycle time.
>
>If two caches are literally identical in every way except for size, then yo=
>u are *CORRECT* that the hit rate will almost certainly be lower for the sm=
>aller cache.
>However, given two caches, one of which just happens to be half the size of=
> the other, it does *NOT* follow that the smaller cache must necessarily ha=
>ve a lower hit rate than the other as changes to some of the other factors =
>that affect hit rate might just make up for what was lost due to the smalle=
>r size.
>
>> Think of it this way. The highest hit rate is
>> obtained when the number of most likely to be used blocks are exactly
>> evenly split between the two caches.
>
>Ummm, no. I guess we are going to have an argument over caching after all .=
>...
>
>The highest hit rate is obtained when a cache manages to successfully antic=
>ipate, load up into its local storage, and then provide that which the proc=
>essor needs *BEFORE* the processor actually makes a request to get it from =
>memory. Period.
>This is true regardless of whether or not we're talking about one cache or =
>multiple caches.
>From the point of view of performance, it's what makes caching work.
>(Note that there may be other reasons for a cache such as to reduce bus tra=
>ffic, but let's ignore these reasons for the moment.)
>Whether or not, say, an I-cache happens to have the same number of likely-t=
>o-be-used blocks as the D-cache is irrelevant.
>They may have the same number. They may not have the same number. I suspe=
>ct for reasons that I'll wave my arms at shortly that they usually aren't.
>What matters is whether they have what's needed and can deliver it before t=
>he processor actually requests it.
>
>Now if an I-cache is getting lots and lots of hits, then presumably it is l=
>ikely filled with code loops that are being executed frequently.
>The longer that the processor can continue to execute these loops, the more=
> it will execute them at speeds that approach that which it would if main m=
>emory were as fast as the cache memory.
>And the more that this happens, the more this speed offsets the the much sl=
>ower speed of the processor when it isn't getting constant hits.
>
>However, running in cached loops doesn't imply much about the data that the=
>se loops are accessing.
>They may be marching through long arrays of data or they may be pounding aw=
>ay at a small portion of data structure such as at the front of a ring buff=
>er. It's all very much application dependent.
>About the only thing that we can infer is that because the code is executin=
>g loops, there is at least one instruction (the jump to the top of a loop) =
>which doesn't have/need a corresponding piece of data in the D-cache.
>IOW, there will tend to be one fewer piece of data in the D-cache than in t=
>he I-cache.
>Whether or not this translates into equal numbers of cache lines rather tha=
>n data words just depends.
>(And note I haven't even touched on the effect of writes to the contents of=
> the D-cache.)
>
>So if you think that caches will have their highest hit rate when the "numb=
>er of most likely to be used blocks are exactly evenly split between the tw=
>o caches", you're going to have to provide a better argument/evidence to su=
>pport your reasoning before I will accept this premise, either in the form =
>of a citation or better yet, a "typical" example.
>
>> That would make the contents of
>> the two half sized caches exactly the same as those of the full sized
>> cache.=20
>
>No, it wouldn't. See above.
>
>> Conversely, if one of the caches has a different, (which means
>> lesser used) block, then its hit rate would be lower.=20
>
>No, it wouldn't. See above.
>
>> There is no way
>> that splitting the caches would lead to a higher hit rate.=20
>
>As I waved my arms at before, it is possible if more changes than are made =
>than just to its size.
>
>For example, if a cache happens to be a directed mapped cache, then there's=
> only one spot in the cache for a piece of data with a particular index.
>If another piece of data with the same index is requested, then old piece o=
>f data is lost/replaced with the new one.
>This is basic direct mapped cache behavior 101.
>
>OTOH, if a cache happens to be a set associative cache of any way greater t=
>han one (i.e. not direct mapped), then the new piece of data can end up in =
>a different spot within the same set for the given index from which it can =
>be returned if it is not lost/replaced for some other reason.
>This is basic set associativity cache behavior 101.
>
>The result is that if the processor has a direct mapped cache and just happ=
>ens to make alternating accesses to two pieces of data that have the same i=
>ndex, the directed mapped cache will *ALWAYS* take a miss on every access (=
>i.e. have a hit rate of 0%), while the same processor with a set associativ=
>e cache of any way greater than one will *ALWAYS* take a take a hit (i.e. h=
>ave a hit rate of 100%).
>And note that nowhere in the above description is there any mention of cach=
>e size.
>Cache size DOES implicitly affect the likelihood of a collision, and so "ty=
>pically" you will get more collisions which will cause a direct mapped cach=
>e to perform worse than a set associative cache.
>And you can theoretically (although not practically) go one step further by=
> making a cache fully associative which will eliminate conflict misses enti=
>rely.
>In short, there most certain is "a way" that the hit rate can be higher on =
>a smaller size cache than a larger one, contrary to your claim.
>
>> But hit rate
>> isn't the only thing that determines cache/system performance.
>
>Yes. Finally, we agree on something.
>
>>>>> such as
>>>>> reducing a unified cache that's used
>>>>> to store both code and data with a
>>>>> separate i-cache for holding
>>>>> instructions and a separate d-cache
>>>>> for holding data which is _de rigueur_
>>>>> on processor caches these days.
>>
>> Separating I and D caches has other advantages. Specifically, since
>> they have separate (duplicated) hardware logic both for addressing and
>> the actual data storage, the two caches can be accessed simultaneously,
>> which improves performance, as the instruction fetch part of a modern
>> CPU is totally asynchronous with the operand fetch/store part, and they
>> can be overlapped. This ability, to do an instruction fetch from cache
>> simultaneously with handling a load/store is enough to overcome the
>> lower hit rat
>
>Having a separate I-cache and D-cache may well have other advantages beside=
>s increased hit rate.
>And increased concurrency may well be one of them.
>However, my point by mentioning the existence of separate I-caches and D-ca=
>ches was to point out that given a sufficiently Good Reason, splitting/repl=
>acing a single cache with smaller caches may be A Good Idea.
>Increased concurrency doesn't change that argument in the slightest.
>Simply replace any mention of "increased hit rate" with "increased concurre=
>ncy" and the result is the same.
>
>If you want to claim that increased concurrency was the *MAIN* reason for t=
>he existence of separate I-caches and D-caches, then I await with bated bre=
>ath for you to present evidence and/or a better argument to show this was t=
>he case.
>And if you're wondering why I'm not presenting -- and am not going to prese=
>nt -- any evidence or argument to support my claim that it was due to incre=
>ased hit rate, that's because we both seem to agree on the basic premise I =
>mentioned before, namely, given a sufficiently Good Reason, then splitting/=
>replaceing a single cache with smaller caches may be A Good Idea.
>Any argument that you present strengthens that premise without the need for=
> me to do anything.
>
>I will point out, however, that I think that increased concurrency seems li=
>ke a pretty weak justification.
>Yes, separate caches might well allow for increased concurrency, but you ha=
>ve to come up with finding those things that can be done during instruction=
> execution that can be done in parallel.
>And if you manage to find that parallelism, then you need to not only be ab=
>le to issue separate operations in parallel, you have to make sure that the=
>se parallel operations don't interfere with one another, which is to say th=
>at your caches remain "coherent" despite doing things like modifying the co=
>de stream currently being executed (i.e. self modifying code).
>Given the limited transistor budget In The Early Days, I doubt that dealing=
> with these issues was something that designers were willing to mess with i=
>f they didn't have to.
>(The first caches tended to be direct mapped because they were the simplest=
> and therefore the cheapest to implement while also having the fastest acce=
>ss times.
>Set associative caches performed better, but were more complicated and ther=
>efore more expensive as well as having slower access times and so came late=
>r.)
>ISTM that a more plausible reason other than hit rate would be to further r=
>educe bus traffic which was one of the other big reasons that DEC (IIRC) go=
>t into using them In the Beginning.
>
>> Note that this advantage doesn't apply to a
>> user/supervisor separation, as the CPU is in one mode or the other, not
>> both simultaneously.
>
>Bullshit.
>
>Assuming that you have two separate caches that can be kept fed and otherwi=
>se operate concurrently, then All You Have To Do to make them both somethin=
>g at the same time is to generate "select" signals for each so that they kn=
>ow that they should operate at the same time.
>Obviously, a processor knows whether or not it is in user mode or superviso=
>r mode when it performs an instruction fetch and so it is trivially easy fo=
>r a processor in either mode to generate the correct "select" signal for an=
> instruction fetch from the correct instruction cache.
>It should be equally obvious that a processor in user mode or supervisor mo=
>de knows (or can know) when it is executing an instruction that should oper=
>ate on data that is in the same mode as the instruction its executing.
>And it should be obvious that you don't want a user mode instruction to eve=
>r be able to access supervisor mode data.
>The only case this leaves to address when it comes to the generation of a "=
>select" signal is when a processor running in supervisor mode wants to do s=
>omething with user mode code or data.
>
>But generating a "select" signal that will access instructions or data in e=
>ither a user mode instruction cache or data in a user mode data cache is tr=
>ivially easy as well, at least conceptually, especially if one is willing t=
>o make use of/exploit that which is common practice in OSs these days.
>In particular, since even before the Toy OSs grew up, there has been a fixa=
>tion with dividing the logical address space into two parts, one part for u=
>ser code and data and the other part for supervisor code and data.
>When the logical space is divided exactly in half (as was the case for much=
> of the time for 32-bit machines), the result was that the high order bit o=
>f the address indicates (and therefore could be used as a select line for) =
>user space versus supervisor space cache access.
>While things have changed a bit since 64-bit machines have become dominant,=
> it is still at least conceptually possible to treat some part of the high =
>order part of a logical address as such an indicator.
>
>"But wait ... ," you might be tempted to say, "... something like that does=
>n't work at all on a system like a 2200 ... the Exec has never had the same=
> sort of placement fixation in either absolute or real space that the forme=
>r Toy OSs had/have", which is true.
>But the thing is that the logical address of any accessible word in memory =
>is NOT "U", but rather "(B,U)" (both explicitly in Extended mode and implic=
>itly in Basic Mode) where B is the number of a base register, and each B-re=
>gister contains an access lock field which in turn is made up of a "ring" a=
>nd a "domain".
>Supervisor mode and user mode is all about degrees of trust which is a simp=
>lification of the more general "ring" and "domain" scheme where some collec=
>tion of rings are supposedly for "supervisor" mode and the remaining collec=
>tion are supposedly for "user" mode.
>Whether or not this is actually the way things are used, it is at least con=
>ceptually possible that an address (B,U) can be turned into a supervisor or=
> user mode indicator that can be concatenated with U which can then be sent=
> to the hardware to select a cache and then a particular word within that c=
>ache.
>So once again, we're back to being able to identify supervisor mode code/da=
>ta versus user mode code/data by its address.
>(And yes, I know about the Processor Privilege [PP] flags in the designator=
> register, and reconciling their use with the ring bits might be a problem,=
> but at least conceptually, PP does not -- or at least need not -- matter w=
>hen it comes to selecting a particular cache.)
>
>If you want to say no one in their right mind -- certainly no real live CPU=
> designer -- would think in terms of using some part of an address as a "ri=
>ng" together with an offset, I would point out to you that this is not the =
>case: a real, live CPU designer *DID* choose to merge security modes with =
>addressing and the result was a relatively successful computer.
>It was called the Data General Eclipse and Kidder's book, "Soul of a New Ma=
>chine", mentions this being done.
>
>What I find ... "interesting" ... here, however, is that you would try to m=
>ake an argument at all about the possible lack of concurrency WRT a possibl=
>e supervisor cache.
>As I have indicated before, I assume that any such cache would be basically=
> at the same level as current L3 caches and it is my understanding that for=
> the most part, they're not doing any sort of concurrent operations today.
>It seems, therefore, that you're trying to present a strawman by suggesting=
> a disadvantage that doesn't exist at all when compared to existing L3 cach=
>es.
>
>>>>>
>>>>> I think it should be clear from the
>>>>> multiple layers of cache these days,
>>>>> each layer being slower but larger
>>>>> than the one above it, that the
>>>>> further you go down (towards memory),
>>>>> the more a given cache is supposed to
>>>>> cache instructions/data that is "high
>>>>> use", but not so much as what's in
>>>>> the cache above it.
>>
>> True for an exclusive cache, but not for an inclusive one.
>
>I don't know what you mean by a "exclusive" cache versus an "inclusive one"=
>.
>Please feel free to elaborate on what you mean.
>In every multi-layered cache in a real live processor chip that I'm aware o=
>f, each line in the L1 cache is also represented by a larger line in the L2=
> cache that contains the L1 line as a subset, each line in the L2 cache is =
>also represented by a larger line in the L3 that contains the L2 line as a =
>subset.
>
>At this point, I'm going to end my response to Mr. Fuld's post here and go =
>off and do other things before I get back to making a final reply to the re=
>maining part of his post.


Click here to read the complete article
Pages:12
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor