novaBBS - comp.sys.unisys - Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...

On 8/14/2023 1:47 PM, Lewis Cole wrote:
> On Monday, August 14, 2023 at 9:14:50 AM UTC-7, Stephen Fuld wrote:
>>> On 8/13/2023 10:18 PM, Lewis Cole wrote:
>>> So just for giggles, I've been thinking
>>> more about what (if anything) can be
>>> done to improve system performance by
>>> tweaks to a CPU's cache architecture.
>>> I've always thought that having a
>>> separate cache for supervisor mode
>>> references and user mode references
>>> _SHOULD_ make things faster, but when
>>> I poked around old stuff on the
>>> Internet about caches from the
>>> beginning of time, I found that while
>>> Once Upon A Time, separate supervisor
>>> mode and user mode caches were
>>> considered something to try, they
>>> were apparently abandoned because a
>>> unified cache seemed to work better
>>> in simulations. Surprise, surprise.
>>
>> Yeah. Only using half the cache at any one time would seem to decrease
>> performance. :-)
>>
>
> Of course, the smiley face indicates that you are being facetious.
> But just on the off chance that someone wandering through the group might take you seriously, let me point out that re-purposing half of a cache DOES NOT necessarily reduce performance, and may in fact increase it if the way that the "missing" half is used somehow manages to increase the overall hit rate ... such as reducing a unified cache that's used to store both code and data with a separate i-cache for holding instructions and a separate d-cache for holding data which is _de rigueur_ on processor caches these days.
>
> I think it should be clear from the multiple layers of cache these days, each layer being slower but larger than the one above it, that the further you go down (towards memory), the more a given cache is supposed to cache instructions/data that is "high use", but not so much as what's in the cache above it.
> And even since the beginning of time (well ... since real live multi-tasking OS appeared), it has been obvious that processors tend to spend most of their time in supervisor mode (OS) code rather than in user (program) code.

I don't want to get into an argument about caching with you, but I am
sure that the percentage of time spent in supervisor mode is very
workload dependent.

> From what I've read, the reason why separate supervisor and user mode caches performed worse than a unified cache was because of all the bouncing around through out the OS that was done.
> Back in The Good Old days where caches were very small essentially single layer, it is easy to imagine that a substantial fraction of any OS code/data (including that of a toy) could not fit in the one and only small cache and would not stay there for very long if it somehow managed to get there.
> But these days, caches are huge (especially the lower level ones) and it doesn't seem all that unimaginable to me that you could fit and keep a substantial portion of any OS laying around in one of the L3 caches of today ... or worse yet, in a L4 cache if a case for better performance can be made.
>
>>> This seems just so odd to me and so
>>> I've been wondering how much this
>>> result is an artifact of the toy OS
>>> that as used in the simulations
>>> (Unix) or the (by today's standards)
>>> small single layer caches used.
>>> This got me to thinking about the
>>> 1110 AKA 1100/40 which had no caches
>>> but did have two different types of
>>> memory with different access speeds.
>>> (I've always thought of the 1110
>>> AKA 1100/40 as such an ugly machine
>>> that I've always ignored it and
>>> therefore remained ignorant of it
>>> even when I worked for the Company.)
>>> To the extent that the faster (but
>>> smaller) memory could be viewed as
>>> a "cache" with a 100% hit rate, I've
>>> been wondering about how performance
>>> differed based on memory placement
>>> back then.
>>
>> According to the 1110 system description on Bitsavers, the cycle time
>> for the primary memory (implemented as plated wire) was 325ns for a read
>> and 520ns for a write, whereas the extended memory (the same core
>> modules as used for the 1106 main memory) had 1,500 ns cycle time, so a
>> substantial difference, especially for reads.
>
> Yes.
>
>> But it really wasn't a cache. While there was a way to use the a
>> channel in a back-to back configuration, to transfer memory blocks from
>> one type of memory to the other (i.e. not use BT instructions), IIRC,
>> this was rarely used.
>
> No, it wasn't a cache, which I thought I made clear in my OP.
> Nevertheless, I think one can reasonably view/think of "primary" memory as if it were a slower memory that just happened to be cached where just by some accident, the cache would always return a hit.
> Perhaps this seems weird to you, but it seems like a convenient tool to me to see if there might be any advantage to having separate supervisor mode and user mode caches.

I agree that it sounds weird to me, but if it helps you, have at it.

>
>>> Was the Exec (whatever level it
>>> might have been ... between 27
>>> and 33?) mostly or wholly loaded
>>> into the faster memory?
>>
>> IIRC, 27 was the last 1108/1106 only level. 28 was an internal
>> Roseville level to start the integration of 1110 support. Level 29
>> (again IIRC) was the second internal version, perhaps also used for
>> early beta site 1110 customers; 30 was the first 1110 version, released
>> on a limited basis primarily to 1110 customers, while 31 was the general
>> stable release.
>>
>
> Thanks for the history.
>
>>> Was there special code (I think
>>> there was) that prioritized
>>> placement of certain things in
>>> memory and if so how?
>>
>> There were options on the bank collector statements to specify prefer or
>> require either primary or extended memory. If you didn't specify, the
>> default was I-banks in primary, D-banks in extended. That made sense,
>> as all instructions required an I-bank read, but many instructions don't
>> require a D-bank reference (e.g. register to register, j=U or XU,
>> control transfer instructions), and the multiple D-bank instructions
>> (e.g. Search and BT) were rare. Also, since I-banks were almost
>> entirely reads, you took advantage of the faster read cycle time.
>>
>> Also, I suspect most programs had a larger D-bank than I-bank, and since
>> you typically had more extended than primary memory, this allowed more
>> optimal use of the expensive primary memory.
>>
>> I don't remember what parts of the Exec were where, but I suspect it was
>> the same as for user programs. Of course, the interrupt vector
>> instructions had to be in primary due to their hardware fixed addresses.
>>
>
> For me, life started with 36 level by which time *BOOT1, et. al. had given way to *BTBLK, et. al.
> Whatever the old bootstrap did, the new one tried to place the Exec I- and D-banks at opposite ends of memory, presumably so that concurrent accesses stood a better chance of not blocking one another due to being in a physically different memory that was often times interleaved.
> IIRC, whether or not this was actually useful, it didn't change until M-Series hit the fan with paging.

First of all, when I mentioned the interrupt vectors, I wasn't talking
about boot elements, but the code starting at address 0200 (128 decimal)
through 0377 on at least pre 1100/90 systems which was a set of LMJ
instructions, one per interrupt type, that were the first instructions
executed after an interrupt. e.g. on an 1108, on an ISI External
Interrupt on CPU0 the hardware would transfer control to address 0200,
where the LMJ instruction would capture the address of the next
instruction to be executed in the interrupted program, then transfer
control to the ISI interrupt handler.

But you did jog my memory about Exec placement. On an 1108, the Exec
I-bank was loaded starting at address 0, and extended at far as needed.
The Exec D-bank was loaded at the end of memory i.e. ending at 262K for
a fully configured memory, extending "downward" as far as needed. This
left the largest contiguous space possible for user programs, as well as
insuring that the Exec I and D banks were in different memory banks, to
guarantee overlapped timing for I fetch and data access. I guess that
the 1110 just did the same thing, as it didn't require changing another
thing, and maximized the contiguous space available for user banks in
both primary and extended memory.

>>> What sort of performance gains
>>> did use of the faster memory
>>> produce or conversely what sort
>>> of performance penalties occur
>>> when it wasn't?
>>
>> As you can see from the different cycle times, the differences were
>> substantial.
>
> Yes, but do you know of anything that would suggest things were faster/slower because a lot of the OS was in primary storage most of the time ... IOW something that would support/refute the idea that separate supervisor and user mode caches might now be A Good Idea?

I think the "faster" was just the same as for the default in user
programs - instructions are accessed more often than data, so I don't
think it has any bearing on the separate cache issue. Remember, all of
the instructions (not "a lot") were in primary memory and none of the data.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Subject	Replies	Author
Speaking of the 1110 AKA 1100/40 and its two types of memory ... By: Lewis Cole on Mon, 14 Aug 2023	37	Lewis Cole

Cobol programmers are down in the dumps.

computers / comp.sys.unisys / Re: Speaking of the 1110 AKA 1100/40 and its two types of memory ...