Message-ID:

Machines that have broken down will work perfectly when the repairman arrives.

devel / comp.arch / Re: Could we build a better 6502?

Michael S <already5chosen@yahoo.com> writes:
>Chapter 1. is Tomasulo's S/360 Model 91/191. This chapter was brief. I am not sure how to call it, a failure or a limited success.

It beat the CDC6600, which was it's primary purpose, so in that sense
it is a success. There was also the /195 (I guess you mean that, not
191), which is a reimplementation in a more modern circuit technology,
so it certainly was no failure.

However, while it is often mentioned as OoO, it's not the kind of
modern OoO that I meant (and described) in my posting: OoO execution
with in-order completion, giving us precise exceptions, while the /91
was infamous for imprecise exceptions.

>Chapter 2. are water-cooled S/390 Models 820 and 900. Replaced by simpler "CMOS" models.

That has eluded me up to now. I only read about some family members
later that were far behind in microarchitecture and per-processor MIPS
to what Intel did at the time (IIRC single-issue in-order while Intel
was 3-wide OoO). I thought at the time that for the legacy software
that they were running, more CPU performance is probably not that
important (and maybe the jobs were I/O-bound anyway). Looking at the
MIPS numbers of the 9672 machines, they were really far behind (but I
think it was the z900 (2500MIPS on 16 processors) that I found
underwhelming).

>Chapter 3 started with z196 in 2010 and is still going (zEC12, z13, z14, z15).

That's what I meant with the OoO S/360 descendants.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Thomas Koenig wrote:
> Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
>
> I have to say I like the structured approach that you are proposing
> below. It's a much better than just randomly tossing ideas around :-)
>
>> True, the addressing modes were mostly orthogonal to the rest of the
>> instructions, but they required control circuitry and space in the
>> PLA. And overall the 6502 had a number of non-programmer-visible
>> registers (IIRC I counted 17 8-bit registers overall), so all this
>> stuff was not that cheap wrt reducing registers, either.
>>
>> An alternative architectural approach is to implement
>
>
>> 1) Registers that are necessary for the data path: program counter, an
>> address register, ALU output, maybe ALU inputs.
>
> Plus the ALU, and the incrementer for the PC (which makes sense
> to have independent of the ALU).

Incrementer/decrementer, and 16 bits too.

But there is a not-so-minor complication of how to do a R-M-W
sequence on a register like PC+1->PC if the registers are built
from latches and there is only one set of read/write bit lines.
And even if you can afford separate read and write bit lines,
the cells are latches and cannot be both read and written at once
(which is why one can go with the single bit line in the first place).

One solution is to double pump the register file,
read on the first 1/2 cycle, writes on the second 1/2 cycle.
But ALU/inc/dec latency could interact with register file clock speed.

Inc/dec operation has a lot of overlap with the ALU ADD/SUB.
There are some savings in having a 16-bit data bus which can
move one 16-bit value or two 8-bit values to the ALU operands,
and having the 8-bit ALU also do 16-bit inc/dec.

This plus the above R-M-W on a single register file entry can imply
a 16-bit latch on the ALU output, then a write back of a register
file update in the next cycle of a separate result bus.

One also would like to read an 8-bit external data bus value directly
to the ALU as an operand without an extra cycle to save the operand
in a temp register.

These are all the same ideas I played around with doing an 8-bit redesign,
I just started with the RCA 1802 (5500 CMOS transistors).

> Another thing to design would be the load/store unit, which should
> be able to load or store a byte at 500 ns, which the RAM at the
> time could do. 1 MHz time for word-sized access sounds good.

It should be capable of doing a memory access cycle in 1 clock.
That is, address is latched into Memory Address Register (MAR)
and bus read/write cycle starts on rising clock edge.
Memory Ready line is sampled on rising edge of next clock,
and for a read if Ready the data is latched internally
and we proceed to the next state.

That allows you to take better advantage of faster SRAM as the
processor clock increases later on.

>> 2) Add a few more to make it possible to have a complete instruction
>> set without needing extra cycles (which would require registers for
>> the intermediate results): e.g., flags, return address.
>
> Flags: Ack.
>
> Return address: You mean a link register for subroutine calls,
> and for interrupts?

Yes, link registers.
This saves putting multi-byte push/pop sequence into the controller.
Push and pop can be single instructions but they only operate on one byte.

Or one can eliminate push and pop and use the LD/ST approach,
which implies it has a register displacement [reg+offset]
address mode (8 and 16 bit displacements).
Obviously useful, but complicates control sequencer.

>> The result would be a Nova-like architecture, but designed for an
>> 8-bit data bus (which implies significant changes).
>>
>> I think this could reduce the amount of control logic quite a lot
>> compared to the 6502, which had to have control for, e.g., saving the
>> return addres on a JSR and restoring the address on a RET.
>
> Agreed. This is probably better implemented by BL / jump to
> link register or even by two instructions.

Yes - gets rid of lots of control logic.
My redesign had two 8-bit accumulators A and B like 6800.
One advantage is they can be used as a pair to hold 16-bit values.

That means that rather than having separate LD/ST instructions
for the BAL PC-LINK and interrupt PC-INT registers,
there are instructions to move to/from the pair A,B
then one uses the normal 8-bit LD/ST to save/restore.

That saves a whack of instructions for operating on
PC-LINK and PC-INT registers.

>> With the amount of control logic saved, one might be able to afford a
>> 16-bit data path, avoiding the need to have two cycles through the ALU
>> for 16-bit operations (and the corresponding control logic), or the
>> need to work around that in software.
>
> Agreed.

There are also bus cross-over points (routing muxes/pass gates) to consider.
e.g. one might want to read a 16-bit register, decrement it using inc/dec,
and route the result to the MAR in 1 cycle,
which is legal because those are separate latches.

This requires disconnecting the MAR from the operand bus,
connecting MAR to the inc/dec result bus,
putting the inc/dec result latches into transparent mode
so the result flows directly through them,
and disconnecting the result bus from the register file.

>> Once we are there and know how much of our budget we have spent on
>> that, we can think about whether we can afford more fancy stuff:
>>
>> * Just one or two additional registers, and/or auto-increment to avoid
>> the most glaring inefficiencies in the instruction set.
>
> Not sure I'm sold on the auto-increment. This would mean two
> register writes per instruction, and I am envisioning this processor
> as (potentially) a single-cycle machine, or at least a machine where
> every instuction takes the same number of cycles, for simplicity.

One must consider which internal buses are being used for what and when.
Also is it a pre-decrement and a post-increment,
and where the buses, MAR, and cross over pass gates are located,
and is it a data bus write, which also needs an operand register read cycle,
or a data bus read which just needs to latch the result when Memory Ready.

Not counting the instruction fetch cycle,
I would expect a byte push requires 2 cycles:
pre-decrement address reg-> MAR,
then read operand register to data bus and perform memory write cycle.
A byte pop requires 3 cycles:
address reg->MAR, post-increment -> result latch,
then result latch -> address reg, start memory read cycle,
then finish read cycle, read data bus -> dest reg.

> One of the instructions could be the encoding of immediates
> as discussed in another thread. Another could be two to four
> bit immediates for ADD and SUB.
>
>
>> * A stack machine, like the b16-like architecture I have in mind.
>>
>> * A register machine, like the RISC Thomas Koenig has in mind.
>
> Both viable options.
>
> [...]

Re: OoO S/360 descendants (was: Could we build a better 6502?)

<3a2c9983-e999-465f-8c41-0492598e1370n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19968&group=comp.arch#19968

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:2f47:: with SMTP id v68mr10455605qkh.190.1629302124603;
Wed, 18 Aug 2021 08:55:24 -0700 (PDT)
X-Received: by 2002:a05:6830:3109:: with SMTP id b9mr7703326ots.276.1629302124375;
Wed, 18 Aug 2021 08:55:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 18 Aug 2021 08:55:24 -0700 (PDT)
In-Reply-To: <2021Aug18.155716@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <4faf234c-eabc-4ed4-b849-5e33df12fdb9n@googlegroups.com>
<jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de>
<84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at>
<61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com> <2021Aug18.102524@mips.complang.tuwien.ac.at>
<524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com> <2021Aug18.155716@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3a2c9983-e999-465f-8c41-0492598e1370n@googlegroups.com>
Subject: Re: OoO S/360 descendants (was: Could we build a better 6502?)
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 18 Aug 2021 15:55:24 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Wed, 18 Aug 2021 15:55 UTC

On Wednesday, August 18, 2021 at 5:24:16 PM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >Chapter 1. is Tomasulo's S/360 Model 91/191. This chapter was brief. I am not sure how to call it, a failure or a limited success.
> It beat the CDC6600, which was it's primary purpose, so in that sense
> it is a success. There was also the /195 (I guess you mean that, not
> 191), which is a reimplementation in a more modern circuit technology,
> so it certainly was no failure.
>

They sold very few machines. Likely, less than 40 for /91, /95 and /195 combined.
Far fewer than CDC6600.

> However, while it is often mentioned as OoO, it's not the kind of
> modern OoO that I meant (and described) in my posting: OoO execution
> with in-order completion, giving us precise exceptions, while the /91
> was infamous for imprecise exceptions.
> >Chapter 2. are water-cooled S/390 Models 820 and 900. Replaced by simpler "CMOS" models.
> That has eluded me up to now. I only read about some family members
> later that were far behind in microarchitecture and per-processor MIPS
> to what Intel did at the time (IIRC single-issue in-order while Intel
> was 3-wide OoO). I thought at the time that for the legacy software
> that they were running, more CPU performance is probably not that
> important (and maybe the jobs were I/O-bound anyway). Looking at the
> MIPS numbers of the 9672 machines, they were really far behind (but I
> think it was the z900 (2500MIPS on 16 processors) that I found
> underwhelming).

That was 8 years later.
Model 900 was announced in 1990 and shipping in the 1st half of 1991.
Back then 200 MIPS per 6 cores was considered quite fast.

> >Chapter 3 started with z196 in 2010 and is still going (zEC12, z13, z14, z15).
> That's what I meant with the OoO S/360 descendants.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> With the amount of control logic saved, one might be able to afford a
>> 16-bit data path, avoiding the need to have two cycles through the ALU
>> for 16-bit operations (and the corresponding control logic), or the
>> need to work around that in software.
>
>I think as long as you're stuck with an accumulator design (i.e.
>a very small number of registers), the benefits of a 16bit datapath
>would be marginal, since you still constantly need to read or write
>it to memory (in 2 steps, given the 8bit memory bus).

If you mean implementing the same architecture with an 8-bit data
path, using two cycles for a 16-bit ALU operation, yes, the slowdown
from that may be small. The question is if the increased complexity
of the control part does not outweigh the lowered complexity of the
data path.

If you mean that such an architecture would not be more efficient than
the 6502, I think you are wrong. E.g., if you follow a pointer, that
could be something like

#we start and end with the pointer in acc
addr <- acc
acc <- (addr)

Two instruction bytes, four memory cycles. Or, if you only have 8-bit
loads:

addr <- acc
acclo <- (addr), addr<-addr+1
acchi <- (addr)

Three instruction bytes, 5 memory cycles. By contrast, on the 6502 it
was something like:

#we have a zero-page word P, with the lower byte set to 0
#we start and end with lo(pointer) in Y and hi(pointer) in A
sta P+1
iny
lda (P),y
tax
dey
lda (p),y
tay
txa

11 instruction bytes, 13 memory cycles, 24 cycles. You can do better
if your before and after conditions do not need to be the same, but in
any case, this demonstrates the shortcomings of the 6502 architecture
nicely (e.g., a TXY instruction would have shortened this sequence
significantly).

Yes, if we can fit this into the budget.

>while keeping the instruction encoding compact (to reduce the
>amount of instruction memory accesses), which calls for implicit
>register references, as is the case in stack machines

Yes, that's why I was thinking of stack machines.

However, given that the technology probably does not give us many
registers anyway, half-implicit registers may be good enough; e.g., if
we have two accumulators and two address registers, we need only two
bits for the register addressing in the instructions above, which
should fit in 8-bit-wide instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: OoO S/360 descendants (was: Could we build a better 6502?)

<2021Aug18.184436@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19971&group=comp.arch#19971

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: OoO S/360 descendants (was: Could we build a better 6502?)
Date: Wed, 18 Aug 2021 16:44:36 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 41
Message-ID: <2021Aug18.184436@mips.complang.tuwien.ac.at>
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at> <61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com> <2021Aug18.102524@mips.complang.tuwien.ac.at> <524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com> <2021Aug18.155716@mips.complang.tuwien.ac.at> <3a2c9983-e999-465f-8c41-0492598e1370n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="21a1509ff6402e962db0ad5a6cc564c9";
logging-data="10653"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oVV2csXPPLGim2FJJ6oh+"
Cancel-Lock: sha1:FCfVERWx0ek+IJQA1wohTacsJec=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Wed, 18 Aug 2021 16:44 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Wednesday, August 18, 2021 at 5:24:16 PM UTC+3, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>> >Chapter 1. is Tomasulo's S/360 Model 91/191. This chapter was brief. I am not sure how to call it, a failure or a limited success.
>> It beat the CDC6600, which was it's primary purpose, so in that sense
>> it is a success. There was also the /195 (I guess you mean that, not
>> 191), which is a reimplementation in a more modern circuit technology,
>> so it certainly was no failure.
>>
>
>They sold very few machines. Likely, less than 40 for /91, /95 and /195 combined.

Reading the wikipedia pages, at most 20 /91s. 2 /95s. About 20 /195s.

>Far fewer than CDC6600.

"At least 100 were delivered in total".

So the CDC6600 may have been commercially more successful, but that
does not make the /91 a failure.

>Model 900 was announced in 1990 and shipping in the 1st half of 1991.
>Back then 200 MIPS per 6 cores was considered quite fast.

So that may have been 40MIPS or so for one core, which was
competetive.

But the rest of the industry saw both the GHz race and the
introduction of superscalar and OoO microprocessors during the next
decade, while IBM's CMOS entry in 1994 was a step back from the
earlier ECL machines; and while they also saw fast improvements, they
were still far behind in the 9672 (178 MIPS for 1 processor) in 1999
and the Z900 in 2000. And the rest of the industry also passed IBM on
the multi-processor stuff, with SGI's Origin and later Altix lines,
Sun's (formerly FPS) Starfire, and HP Superdome. But it seems that
IBM have caught up in both respects in the meantime.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Could we build a better 6502?

<a3b4a7c4-517e-4a07-9e9a-1c4295469500n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19972&group=comp.arch#19972

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:84e1:: with SMTP id m88mr10019734qva.61.1629307786517;
Wed, 18 Aug 2021 10:29:46 -0700 (PDT)
X-Received: by 2002:a54:4883:: with SMTP id r3mr8048726oic.7.1629307786285;
Wed, 18 Aug 2021 10:29:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 18 Aug 2021 10:29:46 -0700 (PDT)
In-Reply-To: <sfi8jr$feu$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <ccb15103-f492-45f6-949f-8287ad771ec2n@googlegroups.com>
<8e1f92cf-807e-4885-ba85-915772bfb11cn@googlegroups.com> <sfi8jr$feu$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a3b4a7c4-517e-4a07-9e9a-1c4295469500n@googlegroups.com>
Subject: Re: Could we build a better 6502?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 18 Aug 2021 17:29:46 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Wed, 18 Aug 2021 17:29 UTC

On Wednesday, August 18, 2021 at 1:16:29 AM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > Back when I was 2 years younger than him, I made a calculator out
> > of a couple dozen transistors, nearly 60 resistors, 40-ish capacitors,
> > and 10- small light bulbs with sockets. My input device was the dial
> > off of a rotary phone, the output was the lights, and I had an 8 gang
> > switch to change from adding to subtracting. I put the thing in a box
> > the size of an attache case, you could open the lid and see the rats
> > nest of wires.
> ><
> > Won county and state science fair that year.
> Now, that _is_ impressive.
>
> I guess your professional career was foreshadowed by this :-)
<
I had built 2 HeathKit radios before this, but what sent me down this
path was my birthday in 6th grade where I got a patch board with
a variety of transistors, resistors, capacitors, a telegraph key, and
a couple more things. I built a catswhisker radio from those things.
<
From here on in I knew I wanted to do electronics--only later did
I vector off into digital circuitry.

Re: Could we build a better 6502?

<145d5eb6-095b-4b86-81ca-afcfe4742558n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19973&group=comp.arch#19973

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:2f47:: with SMTP id v68mr10928253qkh.190.1629308039424; Wed, 18 Aug 2021 10:33:59 -0700 (PDT)
X-Received: by 2002:a05:6808:1807:: with SMTP id bh7mr8129751oib.157.1629308039179; Wed, 18 Aug 2021 10:33:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 18 Aug 2021 10:33:58 -0700 (PDT)
In-Reply-To: <2021Aug18.102524@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <be7f8928-ee11-422e-b11e-f89d3ad39782n@googlegroups.com> <4faf234c-eabc-4ed4-b849-5e33df12fdb9n@googlegroups.com> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at> <61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com> <2021Aug18.102524@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <145d5eb6-095b-4b86-81ca-afcfe4742558n@googlegroups.com>
Subject: Re: Could we build a better 6502?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 18 Aug 2021 17:33:59 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 61

by: MitchAlsup - Wed, 18 Aug 2021 17:33 UTC

On Wednesday, August 18, 2021 at 4:19:00 AM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Tuesday, August 17, 2021 at 5:33:26 AM UTC-5, Anton Ertl wrote:
> >> That, too. With the advent of OoO, the drawbacks of CISC mostly
> >> ceased to play a role.
> ><
> >Which is what we were told when doing the 1st generation of RISC--
> >why not just hold on until the transistor budget catches up. This
> >is the "don't upset the cart" philosophy.
<
> Who told you that with what goal? Did the 68000 people try to
> convince the 88000 people that the 68000 will catch up eventually?
<
yes:: Would you like a list of names you would not know ?
<
> It
> seems to me that the 68000's problem was not the 88000, but SPARC,
> HPPA, MIPS, 29000, and eventually PowerPC (although, without these,
> the 88000 would have been a problem for the 68000).
>
> And did they refer to OoO with the transistor budget, or to
> pipelining?
<
They did not know HOW but they knew it would happen sooner or later.
<
> They certainly did the pipelining with the 68040 in 1990,
<
In large part the '040 was designed using what we figured out in 88100
and 88110.
<
> and then did superscalar with the 68060 in 1994, but never did OoO.
<
Same problem as VAX.
<
>
> What I meant with the drawbacks of CISC mostly ceasing to play a role
> is that the separation of execution and commit would have allowed to
> do complex instructions like those of the 68020 and VAX, with multiple
> memory accesses, all of which can trap relatively straightforwardly:
> just wait for all operations of the instruction to complete before
> committing any part of the instruction, and if there is an exception,
> throw all the parts away (precise exceptions rather than stack puke).
>
> Complex instructions would still come with a cost: There have to be
> enough resources available to guarantee forward progress (rather than
> trapping every time), even in the worst case of all memory accesses in
> conflicting cache lines and conflicting pages (in case of a
> non-fully-associative TLB), and page table entries in conflicting
> pages. And validating all the cases is a burden that CISCs have to
> bear.
>
> So maybe OoO would not have saved heavy CISCs like the VAX (last
> implementation NVAX in 1992) and the 68020 architecture (68060 in
> 1994). But it seems to me that the main reason for their demise was
> the decision by the computer makers to switch to RISC CPUs. IA-32 and
> S/360 descendants survived based on their software legacy, and shortly
> after (in case of IA-32) succeeded with OoO implementations (the S/360
> descendants took much longer AFAIK).
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Could we build a better 6502?

<08ebb48c-b234-4d17-aa49-c9c452cba1a3n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19974&group=comp.arch#19974

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a04e:: with SMTP id j75mr10959185qke.98.1629308136465;
Wed, 18 Aug 2021 10:35:36 -0700 (PDT)
X-Received: by 2002:a05:6808:2208:: with SMTP id bd8mr8359708oib.110.1629308136262;
Wed, 18 Aug 2021 10:35:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 18 Aug 2021 10:35:35 -0700 (PDT)
In-Reply-To: <524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <be7f8928-ee11-422e-b11e-f89d3ad39782n@googlegroups.com>
<4faf234c-eabc-4ed4-b849-5e33df12fdb9n@googlegroups.com> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org>
<sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com>
<2021Aug17.112004@mips.complang.tuwien.ac.at> <61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com>
<2021Aug18.102524@mips.complang.tuwien.ac.at> <524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <08ebb48c-b234-4d17-aa49-c9c452cba1a3n@googlegroups.com>
Subject: Re: Could we build a better 6502?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 18 Aug 2021 17:35:36 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Wed, 18 Aug 2021 17:35 UTC

On Wednesday, August 18, 2021 at 7:27:53 AM UTC-5, Michael S wrote:
> On Wednesday, August 18, 2021 at 12:19:00 PM UTC+3, Anton Ertl wrote:
> >
> > So maybe OoO would not have saved heavy CISCs like the VAX (last
> > implementation NVAX in 1992) and the 68020 architecture (68060 in
> > 1994). But it seems to me that the main reason for their demise was
> > the decision by the computer makers to switch to RISC CPUs. IA-32 and
> > S/360 descendants survived based on their software legacy, and shortly
> > after (in case of IA-32) succeeded with OoO implementations (the S/360
> > descendants took much longer AFAIK).
> > - anton
> > --
> > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
> It seems, OoO implementations of S/360 and descendants is [so far] a story of three chapters.
> Chapter 1. is Tomasulo's S/360 Model 91/191. This chapter was brief. I am not sure how to call it, a failure or a limited success.
<
It was a huge success--for everyone else, just not for IBM.
<
> Chapter 2. are water-cooled S/390 Models 820 and 900. Replaced by simpler "CMOS" models.
> Chapter 3 started with z196 in 2010 and is still going (zEC12, z13, z14, z15).

Re: Could we build a better 6502?

<zYcTI.560$Oz2.431@fx47.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19977&group=comp.arch#19977

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx47.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Could we build a better 6502?
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <be7f8928-ee11-422e-b11e-f89d3ad39782n@googlegroups.com> <4faf234c-eabc-4ed4-b849-5e33df12fdb9n@googlegroups.com> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at> <jwveeaquatd.fsf-monnier+comp.arch@gnu.org>
In-Reply-To: <jwveeaquatd.fsf-monnier+comp.arch@gnu.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 34
Message-ID: <zYcTI.560$Oz2.431@fx47.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 18 Aug 2021 19:12:31 UTC
Date: Wed, 18 Aug 2021 15:12:31 -0400
X-Received-Bytes: 2773

by: EricP - Wed, 18 Aug 2021 19:12 UTC

Stefan Monnier wrote:
>> With the amount of control logic saved, one might be able to afford a
>> 16-bit data path, avoiding the need to have two cycles through the ALU
>> for 16-bit operations (and the corresponding control logic), or the
>> need to work around that in software.
>
> I think as long as you're stuck with an accumulator design (i.e.
> a very small number of registers), the benefits of a 16bit datapath
> would be marginal, since you still constantly need to read or write
> it to memory (in 2 steps, given the 8bit memory bus).

It allows it to transfer the 16-bit address to the Memory Address Register
in 1 cycle not 2. Provided it can also handle a 16-bit increment PC+1->PC
that saves a cycle on every instruction fetch.

It also saves a cycle on every register indirect [reg] address (no offset)
which is one of larger % usages.

It can allow reading both ALU operands in 1 cycle
(it needs some muxes/pass gates to transfer 1 operand to the high byte).
For example, that might allow the accumulator to be sent to the ALU on
bits 15:8 while the external data bus is read on bits 7:0, in 1 clock.
In clock 2 the ALU calculates and result is written back to accumulator.

> For the CPUs of the time, performance was dictated by the number of
> bytes going over the memory bus. So I think the main focus should be on
> maximizing the number of registers (to reduce the amount of data memory
> accesses) while keeping the instruction encoding compact (to reduce the
> amount of instruction memory accesses), which calls for implicit
> register references, as is the case in stack machines and (to a lesser
> extent) belt machines.
>
>
> Stefan

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>They sold very few machines. Likely, less than 40 for /91, /95 and /195 combined.
>
>Reading the wikipedia pages, at most 20 /91s. 2 /95s. About 20 /195s.

The /91 was a failure in that they lost money on the series. The /95 was was
a /91 with thin film memory built for NASA. The /195 probably broke even after
treating the /91 program as a sunk cost and using the cache from the /85. The /91
used ASLT with faster components and denser packaging than SLT but I don't know how
useful that was since they went to SSI and MSI on S/370.

IBM's intentions in high performance computing after the /91 were never very clear.
Everyone was surprised at how well the /85's cache worked, but that provided a speedup
to everything, not just scientific work. They never tried to build another mainframe
with high performance floating point, but the did define a bunch of vector instruction
sets including one for ESA/390 and a different one for zSeries, so someone must be
using them.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: OoO S/360 descendants

<87fsv6a2kv.fsf@localhost>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19980&group=comp.arch#19980

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lyn...@garlic.com (Anne & Lynn Wheeler)
Newsgroups: comp.arch
Subject: Re: OoO S/360 descendants
Date: Wed, 18 Aug 2021 11:02:40 -1000
Organization: Wheeler&Wheeler
Lines: 81
Message-ID: <87fsv6a2kv.fsf@localhost>
References: <sde7eg$hgb$1@newsreader4.netcologne.de>
<4faf234c-eabc-4ed4-b849-5e33df12fdb9n@googlegroups.com>
<jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org>
<sfb5qi$m3u$1@newsreader4.netcologne.de>
<84126256-8745-4707-9445-130860761e76n@googlegroups.com>
<2021Aug17.112004@mips.complang.tuwien.ac.at>
<61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com>
<2021Aug18.102524@mips.complang.tuwien.ac.at>
<524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com>
<2021Aug18.155716@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="25b2bb0e79ce70a2d58b5858dd0f6ae0";
logging-data="5869"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+BQiGVSlzJTNOzHCKYLOazHYD0gYPO4ek="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:CaXpCmSP6fvviSuzXjur3jA3JTM=
sha1:3tJj8knZSlwfULTsRAtv3YJvQUQ=

by: Anne & Lynn Whee - Wed, 18 Aug 2021 21:02 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> That has eluded me up to now. I only read about some family members
> later that were far behind in microarchitecture and per-processor MIPS
> to what Intel did at the time (IIRC single-issue in-order while Intel
> was 3-wide OoO). I thought at the time that for the legacy software
> that they were running, more CPU performance is probably not that
> important (and maybe the jobs were I/O-bound anyway). Looking at the
> MIPS numbers of the 9672 machines, they were really far behind (but I
> think it was the z900 (2500MIPS on 16 processors) that I found
> underwhelming).

z196 seemed to have been the last where there were real live benchmark
numbers ... since then things got a lot more obfuscated ... getting
percents from previous machines. z196 documents have some statement that
1/3 to 1/2 of z10->z196 per processor performance improvement is
introduction of memory latency compensating technology (that had been in
other platforms for long time), out-of-order execution, branch
prediction, etc

z900, 16 processors, 2.5BIPS (156MIPS/proc), Dec2000
z990, 32 processors, 9BIPS, (281MIPS/proc), 2003
z9, 54 processors, 18BIPS (333MIPS/proc), July2005
z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010
EC12, 101 processors, 75BIPS (743MIPS/proc), Aug2012
z13, 140 processors, 100BIPS (710MIPS/proc), Jan2015
z14, 170 processors, 150BIPS (862MIPS/proc), Aug2017
z15, 190 processors, 190BIPS* (1000MIPS/proc), Sep2019

* pubs say z15 1.25 times z14 (1.25*150BIPS or 190BIPS)

trivia: mid-70s got sucked into project to do 16 processor SMP 370, we
even sucked the 3033 processor engineers working on it in their spare
time, lot more interesting than remapping 168-3 to 20% faster chips ...
lots thot that it was really great ... until somebody informed they
head of the POK lab that it could be decades before POK favorite son
operating system had effective 16-way support ... then some of us got
invited to never visit POK again (& 3033 processor to totally focus on
3033). z900 finally appears with 16-way ... more than 20yrs later.

for comparison z196 era blade was e5-2600 benchmarked at 500 BIPS
(industry standard benchmark based on number of iterations compared to
370/158-3 assumed to be 1MIP). Max configured z196 (50BIPS) had IBM
price around $30M or $600,000/BIPS while IBM base list price (before IBM
sold off that server business) was $1815 for E5-2600 blade (or
$3.60/BIPS).

other trivia: 1980, IBM STL was bursting at the seems and moving 300
people from IMS DBMS development group to offsite bldg with
dataprocessing back to STL datacenter. They had tried "remote" 3270s and
found human factors totally unacceptable. I get con'ed into doing
channel-extender support, placing channel attached 3270 controllers at
offsite bldg with no difference in human factors compared to 3270s
inside STL.

Hardware vendor tried to get IBM to release my support, but group in POK
working on some serial stuff get it blocked (afraid if it was in the
market place, it would make it more difficult to get their stuff
released). In 1988, I'm asked to help LLNL standardize some stuff they
are playing with which quickly becomes fibre channel standard (including
some stuff I had done in 1980). The POK group finally get their stuff
released in 1990 with ES/9000 as ESCON when it is already obsolete.

Then some POK engineers become involved in fibre channel standard and
define a heavy weight protocol that drastically cuts the native
throughput that is eventually released as FICON. The most recent public
numbers I've been able to find is Z195 "peak i/o" benchmark getting 2M
IOPS using 104 FICON (running over 104 fibre channel standard). About
the same time there was fibre channel standard announced for e5-2600
blade claiming over million IOPS (two such FCS getting higher throughput
than 104 FICON).

more trivia: IBM had been touting lots of mainframe I/O
pathlength&processing offloaded to dedicated "system assist processors"
(SAP) ... max configured z196 with max. number of 14 SAPs were all 100%
busy at 2.2M SSCH/sec (2.2M I/O operations) but recommends keeping SAPs
cpu at 70% or less (1.5M SSCH/sec) for I/O responsiveness.

--
virtualization experience starting Jan1968, online at home since Mar1970

Re: OoO S/360 descendants

<87bl5ua1it.fsf@localhost>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19981&group=comp.arch#19981

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lyn...@garlic.com (Anne & Lynn Wheeler)
Newsgroups: comp.arch
Subject: Re: OoO S/360 descendants
Date: Wed, 18 Aug 2021 11:25:30 -1000
Organization: Wheeler&Wheeler
Lines: 37
Message-ID: <87bl5ua1it.fsf@localhost>
References: <sde7eg$hgb$1@newsreader4.netcologne.de>
<jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org>
<sfb5qi$m3u$1@newsreader4.netcologne.de>
<84126256-8745-4707-9445-130860761e76n@googlegroups.com>
<2021Aug17.112004@mips.complang.tuwien.ac.at>
<61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com>
<2021Aug18.102524@mips.complang.tuwien.ac.at>
<524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com>
<2021Aug18.155716@mips.complang.tuwien.ac.at>
<3a2c9983-e999-465f-8c41-0492598e1370n@googlegroups.com>
<2021Aug18.184436@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="25b2bb0e79ce70a2d58b5858dd0f6ae0";
logging-data="9880"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+CCi1+rcG4SZjHzXViwIBRRlZp+lp8lk="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:enQf2q1MP9q1N48o3GgvpBWT/jw=
sha1:OMOujUmuvLslZRGFWG7vkrHnm6U=

by: Anne & Lynn Whee - Wed, 18 Aug 2021 21:25 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> But the rest of the industry saw both the GHz race and the
> introduction of superscalar and OoO microprocessors during the next
> decade, while IBM's CMOS entry in 1994 was a step back from the
> earlier ECL machines; and while they also saw fast improvements, they
> were still far behind in the 9672 (178 MIPS for 1 processor) in 1999
> and the Z900 in 2000. And the rest of the industry also passed IBM on
> the multi-processor stuff, with SGI's Origin and later Altix lines,
> Sun's (formerly FPS) Starfire, and HP Superdome. But it seems that
> IBM have caught up in both respects in the meantime.

other triviia: about same time as getting asked to play with LLNL on FCS
.... also asked to participate with SLAC/Gustavson activity spawning SCI
.... 64-way port cache interface. Sequent and Data General did 256-way
.... 64 shared cache boards with four Intel processors each ... Convex
did 128-way, 64 shared cache boards with two HP snake processors each.

SCI
https://en.wikipedia.org/wiki/Scalable_Coherent_Interface

Different versions and derivatives of SCI were implemented by companies
like Dolphin Interconnect Solutions, Convex, Data General AViiON (using
cache controller and link controller chips from Dolphin), Sequent and
Cray Research. Dolphin Interconnect Solutions implemented a PCI and
PCI-Express connected derivative of SCI that provides non-coherent
shared memory access. This implementation was used by Sun Microsystems
for its high-end clusters, Thales Group and several others including
volume applications for message passing within HPC clustering and
medical imaging. SCI was often used to implement non-uniform memory
access architectures. It was also used by Sequent Computer Systems as
the processor memory bus in their NUMA-Q systems. Numascale developed a
derivative to connect with coherent HyperTransport.

.... snip ...

--
virtualization experience starting Jan1968, online at home since Mar1970

Anne & Lynn Wheeler <lynn@garlic.com> writes:
>z196 seemed to have been the last where there were real live benchmark
>numbers ... since then things got a lot more obfuscated ...

Speaks against competetive performance.

>getting
>percents from previous machines. z196 documents have some statement that
>1/3 to 1/2 of z10->z196 per processor performance improvement is
>introduction of memory latency compensating technology (that had been in
>other platforms for long time), out-of-order execution, branch
>prediction, etc
>
>z900, 16 processors, 2.5BIPS (156MIPS/proc), Dec2000
>z990, 32 processors, 9BIPS, (281MIPS/proc), 2003
>z9, 54 processors, 18BIPS (333MIPS/proc), July2005
>z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
>z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010
>EC12, 101 processors, 75BIPS (743MIPS/proc), Aug2012
>z13, 140 processors, 100BIPS (710MIPS/proc), Jan2015
>z14, 170 processors, 150BIPS (862MIPS/proc), Aug2017
>z15, 190 processors, 190BIPS* (1000MIPS/proc), Sep2019
>
>* pubs say z15 1.25 times z14 (1.25*150BIPS or 190BIPS)

1000MIPS/core would be really low; at 5.2GHz that would be 0.2IPC.

By contrast, on Zen3 I see (using the LaTeX benchmark) 2637M AMD64
instructions executed in 0.25s, i.e., >10,000MIPS, and 2.8IPC. Given
the kind of technology in z15, I would expect much higher MIPS
numbers, even if they are not high enough to be competetive (as
suggested by the lack of benchmark numbers).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: OoO S/360 descendants

<5f1ee158-53e6-43dd-9a68-f3a1a37bb4f5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19986&group=comp.arch#19986

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:9e55:: with SMTP id h82mr3265904qke.42.1629373383542;
Thu, 19 Aug 2021 04:43:03 -0700 (PDT)
X-Received: by 2002:a9d:1b5:: with SMTP id e50mr11879198ote.76.1629373383088;
Thu, 19 Aug 2021 04:43:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 19 Aug 2021 04:43:02 -0700 (PDT)
In-Reply-To: <2021Aug19.125511@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org>
<sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com>
<2021Aug17.112004@mips.complang.tuwien.ac.at> <61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com>
<2021Aug18.102524@mips.complang.tuwien.ac.at> <524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com>
<2021Aug18.155716@mips.complang.tuwien.ac.at> <87fsv6a2kv.fsf@localhost> <2021Aug19.125511@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5f1ee158-53e6-43dd-9a68-f3a1a37bb4f5n@googlegroups.com>
Subject: Re: OoO S/360 descendants
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 19 Aug 2021 11:43:03 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3862

by: Michael S - Thu, 19 Aug 2021 11:43 UTC

On Thursday, August 19, 2021 at 2:11:04 PM UTC+3, Anton Ertl wrote:
> Anne & Lynn Wheeler <ly...@garlic.com> writes:
> >z196 seemed to have been the last where there were real live benchmark
> >numbers ... since then things got a lot more obfuscated ...
> Speaks against competetive performance.

I think, even in period where, according to Lynn, there were "were real live benchmark numbers",
they were always IBM mainframe vs some old IBM mainframe.
I don't believe that in last 35-40 years IBM ever published CPU benchmarks that can be compared
directly against IBM's competitors or even vs other IBM-made computers.

It does not mean that other bodies didn't make such comparisons, but somehow nothing is public.

> >getting
> >percents from previous machines. z196 documents have some statement that
> >1/3 to 1/2 of z10->z196 per processor performance improvement is
> >introduction of memory latency compensating technology (that had been in
> >other platforms for long time), out-of-order execution, branch
> >prediction, etc
> >
> >z900, 16 processors, 2.5BIPS (156MIPS/proc), Dec2000
> >z990, 32 processors, 9BIPS, (281MIPS/proc), 2003
> >z9, 54 processors, 18BIPS (333MIPS/proc), July2005
> >z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
> >z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010
> >EC12, 101 processors, 75BIPS (743MIPS/proc), Aug2012
> >z13, 140 processors, 100BIPS (710MIPS/proc), Jan2015
> >z14, 170 processors, 150BIPS (862MIPS/proc), Aug2017
> >z15, 190 processors, 190BIPS* (1000MIPS/proc), Sep2019
> >
> >* pubs say z15 1.25 times z14 (1.25*150BIPS or 190BIPS)
> 1000MIPS/core would be really low; at 5.2GHz that would be 0.2IPC.
>
> By contrast, on Zen3 I see (using the LaTeX benchmark) 2637M AMD64
> instructions executed in 0.25s, i.e., >10,000MIPS, and 2.8IPC. Given
> the kind of technology in z15, I would expect much higher MIPS
> numbers, even if they are not high enough to be competetive (as
> suggested by the lack of benchmark numbers).

Most likely "IBM MIPS" have very little to do with number of actual instructions executed per second.

> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

According to Michael S <already5chosen@yahoo.com>:
>I think, even in period where, according to Lynn, there were "were real live benchmark numbers",
>they were always IBM mainframe vs some old IBM mainframe.
>I don't believe that in last 35-40 years IBM ever published CPU benchmarks that can be compared
>directly against IBM's competitors or even vs other IBM-made computers.
>
>It does not mean that other bodies didn't make such comparisons, but somehow nothing is public.

I don't see how one could do meaningful benchmarks. IBM has lots of very complex instructions
to speed up specific tasks. If you have a bunch of data, sort it with heapsort, compress it
with gzip, and then ship it over a network using TCP checksums, it should be really fast
because they have instructions for those specific tasks.

A long time ago, DEC sold the VAX-11.780 as a one MIPS machine even though it really ran
about 500 KIPS. It was roughly as fast as a 370/158 which ran at 1 IBM MIPS.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Incrementer/decrementer, and 16 bits too.

An incrementer is needed for the PC, so one might also use it for
other stuff. Decrement would cost extra, so I would put that into the
nice-to-have, not the must-have bin.

>But there is a not-so-minor complication of how to do a R-M-W
>sequence on a register like PC+1->PC if the registers are built
>from latches and there is only one set of read/write bit lines.
>And even if you can afford separate read and write bit lines,
>the cells are latches and cannot be both read and written at once
>(which is why one can go with the single bit line in the first place).

So you need another register for the incrementer output (or its
input). Looking at the 6502 microarchitecture
<https://i.imgur.com/BkZ9o.png>, it has four 8-bit registers for the
PC: PCL, PCH (the actual PC), and PCLS, PCHS, despite having separate
busses from the PC to the incrementer and from the incrementer to the
PC.

>One also would like to read an 8-bit external data bus value directly
>to the ALU as an operand without an extra cycle to save the operand
>in a temp register.

The 6502 passes that through the input data latch even though it has
load-and-op instructions. In a load/store architecture, this does not
make much sense.

>It should be capable of doing a memory access cycle in 1 clock.
>That is, address is latched into Memory Address Register (MAR)
>and bus read/write cycle starts on rising clock edge.
>Memory Ready line is sampled on rising edge of next clock,
>and for a read if Ready the data is latched internally
>and we proceed to the next state.
>
>That allows you to take better advantage of faster SRAM as the
>processor clock increases later on.

It also allows you to run the whole thing at the maximum speed that
RAM allows, while AFAIK the 6502 only accessed memory during one
half-cycle (which then allowed the graphics chip to access the RAM in
the other half-cycle). But OTOH you need extra transistors to support
this, and probably also some transistors to make the CPU run fast
enough that it can benefit from this capability.

>Or one can eliminate push and pop and use the LD/ST approach,
>which implies it has a register displacement [reg+offset]
>address mode (8 and 16 bit displacements).
>Obviously useful, but complicates control sequencer.

Therefore my thinking is in the direction of not supporting an offset,
at least in the base architecture, but autoincrement.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Could we build a better 6502?

<a813eeb9-fda9-414d-b488-8dc661c7b168n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19992&group=comp.arch#19992

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:44b4:: with SMTP id a20mr9881239qto.166.1629385185712; Thu, 19 Aug 2021 07:59:45 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr12205979oti.178.1629385185487; Thu, 19 Aug 2021 07:59:45 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 19 Aug 2021 07:59:45 -0700 (PDT)
In-Reply-To: <2021Aug19.162124@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <be7f8928-ee11-422e-b11e-f89d3ad39782n@googlegroups.com> <4faf234c-eabc-4ed4-b849-5e33df12fdb9n@googlegroups.com> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at> <sfi8ip$feu$1@newsreader4.netcologne.de> <G_9TI.10907$Lv3.8090@fx08.iad> <2021Aug19.162124@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a813eeb9-fda9-414d-b488-8dc661c7b168n@googlegroups.com>
Subject: Re: Could we build a better 6502?
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 19 Aug 2021 14:59:45 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 66

by: MitchAlsup - Thu, 19 Aug 2021 14:59 UTC

On Thursday, August 19, 2021 at 9:52:59 AM UTC-5, Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
> >Incrementer/decrementer, and 16 bits too.
<
> An incrementer is needed for the PC, so one might also use it for
> other stuff. Decrement would cost extra, so I would put that into the
> nice-to-have, not the must-have bin.
<
Don't forget that the PC needs an adder, too (branch targets).
<
But I am sue the 6502 shared the adder: data and fetch
<
> >But there is a not-so-minor complication of how to do a R-M-W
> >sequence on a register like PC+1->PC if the registers are built
> >from latches and there is only one set of read/write bit lines.
<
4 phase clocking allows a latch to smell like a flip-flop.
<
> >And even if you can afford separate read and write bit lines,
> >the cells are latches and cannot be both read and written at once
> >(which is why one can go with the single bit line in the first place).
> So you need another register for the incrementer output (or its
> input). Looking at the 6502 microarchitecture
> <https://i.imgur.com/BkZ9o.png>, it has four 8-bit registers for the
> PC: PCL, PCH (the actual PC), and PCLS, PCHS, despite having separate
> busses from the PC to the incrementer and from the incrementer to the
> PC.
> >One also would like to read an 8-bit external data bus value directly
> >to the ALU as an operand without an extra cycle to save the operand
> >in a temp register.
> The 6502 passes that through the input data latch even though it has
> load-and-op instructions. In a load/store architecture, this does not
> make much sense.
> >It should be capable of doing a memory access cycle in 1 clock.
> >That is, address is latched into Memory Address Register (MAR)
> >and bus read/write cycle starts on rising clock edge.
> >Memory Ready line is sampled on rising edge of next clock,
> >and for a read if Ready the data is latched internally
> >and we proceed to the next state.
> >
> >That allows you to take better advantage of faster SRAM as the
> >processor clock increases later on.
> It also allows you to run the whole thing at the maximum speed that
> RAM allows,
<
I mentioned this way up above:: The whole design is to consume as
much memory BW as the pins allow. The ISA was designed to allow
that.
<
> while AFAIK the 6502 only accessed memory during one
> half-cycle (which then allowed the graphics chip to access the RAM in
> the other half-cycle). But OTOH you need extra transistors to support
<
All you need is the ability to Tri-State the bus.
<
> this, and probably also some transistors to make the CPU run fast
> enough that it can benefit from this capability.
> >Or one can eliminate push and pop and use the LD/ST approach,
> >which implies it has a register displacement [reg+offset]
> >address mode (8 and 16 bit displacements).
> >Obviously useful, but complicates control sequencer.
> Therefore my thinking is in the direction of not supporting an offset,
> at least in the base architecture, but autoincrement.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

On Thursday, August 19, 2021 at 9:43:32 AM UTC-5, John Levine wrote:
> According to Michael S <already...@yahoo.com>:
> >I think, even in period where, according to Lynn, there were "were real live benchmark numbers",
> >they were always IBM mainframe vs some old IBM mainframe.
> >I don't believe that in last 35-40 years IBM ever published CPU benchmarks that can be compared
> >directly against IBM's competitors or even vs other IBM-made computers.
> >
> >It does not mean that other bodies didn't make such comparisons, but somehow nothing is public.
> I don't see how one could do meaningful benchmarks. IBM has lots of very complex instructions
> to speed up specific tasks. If you have a bunch of data, sort it with heapsort, compress it
> with gzip, and then ship it over a network using TCP checksums, it should be really fast
> because they have instructions for those specific tasks.
<
Not just instructions, but function units designed and dedicated to doing these things.
>
> A long time ago, DEC sold the VAX-11.780 as a one MIPS machine even though it really ran
> about 500 KIPS. It was roughly as fast as a 370/158 which ran at 1 IBM MIPS.
<
VAX-11/780 ran at about 4 clocks per instruction.
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

John Levine <johnl@taugh.com> writes:
>I don't see how one could do meaningful benchmarks.

Run the same program on both. Linux runs on these systems, so it
should not be hard (apart from getting access to such a machine, and
maybe not being allowed to publish the results).

>IBM has lots of very complex instructions
>to speed up specific tasks.

As the VAX vs. RISC discussion taught us, just because it has very
complex instructions does not mean that these speed up the tasks. And
IIRC IBM' 801 project discovered the same even earlier wrt complex 360
instructions.

>If you have a bunch of data, sort it with heapsort, compress it
>with gzip, and then ship it over a network using TCP checksums, it should be really fast
>because they have instructions for those specific tasks.

So have that as a benchmark to show off the benefits of these
instructions.

They really have an instruction for heapsort, the sort with the worst
cache locality among the n*ln(n) sorts?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>So you need another register for the incrementer output (or its
>input). Looking at the 6502 microarchitecture
><https://i.imgur.com/BkZ9o.png>, it has four 8-bit registers for the
>PC: PCL, PCH (the actual PC), and PCLS, PCHS, despite having separate
>busses from the PC to the incrementer and from the incrementer to the
>PC.

Looking at the dark red color of PCLS and PCHS, these are probably
logic (muxes) rather than registers. So by using separate busses, the
PC can be incremented in one cycle without extra register.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: OoO S/360 descendants

<k9vTI.36153$Fu2.9688@fx16.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19996&group=comp.arch#19996

copy link Newsgroups: comp.arch

Path: i2pn2.org!rocksolid2!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: OoO S/360 descendants
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at> <61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com> <2021Aug18.102524@mips.complang.tuwien.ac.at> <524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com> <2021Aug18.155716@mips.complang.tuwien.ac.at> <87fsv6a2kv.fsf@localhost> <2021Aug19.125511@mips.complang.tuwien.ac.at> <5f1ee158-53e6-43dd-9a68-f3a1a37bb4f5n@googlegroups.com>
In-Reply-To: <5f1ee158-53e6-43dd-9a68-f3a1a37bb4f5n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 61
Message-ID: <k9vTI.36153$Fu2.9688@fx16.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 19 Aug 2021 15:54:56 UTC
Date: Thu, 19 Aug 2021 11:54:39 -0400
X-Received-Bytes: 4168

by: EricP - Thu, 19 Aug 2021 15:54 UTC

Michael S wrote:
> On Thursday, August 19, 2021 at 2:11:04 PM UTC+3, Anton Ertl wrote:
>> Anne & Lynn Wheeler <ly...@garlic.com> writes:
>>> z196 seemed to have been the last where there were real live benchmark
>>> numbers ... since then things got a lot more obfuscated ...
>> Speaks against competetive performance.
>
> I think, even in period where, according to Lynn, there were "were real live benchmark numbers",
> they were always IBM mainframe vs some old IBM mainframe.
> I don't believe that in last 35-40 years IBM ever published CPU benchmarks that can be compared
> directly against IBM's competitors or even vs other IBM-made computers.

Probably because it doesn't help their sales so why do it.
Does anybody port _to_ zSeries?

> It does not mean that other bodies didn't make such comparisons, but somehow nothing is public.
>
>
>>> getting
>>> percents from previous machines. z196 documents have some statement that
>>> 1/3 to 1/2 of z10->z196 per processor performance improvement is
>>> introduction of memory latency compensating technology (that had been in
>>> other platforms for long time), out-of-order execution, branch
>>> prediction, etc
>>>
>>> z900, 16 processors, 2.5BIPS (156MIPS/proc), Dec2000
>>> z990, 32 processors, 9BIPS, (281MIPS/proc), 2003
>>> z9, 54 processors, 18BIPS (333MIPS/proc), July2005
>>> z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
>>> z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010
>>> EC12, 101 processors, 75BIPS (743MIPS/proc), Aug2012
>>> z13, 140 processors, 100BIPS (710MIPS/proc), Jan2015
>>> z14, 170 processors, 150BIPS (862MIPS/proc), Aug2017
>>> z15, 190 processors, 190BIPS* (1000MIPS/proc), Sep2019
>>>
>>> * pubs say z15 1.25 times z14 (1.25*150BIPS or 190BIPS)
>> 1000MIPS/core would be really low; at 5.2GHz that would be 0.2IPC.
>>
>> By contrast, on Zen3 I see (using the LaTeX benchmark) 2637M AMD64
>> instructions executed in 0.25s, i.e., >10,000MIPS, and 2.8IPC. Given
>> the kind of technology in z15, I would expect much higher MIPS
>> numbers, even if they are not high enough to be competetive (as
>> suggested by the lack of benchmark numbers).
>
> Most likely "IBM MIPS" have very little to do with number of actual instructions executed per second.

Like VUPs, these are ZUPs.
As the story was told, DEC originally marketed the VAX-780 as a
1 MIPS processor because that is what the designers thought it was.
And later there were many benchmarks like Dhrystone MIPS that
used VAX-780 as their base unit of processing assuming it was 1 MIPS.

Then someone counted the actual instruction micro states and
discovered that 780 was actually 0.5 MIPS. Ooops.
So some marketing wizard came up with the idea of VUPs or
"VAX Units of Processing" which is defined as the speed of a 780.
Reprint literature, problem solved.

So these must be ZUPs.

Re: OoO S/360 descendants (was: Could we build a better 6502?)

<sflvss$vq5$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19997&group=comp.arch#19997

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: OoO S/360 descendants (was: Could we build a better 6502?)
Date: Thu, 19 Aug 2021 09:12:12 -0700
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <sflvss$vq5$1@dont-email.me>
References: <sde7eg$hgb$1@newsreader4.netcologne.de>
<2021Aug18.155716@mips.complang.tuwien.ac.at>
<3a2c9983-e999-465f-8c41-0492598e1370n@googlegroups.com>
<2021Aug18.184436@mips.complang.tuwien.ac.at> <sfjpgd$25l8$1@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 19 Aug 2021 16:12:13 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c4b354f149773bd247bf578055677a10";
logging-data="32581"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+HHqAFWKjM600dy1ui/mbK659icQLapSw="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
Cancel-Lock: sha1:jSPnAgX1ZGm7bHceJiO2bXNcZKs=
In-Reply-To: <sfjpgd$25l8$1@gal.iecc.com>
Content-Language: en-US

by: Stephen Fuld - Thu, 19 Aug 2021 16:12 UTC

On 8/18/2021 1:10 PM, John Levine wrote:
> According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>> They sold very few machines. Likely, less than 40 for /91, /95 and /195 combined.
>>
>> Reading the wikipedia pages, at most 20 /91s. 2 /95s. About 20 /195s.
>
> The /91 was a failure in that they lost money on the series. The /95 was was
> a /91 with thin film memory built for NASA. The /195 probably broke even after
> treating the /91 program as a sunk cost and using the cache from the /85. The /91
> used ASLT with faster components and denser packaging than SLT but I don't know how
> useful that was since they went to SSI and MSI on S/370.
>
> IBM's intentions in high performance computing after the /91 were never very clear.
> Everyone was surprised at how well the /85's cache worked, but that provided a speedup
> to everything, not just scientific work. They never tried to build another mainframe
> with high performance floating point, but the did define a bunch of vector instruction
> sets including one for ESA/390 and a different one for zSeries, so someone must be
> using them.

Don't forget the vector facility for the 3090. Perhaps the new
instructions are there just for compatibility.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: OoO S/360 descendants

<71d05233-5e4c-4886-ae4d-d3375f782c0cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19999&group=comp.arch#19999

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:80a:: with SMTP id df10mr15459390qvb.58.1629391450288; Thu, 19 Aug 2021 09:44:10 -0700 (PDT)
X-Received: by 2002:a05:6808:1981:: with SMTP id bj1mr3370836oib.155.1629391450072; Thu, 19 Aug 2021 09:44:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 19 Aug 2021 09:44:09 -0700 (PDT)
In-Reply-To: <k9vTI.36153$Fu2.9688@fx16.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <jwvtujq6d9n.fsf-monnier+comp.arch@gnu.org> <sfb5qi$m3u$1@newsreader4.netcologne.de> <84126256-8745-4707-9445-130860761e76n@googlegroups.com> <2021Aug17.112004@mips.complang.tuwien.ac.at> <61a8609e-7cda-46cd-9723-4936e719e927n@googlegroups.com> <2021Aug18.102524@mips.complang.tuwien.ac.at> <524743a0-0c40-4543-b68e-17b2af752d7en@googlegroups.com> <2021Aug18.155716@mips.complang.tuwien.ac.at> <87fsv6a2kv.fsf@localhost> <2021Aug19.125511@mips.complang.tuwien.ac.at> <5f1ee158-53e6-43dd-9a68-f3a1a37bb4f5n@googlegroups.com> <k9vTI.36153$Fu2.9688@fx16.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <71d05233-5e4c-4886-ae4d-d3375f782c0cn@googlegroups.com>
Subject: Re: OoO S/360 descendants
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 19 Aug 2021 16:44:10 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 62

by: Michael S - Thu, 19 Aug 2021 16:44 UTC

On Thursday, August 19, 2021 at 6:54:59 PM UTC+3, EricP wrote:
> Michael S wrote:
> > On Thursday, August 19, 2021 at 2:11:04 PM UTC+3, Anton Ertl wrote:
> >> Anne & Lynn Wheeler <ly...@garlic.com> writes:
> >>> z196 seemed to have been the last where there were real live benchmark
> >>> numbers ... since then things got a lot more obfuscated ...
> >> Speaks against competetive performance.
> >
> > I think, even in period where, according to Lynn, there were "were real live benchmark numbers",
> > they were always IBM mainframe vs some old IBM mainframe.
> > I don't believe that in last 35-40 years IBM ever published CPU benchmarks that can be compared
> > directly against IBM's competitors or even vs other IBM-made computers.
> Probably because it doesn't help their sales so why do it.
> Does anybody port _to_ zSeries?

So, bragging rights do not matter any more?
The fastest CPU on the planet Earth, etc...?
I would think that there were at least few non-IBM-specific benchmarks on which either z196
or one of its successors was, at time of introduction, the fastest in the world.

> > It does not mean that other bodies didn't make such comparisons, but somehow nothing is public.
> >
> >
> >>> getting
> >>> percents from previous machines. z196 documents have some statement that
> >>> 1/3 to 1/2 of z10->z196 per processor performance improvement is
> >>> introduction of memory latency compensating technology (that had been in
> >>> other platforms for long time), out-of-order execution, branch
> >>> prediction, etc
> >>>
> >>> z900, 16 processors, 2.5BIPS (156MIPS/proc), Dec2000
> >>> z990, 32 processors, 9BIPS, (281MIPS/proc), 2003
> >>> z9, 54 processors, 18BIPS (333MIPS/proc), July2005
> >>> z10, 64 processors, 30BIPS (469MIPS/proc), Feb2008
> >>> z196, 80 processors, 50BIPS (625MIPS/proc), Jul2010
> >>> EC12, 101 processors, 75BIPS (743MIPS/proc), Aug2012
> >>> z13, 140 processors, 100BIPS (710MIPS/proc), Jan2015
> >>> z14, 170 processors, 150BIPS (862MIPS/proc), Aug2017
> >>> z15, 190 processors, 190BIPS* (1000MIPS/proc), Sep2019
> >>>
> >>> * pubs say z15 1.25 times z14 (1.25*150BIPS or 190BIPS)
> >> 1000MIPS/core would be really low; at 5.2GHz that would be 0.2IPC.
> >>
> >> By contrast, on Zen3 I see (using the LaTeX benchmark) 2637M AMD64
> >> instructions executed in 0.25s, i.e., >10,000MIPS, and 2.8IPC. Given
> >> the kind of technology in z15, I would expect much higher MIPS
> >> numbers, even if they are not high enough to be competetive (as
> >> suggested by the lack of benchmark numbers).
> >
> > Most likely "IBM MIPS" have very little to do with number of actual instructions executed per second.
> Like VUPs, these are ZUPs.
> As the story was told, DEC originally marketed the VAX-780 as a
> 1 MIPS processor because that is what the designers thought it was.
> And later there were many benchmarks like Dhrystone MIPS that
> used VAX-780 as their base unit of processing assuming it was 1 MIPS.
>
> Then someone counted the actual instruction micro states and
> discovered that 780 was actually 0.5 MIPS. Ooops.
> So some marketing wizard came up with the idea of VUPs or
> "VAX Units of Processing" which is defined as the speed of a 780.
> Reprint literature, problem solved.
>
> So these must be ZUPs.

Re: OoO S/360 descendants (was: Could we build a better 6502?)

<5c500666-e0da-4c5f-9eca-aecf80478b4fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=20000&group=comp.arch#20000

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:fe85:: with SMTP id d5mr15308467qvs.21.1629391862613;
Thu, 19 Aug 2021 09:51:02 -0700 (PDT)
X-Received: by 2002:a05:6830:3109:: with SMTP id b9mr12989752ots.276.1629391862354;
Thu, 19 Aug 2021 09:51:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 19 Aug 2021 09:51:02 -0700 (PDT)
In-Reply-To: <sflvss$vq5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <sde7eg$hgb$1@newsreader4.netcologne.de> <2021Aug18.155716@mips.complang.tuwien.ac.at>
<3a2c9983-e999-465f-8c41-0492598e1370n@googlegroups.com> <2021Aug18.184436@mips.complang.tuwien.ac.at>
<sfjpgd$25l8$1@gal.iecc.com> <sflvss$vq5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5c500666-e0da-4c5f-9eca-aecf80478b4fn@googlegroups.com>
Subject: Re: OoO S/360 descendants (was: Could we build a better 6502?)
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 19 Aug 2021 16:51:02 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Thu, 19 Aug 2021 16:51 UTC

On Thursday, August 19, 2021 at 7:12:15 PM UTC+3, Stephen Fuld wrote:
> On 8/18/2021 1:10 PM, John Levine wrote:
> > According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
> >> They sold very few machines. Likely, less than 40 for /91, /95 and /195 combined.
> >>
> >> Reading the wikipedia pages, at most 20 /91s. 2 /95s. About 20 /195s.
> >
> > The /91 was a failure in that they lost money on the series. The /95 was was
> > a /91 with thin film memory built for NASA. The /195 probably broke even after
> > treating the /91 program as a sunk cost and using the cache from the /85. The /91
> > used ASLT with faster components and denser packaging than SLT but I don't know how
> > useful that was since they went to SSI and MSI on S/370.
> >
> > IBM's intentions in high performance computing after the /91 were never very clear.
> > Everyone was surprised at how well the /85's cache worked, but that provided a speedup
> > to everything, not just scientific work. They never tried to build another mainframe
> > with high performance floating point, but the did define a bunch of vector instruction
> > sets including one for ESA/390 and a different one for zSeries, so someone must be
> > using them.
> Don't forget the vector facility for the 3090. Perhaps the new
> instructions are there just for compatibility.
>

I am pretty sure that "vector" instructions introduced in z13 are totally unrelated to vector facilities of either 3090 or ES/9000.
A new facility is SIMD and is rather similar to "vector and scalar unit" of POWER8.

>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Subject	Author
Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Quadibloc
Re: Could we build a better 6502?	John Levine
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	aph
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	Quadibloc
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Brian G. Lucas
Re: Could we build a better 6502?	Quadibloc
Re: Could we build a better 6502?	Brian G. Lucas
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	Stephen Fuld
Re: Could we build a better 6502?	Terje Mathisen
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Stephen Fuld
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Timothy McCaffrey
Re: Could we build a better 6502?	Michael Barry
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Timothy McCaffrey
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Michael Barry
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	chris
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	chris
Re: Could we build a better 6502?	George Neuner
Re: Could we build a better 6502?	chris
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Bernd Linsel
Re: Could we build a better 6502?	David Brown
Re: Could we build a better 6502?	chris
Re: Could we build a better 6502?	David Brown
Re: Could we build a better 6502?	Terje Mathisen
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Terje Mathisen
Re: Could we build a better 6502?	Al Grant
Re: Could we build a better 6502?	chris
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Stefan Monnier
Re: Could we build a better 6502?	Ivan Godard
Re: Could we build a better 6502?	Stefan Monnier
Re: Could we build a better 6502?	John Dallman
Re: Could we build a better 6502?	Stefan Monnier
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Ivan Godard
Re: Could we build a better 6502?	Stephen Fuld
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	David Schultz
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	David Schultz
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Marcus
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Quadibloc
Re: Could we build a better PDP-8, was 6502?	John Levine
Re: Could we build a better 6502?	Tim Rentsch
Re: Could we build a better 6502?	Quadibloc
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	David Schultz
Re: Could we build a better 6502?	Brett
Re: Could we build a better 6502?	David Schultz
Re: Could we build a better 6502?	Brett
Re: Could we build a better 6502?	David Schultz
Re: Could we build a better 6502?	Brett
Re: Could we build a better 6502?	David Schultz
Re: Could we build a better 6502?	Stefan Monnier
Re: Could we build a better 6502?	Thomas Koenig
Re: Could we build a better 6502?	Stefan Monnier
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	Marcus
Re: Could we build a better 6502?	MitchAlsup
Re: Could we build a better 6502?	EricP
Re: Could we build a better 6502?	Guillaume
Re: Could we build a better 6502?	EricP
Re: Could we build a better 6502?	Timothy McCaffrey
Re: Could we build a better 6502?	JimBrakefield
Re: Could we build a better 6502?	Anssi Saari
Re: Could we build a better 6502?	John Dallman
Re: Could we build a better 6502?	Anton Ertl
Re: Could we build a better 6502?	Michael Barry
Re: Could we build a better 6502?	pec...@gmail.com
Re: Could we build a better 6502?	Bernd Linsel
Re: Could we build a better 6502?	clamky
Re: Could we build a better 6502?	Quadibloc
Re: Could we build a better 6502?	Quadibloc