Message-ID:

A year spent in artificial intelligence is enough to make one believe in God.

On 5/8/2022 9:00 AM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> It looks like:
>> PDP-11:
>> 8x 16-bit
>> Various addressing modes
>> Looks kinda like an MSP430 with half as many registers;
>
> Both have two-address instructions where both operands can be in memory.
>
>> Also sorta resembles the M68K.
>
> The M68k does not have general-purpose registers.
> Apart from MOVE, all instructions have only one memory operand.
>

Yeah, but similar-looking ASM, apart from splitting up the registers and
some other things.

All 3 ISA designs have in common that they are built around 16-bit
instruction words, often with a similar-ish layout, eg:
op,rn,mode,rm
Or:
op,mode,rn,mode,rm

>> VAX
>> 16x 32-bit
>> Kinda similar to PDP-11.
>
> VAX has three-operand instructions, and all operands can be in memory.
>

I was having a harder time finding as much information about VAX, so not
really sure what its encoding looks like.

From the "ASM and above" POV, it kinda resembles the others in this group.

>> Contrast is x86, which appears to have emerged separately.
>> I would probably cluster x86, Z80, and 6502 as another family.
>
> The relationships are well-known:
>
> 6800 -> 6502
> 6800 -> 6809
> 6800 -> 68000
>
> (all incompatible) These are all machines with accumulators and index
> registers (accumulator machines), with the 6809 having more
> accumulators (2x8-bit=1x16-bit) and more index registers (4x16-bit),
> and the 68000 having 8x32-bit each. What was mostly kept is the one
> memory operand per instruction.
>

Though, the 6502's design more resembles the 8080 and friends than it
does the M68K.

Looking, the 6800 and 68000 appear to be significantly different.
Though, yeah, there is some similarity between the 6800 and 6502 as well.

Could probably lump them together by common features:
Similar registers: Accumulator, Index, Stack, Program Counter
8-bit instructions
...

> 8008->8080
> 8080->Z80
> 8080->8086
> 8086->IA-32
> iA-32->AMD64
>
> These started out with special-purpose registers (accumulator, index,
> counter, address) but turned more and more general-purpose over time.
> Again, one memory operand per instruction. Already in the 8086 the
> general instructions could treat all registers as register operand,
> and IA-32 finally had general-purpose registers (all registers could
> also be used for addressing). AMD64 meant that byte operands could
> reside in all registers.
>

I consider the 8086 to be a pretty big jump from the 8080.

Despite both having a common ancestor in the 8080, for example, the 8086
and Z80 are quite different from each other.

>> ARM32 seems to have some superficial similarities both to the PDP and
>> x86,
>
> Which ones?
>

ARM32 has autoincrement modes, and scaled index addressing.
Autoincrement was mostly seen in the PDP inspired ISAs.
Scaled index was seen in the x86 and M68K.

ARM32 uses page-based virtual memory, with a page-table format that
appeared very similar to that of the 386.

I would need to look around more to figure out the timeline for when
each added these features in relation to each other.

>> albeit initially built around 32-bit instruction words and
>> Load/Store, putting it more in with other RISC family ISAs.
>>
>> Eg:
>> 32-bit instruction words, often fixed-length;
>
> A32 is fixed-length. Later T32 was added, which also allows 16-bit
> instruction words (subset of same instruction set).
>

I was not counting Thumb because it was apparently added much later
(90s), and was apparently inspired by the SuperH.

Though, compared with the SuperH, its instruction format was
significantly more bit-twiddly.

Well, at least until RISC-V came along and was like "hold my beer".

So, SuperH:
zzzz-nnnn-mmmm-zzzz //typical 2R op
zzzz-nnnn-zzzz-zzzz //typical 1R op
zzzz-nnnn-mmmm-iiii //a few Ld/St ops
zzzz-nnnn-iiii-iiii //a few immediate instructions.
zzzz-zzzz-iiii-iiii //a few other instructions.

Thumb shook it up some, tried fitting some 3-register forms in the mix
(with 3-bit registers).

Thumb layout was pretty variable, but at least, if bits were next to
each other, they were in the expected order.

RVC has stuff all over the place, and even in most cases where bits
"are" next to each other, they are not in a consistent order, and things
like which bits go where are prone to vary from one instruction form to
the next.

If one counts variations where the bits are in a different order, the
number of RVC instruction layouts is well into the double-digits.

This is partly why the RISC-V mode for the BJX2 core has not yet added
RVC support. I start trying to poke at it, and quickly lose my
motivation to deal with this level of bit-confetti...

It is possible I may just leave it at RV64IM levels for a while.
'C': "WTF is going on with their encodings?"
'A': Defines cache semantics which don't match my L1
'A' also diverges from strict Load/Store semantics.
As-specified, one would need to add an ALU to the L1 cache.
'F'/'D': Doesn't really match up with my existing FPU.
Supporting their FPU design would add considerable cost.
...

In terms of 16-bit ops, BJX2 has:
zzzz-zzzz-nnnn-mmmm //typical 2R ops
zzzz-nzzz-nnnn-zzzz //typical 1R ops
zzzz-ziii-nnnn-mmmm //A few LD/ST ops
zzzz-zznm-nnnn-mmmm //A few 2R ops (R0..R31)
zzzz-zzzz-nnnn-iiii //a few 2RI ops (Imm4)
zzzz-nnnn-iiii-iiii //a few immediate instructions (Imm8).
zzzz-zzzz-iiii-iiii //a few other instructions.

It is possible (in retrospect) it could have been better had BJX2 stayed
slightly closer to the original SH layout, but oh well.

However, the way 32-bit instruction formats had worked in SuperH (and
BJX1) had "sucked pretty hard" (they were all shoved into odd corners of
the 16-bit encoding space, so one would need to look at ~ 12 instruction
bits to distinguish between 16 and 32 bit encodings).

I had mostly managed to keep it consistent in BJX2, at least until I
added the XGPR extension (adds encodings for R32..R63), which kinda made
a mess of things in this area. Also adds a bit of penalty for things
like interrupt handling and task-switching as well (since they need to
be saved/restored).

So, a possibly "similar, but hopefully more consistent":
00zz-nnnn-mmmm-zzzz (16-bit, 2R)
01zz-nnnn-iiii-zzzz (16-bit, 2RI)
10zz-zzzz-iiii-iiii (16-bit, Imm8 and Misc)
11**-****-****-**** (32-bit)

For 32-bit encodings, possibly (5b regs)
11pp-pzzz-zzzn-nnnn zzss-sssz-zzzt-tttt //3R (Rs, Rt, Rn)
11pp-pzzz-zzzn-nnnn zzss-sssi-iiii-iiii //3RI (Rs, Imm9, Rn)
11pp-pzzz-zzzn-nnnn iiii-iiii-iiii-iiii //2RI (Imm16, Rn)
11pp-pzzz-zzzz-iiii iiii-iiii-iiii-iiii //(Imm20)
11pp-pzzz-iiii-iiii iiii-iiii-iiii-iiii //(Imm24 / Jumbo)

Or 32-bit encodings (6b regs)
11pp-pzzz-zznn-nnnn zsss-sssz-zztt-tttt //3R (Rs, Rt, Rn)
11pp-pzzz-zznn-nnnn zsss-sssi-iiii-iiii //3RI (Rs, Imm9, Rn)
11pp-pzzz-zznn-nnnn iiii-iiii-iiii-iiii //2RI (Imm16, Rn)
11pp-pzzz-zzzz-iiii iiii-iiii-iiii-iiii //(Imm20)
11pp-pzzz-iiii-iiii iiii-iiii-iiii-iiii //(Imm24 / Jumbo)

Would stick with Imm9 vs Imm12 as personally I feel Imm12 wastes too
much encoding space.

This layout also keeps Imm16 consistent with the other encodings (unlike
BJX2, where the Imm16 block uses a different/incompatible layout).

Where, ppp:
000: Op?T (Pred?T)
001: Op?F (Pred?F)
010: Op?T| (Pred?T + WEX, PrWEX)
011: Op?F| (Pred?F + WEX, PrWEX)
100: Op (Normal)
101: Special / Misc
110: Op| (WEX)
111: Jumbo

So, say:
1111-1zzz-iiii-iiii iiii-iiii-iiii-iiii (Jumbo-Prefix Instructions)
The 'zzz' field mostly defines how the prefix combines with next op.
000: Expand Immediate field by 24 bits
001: Expand the Opcode space and add 4th Reg/Imm field
...

This still allows 64-bit ops with Imm33s, or 96-bit ops with Imm64.
Having 96-bit encodings for "MOV Imm64, Rn" and "ADD Imm64, Rn" being
very often useful.

Would keep the Link Register hard-wired, as the costs of having a
flexible link register just "doesn't really seem worth it".

Choice between 5b and 6b regs is less obvious:
5b regs: Plenty sufficient for 95% of code
It is uncommon to find code where this is not sufficient
6b regs: Can pay off for that last 5% or so.
But, eats a lot of potential encoding space.
This would have less opcode space than in BJX2.
Or, make most of the ISA use 5b registers:
With only a subset having 6b registers;
Though, this doesn't really solve the issue.

With 5b regs, it would be pretty much break-even with BJX2 in terms of
32-bit opcode space.

The 16-bit space would be smaller, but this gain is eaten by the larger
'ppp' field. This differs from BJX2 where the PrWEX case is treated as
an edge case encoding that only applies to a subset of the ISA.

Of, course, could also do:
0zzz-nnnn-mmmm-zzzz (16-bit)
1ppp-****-****-**** (32-bit)

Where, 16-bit space is smaller still, but 32-bit space is bigger, so a
6-bit register field would at least be "slightly less" painful (but,
still not enough to reach break-even).

Of course, one could make a case for making all of the 128-bit SIMD ops
and similar require 64-bit instruction formats, but this would be
undesirable.

The other option is that the baseline ISA would remain as 32 GPRs.
Say, if they exist, R32..R63 would require Op64 or an Op40x2 style encoding.

Or, one could take a harder stance and only support 32 GPRs (in total).

Arguably, all one "really" needs in the 16-bit space is a few
common-case ALU ops, small branches, and some common-case Load/Store ops
(does not need to be sufficient for general operation).

Within the existing 16-bit space, a few areas are "high-traffic" and
much of the rest is little-used.

This means, likely (combining the current set of thoughts):
So, a possibly "similar, but hopefully more consistent":
0zzz-nnnn-mmmm-zzzz (16-bit, 2R)
0zzz-nnnn-iiii-zzzz (16-bit, 2RI)
0zzz-zzzz-iiii-iiii (16-bit, Imm8 and Misc)
1ppp-zzzz-zzzn-nnnn zzss-sssz-zzzt-tttt //3R (Rs, Rt, Rn)
1ppp-zzzz-zzzn-nnnn zzss-sssi-iiii-iiii //3RI (Rs, Imm9, Rn)
1ppp-zzzz-zzzn-nnnn iiii-iiii-iiii-iiii //2RI (Imm16, Rn)
1ppp-zzzz-zzzz-iiii iiii-iiii-iiii-iiii //(Imm20)
1ppp-zzzz-iiii-iiii iiii-iiii-iiii-iiii //(Imm24 / Jumbo)

With 16-bit ops as, say:
0000-nnnn-mmmm-0sss Store Rn,(Rm)
0000-nnnn-mmmm-1sss Load (Rm),Rn
sss=Type (SB/SS,UB/US,SL/Q,UL/X)
0001-nnnn-iiii-0nss Store Rn,(SP,Disp4)
0001-nnnn-iiii-1nss Load (SP,Disp4),Rn
ss=Type (SL/Q,UL/X)
0010-nnnn-mmmm-0zzz 2R ALU Ops (ADD,SUB,SHA,SHL,-,AND,OR,XOR)
0010-nnnn-mmmm-1nm0 MOV (R0..R31)
0010-nnnn-mmmm-1nm1 ADD (R0..R31)
0010-nnnn-iiii-0zzz 2RI ALU Ops (Imm4u,Rn)
0010-nnnn-iiii-1ns0 MOV Imm5s, Rn (R0..R31)
0010-nnnn-iiii-1ns1 ADD Imm5s, Rn (R0..R31)
0011-0000-iiii-iiii BRA Disp8s
0011-0001-iiii-iiii ADD Imm8s, SP
0011-0010-iiii-iiii BT Disp8s
0011-0011-iiii-iiii BF Disp8s
...

Will probably move SP to a low-numbered register, say:
R0: Zero / PC (Depends on Context)
R1: SP (Stack Pointer)
R2: LR (Link Register)
R3: GBR (Global Register)
R4..R15: Low GPRs
R16..R31: High GPRs

Decided to leave off defining C ABI specifics at this stage (eg: the
split between scratch and preserved registers).

Would move SP to a low register number, as this would avoid a big
annoyance in the register space layout. Moving LR and GBR into GPR space
as they are used frequently.

Compared with BJX2, the DLR and DHR registers would be eliminated.

One could argue for re-introducing partial register banking for
interrupt handling (possibly only R0..R7), mostly as having a few
scratch registers to work with would make the job of the interrupt
prolog/epilog handling a little easier.

Likely, this would be swapping R4..R7 with the registers "hidden behind"
R0..R3 (during decoding). Pros/cons, this would add an annoying edge
case for implementing preemptive multitasking though (need to be able to
address these registers to save/restore them).

It is likely that 128-bit operations would operate on paired registers
using the same scheme as in BJX2.

If it still uses 96-bit addressing, it is likely that there will be a
GBH/DBH (Data High) and PCH (PC High) register. As in BJX2, these will
still be assumed off-limits to user code (a program is not normally
allowed to span multiple quadrants, but may or may not be allowed to
have data pointers outside the local quadrant).

For applications, it is possible that the 128-bit pointers could also be
treated like capabilities (so, a program is not allowed to use an
address to a location outside of its local quadrant unless it is
"blessed" by the OS kernel; but may freely compose pointers within its
own quadrant).

One likely change would be to unify the behavior of GPRs and LR
regarding branch via register, likely:
LSB=0: Always ignore high 16-bits.
LSB=1: Reload state from high 16 bits.
With LR effectively always having the LSB set during normal operation.

....

Subject	Replies	Author
Upcoming DFP support in clang/LLVM By: Ivan Godard on Mon, 2 May 2022	103	Ivan Godard

A year spent in artificial intelligence is enough to make one believe in God.

computers / comp.arch / Re: Architecture comparison (was: Upcoming DFP support in clang/LLVM)