Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and fixed.


devel / comp.arch / Re: Drastic Simplification of Concertina II Coming

SubjectAuthor
* Drastic Simplification of Concertina II ComingQuadibloc
+* Re: Drastic Simplification of Concertina II ComingQuadibloc
|`* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| +* Re: Drastic Simplification of Concertina II ComingQuadibloc
| |+- Re: Drastic Simplification of Concertina II ComingBGB
| |`* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | +- Re: Drastic Simplification of Concertina II ComingBGB
| | +* Re: Drastic Simplification of Concertina II ComingMichael S
| | |`* Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | +* Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |+* Re: Drastic Simplification of Concertina II ComingAnton Ertl
| | | ||+* Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |||+* Re: Drastic Simplification of Concertina II ComingAnton Ertl
| | | ||||`- Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |||`- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | ||`* Re: Drastic Simplification of Concertina II ComingMichael S
| | | || `* Re: Drastic Simplification of Concertina II ComingBGB
| | | ||  +* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | ||  |+* Re: Drastic Simplification of Concertina II ComingBGB
| | | ||  ||`- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | ||  |`* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | ||  | +* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | ||  | |`* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | ||  | | `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | ||  | |  +- Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | ||  | |  `- Re: Drastic Simplification of Concertina II ComingEricP
| | | ||  | `- Re: Drastic Simplification of Concertina II ComingBGB
| | | ||  `* Re: Drastic Simplification of Concertina II ComingTerje Mathisen
| | | ||   `- Re: Drastic Simplification of Concertina II ComingBGB
| | | |`* Re: Drastic Simplification of Concertina II ComingBGB
| | | | +- Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | | `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  +* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  |+* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  ||`* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  || `- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  |`* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | +* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | |+* Re: Drastic Simplification of Concertina II ComingQuadibloc
| | | |  | ||`* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | || +* Re: Drastic Simplification of Concertina II ComingIvan Godard
| | | |  | || |`* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | || | `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | || |  `* Re: Drastic Simplification of Concertina II ComingGeorge Neuner
| | | |  | || |   +- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | || |   `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | || |    `* Re: Drastic Simplification of Concertina II ComingGeorge Neuner
| | | |  | || |     `* Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |  | || |      +- Re: Drastic Simplification of Concertina II ComingQuadibloc
| | | |  | || |      `* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  | || |       `* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | || |        `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | || |         `- Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | || `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||  +* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | ||  |+- Re: Drastic Simplification of Concertina II ComingMichael S
| | | |  | ||  |`- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||  `* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | ||   +* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | ||   |+* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||   ||+* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | ||   |||`* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||   ||| `- Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | ||   ||+* Re: Drastic Simplification of Concertina II ComingTerje Mathisen
| | | |  | ||   |||`* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | ||   ||| +- Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |  | ||   ||| +- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||   ||| `- Re: Drastic Simplification of Concertina II ComingIvan Godard
| | | |  | ||   ||`- Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  | ||   |`- Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | ||   +- Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||   `* Re: Drastic Simplification of Concertina II ComingGeorge Neuner
| | | |  | ||    `* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | ||     `* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  | ||      `* Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |  | ||       `* Re: Drastic Simplification of Concertina II ComingMichael S
| | | |  | ||        `* Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |  | ||         `- Re: Drastic Simplification of Concertina II ComingScott Lurndal
| | | |  | |`* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | | +- Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | | `* Re: Drastic Simplification of Concertina II ComingTim Rentsch
| | | |  | |  `* Re: Drastic Simplification of Concertina II ComingStephen Fuld
| | | |  | |   `- Re: Drastic Simplification of Concertina II ComingBGB
| | | |  | `* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  |  `* Re: Drastic Simplification of Concertina II ComingEricP
| | | |  |   `* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  |    `* Re: Drastic Simplification of Concertina II ComingQuadibloc
| | | |  |     `* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  |      +* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  |      |+- Re: Drastic Simplification of Concertina II ComingMichael S
| | | |  |      |+* Re: Drastic Simplification of Concertina II ComingNiklas Holsti
| | | |  |      ||`* Re: Drastic Simplification of Concertina II Comingmac
| | | |  |      || +- Re: Drastic Simplification of Concertina II ComingNiklas Holsti
| | | |  |      || `* Re: Drastic Simplification of Concertina II ComingAnton Ertl
| | | |  |      ||  `- Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  |      |+* Re: Drastic Simplification of Concertina II ComingTerje Mathisen
| | | |  |      ||`- Re: Drastic Simplification of Concertina II ComingAnton Ertl
| | | |  |      |`* Re: Drastic Simplification of Concertina II ComingDavid Brown
| | | |  |      | `* Re: Drastic Simplification of Concertina II ComingThomas Koenig
| | | |  |      |  +* Re: Drastic Simplification of Concertina II ComingMitchAlsup
| | | |  |      |  |`- Re: Drastic Simplification of Concertina II ComingDavid Brown
| | | |  |      |  `* Re: Drastic Simplification of Concertina II ComingQuadibloc
| | | |  |      `* Re: Drastic Simplification of Concertina II ComingBGB
| | | |  `- Re: Drastic Simplification of Concertina II ComingTerje Mathisen
| | | `* Re: Drastic Simplification of Concertina II ComingMichael S
| | `* Re: Drastic Simplification of Concertina II ComingQuadibloc
| `* Re: Drastic Simplification of Concertina II ComingQuadibloc
+* Re: Drastic Simplification of Concertina II ComingMitchAlsup
+- Re: Drastic Simplification of Concertina II ComingQuadibloc
+* Re: Drastic Simplification of Concertina II ComingThomas Koenig
+* Re: Drastic Simplification of Concertina II ComingQuadibloc
+* Re: Drastic Simplification of Concertina II ComingQuadibloc
`* Re: Drastic Simplification of Concertina II ComingQuadibloc

Pages:12345678910111213141516171819202122
Re: Drastic Simplification of Concertina II Coming

<tn6k8i$26869$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29484&group=comp.arch#29484

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Mon, 12 Dec 2022 01:12:09 -0600
Organization: A noiseless patient Spider
Lines: 394
Message-ID: <tn6k8i$26869$1@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<41fde666-eb3c-4c6b-9ec4-ef94527fcb31n@googlegroups.com>
<0d16f36f-c4a8-42f7-8236-e60776809363n@googlegroups.com>
<tl8gnj$3hf5b$1@newsreader4.netcologne.de> <tmm11f$24lm$1@dont-email.me>
<tn07j8$loid$3@newsreader4.netcologne.de> <tn2c1s$1melj$1@dont-email.me>
<tn4hcu$ohjk$1@newsreader4.netcologne.de> <tn5dg6$20k9t$1@dont-email.me>
<cc428b85-f8bd-40cf-bf24-63508205bafan@googlegroups.com>
<tn5qj9$21laf$1@dont-email.me>
<30e13eec-d0f8-43b9-bec7-c49c418183a3n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 12 Dec 2022 07:12:19 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="5dcfdf9bf0034ea7fac3df0a6d377a34";
logging-data="2302153"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+wsPIz7dg+nAInypILDe4P"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:M0BmnpQ3AII7VaMCBO4AmSj8FYc=
Content-Language: en-US
In-Reply-To: <30e13eec-d0f8-43b9-bec7-c49c418183a3n@googlegroups.com>
 by: BGB - Mon, 12 Dec 2022 07:12 UTC

On 12/11/2022 6:48 PM, MitchAlsup wrote:
> On Sunday, December 11, 2022 at 5:54:20 PM UTC-6, BGB wrote:
>> On 12/11/2022 3:42 PM, MitchAlsup wrote:
>>> On Sunday, December 11, 2022 at 2:10:49 PM UTC-6, BGB wrote:
>>>> On 12/11/2022 6:11 AM, Thomas Koenig wrote:
>>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:
>>>
>>>>> I think this would be a tiny bit harder to parse, because the position
>>>>> of the marker of the bit marking the next instruction would depend
>>>>> on the previous one. (Am I making sense here?)
>>>>>
>>>>> This would also make it harder to put the position of, let's say, the
>>>>> destination register in the same position for each instruction.
>>> <
>>>> IME, fully variable length instructions and bundling don't really mix well.
>>> <
>>> Fully variable length instructions (VAX) just don't make any sense anymore
>>> anyway !! But by creating 3 conditions one can have variable length, and
>>> easy decoding. The 3 conditions are:: a) all bits needed to determine the
>>> length of the instruction are found in the first container of the instruction,
>>> b) all of the register source operands are in the first instruction container,
>>> c) leaving only constants to occupy the variable lengths.
>>> <
>>> A minor "good to have" feature is :: d) the determination of length does not
>>> take very many gates (in my case this is 40 total gates and 4 gates of delay);
>>> which is to be compared to 10-gates of your standard flip-flop to hold a
>>> single bit.
>>> <
>> As noted, my cases are:
>> 16-bit op
>> 32-bit op
>> 32-bit ops daisy-chained together.
>>
>> The 16-bit ops are scalar-only in my case.
>>
>>
>> Early on, I had considered mixed 16/32 bundles, but it didn't seem worth
>> the cost/complexity, and I then redesigned the mechanism to only allow
>> 32-bit words in bundles.
> <
> I rejected 16-bit OpCodes because they eat too much entropy in the encoding
> space while returning so little advantage.

I started out with 16-bit opcodes.

So, the development history can be divided into several major iterations:

A:
* 16/32/48
* No predicated ops.
* Different planned design for WEX.
* The 16-bit ops were primary, with the 32-bit ops as an overflow case.

Originally, the 32-bit ops were imagined as prefix-encodings that would
extend the 16-bit op (similar to BJX1, which was imagined effectively as
a prefix-extended version of the 16-bit SuperH instructions).

Say:
Aiii-ZZnm //OP Rm, Imm12u, Rn (R0..R15 only)
Biii-ZZnm //OP Rm, Imm12n, Rn (R0..R15 only)
FZeo-ZZnm //OP Rm, Ro, Rn (R0..R31)
FZei-ZZnm //OP Rm, Imm5, Rn

WEX would have been encoded using a prefix op, followed by several
suffix instructions. One encoding would have been 80-bit bundles with 3x
logical 24-bit instructions, but there were multiple bundle formats.

B: 16/32/48.
* Redesigned the instruction encodings.
* The 32-bit encodings became the primary ISA.
* Gained predicated ops, and the current WEX encoding scheme.

C: Still 16/32/48.
* More encoding tweaks;
** F0nm-ZeoZ, F1nm-Zeii, ...
* Gained an early form of Jumbo encodings;
** Originally encoded as WEX'ed branch ops.
** So: F4ii-Ciii .. F4ii-Fiii
* Added the PrWEX encodings.
* (First working Verilog implementation was this version, ~ 3 years ago).

D: Now 16/32, dropped original 48-bit encodings.
* Reworked the encoding for Jumbo prefixes.
** FEii-iiii / FFii-iiii
** The original Op48 encoding space was eaten by the Jumbo prefixes.
* Experimented with other changes to the encoding rules.
* Gained R32..R63.

Version D effectively represents the current version of the ISA.
While it is still kind of messy (and some combinations of features can't
be encoded on the same instruction at the same time), as of yet I
haven't figured out any good way to improve on it.

This version advances basically when there is an ISA level design change
sufficient to basically break binary compatibility with any prior code
(once this happens, I usually piled on any other "wishlist" breaking
changes, along with the changes which required breaking compatibility).

> <snip> >>
>> As-is, the 64 and 96 bit instruction formats are basically a hacked case
>> of the bundle format, with the Jumbo Prefixes serving to augment the
>> following instruction rather than being decoded as an instruction in
>> their own right.
>>
>> So, say, 3x 32-bit decoders exist:
>> A=(31:0), B=(63:32), C=(95:64)
>> With C being fed some of the bits from B, and B being fed some of the
>> bits from A (so, B can see that A|B are a Jumbo/Op64 op, and C can see
>> that A|B|C is a Jumbo96 Op).
> <
> In my case, the 3 "parsers" do not share any bits, but after the parsers
> have identified instruction boundaries, Find First circuitry, finalizes what
> instructions are fed into DECODE.
> <
> When one looks at my pipeline, one sees that SRAMs exist between flip-
> flops, so all hit and set verification logic of the cache exists AFTER the
> flip-flops; in effect, you don't even see the instructions (or data) until
> 6-gates after the flip-flops present the next set of instructions. his is too
> late to access the register file and perform forwarding on the heavily loaded
> operand and result busses. So adding the PARSE stage would have happened
> in any event.

OK.

>>
>> The outer decoder then maps these decoder outputs onto the 3 lanes,
>> depending on the length of the bundle (with some special handling for
>> 128-bit SIMD ops, ...).
> <
> My PARSE stage looks at the function units an register requirements of the
> 'present' instructions and presents up to 3 (2 most of the time) to DECODE
> which does most of what people associate with your typical decode stage.
> One special thing PARSE does is to configure the register fields to the register
> ports and considers a pending store that needs a value in the register file--
> congruent to the 3R1W configuration of the RF.

My decode stage basically maps the instruction(s) onto a 6R3W register
file, with 3x 33-bit immediate fields, ...

There are also a few RISC-V decoders thrown in the mix as well, since
RISC-V can also be mapped onto my pipeline (just the idea in this case
is to try to map RISC-V as a superscalar; which requires prefix/suffix
status flag mapping, and logic to check for register clashes).

Had originally considered handling RISC-V bundling explicitly in
software, but this had its own issues (mostly dropped).

>>
>>
>> For a 96-bit Imm64 output:
>> A outputs NOP
>> B outputs NOP, with the high 32-bits of the Immed;
>> C outputs the instruction, with the low 32 bits of the Immed.
> <
> Seems backwards for Little Endian; A should always contain the instruction.
> B and C never more than constants.

While the 16-bit words are little-endian, the ISA design itself handles
bundles and 32-bit instruction decoding as effectively big-endian.

So, A/B/C decoders, and Lane 1/2/3 ordering, are effectively in reverse
order relative to each other, and the higher bits of immediate fields
precede the low-order bits.

So, say:
MOV 0x123456789ABCDEF, R4
Would be expressed as, say:
FE01-2345-FE67-89AB-F804-CDEF

Or, as bytes:
01 FE 45 23 67 FE AB 89 04 F8 EF CD

If I were to do a new ISA design, I might consider redesigning the
instructions to be "properly" little endian.

For decoding, this is what seemed to make the most sense at the time.

>>
>> C realizes it was part of a 96-bit bundle, to reads from a special
>> 'JIMM' register, which the Register-Fetch mechanism understands as "glue
>> the two immediate fields together" and return this as the immediate.
>>
>>
>> Dealing with a 128-bit bundle would require additional hacks, such as a
>> 'D' decoder, and some additional signal routing between the decoders, ...
>>
>> Likewise, one would also likely want:
>> Jumbo, Op2, Jumbo, Op1
>> To decode with Op1 and Op2 on Lanes 1 and 2 rather than Lanes 1 and 3;
>> so the mapping from Decoders to Lanes will also need to deal with Jumbo
>> prefixes in this case (and/or make it so that Lane 3 can execute all the
>> same instructions as Lane 2).
>>
>>
>> It is for some related reasons why I have not (as of yet) reintroduced
>> 48-bit encodings. Even if not difficult per-se, the gains (relative to
>> their costs; or relative to "just use Op64 encodings or similar") are
>> small enough, and the costs high enough, that they are not really likely
>> to be a win.
>>>>
>>>> Partly it is a case of either one needs to need decoders of every width
>>>> for every possible position (not cheap), and/or needs to have a lot of
>>>> extra signal routing to a smaller number of decoders (also not cheap, IME).
>>> <
>>> When it only takes 40-gates to decode an entire word, but 320-gates to
>>> hold that word, one can afford to simply have these gates hanging off
>>> the IB flip-flops. That is, it would take more gates to store the output
>>> of the 40-gate decoder than to statically just hang the gates off the already
>>> present 320-gates.
> <
>> Dunno. Decoders are "not too expensive" but "not exactly free" either.
>> So, say, 4 decoders, is a lot cheaper than, say, 15 (8x 16b + 4x 32b +
>> 3x 32b). Trying to MUX the inputs to save decoders wont help, and N-way
>> MUX'ing the outputs isn't so great either.
> <
> These are not "real" decoders, they are simple pattern recognizers that
> prepare instructions for the 'real' decoders.


Click here to read the complete article
Re: Drastic Simplification of Concertina II Coming

<tn6lrb$26bba$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29485&group=comp.arch#29485

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Mon, 12 Dec 2022 01:39:14 -0600
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <tn6lrb$26bba$1@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<41fde666-eb3c-4c6b-9ec4-ef94527fcb31n@googlegroups.com>
<0d16f36f-c4a8-42f7-8236-e60776809363n@googlegroups.com>
<tl8gnj$3hf5b$1@newsreader4.netcologne.de> <tmm11f$24lm$1@dont-email.me>
<tn07j8$loid$3@newsreader4.netcologne.de> <tn2c1s$1melj$1@dont-email.me>
<79f08afe-6dfc-41c6-8d10-750dd6d29dd6n@googlegroups.com>
<tn5rfi$21p1j$1@dont-email.me>
<321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Dec 2022 07:39:23 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="5dcfdf9bf0034ea7fac3df0a6d377a34";
logging-data="2305386"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/7zn0RLv1AoLnF4xIH/+2t"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:AI0IoM0YIEG/pHqvNOkCouV9Rtw=
Content-Language: en-US
In-Reply-To: <321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com>
 by: BGB - Mon, 12 Dec 2022 07:39 UTC

On 12/11/2022 6:50 PM, MitchAlsup wrote:
> On Sunday, December 11, 2022 at 6:09:25 PM UTC-6, Stephen Fuld wrote:
>> On 12/11/2022 3:24 AM, Michael S wrote:
>
>>>
>>> Except for the absolute bottom end of general-purpose CPUs, the instruction
>>> fetch ports are *at least* 16B=128b wide. So, after jump to none-16B-aligned
>>> target, extra read is going to happen regardless of presence or absence of
>>> the bundles.
> <
>> IANAHG, but that depends upon whether the 16 byte transfer from the
>> cache to the instruction fetch buffer must be aligned to a 16 byte cache
>> boundary or not. I was assuming not. If that assumption is wrong, then
>> I will have to reconsider.
> <
> In the SRAM itself, they are always aligned. To get any semblance of unaligned
> access, you have to access 2 SRAMs and mux then together (, somehow)

Yes.

My L1 caches are organized as "even" and "odd" pairs of 16-byte cache lines.

So, the cache figures out (for a given address), which Even and which
Odd line to fetch, then patches the bits together and extracts a value
based on the low-order bits of the address.

For the L1 D$, for loads, the last step of this process is to sign or
zero extend the value to the requested size (say, internally, it always
fetches 64 bits for a Load, but then this is extended for everything
smaller than 64 bits).

For 128-bit Load/Store, one can just sort of sidestep the normal
extract/insert logic, but this imposes an alignment restriction (with
misaligned access requiring a pair of 64-bit load or store operations).

Similar goes for the instruction cache in my case.
Note that internally, the L2 cache uses 512-bit lines (accessed
essentially as 4 rows of 128 bit lines, or as 512 bits on the RAM-facing
side).

A single row of lines could deal with aligned-only access.
Main selling point is that it could make the L1 caches a little cheaper.
Obvious drawback is that, if one needs to fake misaligned memory access
in software, this is kinda lame.

Either this, or add a mechanism to internally transform a misaligned
access into two accesses inside the L1 cache (so that misaligned access
still works, but imposes a speed penalty).

Re: Drastic Simplification of Concertina II Coming

<i_JlL.20907$vBI8.5793@fx15.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29494&group=comp.arch#29494

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com> <41fde666-eb3c-4c6b-9ec4-ef94527fcb31n@googlegroups.com> <0d16f36f-c4a8-42f7-8236-e60776809363n@googlegroups.com> <tl8gnj$3hf5b$1@newsreader4.netcologne.de> <tmm11f$24lm$1@dont-email.me> <tn07j8$loid$3@newsreader4.netcologne.de> <tn2c1s$1melj$1@dont-email.me> <79f08afe-6dfc-41c6-8d10-750dd6d29dd6n@googlegroups.com> <tn5rfi$21p1j$1@dont-email.me> <321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com>
In-Reply-To: <321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 44
Message-ID: <i_JlL.20907$vBI8.5793@fx15.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 12 Dec 2022 17:59:42 UTC
Date: Mon, 12 Dec 2022 12:59:32 -0500
X-Received-Bytes: 3319
 by: EricP - Mon, 12 Dec 2022 17:59 UTC

MitchAlsup wrote:
> On Sunday, December 11, 2022 at 6:09:25 PM UTC-6, Stephen Fuld wrote:
>> On 12/11/2022 3:24 AM, Michael S wrote:
>
>>> Except for the absolute bottom end of general-purpose CPUs, the instruction
>>> fetch ports are *at least* 16B=128b wide. So, after jump to none-16B-aligned
>>> target, extra read is going to happen regardless of presence or absence of
>>> the bundles.
> <
>> IANAHG, but that depends upon whether the 16 byte transfer from the
>> cache to the instruction fetch buffer must be aligned to a 16 byte cache
>> boundary or not. I was assuming not. If that assumption is wrong, then
>> I will have to reconsider.
> <
> In the SRAM itself, they are always aligned. To get any semblance of unaligned
> access, you have to access 2 SRAMs and mux then together (, somehow)

For better or worse, the design I came up with for a dual instruction
parser was to have 3 16-byte aligned buffers A,B,C each with a virtual
IP address tag. The idea is that it can be fetching from two buffers due
to a fetch buffer straddle while the third buffer is filling ahead.
A little LRU tracker rotates the selection of the victim buffer.
This also allows it to execute a small loop out of two buffers.

For interrupts it allows fetch to be pre-loading a third buffer with the
handler code while it continues to draw from the other two buffers,
thereby minimizing fetch buffer wastage and pipeline bubbles.
When the handler buffer is ready, fetch injects a special Interrupt uOp
into the stream, switches to draw from that buffer, and begins to
prefetch into the other buffers.

The IP address bits [63:4] and its increment selects two 16-byte
buffers and gates them onto two buses. The low IP bits drive an 8:1 mux
to right shift the primary instruction word into the first position.

The instruction length is extracted from the first instruction word
and passed to a special carry-select adder to generate the second IP.
It pre-generates two sets of high address bits, then uses the carry out
of the low bit add to select the same or incremented high bits.

The second instruction is accessed and aligned through an identical
second set of buses and 8:1 shifter muxes.

Re: Drastic Simplification of Concertina II Coming

<tnacas$2idts$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29507&group=comp.arch#29507

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Drastic Simplification of Concertina II Coming
Date: Tue, 13 Dec 2022 09:21:32 -0800
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <tnacas$2idts$1@dont-email.me>
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<41fde666-eb3c-4c6b-9ec4-ef94527fcb31n@googlegroups.com>
<0d16f36f-c4a8-42f7-8236-e60776809363n@googlegroups.com>
<tl8gnj$3hf5b$1@newsreader4.netcologne.de> <tmm11f$24lm$1@dont-email.me>
<tn07j8$loid$3@newsreader4.netcologne.de> <tn2c1s$1melj$1@dont-email.me>
<79f08afe-6dfc-41c6-8d10-750dd6d29dd6n@googlegroups.com>
<tn5rfi$21p1j$1@dont-email.me>
<321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 13 Dec 2022 17:21:32 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="290359e47c9972c2695ec7ddc93224db";
logging-data="2701244"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vM6H84iHD06xxRjugNZ0ulQz0RGkkQyQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.5.1
Cancel-Lock: sha1:Aal7WdecuB703aCCCSG8duq6Z5U=
Content-Language: en-US
In-Reply-To: <321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com>
 by: Stephen Fuld - Tue, 13 Dec 2022 17:21 UTC

On 12/11/2022 4:50 PM, MitchAlsup wrote:
> On Sunday, December 11, 2022 at 6:09:25 PM UTC-6, Stephen Fuld wrote:
>> On 12/11/2022 3:24 AM, Michael S wrote:
>
>>>
>>> Except for the absolute bottom end of general-purpose CPUs, the instruction
>>> fetch ports are *at least* 16B=128b wide. So, after jump to none-16B-aligned
>>> target, extra read is going to happen regardless of presence or absence of
>>> the bundles.
> <
>> IANAHG, but that depends upon whether the 16 byte transfer from the
>> cache to the instruction fetch buffer must be aligned to a 16 byte cache
>> boundary or not. I was assuming not. If that assumption is wrong, then
>> I will have to reconsider.
> <
> In the SRAM itself, they are always aligned. To get any semblance of unaligned
> access, you have to access 2 SRAMs and mux then together (, somehow)

Thanks. So I take it that on the MY 66000, if there is a branch from
"somewhere" to what is the last instruction in a bufferfull, you load
the entire buffer, including all the unused instructions prior to the
one you want into the fetch buffer. Correct?

If so, then I retract my earlier argument about instruction bundles
requiring extra data transfer.

So, then another question, given the above, and your expressed position
that 36 bits was the "ideal" instruction length, why did you choose 32
bits for the instruction size in MY 66000? I get that actually using
only 36 would cause some wasted bits, but is there some other reason?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Drastic Simplification of Concertina II Coming

<57d9e79f-75ee-46ea-8d0e-1678e767ca6dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=29509&group=comp.arch#29509

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1b25:b0:3a5:7dd1:31e0 with SMTP id bb37-20020a05622a1b2500b003a57dd131e0mr59559325qtb.57.1670961184075;
Tue, 13 Dec 2022 11:53:04 -0800 (PST)
X-Received: by 2002:aca:1114:0:b0:35b:edc8:a54a with SMTP id
20-20020aca1114000000b0035bedc8a54amr354549oir.79.1670961183826; Tue, 13 Dec
2022 11:53:03 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 13 Dec 2022 11:53:03 -0800 (PST)
In-Reply-To: <tnacas$2idts$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d63:36fb:50ef:7088;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d63:36fb:50ef:7088
References: <0dc2de98-4b3b-47dc-b9f6-e4f40a5ba6cbn@googlegroups.com>
<41fde666-eb3c-4c6b-9ec4-ef94527fcb31n@googlegroups.com> <0d16f36f-c4a8-42f7-8236-e60776809363n@googlegroups.com>
<tl8gnj$3hf5b$1@newsreader4.netcologne.de> <tmm11f$24lm$1@dont-email.me>
<tn07j8$loid$3@newsreader4.netcologne.de> <tn2c1s$1melj$1@dont-email.me>
<79f08afe-6dfc-41c6-8d10-750dd6d29dd6n@googlegroups.com> <tn5rfi$21p1j$1@dont-email.me>
<321b3955-e217-49ff-96c4-449fa6f77ffbn@googlegroups.com> <tnacas$2idts$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <57d9e79f-75ee-46ea-8d0e-1678e767ca6dn@googlegroups.com>
Subject: Re: Drastic Simplification of Concertina II Coming
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 13 Dec 2022 19:53:04 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4444
 by: MitchAlsup - Tue, 13 Dec 2022 19:53 UTC

On Tuesday, December 13, 2022 at 11:21:35 AM UTC-6, Stephen Fuld wrote:
> On 12/11/2022 4:50 PM, MitchAlsup wrote:
> > On Sunday, December 11, 2022 at 6:09:25 PM UTC-6, Stephen Fuld wrote:
> >> On 12/11/2022 3:24 AM, Michael S wrote:
> >
> >>>
> >>> Except for the absolute bottom end of general-purpose CPUs, the instruction
> >>> fetch ports are *at least* 16B=128b wide. So, after jump to none-16B-aligned
> >>> target, extra read is going to happen regardless of presence or absence of
> >>> the bundles.
> > <
> >> IANAHG, but that depends upon whether the 16 byte transfer from the
> >> cache to the instruction fetch buffer must be aligned to a 16 byte cache
> >> boundary or not. I was assuming not. If that assumption is wrong, then
> >> I will have to reconsider.
> > <
> > In the SRAM itself, they are always aligned. To get any semblance of unaligned
> > access, you have to access 2 SRAMs and mux then together (, somehow)
<
> Thanks. So I take it that on the MY 66000, if there is a branch from
> "somewhere" to what is the last instruction in a bufferfull, you load
> the entire buffer, including all the unused instructions prior to the
> one you want into the fetch buffer. Correct?
<
Yes, the aligned buffer is read and only one container is used as an instruction.
The subsequent buffer is read in the succeeding cycle. Both are put in the IB
so that if something might loop back they can avoid a I$ fetch.
<
What this strategy does is make holes in the fetch pattern so that if one of the
fetched instructions is a branch, one can use a hole to fetch the target of that
branch before the branch is decoded--thus eliminating the perceived need of
a delay slot.
>
> If so, then I retract my earlier argument about instruction bundles
> requiring extra data transfer.
>
> So, then another question, given the above, and your expressed position
> that 36 bits was the "ideal" instruction length, why did you choose 32
> bits for the instruction size in MY 66000? I get that actually using
> only 36 would cause some wasted bits, but is there some other reason?
<
Practicality and code density. The Ideal size of 36 bits enables some encodings
that are not possible in 32-bits, but not needed "all that often"; while everyone
and his brothers have accepted 8-bit bytes ad infimum.
<
Consider the english language but limit it to 2,000 words--most people could
converse quite normally. Now, returning to ISA encoding, is not making it
possible to encode the required semantic of the application all that is really
necessary ??
<
And then there is the practicality where the instruction cache width is congruent
to the width of all subsequent caches in the memory hierarchy.
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Pages:12345678910111213141516171819202122
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor