Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
 If you find that it is broken please let me know here rocksolid.nodes.help


devel / comp.arch / Re: Why My 66000 is and is not RISC

SubjectAuthor
* Why My 66000 is and is not RISCMitchAlsup
+* Re: Why My 66000 is and is not RISCTerje Mathisen
|`* Re: Why My 66000 is and is not RISCMarcus
| `* Re: Why My 66000 is and is not RISCMitchAlsup
|  +* Re: Why My 66000 is and is not RISCBrett
|  |+* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||`* Re: Why My 66000 is and is not RISCBrett
|  || `* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||  +* Re: Why My 66000 is and is not RISCBrett
|  ||  |+* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||  ||`* Re: Why My 66000 is and is not RISCBGB
|  ||  || +* Re: Why My 66000 is and is not RISCIvan Godard
|  ||  || |`* Re: Why My 66000 is and is not RISCThomas Koenig
|  ||  || | `- Re: Why My 66000 is and is not RISCIvan Godard
|  ||  || `* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||  ||  +* Re: Why My 66000 is and is not RISCIvan Godard
|  ||  ||  |`* Re: Why My 66000 is and is not RISCTerje Mathisen
|  ||  ||  | `- Re: Why My 66000 is and is not RISCMitchAlsup
|  ||  ||  +* Re: Why My 66000 is and is not RISCBGB
|  ||  ||  |`* Code Density Deltas (Re: Why My 66000 is and is not RISC)BGB
|  ||  ||  | `* Re: Code Density Deltas (Re: Why My 66000 is and is not RISC)MitchAlsup
|  ||  ||  |  `* Re: Code Density Deltas (Re: Why My 66000 is and is not RISC)BGB
|  ||  ||  |   `- Re: Code Density Deltas (Re: Why My 66000 is and is not RISC)MitchAlsup
|  ||  ||  `* Re: Why My 66000 is and is not RISCThomas Koenig
|  ||  ||   +- Re: Why My 66000 is and is not RISCBGB
|  ||  ||   `- Re: Why My 66000 is and is not RISCEricP
|  ||  |`- Re: Why My 66000 is and is not RISCBGB
|  ||  `* Re: Why My 66000 is and is not RISCBrett
|  ||   `* Re: Why My 66000 is and is not RISCBrett
|  ||    +* Re: Why My 66000 is and is not RISCStephen Fuld
|  ||    |`- Re: Why My 66000 is and is not RISCMitchAlsup
|  ||    `* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||     `* Re: Why My 66000 is and is not RISCBrett
|  ||      +* Re: Why My 66000 is and is not RISCBrett
|  ||      |`* Re: Why My 66000 is and is not RISCBrett
|  ||      | `- Re: Why My 66000 is and is not RISCBrett
|  ||      `* Re: Why My 66000 is and is not RISCStephen Fuld
|  ||       +* Re: Why My 66000 is and is not RISCBGB
|  ||       |`* Re: Why My 66000 is and is not RISCStefan Monnier
|  ||       | +- Re: Why My 66000 is and is not RISCMitchAlsup
|  ||       | `* Re: Why My 66000 is and is not RISCBGB
|  ||       |  `* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||       |   `- Re: Why My 66000 is and is not RISCBGB
|  ||       `* Re: Why My 66000 is and is not RISCluke.l...@gmail.com
|  ||        +- Re: Why My 66000 is and is not RISCMitchAlsup
|  ||        `* Re: Why My 66000 is and is not RISCBGB
|  ||         `* Re: Why My 66000 is and is not RISCAgner Fog
|  ||          +* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||          |`* Re: Why My 66000 is and is not RISCluke.l...@gmail.com
|  ||          | `- Re: Why My 66000 is and is not RISCBGB
|  ||          `- Re: Why My 66000 is and is not RISCMitchAlsup
|  |+* Re: Why My 66000 is and is not RISCTimothy McCaffrey
|  ||+* Re: Why My 66000 is and is not RISCJohn Dallman
|  |||`- Re: Why My 66000 is and is not RISCMitchAlsup
|  ||+* Re: Why My 66000 is and is not RISCBGB
|  |||`* Re: Why My 66000 is and is not RISCTimothy McCaffrey
|  ||| +* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||| |`* Re: Why My 66000 is and is not RISCBGB
|  ||| | `* Re: Why My 66000 is and is not RISCMitchAlsup
|  ||| |  +- Re: Why My 66000 is and is not RISCIvan Godard
|  ||| |  `- Re: Why My 66000 is and is not RISCBGB
|  ||| `- Re: Why My 66000 is and is not RISCIvan Godard
|  ||`- Re: Why My 66000 is and is not RISCMitchAlsup
|  |`* Re: Why My 66000 is and is not RISCThomas Koenig
|  | +- Re: Why My 66000 is and is not RISCBGB
|  | `* Re: Why My 66000 is and is not RISCBrett
|  |  `* Re: Why My 66000 is and is not RISCMitchAlsup
|  |   `* Re: Why My 66000 is and is not RISCDavid Brown
|  |    `* Re: Why My 66000 is and is not RISCBGB
|  |     `* Re: Why My 66000 is and is not RISCDavid Brown
|  |      `* Re: Why My 66000 is and is not RISCBGB
|  |       `* Re: Why My 66000 is and is not RISCDavid Brown
|  |        `- Re: Why My 66000 is and is not RISCBGB
|  `* Re: Why My 66000 is and is not RISCThomas Koenig
|   `- Re: Why My 66000 is and is not RISCMitchAlsup
`* Re: Why My 66000 is and is not RISCBGB
 `* Re: Why My 66000 is and is not RISCMitchAlsup
  `- Re: Why My 66000 is and is not RISCBGB

Pages:1234
Re: Why My 66000 is and is not RISC

<t95rvl$245$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26085&group=comp.arch#26085

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Fri, 24 Jun 2022 19:30:13 -0700
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <t95rvl$245$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<2f5c8378-de57-4ef2-8e24-01209e4c7a20n@googlegroups.com>
<t94tjp$slm$1@dont-email.me>
<315e7041-51de-4ea0-aa3c-3e7a0cf8bff0n@googlegroups.com>
<d28c3207-b7ad-44ed-8614-29eb065aecd6n@googlegroups.com>
<t95mek$33n$1@dont-email.me>
<4cde7f77-e3d5-48bb-bf6d-8329cb91f805n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Jun 2022 02:30:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0819cf7504746da5757c59e2b3da51ed";
logging-data="2181"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18zvlM84q5pmRTdKSLGcDeV"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:dHe79kdbPt7XKKDILXeMnpicr7U=
In-Reply-To: <4cde7f77-e3d5-48bb-bf6d-8329cb91f805n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Sat, 25 Jun 2022 02:30 UTC

On 6/24/2022 6:36 PM, MitchAlsup wrote:
> On Friday, June 24, 2022 at 7:55:52 PM UTC-5, BGB wrote:
>> On 6/24/2022 3:02 PM, MitchAlsup wrote:
>
>>> With <realistically> 30-64-bit registers in use by compiler and 16 of these preserved,
>>> I am not seeing very much caller-save register traffic from Brian's LLVM port. It is more
>>> like R9-R15 are simply temps used whenever and forgotten.
>> That is presumably how it is supposed to be...
>>
>>
>> In my case, it is roughly a 50/50 split between caller save (scratch)
>> and callee save (preserved) registers.
> <
> I, too, have 50%/50%:: R1-15 are temps, R16-30 are preserved.
> R0 receives Return Address, R31 is Stack Pointer. ½ of the temps
> can be used to carry arguments and results covering the 98%-ile.
>>
>> For leaf functions, one wants a lot of scratch registers, and for
>> non-leaf functions, a lot of callee-save registers.
>>
>> But, sadly, no party can be entirely happy:
>> Leaf functions wishing they could have more registers to play with,
>> without needing to save them first;
>> Non-leaf functions wishing they could have more registers for variables
>> which wont get stomped on the next call;
>> ...
>>
>>
>> Can note that, IIRC:
>> Win64 gave a bigger part of this pie to callee-save;
>> SysV/AMD64 gave a bigger part of the pie to caller-save.
> <
> CRAY-1 had only temp registers at the call/return interface. (Lee Higbe circa 1990)
> IBM 360 had only preserved registers.
> VAX had only preserved registers--both had 16 registers.

And Mill has only preserved (from the view of the caller) but you don't
have to preserve them. And only has temps (from the viewpoint of the
callee) but you don't have to clear them.

Re: Why My 66000 is and is not RISC

<t96g62$ane$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26087&group=comp.arch#26087

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sat, 25 Jun 2022 10:14:57 +0200
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <t96g62$ane$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me> <t94kmo$li0$1@newsreader4.netcologne.de>
<t95bpp$77b$1@dont-email.me>
<745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 25 Jun 2022 08:14:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d10ddde9fbac6ae5112228b8da5b97d1";
logging-data="10990"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/NjkqPHF9Tb3qeRWpgVuZQcZP4zzn4+3c="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.9.1
Cancel-Lock: sha1:csVngvjIELaMz6/cRAMHvJFiZkU=
In-Reply-To: <745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
Content-Language: en-GB
 by: David Brown - Sat, 25 Jun 2022 08:14 UTC

On 24/06/2022 23:57, MitchAlsup wrote:
> On Friday, June 24, 2022 at 4:54:05 PM UTC-5, gg...@yahoo.com wrote:
>> Thomas Koenig <tko...@netcologne.de> wrote:
>>> Brett <gg...@yahoo.com> schrieb:
>>>
>>>> How big is the code store needed for an IOT (Internet Of Things smart
>>>> toaster) code stack? And what is the savings for the next size down?
>>>
>>> It will be hard to beat an ARM Cortex-M - based microcontroller
>>> which are firmly embedded in the market, and for which a lot of
>>> software has been written, and which cost a bit more than four
>>> dollars per unit.
>>>
>>> And if that's too expensive and you do not need the performance,
>>> you can always use a MSP430-based one for considerably less,
>>> less than a dollar at quantity.
>>>
>>> The ROM on the latter is somewhere between 1KB and whatever you're
>>> willing to pay for, and the RAM 256 bytes or more. But of course
>>> you're still getting some analog hardware thrown in, such as an
>>> ADC or a comparator.
>>>
>>> Not a lot of savings, I'd say.
>>>
>> You are missing the I in internet, no wifi I can find in that chip.
>>
>> Talking about a network stack to talk to your phone. Smart color changing
>> lightbulbs and soon all the appliances in your home, washer, dryer, stove,
>> microwave, thermostat, security cameras, just everything.
>>
>> Plus your home router, which uses a much more powerful wifi block and CPU.
>>
>> There are markets here that will pay for better code density, assuming a
>> network stack is significant?
> <
> I don't see it:: a 10G or 100G network interface already has a memory footprint
> (for its own buffering concerns) that skimping on the CPU and ROM seems a
> waste.

Only a /very/ tiny proportion of network interfaces are 10 G or above.
Microcontrollers rarely have more than 100 Mbit Ethernet. The next big
thing in wired networking in the embedded world is two-wire Ethernet, to
get the convenience of Ethernet networking at low cost. It comes in 10
Mb and 100 Mb varieties (a 1 Gb variety may come eventually).

For wireless communication, speeds are usually even lower. Modern NBIOT
cellular systems are designed to be extremely low power, cheap, have
longer range (20 km more than 3G and the rest). You send packets of up
to about 200 bytes of data, perhaps once a day, with a delivery time of
several seconds. Perfect for environmental monitoring, finding your
sheep, and many other tasks.

For local Wifi (or Zigbee, Z-Wave, etc.) devices, small and low
bandwidth is also fine. You can get away with a few hundred bytes ram
and still have enough to control a lightbulb, thermostat, etc.

The IOT world is /full/ of systems running on 8-bit AVR's, 16-bit
MSP430's, and other small devices. Code density matters for many of them.

(Of course it's a different matter for wireless cameras and all the
other devices that need high bandwidth.)

Re: Why My 66000 is and is not RISC

<t97aia$a1t$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26088&group=comp.arch#26088

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sat, 25 Jun 2022 10:45:12 -0500
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <t97aia$a1t$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me> <t94kmo$li0$1@newsreader4.netcologne.de>
<t95bpp$77b$1@dont-email.me>
<745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
<t96g62$ane$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Jun 2022 15:45:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bc60642288ede9e52880f470535a9ad3";
logging-data="10301"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18o+KfNy9FbkmMbuMbh9TfQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:yMyxTpuN7Q6BdG1fKYDGOwcm9ZU=
In-Reply-To: <t96g62$ane$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sat, 25 Jun 2022 15:45 UTC

On 6/25/2022 3:14 AM, David Brown wrote:
> On 24/06/2022 23:57, MitchAlsup wrote:
>> On Friday, June 24, 2022 at 4:54:05 PM UTC-5, gg...@yahoo.com wrote:
>>> Thomas Koenig <tko...@netcologne.de> wrote:
>>>> Brett <gg...@yahoo.com> schrieb:
>>>>
>>>>> How big is the code store needed for an IOT (Internet Of Things smart
>>>>> toaster) code stack? And what is the savings for the next size down?
>>>>
>>>> It will be hard to beat an ARM Cortex-M - based microcontroller
>>>> which are firmly embedded in the market, and for which a lot of
>>>> software has been written, and which cost a bit more than four
>>>> dollars per unit.
>>>>
>>>> And if that's too expensive and you do not need the performance,
>>>> you can always use a MSP430-based one for considerably less,
>>>> less than a dollar at quantity.
>>>>
>>>> The ROM on the latter is somewhere between 1KB and whatever you're
>>>> willing to pay for, and the RAM 256 bytes or more. But of course
>>>> you're still getting some analog hardware thrown in, such as an
>>>> ADC or a comparator.
>>>>
>>>> Not a lot of savings, I'd say.
>>>>
>>> You are missing the I in internet, no wifi I can find in that chip.
>>>
>>> Talking about a network stack to talk to your phone. Smart color
>>> changing
>>> lightbulbs and soon all the appliances in your home, washer, dryer,
>>> stove,
>>> microwave, thermostat, security cameras, just everything.
>>>
>>> Plus your home router, which uses a much more powerful wifi block and
>>> CPU.
>>>
>>> There are markets here that will pay for better code density, assuming a
>>> network stack is significant?
>> <
>> I don't see it:: a 10G or 100G network interface already has a memory
>> footprint
>> (for its own buffering concerns) that skimping on the CPU and ROM seems a
>> waste.
>
> Only a /very/ tiny proportion of network interfaces are 10 G or above.
> Microcontrollers rarely have more than 100 Mbit Ethernet.  The next big
> thing in wired networking in the embedded world is two-wire Ethernet, to
> get the convenience of Ethernet networking at low cost.  It comes in 10
> Mb and 100 Mb varieties (a 1 Gb variety may come eventually).
>

Seems like it would also be fairly trivial to chop 10/100 Ethernet down
to a 4-wire variant as well, probably using RJ11 plugs or similar.
Advantage of 4-wire as that this could allow for POE (and 4-wire
phone-wire could be cheaper than CAT5E or similar).

Could also be electrically compatible with existing hubs and switches
via an RJ11 to RJ45 adapter.

> For wireless communication, speeds are usually even lower.  Modern NBIOT
> cellular systems are designed to be extremely low power, cheap, have
> longer range (20 km more than 3G and the rest).  You send packets of up
> to about 200 bytes of data, perhaps once a day, with a delivery time of
> several seconds.  Perfect for environmental monitoring, finding your
> sheep, and many other tasks.
>
> For local Wifi (or Zigbee, Z-Wave, etc.) devices, small and low
> bandwidth is also fine.  You can get away with a few hundred bytes ram
> and still have enough to control a lightbulb, thermostat, etc.
>
> The IOT world is /full/ of systems running on 8-bit AVR's, 16-bit
> MSP430's, and other small devices.  Code density matters for many of them.
>
> (Of course it's a different matter for wireless cameras and all the
> other devices that need high bandwidth.)
>

I would have figured a network stack would have been a bit much for this
class of device...

Re: Why My 66000 is and is not RISC

<t97d80$kk6$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26089&group=comp.arch#26089

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sat, 25 Jun 2022 18:30:55 +0200
Organization: A noiseless patient Spider
Lines: 104
Message-ID: <t97d80$kk6$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me> <t94kmo$li0$1@newsreader4.netcologne.de>
<t95bpp$77b$1@dont-email.me>
<745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
<t96g62$ane$1@dont-email.me> <t97aia$a1t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Jun 2022 16:30:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d10ddde9fbac6ae5112228b8da5b97d1";
logging-data="21126"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+R+uJGNLjRKjGnZYAnIVNz0177mkzq9Bk="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.9.1
Cancel-Lock: sha1:jsKND6PO3eEKxUMyVqrt1urBiBg=
In-Reply-To: <t97aia$a1t$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Sat, 25 Jun 2022 16:30 UTC

On 25/06/2022 17:45, BGB wrote:
> On 6/25/2022 3:14 AM, David Brown wrote:
>> On 24/06/2022 23:57, MitchAlsup wrote:
>>> On Friday, June 24, 2022 at 4:54:05 PM UTC-5, gg...@yahoo.com wrote:
>>>> Thomas Koenig <tko...@netcologne.de> wrote:
>>>>> Brett <gg...@yahoo.com> schrieb:
>>>>>
>>>>>> How big is the code store needed for an IOT (Internet Of Things smart
>>>>>> toaster) code stack? And what is the savings for the next size down?
>>>>>
>>>>> It will be hard to beat an ARM Cortex-M - based microcontroller
>>>>> which are firmly embedded in the market, and for which a lot of
>>>>> software has been written, and which cost a bit more than four
>>>>> dollars per unit.
>>>>>
>>>>> And if that's too expensive and you do not need the performance,
>>>>> you can always use a MSP430-based one for considerably less,
>>>>> less than a dollar at quantity.
>>>>>
>>>>> The ROM on the latter is somewhere between 1KB and whatever you're
>>>>> willing to pay for, and the RAM 256 bytes or more. But of course
>>>>> you're still getting some analog hardware thrown in, such as an
>>>>> ADC or a comparator.
>>>>>
>>>>> Not a lot of savings, I'd say.
>>>>>
>>>> You are missing the I in internet, no wifi I can find in that chip.
>>>>
>>>> Talking about a network stack to talk to your phone. Smart color
>>>> changing
>>>> lightbulbs and soon all the appliances in your home, washer, dryer,
>>>> stove,
>>>> microwave, thermostat, security cameras, just everything.
>>>>
>>>> Plus your home router, which uses a much more powerful wifi block
>>>> and CPU.
>>>>
>>>> There are markets here that will pay for better code density,
>>>> assuming a
>>>> network stack is significant?
>>> <
>>> I don't see it:: a 10G or 100G network interface already has a memory
>>> footprint
>>> (for its own buffering concerns) that skimping on the CPU and ROM
>>> seems a
>>> waste.
>>
>> Only a /very/ tiny proportion of network interfaces are 10 G or above.
>> Microcontrollers rarely have more than 100 Mbit Ethernet.  The next
>> big thing in wired networking in the embedded world is two-wire
>> Ethernet, to get the convenience of Ethernet networking at low cost.
>> It comes in 10 Mb and 100 Mb varieties (a 1 Gb variety may come
>> eventually).
>>
>
> Seems like it would also be fairly trivial to chop 10/100 Ethernet down
> to a 4-wire variant as well, probably using RJ11 plugs or similar.
> Advantage of 4-wire as that this could allow for POE (and 4-wire
> phone-wire could be cheaper than CAT5E or similar).

10 Mbps and 100 Mbps Ethernet already only use 4 wires - one pair in
each direction. Passing (non-isolated) DC power over these wires is
extremely simple, and requires nothing more than a few diodes and an LC
filter. Unfortunately, the PoE standards were developed by a committee
of morons that produced a ridiculously over-engineered system that is
too bulky and expensive to have caught on outside a few specific use-cases.

Even easier, however, is simply to pass the power over the spare pairs
in a standard 4-pair Ethernet cable.

The two-wire Ethernet standards already include support for simpler and
cheaper PoE solutions.

> Could also be electrically compatible with existing hubs and switches
> via an RJ11 to RJ45 adapter.
>
>
>> For wireless communication, speeds are usually even lower.  Modern
>> NBIOT cellular systems are designed to be extremely low power, cheap,
>> have longer range (20 km more than 3G and the rest).  You send packets
>> of up to about 200 bytes of data, perhaps once a day, with a delivery
>> time of several seconds.  Perfect for environmental monitoring,
>> finding your sheep, and many other tasks.
>>
>> For local Wifi (or Zigbee, Z-Wave, etc.) devices, small and low
>> bandwidth is also fine.  You can get away with a few hundred bytes ram
>> and still have enough to control a lightbulb, thermostat, etc.
>>
>> The IOT world is /full/ of systems running on 8-bit AVR's, 16-bit
>> MSP430's, and other small devices.  Code density matters for many of
>> them.
>>
>> (Of course it's a different matter for wireless cameras and all the
>> other devices that need high bandwidth.)
>>
>
> I would have figured a network stack would have been a bit much for this
> class of device...

I have a book on my shelf describing a TCP/IP stack for an 8-bit PIC
microcontroller.

However, the network stack needed for small Wifi or NB-IOT systems is
vastly smaller than you need for a full IP and TCP/IP stack.

Re: Why My 66000 is and is not RISC

<t97m27$lkt$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26091&group=comp.arch#26091

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sat, 25 Jun 2022 14:01:25 -0500
Organization: A noiseless patient Spider
Lines: 196
Message-ID: <t97m27$lkt$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me> <t94kmo$li0$1@newsreader4.netcologne.de>
<t95bpp$77b$1@dont-email.me>
<745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
<t96g62$ane$1@dont-email.me> <t97aia$a1t$1@dont-email.me>
<t97d80$kk6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Jun 2022 19:01:27 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bc60642288ede9e52880f470535a9ad3";
logging-data="22173"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/aq/X8TeURPFEam4/F+qD3"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:7l6dOnemKkTVePMaXZh7YMeNJMg=
In-Reply-To: <t97d80$kk6$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sat, 25 Jun 2022 19:01 UTC

On 6/25/2022 11:30 AM, David Brown wrote:
> On 25/06/2022 17:45, BGB wrote:
>> On 6/25/2022 3:14 AM, David Brown wrote:
>>> On 24/06/2022 23:57, MitchAlsup wrote:
>>>> On Friday, June 24, 2022 at 4:54:05 PM UTC-5, gg...@yahoo.com wrote:
>>>>> Thomas Koenig <tko...@netcologne.de> wrote:
>>>>>> Brett <gg...@yahoo.com> schrieb:
>>>>>>
>>>>>>> How big is the code store needed for an IOT (Internet Of Things
>>>>>>> smart
>>>>>>> toaster) code stack? And what is the savings for the next size down?
>>>>>>
>>>>>> It will be hard to beat an ARM Cortex-M - based microcontroller
>>>>>> which are firmly embedded in the market, and for which a lot of
>>>>>> software has been written, and which cost a bit more than four
>>>>>> dollars per unit.
>>>>>>
>>>>>> And if that's too expensive and you do not need the performance,
>>>>>> you can always use a MSP430-based one for considerably less,
>>>>>> less than a dollar at quantity.
>>>>>>
>>>>>> The ROM on the latter is somewhere between 1KB and whatever you're
>>>>>> willing to pay for, and the RAM 256 bytes or more. But of course
>>>>>> you're still getting some analog hardware thrown in, such as an
>>>>>> ADC or a comparator.
>>>>>>
>>>>>> Not a lot of savings, I'd say.
>>>>>>
>>>>> You are missing the I in internet, no wifi I can find in that chip.
>>>>>
>>>>> Talking about a network stack to talk to your phone. Smart color
>>>>> changing
>>>>> lightbulbs and soon all the appliances in your home, washer, dryer,
>>>>> stove,
>>>>> microwave, thermostat, security cameras, just everything.
>>>>>
>>>>> Plus your home router, which uses a much more powerful wifi block
>>>>> and CPU.
>>>>>
>>>>> There are markets here that will pay for better code density,
>>>>> assuming a
>>>>> network stack is significant?
>>>> <
>>>> I don't see it:: a 10G or 100G network interface already has a
>>>> memory footprint
>>>> (for its own buffering concerns) that skimping on the CPU and ROM
>>>> seems a
>>>> waste.
>>>
>>> Only a /very/ tiny proportion of network interfaces are 10 G or
>>> above. Microcontrollers rarely have more than 100 Mbit Ethernet.  The
>>> next big thing in wired networking in the embedded world is two-wire
>>> Ethernet, to get the convenience of Ethernet networking at low cost.
>>> It comes in 10 Mb and 100 Mb varieties (a 1 Gb variety may come
>>> eventually).
>>>
>>
>> Seems like it would also be fairly trivial to chop 10/100 Ethernet
>> down to a 4-wire variant as well, probably using RJ11 plugs or
>> similar. Advantage of 4-wire as that this could allow for POE (and
>> 4-wire phone-wire could be cheaper than CAT5E or similar).
>
> 10 Mbps and 100 Mbps Ethernet already only use 4 wires - one pair in
> each direction.  Passing (non-isolated) DC power over these wires is
> extremely simple, and requires nothing more than a few diodes and an LC
> filter.  Unfortunately, the PoE standards were developed by a committee
> of morons that produced a ridiculously over-engineered system that is
> too bulky and expensive to have caught on outside a few specific use-cases.
>

That is kinda the point of how it would be electrically compatible:
Use the pairs that are in-use in 10/100;
Skip the other wires;
Maybe use smaller/cheaper RJ11 (6P4C variant) rather than RJ45.

Normal twisted-pair telephone wire would probably have sufficient
electrical properties to 10/100 in many cases.

Only thing is that it would require an adapter to plug RJ11 into RJ45,
though other options:
Use RJ45 but with only 2 pairs (effectively a 10/100-only wire);
Cable which has RJ11 on one end but RJ45 on the other.
(Side-stepping the need for an adapter at the switch).

Probably put the pins in the plug in such a way that it doesn't have
adverse effects if someone tries to plug a telephone into it.

Say:
NC, A+, B-, B+, A-, NC

Normal phone only connecting to the B pair (vs across the A/B pairs).

With the POE system I am imagining, if one did connect across the A/B
pairs, plugging a phone into it would result in it ringing continuously,
whereas if only the B pair is connected (probably the TX pair from the
phone's end), it would be silent and there would be zero net voltage
from the phone's end.

> Even easier, however, is simply to pass the power over the spare pairs
> in a standard 4-pair Ethernet cable.
>

I had assumed doing a thing of running 48 VAC or similar between the two
differential pairs.

This should work OK, but needs at least 4 wires (2 for each pair).
On the device side, there would probably be a bridge rectifier connected
to the center-taps of an isolation transformer.

For PoE with this system, an RJ11<->RJ45 adapter could also function as
the AC injector, say with a pair of isolation transformers (to let the
data through), with the center taps connected (via another transformer)
up to the mains power.

Could make sense in the PoE case to have it as a multi-port block
though, say, 4-8 simultaneous connections, rather than 1 adapter per cable.

> The two-wire Ethernet standards already include support for simpler and
> cheaper PoE solutions.
>

OK, would need to look into it.

But, would assume that a two-wire interface is not likely to be
electrically compatible with traditional Ethernet, at least not without
some additional trickery (additional isolation transformers and probably
a ground wire).

Signaling and power would maybe be done in a similar way to a 2-wire
telephone, but this wouldn't be able to be (passively) connected up to
existing hubs or switches.

>> Could also be electrically compatible with existing hubs and switches
>> via an RJ11 to RJ45 adapter.
>>
>>
>>> For wireless communication, speeds are usually even lower.  Modern
>>> NBIOT cellular systems are designed to be extremely low power, cheap,
>>> have longer range (20 km more than 3G and the rest).  You send
>>> packets of up to about 200 bytes of data, perhaps once a day, with a
>>> delivery time of several seconds.  Perfect for environmental
>>> monitoring, finding your sheep, and many other tasks.
>>>
>>> For local Wifi (or Zigbee, Z-Wave, etc.) devices, small and low
>>> bandwidth is also fine.  You can get away with a few hundred bytes
>>> ram and still have enough to control a lightbulb, thermostat, etc.
>>>
>>> The IOT world is /full/ of systems running on 8-bit AVR's, 16-bit
>>> MSP430's, and other small devices.  Code density matters for many of
>>> them.
>>>
>>> (Of course it's a different matter for wireless cameras and all the
>>> other devices that need high bandwidth.)
>>>
>>
>> I would have figured a network stack would have been a bit much for
>> this class of device...
>
> I have a book on my shelf describing a TCP/IP stack for an 8-bit PIC
> microcontroller.
>
> However, the network stack needed for small Wifi or NB-IOT systems is
> vastly smaller than you need for a full IP and TCP/IP stack.

OK.

Once (when I was much younger) I implemented a TCP/IP stack and Ethernet
card driver in a hobby OS project.

Lots of little lesser-known protocols in this mix, like ICMP and ARP and
similar, ...

In my current projects, I haven't gotten back around to this part yet.
Partly, it looks like to do it from an FPGA, one is basically driving
out the bits themselves, and has to write their own logic for
transmitting and receiving Ethernet frames at the level of bits going
over the wires (IIRC).

With the old card I was targeting, IIRC it was at the level of
abstraction of Ethernet frames getting transmitted and received via a
pair of ring buffers.

The card I am using does have an Ethernet port and similar at least, so
could maybe get to this eventually.

Re: Why My 66000 is and is not RISC

<t97si0$bju$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26093&group=comp.arch#26093

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sat, 25 Jun 2022 15:52:14 -0500
Organization: A noiseless patient Spider
Lines: 147
Message-ID: <t97si0$bju$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<2f5c8378-de57-4ef2-8e24-01209e4c7a20n@googlegroups.com>
<t94tjp$slm$1@dont-email.me>
<315e7041-51de-4ea0-aa3c-3e7a0cf8bff0n@googlegroups.com>
<d28c3207-b7ad-44ed-8614-29eb065aecd6n@googlegroups.com>
<t95mek$33n$1@dont-email.me>
<4cde7f77-e3d5-48bb-bf6d-8329cb91f805n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Jun 2022 20:52:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bc60642288ede9e52880f470535a9ad3";
logging-data="11902"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18tfjoGESdgvpCe3iUnunFg"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:twAvUFA9GQKLP2a36j3OEzrebbo=
In-Reply-To: <4cde7f77-e3d5-48bb-bf6d-8329cb91f805n@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 25 Jun 2022 20:52 UTC

On 6/24/2022 8:36 PM, MitchAlsup wrote:
> On Friday, June 24, 2022 at 7:55:52 PM UTC-5, BGB wrote:
>> On 6/24/2022 3:02 PM, MitchAlsup wrote:
>
>>> With <realistically> 30-64-bit registers in use by compiler and 16 of these preserved,
>>> I am not seeing very much caller-save register traffic from Brian's LLVM port. It is more
>>> like R9-R15 are simply temps used whenever and forgotten.
>> That is presumably how it is supposed to be...
>>
>>
>> In my case, it is roughly a 50/50 split between caller save (scratch)
>> and callee save (preserved) registers.
> <
> I, too, have 50%/50%:: R1-15 are temps, R16-30 are preserved.
> R0 receives Return Address, R31 is Stack Pointer. ½ of the temps
> can be used to carry arguments and results covering the 98%-ile.

Yeah:
R0/R1: Special
R2..R7: Scratch
R8..R14: Preserved
R15: SP
R16..R23: Scratch
R24..R31: Preserved

So: 14 scratch, 15 Preserved.

ABI:
R2/R3 Return Value
R2: Struct Pointer (Struct Return)
R3: 'this'
R4..R7, R20..R23: Arguments

If XGPR:
R32..R39, R48..R55: Scratch
R40..R47, R56..R63: Preserved

If the 128-bit ABI:
R36..R39, R52..R55: Arguments
Some other registers in the ABI are moved around.

SP is at R15 mostly for historical reasons, does result in some cruft
though.

>>
>> For leaf functions, one wants a lot of scratch registers, and for
>> non-leaf functions, a lot of callee-save registers.
>>
>> But, sadly, no party can be entirely happy:
>> Leaf functions wishing they could have more registers to play with,
>> without needing to save them first;
>> Non-leaf functions wishing they could have more registers for variables
>> which wont get stomped on the next call;
>> ...
>>
>>
>> Can note that, IIRC:
>> Win64 gave a bigger part of this pie to callee-save;
>> SysV/AMD64 gave a bigger part of the pie to caller-save.
> <
> CRAY-1 had only temp registers at the call/return interface. (Lee Higbe circa 1990)
> IBM 360 had only preserved registers.
> VAX had only preserved registers--both had 16 registers.

OK.

IIRC, x86:
EAX: Scratch
ECX: Scratch
EDX: Scratch
EBX: "It Depends" (1)
ESP: Stack
EBP: Base-Pointer / Preserved
ESI: Preserved
EDI: Preserved

1: Compilers didn't really seem to entirely agree on whether EBX was
Scratch or Preserved, but Preserved seemed to be more common.

>>
>> A roughly even split seemed like an easy answer, lacking any good way to
>> find a general/optimal balance across a range of programs.
>>
> The choice is a lot easier 50%/50% when you have 32 registers.

Yeah.

>>
> <snip>
>>>
>> I once had PUSH/POP in BJX2, but then I dropped them (mostly for
>> cost-saving reasons; after noting that adjusting the stack-pointer and
>> then using a series of stores, or performing a series of loads and then
>> adjusting the stack pointer, could be similarly effective).
> <
> Push instructions make::
> PUSH R1
> PUSH R2
> PUSH R3
> more expensive than:
> SUB SP,SP,#12
> ST R1,[SP+8]
> ST R1,[SP+4]
> ST R1,[SP]
> due to the serial dependency.
> <
> The peep hole HW optimizer in K9 would perform this transformation.
> {Yes, the optimizer was a piece of HW the compiler knew nothing about.}

Partial issue was that, once I got around to pipelining Load/Store
operations, the Push/Pop would have been generally slower as well as
they require an extra interlock stage to deal with the SP updates.

No hardware level optimizers in my case.

It was cheaper and easier simply to drop them from the ISA.

Typically, the:
ADD Imm8s, SP
Or:
ADD Imm16s, SP
Instructions are used for stack-pointer adjustments.

Had ended up treating the "LDISH16 Imm16, SP" encoding as a special type
of breakpoint, mostly one with a magic number, intended mostly to help
with debugging (if a "__debugbreak()" is hit, I can use the magic number
to figure out which debugbreak was hit, where otherwise I might not know
where exactly the loader has put the loaded program, meaning PC by
itself is not sufficient to identify the offending breakpoint).

But, this does seem like a bit of a hack.

I guess, can also note that 0000 is also interpreted as a breakpoint (so
it will trigger a breakpoint if branching into zeroed memory), but this
is not the official BREAK instruction (3030 or F000_3030).

Technically, it is actually:
MOV.B DLR, (DLR, DLR)
But, like, this encoding is pointless enough, that there is no issue
with interpreting it instead as a breakpoint.

....

Re: Why My 66000 is and is not RISC

<t97u0m$3kbi2$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26094&group=comp.arch#26094

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sat, 25 Jun 2022 21:17:15 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 257
Message-ID: <t97u0m$3kbi2$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org>
<t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Jun 2022 21:17:15 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="b31c3ba5cb3865d587fddf8ae453d3dc";
logging-data="3812930"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/36m100bwlP9fcHnRb/ETo"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:EMmTFCVlQlWke35QEwL7mzFRh1c=
sha1:kTdTF8JLgzFm7m0m30DvujW2g6U=
 by: Brett - Sat, 25 Jun 2022 21:17 UTC

MitchAlsup <MitchAlsup@aol.com> wrote:
> On Thursday, June 23, 2022 at 6:28:53 PM UTC-5, gg...@yahoo.com wrote:
>> MitchAlsup <Mitch...@aol.com> wrote:
>>> In todays installment I touch on things about My 66000 not covered above.
>>>
>>> My 66000 ISA requires an instruction buffer and a 2 stage instruction
>>> processing pipeline I call PARSE and DECODE. Hennessey would be booing
>>> at this point. However, using this, I get branch overhead down to 0.03 cycles
>>> per taken branch without having any delay slot. {This also makes a unified
>>> L1 cache feasible. But since Fetch and MemRef are so far apart on the die
>>> My implementations have chosen not to utilize this capability.}
>>>
>>> PARSE finds the instruction boundaries (main job) and scans ahead for branches,
>>> determines which function units, and looks for CoIssue opportunities. The scan
>>> ahead branches are processes in parallel by DECODE to fetch branch targets
>>> even before the branch instruction is executed. So if a taken prediction is made
>>> the instructions on the taken path are already ready to enter execution. PARSE
>>> identifies immediates and displacements and cancels register port requests,
>>> providing opportunities for ST to read the register file..........
>>>
>>> DECODE processes the instructions from PARSE , accesses register file,
>>> computes forwarding, and starts instruction into the execution pipeline.
>>> DECODE routes immediates and displacements to the required instruction.
>>> ST instruction pass through DECODE twice, the 1st time is for AGEN, the
>>> 2nd time is for ST.data when a register file port is available.
>>>
>>> ---------------------------instruction
>>> stuff-----------------------------------------------------------
>>>
>>> The shift instructions have 2×6-bit fields dealing with the shift amount and
>>> the width of data being shifted. These are used to access odd-sized data
>>> (ala EXTRACT) and to SMASH data calculated at "machine" size back down
>>> into containers of "language" size so the container cannot contain a value
>>> outside of the range of its container. When the width field is 0 it is considered
>>> to be 64-bits. When encoded as an immediate, the 2 fields are back-to-back,
>>> when found in a register there is 26-bits separating the 2 fields, in data<38:32>
>>> both 1000000 and 0000000 are considered to be 64-bits, while 1xxxxxxx
>>> with any of the x's non-zero is considered an Operand exception.
>>>
>>> The Multiplex Instruction MPX. MPX basically allows for selecting bits from
>>> a pair of registers based on another register:: ( a & b ) | ( ~a & c ), however
>>> it has other flavors to provide ( !!a & b ) | ( !a & c ) which is CMOV and by
>>> using the immediate encodings in My 66000 provides MOV Rd,#IMM32 and
>>> MOV Rd,#IMM64 along with MOV Rd, Rs1 and MOV Rd,Rs2. They fall out
>>> for free saving MOV opcodes elsewhere.
>>>
>>> Vectorization: My 66000 ISA contains loop vectorization. This allows
>>> for performing vectorized loops are several iterations per cycle even
>>> 1-wide machines can perform at 32+ instructions per cycle. My main
>>> (as yet unproven) hope is that this takes the pressure off of the design
>>> width. The basic argument is as follows:
>>> a) 1-wide machines operate at 0.7 IPC
>>> b) 2-wide SuperScalar machines operate at 1.0 IPC
>>> c) GBOoO machines operate at 2.0 IPC
>>> d) programs spend more than ½ their time in loops.
>>> So, if one can get a 2× performance advantage of the 1-wide machine
>>> this puts it in spitting distance of the GBOoO machine, which in turn
>>> means the Medium OoO machine can be competitive with the GBOoO
>>> machine are significantly lower {cost, design time, area, power}
>>>
>>> AND while investigating loop vectorization, I discovered that a RISC
>>> pipeline with a 3R-1W register file can perform 1.3 IPC. Branch
>>> instructions (20%) do not use the result register, ST instructions
>>> (10%) can borrow the write port AFTER cache tag and translation
>>> validations, AND in the general code I have seen there is significant
>>> opportunity to perform write-elision in the data path, freeing up even
>>> more ports. This, again takes pressure of the width of the design.
>>> So, with vectorization, a 3 (or 4)-wide machine is competitive with
>>> a 6-wide machine,.....
>>>
>>> None of this prevents or makes wide GBOoO more difficult.
>>>
>>> ----------------------instruction modifiers------------------------------------------
>>>
>>> CARRY is the first of the Instruction-Modifiers. An instruction-modifier
>>> supplies "bits" for several future instructions so that one does not need
>>> the cartesian product of a given subset encoded in the ISA. Thus, there
>>> are shift instructions and when used with CARRY these perform shifts
>>> as wide as you like 128, 256, 512,.....no need to clog up the encoding
>>> space for lightly used but necessary functionality. Oven in the FP arena
>>> CARRY provides access to exact FP arithmetics.
>>>
>>> CARRY provides access to multiprecision arithmetic both integer and FP.
>>> CARRY provides a register which can be used as either/both Input and Output
>>> to a set of instructions. This provides a link from one instruction to another
>>> where data is transmitted but not encoded in the instruction itself.
>>>
>>> Since we are in the realm of power limited, My 66000 ISA has an ABS
>>> instruction. Over in the integer side, this instruction can be performed
>>> by subjugating the sign control built into the data path and be "executed"
>>> without taking any pipeline delay (executes in zero cycles). Over on the
>>> FP side this never adds any latency (executes in zero cycles). ABS always
>>> takes less power than performing the instruction in any other way.
>>>
>>> DBLE is an instruction modifier that supplies register encodings and
>>> adds 64-bits to the calculation width of the modified instruction. Applied
>>> to a FP instruction: DBLE Rd1,Rs11,Rs22,Rs33; FMAC Rd2,Rs12,Rs22,Rs33
>>> we execute: FMUL {Rd1,Rd2},{Rs11,Rs22},{Rs21,Rs22},{Rs31,Rs32}
>>> and presto: we get FP128 by adding exactly 1 instruction, the compiler
>>> can pick any 8 registers it desires alleviating register allocation concerns.
>>> DBLE is a "get by" kind of addition, frowned upon by Hennessey.
>>>
>>> I can envision a SIMD instruction modifier that defines the SIMD parameters
>>> of several subsequent instructions and allows 64-bit SIMD to transpire.
>>> I am still thinking about these. What I cannot envision is a wide SIMD
>>> register file--this is what VVM already provides.
>>>
>>> These instruction-modifiers, it seems to me, are vastly more efficient
>>> than throwing hundreds to thousands of unique instructions into ISA.
>>> Especially if those unique instructions <on average> are not used
>>> "that much".
>>>
>>> -----------------------------Safe
>>> Stack--------------------------------------------------------
>>>
>>> Safe Stack. My 66000 architecture contains the notion of a Safe Stack.
>>> Only 3 instructions have access to Safe Stack: {ENTER, EXIT, and RET}
>>> When Safe Stack is in use, the return address goes directly to the Safe
>>> Stack, and return address comes directly off safe stack. Preserved
>>> registers are placed on Safe Stack {ENTER} and their register values
>>> (conceptually) set to 0. Safe Stack is in normal thread memory but
>>> the PTEs are marked RWE = 000 so any access causes page faults.
>>> EXIT reloads the preserved registers from Safe Stack and transfers
>>> control directly back to caller. When Safe Stack is not in use, R0
>>> is used to hold the return address. Proper compiled code runs the
>>> same when safe stack is on or off, so one can share dynamic libraries
>>> between modes.
>>>
>>> Safe Stack monitors the value in SP and KILLs lines that no longer
>>> need to reach out into the cache hierarchy, Safe Stack can efficiently
>>> use Allocate memory semantics. Much/most of the time, nothing
>>> in safe stack leaves the cache hierarchy.
>>>
>>> Buffer overflows on the "stack" do not corrupt the call/return flow of
>>> control. ROP cannot happen as application has no access to Return
>>> Address. Application cannot see the values in the preserved registers
>>> augmenting safety and certainly cannot modify them.
>>>
>>> -------------------------------ABI----------------------------------------------------------------------
>>>
>>>
>>> Subroutine Calling Convention {A.K.A. ABI}:
>>> Registers R1..R8 contain the first 8 arguments to the subroutine.
>>> SP points are argument[9]
>>> R9..R15 are considered as temporary registers
>>> R16..R29 are preserved registers
>>> R30=FP is a preserved registers but used as a Frame Pointer when
>>> ..............language semantics need.
>>> R31=SP is a preserved register and used as a Stack Pointer. SP must
>>> ..............remain doubleword aligned at all times.
>>>
>>> ABI is very RISC
>>>
>>> So, let's say we want to call a subroutine that wants to allocate 1024
>>> bytes on the stack for its own local data, is long running and needs
>>> to preserve all 14 preserved registers, and is using a FP along with a
>>> SP. Let us further complicate the mater by stating this subroutine
>>> takes variable number of arguments. Entry Prologue:
>>>
>>> ENTRY subroutine_name
>>> subrutine_name:
>>> ENTER R16,R8,#(1024 | 2)
>>>
>>> At this point the register passed arguments have been saved with the
>>> memory passed arguments, FP is pointing at the "other" end of local
>>> data on the stack, after pushing the registers, 1024 bytes has been
>>> allocated onto the SP, the old FP has been saved and the new FP setup.
>>> {This works both with and without Safe Stack}
>>>
>>> Your typical RISC-only ISA would require at least 29 instructions to
>>> do this amount of work getting into the subroutine, and another 17
>>> getting out. If the ISA has both INT and FP register files 29 becomes 37.
>>>
>>> The Same happens in Epilogue: 1 instruction.
>>>
>>> While ABI is very RISC the ISA of Prologue and Epilogue is not.
>>>
>>> As a side note: My 66000 is achieving similar code density as x86-64.
>>>
>>> A few other interesting side
>>> bits:------------------------------------------------------------
>>>
>>> LDM and STM to unCacheable address are performed as if ATOMIC::
>>> that is:: as a single bus transaction. All interested 3rd parties see the
>>> memory before any writes have been performed or after all writes
>>> have been performed. A device driver can read several MMIO device
>>> control registers and know that nobody else in the system has access
>>> to the device control registers that could cause interference. A device
>>> driver can store multiple control register locations without interference.
>>>
>>> There is a page ¿in ROM? known to contain zeros. A Memory Move
>>> instruction can cause a page accessing this ¿ROM? data to be zeroed
>>> without even bothering to access ¿ROM?--and the entire page is zeroed
>>> at the target. Thus, pages being reclaimed to the free pool are but 1
>>> instruction away from being in the already zeroed page pool. Zeroing
>>> pages is performed at the DRAM end of the system (coherently). And
>>> no <deleterious> bus activity is utilized.
> <
>> X86-64 has crap code density, your one instruction stack save restore alone
>> should make you significantly better, unless perhaps you have gone 32+32.
> <
> It is a major contributor to getting as small as it got.
>>
>> Add some accumulator ops and most instructions will fit in 16 bits ops with
>> ease, and you have the extra decode stage to do it anyway.
> <
> I looked at this a few years ago and the damage to long term ISA growth
> was catastrophic. As it is I have nearly ½ of the OpCode space in each
> OpCode group left for the future. and can PARSE instructions in 31 gates
> with only 4 gates of delay. All that goes out the window with a meaningful
> 16-bit "extension". I pass.


Click here to read the complete article
Re: Why My 66000 is and is not RISC

<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26095&group=comp.arch#26095

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:600c:1553:b0:39c:87fc:5784 with SMTP id f19-20020a05600c155300b0039c87fc5784mr11349275wmg.90.1656196174774;
Sat, 25 Jun 2022 15:29:34 -0700 (PDT)
X-Received: by 2002:a05:6214:2308:b0:432:e69f:5d71 with SMTP id
gc8-20020a056214230800b00432e69f5d71mr4355610qvb.19.1656196174189; Sat, 25
Jun 2022 15:29:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!pasdenom.info!usenet-fr.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 25 Jun 2022 15:29:34 -0700 (PDT)
In-Reply-To: <t97u0m$3kbi2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e800:be77:c727:32a0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e800:be77:c727:32a0
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com> <t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com> <t97u0m$3kbi2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
Subject: Re: Why My 66000 is and is not RISC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 25 Jun 2022 22:29:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sat, 25 Jun 2022 22:29 UTC

On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
> MitchAlsup <Mitch...@aol.com> wrote:
<snip>
> > I looked at this a few years ago and the damage to long term ISA growth
> > was catastrophic. As it is I have nearly ½ of the OpCode space in each
> > OpCode group left for the future. and can PARSE instructions in 31 gates
> > with only 4 gates of delay. All that goes out the window with a meaningful
> > 16-bit "extension". I pass.
<
> I don’t get why you think you need to reserve half your opcode space for
> future extensions, I would have thought we are at the end of history for
> opcode extensions.
<
a) because I have watched what happens to processors over 50 years and
how every time you turn around they have more OpCodes--mainly to address
stuff forgotten earlier.
<
b) remember I compacted everything into only 59 actual instructions.
>
> What is the cost of reserving 3 bits of one pattern and the same pattern at
> the 16 bit border, so that you can add 16 bit opcodes in the future?
<
The cost of reserving space for 16-bit is that it over-constrains the 32-bit
OpCode space. For example: I could not give the 16-bit OpCodes a typical
subgroup (6-bit Major OpCode) because the first instruction would only
have 10-bits left !! (16-6=10)
<
Also note: Where there are instructions in several formats (like ADD with 16-bit
immediate and ADD of 2 registers), in all cases, the bit pattern used to recognize
ADD remains identical.
<
There are several OpCode groups reserved in perpetuity, these were chosen such
that if one branches into data there is very little possibility to finding anything other
than INVALID instruction decoding sitting there. From the M7 66000 ISA document::
<----------------------------------------------------------------------------------------------------------------------------
A number of the Major OpCode specifiers are reserved in perpetuity as illegal
OpCodes. Transferring control into 32-bit or 64-bit integer data, or into 32-bit or
64-bit floating point data will very likely result in the decoding of an illegal
instruction and raise the OPERATION exception. In particular, small 32-bit positive
integers and small 32-bit negative integers are illegal instructions. 32-bit Floating
point values in the range ±[1/128..32) are also illegal OpCodes. Should control be
transferred into typical integer or floating point data, there is little likelihood of
executing for a long time before running into an illegal instruction. Executing data
can also be prevented in the MMU when desired (almost always.)
<----------------------------------------------------------------------------------------------------------------------------
This catches programming errors. When I looked, you can't do this with 16-bit inst-
ructions; as basically every encoding has to be used.
<
That is: some architects think of things other than code density--we think of
code safety--even when the GuestOS fights us (or JavaScript,...) programming
attack modes, and design architectures that are substantially more robust than
current ones.
>
In particular, My 66000 is immune to the current laundry list of attack strategies
{Meltdown, Spectré, RowHammer, RoP, ForeShadow, ...}
<
> Besides the cost of rearranging bits the cost is nothing with half the
> opcode space reserved. If you are clever and spread the three bits to fit
> your open slots.
<
>
> This gives your customers choices, customers like choices. You could upsell
> this as a future feature like RISCV does and just never deliver, like
> RISCV. ;)
>
I give my customers a choice: Do you want a machine that is secure or not ?
>
> You have a two wide design,
<
Technically, I have a 1-wide design that can CoIssue some pairs of instructions
It is 1-wide because there is 3R1W register file. It can CoIssue because some
pairs of instructions do not consume certain register ports. It is not at all
SuperScalar !
<
I also have a 6-wide design. Nothing in ISA or the rest of the architecture makes
either design necessarily harder than it needs to be. For many of the control
logic calculations My 66000 ISA requires, I went to the trouble of implementing
the gate circuitry to perform said duties to verify that the choices were good.
For example, I can look at a 32-bit word (which takes 320 gates to hold in flip-
flops) and determine the instruction length in 31 gates (4-gates of delay). Thereby
Instruction caches for machines less than 16-wide do not need predecoding bits.
FCMP is performed in the same function unit as Integer CMP. The integer part
requires 55 gates (4 gates of delay) the FP addition is 17 gates (remains 4 gates
of delay) So adding FCMP to CMP is a small price {Oh and BTW, I restrict myself
to 4-in NAND gates and 3-in NOR gates}
<
> even once you implement 16 bit opcodes handling
> the unaligned case of long instructions is not that hard, an extra gate
> delay? Two? Depending on which of a dozen approaches you pick for size.
<
I can assure you that if I had set out to design a good 16-bit OpCode ISA
I would have succeeded. But over my 40 year career, I have found some
things more valuable, and I pursued those interests. You are free to do
differently.
>
> The resulting 13 bit opcode is plenty for one register a source ACC and a
> dest ACC, and a full set of integer operations. And some short loads/store
> offsets to/from ACC using one address register.
<
Then you end up with a cartesian product of various ways to do the same thing.
This makes the compiler have to figure out if::
<
INC Rd ; NoOp
is better or worse than:
ADD Rd,Rd,#1
and what set of rules (or heuristics) govern the selection. I have worked inside
compilers trying to figure this kind of stuff out. It ain't straightforward..
<
I submit that compilers are best when they only need to figure how to do
something exactly one way.

Re: Why My 66000 is and is not RISC

<t998p1$3sgo4$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26101&group=comp.arch#26101

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 11:26:56 +0200
Organization: A noiseless patient Spider
Lines: 169
Message-ID: <t998p1$3sgo4$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me> <t94kmo$li0$1@newsreader4.netcologne.de>
<t95bpp$77b$1@dont-email.me>
<745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
<t96g62$ane$1@dont-email.me> <t97aia$a1t$1@dont-email.me>
<t97d80$kk6$1@dont-email.me> <t97m27$lkt$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 26 Jun 2022 09:26:57 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="2a6330a4d1dc36ad689cc52acec13e1c";
logging-data="4080388"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Oox811Txcqn4I7TE5t1dnj+HtEHkSLAY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.9.1
Cancel-Lock: sha1:jAm+DpF2nrsKTlg9661+GvI19nU=
Content-Language: en-GB
In-Reply-To: <t97m27$lkt$1@dont-email.me>
 by: David Brown - Sun, 26 Jun 2022 09:26 UTC

On 25/06/2022 21:01, BGB wrote:
> On 6/25/2022 11:30 AM, David Brown wrote:
>> On 25/06/2022 17:45, BGB wrote:

>>> Seems like it would also be fairly trivial to chop 10/100 Ethernet
>>> down to a 4-wire variant as well, probably using RJ11 plugs or
>>> similar. Advantage of 4-wire as that this could allow for POE (and
>>> 4-wire phone-wire could be cheaper than CAT5E or similar).
>>
>> 10 Mbps and 100 Mbps Ethernet already only use 4 wires - one pair in
>> each direction.  Passing (non-isolated) DC power over these wires is
>> extremely simple, and requires nothing more than a few diodes and an
>> LC filter.  Unfortunately, the PoE standards were developed by a
>> committee of morons that produced a ridiculously over-engineered
>> system that is too bulky and expensive to have caught on outside a few
>> specific use-cases.
>>
>
> That is kinda the point of how it would be electrically compatible:
>   Use the pairs that are in-use in 10/100;
>   Skip the other wires;
>   Maybe use smaller/cheaper RJ11 (6P4C variant) rather than RJ45.
>
> Normal twisted-pair telephone wire would probably have sufficient
> electrical properties to 10/100 in many cases.

RJ11 connectors are not going to be much cheaper than RJ45, if anything.
And telephone wire is not going to be good enough for anything here.
In particular, there are no real standards or quality control (except
for long-distance lines that cost more than good Ethernet cables because
no one installs new ones any more). So your cheapo bit of telephone
wire might work in one system, but a different cheapo wire won't. It
might work fine until your motors start, then the ESD interference
disrupts it. Having something that /might/ work or /usually/ works is
not very useful.

>
> Only thing is that it would require an adapter to plug RJ11 into RJ45,
> though other options:
>   Use RJ45 but with only 2 pairs (effectively a 10/100-only wire);
>   Cable which has RJ11 on one end but RJ45 on the other.
>     (Side-stepping the need for an adapter at the switch).
>
>
> Probably put the pins in the plug in such a way that it doesn't have
> adverse effects if someone tries to plug a telephone into it.
>
> Say:
>   NC, A+, B-, B+, A-, NC
>
> Normal phone only connecting to the B pair (vs across the A/B pairs).
>
> With the POE system I am imagining, if one did connect across the A/B
> pairs, plugging a phone into it would result in it ringing continuously,
> whereas if only the B pair is connected (probably the TX pair from the
> phone's end), it would be silent and there would be zero net voltage
> from the phone's end.
>
>
>> Even easier, however, is simply to pass the power over the spare pairs
>> in a standard 4-pair Ethernet cable.
>>
>
> I had assumed doing a thing of running 48 VAC or similar between the two
> differential pairs.
>

AC is expensive in embedded systems. DC is cheap.

> This should work OK, but needs at least 4 wires (2 for each pair).
> On the device side, there would probably be a bridge rectifier connected
> to the center-taps of an isolation transformer.
>
>
> For PoE with this system, an RJ11<->RJ45 adapter could also function as
> the AC injector, say with a pair of isolation transformers (to let the
> data through), with the center taps connected (via another transformer)
> up to the mains power.
>
> Could make sense in the PoE case to have it as a multi-port block
> though, say, 4-8 simultaneous connections, rather than 1 adapter per cable.
>
>
>> The two-wire Ethernet standards already include support for simpler
>> and cheaper PoE solutions.
>>
>
> OK, would need to look into it.
>
>
> But, would assume that a two-wire interface is not likely to be
> electrically compatible with traditional Ethernet, at least not without
> some additional trickery (additional isolation transformers and probably
> a ground wire).
>

It is not electrically compatible, even with such trickery. The
trickery involved to get full duplex signalling on a single twisted pair
involves a good deal more than a transformer!

Currently, two-wire PHY's are rare and expensive. But that's due to
their novelty - if they become popular and quantities go up, prices will
drop. Note that short-range two-wire 10 Mbps Ethernet is a multi-drop
bus, and does not need a switch. (I'm not sure if it supports PoE.)

>
> Signaling and power would maybe be done in a similar way to a 2-wire
> telephone, but this wouldn't be able to be (passively) connected up to
> existing hubs or switches.
>
>
>>> Could also be electrically compatible with existing hubs and switches
>>> via an RJ11 to RJ45 adapter.
>>>
>>>
>>>> For wireless communication, speeds are usually even lower.  Modern
>>>> NBIOT cellular systems are designed to be extremely low power,
>>>> cheap, have longer range (20 km more than 3G and the rest).  You
>>>> send packets of up to about 200 bytes of data, perhaps once a day,
>>>> with a delivery time of several seconds.  Perfect for environmental
>>>> monitoring, finding your sheep, and many other tasks.
>>>>
>>>> For local Wifi (or Zigbee, Z-Wave, etc.) devices, small and low
>>>> bandwidth is also fine.  You can get away with a few hundred bytes
>>>> ram and still have enough to control a lightbulb, thermostat, etc.
>>>>
>>>> The IOT world is /full/ of systems running on 8-bit AVR's, 16-bit
>>>> MSP430's, and other small devices.  Code density matters for many of
>>>> them.
>>>>
>>>> (Of course it's a different matter for wireless cameras and all the
>>>> other devices that need high bandwidth.)
>>>>
>>>
>>> I would have figured a network stack would have been a bit much for
>>> this class of device...
>>
>> I have a book on my shelf describing a TCP/IP stack for an 8-bit PIC
>> microcontroller.
>>
>> However, the network stack needed for small Wifi or NB-IOT systems is
>> vastly smaller than you need for a full IP and TCP/IP stack.
>
> OK.
>
>
> Once (when I was much younger) I implemented a TCP/IP stack and Ethernet
> card driver in a hobby OS project.
>
> Lots of little lesser-known protocols in this mix, like ICMP and ARP and
> similar, ...
>
>
>
> In my current projects, I haven't gotten back around to this part yet.
> Partly, it looks like to do it from an FPGA, one is basically driving
> out the bits themselves, and has to write their own logic for
> transmitting and receiving Ethernet frames at the level of bits going
> over the wires (IIRC).
>
>
> With the old card I was targeting, IIRC it was at the level of
> abstraction of Ethernet frames getting transmitted and received via a
> pair of ring buffers.
>
> The card I am using does have an Ethernet port and similar at least, so
> could maybe get to this eventually.
>

Re: Why My 66000 is and is not RISC

<t9a9c5$1fg$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26118&group=comp.arch#26118

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 13:43:15 -0500
Organization: A noiseless patient Spider
Lines: 279
Message-ID: <t9a9c5$1fg$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me> <t94kmo$li0$1@newsreader4.netcologne.de>
<t95bpp$77b$1@dont-email.me>
<745d5f3a-201e-4988-9b90-d2db7172c0a2n@googlegroups.com>
<t96g62$ane$1@dont-email.me> <t97aia$a1t$1@dont-email.me>
<t97d80$kk6$1@dont-email.me> <t97m27$lkt$1@dont-email.me>
<t998p1$3sgo4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 26 Jun 2022 18:43:18 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="0a2edac7d7c9671ae384204f91143747";
logging-data="1520"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1885vMOBJ/LSpXOHif8JfO1"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:j890AW3m+IYnNSZ9VA5+97tb1SU=
In-Reply-To: <t998p1$3sgo4$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sun, 26 Jun 2022 18:43 UTC

On 6/26/2022 4:26 AM, David Brown wrote:
> On 25/06/2022 21:01, BGB wrote:
>> On 6/25/2022 11:30 AM, David Brown wrote:
>>> On 25/06/2022 17:45, BGB wrote:
>
>>>> Seems like it would also be fairly trivial to chop 10/100 Ethernet
>>>> down to a 4-wire variant as well, probably using RJ11 plugs or
>>>> similar. Advantage of 4-wire as that this could allow for POE (and
>>>> 4-wire phone-wire could be cheaper than CAT5E or similar).
>>>
>>> 10 Mbps and 100 Mbps Ethernet already only use 4 wires - one pair in
>>> each direction.  Passing (non-isolated) DC power over these wires is
>>> extremely simple, and requires nothing more than a few diodes and an
>>> LC filter.  Unfortunately, the PoE standards were developed by a
>>> committee of morons that produced a ridiculously over-engineered
>>> system that is too bulky and expensive to have caught on outside a
>>> few specific use-cases.
>>>
>>
>> That is kinda the point of how it would be electrically compatible:
>>    Use the pairs that are in-use in 10/100;
>>    Skip the other wires;
>>    Maybe use smaller/cheaper RJ11 (6P4C variant) rather than RJ45.
>>
>> Normal twisted-pair telephone wire would probably have sufficient
>> electrical properties to 10/100 in many cases.
>
> RJ11 connectors are not going to be much cheaper than RJ45, if anything.
>  And telephone wire is not going to be good enough for anything here.
> In particular, there are no real standards or quality control (except
> for long-distance lines that cost more than good Ethernet cables because
> no one installs new ones any more).  So your cheapo bit of telephone
> wire might work in one system, but a different cheapo wire won't.  It
> might work fine until your motors start, then the ESD interference
> disrupts it.  Having something that /might/ work or /usually/ works is
> not very useful.
>

For telephone wire, I was mostly thinking of 2-pair CAT3 (as opposed to
4 pair CAT3).

For 10/100, it should be OK, since usually the other two pairs are just
sitting around mostly doing nothing.

Apparently, 2-pair CAT3 (still with RJ45) was actually a thing at one
point for some LANs.

However, I have noted that for some short/cheap Ethernet cables, have
seen the RJ45 connectors crimped onto pieces of flat ribbon cable,
implying that one "can" probably get by with cheaper (non-twisted /
CAT1) wire in some cases (I would guess probably for runs of say 1 or 2
meters or similar, would likely need some testing).

Though, I guess one difference between RJ11 and RJ45 is that a lot of
the through-hole RJ45 plugs have built-in isolation transformers,
whereas a board build for RJ11 plugs might need to supply these itself.

>>
>> Only thing is that it would require an adapter to plug RJ11 into RJ45,
>> though other options:
>>    Use RJ45 but with only 2 pairs (effectively a 10/100-only wire);
>>    Cable which has RJ11 on one end but RJ45 on the other.
>>      (Side-stepping the need for an adapter at the switch).
>>
>>
>> Probably put the pins in the plug in such a way that it doesn't have
>> adverse effects if someone tries to plug a telephone into it.
>>
>> Say:
>>    NC, A+, B-, B+, A-, NC
>>
>> Normal phone only connecting to the B pair (vs across the A/B pairs).
>>
>> With the POE system I am imagining, if one did connect across the A/B
>> pairs, plugging a phone into it would result in it ringing
>> continuously, whereas if only the B pair is connected (probably the TX
>> pair from the phone's end), it would be silent and there would be zero
>> net voltage from the phone's end.
>>
>>
>>> Even easier, however, is simply to pass the power over the spare
>>> pairs in a standard 4-pair Ethernet cable.
>>>
>>
>> I had assumed doing a thing of running 48 VAC or similar between the
>> two differential pairs.
>>
>
> AC is expensive in embedded systems.  DC is cheap.
>

Small transformer and 4 diodes, should be manageable.

With 48VDC, one is going to need a buck converter, which requires an
inductor and a sense and control circuits.

One can do AC->DC with 4 diodes and a capacitor, which is not a huge
cost in any sense, and AC allows using a linear transformer to step down
to 5V or similar.

It would likely be simpler and cheaper than traditional DC PoE:
Doesn't care which pair is which;
Doesn't need any logic to detect what is on the other end.

Traditional PoE involves a pointlessly complicated "negotiation" step.

Eg: For this, just sorta stick 48VAC on the line and assume it is good.
Though, this would not be compatible with devices assuming DC PoE.

But, in a way, this would be a potential advantage to using RJ11, so one
doesn't accidentally mix them up (not sure how well the DC devices would
tolerate being plugged into AC).

>> This should work OK, but needs at least 4 wires (2 for each pair).
>> On the device side, there would probably be a bridge rectifier
>> connected to the center-taps of an isolation transformer.
>>
>>
>> For PoE with this system, an RJ11<->RJ45 adapter could also function
>> as the AC injector, say with a pair of isolation transformers (to let
>> the data through), with the center taps connected (via another
>> transformer) up to the mains power.
>>
>> Could make sense in the PoE case to have it as a multi-port block
>> though, say, 4-8 simultaneous connections, rather than 1 adapter per
>> cable.
>>
>>
>>> The two-wire Ethernet standards already include support for simpler
>>> and cheaper PoE solutions.
>>>
>>
>> OK, would need to look into it.
>>
>>
>> But, would assume that a two-wire interface is not likely to be
>> electrically compatible with traditional Ethernet, at least not
>> without some additional trickery (additional isolation transformers
>> and probably a ground wire).
>>
>
> It is not electrically compatible, even with such trickery.  The
> trickery involved to get full duplex signalling on a single twisted pair
> involves a good deal more than a transformer!
>
> Currently, two-wire PHY's are rare and expensive.  But that's due to
> their novelty - if they become popular and quantities go up, prices will
> drop.  Note that short-range two-wire 10 Mbps Ethernet is a multi-drop
> bus, and does not need a switch.  (I'm not sure if it supports PoE.)
>

OK.

Not sure how it works, haven't found much information thus far.

I guess if I were to consider designing something, a few possibilities:
Simple DC signaling, say:
An RX and TX line, possibly at 3.3v or similar;
Using 8b/10b or similar;
Would need a common ground;
Not likely suitable for longer-distance signaling:
Signal integrity and possible ground-loop issues.
Likely point-to-point only (couldn't do a passive bus).
Something sorta like a CAN bus:
Maybe still using 8b/10b or similar;
Basically, one has a wire that is normally pulled high;
Pull low to send bits, half-duplex.
Likely still subject to ground-loop and propagation delays
Differential signaling, likely tri-state (+/0/-):
Would at least avoid ground loop issues and similar;
Back to requiring isolation transformers and similar.
Wired in a star, there might also be an issue with echoes (1)

1: Potentially echoes would be worse with the isolation transformers, as
induction is likely to reflect an inverted version of the signal back
down the wire, which if if hits the other transformers might flip back
into a positive image, ... Cases where constructive interference occurs
could potentially raise the "noise floor" enough to interfere with
transmission (such as corrupting transmitted frames).

Echo rate at 10 Mbps would be around 1 bit every 30 meters, probably
manageable, and the reflection should mostly "self annihilate" at the
point it is received. For a "sane" network size, the echo is likely to
almost entirely dissipate within a few bits.

At 100 Mbps, it would be 1 bit every 3 meters, potentially a bigger
issue, as echoes would propagate for comparably longer. A lot would also
depend on the inductive properties of the isolation transformers.

(It might take several hundred bits of "silence" for the echoes to die
down following a transmitted message with, say, a 100 meter network).

This issue would likely be significantly reduced if doing point-to-point
signaling (constructive interference could not occur). Would depend
mostly on the "sensitivity" at each end, would want to have sensitivity
low enough that it doesn't detect reflected bits, but high enough that
it does detect transmitted bits.


Click here to read the complete article
Re: Why My 66000 is and is not RISC

<t9afmp$vpi$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26119&group=comp.arch#26119

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 20:31:22 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 128
Message-ID: <t9afmp$vpi$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org>
<t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 26 Jun 2022 20:31:22 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="ae1581073a91194c14ce3a6fc4a1caeb";
logging-data="32562"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/W0GDE7ULb9yVaCrpKVuhX"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:nqPillH+5PUkkAyJHnYcRvmYyM8=
sha1:gPBYfHa0vHNvRiV5sscW1DDljn8=
 by: Brett - Sun, 26 Jun 2022 20:31 UTC

MitchAlsup <MitchAlsup@aol.com> wrote:
> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>> MitchAlsup <Mitch...@aol.com> wrote:
> <snip>
>>> I looked at this a few years ago and the damage to long term ISA growth
>>> was catastrophic. As it is I have nearly ½ of the OpCode space in each
>>> OpCode group left for the future. and can PARSE instructions in 31 gates
>>> with only 4 gates of delay. All that goes out the window with a meaningful
>>> 16-bit "extension". I pass.
> <
>> I don’t get why you think you need to reserve half your opcode space for
>> future extensions, I would have thought we are at the end of history for
>> opcode extensions.
> <
> a) because I have watched what happens to processors over 50 years and
> how every time you turn around they have more OpCodes--mainly to address
> stuff forgotten earlier.
> <
> b) remember I compacted everything into only 59 actual instructions.
>>
>> What is the cost of reserving 3 bits of one pattern and the same pattern at
>> the 16 bit border, so that you can add 16 bit opcodes in the future?
> <
> The cost of reserving space for 16-bit is that it over-constrains the 32-bit
> OpCode space. For example: I could not give the 16-bit OpCodes a typical
> subgroup (6-bit Major OpCode) because the first instruction would only
> have 10-bits left !! (16-6=10)
> <
> Also note: Where there are instructions in several formats (like ADD with 16-bit
> immediate and ADD of 2 registers), in all cases, the bit pattern used to recognize
> ADD remains identical.
> <
> There are several OpCode groups reserved in perpetuity, these were chosen such
> that if one branches into data there is very little possibility to finding anything other
> than INVALID instruction decoding sitting there. From the M7 66000 ISA document::
> <----------------------------------------------------------------------------------------------------------------------------
> A number of the Major OpCode specifiers are reserved in perpetuity as illegal
> OpCodes. Transferring control into 32-bit or 64-bit integer data, or into 32-bit or
> 64-bit floating point data will very likely result in the decoding of an illegal
> instruction and raise the OPERATION exception. In particular, small 32-bit positive
> integers and small 32-bit negative integers are illegal instructions. 32-bit Floating
> point values in the range ±[1/128..32) are also illegal OpCodes. Should control be
> transferred into typical integer or floating point data, there is little likelihood of
> executing for a long time before running into an illegal instruction. Executing data
> can also be prevented in the MMU when desired (almost always.)
> <----------------------------------------------------------------------------------------------------------------------------
> This catches programming errors. When I looked, you can't do this with 16-bit inst-
> ructions; as basically every encoding has to be used.
> <
> That is: some architects think of things other than code density--we think of
> code safety--even when the GuestOS fights us (or JavaScript,...) programming
> attack modes, and design architectures that are substantially more robust than
> current ones.
>>
> In particular, My 66000 is immune to the current laundry list of attack strategies
> {Meltdown, Spectré, RowHammer, RoP, ForeShadow, ...}
> <
>> Besides the cost of rearranging bits the cost is nothing with half the
>> opcode space reserved. If you are clever and spread the three bits to fit
>> your open slots.
> <
>>
>> This gives your customers choices, customers like choices. You could upsell
>> this as a future feature like RISCV does and just never deliver, like
>> RISCV. ;)
>>
> I give my customers a choice: Do you want a machine that is secure or not ?
>>
>> You have a two wide design,
> <
> Technically, I have a 1-wide design that can CoIssue some pairs of instructions
> It is 1-wide because there is 3R1W register file. It can CoIssue because some
> pairs of instructions do not consume certain register ports. It is not at all
> SuperScalar !
> <
> I also have a 6-wide design. Nothing in ISA or the rest of the architecture makes
> either design necessarily harder than it needs to be. For many of the control
> logic calculations My 66000 ISA requires, I went to the trouble of implementing
> the gate circuitry to perform said duties to verify that the choices were good.
> For example, I can look at a 32-bit word (which takes 320 gates to hold in flip-
> flops) and determine the instruction length in 31 gates (4-gates of delay). Thereby
> Instruction caches for machines less than 16-wide do not need predecoding bits.
> FCMP is performed in the same function unit as Integer CMP. The integer part
> requires 55 gates (4 gates of delay) the FP addition is 17 gates (remains 4 gates
> of delay) So adding FCMP to CMP is a small price {Oh and BTW, I restrict myself
> to 4-in NAND gates and 3-in NOR gates}
> <
>> even once you implement 16 bit opcodes handling
>> the unaligned case of long instructions is not that hard, an extra gate
>> delay? Two? Depending on which of a dozen approaches you pick for size.
> <
> I can assure you that if I had set out to design a good 16-bit OpCode ISA
> I would have succeeded. But over my 40 year career, I have found some
> things more valuable, and I pursued those interests. You are free to do
> differently.
>>
>> The resulting 13 bit opcode is plenty for one register a source ACC and a
>> dest ACC, and a full set of integer operations. And some short loads/store
>> offsets to/from ACC using one address register.
> <
> Then you end up with a cartesian product of various ways to do the same thing.
> This makes the compiler have to figure out if::
> <
> INC Rd ; NoOp
> is better or worse than:
> ADD Rd,Rd,#1
> and what set of rules (or heuristics) govern the selection. I have worked inside
> compilers trying to figure this kind of stuff out. It ain't straightforward.
> <
> I submit that compilers are best when they only need to figure how to do
> something exactly one way.

An optimizer pass to convert all loads that are only used once to ACC ops
is pretty trivial, as a software guy I am not asking for Itanic compiler
changes.

All the other 16 bit variants use restricted register sets and has deep
effects on register coloring, etc. I tried a dozen of these on paper and
despite my propaganda of the time they all sucked. ;(

ACC ops follow the KISS principle. Simple one for one substitution of 16
bit opcodes for the longer ones where the operation tree makes it possible.

To make best use you need a bunch of 32 bit ops that use ACC, or better yet
just use a register like R1, and remove that register from normal use, a
reduction of one register for the other compiler passes.

Re: Why My 66000 is and is not RISC

<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26120&group=comp.arch#26120

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a7b:c110:0:b0:39c:8270:7b95 with SMTP id w16-20020a7bc110000000b0039c82707b95mr11746205wmi.41.1656280595768;
Sun, 26 Jun 2022 14:56:35 -0700 (PDT)
X-Received: by 2002:a05:622a:1049:b0:305:2f0f:63c6 with SMTP id
f9-20020a05622a104900b003052f0f63c6mr7315446qte.331.1656280595322; Sun, 26
Jun 2022 14:56:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 26 Jun 2022 14:56:35 -0700 (PDT)
In-Reply-To: <t9afmp$vpi$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ad77:3efc:9604:5c62;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ad77:3efc:9604:5c62
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com> <t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com> <t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com> <t9afmp$vpi$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
Subject: Re: Why My 66000 is and is not RISC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 26 Jun 2022 21:56:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 26 Jun 2022 21:56 UTC

On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
> MitchAlsup <Mitch...@aol.com> wrote:
> > On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:

> > Then you end up with a cartesian product of various ways to do the same thing.
> > This makes the compiler have to figure out if::
> > <
> > INC Rd ; NoOp
> > is better or worse than:
> > ADD Rd,Rd,#1
> > and what set of rules (or heuristics) govern the selection. I have worked inside
> > compilers trying to figure this kind of stuff out. It ain't straightforward.
> > <
> > I submit that compilers are best when they only need to figure how to do
> > something exactly one way.
<
> An optimizer pass to convert all loads that are only used once to ACC ops
> is pretty trivial, as a software guy I am not asking for Itanic compiler
> changes.
<
While I can, in general, agree that you are not asking for Titanic additions
to the compiler, you are asking for Titanic alterations of the axioms and
tenets underlying the encoding philosophy of My 66000 ISA. Basically,
you are asking for a complete reset. I am not interested in starting over.
However, You are free to design whatever 16-bit instructions you want for
your architecture.
>
> All the other 16 bit variants use restricted register sets and has deep
> effects on register coloring, etc. I tried a dozen of these on paper and
> despite my propaganda of the time they all sucked. ;(
<
You and Quadriblock should get together and compare notes........maybe
you could teach him about "what to leave out" as part of your comp.arch
meeting.
>
> ACC ops follow the KISS principle. Simple one for one substitution of 16
> bit opcodes for the longer ones where the operation tree makes it possible.
<
I am well aware of how accumulator machines perform. Rather well on the
60%+ code sequences, and less well on the 40%- code sequences. Whereas
a 32-bit only ISA has but one way of expressing arithmetic and does not so
suffer.
<
Remember I have a <essentially> pure IRSC ISA that is achieving x86-64
code density--significantly better than <¿almost?> all other pure RISC ISAs.
I got here by eliminating instruction functionality that could be embodied
elsewhere in the data-path and uniformly encoded in ISA. This makes each
instruction more powerful without adding delay to its execution and makes
the compilers job a bit easier in expressing the necessary semantic.
>
> To make best use you need a bunch of 32 bit ops that use ACC, or better yet
> just use a register like R1, and remove that register from normal use, a
> reduction of one register for the other compiler passes.
<
Yech.

Re: Why My 66000 is and is not RISC

<t9alpo$1ndd$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26121&group=comp.arch#26121

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 17:15:17 -0500
Organization: A noiseless patient Spider
Lines: 331
Message-ID: <t9alpo$1ndd$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 26 Jun 2022 22:15:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a75f62ff4d5028ec146574a01f589347";
logging-data="56749"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19uWAHw3ecpACrQ+RGvcr8J"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:elWcjYmNdpRXpW7ycqPoxBsIXbw=
Content-Language: en-US
In-Reply-To: <t9afmp$vpi$1@dont-email.me>
 by: BGB - Sun, 26 Jun 2022 22:15 UTC

On 6/26/2022 3:31 PM, Brett wrote:
> MitchAlsup <MitchAlsup@aol.com> wrote:
>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>>> MitchAlsup <Mitch...@aol.com> wrote:
>> <snip>
>>>> I looked at this a few years ago and the damage to long term ISA growth
>>>> was catastrophic. As it is I have nearly ½ of the OpCode space in each
>>>> OpCode group left for the future. and can PARSE instructions in 31 gates
>>>> with only 4 gates of delay. All that goes out the window with a meaningful
>>>> 16-bit "extension". I pass.
>> <
>>> I don’t get why you think you need to reserve half your opcode space for
>>> future extensions, I would have thought we are at the end of history for
>>> opcode extensions.
>> <
>> a) because I have watched what happens to processors over 50 years and
>> how every time you turn around they have more OpCodes--mainly to address
>> stuff forgotten earlier.
>> <
>> b) remember I compacted everything into only 59 actual instructions.
>>>
>>> What is the cost of reserving 3 bits of one pattern and the same pattern at
>>> the 16 bit border, so that you can add 16 bit opcodes in the future?
>> <
>> The cost of reserving space for 16-bit is that it over-constrains the 32-bit
>> OpCode space. For example: I could not give the 16-bit OpCodes a typical
>> subgroup (6-bit Major OpCode) because the first instruction would only
>> have 10-bits left !! (16-6=10)
>> <
>> Also note: Where there are instructions in several formats (like ADD with 16-bit
>> immediate and ADD of 2 registers), in all cases, the bit pattern used to recognize
>> ADD remains identical.
>> <
>> There are several OpCode groups reserved in perpetuity, these were chosen such
>> that if one branches into data there is very little possibility to finding anything other
>> than INVALID instruction decoding sitting there. From the M7 66000 ISA document::
>> <----------------------------------------------------------------------------------------------------------------------------
>> A number of the Major OpCode specifiers are reserved in perpetuity as illegal
>> OpCodes. Transferring control into 32-bit or 64-bit integer data, or into 32-bit or
>> 64-bit floating point data will very likely result in the decoding of an illegal
>> instruction and raise the OPERATION exception. In particular, small 32-bit positive
>> integers and small 32-bit negative integers are illegal instructions. 32-bit Floating
>> point values in the range ±[1/128..32) are also illegal OpCodes. Should control be
>> transferred into typical integer or floating point data, there is little likelihood of
>> executing for a long time before running into an illegal instruction. Executing data
>> can also be prevented in the MMU when desired (almost always.)
>> <----------------------------------------------------------------------------------------------------------------------------
>> This catches programming errors. When I looked, you can't do this with 16-bit inst-
>> ructions; as basically every encoding has to be used.
>> <
>> That is: some architects think of things other than code density--we think of
>> code safety--even when the GuestOS fights us (or JavaScript,...) programming
>> attack modes, and design architectures that are substantially more robust than
>> current ones.
>>>
>> In particular, My 66000 is immune to the current laundry list of attack strategies
>> {Meltdown, Spectré, RowHammer, RoP, ForeShadow, ...}
>> <
>>> Besides the cost of rearranging bits the cost is nothing with half the
>>> opcode space reserved. If you are clever and spread the three bits to fit
>>> your open slots.
>> <
>>>
>>> This gives your customers choices, customers like choices. You could upsell
>>> this as a future feature like RISCV does and just never deliver, like
>>> RISCV. ;)
>>>
>> I give my customers a choice: Do you want a machine that is secure or not ?
>>>
>>> You have a two wide design,
>> <
>> Technically, I have a 1-wide design that can CoIssue some pairs of instructions
>> It is 1-wide because there is 3R1W register file. It can CoIssue because some
>> pairs of instructions do not consume certain register ports. It is not at all
>> SuperScalar !
>> <
>> I also have a 6-wide design. Nothing in ISA or the rest of the architecture makes
>> either design necessarily harder than it needs to be. For many of the control
>> logic calculations My 66000 ISA requires, I went to the trouble of implementing
>> the gate circuitry to perform said duties to verify that the choices were good.
>> For example, I can look at a 32-bit word (which takes 320 gates to hold in flip-
>> flops) and determine the instruction length in 31 gates (4-gates of delay). Thereby
>> Instruction caches for machines less than 16-wide do not need predecoding bits.
>> FCMP is performed in the same function unit as Integer CMP. The integer part
>> requires 55 gates (4 gates of delay) the FP addition is 17 gates (remains 4 gates
>> of delay) So adding FCMP to CMP is a small price {Oh and BTW, I restrict myself
>> to 4-in NAND gates and 3-in NOR gates}
>> <
>>> even once you implement 16 bit opcodes handling
>>> the unaligned case of long instructions is not that hard, an extra gate
>>> delay? Two? Depending on which of a dozen approaches you pick for size.
>> <
>> I can assure you that if I had set out to design a good 16-bit OpCode ISA
>> I would have succeeded. But over my 40 year career, I have found some
>> things more valuable, and I pursued those interests. You are free to do
>> differently.
>>>
>>> The resulting 13 bit opcode is plenty for one register a source ACC and a
>>> dest ACC, and a full set of integer operations. And some short loads/store
>>> offsets to/from ACC using one address register.
>> <
>> Then you end up with a cartesian product of various ways to do the same thing.
>> This makes the compiler have to figure out if::
>> <
>> INC Rd ; NoOp
>> is better or worse than:
>> ADD Rd,Rd,#1
>> and what set of rules (or heuristics) govern the selection. I have worked inside
>> compilers trying to figure this kind of stuff out. It ain't straightforward.
>> <
>> I submit that compilers are best when they only need to figure how to do
>> something exactly one way.
>
>
> An optimizer pass to convert all loads that are only used once to ACC ops
> is pretty trivial, as a software guy I am not asking for Itanic compiler
> changes.
>
> All the other 16 bit variants use restricted register sets and has deep
> effects on register coloring, etc. I tried a dozen of these on paper and
> despite my propaganda of the time they all sucked. ;(
>

A few thoughts:
The restricted set should not be smaller than 1/2 the baseline
"full-sized" set.

So, for 32 GPRs, 4b (16 regs) is OK, 3b (8 regs) a bit less so.

I skipped 3R encodings in the 16-bit space, as they would be
"essentially useless".

For size-optimized code, one does have to bias the compiler towards
using a smaller set of registers (the set usable by 16-bit encodings),
which does typically come at a performance cost (in my case, this is
mixed with the other drawback that one can't predicate or bundle the
16-bit encodings).

So, the general result is that the program is roughly 50% bigger if
built in speed-optimized modes.

Size optimized mode: around 60% 16-bit, 40% 32-bit;
Speed optimized mode: around 20% 16-bit, 80% 32-bit.

Speed-optimized 16/32 is still generally smaller than a fixed-length
32-bit subset though (well, and more so if one disallows Jumbo
encodings, which adds an additional size penalty).

> ACC ops follow the KISS principle. Simple one for one substitution of 16
> bit opcodes for the longer ones where the operation tree makes it possible.
>
> To make best use you need a bunch of 32 bit ops that use ACC, or better yet
> just use a register like R1, and remove that register from normal use, a
> reduction of one register for the other compiler passes.
>

Side note:
Why are R0 and R1 "special" in my case?...

Mostly because early on, I removed them from normal use to have
registers which the ASM stage could stomp without warning.

R0 was used typically to load temporary values into if the Immed field
was insufficient.

Say, without Jumbo:
ADD R4, 123, R5 //OK, can use an immediate form
ADD R6, 123456, R7 //Not OK, doesn't fit.


Click here to read the complete article
Re: Why My 66000 is and is not RISC

<t9atin$2mot$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26122&group=comp.arch#26122

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 19:28:04 -0500
Organization: A noiseless patient Spider
Lines: 91
Message-ID: <t9atin$2mot$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Jun 2022 00:28:07 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a75f62ff4d5028ec146574a01f589347";
logging-data="88861"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Kk08Jx0mdc4JbzQJDETF/"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:DDu4Xh/PVIWdr2dAOdh12f5VEZk=
Content-Language: en-US
In-Reply-To: <dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
 by: BGB - Mon, 27 Jun 2022 00:28 UTC

On 6/26/2022 4:56 PM, MitchAlsup wrote:
> On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
>> MitchAlsup <Mitch...@aol.com> wrote:
>>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>
>>> Then you end up with a cartesian product of various ways to do the same thing.
>>> This makes the compiler have to figure out if::
>>> <
>>> INC Rd ; NoOp
>>> is better or worse than:
>>> ADD Rd,Rd,#1
>>> and what set of rules (or heuristics) govern the selection. I have worked inside
>>> compilers trying to figure this kind of stuff out. It ain't straightforward.
>>> <
>>> I submit that compilers are best when they only need to figure how to do
>>> something exactly one way.
> <
>> An optimizer pass to convert all loads that are only used once to ACC ops
>> is pretty trivial, as a software guy I am not asking for Itanic compiler
>> changes.
> <
> While I can, in general, agree that you are not asking for Titanic additions
> to the compiler, you are asking for Titanic alterations of the axioms and
> tenets underlying the encoding philosophy of My 66000 ISA. Basically,
> you are asking for a complete reset. I am not interested in starting over.
> However, You are free to design whatever 16-bit instructions you want for
> your architecture.

Yeah, there are only so many possible combinations.

>>
>> All the other 16 bit variants use restricted register sets and has deep
>> effects on register coloring, etc. I tried a dozen of these on paper and
>> despite my propaganda of the time they all sucked. ;(
> <
> You and Quadriblock should get together and compare notes........maybe
> you could teach him about "what to leave out" as part of your comp.arch
> meeting.

Yeah, the funky obsession with non-power-of-2 data sizes and similar
would probably be high on my list.

>>
>> ACC ops follow the KISS principle. Simple one for one substitution of 16
>> bit opcodes for the longer ones where the operation tree makes it possible.
> <
> I am well aware of how accumulator machines perform. Rather well on the
> 60%+ code sequences, and less well on the 40%- code sequences. Whereas
> a 32-bit only ISA has but one way of expressing arithmetic and does not so
> suffer.
> <
> Remember I have a <essentially> pure IRSC ISA that is achieving x86-64
> code density--significantly better than <¿almost?> all other pure RISC ISAs.
> I got here by eliminating instruction functionality that could be embodied
> elsewhere in the data-path and uniformly encoded in ISA. This makes each
> instruction more powerful without adding delay to its execution and makes
> the compilers job a bit easier in expressing the necessary semantic.

I will assume in this you *don't* mean the apparent common variation of
x86-64 which takes upwards of 1.5MB to build Doom...

>>
>> To make best use you need a bunch of 32 bit ops that use ACC, or better yet
>> just use a register like R1, and remove that register from normal use, a
>> reduction of one register for the other compiler passes.
> <
> Yech.

Agreed...

There isn't really a good reason to have hard-coded registers "in
general" with 32-bit instruction encodings, particularly not for an
accumulator.

I will make a partial assumption for loading a big constant to a fixed
register, where having an instruction for a larger constant load could
partly offset the drawback of having "not particularly large" immediate
fields in other contexts.

Also at the time, 24 bits would be sufficient in-general for things like
data/bss loads and stores, whereas something more modest (such as 16 or
20 bit) would not have been sufficient (several of these programs
effectively have several MB of '.bss').

....

Re: Why My 66000 is and is not RISC

<t9b0f6$3bks$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26123&group=comp.arch#26123

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 18:17:25 -0700
Organization: A noiseless patient Spider
Lines: 117
Message-ID: <t9b0f6$3bks$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Jun 2022 01:17:26 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="260a7ddd273fa9ac8f9423c560c2a5c8";
logging-data="110236"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18pZbUhl5stbb/1dYTr+GUd"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:4+8QMibybZNbwil2YihJ6gis/7I=
In-Reply-To: <t9atin$2mot$1@dont-email.me>
Content-Language: en-US
 by: Ivan Godard - Mon, 27 Jun 2022 01:17 UTC

On 6/26/2022 5:28 PM, BGB wrote:
> On 6/26/2022 4:56 PM, MitchAlsup wrote:
>> On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>>
>>>> Then you end up with a cartesian product of various ways to do the
>>>> same thing.
>>>> This makes the compiler have to figure out if::
>>>> <
>>>> INC Rd ; NoOp
>>>> is better or worse than:
>>>> ADD Rd,Rd,#1
>>>> and what set of rules (or heuristics) govern the selection. I have
>>>> worked inside
>>>> compilers trying to figure this kind of stuff out. It ain't
>>>> straightforward.
>>>> <
>>>> I submit that compilers are best when they only need to figure how
>>>> to do
>>>> something exactly one way.
>> <
>>> An optimizer pass to convert all loads that are only used once to ACC
>>> ops
>>> is pretty trivial, as a software guy I am not asking for Itanic compiler
>>> changes.
>> <
>> While I can, in general, agree that you are not asking for Titanic
>> additions
>> to the compiler, you are asking for Titanic alterations of the axioms and
>> tenets underlying the encoding philosophy of My 66000 ISA. Basically,
>> you are asking for a complete reset. I am not interested in starting
>> over.
>> However, You are free to design whatever 16-bit instructions you want for
>> your architecture.
>
> Yeah, there are only so many possible combinations.
>
>
>>>
>>> All the other 16 bit variants use restricted register sets and has deep
>>> effects on register coloring, etc. I tried a dozen of these on paper and
>>> despite my propaganda of the time they all sucked. ;(
>> <
>> You and Quadriblock should get together and compare notes........maybe
>> you could teach him about "what to leave out" as part of your comp.arch
>> meeting.
>
> Yeah, the funky obsession with non-power-of-2 data sizes and similar
> would probably be high on my list.
>
>
>>>
>>> ACC ops follow the KISS principle. Simple one for one substitution of 16
>>> bit opcodes for the longer ones where the operation tree makes it
>>> possible.
>> <
>> I am well aware of how accumulator machines perform. Rather well on the
>> 60%+ code sequences, and less well on the 40%- code sequences. Whereas
>> a 32-bit only ISA has but one way of expressing arithmetic and does
>> not so
>> suffer.
>> <
>> Remember I have a <essentially> pure IRSC ISA that is achieving x86-64
>> code density--significantly better than <¿almost?> all other pure RISC
>> ISAs.
>> I got here by eliminating instruction functionality that could be
>> embodied
>> elsewhere in the data-path and uniformly encoded in ISA. This makes each
>> instruction more powerful without adding delay to its execution and makes
>> the compilers job a bit easier in expressing the necessary semantic.
>
> I will assume in this you *don't* mean the apparent common variation of
> x86-64 which takes upwards of 1.5MB to build Doom...
>
>
>
>>>
>>> To make best use you need a bunch of 32 bit ops that use ACC, or
>>> better yet
>>> just use a register like R1, and remove that register from normal use, a
>>> reduction of one register for the other compiler passes.
>> <
>> Yech.
>
> Agreed...
>
> There isn't really a good reason to have hard-coded registers "in
> general" with 32-bit instruction encodings, particularly not for an
> accumulator.
>
>
> I will make a partial assumption for loading a big constant to a fixed
> register, where having an instruction for a larger constant load could
> partly offset the drawback of having "not particularly large" immediate
> fields in other contexts.
>
>
> Also at the time, 24 bits would be sufficient in-general for things like
> data/bss loads and stores, whereas something more modest (such as 16 or
> 20 bit) would not have been sufficient (several of these programs
> effectively have several MB of '.bss').
>
> ...

There are advantages to dedicated base registers, set as a side effect
of other operations. You can keep them where they are used, instead of
in a regfile, saving wire delay. You need fewer of them (Mill: 8 vs 32)
saving fan-in. If you restrict addressable regions to not cross a 4Gb
(or less) boundary then you can have a narrower address adder (Mill: 32
bits vs 64). You avoid the loads with fat constants. You lower pressure
on the genregs. The address-using instructions need fewer bits to encode
the base (Mill: 3 vs 5). The base can be used to select a particular WKR
for range checking, avoiding the trip to the PLB. The base can be used
as the lwb in its WHR, reducing state for task switch or call.

And so on.

Re: Why My 66000 is and is not RISC

<66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26124&group=comp.arch#26124

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:600c:3493:b0:39c:8731:84c3 with SMTP id a19-20020a05600c349300b0039c873184c3mr17741907wmq.45.1656292724991;
Sun, 26 Jun 2022 18:18:44 -0700 (PDT)
X-Received: by 2002:a37:3d5:0:b0:6af:b6:1248 with SMTP id 204-20020a3703d5000000b006af00b61248mr6393701qkd.135.1656292724297;
Sun, 26 Jun 2022 18:18:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 26 Jun 2022 18:18:44 -0700 (PDT)
In-Reply-To: <t9atin$2mot$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ad77:3efc:9604:5c62;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ad77:3efc:9604:5c62
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com> <t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com> <t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com> <t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com> <t9atin$2mot$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
Subject: Re: Why My 66000 is and is not RISC
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 27 Jun 2022 01:18:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Mon, 27 Jun 2022 01:18 UTC

On Sunday, June 26, 2022 at 7:28:11 PM UTC-5, BGB wrote:
> On 6/26/2022 4:56 PM, MitchAlsup wrote:
> > On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
> >> MitchAlsup <Mitch...@aol.com> wrote:
> >>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
> >
> >>> Then you end up with a cartesian product of various ways to do the same thing.
> >>> This makes the compiler have to figure out if::
> >>> <
> >>> INC Rd ; NoOp
> >>> is better or worse than:
> >>> ADD Rd,Rd,#1
> >>> and what set of rules (or heuristics) govern the selection. I have worked inside
> >>> compilers trying to figure this kind of stuff out. It ain't straightforward.
> >>> <
> >>> I submit that compilers are best when they only need to figure how to do
> >>> something exactly one way.
> > <
> >> An optimizer pass to convert all loads that are only used once to ACC ops
> >> is pretty trivial, as a software guy I am not asking for Itanic compiler
> >> changes.
> > <
> > While I can, in general, agree that you are not asking for Titanic additions
> > to the compiler, you are asking for Titanic alterations of the axioms and
> > tenets underlying the encoding philosophy of My 66000 ISA. Basically,
> > you are asking for a complete reset. I am not interested in starting over.
> > However, You are free to design whatever 16-bit instructions you want for
> > your architecture.
> Yeah, there are only so many possible combinations.
<
nearly infinite purmutations........
> >>
> >> All the other 16 bit variants use restricted register sets and has deep
> >> effects on register coloring, etc. I tried a dozen of these on paper and
> >> despite my propaganda of the time they all sucked. ;(
> > <
> > You and Quadriblock should get together and compare notes........maybe
> > you could teach him about "what to leave out" as part of your comp.arch
> > meeting.
> Yeah, the funky obsession with non-power-of-2 data sizes and similar
> would probably be high on my list.
<
I was wondering if anyone caught the palindrome of his thread "...life the universe
and everything." In Hitchhikers Guide to the Galaxy the answer to "life the universe
and everything" is/was 42 whereas Quadriblock's thread converges on 24 which
is the simple reverse of 42.
> >>
> >> ACC ops follow the KISS principle. Simple one for one substitution of 16
> >> bit opcodes for the longer ones where the operation tree makes it possible.
> > <
> > I am well aware of how accumulator machines perform. Rather well on the
> > 60%+ code sequences, and less well on the 40%- code sequences. Whereas
> > a 32-bit only ISA has but one way of expressing arithmetic and does not so
> > suffer.
> > <
> > Remember I have a <essentially> pure IRSC ISA that is achieving x86-64
> > code density--significantly better than <¿almost?> all other pure RISC ISAs.
> > I got here by eliminating instruction functionality that could be embodied
> > elsewhere in the data-path and uniformly encoded in ISA. This makes each
> > instruction more powerful without adding delay to its execution and makes
> > the compilers job a bit easier in expressing the necessary semantic.
<
> I will assume in this you *don't* mean the apparent common variation of
> x86-64 which takes upwards of 1.5MB to build Doom...
<
No, in general I am talking about "never having to" in terms of loading
constants, pasting constants together, negating or inverting operands,
and the prologue and epilogue handling instructions.
> >>
> >> To make best use you need a bunch of 32 bit ops that use ACC, or better yet
> >> just use a register like R1, and remove that register from normal use, a
> >> reduction of one register for the other compiler passes.
> > <
> > Yech.
> Agreed...
>
> There isn't really a good reason to have hard-coded registers "in
> general" with 32-bit instruction encodings, particularly not for an
> accumulator.
>
The only hard coded registers is::
a) R0 receives the return address when control is delivered to a subroutine..
b) ENTER and EXIT use R31 = SP to build and tear down stack fames.
<
However; there are circumstances where the HW understands that ranges
of registers have certain properties. For example: when Safe-Stack mode
is in effect, R16-R31 are saved on Safe Stack, R0 does not receive the return
address, and R16-R30 when read before write return zeros.
>
> I will make a partial assumption for loading a big constant to a fixed
> register, where having an instruction for a larger constant load could
> partly offset the drawback of having "not particularly large" immediate
> fields in other contexts.
>
My 66000 never has to load a constant to a register. Somewhere close
to 9% of my instruction count vanishes by uniformly supplying large
constants.
>
> Also at the time, 24 bits would be sufficient in-general for things like
> data/bss loads and stores, whereas something more modest (such as 16 or
> 20 bit) would not have been sufficient (several of these programs
> effectively have several MB of '.bss').
>
In My 66000 memory mapping; a simple application such as cat can have
the .txt, .data, .bss, ... each separated by GBs in the virtual address
space and yet only need 1 page of memory mapping tables !
> ...

Re: Why My 66000 is and is not RISC

<t9b6av$7hr9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26125&group=comp.arch#26125

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 19:57:34 -0700
Organization: A noiseless patient Spider
Lines: 102
Message-ID: <t9b6av$7hr9$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me>
<66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Jun 2022 02:57:36 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="260a7ddd273fa9ac8f9423c560c2a5c8";
logging-data="247657"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18b3y40ZXrYiDzWnm+EAGa8"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:TE9JJf0OZrvQ6oxP2Sra/DGztnc=
In-Reply-To: <66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Mon, 27 Jun 2022 02:57 UTC

On 6/26/2022 6:18 PM, MitchAlsup wrote:
> On Sunday, June 26, 2022 at 7:28:11 PM UTC-5, BGB wrote:
>> On 6/26/2022 4:56 PM, MitchAlsup wrote:
>>> On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
>>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>>>
>>>>> Then you end up with a cartesian product of various ways to do the same thing.
>>>>> This makes the compiler have to figure out if::
>>>>> <
>>>>> INC Rd ; NoOp
>>>>> is better or worse than:
>>>>> ADD Rd,Rd,#1
>>>>> and what set of rules (or heuristics) govern the selection. I have worked inside
>>>>> compilers trying to figure this kind of stuff out. It ain't straightforward.
>>>>> <
>>>>> I submit that compilers are best when they only need to figure how to do
>>>>> something exactly one way.
>>> <
>>>> An optimizer pass to convert all loads that are only used once to ACC ops
>>>> is pretty trivial, as a software guy I am not asking for Itanic compiler
>>>> changes.
>>> <
>>> While I can, in general, agree that you are not asking for Titanic additions
>>> to the compiler, you are asking for Titanic alterations of the axioms and
>>> tenets underlying the encoding philosophy of My 66000 ISA. Basically,
>>> you are asking for a complete reset. I am not interested in starting over.
>>> However, You are free to design whatever 16-bit instructions you want for
>>> your architecture.
>> Yeah, there are only so many possible combinations.
> <
> nearly infinite purmutations........
>>>>
>>>> All the other 16 bit variants use restricted register sets and has deep
>>>> effects on register coloring, etc. I tried a dozen of these on paper and
>>>> despite my propaganda of the time they all sucked. ;(
>>> <
>>> You and Quadriblock should get together and compare notes........maybe
>>> you could teach him about "what to leave out" as part of your comp.arch
>>> meeting.
>> Yeah, the funky obsession with non-power-of-2 data sizes and similar
>> would probably be high on my list.
> <
> I was wondering if anyone caught the palindrome of his thread "...life the universe
> and everything." In Hitchhikers Guide to the Galaxy the answer to "life the universe
> and everything" is/was 42 whereas Quadriblock's thread converges on 24 which
> is the simple reverse of 42.
>>>>
>>>> ACC ops follow the KISS principle. Simple one for one substitution of 16
>>>> bit opcodes for the longer ones where the operation tree makes it possible.
>>> <
>>> I am well aware of how accumulator machines perform. Rather well on the
>>> 60%+ code sequences, and less well on the 40%- code sequences. Whereas
>>> a 32-bit only ISA has but one way of expressing arithmetic and does not so
>>> suffer.
>>> <
>>> Remember I have a <essentially> pure IRSC ISA that is achieving x86-64
>>> code density--significantly better than <¿almost?> all other pure RISC ISAs.
>>> I got here by eliminating instruction functionality that could be embodied
>>> elsewhere in the data-path and uniformly encoded in ISA. This makes each
>>> instruction more powerful without adding delay to its execution and makes
>>> the compilers job a bit easier in expressing the necessary semantic.
> <
>> I will assume in this you *don't* mean the apparent common variation of
>> x86-64 which takes upwards of 1.5MB to build Doom...
> <
> No, in general I am talking about "never having to" in terms of loading
> constants, pasting constants together, negating or inverting operands,
> and the prologue and epilogue handling instructions.
>>>>
>>>> To make best use you need a bunch of 32 bit ops that use ACC, or better yet
>>>> just use a register like R1, and remove that register from normal use, a
>>>> reduction of one register for the other compiler passes.
>>> <
>>> Yech.
>> Agreed...
>>
>> There isn't really a good reason to have hard-coded registers "in
>> general" with 32-bit instruction encodings, particularly not for an
>> accumulator.
>>
> The only hard coded registers is::
> a) R0 receives the return address when control is delivered to a subroutine.
> b) ENTER and EXIT use R31 = SP to build and tear down stack fames.
> <
> However; there are circumstances where the HW understands that ranges
> of registers have certain properties. For example: when Safe-Stack mode
> is in effect, R16-R31 are saved on Safe Stack, R0 does not receive the return
> address, and R16-R30 when read before write return zeros.
>>
>> I will make a partial assumption for loading a big constant to a fixed
>> register, where having an instruction for a larger constant load could
>> partly offset the drawback of having "not particularly large" immediate
>> fields in other contexts.
>>
> My 66000 never has to load a constant to a register. Somewhere close
> to 9% of my instruction count vanishes by uniformly supplying large
> constants.

Doesn't have to, but isn't it advisble when a constant has widespread uses?

Re: Why My 66000 is and is not RISC

<t9b6g2$7j72$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26126&group=comp.arch#26126

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Mon, 27 Jun 2022 03:00:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 128
Message-ID: <t9b6g2$7j72$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org>
<t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Jun 2022 03:00:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="03faaa51a66dff8af4f6761757df7c45";
logging-data="249058"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oIwc0C8nnquSeyzcXRR8d"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:Y/obf9EhK2VBOSQVe7srFfEmWPY=
sha1:twdqkPPC//k6Ola2FxSwBNHPyyI=
 by: Brett - Mon, 27 Jun 2022 03:00 UTC

MitchAlsup <MitchAlsup@aol.com> wrote:
> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>> MitchAlsup <Mitch...@aol.com> wrote:
> <snip>
>>> I looked at this a few years ago and the damage to long term ISA growth
>>> was catastrophic. As it is I have nearly ½ of the OpCode space in each
>>> OpCode group left for the future. and can PARSE instructions in 31 gates
>>> with only 4 gates of delay. All that goes out the window with a meaningful
>>> 16-bit "extension". I pass.
> <
>> I don’t get why you think you need to reserve half your opcode space for
>> future extensions, I would have thought we are at the end of history for
>> opcode extensions.
> <
> a) because I have watched what happens to processors over 50 years and
> how every time you turn around they have more OpCodes--mainly to address
> stuff forgotten earlier.
> <
> b) remember I compacted everything into only 59 actual instructions.
>>
>> What is the cost of reserving 3 bits of one pattern and the same pattern at
>> the 16 bit border, so that you can add 16 bit opcodes in the future?
> <
> The cost of reserving space for 16-bit is that it over-constrains the 32-bit
> OpCode space. For example: I could not give the 16-bit OpCodes a typical
> subgroup (6-bit Major OpCode) because the first instruction would only
> have 10-bits left !! (16-6=10)

I can work with 10 bits, ideally 11 for 59 instructions, which would be two
sub groups.

ACC opcodes only need one register, plus 5 bits for the opcode hits all the
common cases this extension would use, so 10 bits works fine.

Note that this extension will not slow down code like all the other 16 bit
architectures. You will get a small boost from a smaller code footprint. 32
bit instructions are unaffected, and do all the heavy lifting.

With two opcode groups it should wire directly into your existing logic,
which actually happens anyway if you split this extension into smaller
groups spread into the open opcode spaces where appropriate.

> Also note: Where there are instructions in several formats (like ADD with 16-bit
> immediate and ADD of 2 registers), in all cases, the bit pattern used to recognize
> ADD remains identical.
> <
> There are several OpCode groups reserved in perpetuity, these were chosen such
> that if one branches into data there is very little possibility to finding anything other
> than INVALID instruction decoding sitting there. From the M7 66000 ISA document::
> <----------------------------------------------------------------------------------------------------------------------------
> A number of the Major OpCode specifiers are reserved in perpetuity as illegal
> OpCodes. Transferring control into 32-bit or 64-bit integer data, or into 32-bit or
> 64-bit floating point data will very likely result in the decoding of an illegal
> instruction and raise the OPERATION exception. In particular, small 32-bit positive
> integers and small 32-bit negative integers are illegal instructions. 32-bit Floating
> point values in the range ±[1/128..32) are also illegal OpCodes. Should control be
> transferred into typical integer or floating point data, there is little likelihood of
> executing for a long time before running into an illegal instruction. Executing data
> can also be prevented in the MMU when desired (almost always.)
> <----------------------------------------------------------------------------------------------------------------------------
> This catches programming errors. When I looked, you can't do this with 16-bit inst-
> ructions; as basically every encoding has to be used.
> <
> That is: some architects think of things other than code density--we think of
> code safety--even when the GuestOS fights us (or JavaScript,...) programming
> attack modes, and design architectures that are substantially more robust than
> current ones.
>>
> In particular, My 66000 is immune to the current laundry list of attack strategies
> {Meltdown, Spectré, RowHammer, RoP, ForeShadow, ...}
> <
>> Besides the cost of rearranging bits the cost is nothing with half the
>> opcode space reserved. If you are clever and spread the three bits to fit
>> your open slots.
> <
>>
>> This gives your customers choices, customers like choices. You could upsell
>> this as a future feature like RISCV does and just never deliver, like
>> RISCV. ;)
>>
> I give my customers a choice: Do you want a machine that is secure or not ?
>>
>> You have a two wide design,
> <
> Technically, I have a 1-wide design that can CoIssue some pairs of instructions
> It is 1-wide because there is 3R1W register file. It can CoIssue because some
> pairs of instructions do not consume certain register ports. It is not at all
> SuperScalar !
> <
> I also have a 6-wide design. Nothing in ISA or the rest of the architecture makes
> either design necessarily harder than it needs to be. For many of the control
> logic calculations My 66000 ISA requires, I went to the trouble of implementing
> the gate circuitry to perform said duties to verify that the choices were good.
> For example, I can look at a 32-bit word (which takes 320 gates to hold in flip-
> flops) and determine the instruction length in 31 gates (4-gates of delay). Thereby
> Instruction caches for machines less than 16-wide do not need predecoding bits.
> FCMP is performed in the same function unit as Integer CMP. The integer part
> requires 55 gates (4 gates of delay) the FP addition is 17 gates (remains 4 gates
> of delay) So adding FCMP to CMP is a small price {Oh and BTW, I restrict myself
> to 4-in NAND gates and 3-in NOR gates}
> <
>> even once you implement 16 bit opcodes handling
>> the unaligned case of long instructions is not that hard, an extra gate
>> delay? Two? Depending on which of a dozen approaches you pick for size.
> <
> I can assure you that if I had set out to design a good 16-bit OpCode ISA
> I would have succeeded. But over my 40 year career, I have found some
> things more valuable, and I pursued those interests. You are free to do
> differently.
>>
>> The resulting 13 bit opcode is plenty for one register a source ACC and a
>> dest ACC, and a full set of integer operations. And some short loads/store
>> offsets to/from ACC using one address register.
> <
> Then you end up with a cartesian product of various ways to do the same thing.
> This makes the compiler have to figure out if::
> <
> INC Rd ; NoOp
> is better or worse than:
> ADD Rd,Rd,#1
> and what set of rules (or heuristics) govern the selection. I have worked inside
> compilers trying to figure this kind of stuff out. It ain't straightforward.
> <
> I submit that compilers are best when they only need to figure how to do
> something exactly one way.

Re: Why My 66000 is and is not RISC

<t9b7ae$7pqq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26127&group=comp.arch#26127

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Sun, 26 Jun 2022 22:14:20 -0500
Organization: A noiseless patient Spider
Lines: 220
Message-ID: <t9b7ae$7pqq$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me>
<66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Jun 2022 03:14:23 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a75f62ff4d5028ec146574a01f589347";
logging-data="255834"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18x2efl7d9zijMR7flebryR"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:JdWJjBXf/gDvZf8QbsQ9hvVRawo=
In-Reply-To: <66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
Content-Language: en-US
 by: BGB - Mon, 27 Jun 2022 03:14 UTC

On 6/26/2022 8:18 PM, MitchAlsup wrote:
> On Sunday, June 26, 2022 at 7:28:11 PM UTC-5, BGB wrote:
>> On 6/26/2022 4:56 PM, MitchAlsup wrote:
>>> On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
>>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com wrote:
>>>
>>>>> Then you end up with a cartesian product of various ways to do the same thing.
>>>>> This makes the compiler have to figure out if::
>>>>> <
>>>>> INC Rd ; NoOp
>>>>> is better or worse than:
>>>>> ADD Rd,Rd,#1
>>>>> and what set of rules (or heuristics) govern the selection. I have worked inside
>>>>> compilers trying to figure this kind of stuff out. It ain't straightforward.
>>>>> <
>>>>> I submit that compilers are best when they only need to figure how to do
>>>>> something exactly one way.
>>> <
>>>> An optimizer pass to convert all loads that are only used once to ACC ops
>>>> is pretty trivial, as a software guy I am not asking for Itanic compiler
>>>> changes.
>>> <
>>> While I can, in general, agree that you are not asking for Titanic additions
>>> to the compiler, you are asking for Titanic alterations of the axioms and
>>> tenets underlying the encoding philosophy of My 66000 ISA. Basically,
>>> you are asking for a complete reset. I am not interested in starting over.
>>> However, You are free to design whatever 16-bit instructions you want for
>>> your architecture.
>> Yeah, there are only so many possible combinations.
> <
> nearly infinite purmutations........
>>>>
>>>> All the other 16 bit variants use restricted register sets and has deep
>>>> effects on register coloring, etc. I tried a dozen of these on paper and
>>>> despite my propaganda of the time they all sucked. ;(
>>> <
>>> You and Quadriblock should get together and compare notes........maybe
>>> you could teach him about "what to leave out" as part of your comp.arch
>>> meeting.
>> Yeah, the funky obsession with non-power-of-2 data sizes and similar
>> would probably be high on my list.
> <
> I was wondering if anyone caught the palindrome of his thread "...life the universe
> and everything." In Hitchhikers Guide to the Galaxy the answer to "life the universe
> and everything" is/was 42 whereas Quadriblock's thread converges on 24 which
> is the simple reverse of 42.

FWIW: 128/3 ~= 42 ...

But, yeah, in a more practical sense, unusual data sizes aren't much of
a win.

>>>>
>>>> ACC ops follow the KISS principle. Simple one for one substitution of 16
>>>> bit opcodes for the longer ones where the operation tree makes it possible.
>>> <
>>> I am well aware of how accumulator machines perform. Rather well on the
>>> 60%+ code sequences, and less well on the 40%- code sequences. Whereas
>>> a 32-bit only ISA has but one way of expressing arithmetic and does not so
>>> suffer.
>>> <
>>> Remember I have a <essentially> pure IRSC ISA that is achieving x86-64
>>> code density--significantly better than <¿almost?> all other pure RISC ISAs.
>>> I got here by eliminating instruction functionality that could be embodied
>>> elsewhere in the data-path and uniformly encoded in ISA. This makes each
>>> instruction more powerful without adding delay to its execution and makes
>>> the compilers job a bit easier in expressing the necessary semantic.
> <
>> I will assume in this you *don't* mean the apparent common variation of
>> x86-64 which takes upwards of 1.5MB to build Doom...
> <
> No, in general I am talking about "never having to" in terms of loading
> constants, pasting constants together, negating or inverting operands,
> and the prologue and epilogue handling instructions.

My case is pretty much comparable to x86-64 in what you can do with
immediate and displacement fields:
x86-64: 8 and 32; 64 via a dedicated load.
BJX2: 9 and 33, 64 via a dedicated load.

But, differs in that it has 3R, and is Load/Store.

Also, I have both more registers, and encodings that are often smaller.
Though, x86-64 has:
MOV Rn, Imm64 (80-bits)
Where:
LDI Imm64, Rn (96-bits)
So, x86-64 wins this one...

However, compared with some of the other options (eg: SH-4 or Thumb2),
x86-64's code density seems to be "pretty bad".

Even something like ARMv5 (with fixed-length 32-bit instructions) seems
to be able to beat x86-64 on this metric.

>>>>
>>>> To make best use you need a bunch of 32 bit ops that use ACC, or better yet
>>>> just use a register like R1, and remove that register from normal use, a
>>>> reduction of one register for the other compiler passes.
>>> <
>>> Yech.
>> Agreed...
>>
>> There isn't really a good reason to have hard-coded registers "in
>> general" with 32-bit instruction encodings, particularly not for an
>> accumulator.
>>
> The only hard coded registers is::
> a) R0 receives the return address when control is delivered to a subroutine.
> b) ENTER and EXIT use R31 = SP to build and tear down stack fames.
> <
> However; there are circumstances where the HW understands that ranges
> of registers have certain properties. For example: when Safe-Stack mode
> is in effect, R16-R31 are saved on Safe Stack, R0 does not receive the return
> address, and R16-R30 when read before write return zeros.

OK.

I have R0, R1, and R15/SP hard-coded in certain contexts.

LR is nominally in CR space, but I sometimes used R1 as a "Secondary
Link-Register" or "Saved-Link-Register", mostly in the context of prolog
and epilog compression.

R0 and R1 can be used as scratch registers (with care), however:
The assembler may stomp them without warning in some cases;
For some instructions, they are not allowed:
In some cases, the encodings are special cases;
Some instructions are simply not allowed to use them.
...

>>
>> I will make a partial assumption for loading a big constant to a fixed
>> register, where having an instruction for a larger constant load could
>> partly offset the drawback of having "not particularly large" immediate
>> fields in other contexts.
>>
> My 66000 never has to load a constant to a register. Somewhere close
> to 9% of my instruction count vanishes by uniformly supplying large
> constants.

Stuff like:
if(x>=100)
x=99;
Is still kind of a thing...

But, yeah:
y=x+12345678;
Can be handled with an Imm33s encoding.

But, as noted, the FAzz_zzzz and FBzz_zzzz encodings predate the
addition of Jumbo encodings. In the original form of the ISA (and in ISA
subsets without Jumbo) they are more useful.

But, as noted, these only exist in one of the "unconditional subspace":
FAzz_zzzz LDIZ Imm24u, R0
FBzz_zzzz LDIN Imm24n, R0
WEX Space (Same spot, just WEX=1):
FEzz_zzzz Jumbo
FFzz_zzzz Op64
Predicate Space:
EAzz_zzzz PrWEX F0?T
EBzz_zzzz PrWEX F2?T
EEzz_zzzz PrWEX F0?F
EFzz_zzzz PrWEX F2?F

So, whether or not these are "still" useful, their "twins" elsewhere in
the encoding space allow for a few other encoding spaces to exist.

Likewise:
FFdd_dddd_FAdd_dddd BRA Abs48
FFdd_dddd_FBdd_dddd BSR Disp48
It is itself effective bundled with itself to encode the Abs48 branches.

Had I put pretty much anything else there, I would not have been able to
do this stuff in this way.

>>
>> Also at the time, 24 bits would be sufficient in-general for things like
>> data/bss loads and stores, whereas something more modest (such as 16 or
>> 20 bit) would not have been sufficient (several of these programs
>> effectively have several MB of '.bss').
>>
> In My 66000 memory mapping; a simple application such as cat can have
> the .txt, .data, .bss, ... each separated by GBs in the virtual address
> space and yet only need 1 page of memory mapping tables !

OK.

I am mostly dealing with programs like Doom and Quake effectively
needing ~ 21..23 bits to be able to address across the size of their
'.bss' section.

One can eliminate a lot of bits by using GBR (Global-Base-Register), or
similar, but still need a lot of bits to deal with the size of the section.

One could use less bits by using a GOT, but this just sorta replaces the
large displacements with additional memory loads and tables (not really
a win).


Click here to read the complete article
Re: Why My 66000 is and is not RISC

<t9bffh$mg2$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26128&group=comp.arch#26128

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-3fa9-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Mon, 27 Jun 2022 05:33:37 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t9bffh$mg2$1@newsreader4.netcologne.de>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me> <t9b0f6$3bks$1@dont-email.me>
Injection-Date: Mon, 27 Jun 2022 05:33:37 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-3fa9-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:3fa9:0:7285:c2ff:fe6c:992d";
logging-data="23042"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Mon, 27 Jun 2022 05:33 UTC

Ivan Godard <ivan@millcomputing.com> schrieb:

> There are advantages to dedicated base registers, set as a side effect
> of other operations. You can keep them where they are used, instead of
> in a regfile, saving wire delay. You need fewer of them (Mill: 8 vs 32)
> saving fan-in. If you restrict addressable regions to not cross a 4Gb
> (or less) boundary then you can have a narrower address adder (Mill: 32
> bits vs 64).

Hmm... does that mean that the Mill has to do special things to
address arrays > 4 GB?

Re: Why My 66000 is and is not RISC

<t9bfms$mg2$2@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26129&group=comp.arch#26129

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-3fa9-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Mon, 27 Jun 2022 05:37:32 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t9bfms$mg2$2@newsreader4.netcologne.de>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me>
<66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
Injection-Date: Mon, 27 Jun 2022 05:37:32 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-3fa9-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:3fa9:0:7285:c2ff:fe6c:992d";
logging-data="23042"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Mon, 27 Jun 2022 05:37 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

> My 66000 never has to load a constant to a register. Somewhere close
> to 9% of my instruction count vanishes by uniformly supplying large
> constants.

It does not have to, but it makes sense to do so (and is
now done) if the same constant is stored multiple times, see
https://github.com/bagel99/llvm-my66000/issues/2 .

Re: Why My 66000 is and is not RISC

<t9bk0h$asu0$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26130&group=comp.arch#26130

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Mon, 27 Jun 2022 01:50:55 -0500
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <t9bk0h$asu0$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me>
<66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
<t9bfms$mg2$2@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Jun 2022 06:50:57 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a75f62ff4d5028ec146574a01f589347";
logging-data="357312"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+R0Gp3wMdofIePTRhmNvgw"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:r7wTDjuw4tr9xbpG5CmI1VnjDEM=
In-Reply-To: <t9bfms$mg2$2@newsreader4.netcologne.de>
Content-Language: en-US
 by: BGB - Mon, 27 Jun 2022 06:50 UTC

On 6/27/2022 12:37 AM, Thomas Koenig wrote:
> MitchAlsup <MitchAlsup@aol.com> schrieb:
>
>> My 66000 never has to load a constant to a register. Somewhere close
>> to 9% of my instruction count vanishes by uniformly supplying large
>> constants.
>
> It does not have to, but it makes sense to do so (and is
> now done) if the same constant is stored multiple times, see
> https://github.com/bagel99/llvm-my66000/issues/2 .

Agreed, sometimes it makes sense.

FWIW: In BGBCC, constants are generally treated like a sort of read-only
variable, and so may be pulled into a register like that of a variable.

Though, admittedly, my compiler isn't smart enough to make decisions
about when it might be better to use an immediate, or pull the value
into a register (this decision would need to be made at the
code-generation level rather than the assembler level). This would
likely require adding the use of a heuristic of some sort.

Though, I would suspect cases like that shown are probably in the
minority of cases.

The specific case shown in the issue-tracker would not come up in BJX2,
because it doesn't support directly storing a constant to memory, so
would be forced in this case to load the value into a register and then
store it to memory, and by extension the register would likely be reused
across the multiple stores (since each time one tries to reload it, the
compiler will see that it was already present in a register).

There are cases where this could be useful, just I don't expect they
would likely come up often enough to justify the encoding.

Re: Why My 66000 is and is not RISC

<t9ceo5$hs2h$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26132&group=comp.arch#26132

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
Date: Mon, 27 Jun 2022 07:27:19 -0700
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <t9ceo5$hs2h$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me> <t9b0f6$3bks$1@dont-email.me>
<t9bffh$mg2$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Jun 2022 14:27:17 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="260a7ddd273fa9ac8f9423c560c2a5c8";
logging-data="585809"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18mNhoKpXVbC53t0oo+y2sv"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:THnr8Ltpsa3ambypcl6Jxbnlrgo=
In-Reply-To: <t9bffh$mg2$1@newsreader4.netcologne.de>
Content-Language: en-US
 by: Ivan Godard - Mon, 27 Jun 2022 14:27 UTC

On 6/26/2022 10:33 PM, Thomas Koenig wrote:
> Ivan Godard <ivan@millcomputing.com> schrieb:
>
>> There are advantages to dedicated base registers, set as a side effect
>> of other operations. You can keep them where they are used, instead of
>> in a regfile, saving wire delay. You need fewer of them (Mill: 8 vs 32)
>> saving fan-in. If you restrict addressable regions to not cross a 4Gb
>> (or less) boundary then you can have a narrower address adder (Mill: 32
>> bits vs 64).
>
> Hmm... does that mean that the Mill has to do special things to
> address arrays > 4 GB?

Arrays can be of any size that mmap is willing to give you. Generated
code for constant offsets bigger than 2^32 builds a pointer by explicit
arithmetic rather than by using the address adder. The arithmetic costs
two instructions (con, addp) and a cycle, but saves in the AA for all
offsets < 2^32. We judge that to be a worthwhile tradeoff.

Re: Why My 66000 is and is not RISC

<DwluK.284756$70j.64197@fx16.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26134&group=comp.arch#26134

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Why My 66000 is and is not RISC
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com> <t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me> <7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com> <t92sv9$jbb$1@dont-email.me> <e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com> <t97u0m$3kbi2$1@dont-email.me> <b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com> <t9afmp$vpi$1@dont-email.me> <dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com> <t9atin$2mot$1@dont-email.me> <66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com> <t9bfms$mg2$2@newsreader4.netcologne.de>
In-Reply-To: <t9bfms$mg2$2@newsreader4.netcologne.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 22
Message-ID: <DwluK.284756$70j.64197@fx16.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 27 Jun 2022 17:10:59 UTC
Date: Mon, 27 Jun 2022 13:10:46 -0400
X-Received-Bytes: 1909
 by: EricP - Mon, 27 Jun 2022 17:10 UTC

Thomas Koenig wrote:
> MitchAlsup <MitchAlsup@aol.com> schrieb:
>
>> My 66000 never has to load a constant to a register. Somewhere close
>> to 9% of my instruction count vanishes by uniformly supplying large
>> constants.
>
> It does not have to, but it makes sense to do so (and is
> now done) if the same constant is stored multiple times, see
> https://github.com/bagel99/llvm-my66000/issues/2 .

Is there something odd about the assembler at the top where
it seems to repeatedly spill a register to local stack frame
then use it as a pointer. e.g.

std r26,[sp,216]
.loc 1 268 1 ; fatigue2.f90:268:1
std #-4317352126650676160,[r26]

plus does this 5 other times.
Just checking.

Code Density Deltas (Re: Why My 66000 is and is not RISC)

<t9d0i4$mmiv$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26135&group=comp.arch#26135

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Code Density Deltas (Re: Why My 66000 is and is not RISC)
Date: Mon, 27 Jun 2022 14:31:14 -0500
Organization: A noiseless patient Spider
Lines: 209
Message-ID: <t9d0i4$mmiv$1@dont-email.me>
References: <f647da4a-0617-4292-9a1b-b3be674150cbn@googlegroups.com>
<t90vhb$1m8n$1@gioia.aioe.org> <t9134k$3tg$1@dont-email.me>
<7dc3d1b2-3c75-4f38-b15e-3e9b5c2cbc0dn@googlegroups.com>
<t92sv9$jbb$1@dont-email.me>
<e914214f-4881-448a-94ff-6506dde833c3n@googlegroups.com>
<t97u0m$3kbi2$1@dont-email.me>
<b697d701-a4f1-4028-b873-83704c25074an@googlegroups.com>
<t9afmp$vpi$1@dont-email.me>
<dae289ba-abf7-42b0-b94e-f7874a506391n@googlegroups.com>
<t9atin$2mot$1@dont-email.me>
<66576523-b66f-439d-be43-3054e27fa0aan@googlegroups.com>
<t9b7ae$7pqq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Jun 2022 19:31:16 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="a75f62ff4d5028ec146574a01f589347";
logging-data="744031"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UDmY5WM7h4Gw7G8yeBx4q"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.10.0
Cancel-Lock: sha1:Hn+qPmzVUBK8xTZurpittI7owoA=
In-Reply-To: <t9b7ae$7pqq$1@dont-email.me>
Content-Language: en-US
 by: BGB - Mon, 27 Jun 2022 19:31 UTC

On 6/26/2022 10:14 PM, BGB wrote:
> On 6/26/2022 8:18 PM, MitchAlsup wrote:
>> On Sunday, June 26, 2022 at 7:28:11 PM UTC-5, BGB wrote:
>>> On 6/26/2022 4:56 PM, MitchAlsup wrote:
>>>> On Sunday, June 26, 2022 at 3:31:25 PM UTC-5, gg...@yahoo.com wrote:
>>>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>>>> On Saturday, June 25, 2022 at 4:17:18 PM UTC-5, gg...@yahoo.com
>>>>>> wrote:
>>>>

<snip>

>
> My case is pretty much comparable to x86-64 in what you can do with
> immediate and displacement fields:
>   x86-64: 8 and 32; 64 via a dedicated load.
>   BJX2: 9 and 33, 64 via a dedicated load.
>
> But, differs in that it has 3R, and is Load/Store.
>
> Also, I have both more registers, and encodings that are often smaller.
>   Though, x86-64 has:
>     MOV Rn, Imm64 (80-bits)
>   Where:
>     LDI Imm64, Rn (96-bits)
>   So, x86-64 wins this one...
>
>
> However, compared with some of the other options (eg: SH-4 or Thumb2),
> x86-64's code density seems to be "pretty bad".
>
> Even something like ARMv5 (with fixed-length 32-bit instructions) seems
> to be able to beat x86-64 on this metric.
>
>

So, thought is, comparing a few common-case encodings for size between
x86-64 (x64) and BJX2 (mostly for cases where direct analogs exist):
MOV Rxx, Imm64 | LDI Imm64, Rn
10(x64) vs 12(BJX2), x64 wins
MOV Rxx, Imm32 | LDI Imm8, Rn
7(x64) vs 2(BJX2), BJX2 wins
(x86-64 lacking a smaller immediate than Imm32 here)

MOV Rxx, [Rb+Ri*8] | MOV.Q (Rm, Ri), Rn
4(x64) vs 4(BJX2), tie
MOV Rxx, [Rb+Disp8] | MOV.Q (Rm, Disp9u), Rn
4 (x64) vs 4(BJX2), tie

MOV Rxx, [Rb+Ri*Sc+Disp8] | MOV.Q (Rb, Ri*Sc, Disp11), Rn
5(x64) vs 8(BJX2), x64 wins
MOV Rxx, [RIP+Disp32] | MOV.Q (PC, Disp33), Rn
7(x64) vs 8(BJX2), x64 wins

ADD Rxx, Rxx | ADD Rm, Rn
3(x64) vs 2(BJX2), BJX2 wins

ADD Rn, Imm32 | ADD Imm8, Rn
7(x64) vs 2(BJX2), BJX2 wins
(another scenario where x86-64 lacks smaller immeds)
(You don't get byte values unless working on byte registers).
ADD Rn, Imm32 | ADD Imm16s, Rn
7(x64) vs 4(BJX2), BJX2 wins
ADD Rn, Imm32 | ADD Imm33s, Rn
7(x64) vs 8(BJX2), x64 wins
MOV Rt, Imm64; ADD Rn, Rt | ADD Imm64, Rn
13(x64) vs 12(BJX2), BJX2 wins

In many areas where there is a direct 1:1 comparison, x86-64 seems to be
ahead; Except in cases where x86-64 only has a 32-bit immediate.

I am generally assuming encodings here where REX is used (typical case,
basically required to use registers as 64-bit).

I decided to allow comparing REX-prefixed encodings against 16-bit
encodings on the basis that both cases have the same number of usable
registers (16 in this case).

In general, BJX2 has 2x or 4x (XGPR) as many GPRs x86-64.
Most ops from R16..R31, or R32..R63, will require 32-bit encodings.

In many cases, speed-optimized modes would add a penalty for BJX2, as in
this mode it significantly reduces the number of 16-bit encodings used.
This is partly due to a mix of register assignments (nearly always
enables R16..R31), and because is needs 32-bit encodings for the
WEXifier to be able to do its thing (instruction-level swap-and-bundle
would not be viable with 16-bit encodings thrown in the mix).

The 16-bit encodings remain fairly common in prolog/epilog sequences
though (and in a lot of the hand-written ASM).

Main differences come up in areas where things are not 1:1, eg:
MOV Rnn, Rss; ADD Rnn, Rtt | ADD Rs, Rt, Rn
6(x64) vs 4(BJX2), BJX2 wins
MOV Rxx+0, [RSP+Disp]; MOV Rxx+1, [RSP+Disp] | MOV.X (SP, disp4), Xn
10(x64) vs 2(BJX2), BJX2 wins

Would be a bigger difference for more extreme cases:
MOV Rnn0, Rss0; MOV Rnn1, Rss1;
CLC; ADC Rnn0, Rtt0; ADC Rnn1, Rtt1
Vs:
ADDX Xs, Xt, Xn
13(x64) vs 4(BJX2), BJX2 wins.

3R (Basic):
MOV Rnn, Rss; ADD Rnn, Rtt
Vs:
ADD Rs, Rt, Rn
6(x64), 4(BJX2), BJX2 wins.

3R (Load):
MOV Rnn, Rss; ADD Rnn, [Rtt, Disp8]
Vs:
MOV.Q (Rt, Disp9), Rx; ADD Rs, Rx, Rn
7(x64) vs 8(BJX2), x64 wins.

Or, predication:
CMP Rss, Rtt; JGT .L0; MOV Rnn, Rtt; .L0:
Vs:
CMPQGT Rt, Rs; MOV?F Rt, Rn
8(x64), 6(BJX2), BJX2 wins

....

Will mostly ignore SIMD / SSE here.

One other area of differences is that x86-64 allows the source or
destination to be memory.

This could give an advantage to x86-64 in cases where it operates on
memory, eg:
ADD Rn, [Rs] | MOV.Q (Rs), Rx; ADD Rx, Rn
3 (x64) vs 4 (BJX2), x64 wins

ADD Rn, [RSP+8] | MOV.Q (SP, 8), Rx; ADD Rx, Rn
5 (x64) vs 4 (BJX2), BJX2 wins

ADD [Rn], Rs | MOV.Q (Rn), Rx; ADD Rs, Rx; MOV.Q Rx, (Rn)
3 (x64) vs 6 (BJX2), x64 wins

ADD [RSP+8], Rs | MOV.Q (SP, 8), Rx; ADD Rs, Rx; MOV.Q Rx, (SP, 8)
5 (x64) vs 6 (BJX2), x64 wins

....

Though, BGBCC tries to minimize the number of loads and stores, whereas
a lot of generated x86-64 code uses memory operands fairly often.

Granted, one can make use of having a larger register space (32 or 64)
by statically-assigning commonly used variables to registers, which is
less viable with 16 registers.

Ironically, this is an area where RISC-V could have an advantage with
the 'A' extension, which does allow a limited set of operations to use
direct memory operands. But, I am less a fan of this, as I would assume
sticking with plain Load/Store unless there is a good reason to do
otherwise.

In effect, doing something like this would likely involve needing to
stick an additional ALU into the L1 D$ or similar. One could argue
though that it could make sense on the basis that Load+Op and
Load+Op+Store sequences are still "not particularly rare"...

Looking at it on this level, x86-64 and BJX2 should be more comparable
in terms of code density; and x86-64 a little more competitive on this
metric than what I am often seeing.

This is not what I see in practice though, where the x86-64 binaries
tend to be quite a bit larger (assuming uncompressed binary sizes).

I suspect that quite possibly the compilers are wasting a large amount
of space somewhere (though ".text" still tends to be pretty large, IME,
even for size-optimized builds).

More so given the (often fairly large) size delta between 32 and 64 bit
binaries (somewhat beyond what could be explained away via the REX byte,
PUSH going from 1 to 3 bytes, ...).

While ABIs are different, I can note that both x64 and BJX2 tend to use
a similar ABI design in this area (passing arguments in registers, ...).

Granted, I guess the counter-point would be if other people are not
seeing x86-64 binaries that are seemingly "overly large for no
particularly obvious reason".

At first, I was thinking it was mostly an MSVC thing, but then noted
that GCC also appears to be doing this in my case.

I could speculate on possible causes, but lack anything solid at the moment.

....


devel / comp.arch / Re: Why My 66000 is and is not RISC

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor