Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Behind every great computer sits a skinny little geek.


computers / comp.arch / Re: a modest proposal (apologies to J. Swift)

Re: a modest proposal (apologies to J. Swift)

<662f1619-a5e0-4174-a728-2c795fdbfe35n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=18063&group=comp.arch#18063

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:eb0c:: with SMTP id j12mr10144822qvp.3.1624614941952; Fri, 25 Jun 2021 02:55:41 -0700 (PDT)
X-Received: by 2002:a05:6830:1bf7:: with SMTP id k23mr9331016otb.206.1624614941732; Fri, 25 Jun 2021 02:55:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 25 Jun 2021 02:55:41 -0700 (PDT)
In-Reply-To: <sb2ukd$5k9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <samgjd$cbd$1@dont-email.me> <sao5ar$1cl7$1@gal.iecc.com> <sapo7q$c9v$1@dont-email.me> <saqctc$2rso$1@gal.iecc.com> <saqj72$mnk$1@dont-email.me> <7a9e6be9-e406-467a-93b7-2b14f7d07deen@googlegroups.com> <saqsq8$r75$1@dont-email.me> <d705b13b-5b6f-4943-b454-d057a310fb8en@googlegroups.com> <sat9lt$vvv$1@dont-email.me> <ca127074-6978-44ae-9b09-8a304b921026n@googlegroups.com> <sb22rk$n03$1@dont-email.me> <8b3c0cb3-400a-4987-8ef6-95a23c9abbfan@googlegroups.com> <sb2gio$rc9$1@dont-email.me> <a2f52a02-5c51-4e27-a26f-40ad1ddd08acn@googlegroups.com> <sb2ukd$5k9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <662f1619-a5e0-4174-a728-2c795fdbfe35n@googlegroups.com>
Subject: Re: a modest proposal (apologies to J. Swift)
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 25 Jun 2021 09:55:41 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 193
 by: Michael S - Fri, 25 Jun 2021 09:55 UTC

On Friday, June 25, 2021 at 12:49:03 AM UTC+3, BGB wrote:
> On 6/24/2021 1:44 PM, MitchAlsup wrote:
> > On Thursday, June 24, 2021 at 12:49:14 PM UTC-5, BGB wrote:
> >> On 6/24/2021 11:50 AM, MitchAlsup wrote:
> >>> On Thursday, June 24, 2021 at 8:55:02 AM UTC-5, BGB wrote:
> >>>> (Got distracted, posting got delayed...)
> >>>> On 6/22/2021 2:26 PM, MitchAlsup wrote:
> >>>
> >>>>> <
> >>>>> I hope this is for better reasons than MIPS used in R2000 and R3000.....
> >>>>> <
> >>>>> R0 is the only register with HW defined use--and that is
> >>>>> a) gets IP return address when a CALL is performed
> >>>>> b) is an alias for IP when used as a base register
> >>>>> c) is an alias for zero when used as an index register
> >>>>> <
> >>>>> R31 (SP) has a bit that changes how ENTER and EXIT work
> >>>>> R30 (FP) has a bit that changes how ENTER and EXIT work.
> >>> <
> >>>> R0: May be stomped without warning if one tries to encode an operation
> >>>> with an immediate, and the immediate form can't otherwise be encoded for
> >>>> whatever reason.
> >>> <
> >>> I got rid of any need to stomp registers other than direct instructions.
> >>> So, My 66000 does not stomp any registers, it can supply immediates
> >>> of appropriate sizes.
> >> The original design was built around the assumption of register stomping
> >> (for both R0 and R1).
> >>
> >> With the ISA as it exists now, there is a lot less of it, but it still
> >> exists as a possibility. The stomp register was made mostly off-limits
> >> mostly because a "not off-limits" stomp register was super annoying in
> >> its predecessors (SH and BJX1).
> >>
> >> I moved the stuff that would have originally used R0 and R1 in the SH
> >> ABI to use R2 and R3 in the BJX2 ABI.
> >>
> >>
> >> R1 ends up being used a lot as a "secondary stomp register" by the
> >> compiler, typically when generating an instruction sequence for a 3AC op
> >> will require a temporary register (similar for R16 and R17).
> >>
> >> But, since these aren't really allowed as memory base or displacements,
> >> some built-in ops (such as memory copies) need to use the other scratch
> >> registers, which then blows the function's status as a "pure leaf" (so
> >> it creates a stack frame and uses the old register allocator).
> >>
> >> At present, the "pure leaf" status also excludes functions containing
> >> floating point and vector types, but this may change later.
> >>>>
> >>>> Say, assuming Jumbo isn't allowed:
> >>>> ADD R4, -9999, R6
> >>>> Might be encoded as if it were:
> >>>> MOV -9999, R0
> >>>> ADD R4, R0, R6
> >>> <
> >>> You are using instructions to build and deliver constants into the instruction stream.
> >> As noted, "assuming Jumbo isn't allowed".
> >>
> >> If Jumbo is allowed, this case can be in-effect encoded as a single
> >> instruction.
> >>
> >>
> >> Jumbo isn't always allowed though:
> >> It is still an optional feature as far as the ISA is concerned:
> >> Lite profile cores are allowed to omit it (1).
> > <
> > This is a choice bordering on the poor end of the spectrum of choices.
> >>
> >>
> >> *1: The Lite profile may differ in a few ways:
> >> Scalar only;
> >> 16/32 bit ops only;
> >> Jumbo is optional (*2);
> > <
> > Note: Even VVM is mandatory in My 66000--but VVM can be performed on
> > a 1-wide in-order machine with no extra circuits other than a VVM sequencer.
> > <
> >> FPU and MMU are optional;
> > <
> > Bad choice:: My 66000 does not even come out of reset with the MMU turned off !!
> > Even I/O is defined to pass through an I/O MMU.
> The FPU and MMU are required for the Full profile, but not for the Lite
> profile.
>
> The Lite profile is intended more for use as a microcontroller, and
> these profiles are not required to be binary compatible.
>
>
> Granted, potentially a MMU could stretch KB of RAM a little further, if
> one is implementing virtual memory onto an SDcard or similar. Though,
> even 4K pages are a pretty coarse granularity when the RAM is in KB
> territory.
>
> Though it could almost make sense in this case to try devising some sort
> of "Use SDcard like non-volatile RAM" mechanism (or externally wire up a
> QSPI SRAM module or similar).
>
>
> An FPU is kinda expensive, and not really needed in a microcontroller.
> > <
> > I would not design an ISA these days, with available gate counts, that does not
> > have FP built in.
> > <
> The FPU exists in the Full profile, just it is a bit of a stretch on the
> smaller FPGAs.
>
> Similarly, the MMU adds a lot of cost for a small scalar core, and one
> might want to, say, try to keep their core under 10k LUTs so that they
> can have some space left for IO devices and similar.
> >> No 128-bit operations;
> > <
> > All access to multi-precision stuff is through CARRY in My 66000. CARRY
> > is not optional, but it is only a couple of sequences on low end processors.
> > <
> This mostly covers MOV.X and 128-bit SIMD, which as-implemented require
> multiple lanes. On a Lite core it needs to be broken up into MOV.Q ops.
> >> Uses a smaller address space
> >> 32-bit physical, split into a 30-bit (RAM) and 28-bit region (MMIO).
> > <
> > I control this through the <non optional> MMU. The programmer sees
> > either a 32-bit virtual address space or a 64-bit virtual address space.
> > Either of which may see a 40-bit-to-64-bit physical address space.
> Currently, it is 32-bit in Lite, 48 bit in Full.
>
> The current physical space is a 30-bit RAM space:
> 0000_0000 .. 3FFF_FFFF
> And, 28-bit MMIO space:
> F000_0000 .. FFFF_FFFF
>
> Could support a bigger physical space, but none of the FPGA boards I
> have, have enough RAM to make this worthwhile.
>
> For now, RAM generally goes into the area:
> 0100_0000 .. 07FF_FFFF
>
>
> On the CMod-S7 I have, it puts its RAM space into:
> 0100_0000 .. 0101_FFFF
>
> In this case, the L2 cache functions as RAM, because there is no
> external RAM on this board.
>
> Another (slightly more expensive) version of the CMod board had a 512K
> RAM module though, but the board I got didn't have any RAM.
> >>
> >> This is mostly to make it viable to use on the XC7S25 and similar (or
> >> potentially also various ECP5 devices or similar). Granted, in this area
> >> it would face more competition against RISC-V or similar.
> >>
> > I am aware of your implementation limitations.
> As can be noted though, I am mostly targeting the XC7A100...
>
>
> But, the XC7S25 is more limited than the XC7A100, as are most of the
> ECP5 chips.
>
> The ECP5's can have a decent number of LUTs, but are a little more
> limited in that it is built around LUT4's.
>
> There is also ICE40, but there is little hope of fitting a BJX2 core
> onto an ICE40. For the most part, people are doing 8/16 designs on ICE40
> though.
>

I have no idea who are "people" you are talking about. Would think that most of them are enthusiasts rather than pros.

Me, being a pro, I very much prefer 32-bit soft cores, supported by standard compilers (in practice it means gcc).
For convenience of reliable tools I am fully willing to a cost of pretty low MIPS/MHz, which would be typical for
resource-constrained implementation of such core.
In particular, I often use Nios2e core that gives ~0.15 MIPS/MHz, fits in ~600 4-input LUTs + 2 Embedded memory banks
and hits pretty high frequencies. I would guess, that at your XC7A100 it will be close to 200MHz.
core like that fits with ease within iCE5LP4K and for some less logic-intensive apps can be useful even in iCE5LP2K.

Of course, Nios2 is property of Altera/Intel and (IANAAL) most likely can't be legally used on Lattice devices.
But it's extremly similar to fixed-width variant of RV32 which is certainly legal to use everywhere.

I didn't check if soft RV32 cores of these class are available, for a little cost or completely free of charge, but personally
I will be very surprised if they are not available. Because designing such core is ~6 weeks of dedicated work for a above
average, but not outstanding CA student. Not including validation. I have no feeling about cost of validation of MMU-less
RV32, because it strongly depends on maturity of available tools, into which I never looked.

I don't even think that ~0.15 MIPS/MHz in ~600 4-input LUTs is the best possible. Trading off a bit of achievable
frequency, one can do 0.25 MIPS/MHz in 600 LUTs or over 0.5 MIPS/MHz in 750-800 LUTs.
I know, because I played with these things couple of years ago, under few artificial constraints, in order to make a play more
interesting. Generally designing CPUs is not my passion, so after 2-3 months of playing I lost interest, but I got to the stage
where few of my core variants were running real useful software (one of our production utilities, mostly ECDSA validation)
and measured performances and utilization of FPGA resources, so I consider myself somewhat qualified to make statements
like above.

>
> Though, one can argue though that even on the lower-end, the FPGAs still
> aren't particularly cost-competitive with a Teensy or Pi Pico or
> similar... But, can have more powerful IO.

SubjectRepliesAuthor
o a modest proposal (apologies to J. Swift)

By: Ivan Godard on Sun, 20 Jun 2021

37Ivan Godard
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor