Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Biology grows on you.


devel / comp.arch / The Computer of the Future

SubjectAuthor
* The Computer of the FutureQuadibloc
+* Re: The Computer of the FutureMitchAlsup
|+- Re: The Computer of the FutureBrett
|+* Re: The Computer of the FutureQuadibloc
||`* Re: The Computer of the FutureMitchAlsup
|| `- Re: The Computer of the FutureScott Smader
|`* Re: The Computer of the FutureQuadibloc
| `* Re: The Computer of the FutureMitchAlsup
|  +* Re: The Computer of the FutureQuadibloc
|  |+* Re: The Computer of the FutureQuadibloc
|  ||+* Re: The Computer of the FutureQuadibloc
|  |||`* Re: The Computer of the FutureQuadibloc
|  ||| `- Re: The Computer of the FutureQuadibloc
|  ||`* Re: The Computer of the FutureMitchAlsup
|  || +* Re: The Computer of the FutureQuadibloc
|  || |`* Re: The Computer of the FutureStefan Monnier
|  || | `- Re: The Computer of the FutureMitchAlsup
|  || `* Re: The Computer of the FutureIvan Godard
|  ||  +- Re: The Computer of the FutureQuadibloc
|  ||  +* Re: The Computer of the FutureTim Rentsch
|  ||  |`* Re: The Computer of the FutureTerje Mathisen
|  ||  | `* Re: The Computer of the FutureTim Rentsch
|  ||  |  `* Re: The Computer of the FutureTerje Mathisen
|  ||  |   `* Re: The Computer of the FutureTim Rentsch
|  ||  |    `* Re: The Computer of the FutureTerje Mathisen
|  ||  |     +* Re: The Computer of the FutureTim Rentsch
|  ||  |     |`* Re: The Computer of the FutureTerje Mathisen
|  ||  |     | +* Re: The Computer of the FutureDavid Brown
|  ||  |     | |+* Re: The Computer of the FutureMichael S
|  ||  |     | ||`- Re: The Computer of the FutureTim Rentsch
|  ||  |     | |`* Re: The Computer of the FutureIvan Godard
|  ||  |     | | +- Re: The Computer of the FutureDavid Brown
|  ||  |     | | +* Re: The Computer of the FutureBGB
|  ||  |     | | |`* Re: The Computer of the FutureTerje Mathisen
|  ||  |     | | | +* Re: The Computer of the FutureDavid Brown
|  ||  |     | | | |`* Re: The Computer of the FutureNiklas Holsti
|  ||  |     | | | | +- Re: The Computer of the FutureDavid Brown
|  ||  |     | | | | `* Re: The Computer of the FutureTerje Mathisen
|  ||  |     | | | |  `* Re: The Computer of the FutureThomas Koenig
|  ||  |     | | | |   `* Re: The Computer of the FutureDavid Brown
|  ||  |     | | | |    +* Re: The Computer of the FutureThomas Koenig
|  ||  |     | | | |    |`- Re: The Computer of the FutureDavid Brown
|  ||  |     | | | |    +* Re: The Computer of the FutureNiklas Holsti
|  ||  |     | | | |    |`- Re: The Computer of the FutureDavid Brown
|  ||  |     | | | |    `* Re: The Computer of the FutureMichael S
|  ||  |     | | | |     +* Re: The Computer of the FutureDavid Brown
|  ||  |     | | | |     |`* Re: The Computer of the FutureMitchAlsup
|  ||  |     | | | |     | `* Re: The Computer of the FutureDavid Brown
|  ||  |     | | | |     |  +- Re: The Computer of the FutureThomas Koenig
|  ||  |     | | | |     |  `- Re: The Computer of the FutureQuadibloc
|  ||  |     | | | |     +- Re: The Computer of the FutureQuadibloc
|  ||  |     | | | |     `- Re: The Computer of the FutureQuadibloc
|  ||  |     | | | `* Re: The Computer of the FutureBGB
|  ||  |     | | |  `* Re: The Computer of the FutureMitchAlsup
|  ||  |     | | |   `- Re: The Computer of the FutureBGB
|  ||  |     | | `* Re: The Computer of the FutureQuadibloc
|  ||  |     | |  `- Re: The Computer of the FutureIvan Godard
|  ||  |     | +* Re: The Computer of the FutureTim Rentsch
|  ||  |     | |`* Re: The Computer of the FutureTerje Mathisen
|  ||  |     | | `- Re: The Computer of the FutureBGB
|  ||  |     | `- Re: The Computer of the FutureQuadibloc
|  ||  |     `* Re: The Computer of the FutureBill Findlay
|  ||  |      +- Re: The Computer of the FutureTerje Mathisen
|  ||  |      +* Re: The Computer of the FutureMichael S
|  ||  |      |`* Re: The Computer of the FutureMichael S
|  ||  |      | `* Re: The Computer of the FutureDavid Brown
|  ||  |      |  +- Re: The Computer of the FutureMichael S
|  ||  |      |  +* Re: The Computer of the FutureTom Gardner
|  ||  |      |  |+- Re: The Computer of the FutureThomas Koenig
|  ||  |      |  |`* Re: The Computer of the FutureDavid Brown
|  ||  |      |  | `* Re: The Computer of the FutureTom Gardner
|  ||  |      |  |  `* Re: The Computer of the FutureDavid Brown
|  ||  |      |  |   `* Re: The Computer of the FutureNiklas Holsti
|  ||  |      |  |    `- Re: The Computer of the FutureDavid Brown
|  ||  |      |  `- Re: The Computer of the FutureAndy Valencia
|  ||  |      +* Re: The Computer of the FutureTim Rentsch
|  ||  |      |`- Re: The Computer of the FutureBill Findlay
|  ||  |      `* Re: The Computer of the FutureQuadibloc
|  ||  |       `* Re: The Computer of the FutureQuadibloc
|  ||  |        `* Re: The Computer of the FutureQuadibloc
|  ||  |         `- Re: The Computer of the FutureQuadibloc
|  ||  +* Re: The Computer of the FutureJohn Levine
|  ||  |+- Re: The Computer of the FutureAnton Ertl
|  ||  |`* Re: The Computer of the FutureQuadibloc
|  ||  | +* Re: The Computer of the FutureThomas Koenig
|  ||  | |+- Re: The Computer of the FutureJimBrakefield
|  ||  | |+* Re: The Computer of the FutureQuadibloc
|  ||  | ||`* Re: The Computer of the FutureBGB
|  ||  | || `* Re: The Computer of the FutureMitchAlsup
|  ||  | ||  `* Re: The Computer of the FutureBGB
|  ||  | ||   `- Re: The Computer of the FutureMitchAlsup
|  ||  | |`* Re: The Computer of the FutureTerje Mathisen
|  ||  | | +* Re: The Computer of the FutureStephen Fuld
|  ||  | | |+* FPGAs (was: The Computer of the Future)Anton Ertl
|  ||  | | ||+- Re: FPGAs (was: The Computer of the Future)BGB
|  ||  | | ||+* Re: FPGAs (was: The Computer of the Future)JimBrakefield
|  ||  | | |||`* Re: FPGAs (was: The Computer of the Future)Michael S
|  ||  | | ||| `* Re: FPGAs (was: The Computer of the Future)JimBrakefield
|  ||  | | |||  `* Re: FPGAs (was: The Computer of the Future)Michael S
|  ||  | | |||   +- Re: FPGAs (was: The Computer of the Future)BGB
|  ||  | | |||   +* Re: FPGAs (was: The Computer of the Future)MitchAlsup
|  ||  | | |||   +* Re: FPGAsTerje Mathisen
|  ||  | | |||   `* Re: FPGAs (was: The Computer of the Future)Quadibloc
|  ||  | | ||+- Re: FPGAs (was: The Computer of the Future)Michael S
|  ||  | | ||`* Re: FPGAs (was: The Computer of the Future)MitchAlsup
|  ||  | | |`- Re: The Computer of the FutureTerje Mathisen
|  ||  | | `- Re: The Computer of the FutureThomas Koenig
|  ||  | +- Re: The Computer of the FutureBrian G. Lucas
|  ||  | +- Re: The Computer of the FutureIvan Godard
|  ||  | `- Re: The Computer of the FutureMitchAlsup
|  ||  `* Re: The Computer of the FutureTom Gardner
|  |`* Re: The Computer of the FutureMitchAlsup
|  `* Re: The Computer of the FutureIvan Godard
+* Re: The Computer of the FutureThomas Koenig
`- Re: The Computer of the FutureJimBrakefield

Pages:123456
Re: The Computer of the Future

<d0882172-57cb-44c9-983e-be6609d72421n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23626&group=comp.arch#23626

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a1c:7c07:0:b0:37b:b973:bd3f with SMTP id x7-20020a1c7c07000000b0037bb973bd3fmr3071890wmc.87.1645109755228;
Thu, 17 Feb 2022 06:55:55 -0800 (PST)
X-Received: by 2002:a05:6808:f8e:b0:2d4:1d66:3a22 with SMTP id
o14-20020a0568080f8e00b002d41d663a22mr2839765oiw.120.1645109754677; Thu, 17
Feb 2022 06:55:54 -0800 (PST)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 17 Feb 2022 06:55:54 -0800 (PST)
In-Reply-To: <suljuo$it1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:ddec:e03:7010:d614;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:ddec:e03:7010:d614
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d0882172-57cb-44c9-983e-be6609d72421n@googlegroups.com>
Subject: Re: The Computer of the Future
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 17 Feb 2022 14:55:55 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Quadibloc - Thu, 17 Feb 2022 14:55 UTC

On Thursday, February 17, 2022 at 6:52:27 AM UTC-7, Ivan Godard wrote:

> I think the problem is really a linguistic one: our programming
> languages have caused us to think about our programs in control flow
> terms, when they are more naturally (and more efficiently) thought of
> in dataflow terms.

That does raise another issue.

It's my opinion that the ENIAC was originally a dataflow computer.

A dataflow machine makes very efficient use of its available ALUs.

That's a good thing. But it doesn't seem to be as general-purpose as
a von Neumann machine; you set it up to process multiple streams
of data in a way somewhat analogous to what punched card machines
did with their data, so it doesn't work well with problems that don't
fit into that paradigm.

I may have already mentioned 'way back that VVM could be considered
a way to sugar-coat the description of a dataflow setup. (And I tend
to prefer making it explicit what you're really telling the computer to do.)

John Savard

Re: The Computer of the Future

<61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23630&group=comp.arch#23630

  copy link   Newsgroups: comp.arch
X-Received: by 2002:adf:fd06:0:b0:1e3:22b6:e740 with SMTP id e6-20020adffd06000000b001e322b6e740mr2558245wrr.473.1645111022825;
Thu, 17 Feb 2022 07:17:02 -0800 (PST)
X-Received: by 2002:a05:6808:f8a:b0:2d0:70a3:2138 with SMTP id
o10-20020a0568080f8a00b002d070a32138mr2998866oiw.64.1645111022310; Thu, 17
Feb 2022 07:17:02 -0800 (PST)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 17 Feb 2022 07:17:02 -0800 (PST)
In-Reply-To: <sul480$jt6$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:dc5c:7df4:98bd:eb15;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:dc5c:7df4:98bd:eb15
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <sul480$jt6$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
Subject: Re: The Computer of the Future
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 17 Feb 2022 15:17:02 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 78
 by: MitchAlsup - Thu, 17 Feb 2022 15:17 UTC

On Thursday, February 17, 2022 at 3:24:20 AM UTC-6, BGB wrote:
> On 2/15/2022 3:09 PM, Thomas Koenig wrote:
> > Quadibloc <jsa...@ecn.ab.ca> schrieb:
> >> I noticed that in another thread I might have seemed to have contradicted myself.
> >>
> >> So I will clarify.
> >>
> >> In the near term, in two or three years, I think that it's entirely possible that we
> >> will have dies that combine four "GBOoO" performance cores with sixteen in-
> >> order efficiency cores, and chips that have four of those dies in a package, to
> >> give good performance on both number-crunching and database workloads.
> >
> > Who actually needs number crunching?
> >
> > I certainly do, even in the the company I work in (wich is in
> > the chemical industry, so rather technical) the number of people
> > actually running code which depends on floating point execution
> > speed is rather small, probably in the low single digit percent
> > range of all employees.
> >
> Probably depends. IME, integer operations tend to dominate, but in some
> workloads, floating point tends to make its presence known.
> > That does not mean that floating point is not important :-) but that
> > most users would not notice if they had a CPU with, let's say, a
> > reasonably efficient software emulation of floating point numbers.
> >
> Probably depends on "reasonably efficient".
> Say, ~ 20 cycles, probably only a minority of programs will notice.
> Say, ~ 500 cycles, probably nearly everything will notice.
>
> One big factor is having integer operations which are larger than the
> floating-point values being worked with.
>
>
> As noted, having an FPU which does ADD/SUB/MUL and a few conversion ops
> and similar is for the most part "sufficient" for most practical uses.
>
> Granted, having all the rest could be "better", but is more expensive.
> > Such a CPU would look horrible in SPECfp, and the savings from removing
> > floating point from a general purpose CPU are probably not that great so
> > it is not done, and I think that as an intensive user of floating point,
> > I have to be grateful for that.
> >
> Quick look, at least in my case, the FPU costs less than the L1 D$.
<
Point of order:
<
The actual calculation units of Opteron were, indeed, smaller that the
data storage area of DCache (or ICache).
<
But the FP register file of Opteron was larger than FMUL and FADD
combined as DECODE did some renaming to smooth the flow of
calculations.
<
Also note the LD and ST buffers of Opteron were larger than the
storage (64KB 2-banked) of Opteron.
<
So it depends if you mean "L1 D$" as L1-tag, L1-TLB, L1 data,
or whether you mean "L1 D$" as L1-tag, L1-TLB, L1 data, LD buffer, ST buffer.
And we still do not have the area of the miss buffer.
>
> For a 1-wide core, it may be tempting to omit the FPU and MMU for cost
> reasons. For a bigger core, omitting them may not be worthwhile if
> anything actually uses them.
<
Not any more. When you can fit 16 GBOoOs in a chip and a LBIO is 1/12
and that LBIO already has a FMAC fully pipelined, there is no reason to
leave out this kind of functionality unless your market is "lowest
possible power" and still, this decision cause you grief wrt IMUL and IDIV.
<
When you could put ~200 LBIOs in a die, making the LBIO 30% smaller
is like taking air conditioning and power widows out of your car.
>
>
> Or, at least within the limits of an FPU which is cost-cut enough to
> where the LSB being correctly rounded is a bit hit or miss.
> > Hmm, come to think of it, that is the fist positive thing about
> > SPEC that occurred to me in quite a few years...
> ...

Re: The Computer of the Future

<75f37e2c-8592-4f6e-8a73-f41bd470ef2dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23631&group=comp.arch#23631

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:600c:1d99:b0:37b:b813:c7aa with SMTP id p25-20020a05600c1d9900b0037bb813c7aamr6630621wms.108.1645111119506;
Thu, 17 Feb 2022 07:18:39 -0800 (PST)
X-Received: by 2002:a05:6808:180b:b0:2ce:6ee7:2cd5 with SMTP id
bh11-20020a056808180b00b002ce6ee72cd5mr2799003oib.259.1645111118966; Thu, 17
Feb 2022 07:18:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 17 Feb 2022 07:18:38 -0800 (PST)
In-Reply-To: <suljdm$7dq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:dc5c:7df4:98bd:eb15;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:dc5c:7df4:98bd:eb15
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <suljdm$7dq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <75f37e2c-8592-4f6e-8a73-f41bd470ef2dn@googlegroups.com>
Subject: Re: The Computer of the Future
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 17 Feb 2022 15:18:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2465
 by: MitchAlsup - Thu, 17 Feb 2022 15:18 UTC

On Thursday, February 17, 2022 at 7:43:22 AM UTC-6, Ivan Godard wrote:
> On 2/16/2022 10:25 AM, MitchAlsup wrote:

> > PCIe 6.0 uses 16 GHz clock to send 4 bits per wire per cycle using
> > double data rate and PAM4 modulation; and achieves 64GTs per wire
> > each direction. So 4 pins: true-comp out, true comp in: provide 8GB/s
> > out and 8GB/s in.
> > <
> > Now, remember from yesterday out 120 GB/s per core. You will need
> > 10 of these 4 pin wires to support inbound bandwidth and 5 to support
> > the outbound bandwidth.
> > <
> > But hey, if you want to provide 512 pins, I sure you can find some use
> > for this kind of bandwidth. {but try dealing with the heat.}
<
> If you don't really need the bandwidth, but have the pincount in the
> socket, can't you get less heat by say driving the pins eight at a time
> at an eighth the clock? (please forgive my HW ignorance)
<
Sure, heat is essentially proportional to GTs×pins

Re: The Computer of the Future

<4362936a-de1e-44d8-a7ed-97783b308cf8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23633&group=comp.arch#23633

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:600c:35cb:b0:37e:34c8:c375 with SMTP id r11-20020a05600c35cb00b0037e34c8c375mr3211658wmq.78.1645111372040;
Thu, 17 Feb 2022 07:22:52 -0800 (PST)
X-Received: by 2002:a05:6808:3029:b0:2d3:a03d:165d with SMTP id
ay41-20020a056808302900b002d3a03d165dmr2725021oib.325.1645111370880; Thu, 17
Feb 2022 07:22:50 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 17 Feb 2022 07:22:50 -0800 (PST)
In-Reply-To: <jwvk0dt4maw.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:dc5c:7df4:98bd:eb15;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:dc5c:7df4:98bd:eb15
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<e86a5f19-7855-4754-b8ef-b4d2466a683an@googlegroups.com> <jwvk0dt4maw.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4362936a-de1e-44d8-a7ed-97783b308cf8n@googlegroups.com>
Subject: Re: The Computer of the Future
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 17 Feb 2022 15:22:52 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Thu, 17 Feb 2022 15:22 UTC

On Thursday, February 17, 2022 at 8:19:23 AM UTC-6, Stefan Monnier wrote:
> > Cray instructions in it will take input from memory and put output to
> > memory, while putting intermediate results in vector registers... and
> > a VVM loop would do exactly the same thing, except using ordinary
> > registers.
<
Point of misunderstanding:
<
VVM uses register specifiers to determine data flow within a loop.
It does not actually use GPRs while performing the calculations !
GPRs get written only when there is an interrupt/exception, or at
loop termination based on the "outs" clause of the VEC instruction.
<
This is the relaxation that enables SIMD VVM--not overburdening
the registers themselves.
<
> There is the important difference that Cray vector registers are
> significantly bigger than My 66000 registers, so really if you look at
> the ISA itself, VVM matches the behavior of those CPUs that had
> in-memory vectors rather than vector registers.
<
This needs a lot of justification before I'll buy it.
>
> IIRC none of those vector processors used caches, so maybe VVM can be
> compared to Cray-style in-register vectors except the vector registers
> are stored in the L1 cache (tho it might not give quite as much read
> bandwidth, admittedly).
>
>
> Stefan

Re: The Computer of the Future

<sum76a$bp9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23640&group=comp.arch#23640

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Thu, 17 Feb 2022 13:20:40 -0600
Organization: A noiseless patient Spider
Lines: 172
Message-ID: <sum76a$bp9$1@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <sul480$jt6$1@dont-email.me>
<61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 17 Feb 2022 19:20:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="da247ea08d81cb1af9e8b533c0216bae";
logging-data="12073"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19u85fLeHmcmpnQWcRTo5Yo"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.0
Cancel-Lock: sha1:LEjuYqq/Bzv182VLe9v2ZmGLKTA=
In-Reply-To: <61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
Content-Language: en-US
 by: BGB - Thu, 17 Feb 2022 19:20 UTC

On 2/17/2022 9:17 AM, MitchAlsup wrote:
> On Thursday, February 17, 2022 at 3:24:20 AM UTC-6, BGB wrote:
>> On 2/15/2022 3:09 PM, Thomas Koenig wrote:
>>> Quadibloc <jsa...@ecn.ab.ca> schrieb:
>>>> I noticed that in another thread I might have seemed to have contradicted myself.
>>>>
>>>> So I will clarify.
>>>>
>>>> In the near term, in two or three years, I think that it's entirely possible that we
>>>> will have dies that combine four "GBOoO" performance cores with sixteen in-
>>>> order efficiency cores, and chips that have four of those dies in a package, to
>>>> give good performance on both number-crunching and database workloads.
>>>
>>> Who actually needs number crunching?
>>>
>>> I certainly do, even in the the company I work in (wich is in
>>> the chemical industry, so rather technical) the number of people
>>> actually running code which depends on floating point execution
>>> speed is rather small, probably in the low single digit percent
>>> range of all employees.
>>>
>> Probably depends. IME, integer operations tend to dominate, but in some
>> workloads, floating point tends to make its presence known.
>>> That does not mean that floating point is not important :-) but that
>>> most users would not notice if they had a CPU with, let's say, a
>>> reasonably efficient software emulation of floating point numbers.
>>>
>> Probably depends on "reasonably efficient".
>> Say, ~ 20 cycles, probably only a minority of programs will notice.
>> Say, ~ 500 cycles, probably nearly everything will notice.
>>
>> One big factor is having integer operations which are larger than the
>> floating-point values being worked with.
>>
>>
>> As noted, having an FPU which does ADD/SUB/MUL and a few conversion ops
>> and similar is for the most part "sufficient" for most practical uses.
>>
>> Granted, having all the rest could be "better", but is more expensive.
>>> Such a CPU would look horrible in SPECfp, and the savings from removing
>>> floating point from a general purpose CPU are probably not that great so
>>> it is not done, and I think that as an intensive user of floating point,
>>> I have to be grateful for that.
>>>
>> Quick look, at least in my case, the FPU costs less than the L1 D$.
> <
> Point of order:
> <
> The actual calculation units of Opteron were, indeed, smaller that the
> data storage area of DCache (or ICache).
> <
> But the FP register file of Opteron was larger than FMUL and FADD
> combined as DECODE did some renaming to smooth the flow of
> calculations.
> <
> Also note the LD and ST buffers of Opteron were larger than the
> storage (64KB 2-banked) of Opteron.
> <
> So it depends if you mean "L1 D$" as L1-tag, L1-TLB, L1 data,
> or whether you mean "L1 D$" as L1-tag, L1-TLB, L1 data, LD buffer, ST buffer.
> And we still do not have the area of the miss buffer.

Yeah, I don't have FPU registers.

I dropped them initially (along with the original FMOV instructions)
because they added a fair bit of cost.

I have more recently re-added an "FMOV.S" instruction, which in this
form is implemented as a normal Load/Store with Single<->Double
converters glued on. A case could be made for only doing it load-side,
as this is both cheaper and more common.

Otherwise, loading/storing Single values requires using explicit
conversion ops (with the FADD/FMUL units only operating natively on
Double). Well, it is either that, or add (proper) single-precision ops.

I was counting the D$ and TLB separately, because in my case they are
represented as different modules, and the Vivado netlist view also keeps
them separate.

Some modules just disappear though, like the instruction decoder tends
to get absorbed into other things.

Between FADD and FMUL, FADD is larger.
Just the TLB (by itself) is larger than the FADD (in terms of LUTs).

Or, a few costs (from the netlist, argressively rounded):
FPU : ~ 5k (FADD/FMUL: ~ 2k each, ~ 1k other, *1).
TLB : ~ 3k (4-way, 256 x 4 x 128b)
L1 D$ : ~ 7k
L1 I$ : ~ 2k
ALU : ~ 4k (~ 1.3k per lane)
GPR RF: ~ 2k (64x64b, 6R3W)
...
L2 : ~ 10k (256K, 2-way, 64B cache lines, *2)
DDR : ~ 7k (64B cache lines)
VRAM : ~ 7k (RAM backed, structurally similar to an L1 cache)

*1: This is for Double precision with SIMD.
SIMD seems to have minimal effect on cost.
Enabling GFPX (widens FADD and FMUL to 96 bit, S.E15.M80) does add a
more significant cost increase.

The SIMD operations are internally implemented using pipelining:
Extract element from vector, convert to internal format;
Feed through FADD or FMUL;
Convert back to vector format, store into output vector;
Final result is the output vector.

*2: The size of the L2 cache and DDR PHY vary significantly based on the
use of 16B or 64B cache lines (along the L2<->DRAM interface). The
bigger lines are better for performance as they allow for (more
efficient) burst transfers.

The ringbus continues to operate with 16B logical transfers in either
case (so the L2 is presented to the ringbus in terms of 16B lines).

The RAM in this case is a DDR2 module with a 16-bit bus interface being
run at 50MHz (with 'DLL disabled'). Some other boards use RAM with LPDDR
or QSPI interfaces, but my Verilog code doesn't currently support these (*).

This build was with UTLB disabled, where UTLB adds a smaller 1-way TLB
to the L1 D$. This can (on a hit) allow requests to skip over the main
TLB and go (more directly) onto the L2 ring.

This was not done for the I$, as the I$ miss rate tends to be low enough
(relative to the D$) on average that it isn't likely to be gain much.

*: One of the boards I bought, I thought was going to have QSPI RAM, but
the model I got turned out to not have any RAM (the RAM came with the
XC7A35T variant but not the XC7S25 variant).

>>
>> For a 1-wide core, it may be tempting to omit the FPU and MMU for cost
>> reasons. For a bigger core, omitting them may not be worthwhile if
>> anything actually uses them.
> <
> Not any more. When you can fit 16 GBOoOs in a chip and a LBIO is 1/12
> and that LBIO already has a FMAC fully pipelined, there is no reason to
> leave out this kind of functionality unless your market is "lowest
> possible power" and still, this decision cause you grief wrt IMUL and IDIV.
> <
> When you could put ~200 LBIOs in a die, making the LBIO 30% smaller
> is like taking air conditioning and power widows out of your car.

Yeah.

I have little idea how many BJX2 cores could fit into an ASIC
implementation. In any case, probably a whole lot more than on an Artix-7.

It can make sense mostly for the lower-end Spartan chips (eg: XC7S25),
and is probably necessary for ICE40 based designs.

>>
>>
>> Or, at least within the limits of an FPU which is cost-cut enough to
>> where the LSB being correctly rounded is a bit hit or miss.
>>> Hmm, come to think of it, that is the fist positive thing about
>>> SPEC that occurred to me in quite a few years...
>> ...

Re: The Computer of the Future

<sumbof$66t$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23644&group=comp.arch#23644

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Thu, 17 Feb 2022 12:38:39 -0800
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <sumbof$66t$1@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <sul480$jt6$1@dont-email.me>
<61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
<sum76a$bp9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 17 Feb 2022 20:38:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8866e3f7de7abd673e743c03a9c5105b";
logging-data="6365"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Kxh5UX3TaMPFaflNbLMF0uSqagMsC/l8="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.0
Cancel-Lock: sha1:EhIIzwYYBJO3ipCb5TRGJkJ1qho=
In-Reply-To: <sum76a$bp9$1@dont-email.me>
Content-Language: en-US
 by: Stephen Fuld - Thu, 17 Feb 2022 20:38 UTC

On 2/17/2022 11:20 AM, BGB wrote:

big snip

> Or, a few costs (from the netlist, argressively rounded):
>   FPU   : ~ 5k (FADD/FMUL: ~ 2k each, ~ 1k other, *1).
>   TLB   : ~ 3k (4-way, 256 x 4 x 128b)
>   L1 D$ : ~ 7k
>   L1 I$ : ~ 2k

Why does the L1 D$ cache take 3.5 X as much as the L1 I$

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The Computer of the Future

<sumf70$btn$3@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23649&group=comp.arch#23649

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!rd9pRsUZyxkRLAEK7e/Uzw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Thu, 17 Feb 2022 22:37:35 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sumf70$btn$3@gioia.aioe.org>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <sul480$jt6$1@dont-email.me>
<61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
<sum76a$bp9$1@dont-email.me> <sumbof$66t$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="12215"; posting-host="rd9pRsUZyxkRLAEK7e/Uzw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Thu, 17 Feb 2022 21:37 UTC

Stephen Fuld wrote:
> On 2/17/2022 11:20 AM, BGB wrote:
>
> big snip
>
>> Or, a few costs (from the netlist, argressively rounded):
>>  Â  FPU   : ~ 5k (FADD/FMUL: ~ 2k each, ~ 1k other, *1).
>>  Â  TLB   : ~ 3k (4-way, 256 x 4 x 128b)
>>  Â  L1 D$ : ~ 7k
>>  Â  L1 I$ : ~ 2k
>
> Why does the L1 D$ cache take 3.5 X as much as the L1 I$

I$ is effectively read-only, vs D$ which has to handle writes all the time?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The Computer of the Future

<sumgsf$mhl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23650&group=comp.arch#23650

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Thu, 17 Feb 2022 16:06:06 -0600
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <sumgsf$mhl$1@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <sul480$jt6$1@dont-email.me>
<61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
<sum76a$bp9$1@dont-email.me> <sumbof$66t$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 17 Feb 2022 22:06:07 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="da247ea08d81cb1af9e8b533c0216bae";
logging-data="23093"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+J0+XwWQ1UQVPN9ZJBnHsz"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.0
Cancel-Lock: sha1:WMeoRYAJBYDrb2mew+tTTnSBILU=
In-Reply-To: <sumbof$66t$1@dont-email.me>
Content-Language: en-US
 by: BGB - Thu, 17 Feb 2022 22:06 UTC

On 2/17/2022 2:38 PM, Stephen Fuld wrote:
> On 2/17/2022 11:20 AM, BGB wrote:
>
> big snip
>
>> Or, a few costs (from the netlist, argressively rounded):
>>    FPU   : ~ 5k (FADD/FMUL: ~ 2k each, ~ 1k other, *1).
>>    TLB   : ~ 3k (4-way, 256 x 4 x 128b)
>>    L1 D$ : ~ 7k
>>    L1 I$ : ~ 2k
>
> Why does the L1 D$ cache take 3.5 X as much as the L1 I$
>

Probably because it is:
Byte Addressable vs Word Addressable;
Supports a wider fetch (128-bit vs 96-bit);
Supports both Load and Store operations (vs just Fetch);
Also deals with MMIO operations (different behavior on the bus);
...

There is a lot of extra logic needed to deal with memory stores, dirty
cache lines, storing stuff back to RAM, ... which is effectively
irrelevant to the I$.

The I$ has logic to deal with things like instruction length and
instruction bundles, but this seems to be comparably much smaller than
the logic needed by the D$ to deal with things like byte-aligned Load
and Store.

....

Not shown in this stat is Block-RAM cost, where:
L1 D$ is 32K, organized as 1024 x 2 x 16B.
Each 16B line has: 128-bits of data and 108* bits of metadata (tag).
L1 I$ is 16K, organized as 512 x 2 x 16B.
Each 16B line has: 128-bits of data and 72 bits of metadata (tag).

Cache lines are organized as Even/Odd pairs (or A/B), mostly to deal
with Load/Store crossing a 16B boundary. An "Aligned Only" cache could
skip this step (I$ would still need it though to be able to support
variable-length / variable-alignment bundles).

*: In a newer/experimental version of the L1 D$ which adds epoch
flushing and some other tweaks (the prior version was 88 bits). However,
this version still has some bugs and I haven't released the code yet.

The D$ would actually be a lot more expensive if I did not impose an 8B
alignment restriction for 128-bit Load/Store. This would effectively
double the size of the Extract/Insert logic; as the 128b Load/Store
works by bypassing the logic normally used for Extract/Insert of values
64-bits or less.

The Extract/Insert logic fetches 16B from a 8B aligned position, and
then selects a byte-aligned 8B from this, and then zero or sign-extends
it to the requested length (for Load).

For Store, it combines this value with the value being stored (based on
the store width), inserts it (byte aligned) back into the 16B block, and
then builds a set of Store cache-lines which re-insert this block.

For 128-bit store, the stored value replaces the 16B block, which is
then inserted into the store cache lines.

If the request is Store type (and not MMIO), then the newly-updated
cache-lines may be stored back to the cache-line arrays (with the Dirty
Flag) being Set. On a Cache Miss, it will then be sent out on the bus
before requesting the replacement cache line.

....

Re: The Computer of the Future

<sumlbi$gqr$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23655&group=comp.arch#23655

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Thu, 17 Feb 2022 17:22:24 -0600
Organization: A noiseless patient Spider
Lines: 135
Message-ID: <sumlbi$gqr$1@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <sul480$jt6$1@dont-email.me>
<61534aff-e0f8-4e73-af27-72faeaa206aan@googlegroups.com>
<sum76a$bp9$1@dont-email.me> <sumbof$66t$1@dont-email.me>
<sumf70$btn$3@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 17 Feb 2022 23:22:26 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="54262f9b599f1e919b67f458cfb7a066";
logging-data="17243"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18izTM9j2IlAh7M54Vl9iUX"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.0
Cancel-Lock: sha1:ajWdLHvkv/28txyiGEjF3pfDNfE=
In-Reply-To: <sumf70$btn$3@gioia.aioe.org>
Content-Language: en-US
 by: BGB - Thu, 17 Feb 2022 23:22 UTC

On 2/17/2022 3:37 PM, Terje Mathisen wrote:
> Stephen Fuld wrote:
>> On 2/17/2022 11:20 AM, BGB wrote:
>>
>> big snip
>>
>>> Or, a few costs (from the netlist, argressively rounded):
>>>  Â  FPU   : ~ 5k (FADD/FMUL: ~ 2k each, ~ 1k other, *1).
>>>  Â  TLB   : ~ 3k (4-way, 256 x 4 x 128b)
>>>  Â  L1 D$ : ~ 7k
>>>  Â  L1 I$ : ~ 2k
>>
>> Why does the L1 D$ cache take 3.5 X as much as the L1 I$
>
> I$ is effectively read-only, vs D$ which has to handle writes all the time?
>

This is probably most of it.

There is a whole lot of logic which just goes poof and disappears when
the cache is read-only.

Also, in general, the logic in the I$ is simpler than in the D$.

Main unique thing that the I$ needs to do is look at instruction bits
and figure out how long the bundle is.

So, say (15:12):
0111: w32b=1, Wx=(11) //XGPR
1001: w32b=1, Wx=(11) //XGPR
1110: w32b=1, Wx=((11:9)==101) //PrWEX
1111: w32b=1, Wx=(10)
Else: w32b=0, Wx=0

Then look at this across several instruction words,
w32b=0,Wx=0: 16-bit
w32b=1,Wx=0: 32-bit
(w32b=1,Wx=1), (w32b=0,Wx=0): 48-bit (unused)
(w32b=1,Wx=1), (w32b=1,Wx=0): 64-bit
(w32b=1,Wx=1), (w32b=1,Wx=1), (w32b=1,Wx=0): 96-bit

The actual logic for this part, while not particularly concise, doesn't
have a particularly high LUT cost.

The I$ doesn't care all that much what the instructions "actually do"
(this part is for the Decode and Execute stages to figure out).
Similarly, the decode stage has to sort out all the stuff with Jumbo
Prefixes (the I$ mostly ignores these, treating them like a special case
of the normal bundle encoding).

....

Some of my other ISA designs (imagined) would have gone over to,
effectively (15,13):
0zz: 16b
10z: 16b
110: 32b (Wx=0)
111: 32b (Wx=1)

For something like WEX3W, this would allow determining bundle length by
looking at 6-bits:
{ (15:13), (47:45) }
{ (31:29), (63:61) }
...

Though, I have gone back and forth between using the high bits of each
16-bit word (as what BJX2 currently does), or the low order bits of each
word (like RISC-V).

The RISC-V ordering makes more sense if one assumes a (consistent)
little-ending ordering, but the high-bits scheme makes more sense for
laying out the encodings in hexadecimal (even if the resultant encoding
is mixed-endian).

The bit layout doesn't make any real difference to an FPGA though.

Partial limiting factors (that have partly kept BJX2 alive), is the
issue that I can't really fit 6 bit registers, predication, and WEX,
into a 32-bit instruction word, which still having a good amount of
encoding space.

Say:
zzzz-zzzz tttt-ttss ssss-nnnn nnqq-pw11

pw:
00: 32-bit, Unconditional
01: 32-bit, WEX, Unconditional
10: 32-bit, Predicated
11: 32-bit, WEX, Predicated
qq (p=0):
00: Block 0
01: Block 1
10: Block 2
11: Block 3
qq (p=1):
00: Block 0 ?T
01: Block 0 ?F
10: Block 0 ?ST
11: Block 0 ?SF

The opcode space would work out considerably smaller than what I
currently have (to compensate, would need to move over to using 40 or 48
bit instructions, which is also undesirable).

Another option being to gain one or two bits by dropping the possibility
of 16-bit instructions.
zzzz-zzzz tttt-ttss ssss-nnnn nnzz-qqpw
This could be at least roughly break-even in terms of opcode space.

The tradeoff being, that with the current ISA one has to get a little
mix/match:
Predicates + WEX, R0..R31 only.
WEX + R0..R63, no predicates.
Predicates + R0..R63, no WEX

But, did hack it more recently:
Predicates + WEX + R0..R63; 2-wide bundle (2 ops in 96 bits).
(The encoding here is a bit "questionable", but oh well).

Kinda ugly, basically works though...

A lot of this hair magically disappears if one assumes a subset which
only allows R0..R31 (all these extra special-case encodings become
"invalid" and the implementation can ignore their existence).

....

Re: The Computer of the Future

<868rtnfb1y.fsf@linuxsc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23987&group=comp.arch#23987

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Sun, 06 Mar 2022 05:46:49 -0800
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <868rtnfb1y.fsf@linuxsc.com>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com> <858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com> <fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com> <005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com> <suljuo$it1$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="13246d5653fd8faf4d64a45856686609";
logging-data="21730"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/NgCKPlqcNJxP+/iFjrn60Bmve1GZj5bQ="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:ZvuivONz1NTzBc5P8FCEWjZwYS0=
sha1:TbcO6MwKIG6tAfim+KLC7s5xyn4=
 by: Tim Rentsch - Sun, 6 Mar 2022 13:46 UTC

Ivan Godard <ivan@millcomputing.com> writes:

[...]

> I think the problem is really a linguistic one: our programming
> languages have caused us to think about our programs in control
> flow terms, [...]

Some of us. Not everyone, fortunately.

Re: The Computer of the Future

<t02s9j$360$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23994&group=comp.arch#23994

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Sun, 6 Mar 2022 17:50:43 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t02s9j$360$1@gal.iecc.com>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com> <005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com> <suljuo$it1$1@dont-email.me>
Injection-Date: Sun, 6 Mar 2022 17:50:43 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="3264"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com> <005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com> <suljuo$it1$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sun, 6 Mar 2022 17:50 UTC

According to Ivan Godard <ivan@millcomputing.com>:
>I think the problem is really a linguistic one: our programming
>languages have caused us to think about our programs in control flow
>terms, when they are more naturally (and more efficiently) thought of
>in dataflow terms.

I blame von Neumann. Look at the First Draft of a Report on the EDVAC
and it's brutally serial, one step after another, one word at a time
loaded or stored from the iconoscope.

Compare it to the lovely ENIAC where you could plug anything into
anything else, cables permitting, and the data flowed as soon as it
was ready. Yeah, it was a little harder to program and only five
people in the world could do it, but you can't have everything.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: The Computer of the Future

<2022Mar6.193807@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=23995&group=comp.arch#23995

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Sun, 06 Mar 2022 18:38:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 25
Message-ID: <2022Mar6.193807@mips.complang.tuwien.ac.at>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com> <005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com> <suljuo$it1$1@dont-email.me> <t02s9j$360$1@gal.iecc.com>
Injection-Info: reader02.eternal-september.org; posting-host="835406b4f6a1a7ebab1fd1095578100a";
logging-data="4057"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/JP5ZOR7Di7eIR00mEmWvk"
Cancel-Lock: sha1:7itWwq/5F9SkAsd3U3ti8g8yIjI=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 6 Mar 2022 18:38 UTC

John Levine <johnl@taugh.com> writes:
>According to Ivan Godard <ivan@millcomputing.com>:
>>I think the problem is really a linguistic one: our programming
>>languages have caused us to think about our programs in control flow
>>terms, when they are more naturally (and more efficiently) thought of
>>in dataflow terms.
>
>I blame von Neumann. Look at the First Draft of a Report on the EDVAC
>and it's brutally serial, one step after another, one word at a time
>loaded or stored from the iconoscope.
>
>Compare it to the lovely ENIAC where you could plug anything into
>anything else, cables permitting, and the data flowed as soon as it
>was ready. Yeah, it was a little harder to program and only five
>people in the world could do it, but you can't have everything.

You can, and you could a quarter century ago: You specify the code in
a sequential way (so more than five people in the world can program
it), and the magic of OoO execution means that data flows as soon as
it is ready.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The Computer of the Future

<t04pa8$g2u$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24008&group=comp.arch#24008

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!UXtAIYUgaw/fkqnS/V28xg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Mon, 7 Mar 2022 12:12:08 +0100
Organization: Aioe.org NNTP Server
Message-ID: <t04pa8$g2u$1@gioia.aioe.org>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com>
<2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com>
<1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com>
<66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <868rtnfb1y.fsf@linuxsc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="16478"; posting-host="UXtAIYUgaw/fkqnS/V28xg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Mon, 7 Mar 2022 11:12 UTC

Tim Rentsch wrote:
> Ivan Godard <ivan@millcomputing.com> writes:
>
> [...]
>
>> I think the problem is really a linguistic one: our programming
>> languages have caused us to think about our programs in control
>> flow terms, [...]
>
> Some of us. Not everyone, fortunately.
>
I tend to think much more in data flow terms, as I've written before:

An optimal program is like a creek finding its path down from the
mountain, always searching for the path of least resistance.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The Computer of the Future

<t04r53$m0e$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24009&group=comp.arch#24009

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: spamj...@blueyonder.co.uk (Tom Gardner)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Mon, 7 Mar 2022 11:43:31 +0000
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <t04r53$m0e$2@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com>
<2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com>
<1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com>
<66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 7 Mar 2022 11:43:31 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f373fa96d5a97bb4bfcffcff4d491eda";
logging-data="22542"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18dO6kzG2/KC3FbdX9y7yJN"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
Firefox/52.0 SeaMonkey/2.49.4
Cancel-Lock: sha1:6blihWVNWitawRf3GQ72j3TPUys=
In-Reply-To: <suljuo$it1$1@dont-email.me>
 by: Tom Gardner - Mon, 7 Mar 2022 11:43 UTC

On 17/02/22 13:52, Ivan Godard wrote:
> I think the problem is really a linguistic one: our programming languages have
> caused us to think about our programs in control flow terms, when they are  more
> naturally (and more efficiently) thought of in dataflow terms.

Not all languages, of course.

The commercially and technically important class of hardware
description languages (e.g. Verilog, VHDL) have a very large
component of data flowing (streaming) through "wires" connecting
"processes/FSMs" and event based computation. To anyone used
to "thinking in hardware" that, and its associated parallelism
is easy and natural.

Softies find a marked "impedance mismatch" when they encounter
HDLs, and try to "write Fortran" in the HDL. Hence there are
now moves to have more procedural concepts and constructs,
e.g. System C. The impedance mismatch is lower, but it is still
easy to create something that is poorly "synthesisable", i.e.
can't be translated into hardware/dataflow.

Re: The Computer of the Future

<t04uv9$js1$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24010&group=comp.arch#24010

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Mon, 7 Mar 2022 13:48:41 +0100
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <t04uv9$js1$1@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 7 Mar 2022 12:48:41 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="12b7af70c64a6bd31d12e34602937def";
logging-data="20353"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wN28SP5D97iKe/hyAoRXycSoLues8Ia4="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Cancel-Lock: sha1:f3rr5BmBWDiBlRvfOt8LEhCfCoU=
In-Reply-To: <suh4pk$3pd$4@newsreader4.netcologne.de>
Content-Language: en-US
 by: Marcus - Mon, 7 Mar 2022 12:48 UTC

On 2022-02-15, Thomas Koenig wrote:
> Quadibloc <jsavard@ecn.ab.ca> schrieb:
>> I noticed that in another thread I might have seemed to have contradicted myself.
>>
>> So I will clarify.
>>
>> In the near term, in two or three years, I think that it's entirely possible that we
>> will have dies that combine four "GBOoO" performance cores with sixteen in-
>> order efficiency cores, and chips that have four of those dies in a package, to
>> give good performance on both number-crunching and database workloads.
>
> Who actually needs number crunching?
>
> I certainly do, even in the the company I work in (wich is in
> the chemical industry, so rather technical) the number of people
> actually running code which depends on floating point execution
> speed is rather small, probably in the low single digit percent
> range of all employees.
>
> That does not mean that floating point is not important :-) but that
> most users would not notice if they had a CPU with, let's say, a
> reasonably efficient software emulation of floating point numbers.

The problem is that AFAIK there exists no reasonably efficient software
emulation of IEEE-754 floating-point (where "reasonable" would be
something like 10x slower than using an FPU).

I recently spent some time optimizing function argument sanity checks in
an API layer of an embedded software. For instance some arguments were
orthonormal matrices, and the API made a crude check for that. Normally
this is not a problem, but in this case the API layer ran on an ARM M4
without an FPU, and so a single argument check could take more than
0.1 ms, which is a considerable time period in a real time system where
all work must finish in less than 16 ms.

It would be interesting to see an FP standard that is optimized for
software implementation, and/or CPU:s with "software FP aid"
instructions (fast handling of NaN:s and special cases,
(de)normalization instructions - like CLZ on steroids, etc), so that
common FP operations can be implemented with 5-20 instructions, for
instance.

/Marcus

Re: The Computer of the Future

<6cd04c7b-bc39-4843-bb74-ae7babe6ccaan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24012&group=comp.arch#24012

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:500f:b0:435:796b:7c62 with SMTP id jo15-20020a056214500f00b00435796b7c62mr5913829qvb.12.1646661132263;
Mon, 07 Mar 2022 05:52:12 -0800 (PST)
X-Received: by 2002:a05:6808:23c1:b0:2d7:390e:5c2a with SMTP id
bq1-20020a05680823c100b002d7390e5c2amr18308582oib.108.1646661131942; Mon, 07
Mar 2022 05:52:11 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 7 Mar 2022 05:52:11 -0800 (PST)
In-Reply-To: <t04r53$m0e$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.253.102; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.253.102
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <t04r53$m0e$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6cd04c7b-bc39-4843-bb74-ae7babe6ccaan@googlegroups.com>
Subject: Re: The Computer of the Future
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Mon, 07 Mar 2022 13:52:12 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 21
 by: JimBrakefield - Mon, 7 Mar 2022 13:52 UTC

On Monday, March 7, 2022 at 5:43:35 AM UTC-6, Tom Gardner wrote:
> On 17/02/22 13:52, Ivan Godard wrote:
> > I think the problem is really a linguistic one: our programming languages have
> > caused us to think about our programs in control flow terms, when they are more
> > naturally (and more efficiently) thought of in dataflow terms.
> Not all languages, of course.
>
> The commercially and technically important class of hardware
> description languages (e.g. Verilog, VHDL) have a very large
> component of data flowing (streaming) through "wires" connecting
> "processes/FSMs" and event based computation. To anyone used
> to "thinking in hardware" that, and its associated parallelism
> is easy and natural.
>
> Softies find a marked "impedance mismatch" when they encounter
> HDLs, and try to "write Fortran" in the HDL. Hence there are
> now moves to have more procedural concepts and constructs,
> e.g. System C. The impedance mismatch is lower, but it is still
> easy to create something that is poorly "synthesisable", i.e.
> can't be translated into hardware/dataflow.

I call HDL writers "time lords" as they can operate across both time and space.

Re: The Computer of the Future

<t056sj$hom$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24015&group=comp.arch#24015

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Mon, 7 Mar 2022 07:03:47 -0800
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <t056sj$hom$2@dont-email.me>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com>
<2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com>
<1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com>
<66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <t04r53$m0e$2@dont-email.me>
<6cd04c7b-bc39-4843-bb74-ae7babe6ccaan@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 7 Mar 2022 15:03:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="993e743fddc9c24b4517a045a96596b6";
logging-data="18198"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/J2K/NI2nOESSjzkY9YBdZ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:xp5b6Y/XFHv6cp7wvR7/fEnm+80=
In-Reply-To: <6cd04c7b-bc39-4843-bb74-ae7babe6ccaan@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Mon, 7 Mar 2022 15:03 UTC

On 3/7/2022 5:52 AM, JimBrakefield wrote:
> On Monday, March 7, 2022 at 5:43:35 AM UTC-6, Tom Gardner wrote:
>> On 17/02/22 13:52, Ivan Godard wrote:
>>> I think the problem is really a linguistic one: our programming languages have
>>> caused us to think about our programs in control flow terms, when they are more
>>> naturally (and more efficiently) thought of in dataflow terms.
>> Not all languages, of course.
>>
>> The commercially and technically important class of hardware
>> description languages (e.g. Verilog, VHDL) have a very large
>> component of data flowing (streaming) through "wires" connecting
>> "processes/FSMs" and event based computation. To anyone used
>> to "thinking in hardware" that, and its associated parallelism
>> is easy and natural.
>>
>> Softies find a marked "impedance mismatch" when they encounter
>> HDLs, and try to "write Fortran" in the HDL. Hence there are
>> now moves to have more procedural concepts and constructs,
>> e.g. System C. The impedance mismatch is lower, but it is still
>> easy to create something that is poorly "synthesisable", i.e.
>> can't be translated into hardware/dataflow.
>
> I call HDL writers "time lords" as they can operate across both time and space.

+1!

Re: The Computer of the Future

<t05co1$163q$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24022&group=comp.arch#24022

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!UXtAIYUgaw/fkqnS/V28xg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Mon, 7 Mar 2022 17:43:47 +0100
Organization: Aioe.org NNTP Server
Message-ID: <t05co1$163q$1@gioia.aioe.org>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <t04uv9$js1$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="39034"; posting-host="UXtAIYUgaw/fkqnS/V28xg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.2
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Mon, 7 Mar 2022 16:43 UTC

Marcus wrote:
> On 2022-02-15, Thomas Koenig wrote:
>> Quadibloc <jsavard@ecn.ab.ca> schrieb:
>>> I noticed that in another thread I might have seemed to have
>>> contradicted myself.
>>>
>>> So I will clarify.
>>>
>>> In the near term, in two or three years, I think that it's entirely
>>> possible that we
>>> will have dies that combine four "GBOoO" performance cores with
>>> sixteen in-
>>> order efficiency cores, and chips that have four of those dies in a
>>> package, to
>>> give good performance on both number-crunching and database workloads.
>>
>> Who actually needs number crunching?
>>
>> I certainly do, even in the the company I work in (wich is in
>> the chemical industry, so rather technical) the number of people
>> actually running code which depends on floating point execution
>> speed is rather small, probably in the low single digit percent
>> range of all employees.
>>
>> That does not mean that floating point is not important :-) but that
>> most users would not notice if they had a CPU with, let's say, a
>> reasonably efficient software emulation of floating point numbers.
>
> The problem is that AFAIK there exists no reasonably efficient software
> emulation of IEEE-754 floating-point (where "reasonable" would be
> something like 10x slower than using an FPU).

40-50 clock cycles for FADD/FMAC is just about doable.
>
> I recently spent some time optimizing function argument sanity checks in
> an API layer of an embedded software. For instance some arguments were
> orthonormal matrices, and the API made a crude check for that. Normally
> this is not a problem, but in this case the API layer ran on an ARM M4
> without an FPU, and so a single argument check could take more than
> 0.1 ms, which is a considerable time period in a real time system where
> all work must finish in less than 16 ms.

Could you do the checking in integer domain, or would that be meaningless?
>
> It would be interesting to see an FP standard that is optimized for
> software implementation, and/or CPU:s with "software FP aid"
> instructions (fast handling of NaN:s and special cases,
> (de)normalization instructions - like CLZ on steroids, etc), so that
> common FP operations can be implemented with 5-20 instructions, for
> instance.

We did quite a bit of work on sw/hw fp codesign for an absolute minimum
Mill, you are correct that it helps to have a small number of hw
helpers. In particular you need a saturating shift right in order to
implement the sticky bit for proper rounding.

It also helps to have a fast classifier for one or two inputs,
optionally sorting a pair of inputs by magnitude (needed for FADD/FSUB).
You do all the special cases in parallel with the normal calculations,
then select either that normal result or the special value at the very end.

Doing it this way you can get into a 5x slower speed ballpark.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The Computer of the Future

<11334af8-ca5a-4eac-abd4-665f712b6e92n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24027&group=comp.arch#24027

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:afc5:0:b0:663:1033:dcc7 with SMTP id y188-20020a37afc5000000b006631033dcc7mr7774105qke.53.1646675032360;
Mon, 07 Mar 2022 09:43:52 -0800 (PST)
X-Received: by 2002:a05:6830:1b6f:b0:5af:d2f:eed9 with SMTP id
d15-20020a0568301b6f00b005af0d2feed9mr6439096ote.331.1646675031759; Mon, 07
Mar 2022 09:43:51 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 7 Mar 2022 09:43:51 -0800 (PST)
In-Reply-To: <t04r53$m0e$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e9ee:3df:f03b:70f9;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e9ee:3df:f03b:70f9
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <t04r53$m0e$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <11334af8-ca5a-4eac-abd4-665f712b6e92n@googlegroups.com>
Subject: Re: The Computer of the Future
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 07 Mar 2022 17:43:52 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 27
 by: MitchAlsup - Mon, 7 Mar 2022 17:43 UTC

On Monday, March 7, 2022 at 5:43:35 AM UTC-6, Tom Gardner wrote:
> On 17/02/22 13:52, Ivan Godard wrote:
> > I think the problem is really a linguistic one: our programming languages have
> > caused us to think about our programs in control flow terms, when they are more
> > naturally (and more efficiently) thought of in dataflow terms.
> Not all languages, of course.
>
> The commercially and technically important class of hardware
> description languages (e.g. Verilog, VHDL) have a very large
> component of data flowing (streaming) through "wires" connecting
> "processes/FSMs" and event based computation. To anyone used
> to "thinking in hardware" that, and its associated parallelism
> is easy and natural.
>
> Softies find a marked "impedance mismatch" when they encounter
> HDLs, and try to "write Fortran" in the HDL.
<
Many Verilog "subroutines" can have their "assignment statements"
placed backwards and the subroutine still does the same job !
You HAVE to give up the vonNeumann paradigm that one thing
happens then another and another--they all happen based on
dependencies (some of which are not "in scope".)
<
> Hence there are
> now moves to have more procedural concepts and constructs,
> e.g. System C. The impedance mismatch is lower, but it is still
> easy to create something that is poorly "synthesisable", i.e.
> can't be translated into hardware/dataflow.

Re: The Computer of the Future

<09f4dfc7-ee56-41d7-863f-0dc7dfde0464n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24028&group=comp.arch#24028

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:d87:b0:67b:311c:ecbd with SMTP id q7-20020a05620a0d8700b0067b311cecbdmr3313119qkl.146.1646675324612;
Mon, 07 Mar 2022 09:48:44 -0800 (PST)
X-Received: by 2002:a05:6870:a2d0:b0:d9:ae66:b8e2 with SMTP id
w16-20020a056870a2d000b000d9ae66b8e2mr42352oak.7.1646675324307; Mon, 07 Mar
2022 09:48:44 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 7 Mar 2022 09:48:44 -0800 (PST)
In-Reply-To: <t04uv9$js1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e9ee:3df:f03b:70f9;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e9ee:3df:f03b:70f9
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<suh4pk$3pd$4@newsreader4.netcologne.de> <t04uv9$js1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <09f4dfc7-ee56-41d7-863f-0dc7dfde0464n@googlegroups.com>
Subject: Re: The Computer of the Future
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 07 Mar 2022 17:48:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 59
 by: MitchAlsup - Mon, 7 Mar 2022 17:48 UTC

On Monday, March 7, 2022 at 6:48:44 AM UTC-6, Marcus wrote:
> On 2022-02-15, Thomas Koenig wrote:
> > Quadibloc <jsa...@ecn.ab.ca> schrieb:
> >> I noticed that in another thread I might have seemed to have contradicted myself.
> >>
> >> So I will clarify.
> >>
> >> In the near term, in two or three years, I think that it's entirely possible that we
> >> will have dies that combine four "GBOoO" performance cores with sixteen in-
> >> order efficiency cores, and chips that have four of those dies in a package, to
> >> give good performance on both number-crunching and database workloads.
> >
> > Who actually needs number crunching?
> >
> > I certainly do, even in the the company I work in (wich is in
> > the chemical industry, so rather technical) the number of people
> > actually running code which depends on floating point execution
> > speed is rather small, probably in the low single digit percent
> > range of all employees.
> >
> > That does not mean that floating point is not important :-) but that
> > most users would not notice if they had a CPU with, let's say, a
> > reasonably efficient software emulation of floating point numbers.
> The problem is that AFAIK there exists no reasonably efficient software
> emulation of IEEE-754 floating-point (where "reasonable" would be
> something like 10x slower than using an FPU).
>
> I recently spent some time optimizing function argument sanity checks in
> an API layer of an embedded software. For instance some arguments were
> orthonormal matrices, and the API made a crude check for that. Normally
> this is not a problem, but in this case the API layer ran on an ARM M4
> without an FPU, and so a single argument check could take more than
> 0.1 ms, which is a considerable time period in a real time system where
> all work must finish in less than 16 ms.
>
> It would be interesting to see an FP standard that is optimized for
> software implementation, and/or CPU:s with "software FP aid"
> instructions (fast handling of NaN:s and special cases,
> (de)normalization instructions - like CLZ on steroids, etc), so that
> common FP operations can be implemented with 5-20 instructions, for
> instance.
<
All you need is a native integer 2× as wide as the FP you simulate,
Find First 1 over 2×, and 2× Shifts, and some rounding instruction
also over 2×, and extract/insert exponents and fractions.
<
The problem is there is no 2× designed to emulate FP they are all
designed to emulate 2× integers.
>
> /Marcus

Re: The Computer of the Future

<86ee3ceh3r.fsf@linuxsc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24052&group=comp.arch#24052

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Tue, 08 Mar 2022 04:58:16 -0800
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <86ee3ceh3r.fsf@linuxsc.com>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com> <858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com> <2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com> <fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com> <1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com> <005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com> <suljuo$it1$1@dont-email.me> <868rtnfb1y.fsf@linuxsc.com> <t04pa8$g2u$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="eb9093df1d45a8730a0059165ce9d7b9";
logging-data="23619"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18JO1UfYjcZez2trCMdvoURdCA6bZzE1h8="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:xnL41+EwDsPWqPEKp1mnnFpyVXM=
sha1:AsGRKYYf3LZSpE/6UxowVtkilKs=
 by: Tim Rentsch - Tue, 8 Mar 2022 12:58 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:

> Tim Rentsch wrote:
>
>> Ivan Godard <ivan@millcomputing.com> writes:
>>
>> [...]
>>
>>> I think the problem is really a linguistic one: our programming
>>> languages have caused us to think about our programs in control
>>> flow terms, [...]
>>
>> Some of us. Not everyone, fortunately.
>
> I tend to think much more in data flow terms, as I've written before:
>
> An optimal program is like a creek finding its path down from the
> mountain, always searching for the path of least resistance.

It's an interesting simile. Note by the way that creeks are
greedy algorithms, in the sense of being only locally optimal,
not necessarily globally optimal.

To me the more interesting question is how does this perspective
affect how the code looks? If reading your programs, would I see
something that looks pretty much like other imperative code, or
would there be some distinguishing characteristics that would
indicate your different thought mode? Can you say anything about
what those characteristics might be? Or perhaps give an example
or two? (Short is better if that is feasible.)

Re: The Computer of the Future

<fcace6e8-30a2-40f4-bfae-4c59529c6c10n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24085&group=comp.arch#24085

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:8e0b:0:b0:435:1779:7b22 with SMTP id v11-20020a0c8e0b000000b0043517797b22mr15498410qvb.63.1646811178697;
Tue, 08 Mar 2022 23:32:58 -0800 (PST)
X-Received: by 2002:a05:6870:d58b:b0:d2:8d1d:c12 with SMTP id
u11-20020a056870d58b00b000d28d1d0c12mr4914781oao.108.1646811178472; Tue, 08
Mar 2022 23:32:58 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 8 Mar 2022 23:32:58 -0800 (PST)
In-Reply-To: <t02s9j$360$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:3834:c408:4bd5:ac1d;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:3834:c408:4bd5:ac1d
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <t02s9j$360$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fcace6e8-30a2-40f4-bfae-4c59529c6c10n@googlegroups.com>
Subject: Re: The Computer of the Future
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 09 Mar 2022 07:32:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 56
 by: Quadibloc - Wed, 9 Mar 2022 07:32 UTC

On Sunday, March 6, 2022 at 10:50:47 AM UTC-7, John Levine wrote:
> According to Ivan Godard <iv...@millcomputing.com>:
> >I think the problem is really a linguistic one: our programming
> >languages have caused us to think about our programs in control flow
> >terms, when they are more naturally (and more efficiently) thought of
> >in dataflow terms.

> I blame von Neumann. Look at the First Draft of a Report on the EDVAC
> and it's brutally serial, one step after another, one word at a time
> loaded or stored from the iconoscope.

> Compare it to the lovely ENIAC where you could plug anything into
> anything else, cables permitting, and the data flowed as soon as it
> was ready. Yeah, it was a little harder to program and only five
> people in the world could do it, but you can't have everything.

After the ENIAC, though, eventually more people learned how to
program in dataflow terms. Because that's how analog computers
were programmed.

And what about punched-card accounting machines?

Originally, the victory of the von Neumann machine came about
for a very simple reason: such a machine required far less hardware.
One ALU for a program of any length or complexity, instead of as many
arithmetic stages as there were steps in the problem.

But the other problem is that a lot of uses were found for computers.
Dataflow seems to be good for one thing: turning a stack of input numbers
into a stack of output numbers. Now that programs are written to work in
a GUI instead of from a command line, however, a paradigm that's at least
analogous to dataflow is needed on the highest level, instead of just in
the innermost loops.

Also, while a computer could be designed to be a dataflow engine on
one level, there is the brutal slowness of the interface to main memory,
to external DRAM.

So it would be simple enough to design a dataflow processor. You
put a pile of ALUs on a chip. You have a lot of switchable connections
linking them. And you also put a pile of separate memories in the chip,
so that you can link multiple inputs and outputs to the computation.

Some of those memories could be used as look-up tables instead of
input or output hoppers.

The problem is: how useful would such a processor be? Assume it
shares a die with a conventional processor, which is used to set it up
for problems. How often will setting it up for a problem be faster than
just having the serial processor do the problem?

I have tried to think of a way to define a dataflow instruction for
what is otherwise a conventional processor. Basically, the instruction
performs a vector operation, but with multiple opcodes, with all the
data forwarding specified.

John Savard

Re: The Computer of the Future

<t0a324$1uat$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24089&group=comp.arch#24089

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!UXtAIYUgaw/fkqnS/V28xg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Wed, 9 Mar 2022 12:29:06 +0100
Organization: Aioe.org NNTP Server
Message-ID: <t0a324$1uat$1@gioia.aioe.org>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<858e7a72-fcc0-4bab-b087-28b9995c7094n@googlegroups.com>
<2e266e3e-b633-4c2a-bd33-962cb675bb77n@googlegroups.com>
<fb409a7e-e1a2-4eaf-8fbb-d697ac3f0febn@googlegroups.com>
<1a8a324d-34b8-4c1e-876e-1a0cde795e3fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com>
<66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <868rtnfb1y.fsf@linuxsc.com>
<t04pa8$g2u$1@gioia.aioe.org> <86ee3ceh3r.fsf@linuxsc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="63837"; posting-host="UXtAIYUgaw/fkqnS/V28xg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 9 Mar 2022 11:29 UTC

Tim Rentsch wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>
>> Tim Rentsch wrote:
>>
>>> Ivan Godard <ivan@millcomputing.com> writes:
>>>
>>> [...]
>>>
>>>> I think the problem is really a linguistic one: our programming
>>>> languages have caused us to think about our programs in control
>>>> flow terms, [...]
>>>
>>> Some of us. Not everyone, fortunately.
>>
>> I tend to think much more in data flow terms, as I've written before:
>>
>> An optimal program is like a creek finding its path down from the
>> mountain, always searching for the path of least resistance.
>
> It's an interesting simile. Note by the way that creeks are
> greedy algorithms, in the sense of being only locally optimal,
> not necessarily globally optimal.
>
> To me the more interesting question is how does this perspective
> affect how the code looks? If reading your programs, would I see
> something that looks pretty much like other imperative code, or
> would there be some distinguishing characteristics that would
> indicate your different thought mode? Can you say anything about
> what those characteristics might be? Or perhaps give an example
> or two? (Short is better if that is feasible.)
>
First example: Use (very) small lookup tables to get rid of branches,
typically trying to turn code state machines into data state machines.
This is driven by the fact that modern CPUs are much better at dependent
loads than unpredictable branches.

I typically try to write code that can run in SIMD mode, i.e. using
tables/branchless/predicated ops to handle any single-lane alternate paths.

Such code can also much more often scale across multiple cores/systems
even if that means that I have to do some redundant work across the
boundaries. I.e. when I process lidar data I can make the problem almost
embarrasingly parallelizable by splitting the input into tiles with
35-50m overlap: This is sufficient to effectively eliminate all edge
artifacts when I generate contours and vegetation classifications.

The most fun is when I find ways to remove all internal branching
related to exceptional data, i.e. anything which can impede that nicely
flowing stream of water going downhill.

BTW, looking for both local and global optimal paths/solutions is very
similar to what we do when competing in orienteering: What is my running
speed going to be along these paths vs going cross country? How is that
affected by the density of the vegetation and the amount of elevation
gain/loss? How much more time will I need to spend on the navigation
part when taking a more risky direct route without obvious intermediate
features to use for feedback? (In the latter case I also need to
incorporate the expected loss of time by not hitting the control
perfectly and then having to search and/or recover my position by
running to the nearest large/obvious feature.)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The Computer of the Future

<t0a3gl$5tr$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24090&group=comp.arch#24090

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-fef8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The Computer of the Future
Date: Wed, 9 Mar 2022 11:36:53 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t0a3gl$5tr$1@newsreader4.netcologne.de>
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com>
<66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <t02s9j$360$1@gal.iecc.com>
<fcace6e8-30a2-40f4-bfae-4c59529c6c10n@googlegroups.com>
Injection-Date: Wed, 9 Mar 2022 11:36:53 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-fef8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:fef8:0:7285:c2ff:fe6c:992d";
logging-data="6075"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 9 Mar 2022 11:36 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:

> So it would be simple enough to design a dataflow processor. You
> put a pile of ALUs on a chip. You have a lot of switchable connections
> linking them. And you also put a pile of separate memories in the chip,
> so that you can link multiple inputs and outputs to the computation.
>
> Some of those memories could be used as look-up tables instead of
> input or output hoppers.

Sounds like an FPGA to me.

> The problem is: how useful would such a processor be? Assume it
> shares a die with a conventional processor, which is used to set it up
> for problems. How often will setting it up for a problem be faster than
> just having the serial processor do the problem?

> I have tried to think of a way to define a dataflow instruction for
> what is otherwise a conventional processor. Basically, the instruction
> performs a vector operation, but with multiple opcodes, with all the
> data forwarding specified.

Use a HDL.

If you want, you can have a softcore for your CPU and define special
instructions for your special needs. I don't think it is easy
to modify the FPGA programming on the fly.

Re: The Computer of the Future

<72571ce4-cb0d-4fe7-81cd-91c79ea87047n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24092&group=comp.arch#24092

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5dca:0:b0:2de:57d8:7a89 with SMTP id e10-20020ac85dca000000b002de57d87a89mr18334719qtx.635.1646835866445;
Wed, 09 Mar 2022 06:24:26 -0800 (PST)
X-Received: by 2002:a05:6808:1912:b0:2d9:a01a:4877 with SMTP id
bf18-20020a056808191200b002d9a01a4877mr6476665oib.194.1646835866154; Wed, 09
Mar 2022 06:24:26 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 9 Mar 2022 06:24:25 -0800 (PST)
In-Reply-To: <t0a3gl$5tr$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.253.102; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.253.102
References: <b94f72eb-d747-47a4-85cd-d4c351cfcc5fn@googlegroups.com>
<005ee5af-519a-4d45-93bd-87f4ab580c61n@googlegroups.com> <66d1cc8e-c8b9-4ecb-be59-fee1ab1da715n@googlegroups.com>
<suljuo$it1$1@dont-email.me> <t02s9j$360$1@gal.iecc.com> <fcace6e8-30a2-40f4-bfae-4c59529c6c10n@googlegroups.com>
<t0a3gl$5tr$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <72571ce4-cb0d-4fe7-81cd-91c79ea87047n@googlegroups.com>
Subject: Re: The Computer of the Future
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Wed, 09 Mar 2022 14:24:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 26
 by: JimBrakefield - Wed, 9 Mar 2022 14:24 UTC

On Wednesday, March 9, 2022 at 5:36:57 AM UTC-6, Thomas Koenig wrote:
> Quadibloc <jsa...@ecn.ab.ca> schrieb:
> > So it would be simple enough to design a dataflow processor. You
> > put a pile of ALUs on a chip. You have a lot of switchable connections
> > linking them. And you also put a pile of separate memories in the chip,
> > so that you can link multiple inputs and outputs to the computation.
> >
> > Some of those memories could be used as look-up tables instead of
> > input or output hoppers.
> Sounds like an FPGA to me.
> > The problem is: how useful would such a processor be? Assume it
> > shares a die with a conventional processor, which is used to set it up
> > for problems. How often will setting it up for a problem be faster than
> > just having the serial processor do the problem?
>
> > I have tried to think of a way to define a dataflow instruction for
> > what is otherwise a conventional processor. Basically, the instruction
> > performs a vector operation, but with multiple opcodes, with all the
> > data forwarding specified.
> Use a HDL.
>
> If you want, you can have a softcore for your CPU and define special
> instructions for your special needs. I don't think it is easy
> to modify the FPGA programming on the fly.

It is possible to modify block RAM on the fly
And it can be used to contain micro-code

Pages:123456
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor