Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and fixed.


devel / comp.arch / Jason Cong's future of high performance computing

SubjectAuthor
* Jason Cong's future of high performance computingJimBrakefield
+* Re: Jason Cong's future of high performance computingMitchAlsup
|`* Re: Jason Cong's future of high performance computingScott Lurndal
| +- Re: Jason Cong's future of high performance computingMitchAlsup
| `* Re: Jason Cong's future of high performance computingEricP
|  `- Re: Jason Cong's future of high performance computingThomas Koenig
+* Re: Jason Cong's future of high performance computingQuadibloc
|`* Re: Jason Cong's future of high performance computingMitchAlsup
| +- Re: Jason Cong's future of high performance computingrobf...@gmail.com
| +* Re: Jason Cong's future of high performance computingMichael S
| |`* Re: Jason Cong's future of high performance computingBGB
| | `* Re: Jason Cong's future of high performance computingMitchAlsup
| |  `- Re: Jason Cong's future of high performance computingBGB
| `* Re: Jason Cong's future of high performance computingQuadibloc
|  +* Re: Jason Cong's future of high performance computingMitchAlsup
|  |`* Re: Jason Cong's future of high performance computingQuadibloc
|  | `* Re: Jason Cong's future of high performance computingMitchAlsup
|  |  `* Re: Jason Cong's future of high performance computingScott Lurndal
|  |   +- Re: Jason Cong's future of high performance computingScott Lurndal
|  |   `- Re: Jason Cong's future of high performance computingMitchAlsup
|  +- Re: Jason Cong's future of high performance computingJimBrakefield
|  `- Re: Jason Cong's future of high performance computingBGB
+* Re: Jason Cong's future of high performance computingTerje Mathisen
|`* Re: Jason Cong's future of high performance computingMichael S
| +- Re: Jason Cong's future of high performance computingJimBrakefield
| `- Re: Jason Cong's future of high performance computingBGB
`* Re: Jason Cong's future of high performance computingScott Lurndal
 +* Re: Jason Cong's future of high performance computingAnton Ertl
 |`- Re: Jason Cong's future of high performance computingMitchAlsup
 `* Re: Jason Cong's future of high performance computingEricP
  `* Re: Jason Cong's future of high performance computingMichael S
   `* Re: Jason Cong's future of high performance computingAnton Ertl
    +* Re: Jason Cong's future of high performance computingMichael S
    |`* Re: Jason Cong's future of high performance computingScott Lurndal
    | `* Re: Jason Cong's future of high performance computingScott Lurndal
    |  `* Re: Jason Cong's future of high performance computingMitchAlsup
    |   `* Re: Jason Cong's future of high performance computingScott Lurndal
    |    `* Re: Jason Cong's future of high performance computingMitchAlsup
    |     `* Re: Jason Cong's future of high performance computingJimBrakefield
    |      `* Re: Jason Cong's future of high performance computingScott Lurndal
    |       `* Re: Jason Cong's future of high performance computingScott Lurndal
    |        `- Re: Jason Cong's future of high performance computingMichael S
    +* Re: Jason Cong's future of high performance computingMitchAlsup
    |+- Re: Jason Cong's future of high performance computingBGB
    |`* Re: Jason Cong's future of high performance computingAnton Ertl
    | `* Re: Jason Cong's future of high performance computingMitchAlsup
    |  `* Re: Jason Cong's future of high performance computingScott Lurndal
    |   `* Re: Jason Cong's future of high performance computingMitchAlsup
    |    `- Re: Jason Cong's future of high performance computingScott Lurndal
    `* Re: Jason Cong's future of high performance computingMichael S
     +* Re: Jason Cong's future of high performance computingDavid Brown
     |+* Re: Jason Cong's future of high performance computingNiklas Holsti
     ||+- Re: Jason Cong's future of high performance computingMitchAlsup
     ||`* Re: Jason Cong's future of high performance computingDavid Brown
     || +- Re: Jason Cong's future of high performance computingNiklas Holsti
     || `* Re: Jason Cong's future of high performance computingMitchAlsup
     ||  `* Re: Jason Cong's future of high performance computingMichael S
     ||   +- Re: Jason Cong's future of high performance computingMitchAlsup
     ||   +- Re: Jason Cong's future of high performance computingJimBrakefield
     ||   `- Re: Jason Cong's future of high performance computingAnton Ertl
     |+- Re: Jason Cong's future of high performance computingMichael S
     |`- Re: Jason Cong's future of high performance computingBGB
     `- Re: Jason Cong's future of high performance computingThomas Koenig

Pages:123
Jason Cong's future of high performance computing

<bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30577&group=comp.arch#30577

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2057:b0:706:724e:9301 with SMTP id d23-20020a05620a205700b00706724e9301mr2036547qka.436.1674951088701;
Sat, 28 Jan 2023 16:11:28 -0800 (PST)
X-Received: by 2002:a05:6870:8290:b0:15e:e233:22a2 with SMTP id
q16-20020a056870829000b0015ee23322a2mr2708832oae.9.1674951088317; Sat, 28 Jan
2023 16:11:28 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 28 Jan 2023 16:11:28 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
Subject: Jason Cong's future of high performance computing
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Sun, 29 Jan 2023 00:11:28 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1614
 by: JimBrakefield - Sun, 29 Jan 2023 00:11 UTC

https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
Starts at the 2 minute mark. He argues that computer performance
has plateaued and that FPGAs offer a route to higher performance.
At about 13 minutes he states his goal to turn software programmers
into FPGA application engineers. The rest of the talk is about how this
can be achieved. It's a technical talk to an advanced audience.

Given his previous accomplishments and those of the VAST group at UCLA,
https://vast.cs.ucla.edu/, they can probably pull it off.
That is, in five or ten years this is what "big iron" computing will look like?

Re: Jason Cong's future of high performance computing

<eea89e9b-5bcf-484a-aa13-4041b3331492n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30580&group=comp.arch#30580

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:eb83:0:b0:537:4de2:b768 with SMTP id x3-20020a0ceb83000000b005374de2b768mr1607816qvo.55.1674955006710;
Sat, 28 Jan 2023 17:16:46 -0800 (PST)
X-Received: by 2002:a9d:7a59:0:b0:686:5dbb:5192 with SMTP id
z25-20020a9d7a59000000b006865dbb5192mr2247575otm.190.1674955006443; Sat, 28
Jan 2023 17:16:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 28 Jan 2023 17:16:46 -0800 (PST)
In-Reply-To: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:80d:1590:9f48:40a0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:80d:1590:9f48:40a0
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <eea89e9b-5bcf-484a-aa13-4041b3331492n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 29 Jan 2023 01:16:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4400
 by: MitchAlsup - Sun, 29 Jan 2023 01:16 UTC

On Saturday, January 28, 2023 at 6:11:30 PM UTC-6, JimBrakefield wrote:
> https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
> Starts at the 2 minute mark. He argues that computer performance
> has plateaued and that FPGAs offer a route to higher performance.
<
In my best Homer Simpson voice:: "Well Duh"
<
When CPU packages were limited to 100W of power; there is only a
maximum number of gates one can switch per unit time and stay
under this power limit. Technology increases are simply driving the
dissipation per gate down--after we maxed out IPL consumption.
<
> At about 13 minutes he states his goal to turn software programmers
> into FPGA application engineers. The rest of the talk is about how this
> can be achieved. It's a technical talk to an advanced audience.
<
Software people are trained to think as vonNeumann:: do one thing and then
do one more thing; repeat until done. This is manifest in the way they debug
programs--by single stepping.
<
Hardware is not like this at all:: Nothing is preventing all of the first gates after
a flip flop from sensing its inputs and generating its output simultaneously.
HW designers often (perversely) write Verilog code backwards knowing that
the compiler will rearrange the code by net-list dependency.
<
I should note: you cannot debug HW by single stepping !! as there is no definition
of what single stepping means at the gate level. No, HW designer use simulators
where they can stop at ½ clock intervals and then examine millions of signals--
some of them X (unknown value) and Z (high impeadence).
<
I have serious doubts that one can teach the 99% of software engineers to
think in ways that truly are concurrent--and the first thing that HW designers
have to come to grips with is that there is no single stepping, the minimal
advance is ½ clocks--and here a billion gates can change their output signals.
>
> Given his previous accomplishments and those of the VAST group at UCLA,
> https://vast.cs.ucla.edu/, they can probably pull it off.
<
> That is, in five or ten years this is what "big iron" computing will look like?
<
My bet is that if his efforts succeed, it will be 20 years before whatever it
becomes is available in a computer you buy in "Best Buy" and take home to
use.
<
Then again, there is the problem of applications to make use of the new
capabilities. Where do these come from.
<
----------------------------------------------------------------------------------------------------------------------
<
When I started as a professional in the computer world, I went to a company
sponsored lecture about Artificial intelligence, and how it would revolutionize
what computers do and how they do it and how it was only 5 years off into
the future.........this was 1982! 40 years later (8× longer than stated) we are
on the cusp of AI being useful to the average person not using Google as a
search engine. Yet the application has nothing to do with computing but with
another <nearly> daily activity:: Driving.

Re: Jason Cong's future of high performance computing

<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30582&group=comp.arch#30582

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:fe4c:0:b0:532:32c7:cc9 with SMTP id u12-20020a0cfe4c000000b0053232c70cc9mr2381206qvs.0.1674955365906;
Sat, 28 Jan 2023 17:22:45 -0800 (PST)
X-Received: by 2002:a05:6870:b785:b0:163:4d75:7541 with SMTP id
ed5-20020a056870b78500b001634d757541mr1174905oab.23.1674955365558; Sat, 28
Jan 2023 17:22:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 28 Jan 2023 17:22:45 -0800 (PST)
In-Reply-To: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=162.157.97.93; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 162.157.97.93
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 29 Jan 2023 01:22:45 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2017
 by: Quadibloc - Sun, 29 Jan 2023 01:22 UTC

On Saturday, January 28, 2023 at 5:11:30 PM UTC-7, JimBrakefield wrote:
> Starts at the 2 minute mark. He argues that computer performance
> has plateaued and that FPGAs offer a route to higher performance.
> At about 13 minutes he states his goal to turn software programmers
> into FPGA application engineers.

At present, FPGAs can make _some_ computer programs more
efficient. If a computer program involves things like bit manipulation,
that an FPGA can do well, but CPUs do poorly, it can be a good fit.

It would be possible to design FPGAs that are better suited to
problems that CPUs already do well, so that they could be done
even better on the FPGA. For example, if a problem involves a lot
of 64-bit floating-point arithmetic, put a lot of double-precision
FP ALUs in the FPGA. For some reason, such parts are not available
at the moment.

John Savard

Re: Jason Cong's future of high performance computing

<c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30585&group=comp.arch#30585

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:50c1:0:b0:534:6c47:849b with SMTP id e1-20020ad450c1000000b005346c47849bmr1529750qvq.60.1674955844002;
Sat, 28 Jan 2023 17:30:44 -0800 (PST)
X-Received: by 2002:aca:f387:0:b0:36c:bc0c:678c with SMTP id
r129-20020acaf387000000b0036cbc0c678cmr2188959oih.245.1674955843792; Sat, 28
Jan 2023 17:30:43 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 28 Jan 2023 17:30:43 -0800 (PST)
In-Reply-To: <a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:80d:1590:9f48:40a0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:80d:1590:9f48:40a0
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com> <a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 29 Jan 2023 01:30:43 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2710
 by: MitchAlsup - Sun, 29 Jan 2023 01:30 UTC

On Saturday, January 28, 2023 at 7:22:47 PM UTC-6, Quadibloc wrote:
> On Saturday, January 28, 2023 at 5:11:30 PM UTC-7, JimBrakefield wrote:
> > Starts at the 2 minute mark. He argues that computer performance
> > has plateaued and that FPGAs offer a route to higher performance.
> > At about 13 minutes he states his goal to turn software programmers
> > into FPGA application engineers.
> At present, FPGAs can make _some_ computer programs more
> efficient. If a computer program involves things like bit manipulation,
> that an FPGA can do well, but CPUs do poorly, it can be a good fit.
>
> It would be possible to design FPGAs that are better suited to
> problems that CPUs already do well, so that they could be done
> even better on the FPGA. For example, if a problem involves a lot
> of 64-bit floating-point arithmetic, put a lot of double-precision
> FP ALUs in the FPGA. For some reason, such parts are not available
> at the moment.
<
I see it a bit different:: FPGA applications should target things CPUs do
rather poorly.
<
As BGB indicates, he has had a hard time getting his FPU small enough
for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
So, in order to get speedup, you would need an FPGA that supports 10
FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.
Some of these FPGAs are more expensive than a CPU running at decent
but not extraordinary frequency.

Re: Jason Cong's future of high performance computing

<66345996-9222-44ba-a0bf-b43edb146647n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30586&group=comp.arch#30586

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:e40a:0:b0:702:2939:8cf5 with SMTP id q10-20020ae9e40a000000b0070229398cf5mr2214511qkc.75.1674967067979;
Sat, 28 Jan 2023 20:37:47 -0800 (PST)
X-Received: by 2002:a05:6870:8290:b0:15e:e233:22a2 with SMTP id
q16-20020a056870829000b0015ee23322a2mr2738714oae.9.1674967067719; Sat, 28 Jan
2023 20:37:47 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 28 Jan 2023 20:37:47 -0800 (PST)
In-Reply-To: <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com> <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <66345996-9222-44ba-a0bf-b43edb146647n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sun, 29 Jan 2023 04:37:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3426
 by: robf...@gmail.com - Sun, 29 Jan 2023 04:37 UTC

>At about 13 minutes he states his goal to turn software programmers
>into FPGA application engineers. The rest of the talk is about how this
>can be achieved. It's a technical talk to an advanced audience.

I do not think its a great idea to turn software programmers in FPGA app
engineers. It requires very different thinking, essentially two skill sets. I suspect
most people would want to specialize in one area or another. It may be good to
be able to identify where an FPGA solution could work better. But that is more
like finding a better algorithm.

>I should note: you cannot debug HW by single stepping !! as there is no definition
>of what single stepping means at the gate level. No, HW designer use simulators
>where they can stop at ½ clock intervals and then examine millions of signals--
>some of them X (unknown value) and Z (high impeadence).

I single step through FGPA logic sometimes trying to find bugs, although sometimes
it does not work the best. As you say, output values are only valid at clock intervals. Single
step is available in the Vivado simulator. But there are a couple of caveats. Stepping
will occur sequentially in the same always statement, but once it hits the end of
statement or other exit point it may jump around to another always block seemingly
at random. Best bet is to use it with a breakpoint then single step for only a few lines
Also variables are not set until the clock edge occurs, so one must single step though
all possible steps, hit the clock edge, then look at all the variables. One can set two
breakpoints, one in each successive clock edge to see how variables changed..
******
I think FPGA’s will always be at least an order of magnitude slower than custom CPU
for many compute tasks. Each has it owns area I think. If an FPGA task is common
enough, I think it would eventually get implemented in custom logic rather than using
all lookup tables with switchable routing.
I think FPGAs are great for prototyping and one-off solutions, not so sure beyond that.

Re: Jason Cong's future of high performance computing

<tr5f78$2n5sq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30587&group=comp.arch#30587

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
Date: Sun, 29 Jan 2023 10:45:11 +0100
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <tr5f78$2n5sq$1@dont-email.me>
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Jan 2023 09:45:12 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="0b2d09c9e059ce54aba2965ce07d3f94";
logging-data="2856858"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/HSUa0iTSy0sEZEPgf5ngE/lcV5x+c7cNoNv25Bwi+9w=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.14
Cancel-Lock: sha1:DPsr3mpccf9OVKsrSFv+s8U2Aok=
In-Reply-To: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
 by: Terje Mathisen - Sun, 29 Jan 2023 09:45 UTC

JimBrakefield wrote:
> https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
> Starts at the 2 minute mark. He argues that computer performance
> has plateaued and that FPGAs offer a route to higher performance.
> At about 13 minutes he states his goal to turn software programmers
> into FPGA application engineers. The rest of the talk is about how this
> can be achieved. It's a technical talk to an advanced audience.
>
> Given his previous accomplishments and those of the VAST group at UCLA,
> https://vast.cs.ucla.edu/, they can probably pull it off.
> That is, in five or ten years this is what "big iron" computing will look like?

NO, and once again, NO. FPGA starts out with at least an order of
magnitude speed disadvantage, so you need problems where the algorithms
are really unsuited for a SW implementation.

The only way for FPGA to become relevant in the mass market is because
it turns up as a standard feature in every Intel/Apple/ARM cpu,
otherwise it will forever be relegated to the narrow valley between what
you can do by just throwing a few more OoO cores at the problem, and
when you turn to full VLSI.

I.e. FPGA is for prototyping and low count custom systems. However, even
though cell phone towers have used FPGA to allow relatively easy
(remote) upgrades to handle new radio protocols as they become
finalized, even those systems (in the 100K+ range of installations) will
put everything baseline into VLSI.

I also don't think you can make FPGA context switching even remotely
fast, further limiting possible usage scenarios.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Jason Cong's future of high performance computing

<0b70add4-6ae6-4ead-8b33-7e7ac7b47a1dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30588&group=comp.arch#30588

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:cb81:0:b0:53a:a39f:b24a with SMTP id p1-20020a0ccb81000000b0053aa39fb24amr178105qvk.54.1674994646107;
Sun, 29 Jan 2023 04:17:26 -0800 (PST)
X-Received: by 2002:a05:6870:f604:b0:163:3f73:b113 with SMTP id
ek4-20020a056870f60400b001633f73b113mr1505537oab.261.1674994645722; Sun, 29
Jan 2023 04:17:25 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 04:17:25 -0800 (PST)
In-Reply-To: <tr5f78$2n5sq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com> <tr5f78$2n5sq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0b70add4-6ae6-4ead-8b33-7e7ac7b47a1dn@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 29 Jan 2023 12:17:26 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3621
 by: Michael S - Sun, 29 Jan 2023 12:17 UTC

On Sunday, January 29, 2023 at 11:45:16 AM UTC+2, Terje Mathisen wrote:
> JimBrakefield wrote:
> > https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
> > Starts at the 2 minute mark. He argues that computer performance
> > has plateaued and that FPGAs offer a route to higher performance.
> > At about 13 minutes he states his goal to turn software programmers
> > into FPGA application engineers. The rest of the talk is about how this
> > can be achieved. It's a technical talk to an advanced audience.
> >
> > Given his previous accomplishments and those of the VAST group at UCLA,
> > https://vast.cs.ucla.edu/, they can probably pull it off.
> > That is, in five or ten years this is what "big iron" computing will look like?
> NO, and once again, NO. FPGA starts out with at least an order of
> magnitude speed disadvantage, so you need problems where the algorithms
> are really unsuited for a SW implementation.
>
> The only way for FPGA to become relevant in the mass market is because
> it turns up as a standard feature in every Intel/Apple/ARM cpu,
> otherwise it will forever be relegated to the narrow valley between what
> you can do by just throwing a few more OoO cores at the problem, and
> when you turn to full VLSI.
>
> I.e. FPGA is for prototyping and low count custom systems. However, even
> though cell phone towers have used FPGA to allow relatively easy
> (remote) upgrades to handle new radio protocols as they become
> finalized, even those systems (in the 100K+ range of installations) will
> put everything baseline into VLSI.
>
> I also don't think you can make FPGA context switching even remotely
> fast, further limiting possible usage scenarios.
>
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

FPGAs *are* mass market for more than 2 decades.
But they are mass market not in role of compute accelerator and
I agree with you that they will never become a mass market in that role.
In the previous century mass market FPGAs were called simply FPGA.
In this century they are called "low end" or similar derogatory names.
Wall Street does not care about them just like Wall Street does not care
about micro-controllers, but like micro-controllers they are a cornerstone
of the industry. Well, I am exaggerating a little, somewhat less
then micro-controllers, but a cornerstone nevertheless.

Re: Jason Cong's future of high performance computing

<4b8a569d-8a3f-43c8-bc32-d1514f2eccc8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30589&group=comp.arch#30589

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2b86:b0:534:2577:ccda with SMTP id kr6-20020a0562142b8600b005342577ccdamr2358548qvb.32.1674996499950;
Sun, 29 Jan 2023 04:48:19 -0800 (PST)
X-Received: by 2002:a05:6870:e994:b0:163:b0c5:f852 with SMTP id
r20-20020a056870e99400b00163b0c5f852mr49679oao.9.1674996499665; Sun, 29 Jan
2023 04:48:19 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 04:48:19 -0800 (PST)
In-Reply-To: <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com> <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4b8a569d-8a3f-43c8-bc32-d1514f2eccc8n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 29 Jan 2023 12:48:19 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4173
 by: Michael S - Sun, 29 Jan 2023 12:48 UTC

On Sunday, January 29, 2023 at 3:30:45 AM UTC+2, MitchAlsup wrote:
> On Saturday, January 28, 2023 at 7:22:47 PM UTC-6, Quadibloc wrote:
> > On Saturday, January 28, 2023 at 5:11:30 PM UTC-7, JimBrakefield wrote:
> > > Starts at the 2 minute mark. He argues that computer performance
> > > has plateaued and that FPGAs offer a route to higher performance.
> > > At about 13 minutes he states his goal to turn software programmers
> > > into FPGA application engineers.
> > At present, FPGAs can make _some_ computer programs more
> > efficient. If a computer program involves things like bit manipulation,
> > that an FPGA can do well, but CPUs do poorly, it can be a good fit.
> >
> > It would be possible to design FPGAs that are better suited to
> > problems that CPUs already do well, so that they could be done
> > even better on the FPGA. For example, if a problem involves a lot
> > of 64-bit floating-point arithmetic, put a lot of double-precision
> > FP ALUs in the FPGA. For some reason, such parts are not available
> > at the moment.
> <
> I see it a bit different:: FPGA applications should target things CPUs do
> rather poorly.
> <
> As BGB indicates, he has had a hard time getting his FPU small enough
> for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
> So, in order to get speedup, you would need an FPGA that supports 10
> FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.

BGB plays with Artix-7. Probably XC7A25T that has 23,360 logic cells
and 80 DSP slices. Besides, if he can't fit something into device, it
does not mean that experienced FPGA guy will have the same difficulties.
But let's leave it aside and concentrate on devices.
So, 23,360 logic cells and 80 DSP slices.
For comparison, the biggest device in the same decade old Xilinx 7
series is Virtex XC7VH870T with 876,160 logic cells and 2520 DSP slices.

Newer high-end FPGA devices are bigger by another order of magnitude
although not in DSP slices area: Virtex ULTRAScale+ XCVU19P has 8,938,000
logic cells and 3,840 DSP slices. But that device is also not quite new.

In more recent years Xilinx lost interest in traditional FPGAs and is trying
to push "Adaptive Compute Acceleration Platform" (AGAPs). Some of those have
rather insane amount of multipliers, so big that they stopped counting DSP
slices and are now counting DSP Engines. I am too lazy to dig deeper and find
out what it really means, but one thing is sure - there are a lot of compute
resources in these devices. May be not as much as in leading edge GPUs,
but it's the same order of magnitude.

Were are not talking about 10 or 100 or 1000 FPUs on the high end AGAPs.
More like 10,000.

> Some of these FPGAs are more expensive than a CPU running at decent
> but not extraordinary frequency.

That's another understatement.

Re: Jason Cong's future of high performance computing

<2avBL.592255$9sn9.484650@fx17.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30591&group=comp.arch#30591

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Jason Cong's future of high performance computing
Newsgroups: comp.arch
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
Lines: 15
Message-ID: <2avBL.592255$9sn9.484650@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 29 Jan 2023 14:13:18 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 29 Jan 2023 14:13:18 GMT
X-Received-Bytes: 1477
 by: Scott Lurndal - Sun, 29 Jan 2023 14:13 UTC

JimBrakefield <jim.brakefield@ieee.org> writes:
>https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
>Starts at the 2 minute mark. He argues that computer performance
>has plateaued and that FPGAs offer a route to higher performance.
>At about 13 minutes he states his goal to turn software programmers
>into FPGA application engineers. The rest of the talk is about how this
>can be achieved. It's a technical talk to an advanced audience.

FPGAs have been used for decades to provide higher performance
for certain workloads. That was one of the reasons both Intel and
AMD have each purchased on of the big FPGA guys.

Although now, there are a number of custom ASIC houses that will
be happy to add custom logic to standard processor packages
(such as Marvell).

Re: Jason Cong's future of high performance computing

<HcvBL.592257$9sn9.321594@fx17.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30592&group=comp.arch#30592

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Jason Cong's future of high performance computing
Newsgroups: comp.arch
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com> <eea89e9b-5bcf-484a-aa13-4041b3331492n@googlegroups.com>
Lines: 23
Message-ID: <HcvBL.592257$9sn9.321594@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 29 Jan 2023 14:16:07 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 29 Jan 2023 14:16:07 GMT
X-Received-Bytes: 1726
 by: Scott Lurndal - Sun, 29 Jan 2023 14:16 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Saturday, January 28, 2023 at 6:11:30 PM UTC-6, JimBrakefield wrote:
>> https://www.youtube.com/watch?v=3D-XuMWvGUocI&t=3D123s=20
>> Starts at the 2 minute mark. He argues that computer performance=20
>> has plateaued and that FPGAs offer a route to higher performance.=20
><

><
>I should note: you cannot debug HW by single stepping !! as there is no def=
>inition=20
>of what single stepping means at the gate level. No, HW designer use simula=
>tors=20
>where they can stop at =C2=BD clock intervals and then examine millions of =
>signals--
>some of them X (unknown value) and Z (high impeadence).

Actually, that's how we debugged the Burroughs mainframes, by stepping a
single cycle at a time (using an external "maintenance processor" to
drive the processor logic using scan chains). This was late 70's.

Modern systems include JTAG facilities for similar debugability. Yes
a lot happens each cycle, but a good scan chain will include enough
flops to make debug rather straightforward.

Re: Jason Cong's future of high performance computing

<bcf74775-d7c8-4703-bb99-a1450b0c2d69n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30593&group=comp.arch#30593

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:131b:b0:71c:3b4e:9a47 with SMTP id o27-20020a05620a131b00b0071c3b4e9a47mr174595qkj.416.1675011289355;
Sun, 29 Jan 2023 08:54:49 -0800 (PST)
X-Received: by 2002:a05:6870:96ab:b0:163:aa5f:4530 with SMTP id
o43-20020a05687096ab00b00163aa5f4530mr159473oaq.167.1675011289184; Sun, 29
Jan 2023 08:54:49 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 08:54:48 -0800 (PST)
In-Reply-To: <HcvBL.592257$9sn9.321594@fx17.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:45c0:b342:13b0:20ac;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:45c0:b342:13b0:20ac
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<eea89e9b-5bcf-484a-aa13-4041b3331492n@googlegroups.com> <HcvBL.592257$9sn9.321594@fx17.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bcf74775-d7c8-4703-bb99-a1450b0c2d69n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 29 Jan 2023 16:54:49 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2661
 by: MitchAlsup - Sun, 29 Jan 2023 16:54 UTC

On Sunday, January 29, 2023 at 8:16:10 AM UTC-6, Scott Lurndal wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Saturday, January 28, 2023 at 6:11:30 PM UTC-6, JimBrakefield wrote:
> >> https://www.youtube.com/watch?v=3D-XuMWvGUocI&t=3D123s=20
> >> Starts at the 2 minute mark. He argues that computer performance=20
> >> has plateaued and that FPGAs offer a route to higher performance.=20
> ><
>
> ><
> >I should note: you cannot debug HW by single stepping !! as there is no def=
> >inition=20
> >of what single stepping means at the gate level. No, HW designer use simula=
> >tors=20
> >where they can stop at =C2=BD clock intervals and then examine millions of =
> >signals--
> >some of them X (unknown value) and Z (high impeadence).
> Actually, that's how we debugged the Burroughs mainframes, by stepping a
> single cycle at a time (using an external "maintenance processor" to
> drive the processor logic using scan chains). This was late 70's.
>
> Modern systems include JTAG facilities for similar debugability. Yes
> a lot happens each cycle, but a good scan chain will include enough
> flops to make debug rather straightforward.
<
The point was that single stepping a SW program sees a single state
change per instruction, whereas single clocking a CPU sees thousands
if not tens of thousands of state changes per smallest stepping.

Re: Jason Cong's future of high performance computing

<3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30594&group=comp.arch#30594

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4b5a:0:b0:3b6:32fa:23e3 with SMTP id e26-20020ac84b5a000000b003b632fa23e3mr1433654qts.132.1675012062987;
Sun, 29 Jan 2023 09:07:42 -0800 (PST)
X-Received: by 2002:aca:1202:0:b0:354:9da8:98a9 with SMTP id
2-20020aca1202000000b003549da898a9mr1966337ois.9.1675012062679; Sun, 29 Jan
2023 09:07:42 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 09:07:42 -0800 (PST)
In-Reply-To: <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=162.157.97.93; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 162.157.97.93
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com> <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 29 Jan 2023 17:07:42 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2102
 by: Quadibloc - Sun, 29 Jan 2023 17:07 UTC

On Saturday, January 28, 2023 at 6:30:45 PM UTC-7, MitchAlsup wrote:

> I see it a bit different:: FPGA applications should target things CPUs do
> rather poorly.
> <
> As BGB indicates, he has had a hard time getting his FPU small enough
> for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
> So, in order to get speedup, you would need an FPGA that supports 10
> FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.
> Some of these FPGAs are more expensive than a CPU running at decent
> but not extraordinary frequency.

I feel that FPGAs won't really take off unless they are good at applications
that are common - and those are ones well adapted to CPUs.

So I was advocating FPGAs with real FPUs as a component, not synthesizing
an FPU on an FPGA, which is much slower.

John Savard

Re: Jason Cong's future of high performance computing

<b38f6677-5e13-4358-8a8f-9f0e70d84aedn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30595&group=comp.arch#30595

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:ef8e:0:b0:53a:8c5d:fe22 with SMTP id w14-20020a0cef8e000000b0053a8c5dfe22mr222010qvr.103.1675013503974;
Sun, 29 Jan 2023 09:31:43 -0800 (PST)
X-Received: by 2002:a05:6808:8c8:b0:35c:27c2:68a4 with SMTP id
k8-20020a05680808c800b0035c27c268a4mr3356923oij.42.1675013503761; Sun, 29 Jan
2023 09:31:43 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 09:31:43 -0800 (PST)
In-Reply-To: <3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:45c0:b342:13b0:20ac;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:45c0:b342:13b0:20ac
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com> <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
<3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b38f6677-5e13-4358-8a8f-9f0e70d84aedn@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 29 Jan 2023 17:31:43 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2392
 by: MitchAlsup - Sun, 29 Jan 2023 17:31 UTC

On Sunday, January 29, 2023 at 11:07:44 AM UTC-6, Quadibloc wrote:
> On Saturday, January 28, 2023 at 6:30:45 PM UTC-7, MitchAlsup wrote:
>
> > I see it a bit different:: FPGA applications should target things CPUs do
> > rather poorly.
> > <
> > As BGB indicates, he has had a hard time getting his FPU small enough
> > for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
> > So, in order to get speedup, you would need an FPGA that supports 10
> > FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.
> > Some of these FPGAs are more expensive than a CPU running at decent
> > but not extraordinary frequency.
> I feel that FPGAs won't really take off unless they are good at applications
> that are common - and those are ones well adapted to CPUs.
>
> So I was advocating FPGAs with real FPUs as a component, not synthesizing
> an FPU on an FPGA, which is much slower.
<
The IP to which might cost even more than the FPGA it goes in.
>
> John Savard

Re: Jason Cong's future of high performance computing

<tr6ai8$2rst9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30596&group=comp.arch#30596

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
Date: Sun, 29 Jan 2023 11:31:48 -0600
Organization: A noiseless patient Spider
Lines: 224
Message-ID: <tr6ai8$2rst9$1@dont-email.me>
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>
<c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
<4b8a569d-8a3f-43c8-bc32-d1514f2eccc8n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Jan 2023 17:31:52 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="6a029a0cfa4c45f1877508f8fff0ff8b";
logging-data="3011497"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+KWBTlkQTgf4F+gnH1KstA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:pYJhSCPK0SZh9m8olGcpMB7ZxHg=
In-Reply-To: <4b8a569d-8a3f-43c8-bc32-d1514f2eccc8n@googlegroups.com>
Content-Language: en-US
 by: BGB - Sun, 29 Jan 2023 17:31 UTC

On 1/29/2023 6:48 AM, Michael S wrote:
> On Sunday, January 29, 2023 at 3:30:45 AM UTC+2, MitchAlsup wrote:
>> On Saturday, January 28, 2023 at 7:22:47 PM UTC-6, Quadibloc wrote:
>>> On Saturday, January 28, 2023 at 5:11:30 PM UTC-7, JimBrakefield wrote:
>>>> Starts at the 2 minute mark. He argues that computer performance
>>>> has plateaued and that FPGAs offer a route to higher performance.
>>>> At about 13 minutes he states his goal to turn software programmers
>>>> into FPGA application engineers.
>>> At present, FPGAs can make _some_ computer programs more
>>> efficient. If a computer program involves things like bit manipulation,
>>> that an FPGA can do well, but CPUs do poorly, it can be a good fit.
>>>
>>> It would be possible to design FPGAs that are better suited to
>>> problems that CPUs already do well, so that they could be done
>>> even better on the FPGA. For example, if a problem involves a lot
>>> of 64-bit floating-point arithmetic, put a lot of double-precision
>>> FP ALUs in the FPGA. For some reason, such parts are not available
>>> at the moment.
>> <
>> I see it a bit different:: FPGA applications should target things CPUs do
>> rather poorly.
>> <
>> As BGB indicates, he has had a hard time getting his FPU small enough
>> for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
>> So, in order to get speedup, you would need an FPGA that supports 10
>> FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.
>
>
> BGB plays with Artix-7. Probably XC7A25T that has 23,360 logic cells
> and 80 DSP slices.

For the BJX2 core, mostly on an XC7A100T, I can fit a Double Precision
FPU (1x Binary64) and also a 4x Binary32 SIMD unit (albeit the later
using a hard-wired truncate-only rounding mode).

Originally, had done Binary32 SIMD by pipelining it through the main
FPU, but with a dedicated low-precision SIMD unit, can get a fair bit of
a speedup: 20 MFLOPs -> 200 MFLOPs, at 50MHz.
And was able to stretch it enough to "more or less faithfully" handle
full Binary32 precision.

This is along with the the MMU, and 3-wide execute unit.

Don't really have enough LUTs left over for a second core or any real
sort of GPU.

I could potentially fit dual cores and boost clock speeds to 75 MHz on
an XC7A200T, but this is expensive (and boards with these have been
mostly out-of-stock for a while).

An XC7K325T or similar would possibly allow quad-core and/or a dedicated
GPU, as well as 100 or 150MHz. But, like, I don't really have the money
for something like this... (And this is basically the largest and
fastest FPGA supported by Vivado WebPack).

I can also fit the BJX2 core onto an XC7S50, but need to scale it back
slightly to make it fit.

I have also used an XC7S25 as well, but generally I can only seem to fit
simpler RISC style cores on this. Typically no FPU or MMU.

On the XC7S25 or XC7A35T, one could make a strong case for RISC-V
though, as what one can fit on these FPGAs is pretty much in-line for a
simple scalar RISC-V core or similar (and a RISC-like subset of BJX2 has
no real practical advantage over RV64I or similar).

A stronger case could probably be made for RV32I or maybe RV32IM or
similar on this class of FPGA.

And, if I were doing a GPU on an FPGA, something akin to my current
BJX2-XG2 mode could make sense as a base (possibly with wider SIMD and
also SIMD'ing the memory loads and stores, *).

*: Say, loads where the index register is a vector encoding multiple
indices, each of which is loaded into a subset of the loaded vector. The
ISA would look basically the same as it is now, except nearly everything
would be "doubled".

So, say:
ADD R16, R23, R39
Would actually add 128-bit vectors containing a pair of 64-bit values
(and the current 128-bit SIMD ops would effectively expand to being
8-wide 256-bit operations). Most Loads/Stores would also be doubled
(idea being to schedule loop iterations into each element of the vector;
with only a subset of ops being "directly aware" of the registers being
SIMD vectors; possibly with a mode flag to enable/disable side-effects
from the high-half of the vector).

But, as noted, I would likely need a Kintex or similar to have the LUT
budget for something like this...

> Besides, if he can't fit something into device, it
> does not mean that experienced FPGA guy will have the same difficulties.
> But let's leave it aside and concentrate on devices.
> So, 23,360 logic cells and 80 DSP slices.
> For comparison, the biggest device in the same decade old Xilinx 7
> series is Virtex XC7VH870T with 876,160 logic cells and 2520 DSP slices.
>

A Virtex-7 is also several orders of magnitude more expensive...

If a chip costs more than a typical person will have during their
lifetime, it almost may as well not exist as far as they are concerned.

....

I guess a person can "rent" access to Virtex devices via remote cloud
servers. Still not very practical though.

> Newer high-end FPGA devices are bigger by another order of magnitude
> although not in DSP slices area: Virtex ULTRAScale+ XCVU19P has 8,938,000
> logic cells and 3,840 DSP slices. But that device is also not quite new.
>
> In more recent years Xilinx lost interest in traditional FPGAs and is trying
> to push "Adaptive Compute Acceleration Platform" (AGAPs). Some of those have
> rather insane amount of multipliers, so big that they stopped counting DSP
> slices and are now counting DSP Engines. I am too lazy to dig deeper and find
> out what it really means, but one thing is sure - there are a lot of compute
> resources in these devices. May be not as much as in leading edge GPUs,
> but it's the same order of magnitude.
>
> Were are not talking about 10 or 100 or 1000 FPUs on the high end AGAPs.
> More like 10,000.
>

Hmm...

These look probably if-anything more relevant to "AI" and/or "bitcoin
mining" than to traditional FPGA use cases.

Looks less relevant personally, as "do whole lots of FPU math" isn't
really the typical bottleneck in my projects.

Granted, if one does want "thing that does lots of FPU math", this could
make sense.

>> Some of these FPGAs are more expensive than a CPU running at decent
>> but not extraordinary frequency.
>
> That's another understatement.

General case performance even on par with a RasPi is difficult...

Ironically, it is a little easier to compete with a RasPi for software
OpenGL, as the RasPi just sorta sucks at this (it effectively "face
plants" so hard as to offset its clock-speed advantage).

In a "performance per clock" sense at this task, my BJX2 core somewhat
beats out my Ryzen 7 for this as well. For the NN tests, it almost gets
almost a little absurd...

For more "general purpose" code, the BJX2 core kinda gets its crap
handed back to it though.

Though, in more realistic scenarios, hard to get an Artix-7 anywhere
near the speeds of my desktop PC, and even then, only in contrived
scenarios (the cost of emulating Binary16 FP-SIMD and similar on a PC
via bit-twiddling is higher than the clock-speed delta, ~ 74x).

This sort of thing sometimes poses issues for my emulator, as some
instructions are harder to emulate efficiently.

For example, some of the compressed texture instructions and similar
only "keep up" as they secretly cache recently decoded blocks. A direct
implementation of the approach used in the Verilog implementation would
be too slow to emulate in real time.

Things like emulating cache latency is a double-edged sword, as it isn't
super cheap to evaluate cache-hits and misses, but the cache misses
reduce how much work the emulator needs to do to keep up.

But, yeah, otherwise it would appear that a 150 MHz BJX2 core would be
fast enough to run Quake 3 Arena and similar with software rasterized
OpenGL at "fairly playable" framerates...

But, this will likely need to remain "in theory", as I don't have $k to
drop on a Kintex board or similar to find out...

But, say, if some company or whatever wanted to throw $k my way (both
for the FPGA board, and for "cost of living" reasons), could probably
make it happen (otherwise, I am otherwise divided in my efforts, also
needing to spend a chunk of time out in a machine shop).

Probably not going to happen though.

....

Though, compared with an "actual PC", it isn't very practical.

And, even a RasPi can run Quake 3 pretty easily (and a lot cheaper) if
one can make use of its integrated GPU (main annoyance being that it
uses GLES 2.0 rather than OpenGL 1.x).

For Quake 1, to get it as good as it is, had to resort to some trickery
like rewriting "BoxOnPlaneSide" and similar using ASM, ...

To get much more speed, would likely need a differently organized 3D engine.


Click here to read the complete article
Re: Jason Cong's future of high performance computing

<b36aea7d-a875-4085-b081-32b4a11733ebn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30597&group=comp.arch#30597

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ae9:f107:0:b0:717:a5d4:de3f with SMTP id k7-20020ae9f107000000b00717a5d4de3fmr600235qkg.157.1675013914773;
Sun, 29 Jan 2023 09:38:34 -0800 (PST)
X-Received: by 2002:a05:6870:8202:b0:163:994c:9495 with SMTP id
n2-20020a056870820200b00163994c9495mr449269oae.79.1675013914406; Sun, 29 Jan
2023 09:38:34 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 09:38:34 -0800 (PST)
In-Reply-To: <3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com> <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
<3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b36aea7d-a875-4085-b081-32b4a11733ebn@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Sun, 29 Jan 2023 17:38:34 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2670
 by: JimBrakefield - Sun, 29 Jan 2023 17:38 UTC

On Sunday, January 29, 2023 at 11:07:44 AM UTC-6, Quadibloc wrote:
> On Saturday, January 28, 2023 at 6:30:45 PM UTC-7, MitchAlsup wrote:
>
> > I see it a bit different:: FPGA applications should target things CPUs do
> > rather poorly.
> > <
> > As BGB indicates, he has had a hard time getting his FPU small enough
> > for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
> > So, in order to get speedup, you would need an FPGA that supports 10
> > FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.
> > Some of these FPGAs are more expensive than a CPU running at decent
> > but not extraordinary frequency.
> I feel that FPGAs won't really take off unless they are good at applications
> that are common - and those are ones well adapted to CPUs.
>
> So I was advocating FPGAs with real FPUs as a component, not synthesizing
> an FPU on an FPGA, which is much slower.
>
> John Savard

The Intel-Altera X series devices offer single precision add/multiply, no denorm support
The AMD-Xilinx Versal series offers single precision add/multiply in their SIMD/RISC cores.

Most of these chips have five digit price tags, except the three digit Arria X and Cyclone X GX families.

Some of these series and families offer 10K+ DSP units, 1M+ LUTs, 50+MB block RAM,
and HBM.

Re: Jason Cong's future of high performance computing

<a060e955-c5e7-4972-8036-29931b0b9b70n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30598&group=comp.arch#30598

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:4049:0:b0:3b8:6c12:f6d7 with SMTP id j9-20020ac84049000000b003b86c12f6d7mr5369qtl.465.1675014687979;
Sun, 29 Jan 2023 09:51:27 -0800 (PST)
X-Received: by 2002:a05:6870:c14e:b0:163:a303:fe36 with SMTP id
g14-20020a056870c14e00b00163a303fe36mr381234oad.106.1675014687744; Sun, 29
Jan 2023 09:51:27 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 09:51:27 -0800 (PST)
In-Reply-To: <tr6ai8$2rst9$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:45c0:b342:13b0:20ac;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:45c0:b342:13b0:20ac
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com> <c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
<4b8a569d-8a3f-43c8-bc32-d1514f2eccc8n@googlegroups.com> <tr6ai8$2rst9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a060e955-c5e7-4972-8036-29931b0b9b70n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 29 Jan 2023 17:51:27 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1996
 by: MitchAlsup - Sun, 29 Jan 2023 17:51 UTC

On Sunday, January 29, 2023 at 11:31:55 AM UTC-6, BGB wrote:
> On 1/29/2023 6:48 AM, Michael S wrote:

> > BGB plays with Artix-7. Probably XC7A25T that has 23,360 logic cells
> > and 80 DSP slices.
> For the BJX2 core, mostly on an XC7A100T, I can fit a Double Precision
> FPU (1x Binary64) and also a 4x Binary32 SIMD unit (albeit the later
> using a hard-wired truncate-only rounding mode).
>
But there is some reason you don't correctly compute FMULD--like
using 3/4 of a multiplier tree or something. Was tis due to lack of
gates (LUTs) or lack of DSPs or perceived unnecessary of getting
the right answer ?

Re: Jason Cong's future of high performance computing

<46a774f8-f645-4bee-ab04-923395befd23n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30599&group=comp.arch#30599

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5f84:0:b0:3b8:1e02:f290 with SMTP id j4-20020ac85f84000000b003b81e02f290mr691111qta.337.1675014873754;
Sun, 29 Jan 2023 09:54:33 -0800 (PST)
X-Received: by 2002:aca:1819:0:b0:36f:2426:36ae with SMTP id
h25-20020aca1819000000b0036f242636aemr921625oih.113.1675014873322; Sun, 29
Jan 2023 09:54:33 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 09:54:33 -0800 (PST)
In-Reply-To: <0b70add4-6ae6-4ead-8b33-7e7ac7b47a1dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<tr5f78$2n5sq$1@dont-email.me> <0b70add4-6ae6-4ead-8b33-7e7ac7b47a1dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <46a774f8-f645-4bee-ab04-923395befd23n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Sun, 29 Jan 2023 17:54:33 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 4096
 by: JimBrakefield - Sun, 29 Jan 2023 17:54 UTC

On Sunday, January 29, 2023 at 6:17:27 AM UTC-6, Michael S wrote:
> On Sunday, January 29, 2023 at 11:45:16 AM UTC+2, Terje Mathisen wrote:
> > JimBrakefield wrote:
> > > https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
> > > Starts at the 2 minute mark. He argues that computer performance
> > > has plateaued and that FPGAs offer a route to higher performance.
> > > At about 13 minutes he states his goal to turn software programmers
> > > into FPGA application engineers. The rest of the talk is about how this
> > > can be achieved. It's a technical talk to an advanced audience.
> > >
> > > Given his previous accomplishments and those of the VAST group at UCLA,
> > > https://vast.cs.ucla.edu/, they can probably pull it off.
> > > That is, in five or ten years this is what "big iron" computing will look like?
> > NO, and once again, NO. FPGA starts out with at least an order of
> > magnitude speed disadvantage, so you need problems where the algorithms
> > are really unsuited for a SW implementation.
> >
> > The only way for FPGA to become relevant in the mass market is because
> > it turns up as a standard feature in every Intel/Apple/ARM cpu,
> > otherwise it will forever be relegated to the narrow valley between what
> > you can do by just throwing a few more OoO cores at the problem, and
> > when you turn to full VLSI.
> >
> > I.e. FPGA is for prototyping and low count custom systems. However, even
> > though cell phone towers have used FPGA to allow relatively easy
> > (remote) upgrades to handle new radio protocols as they become
> > finalized, even those systems (in the 100K+ range of installations) will
> > put everything baseline into VLSI.
> >
> > I also don't think you can make FPGA context switching even remotely
> > fast, further limiting possible usage scenarios.
> >
> > Terje
> >
> >
> > --
> > - <Terje.Mathisen at tmsw.no>
> > "almost all programming can be viewed as an exercise in caching"
> FPGAs *are* mass market for more than 2 decades.
> But they are mass market not in role of compute accelerator and
> I agree with you that they will never become a mass market in that role.
> In the previous century mass market FPGAs were called simply FPGA.
> In this century they are called "low end" or similar derogatory names.
> Wall Street does not care about them just like Wall Street does not care
> about micro-controllers, but like micro-controllers they are a cornerstone
> of the industry. Well, I am exaggerating a little, somewhat less
> then micro-controllers, but a cornerstone nevertheless.

There are some issues that microprocessors have:
The high energy cost of access to main memory
A way to utilize dozens of cores and threads within most programming languages.
e.g. parallelism is unsolved and difficult.

Re: Jason Cong's future of high performance computing

<tr6c1s$2rst9$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30600&group=comp.arch#30600

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
Date: Sun, 29 Jan 2023 11:57:14 -0600
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <tr6c1s$2rst9$2@dont-email.me>
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>
<c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
<3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Jan 2023 17:57:16 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="6a029a0cfa4c45f1877508f8fff0ff8b";
logging-data="3011497"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+DKp+2sFEneIZsWyn/gnbC"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:AicTrMiJYeHSejpBOZGi73bFcjI=
Content-Language: en-US
In-Reply-To: <3a9395bb-c010-48f5-b7ec-259a43c8ef79n@googlegroups.com>
 by: BGB - Sun, 29 Jan 2023 17:57 UTC

On 1/29/2023 11:07 AM, Quadibloc wrote:
> On Saturday, January 28, 2023 at 6:30:45 PM UTC-7, MitchAlsup wrote:
>
>> I see it a bit different:: FPGA applications should target things CPUs do
>> rather poorly.
>> <
>> As BGB indicates, he has had a hard time getting his FPU small enough
>> for his FPGA and it still runs slowly (compared to Intel or AMD CPUs).
>> So, in order to get speedup, you would need an FPGA that supports 10
>> FPUs (more likely 100 FPUs) and enough pins to feed it the BW it requires.
>> Some of these FPGAs are more expensive than a CPU running at decent
>> but not extraordinary frequency.
>
> I feel that FPGAs won't really take off unless they are good at applications
> that are common - and those are ones well adapted to CPUs.
>
> So I was advocating FPGAs with real FPUs as a component, not synthesizing
> an FPU on an FPGA, which is much slower.
>

Depends on what one expects from an FPU.

Binary32 units could make sense alongside (or as an extension of) the
existing DSPs. Binary64 units would likely be a little more of a stretch.

Another balancing act of this would be to have "just enough" FPUs.

But, say, if an FPGA could have, say:
4x Binary64 MAC
32x Binary32 MAC

In addition to, say, 80k+ LUTs, 8Mb of BRAM, ... This could be a "pretty
nice" FPGA (particularly if it had a 32-bit DRAM interface, etc, *).

*: The Artix-7 boards are mostly using a 16-bit RAM interfaces, apart
from a few smaller boards using QSPI SRAM's and similar. Seemingly only
higher-end boards having a 32-bit RAM interface.

A rare few boards also use 8-bit DDR or SDRAM.

Could also be interesting if an FPGA board could utilize an M.2 SSD
interface or similar.

> John Savard

Re: Jason Cong's future of high performance computing

<2023Jan29.173724@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30601&group=comp.arch#30601

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
Date: Sun, 29 Jan 2023 16:37:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 69
Message-ID: <2023Jan29.173724@mips.complang.tuwien.ac.at>
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com> <2avBL.592255$9sn9.484650@fx17.iad>
Injection-Info: reader01.eternal-september.org; posting-host="b6a37d676d661f0ab2bb89c17a17053f";
logging-data="3025536"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19T9D8xuFx9sm7rUWTDEq5J"
Cancel-Lock: sha1:WfEoiaSfZrC8RHWJHt9Zx9z2TQU=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 29 Jan 2023 16:37 UTC

scott@slp53.sl.home (Scott Lurndal) writes:
>FPGAs have been used for decades to provide higher performance
>for certain workloads.

What workloads?

My impression is that if a workload is structured such that an FPGA
beats software on something like a CPU or a GPGPU, and if the
workload's performance is important enough, or there are enough
customers for the workload, people may prototype on FPGA, but they
then go for custom silicon for another speedup by an order of
magnitude and for a similar reduction in marginal cost. For FPGAs
this leaves only prototypes, and low-volume uses where performance is
not paramout. There is still enough volume there for significant
revenue for Xilinx and Altera. But the idea that you switch to FPGA
for performance is absurd.

For HPC CPUs and GPGPUs look fine. HPC performs memory accesses,
where FPGAs provide no advantage, and FP operations (FLOPs) where the
custom logic of CPUs and GPGPUs beats FPGAs clearly. You may think
that FPGA provides an advantage in passing data from one FLOP to
another, but CPUs have optimized the case of passing the result of one
instruction to another quite well, with their bypass networks. So
even if you have a field programmable FPU array (FPFA), I doubt that
you will see an advantage over CPUs. And I expect that it's similar
for GPGPUs.

To compare the performance of lower-grade hardware (but still
full-custom rather than FPGA) to software on a high-performace CPU, I
ran the rv8-bench <https://github.com/michaeljclark/rv8-bench/> C
programs (Compiling the C++ program failed) on a VisionFive 1 (1GHz
U74 cores) and compared the results to those on the RV8 simulator
running on a Core i7-5557U (3.4GHz Broadwell) taken from
<https://michaeljclark.github.io/bench>:

aach64 rv64g rv64g rv64g AMD64
qemu qemu rv8 U74 Broadwell
aes 1.31 2.16 1.49 3.30 0.32
dhrystone 0.98 0.57 0.20 1.109 0.10
miniz 2.66 2.21 1.53 8.766 0.77
norx 0.60 1.17 0.99 1.974 0.22
primes 2.09 1.26 0.65 18.686 0.60
qsort 7.38 4.76 1.21 5.218 0.64
sha512 0.64 1.24 0.81 2.048 0.24

The U74 column contains user time (total CPU time is higher by 0-20%),
not sure what the other results are. The interesting columns here are
the rv8 column and the U74 column, but also the rv64-qemu column; they
show that both software software emulation of RV64G on a 2015-vintage
high-end laptop CPU from Intel beats the custom silicon implementation
on the Visionfive 1. For an FPGA implementation you cannot expect a
1GHz clock rate, from what I hear 200MHz would be a good number. So
with the same microarchitecture you get a result that's even slower
than the VisionFive 1 by a factor ~5.

One might naively think that architecture implementation is the kind
of workload where FPGAs beat software on high-end CPUs, but that's
obviously not the case.

>That was one of the reasons both Intel and
>AMD have each purchased on of the big FPGA guys.

If so, IMO they did it for the wrong reason. I was certainly
wondering about the huge amount of money that AMD spent on Xilinx.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Jason Cong's future of high performance computing

<tr6ejm$2sif8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30602&group=comp.arch#30602

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
Date: Sun, 29 Jan 2023 12:40:51 -0600
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <tr6ejm$2sif8$1@dont-email.me>
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<a74b1c3c-bf8f-485d-9c43-a576555e25d5n@googlegroups.com>
<c37102e3-e86a-45d9-a994-ef824720a907n@googlegroups.com>
<4b8a569d-8a3f-43c8-bc32-d1514f2eccc8n@googlegroups.com>
<tr6ai8$2rst9$1@dont-email.me>
<a060e955-c5e7-4972-8036-29931b0b9b70n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Jan 2023 18:40:55 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="6a029a0cfa4c45f1877508f8fff0ff8b";
logging-data="3033576"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+IlSaFs1/utUxQCVyXjVMN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:Y7ZdOsXVx5GGEq94wuUijftDWCc=
Content-Language: en-US
In-Reply-To: <a060e955-c5e7-4972-8036-29931b0b9b70n@googlegroups.com>
 by: BGB - Sun, 29 Jan 2023 18:40 UTC

On 1/29/2023 11:51 AM, MitchAlsup wrote:
> On Sunday, January 29, 2023 at 11:31:55 AM UTC-6, BGB wrote:
>> On 1/29/2023 6:48 AM, Michael S wrote:
>
>>> BGB plays with Artix-7. Probably XC7A25T that has 23,360 logic cells
>>> and 80 DSP slices.
>> For the BJX2 core, mostly on an XC7A100T, I can fit a Double Precision
>> FPU (1x Binary64) and also a 4x Binary32 SIMD unit (albeit the later
>> using a hard-wired truncate-only rounding mode).
>>
> But there is some reason you don't correctly compute FMULD--like
> using 3/4 of a multiplier tree or something. Was tis due to lack of
> gates (LUTs) or lack of DSPs or perceived unnecessary of getting
> the right answer ?

Combination of factors.

I could get a full FMUL result, but it would come at the expense of
spending more LUTs, more DSPs, and having a higher latency.

Though, by themselves, the DSPs aren't as much of an issue, as the FPGA
I am using has "more than enough" DSPs, but I am basically at the limit
of timing latency (sneeze on the thing too hard and it fails timing).

The fraction of a bit of rounding error being "not worth it" if it means
needing to make FMUL slower (and a fairly obvious increase in terms of
resource cost).

And, say, I don't really want an 7 or 8-cycle FMUL...

Likewise for the hard-wired truncate on the SIMD ops:
For most of my use-cases, it straight up "doesn't matter".

I did have a reason to go from a truncated 24-bit floating point format
to full width Binary32, but mostly this is because:
Quake has physics glitches if things are calculated using a truncated
format;
This allowed making the default Binary32 SIMD faster;
My BtMini2 engine was "obviously broken" (*1) if one takes the camera
64km from the origin with 24-bit floating point (but was OK doing this
with Binary32);
....

But, rounding error doesn't really make much of a difference.

*1: At 64k from the origin, the map geometry turns into a jittering dog
chewed mess.

But, this is not a huge surprise when the effective ULP was ~ 1 meter.
There us a lot less jitter when the ULP is closer to 0.8cm.

But, whether or not rounding was performed (or correct), the effective
ULP would still be 0.8 cm.

Similarly for SIMD having MUL and ADD but no MAC:
While in theory, I could do a MAC, I can't do it in 3 clock cycles;
The moment I need more than 3 cycles, doing 3C/1T is broken.

....

Re: Jason Cong's future of high performance computing

<3df082c6-d28c-4463-b2c4-21d39e9750abn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30603&group=comp.arch#30603

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5f84:0:b0:3b8:1e02:f290 with SMTP id j4-20020ac85f84000000b003b81e02f290mr700176qta.337.1675018136207;
Sun, 29 Jan 2023 10:48:56 -0800 (PST)
X-Received: by 2002:a9d:6657:0:b0:68b:d348:e2a2 with SMTP id
q23-20020a9d6657000000b0068bd348e2a2mr16648otm.39.1675018135948; Sun, 29 Jan
2023 10:48:55 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 10:48:55 -0800 (PST)
In-Reply-To: <2023Jan29.173724@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:45c0:b342:13b0:20ac;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:45c0:b342:13b0:20ac
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<2avBL.592255$9sn9.484650@fx17.iad> <2023Jan29.173724@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3df082c6-d28c-4463-b2c4-21d39e9750abn@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 29 Jan 2023 18:48:56 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3783
 by: MitchAlsup - Sun, 29 Jan 2023 18:48 UTC

On Sunday, January 29, 2023 at 12:15:48 PM UTC-6, Anton Ertl wrote:
> sc...@slp53.sl.home (Scott Lurndal) writes:
> >FPGAs have been used for decades to provide higher performance
> >for certain workloads.
> What workloads?
>
> My impression is that if a workload is structured such that an FPGA
> beats software on something like a CPU or a GPGPU, and if the
> workload's performance is important enough, or there are enough
> customers for the workload, people may prototype on FPGA, but they
> then go for custom silicon for another speedup by an order of
> magnitude and for a similar reduction in marginal cost. For FPGAs
> this leaves only prototypes, and low-volume uses where performance is
> not paramout. There is still enough volume there for significant
> revenue for Xilinx and Altera. But the idea that you switch to FPGA
> for performance is absurd.
>
> For HPC CPUs and GPGPUs look fine. HPC performs memory accesses,
> where FPGAs provide no advantage, and FP operations (FLOPs) where the
> custom logic of CPUs and GPGPUs beats FPGAs clearly. You may think
> that FPGA provides an advantage in passing data from one FLOP to
> another, but CPUs have optimized the case of passing the result of one
> instruction to another quite well, with their bypass networks. So
<
I am going to push back here.
<
Take Interpolation* performed in a GPU. Interpolation takes the {x,y,z,w}^3
coordinates of a triangle, and identifies the coordinates of a series of
pixels this triangle maps to. This takes 29 FP calculations (last time I
looked) per pixel, yet, GPUs produce these 8-to-32 pixels per cycle. For
an effective throughput of 232 spFLOPs per cycle from 1 fixed function
unit; and a GPU contains one of these interpolators per Shader Core.
<
There is no way SW is competitive, here (instruction count)--nor are FPGAs
(frequency).
<
(*) Interpolation is a part of rasterization.
<
-----------------------------------------------------------------------------------------------------------------------
<
Secondarily, integer arithmetic is but 8-11% of CPU power dissipation.
FP is even lower, leaving a majority of energy consumption in a) the
clock tree, b) fetch-decode-issue, c) schedule-execute, d) retire; none
of which add to the bottom line of performance--it is just that they
provide the infrastructure* on which the instruction can be executed
with considerable width.
<
(*) Think of a pipeline like a conveyor belt or an assembly line.
<

Re: Jason Cong's future of high performance computing

<tr6g7e$2sif8$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30604&group=comp.arch#30604

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
Date: Sun, 29 Jan 2023 13:08:28 -0600
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <tr6g7e$2sif8$2@dont-email.me>
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<tr5f78$2n5sq$1@dont-email.me>
<0b70add4-6ae6-4ead-8b33-7e7ac7b47a1dn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 29 Jan 2023 19:08:31 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="6a029a0cfa4c45f1877508f8fff0ff8b";
logging-data="3033576"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19qNAkylrP4Aw9Ym2usFS1P"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.6.1
Cancel-Lock: sha1:XcGTzuXonlYwP0GWUtwpK31dNPY=
Content-Language: en-US
In-Reply-To: <0b70add4-6ae6-4ead-8b33-7e7ac7b47a1dn@googlegroups.com>
 by: BGB - Sun, 29 Jan 2023 19:08 UTC

On 1/29/2023 6:17 AM, Michael S wrote:
> On Sunday, January 29, 2023 at 11:45:16 AM UTC+2, Terje Mathisen wrote:
>> JimBrakefield wrote:
>>> https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
>>> Starts at the 2 minute mark. He argues that computer performance
>>> has plateaued and that FPGAs offer a route to higher performance.
>>> At about 13 minutes he states his goal to turn software programmers
>>> into FPGA application engineers. The rest of the talk is about how this
>>> can be achieved. It's a technical talk to an advanced audience.
>>>
>>> Given his previous accomplishments and those of the VAST group at UCLA,
>>> https://vast.cs.ucla.edu/, they can probably pull it off.
>>> That is, in five or ten years this is what "big iron" computing will look like?
>> NO, and once again, NO. FPGA starts out with at least an order of
>> magnitude speed disadvantage, so you need problems where the algorithms
>> are really unsuited for a SW implementation.
>>
>> The only way for FPGA to become relevant in the mass market is because
>> it turns up as a standard feature in every Intel/Apple/ARM cpu,
>> otherwise it will forever be relegated to the narrow valley between what
>> you can do by just throwing a few more OoO cores at the problem, and
>> when you turn to full VLSI.
>>
>> I.e. FPGA is for prototyping and low count custom systems. However, even
>> though cell phone towers have used FPGA to allow relatively easy
>> (remote) upgrades to handle new radio protocols as they become
>> finalized, even those systems (in the 100K+ range of installations) will
>> put everything baseline into VLSI.
>>

VLSI/ASIC only really makes sense if one has a mountain of money to burn.

>> I also don't think you can make FPGA context switching even remotely
>> fast, further limiting possible usage scenarios.
>>
>> Terje
>>
>>
>> --
>> - <Terje.Mathisen at tmsw.no>
>> "almost all programming can be viewed as an exercise in caching"
>
> FPGAs *are* mass market for more than 2 decades.
> But they are mass market not in role of compute accelerator and
> I agree with you that they will never become a mass market in that role.
> In the previous century mass market FPGAs were called simply FPGA.
> In this century they are called "low end" or similar derogatory names.

Makes sense.

Say:
FPGA on a standalone board with an SDcard slot and VGA port and similar;
FPGA board meant to fit a "DIP40" form factor or similar;
FPGA on a PCIe or M.2 card with no external IO interfaces.

Represent somewhat different use cases...

There are FPGA boards for PCIe and M.2, but personally, I have not as
much use for them.

For most general computational tasks, a Ryzen or similar is going to run
circles around whatever one can put on an Artix.

> Wall Street does not care about them just like Wall Street does not care
> about micro-controllers, but like micro-controllers they are a cornerstone
> of the industry. Well, I am exaggerating a little, somewhat less
> then micro-controllers, but a cornerstone nevertheless.

Yeah.

A world without microcontrollers would likely more resemble the 60s or
70s than it does the modern world. Even desktop PCs as we know them
could not exist without microcontrollers.

There is also a non-zero overlap between FPGAs and microcontrollers.

....

Re: Jason Cong's future of high performance computing

<72ABL.148955$PXw7.10070@fx45.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30605&group=comp.arch#30605

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx45.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com> <eea89e9b-5bcf-484a-aa13-4041b3331492n@googlegroups.com> <HcvBL.592257$9sn9.321594@fx17.iad>
In-Reply-To: <HcvBL.592257$9sn9.321594@fx17.iad>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 24
Message-ID: <72ABL.148955$PXw7.10070@fx45.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 29 Jan 2023 19:46:11 UTC
Date: Sun, 29 Jan 2023 14:38:32 -0500
X-Received-Bytes: 1767
 by: EricP - Sun, 29 Jan 2023 19:38 UTC

Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Saturday, January 28, 2023 at 6:11:30 PM UTC-6, JimBrakefield wrote:
>>> https://www.youtube.com/watch?v=3D-XuMWvGUocI&t=3D123s=20
>>> Starts at the 2 minute mark. He argues that computer performance=20
>>> has plateaued and that FPGAs offer a route to higher performance.=20
>> <
>
>> <
>> I should note: you cannot debug HW by single stepping !! as there is no def=
>> inition=20
>> of what single stepping means at the gate level. No, HW designer use simula=
>> tors=20
>> where they can stop at =C2=BD clock intervals and then examine millions of =
>> signals--
>> some of them X (unknown value) and Z (high impeadence).
>
> Actually, that's how we debugged the Burroughs mainframes, by stepping a
> single cycle at a time (using an external "maintenance processor" to
> drive the processor logic using scan chains). This was late 70's.

Luxury! I used a dual trace oscilloscope and an In-Circuit Emulator.

Re: Jason Cong's future of high performance computing

<82ABL.148956$PXw7.24310@fx45.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30606&group=comp.arch#30606

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx45.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Jason Cong's future of high performance computing
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com> <2avBL.592255$9sn9.484650@fx17.iad>
In-Reply-To: <2avBL.592255$9sn9.484650@fx17.iad>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 24
Message-ID: <82ABL.148956$PXw7.24310@fx45.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 29 Jan 2023 19:46:12 UTC
Date: Sun, 29 Jan 2023 14:45:56 -0500
X-Received-Bytes: 1833
 by: EricP - Sun, 29 Jan 2023 19:45 UTC

Scott Lurndal wrote:
> JimBrakefield <jim.brakefield@ieee.org> writes:
>> https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
>> Starts at the 2 minute mark. He argues that computer performance
>> has plateaued and that FPGAs offer a route to higher performance.
>> At about 13 minutes he states his goal to turn software programmers
>> into FPGA application engineers. The rest of the talk is about how this
>> can be achieved. It's a technical talk to an advanced audience.
>
> FPGAs have been used for decades to provide higher performance
> for certain workloads. That was one of the reasons both Intel and
> AMD have each purchased on of the big FPGA guys.
>
> Although now, there are a number of custom ASIC houses that will
> be happy to add custom logic to standard processor packages
> (such as Marvell).

People have also been investigating C/C++ languages to FPGA synthesis
for quite some time (a quicky search finds references back to 1996)
in order to make them more accessible to the general market.

The result is probably not as efficient as Verilog
but if it works users may not care.

Re: Jason Cong's future of high performance computing

<4532399a-3e46-421f-81d4-ee15be3a0420n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=30607&group=comp.arch#30607

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1455:b0:3b8:21bd:94eb with SMTP id v21-20020a05622a145500b003b821bd94ebmr545560qtx.474.1675022953421;
Sun, 29 Jan 2023 12:09:13 -0800 (PST)
X-Received: by 2002:a05:6870:f593:b0:163:aadd:a457 with SMTP id
eh19-20020a056870f59300b00163aadda457mr219243oab.201.1675022953255; Sun, 29
Jan 2023 12:09:13 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 29 Jan 2023 12:09:13 -0800 (PST)
In-Reply-To: <82ABL.148956$PXw7.24310@fx45.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <bdd94600-f8a0-4367-8c12-ba1f9d38821en@googlegroups.com>
<2avBL.592255$9sn9.484650@fx17.iad> <82ABL.148956$PXw7.24310@fx45.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4532399a-3e46-421f-81d4-ee15be3a0420n@googlegroups.com>
Subject: Re: Jason Cong's future of high performance computing
From: already5...@yahoo.com (Michael S)
Injection-Date: Sun, 29 Jan 2023 20:09:13 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2565
 by: Michael S - Sun, 29 Jan 2023 20:09 UTC

On Sunday, January 29, 2023 at 9:46:15 PM UTC+2, EricP wrote:
> Scott Lurndal wrote:
> > JimBrakefield <jim.bra...@ieee.org> writes:
> >> https://www.youtube.com/watch?v=-XuMWvGUocI&t=123s
> >> Starts at the 2 minute mark. He argues that computer performance
> >> has plateaued and that FPGAs offer a route to higher performance.
> >> At about 13 minutes he states his goal to turn software programmers
> >> into FPGA application engineers. The rest of the talk is about how this
> >> can be achieved. It's a technical talk to an advanced audience.
> >
> > FPGAs have been used for decades to provide higher performance
> > for certain workloads. That was one of the reasons both Intel and
> > AMD have each purchased on of the big FPGA guys.
> >
> > Although now, there are a number of custom ASIC houses that will
> > be happy to add custom logic to standard processor packages
> > (such as Marvell).
> People have also been investigating C/C++ languages to FPGA synthesis
> for quite some time (a quicky search finds references back to 1996)
> in order to make them more accessible to the general market.
>
> The result is probably not as efficient as Verilog
> but if it works users may not care.

Pay attention that programming FPGAs in Verilog is almost exclusively
USA trait. The rest of the world does it in VHDL.

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor