Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Disobedience: The silver lining to the cloud of servitude. -- Ambrose Bierce


devel / comp.arch / Re: Neural Network Accelerators

SubjectAuthor
* Neural Network Acceleratorsrobf...@gmail.com
`* Re: Neural Network AcceleratorsStephen Fuld
 +- Re: Neural Network AcceleratorsJohnG
 `* Re: Neural Network AcceleratorsTerje Mathisen
  `* Re: Neural Network AcceleratorsJimBrakefield
   `* Re: Neural Network AcceleratorsMitchAlsup
    +* Re: Neural Network AcceleratorsEricP
    |+* Re: Neural Network AcceleratorsEricP
    ||`* Re: Neural Network AcceleratorsIvan Godard
    || +* Re: Neural Network AcceleratorsScott Smader
    || |`* Re: Neural Network AcceleratorsIvan Godard
    || | +- Re: Neural Network AcceleratorsScott Smader
    || | `* Re: Neural Network AcceleratorsMitchAlsup
    || |  `- Re: Neural Network AcceleratorsIvan Godard
    || `* Re: Neural Network AcceleratorsEricP
    ||  `* Re: Neural Network AcceleratorsScott Smader
    ||   `* Re: Neural Network AcceleratorsEricP
    ||    +* Re: Neural Network AcceleratorsMitchAlsup
    ||    |+* Re: Neural Network AcceleratorsTerje Mathisen
    ||    ||`* Re: Neural Network AcceleratorsThomas Koenig
    ||    || +- Re: Neural Network AcceleratorsMitchAlsup
    ||    || +- Re: Neural Network AcceleratorsStefan Monnier
    ||    || `- Re: Neural Network AcceleratorsTerje Mathisen
    ||    |+- Re: Neural Network AcceleratorsStephen Fuld
    ||    |`* Re: Neural Network AcceleratorsEricP
    ||    | `- Re: Neural Network AcceleratorsBGB
    ||    `* Re: Neural Network AcceleratorsEricP
    ||     +* Re: Neural Network AcceleratorsScott Smader
    ||     |`* Re: Neural Network AcceleratorsEricP
    ||     | +* Re: Neural Network AcceleratorsStephen Fuld
    ||     | |`* Re: Neural Network AcceleratorsEricP
    ||     | | +- Re: Neural Network AcceleratorsScott Smader
    ||     | | +- Re: Neural Network AcceleratorsScott Smader
    ||     | | `* Re: Neural Network AcceleratorsStephen Fuld
    ||     | |  `* Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   +* Re: Neural Network AcceleratorsEricP
    ||     | |   |`* Re: Neural Network AcceleratorsEricP
    ||     | |   | +* Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   | |+* Re: Neural Network AcceleratorsTerje Mathisen
    ||     | |   | ||`- Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   | |`* Re: Neural Network AcceleratorsEricP
    ||     | |   | | `* Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   | |  `* Re: Neural Network Acceleratorsrobf...@gmail.com
    ||     | |   | |   +* Re: Neural Network AcceleratorsJimBrakefield
    ||     | |   | |   |`* Re: Neural Network AcceleratorsIvan Godard
    ||     | |   | |   | +- Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   | |   | +- Re: Neural Network AcceleratorsJimBrakefield
    ||     | |   | |   | +- Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   | |   | `- Re: Neural Network AcceleratorsMitchAlsup
    ||     | |   | |   +- Re: Neural Network AcceleratorsJimBrakefield
    ||     | |   | |   `- Re: Neural Network AcceleratorsSean O'Connor
    ||     | |   | `- Re: Neural Network AcceleratorsThomas Koenig
    ||     | |   `* Re: Neural Network AcceleratorsThomas Koenig
    ||     | |    +- Re: Neural Network AcceleratorsMitchAlsup
    ||     | |    `- Re: Neural Network AcceleratorsJimBrakefield
    ||     | `* Re: Neural Network AcceleratorsScott Smader
    ||     |  `* Re: Neural Network AcceleratorsEricP
    ||     |   `* Re: Neural Network AcceleratorsYoga Man
    ||     |    `- Re: Neural Network AcceleratorsScott Smader
    ||     `- Re: Neural Network AcceleratorsIvan Godard
    |+- Re: Neural Network AcceleratorsEricP
    |+- Re: Neural Network AcceleratorsJimBrakefield
    |`- Re: Neural Network AcceleratorsStephen Fuld
    `* Re: Neural Network AcceleratorsBGB
     `* Re: Neural Network AcceleratorsMitchAlsup
      `- Re: Neural Network AcceleratorsBGB

Pages:123
Re: Neural Network Accelerators

<p4wlJ.33295$Gco3.6417@fx01.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22057&group=comp.arch#22057

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1-2.proxad.net!proxad.net!feeder1-1.proxad.net!193.141.40.65.MISMATCH!npeer.as286.net!npeer-ng0.as286.net!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx01.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Neural Network Accelerators
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com> <sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org> <bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com> <FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad> <smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad> <17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad> <aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com> <ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me> <5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me> <b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
In-Reply-To: <rV8lJ.112170$IW4.87639@fx48.iad>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 31
Message-ID: <p4wlJ.33295$Gco3.6417@fx01.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 18 Nov 2021 17:29:57 UTC
Date: Thu, 18 Nov 2021 12:29:30 -0500
X-Received-Bytes: 2453
 by: EricP - Thu, 18 Nov 2021 17:29 UTC

EricP wrote:
>
> The sum and trigger have enough bit to not overflow.
> For 1024 8-bit integer synapse weights the parallel adder looks
> like it requires 2048 adders varying in size from 8 to 8+10 bits
> producing an 18 bit total. Does that sound correct?

Its only 1023 adders.

In this example each neuron must sum 1024 signed weights,
implemented as a tree of adders for operand pairs,
probaby pipelined because of the number of layers.

512 9-bit adders
256 10-bit adders
....
2 17-bit adders
1 18-bit adder

I was curious whether there was a better way to do this than
a tree of adders. A bit of poking about finds these are called
"multi-operand adders" and there has been a fair amount of
research on these since NN became a thing.

There are a few approaches that have different power, area, delay
properties, e.g. Array tree adder, Wallace tree adder,
Balanced delay tree adder and Overturned-stairs tree adder.

But I can't find anything significantly better than a tree of adders.

Re: Neural Network Accelerators

<bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22058&group=comp.arch#22058

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:293:: with SMTP id z19mr21907779qtw.46.1637257258647;
Thu, 18 Nov 2021 09:40:58 -0800 (PST)
X-Received: by 2002:aca:2412:: with SMTP id n18mr9185082oic.119.1637257258426;
Thu, 18 Nov 2021 09:40:58 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 18 Nov 2021 09:40:58 -0800 (PST)
In-Reply-To: <p4wlJ.33295$Gco3.6417@fx01.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 18 Nov 2021 17:40:58 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 30
 by: MitchAlsup - Thu, 18 Nov 2021 17:40 UTC

On Thursday, November 18, 2021 at 11:30:01 AM UTC-6, EricP wrote:
> EricP wrote:
> >
> > The sum and trigger have enough bit to not overflow.
> > For 1024 8-bit integer synapse weights the parallel adder looks
> > like it requires 2048 adders varying in size from 8 to 8+10 bits
> > producing an 18 bit total. Does that sound correct?
> Its only 1023 adders.
>
> In this example each neuron must sum 1024 signed weights,
> implemented as a tree of adders for operand pairs,
> probaby pipelined because of the number of layers.
>
> 512 9-bit adders
> 256 10-bit adders
> ...
> 2 17-bit adders
> 1 18-bit adder
>
> I was curious whether there was a better way to do this than
> a tree of adders. A bit of poking about finds these are called
> "multi-operand adders" and there has been a fair amount of
> research on these since NN became a thing.
>
> There are a few approaches that have different power, area, delay
> properties, e.g. Array tree adder, Wallace tree adder,
> Balanced delay tree adder and Overturned-stairs tree adder.
>
> But I can't find anything significantly better than a tree of adders.
<
Tree of carry save adders ?? !!

Re: Neural Network Accelerators

<sn6a4e$rgl$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22059&group=comp.arch#22059

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-10ca-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Neural Network Accelerators
Date: Thu, 18 Nov 2021 19:38:54 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sn6a4e$rgl$1@newsreader4.netcologne.de>
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com>
<896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com>
<N3bkJ.63875$Wkjc.36258@fx35.iad> <aCfkJ.35506$SW5.13028@fx45.iad>
<779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com>
<rV8lJ.112170$IW4.87639@fx48.iad> <p4wlJ.33295$Gco3.6417@fx01.iad>
Injection-Date: Thu, 18 Nov 2021 19:38:54 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-10ca-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:10ca:0:7285:c2ff:fe6c:992d";
logging-data="28181"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Thu, 18 Nov 2021 19:38 UTC

EricP <ThatWouldBeTelling@thevillage.com> schrieb:
> EricP wrote:
>>
>> The sum and trigger have enough bit to not overflow.
>> For 1024 8-bit integer synapse weights the parallel adder looks
>> like it requires 2048 adders varying in size from 8 to 8+10 bits
>> producing an 18 bit total. Does that sound correct?
>
> Its only 1023 adders.
>
> In this example each neuron must sum 1024 signed weights,
> implemented as a tree of adders for operand pairs,
> probaby pipelined because of the number of layers.
>
> 512 9-bit adders
> 256 10-bit adders
> ...
> 2 17-bit adders
> 1 18-bit adder
>
> I was curious whether there was a better way to do this than
> a tree of adders. A bit of poking about finds these are called
> "multi-operand adders" and there has been a fair amount of
> research on these since NN became a thing.

You can do it as a carry-save adder until there are only two 17-bit
numbers left. Think Wallace (or Dadda) tree.

Using 4:2 compressors will reduce the number of bits by a factor of
(roughly) two for each level, without having to worry about carry
propagation until the last two numbers are added.

Re: Neural Network Accelerators

<sn6ap3$19od$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22060&group=comp.arch#22060

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!ppYixYMWAWh/woI8emJOIQ.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Neural Network Accelerators
Date: Thu, 18 Nov 2021 20:49:55 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sn6ap3$19od$1@gioia.aioe.org>
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com>
<896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com>
<N3bkJ.63875$Wkjc.36258@fx35.iad> <aCfkJ.35506$SW5.13028@fx45.iad>
<779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com>
<rV8lJ.112170$IW4.87639@fx48.iad> <p4wlJ.33295$Gco3.6417@fx01.iad>
<bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="42765"; posting-host="ppYixYMWAWh/woI8emJOIQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Thu, 18 Nov 2021 19:49 UTC

MitchAlsup wrote:
> On Thursday, November 18, 2021 at 11:30:01 AM UTC-6, EricP wrote:
>> EricP wrote:
>>>
>>> The sum and trigger have enough bit to not overflow.
>>> For 1024 8-bit integer synapse weights the parallel adder looks
>>> like it requires 2048 adders varying in size from 8 to 8+10 bits
>>> producing an 18 bit total. Does that sound correct?
>> Its only 1023 adders.
>>
>> In this example each neuron must sum 1024 signed weights,
>> implemented as a tree of adders for operand pairs,
>> probaby pipelined because of the number of layers.
>>
>> 512 9-bit adders
>> 256 10-bit adders
>> ...
>> 2 17-bit adders
>> 1 18-bit adder
>>
>> I was curious whether there was a better way to do this than
>> a tree of adders. A bit of poking about finds these are called
>> "multi-operand adders" and there has been a fair amount of
>> research on these since NN became a thing.
>>
>> There are a few approaches that have different power, area, delay
>> properties, e.g. Array tree adder, Wallace tree adder,
>> Balanced delay tree adder and Overturned-stairs tree adder.
>>
>> But I can't find anything significantly better than a tree of adders.
> <
> Tree of carry save adders ?? !!
>
Indeed!

That should be the first option for any structure like this,
particularly when/if you only need to extract the final result quite rarely.

No need to suffer the final carry propagation delay more than once, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Neural Network Accelerators

<283484ec-eff4-48f7-80ab-69a6c2fbed28n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22061&group=comp.arch#22061

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:270e:: with SMTP id b14mr23708817qkp.475.1637267831559;
Thu, 18 Nov 2021 12:37:11 -0800 (PST)
X-Received: by 2002:a05:6808:14c3:: with SMTP id f3mr10392403oiw.51.1637267831388;
Thu, 18 Nov 2021 12:37:11 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 18 Nov 2021 12:37:11 -0800 (PST)
In-Reply-To: <sn6ap3$19od$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<sn6ap3$19od$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <283484ec-eff4-48f7-80ab-69a6c2fbed28n@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 18 Nov 2021 20:37:11 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 54
 by: MitchAlsup - Thu, 18 Nov 2021 20:37 UTC

On Thursday, November 18, 2021 at 1:49:58 PM UTC-6, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Thursday, November 18, 2021 at 11:30:01 AM UTC-6, EricP wrote:
> >> EricP wrote:
> >>>
> >>> The sum and trigger have enough bit to not overflow.
> >>> For 1024 8-bit integer synapse weights the parallel adder looks
> >>> like it requires 2048 adders varying in size from 8 to 8+10 bits
> >>> producing an 18 bit total. Does that sound correct?
> >> Its only 1023 adders.
> >>
> >> In this example each neuron must sum 1024 signed weights,
> >> implemented as a tree of adders for operand pairs,
> >> probaby pipelined because of the number of layers.
> >>
> >> 512 9-bit adders
> >> 256 10-bit adders
> >> ...
> >> 2 17-bit adders
> >> 1 18-bit adder
> >>
> >> I was curious whether there was a better way to do this than
> >> a tree of adders. A bit of poking about finds these are called
> >> "multi-operand adders" and there has been a fair amount of
> >> research on these since NN became a thing.
> >>
> >> There are a few approaches that have different power, area, delay
> >> properties, e.g. Array tree adder, Wallace tree adder,
> >> Balanced delay tree adder and Overturned-stairs tree adder.
> >>
> >> But I can't find anything significantly better than a tree of adders.
> > <
> > Tree of carry save adders ?? !!
> >
> Indeed!
>
> That should be the first option for any structure like this,
> particularly when/if you only need to extract the final result quite rarely.
>
> No need to suffer the final carry propagation delay more than once, right?
<
The rule of thumb for large sums is that you use one (1, uno, a single) carry chain
and as many 3-2 counters or 4-2-compressors as you need to reduce the sums
and carries to 2. All accumulations can be doin in carry save--in fact the entire
integer unit of the NEC-S/X (forgot model) was carry save so that adds would
always be 1 cycle. The only time the sums and carries were integerized was
when they were used in address arithmetic.
<
This was also how the Goldschmidt FDIV algorithm worked on the 360/91.
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Neural Network Accelerators

<x2AlJ.22571$452.388@fx22.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22062&group=comp.arch#22062

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx22.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Neural Network Accelerators
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com> <sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org> <bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com> <FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad> <smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad> <17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad> <aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com> <ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me> <5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me> <b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad> <p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
In-Reply-To: <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 41
Message-ID: <x2AlJ.22571$452.388@fx22.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 18 Nov 2021 22:01:01 UTC
Date: Thu, 18 Nov 2021 17:00:44 -0500
X-Received-Bytes: 3107
 by: EricP - Thu, 18 Nov 2021 22:00 UTC

MitchAlsup wrote:
> On Thursday, November 18, 2021 at 11:30:01 AM UTC-6, EricP wrote:
>> EricP wrote:
>>> The sum and trigger have enough bit to not overflow.
>>> For 1024 8-bit integer synapse weights the parallel adder looks
>>> like it requires 2048 adders varying in size from 8 to 8+10 bits
>>> producing an 18 bit total. Does that sound correct?
>> Its only 1023 adders.
>>
>> In this example each neuron must sum 1024 signed weights,
>> implemented as a tree of adders for operand pairs,
>> probaby pipelined because of the number of layers.
>>
>> 512 9-bit adders
>> 256 10-bit adders
>> ...
>> 2 17-bit adders
>> 1 18-bit adder
>>
>> I was curious whether there was a better way to do this than
>> a tree of adders. A bit of poking about finds these are called
>> "multi-operand adders" and there has been a fair amount of
>> research on these since NN became a thing.
>>
>> There are a few approaches that have different power, area, delay
>> properties, e.g. Array tree adder, Wallace tree adder,
>> Balanced delay tree adder and Overturned-stairs tree adder.
>>
>> But I can't find anything significantly better than a tree of adders.
> <
> Tree of carry save adders ?? !!

Right. My brain fart there. Apologies.
I read that on FPGA's, for adders less than 32 bits Ripple Carry Adder
(RCA) is faster than Carry Save Adder due to routing delays,
and then interpreted everything as an RCA tree.
Had I looked a little further I would have seen they were
actually making a case for using 4:2 compressors.

Re: Neural Network Accelerators

<2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22063&group=comp.arch#22063

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2429:: with SMTP id gy9mr68550251qvb.36.1637276297737;
Thu, 18 Nov 2021 14:58:17 -0800 (PST)
X-Received: by 2002:aca:2412:: with SMTP id n18mr562920oic.119.1637276297465;
Thu, 18 Nov 2021 14:58:17 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 18 Nov 2021 14:58:17 -0800 (PST)
In-Reply-To: <x2AlJ.22571$452.388@fx22.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 18 Nov 2021 22:58:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 49
 by: MitchAlsup - Thu, 18 Nov 2021 22:58 UTC

On Thursday, November 18, 2021 at 4:01:06 PM UTC-6, EricP wrote:
> MitchAlsup wrote:
> > On Thursday, November 18, 2021 at 11:30:01 AM UTC-6, EricP wrote:
> >> EricP wrote:
> >>> The sum and trigger have enough bit to not overflow.
> >>> For 1024 8-bit integer synapse weights the parallel adder looks
> >>> like it requires 2048 adders varying in size from 8 to 8+10 bits
> >>> producing an 18 bit total. Does that sound correct?
> >> Its only 1023 adders.
> >>
> >> In this example each neuron must sum 1024 signed weights,
> >> implemented as a tree of adders for operand pairs,
> >> probaby pipelined because of the number of layers.
> >>
> >> 512 9-bit adders
> >> 256 10-bit adders
> >> ...
> >> 2 17-bit adders
> >> 1 18-bit adder
> >>
> >> I was curious whether there was a better way to do this than
> >> a tree of adders. A bit of poking about finds these are called
> >> "multi-operand adders" and there has been a fair amount of
> >> research on these since NN became a thing.
> >>
> >> There are a few approaches that have different power, area, delay
> >> properties, e.g. Array tree adder, Wallace tree adder,
> >> Balanced delay tree adder and Overturned-stairs tree adder.
> >>
> >> But I can't find anything significantly better than a tree of adders.
> > <
> > Tree of carry save adders ?? !!
<
> Right. My brain fart there. Apologies.
> I read that on FPGA's, for adders less than 32 bits Ripple Carry Adder
> (RCA) is faster than Carry Save Adder due to routing delays,
> and then interpreted everything as an RCA tree.
> Had I looked a little further I would have seen they were
> actually making a case for using 4:2 compressors.
<
Every person who wants to call themselves a computer architect should
understand the "math" behind the delay of an Adder:: whether it be a
ripple carry adder (4+bits), a carry-propogate adder (5+ln4(bits)), a
carry select adder (6+ln8(bits)), and also understand that there are
special adders to better deal with the delay characteristics of multiplier
trees (Baugh Wooley) and adders designed for "other purposes" like
Kooge-Stone adders.
<
Apparently we have a NG filled with people who don't really want to be
known of as computer architects..............sigh......................

Re: Neural Network Accelerators

<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22077&group=comp.arch#22077

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7d4e:: with SMTP id h14mr25926534qtb.35.1637534005178;
Sun, 21 Nov 2021 14:33:25 -0800 (PST)
X-Received: by 2002:aca:ac8a:: with SMTP id v132mr17489374oie.44.1637534004923;
Sun, 21 Nov 2021 14:33:24 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 14:33:24 -0800 (PST)
In-Reply-To: <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:f50d:d811:36a6:8f61;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:f50d:d811:36a6:8f61
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Sun, 21 Nov 2021 22:33:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: robf...@gmail.com - Sun, 21 Nov 2021 22:33 UTC

For Thor the neuron is based on a weighted sum, using a multiplier and adder.
A clocked circuit is used and values are multiplied and added in sequence, the
operation iterates through a table using a counter. This has an advantage that
the values used may be partitioned allowing the same neuron to emulate more
than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
used for a sequenced adder. I am curious what the difference in propagation delay
between input to output would be for a sequenced approach versus a direct
approach. I expect the adder tree approach to be many times faster, but it is
possible to fit many more neurons operating in parallel with a sequenced approach
because it uses fewer resources.
1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs..
Thor currently has 8 neurons operating in parallel, but it may be possible to increase
the count to 16.
Software is going to be relied on to map a virtual neural net to the available neurons.

Re: Neural Network Accelerators

<4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22080&group=comp.arch#22080

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:30a:: with SMTP id q10mr26533444qtw.267.1637540873349;
Sun, 21 Nov 2021 16:27:53 -0800 (PST)
X-Received: by 2002:a05:6830:290a:: with SMTP id z10mr20823072otu.243.1637540873072;
Sun, 21 Nov 2021 16:27:53 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 16:27:52 -0800 (PST)
In-Reply-To: <8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.253.102; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.253.102
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Mon, 22 Nov 2021 00:27:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: JimBrakefield - Mon, 22 Nov 2021 00:27 UTC

On Sunday, November 21, 2021 at 4:33:26 PM UTC-6, robf...@gmail.com wrote:
> For Thor the neuron is based on a weighted sum, using a multiplier and adder.
> A clocked circuit is used and values are multiplied and added in sequence, the
> operation iterates through a table using a counter. This has an advantage that
> the values used may be partitioned allowing the same neuron to emulate more
> than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
> in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
> used for a sequenced adder. I am curious what the difference in propagation delay
> between input to output would be for a sequenced approach versus a direct
> approach. I expect the adder tree approach to be many times faster, but it is
> possible to fit many more neurons operating in parallel with a sequenced approach
> because it uses fewer resources.
> 1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
> Thor currently has 8 neurons operating in parallel, but it may be possible to increase
> the count to 16.
> Software is going to be relied on to map a virtual neural net to the available neurons.

Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
Decent Wikipedia page on barrel processors in general (with several examples).
Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
He utilizes the DSP blocks at their maximum frequency.
Another barrel processor: https://opencores.org/projects/avr_hp, probably others
And other vector and multiple issue designs.

Re: Neural Network Accelerators

<sneqaq$heb$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22081&group=comp.arch#22081

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Neural Network Accelerators
Date: Sun, 21 Nov 2021 17:04:26 -0800
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <sneqaq$heb$1@dont-email.me>
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com>
<896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com>
<N3bkJ.63875$Wkjc.36258@fx35.iad> <aCfkJ.35506$SW5.13028@fx45.iad>
<779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com>
<rV8lJ.112170$IW4.87639@fx48.iad> <p4wlJ.33295$Gco3.6417@fx01.iad>
<bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad>
<2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com>
<4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 22 Nov 2021 01:04:26 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6e5d1fc0ff598e6b5b076d2c3098adf2";
logging-data="17867"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fnt8TEV7S7Xswd8VshNrt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.3.2
Cancel-Lock: sha1:izEJXsgRziNhDJ930BkuogxOsRM=
In-Reply-To: <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Mon, 22 Nov 2021 01:04 UTC

On 11/21/2021 4:27 PM, JimBrakefield wrote:
> On Sunday, November 21, 2021 at 4:33:26 PM UTC-6, robf...@gmail.com wrote:
>> For Thor the neuron is based on a weighted sum, using a multiplier and adder.
>> A clocked circuit is used and values are multiplied and added in sequence, the
>> operation iterates through a table using a counter. This has an advantage that
>> the values used may be partitioned allowing the same neuron to emulate more
>> than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
>> in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
>> used for a sequenced adder. I am curious what the difference in propagation delay
>> between input to output would be for a sequenced approach versus a direct
>> approach. I expect the adder tree approach to be many times faster, but it is
>> possible to fit many more neurons operating in parallel with a sequenced approach
>> because it uses fewer resources.
>> 1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
>> Thor currently has 8 neurons operating in parallel, but it may be possible to increase
>> the count to 16.
>> Software is going to be relied on to map a virtual neural net to the available neurons.
>
> Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
> Decent Wikipedia page on barrel processors in general (with several examples).
> Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
> He utilizes the DSP blocks at their maximum frequency.
> Another barrel processor: https://opencores.org/projects/avr_hp, probably others
> And other vector and multiple issue designs.
>

CDC6600 Peripheral Processor.

Re: Neural Network Accelerators

<dd0005f6-a071-4165-a0c3-2c0a262d9e86n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22082&group=comp.arch#22082

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:541:: with SMTP id o1mr39588331qko.145.1637545457726;
Sun, 21 Nov 2021 17:44:17 -0800 (PST)
X-Received: by 2002:a4a:5a43:: with SMTP id v64mr11297016ooa.26.1637545457430;
Sun, 21 Nov 2021 17:44:17 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 17:44:17 -0800 (PST)
In-Reply-To: <sneqaq$heb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com> <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
<sneqaq$heb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dd0005f6-a071-4165-a0c3-2c0a262d9e86n@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 22 Nov 2021 01:44:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 46
 by: MitchAlsup - Mon, 22 Nov 2021 01:44 UTC

On Sunday, November 21, 2021 at 7:04:30 PM UTC-6, Ivan Godard wrote:
> On 11/21/2021 4:27 PM, JimBrakefield wrote:
> > On Sunday, November 21, 2021 at 4:33:26 PM UTC-6, robf...@gmail.com wrote:
> >> For Thor the neuron is based on a weighted sum, using a multiplier and adder.
> >> A clocked circuit is used and values are multiplied and added in sequence, the
> >> operation iterates through a table using a counter. This has an advantage that
> >> the values used may be partitioned allowing the same neuron to emulate more
> >> than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
> >> in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
> >> used for a sequenced adder. I am curious what the difference in propagation delay
> >> between input to output would be for a sequenced approach versus a direct
> >> approach. I expect the adder tree approach to be many times faster, but it is
> >> possible to fit many more neurons operating in parallel with a sequenced approach
> >> because it uses fewer resources.
> >> 1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
> >> Thor currently has 8 neurons operating in parallel, but it may be possible to increase
> >> the count to 16.
> >> Software is going to be relied on to map a virtual neural net to the available neurons.
> >
> > Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
> > Decent Wikipedia page on barrel processors in general (with several examples).
> > Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
> > He utilizes the DSP blocks at their maximum frequency.
> > Another barrel processor: https://opencores.org/projects/avr_hp, probably others
> > And other vector and multiple issue designs.
> >
> CDC6600 Peripheral Processor.
<
Qualcom micro-CPU.

Re: Neural Network Accelerators

<43628409-41a4-4303-ad43-ccc792643ce9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22083&group=comp.arch#22083

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:f2e:: with SMTP id iw14mr95804075qvb.21.1637545502406; Sun, 21 Nov 2021 17:45:02 -0800 (PST)
X-Received: by 2002:a05:6808:4d2:: with SMTP id a18mr17723508oie.99.1637545502194; Sun, 21 Nov 2021 17:45:02 -0800 (PST)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 17:45:01 -0800 (PST)
In-Reply-To: <sneqaq$heb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.253.102; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.253.102
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com> <bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com> <FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad> <smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad> <17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad> <aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com> <ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me> <5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me> <b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad> <p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com> <x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com> <8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com> <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com> <sneqaq$heb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <43628409-41a4-4303-ad43-ccc792643ce9n@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Mon, 22 Nov 2021 01:45:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 57
 by: JimBrakefield - Mon, 22 Nov 2021 01:45 UTC

On Sunday, November 21, 2021 at 7:04:30 PM UTC-6, Ivan Godard wrote:
> On 11/21/2021 4:27 PM, JimBrakefield wrote:
> > On Sunday, November 21, 2021 at 4:33:26 PM UTC-6, robf...@gmail.com wrote:
> >> For Thor the neuron is based on a weighted sum, using a multiplier and adder.
> >> A clocked circuit is used and values are multiplied and added in sequence, the
> >> operation iterates through a table using a counter. This has an advantage that
> >> the values used may be partitioned allowing the same neuron to emulate more
> >> than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
> >> in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
> >> used for a sequenced adder. I am curious what the difference in propagation delay
> >> between input to output would be for a sequenced approach versus a direct
> >> approach. I expect the adder tree approach to be many times faster, but it is
> >> possible to fit many more neurons operating in parallel with a sequenced approach
> >> because it uses fewer resources.
> >> 1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
> >> Thor currently has 8 neurons operating in parallel, but it may be possible to increase
> >> the count to 16.
> >> Software is going to be relied on to map a virtual neural net to the available neurons.
> >
> > Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
> > Decent Wikipedia page on barrel processors in general (with several examples).
> > Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
> > He utilizes the DSP blocks at their maximum frequency.
> > Another barrel processor: https://opencores.org/projects/avr_hp, probably others
> > And other vector and multiple issue designs.
> >
> CDC6600 Peripheral Processor.

|> CDC6600 Peripheral Processor.
Have some experience, could read/write five 12-bit "bytes" to/from 6600 main memory at a time.
Perhaps the best (with documentation) example of a barrel processor.

JCB humor:
Lincoln said God must have loved the poor, he made so many of them.
Cray liked the CDC160 cause he made a lot of them (10 on each 6600).

The ten "160" uP took one of the 16 frames on a 6600. The CPU took eight?
The in-order slow 6400 CPU took one frame. Various combinations were sold.

Re: Neural Network Accelerators

<ba1a3a95-2593-4f53-9cd3-18f758012ffbn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22084&group=comp.arch#22084

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7f4f:: with SMTP id g15mr27895736qtk.309.1637547586214;
Sun, 21 Nov 2021 18:19:46 -0800 (PST)
X-Received: by 2002:aca:c6d0:: with SMTP id w199mr17781771oif.30.1637547586003;
Sun, 21 Nov 2021 18:19:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 18:19:45 -0800 (PST)
In-Reply-To: <43628409-41a4-4303-ad43-ccc792643ce9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com> <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
<sneqaq$heb$1@dont-email.me> <43628409-41a4-4303-ad43-ccc792643ce9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ba1a3a95-2593-4f53-9cd3-18f758012ffbn@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 22 Nov 2021 02:19:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 71
 by: MitchAlsup - Mon, 22 Nov 2021 02:19 UTC

On Sunday, November 21, 2021 at 7:45:03 PM UTC-6, JimBrakefield wrote:
> On Sunday, November 21, 2021 at 7:04:30 PM UTC-6, Ivan Godard wrote:
> > On 11/21/2021 4:27 PM, JimBrakefield wrote:
> > > On Sunday, November 21, 2021 at 4:33:26 PM UTC-6, robf...@gmail.com wrote:
> > >> For Thor the neuron is based on a weighted sum, using a multiplier and adder.
> > >> A clocked circuit is used and values are multiplied and added in sequence, the
> > >> operation iterates through a table using a counter. This has an advantage that
> > >> the values used may be partitioned allowing the same neuron to emulate more
> > >> than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
> > >> in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
> > >> used for a sequenced adder. I am curious what the difference in propagation delay
> > >> between input to output would be for a sequenced approach versus a direct
> > >> approach. I expect the adder tree approach to be many times faster, but it is
> > >> possible to fit many more neurons operating in parallel with a sequenced approach
> > >> because it uses fewer resources.
> > >> 1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
> > >> Thor currently has 8 neurons operating in parallel, but it may be possible to increase
> > >> the count to 16.
> > >> Software is going to be relied on to map a virtual neural net to the available neurons.
> > >
> > > Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
> > > Decent Wikipedia page on barrel processors in general (with several examples).
> > > Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
> > > He utilizes the DSP blocks at their maximum frequency.
> > > Another barrel processor: https://opencores.org/projects/avr_hp, probably others
> > > And other vector and multiple issue designs.
> > >
> > CDC6600 Peripheral Processor.
> |> CDC6600 Peripheral Processor.
> Have some experience, could read/write five 12-bit "bytes" to/from 6600 main memory at a time.
> Perhaps the best (with documentation) example of a barrel processor.
>
> JCB humor:
> Lincoln said God must have loved the poor, he made so many of them.
> Cray liked the CDC160 cause he made a lot of them (10 on each 6600).
>
> The ten "160" uP took one of the 16 frames on a 6600. The CPU took eight?
> The in-order slow 6400 CPU took one frame. Various combinations were sold..
<
In a barrel processor there is one execution pipeline and n× fetch-writeback
pipeline-stages.
<
When a PP Read main memory, it had to do so using 5 LD instructions in a row
each LD getting 12-bits. Should such a LD not be fetched, that slot of the R-W
pryamid would simply be skipped. So if PP knew it was reading an address
(18 bits) it could use 2 LDs.
<
The PPs were not as complicated as a DG-NOVA.

Re: Neural Network Accelerators

<87f33bb6-40d0-4c24-93b8-a0f71df1789en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22085&group=comp.arch#22085

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:27c7:: with SMTP id ge7mr96863352qvb.44.1637547748334;
Sun, 21 Nov 2021 18:22:28 -0800 (PST)
X-Received: by 2002:a05:6808:2396:: with SMTP id bp22mr18950498oib.78.1637547748090;
Sun, 21 Nov 2021 18:22:28 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 18:22:27 -0800 (PST)
In-Reply-To: <43628409-41a4-4303-ad43-ccc792643ce9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com> <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
<sneqaq$heb$1@dont-email.me> <43628409-41a4-4303-ad43-ccc792643ce9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <87f33bb6-40d0-4c24-93b8-a0f71df1789en@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 22 Nov 2021 02:22:28 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 4
 by: MitchAlsup - Mon, 22 Nov 2021 02:22 UTC

Also note: although not claimed as a barrel processor::

reading the B6700 processor patents circa 1970-1985 one could
easily imaging that the B6700 was a 3-deep barrel; especilly those
at a circuit design and pipeline design levels.

Re: Neural Network Accelerators

<c040cbee-e6c3-4ff4-83a7-7a9db7cc7a9en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22086&group=comp.arch#22086

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:346:: with SMTP id r6mr26810734qtw.78.1637548744255;
Sun, 21 Nov 2021 18:39:04 -0800 (PST)
X-Received: by 2002:a9d:110:: with SMTP id 16mr11854330otu.94.1637548743974;
Sun, 21 Nov 2021 18:39:03 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 21 Nov 2021 18:39:03 -0800 (PST)
In-Reply-To: <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.253.102; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.253.102
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com> <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c040cbee-e6c3-4ff4-83a7-7a9db7cc7a9en@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: jim.brak...@ieee.org (JimBrakefield)
Injection-Date: Mon, 22 Nov 2021 02:39:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 45
 by: JimBrakefield - Mon, 22 Nov 2021 02:39 UTC

On Sunday, November 21, 2021 at 6:27:54 PM UTC-6, JimBrakefield wrote:
> On Sunday, November 21, 2021 at 4:33:26 PM UTC-6, robf...@gmail.com wrote:
> > For Thor the neuron is based on a weighted sum, using a multiplier and adder.
> > A clocked circuit is used and values are multiplied and added in sequence, the
> > operation iterates through a table using a counter. This has an advantage that
> > the values used may be partitioned allowing the same neuron to emulate more
> > than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
> > in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
> > used for a sequenced adder. I am curious what the difference in propagation delay
> > between input to output would be for a sequenced approach versus a direct
> > approach. I expect the adder tree approach to be many times faster, but it is
> > possible to fit many more neurons operating in parallel with a sequenced approach
> > because it uses fewer resources.
> > 1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
> > Thor currently has 8 neurons operating in parallel, but it may be possible to increase
> > the count to 16.
> > Software is going to be relied on to map a virtual neural net to the available neurons.
> Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
> Decent Wikipedia page on barrel processors in general (with several examples).
> Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
> He utilizes the DSP blocks at their maximum frequency.
> Another barrel processor: https://opencores.org/projects/avr_hp, probably others
> And other vector and multiple issue designs.

Uncle: Deep NN is too big and moving too fast for me to keep up.
https://xilinx.github.io/finn/about looks like my entry point into hacking FPGA NNs

Re: Neural Network Accelerators

<cd27c608-6806-4d94-8b4e-54d7d6b5b84bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=22290&group=comp.arch#22290

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:e41:: with SMTP id o1mr39429956qvc.88.1639358807316;
Sun, 12 Dec 2021 17:26:47 -0800 (PST)
X-Received: by 2002:a9d:f45:: with SMTP id 63mr22509673ott.350.1639358807095;
Sun, 12 Dec 2021 17:26:47 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 12 Dec 2021 17:26:46 -0800 (PST)
In-Reply-To: <c040cbee-e6c3-4ff4-83a7-7a9db7cc7a9en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=14.231.135.135; posting-account=I2fLiAoAAACmZTKaz8HlCywVp_d7xrxe
NNTP-Posting-Host: 14.231.135.135
References: <8ef0724a-811b-47ff-ad20-709c8c211a37n@googlegroups.com>
<sml4tj$img$1@dont-email.me> <smlm6d$o6a$1@gioia.aioe.org>
<bfbea040-8ef0-4020-aa45-56ae1531f4f8n@googlegroups.com> <896c6088-0cdf-4ad3-b432-544b228f7924n@googlegroups.com>
<FtTjJ.17520$hm7.7298@fx07.iad> <oSTjJ.50938$SR4.17611@fx43.iad>
<smp6vg$c4v$1@dont-email.me> <cV8kJ.31989$_Y5.24885@fx29.iad>
<17882f4b-ad6f-4603-85c2-165ca9fc84f9n@googlegroups.com> <N3bkJ.63875$Wkjc.36258@fx35.iad>
<aCfkJ.35506$SW5.13028@fx45.iad> <779e5c29-0576-4dc1-8c8e-79e57387c7b9n@googlegroups.com>
<ILvkJ.66683$Wkjc.15861@fx35.iad> <smu31d$te9$1@dont-email.me>
<5PxkJ.72842$g35.33193@fx11.iad> <sn1dac$gdc$1@dont-email.me>
<b3e44c57-e16e-4cd2-b824-78ad1302bd1fn@googlegroups.com> <rV8lJ.112170$IW4.87639@fx48.iad>
<p4wlJ.33295$Gco3.6417@fx01.iad> <bbc82b8a-5321-474c-8d52-58f56f09fa40n@googlegroups.com>
<x2AlJ.22571$452.388@fx22.iad> <2f0da24b-6369-4902-9297-1035512eea9cn@googlegroups.com>
<8349d63e-9c02-488b-bc2c-ca357a1fd6a6n@googlegroups.com> <4436edc9-9f6e-4788-8f06-01fa093f286dn@googlegroups.com>
<c040cbee-e6c3-4ff4-83a7-7a9db7cc7a9en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cd27c608-6806-4d94-8b4e-54d7d6b5b84bn@googlegroups.com>
Subject: Re: Neural Network Accelerators
From: seanhad...@gmail.com (Sean O'Connor)
Injection-Date: Mon, 13 Dec 2021 01:26:47 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 0
 by: Sean O'Connor - Mon, 13 Dec 2021 01:26 UTC

The fast Walsh Hadamard transform needs only add and subtract operations. It is easy to turn into a neural network. I have a booklet for $25 US. Side-channel browser attacks are making typing very slow. Lol.

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor