novaBBS - comp.arch - New IBM AI accelerator functions

New IBM AI accelerator functions

<t3dujv$er4$1@newsreader4.netcologne.de>

https://www.novabbs.com/devel/article-flat.php?id=24795&group=comp.arch#24795

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3c49-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: New IBM AI accelerator functions
Date: Sat, 16 Apr 2022 08:26:40 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t3dujv$er4$1@newsreader4.netcologne.de>
Injection-Date: Sat, 16 Apr 2022 08:26:40 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3c49-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3c49:0:7285:c2ff:fe6c:992d";
logging-data="15204"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sat, 16 Apr 2022 08:26 UTC

With their new z16 series and the Telum processor, IBM has put
a lot of half precision fmas into their mainframes - apparently,
they want to do AI inference in realtime on their new mainframes
(credit card fraud is cited).

Some information is at
https://www.hpcwire.com/2021/08/23/ibms-upcoming-z-series-chip-gains-on-chip-ai-acceleration-and-new-name-telum/

According to one slide, they have a new "Neural Network Processing
Assist" instruction, which is a memory-to-memory CISC instruction
doing matrix multiplication, convolution and activation functions
They also prefetch data into the L2 cache.

This certainly solves the "graphics card is too far from the CPU"
problem.

They have special fma nodes (128 processor tiles with 8-way
FP-16 SIMD) and 32 nodes specialized for different activation
functions, either in 16 or 32-bit floating point.

Interesting.

It also shows that half precision is moving towards the mainstream,
or should I say mainframe?

Re: New IBM AI accelerator functions

<t3eoi3$8gq$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24800&group=comp.arch#24800

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: New IBM AI accelerator functions
Date: Sat, 16 Apr 2022 08:49:21 -0700
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <t3eoi3$8gq$1@dont-email.me>
References: <t3dujv$er4$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 16 Apr 2022 15:49:23 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="67412a8536244c1c24b0b751a194f9cb";
logging-data="8730"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2nnCVhMzKUYyv6s3o7IinWrDdzun0sXs="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
Cancel-Lock: sha1:GS7hVqGNB/4RU2dQGuwdA/oB13s=
In-Reply-To: <t3dujv$er4$1@newsreader4.netcologne.de>
Content-Language: en-US

by: Stephen Fuld - Sat, 16 Apr 2022 15:49 UTC

On 4/16/2022 1:26 AM, Thomas Koenig wrote:
> With their new z16 series and the Telum processor, IBM has put
> a lot of half precision fmas into their mainframes - apparently,
> they want to do AI inference in realtime on their new mainframes
> (credit card fraud is cited).
>
> Some information is at
> https://www.hpcwire.com/2021/08/23/ibms-upcoming-z-series-chip-gains-on-chip-ai-acceleration-and-new-name-telum/
>
> According to one slide, they have a new "Neural Network Processing
> Assist" instruction, which is a memory-to-memory CISC instruction
> doing matrix multiplication, convolution and activation functions
> They also prefetch data into the L2 cache.
>
> This certainly solves the "graphics card is too far from the CPU"
> problem.
>
> They have special fma nodes (128 processor tiles with 8-way
> FP-16 SIMD) and 32 nodes specialized for different activation
> functions, either in 16 or 32-bit floating point.
>
> Interesting.
>
> It also shows that half precision is moving towards the mainstream,
> or should I say mainframe?

A little more information from IBM's Hot Chips presentation

https://hc33.hotchips.org/assets/program/conference/day1/HC2021.C1.3%20IBM%20Cristian%20Jacobi%20Final.pdf

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: New IBM AI accelerator functions

<t3gb9e$4d8$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24808&group=comp.arch#24808

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3c49-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: New IBM AI accelerator functions
Date: Sun, 17 Apr 2022 06:15:10 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t3gb9e$4d8$1@newsreader4.netcologne.de>
References: <t3dujv$er4$1@newsreader4.netcologne.de>
<t3eoi3$8gq$1@dont-email.me>
Injection-Date: Sun, 17 Apr 2022 06:15:10 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3c49-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3c49:0:7285:c2ff:fe6c:992d";
logging-data="4520"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 17 Apr 2022 06:15 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
> On 4/16/2022 1:26 AM, Thomas Koenig wrote:
>> With their new z16 series and the Telum processor, IBM has put
>> a lot of half precision fmas into their mainframes - apparently,
>> they want to do AI inference in realtime on their new mainframes
>> (credit card fraud is cited).
>>
>> Some information is at
>> https://www.hpcwire.com/2021/08/23/ibms-upcoming-z-series-chip-gains-on-chip-ai-acceleration-and-new-name-telum/
>>
>> According to one slide, they have a new "Neural Network Processing
>> Assist" instruction, which is a memory-to-memory CISC instruction
>> doing matrix multiplication, convolution and activation functions
>> They also prefetch data into the L2 cache.
>>
>> This certainly solves the "graphics card is too far from the CPU"
>> problem.
>>
>> They have special fma nodes (128 processor tiles with 8-way
>> FP-16 SIMD) and 32 nodes specialized for different activation
>> functions, either in 16 or 32-bit floating point.
>>
>> Interesting.
>>
>> It also shows that half precision is moving towards the mainstream,
>> or should I say mainframe?
>
> A little more information from IBM's Hot Chips presentation
>
> https://hc33.hotchips.org/assets/program/conference/day1/HC2021.C1.3%20IBM%20Cristian%20Jacobi%20Final.pdf

Ah, that was the presentation I was missing.

In the talk itself, there's a little bit more info:
https://www.youtube.com/watch?v=fUqOdu2ympk starting at 17:10.

Linus? Whose that? -- clueless newbie on #Linux

devel / comp.arch / New IBM AI accelerator functions

Subject	Author
New IBM AI accelerator functions	Thomas Koenig
Re: New IBM AI accelerator functions	Stephen Fuld
Re: New IBM AI accelerator functions	Thomas Koenig