Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.

Three fundamental flaws of SIMD

Subject	Author
Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Thomas Koenig
Re: Three fundamental flaws of SIMD	George Neuner
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	George Neuner
Re: Three fundamental flaws of SIMD	Terje Mathisen
Re: Three fundamental flaws of SIMD	Anton Ertl
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Stefan Monnier
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	Thomas Koenig
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Terje Mathisen
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Terje Mathisen
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Terje Mathisen
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	Terje Mathisen
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	Brett
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	Stephen Fuld
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	Stephen Fuld
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	Romain Dolbeau
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	EricP
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	EricP
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	pec...@gmail.com
Re: Three fundamental flaws of SIMD	EricP
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	mac
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	Terje Mathisen
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	Anton Ertl
Re: Three fundamental flaws of SIMD	Quadibloc
Re: Three fundamental flaws of SIMD	Ivan Godard
Re: Three fundamental flaws of SIMD	Marcus
Re: Three fundamental flaws of SIMD	Quadibloc
Re: Three fundamental flaws of SIMD	Anton Ertl
Re: Three fundamental flaws of SIMD	Michael S
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	MitchAlsup
Re: Three fundamental flaws of SIMD	BGB
Re: Three fundamental flaws of SIMD	Thomas Koenig
Re: Three fundamental flaws of SIMD	Anton Ertl
Re: Three fundamental flaws of SIMD	Quadibloc
Re: Three fundamental flaws of SIMD	Paul A. Clayton

Pages:12 3 4 5 6 7 8

Three fundamental flaws of SIMD

<sf3kim$l3o$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19750&group=comp.arch#19750

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Three fundamental flaws of SIMD
Date: Thu, 12 Aug 2021 19:08:37 +0200
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <sf3kim$l3o$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 12 Aug 2021 17:08:38 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bc96407aef1feb270fd6e2d8b72fae56";
logging-data="21624"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18TRp84KSLMKfPikSC/kvcHIraKFntrVME="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1://lewm4xBDaqX5hPtbzR7kjd6p4=
Content-Language: en-US
X-Mozilla-News-Host: snews://news.eternal-september.org:563

by: Marcus - Thu, 12 Aug 2021 17:08 UTC

I posted this article on my blog:

https://www.bitsnbites.eu/three-fundamental-flaws-of-simd

....basically stating the obvious (IMO), without putting too much
emphasis on how some alternatives solve some of the issues etc.

There were quite a few replies on reddit [1] and hackernesws [2],
and judging by the comments it seems that many software developers are
actually very fond of packed SIMD. :-/

/Marcus

[1]
https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd

[2] https://news.ycombinator.com/item?id=28114934

Re: Three fundamental flaws of SIMD

<sf3lm7$khf$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19751&group=comp.arch#19751

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3b2f-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Thu, 12 Aug 2021 17:27:35 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sf3lm7$khf$1@newsreader4.netcologne.de>
References: <sf3kim$l3o$1@dont-email.me>
Injection-Date: Thu, 12 Aug 2021 17:27:35 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3b2f-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3b2f:0:7285:c2ff:fe6c:992d";
logging-data="21039"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 12 Aug 2021 17:27 UTC

Marcus <m.delete@this.bitsnbites.eu> schrieb:
> I posted this article on my blog:
>
> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>
> ...basically stating the obvious (IMO), without putting too much
> emphasis on how some alternatives solve some of the issues etc.
>
> There were quite a few replies on reddit [1] and hackernesws [2],
> and judging by the comments it seems that many software developers are
> actually very fond of packed SIMD. :-/

People are familiar with SIMD, and they apparently are not willing
to entertain the thought that there could be something better.

Plus, somebody who has some skill at wringing something useful
out of SIMD, which may be a source of pride and even monetary
compensation, will shy away from thinking too hard about anything
that may be better.

(When I learned about SIMD, coming from a vector computer background,
I certainly was disappointed by its limitations.)

Re: Three fundamental flaws of SIMD

<jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19753&group=comp.arch#19753

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Thu, 12 Aug 2021 14:36:38 -0400
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>
References: <sf3kim$l3o$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="49f1c261e423e4482e89bdca8058d30a";
logging-data="12534"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190bi59nWsMCc6ogSVW9ua5"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:2nO7JV6fMHJi1y2stbbhZ3MCuTA=
sha1:JF741Os06x27XUsqVGlOCBcvKpE=

by: Stefan Monnier - Thu, 12 Aug 2021 18:36 UTC

Marcus [2021-08-12 19:08:37] wrote:
> I posted this article on my blog:
> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
> ...basically stating the obvious (IMO), without putting too much
> emphasis on how some alternatives solve some of the issues etc.
> There were quite a few replies on reddit [1] and hackernesws [2],
> and judging by the comments it seems that many software developers are
> actually very fond of packed SIMD. :-/

Most software developers are basically alien to computer architecture
as an engineering discipline.

The ISA is imposed on them from outside because it is decided by the
machines they can use/buy. From that point of view, they can only
understand your article as "don't use the SIMD primitives provided by
your hardware" ;-)

Stefan

Re: Three fundamental flaws of SIMD

<sf3t31$k3r$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19756&group=comp.arch#19756

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Thu, 12 Aug 2021 14:33:44 -0500
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <sf3t31$k3r$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 12 Aug 2021 19:33:53 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="2846ba4a2fc8b3940ce8f6bd0028e07a";
logging-data="20603"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18dhoVcaeNGxuwPVH85UAbN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:PHnB16g0cx0xLjG/26yCSTZ8GPE=
In-Reply-To: <jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: BGB - Thu, 12 Aug 2021 19:33 UTC

On 8/12/2021 1:36 PM, Stefan Monnier wrote:
> Marcus [2021-08-12 19:08:37] wrote:
>> I posted this article on my blog:
>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>> ...basically stating the obvious (IMO), without putting too much
>> emphasis on how some alternatives solve some of the issues etc.
>> There were quite a few replies on reddit [1] and hackernesws [2],
>> and judging by the comments it seems that many software developers are
>> actually very fond of packed SIMD. :-/
>
> Most software developers are basically alien to computer architecture
> as an engineering discipline.
>
> The ISA is imposed on them from outside because it is decided by the
> machines they can use/buy. From that point of view, they can only
> understand your article as "don't use the SIMD primitives provided by
> your hardware" ;-)
>

And, from the hardware design front, SIMD is comparably cheap and easy
to implement, more so if it can be done in a way which mostly reuses
parts of the CPU core that one already has laying around for Non-SIMD
operations.

Doing vectors by using more advanced forms of pipe-lining is not exactly
ideal in some cases, particularly if it would require increasing the
effective number of pipeline stages or the number of register file ports.

Though, I will admit that SIMD isn't necessarily the best option in an
ISA design elegance sense...

Re: Three fundamental flaws of SIMD

<br0bhgd34iru4cnttn3m0cvnbrjjk0n59o@4ax.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19757&group=comp.arch#19757

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: gneun...@comcast.net (George Neuner)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Thu, 12 Aug 2021 16:33:11 -0400
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <br0bhgd34iru4cnttn3m0cvnbrjjk0n59o@4ax.com>
References: <sf3kim$l3o$1@dont-email.me> <sf3lm7$khf$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="3ef64917554faca371f27a2392d6e4c0";
logging-data="12838"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18p2wrd/MKzdYMqXYiHsP3v7Q6uQgrUxDM="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:3BDPQoyPAFPJXdKq4p/EbJDmC9k=

by: George Neuner - Thu, 12 Aug 2021 20:33 UTC

On Thu, 12 Aug 2021 17:27:35 -0000 (UTC), Thomas Koenig
<tkoenig@netcologne.de> wrote:

>(When I learned about SIMD, coming from a vector computer background,
>I certainly was disappointed by its limitations.)

Coming from Connection Machine I was even /more/ disappointed.
POD vectors are nice, but vectors of pointers are a lot nicer.

Re: Three fundamental flaws of SIMD

<66291660-bbe9-47cf-ba2a-a2c9d5301992n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19760&group=comp.arch#19760

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:438e:: with SMTP id s14mr5931576qvr.26.1628802561615;
Thu, 12 Aug 2021 14:09:21 -0700 (PDT)
X-Received: by 2002:a05:6830:1dd0:: with SMTP id a16mr4923907otj.22.1628802561362;
Thu, 12 Aug 2021 14:09:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 12 Aug 2021 14:09:21 -0700 (PDT)
In-Reply-To: <sf3lm7$khf$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sf3kim$l3o$1@dont-email.me> <sf3lm7$khf$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <66291660-bbe9-47cf-ba2a-a2c9d5301992n@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 12 Aug 2021 21:09:21 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2757

by: MitchAlsup - Thu, 12 Aug 2021 21:09 UTC

On Thursday, August 12, 2021 at 12:27:37 PM UTC-5, Thomas Koenig wrote:
> Marcus <m.de...@this.bitsnbites.eu> schrieb:
> > I posted this article on my blog:
> >
> > https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
> >
> > ...basically stating the obvious (IMO), without putting too much
> > emphasis on how some alternatives solve some of the issues etc.
> >
> > There were quite a few replies on reddit [1] and hackernesws [2],
> > and judging by the comments it seems that many software developers are
> > actually very fond of packed SIMD. :-/
<
> People are familiar with SIMD, and they apparently are not willing
> to entertain the thought that there could be something better.
<
My, personal, guess is that people simply write code in HLLs and
let the compiler do its thing--not really caring about the SIMDness
of the code.
<
Compiler writers are aware and conscious of SIMDness caught between
the ever growing SIMD complexity and the screwy things they have to
do to get good code spit out the other end.
<
Operating System people are pretty ignorant, except for the size of the
stuff they have to save/restore around context switches, and when they
get inside various system library functions that the compiler spit out in
SIMD form.
>
> Plus, somebody who has some skill at wringing something useful
> out of SIMD, which may be a source of pride and even monetary
> compensation, will shy away from thinking too hard about anything
> that may be better.
>
> (When I learned about SIMD, coming from a vector computer background,
> I certainly was disappointed by its limitations.)

Re: Three fundamental flaws of SIMD

<e1ef6d0d-cd1d-4046-b638-3b65d4a12693n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19761&group=comp.arch#19761

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:9e4f:: with SMTP id h76mr7028251qke.24.1628809147024;
Thu, 12 Aug 2021 15:59:07 -0700 (PDT)
X-Received: by 2002:aca:f089:: with SMTP id o131mr5120645oih.37.1628809146634;
Thu, 12 Aug 2021 15:59:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 12 Aug 2021 15:59:06 -0700 (PDT)
In-Reply-To: <sf3kim$l3o$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.183.224; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.183.224
References: <sf3kim$l3o$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e1ef6d0d-cd1d-4046-b638-3b65d4a12693n@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 12 Aug 2021 22:59:06 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Thu, 12 Aug 2021 22:59 UTC

On Thursday, August 12, 2021 at 8:08:41 PM UTC+3, Marcus wrote:
> I posted this article on my blog:
>
> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>
> ...basically stating the obvious (IMO), without putting too much
> emphasis on how some alternatives solve some of the issues etc.
>
> There were quite a few replies on reddit [1] and hackernesws [2],
> and judging by the comments it seems that many software developers are
> actually very fond of packed SIMD. :-/
>
> /Marcus
>
>
> [1]
> https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd
>
> [2] https://news.ycombinator.com/item?id=28114934

You could spend your time better than writing it.
I could spend my time better than reading it.

Re: Three fundamental flaws of SIMD

<fdc22e32-9142-4b8a-b8a2-df9a4e8a852fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19763&group=comp.arch#19763

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1007:: with SMTP id d7mr6520065qte.158.1628813135787;
Thu, 12 Aug 2021 17:05:35 -0700 (PDT)
X-Received: by 2002:aca:b656:: with SMTP id g83mr5401420oif.84.1628813135556;
Thu, 12 Aug 2021 17:05:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 12 Aug 2021 17:05:35 -0700 (PDT)
In-Reply-To: <sf3kim$l3o$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sf3kim$l3o$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fdc22e32-9142-4b8a-b8a2-df9a4e8a852fn@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 13 Aug 2021 00:05:35 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Fri, 13 Aug 2021 00:05 UTC

On Thursday, August 12, 2021 at 12:08:41 PM UTC-5, Marcus wrote:
> I posted this article on my blog:
>
> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd

The fundamental flaw in SIMD is

a) one should be able to encode a vectorizable loop once and have it
run at the maximum performance of each future generation machine.
One should never have to spit out different instructions just because
the width of the SIMD registers was widened.

b) one should be able to encode a SIMD instruction such that it performs
as wide as the implementation has resources (or as narrow) using the
same OpCodes.

Re: Three fundamental flaws of SIMD

<f22f22d7-bd99-4d94-bc51-f230bf7428c8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19768&group=comp.arch#19768

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7207:: with SMTP id a7mr141397qtp.32.1628824823497;
Thu, 12 Aug 2021 20:20:23 -0700 (PDT)
X-Received: by 2002:a05:6830:1dd0:: with SMTP id a16mr429218otj.22.1628824823105;
Thu, 12 Aug 2021 20:20:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 12 Aug 2021 20:20:22 -0700 (PDT)
In-Reply-To: <sf3kim$l3o$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:f39d:2c00:6044:f930:adec:10c9;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:f39d:2c00:6044:f930:adec:10c9
References: <sf3kim$l3o$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f22f22d7-bd99-4d94-bc51-f230bf7428c8n@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 13 Aug 2021 03:20:23 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Fri, 13 Aug 2021 03:20 UTC

On Thursday, August 12, 2021 at 11:08:41 AM UTC-6, Marcus wrote:

> and judging by the comments it seems that many software developers are
> actually very fond of packed SIMD. :-/

Well, if it's all you can get, it's better than nothing.

On my web page, there's a discussion of an imaginary computer architecture
which began as a simple example of how computers work, but which then
grew to include every feature that existed on some historical computer
somewhere.

So it included both packed SIMD, like MMX and its successors, and
unpacked SIMD like the Cray-I, and illustrated how the two differed;
for example, there are two different ways to provide hardware assist
for FFT, and for some reason one was more suitable to packed SIMD
and the other to unpacked SIMD, or at least so I thought when I
prepared the page.

http://www.quadibloc.com/arch/ar0102.htm

et seq. ...

John Savard

Re: Three fundamental flaws of SIMD

<sf4olc$1o8$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19769&group=comp.arch#19769

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Thu, 12 Aug 2021 20:24:28 -0700
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <sf4olc$1o8$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<f22f22d7-bd99-4d94-bc51-f230bf7428c8n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 03:24:28 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a8cd4a1f7d9781973b00cf666e1db0d9";
logging-data="1800"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Trc/xavUe4w6q8gCKI/wH"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:giqZN6YSsKeFvZcYq2t07t9hbb4=
In-Reply-To: <f22f22d7-bd99-4d94-bc51-f230bf7428c8n@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Fri, 13 Aug 2021 03:24 UTC

On 8/12/2021 8:20 PM, Quadibloc wrote:
> On Thursday, August 12, 2021 at 11:08:41 AM UTC-6, Marcus wrote:
>
>> and judging by the comments it seems that many software developers are
>> actually very fond of packed SIMD. :-/
>
> Well, if it's all you can get, it's better than nothing.
>
> On my web page, there's a discussion of an imaginary computer architecture
> which began as a simple example of how computers work, but which then
> grew to include every feature that existed on some historical computer
> somewhere.
>
> So it included both packed SIMD, like MMX and its successors, and
> unpacked SIMD like the Cray-I, and illustrated how the two differed;
> for example, there are two different ways to provide hardware assist
> for FFT, and for some reason one was more suitable to packed SIMD
> and the other to unpacked SIMD, or at least so I thought when I
> prepared the page.
>
> http://www.quadibloc.com/arch/ar0102.htm
>
> et seq. ...
>
> John Savard
>

Fairly described as "ISA Stockholm Syndrome"?

Re: Three fundamental flaws of SIMD

<sf54se$t9b$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19770&group=comp.arch#19770

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 08:53:01 +0200
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <sf54se$t9b$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 06:53:02 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="efb81c87f9ea9d52da321be3cff0e0f3";
logging-data="29995"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18m3uMIkDyor9nJQHex5DvsgDTg9nQ3gL4="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:L3hEkdz5B2jjyt98DDhq4Hvai0A=
In-Reply-To: <jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: Marcus - Fri, 13 Aug 2021 06:53 UTC

On 2021-08-12 20:36, Stefan Monnier wrote:
> Marcus [2021-08-12 19:08:37] wrote:
>> I posted this article on my blog:
>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>> ...basically stating the obvious (IMO), without putting too much
>> emphasis on how some alternatives solve some of the issues etc.
>> There were quite a few replies on reddit [1] and hackernesws [2],
>> and judging by the comments it seems that many software developers are
>> actually very fond of packed SIMD. :-/
>
> Most software developers are basically alien to computer architecture
> as an engineering discipline.
>
> The ISA is imposed on them from outside because it is decided by the
> machines they can use/buy. From that point of view, they can only
> understand your article as "don't use the SIMD primitives provided by
> your hardware" ;-)
>
>
> Stefan

Funny thing is that I come from a software developer background, and I
have hated those aspects mentioned in the article for as long as I can
remember.

When I first learned about how Cray-1 worked (some four years ago I
think) I was like "Wow! It must be a joy to program that thing".

Then I created my own vector ISA and wrote some demos, and I was like
"Wow! It's really a joy to program this thing".

I naively though that others would have similar feelings.

/Marcus

Re: Three fundamental flaws of SIMD

<sf5ags$tpv$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19772&group=comp.arch#19772

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 10:29:16 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <sf5ags$tpv$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<e1ef6d0d-cd1d-4046-b638-3b65d4a12693n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 08:29:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="efb81c87f9ea9d52da321be3cff0e0f3";
logging-data="30527"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ZzxJYY2umDaIbwO1On2HiRIeFn5hRpNM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:dM/3nce27k52YIZpD/HS7guZNYs=
In-Reply-To: <e1ef6d0d-cd1d-4046-b638-3b65d4a12693n@googlegroups.com>
Content-Language: en-US

by: Marcus - Fri, 13 Aug 2021 08:29 UTC

On 2021-08-13 00:59, Michael S wrote:
> On Thursday, August 12, 2021 at 8:08:41 PM UTC+3, Marcus wrote:
>> I posted this article on my blog:
>>
>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>>
>> ...basically stating the obvious (IMO), without putting too much
>> emphasis on how some alternatives solve some of the issues etc.
>>
>> There were quite a few replies on reddit [1] and hackernesws [2],
>> and judging by the comments it seems that many software developers are
>> actually very fond of packed SIMD. :-/
>>
>> /Marcus
>>
>>
>> [1]
>> https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd
>>
>> [2] https://news.ycombinator.com/item?id=28114934
>
> You could spend your time better than writing it.
> I could spend my time better than reading it.
>

Feel free to skip reading it :-)

I think that some people need to read it, though, since apparently it's
far from obvious to most people. I wrote it since I found myself
explaining the same things over and over in different forums, and it's
easier to have a text to refer to.

/Marcus

Re: Three fundamental flaws of SIMD

<sf5as4$fv$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19773&group=comp.arch#19773

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 10:35:15 +0200
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <sf5as4$fv$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<fdc22e32-9142-4b8a-b8a2-df9a4e8a852fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 08:35:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="efb81c87f9ea9d52da321be3cff0e0f3";
logging-data="511"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19HGRPp+U6u8Xw5cROEZ/zQRJWKGgaD9Dk="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:8BBPabdxWVSTptiLdLpDQL6/ah4=
In-Reply-To: <fdc22e32-9142-4b8a-b8a2-df9a4e8a852fn@googlegroups.com>
Content-Language: en-US

by: Marcus - Fri, 13 Aug 2021 08:35 UTC

On 2021-08-13 02:05, MitchAlsup wrote:
> On Thursday, August 12, 2021 at 12:08:41 PM UTC-5, Marcus wrote:
>> I posted this article on my blog:
>>
>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>
> The fundamental flaw in SIMD is
>
> a) one should be able to encode a vectorizable loop once and have it
> run at the maximum performance of each future generation machine.
> One should never have to spit out different instructions just because
> the width of the SIMD registers was widened.
>
> b) one should be able to encode a SIMD instruction such that it performs
> as wide as the implementation has resources (or as narrow) using the
> same OpCodes.
>

Yep!

I also added:

c) Packed SIMD is just as sensitive to data hazards as scalar code is,
so you either need proper OoO HW, or you need to unroll all SIMD loops
in SW. And since /some/ implementations are in-order (at least for the
SIMD part), the compiler /always/ unrolls loops -> code size bloat ->
I$ penalty.

d) Tail handling for data sets that are not multiples of the SIMD
width...

/Marcus

Re: Three fundamental flaws of SIMD

<sf5c0r$4vq$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19775&group=comp.arch#19775

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!pIhVuqI7njB9TMV+aIPpbg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 10:54:50 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sf5c0r$4vq$1@gioia.aioe.org>
References: <sf3kim$l3o$1@dont-email.me>
<fdc22e32-9142-4b8a-b8a2-df9a4e8a852fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="5114"; posting-host="pIhVuqI7njB9TMV+aIPpbg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Fri, 13 Aug 2021 08:54 UTC

MitchAlsup wrote:
> On Thursday, August 12, 2021 at 12:08:41 PM UTC-5, Marcus wrote:
>> I posted this article on my blog:
>>
>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>
> The fundamental flaw in SIMD is
>
> a) one should be able to encode a vectorizable loop once and have it
> run at the maximum performance of each future generation machine.
> One should never have to spit out different instructions just because
> the width of the SIMD registers was widened.
>
> b) one should be able to encode a SIMD instruction such that it performs
> as wide as the implementation has resources (or as narrow) using the
> same OpCodes.

This has been done well in C#, they have vector operations that will
turn into optimal SIMD instructions during the final JIT/AOT stage, this
way they can optimize for the local CPU, and update the compiler for
future platforms.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Three fundamental flaws of SIMD

<2021Aug13.101107@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19777&group=comp.arch#19777

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 08:11:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 38
Message-ID: <2021Aug13.101107@mips.complang.tuwien.ac.at>
References: <sf3kim$l3o$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="0fd349d651368433610776fd56ad7a3b";
logging-data="12157"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pt9LNUYEmO849YsGpq+H4"
Cancel-Lock: sha1:Z4fpahpvWFWWNzrOHoHcLRKgneo=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Fri, 13 Aug 2021 08:11 UTC

Marcus <m.delete@this.bitsnbites.eu> writes:
>I posted this article on my blog:
>
>https://www.bitsnbites.eu/three-fundamental-flaws-of-simd

Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
becomes apparent in the recently announced CPUs: processes cannot
migrate between cores with different register width, so they choose
128 bit wide registers for both small and large cores. How could SVE
have been modified to avoid this flaw? Does RVV have it?

Flaw 2 starts out with a false claim. There is no requirement that
execution units are as wide as the registers, and AMD has delivered a
lot of CPUs where the execution units were half as wide as the
registers: Palomino and K8 implemented SSE (and, for K8, SSE2) with
64-bit wide functional units, Jaguar and later cats, Bulldozer and its
descendents, and Zen 1 implemented AVX-256 with 128-bit-wide
functional units. Could be a solution for ARMs SVE flaw.

In-order CPUs don't play a role for the AMD64 architecture, and
compilers certainly don't unroll AVX code for that (there is no
AVX-capable in-order CPU). Compilers unroll loops in order to reduce
loop overhead, and for additional optimization options.

Flaw 3: In the comments section, you reveal that tail handling is a
problem for auto-vectorization. My take is that auto-vectorization is
a flawed concept, mainly because it is unreliable: you rub the
compiler the wrong way, and it will stop auto-vectorizing the loop
without giving any warning. Manual vectorization is a better
approach, and when done right, tail handling is no problem. See
Sections 2.2-2.4 of

http://www.complang.tuwien.ac.at/papers/ertl18manlang.pdf

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Three fundamental flaws of SIMD

<a31cb882-3f8a-4f44-b4de-971a4c27af29n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19778&group=comp.arch#19778

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:15c4:: with SMTP id d4mr1380039qty.350.1628852213012;
Fri, 13 Aug 2021 03:56:53 -0700 (PDT)
X-Received: by 2002:aca:59c6:: with SMTP id n189mr1728716oib.44.1628852212715;
Fri, 13 Aug 2021 03:56:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 13 Aug 2021 03:56:52 -0700 (PDT)
In-Reply-To: <2021Aug13.101107@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.183.224; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.183.224
References: <sf3kim$l3o$1@dont-email.me> <2021Aug13.101107@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a31cb882-3f8a-4f44-b4de-971a4c27af29n@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 13 Aug 2021 10:56:53 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Fri, 13 Aug 2021 10:56 UTC

On Friday, August 13, 2021 at 12:05:43 PM UTC+3, Anton Ertl wrote:
> Marcus <m.de...@this.bitsnbites.eu> writes:
> >I posted this article on my blog:
> >
> >https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
> Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
> becomes apparent in the recently announced CPUs: processes cannot
> migrate between cores with different register width, so they choose
> 128 bit wide registers for both small and large cores. How could SVE
> have been modified to avoid this flaw? Does RVV have it?
>
> Flaw 2 starts out with a false claim. There is no requirement that
> execution units are as wide as the registers, and AMD has delivered a
> lot of CPUs where the execution units were half as wide as the
> registers: Palomino and K8 implemented SSE (and, for K8, SSE2) with
> 64-bit wide functional units, Jaguar and later cats, Bulldozer and its
> descendents, and Zen 1 implemented AVX-256 with 128-bit-wide
> functional units. Could be a solution for ARMs SVE flaw.
>

Intel had 64-bit FP EUs for SSE/SSE2/3 in their main CPU line since introduction of SSE and up until Merom,
i.e. ~8 years later.
On Atom side single precision EUs were 128-bit, but double-precision was half-width for many generations.
May be, still is, I didn't check.

I am a fan of >1 ratio between register width and EUs width myself.
But the reasonable ratios nowadays, i.e.
in presence of caches,
wide superscalar cores,
several cores sharing at least part of on-chip cache,
power consumption as main bottleneck,
no less than 16 software-visible VRs,
other things, I forgot,
are 1, 2 or 4.
CRAY-like ratio today is a horrible idea.
IIRC, Alpha Tarantula proposal argued for 8, but their cache subsystem was very different from how it's done today
and power consumption was low on the list of their priorities.

> In-order CPUs don't play a role for the AMD64 architecture, and
> compilers certainly don't unroll AVX code for that (there is no
> AVX-capable in-order CPU). Compilers unroll loops in order to reduce
> loop overhead, and for additional optimization options.
>

One reason to unroll, even on OoO, is a true (RaW) dependency through accumulator,
which is very typical in inner-product loops.

> Flaw 3: In the comments section, you reveal that tail handling is a
> problem for auto-vectorization. My take is that auto-vectorization is
> a flawed concept, mainly because it is unreliable: you rub the
> compiler the wrong way, and it will stop auto-vectorizing the loop
> without giving any warning. Manual vectorization is a better
> approach, and when done right, tail handling is no problem.

Very true.

> See
> Sections 2.2-2.4 of
>
> http://www.complang.tuwien.ac.at/papers/ertl18manlang.pdf
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Three fundamental flaws of SIMD

<sf5m4o$9hs$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19780&group=comp.arch#19780

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 13:47:36 +0200
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <sf5m4o$9hs$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<fdc22e32-9142-4b8a-b8a2-df9a4e8a852fn@googlegroups.com>
<sf5c0r$4vq$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 11:47:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="efb81c87f9ea9d52da321be3cff0e0f3";
logging-data="9788"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Xo+baq8Q9hOarCKFKXQjyPWleP+3FATM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:mIVaQutpClZchqothrNaTd6bVw4=
In-Reply-To: <sf5c0r$4vq$1@gioia.aioe.org>
Content-Language: en-US

by: Marcus - Fri, 13 Aug 2021 11:47 UTC

On 2021-08-13 10:54, Terje Mathisen wrote:
> MitchAlsup wrote:
>> On Thursday, August 12, 2021 at 12:08:41 PM UTC-5, Marcus wrote:
>>> I posted this article on my blog:
>>>
>>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>>
>> The fundamental flaw in SIMD is
>>
>> a) one should be able to encode a vectorizable loop once and have it
>> run at the maximum performance of each future generation machine.
>> One should never have to spit out different instructions just because
>> the width of the SIMD registers was widened.
>>
>> b) one should be able to encode a SIMD instruction such that it performs
>> as wide as the implementation has resources (or as narrow) using the
>> same OpCodes.
>
> This has been done well in C#, they have vector operations that will
> turn into optimal SIMD instructions during the final JIT/AOT stage, this
> way they can optimize for the local CPU, and update the compiler for
> future platforms.

....which probably means that they will be able to map well to other
kinds of vector architectures too.

Even if a compiler / language can hide the details of the underlying
vector architecture, packed SIMD still suffers from code bloat though:
Essentially you need to describe things like pipelining and tail
handling in code rather than having the HW take care of it, which hurts
code density /and/ increases front end load.

>
> Terje
>

Re: Three fundamental flaws of SIMD

<sf5mbg$a8d$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19781&group=comp.arch#19781

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 13:51:12 +0200
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <sf5mbg$a8d$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<f22f22d7-bd99-4d94-bc51-f230bf7428c8n@googlegroups.com>
<sf4olc$1o8$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 11:51:12 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="efb81c87f9ea9d52da321be3cff0e0f3";
logging-data="10509"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2nT7/LNhjxiJa7WXzHI0MRwEW52BAiJQ="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:CKCZzNAhrI+Ef+do0gXzBEFgbO8=
In-Reply-To: <sf4olc$1o8$1@dont-email.me>
Content-Language: en-US

by: Marcus - Fri, 13 Aug 2021 11:51 UTC

On 2021-08-13 05:24, Ivan Godard wrote:

[snip]

> Fairly described as "ISA Stockholm Syndrome"?

I'll have to remember that! :-D

/Marcus

Re: Three fundamental flaws of SIMD

<sf64tc$kfe$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19787&group=comp.arch#19787

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 10:59:38 -0500
Organization: A noiseless patient Spider
Lines: 171
Message-ID: <sf64tc$kfe$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<2021Aug13.101107@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 15:59:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="31b55375a95f58d3f44c305fc14e5da9";
logging-data="20974"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19fcMVVl6yalJN9+yWmkp7n"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:WAw+JImz500CT3i4hQgR7nuHXA8=
In-Reply-To: <2021Aug13.101107@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: BGB - Fri, 13 Aug 2021 15:59 UTC

On 8/13/2021 3:11 AM, Anton Ertl wrote:
> Marcus <m.delete@this.bitsnbites.eu> writes:
>> I posted this article on my blog:
>>
>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>
> Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
> becomes apparent in the recently announced CPUs: processes cannot
> migrate between cores with different register width, so they choose
> 128 bit wide registers for both small and large cores. How could SVE
> have been modified to avoid this flaw? Does RVV have it?
>

IME, moving to vectors larger than 128 bits doesn't tend to gain much
over 128 bits. As the register size gets larger, there is less that can
utilize it effectively.

It is like integer size:
16 bits: can be used heavily, may or may not be sufficient;
32 bits: used heavily, usually sufficient;
64 bits: used occasionally, often useful;
128 bits: used rarely, sometimes useful;
256 bits: novelty size...

For SIMD vectors, there is roughly a factor of 4 relation:
64 bits: can be used heavily, frequently insufficient;
128 bits: used heavily, usually sufficient;
256 bits: used occasionally, sometimes useful;
512 bits: rarely useful.

Once the SIMD vector exceeds the length of the data one would likely
express using vectors, its utility drops off sharply.

The march to endlessly bigger SIMD vectors would make sense if each step
up gave a linear improvement, but makes less sense if it is in
diminishing returns territory.

So, this is part of why my current policy for BJX2 is doing 64 and 128
bit "vectors" and calling it good enough.

Not going to go 256 bits, at least not on a 3-wide core.

Similar tricks to what I used for 128-bit vectors could be used to allow
256-bit vectors on a 6-wide / SMT capable core, since most of the
mechanisms and plumbing would already be in place. Such a vector would
effectively use groups of 4 registers in parallel (and ganging the
memory ports from both SMT threads to allow 256-bit load/store).

So, implicitly, if I had a GSVY, it would implicitly depend on the
existence of WEX-6W...

But, then one might ask: OK, so why not then just issue a 128-bit SIMD
operation on Lane1A and Lane1B?...

And, I could respond: What exactly do you think it is that these 256-bit
operations would be doing?...

For related reasons, one can't have 128-bit operations on a core that
doesn't already support WEX-3W.

> Flaw 2 starts out with a false claim. There is no requirement that
> execution units are as wide as the registers, and AMD has delivered a
> lot of CPUs where the execution units were half as wide as the
> registers: Palomino and K8 implemented SSE (and, for K8, SSE2) with
> 64-bit wide functional units, Jaguar and later cats, Bulldozer and its
> descendents, and Zen 1 implemented AVX-256 with 128-bit-wide
> functional units. Could be a solution for ARMs SVE flaw.
>

FWIW: Despite having 128-bit SIMD, my BJX2 core doesn't actually
(currently) have any units which work natively with 128-bit data.

The register file still mostly operates as 64 bits, and many 128 bit
operations are effectively performed by running two 64-bit operations in
parallel. In many cases, it is simply expanding a single SIMD
instruction over multiple lanes (effectively, a bundle of virtual
operations).

In the ALUX extensions, in a few cases, the ALUs effectively combine
"Voltron style" by feeding bits between each other.

The 128-bit shift operations are effectively two 64-bit funnel shift
operations "in disguise", ...

Well, also having 64-bit units which can occasionally combine for
128-bit operations is a lot less wasteful than having a 128-bit unit
where 99% of the time its capabilities go unused.

And, if one's data doesn't fit nicely into a 128-bit vector... they can
often use two different 64-bit vector ops in parallel.

Packed Integer SIMD isn't done by adding more ALUs, but rather by
splitting up the sub-units within a carry-select adder:
In you select the results where each 16-bit element had a carry-in of
zero, a packed-word ADD magically appears.

For FPU SIMD, there is only one FPU, so SIMD operations internally just
do 4 operations in a row (by sequentially feeding the values into the
FADD or FMUL units and capturing the result out the other end).

I did it the way I did not necessarily because it was "best", but
because I could do it cheaply...

There are ways it could be made better/faster/..., but they would add
resource cost.

Similarly, "going narrower" and presenting vector operations in terms of
scalar operations on individual elements would either:
Slow everything down by essentially turning it back into scalar code;
Require adding a large number of execute lanes and register ports.

Neither of these options is desirable.

Yep.

Generally agreed.
I am actively against autovectorization on BJX2.
To what extent vectors are usable, they are in the form of explicit
language extensions.

I have done things in a way which which I feel doesn't suck nearly as
bad as the "xmmintrin.h" system, and also allows a subset of the GCC
vector extensions.

My approach also tries to be much less of an "ugly wart" on the C
language, so does things in ways that I feel are more consistent with
traditional C semantics (can use native operators, cast conversions, ...).

Though, this does mean some amount of expansion to the typesystem, but
as I feel it, expanding out the C typesystem and numeric tower is much
less evil than something like auto-vectorization, which implies
interpreting C code as if it were something very unlike what was
actually written (and for the compiler to essentially perform magic
tricks with the semantics), or "xmmintrin.h" which is more like thinly
veiled x86 ASM shoehorned into C's existing syntax.

Granted, it does also mean that, unless written to do so, existing
portable C code will not make any use of these sorts of vector extensions.

....

Re: Three fundamental flaws of SIMD

<f4dea063-fb39-48ea-a9d6-cf6663060ae5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19792&group=comp.arch#19792

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:96c2:: with SMTP id y185mr3035469qkd.6.1628871612435;
Fri, 13 Aug 2021 09:20:12 -0700 (PDT)
X-Received: by 2002:a9d:266a:: with SMTP id a97mr1295987otb.114.1628871612195;
Fri, 13 Aug 2021 09:20:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 13 Aug 2021 09:20:12 -0700 (PDT)
In-Reply-To: <sf54se$t9b$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sf3kim$l3o$1@dont-email.me> <jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org>
<sf54se$t9b$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f4dea063-fb39-48ea-a9d6-cf6663060ae5n@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 13 Aug 2021 16:20:12 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Fri, 13 Aug 2021 16:20 UTC

On Friday, August 13, 2021 at 1:53:04 AM UTC-5, Marcus wrote:
> On 2021-08-12 20:36, Stefan Monnier wrote:
> > Marcus [2021-08-12 19:08:37] wrote:
> >> I posted this article on my blog:
> >> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
> >> ...basically stating the obvious (IMO), without putting too much
> >> emphasis on how some alternatives solve some of the issues etc.
> >> There were quite a few replies on reddit [1] and hackernesws [2],
> >> and judging by the comments it seems that many software developers are
> >> actually very fond of packed SIMD. :-/
> >
> > Most software developers are basically alien to computer architecture
> > as an engineering discipline.
> >
> > The ISA is imposed on them from outside because it is decided by the
> > machines they can use/buy. From that point of view, they can only
> > understand your article as "don't use the SIMD primitives provided by
> > your hardware" ;-)
> >
> >
> > Stefan
> Funny thing is that I come from a software developer background, and I
> have hated those aspects mentioned in the article for as long as I can
> remember.
>
> When I first learned about how Cray-1 worked (some four years ago I
> think) I was like "Wow! It must be a joy to program that thing".
>
> Then I created my own vector ISA and wrote some demos, and I was like
> "Wow! It's really a joy to program this thing".
<
You would enjoy programming in My 66000 ISA.
>
> I naively though that others would have similar feelings.
>
> /Marcus

Re: Three fundamental flaws of SIMD

<d4469965-242d-4e4d-bc17-5f695b4acb18n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19793&group=comp.arch#19793

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57c4:: with SMTP id w4mr2678136qta.39.1628872174312; Fri, 13 Aug 2021 09:29:34 -0700 (PDT)
X-Received: by 2002:a05:6808:1509:: with SMTP id u9mr2757460oiw.119.1628872173896; Fri, 13 Aug 2021 09:29:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 13 Aug 2021 09:29:33 -0700 (PDT)
In-Reply-To: <sf64tc$kfe$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <sf3kim$l3o$1@dont-email.me> <2021Aug13.101107@mips.complang.tuwien.ac.at> <sf64tc$kfe$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d4469965-242d-4e4d-bc17-5f695b4acb18n@googlegroups.com>
Subject: Re: Three fundamental flaws of SIMD
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 13 Aug 2021 16:29:34 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 174

by: MitchAlsup - Fri, 13 Aug 2021 16:29 UTC

On Friday, August 13, 2021 at 10:59:42 AM UTC-5, BGB wrote:
> On 8/13/2021 3:11 AM, Anton Ertl wrote:
> > Marcus <m.de...@this.bitsnbites.eu> writes:
> >> I posted this article on my blog:
> >>
> >> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
> >
> > Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
> > becomes apparent in the recently announced CPUs: processes cannot
> > migrate between cores with different register width, so they choose
> > 128 bit wide registers for both small and large cores. How could SVE
> > have been modified to avoid this flaw? Does RVV have it?
> >
> IME, moving to vectors larger than 128 bits doesn't tend to gain much
> over 128 bits. As the register size gets larger, there is less that can
> utilize it effectively.
>
> It is like integer size:
> 16 bits: can be used heavily, may or may not be sufficient;
> 32 bits: used heavily, usually sufficient;
> 64 bits: used occasionally, often useful;
> 128 bits: used rarely, sometimes useful;
> 256 bits: novelty size...
>
> For SIMD vectors, there is roughly a factor of 4 relation:
> 64 bits: can be used heavily, frequently insufficient;
> 128 bits: used heavily, usually sufficient;
> 256 bits: used occasionally, sometimes useful;
> 512 bits: rarely useful.
>
> Once the SIMD vector exceeds the length of the data one would likely
> express using vectors, its utility drops off sharply.
>
>
> The march to endlessly bigger SIMD vectors would make sense if each step
> up gave a linear improvement, but makes less sense if it is in
> diminishing returns territory.
<
I am afraid I have to disagree with you here:: The march to endlessly bigger
SIMD vectors is because the root of the necessity was never addressed
correctly.
<
For example, all of the SIMD instructions in x86-64 are "performed"
by the addition of exactly 2 My 66000 instructions, and may more
the x86-64 are possible to express. ISA explosions shold be proactively
prevented not embraced.
>
>
> So, this is part of why my current policy for BJX2 is doing 64 and 128
> bit "vectors" and calling it good enough.
>
> Not going to go 256 bits, at least not on a 3-wide core.
<
Not going to 256-bits wide EVER.
>
>
> Similar tricks to what I used for 128-bit vectors could be used to allow
> 256-bit vectors on a 6-wide / SMT capable core, since most of the
> mechanisms and plumbing would already be in place. Such a vector would
> effectively use groups of 4 registers in parallel (and ganging the
> memory ports from both SMT threads to allow 256-bit load/store).
>
> So, implicitly, if I had a GSVY, it would implicitly depend on the
> existence of WEX-6W...
>
>
> But, then one might ask: OK, so why not then just issue a 128-bit SIMD
> operation on Lane1A and Lane1B?...
>
> And, I could respond: What exactly do you think it is that these 256-bit
> operations would be doing?...
>
> For related reasons, one can't have 128-bit operations on a core that
> doesn't already support WEX-3W.
> > Flaw 2 starts out with a false claim. There is no requirement that
> > execution units are as wide as the registers, and AMD has delivered a
> > lot of CPUs where the execution units were half as wide as the
> > registers: Palomino and K8 implemented SSE (and, for K8, SSE2) with
> > 64-bit wide functional units, Jaguar and later cats, Bulldozer and its
> > descendents, and Zen 1 implemented AVX-256 with 128-bit-wide
> > functional units. Could be a solution for ARMs SVE flaw.
> >
> FWIW: Despite having 128-bit SIMD, my BJX2 core doesn't actually
> (currently) have any units which work natively with 128-bit data.
>
> The register file still mostly operates as 64 bits, and many 128 bit
> operations are effectively performed by running two 64-bit operations in
> parallel. In many cases, it is simply expanding a single SIMD
> instruction over multiple lanes (effectively, a bundle of virtual
> operations).
>
> In the ALUX extensions, in a few cases, the ALUs effectively combine
> "Voltron style" by feeding bits between each other.
>
> The 128-bit shift operations are effectively two 64-bit funnel shift
> operations "in disguise", ...
>
>
> Well, also having 64-bit units which can occasionally combine for
> 128-bit operations is a lot less wasteful than having a 128-bit unit
> where 99% of the time its capabilities go unused.
>
> And, if one's data doesn't fit nicely into a 128-bit vector... they can
> often use two different 64-bit vector ops in parallel.
<
All of these wide things are performed using CARRY instruction.
>
>
>
> Packed Integer SIMD isn't done by adding more ALUs, but rather by
> splitting up the sub-units within a carry-select adder:
> In you select the results where each 16-bit element had a carry-in of
> zero, a packed-word ADD magically appears.
>
>
> For FPU SIMD, there is only one FPU, so SIMD operations internally just
> do 4 operations in a row (by sequentially feeding the values into the
> FADD or FMUL units and capturing the result out the other end).
>
>
> I did it the way I did not necessarily because it was "best", but
> because I could do it cheaply...
>
> There are ways it could be made better/faster/..., but they would add
> resource cost.
>
>
> Similarly, "going narrower" and presenting vector operations in terms of
> scalar operations on individual elements would either:
> Slow everything down by essentially turning it back into scalar code;
> Require adding a large number of execute lanes and register ports.
>
> Neither of these options is desirable.
> > In-order CPUs don't play a role for the AMD64 architecture, and
> > compilers certainly don't unroll AVX code for that (there is no
> > AVX-capable in-order CPU). Compilers unroll loops in order to reduce
> > loop overhead, and for additional optimization options.
> >
> Yep.
> > Flaw 3: In the comments section, you reveal that tail handling is a
> > problem for auto-vectorization. My take is that auto-vectorization is
> > a flawed concept, mainly because it is unreliable: you rub the
> > compiler the wrong way, and it will stop auto-vectorizing the loop
> > without giving any warning. Manual vectorization is a better
> > approach, and when done right, tail handling is no problem. See
> > Sections 2.2-2.4 of
> >
> > http://www.complang.tuwien.ac.at/papers/ertl18manlang.pdf
> >
> Generally agreed.
> I am actively against autovectorization on BJX2.
> To what extent vectors are usable, they are in the form of explicit
> language extensions.
>
> I have done things in a way which which I feel doesn't suck nearly as
> bad as the "xmmintrin.h" system, and also allows a subset of the GCC
> vector extensions.
>
> My approach also tries to be much less of an "ugly wart" on the C
> language, so does things in ways that I feel are more consistent with
> traditional C semantics (can use native operators, cast conversions, ...).
>
>
> Though, this does mean some amount of expansion to the typesystem, but
> as I feel it, expanding out the C typesystem and numeric tower is much
> less evil than something like auto-vectorization, which implies
> interpreting C code as if it were something very unlike what was
> actually written (and for the compiler to essentially perform magic
> tricks with the semantics), or "xmmintrin.h" which is more like thinly
> veiled x86 ASM shoehorned into C's existing syntax.
>
> Granted, it does also mean that, unless written to do so, existing
> portable C code will not make any use of these sorts of vector extensions.
>
> ...

Re: Three fundamental flaws of SIMD

<sf67go$bsd$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19795&group=comp.arch#19795

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3b2f-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 16:44:08 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sf67go$bsd$1@newsreader4.netcologne.de>
References: <sf3kim$l3o$1@dont-email.me>
<2021Aug13.101107@mips.complang.tuwien.ac.at> <sf64tc$kfe$1@dont-email.me>
Injection-Date: Fri, 13 Aug 2021 16:44:08 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3b2f-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3b2f:0:7285:c2ff:fe6c:992d";
logging-data="12173"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Fri, 13 Aug 2021 16:44 UTC

BGB <cr88192@gmail.com> schrieb:
> On 8/13/2021 3:11 AM, Anton Ertl wrote:
>> Marcus <m.delete@this.bitsnbites.eu> writes:
>>> I posted this article on my blog:
>>>
>>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>>
>> Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
>> becomes apparent in the recently announced CPUs: processes cannot
>> migrate between cores with different register width, so they choose
>> 128 bit wide registers for both small and large cores. How could SVE
>> have been modified to avoid this flaw? Does RVV have it?
>>
>
> IME, moving to vectors larger than 128 bits doesn't tend to gain much
> over 128 bits. As the register size gets larger, there is less that can
> utilize it effectively.

I grant you that there is a decreasing return, which currently
levels off at 256 bits - AVX2 is being used a lot in video codecs.

AVX512 is a fiasco, but mainly because Intel overspent its heat
budget and has to clock down the CPU a _lot_ to use it, destroying
most if any advantage in using it.

> Generally agreed.
> I am actively against autovectorization on BJX2.
> To what extent vectors are usable, they are in the form of explicit
> language extensions.

Count me out for using your architecture, then.

Explicit language extensions

- lock in the user to a specific architecture and compiler
- expose architecture details which should not be visible
- make code hard to write, read and thus maintain

[...]

> My approach also tries to be much less of an "ugly wart" on the C
> language, so does things in ways that I feel are more consistent with
> traditional C semantics (can use native operators, cast conversions, ...).

There are other programming languages than C. What would you
propose for Fortran, for example?

Re: Three fundamental flaws of SIMD

<sf67ie$bsd$2@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19796&group=comp.arch#19796

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-3b2f-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 16:45:02 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sf67ie$bsd$2@newsreader4.netcologne.de>
References: <sf3kim$l3o$1@dont-email.me>
<jwvsfzewm4u.fsf-monnier+comp.arch@gnu.org> <sf54se$t9b$1@dont-email.me>
<f4dea063-fb39-48ea-a9d6-cf6663060ae5n@googlegroups.com>
Injection-Date: Fri, 13 Aug 2021 16:45:02 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-3b2f-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd6:3b2f:0:7285:c2ff:fe6c:992d";
logging-data="12173"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Fri, 13 Aug 2021 16:45 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

> You would enjoy programming in My 66000 ISA.

So would I.

Any chance of this happening in the forseeable future?

Re: Three fundamental flaws of SIMD

<sf6b3e$2n6$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19800&group=comp.arch#19800

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 12:45:16 -0500
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <sf6b3e$2n6$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<2021Aug13.101107@mips.complang.tuwien.ac.at> <sf64tc$kfe$1@dont-email.me>
<sf67go$bsd$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 17:45:18 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="31b55375a95f58d3f44c305fc14e5da9";
logging-data="2790"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/je0DoNBxffuXHVCdg1XN6"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:oNOtaoGgqSnyo569PTZ60Wm+QvA=
In-Reply-To: <sf67go$bsd$1@newsreader4.netcologne.de>
Content-Language: en-US

by: BGB - Fri, 13 Aug 2021 17:45 UTC

On 8/13/2021 11:44 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>> On 8/13/2021 3:11 AM, Anton Ertl wrote:
>>> Marcus <m.delete@this.bitsnbites.eu> writes:
>>>> I posted this article on my blog:
>>>>
>>>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>>>
>>> Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
>>> becomes apparent in the recently announced CPUs: processes cannot
>>> migrate between cores with different register width, so they choose
>>> 128 bit wide registers for both small and large cores. How could SVE
>>> have been modified to avoid this flaw? Does RVV have it?
>>>
>>
>> IME, moving to vectors larger than 128 bits doesn't tend to gain much
>> over 128 bits. As the register size gets larger, there is less that can
>> utilize it effectively.
>
> I grant you that there is a decreasing return, which currently
> levels off at 256 bits - AVX2 is being used a lot in video codecs.
>
> AVX512 is a fiasco, but mainly because Intel overspent its heat
> budget and has to clock down the CPU a _lot_ to use it, destroying
> most if any advantage in using it.
>
>
>> Generally agreed.
>> I am actively against autovectorization on BJX2.
>> To what extent vectors are usable, they are in the form of explicit
>> language extensions.
>
> Count me out for using your architecture, then.
>
> Explicit language extensions
>
> - lock in the user to a specific architecture and compiler
> - expose architecture details which should not be visible
> - make code hard to write, read and thus maintain
>

The traditional solution to this is to use ifdefs.

One already has to do this if they want the same code to use SSE and
NEON in a semi-effective manner.

Or use the common subset that exists with GCC's
"__attribute__((vector_size(N)))" system.

Otherwise, it is like saying that no one can use inline ASM, or that one
will refuse to use a compiler which supports using inline ASM.

One can just write traditional scalar code, and have it perform as such.
Its performance may suck in comparison, but it isn't precluded.

IME, auto vectorization only sometimes helps on traditional targets, and
frequently (if the optimizer is overzealous) can turn into a wrecking
ball on the performance front (performing worse, sometimes
significantly, than its non vectorized counterpart).

If it were up to me though, there would be explicit attributes to tell
the compiler what it should or should not vectorize.

Or, maybe, if some standardized way were defined to specify vector types
in C (preferably more concise than GCC's notation).

Say, for example, if the compiler allowed:
float[[vector(4)]] vec; //define a 4-element floating point vector.
Or:
[[vector(4)]] float vec; //basically equivalent.

> [...]
>
>> My approach also tries to be much less of an "ugly wart" on the C
>> language, so does things in ways that I feel are more consistent with
>> traditional C semantics (can use native operators, cast conversions, ...).
>
> There are other programming languages than C. What would you
> propose for Fortran, for example?
>

Dunno, not enough overlap between use-cases.

BJX2 is not intended for supercomputers or scientific computing, rather
I was more intending it for robot and machine control tasks. Originally
this was partway to address some annoyances I had with using ARM based
controllers for these tasks.

But, this part is an uphill battle, though despite its much slower
clock-speeds, it fares relatively well at some of the tasks I intended
it for, though sadly some tasks are seriously hindered by the available
memory bandwidth.

Re: Three fundamental flaws of SIMD

<sf6dfu$k7k$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19801&group=comp.arch#19801

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Three fundamental flaws of SIMD
Date: Fri, 13 Aug 2021 13:26:05 -0500
Organization: A noiseless patient Spider
Lines: 234
Message-ID: <sf6dfu$k7k$1@dont-email.me>
References: <sf3kim$l3o$1@dont-email.me>
<2021Aug13.101107@mips.complang.tuwien.ac.at> <sf64tc$kfe$1@dont-email.me>
<d4469965-242d-4e4d-bc17-5f695b4acb18n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Aug 2021 18:26:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="31b55375a95f58d3f44c305fc14e5da9";
logging-data="20724"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/adTL2byTzOIxD7RHBa7Id"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:mwZI3ZFukto2S5FMzA8q8ueVPQo=
In-Reply-To: <d4469965-242d-4e4d-bc17-5f695b4acb18n@googlegroups.com>
Content-Language: en-US

by: BGB - Fri, 13 Aug 2021 18:26 UTC

On 8/13/2021 11:29 AM, MitchAlsup wrote:
> On Friday, August 13, 2021 at 10:59:42 AM UTC-5, BGB wrote:
>> On 8/13/2021 3:11 AM, Anton Ertl wrote:
>>> Marcus <m.de...@this.bitsnbites.eu> writes:
>>>> I posted this article on my blog:
>>>>
>>>> https://www.bitsnbites.eu/three-fundamental-flaws-of-simd
>>>
>>> Flaw 1 is addressed with SVE. Interestingly, SVE has a flaw that
>>> becomes apparent in the recently announced CPUs: processes cannot
>>> migrate between cores with different register width, so they choose
>>> 128 bit wide registers for both small and large cores. How could SVE
>>> have been modified to avoid this flaw? Does RVV have it?
>>>
>> IME, moving to vectors larger than 128 bits doesn't tend to gain much
>> over 128 bits. As the register size gets larger, there is less that can
>> utilize it effectively.
>>
>> It is like integer size:
>> 16 bits: can be used heavily, may or may not be sufficient;
>> 32 bits: used heavily, usually sufficient;
>> 64 bits: used occasionally, often useful;
>> 128 bits: used rarely, sometimes useful;
>> 256 bits: novelty size...
>>
>> For SIMD vectors, there is roughly a factor of 4 relation:
>> 64 bits: can be used heavily, frequently insufficient;
>> 128 bits: used heavily, usually sufficient;
>> 256 bits: used occasionally, sometimes useful;
>> 512 bits: rarely useful.
>>
>> Once the SIMD vector exceeds the length of the data one would likely
>> express using vectors, its utility drops off sharply.
>>
>>
>> The march to endlessly bigger SIMD vectors would make sense if each step
>> up gave a linear improvement, but makes less sense if it is in
>> diminishing returns territory.
> <
> I am afraid I have to disagree with you here:: The march to endlessly bigger
> SIMD vectors is because the root of the necessity was never addressed
> correctly.
> <
> For example, all of the SIMD instructions in x86-64 are "performed"
> by the addition of exactly 2 My 66000 instructions, and may more
> the x86-64 are possible to express. ISA explosions shold be proactively
> prevented not embraced.

The ISA explosion can be contained.

x86 and ARM just sorta did a particularly bad job at it, as they sort of
awkwardly hacked it onto an ISA design where it didn't really fit, so
pretty much the entire ISA needs to be duplicated several times over.

Well, and also NEON integrating format conversions within its operations
being kinda absurd.

>>
>>
>> So, this is part of why my current policy for BJX2 is doing 64 and 128
>> bit "vectors" and calling it good enough.
>>
>> Not going to go 256 bits, at least not on a 3-wide core.
> <
> Not going to 256-bits wide EVER.

That is also an option.

On a core which could support it, something like:
PADDYF R24, R36, R44
Would be functionally equivalent to:
PADDXF R26, R38, R46 | PADDXF R24, R36, R44

And, one could just define the latter form as the one to use, or add
256-bit vectors as a kind of syntactic sugar in the assembler.

>>
>>
>> Similar tricks to what I used for 128-bit vectors could be used to allow
>> 256-bit vectors on a 6-wide / SMT capable core, since most of the
>> mechanisms and plumbing would already be in place. Such a vector would
>> effectively use groups of 4 registers in parallel (and ganging the
>> memory ports from both SMT threads to allow 256-bit load/store).
>>
>> So, implicitly, if I had a GSVY, it would implicitly depend on the
>> existence of WEX-6W...
>>
>>
>> But, then one might ask: OK, so why not then just issue a 128-bit SIMD
>> operation on Lane1A and Lane1B?...
>>
>> And, I could respond: What exactly do you think it is that these 256-bit
>> operations would be doing?...
>>
>> For related reasons, one can't have 128-bit operations on a core that
>> doesn't already support WEX-3W.
>>> Flaw 2 starts out with a false claim. There is no requirement that
>>> execution units are as wide as the registers, and AMD has delivered a
>>> lot of CPUs where the execution units were half as wide as the
>>> registers: Palomino and K8 implemented SSE (and, for K8, SSE2) with
>>> 64-bit wide functional units, Jaguar and later cats, Bulldozer and its
>>> descendents, and Zen 1 implemented AVX-256 with 128-bit-wide
>>> functional units. Could be a solution for ARMs SVE flaw.
>>>
>> FWIW: Despite having 128-bit SIMD, my BJX2 core doesn't actually
>> (currently) have any units which work natively with 128-bit data.
>>
>> The register file still mostly operates as 64 bits, and many 128 bit
>> operations are effectively performed by running two 64-bit operations in
>> parallel. In many cases, it is simply expanding a single SIMD
>> instruction over multiple lanes (effectively, a bundle of virtual
>> operations).
>>
>> In the ALUX extensions, in a few cases, the ALUs effectively combine
>> "Voltron style" by feeding bits between each other.
>>
>> The 128-bit shift operations are effectively two 64-bit funnel shift
>> operations "in disguise", ...
>>
>>
>> Well, also having 64-bit units which can occasionally combine for
>> 128-bit operations is a lot less wasteful than having a 128-bit unit
>> where 99% of the time its capabilities go unused.
>>
>> And, if one's data doesn't fit nicely into a 128-bit vector... they can
>> often use two different 64-bit vector ops in parallel.
> <
> All of these wide things are performed using CARRY instruction.

Something like CARRY doesn't really map onto how things are done in
BJX2, where I prefer to avoid contextual encodings.

Granted, prefixes like Jumbo or Op64 could be considered as turning the
following instruction into a contextual encoding, but this can be
sidestepped in the Jumbo or Op64 prefix is seen as composing a new
widened encoding.

Similarly, given they are generally required to be adjacent sort of
avoids "spooky action at a distance" behaviors.

Sadly, this sorta breaks down with the WEX-6W "interleave" semantics,
but trying to define a 6-wide core in terms of two overlapping 3-wide
pipelines introduces a bit of hair. Off-hand, there wasn't a good
alternative that was less awful, aside from limiting the use of Jumbo
prefixes to match their use in the 3-wide profile.

This is how I defined the 5-wide profile to behave, which loses some
potential performance, with the trade-off of "less hair" as it allows
ignoring the use of pipeline interleaving.

Whether or not a 6W core is viable is still "yet to be seen".

Another option could be to support SMT without supporting 6W operation.
In this case, the two pipelines would be independent, and the second
pipeline simply go unused if not running in SMT. However, this could
sidestep some of the issues with trying to support a unified 12R/6W
register file (by essentially just running two 6R/3W register files in
parallel with a little bit of additional plumbing trickery).

>>
>>
>>
>> Packed Integer SIMD isn't done by adding more ALUs, but rather by
>> splitting up the sub-units within a carry-select adder:
>> In you select the results where each 16-bit element had a carry-in of
>> zero, a packed-word ADD magically appears.
>>
>>
>> For FPU SIMD, there is only one FPU, so SIMD operations internally just
>> do 4 operations in a row (by sequentially feeding the values into the
>> FADD or FMUL units and capturing the result out the other end).
>>
>>
>> I did it the way I did not necessarily because it was "best", but
>> because I could do it cheaply...
>>
>> There are ways it could be made better/faster/..., but they would add
>> resource cost.
>>
>>
>> Similarly, "going narrower" and presenting vector operations in terms of
>> scalar operations on individual elements would either:
>> Slow everything down by essentially turning it back into scalar code;
>> Require adding a large number of execute lanes and register ports.
>>
>> Neither of these options is desirable.
>>> In-order CPUs don't play a role for the AMD64 architecture, and
>>> compilers certainly don't unroll AVX code for that (there is no
>>> AVX-capable in-order CPU). Compilers unroll loops in order to reduce
>>> loop overhead, and for additional optimization options.
>>>
>> Yep.
>>> Flaw 3: In the comments section, you reveal that tail handling is a
>>> problem for auto-vectorization. My take is that auto-vectorization is
>>> a flawed concept, mainly because it is unreliable: you rub the
>>> compiler the wrong way, and it will stop auto-vectorizing the loop
>>> without giving any warning. Manual vectorization is a better
>>> approach, and when done right, tail handling is no problem. See
>>> Sections 2.2-2.4 of
>>>
>>> http://www.complang.tuwien.ac.at/papers/ertl18manlang.pdf
>>>
>> Generally agreed.
>> I am actively against autovectorization on BJX2.
>> To what extent vectors are usable, they are in the form of explicit
>> language extensions.
>>
>> I have done things in a way which which I feel doesn't suck nearly as
>> bad as the "xmmintrin.h" system, and also allows a subset of the GCC
>> vector extensions.
>>
>> My approach also tries to be much less of an "ugly wart" on the C
>> language, so does things in ways that I feel are more consistent with
>> traditional C semantics (can use native operators, cast conversions, ...).
>>
>>
>> Though, this does mean some amount of expansion to the typesystem, but
>> as I feel it, expanding out the C typesystem and numeric tower is much
>> less evil than something like auto-vectorization, which implies
>> interpreting C code as if it were something very unlike what was
>> actually written (and for the compiler to essentially perform magic
>> tricks with the semantics), or "xmmintrin.h" which is more like thinly
>> veiled x86 ASM shoehorned into C's existing syntax.
>>
>> Granted, it does also mean that, unless written to do so, existing
>> portable C code will not make any use of these sorts of vector extensions.
>>
>> ...

Click here to read the complete article

Pages:12 3 4 5 6 7 8

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor