novaBBS - comp.arch - Re: VVM question

Re: VVM question

<sg29ht$rbs$1@gioia.aioe.org>

https://www.novabbs.com/devel/article-flat.php?id=20084&group=comp.arch#20084

Path: i2pn2.org!i2pn.org!aioe.org!T3F9KNSTSM9ffyC31YXeHw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: VVM question
Date: Tue, 24 Aug 2021 10:10:36 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sg29ht$rbs$1@gioia.aioe.org>
References: <sftuaa$but$1@newsreader4.netcologne.de>
<5fd4c976-d72c-46f3-9fb4-584e72b628a2n@googlegroups.com>
<sfvckb$bok$2@newsreader4.netcologne.de>
<3ae800da-d7d8-4437-b5bb-ec651b5f5700n@googlegroups.com>
<sg0gr4$pet$1@dont-email.me> <sg0n5u$74p$2@newsreader4.netcologne.de>
<sg0omd$itq$1@dont-email.me>
<65bad170-8d27-4ad4-bf8f-69157e6869f2n@googlegroups.com>
<sg13vd$hom$1@newsreader4.netcologne.de> <sg1mru$h6d$1@dont-email.me>
<sg23h5$5dj$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="28028"; posting-host="T3F9KNSTSM9ffyC31YXeHw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Tue, 24 Aug 2021 08:10 UTC

Thomas Koenig wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>> On 8/23/2021 2:29 PM, Thomas Koenig wrote:
>>> MitchAlsup <MitchAlsup@aol.com> schrieb:
>>>> Back to the posed question:
>>>> <
>>>> If the programmer unrolled the loop by hand (like DGEMM without transposes):
>>>> The LDs would need to be coded using offsets from the index register to be
>>>> recognized as dense::
>>>>
>>>> MOV Ri,#0
>>>> VEC R8,{}
>>>> LDD R4,[R2,Ri<<3]
>>>> LDD R5,[R2,Ri<<3+8]
>>>> LDD R6,[R2,Ri<<3+16]
>>>> LDD R7,[R2,Ri<<3+24]
>>>> ...
>>>> LOOP LT,Ri,#4,Rmax
>>>> <
>>>> The above code would be recognized as dense.
>>>> <
>>>> MOV Ri,#0
>>>> ADD R9,R2,#8
>>>> ADD R9,R2,#16
>>>> ADD R10,R2,#24
>>>> VEC R8,{}
>>>> LDD R4,[R2,Ri<<3]
>>>> LDD R5,[R8,Ri<<3+8]
>>>> LDD R6,[R9,Ri<<3+16]
>>>> LDD R7,[R10,Ri<<3+24]
>>>> ...
>>>> LOOP LT,Ri,#4,Rmax
>>>> <
>>>> This loop is harder to recognize as dense--even though the number of words
>>>> in the loop is less.
>>>
>>> Hm... all possible, but less elegant that it could be. All the
>>> manual unrolling and autovectorization and... rears its ugly
>>> head again.
>>>
>>> With all the mechanisms that VVM already offers, a way for the
>>> programmer or a programming language to specify that operations
>>> such as summation can be done in any order would be a very useful
>>> addition.
>>
>> I am probably missing something here.
>
> Or, equvalently, I have been explaining things badly :-)
>
>> To me the main advantage of
>> allowing out of order summations (using summations here as shorthand for
>> other similar type operations), was to allow the hardware to make use of
>> multiple functional units.
>
> Yes.
>
>> That is, a core with two adders could, if
>> allowed, complete the summation in about half the time.
>
> Yes.
>
>> Without that, I
>> don't see any advantage of out of order summations on VVM. If I am
>> wrong, please explain. If I am right, see below.
>
> Seeing below.
>
>>
>>
>>
>>> Suggestion:
>>>
>>> A variant of the VEC instruction, which does not specify a special
>>> register to keep the address in (which can be hardwired if there
>>> is no space in the thread header). This leaves five bits for
>>> "reduction" registters, which specify that operations on that
>>> register can be done in any order in the loop.
>>
>> Doing the operations in a different order isn't the problem.
>
> It's one half of the problem.
>
> The way VVM is currently specified, it's stricly in-order semantics
> you write down a C loop, and the hardware delivers the results
> exactly in the order you wrote down. This would have to be
> changed.
>
>
>> You need a
>> way to allow/specify the two partial sums to be added together in the
>> end.
>
> That as well.
>
>> I don't see your proposal as doing that.
>
> I thought I had implied it, but it was obviously not clear enough.
>
>
>> And, of course, it is
>> limited to five registers which must be specified in the hardware design.
>
> Five reductions in a loop would be plenty, it is usually one, or more
> rarely two.
>
>>> This would be a perfect match for OpenMP's reduction clause or
>>> for the planned REDUCTION addition to Fortran's DO CONCURRENT.
>>
>> I am not an OpenMP person, and my knowledge of Fortran is old, so could
>> you please give a brief explanation of what these two things do? Thanks.
>
> #pragma omp simd reduction(+:var)
>
> before a loop will tell the compiler that it can go wild
> with the sequence of loops but that "var" will be used
> in a summation reduction.
>
> DO CONCURRENT also runs loops in an unspecified order,
> the REDUCTION clause would then allow to, for example,
> sum up all elements.
>
> One problems with C and similar languages is that you have
> to specify an ordering of the loop explicitly, which shapes
> programmer's thinking and also shapes intermediate languages
> for compilers...
>
The eventual solution for all this will be similar to Mitch's FMAC
accumulator, i.e. a form of super-accumulator which allows one or more
elements to be added per cycle, while delaying all inexact/rounding to
the very end.

A carry-save exact accumulator with ~1100 paired bits would only use a
single full adder (2 or 3 gate delays?) to accept a new input, right?

I am not sure what is the best way for such a beast to handle both
additions and subtractions: Do you need to invert/negate the value to be
subtracted?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: VVM question

<sg2fsq$bpd$1@newsreader4.netcologne.de>

Subject	Author
VVM question	Thomas Koenig
Re: VVM question	Anton Ertl
Re: VVM question	Thomas Koenig
Re: VVM question	Anton Ertl
Re: VVM question	Thomas Koenig
Re: VVM question	Thomas Koenig
Re: VVM question	Anton Ertl
Re: VVM question	Thomas Koenig
Re: VVM question	Anton Ertl
Re: VVM question	Quadibloc
Re: VVM question	Anton Ertl
Re: VVM question	Terje Mathisen
Re: VVM question	Thomas Koenig
Re: VVM question	MitchAlsup
Re: VVM question	Thomas Koenig
Re: VVM question	Stephen Fuld
Re: VVM question	Anton Ertl
Re: VVM question	Terje Mathisen
Re: VVM question	luke.l...@gmail.com
Re: VVM question	luke.l...@gmail.com
Re: VVM question	Terje Mathisen
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	Stephen Fuld
Re: VVM question	Thomas Koenig
Re: VVM question	Stephen Fuld
Re: VVM question	MitchAlsup
Re: VVM question	Thomas Koenig
Re: VVM question	MitchAlsup
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	luke.l...@gmail.com
Re: VVM question	EricP
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	Terje Mathisen
Re: VVM question	EricP
Re: VVM question	MitchAlsup
Re: VVM question	Thomas Koenig
Re: VVM question	MitchAlsup
Re: VVM question	Thomas Koenig
Re: VVM question	Anton Ertl
Re: VVM question	Ivan Godard
Re: VVM question	Terje Mathisen
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	Stephen Fuld
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup
Re: VVM question	Ivan Godard
Re: VVM question	MitchAlsup
Re: VVM question	Ivan Godard
Re: VVM question	MitchAlsup
Re: VVM question	Ivan Godard
Re: VVM question	Stephen Fuld
Re: VVM question	MitchAlsup
Re: VVM question	luke.l...@gmail.com
Re: VVM question	Stephen Fuld
Re: VVM question	Thomas Koenig
Re: VVM question	Terje Mathisen
Re: VVM question	Thomas Koenig
Re: VVM question	MitchAlsup
Re: VVM question	Thomas Koenig
Re: VVM question	Thomas Koenig
Re: VVM question	MitchAlsup
Re: VVM question	MitchAlsup
Re: VVM question	Stephen Fuld
Re: VVM question	MitchAlsup
Re: VVM question	MitchAlsup
Re: VVM question	Stephen Fuld
Re: VVM question	MitchAlsup
Re: VVM question	Terje Mathisen
Re: VVM question	Stephen Fuld
Re: VVM question	luke.l...@gmail.com
Re: VVM question	Thomas Koenig
Re: VVM question	MitchAlsup
Re: VVM question	luke.l...@gmail.com
Re: VVM question	MitchAlsup

After a number of decimal places, nobody gives a damn.

devel / comp.arch / Re: VVM question