novaBBS - comp.arch - Re: Compact representation for common integer constants

Re: FP8 (was Compact representation for common integer constants)

<s7bk62$m23$1@reader1.panix.com>

https://www.novabbs.com/devel/article-flat.php?id=16689&group=comp.arch#16689

Path: i2pn2.org!i2pn.org!aioe.org!goblin3!goblin.stu.neva.ru!panix!not-for-mail
From: pw...@panix.com (paul wallich)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Mon, 10 May 2021 11:40:50 -0400
Organization: PANIX Public Access Internet and UNIX, NYC
Lines: 33
Message-ID: <s7bk62$m23$1@reader1.panix.com>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s76as3$i7j$1@gal.iecc.com> <s76ehd$65p$1@dont-email.me>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<41cb95a5-13ed-46c9-a884-667cd8a5d001n@googlegroups.com>
NNTP-Posting-Host: localhost
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: reader1.panix.com 1620661250 22595 127.0.0.1 (10 May 2021 15:40:50 GMT)
X-Complaints-To: abuse@panix.com
NNTP-Posting-Date: Mon, 10 May 2021 15:40:50 +0000 (UTC)
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
Gecko/20100101 Thunderbird/78.10.1
In-Reply-To: <41cb95a5-13ed-46c9-a884-667cd8a5d001n@googlegroups.com>
Content-Language: en-US

by: paul wallich - Mon, 10 May 2021 15:40 UTC

On 5/10/21 2:10 AM, Quadibloc wrote:
> On Sunday, May 9, 2021 at 10:06:15 PM UTC-6, John Levine wrote:
>
>> Even once they switched to core, tubes burned out, components went out
>> of spec, solder joints cracked, and uptime was measured in hours if
>> you were lucky, minutes if you weren't.
>
>> So a compelling reason to leave out floating point was that all the extra
>> logic made the computer even less reliable. It's impressive that they
>> risked it on the 704.
>
>> I have read that the practical limit on the length
>> of a FORTRAN program on the 704 was set by fact that the compiler
>> had to compile it in one run between hardware failures.
>
> That may be. But aside from the 704 using core memory, which didn't
> have the pattern sensitivity issues of Williams tubes, a lot of effort
> went into making it reliable by IBM. One thing was that the tubes
> were run at a lower-than-normal voltage, since they were being used
> for digital amplification in a computer with thousands of them, instead
> of in a radio with five of them.

Interesting. IIRC the SAGE computers took a converse approach, where
during scheduled maintenance time they would run the tubes briefly at
higher-than normal voltage, causing all the ones that were close to
failure to burn out when they could be replaced easily without affecting
computation.

(Meanwhile, my mother used to run computations that stressed the local
7094, so if her multiple regressions came out bogus in the morning it
usually meant that the machine would fail diagnostics in the afternoon...)

paul

Re: Compact representation for common integer constants

<s7ir12$p9u$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16696&group=comp.arch#16696

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Thu, 13 May 2021 11:20:33 +0200
Organization: A noiseless patient Spider
Lines: 199
Message-ID: <s7ir12$p9u$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 May 2021 09:20:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="46af5385d054e079ff2316dbeaf4723f";
logging-data="25918"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18RGrlhsA8gUVBmWFxQlkhE0oAS4WoLf9s="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:ol2b5IbYKTTpbmNkVuruzQWZLVY=
In-Reply-To: <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
Content-Language: en-US

by: Marcus - Thu, 13 May 2021 09:20 UTC

On 2021-05-12, MitchAlsup wrote:
> On Wednesday, May 12, 2021 at 2:17:33 PM UTC-5, BGB wrote:
>> On 5/10/2021 11:49 AM, Ivan Godard wrote:
>>> On 5/10/2021 9:22 AM, MitchAlsup wrote:
>>>> On Sunday, May 9, 2021 at 10:22:49 PM UTC-5, BGB wrote:
>>>>> On 5/9/2021 4:28 AM, Thomas Koenig wrote:
>>>>>> BGB <cr8...@gmail.com> schrieb:
>>>>>>
>>>>>>> IMUL (Lane 1)
>>>>>>> 32*32 -> 64
>>>>>>
>>>>>> Do you have two instructions (one for signed and one for unsigned)
>>>>>> or three (one for the lower half, one for signed high, one for
>>>>>> unsigned high)? The latter version could save you some ALU
>>>>>> complexity and some latency in the (probably common) case where
>>>>>> only a 32*32 multiplication is needed, at the cost of added
>>>>>> instructions for the 32*32-> 64 bit case.
>>>>>>
>>>>> There are actually more cases:
>>>>> MULS: 32*32->32, Result is sign-extended from low 32 bits
>>>> IMUL
>>>>> MULU: 32*32->32, Result is zero-extended from low 32 bits
>>>> UMUL
>>>>> DMULS: 32*32->64, 64-bit signed result
>>>> CARRY; IMUL
>>>>> DMULU: 32*32->64, 64-bit unsigned result
>>>> CARRY; UMUL
>>>>>
>>>>> The former give the typical behaviors one expects in C, the latter gives
>>>>> the widened results.
>>>>>
>>>>> These exist as 3R forms, so:
>>>>> DMULU R4, R5, R7 // R7 = R4 * R5
>>>> <
>>>> All mine are 2-operand 1-result
>>>>>
>>>>>
>>>>> Originally, there were also multiply ops which multiplied two inputs and
>>>>> then stored a pair of results in R0 and R1, more like the original
>>>>> SuperH multiply ops, but I dropped these for various reasons.
>>>> <
>>>> Consumes way more OpCode space that in useful
>>>>>
>>>>> There are cases where DMACS or DMACU instructions could be useful:
>>>>> DMACU R4, R5, R7 // R7 = R4 * R5 + R7
>>>> <
>>>> IMAC and UMAC
>>>>>
>>>>> But, I don't currently have these.
>>>>>
>>>>>
>>>>> Eg (64-bit signed multiply):
>>>>> SHADQ R4, -32, R6 | SHADQ R5, -32, R7 //1c
>>>>> DMULS R4, R7, R16 //1c
>>>>> DMACS R5, R6, R16 //3c (2c penalty)
>>>>> SHADQ R16, 32, R2 //3c (2c penalty)
>>>>> DMACU R4, R5, R2 //1c
>>>>> RTS
>>>>>
>>>>> Though, while fewer instructions than the current form, the above
>>>>> construction would still be pretty bad in terms of interlock penalties.
>>>>>
>>>>> SHADQ R4, -32, R6 | SHADQ R5, -32, R7 //1c
>>>>> DMULS R5, R6, R16 //1c
>>>>> DMULS R4, R7, R17 //1c
>>>>> DMULU R4, R5, R18 //1c
>>>>> ADD R16, R17, R19 //2c (1c penalty, DMULS R17)
>>>>> SHADQ R19, 32, R2 //2c (1c penalty, ADD R19)
>>>>> ADD R18, R2 //1c
>>>>> RTS
>>>>>
>>>>> Both cases would have approximately the same clock-cycle count (assuming
>>>>> both cases have a 3-cycle latency).
>>>> <
>>>> Which is why I used CARRY; xMUL
>>>>>
>>>>> ( Where recently, I have gotten around to modifying things such that the
>>>>> multiplier is now fully pipelined... )
>>>>>
>>>>>
>>>>>
>>>>> Otherwise, my time recently has mostly been being consumed by
>>>>> debugging...
>>>> <
>>>> Sherlock we know you well...........
>>>>>
>>>>> Then tries and seeing if I can get stuff to pass timing at 75MHz again
>>>>> (hopefully without wrecking stuff quite as bad this time). This
>>>>> sub-effort also revealed a few bugs (*), though there are still some
>>>>> bugs I have yet to resolve...
>>>>>
>>>>> *: Eg, after boosting the core to 75MHz while leaving the MMIO bus at
>>>>> 50MHz, stuff was breaking in simulation due to the L2 Ringbus <->
>>>>> MMIO-Bus bridge not waiting for the MMIO-Bus to return to a READY state
>>>>> before returning back to an idle state.
>>>>>
>>>>> It was then possible for the response to travel the rings and get back
>>>>> to the L1, which then allows execution to continue, with the CPU core
>>>>> then issuing another MMIO request, which then travels along the rings
>>>>> back to the MMIO bridge, in less time than it took for the 'OK -> READY'
>>>>> transition to happen on the MMIO bus...
>>>>>
>>>>> The way the bridge was designed, it would then try to initiate a
>>>>> request, see that the MMIO-Bus state was 'OK', and use whatever result
>>>>> was present (losing the request or returning garbage).
>>>>>
>>>>> This may have been happening at 50MHz as well, and could have possibly
>>>>> been leading to some of the bugs I had seen.
>>>>>
>>>>>
>>>>> Or such...
>>>
>>> Do you have any saturating multiplies? Why or why not?
>>
>> For BJX2?
>> Not currently.
>>
>> There are neither saturating multiplies nor saturated ADD/SUB.
>>
>> Granted, these could be useful for Fixed-point SIMD, but I was mostly
>> being careful with value ranges and arithmetic to limit the likelihood
>> of overflow/underflow.
>>
>>
>> As for why not:
>> These add a bit of cost and complexity.
>>
>> Say, one goes from needing ops for, say:
>> PADD.W, PMULU.W, PMULS.W, ...
>> To:
>> PADDS.W, PADDU.W, PADDSS.W, PADDUS.W,
>> PMULS.W, PMULU.W, PMULSS.W, PMULUS.W
>> ...
>>
>> For vector cases, it is usually possible to keep a certain amount of a
>> "safety zone" at the top and bottom of the range such that overflow is
>> unlikely, and writing calculations such that they are more likely to
>> undershoot than overshoot, ...
>>
>> There are some "packed-compare" and "packed-select" ops which can also
>> help here.
>>
>>
>> For non-SIMD cases, clamping can be done like:
>> CMPGT R4, 0
>> MOV?T 0, R4
>> CMPGT R4, 255
>> MOV?F 255, R4
>>
>> I had at one point considered specialized range-clamping ops, like:
>> CLAMPU.B R4, R4 //clamp to Unsigned Byte range
>> CLAMPS.S R4, R4 //clamp to Signed Short range
> <
> <
> My 66000 has clamping operations to any bit width--these are a subset
> of the extract instructions (which is itself a subset of the shift instruction)
> The classical cases are::
> SLL R8,R8,<8:0> // unsigned char
> SL R8,R8,<8:0> // signed char
> <
> but there are other cases
> <
> SLL R8,R9,<11:0> // 11-bit unsigned bit-field
> SLL R8,R9,<14:13> // 14-bit field from R9<26:13>
> <
> These come in signed (sign extended) and unsigned (zero extended) forms.
> These add exactly ZERO (nada zilch no} instruction to ISA and are in fact the
> basis for the shift instructions with the simple rule of::
> width==0 -> width "really =" 64
>>
>> But, never added them...
> <
> I never needed to............
>>
>> Mostly it is an issue of not being common enough (or expensive enough in
>> the naive case) to justify the added cost.
> <
> When you start using LLVM as a front end, it throws these clamps out any time
> a value in a register gets arithmetically manipulated--preventing the register from
> ever having a value outside of the type defined value-space.
> <
> Unsigned char i = 0;
> .......
> i++
> <
> gets compiled to::
> <
> MOV R19,#0
> .......
> ADD R19,R19,#1
> SLL R19,R19,<8:0>
> <
> Yes it is ugly......but people should not ask for small types unless they need the small type.
>

Click here to read the complete article

Re: FP8 (was Compact representation for common integer constants)

<s7j38u$1aog$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16697&group=comp.arch#16697

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Thu, 13 May 2021 13:41:19 +0200
Organization: Aioe.org NNTP Server
Lines: 25
Message-ID: <s7j38u$1aog$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s73rcr$83m$1@dont-email.me> <s73u8m$tnj$1@dont-email.me>
<s75pe2$shi$1@dont-email.me> <s76as3$i7j$1@gal.iecc.com>
<jwv7dk93zcy.fsf-monnier+comp.arch@gnu.org> <s787j3$99m$1@dont-email.me>
<jwv1rafyd1g.fsf-monnier+comp.arch@gnu.org>
<jwvv97rwy3y.fsf-monnier+comp.arch@gnu.org>
<48015cc8-9326-427a-9fdd-36ed1e12939an@googlegroups.com>
<s7ai2u$ul4$1@gioia.aioe.org> <jwvo8dhpd74.fsf-monnier+comp.arch@gnu.org>
<s7fsgs$1fag$1@gioia.aioe.org> <jwv4kf8krbo.fsf-monnier+comp.arch@gnu.org>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Thu, 13 May 2021 11:41 UTC

Stefan Monnier wrote:
>> Assuming default rounding and FADD, the larger value wins unless the smaller
>> is equal or exactly 1 below, in which case we'll round up.
>
> I think this is only true if the base of your logarithm is something
> like 2 (which I expect wouldn't be the most common choice, it
> corresponds to a "standard" FP but with 0 mantissa bits).
>
> E.g. I think the moral equivalent of 2 mantissa bits would be to use
> a logarithm of base of 2^�. When the logarithm's base is closer to
> 1 (as in this example), then you have to consider more cases.

I could not read your suggested exponent, but due to the need to combine
such exponent only (i.e. log) numbers with regular binary FP, I'm pretty
sure that you would in fact use base 2 log instead of base sqrt(2) or
natural log base e.

I do agree that something like base 2^(1/4) would correspond to a base 2
number with a few mantissa bits.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<s7j3g6$1eik$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16698&group=comp.arch#16698

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Thu, 13 May 2021 13:45:11 +0200
Organization: Aioe.org NNTP Server
Lines: 83
Message-ID: <s7j3g6$1eik$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
<2021May12.190836@mips.complang.tuwien.ac.at>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Thu, 13 May 2021 11:45 UTC

Anton Ertl wrote:
> John Levine <johnl@taugh.com> writes:
>> According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>>> On small systems, I suppose. Floating point was standard or at least a
>>>> widely provided option on large computers by the early 1960s.
>>>
>>> But it still was much slower than integer arithmetic on most machines.
>>
>> I happen to have a 360/67 manual here. It and the similar /65 were
>> workhorse mainframes in the early 1970s.
>>
>> Memory to register integer add took 1.4us, floating took 2.43 short, 2.45 double
>> Integer multiply was 4.8us, 4.4 short (faster than integer), 7.6 double
>> Integer divide was 8.7us, float 7.3, double 14.10
>
> Not bad.
>
>> I have manuals for the 386 and 486, where float arithmetic was also
>> about half the speed of fixed, even though integer arithmetic was
>> 32x32 and float was all extended with 64 fraction bits. Doesn't seem
>> much slower to me.
>
> According to
> <https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_f.html>,
> 8087-Pentium take the following number of cycles for FADD(P) and FMUL(P):
>
> variations/
> operand 8087 287 387 486 Pentium
> fadd 70-100 70-100 23-34 8-20 3/1 FX
> fadd mem32 90-120+EA 90-120 24-32 8-20 3/1 FX
> fadd mem64 95-125+EA 95-125 29-37 8-20 3/1 FX
> faddp 75-105 75-105 23-31 8-20 3/1 FX
> fmul reg s 90-105 90-105 29-52 16 3/1 FX
> fmul reg 130-145 130-145 46-57 16 3/1 FX
> fmul mem32 (110-125)+EA 110-125 27-35 11 3/1 FX
> fmul mem64 (154-168)+EA 154-168 32-57 14 3/1 FX
> fmulp reg s 94-108 94-108 29-52 16 3/1 FX
> fmulp reg 134-148 134-148 29-57 16 3/1 FX
>
> and for comparison ADD and MUL:
>
> add operands bytes 8088 186 286 386 486 Pentium
> add reg, reg 2 3 3 2 2 1 1 UV
> add mem, reg 2+d(0,2) 24+EA 10 7 7 3 3 UV
> add reg, mem 2+d(0,2) 13+EA 10 7 6 2 2 UV
> mul r32 2 - - - 9-38 13-42 10 NP
> mul mem32 2+d(0-2) - - - 12-41 13-42 10 NP
>
> So on the 486 "add reg, reg" is 8-20 times faster than "faddp", but
> fmul is as fast or faster than mul (and imul has the same cycle counts
> as mul). On the Pentium FMUL is >3 times as fast as MUL, and
> pipelined (i.e., 1 fmul/cycle can be started, and I don't think that
> MUL/IMUL can).

They were rightly proud of the Pentium FMUL unit, so they decided to
reuse it for all integer MULs as well: The move back & forth between cpu
domains added the 7 extra clock cycles. :-(
>
>> If you can write your code using the same number of fixed instructions
>> as floats, sure, it'll be faster, but if you need to add extra code to
>> explicit scaling, I doubt it'll really be faster.
>
> If the code does many adds, fixed point will be faster (no scaling
> needed if all the summands and the result have the same scale), if it
> does primarily muls, floating will be faster even on the 386, 486, and
> Pentium, because MUL is slow, and because scaling is needed (but
> scaling is ideally a shift).
>
>> On the other hand, if you had a 286 or 386 with no 287 or 387 and
>> were simulating floating point in software, *that* was slow.
>
> On the 8087 and 287, floating point is also slow; but given the cost
> of synthesized 32x32-bit multiplication on the 8086 and 80286 you
> again have the same balance as above.

I agree.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<2021May13.163046@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16702&group=comp.arch#16702

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Thu, 13 May 2021 14:30:46 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 17
Message-ID: <2021May13.163046@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com> <2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com> <2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="9b6fdafab5a0ed867dc02613f2ce9d52";
logging-data="23030"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+erfJCflGmACQ51sH+6Qr9"
Cancel-Lock: sha1:qnFUJxOQwpyDmSUTknZPdFLLyGQ=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Thu, 13 May 2021 14:30 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>They were rightly proud of the Pentium FMUL unit, so they decided to
>reuse it for all integer MULs as well: The move back & forth between cpu
>domains added the 7 extra clock cycles. :-(

The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
into clock domains (and had similar cycle counts for FMUL and MUL,
because it did MUL in the FPU). I really doubt that the Pentium
(originally 66MHz, shrunk models 200MHz) was divided into clock
domains, certainly not enough to account for 7 cycles (if so, you
would expect 70 cycles for Willamette). My guess ist that there was
some microcode slowness that made MUL slow.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: FP8 (was Compact representation for common integer constants)

<s7jjm9$vtr$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16708&group=comp.arch#16708

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Thu, 13 May 2021 18:21:28 +0200
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <s7jjm9$vtr$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
<2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org>
<2021May13.163046@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 May 2021 16:21:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="46af5385d054e079ff2316dbeaf4723f";
logging-data="32699"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/U7F1I/Xs5v1d0hEQqMPzfiznAUus839c="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:ETX6eMYncbjhpTSsvRIMxXcAHN0=
In-Reply-To: <2021May13.163046@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Marcus - Thu, 13 May 2021 16:21 UTC

ON 2021-05-13, Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> They were rightly proud of the Pentium FMUL unit, so they decided to
>> reuse it for all integer MULs as well: The move back & forth between cpu
>> domains added the 7 extra clock cycles. :-(
>
> The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
> into clock domains (and had similar cycle counts for FMUL and MUL,
> because it did MUL in the FPU). I really doubt that the Pentium
> (originally 66MHz, shrunk models 200MHz) was divided into clock
> domains, certainly not enough to account for 7 cycles (if so, you
> would expect 70 cycles for Willamette). My guess ist that there was
> some microcode slowness that made MUL slow.

Was that clock domains, or did Terje refer to different CPU domains as
in integer pipelines & scheduling vs FP pipelines & scheduling (many
architectures with separate integer and FP register files see a large
penalty when moving values between the two domains)?

>
> - anton
>

Re: FP8 (was Compact representation for common integer constants)

<s7jmmh$1lv5$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16710&group=comp.arch#16710

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Thu, 13 May 2021 19:12:50 +0200
Organization: Aioe.org NNTP Server
Lines: 22
Message-ID: <s7jmmh$1lv5$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
<2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org>
<2021May13.163046@mips.complang.tuwien.ac.at>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Thu, 13 May 2021 17:12 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> They were rightly proud of the Pentium FMUL unit, so they decided to
>> reuse it for all integer MULs as well: The move back & forth between cpu
>> domains added the 7 extra clock cycles. :-(
>
> The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
> into clock domains (and had similar cycle counts for FMUL and MUL,
> because it did MUL in the FPU). I really doubt that the Pentium
> (originally 66MHz, shrunk models 200MHz) was divided into clock
> domains, certainly not enough to account for 7 cycles (if so, you
> would expect 70 cycles for Willamette). My guess ist that there was
> some microcode slowness that made MUL slow.

I was in fact told by my Intel contacts that the slowdown was entirely
due to moving the data from/to integer regs.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<s7jmv7$1lv5$2@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16712&group=comp.arch#16712

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Thu, 13 May 2021 19:17:28 +0200
Organization: Aioe.org NNTP Server
Lines: 30
Message-ID: <s7jmv7$1lv5$2@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
<2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org>
<2021May13.163046@mips.complang.tuwien.ac.at> <s7jjm9$vtr$1@dont-email.me>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Thu, 13 May 2021 17:17 UTC

Marcus wrote:
> ON 2021-05-13, Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> They were rightly proud of the Pentium FMUL unit, so they decided to
>>> reuse it for all integer MULs as well: The move back & forth between cpu
>>> domains added the 7 extra clock cycles. :-(
>>
>> The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
>> into clock domains (and had similar cycle counts for FMUL and MUL,
>> because it did MUL in the FPU). I really doubt that the Pentium
>> (originally 66MHz, shrunk models 200MHz) was divided into clock
>> domains, certainly not enough to account for 7 cycles (if so, you
>> would expect 70 cycles for Willamette). My guess ist that there was
>> some microcode slowness that made MUL slow.
>
> Was that clock domains, or did Terje refer to different CPU domains as
> in integer pipelines & scheduling vs FP pipelines & scheduling (many
> architectures with separate integer and FP register files see a large
> penalty when moving values between the two domains)?

Exactly.

As I noted in my response to Anton, it was these moves that caused the
horrible slowdown.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Compact representation for common integer constants

<e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16716&group=comp.arch#16716

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:6b11:: with SMTP id w17mr20772853qts.143.1620928377006;
Thu, 13 May 2021 10:52:57 -0700 (PDT)
X-Received: by 2002:a9d:6743:: with SMTP id w3mr22567352otm.82.1620928376747;
Thu, 13 May 2021 10:52:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 May 2021 10:52:56 -0700 (PDT)
In-Reply-To: <s7ir12$p9u$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7ir12$p9u$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 13 May 2021 17:52:57 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Thu, 13 May 2021 17:52 UTC

On Thursday, May 13, 2021 at 4:20:36 AM UTC-5, Marcus wrote:
> On 2021-05-12, MitchAlsup wrote:
<snip>
> > When you start using LLVM as a front end, it throws these clamps out any time
> > a value in a register gets arithmetically manipulated--preventing the register from
> > ever having a value outside of the type defined value-space.
> > <
> > Unsigned char i = 0;
> > .......
> > i++
> > <
> > gets compiled to::
> > <
> > MOV R19,#0
> > .......
> > ADD R19,R19,#1
> > SLL R19,R19,<8:0>
> > <
> > Yes it is ugly......but people should not ask for small types unless they need the small type.
> >
> Unfortunately they do. Partly because small types take up less memory
> (when using large arrays/matrices), and partly because people do not
> understand what happens on the machine code level, so they make
> incorrect assumptions about what types to use in different situations.
<
In the past, one was allowed to hoist a small container into a larger register
and let the size of the register override the size of the container. Not so in
the LLVM architecture.

Re: FP8 (was Compact representation for common integer constants)

<2021May14.114422@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16741&group=comp.arch#16741

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 14 May 2021 09:44:22 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 29
Message-ID: <2021May14.114422@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com> <2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com> <2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org> <2021May13.163046@mips.complang.tuwien.ac.at> <s7jmmh$1lv5$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="5d5f7c2fd86717a462d647ead1d1489d";
logging-data="17175"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19SQ3rPLH3KjQsUcyVbK5t7"
Cancel-Lock: sha1:BkOmVJ5a5I2bqq1hXEdbAqf2OPA=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Fri, 14 May 2021 09:44 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> They were rightly proud of the Pentium FMUL unit, so they decided to
>>> reuse it for all integer MULs as well: The move back & forth between cpu
>>> domains added the 7 extra clock cycles. :-(
>>
>> The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
>> into clock domains (and had similar cycle counts for FMUL and MUL,
>> because it did MUL in the FPU). I really doubt that the Pentium
>> (originally 66MHz, shrunk models 200MHz) was divided into clock
>> domains, certainly not enough to account for 7 cycles (if so, you
>> would expect 70 cycles for Willamette). My guess ist that there was
>> some microcode slowness that made MUL slow.
>
>I was in fact told by my Intel contacts that the slowdown was entirely
>due to moving the data from/to integer regs.

Sure, but why? I remember that FP-integer register moves tended to be
slow on a lot of hardware. Some architectures required you to go
explicitly through memory, some had an instruction for doing that
move, but the only microarchitectural connections between interger and
FP units was the load/store unit, resulting in similar performance.
Maybe that was the case for the Pentium, too.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: FP8 (was Compact representation for common integer constants)

<s7m2cs$7g6$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16751&group=comp.arch#16751

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: FP8 (was Compact representation for common integer constants)
Date: Fri, 14 May 2021 16:44:44 +0200
Organization: Aioe.org NNTP Server
Lines: 45
Message-ID: <s7m2cs$7g6$1@gioia.aioe.org>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
<2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org>
<2021May13.163046@mips.complang.tuwien.ac.at> <s7jmmh$1lv5$1@gioia.aioe.org>
<2021May14.114422@mips.complang.tuwien.ac.at>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Fri, 14 May 2021 14:44 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> Anton Ertl wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>> They were rightly proud of the Pentium FMUL unit, so they decided to
>>>> reuse it for all integer MULs as well: The move back & forth between cpu
>>>> domains added the 7 extra clock cycles. :-(
>>>
>>> The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
>>> into clock domains (and had similar cycle counts for FMUL and MUL,
>>> because it did MUL in the FPU). I really doubt that the Pentium
>>> (originally 66MHz, shrunk models 200MHz) was divided into clock
>>> domains, certainly not enough to account for 7 cycles (if so, you
>>> would expect 70 cycles for Willamette). My guess ist that there was
>>> some microcode slowness that made MUL slow.
>>
>> I was in fact told by my Intel contacts that the slowdown was entirely
>> due to moving the data from/to integer regs.
>
> Sure, but why? I remember that FP-integer register moves tended to be
> slow on a lot of hardware. Some architectures required you to go
> explicitly through memory, some had an instruction for doing that
> move, but the only microarchitectural connections between interger and
> FP units was the load/store unit, resulting in similar performance.
> Maybe that was the case for the Pentium, too.

You did need to go via RAM if you wanted a sw copy between integer &
float registers, i.e. no MOV to/from fpu.

Even though the cache was effectively single cycle, you still had two
extra cycles on the integer side and on the fpu side you needed a cycle
for conversion (probably both ways), an extra cycle because the fpu
pipeline ran one cycle behind the integer one, and then you had to
convert/extract the resulting mantissa.

At the time, it looked to me like they effectively did more or less this
in an internal sequencer, so the only thing they avoided was the update
of the cache.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: FP8 (was Compact representation for common integer constants)

<5073b88d-cb36-45c9-8920-fd3232b8d99cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16754&group=comp.arch#16754

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:a44d:: with SMTP id n74mr27220006qke.367.1621005659212;
Fri, 14 May 2021 08:20:59 -0700 (PDT)
X-Received: by 2002:a9d:664c:: with SMTP id q12mr41100702otm.76.1621005659034;
Fri, 14 May 2021 08:20:59 -0700 (PDT)
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 14 May 2021 08:20:58 -0700 (PDT)
In-Reply-To: <2021May14.114422@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<2021May9.101917@mips.complang.tuwien.ac.at> <s7abfl$1k5m$1@gal.iecc.com>
<2021May10.101544@mips.complang.tuwien.ac.at> <s7brr7$1da5$1@gal.iecc.com>
<2021May12.190836@mips.complang.tuwien.ac.at> <s7j3g6$1eik$1@gioia.aioe.org>
<2021May13.163046@mips.complang.tuwien.ac.at> <s7jmmh$1lv5$1@gioia.aioe.org> <2021May14.114422@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5073b88d-cb36-45c9-8920-fd3232b8d99cn@googlegroups.com>
Subject: Re: FP8 (was Compact representation for common integer constants)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 14 May 2021 15:20:59 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Fri, 14 May 2021 15:20 UTC

On Friday, May 14, 2021 at 4:49:34 AM UTC-5, Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >Anton Ertl wrote:
> >> Terje Mathisen <terje.m...@tmsw.no> writes:
> >>> They were rightly proud of the Pentium FMUL unit, so they decided to
> >>> reuse it for all integer MULs as well: The move back & forth between cpu
> >>> domains added the 7 extra clock cycles. :-(
> >>
> >> The Willamette (Pentium 4) with it's 2GHz clock rate divided the die
> >> into clock domains (and had similar cycle counts for FMUL and MUL,
> >> because it did MUL in the FPU). I really doubt that the Pentium
> >> (originally 66MHz, shrunk models 200MHz) was divided into clock
> >> domains, certainly not enough to account for 7 cycles (if so, you
> >> would expect 70 cycles for Willamette). My guess ist that there was
> >> some microcode slowness that made MUL slow.
> >
> >I was in fact told by my Intel contacts that the slowdown was entirely
> >due to moving the data from/to integer regs.
<
> Sure, but why?
<
The FP unit was separately microcoded from the x86, and the instruction
was not recognized as FP by the FP microcode, so the integer microcode
had to negotiate with FP microcode to send the data.
<
< I remember that FP-integer register moves tended to be
> slow on a lot of hardware. Some architectures required you to go
> explicitly through memory, some had an instruction for doing that
> move, but the only microarchitectural connections between interger and
> FP units was the load/store unit, resulting in similar performance.
> Maybe that was the case for the Pentium, too.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Compact representation for common integer constants

<s7qlup$dm0$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16806&group=comp.arch#16806

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Sun, 16 May 2021 08:43:05 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s7qlup$dm0$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
Injection-Date: Sun, 16 May 2021 08:43:05 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:2862:0:7285:c2ff:fe6c:992d";
logging-data="14016"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 16 May 2021 08:43 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

> When you start using LLVM as a front end, it throws these clamps out any time
> a value in a register gets arithmetically manipulated--preventing the register from
> ever having a value outside of the type defined value-space.
><
> Unsigned char i = 0;
> .......
> i++
><
> gets compiled to::
><
> MOV R19,#0
> .......
> ADD R19,R19,#1
> SLL R19,R19,<8:0>
><
> Yes it is ugly......but people should not ask for small types unless they need the small type.

Does the "small types" include the C standard unsigned type?
If so, that is defintitely worth a bug report with clang. I have
just filed one for gcc for POWER which shows the same problem, at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622 .

Re: Compact representation for common integer constants

<2021May16.145732@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16812&group=comp.arch#16812

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Sun, 16 May 2021 12:57:32 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 92
Distribution: world
Message-ID: <2021May16.145732@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com> <s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de> <s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com> <s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com> <s7qlup$dm0$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="a54cb75c6ea932a4a501c10fb0f02607";
logging-data="32123"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4VUNA11zW3mVCGpYAjJ9w"
Cancel-Lock: sha1:1ELXHRcSDgDGj4JR8vcLEEYnJwA=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 16 May 2021 12:57 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>I have
>just filed one for gcc for POWER which shows the same problem, at
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622 .

This is a fun example; let's look only at foo:

unsigned int foo(unsigned int *a, int n)
{ int i;
unsigned int res = 0;
for (i=0; i<n; i++)
res += a[i];

return res;
}

The inner loop is compiled to:

18: 04 00 0a 85 lwzu r8,4(r10)
1c: 14 1a 68 7c add r3,r8,r3
20: 20 00 63 78 clrldi r3,r3,32
24: ff ff 29 39 addi r9,r9,-1
28: 20 00 29 79 clrldi r9,r9,32
2c: ec ff 00 42 bdnz 18 <foo+0x18>

So gcc uses a zero-extending load, adds the result to res (in r3), and
then zero-extends the result with clrldi. This could be pushed out of
the loop with some analysis: On the loop-back edge the result is only
used by an addition (the same one) which does not care about the upper
32 bits. I don't know whether the calling convention requires an
unsigned int to be returned in zero-extended form; if so, a clrldi
would be needed after the loop.

The other clrldi is there for updating a loop counter; this is
strange. I see no point in keeping a loop counter in r9 at all. The
real loop counter used by bdnz is in the ctr register. Maybe gcc
keeps a shadow copy of ctr in r9, because it cannot model the ctr
register. But ctr is not zero-extended by bdnz, so r9 should not be,
either.

On AMD64 with gcc-10.2 -O1 this gives the following inner loop:

14: 03 10 add (%rax),%edx
16: 48 83 c0 04 add $0x4,%rax
1a: 48 39 c8 cmp %rcx,%rax
1d: 75 f5 jne 14 <foo+0x14>

The 32-bit add instruction in 14 automatically zero-extends %edx into
%rdx. gcc uses the address of the current element as the loop counter
in this loop.

On Aarch64 with gcc-7.5 -O1 this gives the following inner loop:

1c: b8404441 ldr w1, [x2], #4
20: 0b010000 add w0, w0, w1
24: eb03005f cmp x2, x3
28: 54ffffa1 b.ne 1c <foo+0x1c> // b.any

Again, a 32-bit add that is defined to zero-extend the result, and the
address of the current element is used as loop counter.

This reminds me of DEC's C compiler for Alpha in 1995 or a little
later (IIRC gcc did not fare any better at the time); when compiling IIRC:

int foo(int a, int b)
{ return a|b;
}

it produced something like

addl a0, zero, a0 #sign-extend a0
addl a1, zero, a1 #sign-extend a1
orl a0, a1, a0 #do the actual or with sign extension
addl a0, zero, a0 #sign-extend a0
ret

(the result register is probably not a0, but you get the idea).

Every one of these addls was unnecessary. I repeated the experiment a
few years later (when the compiler was called Compaq C compiler), and
by that time this had been fixed. Still, it's remarkable that it was
so bad at that more than three years after Alpha had been launched.

It also shows that I32LP64 is a bad idea, even 30 years later. They
should have gone for ILP64.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Compact representation for common integer constants

<e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16815&group=comp.arch#16815

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:9e12:: with SMTP id h18mr53445612qke.483.1621183524151;
Sun, 16 May 2021 09:45:24 -0700 (PDT)
X-Received: by 2002:aca:6286:: with SMTP id w128mr40690837oib.119.1621183523934;
Sun, 16 May 2021 09:45:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 16 May 2021 09:45:23 -0700 (PDT)
In-Reply-To: <s7qlup$dm0$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5bd:f925:f7c0:f4cd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5bd:f925:f7c0:f4cd
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7qlup$dm0$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 16 May 2021 16:45:24 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Sun, 16 May 2021 16:45 UTC

On Sunday, May 16, 2021 at 3:43:08 AM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > When you start using LLVM as a front end, it throws these clamps out any time
> > a value in a register gets arithmetically manipulated--preventing the register from
> > ever having a value outside of the type defined value-space.
> ><
> > Unsigned char i = 0;
> > .......
> > i++
> ><
> > gets compiled to::
> ><
> > MOV R19,#0
> > .......
> > ADD R19,R19,#1
> > SLL R19,R19,<8:0>
> ><
> > Yes it is ugly......but people should not ask for small types unless they need the small type.
> Does the "small types" include the C standard unsigned type?
<
It appears to be attached all integral types smaller than 64-bits.
<
> If so, that is defintitely worth a bug report with clang. I have
> just filed one for gcc for POWER which shows the same problem, at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622 .

Re: Compact representation for common integer constants

<s7svtl$qlt$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16823&group=comp.arch#16823

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 05:45:25 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s7svtl$qlt$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7qlup$dm0$1@newsreader4.netcologne.de>
<e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com>
Injection-Date: Mon, 17 May 2021 05:45:25 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:2862:0:7285:c2ff:fe6c:992d";
logging-data="27325"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 17 May 2021 05:45 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Sunday, May 16, 2021 at 3:43:08 AM UTC-5, Thomas Koenig wrote:
>> MitchAlsup <Mitch...@aol.com> schrieb:
>> > When you start using LLVM as a front end, it throws these clamps out any time
>> > a value in a register gets arithmetically manipulated--preventing the register from
>> > ever having a value outside of the type defined value-space.
>> ><
>> > Unsigned char i = 0;
>> > .......
>> > i++
>> ><
>> > gets compiled to::
>> ><
>> > MOV R19,#0
>> > .......
>> > ADD R19,R19,#1
>> > SLL R19,R19,<8:0>
>> ><
>> > Yes it is ugly......but people should not ask for small types unless they need the small type.
>> Does the "small types" include the C standard unsigned type?
><
> It appears to be attached all integral types smaller than 64-bits.

That's bad.

There is an implicit assumption that people who use the default
types in languages like C or Fortran get code that is not
sub-optimal. This violates that assumption.

Normally, I would not suggest using hardware to get around
software limits, it should be the other way around.

However, if you cannot get this fixed, would it be possible to
fuse these two instructions to both operation itself and zero out
the upper bits, depending on type size? It would only need the
most common cases, i.e. masking to 8, 16 and 32 bits.

>> If so, that is defintitely worth a bug report with clang. I have
>> just filed one for gcc for POWER which shows the same problem, at
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622 .

I will report back on what happens in that PR, to see how other
people handle this.

Re: Compact representation for common integer constants

<s7tgfv$eht$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16824&group=comp.arch#16824

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 12:28:15 +0200
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <s7tgfv$eht$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7ir12$p9u$1@dont-email.me>
<e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 May 2021 10:28:15 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a50370a2f680129f89cd1f4bf4673aa8";
logging-data="14909"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Iv/F9bQjRlyYvDwH+hXE6Qx/I64Df2OQ="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:/Zx3XQeZwsjQ5S+k4K7pNohgTZQ=
In-Reply-To: <e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>
Content-Language: en-GB

by: David Brown - Mon, 17 May 2021 10:28 UTC

On 13/05/2021 19:52, MitchAlsup wrote:
> On Thursday, May 13, 2021 at 4:20:36 AM UTC-5, Marcus wrote:
>> On 2021-05-12, MitchAlsup wrote:
> <snip>
>>> When you start using LLVM as a front end, it throws these clamps out any time
>>> a value in a register gets arithmetically manipulated--preventing the register from
>>> ever having a value outside of the type defined value-space.
>>> <
>>> Unsigned char i = 0;
>>> .......
>>> i++
>>> <
>>> gets compiled to::
>>> <
>>> MOV R19,#0
>>> .......
>>> ADD R19,R19,#1
>>> SLL R19,R19,<8:0>
>>> <
>>> Yes it is ugly......but people should not ask for small types unless they need the small type.
>>>
>> Unfortunately they do. Partly because small types take up less memory
>> (when using large arrays/matrices), and partly because people do not
>> understand what happens on the machine code level, so they make
>> incorrect assumptions about what types to use in different situations.
> <
> In the past, one was allowed to hoist a small container into a larger register
> and let the size of the register override the size of the container. Not so in
> the LLVM architecture.
>

What language are you referring to here?

In C, the compiler is welcome to put the container into any register
that will hold it - or a smaller register, if it can be sure the results
will be the same (according to the language and implementation
specifications).

An "int" in C is supposed to be of a size that can be efficiently
implemented and used for arithmetic on the architecture, traditionally
matching the width of the ALU and/or general purpose registers - but
with a minimum of 16-bit. (In practice, it is usually limited to 32-bit
even on 64-bit systems.) So it is important that arithmetic on "int"
and "unsigned int" is efficient - less so for other types.

C does not even defined arithmetic on smaller types. There is no such
operation as "increment of unsigned char" in C. When you write
"unsigned char i = 0; i++;", to C this means precisely the same as:

unsigned char i = 0;

int tmp = i;
tmp = tmp + 1;
i = tmp;

(If "int" and "char" are the same size, the promotion is to "unsigned int".)

The compiler can generate any code it likes in order to get the same
results, but that is the logical operation.

That means that the compiler could generate the "add r19, r19, #1"
instruction but omit the "sll r19, r19, <8:0>" instruction at the time -
as long as it remembers that there may be extra data in the higher bits
of r19. Maybe there will be more operations on "i" later, or a "byte
store" operation that renders the masking operation unnecessary.

Re: Compact representation for common integer constants

<2021May17.144318@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16825&group=comp.arch#16825

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 12:43:18 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 33
Distribution: world
Message-ID: <2021May17.144318@mips.complang.tuwien.ac.at>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de> <s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com> <s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com> <s7qlup$dm0$1@newsreader4.netcologne.de> <e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com> <s7svtl$qlt$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="d3db7e93078b67cb7e86fcef5a342202";
logging-data="12917"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19iygY2KtAS+yK0HkuTpIPh"
Cancel-Lock: sha1:apc1JZPhyQCS8II3zJDOXnwSYAM=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Mon, 17 May 2021 12:43 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
[Extension woes:]
>There is an implicit assumption that people who use the default
>types in languages like C or Fortran get code that is not
>sub-optimal. This violates that assumption.

Choosing a 32-bit default type on a 64-bit machine is unwise.

But it's interesting how stuff was adapted to this decision (of using
I32LP64 for C on 64-bit machines):

C defines modulo arithmetic for unsigned overflow, but not for signed
overflow, so instead of sign-extending the result, some people got the
idea to pretend that overflows don't happen, and "optimize" the
sign-extension away.

For unsigned numbers, C is defined, and apparently important enough to
merit architectural workarounds for this problem: AMD64 and Aarch64
include 32-bit instructions that zero-extend their results (and on
AMD64 the 32-bit versions can sometimes be encoded with fewer bytes),
but no sign-extending variant of the 32-bit instructions; at first I
wondered about that, but now I think that this is because the
sign-extending variants would not provide a benefit in benchmarketing
given C compilers that "optimize" as described above.

By contrast, Alpha has only sign-extending versions of 32-bit ALU
instructions instructions, AFAIK mainly to make porting VAX software
easier.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Compact representation for common integer constants

<bcd3dd9a-cfcd-4691-9163-842ddf1f483dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16828&group=comp.arch#16828

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:2243:: with SMTP id c3mr126629qvc.21.1621265185245;
Mon, 17 May 2021 08:26:25 -0700 (PDT)
X-Received: by 2002:a9d:3623:: with SMTP id w32mr135740otb.16.1621265184933;
Mon, 17 May 2021 08:26:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 17 May 2021 08:26:24 -0700 (PDT)
In-Reply-To: <s7svtl$qlt$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e8e8:91fc:a9b8:ab12;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e8e8:91fc:a9b8:ab12
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7qlup$dm0$1@newsreader4.netcologne.de> <e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com>
<s7svtl$qlt$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bcd3dd9a-cfcd-4691-9163-842ddf1f483dn@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 17 May 2021 15:26:25 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 49

by: MitchAlsup - Mon, 17 May 2021 15:26 UTC

On Monday, May 17, 2021 at 12:45:27 AM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > On Sunday, May 16, 2021 at 3:43:08 AM UTC-5, Thomas Koenig wrote:
> >> MitchAlsup <Mitch...@aol.com> schrieb:
> >> > When you start using LLVM as a front end, it throws these clamps out any time
> >> > a value in a register gets arithmetically manipulated--preventing the register from
> >> > ever having a value outside of the type defined value-space.
> >> ><
> >> > Unsigned char i = 0;
> >> > .......
> >> > i++
> >> ><
> >> > gets compiled to::
> >> ><
> >> > MOV R19,#0
> >> > .......
> >> > ADD R19,R19,#1
> >> > SLL R19,R19,<8:0>
> >> ><
> >> > Yes it is ugly......but people should not ask for small types unless they need the small type.
> >> Does the "small types" include the C standard unsigned type?
> ><
> > It appears to be attached all integral types smaller than 64-bits.
> That's bad.
>
> There is an implicit assumption that people who use the default
> types in languages like C or Fortran get code that is not
> sub-optimal. This violates that assumption.
>
> Normally, I would not suggest using hardware to get around
> software limits, it should be the other way around.
>
> However, if you cannot get this fixed,
<
This permeates the entire set of front ends LLVM supports; and
apparently is required for C to call/get-called from ADA.
<
> would it be possible to
> fuse these two instructions to both operation itself and zero out
> the upper bits, depending on type size?
<
Only by making all integer instructions other than 64-bits take 2 cycles.
<
> It would only need the
> most common cases, i.e. masking to 8, 16 and 32 bits.
> >> If so, that is defintitely worth a bug report with clang. I have
> >> just filed one for gcc for POWER which shows the same problem, at
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622 .
> I will report back on what happens in that PR, to see how other
> people handle this.

Re: Compact representation for common integer constants

<75ce2084-577e-4273-9d31-174ca4372479n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16829&group=comp.arch#16829

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:b1b:: with SMTP id t27mr424156qkg.42.1621265350735; Mon, 17 May 2021 08:29:10 -0700 (PDT)
X-Received: by 2002:aca:2107:: with SMTP id 7mr330219oiz.110.1621265350504; Mon, 17 May 2021 08:29:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 17 May 2021 08:29:10 -0700 (PDT)
In-Reply-To: <s7tgfv$eht$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e8e8:91fc:a9b8:ab12; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e8e8:91fc:a9b8:ab12
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com> <s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com> <s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com> <s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de> <s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com> <s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com> <s7ir12$p9u$1@dont-email.me> <e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com> <s7tgfv$eht$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <75ce2084-577e-4273-9d31-174ca4372479n@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 17 May 2021 15:29:10 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 69

by: MitchAlsup - Mon, 17 May 2021 15:29 UTC

On Monday, May 17, 2021 at 5:28:18 AM UTC-5, David Brown wrote:
> On 13/05/2021 19:52, MitchAlsup wrote:
> > On Thursday, May 13, 2021 at 4:20:36 AM UTC-5, Marcus wrote:
> >> On 2021-05-12, MitchAlsup wrote:
> > <snip>
> >>> When you start using LLVM as a front end, it throws these clamps out any time
> >>> a value in a register gets arithmetically manipulated--preventing the register from
> >>> ever having a value outside of the type defined value-space.
> >>> <
> >>> Unsigned char i = 0;
> >>> .......
> >>> i++
> >>> <
> >>> gets compiled to::
> >>> <
> >>> MOV R19,#0
> >>> .......
> >>> ADD R19,R19,#1
> >>> SLL R19,R19,<8:0>
> >>> <
> >>> Yes it is ugly......but people should not ask for small types unless they need the small type.
> >>>
> >> Unfortunately they do. Partly because small types take up less memory
> >> (when using large arrays/matrices), and partly because people do not
> >> understand what happens on the machine code level, so they make
> >> incorrect assumptions about what types to use in different situations.
> > <
> > In the past, one was allowed to hoist a small container into a larger register
> > and let the size of the register override the size of the container. Not so in
> > the LLVM architecture.
> >
> What language are you referring to here?
>
> In C, the compiler is welcome to put the container into any register
> that will hold it - or a smaller register, if it can be sure the results
> will be the same (according to the language and implementation
> specifications).
>
> An "int" in C is supposed to be of a size that can be efficiently
> implemented and used for arithmetic on the architecture, traditionally
> matching the width of the ALU and/or general purpose registers - but
> with a minimum of 16-bit. (In practice, it is usually limited to 32-bit
> even on 64-bit systems.) So it is important that arithmetic on "int"
> and "unsigned int" is efficient - less so for other types.
>
> C does not even defined arithmetic on smaller types. There is no such
> operation as "increment of unsigned char" in C. When you write
> "unsigned char i = 0; i++;", to C this means precisely the same as:
>
> unsigned char i = 0;
>
> int tmp = i;
> tmp = tmp + 1;
> i = tmp;
>
> (If "int" and "char" are the same size, the promotion is to "unsigned int".)
>
>
> The compiler can generate any code it likes in order to get the same
> results, but that is the logical operation.
<
And apparently, the container containing shorter than full register width data
need to get smashed back to the defined container significant size after
each operation.
>
> That means that the compiler could generate the "add r19, r19, #1"
> instruction but omit the "sll r19, r19, <8:0>" instruction at the time -
> as long as it remembers that there may be extra data in the higher bits
> of r19. Maybe there will be more operations on "i" later, or a "byte
> store" operation that renders the masking operation unnecessary.

Re: Compact representation for common integer constants

<s7u338$cek$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16831&group=comp.arch#16831

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 15:45:44 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s7u338$cek$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7qlup$dm0$1@newsreader4.netcologne.de>
<e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com>
<s7svtl$qlt$1@newsreader4.netcologne.de>
<bcd3dd9a-cfcd-4691-9163-842ddf1f483dn@googlegroups.com>
Injection-Date: Mon, 17 May 2021 15:45:44 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:2862:0:7285:c2ff:fe6c:992d";
logging-data="12756"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 17 May 2021 15:45 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Monday, May 17, 2021 at 12:45:27 AM UTC-5, Thomas Koenig wrote:
>> MitchAlsup <Mitch...@aol.com> schrieb:
>> > On Sunday, May 16, 2021 at 3:43:08 AM UTC-5, Thomas Koenig wrote:
>> >> MitchAlsup <Mitch...@aol.com> schrieb:
>> >> > When you start using LLVM as a front end, it throws these clamps out any time
>> >> > a value in a register gets arithmetically manipulated--preventing the register from
>> >> > ever having a value outside of the type defined value-space.
>> >> ><
>> >> > Unsigned char i = 0;
>> >> > .......
>> >> > i++
>> >> ><
>> >> > gets compiled to::
>> >> ><
>> >> > MOV R19,#0
>> >> > .......
>> >> > ADD R19,R19,#1
>> >> > SLL R19,R19,<8:0>
>> >> ><
>> >> > Yes it is ugly......but people should not ask for small types unless they need the small type.
>> >> Does the "small types" include the C standard unsigned type?
>> ><
>> > It appears to be attached all integral types smaller than 64-bits.
>> That's bad.
>>
>> There is an implicit assumption that people who use the default
>> types in languages like C or Fortran get code that is not
>> sub-optimal. This violates that assumption.
>>
>> Normally, I would not suggest using hardware to get around
>> software limits, it should be the other way around.
>>
>> However, if you cannot get this fixed,
><
> This permeates the entire set of front ends LLVM supports; and
> apparently is required for C to call/get-called from ADA.

Here is the test case from the PR, which gcc suboptimizes, on
POWER (which shares only having 64-bit operations with your
architecture).

$ cat foo.c
unsigned int foo(unsigned int *a, int n)
{ int i;
unsigned int res = 0;
for (i=0; i<n; i++)
res += a[i];

return res;
} $ clang -O1 -c foo.c && objdump --disassemble foo.o

foo.o: file format elf64-powerpcle

Disassembly of section .text:

0000000000000000 <foo>:
0: 01 00 04 2c cmpwi r4,1
4: 30 00 80 41 blt 34 <foo+0x34>
8: 20 00 85 78 clrldi r5,r4,32
c: fc ff 83 38 addi r4,r3,-4
10: 00 00 60 38 li r3,0
14: a6 03 a9 7c mtctr r5
18: 00 00 00 60 nop
1c: 00 00 00 60 nop
20: 04 00 a4 84 lwzu r5,4(r4)
24: 14 1a 65 7c add r3,r5,r3
28: f8 ff 00 42 bdnz 20 <foo+0x20>
2c: 20 00 63 78 clrldi r3,r3,32
30: 20 00 80 4e blr
34: 00 00 60 38 li r3,0
38: 20 00 80 4e blr

which has one mask instruction, at the end.

This is with clang version 10.0.0-4ubuntu1.

>> fuse these two instructions to both operation itself and zero out
>> the upper bits, depending on type size?
><
> Only by making all integer instructions other than 64-bits take 2 cycles.

So that is not an option :-)

What do you get if you run the test case above through your
compiler? If you get the mask instruction in the loop,
then something is very wrong.

Re: Compact representation for common integer constants

<s7u51d$ds5$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16832&group=comp.arch#16832

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 16:18:53 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s7u51d$ds5$1@newsreader4.netcologne.de>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7ir12$p9u$1@dont-email.me>
<e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>
<s7tgfv$eht$1@dont-email.me>
<75ce2084-577e-4273-9d31-174ca4372479n@googlegroups.com>
Injection-Date: Mon, 17 May 2021 16:18:53 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-2862-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2a0a:a540:2862:0:7285:c2ff:fe6c:992d";
logging-data="14213"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 17 May 2021 16:18 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:

>> The compiler can generate any code it likes in order to get the same
>> results, but that is the logical operation.
><
> And apparently, the container containing shorter than full register width data
> need to get smashed back to the defined container significant size after
> each operation.

It should not, and it does not do so for POWER with LLVM 10.

My guess would be that either your LLVM is too old, or that
there is something wrong with the machine description.

(gcc is indeed buggy in that respect).

Re: Compact representation for common integer constants

<s7ulc4$1bo$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16836&group=comp.arch#16836

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 22:57:40 +0200
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <s7ulc4$1bo$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7ir12$p9u$1@dont-email.me>
<e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>
<s7tgfv$eht$1@dont-email.me>
<75ce2084-577e-4273-9d31-174ca4372479n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 May 2021 20:57:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a50370a2f680129f89cd1f4bf4673aa8";
logging-data="1400"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/5X7xu0tXDgz21XItCTYDlFOpsrIA3YUE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:LihaC9cnSo+Px6ZBGwkt8C4Y37I=
In-Reply-To: <75ce2084-577e-4273-9d31-174ca4372479n@googlegroups.com>
Content-Language: en-GB

by: David Brown - Mon, 17 May 2021 20:57 UTC

On 17/05/2021 17:29, MitchAlsup wrote:
> On Monday, May 17, 2021 at 5:28:18 AM UTC-5, David Brown wrote:

>> C does not even defined arithmetic on smaller types. There is no such
>> operation as "increment of unsigned char" in C. When you write
>> "unsigned char i = 0; i++;", to C this means precisely the same as:
>>
>> unsigned char i = 0;
>>
>> int tmp = i;
>> tmp = tmp + 1;
>> i = tmp;
>>
>> (If "int" and "char" are the same size, the promotion is to "unsigned int".)
>>
>>
>> The compiler can generate any code it likes in order to get the same
>> results, but that is the logical operation.
> <
> And apparently, the container containing shorter than full register width data
> need to get smashed back to the defined container significant size after
> each operation.

No, it does not have to be. A compiler /might/ do that - it's perhaps
easier to write compilers that put the masking in after a statement like
"i++;" here. But compilers certainly don't /have/ to do it. It comes
down to the quality of the compiler's optimisations and code generation.

>>
>> That means that the compiler could generate the "add r19, r19, #1"
>> instruction but omit the "sll r19, r19, <8:0>" instruction at the time -
>> as long as it remembers that there may be extra data in the higher bits
>> of r19. Maybe there will be more operations on "i" later, or a "byte
>> store" operation that renders the masking operation unnecessary.

Re: Compact representation for common integer constants

<2beb943b-4dbe-4a48-9b26-2bb19af01b1an@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16837&group=comp.arch#16837

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:dc08:: with SMTP id s8mr1817084qvk.12.1621286788023;
Mon, 17 May 2021 14:26:28 -0700 (PDT)
X-Received: by 2002:a9d:1918:: with SMTP id j24mr22678ota.329.1621286787771;
Mon, 17 May 2021 14:26:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 17 May 2021 14:26:27 -0700 (PDT)
In-Reply-To: <s7ulc4$1bo$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e8e8:91fc:a9b8:ab12;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e8e8:91fc:a9b8:ab12
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me> <6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me> <f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me> <9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me> <1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7ir12$p9u$1@dont-email.me> <e0345ee6-d311-4813-b7cc-09e46af1aa12n@googlegroups.com>
<s7tgfv$eht$1@dont-email.me> <75ce2084-577e-4273-9d31-174ca4372479n@googlegroups.com>
<s7ulc4$1bo$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2beb943b-4dbe-4a48-9b26-2bb19af01b1an@googlegroups.com>
Subject: Re: Compact representation for common integer constants
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 17 May 2021 21:26:28 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 17 May 2021 21:26 UTC

On Monday, May 17, 2021 at 3:57:45 PM UTC-5, David Brown wrote:
> On 17/05/2021 17:29, MitchAlsup wrote:
> > On Monday, May 17, 2021 at 5:28:18 AM UTC-5, David Brown wrote:
>
> >> C does not even defined arithmetic on smaller types. There is no such
> >> operation as "increment of unsigned char" in C. When you write
> >> "unsigned char i = 0; i++;", to C this means precisely the same as:
> >>
> >> unsigned char i = 0;
> >>
> >> int tmp = i;
> >> tmp = tmp + 1;
> >> i = tmp;
> >>
> >> (If "int" and "char" are the same size, the promotion is to "unsigned int".)
> >>
> >>
> >> The compiler can generate any code it likes in order to get the same
> >> results, but that is the logical operation.
> > <
> > And apparently, the container containing shorter than full register width data
> > need to get smashed back to the defined container significant size after
> > each operation.
<
> No, it does not have to be. A compiler /might/ do that - it's perhaps
> easier to write compilers that put the masking in after a statement like
> "i++;" here. But compilers certainly don't /have/ to do it. It comes
> down to the quality of the compiler's optimisations and code generation.
<
Other than this single fault, the quality of the code is outstanding !
> >>
> >> That means that the compiler could generate the "add r19, r19, #1"
> >> instruction but omit the "sll r19, r19, <8:0>" instruction at the time -
> >> as long as it remembers that there may be extra data in the higher bits
> >> of r19. Maybe there will be more operations on "i" later, or a "byte
> >> store" operation that renders the masking operation unnecessary.

Re: Compact representation for common integer constants

<s7v6j1$qns$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=16844&group=comp.arch#16844

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bage...@gmail.com (Brian G. Lucas)
Newsgroups: comp.arch
Subject: Re: Compact representation for common integer constants
Date: Mon, 17 May 2021 20:51:27 -0500
Organization: A noiseless patient Spider
Lines: 117
Message-ID: <s7v6j1$qns$1@dont-email.me>
References: <44003c05-8b05-4e0e-acb8-bb252be14d26n@googlegroups.com>
<s6udkp$hs5$1@dont-email.me>
<6a45a966-9d86-40ed-9b16-67766956d46fn@googlegroups.com>
<s74akj$siq$1@dont-email.me>
<f94d31bd-ae99-4d0a-84d3-d16e9ba71c6fn@googlegroups.com>
<s74muh$vqf$1@dont-email.me> <s789v4$rv6$1@newsreader4.netcologne.de>
<s7a8u7$mui$1@dont-email.me>
<9f36daff-8b8f-4550-80ad-2f75dd98f319n@googlegroups.com>
<s7bo65$9gq$1@dont-email.me> <s7h9kc$q16$1@dont-email.me>
<1b0596d8-5d9a-4a33-a4e3-ff9d34fd0fc2n@googlegroups.com>
<s7qlup$dm0$1@newsreader4.netcologne.de>
<e29a79f4-80ba-4dcb-8079-cf2f87a86b3en@googlegroups.com>
<s7svtl$qlt$1@newsreader4.netcologne.de>
<bcd3dd9a-cfcd-4691-9163-842ddf1f483dn@googlegroups.com>
<s7u338$cek$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 18 May 2021 01:51:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="24eebaa98e64fea5131a005490b3e74d";
logging-data="27388"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TuWeaaPKnIu3GvX7PzB6+"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.8.1
Cancel-Lock: sha1:Cf0GIZkxYzu/q0DedxpXNRNgkJk=
In-Reply-To: <s7u338$cek$1@newsreader4.netcologne.de>
Content-Language: en-US

by: Brian G. Lucas - Tue, 18 May 2021 01:51 UTC

On 5/17/21 10:45 AM, Thomas Koenig wrote:
> MitchAlsup <MitchAlsup@aol.com> schrieb:
>> On Monday, May 17, 2021 at 12:45:27 AM UTC-5, Thomas Koenig wrote:
>>> MitchAlsup <Mitch...@aol.com> schrieb:
>>>> On Sunday, May 16, 2021 at 3:43:08 AM UTC-5, Thomas Koenig wrote:
>>>>> MitchAlsup <Mitch...@aol.com> schrieb:
>>>>>> When you start using LLVM as a front end, it throws these clamps out any time
>>>>>> a value in a register gets arithmetically manipulated--preventing the register from
>>>>>> ever having a value outside of the type defined value-space.
>>>>>> <
>>>>>> Unsigned char i = 0;
>>>>>> .......
>>>>>> i++
>>>>>> <
>>>>>> gets compiled to::
>>>>>> <
>>>>>> MOV R19,#0
>>>>>> .......
>>>>>> ADD R19,R19,#1
>>>>>> SLL R19,R19,<8:0>
>>>>>> <
>>>>>> Yes it is ugly......but people should not ask for small types unless they need the small type.
>>>>> Does the "small types" include the C standard unsigned type?
>>>> <
>>>> It appears to be attached all integral types smaller than 64-bits.
>>> That's bad.
>>>
>>> There is an implicit assumption that people who use the default
>>> types in languages like C or Fortran get code that is not
>>> sub-optimal. This violates that assumption.
>>>
>>> Normally, I would not suggest using hardware to get around
>>> software limits, it should be the other way around.
>>>
>>> However, if you cannot get this fixed,
>> <
>> This permeates the entire set of front ends LLVM supports; and
>> apparently is required for C to call/get-called from ADA.
>
> Here is the test case from the PR, which gcc suboptimizes, on
> POWER (which shares only having 64-bit operations with your
> architecture).
>
> $ cat foo.c
> unsigned int foo(unsigned int *a, int n)
> {
> int i;
> unsigned int res = 0;
> for (i=0; i<n; i++)
> res += a[i];
>
> return res;
> }
> $ clang -O1 -c foo.c && objdump --disassemble foo.o
>
> foo.o: file format elf64-powerpcle
>
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
> 0: 01 00 04 2c cmpwi r4,1
> 4: 30 00 80 41 blt 34 <foo+0x34>
> 8: 20 00 85 78 clrldi r5,r4,32
> c: fc ff 83 38 addi r4,r3,-4
> 10: 00 00 60 38 li r3,0
> 14: a6 03 a9 7c mtctr r5
> 18: 00 00 00 60 nop
> 1c: 00 00 00 60 nop
> 20: 04 00 a4 84 lwzu r5,4(r4)
> 24: 14 1a 65 7c add r3,r5,r3
> 28: f8 ff 00 42 bdnz 20 <foo+0x20>
> 2c: 20 00 63 78 clrldi r3,r3,32
> 30: 20 00 80 4e blr
> 34: 00 00 60 38 li r3,0
> 38: 20 00 80 4e blr
>
> which has one mask instruction, at the end.
>
> This is with clang version 10.0.0-4ubuntu1.
>
>>> fuse these two instructions to both operation itself and zero out
>>> the upper bits, depending on type size?
>> <
>> Only by making all integer instructions other than 64-bits take 2 cycles.
>
> So that is not an option :-)
>
> What do you get if you run the test case above through your
> compiler? If you get the mask instruction in the loop,
> then something is very wrong.
>
The compiler is currently based on LLVM 9. There is no mask in the loop.
(I do wish that the loop counter and limit were "unsigned", but that also
seems to be a problem with legacy code.) However, there is some other
stupidness in the compiled code (compiled without VVM enabled:
foo: ; @foo
sra r3,r2,<32:0>
cmp r3,r3,#1
blt r3,.LBB0_1
srl r3,r2,<32:0>
mov r2,#0
..LBB0_3: ; =>This Inner Loop Header: Depth=1
lduw r4,[r1]
add r2,r4,r2
add r3,r3,#-1
add r1,r1,#4
bne0 r3,.LBB0_3
mov r1,r2
ret
..LBB0_1:
mov r2,#0
mov r1,r2
ret

brian

"Love may fail, but courtesy will previal." -- A Kurt Vonnegut fan

devel / comp.arch / Re: Compact representation for common integer constants

Subject	Author
Compact representation for common integer constants	JohnG
Re: Compact representation for common integer constants	Ivan Godard
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	JohnG
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Ivan Godard
Re: Compact representation for common integer constants	Marcus
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	MitchAlsup
Clamping. was: Compact representation for common integer constants	Ivan Godard
Re: Clamping. was: Compact representation for common integer constants	MitchAlsup
Re: Clamping. was: Compact representation for common integer	Ivan Godard
Re: Clamping. was: Compact representation for common integer constants	MitchAlsup
Re: Clamping. was: Compact representation for common integer	BGB
Re: Clamping. was: Compact representation for common integer	Ivan Godard
Re: Clamping. was: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Marcus
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Stefan Monnier
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	David Brown
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	Bill Findlay
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Bill Findlay
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	Niklas Holsti
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Bill Findlay
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Terje Mathisen
Re: Compact representation for common integer constants	Stephen Fuld
Re: Compact representation for common integer constants	EricP
Re: Compact representation for common integer constants	Anton Ertl
Re: Compact representation for common integer constants	Thomas Koenig
Re: Compact representation for common integer constants	MitchAlsup
Re: Compact representation for common integer constants	Brian G. Lucas
Re: Compact representation for common integer constants	Quadibloc
Re: Compact representation for common integer constants	BGB
Re: Compact representation for common integer constants	John Levine