Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Wish not to seem, but to be, the best." -- Aeschylus


devel / comp.arch / Re: INC and DEC on AMD64

SubjectAuthor
* Re: Predictive Store ForwardingThomas Koenig
`* INC and DEC on AMD64 (was: Predictive Store Forwarding)Anton Ertl
 `* Re: INC and DEC on AMD64EricP
  `* Re: INC and DEC on AMD64MitchAlsup
   +* Re: INC and DEC on AMD64Terje Mathisen
   |`- Re: INC and DEC on AMD64MitchAlsup
   `- Re: INC and DEC on AMD64EricP

1
Re: Predictive Store Forwarding

<s68tme$9cv$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16399&group=comp.arch#16399

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-deec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Predictive Store Forwarding
Date: Tue, 27 Apr 2021 11:48:30 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <s68tme$9cv$1@newsreader4.netcologne.de>
References: <2021Apr10.132712@mips.complang.tuwien.ac.at>
<D4ieI.27704$N_4.8675@fx36.iad>
<2021Apr16.192352@mips.complang.tuwien.ac.at>
<FmkeI.14410$lyv9.13421@fx35.iad> <XOneI.3805$8O4.2396@fx16.iad>
<2021Apr18.120331@mips.complang.tuwien.ac.at>
<IRXeI.17446$%W6.245@fx44.iad>
<2021Apr18.184815@mips.complang.tuwien.ac.at>
<2021Apr18.191909@mips.complang.tuwien.ac.at>
<2021Apr26.100345@mips.complang.tuwien.ac.at>
<60868900.3704375@news.eternal-september.org>
<s66330$h8i$1@newsreader4.netcologne.de>
<2021Apr26.184059@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 27 Apr 2021 11:48:30 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-deec-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:deec:0:7285:c2ff:fe6c:992d";
logging-data="9631"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 27 Apr 2021 11:48 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>>INC and DEC are egregiously misdesigned. They only modify part
>>of the flags registers, which creates all kinds of dependencies
>>on previous writes to the flag registers.
>>
>>Just don't use them.
>
> These instructions cause complications during CPU design. These
> complications are in the CPU and paid for, so there is no reason not
> to use these instructions.

I think Agner Fog explains it better than I can.

From his "Optimizing subroutines in assembly language - An
optimization guide for x86 platforms" in the version of
2021-01-31:

# The INC and DEC instructions do not modify the carry flag but
# they do modify the other arithmetic flags. Writing to only part
# of the flags register costs an extra μop on some CPUs. It can
# cause a partial flags stalls on some older Intel processors if
# a subsequent instruction reads the carry flag or all the flag
# bits. On all processors, it can cause a false dependence on the
# carry flag from a previous instruction.

# Use ADD and SUB when optimizing for speed. Use INC and DEC when
# optimizing for size or when no penalty is expected.

> And if you think that Intel/AMD will be able to get rid of these
> instructions if VFX Forth does not use them, think again.

It will surely be kept, but if I were king, I would ban generating
these instructions in all compilers unless explictly optimizing
for size.

INC and DEC on AMD64 (was: Predictive Store Forwarding)

<2021Apr28.173710@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16410&group=comp.arch#16410

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: INC and DEC on AMD64 (was: Predictive Store Forwarding)
Date: Wed, 28 Apr 2021 15:37:10 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 50
Distribution: world
Message-ID: <2021Apr28.173710@mips.complang.tuwien.ac.at>
References: <2021Apr10.132712@mips.complang.tuwien.ac.at> <XOneI.3805$8O4.2396@fx16.iad> <2021Apr18.120331@mips.complang.tuwien.ac.at> <IRXeI.17446$%W6.245@fx44.iad> <2021Apr18.184815@mips.complang.tuwien.ac.at> <2021Apr18.191909@mips.complang.tuwien.ac.at> <2021Apr26.100345@mips.complang.tuwien.ac.at> <60868900.3704375@news.eternal-september.org> <s66330$h8i$1@newsreader4.netcologne.de> <2021Apr26.184059@mips.complang.tuwien.ac.at> <s68tme$9cv$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="9fd2443fbf7ae773f80ea9725d79f347";
logging-data="17531"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19xDTQV+WQRjzkh3xaaMIPJ"
Cancel-Lock: sha1:LpyMjiaIwowMsaRSxzNGTzCXGm0=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 28 Apr 2021 15:37 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
>> Thomas Koenig <tkoenig@netcologne.de> writes:
>>>INC and DEC are egregiously misdesigned. They only modify part
>>>of the flags registers, which creates all kinds of dependencies
>>>on previous writes to the flag registers.
>>>
>>>Just don't use them.
>>
>> These instructions cause complications during CPU design. These
>> complications are in the CPU and paid for, so there is no reason not
>> to use these instructions.
>
>I think Agner Fog explains it better than I can.
>
>From his "Optimizing subroutines in assembly language - An
>optimization guide for x86 platforms" in the version of
>2021-01-31:
>
># The INC and DEC instructions do not modify the carry flag but
># they do modify the other arithmetic flags. Writing to only part
># of the flags register costs an extra μop on some CPUs. It can
># cause a partial flags stalls on some older Intel processors if
># a subsequent instruction reads the carry flag or all the flag
># bits.

Unfortunately, this does not tell me which CPUs are affected.

># On all processors, it can cause a false dependence on the
># carry flag from a previous instruction.

That seems to presume that INC and DEC work by producing a single
merged flags result, so INC/DEC would need to merge the flags they
change with the carry flag from some earlier instruction. AFAIK in
modern CPUs (probably going back quite a while) there are three
individual flag parts (IIRC C,V, and the rest). Intel demonstrates
with ADX that Intel CPUs don't have such a performance gotcha, and
IIRC Mitch Alsup mentioned that AMD also does it that way (don't
remember when they started with that).

>It will surely be kept, but if I were king, I would ban generating
>these instructions in all compilers unless explictly optimizing
>for size.

You are king, at least in name:-)

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: INC and DEC on AMD64

<dXgiI.257142$2N3.116343@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16411&group=comp.arch#16411

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: INC and DEC on AMD64
References: <2021Apr10.132712@mips.complang.tuwien.ac.at> <XOneI.3805$8O4.2396@fx16.iad> <2021Apr18.120331@mips.complang.tuwien.ac.at> <IRXeI.17446$%W6.245@fx44.iad> <2021Apr18.184815@mips.complang.tuwien.ac.at> <2021Apr18.191909@mips.complang.tuwien.ac.at> <2021Apr26.100345@mips.complang.tuwien.ac.at> <60868900.3704375@news.eternal-september.org> <s66330$h8i$1@newsreader4.netcologne.de> <2021Apr26.184059@mips.complang.tuwien.ac.at> <s68tme$9cv$1@newsreader4.netcologne.de> <2021Apr28.173710@mips.complang.tuwien.ac.at>
In-Reply-To: <2021Apr28.173710@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 28
Message-ID: <dXgiI.257142$2N3.116343@fx33.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 28 Apr 2021 17:28:41 UTC
Date: Wed, 28 Apr 2021 13:28:31 -0400
X-Received-Bytes: 2288
X-Received-Body-CRC: 109765577
 by: EricP - Wed, 28 Apr 2021 17:28 UTC

Anton Ertl wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>
>> # On all processors, it can cause a false dependence on the
>> # carry flag from a previous instruction.
>
> That seems to presume that INC and DEC work by producing a single
> merged flags result, so INC/DEC would need to merge the flags they
> change with the carry flag from some earlier instruction. AFAIK in
> modern CPUs (probably going back quite a while) there are three
> individual flag parts (IIRC C,V, and the rest). Intel demonstrates
> with ADX that Intel CPUs don't have such a performance gotcha, and
> IIRC Mitch Alsup mentioned that AMD also does it that way (don't
> remember when they started with that).

There are many other instructions that do partial flags updates.
e.g. BTC Bit Test and Complement, CMPXCHGxxx Compare and Exchange,

The various shift SAL/SAR/SHL/SHR and rotate RCL/RCR/ROL/ROR
only updates CF flag if the masked count is != 0.
Rotates update OF if masked count == 0 but shifts do not.
Other flags are unaffected.
So shift unit has to read the old CF and OF and decide whether
to overwrite or propagate them based on count and operation.

Re: INC and DEC on AMD64

<76afaa5d-6c79-4b1a-9ad4-13caa306b7d1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16412&group=comp.arch#16412

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:7745:: with SMTP id s66mr29023937qkc.18.1619631562376;
Wed, 28 Apr 2021 10:39:22 -0700 (PDT)
X-Received: by 2002:a05:6830:1d56:: with SMTP id p22mr24373340oth.329.1619631562176;
Wed, 28 Apr 2021 10:39:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Apr 2021 10:39:21 -0700 (PDT)
In-Reply-To: <dXgiI.257142$2N3.116343@fx33.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e877:396b:dffa:f3cf;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e877:396b:dffa:f3cf
References: <2021Apr10.132712@mips.complang.tuwien.ac.at> <XOneI.3805$8O4.2396@fx16.iad>
<2021Apr18.120331@mips.complang.tuwien.ac.at> <IRXeI.17446$%W6.245@fx44.iad>
<2021Apr18.184815@mips.complang.tuwien.ac.at> <2021Apr18.191909@mips.complang.tuwien.ac.at>
<2021Apr26.100345@mips.complang.tuwien.ac.at> <60868900.3704375@news.eternal-september.org>
<s66330$h8i$1@newsreader4.netcologne.de> <2021Apr26.184059@mips.complang.tuwien.ac.at>
<s68tme$9cv$1@newsreader4.netcologne.de> <2021Apr28.173710@mips.complang.tuwien.ac.at>
<dXgiI.257142$2N3.116343@fx33.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <76afaa5d-6c79-4b1a-9ad4-13caa306b7d1n@googlegroups.com>
Subject: Re: INC and DEC on AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 28 Apr 2021 17:39:22 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 28 Apr 2021 17:39 UTC

On Wednesday, April 28, 2021 at 12:28:45 PM UTC-5, EricP wrote:
> Anton Ertl wrote:
> > Thomas Koenig <tko...@netcologne.de> writes:
> >
> >> # On all processors, it can cause a false dependence on the
> >> # carry flag from a previous instruction.
> >
> > That seems to presume that INC and DEC work by producing a single
> > merged flags result, so INC/DEC would need to merge the flags they
> > change with the carry flag from some earlier instruction. AFAIK in
> > modern CPUs (probably going back quite a while) there are three
> > individual flag parts (IIRC C,V, and the rest). Intel demonstrates
> > with ADX that Intel CPUs don't have such a performance gotcha, and
> > IIRC Mitch Alsup mentioned that AMD also does it that way (don't
> > remember when they started with that).
> There are many other instructions that do partial flags updates.
> e.g. BTC Bit Test and Complement, CMPXCHGxxx Compare and Exchange,
>
> The various shift SAL/SAR/SHL/SHR and rotate RCL/RCR/ROL/ROR
> only updates CF flag if the masked count is != 0.
> Rotates update OF if masked count == 0 but shifts do not.
> Other flags are unaffected.
> So shift unit has to read the old CF and OF and decide whether
> to overwrite or propagate them based on count and operation.
<
The architect who designed this flag stuff should be taken out and shot,
then hanged, drawn and quartered, and dropped in boiling oil.

Re: INC and DEC on AMD64

<s6ccp0$1tt7$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16413&group=comp.arch#16413

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!+9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: INC and DEC on AMD64
Date: Wed, 28 Apr 2021 21:24:17 +0200
Organization: Aioe.org NNTP Server
Lines: 59
Message-ID: <s6ccp0$1tt7$1@gioia.aioe.org>
References: <2021Apr10.132712@mips.complang.tuwien.ac.at>
<XOneI.3805$8O4.2396@fx16.iad> <2021Apr18.120331@mips.complang.tuwien.ac.at>
<IRXeI.17446$%W6.245@fx44.iad> <2021Apr18.184815@mips.complang.tuwien.ac.at>
<2021Apr18.191909@mips.complang.tuwien.ac.at>
<2021Apr26.100345@mips.complang.tuwien.ac.at>
<60868900.3704375@news.eternal-september.org>
<s66330$h8i$1@newsreader4.netcologne.de>
<2021Apr26.184059@mips.complang.tuwien.ac.at>
<s68tme$9cv$1@newsreader4.netcologne.de>
<2021Apr28.173710@mips.complang.tuwien.ac.at>
<dXgiI.257142$2N3.116343@fx33.iad>
<76afaa5d-6c79-4b1a-9ad4-13caa306b7d1n@googlegroups.com>
NNTP-Posting-Host: +9JlleTFc3MOERf2LU/SVA.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Wed, 28 Apr 2021 19:24 UTC

MitchAlsup wrote:
> On Wednesday, April 28, 2021 at 12:28:45 PM UTC-5, EricP wrote:
>> Anton Ertl wrote:
>>> Thomas Koenig <tko...@netcologne.de> writes:
>>>
>>>> # On all processors, it can cause a false dependence on the
>>>> # carry flag from a previous instruction.
>>>
>>> That seems to presume that INC and DEC work by producing a single
>>> merged flags result, so INC/DEC would need to merge the flags they
>>> change with the carry flag from some earlier instruction. AFAIK in
>>> modern CPUs (probably going back quite a while) there are three
>>> individual flag parts (IIRC C,V, and the rest). Intel demonstrates
>>> with ADX that Intel CPUs don't have such a performance gotcha, and
>>> IIRC Mitch Alsup mentioned that AMD also does it that way (don't
>>> remember when they started with that).
>> There are many other instructions that do partial flags updates.
>> e.g. BTC Bit Test and Complement, CMPXCHGxxx Compare and Exchange,
>>
>> The various shift SAL/SAR/SHL/SHR and rotate RCL/RCR/ROL/ROR
>> only updates CF flag if the masked count is != 0.
>> Rotates update OF if masked count == 0 but shifts do not.
>> Other flags are unaffected.
>> So shift unit has to read the old CF and OF and decide whether
>> to overwrite or propagate them based on count and operation.
> <
> The architect who designed this flag stuff should be taken out and shot,
> then hanged, drawn and quartered, and dropped in boiling oil.
>
Hmmm...

To me it almost sounds like you don't particularly like it, tell me that
can't be true!

More seriously, with perfect 20-20 hindsight it would have been better,
starting about 20 years after the original 8086 design, to have had a
separate set of flag-updating instructions.

OTOH, I really cannot blame them too much given how many times I have
taken advantage of the various flag quirks in my asm code, and as I
wrote a few days ago, some of these were crucially important in getting
the 8088 in the original PC to run usefully fast.

The main Achilles' heel back then was the extremely low bandwidth of
just 1/4 byte per cycle, and it had to be shared by code & data
combined, so minimizing the number of executed bytes as well as taken
branches were the only really important rules.

In real code the only common instructions that ran slower then their
load time was probably just MUL and the repeated string ops. I know I
used a MUL to allow the prefetch queue to fill up when I wanted to
measure the size of it to check if I was running on an 8088 or a 16-bit
bus 8086 (which had 2 more prefetch buffer bytes).

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: INC and DEC on AMD64

<10abb271-ff64-4d00-a835-a589c2135a79n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16414&group=comp.arch#16414

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:8b86:: with SMTP id n128mr30183016qkd.151.1619638655989;
Wed, 28 Apr 2021 12:37:35 -0700 (PDT)
X-Received: by 2002:aca:355:: with SMTP id 82mr1538641oid.155.1619638655697;
Wed, 28 Apr 2021 12:37:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Apr 2021 12:37:35 -0700 (PDT)
In-Reply-To: <s6ccp0$1tt7$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:e877:396b:dffa:f3cf;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:e877:396b:dffa:f3cf
References: <2021Apr10.132712@mips.complang.tuwien.ac.at> <XOneI.3805$8O4.2396@fx16.iad>
<2021Apr18.120331@mips.complang.tuwien.ac.at> <IRXeI.17446$%W6.245@fx44.iad>
<2021Apr18.184815@mips.complang.tuwien.ac.at> <2021Apr18.191909@mips.complang.tuwien.ac.at>
<2021Apr26.100345@mips.complang.tuwien.ac.at> <60868900.3704375@news.eternal-september.org>
<s66330$h8i$1@newsreader4.netcologne.de> <2021Apr26.184059@mips.complang.tuwien.ac.at>
<s68tme$9cv$1@newsreader4.netcologne.de> <2021Apr28.173710@mips.complang.tuwien.ac.at>
<dXgiI.257142$2N3.116343@fx33.iad> <76afaa5d-6c79-4b1a-9ad4-13caa306b7d1n@googlegroups.com>
<s6ccp0$1tt7$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <10abb271-ff64-4d00-a835-a589c2135a79n@googlegroups.com>
Subject: Re: INC and DEC on AMD64
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 28 Apr 2021 19:37:35 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 28 Apr 2021 19:37 UTC

On Wednesday, April 28, 2021 at 2:24:21 PM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Wednesday, April 28, 2021 at 12:28:45 PM UTC-5, EricP wrote:
> >> Anton Ertl wrote:
> >>> Thomas Koenig <tko...@netcologne.de> writes:
> >>>
> >>>> # On all processors, it can cause a false dependence on the
> >>>> # carry flag from a previous instruction.
> >>>
> >>> That seems to presume that INC and DEC work by producing a single
> >>> merged flags result, so INC/DEC would need to merge the flags they
> >>> change with the carry flag from some earlier instruction. AFAIK in
> >>> modern CPUs (probably going back quite a while) there are three
> >>> individual flag parts (IIRC C,V, and the rest). Intel demonstrates
> >>> with ADX that Intel CPUs don't have such a performance gotcha, and
> >>> IIRC Mitch Alsup mentioned that AMD also does it that way (don't
> >>> remember when they started with that).
> >> There are many other instructions that do partial flags updates.
> >> e.g. BTC Bit Test and Complement, CMPXCHGxxx Compare and Exchange,
> >>
> >> The various shift SAL/SAR/SHL/SHR and rotate RCL/RCR/ROL/ROR
> >> only updates CF flag if the masked count is != 0.
> >> Rotates update OF if masked count == 0 but shifts do not.
> >> Other flags are unaffected.
> >> So shift unit has to read the old CF and OF and decide whether
> >> to overwrite or propagate them based on count and operation.
> > <
> > The architect who designed this flag stuff should be taken out and shot,
> > then hanged, drawn and quartered, and dropped in boiling oil.
> >
> Hmmm...
>
> To me it almost sounds like you don't particularly like it, tell me that
> can't be true!
<
I jest not.
<
>
> More seriously, with perfect 20-20 hindsight it would have been better,
> starting about 20 years after the original 8086 design, to have had a
> separate set of flag-updating instructions.
>
No flags at all is better still. The PDP-8 showed the way (excepting for that
link bit thingie.)
>
>
> OTOH, I really cannot blame them too much given how many times I have
> taken advantage of the various flag quirks in my asm code, and as I
> wrote a few days ago, some of these were crucially important in getting
> the 8088 in the original PC to run usefully fast.
>
> The main Achilles' heel back then was the extremely low bandwidth of
> just 1/4 byte per cycle, and it had to be shared by code & data
> combined, so minimizing the number of executed bytes as well as taken
> branches were the only really important rules.
>
> In real code the only common instructions that ran slower then their
> load time was probably just MUL and the repeated string ops. I know I
> used a MUL to allow the prefetch queue to fill up when I wanted to
> measure the size of it to check if I was running on an 8088 or a 16-bit
> bus 8086 (which had 2 more prefetch buffer bytes).
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: INC and DEC on AMD64

<QEyiI.13703$5%7.12018@fx13.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16420&group=comp.arch#16420

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fdcspool3.netnews.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer04.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: INC and DEC on AMD64
References: <2021Apr10.132712@mips.complang.tuwien.ac.at> <XOneI.3805$8O4.2396@fx16.iad> <2021Apr18.120331@mips.complang.tuwien.ac.at> <IRXeI.17446$%W6.245@fx44.iad> <2021Apr18.184815@mips.complang.tuwien.ac.at> <2021Apr18.191909@mips.complang.tuwien.ac.at> <2021Apr26.100345@mips.complang.tuwien.ac.at> <60868900.3704375@news.eternal-september.org> <s66330$h8i$1@newsreader4.netcologne.de> <2021Apr26.184059@mips.complang.tuwien.ac.at> <s68tme$9cv$1@newsreader4.netcologne.de> <2021Apr28.173710@mips.complang.tuwien.ac.at> <dXgiI.257142$2N3.116343@fx33.iad> <76afaa5d-6c79-4b1a-9ad4-13caa306b7d1n@googlegroups.com>
In-Reply-To: <76afaa5d-6c79-4b1a-9ad4-13caa306b7d1n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 52
Message-ID: <QEyiI.13703$5%7.12018@fx13.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 29 Apr 2021 13:37:52 UTC
Date: Thu, 29 Apr 2021 09:36:43 -0400
X-Received-Bytes: 3724
 by: EricP - Thu, 29 Apr 2021 13:36 UTC

MitchAlsup wrote:
> On Wednesday, April 28, 2021 at 12:28:45 PM UTC-5, EricP wrote:
>> Anton Ertl wrote:
>>> Thomas Koenig <tko...@netcologne.de> writes:
>>>
>>>> # On all processors, it can cause a false dependence on the
>>>> # carry flag from a previous instruction.
>>> That seems to presume that INC and DEC work by producing a single
>>> merged flags result, so INC/DEC would need to merge the flags they
>>> change with the carry flag from some earlier instruction. AFAIK in
>>> modern CPUs (probably going back quite a while) there are three
>>> individual flag parts (IIRC C,V, and the rest). Intel demonstrates
>>> with ADX that Intel CPUs don't have such a performance gotcha, and
>>> IIRC Mitch Alsup mentioned that AMD also does it that way (don't
>>> remember when they started with that).
>> There are many other instructions that do partial flags updates.
>> e.g. BTC Bit Test and Complement, CMPXCHGxxx Compare and Exchange,
>>
>> The various shift SAL/SAR/SHL/SHR and rotate RCL/RCR/ROL/ROR
>> only updates CF flag if the masked count is != 0.
>> Rotates update OF if masked count == 0 but shifts do not.
>> Other flags are unaffected.
>> So shift unit has to read the old CF and OF and decide whether
>> to overwrite or propagate them based on count and operation.
> <
> The architect who designed this flag stuff should be taken out and shot,
> then hanged, drawn and quartered, and dropped in boiling oil.

INC, DEC and RCL/RCR/ROL/ROR are courtesy of the 8008.
The rotates were 1 bit only and it had no shifts.
It has 4 flags, carry, zero, sign and parity
and the partial flag updates starts here
(e.g. rotates only update CF not others).

The auxiliary carry flag, aka half/nibble carry, comes from the 8080.

The overflow flag and multi-bit shifts and rotates are from 8086.

The real kicker is that for shifts, the value of the OF flag
is only set correctly if the shift count == 1.
For all other count values, the OF flag is undefined
(and the 8086 manual notes this).

So to summarize, in x86 and x64, WRT shifts and the overflow flag,
all that separate flag rename and value forwarding and OoO wake-up
network logic everyone worked so diligently to create,
its' purpose is to maintain compatibility back to the 8086
with a value that, for shift count != 1, is documented as garbage.

Have a nice day! :-)

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor