Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The meek shall inherit the earth; the rest of us will go to the stars.


computers / comp.os.vms / Re: x86-64 data aligment / faulting

SubjectAuthor
* x86-64 data aligment / faultingMark Daniel
+* Re: x86-64 data aligment / faultingMark Daniel
|`* Re: x86-64 data aligment / faultingArne Vajhøj
| `* Re: x86-64 data aligment / faultingBob Gezelter
|  `* Re: x86-64 data aligment / faultingArne Vajhøj
|   +* Re: x86-64 data aligment / faultingBob Gezelter
|   |`* Re: x86-64 data aligment / faultingArne Vajhøj
|   | +* Re: x86-64 data aligment / faultingArne Vajhøj
|   | |`* Re: x86-64 data aligment / faultingJoukj
|   | | `- Re: x86-64 data aligment / faultingArne Vajhøj
|   | `* Re: x86-64 data aligment / faultingSimon Clubley
|   |  `* Re: x86-64 data aligment / faultingSimon Clubley
|   |   `* Re: x86-64 data aligment / faultingArne Vajhøj
|   |    `- Re: x86-64 data aligment / faultingabrsvc
|   `* Re: x86-64 data aligment / faultingSimon Clubley
|    `* Re: x86-64 data aligment / faultingBill Gunshannon
|     +- Re: x86-64 data aligment / faultingScott Dorsey
|     `* Re: x86-64 data aligment / faultinggah4
|      +* Re: x86-64 data aligment / faultingJan-Erik Söderholm
|      |`- Re: x86-64 data aligment / faultinggah4
|      `* Re: x86-64 data aligment / faultingHein RMS van den Heuvel
|       `- Re: x86-64 data aligment / faultinggah4
+- Re: x86-64 data aligment / faultingStephen Hoffman
+- Re: x86-64 data aligment / faultinggah4
+* Re: x86-64 data aligment / faultingMark Daniel
|+- Re: x86-64 data alignment / faultingScott Dorsey
|+* Re: x86-64 data alignment / faultingArne Vajhøj
||`- Re: x86-64 data alignment / faultingScott Dorsey
|+* Re: x86-64 data aligment / faultingJohn Reagan
||`* Re: x86-64 data aligment / faultingJohn Reagan
|| `- Re: x86-64 data aligment / faultingArne Vajhøj
|`- Re: x86-64 data alignment / faultingMichael S
`* Re: x86-64 data aligment / faultingJon Schneider
 +- Re: x86-64 data aligment / faultingScott Dorsey
 `- Re: x86-64 data aligment / faultingMichael S

Pages:12
x86-64 data aligment / faulting

<GdcSJ.1239823$X81f.965693@fx14.ams4>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21038&group=comp.os.vms#21038

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx14.ams4.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.6.1
Newsgroups: comp.os.vms
Reply-To: mark.daniel@wasd.vsm.com.au
Content-Language: en-US
From: mark.dan...@wasd.vsm.com.au (Mark Daniel)
Subject: x86-64 data aligment / faulting
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 8
Message-ID: <GdcSJ.1239823$X81f.965693@fx14.ams4>
X-Complaints-To: abuse@eweka.nl
NNTP-Posting-Date: Fri, 25 Feb 2022 21:53:42 UTC
Organization: Eweka Internet Services
Date: Sat, 26 Feb 2022 08:23:39 +1030
X-Received-Bytes: 1106
 by: Mark Daniel - Fri, 25 Feb 2022 21:53 UTC

Alpha and Itanium had data alignment requirements with penalties for
faulting. Does x86-64? Is sys$start_align_fault_report() et al. still
relevant?

--
Anyone, who using social-media, forms an opinion regarding anything
other than the relative cuteness of this or that puppy-dog, needs
seriously to examine their critical thinking.

Re: x86-64 data aligment / faulting

<s1eSJ.2188479$_xc2.746813@fx02.ams4>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21039&group=comp.os.vms#21039

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx02.ams4.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.6.1
Reply-To: mark.daniel@wasd.vsm.com.au
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
From: mark.dan...@wasd.vsm.com.au (Mark Daniel)
In-Reply-To: <GdcSJ.1239823$X81f.965693@fx14.ams4>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 12
Message-ID: <s1eSJ.2188479$_xc2.746813@fx02.ams4>
X-Complaints-To: abuse@eweka.nl
NNTP-Posting-Date: Fri, 25 Feb 2022 23:57:12 UTC
Organization: Eweka Internet Services
Date: Sat, 26 Feb 2022 10:27:11 +1030
X-Received-Bytes: 1379
 by: Mark Daniel - Fri, 25 Feb 2022 23:57 UTC

On 26/2/22 8:23 am, Mark Daniel wrote:
> Alpha and Itanium had data alignment requirements with penalties for
> faulting.  Does x86-64?  Is sys$start_align_fault_report() et al. still
> relevant?

Hmmm. Using an alignment fault generator and reporter I'm seeing plenty
on Alpha and Itanium; zero on x86-64.

--
Anyone, who using social-media, forms an opinion regarding anything
other than the relative cuteness of this or that puppy-dog, needs
seriously to examine their critical thinking.

Re: x86-64 data aligment / faulting

<62197084$0$701$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21040&group=comp.os.vms#21040

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Fri, 25 Feb 2022 19:12:44 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
From: arn...@vajhoej.dk (Arne Vajhøj)
In-Reply-To: <s1eSJ.2188479$_xc2.746813@fx02.ams4>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 19
Message-ID: <62197084$0$701$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 705b9eae.news.sunsite.dk
X-Trace: 1645834372 news.sunsite.dk 701 arne@vajhoej.dk/68.9.63.232:57683
X-Complaints-To: staff@sunsite.dk
 by: Arne Vajhøj - Sat, 26 Feb 2022 00:12 UTC

On 2/25/2022 6:57 PM, Mark Daniel wrote:
> On 26/2/22 8:23 am, Mark Daniel wrote:
>> Alpha and Itanium had data alignment requirements with penalties for
>> faulting.  Does x86-64?  Is sys$start_align_fault_report() et al.
>> still relevant?
>
> Hmmm.  Using an alignment fault generator and reporter I'm seeing plenty
> on Alpha and Itanium; zero on x86-64.

I had an old Fortran program testing alignment overhead and
I just ran it on Windows x86-64 and it showed absolutely
no overhead for bad alignment of REAL*8 arrays (and there is
a lot of overhead on VMS Alpha and Itanium).

I guess we can say welcome back to CISC. :-)

Arne

Re: x86-64 data aligment / faulting

<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21041&group=comp.os.vms#21041

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:a05:620a:15c1:b0:649:1a2b:4850 with SMTP id o1-20020a05620a15c100b006491a2b4850mr6601703qkm.525.1645850240524;
Fri, 25 Feb 2022 20:37:20 -0800 (PST)
X-Received: by 2002:ac8:118a:0:b0:2d8:a3b2:7501 with SMTP id
d10-20020ac8118a000000b002d8a3b27501mr9481440qtj.354.1645850240369; Fri, 25
Feb 2022 20:37:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Fri, 25 Feb 2022 20:37:20 -0800 (PST)
In-Reply-To: <62197084$0$701$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.113.217; posting-account=r2_qcwoAAACbIdit5Eka3ivGvrYZz7UQ
NNTP-Posting-Host: 100.2.113.217
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: gezel...@rlgsc.com (Bob Gezelter)
Injection-Date: Sat, 26 Feb 2022 04:37:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 47
 by: Bob Gezelter - Sat, 26 Feb 2022 04:37 UTC

On Friday, February 25, 2022 at 7:12:55 PM UTC-5, Arne Vajhøj wrote:
> On 2/25/2022 6:57 PM, Mark Daniel wrote:
> > On 26/2/22 8:23 am, Mark Daniel wrote:
> >> Alpha and Itanium had data alignment requirements with penalties for
> >> faulting. Does x86-64? Is sys$start_align_fault_report() et al.
> >> still relevant?
> >
> > Hmmm. Using an alignment fault generator and reporter I'm seeing plenty
> > on Alpha and Itanium; zero on x86-64.
> I had an old Fortran program testing alignment overhead and
> I just ran it on Windows x86-64 and it showed absolutely
> no overhead for bad alignment of REAL*8 arrays (and there is
> a lot of overhead on VMS Alpha and Itanium).
>
> I guess we can say welcome back to CISC. :-)
>
> Arne
Arne,

With all due respect, the performance penalty for non-aligned references is still very real, speaking as one who did a lot of work on non-faulting IBM System/370 processors back in the day. The same was true with VAX CPUs. They did not fault, but they paid a performance penalty.

There is a difference in context from the days of the System/370 and the VAX: multi-level large caches.

The caches close to the processing core are very fast. This obscures the loss of performance due to non-aligned references. Second, all loads/stores to/from a cache are, almost by definition, aligned.

A program designed to produce alignment faults is also very likely to not abuse the memory system in a way to detect the mis-aligned data penalty. Faults, which are synchronous interrupts, have overhead orders of magnitude more than a double memory fetch, particularly when sequential elements are referenced (sequential elements may well be in the same cache line, even if not aligned on the proper boundary).

If I had the spare time to play with it, I would write a program to randomly address a storage area beyond total cache size, so that every memory reference is a cache miss. Run aligned and unaligned data references and compare the result.

It is easy for a benchmark to measure the incorrect phenomenon.

- Bob Gezelter, http://www.rlgsc.com

Re: x86-64 data aligment / faulting

<svce4c$1ub$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21042&group=comp.os.vms#21042

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: seaoh...@hoffmanlabs.invalid (Stephen Hoffman)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Sat, 26 Feb 2022 00:34:04 -0500
Organization: HoffmanLabs LLC
Lines: 39
Message-ID: <svce4c$1ub$1@dont-email.me>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="8e880ef3e0f3bbd42d6a4f6c73b33655";
logging-data="1995"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+L9KvxUQI2kSFjUsHC+YGQuk8IMXocLi4="
User-Agent: Unison/2.2
Cancel-Lock: sha1:llCfzcB5cwLj7avLouzEdtiFkdM=
 by: Stephen Hoffman - Sat, 26 Feb 2022 05:34 UTC

On 2022-02-25 22:23:39 +0000, Mark Daniel said:

> Alpha and Itanium had data alignment requirements with penalties for
> faulting. Does x86-64? Is sys$start_align_fault_report() et al. still
> relevant?

I suspect the VSI answer will be "It depends". Some instruction subsets
can either greatly benefit from or require alignment. Others, not so
much. With recent processor generations, alignment is not so much of an
issue, save for how much cache it might use.

VSI will be using llvm code generation for the foreseeable future too,
which mostly-avoids dealing with this for compiled code. A lot of folks
have looked at that code generation. At least until VSI has ~finished
the port, and starts profiling VSI and app code, that is.

"On the Sandy Bridge, there is no performance penalty for reading or
writing misaligned memory operands, except for the fact that it uses
more cache banks so that the risk of cache conflicts is higher when the
operand is misaligned. Store-to-load forwarding also works with
misaligned operands in most cases."

Page table swappage will be interesting to profile, as OpenVMS has
~twice the usual number of tables in use with the four-mode "emulation"
work.

Some related reading...

https://www.agner.org/optimize/
https://www.agner.org/forum/viewtopic.php?f=1&t=75&sid=9aec4d9491a7ec1f7d01e971651c13be

https://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html
https://community.intel.com/t5/Software-Tuning-Performance/Why-should-data-be-aligned-to-16-bytes-for-SSE-instructions/m-p/1164004

https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/

--
Pure Personal Opinion | HoffmanLabs LLC

Re: x86-64 data aligment / faulting

<621a9d77$0$701$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21048&group=comp.os.vms#21048

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Sat, 26 Feb 2022 16:36:46 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
From: arn...@vajhoej.dk (Arne Vajhøj)
In-Reply-To: <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 97
Message-ID: <621a9d77$0$701$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 604b035f.news.sunsite.dk
X-Trace: 1645911415 news.sunsite.dk 701 arne@vajhoej.dk/68.9.63.232:59394
X-Complaints-To: staff@sunsite.dk
 by: Arne Vajhøj - Sat, 26 Feb 2022 21:36 UTC

On 2/25/2022 11:37 PM, Bob Gezelter wrote:
> On Friday, February 25, 2022 at 7:12:55 PM UTC-5, Arne Vajhøj wrote:
>> On 2/25/2022 6:57 PM, Mark Daniel wrote:
>>> On 26/2/22 8:23 am, Mark Daniel wrote:
>>>> Alpha and Itanium had data alignment requirements with
>>>> penalties for faulting. Does x86-64? Is
>>>> sys$start_align_fault_report() et al. still relevant?
>>>
>>> Hmmm. Using an alignment fault generator and reporter I'm seeing
>>> plenty on Alpha and Itanium; zero on x86-64.
>> I had an old Fortran program testing alignment overhead and I just
>> ran it on Windows x86-64 and it showed absolutely no overhead for
>> bad alignment of REAL*8 arrays (and there is a lot of overhead on
>> VMS Alpha and Itanium).
>>
>> I guess we can say welcome back to CISC. :-)
>
> With all due respect, the performance penalty for non-aligned
> references is still very real, speaking as one who did a lot of work
> on non-faulting IBM System/370 processors back in the day. The same
> was true with VAX CPUs. They did not fault, but they paid a
> performance penalty.
>
> There is a difference in context from the days of the System/370 and
> the VAX: multi-level large caches.
>
> The caches close to the processing core are very fast. This obscures
> the loss of performance due to non-aligned references. Second, all
> loads/stores to/from a cache are, almost by definition, aligned.
>
> A program designed to produce alignment faults is also very likely to
> not abuse the memory system in a way to detect the mis-aligned data
> penalty. Faults, which are synchronous interrupts, have overhead
> orders of magnitude more than a double memory fetch, particularly
> when sequential elements are referenced (sequential elements may well
> be in the same cache line, even if not aligned on the proper
> boundary).
>
> If I had the spare time to play with it, I would write a program to
> randomly address a storage area beyond total cache size, so that
> every memory reference is a cache miss. Run aligned and unaligned
> data references and compare the result.
>
> It is easy for a benchmark to measure the incorrect phenomenon.

There are lies, damn lies and benchmarks.

:-)

I tested on a 2 MB array.

And I admit that the results can be due to many things.

But the numbers sure show a big difference!

Fortran/VMS/Itanium:

OFFSET 0 : 590 ms
OFFSET 1 : 197510 ms
OFFSET 2 : 197510 ms
OFFSET 3 : 197520 ms
OFFSET 4 : 197510 ms
OFFSET 5 : 197510 ms
OFFSET 6 : 197510 ms
OFFSET 7 : 197510 ms
OFFSET 8 : 590 ms
OFFSET 9 : 197510 ms
OFFSET 10 : 197520 ms
OFFSET 11 : 197520 ms
OFFSET 12 : 197520 ms
OFFSET 13 : 197520 ms
OFFSET 14 : 197520 ms
OFFSET 15 : 197520 ms
OFFSET 16 : 580 ms

GFortran/Windows/x86-64 (100x more reps):

OFFSET 0 : 7473 ms
OFFSET 1 : 7285 ms
OFFSET 2 : 7301 ms
OFFSET 3 : 7301 ms
OFFSET 4 : 7269 ms
OFFSET 5 : 7208 ms
OFFSET 6 : 7191 ms
OFFSET 7 : 7192 ms
OFFSET 8 : 7519 ms
OFFSET 9 : 7285 ms
OFFSET 10 : 7270 ms
OFFSET 11 : 7285 ms
OFFSET 12 : 7270 ms
OFFSET 13 : 7207 ms
OFFSET 14 : 7176 ms
OFFSET 15 : 7176 ms
OFFSET 16 : 7473 ms

Arne

Re: x86-64 data aligment / faulting

<19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21050&group=comp.os.vms#21050

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:ac8:5b82:0:b0:2cf:232d:b1f8 with SMTP id a2-20020ac85b82000000b002cf232db1f8mr12549969qta.58.1645937451507;
Sat, 26 Feb 2022 20:50:51 -0800 (PST)
X-Received: by 2002:a05:622a:94:b0:2de:88ff:438a with SMTP id
o20-20020a05622a009400b002de88ff438amr12226522qtw.60.1645937451352; Sat, 26
Feb 2022 20:50:51 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Sat, 26 Feb 2022 20:50:51 -0800 (PST)
In-Reply-To: <621a9d77$0$701$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=100.2.113.217; posting-account=r2_qcwoAAACbIdit5Eka3ivGvrYZz7UQ
NNTP-Posting-Host: 100.2.113.217
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: gezel...@rlgsc.com (Bob Gezelter)
Injection-Date: Sun, 27 Feb 2022 04:50:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 108
 by: Bob Gezelter - Sun, 27 Feb 2022 04:50 UTC

On Saturday, February 26, 2022 at 4:36:58 PM UTC-5, Arne Vajhøj wrote:
> On 2/25/2022 11:37 PM, Bob Gezelter wrote:
> > On Friday, February 25, 2022 at 7:12:55 PM UTC-5, Arne Vajhøj wrote:
> >> On 2/25/2022 6:57 PM, Mark Daniel wrote:
> >>> On 26/2/22 8:23 am, Mark Daniel wrote:
> >>>> Alpha and Itanium had data alignment requirements with
> >>>> penalties for faulting. Does x86-64? Is
> >>>> sys$start_align_fault_report() et al. still relevant?
> >>>
> >>> Hmmm. Using an alignment fault generator and reporter I'm seeing
> >>> plenty on Alpha and Itanium; zero on x86-64.
> >> I had an old Fortran program testing alignment overhead and I just
> >> ran it on Windows x86-64 and it showed absolutely no overhead for
> >> bad alignment of REAL*8 arrays (and there is a lot of overhead on
> >> VMS Alpha and Itanium).
> >>
> >> I guess we can say welcome back to CISC. :-)
> >
> > With all due respect, the performance penalty for non-aligned
> > references is still very real, speaking as one who did a lot of work
> > on non-faulting IBM System/370 processors back in the day. The same
> > was true with VAX CPUs. They did not fault, but they paid a
> > performance penalty.
> >
> > There is a difference in context from the days of the System/370 and
> > the VAX: multi-level large caches.
> >
> > The caches close to the processing core are very fast. This obscures
> > the loss of performance due to non-aligned references. Second, all
> > loads/stores to/from a cache are, almost by definition, aligned.
> >
> > A program designed to produce alignment faults is also very likely to
> > not abuse the memory system in a way to detect the mis-aligned data
> > penalty. Faults, which are synchronous interrupts, have overhead
> > orders of magnitude more than a double memory fetch, particularly
> > when sequential elements are referenced (sequential elements may well
> > be in the same cache line, even if not aligned on the proper
> > boundary).
> >
> > If I had the spare time to play with it, I would write a program to
> > randomly address a storage area beyond total cache size, so that
> > every memory reference is a cache miss. Run aligned and unaligned
> > data references and compare the result.
> >
> > It is easy for a benchmark to measure the incorrect phenomenon.
> There are lies, damn lies and benchmarks.
>
> :-)
>
> I tested on a 2 MB array.
>
> And I admit that the results can be due to many things.
>
> But the numbers sure show a big difference!
>
> Fortran/VMS/Itanium:
>
> OFFSET 0 : 590 ms
> OFFSET 1 : 197510 ms
> OFFSET 2 : 197510 ms
> OFFSET 3 : 197520 ms
> OFFSET 4 : 197510 ms
> OFFSET 5 : 197510 ms
> OFFSET 6 : 197510 ms
> OFFSET 7 : 197510 ms
> OFFSET 8 : 590 ms
> OFFSET 9 : 197510 ms
> OFFSET 10 : 197520 ms
> OFFSET 11 : 197520 ms
> OFFSET 12 : 197520 ms
> OFFSET 13 : 197520 ms
> OFFSET 14 : 197520 ms
> OFFSET 15 : 197520 ms
> OFFSET 16 : 580 ms
>
> GFortran/Windows/x86-64 (100x more reps):
>
> OFFSET 0 : 7473 ms
> OFFSET 1 : 7285 ms
> OFFSET 2 : 7301 ms
> OFFSET 3 : 7301 ms
> OFFSET 4 : 7269 ms
> OFFSET 5 : 7208 ms
> OFFSET 6 : 7191 ms
> OFFSET 7 : 7192 ms
> OFFSET 8 : 7519 ms
> OFFSET 9 : 7285 ms
> OFFSET 10 : 7270 ms
> OFFSET 11 : 7285 ms
> OFFSET 12 : 7270 ms
> OFFSET 13 : 7207 ms
> OFFSET 14 : 7176 ms
> OFFSET 15 : 7176 ms
> OFFSET 16 : 7473 ms
>
> Arne
Arne,

One needs to analyze beyond raw performance. In this case, I start with the cache organization and related structure. If you "break" the cache, buy forcing every reference to be a cache miss, one will essentially see the 2x performance loss.

If the cache is able to gain anything, it will skew the numbers.

- Bob Gezelter, http://www.rlgsc.com

Re: x86-64 data aligment / faulting

<svft4t$2m8$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21054&group=comp.os.vms#21054

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: club...@remove_me.eisner.decus.org-Earth.UFP (Simon Clubley)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Sun, 27 Feb 2022 13:08:45 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <svft4t$2m8$1@dont-email.me>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4> <62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com> <621a9d77$0$701$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 27 Feb 2022 13:08:45 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f848d0abf5312a98f41dad97a12c59bd";
logging-data="2760"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+XcwnLI7rX+m08wNKYRwSC81uyDd+sySs="
User-Agent: slrn/0.9.8.1 (VMS/Multinet)
Cancel-Lock: sha1:JEZt2/5t4XJt9J8sSGuu6bgdEoM=
 by: Simon Clubley - Sun, 27 Feb 2022 13:08 UTC

On 2022-02-26, Arne Vajhøj <arne@vajhoej.dk> wrote:
>
> GFortran/Windows/x86-64 (100x more reps):
>
> OFFSET 0 : 7473 ms
> OFFSET 1 : 7285 ms
> OFFSET 2 : 7301 ms
> OFFSET 3 : 7301 ms
> OFFSET 4 : 7269 ms
> OFFSET 5 : 7208 ms
> OFFSET 6 : 7191 ms
> OFFSET 7 : 7192 ms
> OFFSET 8 : 7519 ms
> OFFSET 9 : 7285 ms
> OFFSET 10 : 7270 ms
> OFFSET 11 : 7285 ms
> OFFSET 12 : 7270 ms
> OFFSET 13 : 7207 ms
> OFFSET 14 : 7176 ms
> OFFSET 15 : 7176 ms
> OFFSET 16 : 7473 ms
>

In situations like this Arne, it is _always_ a good idea to have
a look at the generated code just to make sure that the compiler
has not done anything unexpected.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

Re: x86-64 data aligment / faulting

<j81h0pFk89bU1@mid.individual.net>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21056&group=comp.os.vms#21056

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: bill.gun...@gmail.com (Bill Gunshannon)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Sun, 27 Feb 2022 09:43:37 -0500
Lines: 10
Message-ID: <j81h0pFk89bU1@mid.individual.net>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net oCV45wpFYA2UReW8idS/QwF3d20+n5bBvYlYcEt8VtUgd73GFw
Cancel-Lock: sha1:0DL6HWOtTVn36+u5dme6N7Ad0Xc=
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.5.0
Content-Language: en-US
In-Reply-To: <svft4t$2m8$1@dont-email.me>
 by: Bill Gunshannon - Sun, 27 Feb 2022 14:43 UTC

When the Alpha first came out its attackers used to provide
example code that performed very badly to prove they were
better. Wasn't this one of the methods used to write those
really bad performing programs? This and forcing repeated
dumping of the pipelined instructions?

bill

Re: x86-64 data aligment / faulting

<svgih1$p25$1@panix2.panix.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21062&group=comp.os.vms#21062

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix2.panix.com!panix2.panix.com!not-for-mail
From: klu...@panix.com (Scott Dorsey)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: 27 Feb 2022 19:13:37 -0000
Organization: Former users of Netcom shell (1989-2000)
Lines: 21
Message-ID: <svgih1$p25$1@panix2.panix.com>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me> <j81h0pFk89bU1@mid.individual.net>
Injection-Info: reader1.panix.com; posting-host="panix2.panix.com:166.84.1.2";
logging-data="26696"; mail-complaints-to="abuse@panix.com"
 by: Scott Dorsey - Sun, 27 Feb 2022 19:13 UTC

Bill Gunshannon <bill.gunshannon@gmail.com> wrote:
>When the Alpha first came out its attackers used to provide
>example code that performed very badly to prove they were
>better. Wasn't this one of the methods used to write those
>really bad performing programs? This and forcing repeated
>dumping of the pipelined instructions?

Yes, although has been pointed out, if you have a large enough array and do
random accesses so that a pull from main memory is needed with nearly every
one, you'll find the x86_64 also performs very badly.

But, having a whole lot of cache helps a whole lot at speeding up clean
well-written code with unaligned accesses, as is noted.

Unfortunately the code that people use to demonstrate defects in system
architectures is frequently little different than the code written by
inexperienced programmers.
--scott

--
"C'est un Nagra. C'est suisse, et tres, tres precis."

Re: x86-64 data aligment / faulting

<f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21064&group=comp.os.vms#21064

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:a37:785:0:b0:54e:39dc:2c28 with SMTP id 127-20020a370785000000b0054e39dc2c28mr9740975qkh.109.1645993002204;
Sun, 27 Feb 2022 12:16:42 -0800 (PST)
X-Received: by 2002:a05:620a:f0e:b0:507:d59b:a928 with SMTP id
v14-20020a05620a0f0e00b00507d59ba928mr9477390qkl.617.1645993002081; Sun, 27
Feb 2022 12:16:42 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Sun, 27 Feb 2022 12:16:41 -0800 (PST)
In-Reply-To: <j81h0pFk89bU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:9700:4689:9d5f:f86:4938:adb0;
posting-account=gLDX1AkAAAA26M5HM-O3sVMAXdxK9FPA
NNTP-Posting-Host: 2601:602:9700:4689:9d5f:f86:4938:adb0
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me> <j81h0pFk89bU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: gah...@u.washington.edu (gah4)
Injection-Date: Sun, 27 Feb 2022 20:16:42 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 24
 by: gah4 - Sun, 27 Feb 2022 20:16 UTC

On Sunday, February 27, 2022 at 6:43:41 AM UTC-8, Bill Gunshannon wrote:
> When the Alpha first came out its attackers used to provide
> example code that performed very badly to prove they were
> better. Wasn't this one of the methods used to write those
> really bad performing programs? This and forcing repeated
> dumping of the pipelined instructions?

As well as I remember it, Alpha only has 32 bit and 64 bit aligned
load/store instructions. If you want something else, you do it with
some other instructions. Shifts and such.

If the compiler does all that for you, it will be pretty slow.

In the Fortran 66 days, it was believe that Fortran COMMON didn't
allow for padding. Some systems would have complicated systems
to avoid problems. Ones that I remember trapped the alignment
exception, copied that data, reran the instruction, and maybe
copied data back.

But then came processors with imprecise interrupts, which
didn't allow finding the offending instruction.

But now COMMON, and C's struct, allow for padding, so it usually
isn't a problem.

Re: x86-64 data aligment / faulting

<svgol9$ng2$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21066&group=comp.os.vms#21066

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jan-erik...@telia.com (Jan-Erik Söderholm)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Sun, 27 Feb 2022 21:58:17 +0100
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <svgol9$ng2$1@dont-email.me>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me>
<j81h0pFk89bU1@mid.individual.net>
<f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 27 Feb 2022 20:58:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4e08347e2374b6d22ba0d4707f1e2e86";
logging-data="24066"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/vKDSffVxeYe+hf/2ouUKn"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.0
Cancel-Lock: sha1:GP3x3bftakpeUiRYo+5KD+NMVTc=
In-Reply-To: <f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
Content-Language: sv
 by: Jan-Erik Söderholm - Sun, 27 Feb 2022 20:58 UTC

Den 2022-02-27 kl. 21:16, skrev gah4:
> On Sunday, February 27, 2022 at 6:43:41 AM UTC-8, Bill Gunshannon wrote:
>> When the Alpha first came out its attackers used to provide
>> example code that performed very badly to prove they were
>> better. Wasn't this one of the methods used to write those
>> really bad performing programs? This and forcing repeated
>> dumping of the pipelined instructions?
>
> As well as I remember it, Alpha only has 32 bit and 64 bit aligned
> load/store instructions.

https://en.wikipedia.org/wiki/DEC_Alpha#Byte-Word_Extensions_(BWX)

Re: x86-64 data aligment / faulting

<621c2bc3$0$693$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21073&group=comp.os.vms#21073

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Sun, 27 Feb 2022 20:56:15 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk>
<19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
From: arn...@vajhoej.dk (Arne Vajhøj)
In-Reply-To: <19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 221
Message-ID: <621c2bc3$0$693$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: ba8a7c5c.news.sunsite.dk
X-Trace: 1646013379 news.sunsite.dk 693 arne@vajhoej.dk/68.9.63.232:56745
X-Complaints-To: staff@sunsite.dk
 by: Arne Vajhøj - Mon, 28 Feb 2022 01:56 UTC

On 2/26/2022 11:50 PM, Bob Gezelter wrote:
> On Saturday, February 26, 2022 at 4:36:58 PM UTC-5, Arne Vajhøj wrote:
>> On 2/25/2022 11:37 PM, Bob Gezelter wrote:
>>> It is easy for a benchmark to measure the incorrect phenomenon.
>> There are lies, damn lies and benchmarks.
>>
>> :-)
>>
>> I tested on a 2 MB array.
>>
>> And I admit that the results can be due to many things.
>>
>> But the numbers sure show a big difference!
>>
>> Fortran/VMS/Itanium:
>>
>> OFFSET 0 : 590 ms
>> OFFSET 1 : 197510 ms
>> OFFSET 2 : 197510 ms
>> OFFSET 3 : 197520 ms
>> OFFSET 4 : 197510 ms
>> OFFSET 5 : 197510 ms
>> OFFSET 6 : 197510 ms
>> OFFSET 7 : 197510 ms
>> OFFSET 8 : 590 ms
>> OFFSET 9 : 197510 ms
>> OFFSET 10 : 197520 ms
>> OFFSET 11 : 197520 ms
>> OFFSET 12 : 197520 ms
>> OFFSET 13 : 197520 ms
>> OFFSET 14 : 197520 ms
>> OFFSET 15 : 197520 ms
>> OFFSET 16 : 580 ms
>>
>> GFortran/Windows/x86-64 (100x more reps):
>>
>> OFFSET 0 : 7473 ms
>> OFFSET 1 : 7285 ms
>> OFFSET 2 : 7301 ms
>> OFFSET 3 : 7301 ms
>> OFFSET 4 : 7269 ms
>> OFFSET 5 : 7208 ms
>> OFFSET 6 : 7191 ms
>> OFFSET 7 : 7192 ms
>> OFFSET 8 : 7519 ms
>> OFFSET 9 : 7285 ms
>> OFFSET 10 : 7270 ms
>> OFFSET 11 : 7285 ms
>> OFFSET 12 : 7270 ms
>> OFFSET 13 : 7207 ms
>> OFFSET 14 : 7176 ms
>> OFFSET 15 : 7176 ms
>> OFFSET 16 : 7473 ms

> One needs to analyze beyond raw performance. In this case, I start
> with the cache organization and related structure. If you "break" the
> cache, buy forcing every reference to be a cache miss, one will
> essentially see the 2x performance loss.
>
> If the cache is able to gain anything, it will skew the numbers.

I modified the program to test with different data sizes, verified
that the code was indeed working on an unaligned addresses and
tried both sequential and random access to array.

I simply can't get a big difference between aligned and unaligned access.

Data at the bottom.

I am not saying that there are no cases where unaligned data access
make a significant difference.

But I have not been able to come up with a case.

Arne

DATA SIZE = 2500 ( 20000 BYTES), REP = 100000, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 718 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 717 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 718 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 717 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 717 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 717 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 733 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 25000 ( 200000 BYTES), REP = 10000, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 702 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 702 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 702 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 702 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 686 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 718 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 702 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 702 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 702 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 250000 ( 2000000 BYTES), REP = 1000, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 717 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 702 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 717 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 718 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 702 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 717 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 717 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 2500000 (20000000 BYTES), REP = 100, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 717 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 717 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 718 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 717 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 717 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 702 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 718 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 717 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 718 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 702 ms
DATA SIZE = 4096 ( 32768 BYTES), REP = 16666, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 733 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 733 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 734 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 733 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 733 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 733 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 734 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 733 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 734 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 748 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 734 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 733 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 733 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 32768 ( 262144 BYTES), REP = 1666, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 561 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 593 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 609 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 577 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 593 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 577 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 593 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 593 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 577 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 577 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 608 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 593 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 593 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 577 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 593 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 577 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 577 ms
DATA SIZE = 262144 ( 2097152 BYTES), REP = 166, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 609 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 624 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 639 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 624 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 640 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 624 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 640 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 624 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 608 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 624 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 624 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 639 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 624 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 624 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 640 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 624 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 608 ms
DATA SIZE = 2097152 (16777216 BYTES), REP = 16, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 733 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 749 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 749 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 749 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 749 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 748 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 749 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 749 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 733 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 749 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 748 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 749 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 749 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 749 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 749 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 764 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms

Re: x86-64 data aligment / faulting

<621c3300$0$692$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21074&group=comp.os.vms#21074

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Sun, 27 Feb 2022 21:27:13 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk>
<19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
<621c2bc3$0$693$14726298@news.sunsite.dk>
From: arn...@vajhoej.dk (Arne Vajhøj)
In-Reply-To: <621c2bc3$0$693$14726298@news.sunsite.dk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 73
Message-ID: <621c3300$0$692$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 8595447e.news.sunsite.dk
X-Trace: 1646015233 news.sunsite.dk 692 arne@vajhoej.dk/68.9.63.232:57981
X-Complaints-To: staff@sunsite.dk
 by: Arne Vajhøj - Mon, 28 Feb 2022 02:27 UTC

On 2/27/2022 8:56 PM, Arne Vajhøj wrote:
> I simply can't get a big difference between aligned and unaligned access.
>
> Data at the bottom.
>
> I am not saying that there are no cases where unaligned data access
> make a significant difference.
>
> But I have not been able to come up with a case.

Code below in case anybody wonder WTF I am doing.

Arne

PROGRAM TEST_ALIGN
INTEGER*4 N,REP,OFFSET
PARAMETER (N=2500000,REP=100,OFFSET=17)
BYTE DUMMY(OFFSET)
REAL*8 X(N+(OFFSET-1)/8+1)
EQUIVALENCE (DUMMY,X)
INTEGER*4 I,J,SCALE(4)
DATA SCALE/1000,100,10,1/
DO 200 J=1,REP
DO 100 I=1,N+1
X(I)=I
100 CONTINUE
200 CONTINUE
DO 400 J=1,4
WRITE(6,700) N/SCALE(J),8*N/SCALE(J),REP*SCALE(J),
+ 'SEQUENTIAL ACCESS'
DO 300 I=1,OFFSET
CALL TEST(I,DUMMY(I),N/SCALE(J),REP*SCALE(J),.FALSE.)
300 CONTINUE
400 CONTINUE
DO 600 J=1,4
WRITE(6,700) 2**(9+3*J),8*2**(9+3*J),REP*SCALE(J)/6,
+ 'RANDOM ACCESS'
DO 500 I=1,OFFSET
CALL TEST(I,DUMMY(I),2**(9+3*J),REP*SCALE(J)/6,.TRUE.)
500 CONTINUE
600 CONTINUE
700 FORMAT(1X,'DATA SIZE = ',I7,' (',I8,' BYTES), REP = ',I6,', ',A)
END

and:

SUBROUTINE TEST(IX,X,N,REP,RANACC)
INTEGER*4 IX,N,REP
REAL*8 X(*)
LOGICAL*4 RANACC
INTEGER*4 I,T1,T2,DUMMY,RANIX
CALL SYSTEM_CLOCK(T1,DUMMY,DUMMY)
DO 200 J=1,REP
IF(RANACC) RANIX=0
DO 100 I=1,N
IF(RANACC) THEN
RANIX=MOD(401*RANIX+1,N)
X(RANIX+1)=I
ELSE
X(I)=I
ENDIF
100 CONTINUE
200 CONTINUE
CALL SYSTEM_CLOCK(T2,DUMMY,DUMMY)
IF(RANACC) THEN
WRITE(6,300) IX-1,MOD(LOC(X),8),T2-T1
ELSE
WRITE(6,300) IX-1,MOD(LOC(X),8),T2-T1
ENDIF
RETURN
300 FORMAT(1X,'OFFSET ',I2,' (ADDRESS MOD 8 = ',I1,'): ',I6,' ms')
END

Re: x86-64 data aligment / faulting

<6842ad85-244f-42c4-8e13-b44fb0004d7cn@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21075&group=comp.os.vms#21075

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:ac8:5883:0:b0:2dd:1a8:4171 with SMTP id t3-20020ac85883000000b002dd01a84171mr15572376qta.171.1646041676219;
Mon, 28 Feb 2022 01:47:56 -0800 (PST)
X-Received: by 2002:a0c:f004:0:b0:432:9dd7:6942 with SMTP id
z4-20020a0cf004000000b004329dd76942mr12652128qvk.14.1646041676098; Mon, 28
Feb 2022 01:47:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Mon, 28 Feb 2022 01:47:55 -0800 (PST)
In-Reply-To: <GdcSJ.1239823$X81f.965693@fx14.ams4>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:9700:4689:510c:517a:f33d:d4ec;
posting-account=gLDX1AkAAAA26M5HM-O3sVMAXdxK9FPA
NNTP-Posting-Host: 2601:602:9700:4689:510c:517a:f33d:d4ec
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6842ad85-244f-42c4-8e13-b44fb0004d7cn@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: gah...@u.washington.edu (gah4)
Injection-Date: Mon, 28 Feb 2022 09:47:56 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 13
 by: gah4 - Mon, 28 Feb 2022 09:47 UTC

On Friday, February 25, 2022 at 1:53:45 PM UTC-8, Mark Daniel wrote:
> Alpha and Itanium had data alignment requirements with penalties for
> faulting. Does x86-64? Is sys$start_align_fault_report() et al. still
> relevant?

Ordinary instructions for x86 and x86-64 allow unaligned access.

Interlocked instructions, such as CMPXCHG16B require aligned
memory operands, in this case on a 16 byte boundary.

Interlocked operations allow for read-modify-write without allowing
any other task or thread, on any processor, in between.

Re: x86-64 data aligment / faulting

<d85a5f3a-0f95-4cc1-bc07-935b39f2d488n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21082&group=comp.os.vms#21082

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:a5d:47a8:0:b0:1ea:85d5:3cd9 with SMTP id 8-20020a5d47a8000000b001ea85d53cd9mr16439915wrb.349.1646062998761;
Mon, 28 Feb 2022 07:43:18 -0800 (PST)
X-Received: by 2002:ac8:5bc2:0:b0:2de:adca:6624 with SMTP id
b2-20020ac85bc2000000b002deadca6624mr14862871qtb.120.1646062998489; Mon, 28
Feb 2022 07:43:18 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Mon, 28 Feb 2022 07:43:18 -0800 (PST)
In-Reply-To: <f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=24.147.72.155; posting-account=U1iMPAoAAAC9r8wm0KaW63EcF8sfjFeH
NNTP-Posting-Host: 24.147.72.155
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me>
<j81h0pFk89bU1@mid.individual.net> <f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d85a5f3a-0f95-4cc1-bc07-935b39f2d488n@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: heinvand...@gmail.com (Hein RMS van den Heuvel)
Injection-Date: Mon, 28 Feb 2022 15:43:18 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Hein RMS van den Heu - Mon, 28 Feb 2022 15:43 UTC

On Sunday, February 27, 2022 at 3:16:43 PM UTC-5, gah4 wrote:
: > As well as I remember it, Alpha only has 32 bit and 64 bit aligned
> load/store instructions. If you want something else, you do it with
> some other instructions. Shifts and such.
>
> If the compiler does all that for you, it will be pretty slow.

Pretty slow is subjective of course. but really it wasn't too bad imho.

As long as the compiler exactly what to expect unaligned it fetched a larger aligned chunk, mask and shift as needed and onwards.

That was just a few clock cycles extra, MUCH less than the dozens of cycles the actual memory access would take.

If it had the check a suspect address it would be a bit longer.

Now if the compiler did not know or rather was mis-informed (about a dynamic data structure address) then an alignment trap would execute and thousands of instructions would execute some of them with system wide serialization. Ouch.

fwiw,
Hein.

Re: x86-64 data aligment / faulting

<svj4al$e7p$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21083&group=comp.os.vms#21083

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: club...@remove_me.eisner.decus.org-Earth.UFP (Simon Clubley)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Mon, 28 Feb 2022 18:29:42 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <svj4al$e7p$1@dont-email.me>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4> <62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com> <621a9d77$0$701$14726298@news.sunsite.dk> <19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com> <621c2bc3$0$693$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 28 Feb 2022 18:29:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d6c7c7f1d524ba3bf1fad953c41aa653";
logging-data="14585"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Q6dXhdnl0yFKoYT/knciC5SDXsJIpEWw="
User-Agent: slrn/0.9.8.1 (VMS/Multinet)
Cancel-Lock: sha1:lBM0ZfWGBPRQipV2q+efdK38V20=
 by: Simon Clubley - Mon, 28 Feb 2022 18:29 UTC

On 2022-02-27, Arne Vajhøj <arne@vajhoej.dk> wrote:
>
> I modified the program to test with different data sizes, verified
> that the code was indeed working on an unaligned addresses and
> tried both sequential and random access to array.
>
> I simply can't get a big difference between aligned and unaligned access.
>

Have you looked at the generated code to verify that unaligned access
is really occuring ?

Another possibility, depending on how smart the compiler is, is that
it could always do aligned access lookups and then only extract the
data it needs out of each lookup.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

Re: x86-64 data aligment / faulting

<svj5qa$t20$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21084&group=comp.os.vms#21084

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: club...@remove_me.eisner.decus.org-Earth.UFP (Simon Clubley)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Mon, 28 Feb 2022 18:55:06 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <svj5qa$t20$1@dont-email.me>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4> <62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com> <621a9d77$0$701$14726298@news.sunsite.dk> <19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com> <621c2bc3$0$693$14726298@news.sunsite.dk> <svj4al$e7p$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 28 Feb 2022 18:55:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d6c7c7f1d524ba3bf1fad953c41aa653";
logging-data="29760"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Hpz3sIVoi7WLpqjHVDakafsrurb0M0Ss="
User-Agent: slrn/0.9.8.1 (VMS/Multinet)
Cancel-Lock: sha1:EvxGi8/cxyTZJweLUVKFHem3jP8=
 by: Simon Clubley - Mon, 28 Feb 2022 18:55 UTC

On 2022-02-28, Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
> On 2022-02-27, Arne Vajhøj <arne@vajhoej.dk> wrote:
>>
>> I modified the program to test with different data sizes, verified
>> that the code was indeed working on an unaligned addresses and
>> tried both sequential and random access to array.
>>
>> I simply can't get a big difference between aligned and unaligned access.
>>
>
> Have you looked at the generated code to verify that unaligned access
> is really occuring ?
>
> Another possibility, depending on how smart the compiler is, is that
> it could always do aligned access lookups and then only extract the
> data it needs out of each lookup.
>

Also, the optimisation settings and level might make a difference.
Try varying them to see if the behaviour changes, especially when
you switch between optimising for time versus space (and yes, include
-O0 in the list of things you try. :-)).

It's for reasons like this that I suggested at the start of this
to look at the generated code, just in case the compiler is doing
something you are not expecting.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

Re: x86-64 data aligment / faulting

<674b65f0-c83f-41b2-8af4-0cb2b9935569n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21088&group=comp.os.vms#21088

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:a5d:6a41:0:b0:1ed:c1da:6c22 with SMTP id t1-20020a5d6a41000000b001edc1da6c22mr16826175wrw.473.1646091596280;
Mon, 28 Feb 2022 15:39:56 -0800 (PST)
X-Received: by 2002:ac8:7d8b:0:b0:2de:622:59b2 with SMTP id
c11-20020ac87d8b000000b002de062259b2mr18637751qtd.484.1646091596083; Mon, 28
Feb 2022 15:39:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.88.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Mon, 28 Feb 2022 15:39:55 -0800 (PST)
In-Reply-To: <d85a5f3a-0f95-4cc1-bc07-935b39f2d488n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:9700:4689:9c7:a2d:d1f5:e230;
posting-account=gLDX1AkAAAA26M5HM-O3sVMAXdxK9FPA
NNTP-Posting-Host: 2601:602:9700:4689:9c7:a2d:d1f5:e230
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me>
<j81h0pFk89bU1@mid.individual.net> <f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
<d85a5f3a-0f95-4cc1-bc07-935b39f2d488n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <674b65f0-c83f-41b2-8af4-0cb2b9935569n@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: gah...@u.washington.edu (gah4)
Injection-Date: Mon, 28 Feb 2022 23:39:56 +0000
Content-Type: text/plain; charset="UTF-8"
 by: gah4 - Mon, 28 Feb 2022 23:39 UTC

On Monday, February 28, 2022 at 7:43:21 AM UTC-8, Hein RMS van den Heuvel wrote:
> On Sunday, February 27, 2022 at 3:16:43 PM UTC-5, gah4 wrote:
> :
> > As well as I remember it, Alpha only has 32 bit and 64 bit aligned
> > load/store instructions. If you want something else, you do it with
> > some other instructions. Shifts and such.
> >
> > If the compiler does all that for you, it will be pretty slow.

> Pretty slow is subjective of course. but really it wasn't too bad imho.
> As long as the compiler exactly what to expect unaligned it fetched a larger aligned chunk, mask and shift as needed and onwards.
(snip)

> Now if the compiler did not know or rather was mis-informed (about a dynamic data structure address) then an alignment trap would execute and thousands of instructions would execute some of them with system wide serialization. Ouch.

Fortunately, Fortran now allows for padding, such that COMMON variables can be aligned.

In the olden days, I believe before Fortran 90, padding wasn't allowed.
People then learned to arrange variables appropriately, with double precision
variables, usually with the strictest alignment, first.

Since you could take a COMMON array (where the compiler should know its alignment)
and pass it to a subroutine, where it wasn't known, that could happen.

I first learned about this on the IBM 360/91, which doesn't have the ability to fix them
at run time. Because of the pipelining, and especially out-of-order retirement, it doesn't
have the address where the alignment exception occurred. Other S/360 models could
fix it up, though slowly, processing the exception.

One that was a problem for some years, was that the 8087, with a 16 bit bus,
only needed 16 bit alignment. Then the 80486 could use 4 byte alignment,
and compilers (and linkers) supplied that. When Pentium was much slower
with misaligned 8 byte operands (that is, x87 double precision), it took some
years for compilers to adjust.

But since many RISC processors require alignment, people should be used
to it by now.

Re: x86-64 data aligment / faulting

<9a3f3e60-a93b-4e56-a763-2e160226309cn@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21090&group=comp.os.vms#21090

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:a05:6000:1ce:b0:1ed:f90d:e546 with SMTP id t14-20020a05600001ce00b001edf90de546mr16429647wrx.348.1646092497312;
Mon, 28 Feb 2022 15:54:57 -0800 (PST)
X-Received: by 2002:a0c:ab84:0:b0:432:916b:abb9 with SMTP id
j4-20020a0cab84000000b00432916babb9mr15465855qvb.33.1646092497076; Mon, 28
Feb 2022 15:54:57 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Mon, 28 Feb 2022 15:54:56 -0800 (PST)
In-Reply-To: <svgol9$ng2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:602:9700:4689:9c7:a2d:d1f5:e230;
posting-account=gLDX1AkAAAA26M5HM-O3sVMAXdxK9FPA
NNTP-Posting-Host: 2601:602:9700:4689:9c7:a2d:d1f5:e230
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <svft4t$2m8$1@dont-email.me>
<j81h0pFk89bU1@mid.individual.net> <f36424d5-82da-4455-8d3c-491f60241ff7n@googlegroups.com>
<svgol9$ng2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9a3f3e60-a93b-4e56-a763-2e160226309cn@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: gah...@u.washington.edu (gah4)
Injection-Date: Mon, 28 Feb 2022 23:54:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: gah4 - Mon, 28 Feb 2022 23:54 UTC

On Sunday, February 27, 2022 at 12:58:20 PM UTC-8, Jan-Erik Söderholm wrote:

(snip, I wrote)

> > As well as I remember it, Alpha only has 32 bit and 64 bit aligned
> > load/store instructions.

> https://en.wikipedia.org/wiki/DEC_Alpha#Byte-Word_Extensions_(BWX)

But does it really do 8 and 16 bit load/store, or just move the previous logic
into the instructions?

Well, as noted by others, in most case the only question is access to cache..
Cache will load/store some larger unit. But x86 needs the ability to do
byte and word I/O operations, without the read/modify/write cycle it can
use on memory. (Though with some complications on multi processor
systems.)

In the case of bytes and (16 bit) words, and before BWX, the load instruction
ignores the low bits, and loads the whole 32 or 64 bit unit. (As well as I
know, Alpha still calls 16 bits 'word'). Then there are instructions to move
around bytes and 16 bit words between registers, that ignore the high bits.

But misaligned larger data units take somewhat more work.

Does BWX allow, and especially process fast, load or store of a 64 bit
double from/to an odd address?

Re: x86-64 data aligment / faulting

<svqhp4$1b3r$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21150&group=comp.os.vms#21150

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!aioe.org!VBq+0lvnmMNyaA+Ljm3y2w.user.46.165.242.91.POSTED!not-for-mail
From: jou...@hrem.nano.tudelft.nl (Joukj)
Newsgroups: comp.os.vms
Subject: Re: x86-64 data aligment / faulting
Date: Thu, 03 Mar 2022 15:02:11 +0100
Organization: Aioe.org NNTP Server
Message-ID: <svqhp4$1b3r$1@gioia.aioe.org>
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4> <62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com> <621a9d77$0$701$14726298@news.sunsite.dk> <19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com> <621c2bc3$0$693$14726298@news.sunsite.dk> <621c3300$0$692$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="44155"; posting-host="VBq+0lvnmMNyaA+Ljm3y2w.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (X11; U; OpenVMS COMPAQ_Professional_Workstation; en-US; rv:1.8.1.17) Gecko/20081029 SeaMonkey/1.1.12
X-Notice: Filtered by postfilter v. 0.9.2
 by: Joukj - Thu, 3 Mar 2022 14:02 UTC

Arne Vajhøj wrote:
> On 2/27/2022 8:56 PM, Arne Vajhøj wrote:
>> I simply can't get a big difference between aligned and unaligned access.
>>
>> Data at the bottom.
>>
>> I am not saying that there are no cases where unaligned data access
>> make a significant difference.
>>
>> But I have not been able to come up with a case.
>
> Code below in case anybody wonder WTF I am doing.
>
> Arne
>
> PROGRAM TEST_ALIGN
> INTEGER*4 N,REP,OFFSET
> PARAMETER (N=2500000,REP=100,OFFSET=17)
> BYTE DUMMY(OFFSET)
> REAL*8 X(N+(OFFSET-1)/8+1)
> EQUIVALENCE (DUMMY,X)
> INTEGER*4 I,J,SCALE(4)
> DATA SCALE/1000,100,10,1/
> DO 200 J=1,REP
> DO 100 I=1,N+1
> X(I)=I
> 100 CONTINUE
> 200 CONTINUE
> DO 400 J=1,4
> WRITE(6,700) N/SCALE(J),8*N/SCALE(J),REP*SCALE(J),
> + 'SEQUENTIAL ACCESS'
> DO 300 I=1,OFFSET
> CALL TEST(I,DUMMY(I),N/SCALE(J),REP*SCALE(J),.FALSE.)
> 300 CONTINUE
> 400 CONTINUE
> DO 600 J=1,4
> WRITE(6,700) 2**(9+3*J),8*2**(9+3*J),REP*SCALE(J)/6,
> + 'RANDOM ACCESS'
> DO 500 I=1,OFFSET
> CALL TEST(I,DUMMY(I),2**(9+3*J),REP*SCALE(J)/6,.TRUE.)
> 500 CONTINUE
> 600 CONTINUE
> 700 FORMAT(1X,'DATA SIZE = ',I7,' (',I8,' BYTES), REP = ',I6,', ',A)
> END
>
> and:
>
> SUBROUTINE TEST(IX,X,N,REP,RANACC)
> INTEGER*4 IX,N,REP
> REAL*8 X(*)
> LOGICAL*4 RANACC
> INTEGER*4 I,T1,T2,DUMMY,RANIX
> CALL SYSTEM_CLOCK(T1,DUMMY,DUMMY)
> DO 200 J=1,REP
> IF(RANACC) RANIX=0
> DO 100 I=1,N
> IF(RANACC) THEN
> RANIX=MOD(401*RANIX+1,N)
> X(RANIX+1)=I
> ELSE
> X(I)=I
> ENDIF
> 100 CONTINUE
> 200 CONTINUE
> CALL SYSTEM_CLOCK(T2,DUMMY,DUMMY)
> IF(RANACC) THEN
> WRITE(6,300) IX-1,MOD(LOC(X),8),T2-T1
> ELSE
> WRITE(6,300) IX-1,MOD(LOC(X),8),T2-T1
> ENDIF
> RETURN
> 300 FORMAT(1X,'OFFSET ',I2,' (ADDRESS MOD 8 = ',I1,'): ',I6,' ms')
> END
>

As others commented: The compiler may have optimized the allignment away
by calling subroutine test. Maybe you should force the misalligment by
using common-blocks instead of passing as variables.

Jouk

Re: x86-64 data aligment / faulting

<6220da12$0$706$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21158&group=comp.os.vms#21158

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Thu, 3 Mar 2022 10:08:59 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk>
<19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
<621c2bc3$0$693$14726298@news.sunsite.dk>
<621c3300$0$692$14726298@news.sunsite.dk> <svqhp4$1b3r$1@gioia.aioe.org>
From: arn...@vajhoej.dk (Arne Vajhøj)
In-Reply-To: <svqhp4$1b3r$1@gioia.aioe.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 92
Message-ID: <6220da12$0$706$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 82773313.news.sunsite.dk
X-Trace: 1646320146 news.sunsite.dk 706 arne@vajhoej.dk/68.9.63.232:56186
X-Complaints-To: staff@sunsite.dk
 by: Arne Vajhøj - Thu, 3 Mar 2022 15:08 UTC

On 3/3/2022 9:02 AM, Joukj wrote:
> Arne Vajhøj wrote:
>> Code below in case anybody wonder WTF I am doing.

>>       PROGRAM TEST_ALIGN
>>       INTEGER*4 N,REP,OFFSET
>>       PARAMETER (N=2500000,REP=100,OFFSET=17)
>>       BYTE DUMMY(OFFSET)
>>       REAL*8 X(N+(OFFSET-1)/8+1)
>>       EQUIVALENCE (DUMMY,X)
>>       INTEGER*4 I,J,SCALE(4)
>>       DATA SCALE/1000,100,10,1/
>>       DO 200 J=1,REP
>>         DO 100 I=1,N+1
>>           X(I)=I
>> 100     CONTINUE
>> 200   CONTINUE
>>       DO 400 J=1,4
>>         WRITE(6,700) N/SCALE(J),8*N/SCALE(J),REP*SCALE(J),
>>      +               'SEQUENTIAL ACCESS'
>>         DO 300 I=1,OFFSET
>>             CALL TEST(I,DUMMY(I),N/SCALE(J),REP*SCALE(J),.FALSE.)
>> 300     CONTINUE
>> 400   CONTINUE
>>       DO 600 J=1,4
>>         WRITE(6,700) 2**(9+3*J),8*2**(9+3*J),REP*SCALE(J)/6,
>>      +               'RANDOM ACCESS'
>>         DO 500 I=1,OFFSET
>>             CALL TEST(I,DUMMY(I),2**(9+3*J),REP*SCALE(J)/6,.TRUE.)
>> 500     CONTINUE
>> 600   CONTINUE
>> 700   FORMAT(1X,'DATA SIZE = ',I7,' (',I8,' BYTES), REP = ',I6,', ',A)
>>       END
>>
>> and:
>>
>>       SUBROUTINE TEST(IX,X,N,REP,RANACC)
>>       INTEGER*4 IX,N,REP
>>       REAL*8 X(*)
>>       LOGICAL*4 RANACC
>>       INTEGER*4 I,T1,T2,DUMMY,RANIX
>>       CALL SYSTEM_CLOCK(T1,DUMMY,DUMMY)
>>       DO 200 J=1,REP
>>         IF(RANACC) RANIX=0
>>         DO 100 I=1,N
>>           IF(RANACC) THEN
>>             RANIX=MOD(401*RANIX+1,N)
>>             X(RANIX+1)=I
>>           ELSE
>>             X(I)=I
>>           ENDIF
>> 100     CONTINUE
>> 200   CONTINUE
>>       CALL SYSTEM_CLOCK(T2,DUMMY,DUMMY)
>>       IF(RANACC) THEN
>>         WRITE(6,300) IX-1,MOD(LOC(X),8),T2-T1
>>       ELSE
>>         WRITE(6,300) IX-1,MOD(LOC(X),8),T2-T1
>>       ENDIF
>>       RETURN
>> 300   FORMAT(1X,'OFFSET ',I2,' (ADDRESS MOD 8 = ',I1,'): ',I6,' ms')
>>       END
>>
>
> As others commented: The compiler may have optimized the allignment away
> by calling subroutine test.

They are compiled separately.

In the main program test is being called with a byte array. The compiler
does not know that it is being treated as floating point array in the
subroutine.

In the subroutine it believes it receives a floating point array. The
compiler does not know that it is actually being called with a byte
array.

Let us say that the subroutine is being called with the address
1028 and array size 3.

I consider it valid compiler behavior to:
- update 3 FP's at 1028, 1036 and 1044
- throw an error for unaligned access

I consider it a compiler bug to:
- update 3 FP's at 1024, 1032 and 1040
- update 3 FP's at 1032, 1040 and 1048
- update 2 FP's at 1032 and 1040

Arne

Re: x86-64 data aligment / faulting

<6226c249$0$705$14726298@news.sunsite.dk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21237&group=comp.os.vms#21237

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
Date: Mon, 7 Mar 2022 21:41:09 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
<s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk>
<ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk>
<19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
<621c2bc3$0$693$14726298@news.sunsite.dk> <svj4al$e7p$1@dont-email.me>
<svj5qa$t20$1@dont-email.me>
From: arn...@vajhoej.dk (Arne Vajhøj)
In-Reply-To: <svj5qa$t20$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 189
Message-ID: <6226c249$0$705$14726298@news.sunsite.dk>
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 4d205270.news.sunsite.dk
X-Trace: 1646707274 news.sunsite.dk 705 arne@vajhoej.dk/68.9.63.232:54404
X-Complaints-To: staff@sunsite.dk
 by: Arne Vajhøj - Tue, 8 Mar 2022 02:41 UTC

On 2/28/2022 1:55 PM, Simon Clubley wrote:
> On 2022-02-28, Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
>> On 2022-02-27, Arne Vajhøj <arne@vajhoej.dk> wrote:
>>>
>>> I modified the program to test with different data sizes, verified
>>> that the code was indeed working on an unaligned addresses and
>>> tried both sequential and random access to array.
>>>
>>> I simply can't get a big difference between aligned and unaligned access.
>>>
>>
>> Have you looked at the generated code to verify that unaligned access
>> is really occuring ?
>>
>> Another possibility, depending on how smart the compiler is, is that
>> it could always do aligned access lookups and then only extract the
>> data it needs out of each lookup.
>>
>
> Also, the optimisation settings and level might make a difference.
> Try varying them to see if the behaviour changes, especially when
> you switch between optimising for time versus space (and yes, include
> -O0 in the list of things you try. :-)).

No alignment effect in neither default, -O0 or -O3.

> It's for reasons like this that I suggested at the start of this
> to look at the generated code, just in case the compiler is doing
> something you are not expecting.

I am not good enough in x86-64 assembler to see if it does
anything unexpected.

SUBROUTINE TEST(IX,X,N,REP)
INTEGER*4 IX,N,REP
REAL*8 X(*)
INTEGER*4 I,T1,T2,DUMMY
CALL SYSTEM_CLOCK(T1,DUMMY,DUMMY)
DO 200 J=1,REP
DO 100 I=1,N
X(I)=I
100 CONTINUE
200 CONTINUE
CALL SYSTEM_CLOCK(T2,DUMMY,DUMMY)
WRITE(6,300) IX-1,T2-T1
RETURN
300 FORMAT(1X,'OFFSET ',I2,' : ',I6,' ms')
END

becomes:

.seh_proc test_
test_:
pushq %r13
.seh_pushreg %r13
pushq %r12
.seh_pushreg %r12
pushq %rbp
.seh_pushreg %rbp
pushq %rdi
.seh_pushreg %rdi
pushq %rsi
.seh_pushreg %rsi
pushq %rbx
.seh_pushreg %rbx
subq $600, %rsp
.seh_stackalloc 600
.seh_endprologue
movq %r9, %rbp
movq %rcx, %rbx
movq %rdx, %rsi
movq %r8, %r12
leaq 40(%rsp), %rdx
leaq 36(%rsp), %rcx
leaq 44(%rsp), %r8
call _gfortran_system_clock_4
movl 0(%rbp), %r11d
movl 36(%rsp), %edi
testl %r11d, %r11d
jle .L2
movl (%r12), %r8d
testl %r8d, %r8d
jle .L2
movl %r8d, %eax
movl %r8d, %ebp
addl $1, %r11d
movl $1, %ecx
shrl $2, %eax
andl $-4, %ebp
movdqa .LC1(%rip), %xmm3
leal -1(%r8), %r12d
subl $1, %eax
leal 1(%rbp), %r13d
salq $5, %rax
leaq 32(%rsi,%rax), %rdx
.p2align 4,,10
.p2align 3
..L6:
cmpl $2, %r12d
jbe .L7
movdqa .LC0(%rip), %xmm1
movq %rsi, %rax
.p2align 4,,10
.p2align 3
..L4:
movdqa %xmm1, %xmm0
addq $32, %rax
paddd %xmm3, %xmm1
cvtdq2pd %xmm0, %xmm2
pshufd $238, %xmm0, %xmm0
movups %xmm2, -32(%rax)
cvtdq2pd %xmm0, %xmm0
movups %xmm0, -16(%rax)
cmpq %rdx, %rax
jne .L4
movl %r13d, %eax
cmpl %ebp, %r8d
je .L5
..L3:
pxor %xmm0, %xmm0
movslq %eax, %r9
leal 1(%rax), %r10d
cvtsi2sdl %eax, %xmm0
leaq (%rsi,%r9,8), %r9
movsd %xmm0, -8(%r9)
cmpl %r10d, %r8d
jl .L5
pxor %xmm0, %xmm0
addl $2, %eax
cvtsi2sdl %r10d, %xmm0
movsd %xmm0, (%r9)
cmpl %eax, %r8d
jl .L5
pxor %xmm0, %xmm0
cvtsi2sdl %eax, %xmm0
movsd %xmm0, 8(%r9)
..L5:
addl $1, %ecx
cmpl %r11d, %ecx
jne .L6
..L2:
leaq 52(%rsp), %rdx
leaq 56(%rsp), %r8
leaq 48(%rsp), %rcx
leaq 64(%rsp), %r12
call _gfortran_system_clock_4
leaq .LC2(%rip), %rax
movq %r12, %rcx
movl 48(%rsp), %esi
movq %rax, 72(%rsp)
leaq .LC3(%rip), %rax
leaq 60(%rsp), %r13
movq %rax, 144(%rsp)
movq .LC4(%rip), %rax
subl %edi, %esi
movl $12, 80(%rsp)
movq %rax, 64(%rsp)
movq $32, 152(%rsp)
call _gfortran_st_write
movl (%rbx), %eax
movq %r13, %rdx
movq %r12, %rcx
movl $4, %r8d
subl $1, %eax
movl %eax, 60(%rsp)
call _gfortran_transfer_integer_write
movq %r13, %rdx
movq %r12, %rcx
movl %esi, 60(%rsp)
movl $4, %r8d
call _gfortran_transfer_integer_write
movq %r12, %rcx
call _gfortran_st_write_done
nop
addq $600, %rsp
popq %rbx
popq %rsi
popq %rdi
popq %rbp
popq %r12
popq %r13
ret
..L7:
movl $1, %eax
jmp .L3
.seh_endproc

Arne

Re: x86-64 data aligment / faulting

<bf047120-7aa6-4515-a18f-abc24b941cd3n@googlegroups.com>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21238&group=comp.os.vms#21238

  copy link   Newsgroups: comp.os.vms
X-Received: by 2002:a05:622a:189b:b0:2de:4b91:b1a8 with SMTP id v27-20020a05622a189b00b002de4b91b1a8mr12361473qtc.601.1646709717278;
Mon, 07 Mar 2022 19:21:57 -0800 (PST)
X-Received: by 2002:a05:620a:c55:b0:67d:1721:5956 with SMTP id
u21-20020a05620a0c5500b0067d17215956mr717294qki.218.1646709717109; Mon, 07
Mar 2022 19:21:57 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.os.vms
Date: Mon, 7 Mar 2022 19:21:56 -0800 (PST)
In-Reply-To: <6226c249$0$705$14726298@news.sunsite.dk>
Injection-Info: google-groups.googlegroups.com; posting-host=96.230.211.194; posting-account=Ysq9BAoAAACGX1EcMMPkdNg4YcTg0TxG
NNTP-Posting-Host: 96.230.211.194
References: <GdcSJ.1239823$X81f.965693@fx14.ams4> <s1eSJ.2188479$_xc2.746813@fx02.ams4>
<62197084$0$701$14726298@news.sunsite.dk> <ff173611-62b5-410b-a0f3-3826b6d56f83n@googlegroups.com>
<621a9d77$0$701$14726298@news.sunsite.dk> <19918859-b710-4f86-8f07-f26e9442ad5fn@googlegroups.com>
<621c2bc3$0$693$14726298@news.sunsite.dk> <svj4al$e7p$1@dont-email.me>
<svj5qa$t20$1@dont-email.me> <6226c249$0$705$14726298@news.sunsite.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bf047120-7aa6-4515-a18f-abc24b941cd3n@googlegroups.com>
Subject: Re: x86-64 data aligment / faulting
From: dansabrs...@yahoo.com (abrsvc)
Injection-Date: Tue, 08 Mar 2022 03:21:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 194
 by: abrsvc - Tue, 8 Mar 2022 03:21 UTC

On Monday, March 7, 2022 at 9:41:17 PM UTC-5, Arne Vajhøj wrote:
> On 2/28/2022 1:55 PM, Simon Clubley wrote:
> > On 2022-02-28, Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
> >> On 2022-02-27, Arne Vajhøj <ar...@vajhoej.dk> wrote:
> >>>
> >>> I modified the program to test with different data sizes, verified
> >>> that the code was indeed working on an unaligned addresses and
> >>> tried both sequential and random access to array.
> >>>
> >>> I simply can't get a big difference between aligned and unaligned access.
> >>>
> >>
> >> Have you looked at the generated code to verify that unaligned access
> >> is really occuring ?
> >>
> >> Another possibility, depending on how smart the compiler is, is that
> >> it could always do aligned access lookups and then only extract the
> >> data it needs out of each lookup.
> >>
> >
> > Also, the optimisation settings and level might make a difference.
> > Try varying them to see if the behaviour changes, especially when
> > you switch between optimising for time versus space (and yes, include
> > -O0 in the list of things you try. :-)).
> No alignment effect in neither default, -O0 or -O3.
> > It's for reasons like this that I suggested at the start of this
> > to look at the generated code, just in case the compiler is doing
> > something you are not expecting.
> I am not good enough in x86-64 assembler to see if it does
> anything unexpected.
>
> SUBROUTINE TEST(IX,X,N,REP)
> INTEGER*4 IX,N,REP
> REAL*8 X(*)
> INTEGER*4 I,T1,T2,DUMMY
> CALL SYSTEM_CLOCK(T1,DUMMY,DUMMY)
> DO 200 J=1,REP
> DO 100 I=1,N
> X(I)=I
> 100 CONTINUE
> 200 CONTINUE
> CALL SYSTEM_CLOCK(T2,DUMMY,DUMMY)
> WRITE(6,300) IX-1,T2-T1
> RETURN
> 300 FORMAT(1X,'OFFSET ',I2,' : ',I6,' ms')
> END
>
> becomes:
>
> .seh_proc test_
> test_:
> pushq %r13
> .seh_pushreg %r13
> pushq %r12
> .seh_pushreg %r12
> pushq %rbp
> .seh_pushreg %rbp
> pushq %rdi
> .seh_pushreg %rdi
> pushq %rsi
> .seh_pushreg %rsi
> pushq %rbx
> .seh_pushreg %rbx
> subq $600, %rsp
> .seh_stackalloc 600
> .seh_endprologue
> movq %r9, %rbp
> movq %rcx, %rbx
> movq %rdx, %rsi
> movq %r8, %r12
> leaq 40(%rsp), %rdx
> leaq 36(%rsp), %rcx
> leaq 44(%rsp), %r8
> call _gfortran_system_clock_4
> movl 0(%rbp), %r11d
> movl 36(%rsp), %edi
> testl %r11d, %r11d
> jle .L2
> movl (%r12), %r8d
> testl %r8d, %r8d
> jle .L2
> movl %r8d, %eax
> movl %r8d, %ebp
> addl $1, %r11d
> movl $1, %ecx
> shrl $2, %eax
> andl $-4, %ebp
> movdqa .LC1(%rip), %xmm3
> leal -1(%r8), %r12d
> subl $1, %eax
> leal 1(%rbp), %r13d
> salq $5, %rax
> leaq 32(%rsi,%rax), %rdx
> .p2align 4,,10
> .p2align 3
> .L6:
> cmpl $2, %r12d
> jbe .L7
> movdqa .LC0(%rip), %xmm1
> movq %rsi, %rax
> .p2align 4,,10
> .p2align 3
> .L4:
> movdqa %xmm1, %xmm0
> addq $32, %rax
> paddd %xmm3, %xmm1
> cvtdq2pd %xmm0, %xmm2
> pshufd $238, %xmm0, %xmm0
> movups %xmm2, -32(%rax)
> cvtdq2pd %xmm0, %xmm0
> movups %xmm0, -16(%rax)
> cmpq %rdx, %rax
> jne .L4
> movl %r13d, %eax
> cmpl %ebp, %r8d
> je .L5
> .L3:
> pxor %xmm0, %xmm0
> movslq %eax, %r9
> leal 1(%rax), %r10d
> cvtsi2sdl %eax, %xmm0
> leaq (%rsi,%r9,8), %r9
> movsd %xmm0, -8(%r9)
> cmpl %r10d, %r8d
> jl .L5
> pxor %xmm0, %xmm0
> addl $2, %eax
> cvtsi2sdl %r10d, %xmm0
> movsd %xmm0, (%r9)
> cmpl %eax, %r8d
> jl .L5
> pxor %xmm0, %xmm0
> cvtsi2sdl %eax, %xmm0
> movsd %xmm0, 8(%r9)
> .L5:
> addl $1, %ecx
> cmpl %r11d, %ecx
> jne .L6
> .L2:
> leaq 52(%rsp), %rdx
> leaq 56(%rsp), %r8
> leaq 48(%rsp), %rcx
> leaq 64(%rsp), %r12
> call _gfortran_system_clock_4
> leaq .LC2(%rip), %rax
> movq %r12, %rcx
> movl 48(%rsp), %esi
> movq %rax, 72(%rsp)
> leaq .LC3(%rip), %rax
> leaq 60(%rsp), %r13
> movq %rax, 144(%rsp)
> movq .LC4(%rip), %rax
> subl %edi, %esi
> movl $12, 80(%rsp)
> movq %rax, 64(%rsp)
> movq $32, 152(%rsp)
> call _gfortran_st_write
> movl (%rbx), %eax
> movq %r13, %rdx
> movq %r12, %rcx
> movl $4, %r8d
> subl $1, %eax
> movl %eax, 60(%rsp)
> call _gfortran_transfer_integer_write
> movq %r13, %rdx
> movq %r12, %rcx
> movl %esi, 60(%rsp)
> movl $4, %r8d
> call _gfortran_transfer_integer_write
> movq %r12, %rcx
> call _gfortran_st_write_done
> nop
> addq $600, %rsp
> popq %rbx
> popq %rsi
> popq %rdi
> popq %rbp
> popq %r12
> popq %r13
> ret
> .L7:
> movl $1, %eax
> jmp .L3
> .seh_endproc
>
> Arne
Looks to me like the memory references are longwords and the required portion is obtained via shifting. You won't see alignment issues here.

Dan

Re: x86-64 data aligment / faulting

<pB2YJ.292172$zX2.209805@fx12.ams4>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=21372&group=comp.os.vms#21372

  copy link   Newsgroups: comp.os.vms
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx12.ams4.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Reply-To: mark.daniel@wasd.vsm.com.au
Subject: Re: x86-64 data aligment / faulting
Content-Language: en-US
Newsgroups: comp.os.vms
References: <GdcSJ.1239823$X81f.965693@fx14.ams4>
From: mark.dan...@wasd.vsm.com.au (Mark Daniel)
In-Reply-To: <GdcSJ.1239823$X81f.965693@fx14.ams4>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 19
Message-ID: <pB2YJ.292172$zX2.209805@fx12.ams4>
X-Complaints-To: abuse@eweka.nl
NNTP-Posting-Date: Tue, 15 Mar 2022 15:50:45 UTC
Organization: Eweka Internet Services
Date: Wed, 16 Mar 2022 02:20:42 +1030
X-Received-Bytes: 1804
 by: Mark Daniel - Tue, 15 Mar 2022 15:50 UTC

On 26/2/22 8:23 am, Mark Daniel wrote:
> Alpha and Itanium had data alignment requirements with penalties for
> faulting.  Does x86-64?  Is sys$start_align_fault_report() et al. still
> relevant
Recent reply under the VSI Engineering imprimatur:

"On Alpha and IA64, accessing unaligned data results in an exception and
an exception handler fixes up the read and records the alignment fault.
On X86, there is no such thing as an alignment fault nor is there a
performance penalty for accessing unaligned data. The various tools to
report alignment faults are still present on X86, but they are not
relevant and will never report any alignment faults."

https://forum.vmssoftware.com/viewtopic.php?f=37&t=8475&p=17487#p17486

--
Anyone, who using social-media, forms an opinion regarding anything
other than the relative cuteness of this or that puppy-dog, needs
seriously to examine their critical thinking.

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor