Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The heart is not a logical organ. -- Dr. Janet Wallace, "The Deadly Years", stardate 3479.4


devel / comp.arch / Re: Assembling span-dependent instructions

SubjectAuthor
* Assembling span-dependent instructionsAnton Ertl
`* Re: Assembling span-dependent instructionsKaz Kylheku
 +* Re: Assembling span-dependent instructionsMitchAlsup
 |`* Re: Assembling span-dependent instructionsAnton Ertl
 | +* Re: Assembling span-dependent instructionsAnton Ertl
 | |`- Re: Assembling span-dependent instructionsThomas Koenig
 | `- Re: Assembling span-dependent instructionsJohn Levine
 +- Re: Assembling span-dependent instructionsAnton Ertl
 `* Re: Assembling span-dependent instructionsantispam
  `- Re: Assembling span-dependent instructionsKaz Kylheku

1
Assembling span-dependent instructions

<22-07-049@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26917&group=comp.arch#26917

  copy link   Newsgroups: comp.compilers comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers,comp.arch
Subject: Assembling span-dependent instructions
Date: Wed, 27 Jul 2022 16:32:50 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 154
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-049@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="57987"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 27 Jul 2022 13:57:17 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Wed, 27 Jul 2022 16:32 UTC

Over the years I have seen several discussions of the topic of
selecting between long and short versions of instructions in
assembling and linking. After one such discussion in 2019 I wrote

https://www.complang.tuwien.ac.at/anton/assembling-span-dependent.html

and after the recent discussion (e.g., <tan76v$qnd$1@gal.iecc.com>) I
finally proofread it and created a publically visible link to it. To
save you from the throuble of switching to a web browser, here's the
contents in plain text.

Crossposted to comp.compilers and comp.arch; maybe you want to reduce
this on followups.

Assembling Span-Dependent Instructions

Some instruction sets support encoding immediate operands with
different sizes. E.g., on AMD64 the assembly language statement

movl 8(%rdi),%rax

can be encoded as

8b 47 08

or as

8b 87 08 00 00 00

The first form can encode only offsets in the range -128..127, while
the latter can encode offsets in the range -2147483648..2147483647.

Architectures with fixed-size instructions tend to split operations
with large immediate operands into more instructions than operations
with small immediate operands, so they following also applies to them.

If the immediate operand is a constant in the assembly language source
code, it is straightforward to just select the smallest encoding for
which the operand fits.

However, if the immediate operand refers to labels in the assembly
program, e.g., for a branch instruction, things become more
complicated: the minimum size of the operand may depend on how the
program is assembled, and how the program is assembled depends on the
minimum size of the operand.

The common case is a relative branch to some label, with some code
between. In this case, making the code larger can only make the minimum
size of the operand larger, and this is relatively benign, as we will
see below.

However, one can also construct cases where making the code larger can
reduce the minimum size of the immediate operand, e.g.:

foo:
movl foo+133-bar(%rdi),%eax
bar:

If you choose the large encoding, the offset is 127 and would seem to
fit in the small encoding. But if you choose the small encoding, the
offset is 130 and does not fit in the small encoding. So in this case,
only the large encoding is a valid solution.

In general, such expressions involving labels may depend on how other
instructions are encoded, which may depend on yet other instructions,
with possible dependency cycles. In general, as Szymanski proved in
1978, this problem is NP-complete.

Now, as soon as anybody mentions NP-completeness, many people become
very gullible; at least I see no other explanation for the amount of
misunderstandings propagated about this topic, which prompted me to
write this web page.

* Szymanski's algorithm

Szymanski himself mentions the approaches discussed below, then
describes a more sophisticated graph-based algorithm (read the paper
for details). For dealing with cases like the second movl instruction
above, he gives criteria for identifying whether an instruction may
suffer from this condition, and suggests that they should be
unconditionally compiled in the long encoding (and gas actually does
that, even if foo+133-bar is replaced with foo-bar, for which the
shorter encoding is sufficient in either case). Given that such cases
are exceedingly rare, that's a good approach.

* Start big and shrink

A simpler algorithm is to start by using the largest encoding of all
instructions, determine the values of the labels, then make those
instructions smaller that can be made smaller. This can be repeated
until a steady-state is reached, or it can be stopped earlier to reduce
assembling time.

This algorithm can fail if the program contains nasty instructions like
our second movl: after the first pass, it decides to use the smaller
encoding for the movl, but that does not work.

If you then want to make this instruction larger again, the algorithm
is no longer guaranteed to terminate (see the end of section 2 for an
example that starts small).

One solution is to identify the nasty instructions as described by
Szymanski, and always keep these big.

A disadvantage of this algorithm is that its steady state may be worse
than for the algorithm below.

* Start small and grow

Another simple algorithm is to start by using the smallest encoding of
all instructions, determine the values of the labels, then make those
instructions larger that need to be larger. Repeat until a steady-state
is reached. Note that you never must shrink an instruction again,
otherwise the algorithm is no longer guaranteed to terminate.

Compared to the start-big approach, this algorithm is simpler (no need
to identify nasty instructions, the algorithm grows them just like the
others) and can produce better results, but typically need one
additional pass.

Compared to Szymanski's algorithm, this algorithm is simpler (no need
to identify nasty instructions) and runs slower (one pass more).
Compared to Szymanski's algorithm with a gas-like imprecise
identification of nasty instructions, the algorithm may produce smaller
code; but given that such instructions do not really occur in real
code, that's not a practical advantage.

* Misunderstandings

Over the years, a number of misunderstandings of this topic have been
posted. Comparing a recent one[1] to the paper, I can imagine how the
paper spawned it: The paper presents the start-big algorithm as just
working, then presents a start-small-then-grow-or-shrink algorithm, and
an example that does not converge (i.e., it does not terminate). It
does not say that the same example will either not terminate or fail if
you start big. The paper also explains that the general problem is
NP-complete, and also that this is not practically relevant. So if you
do not think about it too much, you might think that start-big is the
safe choice, while start-small has problems with termination (and a few
decades later that is conflated with the NP-completeness).

* References

Thomas G. Szymanski: Assembling Code for Machines with Span-Dependent
Instructions, CACM 21(4), 1978, p. 300-308

[1] <863dd999-bd97-49eb-977e-b70e9afed2d1@googlegroups.com>
in the web as <http://al.howardknight.net/?ID=157410050400>

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Assembling span-dependent instructions

<22-07-052@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26920&group=comp.arch#26920

  copy link   Newsgroups: comp.compilers comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 480-992-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Wed, 27 Jul 2022 22:52:40 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 15
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-052@comp.compilers>
References: <22-07-049@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="7304"; mail-complaints-to="abuse@iecc.com"
Keywords: code, question
Posted-Date: 27 Jul 2022 19:00:36 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Wed, 27 Jul 2022 22:52 UTC

On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> However, one can also construct cases where making the code larger can
> reduce the minimum size of the immediate operand, e.g.:
>
> foo:
> movl foo+133-bar(%rdi),%eax
> bar:

That's weird; what is accessed this way, relative to the code,
and does it occur in compiler output?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: Assembling span-dependent instructions

<d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26922&group=comp.arch#26922

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:f28c:0:b0:474:6b84:930 with SMTP id k12-20020a0cf28c000000b004746b840930mr8271110qvl.96.1658964203685;
Wed, 27 Jul 2022 16:23:23 -0700 (PDT)
X-Received: by 2002:ad4:5ded:0:b0:472:f1e0:9562 with SMTP id
jn13-20020ad45ded000000b00472f1e09562mr21222049qvb.85.1658964203520; Wed, 27
Jul 2022 16:23:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 27 Jul 2022 16:23:23 -0700 (PDT)
In-Reply-To: <22-07-052@comp.compilers>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com>
Subject: Re: Assembling span-dependent instructions
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 27 Jul 2022 23:23:23 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 19
 by: MitchAlsup - Wed, 27 Jul 2022 23:23 UTC

On Wednesday, July 27, 2022 at 6:00:39 PM UTC-5, Kaz Kylheku wrote:
> On 2022-07-27, Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> > However, one can also construct cases where making the code larger can
> > reduce the minimum size of the immediate operand, e.g.:
> >
> > foo:
> > movl foo+133-bar(%rdi),%eax
> > bar:
<
> That's weird; what is accessed this way, relative to the code,
> and does it occur in compiler output?
<
It is not something that a compiler would produce.
In addition one needs read access to the code section in order to avoid faults.
Thirdly, if/when an ISA has IP relative addressing, you don't even need this kind
of access "arithmetic"; ever.
>
> --
> TXR Programming Language: http://nongnu.org/txr
> Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: Assembling span-dependent instructions

<2022Jul28.083805@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26927&group=comp.arch#26927

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 06:38:05 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 48
Message-ID: <2022Jul28.083805@mips.complang.tuwien.ac.at>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com>
Injection-Info: reader01.eternal-september.org; posting-host="07ab17fa9b519e4e698b9d31051ddb6b";
logging-data="3049370"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KKIdc0fHocuPWs+9WCcHn"
Cancel-Lock: sha1:FeneJw81v3rMIjWWpgDcyw1peA0=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 28 Jul 2022 06:38 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Wednesday, July 27, 2022 at 6:00:39 PM UTC-5, Kaz Kylheku wrote:
>> On 2022-07-27, Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> > However, one can also construct cases where making the code larger can
>> > reduce the minimum size of the immediate operand, e.g.:
>> >
>> > foo:
>> > movl foo+133-bar(%rdi),%eax
>> > bar:
><
>> That's weird; what is accessed this way, relative to the code,
>> and does it occur in compiler output?
><
>It is not something that a compiler would produce.
>In addition one needs read access to the code section in order to avoid faults.

I have yet to see a Unix system that does not provide read access to
the code section. Maybe in the 16-bit days, with (on the 80286) the
small rather than tiny code model, and similar stuff on the PDP-11,
but not since VAX/68000/386 times.

>Thirdly, if/when an ISA has IP relative addressing, you don't even need this kind
>of access "arithmetic"; ever.

With IP-relative addressing, it's just less obvious what is happening:

[/tmp:131673] cat xxx.s
foo:
movl foo+133(%rip),%eax
[/tmp:131674] gcc -c xxx.s
[/tmp:131675] objdump -d xxx.o

xxx.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
0: 8b 05 7f 00 00 00 mov 0x7f(%rip),%eax # 85 <foo+0x85>

Note the 0x7f in the instruction, but you cannot use a short one
instead, because then the offset would not fit into the -128..127
range.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Assembling span-dependent instructions

<2022Jul28.104552@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26928&group=comp.arch#26928

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 08:45:52 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 67
Message-ID: <2022Jul28.104552@mips.complang.tuwien.ac.at>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com> <2022Jul28.083805@mips.complang.tuwien.ac.at>
Injection-Info: reader01.eternal-september.org; posting-host="07ab17fa9b519e4e698b9d31051ddb6b";
logging-data="3127072"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18nlmyZspv5yGuNRsYCrn7w"
Cancel-Lock: sha1:LyppoLqKQBssCKpyOo0m1eEJCBo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 28 Jul 2022 08:45 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>MitchAlsup <MitchAlsup@aol.com> writes:
>>On Wednesday, July 27, 2022 at 6:00:39 PM UTC-5, Kaz Kylheku wrote:
>>> On 2022-07-27, Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> > However, one can also construct cases where making the code larger can
>>> > reduce the minimum size of the immediate operand, e.g.:
>>> >
>>> > foo:
>>> > movl foo+133-bar(%rdi),%eax
>>> > bar:
>><
>>> That's weird; what is accessed this way, relative to the code,
>>> and does it occur in compiler output?
>><
>>It is not something that a compiler would produce.
>>In addition one needs read access to the code section in order to avoid faults.
>
>I have yet to see a Unix system that does not provide read access to
>the code section. Maybe in the 16-bit days, with (on the 80286) the
>small rather than tiny code model, and similar stuff on the PDP-11,
>but not since VAX/68000/386 times.
>
>>Thirdly, if/when an ISA has IP relative addressing, you don't even need this kind
>>of access "arithmetic"; ever.
>
>With IP-relative addressing, it's just less obvious what is happening:
>
>[/tmp:131673] cat xxx.s
>foo:
>movl foo+133(%rip),%eax
>[/tmp:131674] gcc -c xxx.s
>[/tmp:131675] objdump -d xxx.o
>
>xxx.o: file format elf64-x86-64
>
>
>Disassembly of section .text:
>
>0000000000000000 <foo>:
> 0: 8b 05 7f 00 00 00 mov 0x7f(%rip),%eax # 85 <foo+0x85>
>
>Note the 0x7f in the instruction, but you cannot use a short one
>instead, because then the offset would not fit into the -128..127
>range.

Concerning "not something that a compiler would produce":

[/tmp:131691] cat yyy.c
char const * const foo="bla";

char main(long i)
{ return foo[-3673];
}

[/tmp:131692] gcc -O yyy.c #on Debian 11
[/tmp:131693] objdump -d a.out
....
0000000000001125 <main>:
1125: 0f b6 05 7f 00 00 00 movzbl 0x7f(%rip),%eax # 11ab <_fini+0x17>
112c: c3 retq
....

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Assembling span-dependent instructions

<tbuq2l$tn2$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26931&group=comp.arch#26931

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-dd1e-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 20:03:33 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <tbuq2l$tn2$1@newsreader4.netcologne.de>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers>
<d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com>
<2022Jul28.083805@mips.complang.tuwien.ac.at>
<2022Jul28.104552@mips.complang.tuwien.ac.at>
Injection-Date: Thu, 28 Jul 2022 20:03:33 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-dd1e-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:dd1e:0:7285:c2ff:fe6c:992d";
logging-data="30434"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Thu, 28 Jul 2022 20:03 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

> Concerning "not something that a compiler would produce":
>
> [/tmp:131691] cat yyy.c
> char const * const foo="bla";
>
> char main(long i)
> {
> return foo[-3673];
> }

Nice example of nasal daemons (in Fortran, it would be
"Start World War Three, provided the right operational
hardware has been installed".

Re: Assembling span-dependent instructions

<tbuss7$2d2k$1@gal.iecc.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26933&group=comp.arch#26933

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 20:51:19 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <tbuss7$2d2k$1@gal.iecc.com>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com> <2022Jul28.083805@mips.complang.tuwien.ac.at>
Injection-Date: Thu, 28 Jul 2022 20:51:19 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="78932"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <d3d0b8ef-d647-4772-82dc-f1390870fa1fn@googlegroups.com> <2022Jul28.083805@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Thu, 28 Jul 2022 20:51 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>I have yet to see a Unix system that does not provide read access to
>the code section. Maybe in the 16-bit days, with (on the 80286) the
>small rather than tiny code model, and similar stuff on the PDP-11,
>but not since VAX/68000/386 times.

On a PDP-11/45 or /70 it had split instruction and data space so there
was literally no way to make a data reference to the instruction space.

I agree that was quite a while ago. My recollection is that someone came
up with a contrived example which did super-SDI optimization when trying
to shorten a long jump by looking for other jumps to the same target
that might be close enough for a short jump, and that had the problem
that shortening one jump could make another out of range.

R's,
John
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Assembling span-dependent instructions

<22-07-053@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26940&group=comp.arch#26940

  copy link   Newsgroups: comp.compilers comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 05:56:28 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 26
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-053@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="12712"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 29 Jul 2022 16:29:15 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Anton Ertl - Thu, 28 Jul 2022 05:56 UTC

Kaz Kylheku <480-992-1380@kylheku.com> writes:
>On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> However, one can also construct cases where making the code larger can
>> reduce the minimum size of the immediate operand, e.g.:
>>
>> foo:
>> movl foo+133-bar(%rdi),%eax
>> bar:
>
>That's weird; what is accessed this way, relative to the code,
>and does it occur in compiler output?

I designed this example based on the examples in the paper, not as
something occuring in practice.

The examples in the paper involved several span-dependent instructions
and looked unusual to me, but I convinced myself that something like
that could occur in practice, with data close to code addressed using
some pointer into the code (like %rip-relative on AMD64, or the global
pointer on various position-independent ABIs for other architectures).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Assembling span-dependent instructions

<22-07-055@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26941&group=comp.arch#26941

  copy link   Newsgroups: comp.compilers comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: antis...@math.uni.wroc.pl
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 12:15:14 -0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 58
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-055@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="49930"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, assembler
Posted-Date: 29 Jul 2022 16:38:22 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: antis...@math.uni.wroc.pl - Thu, 28 Jul 2022 12:15 UTC

In comp.arch Kaz Kylheku <480-992-1380@kylheku.com> wrote:
> On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> > However, one can also construct cases where making the code larger can
> > reduce the minimum size of the immediate operand, e.g.:
> >
> > foo:
> > movl foo+133-bar(%rdi),%eax
> > bar:
>
> That's weird; what is accessed this way, relative to the code,
> and does it occur in compiler output?

Code like this may appear due to alignment, say jump to page or cache
line boundary. In realistic situation one is faced in much more
compilex problem. Namely on many architectures best way to provide
constant arguments is by storing constants in memory. This leads to
"constant pools" and problem where to place them. One wants constant
pools as close as possible to code, to use short offsets accessing
them. But for performance reasons it is desirable to put constants in
separate cache lines. Also, one needs jumps to jump around constant
pools. Some jumps occur naturally in program, it is good to re-use
them. But there are possible unused parts of cache lines (both for
code and constant pools). So there is need to balance loss due to
unused parts of cache lines (probably dominant factor), length of
instructions and possible overhead due to extra instructions.

There is extra complication when machine has limited range of offsets
which can be used in single intstruction: when needed offset exceeds
allowed range one has to change to indirect form which needs free
register. So there is extra interaction with registes allocation. This
is particularly nasty if one needs free register and wants to spill
some other register, but address constants exceed length allowed in
instruction, so in order to free register one already needs free
register to put there address constant.

i386 allows rather easy solution to such problems because one can
reach any locations from loads and jumps and most instructions accept
32-bit constant (immediate) arguments. On x86_64 (that is in 64 bit
mode) situation is more interesting because immediates and offsets are
limited to 32-bits, so one can no longer reach whole memory. But
current practice make it easy but wasetful: each part of program (main
executable and shared libraries) is limited to 2G so that 32-bit
offsets are enough and accesses to other parts are indirect. 32 bit
ARM has most of such problems: offsets are quite limited and mosts
constants need to be loaded from memory. ARM intructions are of fixed
length, but when offset is too big to fit into single instruction one
has to use alternative sequence of several instructions (and deal with
register allocation). Z architecture (modern versions of IBM 360) has
such problems too: there are variants of instruction having different
lengths but even longest variant have limited range of available
offsets. At least some versions of Z architecture had severe penalty
for simultaneusly accessing the same cache line for instruction fetch
and data access, so putting constant pools in separate cache line was
very important.

--
Waldek Hebisch

Re: Assembling span-dependent instructions

<22-07-060@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26946&group=comp.arch#26946

  copy link   Newsgroups: comp.compilers comp.arch
Followup: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 480-992-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Followup-To: comp.compilers
Date: Fri, 29 Jul 2022 22:36:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 53
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-060@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <22-07-055@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="48733"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 29 Jul 2022 19:01:02 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Fri, 29 Jul 2022 22:36 UTC

["Followup-To:" header set to comp.compilers.]
On 2022-07-28, antispam@math.uni.wroc.pl <antispam@math.uni.wroc.pl> wrote:
> In comp.arch Kaz Kylheku <480-992-1380@kylheku.com> wrote:
>> On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> > However, one can also construct cases where making the code larger can
>> > reduce the minimum size of the immediate operand, e.g.:
>> >
>> > foo:
>> > movl foo+133-bar(%rdi),%eax
>> > bar:
>>
>> That's weird; what is accessed this way, relative to the code,
>> and does it occur in compiler output?
>
> Code like this may appear due to alignment, say jump to page or cache
> line boundary. In realistic situation one is faced in much more
> compilex problem. Namely on many architectures best way to provide
> constant arguments is by storing constants in memory. This leads to
> "constant pools" and problem where to place them. One wants constant
> pools as close as possible to code, to use short offsets accessing
> them. But for performance reasons it is desirable to put constants in
> separate cache lines. Also, one needs jumps to jump around constant
> pools. Some jumps occur naturally in program, it is good to re-use
> them. But there are possible unused parts of cache lines (both for
> code and constant pools). So there is need to balance loss due to
> unused parts of cache lines (probably dominant factor), length of
> instructions and possible overhead due to extra instructions.

OK; so that makes sense. Say we have a program segment with
some PC relative references to a cache-aligned constant pool
which is farther down the program.

If this program segment shrinks, while the cache-aligned
constant pool stays where it is (to remain aligned),
then that pool will be farther away from some of those
instructions.

This could even happen due to smaller alignments like 8 bytes.
Some instruction loses 4 bytes; then another one experiences
the data being 4 bytes farther than before, which now
doesn't fit.

> There is extra complication when machine has limited range of offsets
> which can be used in single intstruction: when needed offset exceeds
> allowed range one has to change to indirect form which needs free
> register. So there is extra interaction with registes allocation.

On an architecture that is not so register starved, you just
dedicate a register for that. MIPS calls it "at" ("assembler temporary").

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor