Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

"The hands that help are better far than the lips that pray." -- Robert G. Ingersoll

Assembling span-dependent instructions

Subject	Author
Assembling span-dependent instructions	Anton Ertl
Re: Assembling span-dependent instructions	Kaz Kylheku
Re: Assembling span-dependent instructions	Anton Ertl
Re: Assembling span-dependent instructions	antispam
Re: Assembling span-dependent instructions	gah4
Re: Assembling span-dependent instructions	Kaz Kylheku
RE: Assembling span-dependent instructions	Christopher F Clark
RE: Assembling span-dependent instructions	Anton Ertl
RE: Assembling span-dependent instructions	Christopher F Clark

Assembling span-dependent instructions

<22-07-049@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=519&group=comp.compilers#519

copy link Newsgroups: comp.compilers comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers,comp.arch
Subject: Assembling span-dependent instructions
Date: Wed, 27 Jul 2022 16:32:50 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 154
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-049@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="57987"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 27 Jul 2022 13:57:17 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Anton Ertl - Wed, 27 Jul 2022 16:32 UTC

Over the years I have seen several discussions of the topic of
selecting between long and short versions of instructions in
assembling and linking. After one such discussion in 2019 I wrote

https://www.complang.tuwien.ac.at/anton/assembling-span-dependent.html

and after the recent discussion (e.g., <tan76v$qnd$1@gal.iecc.com>) I
finally proofread it and created a publically visible link to it. To
save you from the throuble of switching to a web browser, here's the
contents in plain text.

Crossposted to comp.compilers and comp.arch; maybe you want to reduce
this on followups.

Assembling Span-Dependent Instructions

Some instruction sets support encoding immediate operands with
different sizes. E.g., on AMD64 the assembly language statement

movl 8(%rdi),%rax

can be encoded as

8b 47 08

or as

8b 87 08 00 00 00

The first form can encode only offsets in the range -128..127, while
the latter can encode offsets in the range -2147483648..2147483647.

Architectures with fixed-size instructions tend to split operations
with large immediate operands into more instructions than operations
with small immediate operands, so they following also applies to them.

If the immediate operand is a constant in the assembly language source
code, it is straightforward to just select the smallest encoding for
which the operand fits.

However, if the immediate operand refers to labels in the assembly
program, e.g., for a branch instruction, things become more
complicated: the minimum size of the operand may depend on how the
program is assembled, and how the program is assembled depends on the
minimum size of the operand.

The common case is a relative branch to some label, with some code
between. In this case, making the code larger can only make the minimum
size of the operand larger, and this is relatively benign, as we will
see below.

However, one can also construct cases where making the code larger can
reduce the minimum size of the immediate operand, e.g.:

foo:
movl foo+133-bar(%rdi),%eax
bar:

If you choose the large encoding, the offset is 127 and would seem to
fit in the small encoding. But if you choose the small encoding, the
offset is 130 and does not fit in the small encoding. So in this case,
only the large encoding is a valid solution.

In general, such expressions involving labels may depend on how other
instructions are encoded, which may depend on yet other instructions,
with possible dependency cycles. In general, as Szymanski proved in
1978, this problem is NP-complete.

Now, as soon as anybody mentions NP-completeness, many people become
very gullible; at least I see no other explanation for the amount of
misunderstandings propagated about this topic, which prompted me to
write this web page.

* Szymanski's algorithm

Szymanski himself mentions the approaches discussed below, then
describes a more sophisticated graph-based algorithm (read the paper
for details). For dealing with cases like the second movl instruction
above, he gives criteria for identifying whether an instruction may
suffer from this condition, and suggests that they should be
unconditionally compiled in the long encoding (and gas actually does
that, even if foo+133-bar is replaced with foo-bar, for which the
shorter encoding is sufficient in either case). Given that such cases
are exceedingly rare, that's a good approach.

* Start big and shrink

A simpler algorithm is to start by using the largest encoding of all
instructions, determine the values of the labels, then make those
instructions smaller that can be made smaller. This can be repeated
until a steady-state is reached, or it can be stopped earlier to reduce
assembling time.

This algorithm can fail if the program contains nasty instructions like
our second movl: after the first pass, it decides to use the smaller
encoding for the movl, but that does not work.

If you then want to make this instruction larger again, the algorithm
is no longer guaranteed to terminate (see the end of section 2 for an
example that starts small).

One solution is to identify the nasty instructions as described by
Szymanski, and always keep these big.

A disadvantage of this algorithm is that its steady state may be worse
than for the algorithm below.

* Start small and grow

Another simple algorithm is to start by using the smallest encoding of
all instructions, determine the values of the labels, then make those
instructions larger that need to be larger. Repeat until a steady-state
is reached. Note that you never must shrink an instruction again,
otherwise the algorithm is no longer guaranteed to terminate.

Compared to the start-big approach, this algorithm is simpler (no need
to identify nasty instructions, the algorithm grows them just like the
others) and can produce better results, but typically need one
additional pass.

Compared to Szymanski's algorithm, this algorithm is simpler (no need
to identify nasty instructions) and runs slower (one pass more).
Compared to Szymanski's algorithm with a gas-like imprecise
identification of nasty instructions, the algorithm may produce smaller
code; but given that such instructions do not really occur in real
code, that's not a practical advantage.

* Misunderstandings

Over the years, a number of misunderstandings of this topic have been
posted. Comparing a recent one[1] to the paper, I can imagine how the
paper spawned it: The paper presents the start-big algorithm as just
working, then presents a start-small-then-grow-or-shrink algorithm, and
an example that does not converge (i.e., it does not terminate). It
does not say that the same example will either not terminate or fail if
you start big. The paper also explains that the general problem is
NP-complete, and also that this is not practically relevant. So if you
do not think about it too much, you might think that start-big is the
safe choice, while start-small has problems with termination (and a few
decades later that is conflated with the NP-completeness).

* References

Thomas G. Szymanski: Assembling Code for Machines with Span-Dependent
Instructions, CACM 21(4), 1978, p. 300-308

[1] <863dd999-bd97-49eb-977e-b70e9afed2d1@googlegroups.com>
in the web as <http://al.howardknight.net/?ID=157410050400>

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Assembling span-dependent instructions

<22-07-052@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=522&group=comp.compilers#522

copy link Newsgroups: comp.compilers comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 480-992-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Wed, 27 Jul 2022 22:52:40 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 15
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-052@comp.compilers>
References: <22-07-049@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="7304"; mail-complaints-to="abuse@iecc.com"
Keywords: code, question
Posted-Date: 27 Jul 2022 19:00:36 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Kaz Kylheku - Wed, 27 Jul 2022 22:52 UTC

On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> However, one can also construct cases where making the code larger can
> reduce the minimum size of the immediate operand, e.g.:
>
> foo:
> movl foo+133-bar(%rdi),%eax
> bar:

That's weird; what is accessed this way, relative to the code,
and does it occur in compiler output?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: Assembling span-dependent instructions

<22-07-053@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=523&group=comp.compilers#523

copy link Newsgroups: comp.compilers comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 05:56:28 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 26
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-053@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="12712"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 29 Jul 2022 16:29:15 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Anton Ertl - Thu, 28 Jul 2022 05:56 UTC

Kaz Kylheku <480-992-1380@kylheku.com> writes:
>On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> However, one can also construct cases where making the code larger can
>> reduce the minimum size of the immediate operand, e.g.:
>>
>> foo:
>> movl foo+133-bar(%rdi),%eax
>> bar:
>
>That's weird; what is accessed this way, relative to the code,
>and does it occur in compiler output?

I designed this example based on the examples in the paper, not as
something occuring in practice.

The examples in the paper involved several span-dependent instructions
and looked unusual to me, but I convinced myself that something like
that could occur in practice, with data close to code addressed using
some pointer into the code (like %rip-relative on AMD64, or the global
pointer on various position-independent ABIs for other architectures).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Re: Assembling span-dependent instructions

<22-07-055@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=525&group=comp.compilers#525

copy link Newsgroups: comp.compilers comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: antis...@math.uni.wroc.pl
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 12:15:14 -0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 58
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-055@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="49930"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, assembler
Posted-Date: 29 Jul 2022 16:38:22 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: antis...@math.uni.wroc.pl - Thu, 28 Jul 2022 12:15 UTC

In comp.arch Kaz Kylheku <480-992-1380@kylheku.com> wrote:
> On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> > However, one can also construct cases where making the code larger can
> > reduce the minimum size of the immediate operand, e.g.:
> >
> > foo:
> > movl foo+133-bar(%rdi),%eax
> > bar:
>
> That's weird; what is accessed this way, relative to the code,
> and does it occur in compiler output?

Code like this may appear due to alignment, say jump to page or cache
line boundary. In realistic situation one is faced in much more
compilex problem. Namely on many architectures best way to provide
constant arguments is by storing constants in memory. This leads to
"constant pools" and problem where to place them. One wants constant
pools as close as possible to code, to use short offsets accessing
them. But for performance reasons it is desirable to put constants in
separate cache lines. Also, one needs jumps to jump around constant
pools. Some jumps occur naturally in program, it is good to re-use
them. But there are possible unused parts of cache lines (both for
code and constant pools). So there is need to balance loss due to
unused parts of cache lines (probably dominant factor), length of
instructions and possible overhead due to extra instructions.

There is extra complication when machine has limited range of offsets
which can be used in single intstruction: when needed offset exceeds
allowed range one has to change to indirect form which needs free
register. So there is extra interaction with registes allocation. This
is particularly nasty if one needs free register and wants to spill
some other register, but address constants exceed length allowed in
instruction, so in order to free register one already needs free
register to put there address constant.

i386 allows rather easy solution to such problems because one can
reach any locations from loads and jumps and most instructions accept
32-bit constant (immediate) arguments. On x86_64 (that is in 64 bit
mode) situation is more interesting because immediates and offsets are
limited to 32-bits, so one can no longer reach whole memory. But
current practice make it easy but wasetful: each part of program (main
executable and shared libraries) is limited to 2G so that 32-bit
offsets are enough and accesses to other parts are indirect. 32 bit
ARM has most of such problems: offsets are quite limited and mosts
constants need to be loaded from memory. ARM intructions are of fixed
length, but when offset is too big to fit into single instruction one
has to use alternative sequence of several instructions (and deal with
register allocation). Z architecture (modern versions of IBM 360) has
such problems too: there are variants of instruction having different
lengths but even longest variant have limited range of available
offsets. At least some versions of Z architecture had severe penalty
for simultaneusly accessing the same cache line for instruction fetch
and data access, so putting constant pools in separate cache line was
very important.

--
Waldek Hebisch

RE: Assembling span-dependent instructions

<22-07-056@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=526&group=comp.compilers#526

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: RE: Assembling span-dependent instructions
Date: Thu, 28 Jul 2022 16:02:16 +0300
Organization: Compilers Central
Lines: 39
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-056@comp.compilers>
References: <22-07-049@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="56136"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 29 Jul 2022 16:39:52 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Christopher F Clark - Thu, 28 Jul 2022 13:02 UTC

Anton,

Curiously, I recently used that example (limited to jumps) to explain
something on Quora, how two different approaches can yield different
answers to the same question.

You can make both, the large-and-shrink, and small-and-grow approaches
converge (just ban doing the opposite, once shrunk an instruction cannot
regrow or once grown an instruction cannot reshrink). And under certain
assumptions, you can show that the small-and-grow result will always be
"more optimal" (smaller) than the large-and-shrink result. However, one of
those assumptions is that shrinking some instruction will not require a
different instruction to grow--the results must be monotonic. That is not
always true in real-life instruction sets.

More relevant is that the order or growing or shrinking instructions can
change which other instructions grow or shrink and that means picking which
one to grow or shrink is important and that contributes to the
NP-completeness of the problem. You potentially have to try all possible
orders of growing/shrinking to find the optimal set.

There are even other aspects that impact real-life implementations, such as
aligned instructions possibly executing faster, such that one isn't just
looking for the smallest size, but the smallest size that runs fastest.

Nice to see someone else remembering the same issues.

Kind regards,
Chris

--
******************************************************************************

Chris Clark email:
christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Re: Assembling span-dependent instructions

<22-07-058@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=528&group=comp.compilers#528

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah...@u.washington.edu (gah4)
Newsgroups: comp.compilers
Subject: Re: Assembling span-dependent instructions
Date: Fri, 29 Jul 2022 14:22:01 -0700 (PDT)
Organization: Compilers Central
Lines: 29
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-058@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <22-07-055@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="55525"; mail-complaints-to="abuse@iecc.com"
Keywords: assembler, optimize
Posted-Date: 29 Jul 2022 17:50:26 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-07-055@comp.compilers>

by: gah4 - Fri, 29 Jul 2022 21:22 UTC

On Friday, July 29, 2022 at 1:38:25 PM UTC-7, anti...@math.uni.wroc.pl wrote:

(snip)

> Z architecture (modern versions of IBM 360) has
> such problems too: there are variants of instruction having different
> lengths but even longest variant have limited range of available
> offsets. At least some versions of Z architecture had severe penalty
> for simultaneusly accessing the same cache line for instruction fetch
> and data access, so putting constant pools in separate cache line was
> very important.

I presume this is true for any system with separate data/instuction
cache. It might be more of a problem for z/, with especially long
cache lines.

From the S/360 days, it was usual for data, even variable data, to be
close to code. That is, for non-reentrant programs. Most assembly
code and Fortran did that.

Otherwise, I believe the original question comes up on any machine
with variable sized branch instructions. Many of the stories I
remember are from the PDP-8.

A similar question comes up generating the Table of Contents
with LaTeX. When you run it, it generates the file used to make
the ToC next time. You run it again to generate the ToC, and
it will tell you if anything moved since the time before.
Hopefully it converges.

Re: Assembling span-dependent instructions

<22-07-060@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=530&group=comp.compilers#530

copy link Newsgroups: comp.compilers comp.arch
Followup: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 480-992-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers,comp.arch
Subject: Re: Assembling span-dependent instructions
Followup-To: comp.compilers
Date: Fri, 29 Jul 2022 22:36:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 53
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-060@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> <22-07-055@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="48733"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize
Posted-Date: 29 Jul 2022 19:01:02 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Kaz Kylheku - Fri, 29 Jul 2022 22:36 UTC

["Followup-To:" header set to comp.compilers.]
On 2022-07-28, antispam@math.uni.wroc.pl <antispam@math.uni.wroc.pl> wrote:
> In comp.arch Kaz Kylheku <480-992-1380@kylheku.com> wrote:
>> On 2022-07-27, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> > However, one can also construct cases where making the code larger can
>> > reduce the minimum size of the immediate operand, e.g.:
>> >
>> > foo:
>> > movl foo+133-bar(%rdi),%eax
>> > bar:
>>
>> That's weird; what is accessed this way, relative to the code,
>> and does it occur in compiler output?
>
> Code like this may appear due to alignment, say jump to page or cache
> line boundary. In realistic situation one is faced in much more
> compilex problem. Namely on many architectures best way to provide
> constant arguments is by storing constants in memory. This leads to
> "constant pools" and problem where to place them. One wants constant
> pools as close as possible to code, to use short offsets accessing
> them. But for performance reasons it is desirable to put constants in
> separate cache lines. Also, one needs jumps to jump around constant
> pools. Some jumps occur naturally in program, it is good to re-use
> them. But there are possible unused parts of cache lines (both for
> code and constant pools). So there is need to balance loss due to
> unused parts of cache lines (probably dominant factor), length of
> instructions and possible overhead due to extra instructions.

OK; so that makes sense. Say we have a program segment with
some PC relative references to a cache-aligned constant pool
which is farther down the program.

If this program segment shrinks, while the cache-aligned
constant pool stays where it is (to remain aligned),
then that pool will be farther away from some of those
instructions.

This could even happen due to smaller alignments like 8 bytes.
Some instruction loses 4 bytes; then another one experiences
the data being 4 bytes farther than before, which now
doesn't fit.

> There is extra complication when machine has limited range of offsets
> which can be used in single intstruction: when needed offset exceeds
> allowed range one has to change to indirect form which needs free
> register. So there is extra interaction with registes allocation.

On an architecture that is not so register starved, you just
dedicate a register for that. MIPS calls it "at" ("assembler temporary").

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

RE: Assembling span-dependent instructions

<22-07-061@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=531&group=comp.compilers#531

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: RE: Assembling span-dependent instructions
Date: Sat, 30 Jul 2022 09:28:06 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 55
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-061@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-056@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="40817"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize, comment
Posted-Date: 30 Jul 2022 14:09:36 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Anton Ertl - Sat, 30 Jul 2022 09:28 UTC

Christopher F Clark <christopher.f.clark@compiler-resources.com> writes:
>You can make both, the large-and-shrink, and small-and-grow approaches
>converge (just ban doing the opposite, once shrunk an instruction cannot
>regrow or once grown an instruction cannot reshrink).

My example

foo:
movl foo+133-bar(%rdi),%eax
bar:

is one where start-large-and-shrink fails, unless you can regrow: Once
you have shrunk the instruction, the resulting offset 130 does not fit
into the -128..127 range allowed for a byte offset, so the result
would be a failure. Szymanski's approach is to never shrink
instructions that might have this problem (and he has a way to
distinguish those from others that are guaranteed not to have this
problem, and that he allows to shrink).

By contrast, start-small-and-grow never fails and always converges,
without needing to identify problematic instructions.

>More relevant is that the order or growing or shrinking instructions can
>change which other instructions grow or shrink and that means picking which
>one to grow or shrink is important and that contributes to the
>NP-completeness of the problem. You potentially have to try all possible
>orders of growing/shrinking to find the optimal set.

It's not about order, it's about sets. If one wants to do better than
start-small-and-grow, one could take the set of n problematic
instructions and try out all 2^n assignments of small/large to this
set (the sizes of unproblematic instructions are then determined by
start-small-and-grow).

>There are even other aspects that impact real-life implementations, such as
>aligned instructions possibly executing faster, such that one isn't just
>looking for the smallest size, but the smallest size that runs fastest.

Alignment was not considered in Szymanski's paper (and probably not in
the assembler he worked on), so the problem can occur without .align
directives. Of course, with .align there are additional reasons for
the non-monotonic behaviour.

Concerning assemblers that select longer instructions for alignment
considerations, I have never heard of that, and I doubt that an
assembler does that, except for the filler instructions produced by
the .align directive when it occurs in code.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Szymanski was working on a PDP-11 where all instructions are
an integral number or words so there's no alignment to do. -John]

RE: Assembling span-dependent instructions

<22-07-062@comp.compilers>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=532&group=comp.compilers#532

copy link Newsgroups: comp.compilers

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: RE: Assembling span-dependent instructions
Date: Sun, 31 Jul 2022 02:28:46 +0300
Organization: Compilers Central
Lines: 46
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-062@comp.compilers>
References: <22-07-049@comp.compilers> <22-07-056@comp.compilers> <22-07-061@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="94800"; mail-complaints-to="abuse@iecc.com"
Keywords: assembler, optimize
Posted-Date: 30 Jul 2022 20:32:56 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com

by: Christopher F Clark - Sat, 30 Jul 2022 23:28 UTC

In the large-and-shrink model, you are not allowed to shrink an instruction
which causes the program to become incorrect, it's a test you make before
shrinking it, so you never shrink your example instruction and have to
regrow it. I suppose you could do it by shrink-test-abort
(regrow)/commit. However, that doesn't fix the problem where shrinking one
instruction depends upon shrinking another to result in a valid program
(and the instructions are mutually dependent that way).

And, that's one of the reasons why the two results converge to different
programs. There can be cliques of instructions that need to be all shrunk
before the program is legal. (Or in the more realistic case, graphs where
you have to color (small, large) just right to get the smallest legal
program, which is why it becomes NP-complete and reduces to a
satisfiability problem.)

If you disallow shrinking instructions which don't produce legal programs
(without any other changes, i.e. you cannot shrink your example
instruction), the large-and-shrink model converges. However, there are
combinations of shrunk instructions it never tries, because they are
mutually dependent.

And as far as I know, the first presentations of this idea only dealt with
shrinking jump instructions, where all distances were positive, so that you
could guarantee that the process was monotonic, i.e. all shrinking of
instructions made more shrinking possible. And that's where the proofs
that both algorithms could be shown to converge (but to different code
sequences) applies.

And the large-and-shrink algorithm was shown first, because if the
algorithm was halted mid-way through, it still resulted in a legal
program. The small-and-grow algorithm starts with an invalid program and
needs to proceed to reach a state where the program is legal. If you don't
let it converge, you have a broken instruction sequence.

Kind regards,
Chris

--
******************************************************************************

server_pubkey.txt

rocksolid light 0.9.8
clearnet tor