Message-ID:

Over the shoulder supervision is more a need of the manager than the programming task.

Bart <bc@freeuk.com> wrote:
> On 31/07/2021 20:18, antispam@math.uni.wroc.pl wrote:
> > Bart <bc@freeuk.com> wrote:
>
> > Reasonably modern
> > version of gcc support '-ffunction-sections' option, which means
> > that each function will be a separate "unit" for linker (otherwise
> > whole .o file would be a unit). You need to compile all files
> > going into library using this option. When linking program
> > you need to use '--gc-sections' option. Both options together
> > work and may give desired result.
>
> So presumably that wasn't done with that libraylib_static.a library that
> I used?

Possibly. As I wrote, there are traps and even if you use
proper compiler and linker options the result may be disappointing.

> See, you don't have control over the libraries that other people distribute.
>
>
> > But there are traps. Probably biggest trap is OO code. In such
> > code typically each object has table of methods. If you use
> > object linker normally will pull table of methods and consequently
> > will pull _all_ methods for given object, even if you use
> > only one (possibly trivial) method. Each method will pull
> > object that they use. If you have inhertance chain and use
> > most specialized object linker will amist surely pull whole
> > inhertance chain. So in OO library with say 100 classes and
> > 5000 methods as little as 10 methods may easily pull majority
> > of the library. Another trap is that function that you need
> > may have extra functionality that you do not need but which
> > will pull other things as dependencies. Notable example is
> > that function my detect errors. Even if you usage never leads
> > to errors such function will pull error handling support (for
> > example 'printf' to print error message). I have limited
> > knowledge about Windows API, but it is resonable to suspect
> > that that using just few functions could pull quite a lot
> > of code.
> >
> >> including options that will generate the position-independent code that
> >> would be necessary to extract and relocate individual functions. (How
> >> does it even work out how big a function is?)
> >
> > Here you completely messed up things: static linking does not
> > require position-independent code.
>
> Hmm, yes it does, at least on x64 architecture.
>
> A CALL to another function normally uses a relative offset. You can't
> just extract functions and move them about without invalidating those
> offsets. And doing that requires being able to precisely track what is
> code and what is data (even in executable memory).

Do you know what position-independent code means? And what static
linking means? With static linking relative positions of various
parts of executable are fixed at link time. Frequently also
absolute positions are fixed at link time, but for example DOS
exe format contained relocation info so absolute positions were
fixed only at load time. And yes, after static linking linking
you can not move code around, unless (like DOS) you keep
relocation tables. During link you use relocation tables, so
linker moves sections around and adjusts addresses using
relocation tables.

FYI, position-idependent code is different: after link you
get a blob which uses only relative addresses inside blob.
If for some reason relative addressing is inadeqater (say
machine does not support PC-relative addressing mode),
then position-idependent code uses extra register containing
its location in memory. So for internal references you can
put the blob anywhere in address space and it will work the
same. There is trouble with external references. External
references are all replaced by indirect references via a
table. This table is filled by dynamic linker, based on
_dynamic_ relocation tables. Table of external references
is usually much smaller than code, so gain with
position-idependent independent code is that you can keep
one copy of executable part in memory and use it for
many programs (variables and table of external references
are separate in each program). You need position-idependent
code when you want shared libraries or dynamic linking
without upleasent restrictions. Main point here is that
for sane sharing you do not want to touch executable
part of library, so you can not do relocations on load.

With static linking addresses are fixed, either after
link proper or after applying relocations (which
modifies code). Since addresses are fixed there
is no need for position-idependent code.

> Another factor is being able to extract global variables, which are
> usually accessed by absolute addresses. Put them somewhere else, and now
> you /need/ relocation data to fix up all the references.

With static linking addresses are known when whole linking process
is finished (logically on DOS linking process is finshed only
after load). In particular after linking proces you no longer can
"Put them somewhere else". Relative addresses of global variables
can not change after link proper, on Linux also absolute addresses
can not change after link.

> (DLL files will have that info, but are intended for moving the entirely
> library, not individual blocks of functions or data.)

Yes, Windows i386 DLL-s used static linking with relocation info
for adjusting start address of library at load time. And relocation
was applied to the whole library. This is very unlike what ELF
on Linux is doing: Linux shared libraries need position-independent
code, unlike i386 Windows DLL-s. IIUC also 64-bit Windows DLL
work like Linux ones and need position-independent code.
>
> In fact, one modtivation
> > for static linking is that position-dependent code is few
> > percent faster, so with static linking you can squese some
> > extra performance when it metters.
> >
> > Concerning function size and relocations: that is contained
> > in .o file (apropriate tables are build by assembler).
> > I find it curious that somebody boasts about writing an
> > assembler, but apparently does not know what should go
> > into .o file...
>
> This is nonsense, at least for COFF files. Tell me whereabouts function
> sizes are stored in that file format. It can only infer the sizes by
> analysing the start addresses of all the functions, and then it might be
> stuck on the last one, or where data is intermingled.

I do not understand your fixation on function size. Linker works
with sections and relocates sections as a whole. If you want
functions than use '-ffunction-sections', then section size
will be the same as function size. Concerning COFF, I do not
know if it supports multiple sections in single .o (or .obj if
you prefer) file. If not, then we are back to splitting files
so that single .o file contains exactly one function. I would
hope that Windows COFF (there are a lot of variants of COFF,
but you probably only care about Windows) adapted to modern
trends, but I had no reason to check this.

Bottom lines is that linker works on section, which in primitive
systems may be limited to whole .o file. Each section contains
table of exported addresses, imported ones and table of needed
relocations. Linker moves _sections_ (that is why without
'-ffunction-sections' (explicit or possibly as default for some
target) linker can not eliminate part of section, this is all
-or-nothing business.

> That is too
> haphazard an approach to use reliably.

Well, for reliable results one do not use random binary libraries
from Internet. One either fetches sources and compiles them
with needed options or at least inspects binaries (objdump -h
will tell you which sections are in a library). For example,
when I created file containing 3 functions and compiled it
using '-ffunction-sections' I get from objedump:

.....
3 .text.f_id 00000003 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
4 .text.f_a1 00000003 0000000000000000 0000000000000000 00000043 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
5 .text.f_a2 00000003 0000000000000000 0000000000000000 00000046 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
.....

so there are 3 sections, 1 for each function. When I omit
'-ffunction-sections' I get

0 .text 00000009 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
.....

that is single code section.

> David Brown creates applications for small devices so he has to pull out
> all the stops, use all possible options to keep things compact.
>
> But my desktop PC already has nearly 40,000 DLL files, totalling 25GB,
> belong to the system or to installed programs.
>
> If static linking of third party libraries was such a great idea, then I
> would hardly ever see DLL files!

I do not advocate using static linking, but in some (rare) situations
it can be helpful. Cretainly not a normal approach on desktop system.

>
> Archive files have lots of problems:
>
> * Each library must issue them for multiple languages/compilers/formats
> (raylib provides seprate versions for tcc and gcc, plus .dll that works
> with anything)

That is independent issue having little to do with static versus
dynamic. If you "standarize" on "one true toolchain" than you
can use single format. OTOH, AFAIK you can install alternative
DLL loaders in Windows, so if you wish you can have as many
dll formats as you wish.

> * Each language implementation must come with 1000s of archive files for
> misc libraries

Convince your vendors to use single standard format. Drop vendors
that do not agree.

> * Any updates (breaking changes is a separate topic) means generating
> multiple new versions, and updating every language installation that
> bundles .a files

Any change requires proper management. Again, choose your vendors...

> * Whether a smaller executable is actually achieved depends on how the
> library vendor built the library

I hope that we agreed that small size of static executable is not
_excluded_, but size usually prefers dynamic linking.

> * If most of a library is used anyway, then multiple applications using
> the same library will duplicate it on disk and in memory

Ditto.

> DLLs solve a lot of those:
>
> * The vendor provides one file that works with any compiler and any language

If that is true, than fine. However, what about situations where
"the same" library (but is subtly diffrent versions) is bundled
with several packages. Or lowely Windows Update decided to install
"new improved" version of some library.

> * No need to bundle with every compiler

But if you distribute binary you need to bundle non-standard
libraries.

> * Only one copy of the library on any one machine

Yes.

> * Updates can benefit applications without recompiling or reinstalling

Well, experience seem to indicate otherwise: "fixed" versions of
library functions may cause breakage. Better shared library
systems make sure that after incompatible change program still
uses old version of a function.

--
Waldek Hebisch

Subject	Replies	Author
Why does C allow structs to have a tag? By: James Harris on Sun, 6 Jun 2021	391	James Harris

Over the shoulder supervision is more a need of the manager than the programming task.

devel / comp.lang.c / Re: Why does C allow structs to have a tag?