Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Being overloaded is the sign of a true Debian maintainer. -- JHM on #Debian


devel / comp.os.msdos.djgpp / Running GNU on DOS with DJGPP

SubjectAuthor
o Running GNU on DOS with DJGPPBen Collver

1
Running GNU on DOS with DJGPP

<slrnut4eot.3kd.bencollver@svadhyaya.localdomain>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=455&group=comp.os.msdos.djgpp#455

  copy link   Newsgroups: comp.os.msdos.djgpp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bencoll...@tilde.pink (Ben Collver)
Newsgroups: comp.os.msdos.djgpp
Subject: Running GNU on DOS with DJGPP
Date: Sun, 18 Feb 2024 17:20:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 734
Message-ID: <slrnut4eot.3kd.bencollver@svadhyaya.localdomain>
Injection-Date: Sun, 18 Feb 2024 17:20:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ff8fa32ef33e13080dca2ac2fc4b99b0";
logging-data="1443304"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18UsIXUw8316+HfYmIPS7Si7F66OyKkh2k="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:7W1mF5O5CNAEO+j5VytHc/OzoNU=
 by: Ben Collver - Sun, 18 Feb 2024 17:20 UTC

# Running GNU on DOS with DJGPP

Peeking under the covers to see how DJGPP manages to run GCC on DOS

by Julio Merino
Feb 14, 2024

The recent deep dive into the IDEs of the DOS times 30 years ago made
me reminisce of DJGPP, a distribution of the GNU development tools
for DOS.

[Cover image consisting on a tiny portion of the sources of DJGPP's
dosexec.c source file, with a big MS-DOS logo in the center
surrounded by the logos of GNU, GCC, Bash, and Emacs.]

I remember using DJGPP back in the 1990s before I had been exposed to
Linux and feeling that it was a strange beast. Compared to the
Microsoft C Compiler and Turbo C++, the tooling was bloated and alien
to DOS, and the resulting binaries were huge. But DJGPP provided a
complete development environment for free, which I got from a monthly
magazine, and I could even look at its source code if I wished. You
can't imagine what a big deal that was at the time.

But even if I could look under the cover, I never did. I never really
understood why was DJGPP so strange, slow, and huge, or why it even
existed. Until now. As I'm in the mood of looking back, I've spent
the last couple of months figuring out what the foundations of this
software were and how it actually worked. Part of this research has
resulted in the previous two posts on DOS memory management. And part
of this research is this article. Let's take a look!

Special thanks go to DJ Delorie himself for reviewing a draft of this
article. Make sure to visit his website for DJGPP and a lot more cool
stuff!

<https://delorie.com/>

# What is DJGPP?

Simply put, DJGPP is a port of the GNU development tools to DOS. You
would think that this was an easy feat to achieve given that other
compilers did exist for DOS. However... you should know that Richard
Stallman (RMS)--the creator of GNU and GCC--thought that GCC, a
32-bit compiler, was too big to run on a 16-bit operating system
restricted to 1 MB of memory. DJ Delorie took this as a challenge in
1989 and, with all the contortions that we shall see below, made GCC
and other tools like GDB and Emacs work on DOS.

To a DOS and Windows user, DJGPP was, and still is, an alien
development environment: the tools' behavior is strange compared to
other DOS compilers, and that's primarily due to their Unix heritage.
For example, as soon as you start using DJGPP, you realize that flags
are prefixed by a dash instead of a slash, paths use forward slashes
instead of backward slashes, and the files don't ship in a flat
directory structure like most other programs did. But hey, all the
tools worked and, best of all, they were free!

In fact, from reading about the historical goals of the project, I
gather that a secondary goal was for DJ to evangelize free software
to as many people as possible, meeting them where they already were:
PC users with a not-very-powerful machine that ran DOS. Mind you,
this plan worked on some of us as we ended up moving to Linux and the
free software movement later on.

<https://www.delorie.com/djgpp/doc/eli-m17n99.html#Introduction>

In any case, being a free alien development environment doesn't
explain why it had to be huge and slow compared to other others. To
explain this, we need to look at the "32-bit compiler" part.

# DOS and hardware constraints

As we saw in a previous article, Intel PCs based on the 80386 have
two main modes of operation: real mode and protected mode. In real
mode, the processor behaves like a fast 16-bit 8086, limiting
programs to a 1 MB address space and with free reign to access memory
and hardware peripherals. In protected mode, programs are 32-bit,
have access to a 4 GB address space, and there are protection rules
in place to access memory and hardware.

<https://blogsystem5.substack.com/p/from-0-to-1-mb-in-dos>

DOS was a 16-bit operating system that ran in real mode. Applications
that ran on DOS leveraged DOS' services for things like disk access,
were limited to addressing 1 MB of memory, and had complete control
of the computer. Contrary to that, GCC was a 32-bit program that had
been designed to run on Unix (oops sorry, GNU is Not Unix) and
produce binaries for Unix, and Unix required virtual memory from the
ground up to support multiprocessing. (I know that's not totally
accurate but it's easier to think about it that way.)

<https://www.gnu.org/gnu/about-gnu.html>

<https://unix.stackexchange.com/questions/332699/
how-the-original-unix-kernel-adressed-memory>

Intel-native compilers for DOS, such as the Microsoft C compiler and
Turbo C++, targeted the 8086's weird segmented architecture and
generated code accordingly. Those compilers had to deal with short,
near, and far jumps--which is to say I have extra research to do and
write another article on ancient DOS memory models. GCC, on the other
hand, assumes the full address space is available to programs and
generates code making such assumptions.

GCC was not only a 32-bit program, though: it was also big. In order
to compile itself and other programs, GCC needed more physical memory
than PCs had back then. This means that, in order to port GCC to DOS,
GCC needed virtual memory. In turn, this means that GCC had to run in
protected mode. Yet... DOS is a real mode operating system, and
calling into DOS services to access files and the like requires the
processor to be in real mode.

To address this conundrum, DJ had to find a way to make GCC and the
programs it compiles integrate with DOS. After all, if you have a C
program that opens a file and you compile said program with GCC, you
want the program to open the file via the DOS file system for
interoperability reasons.

Here, witness this. The following silly program, headself.c, goes out
of its way to allocate a buffer above the 2 MB mark and then uses
said buffer to read itself into it, printing the very first line of
its source code:

#include <fcntl.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define BUFMINBASE 2 * 1024 * 1024
#define BUFSIZE 1 * 1024 * 1024

int main(void) {
// Allocate a buffer until its base address is past the 2MB boundary.
char* buf = NULL;
while (buf < (char*)(BUFMINBASE))
buf = (char*)malloc(BUFSIZE);
printf("Read buffer base is at %zd KB\n", ((intptr_t)buf) / 1024);

// Open this source file and print its first line. Really unsafe.
int fd = open("headself.c", O_RDONLY);
read(fd, buf, BUFSIZE);
char *ptr = buf; while (*ptr != '\n') ptr++; *(ptr + 1) = '\0';
printf("%s", buf);

return EXIT_SUCCESS;
}

Yes, yes, I know the above code is really unsafe and lacks error
handling throughout. But that's not important here. Watch out what
happens when we compile and run this program with DJGPP on DOS:

D:\>head -n1 headself.c
#include <fcntl.h>

D:\>gcc -o headself.exe headself.c

D:\>.\headself.exe
Read buffer is at 2673 KB
#include <fcntl.h>

D:\>_

Note two things. The first is that the program has to have run in
protected mode because it successfully allocated a buffer above the
1 MB mark and used it without extraneous API calls. The second is
that the program is invoking file operations, and those operations
interact with files managed by DOS.

And here is where the really cool stuff begins. On the one hand, we
have DOS as a real mode operating system. On the other hand, we have
programs that want to interoperate with DOS but they also want to
take advantage of protected mode to leverage the larger address space
and virtual memory. Unfortunately, protected mode cannot call DOS
services because those require real mode.

The accepted solution to this issue is the use of a DOS Extender as
we already saw in the previous article but such technology was in its
infancy. DJ actually went through three different iterations to fully
resolve this problem in DJGPP:

<https://blogsystem5.substack.com/p/beyond-the-1-mb-barrier-in-dos>

1. The first prototype used Phar Lap's DOS Extender but it didn't get
very far because it didn't support virtual memory.

2. Then, the first real version of DJGPP used DJ's own DOS Extender
called go32, a big hack that I'm not going to talk about here.

3. And then, the second major version of DJGPP--almost a full rewrite
of the first one--switched to using the DOS Protected Mode
Interface (DPMI).

At this point, DJGPP was able to run inside existing DPMI hosts such
as Windows or the many memory managers that already existed for DOS
and it didn't have to carry the hacks that previously existed in go32
(although the go32 code went on to live inside CWSDPMI). The
remainder of this article only talks about the latter of these
versions.

# Large buffers

One thing you may have noticed in the code of the headself.c example
above is that I'm using a buffer for the file read that's 1 MB-long.
That's not unintentional: for such a large buffer to even exist (no
matter our attempts to push it above 2 MBs), the buffer must be
allocated in extended memory. But if it is allocated in extended
memory, how can the file read operations that we send to DOS actually
address such memory? After all, even if we used unreal mode, the DOS
APIs wouldn't understand it.


Click here to read the complete article
1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor