Message-ID:

"Irrationality is the square root of all evil" -- Douglas Hofstadter

devel / comp.lang.c / Re: 32-bit pointers, 64-bit longs

On 18/05/2021 11:43, Malcolm McLean wrote:
> On Tuesday, 18 May 2021 at 09:14:37 UTC+1, muta...@gmail.com wrote:
>>
>>> There are "just a number" uses - a counter, an index, etc. The type for
>>> this needs to be big enough that it is not going to overflow - the
>>> programmer can treat it as though it were an unlimited size mathematical
>>> integer. These days, that really means 64-bit - or an unlimited integer
>>> type that grows as needed. The type name here should reflect that -
>>> "number" would be good. In C, this is "int".
>> But int is almost always 32-bits "these days". How do you
>> reconcile that?
>>
> You only rarely have 2 billion data points. A 1024 * 1024 image is quite large,
> for example, but it's only a million pixels.

Loads of everyday figures can exceed two billion:

* The size of a file (eg. a video file) expressed as bytes

* The capacity of a disk

* The amount of memory in a machine as bytes

* The world population

* How many seconds someone has been alive

* The number of views of a youtube video (and the total number of videos)

* The number of grains of rice on a square you will have when you're
only halfway along that chessboard

Now take one of these figures, and try and use it in a calculation.

i64 or u64 will handle all of these with ease (except for the final
square on the chessboard).

Weren't you advocating for 64-bit ints everywhere a few years ago? (And
didn't we have the same discussion even more recently!)

> Whist modern CPU operations with 64 bits will be as fast as 32 bit
operations, you
> can store more 32 bit integers in the cache, and it's cache mises
which are
> the main determiner of performance.
>

That's a decision you make when deciding the size of array and struct
elements. If i32, u32 or smaller is sufficient, then use that. But
calculations on those can be done on widened 64-bit versions in the
machine's registers.

On 18/05/2021 12:23, Bart wrote:
> On 18/05/2021 09:14, muta...@gmail.com wrote:
>
>> But int is almost always 32-bits "these days". How do you
>> reconcile that?
>
> Because most existing C programs would likely break if 'int' was
> suddenly 64 bits.

Would they? That is not my impression at all. Some would have trouble
if it was changed to 16-bit on their systems, because people assume
"int" is big enough to support numbers larger than 32767. But it will
rarely be a problem that "int" can store larger numbers.

>
>
>>> At no point do I see any purpose in a type whose name says "this is a
>>> bit shorter than general-purpose numbers" or "this is a bit longer than
>>> general-purpose numbers", /especially/ when these descriptions might not
>>> actually be accurate.
>>
>> That certainly makes sense. I have almost never
>> coded "short".
>>
>> Although I would say that "long" would mean "give
>> me whatever you're capable of, even if it is slow".
>>
>> I don't think you should cheat by assuming only
>> 64-bit processors.
>
> Why is it any more cheating than assuming 32-bit processors over 16-bits
> or 8-bits?

It isn't cheating, nor is it assuming anything - except perhaps that
integers of the size you pick for a general "number" are reasonably
efficient.

>
> What is magic about 80386?

Good question!

>
>> So on a 16-bit processor, don't
>> you want to distinguish between fast 16-bit
>> integers and slow 32-bit file offsets?
>>
>> What else are you going to call a file offset? You
>> think off_t is the appropriate thing to use and
>> fseek() should have used that from day 1?
>
> These are the fstat functions that Microsoft ended up with for
> information about a file's size and creation time:
>
> fstat, fstat32, fstat64, fstati64, fstat32i64, fstat64i32
>
> Six functions, but there are two variations of each of fstat and fstat32
> depending on some global macro that determined whether the time was a
> 32- or 64-bit type.
>
> So in the end, there are 8 combinations, but end result is that time
> might be 32 or 64 bits, and file size might be 32 or 64 bits. (There's
> further complications as I think some used a struct of two ints to
> represent 64 bits, others used an actual 64-bit int.)
>
> This is the mess that MS got into because of the transition from
> 16/32-bit system to full 64 bits, which didn't quite coincidence with
> the development of large file systems.
>

These problems can be made less bad (though not eliminated) by /not/
giving exact type sizes when possible. You do need known sizes at
interfaces to libraries or kernel calls, but when you can abstract them
with an unsized name, it can help. Application code can use an "fstat"
function and "stat_t" type (or whatever name you prefer), where the
types are defined in headers for a static library. The implementation
in the static library is then responsible for converting sizes and
calling the exact function in the system library or kernel, according
the sizes supported. Someone still needs to do the work on that static
library, but at least most users are isolated from the issue.

> You've decided to go back in time just so we can have all the same
> problems again!
>
> For a 'time', just use 64 bits; for a file offset; just use 64 bits. You
> might need to decide whether to use i64 or u64, but the range of i64 is
> so wide, it doesn't really matter. i64 has equivalent positive range to
> a u63 type.

These are likely to be fine. Having gone through stages of "16-bit is
good enough", "32-bit will be big enough for anyone", we really are at
the stage of integer sizes that /are/ going to be big enough for some
use-cases. 64-bit signed integer timestamps in microsecond resolution
will do for up to almost 300,000 years - file times beyond that are an SEP.

>
> Nobody who's ever going to use your system is going to thank you for
> reinventing that same mess.
>
>
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fstat-fstat32-fstat64-fstati64-fstat32i64-fstat64i32?view=msvc-160
>
>
> DB:
>>> In my own C (and C++) code, I use size-specific types (int32_t,
>>> uint16_t, etc.), some purpose-specific types (such as uintptr_t,
>>> size_t), and "int" for general numbers. I use "char" strictly for
>>> letters, string data and other character data, and of course "bool" for
>>> true/false flags. To me, "long" and "short" are as outdated as
>>> non-prototype function declarations.
>>
>> I'm wondering whether int32_t is the thing that has
>> any meaning.
>>
>> Is there really something magical about 32 bits?
>
> In the computer world, then yes, since practically every machine is
> based on power-of-two word sizes.
>
>

David Brown <david.brown@hesbynett.no> writes:

> We have a far greater range of languages now, with a wide selection of
> balances between features, simplicity, run-time efficiency, developer
> efficiency, ease-of-use, safety, etc. Some are more minimal, with
> perhaps just a single integer type at 64-bit, others have a selection of
> different sizes for different purposes.
>
> IME and IMHO, I would say there are three basic kinds of integers,
> depending on their usage.
>
> There are low-level uses - required for interaction with hardware,
> connection to data outside the program (file formats, network protocols,
> foreign-function interfaces, etc.), or for when you want accurate
> control such as for getting maximal efficiency from large data
> structures. In these cases, you want size-specific types. Whether you
> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
> should be size-specific and explicit.

I don't see why you need size-specific integers for that. To avoid a
lot of pain, you need an integer type that is big enough for the values
you might have to do arithmetic on, but I don't see the need for
explicit sized integer types.

--
Ben.

On Tuesday, May 18, 2021 at 8:23:17 PM UTC+10, Bart wrote:

> > I don't think you should cheat by assuming only
> > 64-bit processors.

> Why is it any more cheating than assuming 32-bit processors over 16-bits
> or 8-bits?

I'm not. I want to cover them all, or understand why
they can't be covered.

> What is magic about 80386?

I wouldn't say it is magic. It's just the point at which
I paused. I do have an expectation that if someone
is willing to write 400,000 lines of C code to produce
a 3 MB executable, we do need processors that can
execute that. I don't want to hold monkeys back.

But expecting hardware engineers to provide 4 GiB
of memory is ridiculous. The fact that the hardware
folk are busy performing miracles shouldn't mean
that monkeys start running processes that use
300 MB (I keep on clicking on google chrome
processes using more than that and choose "end
task" to try to bring some sanity to my slow (in
2021, despite hardware miracles) Windows system).

But 32-bit is the transition point. I've had the nightmare
of being able to type (as a monkey) more data than a
*computer* (XT) can handle, and I can also see the
theoretical possibility a system will exceed 4 GiB of
memory and then what are you going to do?

I used to check memory prices to see if I could
personally afford 4 GiB of memory to test the limits
of PDOS/386. I can remember when it was the cost
of a car, but motherboards couldn't cope with that.

> This is the mess that MS got into because of the transition from
> 16/32-bit system to full 64 bits, which didn't quite coincidence with
> the development of large file systems.
>
> You've decided to go back in time just so we can have all the same
> problems again!

Well, my question is more - with foresight, could these
problems have been avoided? Did Microsoft really just
blindly thrash around?

Is it technically impossible to do it cleanly? What is the
technical barrier?

I keep coming back to C90. I don't see a problem. It
covers everything. It always has.

You mentioned the mess of fstat() - that's beyond what
Ritchie and ISO C90 provided. Maybe that's the root
problem. I normally don't go beyond C90. I'll bend
everything else, but not C90.

Did Ritchie need to have a talk to Gates and say "don't
do it son"?

> For a 'time', just use 64 bits; for a file offset; just use 64 bits. You
> might need to decide whether to use i64 or u64, but the range of i64 is
> so wide, it doesn't really matter. i64 has equivalent positive range to
> a u63 type.
>
> Nobody who's ever going to use your system is going to thank you for
> reinventing that same mess.

My OS currently produces a warning if you use any DLL
other than msvcrt.dll. And my msvcrt.dll only provides
C90.

It's true that you can ignore that warning, and then you
have something resembling MSDOS functionality. So
you mentioned fstat(). I provide something similar.

/*Function to get all the file attributes from the filename*/
int PosGetFileAttributes(const char *fnm,int *attr)
{ union REGS regsin;
union REGS regsout;
struct SREGS sregs;

regsin.h.ah = 0x43;
regsin.h.al=0x00;
#ifdef __32BIT__
regsin.d.edx = (int)fnm;
#else
sregs.ds = FP_SEG(fnm);
regsin.x.dx = FP_OFF(fnm);
#endif
int86x(0x21, &regsin, &regsout, &sregs);
*attr = regsout.x.cx;

if (!regsout.x.cflag)
{
regsout.x.ax = 0;
}
return (regsout.x.ax);
}

That interface looks like it's going to work fine when
I produce a PDOS/x64.

Maybe Gates got it almost right with MSDOS, he just
failed to provide a C wrapper?

> > I'm wondering whether int32_t is the thing that has
> > any meaning.
> >
> > Is there really something magical about 32 bits?

> In the computer world, then yes, since practically every machine is
> based on power-of-two word sizes.

Yes, but aren't these things meant to be tied back to
some business purpose?

I don't think any businessman is ever going to say
"ok, nobody at my bank is allowed to have more than
4 billion cents in their account, so please set a hard
limit on that - it's a government regulation anyway".

Or let's put it another way - did Ritchie fundamentally
fuck up by not providing int32_t 50 years ago and
it required ISO to clean up the mess in 1999? I'm not
convinced. I think it might be ISO who fundamentally
fucked up and Ritchie was right all along.

If we take things back to first principles, and use the
benefit of hindsight, maybe we can answer that
question. After 50 years I would hope we have
enough experimental data.

BFN. Paul.

On 18/05/2021 14:15, Ben Bacarisse wrote:
> David Brown <david.brown@hesbynett.no> writes:
>
>> We have a far greater range of languages now, with a wide selection of
>> balances between features, simplicity, run-time efficiency, developer
>> efficiency, ease-of-use, safety, etc. Some are more minimal, with
>> perhaps just a single integer type at 64-bit, others have a selection of
>> different sizes for different purposes.
>>
>> IME and IMHO, I would say there are three basic kinds of integers,
>> depending on their usage.
>>
>> There are low-level uses - required for interaction with hardware,
>> connection to data outside the program (file formats, network protocols,
>> foreign-function interfaces, etc.), or for when you want accurate
>> control such as for getting maximal efficiency from large data
>> structures. In these cases, you want size-specific types. Whether you
>> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
>> should be size-specific and explicit.
>
> I don't see why you need size-specific integers for that. To avoid a
> lot of pain, you need an integer type that is big enough for the values
> you might have to do arithmetic on, but I don't see the need for
> explicit sized integer types.
>

As has been mentioned, you can use bytes to access file formats or
network protocols. Size-specific integers can make it simpler and more
efficient (given appropriate endianness and alignment), but they are not
strictly necessary.

For hardware access, you need size-specific accesses. Most hardware on
most systems involves memory-mapped interfaces, but usually you need to
access the registers with the right size - if you have a 32-bit
register, you can't get the right results with 4 8-bit accesses or a
64-bit access.

Ultimately, what you need is a set of functions like "write_volatile_8",
"read_volatile_16", etc.

Size-specific types make low-level programming a great deal easier. You
can well argue that you don't /need/ them - but you do /want/ them.

On Tuesday, May 18, 2021 at 8:35:34 PM UTC+10, Bart wrote:

> > Note that I am interested in supporting 16-bit, 32-bit and 64-bit
> > systems, so I'm after recommendations for that.
> > Possibly 8-bit in the future too.

> Then you are creating extra problems. You may need a special type that
> is different things across those 4 architectures (pointers will already
> be different).
>
> You need to decide whether each of those systems will only ever be used
> with matching, contemporary hardware, so that an 8-bit system will only
> ever see a 100MB drive for example, and which means a file-size type can
> be specific to it.
>
> Or whether you plan on allowing an 8-bit system to access a 2TB drive,
> then you will need the same 64-bit file-size type as used on current
> machines.

Any reason why the language shouldn't accommodate
both of those?

I'm happy to recompile.

> > I'm willing for the 8-bit and 16-bit to be *slow*, in deference
> > to 32-bit and 64-bit, but not fail to work *at all*.

> Another approach then is for 8- and 16-bit machines to run emulators for
> larger machines. That will be extra slow.

Actually I was thinking that PDOS/86 should install
interrupt handlers to enable it to run 80386 code on
an 8086.

But I think it is more appropriate to simply recompile
the application for the 8086.

I have an inherent assumption that I have the source
code and it is C90-compliant. But I'm willing to budge
on the C90 bit if someone can explain exactly how
Ritchie fucked up and there was nothing ISO could
do about it in 1990. But all indications to date are that
Ritchie and ANSI/ISO in 1990 were near-infallible.

It's everyone else who fucked up. Especially with crap
like Java.

BFN. Paul.

On Tuesday, 18 May 2021 at 12:06:47 UTC+1, Bart wrote:
> On 18/05/2021 11:43, Malcolm McLean wrote:
> > On Tuesday, 18 May 2021 at 09:14:37 UTC+1, muta...@gmail.com wrote:
> >>
> >>> There are "just a number" uses - a counter, an index, etc. The type for
> >>> this needs to be big enough that it is not going to overflow - the
> >>> programmer can treat it as though it were an unlimited size mathematical
> >>> integer. These days, that really means 64-bit - or an unlimited integer
> >>> type that grows as needed. The type name here should reflect that -
> >>> "number" would be good. In C, this is "int".
> >> But int is almost always 32-bits "these days". How do you
> >> reconcile that?
> >>
> > You only rarely have 2 billion data points. A 1024 * 1024 image is quite large,
> > for example, but it's only a million pixels.
> Loads of everyday figures can exceed two billion:
>
> * The size of a file (eg. a video file) expressed as bytes
> * The capacity of a disk
> * The amount of memory in a machine as bytes
> * The world population
> * How many seconds someone has been alive
> * The number of views of a youtube video (and the total number of videos)
> * The number of grains of rice on a square you will have when you're
> only halfway along that chessboard
>
> Now take one of these figures, and try and use it in a calculation.
>
> i64 or u64 will handle all of these with ease (except for the final
> square on the chessboard).
>
"Rarely" means that "you can find some counter-examples".
>
> Weren't you advocating for 64-bit ints everywhere a few years ago? (And
> didn't we have the same discussion even more recently!)
>
Yes. I was advocating for a simplified language and architecture in which
floating point values and integers are both 64 bits. However most people
have decided to use 32 bits as the default for an integer, and there are
reasons for that.

On Tuesday, May 18, 2021 at 10:32:46 PM UTC+10, David Brown wrote:

> For hardware access, you need size-specific accesses. Most hardware on

And perhaps that's the proper answer - that is where
it should stay.

> most systems involves memory-mapped interfaces, but usually you need to
> access the registers with the right size - if you have a 32-bit
> register, you can't get the right results with 4 8-bit accesses or a
> 64-bit access.

Writing PDOS/86 and PDOS/386 allowed me to see what
the issues were.

And I don't see an issue with C90. The interface between
applications and OS can be given in C90 terms.

All basic functions - editing C code using an ANSI terminal,
compiling that code, running that code - C90 is all that is
required.

If you want to add a stack of #ifdefs to do some extra
things not covered by C90, go right ahead. But keep those
defaults non-default and everything is fine.

It's true that I don't directly deal with the hardware - I
rely on the BIOS, so if you say you need int32_t to
do that job properly, fine, BIOS writers are free to use
int32_t. I'm basically only interested in the OS and
applications. An assumption I failed to mention, sorry.

But it's a bit odd that ISO didn't relegate that to some
sort of "BIOS appendix".

BFN. Paul.

David Brown <david.brown@hesbynett.no> writes:

> On 18/05/2021 14:15, Ben Bacarisse wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>
>>> We have a far greater range of languages now, with a wide selection of
>>> balances between features, simplicity, run-time efficiency, developer
>>> efficiency, ease-of-use, safety, etc. Some are more minimal, with
>>> perhaps just a single integer type at 64-bit, others have a selection of
>>> different sizes for different purposes.
>>>
>>> IME and IMHO, I would say there are three basic kinds of integers,
>>> depending on their usage.
>>>
>>> There are low-level uses - required for interaction with hardware,
>>> connection to data outside the program (file formats, network protocols,
>>> foreign-function interfaces, etc.), or for when you want accurate
>>> control such as for getting maximal efficiency from large data
>>> structures. In these cases, you want size-specific types. Whether you
>>> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
>>> should be size-specific and explicit.
>>
>> I don't see why you need size-specific integers for that. To avoid a
>> lot of pain, you need an integer type that is big enough for the values
>> you might have to do arithmetic on, but I don't see the need for
>> explicit sized integer types.
>
> As has been mentioned, you can use bytes to access file formats or
> network protocols. Size-specific integers can make it simpler and more
> efficient (given appropriate endianness and alignment), but they are not
> strictly necessary.

Right. You seemed to be saying they were needed for this purpose.

And I don't see why they would necessarily be more efficient. On some
architectures, arithmetic on shorter integers is slow (or at least no
faster) than on long ones.

> For hardware access, you need size-specific accesses.

Yes. And alignment. And representation. Giving the language
size-specific integers is not the solution to this problem, but it's a
cheap one and has caught on at the expense of doing it "properly".

> Most hardware on
> most systems involves memory-mapped interfaces, but usually you need to
> access the registers with the right size - if you have a 32-bit
> register, you can't get the right results with 4 8-bit accesses or a
> 64-bit access.

That's what high(ish)-level programming languages and compilers are
for.

> Ultimately, what you need is a set of functions like "write_volatile_8",
> "read_volatile_16", etc.

I thought you were talking about languages in general. In the general
case, you need a notation to describe formats, and a syntax for mapping
fields to values. Those values need not come from, or be stored in, a
type of any particular size, other than one that is large enough. A
language with a single integer type could manage perfectly well.

> Size-specific types make low-level programming a great deal easier.
> You can well argue that you don't /need/ them - but you do /want/
> them.

I disagree. Every time I use them for this purpose, I realise I
actually want something else, but I have to make do with what C
provides.

If C's bit fields were not so very implementation defined, they would be
a step closed to what I'm talking about.

--
Ben.

On Tuesday, 18 May 2021 at 13:53:43 UTC+1, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
> That's what high(ish)-level programming languages and compilers are
> for.
> > Ultimately, what you need is a set of functions like "write_volatile_8",
> > "read_volatile_16", etc.
> I thought you were talking about languages in general. In the general
> case, you need a notation to describe formats, and a syntax for mapping
> fields to values. Those values need not come from, or be stored in, a
> type of any particular size, other than one that is large enough. A
> language with a single integer type could manage perfectly well.
>
A lot of data is defined in terms of bits and bytes. Either because it's a
binary format, like MPEG, or because it's compressed to save memory,
or because it needs to be passed to and from systems outside the program,
like graphics display systems that expect a certain image format.

On 18/05/2021 12:12, David Brown wrote:
> On 18/05/2021 12:23, Bart wrote:
>> On 18/05/2021 09:14, muta...@gmail.com wrote:
>>
>>> But int is almost always 32-bits "these days". How do you
>>> reconcile that?
>>
>> Because most existing C programs would likely break if 'int' was
>> suddenly 64 bits.
>
> Would they? That is not my impression at all. Some would have trouble
> if it was changed to 16-bit on their systems, because people assume
> "int" is big enough to support numbers larger than 32767. But it will
> rarely be a problem that "int" can store larger numbers.

Look at it from a FFI point of view. This is a declaration from the STB
image library:

int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x,
int *y, int *comp);

The compiler that processes the implementaton changes overnight from i32
to i64 for 'int'. Used with external tools and languages, this might
stop it working. Even if they know about it, how do they tell which
version of int is in used?

What about precompiled libraries? This is not like 32-vs-64-bit code
which will not link.

As for code which is entirely recompiled, I tried converting a large
program to see what would happen, by replacing 'int' with 'Int', the
latter initially a typedef for 'int' to check it still worked, before
switching it to a wider type.

But then I hadn't reckoned on 'long long Int' and 'unsigned Int' (or
'Int unsigned') so C's multi-token types makes such a simple process harder.

I'd have another go later, but this program still needs to interact with
a standard library where ints are still 32 bits, so it might not work
for that reason. Testing is going to be difficult unless done on a 100%
64-bit-int implementation.

Any code that has different behaviour based on sizeof(int) having a
certain value or certain relationship with other types, might execute
different code paths than previously, ones less tested.

In short, I can't tell you whether any current problem will work or not;
obviously you have a lot more confidence than I have, even with other
people's code!

I only know that my generated C code will probably still work - if I
update the FFI declarations pertaining to the C library - but only
because I never use unadorned 'int', only int32 or int64.

This is an example that won't:

#include <stdio.h>

typedef int Int;

struct {
union {
long long int a;
struct {Int low, high;};
};
} X;

int main(void) {
X.a=0x1122334400667788;
printf("%16llX\n",X.a);
printf("%8X%08X\n",X.high, X.low);
}

This works as expected when Int is 32 bits, so that 'low' and 'high'
correspond to the two halves of 'a' (ignoring endian issues). But make
Int 64 bits, and it no longer works.

Are there really so few programs that make such assumptions?

On Tuesday, May 18, 2021 at 8:53:43 AM UTC-4, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
> > On 18/05/2021 14:15, Ben Bacarisse wrote:
> >> David Brown <david...@hesbynett.no> writes:
> >>
> >>> We have a far greater range of languages now, with a wide selection of
> >>> balances between features, simplicity, run-time efficiency, developer
> >>> efficiency, ease-of-use, safety, etc. Some are more minimal, with
> >>> perhaps just a single integer type at 64-bit, others have a selection of
> >>> different sizes for different purposes.
> >>>
> >>> IME and IMHO, I would say there are three basic kinds of integers,
> >>> depending on their usage.
> >>>
> >>> There are low-level uses - required for interaction with hardware,
> >>> connection to data outside the program (file formats, network protocols,
> >>> foreign-function interfaces, etc.), or for when you want accurate
> >>> control such as for getting maximal efficiency from large data
> >>> structures. In these cases, you want size-specific types. Whether you
> >>> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
> >>> should be size-specific and explicit.
> >>
> >> I don't see why you need size-specific integers for that. To avoid a
> >> lot of pain, you need an integer type that is big enough for the values
> >> you might have to do arithmetic on, but I don't see the need for
> >> explicit sized integer types.
> >
> > As has been mentioned, you can use bytes to access file formats or
> > network protocols. Size-specific integers can make it simpler and more
> > efficient (given appropriate endianness and alignment), but they are not
> > strictly necessary.
> Right. You seemed to be saying they were needed for this purpose.
>
> And I don't see why they would necessarily be more efficient. On some
> architectures, arithmetic on shorter integers is slow (or at least no
> faster) than on long ones.
> > For hardware access, you need size-specific accesses.
> Yes. And alignment. And representation. Giving the language
> size-specific integers is not the solution to this problem, but it's a
> cheap one and has caught on at the expense of doing it "properly".
> > Most hardware on
> > most systems involves memory-mapped interfaces, but usually you need to
> > access the registers with the right size - if you have a 32-bit
> > register, you can't get the right results with 4 8-bit accesses or a
> > 64-bit access.
> That's what high(ish)-level programming languages and compilers are
> for.
> > Ultimately, what you need is a set of functions like "write_volatile_8",
> > "read_volatile_16", etc.
> I thought you were talking about languages in general. In the general
> case, you need a notation to describe formats, and a syntax for mapping
> fields to values. Those values need not come from, or be stored in, a
> type of any particular size, other than one that is large enough. A
> language with a single integer type could manage perfectly well.
> > Size-specific types make low-level programming a great deal easier.
> > You can well argue that you don't /need/ them - but you do /want/
> > them.
> I disagree. Every time I use them for this purpose, I realise I
> actually want something else, but I have to make do with what C
> provides.

I vaguely remember you having this opinion before on not needing
to use size and encoding specific types for I/O.

I typically have hard constraints on the binary format of data in a file
or over network packets. If I want a 2's complement encoded
integer in C, I go to the stdint.h and declare it that way, and that is
the interface used in any API to read and write structs of that
specification. For any kind of bit or byte orientation issues, that
typically is handled from the build environment like a configure
style script creating a code sample to validate the behavior against
the compiler and set of compiler flags. Or it's handled by always
running the software on a single hardware architecture that locks in
any of those orientation issues.

I can't see how I/O using the common integer types to a rigidly
defined specification is in anyway useful without a lot of configure
type tests in the build environment. Why would I use 'int' as part of
an API to read or write a 32-bit 2's complement integer when
int32_t was specifically designed for this purpose.

If I need to send binary encoded IEEE-754 float or double over a
network packet, I have the build environment run a batch of tests
that will break the build if the 'float' or 'double' type fail to meet
IEEE-754 criteria.

I'm not going to get approval to write software to fly an aircraft
with such ill-specified types I can tell you that right now.

Can you elaborate more on what you mean?

Best regards,
John D.

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Tuesday, 18 May 2021 at 13:53:43 UTC+1, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>> That's what high(ish)-level programming languages and compilers are
>> for.
>> > Ultimately, what you need is a set of functions like "write_volatile_8",
>> > "read_volatile_16", etc.
>> I thought you were talking about languages in general. In the general
>> case, you need a notation to describe formats, and a syntax for mapping
>> fields to values. Those values need not come from, or be stored in, a
>> type of any particular size, other than one that is large enough. A
>> language with a single integer type could manage perfectly well.
>>
> A lot of data is defined in terms of bits and bytes. Either because it's a
> binary format, like MPEG, or because it's compressed to save memory,
> or because it needs to be passed to and from systems outside the program,
> like graphics display systems that expect a certain image format.

Yes. When the data are raw bits, there may be no need to map to and
from a numerical value, but a language with a notation to describe
formats would presumably also permit testing, copying and setting raw
fields.

--
Ben.

On 18/05/2021 15:06, Bart wrote:
> On 18/05/2021 12:12, David Brown wrote:
>> On 18/05/2021 12:23, Bart wrote:
>>> On 18/05/2021 09:14, muta...@gmail.com wrote:
>>>
>>>> But int is almost always 32-bits "these days". How do you
>>>> reconcile that?
>>>
>>> Because most existing C programs would likely break if 'int' was
>>> suddenly 64 bits.
>>
>> Would they? That is not my impression at all. Some would have trouble
>> if it was changed to 16-bit on their systems, because people assume
>> "int" is big enough to support numbers larger than 32767. But it will
>> rarely be a problem that "int" can store larger numbers.
>
> Look at it from a FFI point of view.

I know that a FFI is important for you, but for the majority of
programmers it is not something that needs emphasis. It is needed by
language implementers, and people making wrappers to allow access to
external libraries from within that language. But usually programmers
of a given language do not use an FFI directly. Thus looking at things
from an FFI viewpoint merely means it has to be possible to make it work
- there is no need for it to be simple or easy.

> This is a declaration from the STB
> image library:
>
> int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x,
> int *y, int *comp);
>
> The compiler that processes the implementaton changes overnight from i32
> to i64 for 'int'. Used with external tools and languages, this might
> stop it working. Even if they know about it, how do they tell which
> version of int is in used?
>

Fair enough. That will need changes.

> What about precompiled libraries? This is not like 32-vs-64-bit code
> which will not link.
>
> As for code which is entirely recompiled, I tried converting a large
> program to see what would happen, by replacing 'int' with 'Int', the
> latter initially a typedef for 'int' to check it still worked, before
> switching it to a wider type.
>
> But then I hadn't reckoned on 'long long Int' and 'unsigned Int' (or
> 'Int unsigned') so C's multi-token types makes such a simple process
> harder.
>
> I'd have another go later, but this program still needs to interact with
> a standard library where ints are still 32 bits, so it might not work
> for that reason. Testing is going to be difficult unless done on a 100%
> 64-bit-int implementation.
>
> Any code that has different behaviour based on sizeof(int) having a
> certain value or certain relationship with other types, might execute
> different code paths than previously, ones less tested.
>
> In short, I can't tell you whether any current problem will work or not;
> obviously you have a lot more confidence than I have, even with other
> people's code!
>
> I only know that my generated C code will probably still work - if I
> update the FFI declarations pertaining to the C library - but only
> because I never use unadorned 'int', only int32 or int64.
>
> This is an example that won't:
>
> #include <stdio.h>
>
> typedef int Int;
>
> struct {
>      union {
>          long long int a;
>          struct {Int low, high;};
>      };
> } X;
>
>
>
> int main(void) {
>       X.a=0x1122334400667788;
>       printf("%16llX\n",X.a);
>       printf("%8X%08X\n",X.high, X.low);
> }
>
> This works as expected when Int is 32 bits, so that 'low' and 'high'
> correspond to the two halves of 'a' (ignoring endian issues). But make
> Int 64 bits, and it no longer works.
>
> Are there really so few programs that make such assumptions?

It is always risky trying to guess how many programs do or do not do
something. But I would not expect to see an assumption that "int" is
always exactly 32 bit happening very often - and less problems with
making it 64-bit than you'd get making it 16-bit.

But I accept that interfaces to code that you are not re-compiling, and
which has "int *" parameters, is going to be a complicating factor.

John Dill <jadill33@gmail.com> writes:

> On Tuesday, May 18, 2021 at 8:53:43 AM UTC-4, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>> > On 18/05/2021 14:15, Ben Bacarisse wrote:
>> >> David Brown <david...@hesbynett.no> writes:
>> >>
>> >>> We have a far greater range of languages now, with a wide selection of
>> >>> balances between features, simplicity, run-time efficiency, developer
>> >>> efficiency, ease-of-use, safety, etc. Some are more minimal, with
>> >>> perhaps just a single integer type at 64-bit, others have a selection of
>> >>> different sizes for different purposes.
>> >>>
>> >>> IME and IMHO, I would say there are three basic kinds of integers,
>> >>> depending on their usage.
>> >>>
>> >>> There are low-level uses - required for interaction with hardware,
>> >>> connection to data outside the program (file formats, network protocols,
>> >>> foreign-function interfaces, etc.), or for when you want accurate
>> >>> control such as for getting maximal efficiency from large data
>> >>> structures. In these cases, you want size-specific types. Whether you
>> >>> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
>> >>> should be size-specific and explicit.
>> >>
>> >> I don't see why you need size-specific integers for that. To avoid a
>> >> lot of pain, you need an integer type that is big enough for the values
>> >> you might have to do arithmetic on, but I don't see the need for
>> >> explicit sized integer types.
>> >
>> > As has been mentioned, you can use bytes to access file formats or
>> > network protocols. Size-specific integers can make it simpler and more
>> > efficient (given appropriate endianness and alignment), but they are not
>> > strictly necessary.
>> Right. You seemed to be saying they were needed for this purpose.
>>
>> And I don't see why they would necessarily be more efficient. On some
>> architectures, arithmetic on shorter integers is slow (or at least no
>> faster) than on long ones.
>> > For hardware access, you need size-specific accesses.
>> Yes. And alignment. And representation. Giving the language
>> size-specific integers is not the solution to this problem, but it's a
>> cheap one and has caught on at the expense of doing it "properly".
>> > Most hardware on
>> > most systems involves memory-mapped interfaces, but usually you need to
>> > access the registers with the right size - if you have a 32-bit
>> > register, you can't get the right results with 4 8-bit accesses or a
>> > 64-bit access.
>> That's what high(ish)-level programming languages and compilers are
>> for.
>> > Ultimately, what you need is a set of functions like "write_volatile_8",
>> > "read_volatile_16", etc.
>> I thought you were talking about languages in general. In the general
>> case, you need a notation to describe formats, and a syntax for mapping
>> fields to values. Those values need not come from, or be stored in, a
>> type of any particular size, other than one that is large enough. A
>> language with a single integer type could manage perfectly well.
>> > Size-specific types make low-level programming a great deal easier.
>> > You can well argue that you don't /need/ them - but you do /want/
>> > them.
>> I disagree. Every time I use them for this purpose, I realise I
>> actually want something else, but I have to make do with what C
>> provides.
>
> I vaguely remember you having this opinion before on not needing
> to use size and encoding specific types for I/O.

That does not sound like what I had hoped to convey. You absolutely
need a way to describe bits and bytes that map to the outside world.

> I typically have hard constraints on the binary format of data in a file
> or over network packets. If I want a 2's complement encoded
> integer in C, I go to the stdint.h and declare it that way, and that is
> the interface used in any API to read and write structs of that
> specification. For any kind of bit or byte orientation issues, that
> typically is handled from the build environment like a configure
> style script creating a code sample to validate the behavior against
> the compiler and set of compiler flags. Or it's handled by always
> running the software on a single hardware architecture that locks in
> any of those orientation issues.

Yes, this is how it's usually done in C. There are problems, but as you
say, they can all be overcome. (Well, the absence of the know size
types would cause a problem, the world is C-friendly these days.)

> I can't see how I/O using the common integer types to a rigidly
> defined specification is in anyway useful without a lot of configure
> type tests in the build environment. Why would I use 'int' as part of
> an API to read or write a 32-bit 2's complement integer when
> int32_t was specifically designed for this purpose.

I agree. I get the feeling you think I am saying "use int".

Part of the problem is that I am not talking about C. C is what it is,
and we get round the problem as best we can. Mostly they are trivial
problems like endianess.

But I got the impression that David was describing the use of integer
types in programming in general and was saying that a range of know-size
integer types are a required feature of languages that interface with
hardware.

> If I need to send binary encoded IEEE-754 float or double over a
> network packet, I have the build environment run a batch of tests
> that will break the build if the 'float' or 'double' type fail to meet
> IEEE-754 criteria.
>
> I'm not going to get approval to write software to fly an aircraft
> with such ill-specified types I can tell you that right now.
>
> Can you elaborate more on what you mean?

First off, I'm not talking about C. C manages, and when there are
problems, we work round them, usually using external tools like the
build system. This method is unlikely to come unstuck because you
probably won't come across a machine with 36-bit sign and magnitude
integers (and thus no int32_t).

C has, perhaps, led people to think that the solution must be lots of
integer types, ideally all with known sizes. I am saying that what you
really need is a notation in the language to describe data formats.

Obviously there would be ways to get and set the bits without caring how
they map to a value. For example, a network address might simply by
copied from one place to another. But sometimes you need to get or set
the numeric value of a field. A language with, say, only one unbounded
integer type could do this just fine. If 'field' has been defined as
being 24-bits wide using sign and magnitude big-endian representation,
then

header.field.asInteger

will get the value for us to do arithmetic on.

Such a notation would also probably include explicit alignment and
padding, so there would be no need for the external format to match what
the compiler produces.

--
Ben.

On 18/05/2021 14:53, Ben Bacarisse wrote:
> David Brown <david.brown@hesbynett.no> writes:
>
>> On 18/05/2021 14:15, Ben Bacarisse wrote:
>>> David Brown <david.brown@hesbynett.no> writes:
>>>
>>>> We have a far greater range of languages now, with a wide selection of
>>>> balances between features, simplicity, run-time efficiency, developer
>>>> efficiency, ease-of-use, safety, etc. Some are more minimal, with
>>>> perhaps just a single integer type at 64-bit, others have a selection of
>>>> different sizes for different purposes.
>>>>
>>>> IME and IMHO, I would say there are three basic kinds of integers,
>>>> depending on their usage.
>>>>
>>>> There are low-level uses - required for interaction with hardware,
>>>> connection to data outside the program (file formats, network protocols,
>>>> foreign-function interfaces, etc.), or for when you want accurate
>>>> control such as for getting maximal efficiency from large data
>>>> structures. In these cases, you want size-specific types. Whether you
>>>> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
>>>> should be size-specific and explicit.
>>>
>>> I don't see why you need size-specific integers for that. To avoid a
>>> lot of pain, you need an integer type that is big enough for the values
>>> you might have to do arithmetic on, but I don't see the need for
>>> explicit sized integer types.
>>
>> As has been mentioned, you can use bytes to access file formats or
>> network protocols. Size-specific integers can make it simpler and more
>> efficient (given appropriate endianness and alignment), but they are not
>> strictly necessary.
>
> Right. You seemed to be saying they were needed for this purpose.

Wanted, rather than needed. But I would say that it is something you
/really/ want. It's a little like saying C doesn't /need/ the "for"
statement - "if" and "goto" can cover your needs. But you /want/ "for".

>
> And I don't see why they would necessarily be more efficient. On some
> architectures, arithmetic on shorter integers is slow (or at least no
> faster) than on long ones.

You use them primarily for accessing data, rather than for arithmetic.
Code for handling externally-defined fixed structures is just much
simpler and clearer to write when you can define a "struct" full of
size-specific types that map directly to the defined format. And it is
not unlikely that the results will be more efficient at run-time as well.

It is with good reason that most compilers (IME) have pre-defined macros
that tell you the endianness of the target, many have extensions to let
you have explicitly big-endian or little-endian types, and (IME) all
have extensions letting you have "packed" structures. This means you
read your file data or network packet directly into a struct and access
the fields (checking for sanity, of course - never trust external data
to be in the right format!).

>
>> For hardware access, you need size-specific accesses.
>
> Yes. And alignment. And representation.

Alignment is usually the smaller of the cpu bit width and the width of
the type, but there are exceptions. However, it's easy to check (a
static assertion on the size of the struct compared to the known size of
the externally defined structure is simple and reliable).

Representation is two's complement for signed integers, IEEE for
floating point, no padding bits. There hasn't been anything else made
for the last 40 years or so, except a few Burroughs mainframes.

If you need the absolute most portable code, then you have to do things
long-hand with reading chars, building up your bigger types as you go.
Certainly it can be done - but it is nice not to have to do it.

> Giving the language
> size-specific integers is not the solution to this problem, but it's a
> cheap one and has caught on at the expense of doing it "properly".
>

Define "properly". Maximal portability is a requirement for some code -
usually, it is not. If you are already using POSIX network functions or
Windows file functions to get the data, then you can safely use
<stdint.h> fixed-size types and you can make assumptions about various
other implementation-specific features. There is no point in making
code needlessly non-portable, but equally there is no point in making it
needlessly portable.

(I am a big fan of having compile-time failures if portability
assumptions do not hold - thus I am happy to use int32_t knowing that if
you try to compile it for a Burroughs mainframe, you'll get a compiler
error rather than mysterious problems at run-time.)

>> Most hardware on
>> most systems involves memory-mapped interfaces, but usually you need to
>> access the registers with the right size - if you have a 32-bit
>> register, you can't get the right results with 4 8-bit accesses or a
>> 64-bit access.
>
> That's what high(ish)-level programming languages and compilers are
> for.

For such low-level accesses, you need low-level control (which you can
get in C).

>
>> Ultimately, what you need is a set of functions like "write_volatile_8",
>> "read_volatile_16", etc.
>
> I thought you were talking about languages in general.

I think things are getting a little mixed up here - I can try to be clearer.

You need something like these access functions or other primitives,
regardless of the language, in order to control memory-mapped hardware.

> In the general
> case, you need a notation to describe formats, and a syntax for mapping
> fields to values. Those values need not come from, or be stored in, a
> type of any particular size, other than one that is large enough. A
> language with a single integer type could manage perfectly well.

Yes. Size-specific integers make it easier and more efficient, however,
at least in C or a C-like language. Using types that don't match the
fields in the fixed format, you will often need to do "packing" and
"unpacking" steps that are unnecessary when you can use the data directly.

>
>> Size-specific types make low-level programming a great deal easier.
>> You can well argue that you don't /need/ them - but you do /want/
>> them.
>
> I disagree. Every time I use them for this purpose, I realise I
> actually want something else, but I have to make do with what C
> provides.
>
> If C's bit fields were not so very implementation defined, they would be
> a step closed to what I'm talking about.
>

Can you give an example of what you would like here?

On 18/05/2021 16:33, David Brown wrote:
> On 18/05/2021 15:06, Bart wrote:

>> This is an example that won't [survive int becoming 64 bits]:
>>
>> #include <stdio.h>
>>
>> typedef int Int;
>>
>> struct {
>>      union {
>>          long long int a;
>>          struct {Int low, high;};
>>      };
>> } X;
>>
>>
>>
>> int main(void) {
>>       X.a=0x1122334400667788;
>>       printf("%16llX\n",X.a);
>>       printf("%8X%08X\n",X.high, X.low);
>> }
>>
>> This works as expected when Int is 32 bits, so that 'low' and 'high'
>> correspond to the two halves of 'a' (ignoring endian issues). But make
>> Int 64 bits, and it no longer works.
>>
>> Are there really so few programs that make such assumptions?
>
> It is always risky trying to guess how many programs do or do not do
> something. But I would not expect to see an assumption that "int" is
> always exactly 32 bit happening very often - and less problems with
> making it 64-bit than you'd get making it 16-bit.
>
> But I accept that interfaces to code that you are not re-compiling, and
> which has "int *" parameters, is going to be a complicating factor.

I simply don't know what will happen. The program I was playing with was
sqlite3.c which contains this:

#ifndef UINT32_TYPE
# ifdef HAVE_UINT32_T
# define UINT32_TYPE uint32_t
# else
# define UINT32_TYPE unsigned int
# endif
#endif

UINT32_TYPE is then used to define:

typedef UINT32_TYPE u32; /* 4-byte unsigned integer */

Notice that under some circumstances, u32 ends up defined on top of
'int'. That same program has:

#if defined(HAVE_STDINT_H)
typedef uintptr_t uptr;
#elif SQLITE_PTRSIZE==4
typedef u32 uptr;
#else
typedef u64 uptr;
#endif

As well as u32 being used inside a struct. If that struct is used to
interface to external code that doesn't use this definition of u32, then
the layout will be wrong.

This is just one detail of one program. Here's part of another (bzip.c):

#define BZ_VERSION "1.0.2, 30-Dec-2001"

typedef char Char;
typedef unsigned char Bool;
typedef unsigned char UChar;
typedef int Int32;
typedef unsigned int UInt32;
typedef short Int16;
typedef unsigned short UInt16;

It defines Int32 (with Int32* etc used everywhere) on top of 'int'.

I left the date in to show it's quite old, not using stdint.h, but there
is quite of lot of this kind of code about. This is why I have left
confidence that you about programs with a 64-int mostly still working.

"muta...@gmail.com" <mutazilah@gmail.com> writes:

> On Tuesday, May 18, 2021 at 4:54:40 PM UTC+10, David Brown wrote:
>
>> For a new language, there
>> are many possibilities but calling things "short" or "long" is unlikely
>> to be a good choice.
>
> Why do you say that?
>
> What are the first principles you are working from?
> (that Ritchie was apparently not aware of - and I
> certainly aren't)

Unexpected numbers of bytes in words has caused fifty years of anguish
for programmers caught unawares. Much, much better to have specified
the size of the variables in a way similar to the 'int32_t' and related
types that have been grafted on after the fact now.

John Dill <jadill33@gmail.com> writes:
>On Tuesday, May 18, 2021 at 8:53:43 AM UTC-4, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>

>> >>> There are low-level uses - required for interaction with hardware,
>> >>> connection to data outside the program (file formats, network protocols,
>> >>> foreign-function interfaces, etc.), or for when you want accurate
>> >>> control such as for getting maximal efficiency from large data
>> >>> structures. In these cases, you want size-specific types. Whether you
>> >>> call them int32_t, i32, Int<32>, etc., is a matter of taste. But they
>> >>> should be size-specific and explicit.
>> >>
>> >> I don't see why you need size-specific integers for that. To avoid a
>> >> lot of pain, you need an integer type that is big enough for the values
>> >> you might have to do arithmetic on, but I don't see the need for
>> >> explicit sized integer types.

>> I disagree. Every time I use them for this purpose, I realise I
>> actually want something else, but I have to make do with what C
>> provides.
>
>I vaguely remember you having this opinion before on not needing
>to use size and encoding specific types for I/O.

There are many places where an access must be exactly X bits.

Take, for example, pretty much any I/O device that provides
a memory mapped register space (or, for that matter,
the PCI configuration space). The size of the I/O request
on the bus must exactly match the I/O register size for the access
to succeed. Can't be misaligned, cannot (except in some specific
and well-defined cases[*]) be larger than the register size, and
and very often cannot be smaller (i.e. you can't do two 8-bit
accesses to read a 16-bit register, leaving aside any atomicity
constraints).

If the memory mapped register is 16-bits, a 32-bit store will
never reach the register.

[*] Generally for devices that support both 32-bit and 64-bit
processors, 64-bit address registers will allow accesses
to each 32-bit half.

On Tuesday, 18 May 2021 at 19:14:29 UTC+3, Joe Pfeiffer wrote:
> "muta...@gmail.com" <muta...@gmail.com> writes:
>
> > On Tuesday, May 18, 2021 at 4:54:40 PM UTC+10, David Brown wrote:
> >
> >> For a new language, there
> >> are many possibilities but calling things "short" or "long" is unlikely
> >> to be a good choice.
> >
> > Why do you say that?
> >
> > What are the first principles you are working from?
> > (that Ritchie was apparently not aware of - and I
> > certainly aren't)
> Unexpected numbers of bytes in words has caused fifty years of anguish
> for programmers caught unawares. Much, much better to have specified
> the size of the variables in a way similar to the 'int32_t' and related
> types that have been grafted on after the fact now.

The portable format specifiers for such in <inttypes.h> make the printfs
quite ugly and that arithmetic operations for some of those are missing
can confuse. Otherwise those are better indeed.

On Tuesday, 18 May 2021 at 16:50:20 UTC+1, Ben Bacarisse wrote:
>
> First off, I'm not talking about C. C manages, and when there are
> problems, we work round them, usually using external tools like the
> build system. This method is unlikely to come unstuck because you
> probably won't come across a machine with 36-bit sign and magnitude
> integers (and thus no int32_t).
>
> C has, perhaps, led people to think that the solution must be lots of
> integer types, ideally all with known sizes. I am saying that what you
> really need is a notation in the language to describe data formats.
>
> Obviously there would be ways to get and set the bits without caring how
> they map to a value. For example, a network address might simply by
> copied from one place to another. But sometimes you need to get or set
> the numeric value of a field. A language with, say, only one unbounded
> integer type could do this just fine. If 'field' has been defined as
> being 24-bits wide using sign and magnitude big-endian representation,
> then
>
> header.field.asInteger
>
> will get the value for us to do arithmetic on.
>
> Such a notation would also probably include explicit alignment and
> padding, so there would be no need for the external format to match what
> the compiler produces.
>
This was my idea for B64. You'd have only two basic types, a 64 bit integer and
a 64 bit floating point type. Strings would be zero-padded multiples of 8 bytes.

The you'd have "bit buffers" for talking to non-B64 routines, and for specifiying
higher-level structures. I never worked out an acceptable bit buffer description
language, however.

On 2021-05-18, Joe Pfeiffer <pfeiffer@cs.nmsu.edu> wrote:
> "muta...@gmail.com" <mutazilah@gmail.com> writes:
>
>> On Tuesday, May 18, 2021 at 4:54:40 PM UTC+10, David Brown wrote:
>>
>>> For a new language, there
>>> are many possibilities but calling things "short" or "long" is unlikely
>>> to be a good choice.
>>
>> Why do you say that?
>>
>> What are the first principles you are working from?
>> (that Ritchie was apparently not aware of - and I
>> certainly aren't)
>
> Unexpected numbers of bytes in words has caused fifty years of anguish
> for programmers caught unawares. Much, much better to have specified
> the size of the variables in a way similar to the 'int32_t' and related
> types that have been grafted on after the fact now.

I simply cannot agree with:

#include <unistd.h>

#define NITERATIONS ((int32_t) 5)
#define TESTFILE "test.txt"

int32_t main(int32_t argc, char **argv)
{
for (int32_t i = 0; i < NITERATIONS; i++) {
int32_t fd = open(TESTFILE, O_RDONLY);
...

}
}

There all kinds of situations when I just want an abstract int, just for
the sake of not writing in an silly language full of low-level,
irrelevant clutter.

This is not a strawman example. When you say that it's much better to
specify the sizes of variables, this is what you mean.

If you do not mean all variables, then you recognize that there
is a need for non-fixed-size integers.

What if that thinking had been applied thirty years ago?
Then we would be stuck with:

int main(int16 argc, char **argv)

C would be famous for not handling more than 32767 program arguments on
any platform.

The type int has nicely scaled with increasing platform capabilities.
On a platform with 16 or 18 bit int, you wouldn't have that many
arguments to a program. On a 32 bit platform you conceivably could; and
by golly, your "int argc" has scaled up to handle that, without a
code change having been required.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

On 18/05/2021 17:50, Ben Bacarisse wrote:

> But I got the impression that David was describing the use of integer
> types in programming in general and was saying that a range of know-size
> integer types are a required feature of languages that interface with
> hardware.
It is certainly possible to have alternative solutions that don't
involve known-size integer types. But known-size integer types are very
useful for interfacing to hardware, and feature heavily in embedded code
that has hardware interaction.

You mentioned bitfields, and I think it would be fair to say that a
bitfield structure syntax which gave precise control over sizes (and
ideally also alignments, padding and endianness) of fields would work as
well as fixed size types for many purposes.

It's also common for embedded systems to have to balance the range of
integer types and their efficiency. On modern PC's you can use 64-bit
types with no concern for efficiency (unless you are using them in large
arrays) and few worries about the range. On small embedded systems, you
might find that 16-bit or even 8-bit types are faster, so you pick your
sizes carefully - fixed size types makes that easier.

(I'd be happy to have a good way to define integer types with specific
ranges, like you can in Pascal or Ada, but that's more about stronger
types and better checking than interfacing to hardware.)

"muta...@gmail.com" <mutazilah@gmail.com> writes:

> On Tuesday, May 18, 2021 at 8:26:24 AM UTC+10, Bart wrote:
>
>> > I'm interested in the 32-bit 80836, so several instructions
>> > required to do 64-bit operations.
>> >
>> > It seems to me that Ritchie covered everything with
>> > short/int/long. If you're on a 32-bit machine and you
>> > want to do 64-bit arithmetic, then make "long" 64-bit.
>> > It will be slow, but that's what it's for. If you want to
>> > be fast, and can live with 32 bits, use "int".
>> >
>> > There's no need for a "long long" and there's no need
>> > for a fpos_t type and new functions.
>
>> If starting from scratch then it makes sense to settle on:
>>
>> char 8 bits
>> short 16
>> int 32
>> long 64
>
> I'm guessing that's what Ritchie had in mind. He had
> everything covered already for a very long time. Well
> past 2021.

I don't see any sign in K&R that they thought they intended anything of
the sort. As it says, "[t]he intent is that short and long should
provide different lengths of integers where practical; int will normally
reflect the most "natural" size for a particular machine. As you can
see, each compiler is fre to interprest short and long as appropriate
for its own hardware. About all you should count on is that short is no
longer than long".

The "as you can see" is referring to a table immediately before that
text giving the sizes of short, int, and long for several different
computers; on PDP-11 short and int are 16 bits while long is 32; on
Honeywell they're all 36 bits; on IBM 370 and Interdata they're
16-32-32.

The current 16-32-64 is perfectly reasonable within the context of what
they wrote, and probably the most straightforward way to advance from 32
bit machines to 64 bit machines while breaking as little code as
possible, but to say it was anticipated by 1978 is a real stretch.

>> 'long long' could be reserved for 128 bits but that's a going to be
>> difficult on a 32-bit processor.
>
> Well, if we ever reach that stage, rather than add yet
> more fundamental types, it's probably time to retire
> 16-bit "short" and shift everything up. Processors
> probably don't even have the ability to do 16-bit
> arithmetic anyway. The S/370 doesn't have such
> instructions.

That's a *really* bad idea. Anyone using those types (especially short)
is using them because they have an expectation regarding how wide they
are. Violating that expectation is a great way to break code.

On 18/05/2021 21:29, Joe Pfeiffer wrote:
> "muta...@gmail.com" <mutazilah@gmail.com> writes:

>> I'm guessing that's what Ritchie had in mind. He had
>> everything covered already for a very long time. Well
>> past 2021.
>
> I don't see any sign in K&R that they thought they intended anything of
> the sort. As it says, "[t]he intent is that short and long should
> provide different lengths of integers where practical; int will normally
> reflect the most "natural" size for a particular machine. As you can
> see, each compiler is fre to interprest short and long as appropriate
> for its own hardware. About all you should count on is that short is no
> longer than long".
>
> The "as you can see" is referring to a table immediately before that
> text giving the sizes of short, int, and long for several different
> computers; on PDP-11 short and int are 16 bits while long is 32; on
> Honeywell they're all 36 bits; on IBM 370 and Interdata they're
> 16-32-32.
>
> The current 16-32-64

The current state of affairs appears to be 16-32-64 for 64-bit Linux,
and 16-32-32 for 32-bit Linux and 32/64-bit Windows.

is perfectly reasonable within the context of what
> they wrote, and probably the most straightforward way to advance from 32
> bit machines to 64 bit machines while breaking as little code as
> possible, but to say it was anticipated by 1978 is a real stretch.
>
>>> 'long long' could be reserved for 128 bits but that's a going to be
>>> difficult on a 32-bit processor.
>>
>> Well, if we ever reach that stage, rather than add yet
>> more fundamental types, it's probably time to retire
>> 16-bit "short" and shift everything up. Processors
>> probably don't even have the ability to do 16-bit
>> arithmetic anyway. The S/370 doesn't have such
>> instructions.
>
> That's a *really* bad idea. Anyone using those types (especially short)
> is using them because they have an expectation regarding how wide they
> are. Violating that expectation is a great way to break code.

It's a bad idea also because if you have a requirement for an array of
elements that are too big for 8 bits but will fit within 16, then it
means using twice the memory if 32 bits have to be used.

Plus it makes it impossible to interface to anything else that has
arrays, structs and pointers that involved 16-bit types.

Subject	Author
32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Kaz Kylheku
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	William Ahern
Re: 32-bit pointers, 64-bit longs	antispam
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Vir Campestris
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	Joe Pfeiffer
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Malcolm McLean
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	antispam
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Malcolm McLean
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Malcolm McLean
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Bart
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	Joe Pfeiffer
Re: 32-bit pointers, 64-bit longs	Malcolm McLean
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	Joe Pfeiffer
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	John Dill
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	Malcolm McLean
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	John Dill
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	Malcolm McLean
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	muta...@gmail.com
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Scott Lurndal
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	Lew Pitcher
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Keith Thompson
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	David Brown
Re: 32-bit pointers, 64-bit longs	Ben Bacarisse
Re: 32-bit pointers, 64-bit longs	Tim Rentsch
Re: 32-bit pointers, 64-bit longs	Joe Pfeiffer
Re: 32-bit pointers, 64-bit longs	antispam
Re: 32-bit pointers, 64-bit longs	antispam
Re: 32-bit pointers, 64-bit longs	Chris M. Thomasson
Re: 32-bit pointers, 64-bit longs	Pedro V