Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

A fail-safe circuit will destroy others. -- Klipstein


devel / comp.lang.c / UTF-8 files and fopen mode.

SubjectAuthor
* UTF-8 files and fopen mode.Malcolm McLean
+* Re: UTF-8 files and fopen mode.fir
|`- Re: UTF-8 files and fopen mode.fir
+* Re: UTF-8 files and fopen mode.Bart
|+* Re: UTF-8 files and fopen mode.fir
||`* Re: UTF-8 files and fopen mode.Bart
|| `* Re: UTF-8 files and fopen mode.fir
||  `- Re: UTF-8 files and fopen mode.fir
|`* Re: UTF-8 files and fopen mode.Michael S
| +- Re: UTF-8 files and fopen mode.Bart
| `- Re: UTF-8 files and fopen mode.Vir Campestris
+* Re: UTF-8 files and fopen mode.Ben Bacarisse
|`* Re: UTF-8 files and fopen mode.Malcolm McLean
| +* Re: UTF-8 files and fopen mode.fir
| |+- Re: UTF-8 files and fopen mode.fir
| |`- Re: UTF-8 files and fopen mode.fir
| +- Re: UTF-8 files and fopen mode.Ben Bacarisse
| +- Re: UTF-8 files and fopen mode.Spiros Bousbouras
| `- Re: UTF-8 files and fopen mode.Richard Damon
+- Re: UTF-8 files and fopen mode.Kaz Kylheku
`- Re: UTF-8 files and fopen mode.Richard Damon

1
UTF-8 files and fopen mode.

<695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26985&group=comp.lang.c#26985

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ae9:dec1:0:b0:763:a36e:19bc with SMTP id s184-20020ae9dec1000000b00763a36e19bcmr3287qkf.5.1691142549783;
Fri, 04 Aug 2023 02:49:09 -0700 (PDT)
X-Received: by 2002:a05:6871:4f81:b0:1bf:802f:83a1 with SMTP id
zv1-20020a0568714f8100b001bf802f83a1mr1237959oab.0.1691142549512; Fri, 04 Aug
2023 02:49:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 02:49:09 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:44f8:96bf:70c6:3fee;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:44f8:96bf:70c6:3fee
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
Subject: UTF-8 files and fopen mode.
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 04 Aug 2023 09:49:09 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1176
 by: Malcolm McLean - Fri, 4 Aug 2023 09:49 UTC

Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?

Re: UTF-8 files and fopen mode.

<a0847a66-5fb6-41d5-8d21-8534ef676c8an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26987&group=comp.lang.c#26987

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:b801:0:b0:767:27e1:fcce with SMTP id i1-20020a37b801000000b0076727e1fccemr3718qkf.0.1691144119531;
Fri, 04 Aug 2023 03:15:19 -0700 (PDT)
X-Received: by 2002:a05:6808:1584:b0:3a7:26fe:ed3 with SMTP id
t4-20020a056808158400b003a726fe0ed3mr1814478oiw.4.1691144119329; Fri, 04 Aug
2023 03:15:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 03:15:18 -0700 (PDT)
In-Reply-To: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.150; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.150
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a0847a66-5fb6-41d5-8d21-8534ef676c8an@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 10:15:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1593
 by: fir - Fri, 4 Aug 2023 10:15 UTC

piątek, 4 sierpnia 2023 o 11:49:18 UTC+2 Malcolm McLean napisał(a):
> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?

it should be text afait ..what opening in text mode do it translates 0x0d 0x0a into only 0x0d in ram afair - unicode has nothing to do it (its btw the flop of os that needs 0x0a 0x0d instead of 0xd only)

Re: UTF-8 files and fopen mode.

<uaijva$182h3$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26988&group=comp.lang.c#26988

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 4 Aug 2023 11:30:03 +0100
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <uaijva$182h3$1@dont-email.me>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 4 Aug 2023 10:30:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1cfa97dd5fed9cab6fcd06b01f6d9dc";
logging-data="1313315"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18wK7NkVUCNFa5zBS6jsgkuhAfbwJNBRI4="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:CCXfgxS9Ze+OFbuXExkn6q0iDl0=
In-Reply-To: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
 by: Bart - Fri, 4 Aug 2023 10:30 UTC

On 04/08/2023 10:49, Malcolm McLean wrote:
> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?

I open all files in binary mode.

It means that if processing text files on Windows, you may encounter
either CRLF or LF-only line endings. But that's easy enough to deal
with: just ignore CR.

(Doubtless someone will be along soon to point out that some 1970s
mainframes had completely different arrangements, and that any such code
will not work on any such machines that are still running.)

Re: UTF-8 files and fopen mode.

<144320f6-abd1-4786-8b00-405264310784n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26989&group=comp.lang.c#26989

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:9a4:b0:635:e19a:6cc4 with SMTP id du4-20020a05621409a400b00635e19a6cc4mr7986qvb.2.1691145292886;
Fri, 04 Aug 2023 03:34:52 -0700 (PDT)
X-Received: by 2002:a05:6808:1708:b0:3a7:3ced:532a with SMTP id
bc8-20020a056808170800b003a73ced532amr1821393oib.7.1691145292676; Fri, 04 Aug
2023 03:34:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 03:34:52 -0700 (PDT)
In-Reply-To: <uaijva$182h3$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.150; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.150
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com> <uaijva$182h3$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <144320f6-abd1-4786-8b00-405264310784n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 10:34:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2161
 by: fir - Fri, 4 Aug 2023 10:34 UTC

piątek, 4 sierpnia 2023 o 12:30:20 UTC+2 Bart napisał(a):
> On 04/08/2023 10:49, Malcolm McLean wrote:
> > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?
> I open all files in binary mode.
>
> It means that if processing text files on Windows, you may encounter
> either CRLF or LF-only line endings. But that's easy enough to deal
> with: just ignore CR.
>
> (Doubtless someone will be along soon to point out that some 1970s
> mainframes had completely different arrangements, and that any such code
> will not work on any such machines that are still running.)

and if you will save yours strings like "\nsome\nhere" it will work as
a proper windows text file?
i use opening in text mode but if you will write your own library you covered such
things in there (LoadTextFile("some.asm"); LoadBinaryFile("some.bmp"); so no big deal

Re: UTF-8 files and fopen mode.

<ca53be57-4bc7-49ab-b6e0-dd476490e3c5n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26990&group=comp.lang.c#26990

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:5a06:0:b0:76c:da44:d05c with SMTP id o6-20020a375a06000000b0076cda44d05cmr3763qkb.10.1691146024060;
Fri, 04 Aug 2023 03:47:04 -0700 (PDT)
X-Received: by 2002:a05:687c:354a:b0:1bb:9fd4:65ed with SMTP id
li10-20020a05687c354a00b001bb9fd465edmr1381953oac.5.1691146023756; Fri, 04
Aug 2023 03:47:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 03:47:03 -0700 (PDT)
In-Reply-To: <uaijva$182h3$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:a17f:bb01:8865:22b0;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:a17f:bb01:8865:22b0
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com> <uaijva$182h3$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ca53be57-4bc7-49ab-b6e0-dd476490e3c5n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 04 Aug 2023 10:47:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2155
 by: Michael S - Fri, 4 Aug 2023 10:47 UTC

On Friday, August 4, 2023 at 1:30:20 PM UTC+3, Bart wrote:
> On 04/08/2023 10:49, Malcolm McLean wrote:
> > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?
> I open all files in binary mode.
>
> It means that if processing text files on Windows, you may encounter
> either CRLF or LF-only line endings. But that's easy enough to deal
> with: just ignore CR.
>
> (Doubtless someone will be along soon to point out that some 1970s
> mainframes had completely different arrangements, and that any such code
> will not work on any such machines that are still running.)

Far more recent and far less obscure example is Mac OS classic.
Of course, in practice it's even less likely to be used today by real users (as
opposed to computer history enthusiasts) than "some 1970s mainframes"

Re: UTF-8 files and fopen mode.

<uaipo6$18the$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26992&group=comp.lang.c#26992

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 4 Aug 2023 13:08:40 +0100
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <uaipo6$18the$1@dont-email.me>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<uaijva$182h3$1@dont-email.me>
<ca53be57-4bc7-49ab-b6e0-dd476490e3c5n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 4 Aug 2023 12:08:38 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1cfa97dd5fed9cab6fcd06b01f6d9dc";
logging-data="1340974"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kueo0xWUlgq7FM7MBJVWubGryRG6e9PM="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:AkbTZ8fm85CEoZcqdwylUncmISM=
In-Reply-To: <ca53be57-4bc7-49ab-b6e0-dd476490e3c5n@googlegroups.com>
 by: Bart - Fri, 4 Aug 2023 12:08 UTC

On 04/08/2023 11:47, Michael S wrote:
> On Friday, August 4, 2023 at 1:30:20 PM UTC+3, Bart wrote:
>> On 04/08/2023 10:49, Malcolm McLean wrote:
>>> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?
>> I open all files in binary mode.
>>
>> It means that if processing text files on Windows, you may encounter
>> either CRLF or LF-only line endings. But that's easy enough to deal
>> with: just ignore CR.
>>
>> (Doubtless someone will be along soon to point out that some 1970s
>> mainframes had completely different arrangements, and that any such code
>> will not work on any such machines that are still running.)
>
> Far more recent and far less obscure example is Mac OS classic.
> Of course, in practice it's even less likely to be used today by real users (as
> opposed to computer history enthusiasts) than "some 1970s mainframes"

Do you mean CR-only line endings?

I don't think I've encountered a file using that since the 1990s. It can
probably be accommodated too (treat CR as though it was LF when not
followed by LF), but I don't bother.

A file with mixed line-endings though would not work, and would anyway
be ambiguous: CRLF could mean one newline, or two with an intervening
blank line.

Re: UTF-8 files and fopen mode.

<uaiqjl$194pq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26993&group=comp.lang.c#26993

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 4 Aug 2023 13:23:18 +0100
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <uaiqjl$194pq$1@dont-email.me>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<uaijva$182h3$1@dont-email.me>
<144320f6-abd1-4786-8b00-405264310784n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 4 Aug 2023 12:23:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a1cfa97dd5fed9cab6fcd06b01f6d9dc";
logging-data="1348410"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18aeLAygHNHAZ7RmothqSzUx3e1L9FJyxQ="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:usgwRU2NxwgtWRlkU1HZCNLFAb0=
In-Reply-To: <144320f6-abd1-4786-8b00-405264310784n@googlegroups.com>
 by: Bart - Fri, 4 Aug 2023 12:23 UTC

On 04/08/2023 11:34, fir wrote:
> piątek, 4 sierpnia 2023 o 12:30:20 UTC+2 Bart napisał(a):
>> On 04/08/2023 10:49, Malcolm McLean wrote:
>>> Should UTF-8 files be opened in binary ("rb") or text ("r") mode
when calling fopen()? Are they "text" or a binary representation of text?
>> I open all files in binary mode.
>>
>> It means that if processing text files on Windows, you may encounter
>> either CRLF or LF-only line endings. But that's easy enough to deal
>> with: just ignore CR.
>>
>> (Doubtless someone will be along soon to point out that some 1970s
>> mainframes had completely different arrangements, and that any such code
>> will not work on any such machines that are still running.)
>
> and if you will save yours strings like "\nsome\nhere" it will work as
> a proper windows text file?

TBH I hardly know whether my strings (even from my language) use CRLF or
LF, or whether text files are opened for writing as binary or text.

Because whether the resulting files contain CRLF or only LF, doesn't
matter any more because my programs that read them are oblivious to that
detail.

However, my script language uses a 'createfile' function, a wrapper
around 'fopen', and the default file mode is "wb". My text editor
creates files using that, so most text files from it use LF-only.

But I didn't even known until I checked it just now.

Where 'fopen' is called directly for text output, sometimes it uses "w",
and sometimes "wb".

> i use opening in text mode but if you will write your own library you
covered such
> things in there (LoadTextFile("some.asm");
LoadBinaryFile("some.bmp"); so no big deal

If you open an image file in the PPM 'P6' format, it uses an ASCII
header, which has text separated by newlines, with the last newline
followed by binary data.

This needs to be opened in binary mode, but it means any CRLF line
endings are exposed as 13, 10 bytes, which may confuse a C program
expecting only '\n' characters.

Re: UTF-8 files and fopen mode.

<871qgjueyv.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26994&group=comp.lang.c#26994

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 04 Aug 2023 14:14:32 +0100
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <871qgjueyv.fsf@bsb.me.uk>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="82be12c07f28e99381fe1dae5b0b0603";
logging-data="1363470"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/gWKTLIRXOED1LDGJx2MSZyTFrc+A6/C0="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:hCkMQ5cHrVz4R7GqOhjLzXoAqWQ=
sha1:DCTolx6pC200iSN4wxZqYcyxjj0=
X-BSB-Auth: 1.c2c5d359d819d0992752.20230804141432BST.871qgjueyv.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 4 Aug 2023 13:14 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
> calling fopen()? Are they "text" or a binary representation of text?

On Unix-like systems it does not matter. On Windows I would say it
depends on whether you want the \n <=> \r\n translation to be done for
you (but it's been a while since I used Windows).

--
Ben.

Re: UTF-8 files and fopen mode.

<414d94b3-aa7d-4a32-990b-132291bcd533n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26996&group=comp.lang.c#26996

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:181a:b0:402:b71e:90e5 with SMTP id t26-20020a05622a181a00b00402b71e90e5mr5270qtc.4.1691155516978;
Fri, 04 Aug 2023 06:25:16 -0700 (PDT)
X-Received: by 2002:a05:6808:1a03:b0:3a7:805:f419 with SMTP id
bk3-20020a0568081a0300b003a70805f419mr2668388oib.6.1691155516517; Fri, 04 Aug
2023 06:25:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:25:16 -0700 (PDT)
In-Reply-To: <uaiqjl$194pq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.203; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.203
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<uaijva$182h3$1@dont-email.me> <144320f6-abd1-4786-8b00-405264310784n@googlegroups.com>
<uaiqjl$194pq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <414d94b3-aa7d-4a32-990b-132291bcd533n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 13:25:16 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4295
 by: fir - Fri, 4 Aug 2023 13:25 UTC

piątek, 4 sierpnia 2023 o 14:23:33 UTC+2 Bart napisał(a):
> On 04/08/2023 11:34, fir wrote:
> > piątek, 4 sierpnia 2023 o 12:30:20 UTC+2 Bart napisał(a):
> >> On 04/08/2023 10:49, Malcolm McLean wrote:
> >>> Should UTF-8 files be opened in binary ("rb") or text ("r") mode
> when calling fopen()? Are they "text" or a binary representation of text?
> >> I open all files in binary mode.
> >>
> >> It means that if processing text files on Windows, you may encounter
> >> either CRLF or LF-only line endings. But that's easy enough to deal
> >> with: just ignore CR.
> >>
> >> (Doubtless someone will be along soon to point out that some 1970s
> >> mainframes had completely different arrangements, and that any such code
> >> will not work on any such machines that are still running.)
> >
> > and if you will save yours strings like "\nsome\nhere" it will work as
> > a proper windows text file?
> TBH I hardly know whether my strings (even from my language) use CRLF or
> LF, or whether text files are opened for writing as binary or text.
>
> Because whether the resulting files contain CRLF or only LF, doesn't
> matter any more because my programs that read them are oblivious to that
> detail.
>
> However, my script language uses a 'createfile' function, a wrapper
> around 'fopen', and the default file mode is "wb". My text editor
> creates files using that, so most text files from it use LF-only.
>
> But I didn't even known until I checked it just now.
>
> Where 'fopen' is called directly for text output, sometimes it uses "w",
> and sometimes "wb".
> > i use opening in text mode but if you will write your own library you
> covered such
> > things in there (LoadTextFile("some.asm");
> LoadBinaryFile("some.bmp"); so no big deal
> If you open an image file in the PPM 'P6' format, it uses an ASCII
> header, which has text separated by newlines, with the last newline
> followed by binary data.
>
> This needs to be opened in binary mode, but it means any CRLF line
> endings are exposed as 13, 10 bytes, which may confuse a C program
> expecting only '\n' characters.

it seemd i was also not clearly seen this (for some reason for example i thought "\n" in c is 13 - maybe becouse i was coding on commodore64 a lot as 13 year old, and on commodere its said it was 13 and i seem ro remember that)

i also checked in fact windows seem to have no trouble with unix LF only files
with some surprise i discovered i save unix asm files in furia..but that is what i in fact want..i also read input files in org-asm in binary to have eventually crlf but
it was also what i wanted (i optionally wanted do soem text statistics etc)

but it seem in normal cirucumstances its probably to open in text mode (to have only LF and save in binary - also to have only LF if c is LF based internally to)

Re: UTF-8 files and fopen mode.

<e39a88db-1e38-4185-9edb-f68685e4ad83n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26997&group=comp.lang.c#26997

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ad4:58ab:0:b0:634:81f6:56a0 with SMTP id ea11-20020ad458ab000000b0063481f656a0mr4846qvb.5.1691155621356;
Fri, 04 Aug 2023 06:27:01 -0700 (PDT)
X-Received: by 2002:a05:6830:208a:b0:6b9:9f42:e143 with SMTP id
y10-20020a056830208a00b006b99f42e143mr1514885otq.4.1691155621151; Fri, 04 Aug
2023 06:27:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:27:00 -0700 (PDT)
In-Reply-To: <a0847a66-5fb6-41d5-8d21-8534ef676c8an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.203; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.203
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com> <a0847a66-5fb6-41d5-8d21-8534ef676c8an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e39a88db-1e38-4185-9edb-f68685e4ad83n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 13:27:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1857
 by: fir - Fri, 4 Aug 2023 13:27 UTC

piątek, 4 sierpnia 2023 o 12:15:29 UTC+2 fir napisał(a):
> piątek, 4 sierpnia 2023 o 11:49:18 UTC+2 Malcolm McLean napisał(a):
> > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?
> it should be text afait ..what opening in text mode do it translates 0x0d 0x0a into only 0x0d in ram afair - unicode has nothing to do it (its btw the flop of os that needs 0x0a 0x0d instead of 0xd only)

in utf-8 all the unicode signs are coded by 128+ values afaik so text mode binary mode do not interfere with unicode/ansi

Re: UTF-8 files and fopen mode.

<72f67c48-4d1b-41a3-a796-81ccfa260a82n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26998&group=comp.lang.c#26998

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:14f3:b0:63c:f55e:595b with SMTP id k19-20020a05621414f300b0063cf55e595bmr5742qvw.1.1691155783679;
Fri, 04 Aug 2023 06:29:43 -0700 (PDT)
X-Received: by 2002:a05:6870:c7b4:b0:1bb:4d41:e929 with SMTP id
dy52-20020a056870c7b400b001bb4d41e929mr1782238oab.3.1691155783483; Fri, 04
Aug 2023 06:29:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:29:43 -0700 (PDT)
In-Reply-To: <414d94b3-aa7d-4a32-990b-132291bcd533n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.203; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.203
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<uaijva$182h3$1@dont-email.me> <144320f6-abd1-4786-8b00-405264310784n@googlegroups.com>
<uaiqjl$194pq$1@dont-email.me> <414d94b3-aa7d-4a32-990b-132291bcd533n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <72f67c48-4d1b-41a3-a796-81ccfa260a82n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 13:29:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4764
 by: fir - Fri, 4 Aug 2023 13:29 UTC

piątek, 4 sierpnia 2023 o 15:25:27 UTC+2 fir napisał(a):
> piątek, 4 sierpnia 2023 o 14:23:33 UTC+2 Bart napisał(a):
> > On 04/08/2023 11:34, fir wrote:
> > > piątek, 4 sierpnia 2023 o 12:30:20 UTC+2 Bart napisał(a):
> > >> On 04/08/2023 10:49, Malcolm McLean wrote:
> > >>> Should UTF-8 files be opened in binary ("rb") or text ("r") mode
> > when calling fopen()? Are they "text" or a binary representation of text?
> > >> I open all files in binary mode.
> > >>
> > >> It means that if processing text files on Windows, you may encounter
> > >> either CRLF or LF-only line endings. But that's easy enough to deal
> > >> with: just ignore CR.
> > >>
> > >> (Doubtless someone will be along soon to point out that some 1970s
> > >> mainframes had completely different arrangements, and that any such code
> > >> will not work on any such machines that are still running.)
> > >
> > > and if you will save yours strings like "\nsome\nhere" it will work as
> > > a proper windows text file?
> > TBH I hardly know whether my strings (even from my language) use CRLF or
> > LF, or whether text files are opened for writing as binary or text.
> >
> > Because whether the resulting files contain CRLF or only LF, doesn't
> > matter any more because my programs that read them are oblivious to that
> > detail.
> >
> > However, my script language uses a 'createfile' function, a wrapper
> > around 'fopen', and the default file mode is "wb". My text editor
> > creates files using that, so most text files from it use LF-only.
> >
> > But I didn't even known until I checked it just now.
> >
> > Where 'fopen' is called directly for text output, sometimes it uses "w",
> > and sometimes "wb".
> > > i use opening in text mode but if you will write your own library you
> > covered such
> > > things in there (LoadTextFile("some.asm");
> > LoadBinaryFile("some.bmp"); so no big deal
> > If you open an image file in the PPM 'P6' format, it uses an ASCII
> > header, which has text separated by newlines, with the last newline
> > followed by binary data.
> >
> > This needs to be opened in binary mode, but it means any CRLF line
> > endings are exposed as 13, 10 bytes, which may confuse a C program
> > expecting only '\n' characters.
> it seemd i was also not clearly seen this (for some reason for example i thought "\n" in c is 13 - maybe becouse i was coding on commodore64 a lot as 13 year old, and on commodere its said it was 13 and i seem ro remember that)
>
> i also checked in fact windows seem to have no trouble with unix LF only files
> with some surprise i discovered i save unix asm files in furia..but that is what i in fact want..i also read input files in org-asm in binary to have eventually crlf but
> it was also what i wanted (i optionally wanted do soem text statistics etc)
>
> but it seem in normal cirucumstances its probably to open in text mode (to have only LF and save in binary - also to have only LF if c is LF based internally to)

this crlf is bessides annoyance so better use LF really if c support that and windows work on that (hovever im also sure most files on windows i sen used crlf not lf)

Re: UTF-8 files and fopen mode.

<72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=26999&group=comp.lang.c#26999

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:2b9b:b0:768:421b:a142 with SMTP id dz27-20020a05620a2b9b00b00768421ba142mr8490qkb.4.1691155799916;
Fri, 04 Aug 2023 06:29:59 -0700 (PDT)
X-Received: by 2002:a05:6808:1456:b0:3a7:4467:c778 with SMTP id
x22-20020a056808145600b003a74467c778mr2792930oiv.7.1691155799587; Fri, 04 Aug
2023 06:29:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:29:59 -0700 (PDT)
In-Reply-To: <871qgjueyv.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=81.143.231.9; posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 81.143.231.9
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com> <871qgjueyv.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 04 Aug 2023 13:29:59 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 18
 by: Malcolm McLean - Fri, 4 Aug 2023 13:29 UTC

On Friday, 4 August 2023 at 14:14:48 UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
> > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
> > calling fopen()? Are they "text" or a binary representation of text?
> On Unix-like systems it does not matter. On Windows I would say it
> depends on whether you want the \n <=> \r\n translation to be done for
> you (but it's been a while since I used Windows).
>
I tested the Baby X resource compiler on Windows. The project is hosted
on github, and it contains a short example UTF-8 file, as well as an example
UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
output a single newline. On Windows, the UTF-8 file has a carriage return,
whlst the UTF-16 file does not. So what must have happened is that git has
added the carriage return when downloading to Windows.
I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
representation?

Re: UTF-8 files and fopen mode.

<46fd7896-f294-4a52-b08e-3362f7affc84n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27000&group=comp.lang.c#27000

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:1a9b:b0:40f:b15e:cd08 with SMTP id s27-20020a05622a1a9b00b0040fb15ecd08mr5579qtc.1.1691156182692;
Fri, 04 Aug 2023 06:36:22 -0700 (PDT)
X-Received: by 2002:a05:6808:128c:b0:3a7:3497:2d28 with SMTP id
a12-20020a056808128c00b003a734972d28mr2552859oiw.7.1691156182490; Fri, 04 Aug
2023 06:36:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:36:22 -0700 (PDT)
In-Reply-To: <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.203; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.203
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<871qgjueyv.fsf@bsb.me.uk> <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <46fd7896-f294-4a52-b08e-3362f7affc84n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 13:36:22 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: fir - Fri, 4 Aug 2023 13:36 UTC

piątek, 4 sierpnia 2023 o 15:30:13 UTC+2 Malcolm McLean napisał(a):
> On Friday, 4 August 2023 at 14:14:48 UTC+1, Ben Bacarisse wrote:
> > Malcolm McLean <malcolm.ar...@gmail.com> writes:
> >
> > > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
> > > calling fopen()? Are they "text" or a binary representation of text?
> > On Unix-like systems it does not matter. On Windows I would say it
> > depends on whether you want the \n <=> \r\n translation to be done for
> > you (but it's been a while since I used Windows).
> >
> I tested the Baby X resource compiler on Windows. The project is hosted
> on github, and it contains a short example UTF-8 file, as well as an example
> UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
> both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
> output a single newline. On Windows, the UTF-8 file has a carriage return,
> whlst the UTF-16 file does not. So what must have happened is that git has
> added the carriage return when downloading to Windows.
> I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
> I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
> representation?

ansi can have CR (mac/commodore 64) LF(unix/amiga) CRLF(dos/windows)
so i think utf is exactly the same...native for windows is crlf but as all newer mac,
windows, unix use LF (and "\n" in c are also LF) the best is imo to use LF

so open in text , save in binary (if you got such approach as i i only code on system i know i kode on, not generalized way) - more generalized woudl be probably open as text save as text

Re: UTF-8 files and fopen mode.

<4b5da98d-5980-4bd5-b834-01189b3415ecn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27001&group=comp.lang.c#27001

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ae9:de06:0:b0:76c:af53:39fd with SMTP id s6-20020ae9de06000000b0076caf5339fdmr4738qkf.7.1691156869216;
Fri, 04 Aug 2023 06:47:49 -0700 (PDT)
X-Received: by 2002:a05:6808:2225:b0:3a3:a8d1:1aa1 with SMTP id
bd37-20020a056808222500b003a3a8d11aa1mr2499063oib.2.1691156869018; Fri, 04
Aug 2023 06:47:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:47:48 -0700 (PDT)
In-Reply-To: <46fd7896-f294-4a52-b08e-3362f7affc84n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.203; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.203
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<871qgjueyv.fsf@bsb.me.uk> <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
<46fd7896-f294-4a52-b08e-3362f7affc84n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4b5da98d-5980-4bd5-b834-01189b3415ecn@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 13:47:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3684
 by: fir - Fri, 4 Aug 2023 13:47 UTC

piątek, 4 sierpnia 2023 o 15:36:31 UTC+2 fir napisał(a):
> piątek, 4 sierpnia 2023 o 15:30:13 UTC+2 Malcolm McLean napisał(a):
> > On Friday, 4 August 2023 at 14:14:48 UTC+1, Ben Bacarisse wrote:
> > > Malcolm McLean <malcolm.ar...@gmail.com> writes:
> > >
> > > > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
> > > > calling fopen()? Are they "text" or a binary representation of text?
> > > On Unix-like systems it does not matter. On Windows I would say it
> > > depends on whether you want the \n <=> \r\n translation to be done for
> > > you (but it's been a while since I used Windows).
> > >
> > I tested the Baby X resource compiler on Windows. The project is hosted
> > on github, and it contains a short example UTF-8 file, as well as an example
> > UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
> > both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
> > output a single newline. On Windows, the UTF-8 file has a carriage return,
> > whlst the UTF-16 file does not. So what must have happened is that git has
> > added the carriage return when downloading to Windows.
> > I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
> > I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
> > representation?
> ansi can have CR (mac/commodore 64) LF(unix/amiga) CRLF(dos/windows)
> so i think utf is exactly the same...native for windows is crlf but as all newer mac,
> windows, unix use LF (and "\n" in c are also LF) the best is imo to use LF
>
> so open in text , save in binary (if you got such approach as i i only code on system i know i kode on, not generalized way) - more generalized woudl be probably open as text save as text

i assumed that savin as text adds crlf - and i tested and it do its a waste of disk space as i tested it on my furia compilation to assembly and "wb" makes 42 578 where "wt" 44 678 - notable wase of space (shows btw that my assumption that line of code is 20 bytes is ok , though here the code is asm not c)

easy to cont way save in both "wb" "wt" and difference is number of lines, compare it to size to have average line length

Re: UTF-8 files and fopen mode.

<2f245a49-3eb4-4de3-b333-643117227e45n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27002&group=comp.lang.c#27002

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:8113:b0:40f:91be:b62a with SMTP id jx19-20020a05622a811300b0040f91beb62amr9696qtb.6.1691157326261;
Fri, 04 Aug 2023 06:55:26 -0700 (PDT)
X-Received: by 2002:a05:6870:5a98:b0:1bb:826a:742b with SMTP id
dt24-20020a0568705a9800b001bb826a742bmr1970108oab.3.1691157325823; Fri, 04
Aug 2023 06:55:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 4 Aug 2023 06:55:25 -0700 (PDT)
In-Reply-To: <46fd7896-f294-4a52-b08e-3362f7affc84n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.203; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.203
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<871qgjueyv.fsf@bsb.me.uk> <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
<46fd7896-f294-4a52-b08e-3362f7affc84n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2f245a49-3eb4-4de3-b333-643117227e45n@googlegroups.com>
Subject: Re: UTF-8 files and fopen mode.
From: profesor...@gmail.com (fir)
Injection-Date: Fri, 04 Aug 2023 13:55:26 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3621
 by: fir - Fri, 4 Aug 2023 13:55 UTC

piątek, 4 sierpnia 2023 o 15:36:31 UTC+2 fir napisał(a):
> piątek, 4 sierpnia 2023 o 15:30:13 UTC+2 Malcolm McLean napisał(a):
> > On Friday, 4 August 2023 at 14:14:48 UTC+1, Ben Bacarisse wrote:
> > > Malcolm McLean <malcolm.ar...@gmail.com> writes:
> > >
> > > > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
> > > > calling fopen()? Are they "text" or a binary representation of text?
> > > On Unix-like systems it does not matter. On Windows I would say it
> > > depends on whether you want the \n <=> \r\n translation to be done for
> > > you (but it's been a while since I used Windows).
> > >
> > I tested the Baby X resource compiler on Windows. The project is hosted
> > on github, and it contains a short example UTF-8 file, as well as an example
> > UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
> > both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
> > output a single newline. On Windows, the UTF-8 file has a carriage return,
> > whlst the UTF-16 file does not. So what must have happened is that git has
> > added the carriage return when downloading to Windows.
> > I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
> > I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
> > representation?
> ansi can have CR (mac/commodore 64) LF(unix/amiga) CRLF(dos/windows)
> so i think utf is exactly the same...native for windows is crlf but as all newer mac,
> windows, unix use LF (and "\n" in c are also LF) the best is imo to use LF
>
> so open in text , save in binary (if you got such approach as i i only code on system i know i kode on, not generalized way) - more generalized woudl be probably open as text save as text

from the other point of wiev one could say in fact some system
could support that "printer scripts" where you could CR and type one text
onto another and not only this but olso rest of codes loke bell etc

in such view the incompatibility made those people who changed
that LF will make CR automatically (which is as i guess breaking
compatybiity with orygnal script)

Re: UTF-8 files and fopen mode.

<87pm42ucnh.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27005&group=comp.lang.c#27005

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 04 Aug 2023 15:04:34 +0100
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <87pm42ucnh.fsf@bsb.me.uk>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<871qgjueyv.fsf@bsb.me.uk>
<72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="82be12c07f28e99381fe1dae5b0b0603";
logging-data="1378238"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/iMXek5NuX4FfsZvPKWm9S+I1dHrFNXxE="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:4XmmaenckYClO/JDZEa/E4IEoP0=
sha1:GVEYmXY04SaOBVbsxOaxD1el3pk=
X-BSB-Auth: 1.8f7546098606a0c5b9c5.20230804150434BST.87pm42ucnh.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 4 Aug 2023 14:04 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Friday, 4 August 2023 at 14:14:48 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>> > Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
>> > calling fopen()? Are they "text" or a binary representation of text?
>> On Unix-like systems it does not matter. On Windows I would say it
>> depends on whether you want the \n <=> \r\n translation to be done for
>> you (but it's been a while since I used Windows).
>>
> I tested the Baby X resource compiler on Windows. The project is hosted
> on github, and it contains a short example UTF-8 file, as well as an example
> UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
> both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
> output a single newline. On Windows, the UTF-8 file has a carriage return,
> whlst the UTF-16 file does not. So what must have happened is that git has
> added the carriage return when downloading to Windows.
> I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
> I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
> representation?

Ditto ASCII. But the problem is that

line\n

and

line\r\n

are different things so they need to be represented as different
streams. Unlike UTF-8, ASCII does define some meanings, but that never
stopped a proliferation of line endings.

Now, I may have missed something, and /Unicode/ might have decreed how a
line is terminated, but I really doubt they could have got away with it.

--
Ben.

Re: UTF-8 files and fopen mode.

<03FhHv1fxLw7X7gsa@bongo-ra.co>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27009&group=comp.lang.c#27009

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 4 Aug 2023 14:34:15 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <03FhHv1fxLw7X7gsa@bongo-ra.co>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com> <871qgjueyv.fsf@bsb.me.uk> <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 4 Aug 2023 14:34:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="81519468f44a5edc45a1ab94af4e2bce";
logging-data="1385976"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ok3OOHl1OUmks8kyu6HsB"
Cancel-Lock: sha1:e/LQC27GS3CZ8YDacI+n8UHAWFY=
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
In-Reply-To: <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
 by: Spiros Bousbouras - Fri, 4 Aug 2023 14:34 UTC

On Fri, 4 Aug 2023 06:29:59 -0700 (PDT)
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
> I tested the Baby X resource compiler on Windows. The project is hosted
> on github, and it contains a short example UTF-8 file, as well as an example
> UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
> both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
> output a single newline. On Windows, the UTF-8 file has a carriage return,
> whlst the UTF-16 file does not. So what must have happened is that git has
> added the carriage return when downloading to Windows.

git has a specific option on how to handle line endings and whether to do
automatic conversions based on operating system. For specifics sreng (my
own verb ; it means "use a search engine") for "git line endings".

> I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
> I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
> representation?

UTF-8 is an algorithm/specification for maping a Unicode codepoint (actually
it works for any integer N with 0 <= N < 2**31) to a sequence of octets. It
says nothing about line endings in operating systems. Whether Unicode as a
whole says anything about line endings , I don't know.

Re: UTF-8 files and fopen mode.

<20230804113241.127@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27030&group=comp.lang.c#27030

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Fri, 4 Aug 2023 18:41:16 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <20230804113241.127@kylheku.com>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
Injection-Date: Fri, 4 Aug 2023 18:41:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="42a4086d1e45385610f8b9f4428b75ac";
logging-data="1458209"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ZcrsCCu/3raUv/NQ8e2XoljNYS8vfcnU="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:yla7TsP7Mm5+cxvXUKAOdmy6EBw=
 by: Kaz Kylheku - Fri, 4 Aug 2023 18:41 UTC

On 2023-08-04, Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
> calling fopen()? Are they "text" or a binary representation of text?

They are text.

Practically speaking, if a UTF-8 file comes from a CR-LF system, and you
open it in binary mode, you will see the \r and \n characters.

The main reason we deal with text versus binary streams on mainstream
platforms is the line ending representation.

That is orthogonal to whether there are UTF-8 characters.

On the other hand, it's possible that the values 0x80 to 0xFF do not
correspond to characters of text on some system.

If we are concerned about the broadest possible portability,
then there is pretty ominous wording in the Standard,
which has these kinds of things to say about text streams:

Characters may have to be added, altered, or deleted on input and
output to conform to differing conventions for representing text in
the host environment.

Thus, there need not be a one-to-one correspondence between the
characters in a stream and those in the external representation.

Data read in from a text stream will necessarily compare equal to the
data that were earlier written out to that stream only if: the data
consist only of printing characters and the control characters
horizontal tab and new-line; no new-line character is immediately
preceded by space characters; and the last character is a new-line
character.

That's from C99, and the paragraph division is mine.

So text streams are allowed to trash UTF-8 according to the Standard.

In practical terms, though, I reiterate that it behooves you to treat
them as text, so that you get the line ending conversions on common
platforms.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: UTF-8 files and fopen mode.

<z9ezM.481786$TPw2.329676@fx17.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27041&group=comp.lang.c#27041

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: UTF-8 files and fopen mode.
Content-Language: en-US
Newsgroups: comp.lang.c
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
From: Rich...@Damon-Family.org (Richard Damon)
In-Reply-To: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 12
Message-ID: <z9ezM.481786$TPw2.329676@fx17.iad>
X-Complaints-To: abuse@easynews.com
Organization: Forte - www.forteinc.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Fri, 4 Aug 2023 17:34:23 -0400
X-Received-Bytes: 1473
 by: Richard Damon - Fri, 4 Aug 2023 21:34 UTC

On 8/4/23 5:49 AM, Malcolm McLean wrote:
> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when calling fopen()? Are they "text" or a binary representation of text?

Since the distinction is whether the data should be interpreted as using
the local line-ending method, I would use text. The fact that it is
UTF-8 encoded means it will almost certainly use some combination of new
line and/or carriage return for line ending. If you open as text, you
know that you will get the \n code for line endings.

Of course, if you don't want to affect the nature of line endings in the
file (or want to process them specially yourself) you want to open as
binary.

Re: UTF-8 files and fopen mode.

<OcezM.482808$TPw2.179020@fx17.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27042&group=comp.lang.c#27042

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: UTF-8 files and fopen mode.
Content-Language: en-US
Newsgroups: comp.lang.c
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<871qgjueyv.fsf@bsb.me.uk>
<72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
From: Rich...@Damon-Family.org (Richard Damon)
In-Reply-To: <72d30e83-0ecd-480b-aeb7-45e0c46070fcn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 24
Message-ID: <OcezM.482808$TPw2.179020@fx17.iad>
X-Complaints-To: abuse@easynews.com
Organization: Forte - www.forteinc.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Fri, 4 Aug 2023 17:37:50 -0400
X-Received-Bytes: 2400
 by: Richard Damon - Fri, 4 Aug 2023 21:37 UTC

On 8/4/23 9:29 AM, Malcolm McLean wrote:
> On Friday, 4 August 2023 at 14:14:48 UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>>
>>> Should UTF-8 files be opened in binary ("rb") or text ("r") mode when
>>> calling fopen()? Are they "text" or a binary representation of text?
>> On Unix-like systems it does not matter. On Windows I would say it
>> depends on whether you want the \n <=> \r\n translation to be done for
>> you (but it's been a while since I used Windows).
>>
> I tested the Baby X resource compiler on Windows. The project is hosted
> on github, and it contains a short example UTF-8 file, as well as an example
> UTF-16 file. The UTF-8 file has the extension ",txt". On my Apple Mac,
> both the UTF-8 file and the UTF-16 file, when translated to UTF-16 and output,
> output a single newline. On Windows, the UTF-8 file has a carriage return,
> whlst the UTF-16 file does not. So what must have happened is that git has
> added the carriage return when downloading to Windows.
> I can fix the problem by opening the UTF-8 file with "r" rather than "rb", but
> I'm not sure this is the right thing to do. The idea of UTF-8 is surely one
> representation?

Yes, it is one representation, but doesn't define what a "line ending"
is. That is an orthogonal definition of the MEANING of the control
characters, which is beyond the definition of the basic character set.

Re: UTF-8 files and fopen mode.

<ualc23$1nktu$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=27072&group=comp.lang.c#27072

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.camp...@invalid.invalid (Vir Campestris)
Newsgroups: comp.lang.c
Subject: Re: UTF-8 files and fopen mode.
Date: Sat, 5 Aug 2023 12:33:23 +0100
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <ualc23$1nktu$1@dont-email.me>
References: <695264dd-9ded-4686-bc37-829a9f3c141en@googlegroups.com>
<uaijva$182h3$1@dont-email.me>
<ca53be57-4bc7-49ab-b6e0-dd476490e3c5n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 5 Aug 2023 11:33:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e40b1971f919edac597c786a691593e0";
logging-data="1823678"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+6+TkO2C2PZIkqCrf81ud0xGZiHFuLnuw="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Cancel-Lock: sha1:52L/j9pweK6NB7DiKMnBZuNRYXg=
Content-Language: en-GB
In-Reply-To: <ca53be57-4bc7-49ab-b6e0-dd476490e3c5n@googlegroups.com>
 by: Vir Campestris - Sat, 5 Aug 2023 11:33 UTC

On 04/08/2023 11:47, Michael S wrote:
> Far more recent and far less obscure example is Mac OS classic.
> Of course, in practice it's even less likely to be used today by real users (as
> opposed to computer history enthusiasts) than "some 1970s mainframes"

I've seen cr only, lf only, lf-cr and cr-lf. Don't ask me where...

(Not to mention the 1970s mainframes, where a file was a set of records
rather than a stream of bytes and needed to be treated as such. There
might even be embedded line breaks, form feeds, etc inside one record.
It's not possible to accurately maps such files onto the C text file)

Andy

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor