Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Earth is a beta site.


devel / comp.lang.c / "Some sanity for C and C++ development on Windows" by Chris Wellons

SubjectAuthor
* "Some sanity for C and C++ development on Windows" by Chris WellonsLynn McGuire
`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
 `* Re: "Some sanity for C and C++ development on Windows" by ChrisVir Campestris
  `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsScott Lurndal
   +- Re: "Some sanity for C and C++ development on Windows" by ChrisKaz Kylheku
   `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
    +- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsScott Lurndal
    `* Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
     +- Re: "Some sanity for C and C++ development on Windows" by ChrisGuillaume
     `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      +- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      +* Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
      |+* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||+* Re: "Some sanity for C and C++ development on Windows" by ChrisBart
      |||`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||| `* Re: "Some sanity for C and C++ development on Windows" by ChrisBart
      |||  `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      |||   `* Re: "Some sanity for C and C++ development on Windows" by ChrisBart
      |||    `- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||`- Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
      |+* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||`* Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
      || `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||  +* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||  |`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||  | `* Re: "Some sanity for C and C++ development on Windows" by ChrisMateusz Viste
      ||  |  +* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||  |  |+* Re: "Some sanity for C and C++ development on Windows" by ChrisMateusz Viste
      ||  |  ||+- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||  |  ||`* Re: "Some sanity for C and C++ development on Windows" by ChrisDavid Brown
      ||  |  || `* Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||  |  ||  +* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||  |  ||  |+* Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||  |  ||  ||`- Re: "Some sanity for C and C++ development on Windows" by ChrisDavid Brown
      ||  |  ||  |`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsBen Bacarisse
      ||  |  ||  | `- Re: "Some sanity for C and C++ development on Windows" by ChrisDavid Brown
      ||  |  ||  `- Re: "Some sanity for C and C++ development on Windows" by ChrisDavid Brown
      ||  |  |`- Re: "Some sanity for C and C++ development on Windows" by ChrisBart
      ||  |  +- Re: "Some sanity for C and C++ development on Windows" by ChrisManfred
      ||  |  +- Re: "Some sanity for C and C++ development on Windows" by ChrisGuillaume
      ||  |  `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsBen Bacarisse
      ||  |   +* Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||  |   |`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsBen Bacarisse
      ||  |   | `- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||  |   `* Re: "Some sanity for C and C++ development on Windows" by ChrisMateusz Viste
      ||  |    +- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsBen Bacarisse
      ||  |    `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||  |     +* Re: "Some sanity for C and C++ development on Windows" by ChrisMateusz Viste
      ||  |     |`- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||  |     `* Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||  |      `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||  |       `- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsMalcolm McLean
      ||  `* Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
      ||   `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||    `* Re: "Some sanity for C and C++ development on Windows" by ChrisManfred
      ||     `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||      `* Re: "Some sanity for C and C++ development on Windows" by ChrisManfred
      ||       `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        +* Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||        |`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        | +* Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||        | |`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        | | +- Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||        | | `- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        | `* Re: "Some sanity for C and C++ development on Windows" by ChrisVir Campestris
      ||        |  `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsScott Lurndal
      ||        |   `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        |    `* Re: "Some sanity for C and C++ development on Windows" by ChrisVir Campestris
      ||        |     `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        |      `- Re: "Some sanity for C and C++ development on Windows" by ChrisKaz Kylheku
      ||        +* Re: "Some sanity for C and C++ development on Windows" by ChrisManfred
      ||        |+- Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon
      ||        |`- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||        `* Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
      ||         `* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      ||          +- Re: "Some sanity for C and C++ development on Windows" by Chris Wellonsjames...@alumni.caltech.edu
      ||          `- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      |`* Re: "Some sanity for C and C++ development on Windows" by Chris WellonsPo Lu
      | `* Re: "Some sanity for C and C++ development on Windows" by ChrisJames Kuyper
      |  `- Re: "Some sanity for C and C++ development on Windows" by Chris WellonsÖö Tiib
      `- Re: "Some sanity for C and C++ development on Windows" by ChrisRichard Damon

Pages:1234
"Some sanity for C and C++ development on Windows" by Chris Wellons

<sr0psj$g2d$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19826&group=comp.lang.c#19826

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lynnmcgu...@gmail.com (Lynn McGuire)
Newsgroups: comp.lang.c
Subject: "Some sanity for C and C++ development on Windows" by Chris Wellons
Date: Tue, 4 Jan 2022 00:36:01 -0600
Organization: A noiseless patient Spider
Lines: 4
Message-ID: <sr0psj$g2d$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 4 Jan 2022 06:36:03 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c7a6919a3c4bb8e118841e608221e466";
logging-data="16461"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19quf9fRyH36PwdF+unepz2"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:S9uDbM0/8mewmPbyV56kpsfYfVs=
Content-Language: en-US
 by: Lynn McGuire - Tue, 4 Jan 2022 06:36 UTC

"Some sanity for C and C++ development on Windows" by Chris Wellons
https://nullprogram.com/blog/2021/12/30/

Lynn

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19845&group=comp.lang.c#19845

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:4c9b:: with SMTP id j27mr44879984qtv.656.1641321557813;
Tue, 04 Jan 2022 10:39:17 -0800 (PST)
X-Received: by 2002:a05:620a:199f:: with SMTP id bm31mr34855895qkb.450.1641321557691;
Tue, 04 Jan 2022 10:39:17 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 4 Jan 2022 10:39:17 -0800 (PST)
In-Reply-To: <sr0psj$g2d$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=94.246.251.164; posting-account=pysjKgkAAACLegAdYDFznkqjgx_7vlUK
NNTP-Posting-Host: 94.246.251.164
References: <sr0psj$g2d$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: oot...@hot.ee (Öö Tiib)
Injection-Date: Tue, 04 Jan 2022 18:39:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 14
 by: Öö Tiib - Tue, 4 Jan 2022 18:39 UTC

On Tuesday, 4 January 2022 at 08:36:13 UTC+2, Lynn McGuire wrote:
> "Some sanity for C and C++ development on Windows" by Chris Wellons
> https://nullprogram.com/blog/2021/12/30/

The whole difference that std::string on other platforms is UTF-8.
It is something that standard of C or C++ do not support in any way.
On the contrary, the standards add obfuscation bullshit like:

const char crap[] = u8"Öö Tiib 😀";

And when to ask why then oh but maybe there is EBCDIK character
set. Shove that ebc-dick where sun doesn't shine, morons.
Let MS add its w1252 prefix if it likes that character set too lot?
But no language lawyer in committee does have balls, and there it
ends.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<sr53qo$vbl$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19863&group=comp.lang.c#19863

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: vir.camp...@invalid.invalid (Vir Campestris)
Newsgroups: comp.lang.c
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Date: Wed, 5 Jan 2022 21:50:16 +0000
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <sr53qo$vbl$1@dont-email.me>
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 5 Jan 2022 21:50:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9af68d0a3f305f3e81c32297d3cfa647";
logging-data="32117"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PTx92plGjpXRbC+GCqlRQ4sDH7JcOBLg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
Cancel-Lock: sha1:rZUESVTBV+SdYm1o5lpV3DzWJtM=
In-Reply-To: <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
Content-Language: en-GB
 by: Vir Campestris - Wed, 5 Jan 2022 21:50 UTC

On 04/01/2022 18:39, Öö Tiib wrote:
> On Tuesday, 4 January 2022 at 08:36:13 UTC+2, Lynn McGuire wrote:
>> "Some sanity for C and C++ development on Windows" by Chris Wellons
>> https://nullprogram.com/blog/2021/12/30/
>
> The whole difference that std::string on other platforms is UTF-8.
> It is something that standard of C or C++ do not support in any way.
> On the contrary, the standards add obfuscation bullshit like:
>
> const char crap[] = u8"Öö Tiib 😀";
>
> And when to ask why then oh but maybe there is EBCDIK character
> set. Shove that ebc-dick where sun doesn't shine, morons.
> Let MS add its w1252 prefix if it likes that character set too lot?
> But no language lawyer in committee does have balls, and there it
> ends.
>
My first job was writing assembler on a mainframe with an EBCDIC
character set. The operating system I worked on has been dead for 40
years now, but I dare say IBM's mainframes still use it.

Andy

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<_mpBJ.219710$qz4.56726@fx97.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19864&group=comp.lang.c#19864

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx97.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
Newsgroups: comp.lang.c
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com> <sr53qo$vbl$1@dont-email.me>
Lines: 24
Message-ID: <_mpBJ.219710$qz4.56726@fx97.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 05 Jan 2022 22:56:58 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 05 Jan 2022 22:56:58 GMT
X-Received-Bytes: 1955
 by: Scott Lurndal - Wed, 5 Jan 2022 22:56 UTC

Vir Campestris <vir.campestris@invalid.invalid> writes:
>On 04/01/2022 18:39, Öö Tiib wrote:
>> On Tuesday, 4 January 2022 at 08:36:13 UTC+2, Lynn McGuire wrote:
>>> "Some sanity for C and C++ development on Windows" by Chris Wellons
>>> https://nullprogram.com/blog/2021/12/30/
>>
>> The whole difference that std::string on other platforms is UTF-8.
>> It is something that standard of C or C++ do not support in any way.
>> On the contrary, the standards add obfuscation bullshit like:
>>
>> const char crap[] = u8"Öö Tiib 😀";
>>
>> And when to ask why then oh but maybe there is EBCDIK character
>> set. Shove that ebc-dick where sun doesn't shine, morons.
>> Let MS add its w1252 prefix if it likes that character set too lot?
>> But no language lawyer in committee does have balls, and there it
>> ends.
>>
>My first job was writing assembler on a mainframe with an EBCDIC
>character set. The operating system I worked on has been dead for 40
>years now, but I dare say IBM's mainframes still use it.

The Unisys mainframes (from the Burroughs side) still use EBCDIC;
I believe the sperry side systems support EBCDIC, if not use it natively.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<20220105153245.370@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19865&group=comp.lang.c#19865

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 480-992-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Date: Wed, 5 Jan 2022 23:34:14 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <20220105153245.370@kylheku.com>
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
Injection-Date: Wed, 5 Jan 2022 23:34:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="cbbe571dc5c7e4064c97d068fd7a263b";
logging-data="9026"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bP/VI9waMyL4KSEy8Tzzu89VXidxKNwE="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:UieVi3+U+HA57CY+CEsW/EXzDmk=
 by: Kaz Kylheku - Wed, 5 Jan 2022 23:34 UTC

On 2022-01-05, Scott Lurndal <scott@slp53.sl.home> wrote:
> Vir Campestris <vir.campestris@invalid.invalid> writes:
>>My first job was writing assembler on a mainframe with an EBCDIC
>>character set. The operating system I worked on has been dead for 40
>>years now, but I dare say IBM's mainframes still use it.
>
> The Unisys mainframes (from the Burroughs side) still use EBCDIC;
> I believe the sperry side systems support EBCDIC, if not use it natively.

My EBCDIC experience is that over the years, I had at least one
manager who was one.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19866&group=comp.lang.c#19866

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:7650:: with SMTP id i16mr50296753qtr.220.1641443732695;
Wed, 05 Jan 2022 20:35:32 -0800 (PST)
X-Received: by 2002:ac8:5b90:: with SMTP id a16mr51545929qta.300.1641443732559;
Wed, 05 Jan 2022 20:35:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 5 Jan 2022 20:35:32 -0800 (PST)
In-Reply-To: <_mpBJ.219710$qz4.56726@fx97.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=94.246.251.164; posting-account=pysjKgkAAACLegAdYDFznkqjgx_7vlUK
NNTP-Posting-Host: 94.246.251.164
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: oot...@hot.ee (Öö Tiib)
Injection-Date: Thu, 06 Jan 2022 04:35:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 30
 by: Öö Tiib - Thu, 6 Jan 2022 04:35 UTC

On Thursday, 6 January 2022 at 00:57:09 UTC+2, Scott Lurndal wrote:
> Vir Campestris <vir.cam...@invalid.invalid> writes:
> >On 04/01/2022 18:39, Öö Tiib wrote:
> >> On Tuesday, 4 January 2022 at 08:36:13 UTC+2, Lynn McGuire wrote:
> >>> "Some sanity for C and C++ development on Windows" by Chris Wellons
> >>> https://nullprogram.com/blog/2021/12/30/
> >>
> >> The whole difference that std::string on other platforms is UTF-8.
> >> It is something that standard of C or C++ do not support in any way.
> >> On the contrary, the standards add obfuscation bullshit like:
> >>
> >> const char crap[] = u8"Öö Tiib 😀";
> >>
> >> And when to ask why then oh but maybe there is EBCDIK character
> >> set. Shove that ebc-dick where sun doesn't shine, morons.
> >> Let MS add its w1252 prefix if it likes that character set too lot?
> >> But no language lawyer in committee does have balls, and there it
> >> ends.
> >>
> >My first job was writing assembler on a mainframe with an EBCDIC
> >character set. The operating system I worked on has been dead for 40
> >years now, but I dare say IBM's mainframes still use it.
>
> The Unisys mainframes (from the Burroughs side) still use EBCDIC;
> I believe the sperry side systems support EBCDIC, if not use it natively.

Even programmer of such system might want to upgrade to compiler
whose standard library supports UTF-8. But it is not possible as
standard library is defined not to.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<GzEBJ.133292$IB7.47845@fx02.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19867&group=comp.lang.c#19867

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
Newsgroups: comp.lang.c
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com> <sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad> <36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
Lines: 35
Message-ID: <GzEBJ.133292$IB7.47845@fx02.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Thu, 06 Jan 2022 16:14:30 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Thu, 06 Jan 2022 16:14:30 GMT
X-Received-Bytes: 2627
 by: Scott Lurndal - Thu, 6 Jan 2022 16:14 UTC

=?UTF-8?B?w5bDtiBUaWli?= <ootiib@hot.ee> writes:
>On Thursday, 6 January 2022 at 00:57:09 UTC+2, Scott Lurndal wrote:
>> Vir Campestris <vir.cam...@invalid.invalid> writes:=20
>> >On 04/01/2022 18:39, =C3=96=C3=B6 Tiib wrote:=20
>> >> On Tuesday, 4 January 2022 at 08:36:13 UTC+2, Lynn McGuire wrote:=20
>> >>> "Some sanity for C and C++ development on Windows" by Chris Wellons=
>=20
>> >>> https://nullprogram.com/blog/2021/12/30/=20
>> >>=20
>> >> The whole difference that std::string on other platforms is UTF-8.=20
>> >> It is something that standard of C or C++ do not support in any way.=
>=20
>> >> On the contrary, the standards add obfuscation bullshit like:=20
>> >>=20
>> >> const char crap[] =3D u8"=C3=96=C3=B6 Tiib =F0=9F=98=80";=20
>> >>=20
>> >> And when to ask why then oh but maybe there is EBCDIK character=20
>> >> set. Shove that ebc-dick where sun doesn't shine, morons.=20
>> >> Let MS add its w1252 prefix if it likes that character set too lot?=20
>> >> But no language lawyer in committee does have balls, and there it=20
>> >> ends.=20
>> >>=20
>> >My first job was writing assembler on a mainframe with an EBCDIC=20
>> >character set. The operating system I worked on has been dead for 40=20
>> >years now, but I dare say IBM's mainframes still use it.
>>=20
>> The Unisys mainframes (from the Burroughs side) still use EBCDIC;=20
>> I believe the sperry side systems support EBCDIC, if not use it natively.
>
>Even programmer of such system might want to upgrade to compiler
>whose standard library supports UTF-8. But it is not possible as=20
>standard library is defined not to. =20

Well, the Burroughs programmers use COBOL and ALGOL, not C, which
have supported I18N and L10N since the 1980s.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19868&group=comp.lang.c#19868

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:2622:: with SMTP id gv2mr55504992qvb.128.1641490119566;
Thu, 06 Jan 2022 09:28:39 -0800 (PST)
X-Received: by 2002:a05:620a:28c1:: with SMTP id l1mr2425210qkp.362.1641490119365;
Thu, 06 Jan 2022 09:28:39 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Thu, 6 Jan 2022 09:28:39 -0800 (PST)
In-Reply-To: <36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=108.48.119.9; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 108.48.119.9
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad> <36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Thu, 06 Jan 2022 17:28:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 48
 by: james...@alumni.calt - Thu, 6 Jan 2022 17:28 UTC

On Wednesday, January 5, 2022 at 11:35:38 PM UTC-5, Öö Tiib wrote:
> On Thursday, 6 January 2022 at 00:57:09 UTC+2, Scott Lurndal wrote:
....
> > The Unisys mainframes (from the Burroughs side) still use EBCDIC;
> > I believe the sperry side systems support EBCDIC, if not use it natively.
> Even programmer of such system might want to upgrade to compiler
> whose standard library supports UTF-8. But it is not possible as
> standard library is defined not to.

Could you cite the text from the C and C++ standards that prohibits the
standard library from supporting UTF-8?
Are you saying that it's prohibited specifically on platforms that normally
use EBCDIC? I'm not aware of any requirement that an implementation of
C follow the conventions for the target platform: an implementation that
emulates working on a completely different platform (such as one where
UTF-8 is the norm) is always allowed.

To the best of my understanding:
* The encoding of source files and the execution character narrow
character set are both implementation-defined multibyte encodings, and
there's nothing that prohibits either encoding from being UTF-8.
* Implementations are explicitly permitted to allow extended characters in
source files for identifiers, string literals, character constants, comments
and header names.
* The current versions of both standards allow string literals and character
constants prefixed with u8, which are required to have UTF-8 encoding.
* Even the latest version of C doesn't mandate any conversion routines for
UTF-8, but on a platform which makes UTF-8 the encoding for it's
execution character set, the conversion routines that contain "mb" in their
name will interpret plain char as having UTF-8 encoding.
* C++ mandates mbrtoc8() and c8rtomb(), which convert between the
native encoding and UTF-8. It also mandates mbrtowc() and wcrtomb(),
as well as the C standard library routines for convertiong betwen the
native encodings and char16_t or char32_t in <uchar>; the C standard does
not require that those types have UTF-16 and UTF-32 encoding
respectively, but the C++ standard does. It also mandates codecvt facets
for converting between UTF-8, UTF-16, and UTF-32, so conversion between
UTF-8 and any of the other four character encodings are mandated,
though conversion to and from wchar_t is a two-step process unless the
native narrow character set uses UTF-8 encoding.

While C, in particular, doesn't mandate quite as much support for UTF-8 as
I'd like, both standards allow the fullest possible support for UTF-8 that I
could imagine. Why do you think otherwise?

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<sr7d6l$1s6b$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19869&group=comp.lang.c#19869

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!UgLt14+w9tVHe1BtIa3HDQ.user.46.165.242.75.POSTED!not-for-mail
From: mess...@bottle.org (Guillaume)
Newsgroups: comp.lang.c
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Date: Thu, 6 Jan 2022 19:42:20 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sr7d6l$1s6b$1@gioia.aioe.org>
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
<b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="61643"; posting-host="UgLt14+w9tVHe1BtIa3HDQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: fr
 by: Guillaume - Thu, 6 Jan 2022 18:42 UTC

Le 06/01/2022 à 18:28, james...@alumni.caltech.edu a écrit :
> While C, in particular, doesn't mandate quite as much support for UTF-8 as
> I'd like, both standards allow the fullest possible support for UTF-8 that I
> could imagine. Why do you think otherwise?

Agreed.

And yes, as support is not that complete, we usually have to use
third-party (or our own) libraries just for that. And to be fair, UTF-8
support, overall, is still a bit shaky.

Sure on Windows, where MS had focused on using Unicode UCS2 instead of
UTF-8, things are no better, even if you use the Windows API instead of
the C standard library.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19871&group=comp.lang.c#19871

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:b7c3:: with SMTP id h186mr42339604qkf.691.1641557178113;
Fri, 07 Jan 2022 04:06:18 -0800 (PST)
X-Received: by 2002:a37:315:: with SMTP id 21mr43724784qkd.52.1641557177973;
Fri, 07 Jan 2022 04:06:17 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 04:06:17 -0800 (PST)
In-Reply-To: <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.50.190.130; posting-account=pysjKgkAAACLegAdYDFznkqjgx_7vlUK
NNTP-Posting-Host: 84.50.190.130
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: oot...@hot.ee (Öö Tiib)
Injection-Date: Fri, 07 Jan 2022 12:06:18 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 94
 by: Öö Tiib - Fri, 7 Jan 2022 12:06 UTC

On Thursday, 6 January 2022 at 19:28:46 UTC+2, james...@alumni.caltech.edu wrote:
> On Wednesday, January 5, 2022 at 11:35:38 PM UTC-5, Öö Tiib wrote:
> > On Thursday, 6 January 2022 at 00:57:09 UTC+2, Scott Lurndal wrote:
> ...
> > > The Unisys mainframes (from the Burroughs side) still use EBCDIC;
> > > I believe the sperry side systems support EBCDIC, if not use it natively.
> > Even programmer of such system might want to upgrade to compiler
> > whose standard library supports UTF-8. But it is not possible as
> > standard library is defined not to.
> Could you cite the text from the C and C++ standards that prohibits the
> standard library from supporting UTF-8?

Are you pretending that you did not understand what I meant?
AFAIK you have rather decent knowledge of standards. The
standards allow implementations to have wide array of whatever
obscure extensions.

However some essential things, like say adding 128 bit integers or
UTF-8 support or even to stop that nonsense with newline characters
is made tricky. Plus it is obscured with random half backed extensions,
prefixes and types that promise not much, tend to be deprecated and
confuse people. That in world where 98% of plain text in internet is UTF-8..

> Are you saying that it's prohibited specifically on platforms that normally
> use EBCDIC? I'm not aware of any requirement that an implementation of
> C follow the conventions for the target platform: an implementation that
> emulates working on a completely different platform (such as one where
> UTF-8 is the norm) is always allowed.

I am saying that when to ask why default string can't be UTF-8 then that
EBCDIC is usually mentioned. Despite there probably are no much C, let
alone C++ used on few alive EBCDIC platforms.

>
> To the best of my understanding:
> * The encoding of source files and the execution character narrow
> character set are both implementation-defined multibyte encodings, and
> there's nothing that prohibits either encoding from being UTF-8.
> * Implementations are explicitly permitted to allow extended characters in
> source files for identifiers, string literals, character constants, comments
> and header names.
> * The current versions of both standards allow string literals and character
> constants prefixed with u8, which are required to have UTF-8 encoding.
> * Even the latest version of C doesn't mandate any conversion routines for
> UTF-8, but on a platform which makes UTF-8 the encoding for it's
> execution character set, the conversion routines that contain "mb" in their
> name will interpret plain char as having UTF-8 encoding.
> * C++ mandates mbrtoc8() and c8rtomb(), which convert between the
> native encoding and UTF-8. It also mandates mbrtowc() and wcrtomb(),
> as well as the C standard library routines for convertiong betwen the
> native encodings and char16_t or char32_t in <uchar>; the C standard does
> not require that those types have UTF-16 and UTF-32 encoding
> respectively, but the C++ standard does.

The UTF-16 and UTF-32 are also among those 2% of obscure text encodings.
OK, UTF-16 can be useful to communicate with mis-designed programming
languages like Java or C# or operating systems like Windows. But UTF-32
is rather exotic.
> It also mandates codecvt facets
> for converting between UTF-8, UTF-16, and UTF-32, so conversion between
> UTF-8 and any of the other four character encodings are mandated,
> though conversion to and from wchar_t is a two-step process unless the
> native narrow character set uses UTF-8 encoding.

The most useful of those facets that converted to-from UTF-8 in char
array were deprecated by C++17.

> While C, in particular, doesn't mandate quite as much support for UTF-8 as
> I'd like, both standards allow the fullest possible support for UTF-8 that I
> could imagine. Why do you think otherwise?

I think that having all text streams and plain non-prefixed "strings" as
UTF-8 is both possible and most logical. UTF-8 code unit is guaranteed
to fit into char by both standards so it is possible. UTF-8 is most
widespread text format so it is logical. Other, obscure encodings like
Windows1252 or EBCDIC (and functions using or filling those) should
have weirdo prefixes and special character types and what not.
Why do you think otherwise?

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<a157953d-d9b8-4b9e-8fa4-47ef614f4380n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19872&group=comp.lang.c#19872

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ad4:5ba3:: with SMTP id 3mr56611632qvq.59.1641560917972;
Fri, 07 Jan 2022 05:08:37 -0800 (PST)
X-Received: by 2002:a05:622a:104e:: with SMTP id f14mr4487166qte.376.1641560917820;
Fri, 07 Jan 2022 05:08:37 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 05:08:37 -0800 (PST)
In-Reply-To: <27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:71e1:9758:79:fd3;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:71e1:9758:79:fd3
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a157953d-d9b8-4b9e-8fa4-47ef614f4380n@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 07 Jan 2022 13:08:37 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1972
 by: Malcolm McLean - Fri, 7 Jan 2022 13:08 UTC

On Friday, 7 January 2022 at 12:06:26 UTC, Öö Tiib wrote:
>
> OK, UTF-16 can be useful to communicate with mis-designed programming
> languages like Java or C# or operating systems like Windows. But UTF-32
> is rather exotic.
>
Yes and no. You can pass strings about as utf-8. But it's hard to manipulate them.
Often it's easier to convert to utf-32 and back to actually access the content of
a string and use it.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19877&group=comp.lang.c#19877

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:5994:: with SMTP id e20mr57923046qte.75.1641576849879;
Fri, 07 Jan 2022 09:34:09 -0800 (PST)
X-Received: by 2002:a05:6214:5286:: with SMTP id kj6mr58879681qvb.74.1641576849694;
Fri, 07 Jan 2022 09:34:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 09:34:09 -0800 (PST)
In-Reply-To: <27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=108.48.119.9; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 108.48.119.9
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Fri, 07 Jan 2022 17:34:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 124
 by: james...@alumni.calt - Fri, 7 Jan 2022 17:34 UTC

On Friday, January 7, 2022 at 7:06:26 AM UTC-5, Öö Tiib wrote:
> On Thursday, 6 January 2022 at 19:28:46 UTC+2, james...@alumni.caltech.edu wrote:
> > On Wednesday, January 5, 2022 at 11:35:38 PM UTC-5, Öö Tiib wrote:
....
> > > Even programmer of such system might want to upgrade to compiler
> > > whose standard library supports UTF-8. But it is not possible as
> > > standard library is defined not to.
> > Could you cite the text from the C and C++ standards that prohibits the
> > standard library from supporting UTF-8?
> Are you pretending that you did not understand what I meant?

No, I am quite accurately and honestly expressing my confusion. You object to
something being prohibited by the standards that is, to the best of my understanding,
allowed. It would make more sense if you were objecting the fact that it isn't
mandatory, and if you were making such claims, I would disagree with you about
whether it would be a good idea to make it mandatory - but as far as I can tell, you're
claiming it isn't allowed.
If you could, as requested, cite the relevant text that prohibits such compilers, you
might convince me that I'm wrong. If not, the citation would at least enable me to try
to convince you that you're misinterpreting that text. Neither possibility can happen
until you actually honor that request.

....
> However some essential things, like say adding 128 bit integers or
> UTF-8 support or even to stop that nonsense with newline characters
> is made tricky.

Neither 128 bits integer nor UTF-8 support are essential. Lots of people have no
need of either (I've never needed either one, though I have used UTF-8 since it was
available). If you need such things on a platform where no existing implementation
of C provides them, your complaint is with the implementors, not the standard,
because the standard says nothing to prohibit such things.

....
> I am saying that when to ask why default string can't be UTF-8 then that
> EBCDIC is usually mentioned. Despite there probably are no much C, let
> alone C++ used on few alive EBCDIC platforms.

That seems reasonable to me. The only places where that logic applies are
implementations of C targeting EBCDIC platforms, and regardless of how rare such
implementations, they would become substantially rarer because users would abandon
them if they switched to UTF-8.

....
> The UTF-16 and UTF-32 are also among those 2% of obscure text encodings.
> OK, UTF-16 can be useful to communicate with mis-designed programming
> languages like Java or C# or operating systems like Windows. But UTF-32
> is rather exotic.

UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
with it, I've only done a few years of programming targeting that platform, and all of
the text that came up in the work I was doing there was simple English, with no need
to make any use of extended characters, so I can't vouch for any of the details about
how text was encoded. However, WIndows is a very common platform, whether or not
you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
seems very odd.

UTF-32 shares an important property with the implementation-defined encoding used
for wchar_t: every character takes up one and only one element in the array.. I have
written a lot of code over several decades for parsing strings that assumes that every
character takes up one char, a valid assumption in the contexts in which I wrote it. When
I think about how I would have to re-write such code to work with multi-byte encodings
such as UTF-8, then the simplicity of replacing char with wchar_t or char32_t, leaving my
logic unchanged, starts looking pretty attractive. However, since I have little experience
writing code to work with extended characters using any encoding, my preferences don't
carry much weight.

> > While C, in particular, doesn't mandate quite as much support for UTF-8 as
> > I'd like, both standards allow the fullest possible support for UTF-8 that I
> > could imagine. Why do you think otherwise?
> I think that having all text streams and plain non-prefixed "strings" as
> UTF-8 is both possible and most logical.

Yes, and that is allowed by both standards, and is the norm, not the exception, on most
Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
confuses me. If you don't work on Unix systems where the encoding of non-prefixed
strings is UTF-8, and you don't work on Windows systems where UTF-16 is the norm, and
you don't work on systems where EBCDIC is the norm, what kinds of systems do you
work on? I'm not saying that there aren't any other types of systems, there's a great
many, but most of the others are substantially less common, so I am just curious which
one(s) you use.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19878&group=comp.lang.c#19878

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:22ed:: with SMTP id p13mr43853526qki.768.1641581072886;
Fri, 07 Jan 2022 10:44:32 -0800 (PST)
X-Received: by 2002:ac8:764a:: with SMTP id i10mr5204166qtr.580.1641581072734;
Fri, 07 Jan 2022 10:44:32 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 10:44:32 -0800 (PST)
In-Reply-To: <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 07 Jan 2022 18:44:32 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 24
 by: Malcolm McLean - Fri, 7 Jan 2022 18:44 UTC

On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
>
> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
> with it, I've only done a few years of programming targeting that platform, and all of
> the text that came up in the work I was doing there was simple English, with no need
> to make any use of extended characters, so I can't vouch for any of the details about
> how text was encoded. However, WIndows is a very common platform, whether or not
> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> seems very odd.
>
Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
slightly incompatible version. If you don't provide a suffix at all, the compiler
selects a version which depends on how you have set it up. I'm not sure what the
default is or exactly how you play with the settings.
>
> > I think that having all text streams and plain non-prefixed "strings" as
> > UTF-8 is both possible and most logical.
> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
> confuses me.

My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
need to do so, and the situation might have changed recently.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<sra3mm$974$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19879&group=comp.lang.c#19879

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Date: Fri, 7 Jan 2022 19:18:46 +0000
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <sra3mm$974$1@dont-email.me>
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
<b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
<884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 19:18:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7d75b15abc98a18e211b332600021ef9";
logging-data="9444"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18t6c68DJo9JaJZFJPqHq6QIJeVQY11N6A="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:54weEmcp1Cnsdh4Tzanr4MqHqhQ=
In-Reply-To: <d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
 by: Bart - Fri, 7 Jan 2022 19:18 UTC

On 07/01/2022 18:44, Malcolm McLean wrote:
> On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
>>
>> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
>> with it, I've only done a few years of programming targeting that platform, and all of
>> the text that came up in the work I was doing there was simple English, with no need
>> to make any use of extended characters, so I can't vouch for any of the details about
>> how text was encoded. However, WIndows is a very common platform, whether or not
>> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
>> seems very odd.
>>
> Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
> The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
> slightly incompatible version. If you don't provide a suffix at all, the compiler
> selects a version which depends on how you have set it up. I'm not sure what the
> default is or exactly how you play with the settings.
>>
>>> I think that having all text streams and plain non-prefixed "strings" as
>>> UTF-8 is both possible and most logical.
>> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
>> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
>> confuses me.
>
> My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
> need to do so, and the situation might have changed recently.
>

This program, which contains UTF8 sequences:

#include <stdio.h>

int main(void) {
printf("ø°PÇ€\n");
}

works OK if compiled with bcc or tcc and run with codepage 65001 active.

However it doesn't work if compiled with gcc; I don't know why.

Calling a Windows -A function (eg. MessageBoxA) with UTF8 strings
doesn't work either.

There are also wider aspects than sending output via stdout as mentioned
in the article, such as command line input.

So it's still a mess on Windows from what I can see.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19880&group=comp.lang.c#19880

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:22ed:: with SMTP id p13mr44116701qki.768.1641587265450;
Fri, 07 Jan 2022 12:27:45 -0800 (PST)
X-Received: by 2002:ac8:5c03:: with SMTP id i3mr60500396qti.107.1641587265313;
Fri, 07 Jan 2022 12:27:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 12:27:45 -0800 (PST)
In-Reply-To: <sra3mm$974$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com> <sra3mm$974$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 07 Jan 2022 20:27:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 64
 by: Malcolm McLean - Fri, 7 Jan 2022 20:27 UTC

On Friday, 7 January 2022 at 19:18:57 UTC, Bart wrote:
> On 07/01/2022 18:44, Malcolm McLean wrote:
> > On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
> >>
> >> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
> >> with it, I've only done a few years of programming targeting that platform, and all of
> >> the text that came up in the work I was doing there was simple English, with no need
> >> to make any use of extended characters, so I can't vouch for any of the details about
> >> how text was encoded. However, WIndows is a very common platform, whether or not
> >> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> >> seems very odd.
> >>
> > Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
> > The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
> > slightly incompatible version. If you don't provide a suffix at all, the compiler
> > selects a version which depends on how you have set it up. I'm not sure what the
> > default is or exactly how you play with the settings.
> >>
> >>> I think that having all text streams and plain non-prefixed "strings" as
> >>> UTF-8 is both possible and most logical.
> >> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
> >> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
> >> confuses me.
> >
> > My experience is that passing utf-8 to printf() or fopen() doesn't work.. But I rarely
> > need to do so, and the situation might have changed recently.
> >
> This program, which contains UTF8 sequences:
>
> #include <stdio.h>
>
> int main(void) {
> printf("ø°PÇ€\n");
> }
>
> works OK if compiled with bcc or tcc and run with codepage 65001 active.
>
Are you sure that's compiling to utf-8?
A better test would be to build the utf-8 sequence explictly, and see if the output
is as specified.
>
> So it's still a mess on Windows from what I can see.
>
It's not just Windows. On my platform, there are several different types and C++
classes that are supposed to hold Unicode. We need a suite of little functions
to do ad hoc conversions between them. Most of these are binary no-ops.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<srac11$8kq$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19881&group=comp.lang.c#19881

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Date: Fri, 7 Jan 2022 21:40:49 +0000
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <srac11$8kq$1@dont-email.me>
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
<b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
<884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
<sra3mm$974$1@dont-email.me>
<b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 21:40:49 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7d75b15abc98a18e211b332600021ef9";
logging-data="8858"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18uVWkP2k1C4bfMwj8ZWtAcrp5aGGoTuB0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:q0T1VH+WMrDdBsbGEWHtjIYUUJc=
In-Reply-To: <b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com>
 by: Bart - Fri, 7 Jan 2022 21:40 UTC

On 07/01/2022 20:27, Malcolm McLean wrote:
> On Friday, 7 January 2022 at 19:18:57 UTC, Bart wrote:
>> On 07/01/2022 18:44, Malcolm McLean wrote:
>>> On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
>>>>
>>>> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
>>>> with it, I've only done a few years of programming targeting that platform, and all of
>>>> the text that came up in the work I was doing there was simple English, with no need
>>>> to make any use of extended characters, so I can't vouch for any of the details about
>>>> how text was encoded. However, WIndows is a very common platform, whether or not
>>>> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
>>>> seems very odd.
>>>>
>>> Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
>>> The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
>>> slightly incompatible version. If you don't provide a suffix at all, the compiler
>>> selects a version which depends on how you have set it up. I'm not sure what the
>>> default is or exactly how you play with the settings.
>>>>
>>>>> I think that having all text streams and plain non-prefixed "strings" as
>>>>> UTF-8 is both possible and most logical.
>>>> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
>>>> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
>>>> confuses me.
>>>
>>> My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
>>> need to do so, and the situation might have changed recently.
>>>
>> This program, which contains UTF8 sequences:
>>
>> #include <stdio.h>
>>
>> int main(void) {
>> printf("ø°PÇ€\n");
>> }
>>
>> works OK if compiled with bcc or tcc and run with codepage 65001 active.
>>
> Are you sure that's compiling to utf-8?
> A better test would be to build the utf-8 sequence explictly, and see if the output
> is as specified.

Notepad was told to save as UTF8. Codepage 65001 is UTF8 (and it didn't
work with 1252). And I just checked the source file to confirm the
sequences are the correct UTF8.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<986d6844-5092-4f95-b605-80e10addd24an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19882&group=comp.lang.c#19882

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:4542:: with SMTP id u2mr16296760qkp.605.1641592987659;
Fri, 07 Jan 2022 14:03:07 -0800 (PST)
X-Received: by 2002:ac8:5c03:: with SMTP id i3mr60789451qti.107.1641592987484;
Fri, 07 Jan 2022 14:03:07 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 14:03:07 -0800 (PST)
In-Reply-To: <srac11$8kq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com> <sra3mm$974$1@dont-email.me>
<b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com> <srac11$8kq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <986d6844-5092-4f95-b605-80e10addd24an@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 07 Jan 2022 22:03:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 70
 by: Malcolm McLean - Fri, 7 Jan 2022 22:03 UTC

On Friday, 7 January 2022 at 21:41:00 UTC, Bart wrote:
> On 07/01/2022 20:27, Malcolm McLean wrote:
> > On Friday, 7 January 2022 at 19:18:57 UTC, Bart wrote:
> >> On 07/01/2022 18:44, Malcolm McLean wrote:
> >>> On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
> >>>>
> >>>> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
> >>>> with it, I've only done a few years of programming targeting that platform, and all of
> >>>> the text that came up in the work I was doing there was simple English, with no need
> >>>> to make any use of extended characters, so I can't vouch for any of the details about
> >>>> how text was encoded. However, WIndows is a very common platform, whether or not
> >>>> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> >>>> seems very odd.
> >>>>
> >>> Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
> >>> The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
> >>> slightly incompatible version. If you don't provide a suffix at all, the compiler
> >>> selects a version which depends on how you have set it up. I'm not sure what the
> >>> default is or exactly how you play with the settings.
> >>>>
> >>>>> I think that having all text streams and plain non-prefixed "strings" as
> >>>>> UTF-8 is both possible and most logical.
> >>>> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
> >>>> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
> >>>> confuses me.
> >>>
> >>> My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
> >>> need to do so, and the situation might have changed recently.
> >>>
> >> This program, which contains UTF8 sequences:
> >>
> >> #include <stdio.h>
> >>
> >> int main(void) {
> >> printf("ø°PÇ€\n");
> >> }
> >>
> >> works OK if compiled with bcc or tcc and run with codepage 65001 active.
> >>
> > Are you sure that's compiling to utf-8?
> > A better test would be to build the utf-8 sequence explictly, and see if the output
> > is as specified.
> Notepad was told to save as UTF8. Codepage 65001 is UTF8 (and it didn't
> work with 1252). And I just checked the source file to confirm the
> sequences are the correct UTF8.
>
So almost certainly the compiler is compiling it to UTF-8.
But it would have been easier to just specify one non-ascii character as raw bytes,
forcing it to use UTF-8 whatever the source or execution character set.
If it displays correctly, then you know that UTF-8 is supported by the printf / terminal
combination.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<srae3l$nd8$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19883&group=comp.lang.c#19883

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Date: Fri, 7 Jan 2022 22:16:21 +0000
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <srae3l$nd8$1@dont-email.me>
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
<b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
<884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
<sra3mm$974$1@dont-email.me>
<b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com>
<srac11$8kq$1@dont-email.me>
<986d6844-5092-4f95-b605-80e10addd24an@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jan 2022 22:16:21 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7d75b15abc98a18e211b332600021ef9";
logging-data="23976"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+A+ycWlHZ/2RrYRk8K1ZsjoBoSJzsTPzE="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:HSnhQzC+5Hq+Ye7b3eN5UWVWH0c=
In-Reply-To: <986d6844-5092-4f95-b605-80e10addd24an@googlegroups.com>
 by: Bart - Fri, 7 Jan 2022 22:16 UTC

On 07/01/2022 22:03, Malcolm McLean wrote:
> On Friday, 7 January 2022 at 21:41:00 UTC, Bart wrote:
>> On 07/01/2022 20:27, Malcolm McLean wrote:
>>> On Friday, 7 January 2022 at 19:18:57 UTC, Bart wrote:
>>>> On 07/01/2022 18:44, Malcolm McLean wrote:
>>>>> On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
>>>>>>
>>>>>> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
>>>>>> with it, I've only done a few years of programming targeting that platform, and all of
>>>>>> the text that came up in the work I was doing there was simple English, with no need
>>>>>> to make any use of extended characters, so I can't vouch for any of the details about
>>>>>> how text was encoded. However, WIndows is a very common platform, whether or not
>>>>>> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
>>>>>> seems very odd.
>>>>>>
>>>>> Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
>>>>> The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
>>>>> slightly incompatible version. If you don't provide a suffix at all, the compiler
>>>>> selects a version which depends on how you have set it up. I'm not sure what the
>>>>> default is or exactly how you play with the settings.
>>>>>>
>>>>>>> I think that having all text streams and plain non-prefixed "strings" as
>>>>>>> UTF-8 is both possible and most logical.
>>>>>> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
>>>>>> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
>>>>>> confuses me.
>>>>>
>>>>> My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
>>>>> need to do so, and the situation might have changed recently.
>>>>>
>>>> This program, which contains UTF8 sequences:
>>>>
>>>> #include <stdio.h>
>>>>
>>>> int main(void) {
>>>> printf("ø°PÇ€\n");
>>>> }
>>>>
>>>> works OK if compiled with bcc or tcc and run with codepage 65001 active.
>>>>
>>> Are you sure that's compiling to utf-8?
>>> A better test would be to build the utf-8 sequence explictly, and see if the output
>>> is as specified.
>> Notepad was told to save as UTF8. Codepage 65001 is UTF8 (and it didn't
>> work with 1252). And I just checked the source file to confirm the
>> sequences are the correct UTF8.
>>
> So almost certainly the compiler is compiling it to UTF-8.

Actually, my compiler at least is doing nothing at all. It knows nothing
about UTF8; it's just a sequence of bytes forming a string literal. The
E2 82 AC sequence representing € is output to the binary as a E2 82 AC
sequence, just as 41 42 43 is passed through as 41 42 43 ("ABC").

That's the advantage of UTF8.

It is the editor that needs to be aware of it, needing to deal with
input of it, display, and writing to a file with the correct encoding.

And the runtime or OS must also show the correct display for UTF8
sequences. It's that bit that is going wrong with gcc.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<6297c8c7-4be0-4a5a-92dc-5fae321adf7an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19884&group=comp.lang.c#19884

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:212d:: with SMTP id r13mr19865993qvc.63.1641594680076;
Fri, 07 Jan 2022 14:31:20 -0800 (PST)
X-Received: by 2002:a05:620a:440d:: with SMTP id v13mr44831106qkp.597.1641594679842;
Fri, 07 Jan 2022 14:31:19 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 14:31:19 -0800 (PST)
In-Reply-To: <srae3l$nd8$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:5cfd:d1e0:7cc4:f7bb
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com> <sra3mm$974$1@dont-email.me>
<b79341ed-50d2-4f32-aff3-96837b4a8d1an@googlegroups.com> <srac11$8kq$1@dont-email.me>
<986d6844-5092-4f95-b605-80e10addd24an@googlegroups.com> <srae3l$nd8$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6297c8c7-4be0-4a5a-92dc-5fae321adf7an@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Fri, 07 Jan 2022 22:31:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 89
 by: Malcolm McLean - Fri, 7 Jan 2022 22:31 UTC

On Friday, 7 January 2022 at 22:16:33 UTC, Bart wrote:
> On 07/01/2022 22:03, Malcolm McLean wrote:
> > On Friday, 7 January 2022 at 21:41:00 UTC, Bart wrote:
> >> On 07/01/2022 20:27, Malcolm McLean wrote:
> >>> On Friday, 7 January 2022 at 19:18:57 UTC, Bart wrote:
> >>>> On 07/01/2022 18:44, Malcolm McLean wrote:
> >>>>> On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
> >>>>>>
> >>>>>> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
> >>>>>> with it, I've only done a few years of programming targeting that platform, and all of
> >>>>>> the text that came up in the work I was doing there was simple English, with no need
> >>>>>> to make any use of extended characters, so I can't vouch for any of the details about
> >>>>>> how text was encoded. However, WIndows is a very common platform, whether or not
> >>>>>> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> >>>>>> seems very odd.
> >>>>>>
> >>>>> Every Windows API call that takes text comes in an A-suffix or a W-suffix call.
> >>>>> The A-suffix takes ascii strings, the W-suffix takes near UTF-16, actually Microsoft's
> >>>>> slightly incompatible version. If you don't provide a suffix at all, the compiler
> >>>>> selects a version which depends on how you have set it up. I'm not sure what the
> >>>>> default is or exactly how you play with the settings.
> >>>>>>
> >>>>>>> I think that having all text streams and plain non-prefixed "strings" as
> >>>>>>> UTF-8 is both possible and most logical.
> >>>>>> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
> >>>>>> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
> >>>>>> confuses me.
> >>>>>
> >>>>> My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
> >>>>> need to do so, and the situation might have changed recently.
> >>>>>
> >>>> This program, which contains UTF8 sequences:
> >>>>
> >>>> #include <stdio.h>
> >>>>
> >>>> int main(void) {
> >>>> printf("ø°PÇ€\n");
> >>>> }
> >>>>
> >>>> works OK if compiled with bcc or tcc and run with codepage 65001 active.
> >>>>
> >>> Are you sure that's compiling to utf-8?
> >>> A better test would be to build the utf-8 sequence explictly, and see if the output
> >>> is as specified.
> >> Notepad was told to save as UTF8. Codepage 65001 is UTF8 (and it didn't
> >> work with 1252). And I just checked the source file to confirm the
> >> sequences are the correct UTF8.
> >>
> > So almost certainly the compiler is compiling it to UTF-8.
> Actually, my compiler at least is doing nothing at all. It knows nothing
> about UTF8; it's just a sequence of bytes forming a string literal. The
> E2 82 AC sequence representing € is output to the binary as a E2 82 AC
> sequence, just as 41 42 43 is passed through as 41 42 43 ("ABC").
>
> That's the advantage of UTF8.
>
> It is the editor that needs to be aware of it, needing to deal with
> input of it, display, and writing to a file with the correct encoding.
>
Yes, the majority of a C source file is going to be pure ascii, with only
a few extended string literals embedded. So UTF-8 is a good choice. The
compiler needs no modification, and file size remains about the same.
>
> And the runtime or OS must also show the correct display for UTF8
> sequences. It's that bit that is going wrong with gcc.
>
The terminal is the same. So gcc must be linking a printf() that doesn't
treat UTF-8 correctly, though it's hard to see what coule be going wrong
if the terminal takes UTF-8 in 8 bit bytes.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<c52c7902-0ce0-4db2-af97-1f9fc5c2a9fan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19885&group=comp.lang.c#19885

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:2622:: with SMTP id gv2mr61114994qvb.128.1641612725139;
Fri, 07 Jan 2022 19:32:05 -0800 (PST)
X-Received: by 2002:a05:620a:28c1:: with SMTP id l1mr6707347qkp.362.1641612724996;
Fri, 07 Jan 2022 19:32:04 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 19:32:04 -0800 (PST)
In-Reply-To: <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=94.246.251.164; posting-account=pysjKgkAAACLegAdYDFznkqjgx_7vlUK
NNTP-Posting-Host: 94.246.251.164
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c52c7902-0ce0-4db2-af97-1f9fc5c2a9fan@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: oot...@hot.ee (Öö Tiib)
Injection-Date: Sat, 08 Jan 2022 03:32:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 188
 by: Öö Tiib - Sat, 8 Jan 2022 03:32 UTC

On Friday, 7 January 2022 at 19:34:16 UTC+2, james...@alumni.caltech.edu wrote:
> On Friday, January 7, 2022 at 7:06:26 AM UTC-5, Öö Tiib wrote:
> > On Thursday, 6 January 2022 at 19:28:46 UTC+2, james...@alumni.caltech.edu wrote:
> > > On Wednesday, January 5, 2022 at 11:35:38 PM UTC-5, Öö Tiib wrote:
> ...
> > > > Even programmer of such system might want to upgrade to compiler
> > > > whose standard library supports UTF-8. But it is not possible as
> > > > standard library is defined not to.
> > > Could you cite the text from the C and C++ standards that prohibits the
> > > standard library from supporting UTF-8?
> > Are you pretending that you did not understand what I meant?
> No, I am quite accurately and honestly expressing my confusion. You object to
> something being prohibited by the standards that is, to the best of my understanding,
> allowed. It would make more sense if you were objecting the fact that it isn't
> mandatory, and if you were making such claims, I would disagree with you about
> whether it would be a good idea to make it mandatory - but as far as I can tell, you're
> claiming it isn't allowed.

It is allowed. Almost whatever is allowed. But you yourself listed all that distracting
and confusing half-support, all those char8_t-s, added and deprecated codecvt-s
and u8 prefixes. Every possible thing designed to avoid adding actual support
to standard.

> If you could, as requested, cite the relevant text that prohibits such compilers, you
> might convince me that I'm wrong. If not, the citation would at least enable me to try
> to convince you that you're misinterpreting that text. Neither possibility can happen
> until you actually honor that request.

I can not possibly cite that. And now I'm confused how can be you snipped that
"The standards allow implementations to have wide array of whatever obscure
extensions." That already told it? So if I worded it unclear, then it is my fault.
Why must UTF-8 be extension?

>
> ...
> > However some essential things, like say adding 128 bit integers or
> > UTF-8 support or even to stop that nonsense with newline characters
> > is made tricky.
>
> Neither 128 bits integer nor UTF-8 support are essential. Lots of people have no
> need of either (I've never needed either one, though I have used UTF-8 since it was
> available). If you need such things on a platform where no existing implementation
> of C provides them, your complaint is with the implementors, not the standard,
> because the standard says nothing to prohibit such things.
>

In world where 98% of text communication goes with UTF-8 there of course
is some shrinking 2% of market left.
> ...
> > I am saying that when to ask why default string can't be UTF-8 then that
> > EBCDIC is usually mentioned. Despite there probably are no much C, let
> > alone C++ used on few alive EBCDIC platforms.
> That seems reasonable to me. The only places where that logic applies are
> implementations of C targeting EBCDIC platforms, and regardless of how rare such
> implementations, they would become substantially rarer because users would abandon
> them if they switched to UTF-8.

Hypothetical C programmer on EBCDIC platform (never heard of one) most likely
wants to exchange information with rest of the world. So why he would not want
to upgrade to compiler that supports UTF-8? I am purely speculating as I got no
experience with those devices.

>
> ...
> > The UTF-16 and UTF-32 are also among those 2% of obscure text encodings..
> > OK, UTF-16 can be useful to communicate with mis-designed programming
> > languages like Java or C# or operating systems like Windows. But UTF-32
> > is rather exotic.
> UTF-16 is, as I understand it, the default in the Windows world. I'm not that familiar
> with it, I've only done a few years of programming targeting that platform, and all of
> the text that came up in the work I was doing there was simple English, with no need
> to make any use of extended characters, so I can't vouch for any of the details about
> how text was encoded. However, WIndows is a very common platform, whether or not
> you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> seems very odd.

It has usages like I confirmed already. Obscure I said because it has merged the
overhead and need for BOMs of UTF-32 with inconveniences of UTF-8 without
any benefits.

>
> UTF-32 shares an important property with the implementation-defined encoding used
> for wchar_t: every character takes up one and only one element in the array. I have
> written a lot of code over several decades for parsing strings that assumes that every
> character takes up one char, a valid assumption in the contexts in which I wrote it. When
> I think about how I would have to re-write such code to work with multi-byte encodings
> such as UTF-8, then the simplicity of replacing char with wchar_t or char32_t, leaving my
> logic unchanged, starts looking pretty attractive. However, since I have little experience
> writing code to work with extended characters using any encoding, my preferences don't
> carry much weight.

That is good point. UTF-8 converts trivially to UTF-32 and back. So where such
precisely-one guarantee helps to make some algorithm more robust and simple
there we can easily convert of course. But I see no reason why to keep (or to
transfer) UTF-32 outside of such context.

> > > While C, in particular, doesn't mandate quite as much support for UTF-8 as
> > > I'd like, both standards allow the fullest possible support for UTF-8 that I
> > > could imagine. Why do you think otherwise?
> > I think that having all text streams and plain non-prefixed "strings" as
> > UTF-8 is both possible and most logical.
> Yes, and that is allowed by both standards, and is the norm, not the exception, on most
> Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
> confuses me.

If I appeared to make that claim then it is probably my fault. I meant it is made difficult
by adding things that look like trying to support it one day but there really are no UTF-8
support in standards despite we use it everywhere for decades.

> If you don't work on Unix systems where the encoding of non-prefixed
> strings is UTF-8, and you don't work on Windows systems where UTF-16 is the norm, and
> you don't work on systems where EBCDIC is the norm, what kinds of systems do you
> work on? I'm not saying that there aren't any other types of systems, there's a great
> many, but most of the others are substantially less common, so I am just curious which
> one(s) you use.

I have written C and C++ for systems and peripherals of things like cash dispensers,
point of sale terminals, taximeters, spectrometers, thermostats, frequency converters
and mobile phones. Also I have participated in projects of writing utility software and
services on Unixes and Windowses. From systems that I've programmed if to
remove the pointless obfuscation garbage from standards and to require UTF-8
then perhaps only MS has to do something at all.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<g08CJ.61192$Ak2.25570@fx20.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19886&group=comp.lang.c#19886

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx20.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.4.1
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris
Wellons
Content-Language: en-US
Newsgroups: comp.lang.c
References: <sr0psj$g2d$1@dont-email.me>
<761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com>
<b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
From: Rich...@Damon-Family.org (Richard Damon)
In-Reply-To: <27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 33
Message-ID: <g08CJ.61192$Ak2.25570@fx20.iad>
X-Complaints-To: abuse@easynews.com
Organization: Forte - www.forteinc.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Fri, 7 Jan 2022 23:01:16 -0500
X-Received-Bytes: 2887
X-Original-Bytes: 2754
 by: Richard Damon - Sat, 8 Jan 2022 04:01 UTC

On 1/7/22 7:06 AM, Öö Tiib wrote:
> The UTF-16 and UTF-32 are also among those 2% of obscure text encodings.
> OK, UTF-16 can be useful to communicate with mis-designed programming
> languages like Java or C# or operating systems like Windows. But UTF-32
> is rather exotic.
>

One key thing to remember is that most UTF-16 (and especially Window's
use of it) goes back to a too early adoption of Unicode and UCS-2 as
'The Standard' for Text, when 16-bit 'Unicode' was claimed to be the
answer to the problem of all those code-pages.

Then, when after it got adopted and baked into ABIs/APIs it was realized
that Unicode was going to need to be bigger so UCS-2 switched to UTF-16
and UCS-4 became the real 'wide character' type (except, in some ways it
still wasn't due to combining codepoints).

For C, this basically means that the 'wide character' system is
practically broken, and is really broken on Windows machines as the
standard says its needs to be the widest type of character, and that it
expresses all the characters in one unit, but on Windows by ABI
requirements it must be 16 bits, and 32 bits isn't really correct even
on systems which do use that for wide characters due to the combining
codes issue.

Basically, for a system that want to really conform to both Unicode and
the C standard, the implementation is put in a spot where it is actually
impossible to do so, at least if you want to keep the full intent of the
standard (that wide strings can be arbitrarily split and joined without
issue).

Then you have that Unicode is actually not stateless when you add things
like the various Left-to-Right codes or things emoji modifier characters.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<74dd4f1f-c5ff-4c9e-9a04-3616a978fb04n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19887&group=comp.lang.c#19887

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:4050:: with SMTP id i16mr8412825qko.274.1641617546279;
Fri, 07 Jan 2022 20:52:26 -0800 (PST)
X-Received: by 2002:a05:622a:11ce:: with SMTP id n14mr59123954qtk.432.1641617546061;
Fri, 07 Jan 2022 20:52:26 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 20:52:25 -0800 (PST)
In-Reply-To: <c52c7902-0ce0-4db2-af97-1f9fc5c2a9fan@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=108.48.119.9; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 108.48.119.9
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<c52c7902-0ce0-4db2-af97-1f9fc5c2a9fan@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <74dd4f1f-c5ff-4c9e-9a04-3616a978fb04n@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Sat, 08 Jan 2022 04:52:26 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 139
 by: james...@alumni.calt - Sat, 8 Jan 2022 04:52 UTC

On Friday, January 7, 2022 at 10:32:13 PM UTC-5, Öö Tiib wrote:
> On Friday, 7 January 2022 at 19:34:16 UTC+2, james...@alumni.caltech.edu wrote:
> > On Friday, January 7, 2022 at 7:06:26 AM UTC-5, Öö Tiib wrote:
....
> > No, I am quite accurately and honestly expressing my confusion. You object to
> > something being prohibited by the standards that is, to the best of my understanding,
> > allowed. It would make more sense if you were objecting the fact that it isn't
> > mandatory, and if you were making such claims, I would disagree with you about
> > whether it would be a good idea to make it mandatory - but as far as I can tell, you're
> > claiming it isn't allowed.
> It is allowed. Almost whatever is allowed. But you yourself listed all that distracting
> and confusing half-support, all those char8_t-s, added and deprecated codecvt-s
> and u8 prefixes. Every possible thing designed to avoid adding actual support
> to standard.

So, you are arguing that it should be mandatory to have UTF-8 as the encoding for
unprefixed string literals, even for implementations targeting platforms where that's
contrary to the conventions for that platform?

> > If you could, as requested, cite the relevant text that prohibits such compilers, you
> > might convince me that I'm wrong. If not, the citation would at least enable me to try
> > to convince you that you're misinterpreting that text. Neither possibility can happen
> > until you actually honor that request.
> I can not possibly cite that. And now I'm confused how can be you snipped that
> "The standards allow implementations to have wide array of whatever obscure
> extensions." That already told it? So if I worded it unclear, then it is my fault.
> Why must UTF-8 be extension?

It didn't occur to me that you meant "obscure extensions" to refer to UTF-8 support. It
isn't an extension. "The values of the members of the execution character set are
implementation-defined." (5.2.1p1). That puts choosing UTF-8 for that encoding in the
same category as choosing 8 as the value for CHAR_BIT or setting the values for the
macros that are #defined in <limits.h>.

The term "extension" is not normally used for implementation-defined behavior. Note that
4p9 requires that "An implementation shall be accompanied by a document that defines
all implementation-defined and locale-specific characteristics and all extensions." If
implementation-defined behavior were considered to qualify as an extension, that
specification would be redundant, something the committee generally tries to avoid.

.....
> In world where 98% of text communication goes with UTF-8 there of course
> is some shrinking 2% of market left.

Do you have sources for those numbers, or are you just pulling them out of your hat?
I'm not saying you're wrong, just that I don't know of any easy way to determine what
those numbers are.

> > ...
> > > I am saying that when to ask why default string can't be UTF-8 then that
> > > EBCDIC is usually mentioned. Despite there probably are no much C, let
> > > alone C++ used on few alive EBCDIC platforms.
> > That seems reasonable to me. The only places where that logic applies are
> > implementations of C targeting EBCDIC platforms, and regardless of how rare such
> > implementations, they would become substantially rarer because users would abandon
> > them if they switched to UTF-8.
> Hypothetical C programmer on EBCDIC platform (never heard of one) most likely
> wants to exchange information with rest of the world. So why he would not want
> to upgrade to compiler that supports UTF-8? I am purely speculating as I got no
> experience with those devices.

Unlike your hypothetical C programmer on EBCDIC platform, real programmers of that type
have access to conversion routines for use when the need to communicate outside the
EBCDIC world comes up. If UTF-8 were mandatory for unprefixed string literals, an
implementations targeting such platforms that conformed to such a mandate could add an
extension to create EBCDIC-encoded string literals. If so, developers for such platforms
would have to routinely use that extension for most of their string literals. Such developers
would find that very inconvenient, and would therefore make sure that any implementation
targeting that platform had an option that would make it fail to conform to such a mandate.

Imposing that mandate would fail to make UTF-8 any more widely used. The reason that
there do exist platforms where UTF-8 is not the encoding used for unprefixed string literals
is because their users want some other encoding to be used for that purpose.. If that weren't
the case, someone would have already created a UTF-8 implementation for that platform.

....
> > how text was encoded. However, WIndows is a very common platform, whether or not
> > you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> > seems very odd.
> It has usages like I confirmed already. Obscure I said because it has merged the
> overhead and need for BOMs of UTF-32 with inconveniences of UTF-8 without
> any benefits.

It doesn't matter how strongly you disapprove of it - what matters is how many people want
to use it despite your disapproval.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<91478709-56f3-4696-b4ed-fd24e6335a9en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19888&group=comp.lang.c#19888

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:1c8b:: with SMTP id ib11mr30788468qvb.82.1641618819152;
Fri, 07 Jan 2022 21:13:39 -0800 (PST)
X-Received: by 2002:a05:620a:2911:: with SMTP id m17mr9062745qkp.151.1641618818981;
Fri, 07 Jan 2022 21:13:38 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 7 Jan 2022 21:13:38 -0800 (PST)
In-Reply-To: <d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=108.48.119.9; posting-account=Ix1u_AoAAAAILVQeRkP2ENwli-Uv6vO8
NNTP-Posting-Host: 108.48.119.9
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<d381398d-580b-449c-bb67-21a88c9ed106n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <91478709-56f3-4696-b4ed-fd24e6335a9en@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: jameskuy...@alumni.caltech.edu (james...@alumni.caltech.edu)
Injection-Date: Sat, 08 Jan 2022 05:13:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 103
 by: james...@alumni.calt - Sat, 8 Jan 2022 05:13 UTC

On Friday, January 7, 2022 at 1:44:39 PM UTC-5, Malcolm McLean wrote:
> On Friday, 7 January 2022 at 17:34:16 UTC, james...@alumni.caltech.edu wrote:
....
> > > I think that having all text streams and plain non-prefixed "strings" as
> > > UTF-8 is both possible and most logical.
> > Yes, and that is allowed by both standards, and is the norm, not the exception, on most
> > Unix-like platforms that I'm familiar with. That's why your claim that it's not allowed
> > confuses me.
> My experience is that passing utf-8 to printf() or fopen() doesn't work. But I rarely
> need to do so, and the situation might have changed recently.

My wife was born in Taiwan, and our kids are bilingual, so every computer in our house has
been set up to handle Chinese text properly. If yours isn't, you might not see the right
characters in the string literals below. You could try using

"\u5929\u5B89\u95E8\u5E7F\u573A"

instead - the encoding of the character arrays should be unchanged.

#include <inttypes.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <uchar.h>

int main(void)
{ setlocale(LC_ALL, "");
const char location[] = "天安门广场";
const char location8[] = u8"天安门广场";
const char *p;

printf("Location :");
for(p = location; *p; p++)
printf("%#X ", *(unsigned char*)p);

printf("\nLocation8:");
for(p = location8; *p; p++)
printf("%#X ", *(unsigned char*)p);

printf("\n\"%s\"\n", location);

p = location;
const char * const end = location + sizeof location;
mbstate_t state={0};
while(*p)
{
char32_t c32;
size_t bytes = mbrtoc32(&c32, p, end - p, &state);
switch(bytes)
{
case (size_t)(-3):
printf("%#" PRIXLEAST32 " ", c32);
break;
case (size_t)(-2):
break;
fprintf(stderr, "incomplete character\n");
return EXIT_FAILURE;
case (size_t)(-1):
fprintf(stderr, "%td:Encoding error\n", p-location);
return EXIT_FAILURE;
default:
printf("%#" PRIXLEAST32 " ", c32);
p += bytes;
break;
}
}

printf("\n");
return EXIT_SUCCESS;
}

That program produces the following output on my system:
Location :0XE5 0XA4 0XA9 0XE5 0XAE 0X89 0XE9 0X97 0XA8 0XE5 0XB9 0XBF 0XE5 0X9C 0XBA
Location8:0XE5 0XA4 0XA9 0XE5 0XAE 0X89 0XE9 0X97 0XA8 0XE5 0XB9 0XBF 0XE5 0X9C 0XBA
"天安门广场"
0X5929 0X5B89 0X95E8 0X5E7F 0X573A

Note that the u8 string is encoded exactly the same way as the unprefixed string literal,
confirming that UTF-8 is the encoding for unprefixed string literals.
The setlocale() call is not needed to correctly display the string, but it is needed for
mbrtoc32() to work. In the default "C" locale, mbrtoc32() reports an encoding error. The
"" locale is the implementation-defined default locale - I'm not sure what gcc defines that
default to be, but I suspect that it's the value of my LANG environment variable, which is
"en_US.UTF-8". Virtually every locale supported on my system has UTF-8 or utf8 in it's name.
The "C" locale is one of the few exceptions, but a "C.UTF-8" locale is also supported.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<4a405512-8c50-479a-9928-857fc7d5fac4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19889&group=comp.lang.c#19889

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:5286:: with SMTP id kj6mr62030921qvb.74.1641658665298;
Sat, 08 Jan 2022 08:17:45 -0800 (PST)
X-Received: by 2002:ac8:6e8f:: with SMTP id c15mr7082373qtv.462.1641658665122;
Sat, 08 Jan 2022 08:17:45 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Sat, 8 Jan 2022 08:17:44 -0800 (PST)
In-Reply-To: <74dd4f1f-c5ff-4c9e-9a04-3616a978fb04n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=94.246.251.164; posting-account=pysjKgkAAACLegAdYDFznkqjgx_7vlUK
NNTP-Posting-Host: 94.246.251.164
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<c52c7902-0ce0-4db2-af97-1f9fc5c2a9fan@googlegroups.com> <74dd4f1f-c5ff-4c9e-9a04-3616a978fb04n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4a405512-8c50-479a-9928-857fc7d5fac4n@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: oot...@hot.ee (Öö Tiib)
Injection-Date: Sat, 08 Jan 2022 16:17:45 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 205
 by: Öö Tiib - Sat, 8 Jan 2022 16:17 UTC

On Saturday, 8 January 2022 at 06:52:33 UTC+2, james...@alumni.caltech.edu wrote:
> On Friday, January 7, 2022 at 10:32:13 PM UTC-5, Öö Tiib wrote:
> > On Friday, 7 January 2022 at 19:34:16 UTC+2, james...@alumni.caltech.edu wrote:
> > > On Friday, January 7, 2022 at 7:06:26 AM UTC-5, Öö Tiib wrote:
> ...
> > > No, I am quite accurately and honestly expressing my confusion. You object to
> > > something being prohibited by the standards that is, to the best of my understanding,
> > > allowed. It would make more sense if you were objecting the fact that it isn't
> > > mandatory, and if you were making such claims, I would disagree with you about
> > > whether it would be a good idea to make it mandatory - but as far as I can tell, you're
> > > claiming it isn't allowed.
> > It is allowed. Almost whatever is allowed. But you yourself listed all that distracting
> > and confusing half-support, all those char8_t-s, added and deprecated codecvt-s
> > and u8 prefixes. Every possible thing designed to avoid adding actual support
> > to standard.
>
> So, you are arguing that it should be mandatory to have UTF-8 as the encoding for
> unprefixed string literals, even for implementations targeting platforms where that's
> contrary to the conventions for that platform?

Yes, and vast majority would be happy. What other char* text is needed than UTF-8?
Why? On what? For what? Must be odd corner case. Each trashcan, smoke sensor or
microwave oven out there wants to communicate with whatever siris, alexas,
google homes and skynets they serve. All of those use UTF-8 texts. If it has some
LCD or LED panel then it wants to show text understandable to local desperate
housewife, low salary technician or taxi driver. If it is char* then it is UTF-8 there.

> > > If you could, as requested, cite the relevant text that prohibits such compilers, you
> > > might convince me that I'm wrong. If not, the citation would at least enable me to try
> > > to convince you that you're misinterpreting that text. Neither possibility can happen
> > > until you actually honor that request.
> > I can not possibly cite that. And now I'm confused how can be you snipped that
> > "The standards allow implementations to have wide array of whatever obscure
> > extensions." That already told it? So if I worded it unclear, then it is my fault.
> > Why must UTF-8 be extension?
>
> It didn't occur to me that you meant "obscure extensions" to refer to UTF-8 support. It
> isn't an extension. "The values of the members of the execution character set are
> implementation-defined." (5.2.1p1). That puts choosing UTF-8 for that encoding in the
> same category as choosing 8 as the value for CHAR_BIT or setting the values for the
> macros that are #defined in <limits.h>.

CHAR_BIT can't be less than 8 so UTF-8 code unit is guaranteed to fit. The flexibility
to have bigger CHAR_BIT than 8 can be left there for char has to serve also as byte.

> The term "extension" is not normally used for implementation-defined behavior. Note that
> 4p9 requires that "An implementation shall be accompanied by a document that defines
> all implementation-defined and locale-specific characteristics and all extensions." If
> implementation-defined behavior were considered to qualify as an extension, that
> specification would be redundant, something the committee generally tries to avoid.

If there is implementation defined behavior or not in my experience if text is passed
with char* then it points at UTF-8 and programmer has to fight with that implementation
defined garbage because he needs it to be UTF-8. And I'm complaining against
attempts to lie to novices that UTF-8 should be uchar8_t* or something else like
that. Practical example:
FILE *f = fopen( "Foo😀Bar.txt", "w");
That should work unless underlying file system does not support files
named "Foo😀Bar.txt" If it supports but the code does not work then it indicates
bad standard that allows implementations to weasel away. No garbage like
u8fopen( u8"Foo😀Bar.txt", "w") coming somewhere maybe in C35 or so is
needed as it already works like in my example on vast majority of things.

> ....
> > In world where 98% of text communication goes with UTF-8 there of course
> > is some shrinking 2% of market left.
> Do you have sources for those numbers, or are you just pulling them out of your hat?
> I'm not saying you're wrong, just that I don't know of any easy way to determine what
> those numbers are.

There are no easy way but some organizations do diligently statistics what is
possible to monitor. Like that:
<https://w3techs.com/technologies/history_overview/character_encoding>
Legacy is there but shrinking. If whatever new C position opens where text has
to be accessed with char* then chance is close to 0 that it is something else
but UTF-8.
> > > ...
> > > > I am saying that when to ask why default string can't be UTF-8 then that
> > > > EBCDIC is usually mentioned. Despite there probably are no much C, let
> > > > alone C++ used on few alive EBCDIC platforms.
> > > That seems reasonable to me. The only places where that logic applies are
> > > implementations of C targeting EBCDIC platforms, and regardless of how rare such
> > > implementations, they would become substantially rarer because users would abandon
> > > them if they switched to UTF-8.
> > Hypothetical C programmer on EBCDIC platform (never heard of one) most likely
> > wants to exchange information with rest of the world. So why he would not want
> > to upgrade to compiler that supports UTF-8? I am purely speculating as I got no
> > experience with those devices.
>
> Unlike your hypothetical C programmer on EBCDIC platform, real programmers of that type
> have access to conversion routines for use when the need to communicate outside the
> EBCDIC world comes up. If UTF-8 were mandatory for unprefixed string literals, an
> implementations targeting such platforms that conformed to such a mandate could add an
> extension to create EBCDIC-encoded string literals. If so, developers for such platforms
> would have to routinely use that extension for most of their string literals. Such developers
> would find that very inconvenient, and would therefore make sure that any implementation
> targeting that platform had an option that would make it fail to conform to such a mandate.

You never answered why should they use obscure extensions for what they need on
majority of cases. Why UTF-8 must be obscure extension?

> Imposing that mandate would fail to make UTF-8 any more widely used. The reason that
> there do exist platforms where UTF-8 is not the encoding used for unprefixed string literals
> is because their users want some other encoding to be used for that purpose. If that weren't
> the case, someone would have already created a UTF-8 implementation for that platform.

It is used on close to 100% of cases anyway. I am objecting that it is deliberately
standardized (or more like pseudo-standardized/non-standardized) to be
inconvenient to use.

> ...
> > > how text was encoded. However, WIndows is a very common platform, whether or not
> > > you approve of it's design (I share your disapproval for it), so calling UTF-16 obscure
> > > seems very odd.
> > It has usages like I confirmed already. Obscure I said because it has merged the
> > overhead and need for BOMs of UTF-32 with inconveniences of UTF-8 without
> > any benefits.
> It doesn't matter how strongly you disapprove of it - what matters is how many people want
> to use it despite your disapproval.

Agreed. So do you have numbers how many C programmers *want* to use UTF-16? I
think that it is little, but I do not have any sources. They may *need* to for legacy
reasons I already mentioned but even there it is most likely small number. Their
pain with support to their u"string", L"string" and \x \u \U character references
might need relieving too but is bit different topic.

Re: "Some sanity for C and C++ development on Windows" by Chris Wellons

<314f4088-9ea3-4117-b034-356d77a705cen@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19890&group=comp.lang.c#19890

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:f29:: with SMTP id iw9mr61168324qvb.37.1641662232626;
Sat, 08 Jan 2022 09:17:12 -0800 (PST)
X-Received: by 2002:a05:620a:460c:: with SMTP id br12mr8526929qkb.519.1641662232499;
Sat, 08 Jan 2022 09:17:12 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Sat, 8 Jan 2022 09:17:12 -0800 (PST)
In-Reply-To: <4a405512-8c50-479a-9928-857fc7d5fac4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:51cc:229:c553:bf0a;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:51cc:229:c553:bf0a
References: <sr0psj$g2d$1@dont-email.me> <761b391e-f071-484e-8507-f58eeb44a8e9n@googlegroups.com>
<sr53qo$vbl$1@dont-email.me> <_mpBJ.219710$qz4.56726@fx97.iad>
<36c23681-a90b-4de4-8451-e31e74f6c838n@googlegroups.com> <b13c9427-f475-4bcc-98c8-5de476b4e75bn@googlegroups.com>
<27fc916b-9aee-4a76-85e8-6d4a2281b74bn@googlegroups.com> <884c9725-5b12-4727-98a1-6b7c46efb4aen@googlegroups.com>
<c52c7902-0ce0-4db2-af97-1f9fc5c2a9fan@googlegroups.com> <74dd4f1f-c5ff-4c9e-9a04-3616a978fb04n@googlegroups.com>
<4a405512-8c50-479a-9928-857fc7d5fac4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <314f4088-9ea3-4117-b034-356d77a705cen@googlegroups.com>
Subject: Re: "Some sanity for C and C++ development on Windows" by Chris Wellons
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Sat, 08 Jan 2022 17:17:12 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 17
 by: Malcolm McLean - Sat, 8 Jan 2022 17:17 UTC

On Saturday, 8 January 2022 at 16:17:53 UTC, Öö Tiib wrote:
> On Saturday, 8 January 2022 at 06:52:33 UTC+2, james...@alumni.caltech.edu wrote:
>
> > So, you are arguing that it should be mandatory to have UTF-8 as the encoding for
> > unprefixed string literals, even for implementations targeting platforms where that's
> > contrary to the conventions for that platform?
> Yes, and vast majority would be happy. What other char* text is needed than UTF-8?
> Why? On what? For what? Must be odd corner case.
>
Where you've got an 8 bit character-mapped display that supports ascii plus some
extended characters. That used to be almost every microcomputer, and it still lives
on to a bit in modern PCs.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor