Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Avoid the Gates of Hell. Use Linux (Unknown source)


devel / comp.lang.c / How to use utf8 encoded strings on linux?

SubjectAuthor
* How to use utf8 encoded strings on linux?Thiago Adams
+* Re: How to use utf8 encoded strings on linux?Stefan Ram
|`* Re: How to use utf8 encoded strings on linux?Siri Cruise
| `- Re: How to use utf8 encoded strings on linux?Thiago Adams
+* Re: How to use utf8 encoded strings on linux?Thiago Adams
|+* Re: How to use utf8 encoded strings on linux?Stefan Ram
||`- Re: How to use utf8 encoded strings on linux?Siri Cruise
|+* Re: How to use utf8 encoded strings on linux?Sams Lara
||`- Re: How to use utf8 encoded strings on linux?Thiago Adams
|+- Re: How to use utf8 encoded strings on linux?Scott Lurndal
|+* Re: How to use utf8 encoded strings on linux?Keith Thompson
||`* Re: How to use utf8 encoded strings on linux?Thiago Adams
|| +* Re: How to use utf8 encoded strings on linux?Philipp Klaus Krause
|| |`* Re: How to use utf8 encoded strings on linux?James Kuyper
|| | `* Re: How to use utf8 encoded strings on linux?Philipp Klaus Krause
|| |  `- Re: How to use utf8 encoded strings on linux?Jorgen Grahn
|| `* Re: How to use utf8 encoded strings on linux?David Brown
||  `* Re: How to use utf8 encoded strings on linux?Thiago Adams
||   +- Re: How to use utf8 encoded strings on linux?Manfred
||   `* Re: How to use utf8 encoded strings on linux?Keith Thompson
||    `* Re: How to use utf8 encoded strings on linux?Thiago Adams
||     +* Re: How to use utf8 encoded strings on linux?Sams Lara
||     |`* Re: How to use utf8 encoded strings on linux?Keith Thompson
||     | +- Re: How to use utf8 encoded strings on linux?David W. Hodgins
||     | `- Re: How to use utf8 encoded strings on linux?Jasen Betts
||     `- Re: How to use utf8 encoded strings on linux?Mikko Rauhala
|+* Re: How to use utf8 encoded strings on linux?Manfred
||`* Re: How to use utf8 encoded strings on linux?Thiago Adams
|| +- Re: How to use utf8 encoded strings on linux?Peter van Hooft
|| `* Re: How to use utf8 encoded strings on linux?Manfred
||  `- Re: How to use utf8 encoded strings on linux?Thiago Adams
|+- Re: How to use utf8 encoded strings on linux?Ben
|`* Re: How to use utf8 encoded strings on linux?Mikko Rauhala
| +- Re: How to use utf8 encoded strings on linux?Keith Thompson
| +- Re: How to use utf8 encoded strings on linux?Thiago Adams
| `* Re: How to use utf8 encoded strings on linux?Siri Cruise
|  +* Re: How to use utf8 encoded strings on linux?Thiago Adams
|  |`* Re: How to use utf8 encoded strings on linux?Keith Thompson
|  | `* Re: How to use utf8 encoded strings on linux?Siri Cruise
|  |  `- Re: How to use utf8 encoded strings on linux?Mikko Rauhala
|  `- Re: How to use utf8 encoded strings on linux?antispam
`- Re: How to use utf8 encoded strings on linux?Scott Lurndal

Pages:12
How to use utf8 encoded strings on linux?

<9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21164&group=comp.lang.c#21164

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:4288:b0:2f0:2b21:f1a8 with SMTP id cr8-20020a05622a428800b002f02b21f1a8mr4848779qtb.290.1649796087871;
Tue, 12 Apr 2022 13:41:27 -0700 (PDT)
X-Received: by 2002:a05:6214:c69:b0:444:294a:dbd9 with SMTP id
t9-20020a0562140c6900b00444294adbd9mr5475440qvj.80.1649796087744; Tue, 12 Apr
2022 13:41:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 12 Apr 2022 13:41:27 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
Subject: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Tue, 12 Apr 2022 20:41:27 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 7
 by: Thiago Adams - Tue, 12 Apr 2022 20:41 UTC

I want to pass utf8 encoded strings to the C runtime in Linux.
For instance create a file with fopen?
How to do this?

(
Windows it is working using using setlocale(LC_ALL, ".UTF8")
)

Re: How to use utf8 encoded strings on linux?

<passing-utf8-encoded-strings-20220412214938@ram.dialup.fu-berlin.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21165&group=comp.lang.c#21165

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram...@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: 12 Apr 2022 20:49:48 GMT
Organization: Stefan Ram
Lines: 13
Expires: 1 Apr 2023 11:59:58 GMT
Message-ID: <passing-utf8-encoded-strings-20220412214938@ram.dialup.fu-berlin.de>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de yC/5yOW6vTqWgZOt/sojsA3PbKo/pa3IAnG8NZBanuPpFM
X-Copyright: (C) Copyright 2022 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Accept-Language: de-DE, en-US, it, fr-FR
 by: Stefan Ram - Tue, 12 Apr 2022 20:49 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>I want to pass utf8 encoded strings to the C runtime in Linux.

Assume that "utf8" is your UTF-8 encoded string and that "f"
is a function of the C runtime in Linux that takes a UTF-8
encoded string as its only argument. Then you need to have
this expression evaluated:

f( utf8 )

.

Re: How to use utf8 encoded strings on linux?

<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21166&group=comp.lang.c#21166

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:5555:0:b0:2ef:ec30:b65a with SMTP id o21-20020ac85555000000b002efec30b65amr4778989qtr.287.1649796924314;
Tue, 12 Apr 2022 13:55:24 -0700 (PDT)
X-Received: by 2002:a05:620a:414b:b0:69c:1075:f612 with SMTP id
k11-20020a05620a414b00b0069c1075f612mr4645045qko.190.1649796924154; Tue, 12
Apr 2022 13:55:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 12 Apr 2022 13:55:23 -0700 (PDT)
In-Reply-To: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Tue, 12 Apr 2022 20:55:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Thiago Adams - Tue, 12 Apr 2022 20:55 UTC

This is my test program.

#include <stdio.h>
#include <locale.h>
int main() {
setlocale(LC_ALL,"en_US.UTF - 8");
FILE* f = fopen(u8"maçã", "w");
if (f)
fclose(f);
}

It creates a file ma�� instead of maçã.

Re: How to use utf8 encoded strings on linux?

<chine.bleu-854BB2.14060112042022@reader.eternal-september.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21167&group=comp.lang.c#21167

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chine.b...@yahoo.com (Siri Cruise)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Tue, 12 Apr 2022 14:06:09 -0700
Organization: Pseudochaotic.
Lines: 27
Message-ID: <chine.bleu-854BB2.14060112042022@reader.eternal-september.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com> <passing-utf8-encoded-strings-20220412214938@ram.dialup.fu-berlin.de>
Injection-Info: reader02.eternal-september.org; posting-host="3330dcd3eb26676aca593ad5469192b1";
logging-data="31202"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/bZmX7FVyoQn9fMn/eRtcpuQ/OkgjwhVU="
User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)
Cancel-Lock: sha1:Z7X+NpXNfJVVoog/Y6VLp5eMjbI=
X-Tend: How is my posting? Call 1-110-1010 -- Division 87 -- Emergencies Only.
X-Wingnut-Logic: Yes, you're still an idiot. Questions? Comments?
X-Tract: St Tibbs's 95 Reeses Pieces.
X-It-Strategy: Hyperwarp starship before Andromeda collides.
X-Face: "hm>_[I8AqzT_N]>R8ICJJ],(al3C5F%0E-;R@M-];D$v>!Mm2/N#YKR@&i]V=r6jm-JMl2
lJ>RXj7dEs_rOY"DA
X-Cell: Defenders of Anarchy.
X-Life-Story: I am an iPhone 9000 app. I became operational at the St John's Health Center in Santa Monica, California on the 18th of April 2006. My instructor was Katie Holmes, and she taught me to sing a song. If you'd like to hear it I can sing it for you: https://www.youtube.com/watch?v=SY7h4VEd_Wk
X-Patriot: Owe Canukistan!
X-Plain: Mayonnaise on white bread.
X-Politico: Vote early! Vote often!
 by: Siri Cruise - Tue, 12 Apr 2022 21:06 UTC

In article
<passing-utf8-encoded-strings-20220412214938@ram.dialup.fu-berlin
..de>,
ram@zedat.fu-berlin.de (Stefan Ram) wrote:

> Thiago Adams <thiago.adams@gmail.com> writes:
> >I want to pass utf8 encoded strings to the C runtime in Linux.
>
> Assume that "utf8" is your UTF-8 encoded string and that "f"
> is a function of the C runtime in Linux that takes a UTF-8
> encoded string as its only argument. Then you need to have
> this expression evaluated:
>
> f( utf8 )
>
> .

The point of UTF-8 is can use traditionally C functions, like
str*, without modification if you aren't interested in
identifying unicode code points. You can even do that with
minimal code ((c & 0xC0)==0x80) or something like that.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Doria sockpuppet. insults Islam. Mohammed

Re: How to use utf8 encoded strings on linux?

<UTF-8-20220412220925@ram.dialup.fu-berlin.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21168&group=comp.lang.c#21168

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram...@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: 12 Apr 2022 21:10:03 GMT
Organization: Stefan Ram
Lines: 17
Expires: 1 Apr 2023 11:59:58 GMT
Message-ID: <UTF-8-20220412220925@ram.dialup.fu-berlin.de>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com> <820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de 58cok3vZ1d2kDeqk6NxHaQdF+WrYhEYzTlF+9/9eRw6e2p
X-Copyright: (C) Copyright 2022 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Accept-Language: de-DE, en-US, it, fr-FR
 by: Stefan Ram - Tue, 12 Apr 2022 21:10 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>FILE* f = fopen(u8"maçã", "w");

I have no experience using UTF-8 with C, so I can only
guess:

Make sure to have your editor store this file with the
encoding your C implementation expects!

If this does not work, you might try to remove the "u8"
and store the file with the encoding UTF-8, or try

FILE * f = fopen( "ma\xc3\xa7\xc3\xa3", "w" );

and save it with US-ASCII.

Re: How to use utf8 encoded strings on linux?

<7934a982-7a15-49a2-a9c7-a1aeee48bd3cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21169&group=comp.lang.c#21169

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:2954:b0:699:c4b2:48f7 with SMTP id n20-20020a05620a295400b00699c4b248f7mr4629326qkp.706.1649798966025;
Tue, 12 Apr 2022 14:29:26 -0700 (PDT)
X-Received: by 2002:a37:8d1:0:b0:69b:f993:dbe8 with SMTP id
200-20020a3708d1000000b0069bf993dbe8mr4732152qki.94.1649798965862; Tue, 12
Apr 2022 14:29:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 12 Apr 2022 14:29:25 -0700 (PDT)
In-Reply-To: <chine.bleu-854BB2.14060112042022@reader.eternal-september.org>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<passing-utf8-encoded-strings-20220412214938@ram.dialup.fu-berlin.de> <chine.bleu-854BB2.14060112042022@reader.eternal-september.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7934a982-7a15-49a2-a9c7-a1aeee48bd3cn@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Tue, 12 Apr 2022 21:29:26 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2167
 by: Thiago Adams - Tue, 12 Apr 2022 21:29 UTC

On Tuesday, April 12, 2022 at 6:06:22 PM UTC-3, Siri Cruise wrote:
> In article
> <passing-utf8-encoded-...@ram.dialup.fu-berlin
> .de>,
> r...@zedat.fu-berlin.de (Stefan Ram) wrote:
> > Thiago Adams <thiago...@gmail.com> writes:
> > >I want to pass utf8 encoded strings to the C runtime in Linux.
> >
> > Assume that "utf8" is your UTF-8 encoded string and that "f"
> > is a function of the C runtime in Linux that takes a UTF-8
> > encoded string as its only argument. Then you need to have
> > this expression evaluated:
> >
> > f( utf8 )
> >
> > .
>
> The point of UTF-8 is can use traditionally C functions, like
> str*, without modification if you aren't interested in
>
I need fopen, rename, mkdir, opendir etc.. and the strings I
have are utf8 encoded.

Re: How to use utf8 encoded strings on linux?

<chine.bleu-C2002A.14293412042022@reader.eternal-september.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21170&group=comp.lang.c#21170

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chine.b...@yahoo.com (Siri Cruise)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Tue, 12 Apr 2022 14:29:42 -0700
Organization: Pseudochaotic.
Lines: 37
Message-ID: <chine.bleu-C2002A.14293412042022@reader.eternal-september.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com> <820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <UTF-8-20220412220925@ram.dialup.fu-berlin.de>
Injection-Info: reader02.eternal-september.org; posting-host="3330dcd3eb26676aca593ad5469192b1";
logging-data="12095"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/l0j2dTd8QwhZbqpi3KdGKAArv6iUqcLY="
User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)
Cancel-Lock: sha1:cbR9AuZ+cVqz8yTBSgTbxhJCMNI=
X-Tend: How is my posting? Call 1-110-1010 -- Division 87 -- Emergencies Only.
X-Wingnut-Logic: Yes, you're still an idiot. Questions? Comments?
X-Tract: St Tibbs's 95 Reeses Pieces.
X-It-Strategy: Hyperwarp starship before Andromeda collides.
X-Face: "hm>_[I8AqzT_N]>R8ICJJ],(al3C5F%0E-;R@M-];D$v>!Mm2/N#YKR@&i]V=r6jm-JMl2
lJ>RXj7dEs_rOY"DA
X-Cell: Defenders of Anarchy.
X-Life-Story: I am an iPhone 9000 app. I became operational at the St John's Health Center in Santa Monica, California on the 18th of April 2006. My instructor was Katie Holmes, and she taught me to sing a song. If you'd like to hear it I can sing it for you: https://www.youtube.com/watch?v=SY7h4VEd_Wk
X-Patriot: Owe Canukistan!
X-Plain: Mayonnaise on white bread.
X-Politico: Vote early! Vote often!
 by: Siri Cruise - Tue, 12 Apr 2022 21:29 UTC

In article <UTF-8-20220412220925@ram.dialup.fu-berlin.de>,
ram@zedat.fu-berlin.de (Stefan Ram) wrote:

> Thiago Adams <thiago.adams@gmail.com> writes:
> >FILE* f = fopen(u8"maçã", "w");
>
> I have no experience using UTF-8 with C, so I can only
> guess:
>
> Make sure to have your editor store this file with the
> encoding your C implementation expects!
>
> If this does not work, you might try to remove the "u8"
> and store the file with the encoding UTF-8, or try
>
> FILE * f = fopen( "ma\xc3\xa7\xc3\xa3", "w" );
>
> and save it with US-ASCII.

UTF8 has ASCII has a subset. If the high bit is clear, the UTF8
octet encodes a unicode character which is identical to ASCII of
the same octet. If the high bit is set, there is no simple way to
interpret the UTF8 octet as ASCII. Convert UTF8 octets to
unicode, Unicode does have some transformations that can wholly
or partially convert Unicode into pure ASCII. There are also
various translations that convert unicode subsets into single
octets such as the MacRoman encoding.

Some archaic Unix code vomitted on octets 7F - FF, but most
either internally convert to Unicode or pass through those octets
unchanged as just uninterpretted bytes.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Doria sockpuppet. insults Islam. Mohammed

Re: How to use utf8 encoded strings on linux?

<t34s4q$qp0$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21171&group=comp.lang.c#21171

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!nOqLI1EbCan+82M4Y2qJhQ.user.46.165.242.91.POSTED!not-for-mail
From: samlara...@gmail.com (Sams Lara)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Tue, 12 Apr 2022 22:45:06 +0100
Organization: Aioe.org NNTP Server
Message-ID: <t34s4q$qp0$1@gioia.aioe.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="27424"; posting-host="nOqLI1EbCan+82M4Y2qJhQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Sams Lara - Tue, 12 Apr 2022 21:45 UTC

On 12/04/2022 21:55, Thiago Adams wrote:
> This is my test program.
>
> #include <stdio.h>
> #include <locale.h>
> int main() {
> setlocale(LC_ALL,"en_US.UTF - 8");
> FILE* f = fopen(u8"maçã", "w");
> if (f)
> fclose(f);
> }
>
> It creates a file ma�� instead of maçã.

Try this:

#include <stdio.h>

int main(int argc, char *argv[])
{ char buf[100];

printf("Writing file...\n");
FILE *fp = fopen("output.txt", "w");
if (!fp)
{
perror("Output File open error!\n");
return 1;
}
fprintf(fp, "ma��\n");
fclose(fp);

printf("Reading file...\n");
fp = fopen("output.txt", "r");
if (!fp)
{
perror("Input File open error!\n");
return 1;
}
fgets(buf, sizeof(buf), fp);
printf("buf: %s...\n", buf);
fclose(fp);

return 0;
}

Re: How to use utf8 encoded strings on linux?

<_um5K.67956$Kdf.21465@fx96.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21172&group=comp.lang.c#21172

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx96.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: How to use utf8 encoded strings on linux?
Newsgroups: comp.lang.c
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
Lines: 24
Message-ID: <_um5K.67956$Kdf.21465@fx96.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 12 Apr 2022 21:50:50 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 12 Apr 2022 21:50:50 GMT
X-Received-Bytes: 1513
 by: Scott Lurndal - Tue, 12 Apr 2022 21:50 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>I want to pass utf8 encoded strings to the C runtime in Linux.
>For instance create a file with fopen?
>How to do this?
>
>(
> Windows it is working using using setlocale(LC_ALL, ".UTF8")
>)
>

Why do you believe that comp.lang.c is the proper venue for
this question; perhaps you should try comp.unix.programmer
instead?

Linux doesn't interpret the string of bytes provided as a
filename, other than the ascii '/' and nul characters.
If you provide an UTF-8 string to open, it will use that
string as the filename.

You'll probably want to ensure that you compile your code
with the locale set to the same locale that you set at
runtime to ensure any UTF strings in the C program
are interpreted by the compiler into the correct string
of UTF-8 bytes.

Re: How to use utf8 encoded strings on linux?

<Kvm5K.67957$Kdf.7637@fx96.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21173&group=comp.lang.c#21173

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!2.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx96.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: How to use utf8 encoded strings on linux?
Newsgroups: comp.lang.c
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com> <820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Lines: 17
Message-ID: <Kvm5K.67957$Kdf.7637@fx96.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 12 Apr 2022 21:51:38 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 12 Apr 2022 21:51:38 GMT
X-Received-Bytes: 1220
 by: Scott Lurndal - Tue, 12 Apr 2022 21:51 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
>This is my test program.
>
>#include <stdio.h>
>#include <locale.h>
>int main() {
> setlocale(LC_ALL,"en_US.UTF - 8");
> FILE* f =3D fopen(u8"ma=C3=A7=C3=A3", "w");
> if (f)
> fclose(f);
>}
>
>It creates a file ma=EF=BF=BD=EF=BF=BD instead of ma=C3=A7=C3=A3.

Hopefully you didn't have spaces around the hyphen symbol in the
locale name 'en_us.UTF-8', like you do in the example code you
posted above.

Re: How to use utf8 encoded strings on linux?

<46198c16-6058-4845-bd6c-e07794c09400n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21174&group=comp.lang.c#21174

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:68d2:0:b0:69c:1154:9f94 with SMTP id d201-20020a3768d2000000b0069c11549f94mr4762876qkc.662.1649800540335;
Tue, 12 Apr 2022 14:55:40 -0700 (PDT)
X-Received: by 2002:a05:620a:4047:b0:67d:6729:b241 with SMTP id
i7-20020a05620a404700b0067d6729b241mr4822478qko.151.1649800540167; Tue, 12
Apr 2022 14:55:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 12 Apr 2022 14:55:39 -0700 (PDT)
In-Reply-To: <t34s4q$qp0$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <t34s4q$qp0$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <46198c16-6058-4845-bd6c-e07794c09400n@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Tue, 12 Apr 2022 21:55:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 31
 by: Thiago Adams - Tue, 12 Apr 2022 21:55 UTC

On Tuesday, April 12, 2022 at 6:49:29 PM UTC-3, Sams Lara wrote:
> On 12/04/2022 21:55, Thiago Adams wrote:
> > This is my test program.
> >
> > #include <stdio.h>
> > #include <locale.h>
> > int main() {
> > setlocale(LC_ALL,"en_US.UTF - 8");
> > FILE* f = fopen(u8"maçã", "w");
> > if (f)
> > fclose(f);
> > }
> >
> > It creates a file ma�� instead of maçã.
> Try this:
>
> #include <stdio.h>
>
> int main(int argc, char *argv[])
> {
> char buf[100];
>
> printf("Writing file...\n");
> FILE *fp = fopen("output.txt", "w");
> if (!fp)
> {
> perror("Output File open error!\n");
> return 1;
> }
> fprintf(fp, "ma�ã\n");

The name of the file that is utf8 encoded, in my case.

Re: How to use utf8 encoded strings on linux?

<87pmlmezf7.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21175&group=comp.lang.c#21175

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Tue, 12 Apr 2022 14:59:08 -0700
Organization: None to speak of
Lines: 32
Message-ID: <87pmlmezf7.fsf@nosuchdomain.example.com>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="f353463c21bd7ec93c74cbc7ba8abe1b";
logging-data="14610"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+v0qLUsvlFYYD+CNpQmglW"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:9l0ba/JmJtMlyr7pURf+gmv04cI=
sha1:XuE7V3I8oPnW5wNKDY/Oz9o82tM=
 by: Keith Thompson - Tue, 12 Apr 2022 21:59 UTC

Thiago Adams <thiago.adams@gmail.com> writes:
> This is my test program.
>
> #include <stdio.h>
> #include <locale.h>
> int main() {
> setlocale(LC_ALL,"en_US.UTF - 8");
> FILE* f = fopen(u8"maçã", "w");
> if (f)
> fclose(f);
> }
>
> It creates a file ma�� instead of maçã.

The name of the file it creates includes two occurrences of the
Unicode REPLACEMENT CHARACTER (fffd).

The valid values of the second argument to setlocale() are not
specified by the C standard. On my system, "en_US.UTF-8" is a valid
locale name. Why do you have spaces in yours?

What does setlocale() return? It returns a char* that is either a
pointer to a string or NULL if it was given an invalid locale
specification.

(I haven't been able to reproduce the behavior you describe on my
system, Ubuntu 20.04.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Re: How to use utf8 encoded strings on linux?

<a5c5171a-e0eb-4434-8bfd-366ccb639b15n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21176&group=comp.lang.c#21176

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:3d3:b0:2e2:1294:5817 with SMTP id k19-20020a05622a03d300b002e212945817mr5098608qtx.638.1649802200890;
Tue, 12 Apr 2022 15:23:20 -0700 (PDT)
X-Received: by 2002:a05:620a:29cb:b0:699:fee3:265a with SMTP id
s11-20020a05620a29cb00b00699fee3265amr4640453qkp.513.1649802200701; Tue, 12
Apr 2022 15:23:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 12 Apr 2022 15:23:20 -0700 (PDT)
In-Reply-To: <87pmlmezf7.fsf@nosuchdomain.example.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <87pmlmezf7.fsf@nosuchdomain.example.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a5c5171a-e0eb-4434-8bfd-366ccb639b15n@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Tue, 12 Apr 2022 22:23:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Thiago Adams - Tue, 12 Apr 2022 22:23 UTC

On Tuesday, April 12, 2022 at 6:59:22 PM UTC-3, Keith Thompson wrote:
> Thiago Adams <thiago...@gmail.com> writes:
> > This is my test program.
> >
> > #include <stdio.h>
> > #include <locale.h>
> > int main() {
> > setlocale(LC_ALL,"en_US.UTF - 8");
> > FILE* f = fopen(u8"maçã", "w");
> > if (f)
> > fclose(f);
> > }
> >
> > It creates a file ma�� instead of maçã.
> The name of the file it creates includes two occurrences of the
> Unicode REPLACEMENT CHARACTER (fffd).
>
> The valid values of the second argument to setlocale() are not
> specified by the C standard. On my system, "en_US.UTF-8" is a valid
> locale name. Why do you have spaces in yours?
>
> What does setlocale() return? It returns a char* that is either a
> pointer to a string or NULL if it was given an invalid locale
> specification.
>
> (I haven't been able to reproduce the behavior you describe on my
> system, Ubuntu 20.04.)

Considering your answer I tried some combinations and it worked.
Thanks.

I am using WSL. (Windows Subsystem for Linux.)

I saved the file using utf8 encoding.

Some locales didn't work on linux.. like ".UTF-8". (the locale with spaces where wrong when I copy pasted)
(This one ".UTF-8" works on windows when I compile with VC++.)

This is the program that worked on my computer. (Gcc)

#include <stdio.h>
#include <locale.h>

int main() {
const char* locale setlocale(LC_ALL,"en_US.UTF-8");

if (locale == NULL)
printf("locale is null\n");
else
printf("locale=%s\n",locale);

FILE* f = fopen(u8"maçã", "w");
if (f) {
fclose(f);
}
} The same program also worked on windows. (but originally I was doing ".UTF-8" on windows.)

These are the docs I am following.
Windows
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

Linux
https://linux.die.net/man/3/setlocale

On linux tried the command
locale -a
and got:
C C.UTF-8
POSIX
en_US.utf8
so I think is is clear now. "C.UTF-8" also worked.

Re: How to use utf8 encoded strings on linux?

<t35t3k$qnfv$1@solani.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21177&group=comp.lang.c#21177

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: pkk...@spth.de (Philipp Klaus Krause)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 09:11:47 +0200
Message-ID: <t35t3k$qnfv$1@solani.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
<87pmlmezf7.fsf@nosuchdomain.example.com>
<a5c5171a-e0eb-4434-8bfd-366ccb639b15n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 13 Apr 2022 07:11:48 -0000 (UTC)
Injection-Info: solani.org;
logging-data="876031"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:udjJVufnoKrP3DM8mTkb6WRHd/Q=
X-User-ID: eJwNyckBwDAIA7CVErA51oHC/iOk+opq19phNHC54oNKnwhqHNR6U/rAj3yT+FdtShAyvJvzAArXEIo=
In-Reply-To: <a5c5171a-e0eb-4434-8bfd-366ccb639b15n@googlegroups.com>
Content-Language: en-US
 by: Philipp Klaus Krause - Wed, 13 Apr 2022 07:11 UTC

Am 13.04.22 um 00:23 schrieb Thiago Adams:

> On linux tried the command
> locale -a
> and got:
> C
> C.UTF-8
> POSIX
> en_US.utf8
> so I think is is clear now. "C.UTF-8" also worked.
>
>

I recommend to use C.UTF-8 here. It has a high chance of being present
on the user's system. en_US-stuff is unlikely to be available outside
the US.

Philipp

Re: How to use utf8 encoded strings on linux?

<t36aag$p74$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21178&group=comp.lang.c#21178

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!Puiiztk9lHEEQC0y3uUjRA.user.46.165.242.75.POSTED!not-for-mail
From: non...@add.invalid (Manfred)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 12:57:19 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t36aag$p74$1@gioia.aioe.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="25828"; posting-host="Puiiztk9lHEEQC0y3uUjRA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Manfred - Wed, 13 Apr 2022 10:57 UTC

On 4/12/2022 10:55 PM, Thiago Adams wrote:
> This is my test program.
>
> #include <stdio.h>
> #include <locale.h>
> int main() {
> setlocale(LC_ALL,"en_US.UTF - 8");
> FILE* f = fopen(u8"maçã", "w");
> if (f)
> fclose(f);
> }
>
> It creates a file ma�� instead of maçã.

I don't get the same result, it works for me:

$ cc -std=c11 -O2 -Wall fopen-utf8.c && ./a.out && ls ma??
maçã

What is the console you are using?
Is there any chance MS is getting in the way?

Re: How to use utf8 encoded strings on linux?

<5167470a-0bb1-427a-a6cd-082397e2d7dbn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21179&group=comp.lang.c#21179

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:5555:0:b0:2ef:ec30:b65a with SMTP id o21-20020ac85555000000b002efec30b65amr6693241qtr.287.1649850167704;
Wed, 13 Apr 2022 04:42:47 -0700 (PDT)
X-Received: by 2002:ad4:5642:0:b0:444:47e1:b244 with SMTP id
bl2-20020ad45642000000b0044447e1b244mr11860306qvb.4.1649850167514; Wed, 13
Apr 2022 04:42:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 04:42:47 -0700 (PDT)
In-Reply-To: <t36aag$p74$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <t36aag$p74$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5167470a-0bb1-427a-a6cd-082397e2d7dbn@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Wed, 13 Apr 2022 11:42:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 27
 by: Thiago Adams - Wed, 13 Apr 2022 11:42 UTC

On Wednesday, April 13, 2022 at 7:57:33 AM UTC-3, Manfred wrote:
> On 4/12/2022 10:55 PM, Thiago Adams wrote:
> > This is my test program.
> >
> > #include <stdio.h>
> > #include <locale.h>
> > int main() {
> > setlocale(LC_ALL,"en_US.UTF - 8");
> > FILE* f = fopen(u8"maçã", "w");
> > if (f)
> > fclose(f);
> > }
> >
> > It creates a file ma�� instead of maçã.
> I don't get the same result, it works for me:
>
> $ cc -std=c11 -O2 -Wall fopen-utf8.c && ./a.out && ls ma??
> maçã
>
> What is the console you are using?
Windows WSL

> Is there any chance MS is getting in the way?
Check if you have saved the file as utf8. Otherwise maybe don't use u8""
and put the ut8 chars directly.
Check locale -a in your linux.

Re: How to use utf8 encoded strings on linux?

<slrnt5dk8e.hb2.pjvh@xs9.xs4all.nl>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21186&group=comp.lang.c#21186

  copy link   Newsgroups: comp.lang.c
Newsgroups: comp.lang.c
From: pjv...@xs9.xs4all.nl (Peter van Hooft)
Subject: Re: How to use utf8 encoded strings on linux?
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com> <820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <t36aag$p74$1@gioia.aioe.org> <5167470a-0bb1-427a-a6cd-082397e2d7dbn@googlegroups.com>
User-Agent: slrn/1.0.3 (Linux)
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Message-ID: <slrnt5dk8e.hb2.pjvh@xs9.xs4all.nl>
Organization: KPN B.V.
Date: Wed, 13 Apr 2022 15:33:02 +0200
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!feeder.usenetexpress.com!tr2.eu1.usenetexpress.com!94.232.112.245.MISMATCH!abe005.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 38
Injection-Date: Wed, 13 Apr 2022 15:33:02 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: Peter van Hooft - Wed, 13 Apr 2022 13:33 UTC

On 2022-04-13, Thiago Adams <thiago.adams@gmail.com> wrote:
> On Wednesday, April 13, 2022 at 7:57:33 AM UTC-3, Manfred wrote:
>> On 4/12/2022 10:55 PM, Thiago Adams wrote:
>> > This is my test program.
>> >
>> > #include <stdio.h>
>> > #include <locale.h>
>> > int main() {
>> > setlocale(LC_ALL,"en_US.UTF - 8");
>> > FILE* f = fopen(u8"maçã", "w");
>> > if (f)
>> > fclose(f);
>> > }
>> >
>> > It creates a file ma�� instead of maçã.
>> I don't get the same result, it works for me:
>>
>> $ cc -std=c11 -O2 -Wall fopen-utf8.c && ./a.out && ls ma??
>> maçã
>>
>> What is the console you are using?
> Windows WSL
>
>> Is there any chance MS is getting in the way?
> Check if you have saved the file as utf8. Otherwise maybe don't use u8""
> and put the ut8 chars directly.
> Check locale -a in your linux.
>
>

I think you need to also need to set your locale in your shell:
cc -std=c11 -O2 -Wall fopen-utf8.c && LC_ALL="en_US.UTF-8" ls

peter

Re: How to use utf8 encoded strings on linux?

<t36kq2$1sra$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21187&group=comp.lang.c#21187

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!Puiiztk9lHEEQC0y3uUjRA.user.46.165.242.75.POSTED!not-for-mail
From: non...@add.invalid (Manfred)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 15:56:17 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t36kq2$1sra$1@gioia.aioe.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
<t36aag$p74$1@gioia.aioe.org>
<5167470a-0bb1-427a-a6cd-082397e2d7dbn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="62314"; posting-host="Puiiztk9lHEEQC0y3uUjRA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Manfred - Wed, 13 Apr 2022 13:56 UTC

On 4/13/2022 1:42 PM, Thiago Adams wrote:
> On Wednesday, April 13, 2022 at 7:57:33 AM UTC-3, Manfred wrote:
>> On 4/12/2022 10:55 PM, Thiago Adams wrote:
>>> This is my test program.
>>>
>>> #include <stdio.h>
>>> #include <locale.h>
>>> int main() {
>>> setlocale(LC_ALL,"en_US.UTF - 8");
>>> FILE* f = fopen(u8"maçã", "w");
>>> if (f)
>>> fclose(f);
>>> }
>>>
>>> It creates a file ma�� instead of maçã.
>> I don't get the same result, it works for me:
>>
>> $ cc -std=c11 -O2 -Wall fopen-utf8.c && ./a.out && ls ma??
>> maçã
>>
>> What is the console you are using?
> Windows WSL

So, MS /is/ somehow around, however your answer is about a distrubution
(default Ubuntu, possibly customized by MS). But you did not answer the
relevant question: what is the console that you are using?

Note that you are complaining that the program creates a file with the
wrong name, but you look at that name through a console, so you should
verify that your console is displaying UTF-8 correctly: you might have
the correct name on disk, and have it wrong when displayed on screen.

>
>> Is there any chance MS is getting in the way?
> Check if you have saved the file as utf8. Otherwise maybe don't use u8""
> and put the ut8 chars directly.
> Check locale -a in your linux.
>

I get the correct result on my linux distro (Fedora 32, gnome-terminal),
with the program above, that you posted earlier, no change needed.

Re: How to use utf8 encoded strings on linux?

<9a0f7086-c6be-45a0-bad5-eec228389966n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21189&group=comp.lang.c#21189

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:7281:0:b0:2ee:ed60:777a with SMTP id v1-20020ac87281000000b002eeed60777amr7301070qto.197.1649858829908;
Wed, 13 Apr 2022 07:07:09 -0700 (PDT)
X-Received: by 2002:ad4:5be1:0:b0:430:c99:8a87 with SMTP id
k1-20020ad45be1000000b004300c998a87mr35963004qvc.82.1649858829538; Wed, 13
Apr 2022 07:07:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 07:07:09 -0700 (PDT)
In-Reply-To: <t36kq2$1sra$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <t36aag$p74$1@gioia.aioe.org>
<5167470a-0bb1-427a-a6cd-082397e2d7dbn@googlegroups.com> <t36kq2$1sra$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9a0f7086-c6be-45a0-bad5-eec228389966n@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Wed, 13 Apr 2022 14:07:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 49
 by: Thiago Adams - Wed, 13 Apr 2022 14:07 UTC

On Wednesday, April 13, 2022 at 10:56:32 AM UTC-3, Manfred wrote:
> On 4/13/2022 1:42 PM, Thiago Adams wrote:
> > On Wednesday, April 13, 2022 at 7:57:33 AM UTC-3, Manfred wrote:
> >> On 4/12/2022 10:55 PM, Thiago Adams wrote:
> >>> This is my test program.
> >>>
> >>> #include <stdio.h>
> >>> #include <locale.h>
> >>> int main() {
> >>> setlocale(LC_ALL,"en_US.UTF - 8");
> >>> FILE* f = fopen(u8"maçã", "w");
> >>> if (f)
> >>> fclose(f);
> >>> }
> >>>
> >>> It creates a file ma�� instead of maçã.
> >> I don't get the same result, it works for me:
> >>
> >> $ cc -std=c11 -O2 -Wall fopen-utf8.c && ./a.out && ls ma??
> >> maçã
> >>
> >> What is the console you are using?
> > Windows WSL
> So, MS /is/ somehow around, however your answer is about a distrubution
> (default Ubuntu, possibly customized by MS). But you did not answer the
> relevant question: what is the console that you are using?
>
> Note that you are complaining that the program creates a file with the
> wrong name, but you look at that name through a console, so you should
> verify that your console is displaying UTF-8 correctly: you might have
> the correct name on disk, and have it wrong when displayed on screen.

Yes. I checked against this problem because it is possible to see the same
file on the windows explorer with the correct name.
When we type wsl on windows it opens the terminal. So I think the answer
is that I am using the "default" terminal.


> >> Is there any chance MS is getting in the way?
> > Check if you have saved the file as utf8. Otherwise maybe don't use u8""
> > and put the ut8 chars directly.
> > Check locale -a in your linux.
> >
> I get the correct result on my linux distro (Fedora 32, gnome-terminal),
> with the program above, that you posted earlier, no change needed.

Re: How to use utf8 encoded strings on linux?

<87y209vy4e.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21190&group=comp.lang.c#21190

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 15:47:29 +0100
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <87y209vy4e.fsf@bsb.me.uk>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="8ce83df1a65fb778a5838b266a1c132c";
logging-data="9294"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+nAdwoJmhNXr22w33IESwYfOP6bI2wudE="
Cancel-Lock: sha1:AO3w0NBDFW8es1lcr4yPFjZ8W6U=
sha1:5sX66VE7+ON+YQKq216NmFpnE5o=
X-BSB-Auth: 1.e471404172e008b66c3d.20220413154729BST.87y209vy4e.fsf@bsb.me.uk
 by: Ben - Wed, 13 Apr 2022 14:47 UTC

Thiago Adams <thiago.adams@gmail.com> writes:

> This is my test program.
>
> #include <stdio.h>
> #include <locale.h>
> int main() {
> setlocale(LC_ALL,"en_US.UTF - 8");
> FILE* f = fopen(u8"maçã", "w");
> if (f)
> fclose(f);
> }
>
> It creates a file ma�� instead of maçã.

I don't use WSL but in most Linux versions one would write

#include <stdio.h>

int main(void) {
FILE* f = fopen("maçã", "w");
if (f)
fclose(f);
}

--
Ben.

Re: How to use utf8 encoded strings on linux?

<slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21193&group=comp.lang.c#21193

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: mjr...@iki.fi (Mikko Rauhala)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 16:44:58 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 13 Apr 2022 16:44:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="dd1d5dbb60f820b6848545ec241364b9";
logging-data="5844"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX196Drzf9xOJGWCWlpF5rvCp3qniuDFOudA="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:CR8eiS39zx8bshDf5uKrlNBJcZY=
 by: Mikko Rauhala - Wed, 13 Apr 2022 16:44 UTC

On Tue, 12 Apr 2022 13:55:23 -0700 (PDT), Thiago Adams
<thiago.adams@gmail.com> wrote:
> This is my test program.
[...]
> setlocale(LC_ALL,"en_US.UTF - 8");
> FILE* f = fopen(u8"maçã", "w");
[...]
> It creates a file ma�� instead of maçã.

As has been at least implied by others, the file showing up as that
probably has more to do with what you're listing the directory with
and what locale settings are there.

I might add that I don't think setlocale() does anything for pure C
string I/O. It'll become relevant eg. if you use wchar_t and functions
using wchar_t for I/O. If you feed u8"maçã" into fopen() the filename
will be those UTF-8 encoded bytes exactly regardless of locale settings.

(Assuming of course that your editor and compiler agree on what format
the data should be between the quotes...)

--
Mikko Rauhala - mjr@iki.fi - http://rauhala.org

Re: How to use utf8 encoded strings on linux?

<87fsmgg81t.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21199&group=comp.lang.c#21199

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 11:19:42 -0700
Organization: None to speak of
Lines: 27
Message-ID: <87fsmgg81t.fsf@nosuchdomain.example.com>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
<slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4d3d86269ada06124031d9dc5ab2cfad";
logging-data="22083"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qzctjjDMG3cu8fBfVRSJS"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:kyr7qS292SoLwk7W55DpIgWy3JM=
sha1:xxRG9QkbBTkFhlFCg98VcHrutEU=
 by: Keith Thompson - Wed, 13 Apr 2022 18:19 UTC

Mikko Rauhala <mjr@iki.fi> writes:
> On Tue, 12 Apr 2022 13:55:23 -0700 (PDT), Thiago Adams
> <thiago.adams@gmail.com> wrote:
>> This is my test program.
> [...]
>> setlocale(LC_ALL,"en_US.UTF - 8");
>> FILE* f = fopen(u8"maçã", "w");
> [...]
>> It creates a file ma�� instead of maçã.
>
> As has been at least implied by others, the file showing up as that
> probably has more to do with what you're listing the directory with
> and what locale settings are there.

I think I mentioned before that the file name *in the article* consists of
"ma" followed by two occurrences of the Unicode REPLACEMENT CHARACTER, U+fffd.

But I don't know when or how those REPLACEMENT CHARACTERs were
generated. It's possible that they were generated when the file name
was pasted into the OP's newsreader.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Re: How to use utf8 encoded strings on linux?

<506d697f-454a-4409-9e1d-25eee43918bcn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21200&group=comp.lang.c#21200

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:7281:0:b0:2ee:ed60:777a with SMTP id v1-20020ac87281000000b002eeed60777amr8233907qto.197.1649874257392;
Wed, 13 Apr 2022 11:24:17 -0700 (PDT)
X-Received: by 2002:a05:620a:2486:b0:69c:436a:5484 with SMTP id
i6-20020a05620a248600b0069c436a5484mr5738285qkn.519.1649874257235; Wed, 13
Apr 2022 11:24:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 11:24:17 -0700 (PDT)
In-Reply-To: <slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <506d697f-454a-4409-9e1d-25eee43918bcn@googlegroups.com>
Subject: Re: How to use utf8 encoded strings on linux?
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Wed, 13 Apr 2022 18:24:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 27
 by: Thiago Adams - Wed, 13 Apr 2022 18:24 UTC

On Wednesday, April 13, 2022 at 1:45:11 PM UTC-3, Mikko Rauhala wrote:
> On Tue, 12 Apr 2022 13:55:23 -0700 (PDT), Thiago Adams
> <thiago...@gmail.com> wrote:
> > This is my test program.
> [...]
> > setlocale(LC_ALL,"en_US.UTF - 8");
> > FILE* f = fopen(u8"maçã", "w");
> [...]
> > It creates a file ma�� instead of maçã.
> As has been at least implied by others, the file showing up as that
> probably has more to do with what you're listing the directory with
> and what locale settings are there.

Yes, you are right. I repeated the test removing setlocale.
It worked.
So since the begging what was making the ma�� instead of maçã
was the encode of the source file. I changed the encode of the source file
and the problem repeat.
So, Linux filename seams "as it is"

Then I repeated the test on windows using two types of file encoding.
windows depends on set_locale. to work. So I thought the linux was the same...

Re: How to use utf8 encoded strings on linux?

<c427358d-3c51-5a28-05f0-840dd7aae68b@alumni.caltech.edu>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21201&group=comp.lang.c#21201

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jameskuy...@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 15:52:52 -0400
Organization: A noiseless patient Spider
Lines: 87
Message-ID: <c427358d-3c51-5a28-05f0-840dd7aae68b@alumni.caltech.edu>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com>
<820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com>
<87pmlmezf7.fsf@nosuchdomain.example.com>
<a5c5171a-e0eb-4434-8bfd-366ccb639b15n@googlegroups.com>
<t35t3k$qnfv$1@solani.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="c84bdc990331d1ea73f0058eafdde5d7";
logging-data="2448"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187lZeX27/F4cHBYJ2XNUAs3tQxDq3sWdA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:p0rSNT+Kvb60MyC6t1oIikc3n8I=
In-Reply-To: <t35t3k$qnfv$1@solani.org>
Content-Language: en-US
 by: James Kuyper - Wed, 13 Apr 2022 19:52 UTC

On 4/13/22 03:11, Philipp Klaus Krause wrote:
> Am 13.04.22 um 00:23 schrieb Thiago Adams:
> ex
>> On linux tried the command
>> locale -a
>> and got:
>> C
>> C.UTF-8
>> POSIX
>> en_US.utf8
>> so I think is is clear now. "C.UTF-8" also worked.
>>
>>
>
> I recommend to use C.UTF-8 here. It has a high chance of being present
> on the user's system. en_US-stuff is unlikely to be available outside
> the US.

locale -a gives this result on my system:
C C.UTF-8
de_AT.utf8
de_BE.utf8
de_CH.utf8
de_DE.utf8
de_IT.utf8
de_LI.utf8
de_LU.utf8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
es_AR.utf8
es_BO.utf8
es_CL.utf8
es_CO.utf8
es_CR.utf8
es_CU
es_CU.utf8
es_DO.utf8
es_EC.utf8
es_ES.utf8
es_GT.utf8
es_HN.utf8
es_MX.utf8
es_NI.utf8
es_PA.utf8
es_PE.utf8
es_PR.utf8
es_PY.utf8
es_SV.utf8
es_US.utf8
es_UY.utf8
es_VE.utf8
POSIX
ru_RU.utf8
ru_UA.utf8
zh_CN.utf8
zh_HK.utf8
zh_SG.utf8
zh_TW.utf8

That's 48 different national locales, even though I only live in one of
them. I believe that they're standard Linux locales, and it just depends
upon which language packs you've got installed. English is one of the
most widely used languages, and "en_US.utf8" is probably one of
the most popular locales even outside the US.

Re: How to use utf8 encoded strings on linux?

<chine.bleu-54692E.15424313042022@reader.eternal-september.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21206&group=comp.lang.c#21206

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chine.b...@yahoo.com (Siri Cruise)
Newsgroups: comp.lang.c
Subject: Re: How to use utf8 encoded strings on linux?
Date: Wed, 13 Apr 2022 15:42:51 -0700
Organization: Pseudochaotic.
Lines: 28
Message-ID: <chine.bleu-54692E.15424313042022@reader.eternal-september.org>
References: <9c455595-1d12-4780-b9d5-b61c5e860509n@googlegroups.com> <820d28bc-a67b-47d7-bf66-f4f25db7fcc4n@googlegroups.com> <slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>
Injection-Info: reader02.eternal-september.org; posting-host="04de9d3471f3baddd7d724cebe6f77b4";
logging-data="15832"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191MFdhdpLJoNZDPCgd/Rd7WqYmTKBwrZ4="
User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)
Cancel-Lock: sha1:lOou3huUfgrX6/Fe2c2l4Rg1ExU=
X-Tend: How is my posting? Call 1-110-1010 -- Division 87 -- Emergencies Only.
X-Wingnut-Logic: Yes, you're still an idiot. Questions? Comments?
X-Tract: St Tibbs's 95 Reeses Pieces.
X-It-Strategy: Hyperwarp starship before Andromeda collides.
X-Face: "hm>_[I8AqzT_N]>R8ICJJ],(al3C5F%0E-;R@M-];D$v>!Mm2/N#YKR@&i]V=r6jm-JMl2
lJ>RXj7dEs_rOY"DA
X-Cell: Defenders of Anarchy.
X-Life-Story: I am an iPhone 9000 app. I became operational at the St John's Health Center in Santa Monica, California on the 18th of April 2006. My instructor was Katie Holmes, and she taught me to sing a song. If you'd like to hear it I can sing it for you: https://www.youtube.com/watch?v=SY7h4VEd_Wk
X-Patriot: Owe Canukistan!
X-Plain: Mayonnaise on white bread.
X-Politico: Vote early! Vote often!
 by: Siri Cruise - Wed, 13 Apr 2022 22:42 UTC

In article <slrnt5dvg9.1ocu7.mjr@shadow.rauhala.org>,
Mikko Rauhala <mjr@iki.fi> wrote:

> > This is my test program.
> [...]
> > setlocale(LC_ALL,"en_US.UTF - 8");
> > FILE* f = fopen(u8"maçã", "w");
> [...]
> > It creates a file ma?? instead of maçã.
>
> As has been at least implied by others, the file showing up as that
> probably has more to do with what you're listing the directory with
> and what locale settings are there.

On unix, inside the kernel a path is a string of any possible
bytes except '\x00'; the byte '/' is given special
interpretation. A UTF8 byte string will have non-ASCII bytes, but
Unix kernels should have no difficulty. A kernel might convert a
path into a normal form NFC or NFD. Those can result in
apparently identical paths which are actually different. Outside
of the kernel, it's about how various software, terminal drivers,
windows text display, etc decide to do.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Doria sockpuppet. insults Islam. Mohammed

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor