Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"The one charm of marriage is that it makes a life of deception a necessity." -- Oscar Wilde


devel / comp.lang.c / strcasecmp for utf8 strings

SubjectAuthor
* strcasecmp for utf8 stringsThiago Adams
+* Re: strcasecmp for utf8 stringsMalcolm McLean
|+* Re: strcasecmp for utf8 stringsThiago Adams
||+* Re: strcasecmp for utf8 stringsMalcolm McLean
|||+* Re: strcasecmp for utf8 stringsMateusz Viste
||||+- Re: strcasecmp for utf8 stringsManfred
||||`* Re: strcasecmp for utf8 stringsno name
|||| +- [OT] Accents on uppercase letters in French (Was: strcasecmp for utf8 strings)Ben Bacarisse
|||| `- Re: strcasecmp for utf8 stringsKeith Thompson
|||`* Re: strcasecmp for utf8 stringsBen
||| `- Re: strcasecmp for utf8 stringsMalcolm McLean
||`* Re: strcasecmp for utf8 stringsDavid Brown
|| `* Re: strcasecmp for utf8 stringsKeith Thompson
||  `- Re: strcasecmp for utf8 stringsMichael Bäuerle
|`* Re: strcasecmp for utf8 stringsJorgen Grahn
| `- Re: strcasecmp for utf8 stringsSiri Cruise
+- Re: strcasecmp for utf8 stringsMichael Bäuerle
+- Re: strcasecmp for utf8 stringsLynn McGuire
`- Re: strcasecmp for utf8 stringsGuillaume

1
strcasecmp for utf8 strings

<cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21180&group=comp.lang.c#21180

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:4309:b0:67b:3fc1:86eb with SMTP id u9-20020a05620a430900b0067b3fc186ebmr6431504qko.495.1649850566496;
Wed, 13 Apr 2022 04:49:26 -0700 (PDT)
X-Received: by 2002:a05:622a:60f:b0:2e2:750:ce24 with SMTP id
z15-20020a05622a060f00b002e20750ce24mr6833583qta.315.1649850566327; Wed, 13
Apr 2022 04:49:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 04:49:26 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
Subject: strcasecmp for utf8 strings
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Wed, 13 Apr 2022 11:49:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 8
 by: Thiago Adams - Wed, 13 Apr 2022 11:49 UTC

I need to compare utf-8 strings ignoring the case on windows and linux.

I tried _stricmp on windows and strcasecmp on linux
changing the locale to utf8. No success.

Has anyone tried this? Have this problem?

I'm giving up trying to make this work using native SO functions..and maybe I will use a utf8-lib that has some implementation.

Re: strcasecmp for utf8 strings

<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21181&group=comp.lang.c#21181

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:7dc8:0:b0:2e1:b3ec:6666 with SMTP id c8-20020ac87dc8000000b002e1b3ec6666mr6994542qte.556.1649853231915;
Wed, 13 Apr 2022 05:33:51 -0700 (PDT)
X-Received: by 2002:ac8:7fcc:0:b0:2e0:7760:2f10 with SMTP id
b12-20020ac87fcc000000b002e077602f10mr7029397qtk.34.1649853231773; Wed, 13
Apr 2022 05:33:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 05:33:51 -0700 (PDT)
In-Reply-To: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:48c3:a6c3:9d4e:93ff;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:48c3:a6c3:9d4e:93ff
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
Subject: Re: strcasecmp for utf8 strings
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Wed, 13 Apr 2022 12:33:51 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 12
 by: Malcolm McLean - Wed, 13 Apr 2022 12:33 UTC

On Wednesday, 13 April 2022 at 12:49:33 UTC+1, Thiago Adams wrote:
> I need to compare utf-8 strings ignoring the case on windows and linux.
>
> I tried _stricmp on windows and strcasecmp on linux
> changing the locale to utf8. No success.
>
> Has anyone tried this? Have this problem?
>
> I'm giving up trying to make this work using native SO functions..and maybe I will use a utf8-lib that has some implementation.
>
Case-insensitivity doesn't translate easily to non-English languages. Even in French, e-acute is capitalised as a simple "E"
without an accent, so the reversibility principle is broken. In Hebrew, you don't have a concept of capitals at all, but
you do have other variation in the basic 22 character set.

Re: strcasecmp for utf8 strings

<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21182&group=comp.lang.c#21182

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:4014:b0:69c:10af:d98e with SMTP id h20-20020a05620a401400b0069c10afd98emr6508997qko.633.1649854421782;
Wed, 13 Apr 2022 05:53:41 -0700 (PDT)
X-Received: by 2002:ac8:518b:0:b0:2f0:62f0:a8ae with SMTP id
c11-20020ac8518b000000b002f062f0a8aemr6107309qtn.51.1649854421584; Wed, 13
Apr 2022 05:53:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 05:53:41 -0700 (PDT)
In-Reply-To: <3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.248.114; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.248.114
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com> <3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
Subject: Re: strcasecmp for utf8 strings
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Wed, 13 Apr 2022 12:53:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 33
 by: Thiago Adams - Wed, 13 Apr 2022 12:53 UTC

On Wednesday, April 13, 2022 at 9:34:00 AM UTC-3, Malcolm McLean wrote:
> On Wednesday, 13 April 2022 at 12:49:33 UTC+1, Thiago Adams wrote:
> > I need to compare utf-8 strings ignoring the case on windows and linux.
> >
> > I tried _stricmp on windows and strcasecmp on linux
> > changing the locale to utf8. No success.
> >
> > Has anyone tried this? Have this problem?
> >
> > I'm giving up trying to make this work using native SO functions..and maybe I will use a utf8-lib that has some implementation.
> >
> Case-insensitivity doesn't translate easily to non-English languages. Even in French, e-acute is capitalised as a simple "E"
> without an accent, so the reversibility principle is broken.

é becomes E in French?

The lib I am looking at this moment
https://github.com/sheredom/utf8.h/blob/master/utf8.h
(see utf8casecmp.)
Compares if lowercase matches or if uppercase matches.

So comparing "é" with "E" in French
if é == e or E == E returns true.

Re: strcasecmp for utf8 strings

<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21183&group=comp.lang.c#21183

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a0c:e8ca:0:b0:446:e7a:61af with SMTP id m10-20020a0ce8ca000000b004460e7a61afmr4980825qvo.37.1649854999478;
Wed, 13 Apr 2022 06:03:19 -0700 (PDT)
X-Received: by 2002:ac8:5851:0:b0:2e1:eba3:3beb with SMTP id
h17-20020ac85851000000b002e1eba33bebmr7082306qth.20.1649854998697; Wed, 13
Apr 2022 06:03:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 06:03:18 -0700 (PDT)
In-Reply-To: <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:48c3:a6c3:9d4e:93ff;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:48c3:a6c3:9d4e:93ff
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com> <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>
Subject: Re: strcasecmp for utf8 strings
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Wed, 13 Apr 2022 13:03:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 37
 by: Malcolm McLean - Wed, 13 Apr 2022 13:03 UTC

On Wednesday, 13 April 2022 at 13:53:49 UTC+1, Thiago Adams wrote:
> On Wednesday, April 13, 2022 at 9:34:00 AM UTC-3, Malcolm McLean wrote:
> > On Wednesday, 13 April 2022 at 12:49:33 UTC+1, Thiago Adams wrote:
> > > I need to compare utf-8 strings ignoring the case on windows and linux.
> > >
> > > I tried _stricmp on windows and strcasecmp on linux
> > > changing the locale to utf8. No success.
> > >
> > > Has anyone tried this? Have this problem?
> > >
> > > I'm giving up trying to make this work using native SO functions..and maybe I will use a utf8-lib that has some implementation.
> > >
> > Case-insensitivity doesn't translate easily to non-English languages. Even in French, e-acute is capitalised as a simple "E"
> > without an accent, so the reversibility principle is broken.
> é becomes E in French?
>
> The lib I am looking at this moment
> https://github.com/sheredom/utf8.h/blob/master/utf8.h
> (see utf8casecmp.)
> Compares if lowercase matches or if uppercase matches.
>
> So comparing "é" with "E" in French
> if é == e or E == E returns true.
>
Yes. You don't write the accents for uppercase letters in French.
So E matches either e or e-accented. So the logic
if (tolower(char_a) == tolower(char_b))

which is acceptable for English, would break in French. If you used
toupper() it would work, but possibly break for other languages.

It seems the casecmp() functions are defined to work if either toupper()
or tolower() give a match, which might work sufficiently well to be
usable.

Re: strcasecmp for utf8 strings

<AABiVsw1nFUAAAGE.A3.flnews@WStation5.stz-e.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21184&group=comp.lang.c#21184

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: michael....@stz-e.de (Michael Bäuerle)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Wed, 13 Apr 2022 15:12:21 +0200 (CEST)
Lines: 18
Message-ID: <AABiVsw1nFUAAAGE.A3.flnews@WStation5.stz-e.de>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
Reply-To: Michael Bäuerle <michael.baeuerle@gmx.net>
Mime-Version: 1.0
X-Trace: individual.net itWf7SLjJSAWuYjTdFsy0AUOnFW1BC6dNDTLhAinfoV4a6bLG8
X-Orig-Path: not-for-mail
Cancel-Lock: sha1:c+K9kfyox6lyqlSsuTrUElcmyko= sha256:rwRJ5HulsGf7JQ6h64OJgCcgV9gMfhTHGVFmM7KPuGA= sha1:0Rs/b+eDhKU5kebcWJdTilBy+ck=
Injection-Date: Wed, 13 Apr 2022 13:12:21 -0000
User-Agent: flnews/1.1.0pre14 (for NetBSD)
 by: Michael Bäuerle - Wed, 13 Apr 2022 13:12 UTC

Thiago Adams wrote:
>
> I need to compare utf-8 strings ignoring the case on windows and linux.

This is a nontrivial task with Unicode:
<https://www.unicode.org/reports/tr21/tr21-5.html#Caseless_Matching>

> [...]
> and maybe I will use a utf8-lib that has some implementation.

One such library is ICU: <https://icu.unicode.org/>
Common GNU/Linux distributions should ship a package for it.
According to Wikipedia, it is shipped with Windows 10 since version
1703:
<https://en.wikipedia.org/wiki/International_Components_for_Unicode>

The function u_strcasecmp() maybe is suitable for what you want to do:
<https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ustring_8h.html#a418fdda6c0b3ffd4da6fa8b4f46fafec>

Re: strcasecmp for utf8 strings

<t36ihv$hda$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21185&group=comp.lang.c#21185

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!QWEIpcoUowrm0T/IOV1a3A.user.46.165.242.75.POSTED!not-for-mail
From: mate...@xyz.invalid (Mateusz Viste)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Wed, 13 Apr 2022 15:17:51 +0200
Organization: . . .
Message-ID: <t36ihv$hda$1@gioia.aioe.org>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="17834"; posting-host="QWEIpcoUowrm0T/IOV1a3A.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
X-Notice: Filtered by postfilter v. 0.9.2
 by: Mateusz Viste - Wed, 13 Apr 2022 13:17 UTC

2022-04-13 at 06:03 -0700, Malcolm McLean wrote:
> Yes. You don't write the accents for uppercase letters in French.

That is very much incorrect. Missing accents on French uppercase
letters is only a sign of laziness or ancient technical limitations.

Mateusz

Re: strcasecmp for utf8 strings

<t36lb1$214$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21188&group=comp.lang.c#21188

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!Puiiztk9lHEEQC0y3uUjRA.user.46.165.242.75.POSTED!not-for-mail
From: non...@add.invalid (Manfred)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Wed, 13 Apr 2022 16:05:20 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t36lb1$214$1@gioia.aioe.org>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>
<t36ihv$hda$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="2084"; posting-host="Puiiztk9lHEEQC0y3uUjRA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Manfred - Wed, 13 Apr 2022 14:05 UTC

On 4/13/2022 3:17 PM, Mateusz Viste wrote:
> 2022-04-13 at 06:03 -0700, Malcolm McLean wrote:
>> Yes. You don't write the accents for uppercase letters in French.
>
> That is very much incorrect. Missing accents on French uppercase
> letters is only a sign of laziness or ancient technical limitations.
>
> Mateusz
>

I believe you are right:

é É

á Á

Re: strcasecmp for utf8 strings

<87sfqhvxz5.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21191&group=comp.lang.c#21191

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Wed, 13 Apr 2022 15:50:38 +0100
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <87sfqhvxz5.fsf@bsb.me.uk>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="8ce83df1a65fb778a5838b266a1c132c";
logging-data="9294"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+i2ABHjuKBLLTZbjjiHza/83WTaEU7Qw="
Cancel-Lock: sha1:YZEJslC8ygzlM2ZIvXihb4cvX/g=
sha1:6DP+Q42HPucBJnxJIksasW3WsCg=
X-BSB-Auth: 1.5c1c5a7d9156cf2fff03.20220413155038BST.87sfqhvxz5.fsf@bsb.me.uk
 by: Ben - Wed, 13 Apr 2022 14:50 UTC

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Wednesday, 13 April 2022 at 13:53:49 UTC+1, Thiago Adams wrote:
>> On Wednesday, April 13, 2022 at 9:34:00 AM UTC-3, Malcolm McLean wrote:

>> > Case-insensitivity doesn't translate easily to non-English
>> > languages. Even in French, e-acute is capitalised as a simple "E"
>> > without an accent, so the reversibility principle is broken.
>>
>> é becomes E in French?
<cut>
> Yes. You don't write the accents for uppercase letters in French.

Je vous propose:

https://www.academie-francaise.fr/questions-de-langue

--
Ben.

Re: strcasecmp for utf8 strings

<286edb3f-8ac6-4f31-a423-db71fa503807n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21192&group=comp.lang.c#21192

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:6382:0:b0:69b:fbde:42d1 with SMTP id x124-20020a376382000000b0069bfbde42d1mr7832223qkb.48.1649867883665;
Wed, 13 Apr 2022 09:38:03 -0700 (PDT)
X-Received: by 2002:a37:b4d:0:b0:69c:4817:4355 with SMTP id
74-20020a370b4d000000b0069c48174355mr5402049qkl.100.1649867883455; Wed, 13
Apr 2022 09:38:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 13 Apr 2022 09:38:03 -0700 (PDT)
In-Reply-To: <87sfqhvxz5.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:48c3:a6c3:9d4e:93ff;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:48c3:a6c3:9d4e:93ff
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com> <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com> <87sfqhvxz5.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <286edb3f-8ac6-4f31-a423-db71fa503807n@googlegroups.com>
Subject: Re: strcasecmp for utf8 strings
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Wed, 13 Apr 2022 16:38:03 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 22
 by: Malcolm McLean - Wed, 13 Apr 2022 16:38 UTC

On Wednesday, 13 April 2022 at 15:50:50 UTC+1, Ben wrote:
> Malcolm McLean <malcolm.ar...@gmail.com> writes:
>
> > On Wednesday, 13 April 2022 at 13:53:49 UTC+1, Thiago Adams wrote:
> >> On Wednesday, April 13, 2022 at 9:34:00 AM UTC-3, Malcolm McLean wrote:
>
> >> > Case-insensitivity doesn't translate easily to non-English
> >> > languages. Even in French, e-acute is capitalised as a simple "E"
> >> > without an accent, so the reversibility principle is broken.
> >>
> >> é becomes E in French?
> <cut>
> > Yes. You don't write the accents for uppercase letters in French.
> Je vous propose:
>
> https://www.academie-francaise.fr/questions-de-langue
>
Ah, the academy disagrees.
I was taught not to put accents on capitals by my French teacher and
that's the practice followed by many printers. But it is wrong.

Which creates an even more difficult situation for programmers.

Re: strcasecmp for utf8 strings

<slrnt5e4i3.2ude.grahn+nntp@frailea.sa.invalid>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21198&group=comp.lang.c#21198

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: grahn+n...@snipabacken.se (Jorgen Grahn)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: 13 Apr 2022 18:11:15 GMT
Lines: 49
Message-ID: <slrnt5e4i3.2ude.grahn+nntp@frailea.sa.invalid>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
X-Trace: individual.net av7euFgWsC7AHlXbEESW+QM6fOVGrg21L+jf/G5sxbfm3XLiJ9
Cancel-Lock: sha1:CQ0xdDD6zqfxN9X+1U/1zbm2GMQ=
User-Agent: slrn/1.0.3 (OpenBSD)
 by: Jorgen Grahn - Wed, 13 Apr 2022 18:11 UTC

On Wed, 2022-04-13, Malcolm McLean wrote:
> On Wednesday, 13 April 2022 at 12:49:33 UTC+1, Thiago Adams wrote:
>> I need to compare utf-8 strings ignoring the case on windows and linux.
>>
>> I tried _stricmp on windows and strcasecmp on linux
>> changing the locale to utf8. No success.
>>
>> Has anyone tried this? Have this problem?
>>
>> I'm giving up trying to make this work using native SO
>> functions..and maybe I will use a utf8-lib that has some
>> implementation.
>>
> Case-insensitivity doesn't translate easily to non-English languages.

But it /does/ translate.

[snip examples]

If we go back to string comparisons in general, there are other
challenges, too. The right ordering depends on culture, time and
context. If you're unlucky it also depends on the semantics of the
text. The swedish rule for the ae ligature is[1] to sort it as
a-umlaut if the word comes from Iceland, but as ae if it comes from
Latin.

I don't know what Gnu libc does for sv_SE in that case. But I know
it has plenty of code for collation. Table-driven, I think, and
the strings to compare get "compiled" to something which can simply
be strcmp()ed in the "C" locale in the end.

The main conclusion I have drawn about this stuff is: if at all
possible, avoid doing anything "humanistic" to text in your programs.
This includes uppercasing, lowercasing, and sorting in a way which
makes sense to non-programmers. Life becomes much easier this way.
Like the Unix file system -- it doesn't have to know anything about
character sets or languages, because file names are just arrays of
char.

/Jorgen

[1] Or was. Sometimes the rules get dumbed down so they can be
automated. Sometimes they get dumbed down because everyone is
already using software which ignores the rules, and there's no
point having a rule which only a handful of humanists care about.

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Re: strcasecmp for utf8 strings

<chine.bleu-5BE63C.15331313042022@reader.eternal-september.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21205&group=comp.lang.c#21205

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chine.b...@yahoo.com (Siri Cruise)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Wed, 13 Apr 2022 15:33:21 -0700
Organization: Pseudochaotic.
Lines: 22
Message-ID: <chine.bleu-5BE63C.15331313042022@reader.eternal-september.org>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com> <3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com> <slrnt5e4i3.2ude.grahn+nntp@frailea.sa.invalid>
Injection-Info: reader02.eternal-september.org; posting-host="04de9d3471f3baddd7d724cebe6f77b4";
logging-data="11970"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4n2p10CQSFxnAkdDE2+3epgMHE+rqhL8="
User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)
Cancel-Lock: sha1:JMyzvcdR35hyXbfeTzR+7S3B6No=
X-Tend: How is my posting? Call 1-110-1010 -- Division 87 -- Emergencies Only.
X-Wingnut-Logic: Yes, you're still an idiot. Questions? Comments?
X-Tract: St Tibbs's 95 Reeses Pieces.
X-It-Strategy: Hyperwarp starship before Andromeda collides.
X-Face: "hm>_[I8AqzT_N]>R8ICJJ],(al3C5F%0E-;R@M-];D$v>!Mm2/N#YKR@&i]V=r6jm-JMl2
lJ>RXj7dEs_rOY"DA
X-Cell: Defenders of Anarchy.
X-Life-Story: I am an iPhone 9000 app. I became operational at the St John's Health Center in Santa Monica, California on the 18th of April 2006. My instructor was Katie Holmes, and she taught me to sing a song. If you'd like to hear it I can sing it for you: https://www.youtube.com/watch?v=SY7h4VEd_Wk
X-Patriot: Owe Canukistan!
X-Plain: Mayonnaise on white bread.
X-Politico: Vote early! Vote often!
 by: Siri Cruise - Wed, 13 Apr 2022 22:33 UTC

In article <slrnt5e4i3.2ude.grahn+nntp@frailea.sa.invalid>,
Jorgen Grahn <grahn+nntp@snipabacken.se> wrote:

> If we go back to string comparisons in general, there are other
> challenges, too. The right ordering depends on culture, time and

If your text is in unicode, like UTF8, you can define 'caseless'
with unicode upper, lower, and title case. Another old technique
is to use collation tables. Such a table gives a collation order
to each character, so you compare the collation order of
characters rather than the character themselves.

One complication is that a character like e-acute can regarded as
equivalent to a character sequence like e + acute-mark. Unicode
has defined procedures to create canonical encodings for all
strings.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Doria sockpuppet. insults Islam. Mohammed

Re: strcasecmp for utf8 strings

<t38r99$vc3$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21220&group=comp.lang.c#21220

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Thu, 14 Apr 2022 11:59:05 +0200
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <t38r99$vc3$1@dont-email.me>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 14 Apr 2022 09:59:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8ca88fd922bc2c5d7b2ffd96a47536a6";
logging-data="32131"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18XgKJFd7OoiwS6/ZO0kaWbleisI4HcBHY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
Thunderbird/60.6.1
Cancel-Lock: sha1:BVKAAwpqQGxysfADxZG3j43+49U=
In-Reply-To: <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
Content-Language: en-GB
 by: David Brown - Thu, 14 Apr 2022 09:59 UTC

On 13/04/2022 14:53, Thiago Adams wrote:
> On Wednesday, April 13, 2022 at 9:34:00 AM UTC-3, Malcolm McLean wrote:
>> On Wednesday, 13 April 2022 at 12:49:33 UTC+1, Thiago Adams wrote:
>>> I need to compare utf-8 strings ignoring the case on windows and linux.
>>>
>>> I tried _stricmp on windows and strcasecmp on linux
>>> changing the locale to utf8. No success.
>>>
>>> Has anyone tried this? Have this problem?
>>>
>>> I'm giving up trying to make this work using native SO functions..and maybe I will use a utf8-lib that has some implementation.
>>>
>> Case-insensitivity doesn't translate easily to non-English languages. Even in French, e-acute is capitalised as a simple "E"
>> without an accent, so the reversibility principle is broken.
>
> é becomes E in French?
>
> The lib I am looking at this moment
> https://github.com/sheredom/utf8.h/blob/master/utf8.h
> (see utf8casecmp.)
> Compares if lowercase matches or if uppercase matches.
>
> So comparing "é" with "E" in French
> if é == e or E == E returns true.
>

I don't know enough about French to comment. But Malcolm's point about
the problem being massively complicated in other languages is true.

In Turkish, the letter i capitalises to İ (with a dot above it), while
the letter ı (with no dot) capitalises to I. These are entirely
different letters.

In German, ß is used in some circumstances for two small s letters, but
the capital is written SS. Greek has two different small letter sigmas
but only one capital. Arabic has, if I remember correctly (I don't know
the language) five different versions for each written letter rather
than just two as we have in the Latin alphabet.

At the very least, any kind of case-insensitive comparison is going to
be dependent on the language.

Then you can have the joys of transliteration of non-English letters to
English, even disregarding capitalisation. The Norwegian letter "å" is
sometimes transliterated as "aa", sometimes just "a". So the name "Åse"
might sometimes be written "Ase" or "AAse" but be considered the same.
(And "å" is sorted at the end of the alphabet - so sometimes "aa" will
sort at the start, sometimes at the end of a sorting.)

Re: strcasecmp for utf8 strings

<87r15zeciv.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21236&group=comp.lang.c#21236

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Thu, 14 Apr 2022 11:38:16 -0700
Organization: None to speak of
Lines: 15
Message-ID: <87r15zeciv.fsf@nosuchdomain.example.com>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<t38r99$vc3$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="f70bdff0312c3b54d6b132549de7af94";
logging-data="11958"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19g0Hv1QCumgLnx7fHybS9U"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:GqFuZe7YmOeFyLYcE+7go655UpM=
sha1:pm+3JazBx8+i3cSENV3fo+LDHF4=
 by: Keith Thompson - Thu, 14 Apr 2022 18:38 UTC

David Brown <david.brown@hesbynett.no> writes:
[...]
> In German, ß is used in some circumstances for two small s letters,
> but the capital is written SS.

The character ß is called "LATIN SMALL LETTER SHARP S" in Unicode.
There's also a character ẞ "LATIN CAPITAL LETTER SHARP S". (The capital
letter is smaller than the small letter in the font I'm using.) I
hanve't checked the German capitalization rules, but they could in
principle use *either* SS or ẞ.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Re: strcasecmp for utf8 strings

<t3a9bu$4vb$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21241&group=comp.lang.c#21241

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lynnmcgu...@gmail.com (Lynn McGuire)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Thu, 14 Apr 2022 18:05:32 -0500
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <t3a9bu$4vb$1@dont-email.me>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 14 Apr 2022 23:05:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9cca706cc6a9214ac2db00d1912166a6";
logging-data="5099"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eRpLvM4dnl7qJqEzTzhoe"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
Cancel-Lock: sha1:cwkbPsyRxw24upE6lrtti4tWGHk=
In-Reply-To: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
Content-Language: en-US
 by: Lynn McGuire - Thu, 14 Apr 2022 23:05 UTC

On 4/13/2022 6:49 AM, Thiago Adams wrote:
> I need to compare utf-8 strings ignoring the case on windows and linux.
>
> I tried _stricmp on windows and strcasecmp on linux
> changing the locale to utf8. No success.
>
> Has anyone tried this? Have this problem?
>
> I'm giving up trying to make this work using native SO functions..and maybe I will use a utf8-lib that has some implementation.

Looks like strcmpi has the same problem.
https://www.ibm.com/docs/en/i/7.1?topic=functions-strcmpi-compare-strings-without-case-sensitivity

Also boost::iequals
https://stackoverflow.com/questions/11635/case-insensitive-string-comparison-in-c

Lynn

Re: strcasecmp for utf8 strings

<t3ca7v$gsr$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21245&group=comp.lang.c#21245

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!xxg7gqfdWlnbkv8rBcaxvQ.user.46.165.242.75.POSTED!not-for-mail
From: mess...@bottle.org (Guillaume)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Fri, 15 Apr 2022 19:32:42 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t3ca7v$gsr$1@gioia.aioe.org>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="17307"; posting-host="xxg7gqfdWlnbkv8rBcaxvQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: fr
 by: Guillaume - Fri, 15 Apr 2022 17:32 UTC

Le 13/04/2022 à 13:49, Thiago Adams a écrit :
> I need to compare utf-8 strings ignoring the case on windows and linux.
>
> I tried _stricmp on windows and strcasecmp on linux
> changing the locale to utf8. No success.
>
> Has anyone tried this? Have this problem?

There's no easy and standard way of doing this AFAIK. Capitalization
with alphabets other than roman is a complex task.

That's why file systems, such as exFAT (and I think NTFS) which require
case-insensitivity embed "up-case tables", so the file system must be
formatted with the locale supposed to be used for it, and this table is
what should be used to perform the up-casing.

You can probably find libraries embedding such tables for a variety of
locales, but I don't really know of a standard one for this. If anyone
does...

Re: strcasecmp for utf8 strings

<roeqii-cv2.ln1@micha.freeshell.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21249&group=comp.lang.c#21249

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: michael....@gmx.net (Michael Bäuerle)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Sat, 16 Apr 2022 11:27:19 +0200 (CEST)
Lines: 43
Message-ID: <roeqii-cv2.ln1@micha.freeshell.org>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com> <3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com> <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com> <t38r99$vc3$1@dont-email.me> <87r15zeciv.fsf@nosuchdomain.example.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net kwI4FChnMqRDk6O7AkjRAAI4NZgcRx0Wr45kva2E88+0FaON/I
X-Orig-Path: not-for-mail
Cancel-Lock: sha1:eM8wlhaYGJiIMLhBTvew4yd/6Jc=
User-Agent: flnews/1.1.0pre14 (for GNU/Linux)
 by: Michael Bäuerle - Sat, 16 Apr 2022 09:27 UTC

Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
> >
> [...]
> > In German, ß is used in some circumstances for two small s letters,
> > but the capital is written SS.
>
> The character ß is called "LATIN SMALL LETTER SHARP S" in Unicode.
> There's also a character ẞ "LATIN CAPITAL LETTER SHARP S". (The capital
> letter is smaller than the small letter in the font I'm using.) I
> hanve't checked the German capitalization rules, but they could in
> principle use *either* SS or ẞ.

The ẞ "LATIN CAPITAL LETTER SHARP S" has not existed in the past.
The reason was (AFAIK) the ligature nature of ß, that was never used
at the beginning of a word.

If uppercase was needed, e.g. in headlines, SS was used instead.
This was problematic because there are some german words that have
different meaning if they are written with ss instead of ß, e.g.
Maße (measures) vs. Masse (mass). Eventually a capital sharp s was
invented.

Unicode full case folding [1] maps:

S => s
ß => ss
ẞ => ss

This means in a case-insensitive string compare "ß", "ss", "ẞ" and "SS"
should match each other. A case-insensitive search for any of the
following four words should therefore match all of them:

Masse MASSE
Maße MAẞE

As noted above, this is rather wrong in german, because of the different
meaning. But in the past MASSE was the official capitalization for both
words (at least for some time).

__________
[1] <https://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt>

Re: strcasecmp for utf8 strings

<4b3d41d5-997b-4794-889e-ec63f1d267ebn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=25984&group=comp.lang.c#25984

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:a4e:b0:62d:fa0a:64e2 with SMTP id ee14-20020a0562140a4e00b0062dfa0a64e2mr3357659qvb.10.1687709628346;
Sun, 25 Jun 2023 09:13:48 -0700 (PDT)
X-Received: by 2002:a81:b647:0:b0:56c:f8b7:d4fa with SMTP id
h7-20020a81b647000000b0056cf8b7d4famr10679464ywk.7.1687709628095; Sun, 25 Jun
2023 09:13:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Sun, 25 Jun 2023 09:13:47 -0700 (PDT)
In-Reply-To: <t36ihv$hda$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:646:c000:3e:0:0:0:78;
posting-account=AEy2MwoAAADMuiWvD1xPpSTpbSXbimK7
NNTP-Posting-Host: 2601:646:c000:3e:0:0:0:78
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com> <58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com> <t36ihv$hda$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4b3d41d5-997b-4794-889e-ec63f1d267ebn@googlegroups.com>
Subject: Re: strcasecmp for utf8 strings
From: 0mm...@gmail.com (no name)
Injection-Date: Sun, 25 Jun 2023 16:13:48 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: no name - Sun, 25 Jun 2023 16:13 UTC

On Wednesday, April 13, 2022 at 6:18:05 AM UTC-7, Mateusz Viste wrote:
> 2022-04-13 at 06:03 -0700, Malcolm McLean wrote:
> > Yes. You don't write the accents for uppercase letters in French.
> That is very much incorrect. Missing accents on French uppercase
> letters is only a sign of laziness or ancient technical limitations.
>
> Mateusz

No accents on uppercase "are these days tolerated", uppercase letter without accent is French grammar pedantic.
Thank you. Jeez.

[OT] Accents on uppercase letters in French (Was: strcasecmp for utf8 strings)

<87cz1j710n.fsf_-_@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=25985&group=comp.lang.c#25985

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: [OT] Accents on uppercase letters in French (Was: strcasecmp for utf8 strings)
Date: Sun, 25 Jun 2023 21:14:16 +0100
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <87cz1j710n.fsf_-_@bsb.me.uk>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>
<t36ihv$hda$1@gioia.aioe.org>
<4b3d41d5-997b-4794-889e-ec63f1d267ebn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="c848ba523ce3b6b1d79b035dc68945eb";
logging-data="627381"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+/MRnWnInNXaB1wZJ5MpUM5xqQ4xHz8nc="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:tBf2LPpPXX9+Jou3SBK1Qt4ZYMA=
sha1:QGJuSULw7BxVoSILxsnW8erwMsQ=
X-BSB-Auth: 1.1e9164a1c5c27abcfb8a.20230625211416BST.87cz1j710n.fsf_-_@bsb.me.uk
 by: Ben Bacarisse - Sun, 25 Jun 2023 20:14 UTC

no name <0mmw00@gmail.com> writes:

> On Wednesday, April 13, 2022 at 6:18:05 AM UTC-7, Mateusz Viste wrote:
>> 2022-04-13 at 06:03 -0700, Malcolm McLean wrote:
>> > Yes. You don't write the accents for uppercase letters in French.
>> That is very much incorrect. Missing accents on French uppercase
>> letters is only a sign of laziness or ancient technical limitations.
>
> No accents on uppercase "are these days tolerated", uppercase letter
> without accent is French grammar pedantic.

https://www.academie-francaise.fr/questions-de-langue#5_strong-em-accentuation-des-majuscules-em-strong

--
Ben.

Re: strcasecmp for utf8 strings

<87r0pz1a1x.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=25986&group=comp.lang.c#25986

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: strcasecmp for utf8 strings
Date: Sun, 25 Jun 2023 14:55:38 -0700
Organization: None to speak of
Lines: 22
Message-ID: <87r0pz1a1x.fsf@nosuchdomain.example.com>
References: <cc18bf51-8201-455e-9d5d-d3e3168edd98n@googlegroups.com>
<3d74c183-01e2-4161-b8ed-fc66c8df16ddn@googlegroups.com>
<58ed3b3c-e241-4e81-a3ec-6ce30d8e1a6bn@googlegroups.com>
<b1f59d94-6649-4c70-8432-2b9722d928e0n@googlegroups.com>
<t36ihv$hda$1@gioia.aioe.org>
<4b3d41d5-997b-4794-889e-ec63f1d267ebn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="f6bbe842c2be25aa68aaa86a10f5c612";
logging-data="650754"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+YYJI45HYagRgE0lXFg8dU"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:o17+nV8r7IXBAiTmgDRyaB+/kXk=
sha1:DB6MbWv9tu5ZqvrpHWb/Z18HE8Y=
 by: Keith Thompson - Sun, 25 Jun 2023 21:55 UTC

no name <0mmw00@gmail.com> writes:
> On Wednesday, April 13, 2022 at 6:18:05 AM UTC-7, Mateusz Viste wrote:
>> 2022-04-13 at 06:03 -0700, Malcolm McLean wrote:
>> > Yes. You don't write the accents for uppercase letters in French.
>> That is very much incorrect. Missing accents on French uppercase
>> letters is only a sign of laziness or ancient technical limitations.
>>
>> Mateusz
>
> No accents on uppercase "are these days tolerated", uppercase letter without accent is French grammar pedantic.
> Thank you. Jeez.

Can you rephrase that? I know you're talking about accents on uppercase
letters, but I honestly don't know what you're saying about them.

(Anticipating the obvious responses, speculation from others about what
"no name" meant is not useful.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor