Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Diplomacy is the art of saying "nice doggy" until you can find a rock.


aus+uk / uk.comp.sys.mac / Apple and Unicode

SubjectAuthor
* Apple and UnicodeTimS
+* Re: Apple and UnicodeChris Ridd
|`* Re: Apple and UnicodeTimS
| `* Re: Apple and UnicodeChris Ridd
|  `- Re: Apple and UnicodeTimS
`- Re: Apple and UnicodeRichard Tobin

1
Apple and Unicode

<j935p4F3jpeU1@mid.individual.net>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=6909&group=uk.comp.sys.mac#6909

  copy link   Newsgroups: uk.comp.sys.mac
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: timstrea...@greenbee.net (TimS)
Newsgroups: uk.comp.sys.mac
Subject: Apple and Unicode
Date: 12 Mar 2022 09:00:20 GMT
Lines: 13
Message-ID: <j935p4F3jpeU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net Pg3Lb8jEw5ZVFPykLaf4NAzF8g0tZf/FQLRYBPF943840skPCI
Cancel-Lock: sha1:GUVLeypo4vIsbiBnbVLY6AJbTqY=
X-No-Archive: Yes
User-Agent: Usenapp/1.18/l for MacOS - Full License
 by: TimS - Sat, 12 Mar 2022 09:00 UTC

Anyone here know why Apple seems to use decomposed UTF8 when making characters
with accents, such as é ?

The letter é can be made either composed (U+00E9 or C3 A9 in UTF8) or
decomposed (U+0065 followed by U+0301, or 65 CC 81 in UTF8). As you see, the
composed é is shorter (two bytes instead of three). Both will render as é.

Do some Asian languages require the notion of using decomposed fragments of
letters which are then composed to form a glyph (is that the word?).
Otherwise, why does Unicode have composing characters?

--
Tim

Re: Apple and Unicode

<t0hqjb$vkc$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=6910&group=uk.comp.sys.mac#6910

  copy link   Newsgroups: uk.comp.sys.mac
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chrisr...@mac.com (Chris Ridd)
Newsgroups: uk.comp.sys.mac
Subject: Re: Apple and Unicode
Date: Sat, 12 Mar 2022 09:53:46 +0000
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <t0hqjb$vkc$1@dont-email.me>
References: <j935p4F3jpeU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 12 Mar 2022 09:53:47 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8240628fcb0e004b4642e8e01263be85";
logging-data="32396"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Ziph31wKQ/eZroWiisWAWLS2xQzVTWDs="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Cancel-Lock: sha1:WgE66iN5zVQQjswaSMXftLDfTMM=
In-Reply-To: <j935p4F3jpeU1@mid.individual.net>
 by: Chris Ridd - Sat, 12 Mar 2022 09:53 UTC

On 12/03/2022 09:00, TimS wrote:
> Anyone here know why Apple seems to use decomposed UTF8 when making characters
> with accents, such as é ?

I've no idea, but if you're using a dead key to produce your accents
then it might be "natural" for that to produce the initial non-spacing
diacritical. If you've got a physical French/German/etc keyboard and are
striking an accented/umlauted letter perhaps it does something different?

--
Chris

Re: Apple and Unicode

<j93cvlF4ts9U1@mid.individual.net>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=6915&group=uk.comp.sys.mac#6915

  copy link   Newsgroups: uk.comp.sys.mac
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: timstrea...@greenbee.net (TimS)
Newsgroups: uk.comp.sys.mac
Subject: Re: Apple and Unicode
Date: 12 Mar 2022 11:03:17 GMT
Lines: 19
Message-ID: <j93cvlF4ts9U1@mid.individual.net>
References: <j935p4F3jpeU1@mid.individual.net> <t0hqjb$vkc$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net zdukn8GnjYIBf5OBqREDdw2jlaxyYfsjLlxuwAWHryKk2H0e9w
Cancel-Lock: sha1:csSD9xB33o1MQTy1Rlb8X8O0Yss=
X-No-Archive: Yes
User-Agent: Usenapp/1.18/l for MacOS - Full License
 by: TimS - Sat, 12 Mar 2022 11:03 UTC

On 12 Mar 2022 at 09:53:46 GMT, Chris Ridd <chrisridd@mac.com> wrote:

> On 12/03/2022 09:00, TimS wrote:
>> Anyone here know why Apple seems to use decomposed UTF8 when making characters
>> with accents, such as é ?
>
> I've no idea, but if you're using a dead key to produce your accents
> then it might be "natural" for that to produce the initial non-spacing
> diacritical. If you've got a physical French/German/etc keyboard and are
> striking an accented/umlauted letter perhaps it does something different?

Mmm. I believe this is being tested by someone who has a Swiss-French
keyboard. If he reports back I'll let you know. But I'm sure I've read on one
of the 1.0E99 websites that claims to be omniscient in such matters, that
macOS filenames (i.e. what is actually in the directory) are always
decomposed. BICBW.

--
Tim

Re: Apple and Unicode

<t0i13u$2qc0$1@macpro.inf.ed.ac.uk>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=6916&group=uk.comp.sys.mac#6916

  copy link   Newsgroups: uk.comp.sys.mac
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!feeds.news.ox.ac.uk!news.ox.ac.uk!usenet.inf.ed.ac.uk!.POSTED!not-for-mail
From: rich...@cogsci.ed.ac.uk (Richard Tobin)
Newsgroups: uk.comp.sys.mac
Subject: Re: Apple and Unicode
Date: Sat, 12 Mar 2022 11:45:02 +0000 (UTC)
Organization: Language Technology Group, University of Edinburgh
Lines: 27
Message-ID: <t0i13u$2qc0$1@macpro.inf.ed.ac.uk>
References: <j935p4F3jpeU1@mid.individual.net>
NNTP-Posting-Host: macaroni.inf.ed.ac.uk
X-Trace: macpro.inf.ed.ac.uk 1647085502 92544 129.215.197.42 (12 Mar 2022 11:45:02 GMT)
X-Complaints-To: usenet@macpro.inf.ed.ac.uk
NNTP-Posting-Date: Sat, 12 Mar 2022 11:45:02 +0000 (UTC)
X-Newsreader: trn 4.0-test76 (Apr 2, 2001)
Originator: richard@cogsci.ed.ac.uk (Richard Tobin)
 by: Richard Tobin - Sat, 12 Mar 2022 11:45 UTC

In article <j935p4F3jpeU1@mid.individual.net>,
TimS <timstreater@greenbee.net> wrote:

>Anyone here know why Apple seems to use decomposed UTF8 when making
>characters with accents

I don't know what Apple's policy is, but there's no good single
solution to this.

One problem is that there are a vast number of possible combinations
of letters and accents (more generally, diacritics), and only some of
them exist as precomposed characters in Unicode. This means that to
produce the composed normal form a program (or library) has to know
which ones exist as composed characters.

On the other hand, there can be characters with more than one
diacritic. With decomposed characters, in which order should the
accents appear? Some pairs of diacritics have a visible order - in
some languages a character may have both a circumflex and an acute
accent, it's necessary to specify which way round they go. In other
cases they don't, such as a circumflex and a cedilla which by their
nature go on the top and bottom (I doubt that particular example
really occurs). Unicode therefore defines combining classes and a
canonical ordering for combining characters to make it easier to
compare two strings.

-- Richard

Re: Apple and Unicode

<t0if8s$qj8$1@dont-email.me>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=6918&group=uk.comp.sys.mac#6918

  copy link   Newsgroups: uk.comp.sys.mac
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: chrisr...@mac.com (Chris Ridd)
Newsgroups: uk.comp.sys.mac
Subject: Re: Apple and Unicode
Date: Sat, 12 Mar 2022 15:46:36 +0000
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <t0if8s$qj8$1@dont-email.me>
References: <j935p4F3jpeU1@mid.individual.net> <t0hqjb$vkc$1@dont-email.me>
<j93cvlF4ts9U1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 12 Mar 2022 15:46:36 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8240628fcb0e004b4642e8e01263be85";
logging-data="27240"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Mni1bmfVFYW3wRl/ehxbga/QTE6w7IWQ="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.7.0
Cancel-Lock: sha1:Y63jmgc1B0wBvlBGGEzuHd0Sqtw=
In-Reply-To: <j93cvlF4ts9U1@mid.individual.net>
 by: Chris Ridd - Sat, 12 Mar 2022 15:46 UTC

On 12/03/2022 11:03, TimS wrote:
> On 12 Mar 2022 at 09:53:46 GMT, Chris Ridd <chrisridd@mac.com> wrote:
>
>> On 12/03/2022 09:00, TimS wrote:
>>> Anyone here know why Apple seems to use decomposed UTF8 when making characters
>>> with accents, such as é ?
>>
>> I've no idea, but if you're using a dead key to produce your accents
>> then it might be "natural" for that to produce the initial non-spacing
>> diacritical. If you've got a physical French/German/etc keyboard and are
>> striking an accented/umlauted letter perhaps it does something different?
>
> Mmm. I believe this is being tested by someone who has a Swiss-French
> keyboard. If he reports back I'll let you know. But I'm sure I've read on one
> of the 1.0E99 websites that claims to be omniscient in such matters, that
> macOS filenames (i.e. what is actually in the directory) are always
> decomposed. BICBW.

Is this causing you any problems?

--
Chris

Re: Apple and Unicode

<j9424mF8r5rU1@mid.individual.net>

  copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=6921&group=uk.comp.sys.mac#6921

  copy link   Newsgroups: uk.comp.sys.mac
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: timstrea...@greenbee.net (TimS)
Newsgroups: uk.comp.sys.mac
Subject: Re: Apple and Unicode
Date: 12 Mar 2022 17:04:22 GMT
Lines: 30
Message-ID: <j9424mF8r5rU1@mid.individual.net>
References: <j935p4F3jpeU1@mid.individual.net> <t0hqjb$vkc$1@dont-email.me> <j93cvlF4ts9U1@mid.individual.net> <t0if8s$qj8$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net OFZ8R3X2kpMN8r9rNCc4tAt5xyD5zoY+fqC+X4BfBCKRRcdtas
Cancel-Lock: sha1:pwU3CSh/8PdhuLX6W98pBZLkvjs=
X-No-Archive: Yes
User-Agent: Usenapp/1.18/l for MacOS - Full License
 by: TimS - Sat, 12 Mar 2022 17:04 UTC

On 12 Mar 2022 at 15:46:36 GMT, Chris Ridd <chrisridd@mac.com> wrote:

> On 12/03/2022 11:03, TimS wrote:
>> On 12 Mar 2022 at 09:53:46 GMT, Chris Ridd <chrisridd@mac.com> wrote:
>>
>>> On 12/03/2022 09:00, TimS wrote:
>>>> Anyone here know why Apple seems to use decomposed UTF8 when making characters
>>>> with accents, such as é ?
>>>
>>> I've no idea, but if you're using a dead key to produce your accents
>>> then it might be "natural" for that to produce the initial non-spacing
>>> diacritical. If you've got a physical French/German/etc keyboard and are
>>> striking an accented/umlauted letter perhaps it does something different?
>>
>> Mmm. I believe this is being tested by someone who has a Swiss-French
>> keyboard. If he reports back I'll let you know. But I'm sure I've read on one
>> of the 1.0E99 websites that claims to be omniscient in such matters, that
>> macOS filenames (i.e. what is actually in the directory) are always
>> decomposed. BICBW.
>
> Is this causing you any problems?

Not me, no. There's been a couple of threads on the Xojo Forum where someone
prolly with little programming experience is running into issues.

Odd how some folks post "It doesn't work" and then expect everyone to intuit
WTF they are talking about.

--
Tim


aus+uk / uk.comp.sys.mac / Apple and Unicode

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor