Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Line Printer paper is strongest at the perforations.


devel / comp.unix.shell / Determine size demand of (Unicode-)characters on terminal from shell

SubjectAuthor
* Determine size demand of (Unicode-)characters on terminal from shellJanis Papanagnou
`* Re: Determine size demand of (Unicode-)characters on terminal frommarrgol
 `* Re: Determine size demand of (Unicode-)characters on terminal fromJanis Papanagnou
  +* Re: Determine size demand of (Unicode-)characters on terminal from shellKeith Thompson
  |`* Re: Determine size demand of (Unicode-)characters on terminal fromJanis Papanagnou
  | +- Re: Determine size demand of (Unicode-)characters on terminal from shellSpiros Bousbouras
  | `- Re: Determine size demand of (Unicode-)characters on terminal from shellEli the Bearded
  `* Re: Determine size demand of (Unicode-)characters on terminal fromJanis Papanagnou
   `* Re: Determine size demand of (Unicode-)characters on terminal from shellBen Bacarisse
    `- Re: Determine size demand of (Unicode-)characters on terminal fromJanis Papanagnou

1
Determine size demand of (Unicode-)characters on terminal from shell

<sqcdr2$k3p$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4822&group=comp.unix.shell#4822

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Determine size demand of (Unicode-)characters on terminal from shell
Date: Mon, 27 Dec 2021 14:07:46 +0100
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <sqcdr2$k3p$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Dec 2021 13:07:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5d001ba20e4d055c18e6e6107730c9f3";
logging-data="20601"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Su3l06YcsNC53K/9WYs5s"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:ABudOptczdFBnevj1Z78t4jBrpM=
X-Mozilla-News-Host: news://news.eternal-september.org:119
 by: Janis Papanagnou - Mon, 27 Dec 2021 13:07 UTC

I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:

Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?

Janis

Re: Determine size demand of (Unicode-)characters on terminal from shell

<61c9c228$0$23915$65785112@news.neostrada.pl>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4824&group=comp.unix.shell#4824

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!newsfeed.neostrada.pl!unt-exc-02.news.neostrada.pl!unt-spo-a-02.news.neostrada.pl!news.neostrada.pl.POSTED!not-for-mail
Subject: Re: Determine size demand of (Unicode-)characters on terminal from
shell
Newsgroups: comp.unix.shell
References: <sqcdr2$k3p$1@dont-email.me>
From: marr...@address.invalid (marrgol)
Date: Mon, 27 Dec 2021 14:39:52 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.14.0
MIME-Version: 1.0
In-Reply-To: <sqcdr2$k3p$1@dont-email.me>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Lines: 21
Message-ID: <61c9c228$0$23915$65785112@news.neostrada.pl>
Organization: Telekomunikacja Polska
NNTP-Posting-Host: 176.111.237.144
X-Trace: 1640612392 unt-rea-a-01.news.neostrada.pl 23915 176.111.237.144:58216
X-Complaints-To: abuse@news.neostrada.pl
X-Received-Bytes: 1922
 by: marrgol - Mon, 27 Dec 2021 13:39 UTC

On 27/12/2021 at 14.07, Janis Papanagnou wrote:
> I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
> characters on a terminal window. The indices to provide for %d
> are suited for (e.g.) the Latin character sets, but not for
> character sets where characters require more than one unit for
> the displayed glyph, e.g. like the Chinese characters. So with
> a Latin character set I'd use indices 1, 2, 3, ... and for the
> Asian sets I's use 1, 3, 5, ... to position the characters at
> the screen. My question:
>
> Is the size that the character glyphs need for representation
> on a terminal somehow retrievable, so that I get, say, for
> Unicode character \U0041 a value of 1 and for \U30ee a value
> of 2, so that I can automatize the displaying on a terminal?

Quick search reveals:
https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

--
mrg

Re: Determine size demand of (Unicode-)characters on terminal from shell

<sqck6b$tqj$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4825&group=comp.unix.shell#4825

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from
shell
Date: Mon, 27 Dec 2021 15:56:11 +0100
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <sqck6b$tqj$1@dont-email.me>
References: <sqcdr2$k3p$1@dont-email.me>
<61c9c228$0$23915$65785112@news.neostrada.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Dec 2021 14:56:11 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5d001ba20e4d055c18e6e6107730c9f3";
logging-data="30547"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/bKQjQQN1Oswoo1VqmbFhL"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:oWXA/4559dX09mMWnl/s0Gw1lJE=
In-Reply-To: <61c9c228$0$23915$65785112@news.neostrada.pl>
 by: Janis Papanagnou - Mon, 27 Dec 2021 14:56 UTC

On 27.12.2021 14:39, marrgol wrote:
> On 27/12/2021 at 14.07, Janis Papanagnou wrote:
>> I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
>> characters on a terminal window. The indices to provide for %d
>> are suited for (e.g.) the Latin character sets, but not for
>> character sets where characters require more than one unit for
>> the displayed glyph, e.g. like the Chinese characters. So with
>> a Latin character set I'd use indices 1, 2, 3, ... and for the
>> Asian sets I's use 1, 3, 5, ... to position the characters at
>> the screen. My question:
>>
>> Is the size that the character glyphs need for representation
>> on a terminal somehow retrievable, so that I get, say, for
>> Unicode character \U0041 a value of 1 and for \U30ee a value
>> of 2, so that I can automatize the displaying on a terminal?
>
> Quick search reveals:
> https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

Interesting, Stephane asked that question. And wc -L seems to be
the solution; non-standard but at least works on my system. Thanks!

$ printf "\U30ee" | wc -L
2 $ printf "\U0041" | wc -L
1

Janis

Re: Determine size demand of (Unicode-)characters on terminal from shell

<87wnjp3ezv.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4826&group=comp.unix.shell#4826

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from shell
Date: Mon, 27 Dec 2021 13:38:28 -0800
Organization: None to speak of
Lines: 39
Message-ID: <87wnjp3ezv.fsf@nosuchdomain.example.com>
References: <sqcdr2$k3p$1@dont-email.me>
<61c9c228$0$23915$65785112@news.neostrada.pl>
<sqck6b$tqj$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="35ea2551f1694fd90830c28389bd7752";
logging-data="31519"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18d3xttDLnsjEOW2oqVMs0v"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:uTzhxIKLZN+xzK+RYLYkq+BRJWg=
sha1:Z/t8AEN5EZ72eA5064aRnS/88LY=
 by: Keith Thompson - Mon, 27 Dec 2021 21:38 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
> On 27.12.2021 14:39, marrgol wrote:
>> On 27/12/2021 at 14.07, Janis Papanagnou wrote:
>>> I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
>>> characters on a terminal window. The indices to provide for %d
>>> are suited for (e.g.) the Latin character sets, but not for
>>> character sets where characters require more than one unit for
>>> the displayed glyph, e.g. like the Chinese characters. So with
>>> a Latin character set I'd use indices 1, 2, 3, ... and for the
>>> Asian sets I's use 1, 3, 5, ... to position the characters at
>>> the screen. My question:
>>>
>>> Is the size that the character glyphs need for representation
>>> on a terminal somehow retrievable, so that I get, say, for
>>> Unicode character \U0041 a value of 1 and for \U30ee a value
>>> of 2, so that I can automatize the displaying on a terminal?
>>
>> Quick search reveals:
>> https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
>
> Interesting, Stephane asked that question. And wc -L seems to be
> the solution; non-standard but at least works on my system. Thanks!
>
> $ printf "\U30ee" | wc -L
> 2
> $ printf "\U0041" | wc -L
> 1

Interally, `wc -L` uses the POSIX `wcwidth()` function.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcwidth.html

I'm not 100% clear on how the number of column positions for a given
character is defined.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Re: Determine size demand of (Unicode-)characters on terminal from shell

<sqdfm3$n1v$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4827&group=comp.unix.shell#4827

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from
shell
Date: Mon, 27 Dec 2021 23:45:22 +0100
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <sqdfm3$n1v$1@dont-email.me>
References: <sqcdr2$k3p$1@dont-email.me>
<61c9c228$0$23915$65785112@news.neostrada.pl> <sqck6b$tqj$1@dont-email.me>
<87wnjp3ezv.fsf@nosuchdomain.example.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Dec 2021 22:45:23 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="5d001ba20e4d055c18e6e6107730c9f3";
logging-data="23615"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/DWRanY03O7XXC+aA8FsH"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:oaqNPmLqECperuJWKYdfI6+FKYU=
In-Reply-To: <87wnjp3ezv.fsf@nosuchdomain.example.com>
 by: Janis Papanagnou - Mon, 27 Dec 2021 22:45 UTC

On 27.12.2021 22:38, Keith Thompson wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>> On 27.12.2021 14:39, marrgol wrote:
>>> On 27/12/2021 at 14.07, Janis Papanagnou wrote:
>>>> I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
>>>> characters on a terminal window. The indices to provide for %d
>>>> are suited for (e.g.) the Latin character sets, but not for
>>>> character sets where characters require more than one unit for
>>>> the displayed glyph, e.g. like the Chinese characters. So with
>>>> a Latin character set I'd use indices 1, 2, 3, ... and for the
>>>> Asian sets I's use 1, 3, 5, ... to position the characters at
>>>> the screen. My question:
>>>>
>>>> Is the size that the character glyphs need for representation
>>>> on a terminal somehow retrievable, so that I get, say, for
>>>> Unicode character \U0041 a value of 1 and for \U30ee a value
>>>> of 2, so that I can automatize the displaying on a terminal?
>>>
>>> Quick search reveals:
>>> https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
>>
>> Interesting, Stephane asked that question. And wc -L seems to be
>> the solution; non-standard but at least works on my system. Thanks!
>>
>> $ printf "\U30ee" | wc -L
>> 2
>> $ printf "\U0041" | wc -L
>> 1
>
> Interally, `wc -L` uses the POSIX `wcwidth()` function.

Yes, that function seems to be the standard base for a couple tools.
It's good to have access to that function on Linux in such a simple
way. (Not sure how reliable that is, though; see below.)

> https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcwidth.html
>
> I'm not 100% clear on how the number of column positions for a given
> character is defined.

The issue seems to be quite a mess. In the SE thread Stefane gave a link
to an article on the Unicode topic that I found interesting and amusing:
https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-characters-and-character-width

Janis

Re: Determine size demand of (Unicode-)characters on terminal from shell

<rafQvPvVXfdGowE8l@bongo-ra.co>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4829&group=comp.unix.shell#4829

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: spi...@gmail.com (Spiros Bousbouras)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from shell
Date: Tue, 28 Dec 2021 03:49:35 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <rafQvPvVXfdGowE8l@bongo-ra.co>
References: <sqcdr2$k3p$1@dont-email.me> <61c9c228$0$23915$65785112@news.neostrada.pl> <sqck6b$tqj$1@dont-email.me>
<87wnjp3ezv.fsf@nosuchdomain.example.com> <sqdfm3$n1v$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 28 Dec 2021 03:49:35 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f101e5c2821a2e879a29ad70a686ec98";
logging-data="4609"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19T/lYg6chrcY/VMQbgtSDL"
Cancel-Lock: sha1:+BKMwNbkJEaYdzKGA7nmbtwIduI=
In-Reply-To: <sqdfm3$n1v$1@dont-email.me>
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Tue, 28 Dec 2021 03:49 UTC

On Mon, 27 Dec 2021 23:45:22 +0100
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
> On 27.12.2021 22:38, Keith Thompson wrote:
> > Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
> >> On 27.12.2021 14:39, marrgol wrote:
> >>> Quick search reveals:
> >>> https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
> >>
> >> Interesting, Stephane asked that question. And wc -L seems to be
> >> the solution; non-standard but at least works on my system. Thanks!
> >>
> >> $ printf "\U30ee" | wc -L
> >> 2
> >> $ printf "\U0041" | wc -L
> >> 1
> >
> > Interally, `wc -L` uses the POSIX `wcwidth()` function.
>
> Yes, that function seems to be the standard base for a couple tools.
> It's good to have access to that function on Linux in such a simple
> way. (Not sure how reliable that is, though; see below.)
>
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcwidth.html
> >
> > I'm not 100% clear on how the number of column positions for a given
> > character is defined.

I was wondering about that myself. I'm sure the Unicode standard has something
to say about it but whether there are other factors , I don't know. Then this
made me wonder whether there are newsgroups discussing Unicode. I was only
able to find fr.comp.normes.unicode which doesn't have discussion.

> The issue seems to be quite a mess. In the SE thread Stefane gave a link
> to an article on the Unicode topic that I found interesting and amusing:
> https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-characters-and-character-width

I had a look at that link myself. I know little about Unicode so I didn't
see anything to doubt that page. But exploring further on the site I found
https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design :

In C, functions like strpos return -1 if the item isn't found. If you
don't check for that case and try to use that as an index, you'll hit
junk memory and your program will blow up. (Probably. It's C. Who the
fuck knows. I'm sure there are tools for this, at least.)
[...]
For those not down with the C: INT_MAX is the biggest integer that will
fit in a variable, ever.

This kind of sloppiness and his style of posting and the fact that comments
are only allowed through Disqus make me wary.

Re: Determine size demand of (Unicode-)characters on terminal from shell

<srr1im$tmi$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4849&group=comp.unix.shell#4849

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from
shell
Date: Fri, 14 Jan 2022 06:26:46 +0100
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <srr1im$tmi$1@dont-email.me>
References: <sqcdr2$k3p$1@dont-email.me>
<61c9c228$0$23915$65785112@news.neostrada.pl> <sqck6b$tqj$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 14 Jan 2022 05:26:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="a870b32ab8ee0054ceb7dd078a69656e";
logging-data="30418"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19WoX6vza55NpCtRMmpxBOY"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:xqkH4I4wpIlAWPQvuBsWE609ciQ=
In-Reply-To: <sqck6b$tqj$1@dont-email.me>
 by: Janis Papanagnou - Fri, 14 Jan 2022 05:26 UTC

On 27.12.2021 15:56, Janis Papanagnou wrote:
> On 27.12.2021 14:39, marrgol wrote:
>> On 27/12/2021 at 14.07, Janis Papanagnou wrote:
>>> I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
>>> characters on a terminal window. The indices to provide for %d
>>> are suited for (e.g.) the Latin character sets, but not for
>>> character sets where characters require more than one unit for
>>> the displayed glyph, e.g. like the Chinese characters. So with
>>> a Latin character set I'd use indices 1, 2, 3, ... and for the
>>> Asian sets I's use 1, 3, 5, ... to position the characters at
>>> the screen. My question:
>>>
>>> Is the size that the character glyphs need for representation
>>> on a terminal somehow retrievable, so that I get, say, for
>>> Unicode character \U0041 a value of 1 and for \U30ee a value
>>> of 2, so that I can automatize the displaying on a terminal?
>>
>> Quick search reveals:
>> https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
>
> Interesting, Stephane asked that question. And wc -L seems to be
> the solution; non-standard but at least works on my system. Thanks!
>
> $ printf "\U30ee" | wc -L
> 2
> $ printf "\U0041" | wc -L
> 1

Just tried that for the Unicode-smileys starting in the Unicode tables
from position U+1F600 (128512), but for these symbols 'wc -L' returns
0, as if these symbols wouldn't require any space. - Too bad.

Janis

Re: Determine size demand of (Unicode-)characters on terminal from shell

<87lezi5ac4.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4850&group=comp.unix.shell#4850

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from shell
Date: Fri, 14 Jan 2022 20:30:35 +0000
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <87lezi5ac4.fsf@bsb.me.uk>
References: <sqcdr2$k3p$1@dont-email.me>
<61c9c228$0$23915$65785112@news.neostrada.pl>
<sqck6b$tqj$1@dont-email.me> <srr1im$tmi$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="206e5a0bdb8f1dd829c262f68c5b7238";
logging-data="15810"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9Biv1WQs9iiEiuGwFGmir576iIJsrScM="
Cancel-Lock: sha1:ImqVqKzuBt9MDQ3at+WxCulGVoE=
sha1:339O4KvpEh79LyNUNmX43/eL784=
X-BSB-Auth: 1.b49b6590433e2e73ef79.20220114203035GMT.87lezi5ac4.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 14 Jan 2022 20:30 UTC

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

> On 27.12.2021 15:56, Janis Papanagnou wrote:
>> On 27.12.2021 14:39, marrgol wrote:
>>> On 27/12/2021 at 14.07, Janis Papanagnou wrote:
>>>> I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
>>>> characters on a terminal window. The indices to provide for %d
>>>> are suited for (e.g.) the Latin character sets, but not for
>>>> character sets where characters require more than one unit for
>>>> the displayed glyph, e.g. like the Chinese characters. So with
>>>> a Latin character set I'd use indices 1, 2, 3, ... and for the
>>>> Asian sets I's use 1, 3, 5, ... to position the characters at
>>>> the screen. My question:
>>>>
>>>> Is the size that the character glyphs need for representation
>>>> on a terminal somehow retrievable, so that I get, say, for
>>>> Unicode character \U0041 a value of 1 and for \U30ee a value
>>>> of 2, so that I can automatize the displaying on a terminal?
>>>
>>> Quick search reveals:
>>> https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
>>
>> Interesting, Stephane asked that question. And wc -L seems to be
>> the solution; non-standard but at least works on my system. Thanks!
>>
>> $ printf "\U30ee" | wc -L
>> 2
>> $ printf "\U0041" | wc -L
>> 1
>
> Just tried that for the Unicode-smileys starting in the Unicode tables
> from position U+1F600 (128512), but for these symbols 'wc -L' returns
> 0, as if these symbols wouldn't require any space. - Too bad.

$ printf "\U1f600" | wc -L
2

Maybe a locale setting?

--
Ben.

Re: Determine size demand of (Unicode-)characters on terminal from shell

<srt35d$e0k$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4851&group=comp.unix.shell#4851

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: janis_pa...@hotmail.com (Janis Papanagnou)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from
shell
Date: Sat, 15 Jan 2022 01:06:05 +0100
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <srt35d$e0k$1@dont-email.me>
References: <sqcdr2$k3p$1@dont-email.me>
<61c9c228$0$23915$65785112@news.neostrada.pl> <sqck6b$tqj$1@dont-email.me>
<srr1im$tmi$1@dont-email.me> <87lezi5ac4.fsf@bsb.me.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Jan 2022 00:06:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1042b20652bc95d9a1ea5fece84486f5";
logging-data="14356"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/omNi6eaBRr3S7fd3G7scX"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:Q0vX4ISHSt2ZDJNZkX8FltiWhBk=
In-Reply-To: <87lezi5ac4.fsf@bsb.me.uk>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Sat, 15 Jan 2022 00:06 UTC

On 14.01.2022 21:30, Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
>>
>> Just tried that for the Unicode-smileys starting in the Unicode tables
>> from position U+1F600 (128512), but for these symbols 'wc -L' returns
>> 0, as if these symbols wouldn't require any space. - Too bad.
>
> $ printf "\U1f600" | wc -L
> 2

Hmm..

> Maybe a locale setting?

I tried a couple UTF-8 locales along with plain C locale; all return 0
in my environment.

Given your post I now tried it also on a machine with newer OS version.
And there it works as expected. - It seems that the locale definitions
in the system files of that older Linux version are broken?

Janis

Re: Determine size demand of (Unicode-)characters on terminal from shell

<eli$2201151651@qaz.wtf>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=4856&group=comp.unix.shell#4856

  copy link   Newsgroups: comp.unix.shell
Path: i2pn2.org!i2pn.org!news.niel.me!tncsrv06.tnetconsulting.net!2.us.feeder.erje.net!feeder.erje.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *...@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.unix.shell
Subject: Re: Determine size demand of (Unicode-)characters on terminal from shell
Date: Sat, 15 Jan 2022 22:01:10 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2201151651@qaz.wtf>
References: <sqcdr2$k3p$1@dont-email.me> <sqck6b$tqj$1@dont-email.me> <87wnjp3ezv.fsf@nosuchdomain.example.com> <sqdfm3$n1v$1@dont-email.me>
Injection-Date: Sat, 15 Jan 2022 22:01:10 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="5366"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Sat, 15 Jan 2022 22:01 UTC

In comp.unix.shell, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
> On 27.12.2021 22:38, Keith Thompson wrote:
> > Interally, `wc -L` uses the POSIX `wcwidth()` function.
> Yes, that function seems to be the standard base for a couple tools.
> It's good to have access to that function on Linux in such a simple
> way. (Not sure how reliable that is, though; see below.)

I tried it out on NetBSD 9.2 today and found an interesting quirk. As
it's not Gnu, no --version, but I pulled this out of wc with `strings`:

GCC: (NetBSD nb4 20200810) 7.5.0
$NetBSD: crt0.S,v 1.4 2018/11/26 17:37:46 joerg Exp $
$NetBSD: crt0-common.c,v 1.23 2018/12/28 20:12:35 christos Exp $
$NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
$NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
$NetBSD: wc.c,v 1.35 2011/09/16 15:39:30 joerg Exp $
$NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
$NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
@(#) Copyright (c) 1980, 1987, 1991, 1993 The Regents of the University of California. All rights reserved.

$ printf "\U30ee" | wc -L
0
$ printf "\U30ee\n" | wc -L
1
$

Compared with a Gnu wc:

$ printf "\U30ee" | gwc -L
2 $ printf "\U30ee\n" | gwc -L
2 $

So not only is the NetBSD one not using something to detect "wide"
characters, line length only counts complete lines.

Elijah
------
has been looking for a good way to find terminal display length

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor