Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

From Sharp minds come... pointed heads. -- Bryan Sparrowhawk


devel / comp.lang.tcl / Dealing with encodings

SubjectAuthor
* Dealing with encodingsLuc
`- Dealing with encodingsRich

1
Dealing with encodings

<20230224152409.0233668d@lud1.home>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10796&group=comp.lang.tcl#10796

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: luc...@sep.invalid (Luc)
Newsgroups: comp.lang.tcl
Subject: Dealing with encodings
Date: Fri, 24 Feb 2023 15:24:09 -0300
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <20230224152409.0233668d@lud1.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader01.eternal-september.org; posting-host="71ec5c5b20b1f67c596b9ba4683373b5";
logging-data="2404258"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18tU4nIts93phf3I36iYNPKDkTPBO3Kx4Q="
Cancel-Lock: sha1:xD9YgJ9feT3sBIPO8P9UUyYzPOA=
 by: Luc - Fri, 24 Feb 2023 18:24 UTC

I have a basic text editor that well, edits text files.

I've been using it for a long time without ever giving
a thought about encodings. I just open, edit and save.
I never knew or cared about what encodings were involved.

I want to change that.

I know how to tell Tcl to write with a certain encoding.
But I never implemented that and I've been thinking that
I should probably keep the existing encoding in most cases,
and for that I have to be able to tell what encoding is
there already.

I believe Tcl cannot do that. I've been researching and
it seems that we need external software to do that, namely
'file' and 'enca' neither of which is super reliable.

What experience do you have with that? Can you share any
suggestions or recommendations?

--
Luc
>>

Re: Dealing with encodings

<ttc8oc$2gdkp$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=10798&group=comp.lang.tcl#10798

  copy link   Newsgroups: comp.lang.tcl
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ric...@example.invalid (Rich)
Newsgroups: comp.lang.tcl
Subject: Re: Dealing with encodings
Date: Sat, 25 Feb 2023 06:10:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <ttc8oc$2gdkp$1@dont-email.me>
References: <20230224152409.0233668d@lud1.home>
Injection-Date: Sat, 25 Feb 2023 06:10:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="fd5cf3010a297d53e0b00e9cca62611b";
logging-data="2635417"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19BXmTbwWNaqJZ4aXKG9xg9"
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/3.10.17 (x86_64))
Cancel-Lock: sha1:8mj+3Be+Hj/Cjsq3miRuBifbFHE=
 by: Rich - Sat, 25 Feb 2023 06:10 UTC

Luc <luc@sep.invalid> wrote:
> I know how to tell Tcl to write with a certain encoding. But I never
> implemented that and I've been thinking that I should probably keep
> the existing encoding in most cases, and for that I have to be able
> to tell what encoding is there already.
>
> I believe Tcl cannot do that. I've been researching and it seems
> that we need external software to do that, namely 'file' and 'enca'
> neither of which is super reliable.

Actually, absent side-channel information, it is impossible to tell
with 100% certainty what 'encoding' a given file has been encoded with.

The best you can do is verify that a given file does not contain any
illegal sequences for the expected encoding. These kinds of
hieuristics will get you 95% there, but it will always be possible for
something to slip through.

> What experience do you have with that? Can you share any suggestions
> or recommendations?

For reading, if you assume UTF-8, you'll be right more often than wrong
for anything modern. The older the "text file" you plan to edit, the
greater probability for UTF-8 to be an incorrect choice. And there
will always end up being a few where you just have to make a guess and
see if it looks like it worked.

For writing, just create everything as UTF-8 unless you have a *very*
good reason to do otherwise.


devel / comp.lang.tcl / Dealing with encodings

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor