Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"It's not just a computer -- it's your ass." -- Cal Keegan


devel / comp.lang.python / Re: tail

SubjectAuthor
o Re: tailChris Angelico

1
Re: tail

<mailman.343.1651954334.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18182&group=comp.lang.python#18182

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ros...@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 8 May 2022 06:12:01 +1000
Lines: 70
Message-ID: <mailman.343.1651954334.20749.python-list@python.org>
References: <CABbU2U-_Z546umxtnZXL8b1LUERCnyOxYw6osKTvKncOHFkJ3A@mail.gmail.com>
<60454E09-0ADA-4881-A84B-6C11397D244F@barrys-emacs.org>
<CABbU2U99Jpa6nuYg0sXw6=GjBEKVk9u-_oyxSoL8hLrW_2FoBA@mail.gmail.com>
<561ac7a8-2034-c1ce-6fca-f4280baac409@mrabarnett.plus.com>
<CABbU2U-N=YiRYfVkjpv8RP6BCo4VOLL7SWK=vNq8oje7nwuyUw@mail.gmail.com>
<CAPTjJmpoO+kR4EgiZwu=G_K8wrs4GBDq3xsrrir+MTBGkWpHLQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de FbGAZaJ2nY0qXzgufntJrgYAbbFEiCeQO18mpG4xVIbQ==
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=GRmU33Me;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.017
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; '2022': 0.05; 'parameter':
0.05; 'sun,': 0.07; 'bin': 0.09; 'byte': 0.09; 'ok,': 0.09;
'trivial': 0.09; '"what': 0.16; '>>>>': 0.16; 'arbitrary': 0.16;
'assuming': 0.16; 'barry': 0.16; 'chrisa': 0.16; 'encoding': 0.16;
'encoding,': 0.16; 'encoding.': 0.16; 'from:addr:rosuav': 0.16;
'from:name:chris angelico': 0.16; 'furthermore,': 0.16; 'input.':
0.16; 'naive': 0.16; 'specify': 0.16; 'stateless': 0.16; 'unit,':
0.16; 'wrote:': 0.16; 'uses': 0.19; 'to:addr:python-list': 0.20;
'sat,': 0.22; 'code': 0.23; 'actual': 0.25; 'project.': 0.27;
'>>>': 0.28; 'sense': 0.28; 'think': 0.32; 'message-
id:@mail.gmail.com': 0.32; 'unless': 0.32; 'but': 0.32; 'there':
0.33; 'same': 0.34; "didn't": 0.34; 'mean': 0.34; 'skip:" 20':
0.34; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34;
'trying': 0.35; 'handling': 0.35; 'from:addr:gmail.com': 0.35;
'also,': 0.36; 'cases': 0.36; 'those': 0.36; "it's": 0.37;
'received:209.85': 0.37; 'hard': 0.37; 'file': 0.38; 'way': 0.38;
'could': 0.38; 'received:209': 0.39; 'added': 0.39; 'text': 0.39;
'handle': 0.39; 'use': 0.39; 'something': 0.40; 'want': 0.40;
'method': 0.61; 'mode': 0.62; 'true': 0.63; 'simply': 0.63;
'skip:b 20': 0.63; 'skip:b 10': 0.63; 'in.': 0.64; '100%': 0.66;
'per': 0.68; 'interpreted': 0.69; 'sequence': 0.69; 'subsequent':
0.76; 'absolutely': 0.84; 'decode': 0.84; 'imposes': 0.84; 'skip:"
40': 0.84; 'sulla': 0.84; 'you;': 0.84; 'differently': 0.91;
'trick': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:content-transfer-encoding;
bh=smggnBZZ7oATaGKT9Pp7DpHAa2Y92/zZ7viKLi+bLbs=;
b=GRmU33MefJlfYmwVmG13Nf/kw41F1cKN0QmDhpBoUllEpKitdfUeWXrYko1aTC0YFg
wzAbGTveZKnxeZiIYVrA5DCyG0RKqNNCkbYE3GLvQsrvUoMDrK5BXFj3aju6edcDt4fs
7csOW9rgckQNe99msT0l0lim+WvinJNPr1xZ5hZBuxpyFbDpqPGTJCrxx3P227nkyxAK
KIGtzfmwFNsFPg2e6rTUJAtO6x/TqdOmTPEFIUPaFK7KPKXDqKavd6YS4U0JI8S0rVzb
MYsC0jEOqnnC62yH4ZZZhwX6U1m0f5aD/lSTfMcNdoT1LYawGWP+vonbdljn4lLJyjxY
Eblw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:content-transfer-encoding;
bh=smggnBZZ7oATaGKT9Pp7DpHAa2Y92/zZ7viKLi+bLbs=;
b=SUyo0hNSGXbeesXKgiG3Ewle8RAkI9aTT27ryqh2PSlhFSIJmMDcDoxfb+UztB3iMe
QM/A6wGzL+f1oYH4g280ppj8xDDt2uuFTNZ4DXsr2xUKn5KU1SQeH16aZIV8qm+SXC+f
qXpDmnIgjoYHInIZaDy4d5eMAk2sin9D7QQiC9x6hdWN5ibh8LEcR/4qnJwxBjlSeQsp
7ucyWGJyglhU+Pyf4MdXxKB0RaxDh9cDWn3VKI6Zl+8kiZXR2ezhXLND6MvGVCtXfe7M
67yC0V1iYOuonbk5awy+T8W6IUaUuOVRDfZ6M5Z3AfoFxYqG3wEld0uyNiopZuITf2o2
O3rA==
X-Gm-Message-State: AOAM532b05B3Blu+ifmviyMTLDiTbWbMNw0gFNqjblHCLGEAlMKHO+s4
a6Dk4EQCueQ1Gqta6ERo3lvabhsTDQAcmwdOs6SRxQ8+
X-Google-Smtp-Source: ABdhPJwSUI1oOUNdCOWTUNsWw97mtVtE9Va/nFTbc8MeZO2K6gqcBbjKX42vmbGzyA5MBjrRYXY2m6h6xkkssaOaXfo=
X-Received: by 2002:a05:600c:2205:b0:394:193a:80ed with SMTP id
z5-20020a05600c220500b00394193a80edmr15921266wml.191.1651954332864; Sat, 07
May 2022 13:12:12 -0700 (PDT)
In-Reply-To: <CABbU2U-N=YiRYfVkjpv8RP6BCo4VOLL7SWK=vNq8oje7nwuyUw@mail.gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmpoO+kR4EgiZwu=G_K8wrs4GBDq3xsrrir+MTBGkWpHLQ@mail.gmail.com>
X-Mailman-Original-References: <CABbU2U-_Z546umxtnZXL8b1LUERCnyOxYw6osKTvKncOHFkJ3A@mail.gmail.com>
<60454E09-0ADA-4881-A84B-6C11397D244F@barrys-emacs.org>
<CABbU2U99Jpa6nuYg0sXw6=GjBEKVk9u-_oyxSoL8hLrW_2FoBA@mail.gmail.com>
<561ac7a8-2034-c1ce-6fca-f4280baac409@mrabarnett.plus.com>
<CABbU2U-N=YiRYfVkjpv8RP6BCo4VOLL7SWK=vNq8oje7nwuyUw@mail.gmail.com>
 by: Chris Angelico - Sat, 7 May 2022 20:12 UTC

On Sun, 8 May 2022 at 04:37, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
>
> On Sat, 7 May 2022 at 19:02, MRAB <python@mrabarnett.plus.com> wrote:
> >
> > On 2022-05-07 17:28, Marco Sulla wrote:
> > > On Sat, 7 May 2022 at 16:08, Barry <barry@barrys-emacs.org> wrote:
> > >> You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard for the cases you wanted.
> > >
> > >>>> "\n".encode("utf-16")
> > > b'\xff\xfe\n\x00'
> > >>>> "".encode("utf-16")
> > > b'\xff\xfe'
> > >>>> "a\nb".encode("utf-16")
> > > b'\xff\xfea\x00\n\x00b\x00'
> > >>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > > b'\n\x00'
> > >
> > > Can I use the last trick to get the encoding of a LF or a CR in any encoding?
> >
> > In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> > could be little-endian or big-endian.
> >
> > As you didn't specify which you wanted, it defaulted to little-endian
> > and added a BOM (U+FEFF).
> >
> > If you specify which endianness you want with "utf-16le" or "utf-16be",
> > it won't add the BOM:
> >
> > >>> # Little-endian.
> > >>> "\n".encode("utf-16le")
> > b'\n\x00'
> > >>> # Big-endian.
> > >>> "\n".encode("utf-16be")
> > b'\x00\n'
>
> Well, ok, but I need a generic method to get LF and CR for any
> encoding an user can input.
> Do you think that
>
> "\n".encode(encoding).lstrip("".encode(encoding))
>
> is good for any encoding?

No, because it is only useful for stateless encodings. Any encoding
which uses "shift bytes" that cause subsequent bytes to be interpreted
differently will simply not work with this naive technique. Also,
you're assuming that the byte(s) you get from encoding LF will *only*
represent LF, which is also not true for a number of other encodings -
they might always encode LF to the same byte sequence, but could use
that same byte sequence as part of a multi-byte encoding. So, no, for
arbitrarily chosen encodings, this is not dependable.

> Furthermore, is there a way to get the
> encoding of an opened file object?

Nope. That's fundamentally not possible. Unless you mean in the
trivial sense of "what was the parameter passed to the open() call?",
in which case f.encoding will give it to you; but to find out the
actual encoding, no, you can't.

The ONLY way to 100% reliably decode arbitrary text is to know, from
external information, what encoding it is in. Every other scheme
imposes restrictions. Trying to do something that works for absolutely
any encoding is a doomed project.

ChrisA

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor