Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

He keeps differentiating, flying off on a tangent.


devel / comp.lang.python / Re: tail

SubjectAuthor
* Re: tailMarco Sulla
`- Re: tailGreg Ewing

1
Re: tail

<mailman.354.1652039285.20749.python-list@python.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=18195&group=comp.lang.python#18195

 copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Marco.Su...@gmail.com (Marco Sulla)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 8 May 2022 21:47:18 +0200
Lines: 100
Message-ID: <mailman.354.1652039285.20749.python-list@python.org>
References: <CABbU2U99Jpa6nuYg0sXw6=GjBEKVk9u-_oyxSoL8hLrW_2FoBA@mail.gmail.com>
<A3773CDA-B6FE-4A51-8D75-362397220F67@barrys-emacs.org>
<CABbU2U_J7HdUjDV8TLjHJkUb7xBTUes6rG0F17sDJNFX13-SNg@mail.gmail.com>
<3848780F-83B8-4B5F-BFAD-157390288C15@barrys-emacs.org>
<CABbU2U92=Xz3d2jesNZi83stnGG1XWFg7ig-=tjE5_4b_XSyzQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de G5itpveBnw+8fYLuG/6oWAXtVrKQpHLck6d4kHebkFZw==
Return-Path: <elbarbun@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=VX8PYEQP;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.005
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'def': 0.04; 'knows': 0.04;
'stream': 0.04; '2022': 0.05; 'bigger': 0.05; 'chances': 0.05;
'parameter': 0.05; 'sun,': 0.07; 'anyway,': 0.09; 'byte': 0.09;
'cc:addr:python-list': 0.09; 'elif': 0.09; 'smaller': 0.09;
'typically': 0.09; 'problem.': 0.15; '2022,': 0.16; 'barry': 0.16;
'behaviour': 0.16; 'builtin': 0.16; 'cc:name:python list': 0.16;
'char': 0.16; 'cpython.': 0.16; 'encoding': 0.16; 'found.': 0.16;
'from:name:marco sulla': 0.16; 'newline': 0.16; 'none:': 0.16;
'not)': 0.16; 'shortly,': 0.16; 'simpler,': 0.16; 'url:seek':
0.16; 'wrote:': 0.16; 'python': 0.16; 'api': 0.17; "can't": 0.17;
'cc:addr:python.org': 0.20; 'code': 0.23; 'lines': 0.23;
'extension': 0.25; 'cannot': 0.25; 'cc:2**0': 0.25; 'binary':
0.26; 'object': 0.26; 'function': 0.27; 'think': 0.32; 'objects':
0.32; 'split': 0.32; 'message-id:@mail.gmail.com': 0.32; 'but':
0.32; "i'm": 0.33; 'there': 0.33; '100': 0.33; 'header:In-Reply-
To:1': 0.34; 'received:google.com': 0.34; 'from:addr:gmail.com':
0.35; 'change': 0.36; "it's": 0.37; 'received:209.85': 0.37;
'file': 0.38; 'could': 0.38; 'read': 0.38; 'received:209': 0.39;
'two': 0.39; 'text': 0.39; 'otherwise': 0.39; 'list': 0.39; 'use':
0.39; 'case.': 0.40; 'consistent': 0.40; 'both': 0.40; 'should':
0.40; 'method': 0.61; 'skip:o 10': 0.61; "there's": 0.61; 'mode':
0.62; 'skip:o 20': 0.63; 'true': 0.63; 'great': 0.63; 'between':
0.63; 'explained': 0.64; 'finished': 0.64; 'your': 0.64; 'time.':
0.66; 'skip:e 20': 0.67; 'per': 0.68; 'mix': 0.69; 'small,': 0.69;
'little': 0.73; 'near': 0.76; 'mode,': 0.76; 'seek': 0.81;
'position': 0.81; 'cycle': 0.84; 'eventually': 0.84; 'points.':
0.84; 'prototype': 0.84; 'scott': 0.84; 'sulla': 0.84; 'tiny':
0.84; 'mode.': 0.89; 'meets': 0.91; 'tend': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:cc; bh=PeIjTqRVVTvtCxkgXLu6mQLlPnRsdSSPlrXS1uy7Z6M=;
b=VX8PYEQPBWy3+rrdd5JFly1Hf2zIU+BSS7+kRwO0SSbHUcdAlrBMN1xqSK+QD6BYnZ
VlkZYAeBz+/N+yGjFSbOj3GONCguYqtzABsyF2u8TBpYbrMnWUOazynyYe5PXk0dXvwP
HLXb8Hv6OOplwEEN98oUnk2h9wb0WYrIkGz/7KZYUIIzbUtwtNaC14uEoF0YEEk1SPPG
N05OU4BTGB/O9JICW7cCEFcsbavIrlOLBJe92jPOWy3yBeGyNOqwFcx9j5lz4QR5qQuU
rcCDsQszqgd4vdIe+czf1ADUGCVUvdwdPkwm1i4NRcmTjayGD+FcITOdJW+vYNqOf+sW
kD8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:cc;
bh=PeIjTqRVVTvtCxkgXLu6mQLlPnRsdSSPlrXS1uy7Z6M=;
b=0ICMK45D9cAhMMv7zigRk3TX0uUGRq2QuVNv4XLzH5Px69NoZp5iVho2qSTUSb2bHG
6qGK8kW2Bzz22IaidcOGCag0QsRRyE1U1SjKwBjjl08n7Qiepk8+KxrEyStkvm5SSN5/
Pe98cfG/xKAXP4ImUlr9VXGYBd0wu6NKpMpzRE2B3pFjsgjDYXDyaipXSD6bRPMO5M9I
Zavnn9pO8H0Jx07GZ+GDlDJputNzYp7b0CKPCq2DK2zXGj+DcbKCJSdWRyYYyi2noaUq
9aNkcy5i9qNVnSsb4XtOneWh6oG7wHxUsPqjmnb1dhyQ4Aumxq1q2y/OE7DolgXlfZHM
jWOw==
X-Gm-Message-State: AOAM532EHZPcM3kTqfY+DhWCABML9vUsItZ2ORDtX9oxDCIz5Xs4wk/y
XigsZ2CtHFpBNOOPvrjUBlKqqiHx42NhdCuGMwAQBALkPcQ=
X-Google-Smtp-Source: ABdhPJxvvjdYUVimP78WILXwNcSHEKeQXlTyxi5QSevKJOIdZjlLj5+Obfq4XoZoYwsYIRj4EfL6zJJFvYtB8anOu1w=
X-Received: by 2002:a81:7b05:0:b0:2f1:7f75:1d1e with SMTP id
w5-20020a817b05000000b002f17f751d1emr10832401ywc.520.1652039275887; Sun, 08
May 2022 12:47:55 -0700 (PDT)
In-Reply-To: <3848780F-83B8-4B5F-BFAD-157390288C15@barrys-emacs.org>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CABbU2U92=Xz3d2jesNZi83stnGG1XWFg7ig-=tjE5_4b_XSyzQ@mail.gmail.com>
X-Mailman-Original-References: <CABbU2U99Jpa6nuYg0sXw6=GjBEKVk9u-_oyxSoL8hLrW_2FoBA@mail.gmail.com>
<A3773CDA-B6FE-4A51-8D75-362397220F67@barrys-emacs.org>
<CABbU2U_J7HdUjDV8TLjHJkUb7xBTUes6rG0F17sDJNFX13-SNg@mail.gmail.com>
<3848780F-83B8-4B5F-BFAD-157390288C15@barrys-emacs.org>
 by: Marco Sulla - Sun, 8 May 2022 19:47 UTC

On Sun, 8 May 2022 at 20:31, Barry Scott <barry@barrys-emacs.org> wrote:
>
> > On 8 May 2022, at 17:05, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
> >
> > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
> > n_chunk_size = n * chunk_size
>
> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically the smaller size the file system will allocate.
> I tend to read on multiple of MiB as its near instant.

Well, I tested on a little file, a list of my preferred pizzas, so....

> > pos = os.stat(filepath).st_size
>
> You cannot mix POSIX API with text mode.
> pos is in bytes from the start of the file.
> Textmode will be in code points. bytes != code points.
>
> > chunk_line_pos = -1
> > lines_not_found = n
> >
> > with open(filepath, newline=newline, encoding=encoding) as f:
> > text = ""
> >
> > hard_mode = False
> >
> > if newline == None:
> > newline = _lf
> > elif newline == "":
> > hard_mode = True
> >
> > if hard_mode:
> > while pos != 0:
> > pos -= n_chunk_size
> >
> > if pos < 0:
> > pos = 0
> >
> > f.seek(pos)
>
> In text mode you can only seek to a value return from f.tell() otherwise the behaviour is undefined.

Why? I don't see any recommendation about it in the docs:
https://docs.python.org/3/library/io.html#io.IOBase.seek

> > text = f.read()
>
> You have on limit on the amount of data read.

I explained that previously. Anyway, chunk_size is small, so it's not
a great problem.

> > lf_after = False
> >
> > for i, char in enumerate(reversed(text)):
>
> Simple use text.rindex('\n') or text.rfind('\n') for speed.

I can't use them when I have to find both \n or \r. So I preferred to
simplify the code and use the for cycle every time. Take into mind
anyway that this is a prototype for a Python C Api implementation
(builtin I hope, or a C extension if not)

> > Shortly, the file is always opened in text mode. File is read at the end in
> > bigger and bigger chunks, until the file is finished or all the lines are
> > found.
>
> It will fail if the contents is not ASCII.

Why?

> > Why? Because in encodings that have more than 1 byte per character, reading
> > a chunk of n bytes, then reading the previous chunk, can eventually split
> > the character between the chunks in two distinct bytes.
>
> No it cannot. text mode only knows how to return code points. Now if you are in
> binary it could be split, but you are not in binary mode so it cannot.

>From the docs:

seek(offset, whence=SEEK_SET)
Change the stream position to the given byte offset.

> > Do you think there are chances to get this function as a method of the file
> > object in CPython? The method for a file object opened in bytes mode is
> > simpler, since there's no encoding and newline is only \n in that case.
>
> State your requirements. Then see if your implementation meets them.

The method should return the last n lines from a file object.
If the file object is in text mode, the newline parameter must be honored.
If the file object is in binary mode, a newline is always b"\n", to be
consistent with readline.

I suppose the current implementation of tail satisfies the
requirements for text mode. The previous one satisfied binary mode.

Anyway, apart from my implementation, I'm curious if you think a tail
method is worth it to be a method of the builtin file objects in
CPython.

Re: tail

<jdr3p9F2n46U1@mid.individual.net>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=18200&group=comp.lang.python#18200

 copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: greg.ew...@canterbury.ac.nz (Greg Ewing)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Mon, 9 May 2022 11:58:32 +1200
Lines: 17
Message-ID: <jdr3p9F2n46U1@mid.individual.net>
References: <CABbU2U99Jpa6nuYg0sXw6=GjBEKVk9u-_oyxSoL8hLrW_2FoBA@mail.gmail.com>
<A3773CDA-B6FE-4A51-8D75-362397220F67@barrys-emacs.org>
<CABbU2U_J7HdUjDV8TLjHJkUb7xBTUes6rG0F17sDJNFX13-SNg@mail.gmail.com>
<3848780F-83B8-4B5F-BFAD-157390288C15@barrys-emacs.org>
<CABbU2U92=Xz3d2jesNZi83stnGG1XWFg7ig-=tjE5_4b_XSyzQ@mail.gmail.com>
<mailman.354.1652039285.20749.python-list@python.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net O9ndqSIvYkt7KXbITTRCHwRMAN3rPsuU1QRTqZsQJk2rq+zMlm
Cancel-Lock: sha1:yzYCRm0rLCq4ZYSn6x8lx3Henk4=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:91.0)
Gecko/20100101 Thunderbird/91.3.2
Content-Language: en-US
In-Reply-To: <mailman.354.1652039285.20749.python-list@python.org>
 by: Greg Ewing - Sun, 8 May 2022 23:58 UTC

On 9/05/22 7:47 am, Marco Sulla wrote:
>> It will fail if the contents is not ASCII.
>
> Why?

For some encodings, if you seek to an arbitrary byte position and
then read, it may *appear* to succeed but give you complete gibberish.

Your method might work for a certain subset of encodings (those that
are self-synchronising) but it won't work for arbitrary encodings.

Given that limitation, I don't think it's reliable enough to include
in the standard library.

--
Greg

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor