Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Yes, and I feel bad about rendering their useless carci into dogfood..." -- Badger comics


devel / comp.lang.python / Re: tail

SubjectAuthor
o Re: tailChris Angelico

1
Re: tail

<mailman.221.1650748571.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18001&group=comp.lang.python#18001

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ros...@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 24 Apr 2022 07:15:58 +1000
Lines: 72
Message-ID: <mailman.221.1650748571.20749.python-list@python.org>
References: <CABbU2U98YKdcnJkDPfzE3Pqso+6LL72usB8hrSBVR0WbhauRoQ@mail.gmail.com>
<CAPTjJmr3AiCyvxXt=-nqNLrJfyQHmG=pvSsM7nU_XxhSe94zgA@mail.gmail.com>
<CABbU2U8TAvy0zMhUcNtTD0=WpQ6oNYEeZQuKDjnxhG85FVriDg@mail.gmail.com>
<CAPTjJmqnfoPjoNT2CNsrkMVxkzAMHHXHj-G3DuGrJ21SDRNsPA@mail.gmail.com>
<CABbU2U_sWyEmBXf0Psudwc-FLeRYqLX=B4x-_9TV0qc5ZVt3Bg@mail.gmail.com>
<CAPTjJmrJacamKq1V5T8FECkm4jURdYQgj0VsC+JK5Db0NoFaww@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de diM6L88Mwr+0nbxxaa2r2wQiulmh/Xzo6B0oj+tOQ9zA==
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=W/f5tihE;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.014
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; '(which': 0.04; 'def':
0.04; '2022': 0.05; '31,': 0.05; 'is.': 0.05; 'sun,': 0.07; '"""':
0.09; 'angelico': 0.09; 'byte': 0.09; 'else:': 0.09; 'char': 0.16;
'chrisa': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris
angelico': 0.16; 'general.': 0.16; 'splitting': 0.16; 'wrote:':
0.16; 'to:addr:python-list': 0.20; 'sat,': 0.22; 'lines': 0.23;
'depends': 0.25; "isn't": 0.27; 'else': 0.27; 'function': 0.27;
'chris': 0.28; 'it,': 0.29; 'whole': 0.30; 'think': 0.32; 'guess':
0.32; 'message-id:@mail.gmail.com': 0.32; 'but': 0.32; 'same':
0.34; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34;
'from:addr:gmail.com': 0.35; 'really': 0.37; "it's": 0.37;
'received:209.85': 0.37; 'file': 0.38; 'received:209': 0.39;
'quite': 0.39; 'single': 0.39; 'break': 0.39; 'still': 0.40;
'seeking': 0.40; 'likely': 0.61; 'skip:o 10': 0.61; 'simply':
0.63; 'full': 0.64; 'your': 0.64; 'well': 0.65; 'less': 0.65;
'time.': 0.66; 'back': 0.67; 'time,': 0.67; 'per': 0.68;
'compare': 0.69; '2016': 0.70; "you'll": 0.73; 'backwards': 0.84;
'lose': 0.84; 'sulla': 0.84; 'line,': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=PO32JrchQ21R+5z2mZsF01j//VhDbO5MxkrixYOxOn0=;
b=W/f5tihEvrOurXnjFRwMdTZqPC26AA1C39XJh3kH4n20jZQgjJanHAfiruXoEgnsdX
C7YsF+ddXCMmFvYIa4o3gdSTnFo/cHN5JO7DaohFRRnorwD9mXWu6IxO8dnYC1b1WN7p
ADSM8f7E0BRo+E3vCeA8p6bGpODtR5CkSUcxN4cR2KwfkUs34XR9AWx0pJ2KdpbpCx/J
1MtE2c2G+ryBKNvrBQ6SKH13davP2BHFNA8weZEQZit5R73oA1hGQFviOGBDSSSqoNXa
Lp24V9oyf6FKK2b5PmGdI8bOux6MANK9RREZcK4gIiKMrh1tfi0dUBYN6HTT2iLcr9jO
WfRQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to;
bh=PO32JrchQ21R+5z2mZsF01j//VhDbO5MxkrixYOxOn0=;
b=kV3lEX5E3+fhtUpdyqy6Hq4yq1E3ULdx1KA3Eyp5e8zLDv4oN/pgnCPA8F1jfnj3kA
xNu9q7DHIKjkOTyIRsX7sJzUC8pvAk+V47aokqzg2Azid+gnE7GC8xIOGQcJiW5b/Faz
GF6pTP98rvCZeNVycs3Te/FDWGczrIvPf+UZG6m8C9QeBDDVkXHdPcHq0uM3xdS6mhRZ
aKeQsoD/xFRfEpm59viawSK/btSiYI9ZYx8/pWk/BVZLhZJBllEZ3/s35FTSaScKUwLL
ajYSYTtPu3dGgoA3Lczo8YuQBINle4JfC3TRzmbbZyy5K1l328TWdJLCIL8LK6PlpNDw
XbvA==
X-Gm-Message-State: AOAM532gz7y2QqpuhVSPIE8Kr7qQCNdMom7fD8anciZm9VOs0QptSnE7
TNBB1CSaRyzwt49qlYX7wabu3DIFx+is/m7x9ClHcmz9
X-Google-Smtp-Source: ABdhPJyyCCNm9Qo6jI7yIxIavAfgvBEeLdTqb5hAN5zJDMvlhkzMbKSm9mn+ytNIz3iEsnRZQroVThWJAtaTgAnGPBQ=
X-Received: by 2002:a05:600c:19c6:b0:392:88e2:7426 with SMTP id
u6-20020a05600c19c600b0039288e27426mr9836736wmq.132.1650748569696; Sat, 23
Apr 2022 14:16:09 -0700 (PDT)
In-Reply-To: <CABbU2U_sWyEmBXf0Psudwc-FLeRYqLX=B4x-_9TV0qc5ZVt3Bg@mail.gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmrJacamKq1V5T8FECkm4jURdYQgj0VsC+JK5Db0NoFaww@mail.gmail.com>
X-Mailman-Original-References: <CABbU2U98YKdcnJkDPfzE3Pqso+6LL72usB8hrSBVR0WbhauRoQ@mail.gmail.com>
<CAPTjJmr3AiCyvxXt=-nqNLrJfyQHmG=pvSsM7nU_XxhSe94zgA@mail.gmail.com>
<CABbU2U8TAvy0zMhUcNtTD0=WpQ6oNYEeZQuKDjnxhG85FVriDg@mail.gmail.com>
<CAPTjJmqnfoPjoNT2CNsrkMVxkzAMHHXHj-G3DuGrJ21SDRNsPA@mail.gmail.com>
<CABbU2U_sWyEmBXf0Psudwc-FLeRYqLX=B4x-_9TV0qc5ZVt3Bg@mail.gmail.com>
 by: Chris Angelico - Sat, 23 Apr 2022 21:15 UTC

On Sun, 24 Apr 2022 at 07:13, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
>
> On Sat, 23 Apr 2022 at 23:00, Chris Angelico <rosuav@gmail.com> wrote:
> > > > This is quite inefficient in general.
> > >
> > > Why inefficient? I think that readlines() will be much slower, not
> > > only more time consuming.
> >
> > It depends on which is more costly: reading the whole file (cost
> > depends on size of file) or reading chunks and splitting into lines
> > (cost depends on how well you guess at chunk size). If the lines are
> > all *precisely* the same number of bytes each, you can pick a chunk
> > size and step backwards with near-perfect efficiency (it's still
> > likely to be less efficient than reading a file forwards, on most file
> > systems, but it'll be close); but if you have to guess, adjust, and
> > keep going, then you lose efficiency there.
>
> Emh, why chunks? My function simply reads byte per byte and compares it to b"\n". When it find it, it stops and do a readline():
>
> def tail(filepath):
> """
> @author Marco Sulla
> @date May 31, 2016
> """
>
> try:
> filepath.is_file
> fp = str(filepath)
> except AttributeError:
> fp = filepath
>
> with open(fp, "rb") as f:
> size = os.stat(fp).st_size
> start_pos = 0 if size - 1 < 0 else size - 1
>
> if start_pos != 0:
> f.seek(start_pos)
> char = f.read(1)
>
> if char == b"\n":
> start_pos -= 1
> f.seek(start_pos)
>
> if start_pos == 0:
> f.seek(start_pos)
> else:
> for pos in range(start_pos, -1, -1):
> f.seek(pos)
>
> char = f.read(1)
>
> if char == b"\n":
> break
>
> return f.readline()
>
> This is only for one line and in utf8, but it can be generalised.
>

Ah. Well, then, THAT is why it's inefficient: you're seeking back one
single byte at a time, then reading forwards. That is NOT going to
play nicely with file systems or buffers.

Compare reading line by line over the file with readlines() and you'll
see how abysmal this is.

If you really only need one line (which isn't what your original post
suggested), I would recommend starting with a chunk that is likely to
include a full line, and expanding the chunk until you have that
newline. Much more efficient than one byte at a time.

ChrisA


devel / comp.lang.python / Re: tail

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor