Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Being overloaded is the sign of a true Debian maintainer. -- JHM on #Debian


devel / comp.lang.python / Re: tail

SubjectAuthor
o Re: tailChris Angelico

1
Re: tail

<mailman.226.1650751912.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18006&group=comp.lang.python#18006

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ros...@gmail.com (Chris Angelico)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 24 Apr 2022 08:11:39 +1000
Lines: 65
Message-ID: <mailman.226.1650751912.20749.python-list@python.org>
References: <CABbU2U98YKdcnJkDPfzE3Pqso+6LL72usB8hrSBVR0WbhauRoQ@mail.gmail.com>
<CAPTjJmr3AiCyvxXt=-nqNLrJfyQHmG=pvSsM7nU_XxhSe94zgA@mail.gmail.com>
<20220423220229.6lvry4nwsbk2llcd@hjp.at>
<CAPTjJmpZhRdtjWSpgNLYP_cNZNmirCaBLnUhgH8WyCjffnwT1Q@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de ati+OiqSYO3k4m2cOYUprga7y/Dn+R3E/31wMUDmrUfQ==
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=orjUmqEv;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '2022': 0.05; 'bunch':
0.05; 'random': 0.05; 'sun,': 0.07; 'used.': 0.07; 'angelico':
0.09; 'byte': 0.09; 'depend': 0.09; 'general,': 0.09; 'hard.':
0.09; 'memory.': 0.09; 'minus': 0.09; "shouldn't": 0.09; 'text.':
0.09; 'threshold': 0.09; 'way?': 0.09; 'steps': 0.11; 'memory':
0.15; '"line': 0.16; '(so': 0.16; '(will': 0.16; '768': 0.16;
'bottom?': 0.16; 'chrisa': 0.16; 'commons': 0.16;
'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16;
'general.': 0.16; 'hardly': 0.16; 'holzer': 0.16; 'instead.':
0.16; 'iterate': 0.16; 'personally,': 0.16; 'something).': 0.16;
'streams': 0.16; 'structures': 0.16; 'wrote:': 0.16; 'problem':
0.16; 'probably': 0.17; 'to:addr:python-list': 0.20; 'all,': 0.20;
'lines': 0.23; 'python,': 0.25; 'depends': 0.25; 'library': 0.26;
'function': 0.27; 'chris': 0.28; 'whole': 0.30; 'question': 0.32;
'(as': 0.32; 'maintaining': 0.32; 'point,': 0.32; 'split': 0.32;
'message-id:@mail.gmail.com': 0.32; 'unless': 0.32; 'but': 0.32;
"i'm": 0.33; 'there': 0.33; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'from:addr:gmail.com': 0.35;
'necessarily': 0.37; "it's": 0.37; 'received:209.85': 0.37;
'though': 0.37; 'file': 0.38; 'read': 0.38; 'received:209': 0.39;
'quite': 0.39; 'text': 0.39; 'use': 0.39; 'case.': 0.40;
'difficult': 0.40; 'done.': 0.40; 'file:': 0.40; 'place.': 0.40;
'method': 0.61; 'format': 0.62; 'skip:r 20': 0.64; 'introducing':
0.64; 'your': 0.64; 'pay': 0.65; 'entire': 0.67; 'right': 0.68;
'price.': 0.69; 'virtually': 0.69; 'too.': 0.70; 'database': 0.80;
'seek': 0.81; 'position': 0.81; 'backwards': 0.84; 'big,': 0.84;
'decode': 0.84; 'easy.': 0.84; 'lines,': 0.84; 'sulla': 0.84;
'tiny': 0.84; 'doing.': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=uYIERugq3Q+ZPL1NTAy1T91b+7MVF4WHbDnnyBbJFPU=;
b=orjUmqEvav+ijYzQTMuK1cB6/ZLCIoL8OwcihZaRCP5ezOIv0DYaRdp3ejXJL+NbIc
2l3z36SdQUQxzPAj4jGe7R9Zx0cIIGC9HwJ87SMQMlhksjspoVkYS5PNGNQ6r6Ng1nVo
VyNTy1B2wnnh9x/qjrH8RheNUH15DWmXWmyLIesgWp2wzEKAKmVwEWYDdBnRgYx7rV7V
ARUdvWmwjBXzxL9mEfzqE4KTHxRd8AQ+bhHXzCpTz4OW2jufpa6oZZoTqCekABbvkCVV
2JcQX9LFE8MnbtcY79pnB9maCmpVFVVGm1KxmQ36jZ6fEIuNwzqzFUoLzd2krLmtw6Vs
JP/w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to;
bh=uYIERugq3Q+ZPL1NTAy1T91b+7MVF4WHbDnnyBbJFPU=;
b=gF2nBNYx8GXn6U1SgJNDI+IL4LrG7LgTRg1P7tgg5Pn+loI6I7cHog2I1FVserXCQr
Sc9GmFpKGsOSplBoFRAF6CA1V1G+fs4h+jqnODai1yl7HcvPY/KDvTpxVJ7eD4X0RsCv
Agi70aOxdrBWpxfP/A4Fi9m+sR/SeyaUDa/o8LC+zstzMUhDqS0cXXKvGCehiJVtoSp4
Yi+afIo3X3SjVpNBWu2bKWfC/QsOuxGTWsAiJZKT3hWpG409PFZrHmt9csoA6gMuiGWs
3qM7w32RTAMZhvLko1OTDJ5bAanjF1m++vBg6IxEa+WHgVprl0z3lKpNwNLrx0G0W+Y0
JQow==
X-Gm-Message-State: AOAM530MJ7onKTcwfZnZ4ImAySu9ZZTfM1/gljInfnLTnRpMk5QDmzjY
c83yPgA7mYGtn+d+F3+NmE2OsnwYREwOm0C8ICFQj/dc
X-Google-Smtp-Source: ABdhPJxKqYfGE2pMYVl394aeiPG8S9oi451f45Q9Rx3+ZQReUMXAJdQQP7PUyaGUesmOkhzYDJGymAJiC3GwKOvGLBk=
X-Received: by 2002:a05:6000:507:b0:20a:a549:d3ab with SMTP id
a7-20020a056000050700b0020aa549d3abmr8579704wrf.243.1650751910221; Sat, 23
Apr 2022 15:11:50 -0700 (PDT)
In-Reply-To: <20220423220229.6lvry4nwsbk2llcd@hjp.at>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmpZhRdtjWSpgNLYP_cNZNmirCaBLnUhgH8WyCjffnwT1Q@mail.gmail.com>
X-Mailman-Original-References: <CABbU2U98YKdcnJkDPfzE3Pqso+6LL72usB8hrSBVR0WbhauRoQ@mail.gmail.com>
<CAPTjJmr3AiCyvxXt=-nqNLrJfyQHmG=pvSsM7nU_XxhSe94zgA@mail.gmail.com>
<20220423220229.6lvry4nwsbk2llcd@hjp.at>
 by: Chris Angelico - Sat, 23 Apr 2022 22:11 UTC

On Sun, 24 Apr 2022 at 08:03, Peter J. Holzer <hjp-python@hjp.at> wrote:
>
> On 2022-04-24 04:57:20 +1000, Chris Angelico wrote:
> > On Sun, 24 Apr 2022 at 04:37, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
> > > What about introducing a method for text streams that reads the lines
> > > from the bottom? Java has also a ReversedLinesFileReader with Apache
> > > Commons IO.
> >
> > It's fundamentally difficult to get precise. In general, there are
> > three steps to reading the last N lines of a file:
> >
> > 1) Find out the size of the file (currently, if it's being grown)
> > 2) Seek to the end of the file, minus some threshold that you hope
> > will contain a number of lines
> > 3) Read from there to the end of the file, split it into lines, and
> > keep the last N
> [...]
> > This is quite inefficient in general. It would be far FAR easier to do
> > this instead:
> >
> > 1) Read the entire file and decode bytes to text
> > 2) Split into lines
> > 3) Iterate backwards over the lines
>
> Which one is more efficient depends very much on the size of the file.
> For a file of a few kilobytes, the second solution is probably more
> efficient. But for a few gigabytes, that's almost certainly not the
> case.

Yeah. I said "easier", not necessarily more efficient. Which is more
efficient is a virtually unanswerable question (will you need to
iterate over the whole file or stop part way? Is the file stored
contiguously? Can you memory map it in some way?), so it's going to
depend a lot on your use-case.

> > Tada! Done. And in Python, quite easy. The downside, of course, is
> > that you have to store the entire file in memory.
>
> Not just memory. You have to read the whole file in the first place. Which is
> hardly efficient if you only need a tiny fraction.

Right - if that's the case, then the chunked form, even though it's
harder, would be worth doing.

> > Personally, unless the file is tremendously large and I know for sure
> > that I'm not going to end up iterating over it all, I would pay the
> > memory price.
>
> Me, too. Problem with a library function (as Marco proposes) is that you
> don't know how it will be used.
>

Yup. And there may be other options worth considering, like
maintaining an index (a bunch of "line 142857 is at byte position
3141592" entries) which would allow random access... but at some
point, if your file is that big, you probably shouldn't be storing it
as a file of lines of text. Use a database instead.

Reading a text file backwards by lines is, by definition, hard. Every
file format I know of that involves starting at the end of the file is
defined in binary, so you can actually seek, and is usually defined
with fixed-size structures (so you just go "read the last 768 bytes of
the file" or something).

ChrisA

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor