Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Debug is human, de-fix divine.


devel / comp.lang.python / Re: tail

SubjectAuthor
o Re: taildn

1
Re: tail

<mailman.225.1650751490.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18005&group=comp.lang.python#18005

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: PythonL...@DancesWithMice.info (dn)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 24 Apr 2022 10:04:35 +1200
Organization: DWM
Lines: 74
Message-ID: <mailman.225.1650751490.20749.python-list@python.org>
References: <CABbU2U98YKdcnJkDPfzE3Pqso+6LL72usB8hrSBVR0WbhauRoQ@mail.gmail.com>
<CAPTjJmr3AiCyvxXt=-nqNLrJfyQHmG=pvSsM7nU_XxhSe94zgA@mail.gmail.com>
<CABbU2U8TAvy0zMhUcNtTD0=WpQ6oNYEeZQuKDjnxhG85FVriDg@mail.gmail.com>
<CAPTjJmqnfoPjoNT2CNsrkMVxkzAMHHXHj-G3DuGrJ21SDRNsPA@mail.gmail.com>
<CABbU2U_sWyEmBXf0Psudwc-FLeRYqLX=B4x-_9TV0qc5ZVt3Bg@mail.gmail.com>
<CAPTjJmrJacamKq1V5T8FECkm4jURdYQgj0VsC+JK5Db0NoFaww@mail.gmail.com>
<55a04f90-8fb8-c585-afae-aca73c7d641f@DancesWithMice.info>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de j5hUgHtpgNq1nKRnDLHeTwd/CecjWipHEUaKk0hSs0MA==
Return-Path: <PythonList@DancesWithMice.info>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=danceswithmice.info header.i=@danceswithmice.info
header.b=WGOtqJA4; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.004
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(which': 0.04; '2022':
0.05; 'is.': 0.05; 'sun,': 0.07; '=dn': 0.09; 'algorithmic': 0.09;
'angelico': 0.09; 'apparently': 0.09; 'byte': 0.09; 'electrical':
0.09; 'from:addr:danceswithmice.info': 0.09;
'from:addr:pythonlist': 0.09; 'linux': 0.09; 'readable': 0.09;
'timing': 0.09; 'utility': 0.09; '>>>>': 0.16; '>>>>>': 0.16;
'algorithm,': 0.16; 'algorithms': 0.16; 'general.': 0.16; 'logs':
0.16; 'message-id:@DancesWithMice.info': 0.16; 'received:51.254':
0.16; 'received:51.254.211': 0.16; 'received:51.254.211.219':
0.16; 'received:cloud': 0.16; 'received:rangi.cloud': 0.16;
'shorter': 0.16; 'size.': 0.16; 'splitting': 0.16; 'trials': 0.16;
'wrote:': 0.16; 'python': 0.16; 'applications': 0.17; 'to:addr
:python-list': 0.20; 'sat,': 0.22; 'lines': 0.23; 'idea': 0.24;
'depends': 0.25; "wasn't": 0.26; "isn't": 0.27; 'function': 0.27;
'>>>': 0.28; 'chris': 0.28; 'sense': 0.28; 'it,': 0.29; 'header
:User-Agent:1': 0.30; 'whole': 0.30; 'header:Organization:1':
0.31; 'think': 0.32; 'context': 0.32; 'elements': 0.32; 'feed':
0.32; 'guess': 0.32; 'split': 0.32; 'but': 0.32; "i'm": 0.33;
'server': 0.33; 'same': 0.34; 'header:In-Reply-To:1': 0.34;
'running': 0.34; 'files': 0.36; 'really': 0.37; "it's": 0.37;
'received:192.168': 0.37; 'file': 0.38; 'quite': 0.39; 'adding':
0.39; 'single': 0.39; 'use': 0.39; '(with': 0.39; 'evaluation':
0.39; 'still': 0.40; 'seeking': 0.40; 'should': 0.40; 'likely':
0.61; 'skip:h 10': 0.61; 'reasonable': 0.62; 'limited': 0.62;
'simply': 0.63; 'full': 0.64; 'received:51': 0.64; 'thus': 0.64;
'your': 0.64; 'produce': 0.65; 'similar': 0.65; 'well': 0.65;
'less': 0.65; 'received:userid': 0.66; 'time.': 0.66; 'back':
0.67; 'time,': 0.67; 'per': 0.68; 'compare': 0.69; 'generator':
0.69; "you'll": 0.73; 'low': 0.74; '....': 0.76; 'assessment':
0.81; 'backwards': 0.84; 'lose': 0.84; 'sulla': 0.84; 'thus,':
0.84; 'two.': 0.91; 'line,': 0.93; 'storage': 0.95
DKIM-Filter: OpenDKIM Filter v2.11.0 vps.rangi.cloud AACA1C9BE
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=danceswithmice.info;
s=staff; t=1650751488;
bh=anpMKtJyOWxZlS99mSLtqTAcJpCIEXvfJhfSDJMGW6s=;
h=Date:Subject:To:References:From:In-Reply-To:From;
b=WGOtqJA4jX/vaX2M254/4dn2B30wbkylpA4wCPJnMcKp+eXQQbbDiwvABQCfSbrMh
JKoxHkF23R5aL6VCJXy0dqYxHxlYAO/wSkTcV6kceJiYSPLFXCpJDK7+AHNPfprBS4
XBV79v9mJRAlabc7/4rNipmuJIkofaAeUqpRvmh/zbe89X8vFPkP7WP43Rw1zajGpM
MTYuTIDMiJEPoc+o6hDPSephJaHZAnNOX/ikyKiyEANZ8jLskrKSU9AjHjt5rrkShi
HiMWXsQF1SEdrLIeAPcTH27GzdizZiamfqMc3Qd85eAgkwPK+MtKUn5pAHPPVfc3b0
M+yJuZYrNDqTQ==
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on vps517507.ovh.net
X-Spam-Level:
X-Spam-Status: No, score=-4.9 required=5.0 tests=ALL_TRUSTED,BAYES_00,
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,NICE_REPLY_A autolearn=ham
autolearn_force=no version=3.4.0
DKIM-Filter: OpenDKIM Filter v2.11.0 vps.rangi.cloud B00974016
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=danceswithmice.info;
s=staff; t=1650751487;
bh=anpMKtJyOWxZlS99mSLtqTAcJpCIEXvfJhfSDJMGW6s=;
h=Date:Subject:To:References:From:In-Reply-To:From;
b=QGi5M5zSv5fLsURnNQ8aXid7HodSjBHGJl0mVazdD9csZZjzMlHre2HumVMgqmC1l
9DnEiL9YB/Fnen4TuyTS12SAUVGSYVIh3z0PtfMp5NNDzvHds+mh4O/TqMLoc2Rce7
UVwHaAxx2ObSS7MRAV38zEQ2JQMLudwOXvXYe/1BHVBAZ8xC1DgWp3VMy3+nYgxR3f
lAdlgd3wCsXclkiRjj6fUC+/ewuiRkxRKKURBARVM9Dv9Nckt+wcubRVRn8VN+QRXA
7SJsWDrqtq8SfEJkfV7KMxAA10BwsHUU2L6pcgXh5jC/Vl3Tvx9m0XF9VBdYrwRJrB
EbI9jrF9p+sKg==
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.8.1
Content-Language: en-GB
In-Reply-To: <CAPTjJmrJacamKq1V5T8FECkm4jURdYQgj0VsC+JK5Db0NoFaww@mail.gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <55a04f90-8fb8-c585-afae-aca73c7d641f@DancesWithMice.info>
X-Mailman-Original-References: <CABbU2U98YKdcnJkDPfzE3Pqso+6LL72usB8hrSBVR0WbhauRoQ@mail.gmail.com>
<CAPTjJmr3AiCyvxXt=-nqNLrJfyQHmG=pvSsM7nU_XxhSe94zgA@mail.gmail.com>
<CABbU2U8TAvy0zMhUcNtTD0=WpQ6oNYEeZQuKDjnxhG85FVriDg@mail.gmail.com>
<CAPTjJmqnfoPjoNT2CNsrkMVxkzAMHHXHj-G3DuGrJ21SDRNsPA@mail.gmail.com>
<CABbU2U_sWyEmBXf0Psudwc-FLeRYqLX=B4x-_9TV0qc5ZVt3Bg@mail.gmail.com>
<CAPTjJmrJacamKq1V5T8FECkm4jURdYQgj0VsC+JK5Db0NoFaww@mail.gmail.com>
 by: dn - Sat, 23 Apr 2022 22:04 UTC

On 24/04/2022 09.15, Chris Angelico wrote:
> On Sun, 24 Apr 2022 at 07:13, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
>>
>> On Sat, 23 Apr 2022 at 23:00, Chris Angelico <rosuav@gmail.com> wrote:
>>>>> This is quite inefficient in general.
>>>>
>>>> Why inefficient? I think that readlines() will be much slower, not
>>>> only more time consuming.
>>>
>>> It depends on which is more costly: reading the whole file (cost
>>> depends on size of file) or reading chunks and splitting into lines
>>> (cost depends on how well you guess at chunk size). If the lines are
>>> all *precisely* the same number of bytes each, you can pick a chunk
>>> size and step backwards with near-perfect efficiency (it's still
>>> likely to be less efficient than reading a file forwards, on most file
>>> systems, but it'll be close); but if you have to guess, adjust, and
>>> keep going, then you lose efficiency there.
>>
>> Emh, why chunks? My function simply reads byte per byte and compares it to b"\n". When it find it, it stops and do a readline():
....

> Ah. Well, then, THAT is why it's inefficient: you're seeking back one
> single byte at a time, then reading forwards. That is NOT going to
> play nicely with file systems or buffers.
>
> Compare reading line by line over the file with readlines() and you'll
> see how abysmal this is.
>
> If you really only need one line (which isn't what your original post
> suggested), I would recommend starting with a chunk that is likely to
> include a full line, and expanding the chunk until you have that
> newline. Much more efficient than one byte at a time.

Disagreeing with @Chris in the sense that I use tail very frequently,
and usually in the context of server logs - but I'm talking about the
Linux implementation, not Python code!

Agree with @Chris' assessment of the (in)efficiency. It is more likely
than not, that you will have a good idea of the length of each line.
Even if the line-length is highly-variable (thinking of some of my
applications of the Python logging module!), one can still 'take a stab
at it' (a "thumb suck" as an engineer-colleague used to say - apparently
not an electrical engineer!) by remembering that lines exceeding
80-characters become less readable and thus have likely?hopefully been
split into two.

Thus,

N*(80+p)

where N is the number of lines desired and p is a reasonable
'safety'/over-estimation percentage, would give a good chunk size.
Binar-ily grab that much of the end of the file, split on line-ending,
and take the last N elements from that list. (with 'recovery code' just
in case the 'chunk' wasn't large-enough).

Adding to the efficiency (of the algorithm, but not the dev-time),
consider that shorter files are likely to be more easily--handled by
reading serially from the beginning. To side-step @Chris' criticism, use
a generator to produce the individual lines (lazy evaluation and low
storage requirement) and feed them into a circular-queue which is
limited to N-entries. QED, as fast as the machine's I/O, and undemanding
of storage-space!

Running a few timing trials should reveal the 'sweet spot', at which one
algorithm takes-over from the other!

NB quite a few of IBM's (extensively researched) algorithms which formed
utility program[me]s on mainframes, made similar such algorithmic
choices, in the pursuit of efficiencies.
--
Regards,
=dn

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor