Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Ignorance is bliss. -- Thomas Gray Fortune updates the great quotes, #42: BLISS is ignorance.


devel / comp.lang.python / Re: tail

SubjectAuthor
o Re: tailMarco Sulla

1
Re: tail

<mailman.450.1652982656.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18330&group=comp.lang.python#18330

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Marco.Su...@gmail.com (Marco Sulla)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Thu, 19 May 2022 19:50:16 +0200
Lines: 62
Message-ID: <mailman.450.1652982656.20749.python-list@python.org>
References: <CABbU2U_DbdMt7578cqAuHjyRpe=3cW29aog0=OJJZRkX8-Xi6g@mail.gmail.com>
<YoVlbKuF62gisDjt@cskk.homeip.net>
<CABbU2U-fXhEogR=54iUepzNw-e0pq+AACpD46mSmYyemJ6E2dA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de ExUcIYOmwv+WN3S4AwexBQiYdfa493DorPDeT9PIjtCg==
Return-Path: <elbarbun@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=EF1eNUhQ;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.012
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '2022': 0.05; 'that?':
0.07; 'thing.': 0.07; 'anyway,': 0.09; 'linux': 0.09; '"line':
0.16; '>>>>': 0.16; 'cameron': 0.16; 'dominate': 0.16; 'encoding':
0.16; 'from:name:marco sulla': 0.16; 'hand,': 0.16; 'indeed':
0.16; 'provide.': 0.16; 'python3': 0.16; 'received:209.85.210.48':
0.16; 'received:mail-ot1-f48.google.com': 0.16; 'seconds.': 0.16;
'similar.': 0.16; 'simpson': 0.16; 'skip:> 10': 0.16; 'tries':
0.16; 'tuning': 0.16; 'wrote:': 0.16; 'python': 0.16; 'to:addr
:python-list': 0.20; "i've": 0.22; 'code': 0.23; 'seems': 0.26;
'bit': 0.27; 'done': 0.28; 'received:209.85.210': 0.29; 'it,':
0.29; 'whole': 0.30; 'default': 0.31; "doesn't": 0.32; 'amounts':
0.32; 'end.': 0.32; 'good.': 0.32; 'message-id:@mail.gmail.com':
0.32; 'but': 0.32; "i'll": 0.33; 'there': 0.33; 'same': 0.34;
'core': 0.34; 'printing': 0.34; 'header:In-Reply-To:1': 0.34;
'received:google.com': 0.34; 'yes,': 0.35; 'from:addr:gmail.com':
0.35; 'files': 0.36; 'source': 0.36; 'really': 0.37; "it's": 0.37;
'received:209.85': 0.37; 'hard': 0.37; 'file': 0.38;
'received:209': 0.39; 'text': 0.39; 'wed,': 0.39; 'wrote': 0.39;
'try': 0.40; 'skip:t 40': 0.64; 'your': 0.64; 'similar': 0.65;
'numbers': 0.67; 'bad': 0.67; 'cost': 0.69; 'also:': 0.69;
'compare': 0.69; 'result,': 0.69; 'skip:t 60': 0.70;
'performance': 0.71; 'skip:y 10': 0.76; 'surprise': 0.76;
'measuring': 0.84; 'sulla': 0.84; 'me:': 0.91; 'central': 0.95
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
bh=ixEN6I8t8doMQetQmv/DlAFPhV5cGz1910e7XxEx0mY=;
b=EF1eNUhQcBpekJe+Ddth7fQqgzA7aZF00PRYqMTrFFbCjlfp+CBKN4WHwSqzkVZewM
O9vRWkoEWtn+3/uV20JV1G2KiU6LPrkVEHf3SzKW4bC81devvrpfNeb43aFcJdoyjEDW
sHc2yXi766RSUBnfmKhSzipAVrd7UKwAV+ygTg4WmzDT5bGaGcK0sucjA+AMwT9HDvBW
cPec37BMuXZC8WHF8XGzG3jWGviv4sAWXJst3m0xFdv/lZiCGfOJY+iPGZfEAkdbMMFE
j9gDWP4CBYr5xIgKgMoVrd6xol2ypls+Gp+VzJHedKRZO4/dI5sqbWf9WymdXiV1XAlE
+GYA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to;
bh=ixEN6I8t8doMQetQmv/DlAFPhV5cGz1910e7XxEx0mY=;
b=YUgYYkbN3vBWMOjXnKnMUy/8cOsDzObmWif3WZRxE2oGOnE3GAG0DbYhqG9EjVLuk0
/n0sUWhxVyVQf7M+ySkTLvwD1UhZStIZ8PI/sN1NW0XY4V2sJrOwB/RuzFvykEFDmiUH
IM3TBRc2DTsMSJjhN4uMuqeFqgmUgu7XqgsbqK1WU9it9/clN37s0Q4X7eCSIaRgNCko
mXcMQvvOMaQN03kHZ8rLb4QWnNtLuW5u37K1gPThOT3L6veJZHNvE3d7Ck5ww6IhDBSy
mPN16GkiFDrh/Z6zdKDbMwu7ryxGtBHgyDUbVON3ufP5PqHm2H062UHcKciL+1x4M9Ls
XY0Q==
X-Gm-Message-State: AOAM530E1XTICfR6VBZOV8AsU4l3xbV7yEB/VcBtIHamatHlriNIDSh6
JOb42h2IX8vz0HHPsXGBlvk9vW0QY3g84ICsdPZ+CyXbSnw=
X-Google-Smtp-Source: ABdhPJz4lrsYbybJE66Mq8UdPHeAbs3wxlh9oWWU9e75OOQeAYpuWePkqIcCA/hzzB4A+YXjvKTnm9J1l/YMTgsZYPQ=
X-Received: by 2002:a05:6830:1449:b0:60a:dc1d:3afa with SMTP id
w9-20020a056830144900b0060adc1d3afamr2247288otp.89.1652982652311; Thu, 19 May
2022 10:50:52 -0700 (PDT)
In-Reply-To: <YoVlbKuF62gisDjt@cskk.homeip.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CABbU2U-fXhEogR=54iUepzNw-e0pq+AACpD46mSmYyemJ6E2dA@mail.gmail.com>
X-Mailman-Original-References: <CABbU2U_DbdMt7578cqAuHjyRpe=3cW29aog0=OJJZRkX8-Xi6g@mail.gmail.com>
<YoVlbKuF62gisDjt@cskk.homeip.net>
 by: Marco Sulla - Thu, 19 May 2022 17:50 UTC

On Wed, 18 May 2022 at 23:32, Cameron Simpson <cs@cskk.id.au> wrote:
>
> On 17May2022 22:45, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
> >Well, I've done a benchmark.
> >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, number=100000)
> >1.5963431186974049
> >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, number=100000)
> >2.5240604374557734
> >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", globals={"tail":tail}, number=100000)
> >1.8944984432309866
>
> This suggests that the file size does not dominate uour runtime.

Yes, this is what I wanted to test and it seems good.

> Ah.
> _Or_ that there are similar numbers of newlines vs text in the files so
> reading similar amounts of data from the end. If the "line desnity" of
> the files were similar you would hope that the runtimes would be
> similar.

No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum,
so really long lines. Indeed I get better results tuning chunk_size.
Anyway, also with the default value the performance is not bad at all.

> >But the time of Linux tail surprise me:
> >
> >marco@buzz:~$ time tail lorem.txt
> >[text]
> >
> >real 0m0.004s
> >user 0m0.003s
> >sys 0m0.001s
> >
> >It's strange that it's so slow. I thought it was because it decodes
> >and print the result, but I timed
>
> You're measuring different things. timeit() tries hard to measure just
> the code snippet you provide. It doesn't measure the startup cost of the
> whole python interpreter. Try:
>
> time python3 your-tail-prog.py /home/marco/lorem.txt

Well, I'll try it, but it's not a bit unfair to compare Python startup with C?
> BTW, does your `tail()` print output? If not, again not measuring the
> same thing.
> [...]
> Also: does tail(1) do character set / encoding stuff? Does your Python
> code do that? Might be apples and oranges.

Well, as I wrote I also timed

timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))",
globals={"tail":tail}, number=100000)

and I got ~36 seconds.

> If you have the source of tail(1) to hand, consider getting to the core
> and measuring `time()` immediately before and immediately after the
> central tail operation and printing the result.

IMHO this is a very good idea, but I have to find the time(). Ahah. Emh.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor