Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

The existence of god implies a violation of causality.


devel / comp.lang.python / Re: tail

SubjectAuthor
* Re: tailMarco Sulla
`- Re: tailDennis Lee Bieber

1
Re: tail

<mailman.358.1652042986.20749.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18199&group=comp.lang.python#18199

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Marco.Su...@gmail.com (Marco Sulla)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 8 May 2022 22:48:32 +0200
Lines: 70
Message-ID: <mailman.358.1652042986.20749.python-list@python.org>
References: <CABbU2U92=Xz3d2jesNZi83stnGG1XWFg7ig-=tjE5_4b_XSyzQ@mail.gmail.com>
<C2969F2F-E2C9-45FE-B076-19D179E27868@barrys-emacs.org>
<CABbU2U8u-arOsEO=JfRUPeNQS68TYWVUYHvp6StiNpp_xaotZQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de 45unlTMm6MKcTuy0n8xC7wvZE4wua3P/09UeOf6OO7aA==
Return-Path: <elbarbun@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=U3W8NItX;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.017
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'def': 0.04; 'stream':
0.04; '2022': 0.05; 'bigger': 0.05; 'sun,': 0.07; 'anyway,': 0.09;
'byte': 0.09; 'cc:addr:python-list': 0.09; 'meant': 0.09;
'received:209.85.219': 0.09; 'smaller': 0.09; 'typically': 0.09;
'problem.': 0.15; '2022,': 0.16; '>>>>': 0.16; 'barry': 0.16;
'behaviour': 0.16; 'cc:name:python list': 0.16; 'found.': 0.16;
'from:name:marco sulla': 0.16; 'shortly,': 0.16; 'url:seek': 0.16;
'wrote:': 0.16; 'problem': 0.16; 'cc:addr:python.org': 0.20;
'way.': 0.22; 'code': 0.23; 'lines': 0.23; 'saying': 0.25;
'cc:2**0': 0.25; 'seems': 0.26; 'binary': 0.26; '>>>': 0.28;
'think': 0.32; 'split': 0.32; 'message-id:@mail.gmail.com': 0.32;
'but': 0.32; "i'm": 0.33; '100': 0.33; 'mean': 0.34; 'header:In-
Reply-To:1': 0.34; 'received:google.com': 0.34; 'yes,': 0.35;
'from:addr:gmail.com': 0.35; 'files': 0.36; 'change': 0.36;
"it's": 0.37; 'received:209.85': 0.37; 'file': 0.38; 'could':
0.38; 'read': 0.38; 'received:209': 0.39; 'two': 0.39; 'quite':
0.39; 'adding': 0.39; 'text': 0.39; 'otherwise': 0.39; 'list':
0.39; 'use': 0.39; 'on.': 0.39; 'data.': 0.40; 'files.': 0.40;
'try': 0.40; 'mode': 0.62; 'great': 0.63; 'between': 0.63; 'down':
0.64; 'explained': 0.64; 'finished': 0.64; 'per': 0.68; 'further':
0.69; 'small,': 0.69; 'little': 0.73; 'quote': 0.74; 'near': 0.76;
'seek': 0.81; 'position': 0.81; 'eventually': 0.84; 'junction':
0.84; 'mega': 0.84; 'scott': 0.84; 'sulla': 0.84; 'tiny': 0.84;
'mode.': 0.89; 'tend': 0.91; 'trick': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:cc:content-transfer-encoding;
bh=E4m+YPN+iIoBq2W6Sgy9VmL66iSn2gSxGxo+bNTRZZY=;
b=U3W8NItXR8h1G/IzBQdnv84PINgI4tivjguffP3Zr2U4yOhIJj5vHkAhmS8jKiFObn
5I72TSwynjnBwRa6miehsSPDsunAPc0l2djwwRkuFy5fK2D5vYR0HbxtPEIJvCcmtyED
TK98k8RACNFXA/AYXlU4fx+h8SMtwcSrENUswr83eisph15xsIoAa17tMBWmjmXuPQi7
P1QVE/asffTmQUnig19qebcW3NF2A74juqZdvhH8ZHdV3oreJVxwL7ycwncnR4r52fQW
+i1mGIl1qCwpMVnti27DbcmjmrfXIlmaIWQZvMoD3UCPS2S+toIbq1FK5NDUY9nARrvD
ghEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:cc:content-transfer-encoding;
bh=E4m+YPN+iIoBq2W6Sgy9VmL66iSn2gSxGxo+bNTRZZY=;
b=JVjs0Z5yXBUgJr3RW8lu3q05Q4iH0k5PnXdFJe3xvSHo1psCusnOqdYbLJqjkr1QQe
cvMpVDhlXrCM/YurYSmDPH9cNjxEpkrg/1igy1CDa/KgyVNP5iEifloZO0OgwLpYDgj8
Ph1lxvBPRgpbuMzbqI2/bsEALr8dpWOdda2n9qceXa7mHbsMfjnR3cBYrhSdb8vPgkkc
o9oZliBKh4CrJaw5QgLqGKmocxFbBEAmBKGcL+RwasHGVFxeTLxD1N0UbP7mpdWD2567
EHXs+Dps7iUD7OkQDysInp2Ch66YzfQl4Uc7nZ9EJzAp3fMBL9yk5fY6Em5RqbDVIu3t
hgYA==
X-Gm-Message-State: AOAM532ZXbJOYmzxM06Mq/qnvDP+oh9hCgVmZ2/tZ+ahQ5zN9fxSqOpu
eQsXyCUWoW2oWgmZWngol3N4hzQN/J7qhA4d+/HNhZM8fM4=
X-Google-Smtp-Source: ABdhPJwbwDg+W2wh6xrmczFXuBj+apsbuI7qwKxOQrCCTgH+/eyWSMVHoZq0TgEjJlnr5ACMMZXnIJf1oV68DBCBfH4=
X-Received: by 2002:a25:8612:0:b0:648:730c:243b with SMTP id
y18-20020a258612000000b00648730c243bmr10619578ybk.195.1652042949741; Sun, 08
May 2022 13:49:09 -0700 (PDT)
In-Reply-To: <C2969F2F-E2C9-45FE-B076-19D179E27868@barrys-emacs.org>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CABbU2U8u-arOsEO=JfRUPeNQS68TYWVUYHvp6StiNpp_xaotZQ@mail.gmail.com>
X-Mailman-Original-References: <CABbU2U92=Xz3d2jesNZi83stnGG1XWFg7ig-=tjE5_4b_XSyzQ@mail.gmail.com>
<C2969F2F-E2C9-45FE-B076-19D179E27868@barrys-emacs.org>
 by: Marco Sulla - Sun, 8 May 2022 20:48 UTC

On Sun, 8 May 2022 at 22:34, Barry <barry@barrys-emacs.org> wrote:
>
> > On 8 May 2022, at 20:48, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
> >
> > On Sun, 8 May 2022 at 20:31, Barry Scott <barry@barrys-emacs.org> wrote:
> >>
> >>>> On 8 May 2022, at 17:05, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
> >>>
> >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):
> >>> n_chunk_size = n * chunk_size
> >>
> >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically the smaller size the file system will allocate.
> >> I tend to read on multiple of MiB as its near instant.
> >
> > Well, I tested on a little file, a list of my preferred pizzas, so....
>
> Try it on a very big file.

I'm not saying it's a good idea, it's only the value that I needed for my tests.
Anyway, it's not a problem with big files. The problem is with files
with long lines.

> >> In text mode you can only seek to a value return from f.tell() otherwise the behaviour is undefined.
> >
> > Why? I don't see any recommendation about it in the docs:
> > https://docs.python.org/3/library/io.html#io.IOBase.seek
>
> What does adding 1 to a pos mean?
> If it’s binary it mean 1 byte further down the file but in text mode it may need to
> move the point 1, 2 or 3 bytes down the file.

Emh. I re-quote

seek(offset, whence=SEEK_SET)
Change the stream position to the given byte offset.

And so on. No mention of differences between text and binary mode.

> >> You have on limit on the amount of data read.
> >
> > I explained that previously. Anyway, chunk_size is small, so it's not
> > a great problem.
>
> Typo I meant you have no limit.
>
> You read all the data till the end of the file that might be mega bytes of data.

Yes, I already explained why and how it could be optimized. I quote myself:

Shortly, the file is always opened in text mode. File is read at the
end in bigger and bigger chunks, until the file is finished or all the
lines are found.

Why? Because in encodings that have more than 1 byte per character,
reading a chunk of n bytes, then reading the previous chunk, can
eventually split the character between the chunks in two distinct
bytes.

I think one can read chunk by chunk and test the chunk junction
problem. I suppose the code will be faster this way. Anyway, it seems
that this trick is quite fast anyway and it's a lot simpler.

Re: tail

<phrg7hls04tnkgsdaifa372q0s4egjdnm1@4ax.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18201&group=comp.lang.python#18201

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sun, 08 May 2022 20:56:45 -0500
From: wlfr...@ix.netcom.com (Dennis Lee Bieber)
Newsgroups: comp.lang.python
Subject: Re: tail
Date: Sun, 08 May 2022 21:56:46 -0400
Organization: IISS Elusive Unicorn
Message-ID: <phrg7hls04tnkgsdaifa372q0s4egjdnm1@4ax.com>
References: <CABbU2U92=Xz3d2jesNZi83stnGG1XWFg7ig-=tjE5_4b_XSyzQ@mail.gmail.com> <C2969F2F-E2C9-45FE-B076-19D179E27868@barrys-emacs.org> <CABbU2U8u-arOsEO=JfRUPeNQS68TYWVUYHvp6StiNpp_xaotZQ@mail.gmail.com> <mailman.358.1652042986.20749.python-list@python.org>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Lines: 73
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-IZNYe6ng/FUeTaSlN+dHm2lbun68912FtV04Z9zcCAZxmZ3Ol2enWGmJGjuymN3zsB5IPdgdjQxFg+l!FTsce/SQpMt4KJ3wE/KEZnE+eBsvySouVB6rV3jgWLoPZ8jrotMSRKEyoDm8jVmlQ4aDFZRm
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3824
 by: Dennis Lee Bieber - Mon, 9 May 2022 01:56 UTC

On Sun, 8 May 2022 22:48:32 +0200, Marco Sulla
<Marco.Sulla.Python@gmail.com> declaimed the following:

>
>Emh. I re-quote
>
>seek(offset, whence=SEEK_SET)
>Change the stream position to the given byte offset.
>
>And so on. No mention of differences between text and binary mode.

You ignore that, underneath, Python is just wrapping the C API... And
the documentation for C explicitly specifies that other then SEEK_END with
offset 0, and SEEK_SET with offset of 0, for a text file one can only rely
upon SEEK_SET using an offset previously obtained with (C) ftell() /
(Python) .tell() .

https://docs.python.org/3/library/io.html
"""
class io.IOBase

The abstract base class for all I/O classes.
"""
seek(offset, whence=SEEK_SET)

Change the stream position to the given byte offset. offset is
interpreted relative to the position indicated by whence. The default value
for whence is SEEK_SET. Values for whence are:
"""

Applicable to BINARY MODE I/O: For UTF-8 and any other multibyte
encoding, this means you could end up positioning into the middle of a
"character" and subsequently read garbage. It is on you to handle
synchronizing on a valid character position, and also to handle different
line ending conventions.

"""
class io.TextIOBase

Base class for text streams. This class provides a character and line
based interface to stream I/O. It inherits IOBase.
"""
seek(offset, whence=SEEK_SET)

Change the stream position to the given offset. Behaviour depends on
the whence parameter. The default value for whence is SEEK_SET.

SEEK_SET or 0: seek from the start of the stream (the default);
offset must either be a number returned by TextIOBase.tell(), or zero. Any
other offset value produces undefined behaviour.

SEEK_CUR or 1: “seek” to the current position; offset must be zero,
which is a no-operation (all other values are unsupported).

SEEK_END or 2: seek to the end of the stream; offset must be zero
(all other values are unsupported).
"""

EMPHASIS: "offset must either be a number returned by TextIOBase.tell(), or
zero."

TEXT I/O, with a specified encoding, will return Unicode data points,
and will handle converting line ending to the internal (<lf> represents
new-line) format.

Since your code does not specify BINARY mode in the open statement,
Python should be using TEXT mode.

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor