Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.


devel / comp.lang.python / Re: imaplib: is this really so unwieldy?

SubjectAuthor
o Re: imaplib: is this really so unwieldy?Cameron Simpson

1
Re: imaplib: is this really so unwieldy?

<mailman.331.1621937989.3087.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=13311&group=comp.lang.python#13311

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: cs...@cskk.id.au (Cameron Simpson)
Newsgroups: comp.lang.python
Subject: Re: imaplib: is this really so unwieldy?
Date: Tue, 25 May 2021 19:38:35 +1000
Lines: 142
Message-ID: <mailman.331.1621937989.3087.python-list@python.org>
References: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
<YKzFm7gR+5eKzov7@cskk.homeip.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: news.uni-berlin.de ozXTnACuxtRuq6I+tKQkDgit/T/NOgLYA2raN9KDBx/Q==
Return-Path: <cameron@cskk.id.au>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04;
'containing': 0.05; 'fairly': 0.05; 'library.': 0.05; 'exit':
0.07; 'translate': 0.07; 'underlying': 0.07; 'wrong.': 0.07;
'anyway,': 0.09; 'byte': 0.09; 'choice.': 0.09; 'convert': 0.09;
'docs,': 0.09; 'forced': 0.09; 'numeric': 0.09; 'overhead': 0.09;
'parse': 0.09; 'reasons:': 0.09; 'rfc': 0.09; "shouldn't": 0.09;
'text.': 0.09; 'which,': 0.09; 'cheers,': 0.10; 'looks': 0.11;
"can't": 0.14; 'import': 0.14; 'talks': 0.14; '"message': 0.16;
'10:23,': 0.16; 'awkward': 0.16; 'bytes)': 0.16; 'cameron': 0.16;
'charset': 0.16; 'eg:': 0.16; 'encoding': 0.16; 'encoding,': 0.16;
'expressions': 0.16; 'far,': 0.16; 'fetch': 0.16; 'from:addr:cs':
0.16; 'from:addr:cskk.id.au': 0.16; 'from:name:cameron simpson':
0.16; 'immersed': 0.16; 'indicating': 0.16; 'message-
id:@cskk.homeip.net': 0.16; 'module,': 0.16; 'num': 0.16;
'readonly': 0.16; 'received:10.10': 0.16; 'results,': 0.16;
'simpson': 0.16; 'skip:> 10': 0.16; 'specify': 0.16; 'strings,':
0.16; 'these.': 0.16; 'things,': 0.16; 'unicode': 0.16; 'url-
ip:4.31.198/24': 0.16; 'url-ip:4.31/16': 0.16; 'url-ip:4/8': 0.16;
'url:doc': 0.16; 'url:ietf': 0.16; 'wrapper': 0.16; 'wrote:':
0.16; 'says': 0.16; 'that.': 0.16; 'python': 0.16; 'uses': 0.19;
'name.': 0.20; "i've": 0.22; 'exception': 0.23; "what's": 0.23;
'to:addr:python-list': 0.23; 'code': 0.24; 'probably': 0.24;
'section': 0.26; 'seems': 0.26; 'extract': 0.27; 'library': 0.27;
'port': 0.27; 'else': 0.27; 'function': 0.28; 'done': 0.28;
'module': 0.28; 'requests': 0.29; 'text': 0.29; 'header:User-
Agent:1': 0.31; 'seem': 0.31; 'stuff': 0.31; 'takes': 0.31; 'but':
0.31; '(with': 0.31; 'expect': 0.31; 'fact': 0.31; 'raise': 0.31;
"doesn't": 0.32; "i'm": 0.32; 'assume': 0.32; 'ease': 0.32;
'formats': 0.32; 'specified': 0.32; 'unknown': 0.32; 'program':
0.33; 'using': 0.33; 'server': 0.33; 'header:In-Reply-To:1': 0.33;
'functions': 0.34; 'particular': 0.34; 'same': 0.34; "we're":
0.35; 'well,': 0.35; 'code:': 0.35; 'way': 0.37; 'really': 0.37;
'those': 0.38; 'read': 0.38; 'lot': 0.62; 'numbers': 0.63;
'subject:this': 0.63; 'header:Received:6': 0.63; 'named': 0.63;
'skip:m 20': 0.64; 'received:userid': 0.64; 'pass': 0.64; 'your':
0.64; 'higher': 0.69; 'response.': 0.69; 'deeply': 0.69; 'manner':
0.69; 'you.': 0.70; 'knowing': 0.71; 'content': 0.72; 'little':
0.75; "you'll": 0.75; 'received:172.16': 0.77; 'supposed': 0.77;
'treat': 0.77; 'breaking': 0.78; 'client': 0.79; '(such': 0.84;
'conservative': 0.84; 'decode': 0.84; 'handed': 0.84;
'programme,': 0.84; 'sequence.': 0.84; 'strings': 0.84;
'subject:really': 0.84; 'want.': 0.84; 'weird': 0.84; 'largely':
0.91
X-RG-Spam: Unknown
X-RazorGate-Vade: gggruggvucftvghtrhhoucdtuddrgeduledrvdekuddgudekucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuuffpveftpgfvgffnuffvtfetpdfqfgfvnecuuegrihhlohhuthemucegtddtnecunecujfgurhepfffhvffukfggtggujggffhesthdtredttdervdenucfhrhhomhepvegrmhgvrhhonhcuufhimhhpshhonhcuoegtshestghskhhkrdhiugdrrghuqeenucggtffrrghtthgvrhhnpeevhedvkeeuudejleeiheetfeehfeevieeitdekudetfeffudekueefkeeljefffeenucffohhmrghinhepvgigrghmphhlvgdrtghomhdpihgvthhfrdhorhhgpdhphihpihdrohhrghenucfkphepuddruddvledrudejjedruddttdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhephhgvlhhopehsohgvkhhrihhsrdifrghukhdrtghskhhkrdhhohhmvghiphdrnhgvthdpihhnvghtpedurdduvdelrddujeejrddutddtpdhmrghilhhfrhhomhepoegtrghmvghrohhnsegtshhkkhdrihgurdgruheqpdhrtghpthhtohepoehphihthhhonhdqlhhishhtsehphihthhhonhdrohhrgheq
X-RazorGate-Vade-Verdict: clean 0
X-RazorGate-Vade-Classification: clean
X-RG-VS-CLASS: clean
X-Authentication-Info: Submitted using ID cskk@bigpond.com
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
User-Agent: Mutt/2.0.3 (2020-12-04)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <YKzFm7gR+5eKzov7@cskk.homeip.net>
X-Mailman-Original-References: <21fb6c5f-97a4-654b-887f-2c31a549bcbe@adminart.net>
 by: Cameron Simpson - Tue, 25 May 2021 09:38 UTC

On 25May2021 10:23, hw <hw@adminart.net> wrote:
>I'm about to do stuff with emails on an IMAP server and wrote a program
>using imaplib which, so far, gets the UIDs of the messages in the
>inbox:
>
>
>#!/usr/bin/python

I'm going to assume you're using Python 3.

>import imaplib
>import re
>
>imapsession = imaplib.IMAP4_SSL('imap.example.com', port = 993)
>
>status, data = imapsession.login('user', 'password')
>if status != 'OK':
> print('Login failed')
> exit

Your "exit" won't do what you want. I expect this code to raise a
NameError exception here (you've not defined "exit"). That _will_ abort
the programme, but in a manner indicating that you're used an unknown
name. You probably want:

sys.exit(1)

You'll need to import "sys".

>messages = imapsession.select(mailbox = 'INBOX', readonly = True)
>typ, msgnums = imapsession.search(None, 'ALL')

I've done little with IMAP. What's in msgnums here? Eg:

print(type(msgnums), repr(msgnums))

just so we all know what we're dealing with here.

>message_uuids = []
>for number in str(msgnums)[3:-2].split():

This is very strange. Did you see the example at the end of the module
docs, it has this example code:

import getpass, imaplib

M = imaplib.IMAP4()
M.login(getpass.getuser(), getpass.getpass())
M.select()
typ, data = M.search(None, 'ALL')
for num in data[0].split():
typ, data = M.fetch(num, '(RFC822)')
print('Message %s\n%s\n' % (num, data[0][1]))
M.close()
M.logout()

It is just breaking apart data[0] into strings which were separated by
whitespace in the response. And then using those same strings as keys
for the .fecth() call. That doesn't seem complex, and in fact is blind
to the format of the "message numbers" returned. It just takes what it
is handed and uses those to fetch each message.

> status, data = imapsession.fetch(number, '(UID)')
> if status == 'OK':
> match = re.match('.*\(UID (\d+)\)', str(data))
[...]
>It's working (with Cyrus), but I have the feeling I'm doing it all
>wrong because it seems so unwieldy.

IMAP's quite complex. Have you read RFC2060?

https://datatracker.ietf.org/doc/html/rfc2060.html

The imaplib library is probably a fairly basic wrapper for the
underlying protocol which provides methods for the basic client requests
and conceals the asynchronicity from the user for ease of (basic) use.

>Apparently the functions of imaplib return some kind of bytes while
>expecting strings as arguments, like message numbers must be strings.
>The documentation doesn't seem to say if message UIDs are supposed to
>be integers or strings.

You can go a long way by pretending that they are opaque strings. That
they may be numeric in content can be irrelevant to you. treat them as
strings.

>So I'm forced to convert stuff from bytes to strings (which is weird
>because bytes are bytes)

"bytes are bytes" is tautological. You're getting bytes for a few
reasons:

- the imap protocol largely talks about octets (bytes), but says they're
text. For this reason a lot of stuff you pass as client parameters are
strings, because strings are text.

- text may be encoded as bytes in many ways, and without knowing the
encoding, you can't extract text (strings) from bytes

- the imaplib library may date from Python 2, where the str type was
essentially a byte sequence. In Python 3 a str is a sequence of
Unicode code points, and you translate to/from bytes if you need to
work with bytes.

Anyway, the IMAP response are bytes containing text. You get a lot of
bytes.

When you go:

text = str(data)

that is _assuming_ a particular text encoding stored in the data. You
really ought to specify an encoding here. If you've not specified the
CHARSET for things, 'ascii' would be a conservative choice. The IMAP RFC
talks about what to expect in section 4 (Data Formats). There's quite a
lot of possible response formats and I can understand imaplib not
getting deeply into decoding these.

>and to use regular expressions to extract the message-uids from what
>the functions return (which I shouldn't have to because when I'm asking
>a function to give me a uid, I expect it to return a uid).

No, you're asking the IMAP _protocol_ to return you UIDs. The module
itself doesn't parse what you ask for in the fetch results, and
therefore it can't decode the response (data bytes) into some higher
level thing (such as UIDs in your case, but you can ask for all sorts of
weird stuff with IMAP).

So having passed '(UID)' to the SEARCH request, you now need to parse
the response.

>This so totally awkward and unwieldy and involves so much overhead
>that I must be doing this wrong. But am I? How would I do this right?

Well, you _could_ get immersed in the nitty gritty of the IMAP protocol
and the imaplib module, _or_ you could see if someone else has done some
work to make this easier by writing a higher level library. A search at
pypi.org for "imap" found a lot of stuff. The package named "imap-tools"
looks promising. Try that.

Cheers,
Cameron Simpson <cs@cskk.id.au>

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor