Message-ID:

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.

devel / comp.lang.python / Re: What to use for finding as many syntax errors as possible.

RE: What to use for finding as many syntax errors as possible.

<mailman.599.1665376892.20444.python-list@python.org>

https://www.novabbs.com/devel/article-flat.php?id=19737&group=comp.lang.python#19737

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From:
Newsgroups: comp.lang.python
Subject: RE: What to use for finding as many syntax errors as possible.
Date: Mon, 10 Oct 2022 00:41:28 -0400
Lines: 152
Message-ID: <mailman.599.1665376892.20444.python-list@python.org>
References: <8ba966ed-a935-84ea-f65d-fec0dc71403a@vub.be>
<Y0NO+FsJ4oeCNO1T@cskk.homeip.net>
<00bd01d8dc62$900ebba0$b02c32e0$@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de 8P8PEerRlkW5KURhYVLDjwJNVqX3M9Xn8hAQ0HFNLcHg==
Return-Path: <avi.e.gross@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=VONwuaBm;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'argument': 0.04; 'knows':
0.04; '2022': 0.05; 'bunch': 0.05; 'programming.': 0.05; 'random':
0.05; 'variable': 0.05; '(to': 0.07; 'cards': 0.07; 'happened':
0.07; 'lets': 0.07; 'matching': 0.07; 'modules': 0.07; 'parent':
0.07; 'partial': 0.07; 'suggestion': 0.07; 'translate': 0.07;
'wanting': 0.07; 'wrong.': 0.07; 'compiler': 0.09; 'intelligent':
0.09; 'module.': 0.09; 'namespace': 0.09; 'newspaper.': 0.09;
'obviously': 0.09; 'perspective': 0.09; 'received:209.85.219':
0.09; 'resolved': 0.09; 'trivial': 0.09; 'cheers,': 0.11;
'url:mailman': 0.15; 'syntax': 0.15; '"the"': 0.16; '>>is': 0.16;
'after?': 0.16; 'alias': 0.16; 'antoon': 0.16; 'arguments': 0.16;
'cameron': 0.16; 'confused': 0.16; 'context.': 0.16; 'declared':
0.16; 'equally': 0.16; 'evaluating': 0.16; 'follows': 0.16;
'follows.': 0.16; 'hints': 0.16; 'humans': 0.16; 'ignored': 0.16;
'initiated,': 0.16; 'interpreter': 0.16; 'mean.': 0.16; 'missed':
0.16; 'once.': 0.16; 'pardon': 0.16; 'parsing': 0.16; 'pass.':
0.16; 'redundant': 0.16; 'region.': 0.16; 'repair': 0.16; 'sees':
0.16; 'simpson': 0.16; 'something.': 0.16; 'sounds': 0.16;
'subject:syntax': 0.16; 'tool?': 0.16; 'tries': 0.16; 'variable.':
0.16; 'wish.': 0.16; 'wrapper': 0.16; 'wrote:': 0.16; 'problem':
0.16; 'python': 0.16; 'october': 0.17; 'grant': 0.17; 'probably':
0.17; 'message-id:@gmail.com': 0.18; 'bug': 0.19; 'calls': 0.19;
'figure': 0.19; 'it?': 0.19; 'round': 0.19; 'to:addr:python-list':
0.20; 'language': 0.21; 'written': 0.22; "i've": 0.22;
'languages': 0.22; 'maybe': 0.22; 'ran': 0.22; 'version': 0.23;
'code': 0.23; 'goal': 0.23; 'lines': 0.23; 'run': 0.23; 'past':
0.25; 'skip:- 10': 0.25; 'url-ip:188.166.95.178/32': 0.25; 'url-
ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'cannot': 0.25;
'url-ip:188.166/16': 0.25; 'again,': 0.26; 'object': 0.26;
'thorough': 0.26; 'bit': 0.27; 'function': 0.27; 'old': 0.27;
'done': 0.28; 'sense': 0.28; 'direct': 0.73; 'little': 0.73;
'accuracy': 0.74; 'easy': 0.74; 'tools': 0.74; 'finds': 0.76;
'skip:y 10': 0.76; 'supposed': 0.76; 'treat': 0.76; 'life': 0.77;
'out,': 0.78; 'sent:': 0.78; 'html': 0.80; 'out.': 0.80; 'unit':
0.81; 'powerful': 0.84; 'compromise': 0.84; 'contexts.': 0.84;
'crucial': 0.84; 'cycle.': 0.84; 'easy.': 0.84; 'eventually':
0.84; 'handed': 0.84; 'masculine': 0.84; "people's": 0.84;
'phase.': 0.84; 'phrases': 0.84; 'pipeline': 0.84; 'punch': 0.84;
'spell': 0.84; 'spelling': 0.84; 'subject:many': 0.84;
'supposedly': 0.84; 'unified': 0.84; 'waited': 0.84; 'amended':
0.91; 'differently': 0.91; 'grammar': 0.91; 'hopes': 0.91;
'mistakes': 0.91; 'guides': 0.96
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:from:to:cc:subject:date:message-id:reply-to;
bh=t4Z3KvVhMT/NpfnKyc5V6IMlxvdVCXzdAxwnl4dqDP0=;
b=VONwuaBmSBLGN4s67xOSyk0bikU+mmifRPR7j/uwpahlAToigvU1lilaecTWydutRh
gSRTHs2EFqMCPVfRRMjueHjzpp4ggJh36sakw6xrFONsChdZPeFMyLF6Fd9Ml0NdoP+s
eJ9b9xKoXQt2/vRYKU3vEnEeC/axA1qItnUEXl1xuNYZ8wKfyG3srEV3T5MAaYDy9o5Z
2ajOD8AgI9QwEsY0J9jEzJktW0Ki9Ofjuw3ew4GL5FDR+OF6bElcMvzg7mKNENtRUnUN
2s0BZF76tsDbx5f4/u4Iscr/VRpib9iEchofWDEH617ZLKHMgh3iYLH+L/VU8651kxOy
KsVw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=t4Z3KvVhMT/NpfnKyc5V6IMlxvdVCXzdAxwnl4dqDP0=;
b=16zRf2EbGwQrfuIGmSEkjtpu82xsQ6R9s+wB6hyUxavvE+ytRIJfKPJuNIJTrrNHK0
JF5G3/HfFnEvMFn23c5YSekMKqmVencTchoFdTtzsDQXX2ESxd4pwxA0VHKG6HuJQxOj
ZO/vRIbYyZia3l58sZmIK0tKytNuHkB3lBgsiIE5RNvB/hhTS5+vt6aB3NLQVD5cgClc
V8Bipx+8aUj5UCKJlp+5LR4KU0kAzU39UMZ3x55uKRtfIa2/aPMmrWz4EkJ4eoLYkuzf
xy4I85LsjGNOTdhkc9Dsys98pXMrfmLO2tBXZYptVJVMM3TaRSYG+REgedF2dIZTAZTP
ESsQ==
X-Gm-Message-State: ACrzQf3TrzIY4NWiIsEyRvoFU8MnFl4mc7rKb7mq/BEe+ktwXGXSrQdX
7EYDBLPpfwJJ2TGWTwxqeMlrTDuEL2s=
X-Google-Smtp-Source: AMsMyM4BG+mYdiA0PBOxV7zQm0Pn4x4adp8KUlRQZprYbVFBNjPHzIxKbMK36h/0vn0eXjbun+SCrw==
X-Received: by 2002:a05:6214:ca8:b0:4b1:87f8:c4ef with SMTP id
s8-20020a0562140ca800b004b187f8c4efmr13249232qvs.50.1665376889576;
Sun, 09 Oct 2022 21:41:29 -0700 (PDT)
In-Reply-To: <Y0NO+FsJ4oeCNO1T@cskk.homeip.net>
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AQH9YXncSARpks0oyg4hYE6kAeP/RgDzYEMCrbZbpXA=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <00bd01d8dc62$900ebba0$b02c32e0$@gmail.com>
X-Mailman-Original-References: <8ba966ed-a935-84ea-f65d-fec0dc71403a@vub.be>
<Y0NO+FsJ4oeCNO1T@cskk.homeip.net>

by: - Mon, 10 Oct 2022 04:41 UTC

Cameron,

Your suggestion makes me shudder!

Removing all earlier lines of code is often guaranteed to generate errors as
variables you are using are not declared or initiated, modules are not
imported and so on.

Removing just the line or three where the previous error happened would also
have a good chance of invalidating something.

Someone who really wants to be able to isolate large parts of their code so
that an error in once does not compromise lots of remaining code, might
build their code in small units on the level of single functions per file
and do lots of imports. They can then ask for all the files to be
pseudo-compiled to byte-code and that might provide lots of errors to look
at in one pass.

But asking for a one-file version to find errors and somehow go past them
and look for more is more daunting but of course can be done with partial
accuracy and usefulness at best.

As an analogy, if tolerated, think of a spell-checker on a document that can
find oodles of words spelled wrong. Unfortunately, a spell corrector can
drive us nuts if it knows little about context. If it sees a word like
"reid" should it just change it to "read" or "red" or perhaps "reed" or look
to see if the real problem is it is supposed to be unified (no space) with a
word before or after? Will it know if the word appears in a context where a
language like Latin or French or German or Hungarian is being quoted and
perhaps it is spelled right, or if wrong, has other more likely corrections?

Now if you add a grammar detector, and it knows you are looking for an
adjective or a verb or a noun, it may do better.

I use Google translate quite a bit as a tool as I often have to type in
various languages and it provides a handy keyboard or lets me check if I
used the right grammar especially in languages with silly ideas that objects
can have 2 or even three genders. So putting in phrases like "this xyz" can
result in language-specific text that tells me if it is masculine or
feminine or perhaps neuter. But the reason I mention it is how often it is
WRONG. I mean many languages have multiple words that are spelled the same
but used and pronounced differently in various contexts. The English word
"read" can sound like reed or like red so past tense sounds different as in
I read that book last week versus please read it to me now. But some
languages such as Hebrew which generally may not show the vowels, can get
totally confused in this program as humans often need lots of context to
figure out whether the current short word is in a context where it means
"you: feminine and singular and is pronounced aht or it is a way of showing
what follows is a direct object and loosely means "the" in a redundant way
and is pronounced as "eht". Quite a few words have three or more possible
ways to pronounce the same letters and without vowel guides need context and
sometimes some spreadsheet-like ingenuity as multiple other words are also
in limbo and once resolved can impact what other words may now mean.
Obviously adding back the vowels makes things clear so people who are used
to seeing books written that old way can get hopelessly lost reading a
modern newspaper.

End of digression, just assume I could have gone on for many pages
describing my annoyances at what Google translate does to many other
languages that show the imperfections in what is really a great and powerful
tool.

Well parsing any program in most languages can be equally complex and
require lots of context. For example, you can often use the same identifier
to be the name of a regular variable or the name of a function and sometimes
other things such as the name of a module. They can often be disambiguated
in context. Perhaps the same name following by parentheses should be a
function call while a name followed by :: or ::: might in that language
require it to be the name of a module/package. If followed by [ it might
need to be something indexable such as an array or list and so on. So say
there is an error in the variable. Can the interpreter or linter figure out
what the error is and almost repair it? Can it see a variable name like
"alpXha" and note there is no such identifier in the current namespace but
there is one called "alpha" that might be the one without the X? But what if
what is missing is an open parent or maybe the matching close paren. Does it
know if the problem is a bad variable name or a bad function invocation or
one of many other possible problems. Code with a random blemish is often not
easily figured out. If I type the name of a function without parentheses, it
could be an attempt to call the function with no arguments (an error though
in many languages) or it could be I want to pass the function itself as n
argument in functional programming. But if I have another variable of type
array, might it not be parentheses missing but square brackets?

The compiler or interpreter often cannot fix it so it often tries to skip
forward till it finds something unambiguous that mark the beginning of a new
section. That might be something like an unquoted semicolon at the end of a
line or a matching close bracket. Depending on such choices, again, varying
amounts of the program may be ignored in evaluating what follows. But this
is not the same as a human speedreading or daydreaming who misses a bit here
and there and just hopes it was not crucial and that what follows probably
remains worthy and valid. I have sometimes missed something like a name and
then seen pages of pronouns like "she" and eventually give up as no more
hints arrive and I have to go back or ask someone lest a big bunch of the
text makes no sense to me.

Someone is wanting to treat code from a spelling checker perspective and
wants all possible mistakes thrown at them at once. As I pointed out, in
real life many kinds of context can matter and a really good checker might
even consult a personal list of words it has learned you want ignored, like
people's names or some abbreviations like LOL. It may even read marked-up
text in say HTML or XML or similar formats that is marked with the language
they supposedly contain and calls up a spell-checker appropriate for each
region.

But if they want a really intelligent program that recovers enough from
errors to reliably continue, maybe not easy.

They have explained and amended that they understand some of these issues
and are willing to get lots of false negatives or red herrings and their
real goal is to have a chance to detect and maybe fix a few things per round
rather than just one. Not a bad wish. Just not a trivial wish to grant and
satisfy.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Cameron Simpson
Sent: Sunday, October 9, 2022 6:45 PM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On 09Oct2022 21:46, Antoon Pardon <antoon.pardon@vub.be> wrote:
>>Is it that onerous to fix one thing and run it again? It was once when
>>you handed in punch cards and waited a day or on very busy machines.
>
>Yes I find it onerous, especially since I have a pipeline with unit
>tests and other tools that all have to redo their work each time a bug
>is corrected.

It is easy to get the syntax right before submitting to such a pipeline.
I usually run a linter on my code for serious commits, and I've got a
`lint1` alias which basicly runs the short fast flavour of that which does a
syntax check and the very fast less thorough lint phase.

I say this just to ease your write/run-tests cycle.

Regarding your main request, had you considered writing your own wrapper
tool? Something which ran something like:

python -We:invalid -m py_compile your_python_file.py

If there's an error, report it, then make a new file commencing with the
next unindented line after the error, with all preceeding lines commented
out (to keep the line numbers the same). Then run the check again. Repeat
until the file's empty or there are no errors.

This doesn't sound very complex.

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list

Subject	Author
RE: What to use for finding as many syntax errors as possible.	<avi.e.gross
Re: What to use for finding as many syntax errors as possible.	Robert Latest