Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Neil Armstrong tripped.


devel / comp.lang.python / RE: Extract lines from file, add to new files

SubjectAuthor
o RE: Extract lines from file, add to new files<avi.e.gross

1
RE: Extract lines from file, add to new files

<mailman.18.1705070542.15798.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24816&group=comp.lang.python#24816

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From:
Newsgroups: comp.lang.python
Subject: RE: Extract lines from file, add to new files
Date: Fri, 12 Jan 2024 09:42:18 -0500
Lines: 81
Message-ID: <mailman.18.1705070542.15798.python-list@python.org>
References: <5a6d88e-46b1-a3ea-333-d053cbe5654d@appl-ecosys.com>,
<24aeb00e-41fd-4809-ae96-d429645cbc07@mrabarnett.plus.com>,
<4a215b7d-f1af-49d7-1496-96e290255314@appl-ecosys.com>
<65A0E32A.2458.26DB92@RealGrizzlyAdams.vivaldi.net>
<002401da4565$8b477360$a1d65a20$@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de aaGsaBhHrYTrWs/Fv/nNjgj++1/wbN+uKCW0jLHj4iJw==
Cancel-Lock: sha1:hb7vxcwMLiDDWyijJoKNtrOByh0= sha256:HGgq5jSiEo6JsR9hEJmXtDcrsZooEtBp5QSmYx35zac=
Return-Path: <avi.e.gross@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=i3FUzuXC;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.025
X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'looks': 0.02; 'csv': 0.03;
'containing': 0.05; 'row': 0.05; 'arrays': 0.07; 'subject:add':
0.07; 'aspect': 0.09; 'blank': 0.09; 'computing': 0.09;
'construct': 0.09; 'example.': 0.09; 'grep': 0.09; 'objects,':
0.09; 'parse': 0.09; 'received:108': 0.09; 'subject:files': 0.09;
'supplied': 0.09; 'url:mailman': 0.15; 'memory': 0.15; '2024':
0.16; 'columns': 0.16; 'does,': 0.16; 'duplicates': 0.16;
'empty,': 0.16; 'part)': 0.16; 'purpose?': 0.16; 'scenario,':
0.16; 'shepard': 0.16; 'sync': 0.16; 'whitespace,': 0.16;
'wrote:': 0.16; 'python': 0.16; 'code.': 0.17; 'message-
id:@gmail.com': 0.18; 'addresses': 0.19; 'name.': 0.19; 'thu,':
0.19; 'to:addr:python-list': 0.20; 'language': 0.21; 'input':
0.21; 'written': 0.22; 'subject:file': 0.22; 'lines': 0.23; 'run':
0.23; '(and': 0.25; 'skip:- 10': 0.25; 'url-ip:188.166.95.178/32':
0.25; 'url-ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'url-
ip:188.166/16': 0.25; '11,': 0.26; 'friday,': 0.26; 'object':
0.26; 'done': 0.28; 'output': 0.28; 'sense': 0.28; 'seem': 0.31;
'url-ip:188/8': 0.31; 'program': 0.31; 'question': 0.32;
'collected': 0.32; 'end.': 0.32; 'extract': 0.32; 'formats': 0.32;
'objects': 0.32; 'python-list': 0.32; 'structure': 0.32; 'unless':
0.32; 'but': 0.32; 'there': 0.33; 'someone': 0.34; 'same': 0.34;
'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34;
'from:addr:gmail.com': 0.35; 'files': 0.36; 'name:': 0.37;
'subject:from': 0.37; 'really': 0.37; 'using': 0.37; "it's": 0.37;
'file': 0.38; 'could': 0.38; 'two': 0.39; 'least': 0.39; 'edit':
0.39; 'mentioned': 0.39; 'valid': 0.39; 'list': 0.39; 'on.': 0.39;
'still': 0.40; 'design,': 0.40; 'exact': 0.40; 'happen': 0.40;
'otherwise,': 0.40; 'something': 0.40; 'want': 0.40; 'including':
0.60; 'above': 0.62; 'from:': 0.62; 'to:': 0.62; 'load': 0.62;
'merge': 0.62; 'days': 0.62; 'email:': 0.63; 'email': 0.63;
'pass': 0.64; 're:': 0.64; 'updating': 0.64; 'your': 0.64;
'produce': 0.65; 'touch': 0.65; 'look': 0.65; 'required': 0.65;
'receiving': 0.66; 'time.': 0.66; 'back': 0.67; 'bad': 0.67;
'generally': 0.67; 'exactly': 0.68; 'operations': 0.68; 'further':
0.69; 'order': 0.69; 'and,': 0.69; 'it:': 0.69; 'manner': 0.69;
'obvious': 0.69; 'solutions': 0.70; 'ignore': 0.71; 'care': 0.71;
'january': 0.71; 'easy': 0.74; 'tools': 0.74; 'operate': 0.75;
'poor': 0.76; 'sent:': 0.78; 'alternating': 0.84; 'further,':
0.84; 'handled': 0.84; 'lines,': 0.84; 'personalized': 0.84;
'valid,': 0.84; 'written,': 0.84; 'adams': 0.91; 'consists': 0.93;
'line,': 0.93; 'storage': 0.95
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1705070540; x=1705675340; darn=python.org;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:from:to:cc:subject:date:message-id:reply-to;
bh=13MeiC0NzpSmWlRVrr3IXPREcxIkBet6lkj9uRHUa0c=;
b=i3FUzuXCTUS66o/KO6D0D3Ag+hcNeCtFABOIKQVvu1ret4mrGEEjjGWyf5BpdRhLpM
99wP0f1z//WzkHdmFNlyaQk0VmbV76HHvdRZKu0NwDYmH7YRyQQX9fPnU8liKwBKcChn
bjR77usHmhcUt0wV+LJAsUGSD3m8jswG1Jpxd2s2vAMUj860Bx0s5bUAoJu3UayJ7Q0K
EUV0JGLoLA0TASpS/Co0dBJODJGvvN+YCMByXpREIhI34dJGzLGlumLScGR8VjLuCM44
YRl2cfQ9nsu9SC1X1Gm/GpIRO+2CmClVt3/J50oQ4HcW4SAOD5kiVa6pA7THzWE6Rhg9
lbQA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1705070540; x=1705675340;
h=thread-index:content-language:content-transfer-encoding
:mime-version:message-id:date:subject:in-reply-to:references:to:from
:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
bh=13MeiC0NzpSmWlRVrr3IXPREcxIkBet6lkj9uRHUa0c=;
b=S++XVwbUQKhbP0vtWTF2fYSyLvswheDDO5BYGyhiECT5b0ALftm213CEjBlS7yPzKY
FNnmjKEWlBio2YNLyyxW5v0PvcjwCJtjd9Px7BJe0l84iIkz1I9q0kBn/1U/xHKWVlcm
tnn/R2JPFV1/ZKsJrNX2XcNHWwsmX/VLUflhDsxyCUhL/DDM4eU0irF4dUtNhp8a8+7K
W1Bt8N7dDGd/XeAopzKFvB5hpu6PL4JK8QIO1nKx1/VAafie3reu18oErAPKyxNgnx3D
E9UGT/GUKvBjI2gAx16jw9ueKReIspfGXN+G2X7wL90N+P08ELIb9b2AES9lIyPpkjRb
wyYw==
X-Gm-Message-State: AOJu0YxWWYpQlQBlya1P4qeQDI3KQvMtpSgrzzGT+zBS9JN89uOCRkMQ
KN1aTycBwfklv/SVK5Vy05jki6Q0Cf8=
X-Google-Smtp-Source: AGHT+IGa0Wu5QW6BKIWegBor1WKsMaWqoSdm8neWV/aHT8NZXKlP8u3UC4AGZ00gOwoifUpVS6tByw==
X-Received: by 2002:a05:6214:19c9:b0:681:aa8:41af with SMTP id
j9-20020a05621419c900b006810aa841afmr1209740qvc.97.1705070539878;
Fri, 12 Jan 2024 06:42:19 -0800 (PST)
In-Reply-To: <65A0E32A.2458.26DB92@RealGrizzlyAdams.vivaldi.net>
X-Mailer: Microsoft Outlook 16.0
Content-Language: en-us
Thread-Index: AQIEz66WMm6MZC7NmeSpotZqvFCmzQI47IbGAgE6auYCCAMH4bBPExFQ
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <002401da4565$8b477360$a1d65a20$@gmail.com>
X-Mailman-Original-References: <5a6d88e-46b1-a3ea-333-d053cbe5654d@appl-ecosys.com>,
<24aeb00e-41fd-4809-ae96-d429645cbc07@mrabarnett.plus.com>,
<4a215b7d-f1af-49d7-1496-96e290255314@appl-ecosys.com>
<65A0E32A.2458.26DB92@RealGrizzlyAdams.vivaldi.net>
 by: - Fri, 12 Jan 2024 14:42 UTC

If the data in the input file is exactly as described and consists of
alternating lines containing a name and email address, or perhaps an
optional blank line, then many solutions are possible using many tools
including python programs.

But is the solution a good one for some purpose? The two output files may
end up being out of sync for all kinds of reasons. One of many "errors" can
happen if multiple lines in a row do not have an "@" or a person's name
does, for example. What if someone supplied more than one email address with
a comma separator? This may not be expected but could cause problems.

Some of the other tools mentioned would not care and produce garbage. Grep
as an example could be run twice asking for lines with an "@" and then lines
without. In this case, that would be trivial. Blank lines, or ones with just
whitespace, might need another pass to be omitted.

But a real challenge would be to parse the file in a language like Python
and find all VALID stretches in the data and construct a data structure
containing either a valid name or something specific like "ANONYMOUS"
alongside an email address. These may be written out as soon as it is
considered valid, or collected in something like a list. You can do further
processing if you want the results in some order or remove duplicates or bad
email addresses and so on. In that scenario, the two files would be written
out at the end.

Python can do the above while some of the other tools mentioned are not
really designed for it. Further, many of the tools are not generally
available everywhere.

Another question is why it makes sense to produce two output files to
contain the data that may not be linked and would not be easy to edit and
keep synchronized such as to remove or add entries. There are many ways to
save the data that might be more robust for many purposes. It looks like the
application intended is a sort of form letter merge where individual emails
will be sent that contain a personalized greeting. Unless that application
has already been written, there are many other ways that make sense. One
obvious one is to save the data in a databases as columns in a table. Other
ones are to write one file with entries easily parsed out such as:

NAME: name | EMAIL: email

Whatever the exact design, receiving software could parse that out as needed
by the simpler act of reading one line at a time.

And, of course, there are endless storage formats such as a CSV file or
serializing your list of objects to a file so that the next program can load
them in and operate from memory on all the ones it wants. The two file
solution may seem simpler but harks back to how some computing was done in
early days when list of objects might be handled by having multiple arrays
with each containing one aspect of the object and updating required
rememebreing to touch each array the same way.. That can still be a useful
technique when some operations being done in a vectoried manner might be
faster than an array of objects, but is more often a sign of poor code.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Grizzy Adams via Python-list
Sent: Friday, January 12, 2024 1:59 AM
To: Rich Shepard via Python-list <python-list@python.org>; Rich Shepard
<rshepard@appl-ecosys.com>
Subject: Re: Extract lines from file, add to new files

Thursday, January 11, 2024 at 10:44, Rich Shepard via Python-list wrote:
Re: Extract lines from file, add to (at least in part)

>On Thu, 11 Jan 2024, MRAB via Python-list wrote:

>> From the look of it:
>> 1. If the line is empty, ignore it.
>> 2. If the line contains "@", it's an email address.
>> 3. Otherwise, it's a name.

If that is it all? a simple Grep would do (and save on the blank line)
--
https://mail.python.org/mailman/listinfo/python-list

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor