novaBBS - comp.lang.python - Re: Extract lines from file, add to new files

Re: Extract lines from file, add to new files

<mailman.39.1706983149.3227.python-list@python.org>

https://www.novabbs.com/devel/article-flat.php?id=25006&group=comp.lang.python#25006

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: lis...@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: Extract lines from file, add to new files
Date: Sat, 3 Feb 2024 12:58:57 -0500
Lines: 215
Message-ID: <mailman.39.1706983149.3227.python-list@python.org>
References: <c9cf741b-9781-8dd6-96df-d270d0ac2@appl-ecosys.com>
<2a5eef2d-3d66-8cea-64eb-602ea5aff946@appl-ecosys.com>
<f1784a5b-662a-47fc-ada0-cc09ea2cfc24@tompassin.net>
<9028bd96-dbc8-fbc1-8584-a965ef7def5d@appl-ecosys.com>
<5cea1ab2-1848-41bd-8e5b-323fe55ba8c9@tompassin.net>
<478e3bbe-db95-9533-595b-7a19a5e2cf@appl-ecosys.com>
<e0c1c61d-2941-452a-8977-ef5c5bfb8bdb@tompassin.net>
<004c01da53fd$9273b9e0$b75b2da0$@gmail.com>
<17769063-0f0d-49ef-9599-88157690cefd@tompassin.net>
<007701da56be$b4ae7cc0$1e0b7640$@gmail.com>
<1edcbbc8-f785-4b72-aed3-d6131794989b@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de ZegTGYKsgiMKUKRcIBMyZQkqzXwCW1uGc4U6U2DBdegg==
Cancel-Lock: sha1:2D5GZMJDHPVxNoScbwZxAQHpMiY= sha256:yKNWPMgkZjeROgEwYqPpnQrZVlga+D++2lrbhgEKvIw=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=dUjFs5eO;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'csv': 0.03; '31,': 0.05;
'fairly': 0.05; 'file?': 0.05; 'thread': 0.05; 'architecture':
0.07; 'cpu': 0.07; 'exit': 0.07; 'filter': 0.07; 'lets': 0.07;
'loop': 0.07; 'matches': 0.07; 'modules': 0.07; 'subject:add':
0.07; 'tab': 0.07; 'wanting': 0.07; '11.': 0.09; 'approaches':
0.09; 'environments': 0.09; 'originally': 0.09; 'pandas': 0.09;
'programmers,': 0.09; 'received:23.83.212': 0.09;
'received:elm.relay.mailchannels.net': 0.09; 'something,': 0.09;
'subject:files': 0.09; 'toward': 0.09; 'downloaded': 0.13;
'help,': 0.14; 'import': 0.15; 'memory': 0.15; '2024': 0.16;
'>>>>': 0.16; 'addresses.': 0.16; 'answer.': 0.16; 'anyway.':
0.16; 'bash': 0.16; 'batch': 0.16; 'categories': 0.16; 'columns':
0.16; 'complexity.': 0.16; 'database.': 0.16; 'design.': 0.16;
'else?': 0.16; 'explaining': 0.16; 'exported': 0.16; 'frustrated':
0.16; 'heavily': 0.16; 'impossible': 0.16; 'large.': 0.16;
'mentioned,': 0.16; 'newcomers': 0.16; 'piece.': 0.16; 'pipe':
0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'script.': 0.16; 'shepard': 0.16;
'somewhat': 0.16; 'spending': 0.16; 'structures': 0.16; 'tab.':
0.16; 'target.': 0.16; 'these.': 0.16; 'toy': 0.16; 'understand.':
0.16; 'usual,': 0.16; 'ways.': 0.16; 'whatever.': 0.16;
'willing,': 0.16; 'zoom': 0.16; 'wrote:': 0.16; 'python': 0.16;
'values': 0.17; 'probably': 0.17; 'addresses': 0.19; "aren't":
0.19; 'figure': 0.19; 'guidance': 0.19; 'implement': 0.19; 'pm,':
0.19; 'tue,': 0.19; 'to:addr:python-list': 0.20; 'input': 0.21;
"i've": 0.22; 'languages': 0.22; 'communicate': 0.22; 'maybe':
0.22; 'subject:file': 0.22; 'way.': 0.22; 'version': 0.23;
'lines': 0.23; "i'd": 0.24; 'anything': 0.25; 'skip:- 10': 0.25;
'help.': 0.25; 'saying': 0.25; 'seems': 0.26; 'party': 0.26;
'wednesday,': 0.26; 'else': 0.27; 'bit': 0.27; 'done': 0.28;
'>>>': 0.28; 'mostly': 0.28; 'thinking': 0.28; 'example,': 0.28;
'ideas': 0.28; 'suggest': 0.28; 'asked': 0.29; 'chance': 0.71;
'market': 0.71; 'january': 0.71; 'product': 0.71; 'easy': 0.74;
'tools': 0.74; 'covered': 0.75; 'name,': 0.75; 'factors': 0.76;
'selecting': 0.76; 'supposed': 0.76; 'out,': 0.78; 'sent:': 0.78;
'major': 0.78; 'database': 0.80; 'out.': 0.80; 'queries': 0.81;
'returned': 0.81; 'more.': 0.82; 'points': 0.84; '(like': 0.84;
'ages': 0.84; 'became': 0.84; 'choices': 0.84; 'from.': 0.84;
'garbage': 0.84; 'lean': 0.84; "people's": 0.84; 'personalized':
0.84; 'reading,': 0.84; 'recipients.': 0.84; 'schema': 0.84;
'spreadsheets': 0.84; 'tables': 0.84; 'tend': 0.91; 'central':
0.95; 'worry': 0.95
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1706983138; a=rsa-sha256;
cv=none;
b=AlJhjifGCk1JI4Vu4pB7Ix97bC6epN/RgVWiO8NgLJn+fcJ+6Q4as28Cl3K7t9diaDcNbk
ooSiEe7GAPP/4x5RBB5QzdMTvr8D0nbwu9kkJBtF8vFkBMO0Y12qHT2spar9iEQWc2rZLV
v+bXCXFVuSjJDRpEx/gBGAQ31Ou0kLmQBlF3JTpCxP6ExoyUEjWr2m3IbB7ibEInOkJUab
LENCsgn7RBW8+hNaNszdrN2RddcZu0kIf/q5o7Kjp7dqw77tXnbgeSK3iETkZANaNyLC2o
ZKnJ+ht62Q0OwVZTtVuiPCJSoh8vuMlo5c82AL74i11PNLoSsjOqigWOUjet4Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1706983138;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=Jn5f+SeBZJsaoEXvYVaPXIMVao6MNqipkyU4n7d2df0=;
b=70g3C3Bzh5MUssUk5CzUkftM7xIH8WjQL5aCW+503pw5+XV/5rfBgxoLyypUw1uZy6WDp2
sKuCu9f/6FYUQAlHjGd4UKVZCA2NFvvrkfogdo9vKVf0BIdRYz665tqIf/6o4im0tD9m9l
41FL2EjLLoW9ZJJb4rIx1ZY9M0/Su9KCw+4KBUUtwTqCPm2XhB01a686bfhPg+KG+f8XpP
F0JAOvG2itfJbia7KC303rgBdHxTg0AC48bp0o6BMikzhr+bWYxqX+YPIXft8A1I4SellB
WwUns6RD8tHtJIn0AKsl642EXHYhgw5IaYzWWuSI4YbKGG2zSG4XiGOTJK58ew==
ARC-Authentication-Results: i=1; rspamd-6bdc45795d-f9z8g;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Wipe-Supply: 3bc1e3ea78972c3d_1706983138924_4205842900
X-MC-Loop-Signature: 1706983138924:2187110435
X-MC-Ingress-Time: 1706983138923
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1706983138;
bh=Jn5f+SeBZJsaoEXvYVaPXIMVao6MNqipkyU4n7d2df0=;
h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
b=dUjFs5eOZkVRT59s7DX40trEV/pKfS3YUAAslxAW5AaDyxELG9RWjwYFVWRttKR1o
C/oL3WwLAxbRpDXNOlt25c0FMiz5TyzRsXyI8JYcsyBUTfC+FsEf2zrTXCjtT4s+IO
MU69bxKOAR1RqpqIcxD4Kgc7kUcjqGpSNss0JDYbBV9RUUZ6F7rAsqaDdud8OPJrHf
TL4fXMiFlHv1M+w4tiheV1i7lV5WGOHc3AQYpp6WB3Q9ElYo9fp7bhO+hvEOETjk7Q
eyLHKI9OqkkucK5hjTRGMdGjxoPBeN5JGsu/aP4JzZ9dd0gCC04Eu3X3OBxrMh/bSV
GrGJjIvQ4jXVA==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <007701da56be$b4ae7cc0$1e0b7640$@gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <1edcbbc8-f785-4b72-aed3-d6131794989b@tompassin.net>
X-Mailman-Original-References: <c9cf741b-9781-8dd6-96df-d270d0ac2@appl-ecosys.com>
<2a5eef2d-3d66-8cea-64eb-602ea5aff946@appl-ecosys.com>
<f1784a5b-662a-47fc-ada0-cc09ea2cfc24@tompassin.net>
<9028bd96-dbc8-fbc1-8584-a965ef7def5d@appl-ecosys.com>
<5cea1ab2-1848-41bd-8e5b-323fe55ba8c9@tompassin.net>
<478e3bbe-db95-9533-595b-7a19a5e2cf@appl-ecosys.com>
<e0c1c61d-2941-452a-8977-ef5c5bfb8bdb@tompassin.net>
<004c01da53fd$9273b9e0$b75b2da0$@gmail.com>
<17769063-0f0d-49ef-9599-88157690cefd@tompassin.net>
<007701da56be$b4ae7cc0$1e0b7640$@gmail.com>

by: Thomas Passin - Sat, 3 Feb 2024 17:58 UTC

In my view this whole thread became murky and complicated because the OP
did not write down the requirements for the program. Requirements are
needed to communicate with other people. An individual may not need to
actually write down the requirements - depending on their complexity -
but they always exist even if only vaguely in a person's mind. The
requirements may include what tools or languages the person wants to use
and why.

If you are asking for help, you need to communicate the requirements to
the people you are asking for help from.

The OP may have thought the original post(s) contained enough of the
requirements but as we know by now, they didn't.

The person asking for help may not realize they don't know enough to
write down all the requirements; an effort to do so may bring that lack
to visibility.

Mailing lists like these have a drawback that it's hard to impossible
for someone not involved in a thread to learn anything general from it.
We can write over and over again to please state clearly what you want
to do and where the sticking points are, but newcomers post new
questions without ever reading these pleas. Then good-hearted people
who want to be helpful end up spending a lot of time trying to guess
what is actually being asked for, and maybe never find out with enough
clarity. Others take a guess and then spend time working up a solution
that may or may not be on target.

So please! before posting a request for help, write down the
requirements as best you can figure them out, and then make sure that
they are expressed such that the readers can understand.

On 2/3/2024 11:33 AM, avi.e.gross@gmail.com wrote:
> Thomas,
>
> I have been thinking about the concept of being stingy with information as
> this is a fairly common occurrence when people ask for help. They often ask
> for what they think they want while people like us keep asking why they want
> that and perhaps offer guidance on how to get closer to what they NEED or a
> better way.
>
> In retrospect, Rich did give all the info he thought he needed. It boiled
> down to saying that he wants to distribute data into two files in such a way
> that finding an item in file A then lets him find the corresponding item in
> file B. He was not worried about how to make the files or what to do with
> the info afterward. He had those covered and was missing what he considered
> a central piece. And, it seems he programs in multiple languages and
> environments as needed and is not exactly a newbie. He just wanted a way to
> implement his overall design.
>
> We threw many solutions and ideas at him but some of us (like me) also got
> frustrated as some ideas were not received due to one objection or another
> that had not been mentioned earlier when it was not seen as important.
>
> I particularly notice a disconnect some of us had. Was this supposed to be a
> search that read only as much as needed to find something and stopped
> reading, or a sort of filter that returned zero or more matches and went to
> the end, or perhaps something that read entire files and swallowed them into
> data structures in memory and then searched and found corresponding entries,
> or maybe something else?
>
> All the above approaches could work but some designs not so much. For
> example, some files are too large. We, as programmers, often consciously or
> unconsciously look at many factors to try to zoom in on what approaches me
> might use. To be given minimal amounts of info can be frustrating. We worry
> about making a silly design. But the OP may want something minimal and not
> worry as long as it is fairly easy to program and works.
>
> We could have suggested something very simple like:
>
> Open both files A and B
> In a loop get a line from each. If the line from A is a match, do something
> with the current line from B.
> If you are getting only one, exit the loop.
>
> Or, if willing, we could have suggested any other file format, such as a
> CSV, in which the algorithm is similar but different as in:
>
> Open file A
> Read a line in a loop
> Split it in parts
> If the party of the first part matches something, use the party of the
> second part
>
> Or, of course, suggest they read the entire file, into a list of lines or a
> data.frame and use some tools that search all of it and produce results.
>
> I find I personally now often lean toward the latter approach but ages ago
> when memory and CPU were considerations and maybe garbage collection was not
> automatic, ...
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
> Behalf Of Thomas Passin via Python-list
> Sent: Wednesday, January 31, 2024 7:25 AM
> To: python-list@python.org
> Subject: Re: Extract lines from file, add to new files
>
> On 1/30/2024 11:25 PM, avi.e.gross@gmail.com wrote:
>> Thomas, on some points we may see it differently.
>
> I'm mostly going by what the OP originally asked for back on Jan 11.
> He's been too stingy with information since then to be worth spending
> much time on, IMHO.
>
>> Some formats can be done simply but are maybe better done in somewhat
>> standard ways.
>>
>> Some of what the OP has is already tables in a database and that can
>> trivially be exported into a CSV file or other formats like your TSV file
>> and more. They can also import from there. As I mentioned, many
> spreadsheets
>> and all kinds of statistical programs tend to support some formats making
> it
>> quite flexible.
>>
>> Python has all kinds of functionality, such as in the pandas module, to
> read
>> in a CSV or write it out. And once you have the data structure in memory,
> al
>> kinds of queries and changes can be made fairly straightforwardly. As one
>> example, Rich has mentioned wanting finer control in selecting who gets
> some
>> version of the email based on concepts like market segmentation. He
> already
>> may have info like the STATE (as in Arizona) in his database. He might at
>> some point enlarge his schema so each entry is placed in one or more
>> categories and thus his CSV, once imported, can do the usual tasks of
>> selecting various rows and columns or doing joins or whatever.
>>
>> Mind you, another architecture could place quite a bit of work completely
> on
>> the back end and he could send SQL queries to the database from python and
>> get back his results into python which would then make the email messages
>> and pass them on to other functionality to deliver. This would remove any
>> need for files and just rely on the DB.
>>
>> There as as usual, too many choices and not necessarily one best answer.
> Of
>> course if this was a major product that would be heavily used, sure, you
>> could tweak and optimize. As it is, Rich is getting a chance to improve
> his
>> python skills no matter which way he goes.
>>
>>
>>
>> -----Original Message-----
>> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org>
> On
>> Behalf Of Thomas Passin via Python-list
>> Sent: Tuesday, January 30, 2024 10:37 PM
>> To: python-list@python.org
>> Subject: Re: Extract lines from file, add to new files
>>
>> On 1/30/2024 12:21 PM, Rich Shepard via Python-list wrote:
>>> On Tue, 30 Jan 2024, Thomas Passin via Python-list wrote:
>>>
>>>> Fine, my toy example will still be applicable. But, you know, you
> haven't
>>>> told us enough to give you help. Do you want to replace text from values
>>>> in a file? That's been covered. Do you want to send the messages using
>>>> those libraries? You haven't said what you don't know how to do.
>>>> Something
>>>> else? What is it that you want to do that you don't know how?
>>>
>>> Thomas,
>>>
>>> For 30 years I've used a bash script using mailx to send messages to a
>> list
>>> of recipients. They have no salutation to personalize each one. Since I
>>> want
>>> to add that personalized salutation I decided to write a python script to
>>> replace the bash script.
>>>
>>> I have collected 11 docs explaining the smtplib and email modules and
>>> providing example scripts to apply them to send multiple individual
>>> messages
>>> with salutations and attachments.
>>
>> If I had a script that's been working for 30 years, I'd probably just
>> use Python to do the personalizing and let the rest of the bash script
>> do the rest, like it always has. The Python program would pipe or send
>> the personalized messages to the rest of the bash program. Something in
>> that ballpark, anyway.
>>
>>> Today I'm going to be reading these. They each recommend using .csv input
>>> files for names and addresses. My first search is learning whether I can
>>> write a single .csv file such as:
>>> "name1","address1"
>>> "mane2","address2"
>>> which I believe will work; and by inserting at the top of the message
>> block
>>> Hi, {yourname}
>>> the name in the .csv file will replace the bracketed place holder
>> If the file contents are going to be people's names and email addresses,
>> I would just tab separate them and split each line on the tab. Names
>> aren't going to include tabs so that would be safe. Email addresses
>> might theoretically include a tab inside a quoted name but that would be
>> extremely obscure and unlikely. No need for CSV, it would just add
>> complexity.
>>
>> data = f.readlines()
>> for d in data:
>> name, addr = line.split('\t') if line.strip() else ('', '')
>>
>>> Still much to learn and the batch of downloaded PDF files should educate
>>> me.
>>>
>>> Regards,
>>>
>>> Rich
>>
>

Click here to read the complete article

"You must have an IQ of at least half a million." -- Popeye

devel / comp.lang.python / Re: Extract lines from file, add to new files

Subject	Author
Re: Extract lines from file, add to new files	Thomas Passin