Message-ID:

Science and religion are in full accord but science and faith are in complete discord.

devel / comp.lang.python / Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

Re: How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

<mailman.101.1705896216.15798.python-list@python.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24924&group=comp.lang.python#24924

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: lis...@tompassin.net (Thomas Passin)
Newsgroups: comp.lang.python
Subject: Re: How to replace a cell value with each of its contour cells and
yield the corresponding datasets seperately in a list according to a
Pandas-way?
Date: Sun, 21 Jan 2024 22:57:02 -0500
Lines: 239
Message-ID: <mailman.101.1705896216.15798.python-list@python.org>
References: <CAGJtH9RvLNeHFQSVk4b-TuYKKLgVbvAnoiQvC37vw5HGm2YjHg@mail.gmail.com>
<a9aa6809-2842-4e7d-9024-fb2549c2eb65@tompassin.net>
<CAGJtH9Svh5Q5B68aB+WK8JaYzV=EY-pfobi=cSoNjrTxBxv5_Q@mail.gmail.com>
<86081560-08ff-42bc-ae6d-dfb03d8b2a7d@tompassin.net>
<CAGJtH9Qcqd_fBB-h7ED9QxnA0aen6HiavM7+pY00T349MzkT8A@mail.gmail.com>
<055533d2-0600-4089-990a-6e25c172a45f@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de B/ckuYwRGnqyNsSrV8BnxAXzHSUcHKKFQ/EveGspm8JA==
Cancel-Lock: sha1:f3kDB2C2MK4DvZAnzgLjic6hfUk= sha256:KgpVNG8NqGMtgmdFiQyFzz4z5mWa7580uom2OKnCxb4=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=Du/PZndB;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'comments': 0.03; 'def':
0.04; 'containing': 0.05; 'math': 0.05; 'random': 0.05; 'arrays':
0.07; 'queue': 0.07; 'skip:\xc2 30': 0.07; 'subject:value': 0.07;
'cell.': 0.09; 'code?': 0.09; 'dataframe': 0.09; 'filtering':
0.09; 'numpy': 0.09; 'pandas': 0.09; 'skip:z 20': 0.09; 'values.':
0.09; 'subject:list': 0.11; 'import': 0.15; 'url:mailman': 0.15;
'(1,': 0.16; '1),': 0.16; '2024': 0.16; 'be,': 0.16;
'code.\xc2\xa0': 0.16; 'dataframes': 0.16; 'dataset,': 0.16;
'datasets': 0.16; 'datasets,': 0.16; 'enhancing': 0.16;
'itertools': 0.16; 'lambda': 0.16; 'main()': 0.16; 'numpy,': 0.16;
'picking': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'reformat': 0.16; 'separately':
0.16; 'subject:cell': 0.16; 'subject:each': 0.16; 'subject:way':
0.16; 'tuples': 0.16; 'yield': 0.16; '\xc2\xa0--': 0.16;
'\xc2\xa01.': 0.16; '\xc2\xa0:': 0.16; '\xc2\xa0have': 0.16;
'\xc2\xa0keep': 0.16; '\xc2\xa0on': 0.16; 'wrote:': 0.16;
'python': 0.16; 'larger': 0.17; 'probably': 0.17; 'libraries':
0.19; 'pm,': 0.19; 'to:addr:python-list': 0.20; 'machine': 0.22;
'written': 0.22; 'code': 0.23; 'laptop': 0.23; 'skip:p 30': 0.23;
'subject:How': 0.23; 'run': 0.23; 'url-ip:188.166.95.178/32':
0.25; 'url-ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'url-
ip:188.166/16': 0.25; 'else': 0.27; 'header:User-Agent:1': 0.30;
'seem': 0.31; 'takes': 0.31; 'am,': 0.31; 'approach': 0.31; 'url-
ip:188/8': 0.31; 'think': 0.32; "doesn't": 0.32; 'aiming': 0.32;
'execution': 0.32; 'python-list': 0.32; 'received:10.0': 0.32;
'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'specified': 0.32; "i'm":
0.33; '100': 0.33; 'someone': 0.34; 'header:In-Reply-To:1': 0.34;
'complex': 0.35; 'cell': 0.36; 'count': 0.36; 'target': 0.36;
'people': 0.36; 'those': 0.36; 'couple': 0.37; 'lists': 0.37;
'special': 0.37; 'really': 0.37; 'using': 0.37; "it's": 0.37;
'could': 0.38; 'read': 0.38; 'thanks': 0.38; 'changes': 0.39;
'single': 0.39; 'enough': 0.39; 'list': 0.39; 'break': 0.39;
'define': 0.40; 'program.': 0.40; 'seconds': 0.40; 'hello,': 0.40;
'want': 0.40; 'initial': 0.61; 'identified': 0.62; 'here': 0.62;
'skip:k 10': 0.64; 'skip:r 20': 0.64; 'about.': 0.64; 'in.': 0.64;
'skip:t 40': 0.64; 'your': 0.64; 'independent': 0.65; 'look':
0.65; 'well': 0.65; 'let': 0.66; 'skip:t 20': 0.66; 'nearly':
0.67; 'skip:t 30': 0.67; 'header:Received:6': 0.67; 'skip:e 20':
0.67; 'choose': 0.67; 'received:64': 0.67; 'skip:i 40': 0.68;
'pitch': 0.69; 'surrounding': 0.69; '8bit%:6': 0.71; 'product':
0.71; '10%': 0.76; 'formatting': 0.76; 'factor.': 0.84; 'janv.':
0.84; 'subject: \n ': 0.84; '\xc3\xa9crit\xc2\xa0:': 0.84; 'skip:d
30': 0.86; 'thoughts,': 0.91; 'usage.': 0.91
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1705895823; a=rsa-sha256;
cv=none;
b=Vm0gsWJdJNgdGqKwbcqI5v9rAMbFMlS9pcJm0mOX4vHVTIdednbm5CPyQhnqZDDReUNvwr
Fx6HeUHNNGL/8U8wvGbIZngbAtJri8lIId3b8OX7FLiuCZXgRFvKjKhY/aM+eu0B2N4UWq
hbZgzqSPQRr5V9PHMY39cL+n/05fMeU8CxN0U1xEw7AIhZ25/JAt6ypNhYoahUxcrq8k9b
TZWHpOBaH5x62ka2+9AdP/yjIbFqMDZ9r63XhwZC9ztmJSnRu+Y5ZE39UgjMvjwZjtkA00
YwiZoZ6pVOH0FH35DSikWAPnsYKY4bz0QIZUlDISCMwqEDL+oWqagh4GNMCOBQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1705895823;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=J+OkdBUs+VUPkPP9rG+gCKO7wJwMApGK7DjPCevRzIE=;
b=QLIsRXq/jfFEsadhFNjP3VmTDYNwQWMa4uqPF2wtkmiVePir+9tvHXvJeQ1apB7Mhuyw+K
ZiFPen1aLDKVPLVG0bVHvifJiKkwG5VgwjkQKFIEip5p84/g7zyJ2jhR3EAJoMdSHurlb9
XbmX6D/IniCC7eq0cfomH2YctbBoxYKvR4ecF0RLBEwfITNaaauINK3L8nP+um1wlHsfYW
RlRh0NBb647cNYh1pa/KOx0b2pJjTiRLBGXIyJ66J9S1NnOp+jLrarasyf6fdF2jEz8eJG
hROsyoFadDtBjvYgbXYATqGQJqUdDM6BC4fMZZJ0cttaAYMgTPbMJ6gMx5NOdg==
ARC-Authentication-Results: i=1; rspamd-568947cb6c-5f57s;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Relation-Lyrical: 48d190dd619d07f1_1705895823808_1251795846
X-MC-Loop-Signature: 1705895823808:3243765924
X-MC-Ingress-Time: 1705895823808
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1705895823;
bh=J+OkdBUs+VUPkPP9rG+gCKO7wJwMApGK7DjPCevRzIE=;
h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding;
b=Du/PZndBO/nrY6TH97p1LtDvgH/Ox7XkR4MPlfLoXGXdf5Pyex6iSDcbzjQmDdyUP
JaOEMHtxGmrtPJqx/y5pKFuTgSdreYRxhb+sB4qXSgi3vgIJoAvg4bMF+skBjlK+sR
pBY258bzGFHTCs5GHCKukcmUHf6eqI9e5Brhjor22QlF1QMToXYYNghoY66NKqD+3t
szSIok4qWoxDVdFFBAXgGXS/qSvxOOJJfDKuV0NI0LmRX7ILIEAIUeAedLxJxlIIVC
+eYwpmEeZIlGFsDEDB6WnsHfesdGc/RKMwYnRYATHecyMB6K8iv2Mg0mSfa4HNfcZi
EFfkyXtddFFEw==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <CAGJtH9Qcqd_fBB-h7ED9QxnA0aen6HiavM7+pY00T349MzkT8A@mail.gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <055533d2-0600-4089-990a-6e25c172a45f@tompassin.net>
X-Mailman-Original-References: <CAGJtH9RvLNeHFQSVk4b-TuYKKLgVbvAnoiQvC37vw5HGm2YjHg@mail.gmail.com>
<a9aa6809-2842-4e7d-9024-fb2549c2eb65@tompassin.net>
<CAGJtH9Svh5Q5B68aB+WK8JaYzV=EY-pfobi=cSoNjrTxBxv5_Q@mail.gmail.com>
<86081560-08ff-42bc-ae6d-dfb03d8b2a7d@tompassin.net>
<CAGJtH9Qcqd_fBB-h7ED9QxnA0aen6HiavM7+pY00T349MzkT8A@mail.gmail.com>

by: Thomas Passin - Mon, 22 Jan 2024 03:57 UTC

On 1/21/2024 1:25 PM, marc nicole wrote:
> It is part of a larger project aiming at processing data according to a
> given algorithm
> Do you have any comments or any enhancing recommendations on the code?

I'm not knowledgeable enough about either pandas or numpy, I'm afraid,
just very basic usage. Someone else will probably pitch in.

> Thanks.
>
> Le dim. 21 janv. 2024 à 18:28, Thomas Passin via Python-list
> <python-list@python.org <mailto:python-list@python.org>> a écrit :
>
> On 1/21/2024 11:54 AM, marc nicole wrote:
> > Thanks for the reply,
> >
> > I think using a Pandas (or a Numpy) approach would optimize the
> > execution of the program.
> >
> > Target cells could be up to 10% the size of the dataset, a good
> example
> > to start with would have from 10 to 100 values.
>
> Thanks for the reformatted code. It's much easier to read and think
> about.
>
> For say 100 points, it doesn't seem that "optimization" would be
> much of
> an issue. On my laptop machine and Python 3.12, your example takes
> around 5 seconds to run and print(). OTOH if you think you will go to
> much larger datasets, certainly execution time could become a factor.
>
> I would think that NumPy arrays and/or matrices would have good
> potential.
>
> Is this some kind of a cellular automaton, or an image filtering
> process?
>
> > Let me know your thoughts, here's a reproducible example which I
> formatted:
> >
> >
> >
> > from numpy import random
> > import pandas as pd
> > import numpy as np
> > import operator
> > import math
> > from collections import deque
> > from queue import *
> > from queue import Queue
> > from itertools import product
> >
> >
> > def select_target_values(dataframe, number_of_target_values):
> > target_cells = []
> > for _ in range(number_of_target_values):
> > row_x = random.randint(0, len(dataframe.columns) - 1)
> > col_y = random.randint(0, len(dataframe) - 1)
> > target_cells.append((row_x, col_y))
> > return target_cells
> >
> >
> > def select_contours(target_cells):
> > contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
> > contour_cells = []
> > for target_cell in target_cells:
> > # random contour count for each cell
> > contour_cells_count = random.randint(1, 4)
> > try:
> > contour_cells.append(
> > [
> > tuple(
> > map(
> > lambda i, j: i + j,
> > (target_cell[0], target_cell[1]),
> > contour_coordinates[iteration_],
> > )
> > )
> > for iteration_ in range(contour_cells_count)
> > ]
> > )
> > except IndexError:
> > continue
> > return contour_cells
> >
> >
> > def create_zipf_distribution():
> > zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))
> >
> > zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)
> >
> > return zipf_distribution_dataset
> >
> >
> > def apply_contours(target_cells, contour_cells):
> > target_cells_with_contour = []
> > # create one single list of cells
> > for idx, target_cell in enumerate(target_cells):
> > target_cell_with_contour = [target_cell]
> > target_cell_with_contour.extend(contour_cells[idx])
> > target_cells_with_contour.append(target_cell_with_contour)
> > return target_cells_with_contour
> >
> >
> > def create_possible_datasets(dataframe, target_cells_with_contour):
> > all_datasets_final = []
> > dataframe_original = dataframe.copy()
> >
> > list_tuples_idx_cells_all_datasets = list(
> > filter(
> > lambda x: x,
> > [list(tuples) for tuples in
> > list(product(*target_cells_with_contour))],
> > )
> > )
> > target_original_cells_coordinates = list(
> > map(
> > lambda x: x[0],
> > [
> > target_and_contour_cell
> > for target_and_contour_cell in
> target_cells_with_contour
> > ],
> > )
> > )
> > for dataset_index_values in list_tuples_idx_cells_all_datasets:
> > all_datasets = []
> > for idx_cell in range(len(dataset_index_values)):
> > dataframe_cpy = dataframe.copy()
> > dataframe_cpy.iat[
> > target_original_cells_coordinates[idx_cell][1],
> > target_original_cells_coordinates[idx_cell][0],
> > ] = dataframe_original.iloc[
> > dataset_index_values[idx_cell][1],
> > dataset_index_values[idx_cell][0]
> > ]
> > all_datasets.append(dataframe_cpy)
> > all_datasets_final.append(all_datasets)
> > return all_datasets_final
> >
> >
> > def main():
> > zipf_dataset = create_zipf_distribution()
> >
> > target_cells = select_target_values(zipf_dataset, 5)
> > print(target_cells)
> > contour_cells = select_contours(target_cells)
> > print(contour_cells)
> > target_cells_with_contour = apply_contours(target_cells,
> contour_cells)
> > datasets = create_possible_datasets(zipf_dataset,
> > target_cells_with_contour)
> > print(datasets)
> >
> >
> > main()
> >
> > Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list
> > <python-list@python.org <mailto:python-list@python.org>
> <mailto:python-list@python.org <mailto:python-list@python.org>>> a
> écrit :
> >
> > On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> > > Hello,
> > >
> > > I have an initial dataframe with a random list of target cells
> > (each cell
> > > being identified with a couple (x,y)).
> > > I want to yield four different dataframes each containing the
> > value of one
> > > of the contour (surrounding) cells of each specified
> target cell.
> > >
> > > the surrounding cells to consider for a specific target
> cell are
> > : (x-1,y),
> > > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4
> > cells from
> > > these and consider for replacement to the target cell.
> > >
> > > I want to do that through a pandas-specific approach without
> > having to
> > > define the contour cells separately and then apply the
> changes on the
> > > dataframe
> >
> > 1. Why do you want a Pandas-specific approach? Many people would
> > rather
> > keep code independent of special libraries if possible;
> >
> > 2. How big can these collections of target cells be, roughly
> speaking?
> > The size could make a big difference in picking a design;
> >
> > 3. You really should work on formatting code for this list.
> Your code
> > below is very complex and would take a lot of work to
> reformat to the
> > point where it is readable, especially with the nearly
> impenetrable
> > arguments in some places. Probably all that is needed is to
> replace
> > all
> > tabs by (say) three spaces, and to make sure you
> intentionally break
> > lines well before they might get word-wrapped. Here is one
> example I
> > have reformatted (I hope I got this right):
> >
> > list_tuples_idx_cells_all_datasets = list(filter(
> > lambda x: utils_tuple_list_not_contain_nan(x),
> > [list(tuples) for tuples in list(
> > itertools.product(*target_cells_with_contour))
> > ]))
> >
> > 4. As an aside, it doesn't look like you need to convert all
> those
> > sequences and iterators to lists all over the place;
> >
> >
> > > (but rather using an all in one approach):
> > > for now I have written this example which I think is not
> Pandas
> > specific:
> > [snip]
> >
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> <https://mail.python.org/mailman/listinfo/python-list>
> > <https://mail.python.org/mailman/listinfo/python-list
> <https://mail.python.org/mailman/listinfo/python-list>>
> >
>
> --
> https://mail.python.org/mailman/listinfo/python-list
> <https://mail.python.org/mailman/listinfo/python-list>
>

Click here to read the complete article

Subject	Author
Re: How to replace a cell value with each of its contour cells and yield the cor	Thomas Passin