novaBBS - comp.lang.python - Pandas: How does df.apply(lambda work to create a result

Pandas: How does df.apply(lambda work to create a result

<s8lmum$7qg$1@dont-email.me>

https://www.novabbs.com/devel/article-flat.php?id=13358&group=comp.lang.python#13358

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: vee...@foo.com (Veek M)
Newsgroups: comp.lang.python
Subject: Pandas: How does df.apply(lambda work to create a result
Date: Wed, 26 May 2021 14:45:42 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <s8lmum$7qg$1@dont-email.me>
Reply-To: Veek M <vek.m12@foo.com>
Injection-Date: Wed, 26 May 2021 14:45:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="125be8dc561f210468a25bd1889fad29";
logging-data="8016"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DqgjqOsS0YalWB6rdVKBQ"
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:Be8gfpIXGp7mKzM3shhxi/tZgRU=

by: Veek M - Wed, 26 May 2021 14:45 UTC

t = pd.DataFrame([[4,9],]*3, columns=['a', 'b'])
a b
0 4 9
1 4 9
2 4 9

t.apply(lambda x: [x]) gives
a [[1, 2, 2]]
b [[1, 2, 2]]
How?? When you 't' within console the entire data frame is dumped but how are
the individual elements passed into .apply()? I can't do lambda x,y: [x,y]
because only 1 arg is passed so how does [4] generate [[ ]]

Also - this:
t.apply(lambda x: [x], axis=1)
0 [[139835521287488, 139835521287488]]
1 [[139835521287488, 139835521287488]]
2 [[139835521287488, 139835521287488]]
vey weird - what just happened??

In addition, how do I filter data eg: t[x] = t[x].apply(lambda x: x*72.72) I'd
like to remove numeric -1 contained in the column output of t[x]. 'filter' only
works with labels of indices, i can't do t[ t[x] != -1 ] because that will then
generate all the rows and I have no idea how that will be translate to
within a .apply(lambda x... (hence my Q on what's going on internally)

Could someone clarify please.

(could someone also tell me briefly the best way to use NNTP and filter
out the SPAM - 'pan' and 'tin' don't work anymore afaik
[eternal-september] and I'm using slrn currently - the SLang regex is
weird within the kill file - couldn't get it to work - wound up killing
everything when I did
Subject: [A-Z][A-Z][A-Z]+
)

Re: Pandas: How does df.apply(lambda work to create a result

<mailman.388.1622163356.3087.python-list@python.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=13394&group=comp.lang.python#13394

copy link Newsgroups: comp.lang.python

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: cs...@cskk.id.au (Cameron Simpson)
Newsgroups: comp.lang.python
Subject: Re: Pandas: How does df.apply(lambda work to create a result
Date: Fri, 28 May 2021 10:45:04 +1000
Lines: 109
Message-ID: <mailman.388.1622163356.3087.python-list@python.org>
References: <s8lmum$7qg$1@dont-email.me>
<YLA9EPTd5q4mCus1@cskk.homeip.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de vo4oCIIkenxZOMGVF31B2gx6GSLNqQY5slI4v/OG4N7g==
Return-Path: <cameron@cskk.id.au>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'this:': 0.03;
'containing': 0.05; 'parameter': 0.05; 'row': 0.05; 'filter':
0.07; 'returning': 0.07; 'translate': 0.07; 'confess': 0.09;
'dataframe': 0.09; 'newsgroup': 0.09; 'numeric': 0.09; 'readable':
0.09; 'regex': 0.09; 'subject:result': 0.09; 'url:reference':
0.09; 'url:stable': 0.09; 'cheers,': 0.10; 'looks': 0.11; "can't":
0.14; 'also.': 0.16; 'cameron': 0.16; 'column': 0.16;
'dataframe.': 0.16; 'eg:': 0.16; 'frame': 0.16; 'from:addr:cs':
0.16; 'from:addr:cskk.id.au': 0.16; 'from:name:cameron simpson':
0.16; 'gatewayed': 0.16; 'indicating': 0.16; 'labels': 0.16;
'lambda': 0.16; 'likewise': 0.16; 'message-id:@cskk.homeip.net':
0.16; 'nntp': 0.16; 'pandas.': 0.16; 'prompt.': 0.16;
'received:10.10': 0.16; 'received:l': 0.16; 'row,': 0.16;
'simpson': 0.16; 'skip:> 10': 0.16; 'skip:> 20': 0.16;
'subject:Pandas': 0.16; 'subject:does': 0.16; 'wrote:': 0.16;
'values': 0.16; 'instead': 0.17; 'maybe': 0.20; 'way.': 0.20;
'subject:How': 0.22; "what's": 0.23; "i'd": 0.23; 'to:addr:python-
list': 0.23; 'list,': 0.23; 'idea': 0.25; 'seems': 0.26;
'elements': 0.27; 'creating': 0.27; 'single': 0.28; 'this?': 0.28;
'effect': 0.28; 'output': 0.28; 'subscribe': 0.28; 'this.': 0.29;
'mailing': 0.30; 'header:User-Agent:1': 0.31; 'printed': 0.31;
'takes': 0.31; 'but': 0.31; 'expect': 0.31; "i'm": 0.32; 'link.':
0.32; 'python-list': 0.32; 'using': 0.33; 'class': 0.33;
'requires': 0.33; 'work.': 0.33; 'header:In-Reply-To:1': 0.33;
'example,': 0.35; 'cell': 0.37; 'possibly': 0.37; 'table': 0.37;
'two': 0.37; 'way': 0.37; 'currently': 0.37; 'file': 0.38;
'something': 0.38; 'going': 0.38; 'use': 0.38; 'does': 0.38;
'someone': 0.39; 'list': 0.39; 'called': 0.40; 'test': 0.40;
'could': 0.40; 'pretty': 0.40; 'items': 0.61; 'best': 0.61;
'entire': 0.61; 'gives': 0.63; 'header:Received:6': 0.63;
'received:userid': 0.64; 'url-ip:172.67/16': 0.64; 'your': 0.64;
'clear': 0.65; '[4]': 0.65; 'addition,': 0.65; 'subject:(': 0.65;
'above': 0.65; 'spam': 0.66; 'that,': 0.68; 'less': 0.68;
'within': 0.68; 'exactly': 0.69; 'result,': 0.69; 'reply': 0.77;
'documented': 0.77; 'received:172.16': 0.77; 'url-ip:104.26/16':
0.79; 'disclaimer:': 0.82; 'url:api': 0.84; 'axis': 0.84;
'cleaner': 0.84; 'killing': 0.84; 'weird': 0.84; 'url-
ip:104.26.0/24': 0.91; 'url-ip:104.26.1/24': 0.91; 'produces':
0.93
X-RG-Spam: Unknown
X-RazorGate-Vade: gggruggvucftvghtrhhoucdtuddrgeduledrvdekiedgfeeiucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuuffpveftpgfvgffnuffvtfetpdfqfgfvnecuuegrihhlohhuthemucegtddtnecunecujfgurhepfffhvffukfggtggugfgjfghfsehtqhertddtredvnecuhfhrohhmpeevrghmvghrohhnucfuihhmphhsohhnuceotghssegtshhkkhdrihgurdgruheqnecuggftrfgrthhtvghrnhepveelffdtgfeijeevfeetheegueeiueeihfejgeelffejleffkeetvddviefhveegnecuffhomhgrihhnpehphigurghtrgdrohhrghenucfkphepuddruddvledrudejledrfeegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehhvghlohepshhovghkrhhishdrfigruhhkrdgtshhkkhdrhhhomhgvihhprdhnvghtpdhinhgvthepuddruddvledrudejledrfeegpdhmrghilhhfrhhomhepoegtrghmvghrohhnsegtshhkkhdrihgurdgruheqpdhrtghpthhtohepoehphihthhhonhdqlhhishhtsehphihthhhonhdrohhrgheq
X-RazorGate-Vade-Verdict: clean 0
X-RazorGate-Vade-Classification: clean
X-RG-VS-CLASS: clean
X-Authentication-Info: Submitted using ID cskk@bigpond.com
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <s8lmum$7qg$1@dont-email.me>
User-Agent: Mutt/2.0.3 (2020-12-04)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <YLA9EPTd5q4mCus1@cskk.homeip.net>
X-Mailman-Original-References: <s8lmum$7qg$1@dont-email.me>

by: Cameron Simpson - Fri, 28 May 2021 00:45 UTC

Disclaimer: I haven't actually used pandas.

On 26May2021 14:45, Veek M <veekm@foo.com> wrote:
>t = pd.DataFrame([[4,9],]*3, columns=['a', 'b'])
> a b
>0 4 9
>1 4 9
>2 4 9

I presume you've printed "t" here. So the above table is str(t). Or
possibly repr(t) if you were at the interactive prompt. It is a human
readable printout of "t".

>t.apply(lambda x: [x]) gives
>a [[1, 2, 2]]
>b [[1, 2, 2]]
>How?? When you 't' within console the entire data frame is dumped but how are
>the individual elements passed into .apply()?

The doco for .apply seems to be here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

When you go "t.apply(....)", the class implementing "t" has its .apply()
method called - this is what does the work. So "t" is a DataFrame, so
you're calling DataFrame.apply as documented at the above link.

From the output, I expect that it takes each row in the DataFrame and
passed it to the lambda function, and produces a single column value
from the result, in the end creating a new single column DataFrame. The
docs suggets you can do more than that, also.

>I can't do lambda x,y: [x,y]
>because only 1 arg is passed so how does [4] generate [[ ]]

Because your lambda:

lambda x: [x]

is passed the whole row, which is a list. You're returning a single
element list containing that list.

If you know the rows have exactly 2 values you could do this:

lambda x: [x[0]*2, x[1]*3]

to get the first column multiplied by 2 and the second by 3.

You might do better to write your lambda like this:

lambda row: [row]

just so that it is clear that you're getting the whole row rather than
some single element from the row.

>Also - this:
> t.apply(lambda x: [x], axis=1)
>0 [[139835521287488, 139835521287488]]
>1 [[139835521287488, 139835521287488]]
>2 [[139835521287488, 139835521287488]]
>vey weird - what just happened??

See the docs above for the effect the axis parameter has on _how_ .apply
does its work.

>In addition, how do I filter data eg: t[x] = t[x].apply(lambda x: x*72.72) I'd
>like to remove numeric -1 contained in the column output of t[x]. 'filter' only
>works with labels of indices, i can't do t[ t[x] != -1 ] because that will then
>generate all the rows and I have no idea how that will be translate to
>within a .apply(lambda x... (hence my Q on what's going on internally)

It looks like the .fliter method accepts an items parameter indicating
which axis labels to keep. Use axis=0 to filter on the rows instead of
the columns. Maybe something shaped like this?

t.filter(axis=0, items=[
label for label in t.labels if t[label][0] != -1
]).apply(.....)

That looks pretty cumbersome and also requires a way to get the labels
of "t", which I just made up as "t.labels". And I'm just guessing that
"t[label][0]" might get you the cell value you want to test against -1.

I expect there's a cleaner way to do this.

>(could someone also tell me briefly the best way to use NNTP and filter
>out the SPAM - 'pan' and 'tin' don't work anymore afaik
>[eternal-september] and I'm using slrn currently - the SLang regex is
>weird within the kill file - couldn't get it to work - wound up killing
>everything when I did
>Subject: [A-Z][A-Z][A-Z]+
>)

I confess I subscribe to the python-list mailing list, not the
newsgroup. It has much much less spam, and the two are gatewayed so you
can particpate either way. For example, you've posted to the newsgroup
and I'm seeing your post in the mailing list. Likewise my reply will be
going to the mailing list and copied to the newsgroup.

Come on over to the mailing list. It is rumoured to be much quieter.

Cheers,
Cameron Simpson <cs@cskk.id.au>

1: No code table for op: ++post

devel / comp.lang.python / Pandas: How does df.apply(lambda work to create a result

Subject	Author
Pandas: How does df.apply(lambda work to create a result	Veek M
Re: Pandas: How does df.apply(lambda work to create a result	Cameron Simpson