Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"There is hopeful symbolism in the fact that flags do not wave in a vacuum." -- Arthur C. Clarke


devel / comp.lang.python / Re: C API PyObject_CallFunctionObjArgs returns incorrect result

SubjectAuthor
o Re: C API PyObject_CallFunctionObjArgs returns incorrect resultMRAB

1
Re: C API PyObject_CallFunctionObjArgs returns incorrect result

<mailman.225.1646617517.2329.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17415&group=comp.lang.python#17415

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: pyt...@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: C API PyObject_CallFunctionObjArgs returns incorrect result
Date: Mon, 7 Mar 2022 01:42:05 +0000
Lines: 52
Message-ID: <mailman.225.1646617517.2329.python-list@python.org>
References: <MxWmaxK--3-2@tutanota.com>
<5ad962fc-1257-dd8d-96ab-541ae5bae2fa@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de uHJ9EcQVBomJiOhx0Vw36QF7rlpPauuFy6Fa2hjrG8+g==
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=ZyrTVbUO;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.000
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'looks': 0.02; 'this:':
0.03; '3.8': 0.05; 'subject:API': 0.07; 'from:addr:python': 0.09;
'library,': 0.09; 'received:192.168.1.64': 0.09; 'subject:result':
0.09; 'import': 0.15; '>>>>': 0.16;
'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16;
'message-id:@mrabarnett.plus.com': 0.16; 'nltk': 0.16;
'pyobject*': 0.16; 'received:84.93': 0.16; 'received:84.93.230':
0.16; 'received:plus.net': 0.16; 'string:': 0.16;
'subject:incorrect': 0.16; 'subject:returns': 0.16; 'wrote:':
0.16; 'problem': 0.16; 'python': 0.16; 'api': 0.17; 'instead':
0.17; 'to:addr:python-list': 0.20; 'version': 0.23; 'command':
0.23; 'skip:p 30': 0.23; 'run': 0.23; 'tried': 0.26; 'library':
0.26; 'header:User-Agent:1': 0.30; 'think': 0.32; 'python-list':
0.32; 'received:192.168.1': 0.32; 'but': 0.32; '0);': 0.33;
'header:In-Reply-To:1': 0.34; 'using': 0.37; "it's": 0.37;
'this.': 0.37; 'received:192.168': 0.37; 'thanks': 0.38; 'list':
0.39; 'prompt': 0.39; 'should': 0.40; 'produce': 0.65; 'sentence':
0.69; 'relevant': 0.73; 'implemented': 0.76; 'produces': 0.76;
'sentences': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1646617330; bh=GjfYhzHABAiWywggH5bfp3ALquhX6LJrshyT+sKsNrc=;
h=Date:Subject:To:References:From:In-Reply-To;
b=ZyrTVbUO0K9pgO/87hXNnlvpIMEkwdaoNyYqrFjh5n9ILLMV/osNhp7gj2Ny8WbyU
8Jq9UedPwivQp/P8eLKw43B2xfla6p/wfGD9MN+6SMDFTjtLFzC8AavTz/B0w4T7d8
+b6f4w11f5wxRN4lZL/y4NvZFBiOz5UGvazgLwsv2g3x+GB6uE3ZX/jMJ++GVIywLA
yk5PEsZIfQy4vN281gd6JmMOt+n8cJig5VqfMa0xAHmMKXpIcxP/JhYOyEL7b+PZ8C
yeMm8NuTGIImhKzFExHJ8UwC75a2JamsYD1tgQM3QNT3L+MRq+E79kWhzNN2j9Nd/a
b2XNiTOHbOagw==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=QMIL+iHL c=1 sm=1 tr=0 ts=622562f2
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=KJUjdXAukfxTSRk2lRUA:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Content-Language: en-GB
In-Reply-To: <MxWmaxK--3-2@tutanota.com>
X-CMAE-Envelope: MS4xfGO/f4p8ykFuIKYZCNmb9PJV95hRDvz6MznVEiTxliN3BvDrtQmOP77zrrGLwPN6dpvRSN1i0uPNGuTCOKK/0QRpcmPiInTAk3cSAkRVeb7yYxrhUR96
pGrfo3IWSJHSzc2yoh4HAOdLXDD80J6SkgHtK7KQ5/H5s+LHQlxgkSgRwDy/JD6F4LdeYeQ8x56TvczF1pJJ2onZAvGSnZ94xsY=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <5ad962fc-1257-dd8d-96ab-541ae5bae2fa@mrabarnett.plus.com>
X-Mailman-Original-References: <MxWmaxK--3-2@tutanota.com>
 by: MRAB - Mon, 7 Mar 2022 01:42 UTC

On 2022-03-07 00:32, Jen Kris via Python-list wrote:
> I am using the C API in Python 3.8 with the nltk library, and I have a problem with the return from a library call implemented with PyObject_CallFunctionObjArgs.
>
> This is the relevant Python code:
>
> import nltk
> from nltk.corpus import gutenberg
> fileids = gutenberg.fileids()
> sentences = gutenberg.sents(fileids[0])
> sentence = sentences[0]
> sentence = " ".join(sentence)
> pt = nltk.word_tokenize(sentence)
>
> I run this at the Python command prompt to show how it works:
>>>> sentence = " ".join(sentence)
>>>> pt = nltk.word_tokenize(sentence)
>>>> print(pt)
> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>>> type(pt)
> <class 'list'>
>
> This is the relevant part of the C API code:
>
> PyObject* str_sentence = PyObject_Str(pSentence);
> // nltk.word_tokenize(sentence)
> PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize");
> PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0);
>
> (where pModule_mstr is the nltk library).
>
> That should produce a list with a length of 7 that looks like it does on the command line version shown above:
>
> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>
> But instead the C API produces a list with a length of 24, and the REPR looks like this:
>
> '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", \',\', "\'", \']\', "\'", \']\']'
>
> I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without success.
>
> Thanks for any help on this.
>
What is pSentence? Is it what you think it is?
To me it looks like it's either the list:

['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']

or that list as a string:

"['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"

and that what you're tokenising.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor