Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"...Unix, MS-DOS, and Windows NT (also known as the Good, the Bad, and the Ugly)." (By Matt Welsh)


devel / comp.lang.python / Urllib.request vs. Requests.get

SubjectAuthor
o Urllib.request vs. Requests.getJulius Hamilton

1
Urllib.request vs. Requests.get

<mailman.29.1638899261.15287.python-list@python.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16327&group=comp.lang.python#16327

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: juliusha...@gmail.com (Julius Hamilton)
Newsgroups: comp.lang.python
Subject: Urllib.request vs. Requests.get
Date: Tue, 7 Dec 2021 12:35:06 +0100
Lines: 27
Message-ID: <mailman.29.1638899261.15287.python-list@python.org>
References: <CAEsMKX3NiKyvrzYpFQcRSwsh4tf2=mpBEuSnwGaU0c9HavR=Yg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de UCRffgL/Ab+zbCKotQnpDwnlG0iQnzmh7frff69e/jUw==
Return-Path: <juliushamilton100@gmail.com>
X-Original-To: Python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=gmail.com header.i=@gmail.com header.b=XlAmIxbM;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: UNSURE 0.351
X-Spam-Level: ***
X-Spam-Evidence: '*H*': 0.32; '*S*': 0.03; 'fundamental': 0.09;
'received:209.85.219': 0.09; 'examples.': 0.16; 'great,': 0.16;
'to:addr:python-list': 0.20; 'option': 0.20; 'returns': 0.22;
'anyone': 0.25; 'comment': 0.31; 'program': 0.31; 'requests,':
0.32; 'message-id:@mail.gmail.com': 0.32; 'but': 0.32;
'received:google.com': 0.34; 'trying': 0.35; 'fine': 0.35;
'from:addr:gmail.com': 0.35; 'thanks,': 0.36; 'people': 0.36;
'currently': 0.37; 'received:209.85': 0.37; 'could': 0.38;
'received:209': 0.39; 'text': 0.39; 'use': 0.39; '(with': 0.39;
'skip:u 20': 0.39; 'url-ip:104.21/16': 0.61; 'seen': 0.62;
'i\xe2\x80\x99ve': 0.62; 'lower': 0.62; 'feel': 0.63; 'between':
0.63; 'prevent': 0.67; 'right': 0.68; 'refine': 0.69; 'subject:.
': 0.73; 'tools': 0.74; 'direct.': 0.84; 'happen,': 0.84; 'job,':
0.84; 'steps.': 0.84; 'vs.': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=mime-version:from:date:message-id:subject:to;
bh=V9piJetkxrp4SA+wfMqTz9QcHKfYx0AOB9mBsEpIJ0M=;
b=XlAmIxbMC0kAT+Yt55z6stJZCZQCJdFpxzk+2/0RqeHSo0hBxnAygX1RSkdBt+5Fk5
EZet0CSndH69mvcyoJlueGaUxF1ld9SPlMGRTCg0c64qkm5HmSompAd7/bsixFRGXnx7
U1siWFT1+1ReA6O5MV5ci5WkntvEu3/k3KC8YdMUzZVbPWYzoBqh6la6UjHS9kALm5kF
gR2Mg0AR9PWmgb2b87JHGFznKwXoGkeJvxPOOSFUroJmvpIt/6nzBlwFalpHk5u+qnh7
uQXElcVf1CQbcnI0bON31jZB19jFt7U/m7WgWM6WjH26ccHW/DvgYm1/enlE8jx1Q2Da
TeXA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
bh=V9piJetkxrp4SA+wfMqTz9QcHKfYx0AOB9mBsEpIJ0M=;
b=lHLxU6T7iiQoFjW5mVTjHlmedIWQplNgvqK/s7Cx3MFPCRU+rE8KHFiIGjW3DQj42j
/WvAAYg2l3WfXwidBABGka14Oz+PiEJniJ1Aeli/FMkR2NqAKTNROitwGFZE4OLrDzf/
VAa4o0/DpunYiarMz+mWq4avmbHRyo3SkyUoK80Sv54qAfDZZNq/DeXIxW5wRJA6bMNi
9fSvIreziKWPuNSL3UYLAaoDUMYsgNnpzhLx2gKsyDYuYHGoFwwYyeYqDjS62YUJzMq8
RSAmgpW/9vc7j2bjDRXslfYwgull4+WfrKNJh83+tTJMZBkBvtBP5vn5a+a79cq7Z47Y
v5lg==
X-Gm-Message-State: AOAM530bmnW3PqEaOiql7GXGgvsTcBupW/N3d1ptdQcjp8sp0PO34nym
NwF2mR7GGWK9tAZCdgKrWK6I0jqm4y3+SMd01l6pimpPgxg=
X-Google-Smtp-Source: ABdhPJy2q2JjFvZHdjj3jZoWdAtOa2dORTrPJ4QEczgv8LOQ92laxTBV9nYV6ezByIdcmxJSLD5yeiMJro+wg63fXEQ=
X-Received: by 2002:a25:944:: with SMTP id u4mr54702737ybm.80.1638876916805;
Tue, 07 Dec 2021 03:35:16 -0800 (PST)
X-Mailman-Approved-At: Tue, 07 Dec 2021 12:47:40 -0500
X-Content-Filtered-By: Mailman/MimeDel 2.1.38
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.38
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAEsMKX3NiKyvrzYpFQcRSwsh4tf2=mpBEuSnwGaU0c9HavR=Yg@mail.gmail.com>
 by: Julius Hamilton - Tue, 7 Dec 2021 11:35 UTC

Hey,

I am currently working on a simple program which scrapes text from webpages
via a URL, then segments it (with Spacy).

I’m trying to refine my program to use just the right tools for the job,
for each of the steps.

Requests.get works great, but I’ve seen people use urllib.request.urlopen()
in some examples. It appealed to me because it seemed lower level than
requests.get, so it just makes the program feel leaner and purer and more
direct.

However, requests.get works fine on this url:

https://juno.sh/direct-connection-to-jupyter-server/

But urllib returns a “403 forbidden”.

Could anyone please comment on what the fundamental differences are between
urllib vs. requests, why this would happen, and if urllib has any option to
prevent this and get the page source?

Thanks,
Julius

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor