Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

6 May, 2024: The networking issue during the past two days has been identified and may be fixed. Will keep monitoring.


devel / comp.lang.python / How would one scrap property listing website like this?

SubjectAuthor
* How would one scrap property listing website like this?tripd...@gmail.com
`- Re: How would one scrap property listing website like this?Leo

1
How would one scrap property listing website like this?

<d4de0692-30ac-40b2-8092-db839bd3998bn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19577&group=comp.lang.python#19577

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:a37:348:0:b0:6cf:f30:9f0b with SMTP id 69-20020a370348000000b006cf0f309f0bmr2822570qkd.735.1663864607693;
Thu, 22 Sep 2022 09:36:47 -0700 (PDT)
X-Received: by 2002:a05:6870:6125:b0:126:c619:2b68 with SMTP id
s37-20020a056870612500b00126c6192b68mr2469551oae.284.1663864607409; Thu, 22
Sep 2022 09:36:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Thu, 22 Sep 2022 09:36:47 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=102.89.32.121; posting-account=vE4rmAoAAADc1B6gQVwMLSDaz1LAeKj8
NNTP-Posting-Host: 102.89.32.121
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d4de0692-30ac-40b2-8092-db839bd3998bn@googlegroups.com>
Subject: How would one scrap property listing website like this?
From: tripdarl...@gmail.com (tripd...@gmail.com)
Injection-Date: Thu, 22 Sep 2022 16:36:47 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1153
 by: tripd...@gmail.com - Thu, 22 Sep 2022 16:36 UTC

https://nigeriapropertycentre.com/
Has anyone scrap something like this before?
probably i should try power bi first to see if it can?

Re: How would one scrap property listing website like this?

<tglelf$2n04r$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19583&group=comp.lang.python#19583

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: use...@gkbrk.com (Leo)
Newsgroups: comp.lang.python
Subject: Re: How would one scrap property listing website like this?
Date: Fri, 23 Sep 2022 23:14:56 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <tglelf$2n04r$1@dont-email.me>
References: <d4de0692-30ac-40b2-8092-db839bd3998bn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 23 Sep 2022 23:14:56 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="5404dbeaef6a87ea1e74e1e631b92d7f";
logging-data="2850971"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+T540OqvovP/nDu7R+ZD4G"
User-Agent: Pan/0.151 (Butcha; a6f6327)
Cancel-Lock: sha1:tXx3ziGk3XhwEJlJJVRUMxc9Brg=
 by: Leo - Fri, 23 Sep 2022 23:14 UTC

On Thu, 22 Sep 2022 09:36:47 -0700 (PDT), tripd...@gmail.com wrote:

> https://nigeriapropertycentre.com/
> Has anyone scrap something like this before?
> probably i should try power bi first to see if it can?

You can try something like this.

import urllib.request
from bs4 import BeautifulSoup

URL = "https://nigeriapropertycentre.com/for-rent/lagos"
UA = "Mozilla/5.0 (X11; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/
105.0"

def fetch_url(url):
headers = {"User-Agent": UA}
req = urllib.request.Request(url, headers={"User-Agent": UA})
resp = urllib.request.urlopen(req)
return resp.read().decode("utf-8")

html = fetch_url(URL)
soup = BeautifulSoup(html, "html.parser")

for item in soup.find_all(itemtype="https://schema.org/ListItem"):
row = {}
row["name"] = item.find(itemprop="name").text
row["url"] = item.find(itemprop="url").get("href", "")
row["image"] = item.find(itemprop="image").get("src", "")
row["content-title"] = item.find(class_="content-title").text
row["address"] = item.find("address").text.strip()
row["description"] = item.find(itemprop="description").text.strip()
row["added-on"] = item.find("span", class_="added-on").text.strip()
row["price"] = item.find("span", class_="price").parent.text.strip()

row["aux"] = []

for li in item.find("ul", class_="aux-info").find_all("li"):
row["aux"].append(li.text.strip())

print(row)

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor