Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

It is easier to write an incorrect program than understand a correct one.


computers / comp.os.linux.misc / Re: wget or curl question

SubjectAuthor
* wget or curl questionpH
+* Re: wget or curl questionAndreas Kohlbach
|`* Re: wget or curl questionpH
| `- Re: wget or curl questionAndreas Kohlbach
`- Re: wget or curl questionComputer Nerd Kev

1
wget or curl question

<s7v837$81l$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=5059&group=comp.os.linux.misc#5059

 copy link   Newsgroups: comp.os.linux.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: wNOSP...@gmail.org (pH)
Newsgroups: comp.os.linux.misc
Subject: wget or curl question
Date: Tue, 18 May 2021 02:17:11 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <s7v837$81l$1@dont-email.me>
Injection-Date: Tue, 18 May 2021 02:17:11 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0182e46dda5982e486566893ade6c202";
logging-data="8245"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cDZsBwiBcWDqdY9lfEBiCGhPbSvUOKoI="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:kBJ/xNL58frH0yBVFC6l8rvYSpM=
 by: pH - Tue, 18 May 2021 02:17 UTC

An acquaintance showed me a website that has some neat old technology
magazines.

He wanted to download all the magazines. I thought that wget or curl could
do something like that.

Here's the url:

https://www.inventionandtech.com/magazine/archive

It's been some time since I tried but I do remember that I failed miserably
with curl and with wget I was able to get the website directory/tree
structure, but no content.

I came across his emaiil and thought I might try again...any suggestions for
what to try?
I think he just wants the magazines, no web structure needed. I looked
through one or two of them; they're cool, but who has all the time to read
everything sent our way these days.

pH in Aptos

Re: wget or curl question

<87pmxotk97.fsf@usenet.ankman.de>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=5066&group=comp.os.linux.misc#5066

 copy link   Newsgroups: comp.os.linux.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ank...@spamfence.net (Andreas Kohlbach)
Newsgroups: comp.os.linux.misc
Subject: Re: wget or curl question
Date: Tue, 18 May 2021 06:21:40 -0400
Organization: https://news-commentaries.blogspot.com/
Lines: 41
Message-ID: <87pmxotk97.fsf@usenet.ankman.de>
References: <s7v837$81l$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="786135baa80becf4f70f73cb05a32886";
logging-data="12689"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/v2jsZRys7Eb4UOlo7GQco"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:xc2scPD7+iDi+5rTg5ZuXu6Z3o0=
sha1:eyOxrvChxU5OhMGyqIKHaCTmt2A=
X-Face: '#R~-oJz-_!iXhczPJ;=w1(`5-uQ2$0qHB7KKDV,]VoAC!P?swaa#m|eB<DkOt*XH=~9C[g S^w)b,)1q,{P\7Z3H,N(^m.YKuYM//B{X:PvbDk.|:g:$wVr*3*)[K6F+k\z-s32+oB]YJPy11wuGGz'bQAk~1.b1[;M{^A2@bboIENBB:Wd:<Fm~r7OuiJA1g}7KC-T'>Du+
X-Face-What-Is-It: Capture Bee from Galaga
 by: Andreas Kohlbach - Tue, 18 May 2021 10:21 UTC

On Tue, 18 May 2021 02:17:11 -0000 (UTC), pH wrote:
>
> An acquaintance showed me a website that has some neat old technology
> magazines.
>
> He wanted to download all the magazines. I thought that wget or curl could
> do something like that.
>
> Here's the url:
>
> https://www.inventionandtech.com/magazine/archive
>
> It's been some time since I tried but I do remember that I failed miserably
> with curl and with wget I was able to get the website directory/tree
> structure, but no content.
>
> I came across his emaiil and thought I might try again...any suggestions for
> what to try?
> I think he just wants the magazines, no web structure needed. I looked
> through one or two of them; they're cool, but who has all the time to read
> everything sent our way these days.

Try something like

wget -r -l 2 -A "*.pdf" https://www.inventionandtech.com/magazine/archive

which gets recursive and two levels deep, hoping the magazines are
there. Omitting -l will go five levels.

-l 0 would go as deep as it can. Chances are you end up downloading the
whole internet. ;-)

The -A should only accept PDF files, assuming the files you're after are
in PDF format.

It might still download a lot of rubbish though. I never figured out a
way to just get what I wanted.
--
Andreas

PGP fingerprint 952B0A9F12C2FD6C9F7E68DAA9C2EA89D1A370E0

Re: wget or curl question

<s815o4$8si$1@dont-email.me>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=5077&group=comp.os.linux.misc#5077

 copy link   Newsgroups: comp.os.linux.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: wNOSP...@gmail.org (pH)
Newsgroups: comp.os.linux.misc
Subject: Re: wget or curl question
Date: Tue, 18 May 2021 19:49:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <s815o4$8si$1@dont-email.me>
References: <s7v837$81l$1@dont-email.me> <87pmxotk97.fsf@usenet.ankman.de>
Injection-Date: Tue, 18 May 2021 19:49:25 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="0182e46dda5982e486566893ade6c202";
logging-data="9106"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9twX4U9PIPeesb+zy2sTaamxbGTffc1Y="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:8F5gBVM81Elrk3WsOLxy1nmQL5Y=
 by: pH - Tue, 18 May 2021 19:49 UTC

On 2021-05-18, Andreas Kohlbach <ank@spamfence.net> wrote:
> On Tue, 18 May 2021 02:17:11 -0000 (UTC), pH wrote:
>>
>> An acquaintance showed me a website that has some neat old technology
>> magazines.
>>
>> He wanted to download all the magazines. I thought that wget or curl could
>> do something like that.
>>
>> Here's the url:
>>
>> https://www.inventionandtech.com/magazine/archive
>>
>> It's been some time since I tried but I do remember that I failed miserably
>> with curl and with wget I was able to get the website directory/tree
>> structure, but no content.
>>
>> I came across his emaiil and thought I might try again...any suggestions for
>> what to try?
>> I think he just wants the magazines, no web structure needed. I looked
>> through one or two of them; they're cool, but who has all the time to read
>> everything sent our way these days.
>
> Try something like
>
> wget -r -l 2 -A "*.pdf" https://www.inventionandtech.com/magazine/archive
>
> which gets recursive and two levels deep, hoping the magazines are
> there. Omitting -l will go five levels.
>
> -l 0 would go as deep as it can. Chances are you end up downloading the
> whole internet. ;-)
>
> The -A should only accept PDF files, assuming the files you're after are
> in PDF format.
>
> It might still download a lot of rubbish though. I never figured out a
> way to just get what I wanted.

Thanks for that suggested command line. I'll report back when I get a
chance to give it a whirl later on tonight. (They do look like very
interesting magazines.)

pH

Re: wget or curl question

<87wnrvsr1j.fsf@usenet.ankman.de>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=5079&group=comp.os.linux.misc#5079

 copy link   Newsgroups: comp.os.linux.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ank...@spamfence.net (Andreas Kohlbach)
Newsgroups: comp.os.linux.misc
Subject: Re: wget or curl question
Date: Tue, 18 May 2021 16:52:40 -0400
Organization: https://news-commentaries.blogspot.com/
Lines: 12
Message-ID: <87wnrvsr1j.fsf@usenet.ankman.de>
References: <s7v837$81l$1@dont-email.me> <87pmxotk97.fsf@usenet.ankman.de>
<s815o4$8si$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="786135baa80becf4f70f73cb05a32886";
logging-data="32319"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18md/ufwLhlzzM8htL6LbZT"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:+kig8KHm7wGunpshByrodlh2zEY=
sha1:s/4DYb7O87LVgCCmJ1KHy9wXMvU=
X-No-Archive: Yes
X-Face: '#R~-oJz-_!iXhczPJ;=w1(`5-uQ2$0qHB7KKDV,]VoAC!P?swaa#m|eB<DkOt*XH=~9C[g S^w)b,)1q,{P\7Z3H,N(^m.YKuYM//B{X:PvbDk.|:g:$wVr*3*)[K6F+k\z-s32+oB]YJPy11wuGGz'bQAk~1.b1[;M{^A2@bboIENBB:Wd:<Fm~r7OuiJA1g}7KC-T'>Du+
X-Face-What-Is-It: Capture Bee from Galaga
 by: Andreas Kohlbach - Tue, 18 May 2021 20:52 UTC

On Tue, 18 May 2021 19:49:25 -0000 (UTC), pH wrote:
>
> Thanks for that suggested command line. I'll report back when I get a
> chance to give it a whirl later on tonight. (They do look like very
> interesting magazines.)

I now looked into some. The texts themselves appear to be plain HTML, not
PDF as I assumed (most documentation comes in PDF today).

Good luck.
--
Andreas

Re: wget or curl question

<s81hfq$tek$1@gioia.aioe.org>

 copy mid

https://www.novabbs.com/computers/article-flat.php?id=5081&group=comp.os.linux.misc#5081

 copy link   Newsgroups: comp.os.linux.misc
Path: i2pn2.org!i2pn.org!aioe.org!KEGN5yBX2eVvY4ATKFOU2Q.user.gioia.aioe.org.POSTED!not-for-mail
From: not...@telling.you.invalid (Computer Nerd Kev)
Newsgroups: comp.os.linux.misc
Subject: Re: wget or curl question
Date: Tue, 18 May 2021 23:09:46 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 42
Message-ID: <s81hfq$tek$1@gioia.aioe.org>
References: <s7v837$81l$1@dont-email.me>
NNTP-Posting-Host: KEGN5yBX2eVvY4ATKFOU2Q.user.gioia.aioe.org
X-Complaints-To: abuse@aioe.org
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
X-Notice: Filtered by postfilter v. 0.9.2
 by: Computer Nerd Kev - Tue, 18 May 2021 23:09 UTC

pH <wNOSPAMp@gmail.org> wrote:
> An acquaintance showed me a website that has some neat old technology
> magazines.
>
> He wanted to download all the magazines. I thought that wget or curl could
> do something like that.
>
> Here's the url:
>
> https://www.inventionandtech.com/magazine/archive
>
> It's been some time since I tried but I do remember that I failed miserably
> with curl and with wget I was able to get the website directory/tree
> structure, but no content.

A "website directory/tree structure" implies to me something more
like this to me:
http://archive.debian.org

The page at the URL you provided has lots of unwanted navigation
links, and the magazine pages don't descend the directory tree but
jump between various top-level directories including "/volume",
"/content", and "/node".

In my experience wget's recursive mode can work very well on plain
server-generated directory indexes like the Debian archive, _if_
you mess about with the command arguments a fair bit to start with
(a bit like (and including) regular expressions, there's always
some edge case that you forgot with the first version). For a
complex site like that I think that it would take quite a while to
to narrow down the items that Wget downloads to just what you
require, and present it in a browsable way. I think you'd need to
do at least a bit of scripting to process the results, if not to
run the whole process itself.

Using a dedicated website downloader program such as HTTrack would
probably be a quicker route.
https://www.httrack.com/

--
__ __
#_ < |\| |< _#

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor