Rocksolid Light

Welcome to Rocksolid Light

register   nodelist   faq  


rocksolid / rocksolid.shared.offtopic / Re: I can't believe it, the baidu spider is scanning def3 on tor

SubjectAuthor
* I can't believe it, the baidu spider is scanning def3 on tortrw
`* Re: I can't believe it, the baidu spider is scanning def3 on torAnonUser
 `* Re: I can't believe it, the baidu spider is scanning def3 on torAnonUser
  `* Re: I can't believe it, the baidu spider is scanning def3 on torAnonUser
   `* Re: Re I cant believe it the baidu spider is scanning def3 on tortrw
    `* Re: Re I cant believe it the baidu spider is scanning def3 on torAnonUser
     +* Re: Re I cant believe it the baidu spider is scanning def3 on torAnonUser
     |`- Re: Re I cant believe it the baidu spider is scanning def3 on torGuest
     `- Re: Re I cant believe it the baidu spider is scanning def3 on tortrw

Subject: I can't believe it, the baidu spider is scanning def3 on tor
From: trw@i2pmail.org (trw)
Newsgroups: rocksolid.shared.offtopic
Organization: Dancing elephants
Date: Fri, 18 Oct 2019 23:18 UTC
Seems like a strange thing, and maybe it is a fake: baidu shows no hit when searching for the name or the onion address.
Or maybe they just harvest the info, and censor it right after...

trw
Posted on def3


Subject: Re: I can't believe it, the baidu spider is scanning def3 on tor
From: AnonUser@rslight.i2p (AnonUser)
Newsgroups: rocksolid.shared.offtopic
Organization: Rocksolid Light
Date: Fri, 18 Oct 2019 23:34 UTC
trw wrote:

Seems like a strange thing, and maybe it is a fake: baidu shows no hit when
searching for the name or the onion address.
Or maybe they just harvest the info, and censor it right after...

I see baidu search in my logs also. I assume it's someone running the spider themself (or spoofing the user-agent). May not be the site. Often they disable delays between page requests and ignore robots.txt, which most real sites won't do.


--
Posted on Rocksolid Light



Subject: Re: I can't believe it, the baidu spider is scanning def3 on tor
From: anonuser@retrobbs.rocksolidbbs.com.remove-kn3-this (AnonUser)
Newsgroups: rocksolid.shared.offtopic
Organization: RetroBBS
Date: Sat, 19 Oct 2019 19:15 UTC
  To: AnonUser
this seems most likely.

people think nothing of it if googlebot or any of the major search engine bots is crawling your site without a delay, or even ignoring robots.txt or simply don't want any attention.

source: i've written crawlers in the past and spoofed the user agent because of those reasons.
--
Posted on RetroBBS



Subject: Re: I can't believe it, the baidu spider is scanning def3 on tor
From: anonuser@retrobbs.rocksolidbbs.com.remove-9vm-this (AnonUser)
Newsgroups: rocksolid.shared.offtopic
Organization: RetroBBS
Date: Sat, 19 Oct 2019 19:17 UTC
  To: AnonUser
googlebot and the other well known bots usually also got preferential treatment.
--
Posted on RetroBBS



Subject: Re: Re I cant believe it the baidu spider is scanning def3 on tor
From: trw@i2pmail.org (trw)
Newsgroups: rocksolid.shared.offtopic
Organization: Dancing elephants
Date: Sat, 19 Oct 2019 20:08 UTC
source: i've written crawlers in the past and spoofed the user agent because of those reasons.

were those for clearnet or for darknets (or both) ? i don't mind bots as long as they do not consume too many resources...

cheers

trw
Posted on def3


Subject: Re: Re I cant believe it the baidu spider is scanning def3 on tor
From: anonuser@retrobbs.rocksolidbbs.com.remove-3rb-this (AnonUser)
Newsgroups: rocksolid.shared.offtopic
Organization: RetroBBS
Date: Sat, 19 Oct 2019 22:10 UTC
  To: trw
mainly clearnet. i did also crawl the darknet without a delay back then, simply because the network was really slow.

years ago the darknets were even slower than they are now. one could even say they are "fast" nowadays.

you can always rate limit your eepsite if bots become an issue, it generally isn't worth it for any crawler to re-create tunnels very often, as that is computationally expensive and also takes a bit to warm up. i also do not think most people would bother with that.
--
Posted on RetroBBS



Subject: Re: Re I cant believe it the baidu spider is scanning def3 on tor
From: anonuser@retrobbs.rocksolidbbs.com.remove-ob-this (AnonUser)
Newsgroups: rocksolid.shared.offtopic
Organization: RetroBBS
Date: Sat, 19 Oct 2019 22:13 UTC
  To: AnonUser
it just came to my mind that back then there were clearnet websites which provided access to darknet websites, so clearnet search engines did index the darknets for a while.

i do not know if any such website is still active, though i would seriously doubt it because of the possible legal issues and constant dmca/takedown requests.
--
Posted on RetroBBS



Subject: Re: Re I cant believe it the baidu spider is scanning def3 on tor
From: guest@retrobbs.rocksolidbbs.com (Guest)
Newsgroups: rocksolid.shared.offtopic
Organization: Dancing elephants
Date: Sat, 19 Oct 2019 22:24 UTC
i do not know if any such website is still active,

oh yes, there are many of them that are active. best to be avoided, imo, both as a client and as a server.
Posted on def3


Subject: Re: Re I cant believe it the baidu spider is scanning def3 on tor
From: trw@i2pmail.org (trw)
Newsgroups: rocksolid.shared.offtopic
Organization: Dancing elephants
Date: Sat, 19 Oct 2019 22:31 UTC
you can always rate limit your eepsite if bots become an issue

on i2p everything is dandy, and the java package allows finetuning of these service settings to a very high extent.
it is the tor side that is usually making trouble, and that one has no such measures (unless you add your own code to the service or you play around with iptables).
anyway, bots and spiders are just something that any service has to deal with somehow, it is kind a stress test sometimes, so it helps to find weak spots in the server.

i did some crawling of both tor and i2p some years ago, only by using wget in a script. this was semi successful...


cheers

trw
Posted on def3


1
rocksolid light 0.6.5e
clearnet i2p tor