(OT) safe to mechanize?
Anonymous Monk
created: 2006-08-03 09:39:13
I want to parse someone's site, but I don't want them to know that I'm parsing it.

Can a site track down who is parsing them? My program will need to login. Also, will the webmaster notice that someone is parsing the site? (ie. lots of 'hits' to the site).

thanks

2006-08-03 Retitled by planetscape, as per Monastery guidelines ( keep:1 edit:10 reap:6 )
Original title: 'safe to mechanize?'

Re: (OT) safe to mechanize?
created: 2006-08-03 09:44:16

It sounds like you are planning to do something immoral and/or illegal - just don't. (And the site administrator will probably have the data they need to track you down should they want to.)

Re^2: (OT) safe to mechanize?
created: 2006-08-03 12:27:57
What about privacy concerns? What if he means he's a human rights researcher and wants to avoid arrest by a repressive government?
Re^3: (OT) safe to mechanize?
created: 2006-08-03 14:15:42

What if he's an agent of a repressive government who wants to surreptitiously research on human rights organizations in order to find dissidents to arrest?

If what they're trying to do is above board, state the objective up front rather than coming at it obliquely. There've been posters in the past that had questions in a similar vein that turned out to be trying to defraud someone (there've been other script kiddies; that's just the one that sticks in my mind due to the persistence and the astonishing utter ignorance of how networks work (OK, and the presence of one of my all time highest rep nodes :)).

Re^4: (OT) safe to mechanize?
created: 2006-08-03 14:29:07
OK I can see that point of view.

But consider this: if you are a bit naive and being scrupulous, you might not think about being perceived as dishonest. The possiblities might just never cross your mind. I often find myself having to puzzle over why certain regulations exist while I shift to a sneaky frame of mind to untangle the logic.

Or maybe the OP is not comfortable stating his reasoning for fear of drawing attention.
Continue my hypothetical: if you were trying to avoid a dictator's torture chamber, would you get on a message board and announce what you're trying to do and why?

Re: (OT) safe to mechanize?
created: 2006-08-03 09:48:04

While I can think of several reasons you would want to do this, I can't think of any that make me want to help you. This isn't really a Perl question, either ... you might want to update your node to demonstrate that (a) you are doing something that is legal and moral, and (b) you are attempting something in Perl.


No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: (OT) safe to mechanize?
imp
created: 2006-08-03 09:49:54
If you want a better chance of going undetected you could pause between requests, perhaps by using WWW::Mechanize::Sleepy.

It would also be wise to set your user agent to something normal.

This might be sufficient to avoid suspicion for most sites, but some sites will be more paranoid (I'd imagine safari would be one of these).

Re^2: (OT) safe to mechanize?
imp
created: 2006-08-03 10:02:49
As I mentioned, a sufficiently paranoid site will likely catch you, by detecting behaviour that is not typical of humans.

A couple suspicious activities:

  • Opening every link of a page in sequence
  • Opening links that are not visible. This could be due to style settings, or being in an invisible block
If it's a commercial site with a monetary interest to protect, you will likely be caught.
Re: (OT) safe to mechanize?
created: 2006-08-03 09:54:03

Of course they can tell. Webmasters have eerie powers. And legions of flying monkeys which they'll send out over the Intarweb to track you down and give you such a wedgie.

Not to mention they monkeys' eerie powers. Well, not really eerie; more . . . preternatural. I mean they can fly and give superwedgies, but that's about it. Above and beyond what one would normally expect from monkeys, at least.

Oh, and if you've ever logged in to the site from anywhere you'd better make sure to check the name on your waistband now (because that's how the monkeys check they've got the right guy; if you don't have your name there, they'll take your wallet to first check your ID and give you the aforementioned wedgie).

Re: (OT) safe to mechanize?
created: 2006-08-03 14:06:38
Would it be wrong/immoral to bring up the legions of free proxy servers on the net? It would take some complex code, but you could take a list of 100 proxy servers and randomly chose one for your next link to be opened. Or if you don't care if they know they were parsed, your only concern is you don't want them to know YOU parsed them, you could use just one proxy. Keep in mind that proxies protect you from a mad webmaster, they do nothing to protect you from the law.
Re^2: (OT) safe to mechanize?
created: 2006-08-03 14:19:56

Yeah, that'll really help to hit the web site from a bazillion different anonymous proxies . . . so he can log into the site to spider it.

Re^3: (OT) safe to mechanize?
created: 2006-08-04 05:59:54
it helps
Re^4: (OT) safe to mechanize?
created: 2006-08-04 08:57:08

I'll say this again, slowly. Look at the original question and read again:

IT DOES NOT HELP TO USE AN ANONYMOUS PROXY TO HIT A SITE TO WHICH YOU MUST SUPPLY A LOGIN TO ACCESS.

(Presuming of course you don't also have some sort of . . . well I don't want to colour things with the wrong adjective for those hypothetical human rights researchers but I can't think of another, but some sort of "fraudulent" means of obtaining an authentication token which can't be traced back to you).

perlmonks.org content © perlmonks.org and andyford, Anonymous Monk, dorward, Fletch, imp, perlmonkey2, ptum

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03