Can a site track down who is parsing them? My program will need to login. Also, will the webmaster notice that someone is parsing the site? (ie. lots of 'hits' to the site).
thanks
2006-08-03 Retitled by planetscape, as per Monastery guidelines
Original title: 'safe to mechanize?'
It sounds like you are planning to do something immoral and/or illegal - just don't. (And the site administrator will probably have the data they need to track you down should they want to.)
What if he's an agent of a repressive government who wants to surreptitiously research on human rights organizations in order to find dissidents to arrest?
If what they're trying to do is above board, state the objective up front rather than coming at it obliquely. There've been posters in the past that had questions in a similar vein that turned out to be trying to defraud someone (there've been other script kiddies; that's just the one that sticks in my mind due to the persistence and the astonishing utter ignorance of how networks work (OK, and the presence of one of my all time highest rep nodes :)).
But consider this: if you are a bit naive and being scrupulous, you might not think about being perceived as dishonest. The possiblities might just never cross your mind. I often find myself having to puzzle over why certain regulations exist while I shift to a sneaky frame of mind to untangle the logic.
Or maybe the OP is not comfortable stating his reasoning for fear of drawing attention.
Continue my hypothetical: if you were trying to avoid a dictator's torture chamber, would you get on a message board and announce what you're trying to do and why?
While I can think of several reasons you would want to do this, I can't think of any that make me want to help you. This isn't really a Perl question, either ... you might want to update your node to demonstrate that (a) you are doing something that is legal and moral, and (b) you are attempting something in Perl.
It would also be wise to set your user agent to something normal.
This might be sufficient to avoid suspicion for most sites, but some sites will be more paranoid (I'd imagine safari would be one of these).
A couple suspicious activities:
Of course they can tell. Webmasters have eerie powers. And legions of flying monkeys which they'll send out over the Intarweb to track you down and give you such a wedgie.
Not to mention they monkeys' eerie powers. Well, not really eerie; more . . . preternatural. I mean they can fly and give superwedgies, but that's about it. Above and beyond what one would normally expect from monkeys, at least.
Oh, and if you've ever logged in to the site from anywhere you'd better make sure to check the name on your waistband now (because that's how the monkeys check they've got the right guy; if you don't have your name there, they'll take your wallet to first check your ID and give you the aforementioned wedgie).
Yeah, that'll really help to hit the web site from a bazillion different anonymous proxies . . . so he can log into the site to spider it.
I'll say this again, slowly. Look at the original question and read again:
IT DOES NOT HELP TO USE AN ANONYMOUS PROXY TO HIT A SITE TO WHICH YOU MUST SUPPLY A LOGIN TO ACCESS.
(Presuming of course you don't also have some sort of . . . well I don't want to colour things with the wrong adjective for those hypothetical human rights researchers but I can't think of another, but some sort of "fraudulent" means of obtaining an authentication token which can't be traced back to you).
perlmonks.org content © perlmonks.org and andyford, Anonymous Monk, dorward, Fletch, imp, perlmonkey2, ptum
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03