Extracting Pages via proxy server using WWW::Mechanize
Anonymous Monk
created: 2006-01-04 02:24:07

Hi Monks,

I have a problem while extracting a web page via proxy server using WWW::Mechanize.
I tried this one in LWP::UserAgent i got result but not in WWW::Mechanize.
I have a doubt where to give my proxy username and password?. Any can help me?

Note: I am using Window OS. so $mech->env_proxy(); is not working.

Here my try is
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();

$mech->proxy(['http', 'ftp'], 'http://10.16.5.11:3030');
#$mech->env_proxy();
my $url = 'http://search.cpan.org';
my $response = $mech->get( $url );

my $html = $response->content;
print $html;
Thanks, PerlUser

Edited by [planetscape] - replaced <pre> tags with [id://17558|code] tags

Re: Extracting Pages via proxy server using WWW::Mechanize
created: 2006-01-04 02:30:23
PerlUser,
Perhaps Re: WWW:::Mechanize and credentials will be of help. If not, more details are required to help.

Cheers - L~R

Re^2: Extracting Pages via proxy server using WWW::Mechanize
created: 2006-01-04 04:37:55

Hi Cheers,

Thanks for your reply.
I tried the code. but it's not what i expect.
I done the extraction using LWP::UserAgent. but now i am expecting the same by using WWW::Mechanize.
In LWP::UserAgent i used the statement like '$req->proxy_authorization_basic($username, $password);' to access the proxy.
In WWW::Mechanize where i have to give my username and password for proxy server?.

The code i tried in LWP::UserAgent

use LWP::UserAgent;
my $ua = new LWP::UserAgent;
$ua->proxy(['http']=> 'http://11.12.5.20:3350');

my $url = 'http://www.yahoo.com';

my $req = HTTP::Request->new(GET => $url);
$req->proxy_authorization_basic($username, $password);
my $res = $ua->request($req);

if ($res->is_success) {
	print $res->content;
}
Thanks,
PerlUser
Re: Extracting Pages via proxy server using WWW::Mechanize
created: 2006-01-04 04:49:15
I've used the following solution before, which extends mechanize. I'm pretty sure I found this on perlmonks, but I can't find the original node, apologies to the monk i'm ripping off :)
#!/usr/bin/perl -w
my $mech = WWW::MechanizeCustom->new();
$mech->proxy('http','http://your-proxy:80');
$mech->set_basic_credentials( 'username', 'password' );


$mech->get("http://perlmonks.com");
print $mech->content;



#================================================================
package WWW::MechanizeCustom;

use base 'WWW::Mechanize';

# add a set_basic_credentials method, using a closure to remember
{
    my ( $username, $password );
    sub set_basic_credentials{ ( $username, $password ) = @_[1..2] }
    sub get_basic_credentials{ $username, $password };
}

---
my name's not Keith, and I'm not reasonable.
Re: Extracting Pages via proxy server using WWW::Mechanize
created: 2006-01-04 05:47:05
When the net ops turned on proxy authentication at our work using NTLM our solaris systems had problems. Firefox etc can set up authentication but others use environment variables like http_proxy

Solution was to put authentication into variable. Bit of a bummer since need to update it every 90 days but in most cases it does work.

Maybe you can do something similar in you proxy setup line

...
$mech->proxy(['http', 'ftp'], 'http://user@password:10.16.5.11:3030');
...
Re: Extracting Pages via proxy server using WWW::Mechanize
created: 2006-01-04 08:56:06

My boilerplate advice is this:

I would try using a module such as HTTP::Recorder or WWW::Mechanize::Shell to record a successful manual form submission. The output of HTTP::Recorder, for instance, can be "dropped" right into your WWW::Mechanize scripts.

I am also on Win32; I have had no problems using HTTP::Proxy, so that may be a possibility for you.

Another important tool for finding out what is really happening behind the scenes between server and browser is a protocol analyzer such as Ethereal.

Don't forget that Super Search is your friend here on PM. Many questions such as yours have been asked recently...

HTH,

planetscape

perlmonks.org content © perlmonks.org and Anonymous Monk, Limbic~Region, planetscape, reasonablekeith, tweetiepooh

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03