Browsing php pages properly?
TacoVendor
created: 2006-02-01 11:13:13
I haven't figured out the easiest way to describe what I need to do so that it is clear, but I will do my best here. Easiest to start with explaining what works I guess.

Using a web browser, I am browsing a site that is serving up pages using php. Some links trigger a server side script that pushes a url back down to the browser with an instant page refresh. I call 'server.com/page1.php?script' the browser next displays a url of 'server.com/results.php=xxx'.

I can move perl through calling web pages and such without a problem, I just have no idea how to get perl to see the results that are pushed down like this from a server side script.

This specific script is running right now on ActivePerl 5.8.x on Win32. If I can get it working properly here I can figure out how to port it to the other versions/os's that I need to work with.

If more clarification is needed then ask in a reply and I will give whatever info I can.

Thanks.
Re: Browsing php pages properly?
created: 2006-02-01 11:21:43

If I understand correctly, you're trying to downloading web pages. One (or more) of them redirects you to another page, and you want to know how to download the page to which you've been redirected.

LWP should already do this for you. Compare the request and simple_request methods of LWP::UserAgent.

However, LWP will only do this if the redirect is an HTTP redirect. If the redirect is done via HTML's META tags or via JavaScript, LWP cannot help you.

WWW::Mechanize might process HTML META tags, but it definitely will not do JavaScript either.

If the PHP emits JavaScript to perform the redirection, you might want to consider Win32::IE::Mechanize.

Re^2: Browsing php pages properly?
created: 2006-02-01 11:40:08
It is javascript that is doing the redirect.

My issue is focused around being able to see what url the server has pushed. If the only option is to use the IE::Mechanize module to launch an IE instance to see the pushed url then so be it, but is there by chance a way to have perl 'sit back and wait' like a browser to accept data from a server side push like this?
Re^3: Browsing php pages properly?
created: 2006-02-01 11:58:46

No, because LWP doesn't look at the response body, and because WWW::Mechanize doesn't pass the JavaScript to a JavaScript engine. I think there's is a project to create a JavaScript engine in Perl, but I don't know its status.

Now, if you're only dealing with one .php (or a set that behave identically), you could search the JavaScript code for the URL using a simple regexp, then fetch that page yourself.

Re^4: Browsing php pages properly?
created: 2006-02-01 12:05:39
I cannot search the java code for the url since the returned url is generated at the time of calling the link from the 1st page. The data after the '=' in the url is used to generate the page itself when called by the browser.

After looking over the IE::Mech module, that looks like it will work for what I need. The integration of the OLE module will let me grab the pushed url and I should be ok at that point.

ikegame, thanks for the info, especially the reference to IE::Mech. I hadn't come across that one before and would not have even thought about doing this that way.

perlmonks.org content © perlmonks.org and ikegami, TacoVendor

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03