Size of a webpage
js1
created: 2004-06-16 03:58:04

Hi,

I want to write a perl script to measure the size of a web page. Apparently the page is using too much bandwidth (34kb) although by my calculations it's only 25kb.

I had a look at LWP but I'm not sure if that's the right module for this job because I need to take into account the headers and gif files. Does anyone know which module to I should use?

Thanks for any help.

js1.

Re: Size of a webpage
created: 2004-06-16 04:09:56
You could use the HTTP::Size module. Take a look at its POD for the get_sizes() method. The POD states that method "fetches all of the images then sums the sizes of the original page and image sizes. It returns a total download size."

Here's an untested example:

use HTTP::Size;
my $total = 
    HTTP::Size::get_sizes( 'http://www.perlmonks.org' );
print "$total\n";

Hope this helps...

UPDATE: Ok, now I've installed the module and tested the snippet above. ...it works, and at least on one of the test-runs the total size of the Perlmonks front page was 87652 bytes. That will change depending on the amount of chatterbox text, the number of and size of Front Paged articles, etc.


Dave

Re: Size of a webpage
created: 2004-06-16 04:21:26

Have a look at Apache::Dynagzip You can use it on static as well as dynamic content. By compressing content before you send it a 50KB page becomes a 5-10KB bandwidth tranmission. Almost all modern browsers will accept compressed content so it is win-win. Google, Slashdot and almost all the biggies use compression. You should also look towards removing all the extraneous whitespace from the docs you serve. View source on Google for example. No spare spaces get sent. Compress::LeadingBlankSpaces (works with Dynagzip) will do this for you.

cheers

tachyon

Re^2: Size of a webpage
created: 2004-06-17 05:22:58
Be aware that MSIE (5.5 SP1/2 and 6 without SP) corrupts cached compressed pages. A fresh install of XP (IE6), for example, shows corrupted pages the second time that you access to some site that gzip pages and allow caching of these (sending Last-Modified headers, etc.).
See some of the MS articles about this: Q313712 and Q312496.
Sadly, thanks to this widespread bug, we should choice between compress pages, or allow caching.
Slashdot and Google doesn't allow caching, I think.
José
Re^3: Size of a webpage
created: 2004-06-17 19:04:43

Hey this is Perl..... If you read the Docs you will see:

It is strongly recommended to use Apache::CompressClientFixup handler in order to avoid compression for known buggy browsers. Apache::CompressClientFixup package can be found on CPAN

This works with any of the gzip compression modules and does not serve gzip compressed content to buggy browsers. The articles you quote are quoted in its docs. So you can have you cake and eat it.

cheers

tachyon

Re^4: Size of a webpage
created: 2004-06-18 03:48:52
As version 0.07 of CompressClientFixup, documentation states that "This version of the handler does not restrict compression for MSIE over HTTP".
So, this version still send compressed content to buggy browsers.
Anyway, It had no sense to fix the module to restrict compression to MSIE 5.5 and 6 (maybe 85% of users? or 95%?), in that case simply don't use compression.
I warn about this because playing with gzipped pages gave me a big big headache.
Regards,
José.
Re: Size of a webpage
js1
created: 2004-06-16 04:43:20

That module doesn't take into account the css style sheet and page referral unfortunately.

js1.

Re^2: Size of a webpage
created: 2004-06-16 04:59:13

Stop complaining! Why not just patch it. It is trivial to patch. See Re: Who wants to help me adjust LinkExtor::Simple? for details of how to make LinkExtor::Simple extract any links you like. Then call the style, extjs methods you just created in the same way $extor->img gets looped over inf HTML::Size

Although you can easily do it it is relevant to remember that CSS, JS, icons, button images, will generally get cached locally so while they might be part of a page they are typically reusable elements.

The ultimate way to do it is to use a logging proxy of some sort. HTTP::Proxy with HTTP::Recorder might be an option.

cheers

tachyon

Re^2: Size of a webpage
created: 2004-06-16 05:34:51
Could you explain what page referral has to do with the page size? I'm getting confused.

J.

Re: Size of a webpage
created: 2004-06-16 04:59:04
Ther was a very similar question yesterday: determine web page size, w/ and w/o images.

pelagic

perlmonks.org content © perlmonks.org and davido, dont_you, Joost, js1, pelagic, tachyon

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03