#!/usr/bin/perl -w use strict; use File::Copy qw(copy); use LWP::Simple qw(mirror is_error); my $url = 'http://...'; my $file = '/home/juerd/tmp/foo.html'; copy $file, "$file.old" or warn $!; my $status = mirror $url, $file; warn "HTTP $status" if is_error $status; system qw(diff -u), "$file.old", $file;
#!/usr/bin/perl -w use strict; +use File::Copy qw(copy); -use LWP::Simple qw(mirror is_success); +use LWP::Simple qw(mirror is_error); my $url = 'http://...'; my $file = '/home/juerd/tmp/foo.html'; -rename $file, "$file.old" or warn $!; +copy $file, "$file.old" or warn $!; my $status = mirror $url, $file; -warn "HTTP $status" unless is_success $status; +warn "HTTP $status" if is_error $status; system qw(diff -u), "$file.old", $file;
In this snippet, you actually download the whole file each time you (cron) run(s) the script. Wouldn't it be nicer if you'd just ask for a HEAD and check the "Last-Modified" header and do some local testing on that?
$ HEAD http://www.server.tld/page.htm | grep "Last-Modified"
In this snippet, you actually download the whole file each time you (cron) run(s) the script.
Not true.
From LWP::UserAgent, that LWP::Simple uses under the hood:
$ua->mirror( $url, $filename ) This method will get the document identified by $url and store it in file called $filename. If the file already exists, then the request will contain an "If-Modified-Since" header matching the modification time of the file. If the document on the server has not changed since this time, then nothing happens. If the document has been updated, it will be downloaded again. The modification time of the file will be forced to match that of the server.
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
Aren't you defeating the mirror check by renaming $file to "$file.old" before giving $file to the mirror call by which time it won't exists?
Aren't you defeating the mirror check by renaming $file to "$file.old" before giving $file to the mirror call by which time it won't exists?
Oops; yes. Updated.
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
I'd be even happier if you used Text::Diff or something equivalent instead of a system call. :-)
ihb
I'd be even happier if you used Text::Diff or something equivalent instead of a system call. :-)
For something that runs once per day, it's not worth the trouble. I even use `cat foo` in scripts like this one.
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
In short, my point is that when sharing it with other monks I'd be happier to see a portable snippet since it doesn't require much work to make it that. Of course, it's better to share a non-portable snippet than not share at all; that's why I said "happier" and not "happy".
It's not worth the trouble for you when you use it, but since this post isn't targeted to you I just figured it would be nice if you patched it so that more could benefit from it. Just as you'd do with any CPAN module you publish.
ihb
perl -le "print unpack'N', pack'B32', '00000000000000000000001011100100'"
I wanted to use it as a replacement for Personal Nodelet - so it has a special (undocumented) feature that links to Perl Monks are internally converted to links to appriopriate The Pen pages.
By the way most current web browsers can notify you about changes to pages in your bookmarks.
By the way most current web browsers can notify you about changes to pages in your bookmarks.
I don't just want to know that it changed, I want to know exactly which lines were added and removed. There are numerous scripts that do something like this, but creating a new one is MUCH easier than reading manuals of other scripts, because they're all bloated with features I don't need right now.
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
It is much more accurate and safe than only using a diff.
Accuracy is irrelevant for text documents. Either a line is the same, or it is not. Besides that, I'm especially interested in *which* lines are different, and how they changed. diff tells me exactly that.
the last modified date. This value is not safe/trustworthy.
It has proven to be worthy of my trust.
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
perlmonks.org content © perlmonks.org and b10m, BrowserUk, danielcid, ihb, Juerd, rob_au, zby
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03