Random Darwin Award in plain text
codeacrobat
created: 2006-03-22 16:19:31
From time to time hacking in the terminal I need some rest. One of the things I'd love then is reading a random Darwin Award.

The text seems to be pretty good hidden in the html tree, so I decided to use an empirical approach, which filters the surrounding stuff.
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;

my $agent = WWW::Mechanize->new( autocheck => 1 );
   $agent->get('http://cgi.darwinawards.com/cgi/random.pl');
my $content = $agent->content( format => "text" );
my $cr = chr 169;
$content =~ s/.*\d\d\s+Urban Legend//s;
$content =~ s/.*\d\d\s+Personal Account//s;
$content =~ s/.*Reader Submission\s+Pending Acceptance//s;
$content =~ s/\s*DarwinAwards\.com\s*$cr.*//s;
$content =~ s/.*?\([^\)]*?\d{2}[^\)]*\) //s;
$content =~ s/.*Darwin\s?Award\s?Nominee//si;
$content =~ s/.*Confirmed \S+\s?by Darwin//si;
$content =~ s/.*Honorable Mentions//s;
$content =~ s/submitted by.*//si;
$content =~ s/109876543210.*//s;
$content =~ s/^\s+//;

print $content;
Re: Random Darwin Award in plain text
created: 2006-03-23 08:19:33
A valuable replacement for your usual fortune cookies :)
Re^2: Random Darwin Award in plain text
created: 2006-03-23 16:48:12
fortune cookies? who needs fortune cookies ;-)
perl -MLWP::Simple -e '@_ = split/\%\n/, get(q(http://phd.pp.ru/Texts/fun/signatures.txt));print splice @_, @_*rand,1'
Re: Random Darwin Award in plain text
created: 2006-03-29 11:35:21

I quite like this, thank you. I find that not all entries have a trailing newline, so I added this to the code before the print statement: $content = $content . "\n";

Is there a way to run fmt -72 or something similar on this text block to have it break the lines neatly?

Re^2: Random Darwin Award in plain text
created: 2006-03-29 18:15:37
Ah good idea. About the formatting:
perl darwin.pl | fmt -72
would do it.

A perl only solution could use Text::Wrap:
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use Data::Dumper;
use Text::Wrap qw(wrap);

my $agent = WWW::Mechanize->new( autocheck => 1 );
   $agent->get('http://cgi.darwinawards.com/cgi/random.pl');
my $content = $agent->content( format => "text" );
my $cr = chr 169;
$content =~ s/.*\d\d\s+Urban Legend//s;
$content =~ s/.*\d\d\s+Personal Account//s;
$content =~ s/.*Reader Submission\s+Pending Acceptance//s;
$content =~ s/\s*DarwinAwards\.com\s*$cr.*//s;
$content =~ s/.*?\([^\)]*?\d{2}[^\)]*\) //s;
$content =~ s/.*Darwin\s?Award\s?Nominee//si;
$content =~ s/.*Confirmed \S+\s?by Darwin//si;
$content =~ s/.*Honorable Mentions//s;
$content =~ s/submitted by.*//si;
$content =~ s/109876543210.*//s;
$content =~ s/^\s+//;

print wrap("\t", "", "$content\n");

perlmonks.org content © perlmonks.org and codeacrobat, wazoox, willyyam

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03