Archive mail into a database
jkva
created: 2006-01-11 09:59:16
Previously one huge mailbox file (think about 3GB) was used
to store the archived e-mail. To make searching in
this easier and to make future manipulation easier I wrote a little script that writes e-mail data (dumped by
the Exim MTA into files) into a MySQL db.

#!/usr/bin/perl

use strict;
use warnings;

use DBI;

my $dbh = undef;
my $sth = undef;

my %const = (
   dbhost => 'localhost',
   dbname => 'mail',
   dblogin => '',
   dbpassword => '',
   dbhandler => \$dbh,
   statementhandler => \$sth,
   statement => '',
   maildir => '/usr/db_mail/',
   currentfile => '',
   mail_datetime => '',
   mail_headers => '',
   mail_from => '',
   mail_to => '',
   mail_cc => '',
   mail_subject => '',
   mail_body => '',
);

sub dbconnect {
  $const{dbhandler} = DBI->connect("DBI:mysql:$const{dbname}:$const{dbhost}",$const{dblogin},$const{dbpassword});
}

sub dbdisconnect {
  if($const{dbhandler}) {
    $const{statementhandler}->finish() if $sth;
    $const{dbhandler}->disconnect();
  }
}

sub insert {
  $const{mail_body} = substr $const{mail_body}, 0, 1000000; #first MB(well, almost) of body
  $const{statement} = qq[INSERT INTO archive VALUES(?,?,?,?,?,?,?,?)];
  dbconnect();
  $const{statementhandler} = $const{dbhandler}->prepare($const{statement});
  $const{statementhandler}->execute(undef,$const{mail_datetime},$const{mail_from},$const{mail_to},$const{mail_cc},$const{mail_subject},$const{mail_headers},$const{mail_body});
  dbdisconnect();
}

sub parse {
  open(FILE, $_[0]) or return;
  my $mail = join('', ) if (-f $_[0]) && ($_[0] =~ /^$const{maildir}/);
  close(FILE);
  return if !$mail;
  local $/=undef;
  $const{mail_datetime} = $1 if $mail =~ m/Delivery-date: (.*?)\n/s;
  $const{mail_from} = $1 if $mail =~ m/From: (.*?)\n/s;
  $const{mail_to} = $1 if $mail =~ m/To: (.*?)\n/s;
  $const{mail_cc} = $1 if $mail =~ m/Cc: (.*?)\n/s;
  $const{mail_subject} = $1 if $mail =~ m/Subject: (.*?)\n/s;
  $const{mail_headers} = $1 if $mail =~ m/^(.*?)\n\n/s;
  $const{mail_body} = $1 if $mail =~ m/\n\n(.*?)$/s;
  insert();
  unlink $_[0] if system("mv",$_[0],"$const{maildir}parsed/$const{currentfile}") != 0;
}

opendir(MAILDIR, $const{maildir}) or exit 10; #Can't open maildir
foreach my $thisfile (readdir(MAILDIR)) {
  $const{currentfile} = $thisfile;
  parse("$const{maildir}$thisfile");
}
closedir(MAILDIR);

exit 0;
Re: Archive mail into a database
created: 2006-01-11 11:01:09

Putting all your variables into a hash completely obliviates all benefits of using strict.
You may as well make them all global variables and turn off strict. Your code would be shorter and easier to read.

Now, if you really are tied :-) to the hash-based variable approach, you could at least use something like Tie::StrictHash, or the possibly somewhat more namespace-scaleable Tie::SecureHash.

We're building the house of the future together.
Re: Archive mail into a database
created: 2006-01-11 11:26:24

I don't believe that you really need to care about that close() failing, since the file is open for reading. (Of course it doesn't hurt to check.)
But I'd be much more concerned about the possible failure of those move and unlink calls.

Btw... Why are you using external 'mv'? You could use the move function of the standard File::Copy module; it is both portable and 'smart'.

We're building the house of the future together.
Re: Archive mail into a database
zby
created: 2006-01-11 11:40:57
How about using some existing modules to parse the emails instead of using regexes. With a quick search on cpan I found Email::Simple and Email::Abstract, I don't know if they are of any value.
Re: Archive mail into a database
created: 2006-01-11 12:03:31

It seems to me that you could get "unexpected" results by doing

  $const{mail_to} = $1 if $mail =~ m/To: (.*?)\n/s;
That only sets the variable if the match is made; if no match is made, the variable retains its current value, which — because of the way you're using global variables — could be what was found in the previous file. I think I'd do it this way:
  $const{mail_to} = $mail =~ m/To: (.*?)\n/s;
That way, the variable gets — properly — undef if no match is made. Personally, I'd be inclined to pass the values as arguments to insert, rather than using global variables.

We're building the house of the future together.

perlmonks.org content © perlmonks.org and jdporter, jkva, zby

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03