# ------------------------------------------------------------------------------------------------------------------------------
# This perl script is designed to run through all of the mailboxes in /mnt/mail on
# mail.visp.co.nz removing all corrupted data from the start of any corrupted
# mailboxes.
#
# Coded By : Oliver Sneyd
# When : February 2006
# Contact : oliver.sneyd@mail.iconz.net
# -----------------------------------------------------------------------------------------------------------------------------
# Include the file statistics object so that the script can check the filesizes of each mailbox
use File::stat;
# Path to the maildir, /mnt/mail for mail.visp.co.nz
$path = "./mail/";
$backupPath = "./backup/";
# Open up a directory handle
print "\n\tGenerating mailbox list ...\n";
opendir(MAILDIR, $path);
# Read the names of each entry in the maildir in to an arrays
@filenames = readdir(MAILDIR);
# Create an array to hold the mailboxes
my @mailboxes = ();
# Loop through the results returned by the directory handle
for($i = 0; $i < @filenames; $i++)
{
# If the result returned by directory handle is NOT a directory
if(not(-d ($path . $filenames[$i])))
{
# Work out the file-size of the current mailbox
$size = stat($path . $filenames[$i]);
# If the filesize is greater than 0, add it to the mailbox list
if($size->size > 0)
{
push(@mailboxes, $filenames[$i]);
}
}
}
# Loop through the mailboxes
print "\tChecking for corrupt mailboxes ...\n\n";
while(@mailboxes > 0)
{
$mailbox = pop(@mailboxes);
checkMailbox($mailbox);
}
print "\n\tDone.\n\n";
# Close the directory handle
closedir(MAILDIR);
# ------------------------------------------------------------ FUNCTIONS ---------------------------------------------------
sub checkMailbox
{
# Set a corrupt variable to be true
$corrupt = 1;
# Loop untill corrupt is false
$initial = 0;
while($corrupt == 1)
{
# Open up the mailbox
open(MAILBOX, ($path . $_[0]));
# Read in the first line of the mailbox
$line = ;
# Get the index of the string "From"
$idx = index($line, "From");
# If the index of "From" is 0, the mailbox isn't corrupted any more
if($idx == 0)
{
# So set corrupted to false
$corrupt = 0;
}
else
{
# Make a bacukp of the corrupted mailbox, just in case
if($initial == 0)
{
print "\tFixing $_[0] ...\n";
system("cp $path" . $_[0] . " " . $backupPath . ".");
$initial = 1;
}
# And remove the first line of the mailbox
system("sed -e '1d' $path" . "$_[0] | more > $path" . $_[0]);
}
# Close the mailbox
close(MAILBOX);
}
}
This doesn't explain the problem you're having, but have you considered the possibility that the 'garbage' at the start of your mailbox files might span multiple lines? Your code doesn't account for that possibility.
More to the point, you don't check for failure on open, yet later you use system to overwrite the file. While it doesn't seem likely that you would be able to overwrite a file that you couldn't open, it makes me nervous to see you opening a file without checking the result of that open.
sed -i -e '1d' $filebe a more idiomatic way to write it?
I believe that piping output into the file you're stream-editing is not the most reliable thing to do. In fact, I'm pretty sure that's why your buffering by "|more" prevents the file from being clobbered.
-i extension
Edit files in-place, saving backups with the specified extension.
If a zero-length extension is given, no backup will be saved. It
is not recommended to give a zero-length extension when in-place
editing files, as you risk corruption or partial content in situ-
ations where disk space is exhausted, etc.
sed: illegal option -- i
usage: sed script [-an] [file ...]
sed [-an] [-e script] ... [-f script_file] ... [file ...]
system("sed -e '1d' $path" . "$_[0] | more > $path" . $_[0]);
Oops! You want to edit a file "in place", but by redirecting output to the same location as your input is supposed to be, you effectively truncate that file before it can be processed.
When updating file contents you should make sure input and output don't interfere. One approach could be like this: Since you already use perl to read the first line, why don't you just read on until you find a "From" line, and then start copying that and what follows to another file. Finally you can move the result back to the original location.
Of course, perl has builtins that can do most of the work for you. Like, for example:
perl -n -i.bak -e 'print if /^From/..-1' mail_fileThis snippet removes all lines before the first occurence of a line starting with the four letters F, r, o, m from mail_file, leaving a backup of the original in mail_file.bak.
You should also make sure no mails are delivered while you are working on real life mailbox hierarchies.
If there was only one process involved, the outcome would be quite predictable. However, since you constructed a pipeline of two processes there is a chance that the first one wins the race and catches a portion of the file before the file is destroyed by the second one. However, as you already observed, you can not rely on that.
To solve that problem you can use a temporary file (like perl -i does behind the scene) or read and write to the file through a single file handle in a single process, which may prove somewhat more difficult to get right.
If you are interested anyway you may want to look up file access modes in perlopentut, specifically +<. You also might find the truncate function useful. node 21664 has excellent explanations of the different techniques.
my $path = "./mail";
my $bkup = "./backup";
open MAILDIR, $path;
for my $mbox ( grep { -f "$path/$_" and -s _ } readdir MAILDIR )
{
rename "$path/$mbox", "$bkup/$mbox";
system( "perl -ne 'print if /^From/..-1' $bkup/$mbox > $path/$mbox" );
}
(That assumes that the backup directory is not on a distinct disk volume.)
perlmonks.org content © perlmonks.org and capoeiraolly, graff, martin, ptum, rhesa
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03