Strange Problem
capoeiraolly
created: 2006-02-02 16:46:08
Hey all, I'm writing a perl script to fix corrupted sendmail mailboxes. Basically, the very first thing in a mailbox has to be the word From, otherwise users can't log in. From time to time bits of garbage get entered in to the start of the mailbox files so I've written a perl script to remove those line of garbage. The script work fine on my machine (debian) but when I stick it on my mail server (BSD) it throws a hissy fit, and instead of removing garbage it will sometimes work and sometimes wipe the entire mailbox file. It's kind of driving me nuts so any help would be really appreciated.. Here's the script :
# ------------------------------------------------------------------------------------------------------------------------------
# This perl script is designed to run through all of the mailboxes in /mnt/mail on
# mail.visp.co.nz removing all corrupted data from the start of any corrupted
# mailboxes.
#
# Coded By : Oliver Sneyd
# When : February 2006
# Contact : oliver.sneyd@mail.iconz.net
# -----------------------------------------------------------------------------------------------------------------------------

# Include the file statistics object so that the script can check the filesizes of each mailbox
use File::stat;

# Path to the maildir, /mnt/mail for mail.visp.co.nz
$path = "./mail/";
$backupPath = "./backup/";

# Open up a directory handle
print "\n\tGenerating mailbox list ...\n";
opendir(MAILDIR, $path);

# Read the names of each entry in the maildir in to an arrays
@filenames = readdir(MAILDIR);

# Create an array to hold the mailboxes
my @mailboxes = ();

# Loop through the results returned by the directory handle
for($i = 0; $i < @filenames; $i++)
{
	# If the result returned by directory handle is NOT a directory 
	if(not(-d ($path . $filenames[$i])))
	{
		# Work out the file-size of the current mailbox
		$size = stat($path . $filenames[$i]);
		
		# If the filesize is greater than 0, add it to the mailbox list
		if($size->size > 0)
		{
			push(@mailboxes, $filenames[$i]);
		}
	}
}

# Loop through the mailboxes
print "\tChecking for corrupt mailboxes ...\n\n";
while(@mailboxes > 0)
{
	$mailbox = pop(@mailboxes);
	checkMailbox($mailbox);
}
print "\n\tDone.\n\n";

# Close the directory handle
closedir(MAILDIR);


# ------------------------------------------------------------ FUNCTIONS ---------------------------------------------------


sub checkMailbox
{
	# Set a corrupt variable to be true
	$corrupt = 1;
	
	# Loop untill corrupt is false
	$initial = 0;
	while($corrupt == 1)
	{
		# Open up the mailbox
		open(MAILBOX, ($path . $_[0]));
		
		# Read in the first line of the mailbox 
		$line = ;
		
		# Get the index of the string "From"
		$idx = index($line, "From");
		
		# If the index of "From" is 0, the mailbox isn't corrupted any more
		if($idx == 0)
		{
			# So set corrupted to false
			$corrupt = 0;
		}
		else
		{
			# Make a bacukp of the corrupted mailbox, just in case
			if($initial == 0)
			{
				print "\tFixing $_[0] ...\n";
				system("cp $path" . $_[0] . " " . $backupPath . ".");
				$initial = 1;
			}
			
			# And remove the first line of the mailbox
			system("sed -e '1d' $path" . "$_[0] | more > $path" .  $_[0]);
		}
		
		# Close the mailbox
		close(MAILBOX);
	}
	
}
Re: Strange Problem
created: 2006-02-02 17:17:05

This doesn't explain the problem you're having, but have you considered the possibility that the 'garbage' at the start of your mailbox files might span multiple lines? Your code doesn't account for that possibility.

More to the point, you don't check for failure on open, yet later you use system to overwrite the file. While it doesn't seem likely that you would be able to overwrite a file that you couldn't open, it makes me nervous to see you opening a file without checking the result of that open.

Re^2: Strange Problem
created: 2006-02-02 17:37:59
Ok the problem is that the script seems to randomly empty the mailbox files (on the BSD box) instead of just removing single/multiple lines of garbage. Sometimes it will work on a mailbox, sometimes it won't... I haven't seen any pattern to it yet.

Haven't bothererd with the file opening checking yet because I'm still just testing. It's a contained environment that I'm testing this in, not the actual mailboxes.

The code should account for multiple lines of garbage, the while loop will read in a line at a time of the file until the file is either empty or untill the index of "From" is 0.

The test I'm running on BSD (errors) is exactly the same as the one that works running on my Debian (no errors) machine. I'm only running it on about 10 mailboxes, with a mixture of corrupted and non corrupted files.
Re: Strange Problem
created: 2006-02-02 17:48:38
Why the pipe through more in your system call to sed?
I'd also prefer to close MAILBOX before calling system commands on the file, but that may be irrelevant.
Re^2: Strange Problem
created: 2006-02-02 17:54:29
If you don't pipe the sed through more it simple wipes the file. Good idea to close the file handle first. Dosn't do anything for the problem though.
Re^3: Strange Problem
created: 2006-02-02 18:09:04
i'm not a sed expert by any stretch of the imagination, but wouldn't
  sed -i -e '1d' $file
be a more idiomatic way to write it?

I believe that piping output into the file you're stream-editing is not the most reliable thing to do. In fact, I'm pretty sure that's why your buffering by "|more" prevents the file from being clobbered.

Re^4: Strange Problem
created: 2006-02-02 18:42:39
I'm no sed expert either, in fact this is the first time I've used it.

That's a much nicer way of doing things, cheers :)
Re^4: Strange Problem
created: 2006-02-02 18:45:10
Damn, sed on BSD dosn't have the -i option. I'll have a look at the man page to see if there is another way to do an in-place edit.
Re^5: Strange Problem
created: 2006-02-02 22:29:56
Um, what sort of BSD are you using? When I do "man sed" on freebsd (and the bsd-based darwin on my mac), I see:
  -i extension
      Edit files in-place, saving backups with the specified extension.
      If a zero-length extension is given, no backup will be saved.  It
      is not recommended to give a zero-length extension when in-place
      editing files, as you risk corruption or partial content in situ-
      ations where disk space is exhausted, etc.
Re^6: Strange Problem
created: 2006-02-07 15:16:31
Not sure which flavour of BSD is installed on this box as I didn't set it up myself, however sed -i produces :
sed: illegal option -- i
usage: sed script [-an] [file ...]
       sed [-an] [-e script] ... [-f script_file] ... [file ...]
Re: Strange Problem
created: 2006-02-02 18:09:12
The main culprit seems to be this line:
system("sed -e '1d' $path" . "$_[0] | more > $path" .  $_[0]);

Oops! You want to edit a file "in place", but by redirecting output to the same location as your input is supposed to be, you effectively truncate that file before it can be processed.

When updating file contents you should make sure input and output don't interfere. One approach could be like this: Since you already use perl to read the first line, why don't you just read on until you find a "From" line, and then start copying that and what follows to another file. Finally you can move the result back to the original location.

Of course, perl has builtins that can do most of the work for you. Like, for example:

perl -n -i.bak -e 'print if /^From/..-1' mail_file
This snippet removes all lines before the first occurence of a line starting with the four letters F, r, o, m from mail_file, leaving a backup of the original in mail_file.bak.

You should also make sure no mails are delivered while you are working on real life mailbox hierarchies.

Re^2: Strange Problem
created: 2006-02-02 18:40:40
I'll give the perl command a go, but the sed command does actually work... give it a go. If you have a text file with say three lines in :

line 1
line 2
line 3

The result of that system call is (I've tried it on both Debian and BSD) :

line 2
line 3

Of course I will make sure that no mail is delivered to the mailbox while i'm messing around with it :)
Re^3: Strange Problem
created: 2006-02-02 22:11:03
Your shell command line might sometimes work but the problem is precisely that it is not guaranteed to do so. The reason is that the >file part clobbers the very same file that is supposed to be read by the sed -e '1d' file part.

If there was only one process involved, the outcome would be quite predictable. However, since you constructed a pipeline of two processes there is a chance that the first one wins the race and catches a portion of the file before the file is destroyed by the second one. However, as you already observed, you can not rely on that.

To solve that problem you can use a temporary file (like perl -i does behind the scene) or read and write to the file through a single file handle in a single process, which may prove somewhat more difficult to get right.

If you are interested anyway you may want to look up file access modes in perlopentut, specifically +<. You also might find the truncate function useful. node 21664 has excellent explanations of the different techniques.

Re^2: Strange Problem
created: 2006-02-02 21:41:10
Works beautifully. Thanks you for the help :)
Re: Strange problem trying to clean garbage from start of mailbox file
created: 2006-02-02 22:53:15
Now that [martin] has solved your basic problem, I'd just like to point out short your code could be:
my $path = "./mail";
my $bkup = "./backup";

open MAILDIR, $path;

for my $mbox ( grep { -f "$path/$_" and -s _ } readdir MAILDIR )
{
    rename "$path/$mbox", "$bkup/$mbox";
    system( "perl -ne 'print if /^From/..-1' $bkup/$mbox > $path/$mbox" );
}
(That assumes that the backup directory is not on a distinct disk volume.)
Re^2: Strange problem trying to clean garbage from start of mailbox file
created: 2006-02-02 23:19:25
Cheers for that...

Won't this code create a backup of every mailbox instead of just the corrupted ones?

perlmonks.org content © perlmonks.org and capoeiraolly, graff, martin, ptum, rhesa

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03