Copying a file to a temporary file
shay
created: 2004-06-15 09:03:30
What is the best way to copy a file to a temporary file?

(The purpose of the temporary file is to read data from, process it in some way, and then write it back over the top of the original file; i.e. I effectively want to edit the original file "in-place", but not all in memory at once in case it is too large.)

I was intending to use the File::Temp::tempfile() function to get a temporary file, and File::Copy to do the copying, but both of the following obvious ideas have problems:

  1. my $tmpfh = File::Temp::tempfile();
    File::Copy::copy($file, $tmpfh);
    
  2. my($tmpfh, $tmpfile) = File::Temp::tempfile(UNLINK => 1);
    File::Copy::copy($file, $tmpfile);
    
The first idea is risky according to File::Copy's manpage (it says passing filehandles instead of filenames to copy() may lead to loss of information on some systems), while the second idea is not the recommended practice in File::Temp's manpage (it is safer to only get the temporary file's filehandle, not to get it's name as well).
Re: Copying a file to a temporary file
created: 2004-06-15 09:15:30

For inplace editing I suggest you look at -i switch or [Super Search] for 'inplace edit'

perl -pi.bak -e 's/this stuff/that stuff/g' some files

As for creating a temporary file. There is lots of that on [Super Search] too, but once you have a temp file handle you can just use [open] and [<>] or [read] and [print]......

You will find the guts of File::Temp in a 15 line function here [id://334072]

cheers

tachyon

Re: Copying a file to a temporary file
created: 2004-06-15 09:26:00
According to the docs of File::Temp, if you use File::Temp::tempfile in scalar context, it only returns the filehandle. You can then use this filehandle as an argument to File::Copy::copy, which accepts both filenames and filehandles. Update: You'll have to find out if the warning in File::Copy's docs about using filehandles apply to your situation.

Arjen

Re: Copying a file to a temporary file
created: 2004-06-15 09:26:15

Is there a reason why you want to copy the original file to process it rather than say, renaming it to some temporary name and outputting the results of your munging to a new file with the original name?

For example: Is it necessary that the original file be available to other processes whilst the munging is in progress?

The nice thing about ranaming is that it is (under most circumstances), an atomic operation at the OS level, which closes many possibilities for problems.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Re^2: Copying a file to a temporary file
created: 2004-06-16 07:53:35
What I want to achieve is editing a file in-place (via a temporary file) and optionally creating a backup file as well if a backup filename is given.

My idea was therefore to use the backup file as the temporary file, where a backup filename is given, or else use File::Temp to create one for me. I then copy to the temporary file, read from it and write back over the original, and then leave File::Temp to clean up the temporary file if one was created. (If a specified backup file was used instead, then it gets left afterwards, of course.)

I could start by renaming the original to the backup/temporary name instead, as you suggest, but where do I get the temporary name from? File::Temp returns an open filehandle - no good for renaming my original file to, hence I was looking to copy to it instead.

Actually, having read Re-runnably editing a file in place, I'm now thinking something along those lines would be better:

I could get a temporary filehandle, read from the original file, process the data and write to the temporary filehandle. Then I'd want to rename the temporary file to the original filename, but I don't know the temporary filename unless I ignore File::Temp's advice and pick up both the handle and the name. Maybe that's safe enough since I wouldn't be doing anything with the temporary filename except renaming it (and I therefore wouldn't want File::Temp to try to delete the temporary file either). (I'd have to create the backup file separately, rather than using it as the temporary file, in this scheme, of course.)

- Steve

Re^3: Copying a file to a temporary file
created: 2004-06-16 09:05:14

I think I'd use something simple like:

my $file = ...;
my $n=0;
if( -e $file ) { ## Stop endless loop if $file doesn't exist
    $n++ until rename $file, "$file.bak$n";
}
else {
    die "$file doesn't exist";
}
my $backup = "$file.bak$n";

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - [Abigail-II|Abigail]
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - [tachyon]
Re^4: Copying a file to a temporary file
created: 2004-06-16 12:01:04
That doesn't look like a great way to choose a backup filename - the rename will succeed even for candidate backup filenames that exist (permissions permitting), so the backup has potentially just clobbered another file! (Or have I misunderstood you?)

Anyway, as I said, the backup filename is supplied by the caller of this code if a backup file is required. My real concern is what the best way to achieve the in-place edit via a temporary file is, possibly taking advantage of the given backup filename if one is given.

I like the idea of writing the processed data to a temporary file and then moving that back (either (1) by a rename or (2) by copying the contents), rather than my original idea of moving/copying the file to be edited and then writing the processed data back to it, so that the process can be easily re-run if it failed the first time.

However, both options (1) and (2) above have problems:

Option (1) goes something like this (return values obviously need checking, and there are some chmod games that can be played too, but this is the bare bones of it):

use File::Temp qw(tempfile);
my $file = 'test.txt';
my($tmpfh, $tmpfile) = tempfile();
open my $fh, '<', $file;
binmode $fh;
while (<$fh>) {
  # Process $_ here
  print $tmpfh $_;
}
close $fh;
close $tmpfh;
rename $tmpfile, $file;

I can see two problems with that. Firstly, tempfile() was not called in scalar context so the temporary file will not be cleaned up if the program is interrupted or killed. (A $SIG{INT} handler could arrange for them to be cleaned up if interrupted, but not if the program is killed.) Secondly, while the rename itself is (normally) atomic, there is a race condition between the close and the rename - somebody else could potentially modify the file inbetween.

Option (2) looks like this (with the same caveats as before):

use Fcntl qw(:seek);
use File::Temp qw(tempfile);
my $file = 'test.txt';
my $tmpfh = tempfile();
open my $fh, '<', $file;
binmode $fh;
while (<$fh>) {
  # Process $_ here
  print $tmpfh $_;
}
close $fh;
seek $tmpfh, 0, SEEK_SET;
open my $fh2, '>', $file;
binmode $fh2;
print $fh2 $_ while <$tmpfh>;
close $fh2;
close $tmpfh;

This time, the temporary file's contents are written back to the original file without the temporary file having been closed, so there is no close/rename race condition. Also, tempfile() was called in scalar context so the temporary file will be cleaned up even if the program is killed (on Win32, at least, via the O_TEMPORARY flag that is used when opening the file). However, the process of copying the temporary file's contents back to the original file is no longer atomic, so if the program is interrupted during the final while loop then the original file will be left partially written.

So neither option is perfect. Which is approach is the lesser of the two evils? Is there another approach with none of these pitfalls?

Re^5: Copying a file to a temporary file
created: 2004-06-16 13:53:33
That doesn't look like a great way to choose a backup filename - the rename will succeed even for candidate backup filenames that exist (permissions permitting),...

Really? I'm pretty certain that I have never used a filesystem that, regardless of permissions, would allow you to rename one file on top of an existing one. Which filesystem are you using?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re^6: Copying a file to a temporary file
created: 2004-06-17 03:43:50
I don't think the filesystem is relevant. It is true that on my Windows NTFS filesystem that the shell command "rename" will not rename OLDNAME to NEWNAME if NEWNAME already exists, but we're talking about Perl...

The perlfunc manpage entry for Perl's built-in rename() function says:

Changes the name of a file; an existing file NEWNAME will be clobbered.
and it's quite correct (I just tried it to make sure!).

Any more thoughts on my temporary file issue?

- Steve

Re^7: Copying a file to a temporary file
created: 2004-06-17 07:35:41

I really never knew that. How dumb. Both my assumption in not checking what I knew could never be so and the logic that makes me wrong. You'll have to decide for yourself which is dumber:)

It will be a while before I stop thinking about the logic that allows a [rename] function to become a "delete target and then copy over" command.

You could consider this.

#! perl -slw
use strict;
use Win32::API::Prototype;

ApiLink( 
	'kernel32',
	'UINT GetTempFileName(
	  LPCTSTR lpPathName,
	  LPCTSTR lpPrefixString,
	  UINT uUnique,
	  LPTSTR lpTempFileName
	)'
) or die $^E;

my $tempFileName = ' ' x 254;
my $path = '.';
my $prefix = 'temp0000';

GetTempFileName( $path, $prefix, 0, $tempFileName ) or die $^E;

print $tempFileName;


After the above code has been run, the an empty file with the name returned will have been created. You can then open and use it as you need to.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - [Abigail-II|Abigail]
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - [tachyon]

perlmonks.org content © perlmonks.org and Aragorn, BrowserUk, shay, tachyon

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03