#!/usr/bin/perl
use warnings;
use File::Find;
#takes list of dirs on commandline
#must give one or get an error
finddepth sub {
return if $_ eq "." or $_ eq "..";
return if -d;
unlink($_);
# print "$File::Find::name\n"; #if you want printout
}, @ARGV;
__END__
So a couple of the shell guru's, who like to "bash" Perl, said this is faster
find . -type f -exec rm {} \;
So I decided to test it on a deeply nested directory tree, of 80 Megs in size, and timed them.
$ time find . -type f -exec rm {} \;
real 0m2.987s
user 0m0.785s
sys 0m2.184s
$ time ./zdelfiles Gtk3
real 0m0.384s
user 0m0.076s
sys 0m0.308s
The Perl script was in the range of 10 times faster. :-) Comments, improvements, and edifications welcome.
sub {
-f _ or return;
unlink $_;
You really only want to unlink "regular" files; and this makes the
comparison apples-and-apples with the shell version.
Also, explicitly testing '.' and '..' is superfluous, because they'd be caught by -d.
Fore!
sub { unlink if -f }
:-)
[id://149675|Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.]
Yes, my reply originally looked like that; but as the OP said, you may want to do additional things, such as reporting.
time find . -type f -print | xargs rmI don't know how much that will affect your timings, though.
it starts complaining AND skipping filenames with spaces in them
...And that is precisely why find has the -print0 switch and xargs has the -0 (or --null) switch.
find . -type f -print0 | xargs -0 rm
Flavio
perl -ple'$_=reverse' <<
xargs is smart enough to deal with that. If the argument list would exceed certain limits (may depend on the system and the build to xargs, for some versions of xargs, limits can be set as parameters), as many copies of the program will be fired up as necessary. Limits exist on the number of arguments, and the total length of the arguments.
OT: I'm quite happy of my wrong answer. It taught me that I had wrong expectations about xargs, and that it will work fine in my Linux box. And others will possibly benefit :)
Flavio
perl -ple'$_=reverse' <<
I remembered working on a Sun some time ago where a pipe with xargs gave problems with too many arguments - but I might have dreamt it.If so, then it would be a bug, as that would mean its xargs would not be POSIX compliant. I can't recall ever run into this problem using Solaris (but that doesn't prove anything - my memory isn't perfect). Here's the relevant quote from the POSIX docs:
The generated command line length shall be the sum of the size in bytes of the utility name and
each argument treated as strings, including a null byte terminator for each of these strings. The
xargs utility shall limit the command line length such that when the command line is invoked,
the combined argument and environment lists (see the exec family of functions in the System
Interfaces volume of IEEE Std 1003.1-2001) shall not exceed {ARG_MAX}-2 048 bytes. Within
this constraint, if neither the -n nor the -s option is specified, the default command line length
shall be at least {LINE_MAX}.
time perl -MFile::Find -e'finddepth sub { unlink if -f }, @ARGV' /tmp
real 0m3.111s user 0m0.821s sys 0m2.233s
time find /tmp -type f | xargs rm
real 0m3.312s user 0m0.760s sys 0m2.511sAnd the varied widely - anywhere up to 5 seconds.
return if -d; unlink $_against the golfed
unlink $_ if -f;and the golfed -f test seems to be a bit slower. Maybe because unlink somehow gets called for each directory then stopped? Whereas the 'return if -d ' returns immediately. BUT the improved shell with null
find . -type f -print0 | xargs -0 rmseems to win :-(
time -d-test Gtk3 real 0m0.412s user 0m0.074s sys 0m0.337s time -f-test Gtk3 real 0m0.478s user 0m0.076s sys 0m0.388s time find . -type f -print0 | xargs -0 rm real 0m0.334s user 0m0.012s sys 0m0.321s
Also the optimized shell script only beat the Perl version by a nose, Considering how much more flexible the Perl script is, in processing the files as they are found, run-of-the-mill Perl is likely to be faster, than a run-of-the-mill shell, doing some equivalent task. Shell, with it's constant spawing of awk and sed, etc.; is probably harder to do at optimized speed, compared to Perl.
You may view the original node and the consideration vote tally.
I'm a bit surprised however that no-one so far as piped in the "programmer time is more costly than running time" mantra. Surely, the 2 seconds running time difference are dwarved by all the extra typing you need in your Perl solution. Or are Perl programmers cheap, and shell programmers expensive?
I would always go for the shell solution. I'll have deleted all the files even before you've finished typing your Perl program.
One way of doing system administration is to write a little program for every minor task you want. A small change, a different program. And then, everyone has to carry disks with their personal libraries around. Granted, it's workable.
I myself prefer the Unix/POSIX solution. Lots of small tools, that can be stacked like legos. Tools that are everywhere, like find and xargs. When I sit down at a Unix system, I can type
find . -type f -print0 | xargs rm
to delete files, and leave the directory structure as is. I don't have to remember whether I installed a program doing this for me on the box, and if I did, how it's called. And I don't need to write a new program if I want to delete all files older than a week - just add an extra option to find. (Sure, you could enhance your program that it takes all kinds of options, but if you have to type as many options to your program as to find, you might as well have used find in the first place).
I'm not a monoculturist programmer. For anything complex, I write a Perl or a C program (preferably Perl, but that isn't always available - if all you have is a few Mb of RAM and a dozen or so Mb on disk, there's no Perl, but busybox stacks a lot of goodies in just a few kb). But I don't bother writing programs for tasks that I don't do that often and that only require a few simple commands. That's not efficient.
So you shell guys have a point, you rely on some standard Gnu utlities, and say it is fast in the broadest sense. But my original point, that alot of the shell 1-liners that thrown out as quick solutions, are not neccesarily faster than a Perl script, just because it is C chained together in a pipe. And I do see the shell guys making this claim in the newsgroups, without showing any proof. Thus my original post.
Personally, I would find it COST efficient to put all my Perl utilities on a USB-keyring-drive, rather than spend the time to learn "arcane" shell syntax. Everytime I look at the way bash shell is done, it blows my mind as being the most confusing syntax that I've ever seen. So I could spend hours trying to confuse myself with shell, where a $25 USB-keyring-drive would let me carry my Perl utilities with me. Efficiency is measured in more than just typing time, there is the economics and mental strain of learning multiple languages that have conflicting syntax styles. Perl, C, PhP, Python, etc. all have 'compatible' syntax, Bash shell is definitely odd.
I find it admirable that some hackers use different languages according to what is easier to do, but how many syntax errors do they make, when they are juggling shells? Personally I think it is better to try and learn 1 language, and become good with it......yes I only ride a bicycle and I only use Perl. ;-)
Personally, I would find it COST efficient to put all my Perl utilities on a USB-keyring-drive, rather than spend the time to learn "arcane" shell syntax.Well, I could understand a Java or a Python coder complaining another language has an "arcane" syntax. But a Perl programmer complaining shell has "arcane" syntax, I can't take seriously. And for cost, let's see, you move from one box to the other. First, you have to umount the USB device from the one box, crawl under the table to remove the device, crawl under another table to put the stick in the different box, become super user on the new box, edit /etc/vfstab or /etc/sudoers so a regular user can mount a USB device, log off as root, mount the USB device, become root again, fix the syntax error in the file you just edited, log off as root, mount the USB device, and then you're ready to remove the files. No thanks, I just type in the handful characters on the command line - it's faster, and hence, more cost efficient.
Everytime I look at the way bash shell is done, it blows my mind as being the most confusing syntax that I've ever seen.Well, we're talking about
find . -type f -print0 | xargs -0 rm
and we have a Perl programmer complaining the syntax is hard to understand.
Perl, C, PhP, Python, etc. all have 'compatible' syntax,I think many Python programmers would be deeply insulted by that statement.
Bash shell is definitely odd.Really? The Bourne shell (which is what you ought to use for scripts) has loops, functions, and conditions, just like C, Python and Perl have. It's interpolation options are vastly superiour to Perl. Perhaps the oddest things shells have are redirection, (>, >>, <, |), but Perl has them as well in its open statement. And Perl6 will have ==> and <== acting as pipes.
I find it admirable that some hackers use different languages according to what is easier to do, but how many syntax errors do they make, when they are juggling shells?Why would you juggle shells? The Bourne shell (or a compatible shell) is available on every Unix or Unix-like OS - it's a POSIX requirement. Any sane shell programmer will write his shell scripts in the Bourne shell. No 'juggling' needed. I generally have less problems in the shell going from one OS to another than in Perl - where one box will have a thread enabled 5.8.7 perl with 64 bit integers, the other will have 5.6.0 with no threads and only 32 bit integers. Furthermore, the line being discussed will work on any mainstream shell (sh, bash, csh, tcsh, ksh, ash, zsh, ...)
But a Perl programmer complaining shell has "arcane" syntax, I can't take seriously.
I've edited cross-platform Makefiles. Shell has multiple incompatible ugly arcane syntaxes.
then you suggest using a USB keyring drive - as if file systems are all that portable
Well you could have 2, 1 vfat and one ext2; that would get you onto most systems. You could also carry around a bootable cd like knoppix, so you can boot systems anyway you want.
crawl under the table to remove the device, crawl under another table to put the stick in the different box,
Hackers need exercise.
we're talking about find . -type f -print0 | xargs -0 rm
So I'm supposed to memorize that to save me .1 seconds? The shell programmers who threw that out as a solution didn't even remember the nulls right, and I'm supposed to? Let's see, was that -print0 | xargs -0 OR -print 0 | -xargs0 ? Did I need that funny {} or not, damn, I could have finished already if I just wrote it in Perl.!
Why would you juggle shells?
Why do you climb under tables?
I'm just having fun with you, sorry. I, like probably most computer hackers, deal with 1 or 2 computers until their control. A desktop and a laptop. If you have a job, or the need to be moving from one machine to another, without root priviledges, then it pays to have memorized those bash shell 1-liners. But for most of us, we can use our ~/bin directories to store our perl scripts.
You could also carry around a bootable cd like knoppix, so you can boot systems anyway you want.
And you complain that shell isnt portable?
Makeshifts last the longest.
Personally, I would find it COST efficient to put all my Perl utilities on a USB-keyring-drive, rather than spend the time to learn "arcane" shell syntax.
I know this has already been discussed (what with all the crawling around under desks) but I'd like to point out that I do not have easy physical access to most of the computers I use due to them being mounted in racks two floors below my office, and occasionally in other buildings. Using any form of portable media is pretty much a non-starter, so I would need to get any tools I need over the network or use those that are installed already.
That said, I have a load of tools on shared filesystems that can be accessed from most hosts, so usually it doesn't matter.
You may view the original node and the consideration vote tally.
I would always go for the shell solution. I'll have deleted all the files even before you've finished typing your Perl program.
Well, since you are being snarky I'll respond in kind: I doubt it, i reckon youll still be fighting with the shell syntax, and doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on. And even then you still wont be 100% confident that it will all work as expected.
Which to me is the reason that perl scripts beat shell scripts hands down pretty well every time. I can use the same perl script on every shell and OS I can find pretty much. Your shell script will only work on a small subset of them, and will require massive changes for some of them.
Shell scripts are only worth thinking about if you are a monoculture programmer. Since I'm not I view them mostly with contempt. Who needs shell scripts when you have perl scripts instead?
I reckon youll still be fighting with the shell syntax
I can type find | xargs pipes in my sleep.
doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on
Present in the shell? Theyre external binaries; which shell youre using is irrelevant. Maybe present on the system, except that if find, xargs and rm are not present, that is one very broken system. And the -print0/-0 switches are available on these commands on all Unixoid systems where I cared to look.
And all that is far more likely to be around than perl, in any case.
If your portability argument concerns moving between Windows and Unix, well, I can see how someone working on Windows would prefer to always use Perl :-)
Makeshifts last the longest.
Well, since you are being snarky I'll respond in kind: I doubt it, i reckon youll still be fighting with the shell syntax, and doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on. And even then you still wont be 100% confident that it will all work as expected.Bollocks. find | xargs has worked on every Unix system I've used for the last 30 years. Out of the box. In any shell, as the only 'shell' thing here is the pipe, which is universal. It has worked long before Larry released perl1.0, and it will continue to work long after perl5 will be a distant memory.
Which to me is the reason that perl scripts beat shell scripts hands down pretty well every time. I can use the same perl script on every shell and OS I can find pretty much. Your shell script will only work on a small subset of them, and will require massive changes for some of them.The shell solution will work on at least anything that's POSIX compliant. Will your Perl program work in perl6? How would you know - it may work on todays version of perl6, but maybe not on next weeks. As for Perl being present on the OS by default, for many OSses, it's only quite recent that their OS came with some version of perl5 installed.
Shell scripts are only worth thinking about if you are a monoculture programmer. Since I'm not I view them mostly with contempt. Who needs shell scripts when you have perl scripts instead?So, you do everything with Perl scripts, so you're not a monoculture programmer? Interesting. What's your definition of monoculture then?
But you're right. Once you have a truck, you have no need for a bicycle. It's much easier to start up the truck and find a parking spot, just to get a newspaper from the shop around the corner. It's cheaper as well. Bicyclists are monoculture traffic participants - none of them know how to drive a car.
So, you do everything with Perl scripts, so you're not a monoculture programmer? Interesting. What's your definition of monoculture then?
Yes, pretty well anything I write that has to be run on multiple enviorments (which in theory is most of what I do) is written in perl.
Monoculture to me is writing code that expects to be run on/in a certain OS/Shell/Architecture.
But I will say that your bike/truck point is a powerful one.
"Monoculture" should not include "shell" any more than it should include "perl". Both are just interpreters that one may use to write the programs we're referring to.
Besides, I write perl to run on only three more platforms than I write for shell. I write cross-platform shell at work for AIX (4, 5), Sun (5-9), HP/PARISC (10+), HP/ia64 (11.23+), and Linux for ia32, ia64, x86-64, ppc, and s390/s390x. Add Windows for ia32, ia64, and x86-64 to get my perl list. I write both production and tooling in both languages for all platforms. And I have over 10,000 lines of production code that I've written and/or maintain in each language.
I dispute any attempt to claim cross-platform problems in shells at any significant difference from perl. Core tools, much like core perl, is pretty much ubiquitous. Non-core tools, which perl would often need to use system() to call anyway, won't be any more difficult in shell than perl.
In fact, the only reason why we don't actually ship perl scripts (we only use them for development) but we do ship shell scripts is trying to convince management to rely on a decent version of perl being installed everywhere. We have no problems with being convinced of Bourne Shell syntax everywhere (although we actually use bash on Linux). The version dependancies in shell are way less than we've had with perl where multiple times we've needed upgrades to handle things. Shell has just worked.
For minor, one-off scripts, I still fall back to shell. For complex things, I move over to perl. But even then, there are just some limitations. Such as trying to have a simple command to effect great changes to one's environment. Perl just can't do it without spawning a subshell, or running inside a shell 'eval' statement, both of which have annoyances/limitations. Contrast with the shell's limitation on compile-time checking (there is none) or lexical variables (again, no such thing), and you end up with reasons to keep both tools in one's toolchest.
For me, learning both tools (shell and perl) has made me more efficient on all platforms. Especially unix/linux. Knowing their limitations allows me better access to each of their strengths.
[zentara], try this one. I wrote this many years ago to clean up 100's of MB of source code (meaning 100's of 1000's of files) and it seems pretty fast. Way faster than rm -rf, for example. However, my goal wasn't to remove just the files, but the whole tree. I'll comment out the part that removes directories just to make it do what yours does. Granted ... this is a bit more complex. But it can't easily be duplicated in shell.
use strict;
use warnings;
$|=1;
foreach my $d (@ARGV)
{
remove_dir($d);
rmdir $d;
}
print "\nDone.\n";
sub remove_dir
{
my $d = shift;
if ( -f $d or -l $d )
{
unlink $d;
return;
}
# must be a directory?
my (@sfiles, @sdirs);
local *DIR;
opendir(DIR, $d) || do { print "Can't open $d: $!\n"; return };
foreach (readdir(DIR))
{
next if ($_ eq '.');
next if ($_ eq '..');
my $sd = "$d/$_";
if ( -l $sd ) { push(@sfiles, $sd);}
elsif ( -d $sd ) { push(@sdirs, $sd); }
else { push(@sfiles, $sd); }
}
closedir(DIR);
print ".";
# process subdirectories via fork
my $count;
foreach my $sd (@sdirs)
{
my $pid;
if ($pid = fork())
{
# parent
++$count;
}
elsif (defined $pid)
{
# child
remove_dir($sd);
exit;
}
else
{
# failure - try again in a bit
sleep 5;
redo;
}
while ($count > 2) {
wait();
$count--;
}
}
while (wait() != -1) {}
#foreach (@sdirs) {
# rmdir $_ || do {
# warn "$0: Unable to remove directory $_: $!\n";
# };
#}
my @cannot = grep {!unlink($_)} @sfiles;
if (@cannot) {
warn "$0: cannot unlink @cannot\n";
}
}
I'll also add that the difference in speed between .4s and 3s is quite negligible when compared to the amount of time it takes to remember and write them. This example above is ludicrously expensive to write, but it is something I do enough that I call it "RD" (yes, upper-case - it's too dangerous to get a short lower-case name) and put it in /usr/local/bin on all machines, all platforms, that I have access to (primarily as a symlink to a shared NFS partition). We really do use it that much ;-)
grep "^function" *.4gl | sed "s/\(.*\):function \(.*\)(.*/\2 \1 \/^function \2(/"But the above was wrong, so I rewrote a "correct" perl version :
/^\s*function\s+(\w+)\s*\(/i # and then use hashes to save data so there's no s///I rewrote the new perl version in shell (grep/sed) for kicks, and it was slower than the perl version (and much uglier).
perlmonks.org content © perlmonks.org and Anonymous Monk, Aristotle, chromatic, demerphq, frodo72, itub, jdhedden, jdporter, NodeReaper, Perl Mouse, robharper, Roy Johnson, runrig, Tanktalus, VSarkiss, zentara
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03