how to do effective sort in perl
Anonymous Monk
created: 2006-05-03 09:23:04
i am doing sorting on array using sort function available in perl
opendir( DIR, '.' )||die "Can't open $dir";
my(@filenames)=readdir ( DIR );
closedir ( DIR );
@filenames=sort {$a cmp $b} @filenames;
i want to optimize this piece of code. is there some way i can do this

Code tags added by [Arunbear]

Re: how to do effective sort in perl
created: 2006-05-03 09:28:33

Well, aside from calling my @filenames = sort readdir( DIR ) in one step (since by default sort uses effectively the comparitor you've shown, and that copies the results around one less time) what more do you think could be optimized? Doesn't get much more bare bones than that. If you're not interested in every filename that readdir is going to return you might stick a grep in between there, but there's not really much more to be pared down.

Re: how to do effective sort in perl
created: 2006-05-03 09:37:17

sort with no explicit comparison is already optimized away and does the same as what you've shown, so just remove the block. Oh, and BTW: you probably already knew, but "premature optimization is the root of all evil".

Re: how to do effective sort in perl
created: 2006-05-03 10:44:33

You didn't say what you wanted to optimize, so here's another take on it:

my @filenames = do {
   opendir my $d, "." or die;
   sort readdir $d
};
Note that this only works on modern perls.

[duff]

Re: how to do effective sort in perl
created: 2006-05-03 10:51:18

On my OS I'd do

my @files = glob '*';

as glob always produces a sorted list on my system. I'm not sure if that is true on other systems?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: how to do effective sort in perl
created: 2006-05-03 11:06:47

On some OSs that doesn't include hidden files:

$ perl -le 'opendir my $fd, "."; my @files = readdir $fd; closedir $fd; print scalar @files'
48
$ perl -le 'my @files = <*>; print scalar @files'
7
$ _

But including them is not difficult:

$ perl -le 'my @files = (<*>,<.*>); print scalar @files'
48
$ _

It seems that [doc://readdir] is faster, though:

use Benchmark qw(:all) ;
cmpthese(20000, {
  readdir => sub { opendir my $fd, "."; my @files = readdir $fd; closedir $fd; },
  glob    => sub { my @files = (<*>,<.*>); }
});
__END__
           Rate    glob readdir
glob     3170/s      --    -75%
readdir 12739/s    302%      --

Update: Of course [doc://glob] is slower, since in this computer it also sorts the result.

--
David Serrano

Re: how to do effective sort in perl
created: 2006-05-03 11:15:45

If it's an efficient sort algorithm you're after, you should note that "In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm whose worst-case behavior is O(NlogN)." (perldoc). According to Numerical Recipes, "For the basic task of sorting N elements, the best algorithms require on the order of several times N log2 N operations." That means that mergesort is just about as efficient as you can get.

Incidentally, Numerical Recipes is an astonishingly good book. Knuth may be the definitive reference, and nag may have many algorithms fine-tuned to specific cases, but NR is exceptionally good value for money. You can even download free source code in FORTRAN or C. If you're new to Perl, I'd see the lack of a Perl version as a good thing. I think it's a useful exercise in learning a language to try translating into it.

Update: Of course, all of this is overkill for the number of items likely to be found in any sensible directory!

Polonius
Re^2: how to do effective sort in perl
created: 2006-05-03 22:35:05
Of course, all of this is overkill for the number of items likely to be found in any sensible directory!

heh. I've actually had to scold some of my co-workers, telling them they really should not be putting 300,000 files in a single directory, and there are easy enough ways to avoid this... Even 50,000 seems like too much to me (but then, I learned programming on a pdp 11-10, and my sense of scale may be a bit old-fashioned).

Re^3: how to do effective sort in perl
created: 2006-05-04 10:57:09

heh. I've actually had to scold some of my co-workers, telling them they really should not be putting 300,000 files in a single directory, and there are easy enough ways to avoid this...

Anything more than a screenful seems excessive to me! But then, I grew up on a Data General S-250, IIRC (Think PDP 11-70 - that sorta era, but less successful.)

Polonius
OT: Re^3: how to do effective sort in perl
created: 2006-05-04 13:37:21

Been a while since I have referred anyone to this, but it is a pretty good introductory discussion <shamelessplug>(and response)</shamelessplug> on why some directory structures slow down after a certain number of documents are included. Not necessarily useful for those Old System Administrators1.

1 - OSA's never die, they just put down roots.

--MidLifeXis

Re: how to do effective sort in perl
QM
created: 2006-05-03 15:35:16
As other have alluded, there are several paths to optimization.

Do you want it to run faster?

Do you want to use less code?

Do you want it to be more readable, maintainable, portable, or more robust?

glob or similar will be less code. What you've given (other than the lack of codetags) is almost as good as you can do for balancing all of the criteria above.

Incidentally, IIRC Perl has builtin optimizations for the following sort subs:

{$a cmp $b} # default
{$b cmp $a}
{$a <=> $b}
{$b <=> $a}
These are the most common. So using any one of these should be indistinguishable from the default (no sort sub) case.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Re: how to do effective sort in perl
created: 2006-05-03 16:59:24
perl -e '@filenames = sort <*>'

perlmonks.org content © perlmonks.org and Anonymous Monk, blazar, BrowserUk, duff, Fletch, graff, Hue-Bond, MidLifeXis, Polonius, QM, smokemachine

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03