0/1/2/3 0 0/4/5/6 0/4 0/1 0/1/2 0/4/5 0/10/111/145 0/10/111 0/10Now I'm trying to sort them. The correct order is:
0 0/1 0/1/2 0/1/2/3 0/4 0/4/5 0/4/5/6 0/10 0/10/111 0/10/111/145Perl's default sort using [cmp] comes very close, but it places 0/10 before 0/4, and it should not. I've found a really ugly solution involving a couple of for loops and such things, but I was really hoping for anything faster and cleaner. Any ideas?
sub do_the_sort {
my @aa = split /\//, $a;
my @bb = split /\//, $b;
for ( 0 .. $#aa ) {
return 1 unless defined $bb[$_];
my $zz = $aa[$_] <=> $bb[$_];
return $zz if $zz;
}
-1;
}
my @y = sort do_the_sort @x;
my @data = map {chomp; [ split '/' ] } ;
@data = map { join '/', @$_ } sort my_sort @data;
sub my_sort {
my $i = shift || 0;
if ( defined $a->[$i] && defined $b->[$i] ) {
return $a->[$i] <=> $b->[$i] || my_sort(++$i);
}
return defined $a->[$i] ? 1 : -1;
}
Cheers - [Limbic~Region|L~R]
If your numeric values are all less than 255, then this works. You could swap the ST to a GRT or OM if performance is a concern. If your numbers get larger than 255 you could probably use unicode pack 'U*', but I haven't tested that.
@data = qw[
0/1/2/3 0 0/4/5/6 0/4 0/1 0/1/2
0/4/5 0/10/111/145 0/10/111 0/10
];;
print for map{
$_->[0]
} sort{
$a->[1] cmp $b->[1]
} map{
[ $_, pack 'C*', split '/', ]
} @data;;
0
0/1
0/1/2
0/1/2/3
0/4
0/4/5
0/4/5/6
0/10
0/10/111
0/10/111/145
My perl shell has the -l switch on the shebang line which means amongst other things) that print statements have a newline appended to them automatically. See perlrun for the details.
And after thinking I found it:
print for map{
$_->[0] . "\n"
}
etc.
I should think a lot more before posting a trivial questions. :-)
Are you sure that non-C locales wont cause this to sort in orders other than ASCIIbetical (and thus wrongly), particularly if you add Unicode to the mix?
Makeshifts last the longest.
Are you sure that non-C locales won’t cause this to sort in orders other than ASCIIbetical
Um, no. Did I imply that I was?
I assume that anyone using non-C locales will know they are and know that they will have to take special steps to get sorts to operate correctly. This assumption is based upon the fact that many uses of GRT (including yours elsewhere in the thread), and as far as I can tell from looking, many of the various sorting modules on CPAN will be similarly affected.
#!/usr/bin/perl
use strict;
use warnings;
# parse the paths into lists
my @data;
while ( ) {
chomp;
push @data, [ split m:/:, $_ ];
}
sub compare_lists_lexicographically {
# make copies of the input lists
my @left = @$a;
my @right = @$b;
# chew threw the lists until
# a) we discover they're different lengths, or
# b) the elements are different, or
# c) neither
while ( 1 ) {
if ( !@left && !@right ) {
# ran out at the same time: they're equal
return 0;
}
elsif ( !@left ) {
# left is shorter than right: left is first
return -1;
}
elsif ( !@right ) {
# right is shorter than left: right is first
return 1;
}
else {
# check the next element from each list
my ( $l, $r ) = ( shift @left, shift @right );
# NB: use < if fields are numeric, lt otherwise
if ( $l < $r ) {
# next element of left comes first
return -1;
}
elsif ( $r < $l ) {
# next element of right comes first
return 1;
}
# else equal, keep checking
}
}
}
# display the output
printf "%s\n", join '/', @$_ for sort compare_lists_lexicographically @data;
__DATA__
0/1/2/3
0
0/4/5/6
0/4
0/1
0/1/2
0/4/5
0/10/111/145
0/10/111
0/10
HTH,
perl -le 'print for map { join"/",map ord,split"",$_}
sort map {chomp;join"",map chr,@x=split"/",$_}<>'
get the right order?
Id use a GRT variant.
use List::Util qw( max );
my @padded = do {
my @copy = @path;
my @segment_length;
while( 1 ) {
my $len = max map { s!([^/]*/|[^/]+)!! ? length( $1 ) : -1 } @copy;
last if $len == -1;
push @segment_length, $len;
}
my $format = join '', map "%${_}s", @segment_length;
map {;
no warnings 'uninitialized';
sprintf $format, split m!/!, $_, -1;
} @path;
};
@path = @path[ sort { $padded[ $a ] cmp $padded[ $b ] } 0 .. $#padded ];
Makeshifts last the longest.
foreach (@in) {s|(\d+)|sprintf '%10.10d', $1|eg};
@in = sort @in;
foreach (@in) {s|(\d+)|sprintf '%d', $1|eg};
This assumes that your numbers have up to 10 digits.
perlmonks.org content © perlmonks.org and Aristotle, arkturuz, athomason, borisz, BrowserUk, BUU, educated_foo, Limbic~Region, NiJo
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03