$seq="IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMMMMMOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMIIIIIIMMMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOMMMMMMMMI";
Using [doc://index] would look something like the following:
sub using_index {
our $seq; *seq = \$_[0];
my @groups;
my $pos = -1;
my $start = -1;
for (;;) {
my $new_pos = index($seq, 'M', $pos);
if ($new_pos < 0) {
if (defined($start)) {
push(@groups, [ $start, $pos ]);
}
last;
}
if ($start < 0) {
$start = $new_pos;
}
elsif ($new_pos - $pos > 1) {
push(@groups, [ $start, $pos ]);
$start = $new_pos;
}
$pos = $new_pos + 1;
}
return @groups;
}
It would be simpler if there was a function that returned the next character which isn't 'M'.
As you can guess, it's much slower than the regexp approach. The regexp approach is 160% faster than (i.e. 2.6 times the speed of) the index method on the input you provided.
Benchmark code:
use strict;
use warnings;
use Benchmark qw( cmpthese );
sub using_index {
our $seq; *seq = \$_[0];
my @groups;
my $pos = -1;
my $start = -1;
for (;;) {
my $new_pos = index($seq, 'M', $pos);
if ($new_pos < 0) {
if (defined($start)) {
push(@groups, [ $start, $pos ]);
}
last;
}
if ($start < 0) {
$start = $new_pos;
}
elsif ($new_pos - $pos > 1) {
push(@groups, [ $start, $pos ]);
$start = $new_pos;
}
$pos = $new_pos + 1;
}
return @groups;
}
sub using_regexp {
our $seq; *seq = \$_[0];
my @groups;
push(@groups, [ $-[0], $+[0] ]) while $seq =~ /M+/g;
return @groups;
}
{
my $seq = "IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMMMMMOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMIIIIIIMMMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOMMMMMMMMI";
print("using_index\n");
print("-----------\n");
printf("%d to %d\n", @$_) foreach using_index($seq);
print("\n");
print("using_regexp\n");
print("------------\n");
printf("%d to %d\n", @$_) foreach using_regexp($seq);
print("\n");
cmpthese(-3, {
using_index => sub { my @groups = using_index $seq; 1; },
using_regexp => sub { my @groups = using_regexp $seq; 1; },
});
}
Benchmark results:
Rate using_index using_regexp
using_index 2295/s -- -62%
using_regexp 5995/s 161% --
my $seq = "..."; my @groups; push @groups, [$-[0], $+[0]] while $seq =~ /M+/g; print "$_->[0] to $_->[1]\n" for @groups;This gives me different values than you've shown, but I believe it's correct.
I have come up with this. I dont know whether using $& is effecient or not.
use strict;
use warnings;
my $seq="IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMMMMMOOOOO
+OOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMIIIIIIM
+MMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOO
+OOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMM
+MMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOO
+OMMMMMMMMI";
while ($seq=~/(M+)/g) {
my $l = pos($seq);
print $l-length($&)+1," to ",$l,$/;
}
Regards,
Murugesan Kandasamy
use perl for(;;);
I dont know whether using $& is effecient or not.
Using $& is not efficient, and usually to be avoided. See the entry in perlvar for details.
That's not exactly true. $& is only inefficient if you have another regexp in your program which doesn't capture.
However, it's use is discouraged, since captures can perform the same task without the "effect at a distance" of $&.
On the off chance that you actually meant 6 or more M's, try this modification of [japhy]'s solution:
my $seq = "...";
my @groups;
push @groups, [$-[0], $+[0]-1] while $seq =~ /M{6,}/g;
print "$_->[0] to $_->[1]\n" for @groups;
(where the displayed positions are 0-based.)
my @pos;
my $start = index($str, 'M');
while ($start != -1) {
my $pos;
my $i = 0;
1 while ($pos = index($str, 'M', $start + $i)) == $start + $i++;
push @pos, [$start, $start + $i-2];
$start = $pos;
}
$seq="IIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIIIMMMMMMMMMOOOOOOOOOOOOOOMMMMMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMIIIIIIMMMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOMMMMMMMIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOOOOMMMMMMMMMIIIMMMMMMMMMMMOOOOOOOOOOOOOOOOOMMMMMMMMI";
$i=0;
$_= $seq;
s/([^M]*)(M*)/{ my $j=$i+length($1); $i=$j+length($2); $j==$i ? "" : "pos $j-$i\n" }/ge;
print $seq, "\n", $_, "\n";
perlmonks.org content © perlmonks.org and Anonymous Monk, Cristoforo, ikegami, japhy, kwaping, murugu, revdiablo, Skeeve, ysth
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03