Hi monks,
I have following problem ... and it is no HOMEWORK (I am 38 years old) eg:
my $string ="EICHENBAUMSCHULE"; my @query = qw(EIC BAUM UMS CHU LE);
1. task
find matches
2. task
make longest query_string out off single short overlapping
(eg. BAUM and UMS) or touching (eg. CHU and LE) matches
e.g.
EICHENBAUMSCHULE
EIC
BAUM
UMS
CHU
LE
concatenate now
BAUMSCHULE
result should be
@new_query = qw(EIC, BAUMSCHULE);
Thanks in advance
Murcia
Updated - [Steve_p] made changes to the formatting -- [flux8] too.
Sounds like homework to me. I don't think having your homework done by the community is the right way to get through school - or life for that matter.
Tell us your thoughts on that matter and tell us your difficulties or problems, how you tried to solve them, where your insecurities are: we will help you along then.
Cheers, Sören
my $string ="EICHENBAUMSCHULE"; my @query = qw(EIC BAUM UMS CHU LE);would the following also match
BAUMSLECHU # the CHU and LE are not in the same order as indicated in @query
This doesn't handle any error conditions, like if the query contains non-matches, but it may give you a starting point.
#! perl -slw
use strict;
sub display{
my $string = shift;
return $string . $/,
map{ ' ' x index( $string, $_ ) . "$_\n" } @_;
}
my $string ="EICHENBAUMSCHULE";
my @query = qw( EIC BAUM UMS CHU LE );
print display( $string, @query );
__END__
P:\test>366569
EICHENBAUMSCHULE
EIC
BAUM
UMS
CHU
LE
As for code ... you still don't have all the requirements laid out yet. Specifically:
I'm sure there are other possibilities that I didn't think of in the first five minutes of looking at the problem. Remember - if you can't explain the problem to a teddy bear, you can't explain the problem to a computer.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
In other words, it doesn't sound like you can explain the requirements to a teddybear. I would spend some time with pencil and paper before going much further. Then, when you have pencil and paper down, convert those into test cases before writing any code.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
#!perl
use strict;
use warnings;
my $string = "EICHENBAUMSCHULE";
my @query = qw/EIC BAUM UMS CHU LE/;
@query = sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
grep { $_->[1] > -1 }
map { [$_, index($string, $_), index($string, $_) + length($_) - 1] }
@query;
my @matches;
@query
or die "No matches";
my $i = -1;
my $p = -2;
for (@query) {
if ($p + 1 < $_->[1]) {
++$i;
}
push @{$matches[$i]}, $_;
$p = $_->[2];
}
@matches = sort { $b->[0] <=> $a->[0] }
map { [$_->[-1]->[2] - $_->[0]->[1] + 1, $_] }
@matches;
print "Longest continuous match has length $matches[0]->[0]\n\n";
print "$string\n";
for (@{$matches[0]->[1]}) {
print ' ' x $_->[1] . $_->[0] . "\n";
}
Sorry for the false start, hopefully this does the job.
#! perl -slw
use strict;
use List::Util qw[ reduce ];
sub longestComposite{
my( $string, $qRef ) = @_;
my $copy = ' ' x length $string;
$string =~ s[($_)]{
my $len = length $1;
substr( $copy, pos( $string ), $len ) = $1;
$1;
}ge for @$qRef;
return reduce{
length $a > length $b ? $a : $b
} split ' ', $copy;
}
my $string ="EICHENBAUMSCHULE";
my @query = qw( EIC BAUM UMS CHU LE );
print longestComposite( $string, \@query );
__END__
P:\test>366569
BAUMSCHULE
No promises about efficiency if you're doing this on large volumes of data.
#!/usr/bin/perl use strict; use warnings; $\ = $/; my $string = 'eichenbaumschule'; my @query = qw(baum ums eic chu le); $string =~ s/$_/uc $_/ieg for @query; @query = $string =~ /([A-Z]+)/g; print for @query; __OUTPUT__ EIC BAUMSCHULE
my $string = 'eichenbaumschule';
my @query = qw(baum ums eic chu le);
@query = map { qr/($_)/i } @query;
$string =~ s/$_/\U$1/g for @query;
(I get ~6000 iterations/sec this way vs. ~3000 your way).
Update: Just realized, this is only a factor if you do the same substitutions multiple times in the same program...if you're only doing this once per run, then you may as well do it [delirium]'s way :-)
This is really neat++.
As the target is genome work which usually involves large volumes of big strings, using split '[a-z]+', string; to avoid the capture brackets will save a little time.
sub delirium{
my( $string, $qRef ) = @_;
$string =~ tr[A-Z][a-z];
$string =~ s/$_/uc $_/ieg for @$qRef;
return reduce{
length $a > length $b ? $a : $b
} split '[a-z]+',;
}
my $string ="EICHENBAUMSCHULE";
my @query = qw(EIC BAUM UMS CHU LE);
my $matchline = '@' x length $string;
for $match(@query) {
my $spacer = index $string, $match;
$matchline |= (('@' x $spacer) . $match) if $spacer > -1;
}
print for sort { length($b) <=> length($a) } $matchline =~ /[^@]+/g;
perlmonks.org content © perlmonks.org and BrowserUk, CombatSquirrel, davidj, delirium, dragonchild, Happy-the-monk, Jasper, Murcia, runrig
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03