I have a file with a really long string in it (it is actually XML but for some reason it is stored in 1 line). What I need to do is to do a substring search of the file and print out the "word" that contains the substring. This "word" might be a url, a description, etc. For coding and extraction purposes, the "word" is delineated by whitespace. So I need to back up to the beginning of the "word" an print out to the end of the "word."
Here is the code I have already but as you can see it uses an absolute substring size and I need it to be dynamic:
while (<>) {
my $istr = lc($_);
my $offset = index($istr,"cesi");
print $offset."\n";
if ($offset > -1) {
my $str = substr($istr, $offset-20, 100);
print $str."\n";
}
}
Thanks in advance for any input.
my $string = 'cesi';
while($istr =~ /(\S*$string\S*)/gi) {
print "$1\n";
}
not tested, but should work... the i does case insensitive matching, the g matches more than once, allowing the loop to catch all occurances. Lowercasing the string ahead of time may help the speed, especially if you want the output to be lowercase (though you probably don't if you have things like URLs).
If you want to know the location of the word in the source string the special array @- and @+ should come in handy.
Perfect; that was the missing piece. I knew that most likely had to use a regex but that is admittedly a weak point for me. This does just what I am looking for.
Thanks for the help and the rapid reply.
Your description doesn't completely tally with your code. If the file contains a single long string, then your while loop will only iterate once. However, to print out all, whitespace delimited words that either match or contain a given search term, you could use:
$string = 'this is a really long string (no really, it is!) that contains a whitespace delimited word'; print $1 while $string =~ m[(\b\S*limit\S*\b)]gi;; ## All words, case insensitive. delimited
perlmonks.org content © perlmonks.org and bfdi533, BrowserUk, suaveant
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03