foreach(loop through each $word in search phrase) { $answer =~ s/$word/$word<\/u><\/b>/g; }Any tips or information would be greatly appreciated. Perl noob
Edit: [g0n] - OP has edited node, removing context of some of the replies. Original content follows:
Hi Perl Monks, I have a regular expression issue that I am having trouble with. I am writing a script that underlines and bolds string text based on the users search phrase. I have this code already but I need to change it so that it does NOT change it for text that are in html tags.
foreach(loop through each $word in search phrase) {
$answer =~ tr/$word/$word;
}
Any tips or information would be greatly appreciated.
Perl noob
A) tr/// isn't what you want (even if you'd used a syntactically correct trailing slash). s/// is for substitutions.
B) unless you can guarantee a very strict formatting in the HTML you're operating on you don't want to use a regexp to manipulate HTML. Use HTML::TokeParser or HTML::TreeBuilder or the like.
[ The OP silently updated his question. This post is now obsolete. ]
Result:
foreach(loop through each $word in search phrase) {
$answer =~ s{\Q$word\E}{$word}g;
}
Read [doc://perlop] for details.
So, let me get this straight. You have a user-entered search phrase and you want to highlight HTML content where it matches those words.
First, let me recommend that when you change your node, you mark it as Update: and either use strike notation or simply post your updated material in a separate paragraph, leaving your original post content alone.
Second, you want to parse out a search phrase into words and put them in an array -- use [perldoc://split|split] to accomplish this.
Third, you'll want to step through that array, using a construct like this (untested):
for my $word (@search_words) {
$html =~ s/($word)/$1<\/u><\/b>/g;
}
This acts on the HTML in $html that you are evaluating and replaces $word with a highlighted version of itself (that's what the $1 accomplishes). It acts on the entire contents of $html because of the /g modifier on the regex.
Fourth, if you want to parse out sections or tags of HTML, applying your substitution to some while ignoring others, you'll probably want to use a CPAN module to do that. I've used [cpan://HTML::TreeBuilder] for such things before, but you may want to search around a little for something that suits your needs.
Update: Ah, I see that [Fletch] already recommended this. Well, now you've heard it from two people! :)
use strict;
use HTML::TokeParser::Simple;
use Getopt::Std;
my $Usage = "Usage: $0 -f word file.html > highlighted.html\n";
my %opts;
( getopts( 'f:', \%opts ) and $opts{f} and @ARGV == 1 )
or die $Usage;
$opts{f} =~ tr/,/\|/;
# This applies a regex substitution to the text part
# and leaves the tags unmodified:
while ( my $token = $p->get_token ) {
if ( $token->is_text ) {
$_ = $token->as_is;
s{($opts{$f})}{$1}g;
print;
}
else {
print $token->as_is;
}
}
(update: forgot to include the assignment to $_)
perlmonks.org content © perlmonks.org and axl163, Fletch, graff, ikegami, ptum
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03