Perl regex
axl163
created: 2006-05-02 12:43:06
Hi Perl Monks, I have a regular expression issue that I am having trouble with. I am writing a script that underlines and bolds string text based on the users search phrase. I have this code already but I need to change it so that it does NOT change it for text that are with the following HTML tags:

foreach(loop through each $word in search phrase) {
   $answer =~ s/$word/$word<\/u><\/b>/g;
}
Any tips or information would be greatly appreciated. Perl noob

Edit: [g0n] - OP has edited node, removing context of some of the replies. Original content follows:

Hi Perl Monks, I have a regular expression issue that I am having trouble with. I am writing a script that underlines and bolds string text based on the users search phrase. I have this code already but I need to change it so that it does NOT change it for text that are in html tags.

foreach(loop through each $word in search phrase) {
   $answer =~ tr/$word/$word;
}
Any tips or information would be greatly appreciated. Perl noob
Re: Perl regex
created: 2006-05-02 12:51:43

A) tr/// isn't what you want (even if you'd used a syntactically correct trailing slash). s/// is for substitutions.

B) unless you can guarantee a very strict formatting in the HTML you're operating on you don't want to use a regexp to manipulate HTML. Use HTML::TokeParser or HTML::TreeBuilder or the like.

Re: Perl regex
created: 2006-05-02 12:52:05

[ The OP silently updated his question. This post is now obsolete. ]

  • tr/// does not do what you think it does. Replace tr with s.
  • Add the missing /.
  • You have unescaped / characters in your replace string. Either escape them, or change the delimiter.
  • $word contains text, not a regexp. Its content needs to be escaped.
  • You probably also want to add the g modifier, which indicates you want to replace all occurances (instead of just the first one).

Result:

foreach(loop through each $word in search phrase) {
   $answer =~ s{\Q$word\E}{$word}g;
}

Read [doc://perlop] for details.

Re: Perl regex
created: 2006-05-02 15:11:32

So, let me get this straight. You have a user-entered search phrase and you want to highlight HTML content where it matches those words.

First, let me recommend that when you change your node, you mark it as Update: and either use strike notation or simply post your updated material in a separate paragraph, leaving your original post content alone.

Second, you want to parse out a search phrase into words and put them in an array -- use [perldoc://split|split] to accomplish this.

Third, you'll want to step through that array, using a construct like this (untested):

  for my $word (@search_words) {
    $html =~ s/($word)/$1<\/u><\/b>/g;
  }

This acts on the HTML in $html that you are evaluating and replaces $word with a highlighted version of itself (that's what the $1 accomplishes). It acts on the entire contents of $html because of the /g modifier on the regex.

Fourth, if you want to parse out sections or tags of HTML, applying your substitution to some while ignoring others, you'll probably want to use a CPAN module to do that. I've used [cpan://HTML::TreeBuilder] for such things before, but you may want to search around a little for something that suits your needs.

Update: Ah, I see that [Fletch] already recommended this. Well, now you've heard it from two people! :)


No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Perl regex
created: 2006-05-02 21:37:43
This is the sort of case where HTML::TokeParser::Simple really is simple:
use strict;
use HTML::TokeParser::Simple;
use Getopt::Std;

my $Usage = "Usage: $0 -f word file.html > highlighted.html\n";

my %opts;
( getopts( 'f:', \%opts ) and $opts{f} and @ARGV == 1 )
    or die $Usage;

$opts{f} =~ tr/,/\|/;

# This applies a regex substitution to the text part
# and leaves the tags unmodified:

while ( my $token = $p->get_token ) {
    if ( $token->is_text ) {
        $_ = $token->as_is;
        s{($opts{$f})}{$1}g;
        print;
    }
    else {
        print $token->as_is;
    }
}

(update: forgot to include the assignment to $_)

perlmonks.org content © perlmonks.org and axl163, Fletch, graff, ikegami, ptum

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03