HTML::Strip question--stripping only certain tags?
Anonymous Monk
created: 2006-02-01 16:49:21
I need to take some HTML text and strip it of only certain HTML tags (the rest can stay). I took a look at HTML::Strip and it seems I can either strip all HTML tags or only a list of certain HTML tags. Is there some way to get it to strip everything EXCEPT a certain list of tags? Is there some other module I should be looking at instead?
Re: HTML::Strip question--stripping only certain tags?
created: 2006-02-01 17:01:28

If you only want to remove certain parts you could take a to look at HTML::TreeBuilder and friends and use it to selectively pull out the elements you want from what's there. Alternately if your HTML is well formed enough you could use XML::Twig to do something similar.

Another source of inspiration might be to get the slashcode source and look at its comment filtering (seeing as that's what this sounds like you're trying to do).

Re: HTML::Strip question--stripping only certain tags?
created: 2006-02-01 18:05:23
[cpan://HTML::Scrubber] lets you allow only selected tags. I lifted the following from the pod (slightly modified):

#!/usr/bin/perl -w
use HTML::Scrubber;
use strict;

my $html = q[



a => link br =>
b => bold u => UNDERLINE ]; # only allow the following tags my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] ); print $scrubber->scrub($html); __END__ Output:
a => link br =>
b => bold u => UNDERLINE
Re: HTML::Strip question--stripping only certain tags?
created: 2006-02-01 19:24:26

There's a module based on HTML::Parser for this on my pages (not released to CPAN) that allows you just that and more. It allows you to specify not just the list of tags to allow, but also the attributes. So not more unexpected onMouseOvers and onLoads ;-)

Just the module name is a bit silly ...

Jenda

XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

perlmonks.org content © perlmonks.org and Anonymous Monk, bmann, Fletch, Jenda

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03