Parse with XML::Simple: how to keep some tags "unparsed"?
dda
created: 2004-07-01 06:08:21
Hi Monks!

I need to parse a simple XML file which contains HTML tags, for example:


  
    This is 
some HTML text
Is there a way to specify that everyting inside element should not be parsed using XML::Simple? If no, which module should I use instead?

Thank you for your help.

--dda

Re: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 06:13:39
if i understand you correctly you don't want to parse the <content>-tags because you want to save time.
then it would probably be better to use XML::Parser.
also have a look at http://perl-xml.sourceforge.net/ for FAQ and examples.
Re^2: Parse with XML::Simple: how to keep some tags "unparsed"?
dda
created: 2004-07-01 06:34:25
No, it is not a matter of time. I need to keep all XHTML contents of tags in a single place, and do not parse it into perl data structures.

--dda

Re^3: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 06:46:59
If you are embedding data that has >'s and <'s, then you prolly wanna use the CDATA directive/option/thing within your xml to denote, "this is data of the XML document, not part of the XML structure".

If that's beyond your control, you can always create a SAX parser that does just what you want.

Or you can write some XSLT that transforms the content nested data into what I described above.

Bart: God, Schmod. I want my monkey-man.

Re^4: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 10:30:45

Or you can write some XSLT that transforms the content nested data into what I described above.

Show the XSLT - you [id://344355|know what happens if you don't] ;-):



                   
      
         
            
            
               
                  
               
            
      

/J\

Re^5: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 10:39:21
I'll retort with a joke.

In three places across the country, there fires break out in three bedrooms. The fires aren't large yet, but the smoke wakes up the people inside those rooms. In them, slept a chemist, a physicist and a computer scientist (or mathematician, works both ways).

The chemist wakes up, measures the fire's dimensions and heat, grabs a large container, measures the water exactly and puts out the fire with a fairly even distribution of water, not wasting a drop.

The physicist wakes up, grabs a container, fills it as quickly as he can, douses the fire and it's out.

The computer scientist wakes up, sees the fire, declares it an already solved problem and goes back to sleep.

Lesson learned? I'm lazy :) (Or the other excuse is, I know it's possible, but there were too many hurdles to go through: finding my xslt book, getting an xslt parser installed etc.. ) Doesn't make it right. bad monk, i know.

Bart: God, Schmod. I want my monkey-man.

Re: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 09:03:48

This is not valid XML. You have a tag, "" that contains both a value "This is ", and a child tag "

some HTML text
". Pick one, or hide the <> characters with <>, or do the right thing and use CDATA.

Update: I stand corrected. I just checked the XML spec (http://www.w3.org/TR/2004/REC-xml-20040204) and gellyfish and ktingle are correct. Sorry.

Re^2: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 09:40:31

It actually is valid - an node is allowed to have mixed content. This snippet will give rise to a schema like:



  
    
      
        
          
            
              
                
                  
                    
                      
                    
                  
                
              
            
          
        
      
      
    
  

However this is certainly not what was intended - if the contents of are to be taken literally it should be a CDATA section.

/J\

Re^2: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 10:23:59
An element can have a value and a child element, check the XML spec. Its awkward, but valid XML.
Re: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 10:30:07
others have said to use CDATA but not given illustration for you:

  some HTML text
]]>
Re: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-01 15:51:51

Did you take a look at XML::Twig? AFAIK it's very flexible in parsing/filtering tags.

Re^2: Parse with XML::Simple: how to keep some tags "unparsed"?
qq
created: 2004-07-05 05:08:47
use XML::Twig;

my $xml = '

  
      This is 
some HTML text
'; my $twig = XML::Twig->new( twig_handlers => { content => sub { $_->print; print "\n"; }, }, ); $twig->parse($xml); $twig->purge;
Re: Parse with XML::Simple: how to keep some tags "unparsed"?
created: 2004-07-04 20:26:43
Heya dda
Using CDATA it's a good choice, but you may want to check XML::Smart perhaps?

--
Kyoichi

perlmonks.org content © perlmonks.org and abclex, bageler, dda, exussum0, gellyfish, ktingle, Kyoichi, pbeckingham, qq, tinita

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03