I don't remember regex seeming this hard before
danderson
created: 2004-06-14 20:08:55
Hello Monks,

I am thoroughly stuck.

Suppose one was to have a string formatted like so: " a " and one wanted to translate every character inside <>s to upper case (it's a much simplified version of what I'm actually doing - I figure, no point in cluttering up the quesion with [^\(\)-]s etc. Oops, too late.)

How do you go about this? Non-greedy matching won't work, because <>s can nest. That is, "<
a> a " should have every char except the third 'a' modified. Non-greedy would do the first and last. Greedy will do all.

So I've been thinking that the solution must be to iterate or recurse over the string char-by-char, but I'm loathe to sinking to C-array style string parsing. Heck, I'm not even sure how to handle strings char-by-char in Perl.

Is that the only solution? Or is there a regex trick that I don't know of that makes this simple?
Re: I don't remember regex seeming this hard before
created: 2004-06-14 20:20:52

You want a parser, not a regex. But I am in a weird mood ...

my $test = "< a> a ";

for ($test) {
  my $count = 0;
  s{(.)}{
    $count++ if $1 eq '<';
    $count-- if $1 eq '>';
    $count ? uc $1 : $1;
  }ge;
}

print $test;

print "Just another Perl ${\(trickster and hacker)},"
The Sidhekin proves Sidhe did it!

Re^2: I don't remember regex seeming this hard before
created: 2004-06-14 20:33:38
Yes, you're correct, it's closer to a parser (the not-an-example version is going to do a bit of regex on the contents of the <>s).

That's a pretty spiffy, though - thanks!
Re: I don't remember regex seeming this hard before
created: 2004-06-14 22:11:44
To do the nesting right with a regex you'd need to use (??{}) which is probably not the simplest answer for this problem. Use one of the other solutions
Re: I don't remember regex seeming this hard before
created: 2004-06-15 05:32:01

Here's a simple sed-like solution. This is probably not the fastest way, though.

$string = "< a> a \n"; 
{ $string=~s/<([^<]*)>/\U$1/g and redo; }
print $string;
Re: I don't remember regex seeming this hard before
created: 2004-06-15 07:42:19
This article may provide some additional enlightenment.

We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: I don't remember regex seeming this hard before
created: 2004-06-15 07:59:16
Use a regex that calls itself.
#!/usr/bin/perl

my $qr;
$qr = qr!(?:\<(?:(?>[^\<\>]+)|(??{$qr}))*\>)!;

$_  = "< a> a ";
s/($qr)/\U$1/g;
print;
Boris
Re: I don't remember regex seeming this hard before
created: 2004-06-15 08:34:36

An uglier solution, just to be complete:

$k = 0; $string=~s@(?:<(?{++$k})|>(?{--$k}))*([^<>]*)@$k?uc($1):$1@ge; 
Re^2: I don't remember regex seeming this hard before
created: 2004-06-15 12:30:56
Or, to be (arguably) less ugly and do a little less work,
s/([<>]+[^<>]+)/($k+=($1=~y#<##)-($1=~y#>##)) ? uc $1 : $1/ge;

We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: I don't remember regex seeming this hard before
created: 2004-06-15 09:40:47
If you have the means, you may wish to check out Recipe 6.17 from The Perl Cookbook, 2nd Ed. I'm not typing out the code here in case it would be a copyright infringement.

perlmonks.org content © perlmonks.org and ambrus, borisz, bsb, danderson, orderthruchaos, Roy Johnson, runrig, Sidhekin

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03