Why is this regex greedy?
pbeckingham
created: 2004-07-02 16:16:16

Given a string that looks suspiciously like a path name:

  '/a/b/c/d/e/abc00000'
yet is not a path name, I am trying to extract the abc000000 identifier from the end. My regex doesn't work, and while I know how to write a better one that works, I don't know specifically why this one does not:
  my $s = '/a/b/c/d/e/abc00000';
  my ($id) = $s =~ m{/(.+?\d+)$};
  print $id, "\n";

  __OUTPUT__
  a/b/c/d/e/abc00000
I can fix it easily by using:
  my ($id) = $s =~ m{/([^/]+\d+)$};
Which I understand. It's just that I don't understand why the first version is greedy.

Re: Why is this regex greedy?
created: 2004-07-02 16:28:02
your regex is matching:
/ followed by one or more any character up to the first digit til end of line

updated: removed regex example sice you just want it explained
Re^2: Why is this regex greedy?
created: 2004-07-02 16:38:46

Thanks, but I have no shortage of correctly functioning regexes - I want to know precisely why the one listed doesn't work. chromatic knows.

Re^3: Why is this regex greedy?
created: 2004-07-06 09:21:28
It's simple, the .+ is capturing everything.
Re: Why is this regex greedy?
created: 2004-07-02 16:32:19

The regex engine prefers leftmost, longest matches. Nothing in your regex prevents it from matching everything between the first slash and the end of line.

Re^2: Why is this regex greedy?
created: 2004-07-02 16:34:29

So it is not treating that .+? the way I expected? Even though there are more-minimal matches?

I guess leftmost-longest trumps non-greedy.

Re^3: Why is this regex greedy?
created: 2004-07-02 16:55:20

Right and right.

I don't understand why people expect non-greedy matching to actually mean "globally shortest match". Perhaps it's in the language we use. Just keep the left-to-rightness as the most prominent feature of your mental model of how perl's RE engine works and you shouldn't go wrong though.

Re: Why is this regex greedy?
created: 2004-07-02 19:32:13
If you are running perl 5.6 or newer (and you are, right?) you might be able to insert use re 'debug'; which will give you detailed output from regex engine. That's a good way to debug your regular expressions.

2share!2flame...
Re: Why is this regex greedy?
created: 2004-07-02 20:15:40

Short answer, the regex engine works left to right.

Long answer, go read Friedl's articles in The Perl Journal or his book on Mastering Regular Expressions.

Re: Why is this regex greedy?
created: 2004-07-02 21:51:48
Here's a good tutorial by chromatic that might be of help, if you haven't already checked it out.
~hb
Re: Why is this regex greedy?
created: 2004-07-02 22:50:11
I'd suggest you use m{.*/(.*)} to get whatever is after the last '/' (assuming there are no newlines in your data). Or perhaps a module like File::Basename
_____________________________________________________
Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: Why is this regex greedy?
created: 2004-07-04 13:04:27
Using a regular expression along isn't necessarily what you want.

Remember: Perl is more than just regular expressions :)

$str = '/a/b/c/d/e/abc00000';
$end = (split '/', $str)[-1]; # pull the last element split() returns
print "we wanted '$end'";
--Stevie-O
$"=$,,$_=q>|\p4<6 8p
.q>.<4-KI;$,
.=pack'N*',"@{[unpack'C*',$_]
}"for split/

perlmonks.org content © perlmonks.org and chromatic, dpavlin, duff, ercparker, heroin_bob, japhy, kscaldef, mcogan1966, pbeckingham, Stevie-O

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03