Given a string that looks suspiciously like a path name:
'/a/b/c/d/e/abc00000'yet is not a path name, I am trying to extract the abc000000 identifier from the end. My regex doesn't work, and while I know how to write a better one that works, I don't know specifically why this one does not:
my $s = '/a/b/c/d/e/abc00000';
my ($id) = $s =~ m{/(.+?\d+)$};
print $id, "\n";
__OUTPUT__
a/b/c/d/e/abc00000
I can fix it easily by using:
my ($id) = $s =~ m{/([^/]+\d+)$};
Which I understand. It's just that I don't understand why the first version is greedy.
Thanks, but I have no shortage of correctly functioning regexes - I want to know precisely why the one listed doesn't work. chromatic knows.
The regex engine prefers leftmost, longest matches. Nothing in your regex prevents it from matching everything between the first slash and the end of line.
So it is not treating that .+? the way I expected? Even though there are more-minimal matches?
I guess leftmost-longest trumps non-greedy.
Right and right.
I don't understand why people expect non-greedy matching to actually mean "globally shortest match". Perhaps it's in the language we use. Just keep the left-to-rightness as the most prominent feature of your mental model of how perl's RE engine works and you shouldn't go wrong though.
Short answer, the regex engine works left to right.
Long answer, go read Friedl's articles in The Perl Journal or his book on Mastering Regular Expressions.
Remember: Perl is more than just regular expressions :)
$str = '/a/b/c/d/e/abc00000'; $end = (split '/', $str)[-1]; # pull the last element split() returns print "we wanted '$end'";
$"=$,,$_=q>|\p4<6 8p.q>.<4-KI ;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/;$_=$,,y[A-Z a-z] {}cd;print lc
perlmonks.org content © perlmonks.org and chromatic, dpavlin, duff, ercparker, heroin_bob, japhy, kscaldef, mcogan1966, pbeckingham, Stevie-O
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03