File name regex
dev2dev
created: 2006-01-20 01:18:22
hi monks, need your expertiese, i have a regex to match file name, i am intrested in either ^\d{8}(_(\d)+)*\.ilf or ^\d{8}(\s(\d)+)*\.sht is there any shortcut for this
if (($file=~m/^\d{8}(_(\d)+)*\.ilf/i) || ($file=~/^\d{8}(\s(\d)+)*\.sht/i)){

#blah blah
}

Thanks in advance
Re: File name regex
created: 2006-01-20 01:52:33

It's not possible to combine those regexps if those are trully meant to be captures ((...)) and not groupings ((?:...)). Any joining would change the variables in which the data is captured.

Let's assume that you only used the parens for grouping (and that you forgot $ at the end of each regexp), then you'd be starting with the following:

if ($file =~ /^\d{8}(?:_\d+)*\.ilf$/i ||
    $file =~ /^\d{8}(?:\s\d+)*\.sht$/i
) {
   ...
}

The possibilities for joining are still quite limited since there is a mixture of common and not common elements:

      vvvvvvvvv  vvv vvv   vvvv  common
     /^\d{8}(?:_ \d+)*\.ilf$/ix
     /^\d{8}(?:\s\d+)*\.sht$/ix
               ^^       ^^^      not common

The best that can be done, as far as I can tell, is to join the common beginning and the common ending, as seen in the following code snippet:

if ($file =~ /^\d{8}(?:(?:_\d+)*\.ilf|(?:\s\d+)*\.sht)$/i) {
   ...
}

This is definitely less readable.

Re: File name regex
created: 2006-01-20 01:59:45

This should do for you:

if ($file =~ /^\d{8}((_\d+)*\.ilf|(\s\d+)*\.sht)/i) { ... }

You don't need parens around \d.

Update: What ikegami said about captures vs. grouping. I assumed you were only using the parens for grouping.

Re: File name regex
created: 2006-01-20 02:02:12

Collect the similar terms (^\d{8}, \d+\.) , use alternations for dissimilar ones (_, \s, sht, ilf ...

$file =~ m/ ^ \d{8}
            (
              (?: _ | \s )
              (\d)+
            )*
            \.
            (?: ilf | sht )
          /xi ;
Re^2: File name regex
created: 2006-01-20 02:05:19

That would work if he can live with the false positives. i.e., you pattern will match files named 12345678_1.sht and 12345678 1.ilf

Re^2: File name regex
created: 2006-01-20 09:23:38

Erm, and now that you've deleted the previous contents we'll never know what was wrong. It's better form to add an update (possibly bracketing the incorrect parts with <strike> tags) stating that somethings wrong. Now the existing reply to your node makes no sense because it has no context.

Re: File name regex
created: 2006-01-20 02:53:34

I believe this will do what you ask, where if the option '_nnn' is present then the extension should be '.ilf', or if it is ' nnn' then the extension should be '.sht'; whilst capturing the entire optional part to $1 and the last digit of that optional part to $2:

    m[ ^ \d{8} ( (?(?=.* \. ilf) _ | \s ) (\d)+ )* \. (?:ilf|sht) ]x

Whether you would call that a 'shortcut' is debatable.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: File name regex
created: 2006-01-20 03:56:26

(Warning: more [cpan://Regexp::Assemble] pimping ahead).

Assuming you do not really care about those captures, if I "unroll" the patterns, it looks like you're interested matching against the following patterns:

^\d{8}\.ilf
^\d{8}_\d+\.ilf
^\d{8}\.sht
^\d{8}\s\d+\.sht

Using the assemble script from the above module, or something like

my $re = Regexp::Assemble->new(flags=>'i')->add(@list)

it produces the following pattern:

^\d{8}(?:\.(?:ilf|sht)|\s\d+\.sht|_\d+\.ilf)

... which looks roughly like that the other people came up with manually. If you do need the captures, the pattern becomes

^\d{8}(?:(\s(\d+))\.sht|(_(\d+))\.ilf|\.(?:ilf|sht))

in which case you can perform the match and get the captures with something like

my @result = map {defined} ($file =~ /$re/);

• another intruder with the mooring in the heart of the Perl

perlmonks.org content © perlmonks.org and BrowserUk, dev2dev, duff, Fletch, grinder, ikegami, parv

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03