Glob filespec
bangers
created: 2006-05-03 06:39:16
I appreciate this is a little off topic, but a colleague and I've googled for most of this morning with no joy. This is my last port of call, so any help would be gratefully appreciated.

I have a process running on Debian which uses glob to scan for incoming files of a certain type. Currently we use a file spec something like *_{process,read}_* This works great on files with names like:
abc_read_today.dat

The trouble is, that for operational reasons, the format is going to change so that a file could now be called:
abc_leave_today+abc_read_today.dat

We only want to process a file if it has 'read' or 'process' before the first '+'. Does anyone have any ideas on this?

We have looked into doing a glob on the file spec, then splitting the file on the '+' and doing a Perl regex on the file spec. This is the plan of last resort as I am not 100% happy that we can reliably convert the file specs to Perl regexs.

As I said, sorry that this isn’t strictly Perl, but it is in relation to a Perl process.
Re: Glob filespec
created: 2006-05-03 07:16:54

glob plainly emulates shell globbing, and does not work with regexen. You can either use File::Find (or its relatives File::Find::Rule and File::Finder) even if you do not need to recurse, or just opendir, readdir and grep on filenames yourself.

Or else, now that I think of it, shouldn't *_{process,read}[+_]* work? Well, not exactly, because it would give false positives if "+" were not the first one. Maybe it's enough for you, anyway...
Re^2: Glob filespec
created: 2006-05-03 12:19:33
Thanks for your suggestions. Unfortunately File::File etc won’t work as we don’t want to change several 1,000 file specs ( sorry if some of the restriction seem arbitrary, but there are good reasons for them)

In the end we decided to use the file spec to pull back a super set of what we wanted. We then converted any '*' into '(.*?)' and did a regex. If $1 contains a '+' then we exclude the file e.g.
my $spec = ‘*_{process,read}_*’;

my $reg = $spec;
$reg =~ s/\*/(.*?)/g;

my @use;
for my $file ( glob $spec )  {
	$file = m/$reg/;
	push @use, $file unless $1 =~ /\+/;
}
Note: That’s a simplification of the code, which works, I haven’t tested or run the code above. It’s just for illustration here.

I suppose in the end it was a PERL question after all.
Re^3: Glob filespec
created: 2006-05-04 06:18:42
Thanks for your suggestions. Unfortunately File::File etc won’t work as we don’t want to change several 1,000 file specs ( sorry if some of the restriction seem arbitrary, but there are good reasons for them)

To be fair I don't understand your concerns since I don't have the slightest idea about what you mean with "to change several 1,000 file specs". I suspect that you, in turn, did misunderstood the suggestion about [cpan://File::Find].

my $spec = ‘*_{process,read}_*’;

Please use real single quotes: what are you using as an editor?!?

my @use;
for my $file ( glob $spec )  {
    $file = m/$reg/;
    push @use, $file unless $1 =~ /\+/;
}

This won't work, since since {process,read} does not do what you seem to think it does, in a regex. You probably want

my @use=grep !/[^+]*?_(?:process|read)_/, glob $spec;

But then you should be aware that you're duplicating your efforts, performing two very similar pattern matches one after the other. Although I'm a big advocate of [doc://glob] whereas I often see people do unnecessary [doc://opendir]s and [doc://readdir]s, in this case I feel like suggesting you to follow that path...

I suppose in the end it was a PERL question after all.

No, it was not a "PERL" question, since there's not such a thing. Check

perldoc -q 'difference between "perl" and "Perl"'

and while you're there, [id://510594].

perlmonks.org content © perlmonks.org and bangers, blazar

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03