sub lexer {
my($parser)=shift;
my $s = $parser->YYData->{INPUT}; # reference to the string to lex
m/\G\s+/gc; skip any spaces
return ('INT', $1) if $$s =~ m/\G\(d+)/gc;
return ('ID', $1) if $$s =~ m/[A-Z](\w*)/gc;
... # and it goes on for many tentative matches
}
I know that I always match
on $$s so why should I restate it at each match.
I _had_ to remove
these useless $$S !
It took me a long time to realize that I could do it
with a typeglob trick :
*_ = $parser->YYData->{INPUT}; # reference to the string to lex
Now $_ is an alias to the string to lex.
So I can match on it I and don't need the =~ operator anymore
--
stefp
Clever. I usually put a local in there though, just to avoid trouble.
sub lexer { (*_) = @_; print $1 if m/\G(A)/gc || m/\G(B)/gc ; }
my $a = "AB"; lexer \$a; ; lexer \$a;
This prints "A" then "B";
If I add a local *_ or a local $_,
at the entry of the lexer routine, that does not work anymore.
So much for a cool trick.
--
stefp
Not that I like to generalize things, because I usually end up un-generalizing them a few months later (stupid shifting requirements!), but it seems easier than playing with symbol table manipulations just to save a few keystrokes to me... am I missing something?
I'm thinking of something roughly along these lines... completely untested and possibly wrong code is below. ;-)
# make a table of regular expression patterns
my %table = ( qr/(\d+)/ => 'INT',
qr/([A-Z]\w*)/ => 'ID',
.... # more tokens here );
my ($parser) = shift;
my $s = $parser->YYData->{INPUT};
my @matches; # any matches found by our re go in here
foreach my $re ( keys %table ) {
# for each regexp, check to see if it matches, and
# put all the captured values in @matches if it does
@matches = ( $$s =~ m/\G$re/gc );
# return the appropriate token, and captures...
return( $table{$re}, @matches) if (@matches);
} # end search for a token match
# token not found... put error handling here ...
--
Ytrew
--
stefp
for( $$s ) {
....
}
- [tye]
--
stefp
with($$s) {
...
}
But in the meantime, I've trained myself to actually read/see
for(SCALAR) { ... }
as
with(SCALAR) { ... }
Chalk it up as another Perl idiom.
Perl 6 also has syntactic relief for the m/\G.../gc monstrosity as well. That turns into m:p/.../, where the :p tells it to start matching at the current position. (But generally you don't even need that since subrules in a grammar always anchor to the current position anyway.)
Much like in English, you can use Perl's for() for iterating over a list, iterating via initialization + check + step, or associating a single topic with a block of syntax. So I, without apology, use for() for topicalizing. For you, I won't stop doing this. (: Excuse me for not demonstrating the use of English "for" analogous to init + check + step.
- tye
You can also use a single regexp with all alternatives and \G and the g flag but without the c flag. Then you can decide which alternative mached by checking the definedness of $1 and other match variables.
I sometimes use that idiom instead of many regexps with a gc flag. A nice example is the glob_to_re function in cgrep (snapshot) (which is btw an improved version of my Egrep clone with function name display). A simpler example is in Re: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.).
Why not just store $$s in a local copy of $_?
#Either local $_ = $$s; #Or s//$$s/; #tricky.. ;-)
Actually, in this case, I'd be tempted to alter your approach altogether and use a regex table.
sub lexer {
my ($parser) = shift;
my $s = $parser->YYData->{INPUT};
# I don't get your line: 'm/\G\s+/gc; skip any spaces'
my %dispatch = (
INT => qr/\G(\d+)/gc,
ID => qr/\G([A-Z]\w*)/gc,
#.. and so on ..
);
while (my ($key, $regex) = each %dispatch) {
return ($key, $1) if $$s =~ $regex;
}
}
Your second solution is no solution:
$_ = '!'; $s = \'No'; s//$$s/; print;
You'd need to empty out $_ first, so the local $_ is the way.
Absolutely. That's the tricky part. ;-) It will work when $_ is undefined, but not otherwise. Of course, you could always change it to s/.*/$$s/, but still not advisable. More of an obfu trick...
--
stefp
perlmonks.org content © perlmonks.org and ambrus, Anonymous Monk, bart, chromatic, Corion, radiantmatrix, stefp, TimToady, tye
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03