[id://357449] kicked off a quest to see if I could apply the stuff I learned in [http://www.perl.com/pub/a/2002/05/07/optree.html|Hacking The Op Tree For Fun And...]. The following code is the result of a two weeks of hacking at perl to automatically either report ( the default - a safe choice ) or fix your code that doesn't check the result of system calls. The following snippet will automatically fix your entire program so that there are no unchecked io, file, or directory calls. It even gets into print() and printf() which are both normally impossible to override.
I am posting this code because it may be a while before I release another, better version. Along the way I found that it would be easy to abstract the `compiled_program() =~ s///`-like portions of the module off and so I expect to create a Devel::Macro later so I can rewrite this as a proper perl-macro using the impending macro system. In the interim I'm thinking I may just end up really busy with other, more important stuff and instead of just letting this sit around, collecting dust I thought I'd share it and let you all at least try it out.
All the user-configurable code is handled via the import() call so where the documentation isn't clear read that bit of source. See also the %CHECK_DICT for a categorized list of opcodes that can be matched. By default the module looks for everything appropriate from io, sockets, file, directory, eval, and miscellaneous categories.
I had to submit a patch to B::Utils and to B::Generate to get this to work. I have included both patches as responses to this node (mostly so the code I actually want to post isn't cluttered up with these related patches).
use Devel::UncheckedOps
( fix => 1,
functions => [ map @$_,
@Devel::UncheckedOps::CHECK_DICT{ qw( io file directory ) } ] );
print "Hello world!\n";
# Becomes
# print "Hello world!\n" or die $!;
package Devel::UncheckedOps;
use strict;
use warnings FATAL => 'all';
use B qw( OPf_WANT_VOID class );
use B::Utils qw( walkallops_filtered opgrep );
use vars qw( $REPORT_CALLBACK %CHECK_DICT @CHECK_OPS @TERMINAL_OPS $VERSION
$OPCODE_NAME $FIX_OPCODES $O_PM @QUEUED_FIXES_TO_APPLY $DEBUG );
# %ALLOPS %OPMAP
use Carp qw( carp );
$VERSION = '0.01';
# This is a largish list of stuff I think can be validated by this module.
# The default list of opcodes that will be checked is defined in @CHECK_OPS
# immediately following and normal users specify the list of ops to validate
# by passing in a reference to an array to the 'check' parameter of the use()
# call.
%CHECK_DICT = (
# 'write' would be a nice op to add but I do not yet know how
# it works in code.
io =>
[ qw[open close binmode dbmclose dbmopen fcntl flock getc ioctl
pipe_op tie read print prtf seek send sysopen sysread
sysseek syswrite recv tell truncate ] ],
sockets =>
[ qw[accept bind connect listen shutdown sockpair ] ],
file =>
[ qw[ chdir chmod chown chroot link mkdir readlink rename rmdir
symlink unlink utime rmdir ] ],
directory =>
[ qw[ closedir open_dir readdir rewinddir seekdir telldir ] ],
# process =>
# TODO:
# Check backtick
# die $?, not $!
# [ qw[ exec fork kill system ] ],
# I do not know how to validate the semaphore, shared memory,
# or message passing code. Thi
shared_memory =>
[ qw[ shmctl shmget shmread shmwrite ] ],
message_passing =>
[ qw[ msgctl msgget msgrcv msgsnd ] ],
semaphores =>
[ qw[ semctl semget semop ] ],
eval => [ qw[ dofile ] ],
miscellaneous =>
[ qw[ syscall ] ]
);
@CHECK_OPS = ( map @$_,
@CHECK_DICT{ qw( io
sockets
file
directory
eval
miscellaneous ) } );
# I started with just nextstate and leavesub but while reading opcode.pl
# went "eh, what the heck. Why not?" and just included the raft of related
# opcodes.
@TERMINAL_OPS = ( qw[ method
entersub
leavesub
leavesublv
caller
reset
lineseq
nextstate
dbstate
unstack
enter
leave
scope
enteriter
iter
enterloop
leaveloop
return
last
next
redo
dump
goto
exit ] );
$REPORT_CALLBACK = \ &default_report;
CHECK
{
check();
}
# Create an alias so that when fixing, a person can say fix() instead. This
# might only be interesting when the normal CHECK call wasn't called.
*fix = \✓
sub check
{
if ( $FIX_OPCODES )
{
walkallops_filtered( \ &find_unchecked_system_call,
\ &queue_fix_opcode );
fix_opcode( $_ ) for @QUEUED_FIXES_TO_APPLY;
}
else
{
walkallops_filtered( \ &find_unchecked_system_call,
$REPORT_CALLBACK );
}
if ( $O_PM )
{
eval "use O '$O_PM'";
}
return 1;
}
sub import
{
my $class = shift;
my %p = @_;
# Ethier take a callback from the user via
# use Devel::UncheckedOps ( callback => sub { ... } );
# or supply a default.
$REPORT_CALLBACK = $p{'report_callback'}
if $p{'report'};
# Allow both `use Devel::UncheckedOps( check => 'print' )` or
# `use Devel::UncheckedOps( check => [ 'print' ] )`. This is the
# parameter I most expect people to specify.
if ( $p{'function'} )
{
@CHECK_OPS = $p{'function'};
}
elsif ( $p{'functions'} )
{
@CHECK_OPS = @{$p{'functions'}};
}
# This is a boolean value.
$FIX_OPCODES = !! $p{'fix'};
if ( $FIX_OPCODES )
{
eval q[
use B::Generate ();
use Internals ();
1;
]
or carp( $@ );
}
# This is a boolean value. Various guts will be displayed if you pass
# in a true value. The guts that are displayed are entirely up to my
# most recent needs.
$DEBUG = !! $p{'debug'};
# This is passed to `use O '$O_PM'` so the user of this module can say
# `use Devel::UncheckedOps( O => 'Deparse' )` to see what the
# code looks like after deparsing. The parameter is any module in the B::
# namespace that has already been designed to be called by O.pm in this
# way. This includes Bblock, Bytecode, C, CC, Concise, Debug, Deparse,
# Showlex, Stackobj, Stash, Terse, Xref or any other module you might
# get from CPAN like B::Deobfuscate.
$O_PM = $p{'O'};
# I seriously doubt that anyone is going to need to specify these. I
# include this solely for debugging purposes and perhaps the eventual
# need for it.
@TERMINAL_OPS = @{$p{'terminal_ops'}}
if $p{'terminals'};
return 1;
}
sub find_unchecked_system_call
{
# This is used by B::Utils::*_filtered to grep for opcodes that need to
# be reported or fixed.
my $op = shift;
# I am going to fix/report this in another function immediately following.
$OPCODE_NAME = $op->oldname;
# if ( $FIX_OPCODES )
# {
# my $addr = $$op;
# for my $m ( qw( sibling
# first
# last ) )
# {
# my $to = eval { ${ $op->$m } };
# next unless $to;
# push @{$OPMAP{ $to }}, [ $op, $m ];
# }
# }
# B::Utils::opgrep test to decide if this opcode is one that is desirable.
return ( opgrep( { name => \ @CHECK_OPS,
flags => OPf_WANT_VOID },
$op )
or
opgrep( { name => \ @CHECK_OPS,
next => { name => \ @TERMINAL_OPS } },
$op ) );
}
sub default_report
{
# This is the default callback for reporting that something has gone awry.
# It may be overriden by saying
# use Devel::UncheckedOps( report => \ &other_sub );
carp( "Unchecked $OPCODE_NAME"
. " call at $B::Utils::file line $B::Utils::line" );
}
sub queue_fix_opcode
{
# This function puts fixes into a to-do list so that they are only
# altered when the tree is not being currently walked.
my $op = shift;
push @QUEUED_FIXES_TO_APPLY, { op => $op,
file => $B::Utils::file,
line => $B::Utils::line };
return 1;
}
sub fix_opcode
{
# This function accepts a 'fix' as previously queued by queue_fix_opcode().
my $fix = shift;
my $op = $fix->{'op'};
my $file = $fix->{'file'};
my $line = $fix->{'line'};
printf( __PACKAGE__
. " FIXING %s at %s line %s\n",
op_to_text( $op ),
$file,
$line )
if $DEBUG;
# This is the in-memory address of the opcode. It is used
my $orig_next = $op->next;
my $orig_sibling = $op->sibling;
printf( __PACKAGE__
. " SIBLING %s\n",
op_to_text( $orig_sibling ) )
if $DEBUG;
my $orig_parent = $op->parent;
printf( __PACKAGE__
. " PARENT %s\n",
op_to_text( $orig_parent ) )
if $DEBUG;
my $orig_reverse_first;
$orig_reverse_first = $orig_parent
if ${$orig_parent->first} == $$op;
printf( __PACKAGE__
. " PARENT->FIRST %s\n",
op_to_text( $orig_parent->first ) )
if $DEBUG;
my $orig_reverse_last;
$orig_reverse_last = $orig_parent
if ${$orig_parent->last} == $$op;
printf( __PACKAGE__
. " PARENT->LAST %s\n",
op_to_text( $orig_parent->last ) )
if $DEBUG;
# Maybe find the opcode that thinks this opcode is its sibling by going
# to this opcode's parent and walking over the list of siblings until this
# one is reached. The previously visited opcode is the one we're after.
my @siblings = $orig_parent->kids;
my $orig_reverse_sibling = ( grep ${$siblings[$_]->sibling} == $$op,
0 .. $#siblings - 1 )[0];
$orig_reverse_sibling = $siblings[ $orig_reverse_sibling ]
if defined $orig_reverse_sibling;
printf( __PACKAGE__
. " REVERSE SIBLING %s\n",
op_to_text( $orig_reverse_sibling ) )
if ( $DEBUG
and $orig_reverse_sibling
and ${$orig_reverse_sibling->sibling} == $$op );
# Construct the new program fragment in reverse order so parent nodes
# can point to child nodes. This alters the original node so it is
# now inside the new fragment.
# or
# ORIGINAL
# die
# pushmark
# gvsv
use Devel::Peek;
my $gvsv = B::SVOP->new( 'gvsv' => 2, '$!' );
# Now inflate the reference count for *! because this is a sneaky way
# to take a reference that doesn't inform the variable's refcnt.
Internals::SetRefCount( \*!, 1 + Internals::GetRefCount( \*! ) );
my $pushmark = B::OP->new( 'pushmark' => 2 );
my $die = B::LISTOP->new( 'die' => 5, $pushmark, $gvsv );
$die->targ( 1 );
$die->private( 1 );
my $or_root = B::LOGOP->new( 'or' => 2, $op, $die );
my $or_op = $or_root->first;
$or_op->private( 1 );
# Insert this fragment into the appropriate place in the tree. Every place
# that the ORIGINAL node was, this new node has to replace it.
# PARENT
# ->first( ORIGINAL )
# ->last( ORIGINAL )
$orig_reverse_first->first( $or_root ) if $orig_reverse_first;
$orig_reverse_last->last( $or_root ) if $orig_reverse_last;
# PARENT
# KID
# ->sibling( ORIGINAL )
$orig_reverse_sibling->sibling( $or_root ) if $orig_reverse_sibling;
# PARENT
# ORIGINAL
# ->sibling( KID )
$or_root->sibling( $orig_sibling ) if $orig_sibling;
# Now thread the execution order.
# EXT -> $op
# -> OR
# -> $orig_next
# ...
# -> $pushmark
# -> gvsv
# -> die
# Insert the OR into the execution
$op->next( $or_op );
# Continue as normal if $or succeeds
$or_op->next( $orig_next );
# Otherwise detour and then reroute back to the normal place
$or_op->other( $pushmark );
$pushmark->next( $gvsv );
$gvsv->next( $die );
$die->next( $orig_next );
1;
}
sub op_to_text
{
my $op = shift;
return 'undef' if not defined $op;
my $class = class $op;
my $name;
eval {
$name = $op->oldname;
1;
} or do {
$name = '';
};
my $addr = sprintf '(0x%07x)', $$op ;
join( '=',
grep length(),
$class, $name, $addr );
}
1;
__END__
=head1 NAME
Devel::UncheckedOps - Perl extension to warp your mind
=head1 SYNOPSIS
use Devel::UncheckedOps ( functions => [ 'print', 'prtf' ],
fix => 1 );
=head1 DESCRIPTION
This module examines the compiled perl program and either reports or fixes
unchecked system calls.
=head1 USE PARAMETERS
=over4
=item function => NAME
This parameter specifies a single function name to search for. Do remember
to document %CHECK_DICT which has a big, categorized list of op codes.
use Devel::UncheckedOps ( function => 'print' );
=item functions => \ @NAMES
This parameter specifies a list of function names to search for. Do remember
to document %CHECK_DICT which has a big, categorized list of op codes.
=item report => \ &CALLBACK
If the program is not fixing then it is reporting. This allows the user to
specify and alternate reporting function. It is passed the opcode that is
in error.
=item fix => BOOLEAN
A boolean value that triggers all the really cool guts so even non-overridable
stuff like print and printf are fixed up.
=item debug => BOOLEAN
A boolean to get some additional information. Generally this is useful when
debugging the operation of the fixing code.
=item O => B:: backend name
Put stuff like 'Deparse', 'Concise', 'Terse', 'Debug', etc. here. This just
arranges to have the program passed to the appropriate B:: backend after it
has been altered. It is like saying -MO=Deparse to a command-line script.
=item terminal_ops => \ @NAMES
Go read the source.
=back
=head1 MOD_PERL?
Mention that everything can be called directly from check() though everything
normally happens during the normal CHECK routine. This may not even be valid
to bring up.
=head1 SEE ALSO
See... what? That request that wished for this?
=head1 AUTHOR
Me.
=head1 COPYRIGHT AND LICENSE
Same as perl, etc,.
=cut
This is a patch to [cpan://B::Utils] 0.04 so that the ->parent call works correctly. [http://rt.cpan.org/NoAuth/Bug.html?id=6623]
--- .cpan/build/B-Utils-0.04/Utils.pm 2002-05-02 15:55:59.000000000 -0500
+++ perl5.8.3/lib/site_perl/5.8.3/B/Utils.pm 2004-06-14 10:40:46.730869456 -0500
@@ -3,6 +3,7 @@
use 5.006;
use strict;
use warnings;
+use vars '$DEBUG';
our @EXPORT_OK = qw(all_starts all_roots anon_subs
walkoptree_simple walkoptree_filtered
walkallops_simple walkallops_filtered
@@ -32,6 +33,11 @@
our @bad_stashes = qw(B Carp Exporter warnings Cwd Config CORE blib strict DynaLoader vars XSLoader AutoLoader base);
+sub null {
+ my $op = shift;
+ class( $op ) eq 'NULL';
+}
+
{ my $_subsdone=0;
sub _init { # To ensure runtimeness.
return if $_subsdone;
@@ -168,31 +174,112 @@
sub B::OP::parent {
my $target = shift;
+ printf( "parent %s %s=(0x%07x)\n",
+ B::class( $target),
+ $target->oldname,
+ $$target )
+ if $DEBUG;
+
die "I'm not sure how to do this yet. I'm sure there is a way. If you know, please email me."
if (!$target->seq);
- my (%deadend, $search);
- $search = sub {
- my $node = shift || return undef;
+ my (%deadend, $search_kids);
+ $search_kids = sub {
+ my $node = shift || return undef;
+
+ printf( "Searching from %s %s=(0x%07x)\n",
+ class($node)||'?',
+ $node->oldname,
+ $$node )
+ if $DEBUG;
+
# Go up a level if we've got stuck, and search (for the same
# $target) from a higher vantage point.
- return $search->($node->parent) if exists $deadend{$node};
-
- # Test the immediate children
- return $node if scalar grep {$_ == $target} $node->kids;
-
- # Recurse
- my $x;
- defined($x = $search->($_)) and return $x for $node->kids;
-
+ if ( exists $deadend{ $node } )
+ {
+ printf( " search parent %s %s=(0x%07x)\n",
+ B::class( $node ),
+ $node->oldname,
+ $$node )
+ if $DEBUG;
+ return $search_kids->( $node->parent );
+ }
+
+ # Test the immediate children, but only children we haven't visited
+ # already.
+ my @new_kids = ( grep !$deadend{ $_ },
+ $node->kids );
+ if ( scalar grep $$_ == $$target, @new_kids )
+ {
+ return $node;
+ }
+
+ # Recurse and examine each child, in turn.
+ print( " search kids\n"
+ . join( "",
+ map sprintf( " %s %s=(0x%07x)\n",
+ B::class( $_ ),
+ $_->oldname,
+ $$_ ),
+ @new_kids ) )
+ if $DEBUG and @new_kids;
+
+ for ( @new_kids )
+ {
+ my $x = $search_kids->( $_ );
+ return $x if $x;
+ }
+
# Not in this subtree.
$deadend{$node}++;
return undef;
- };
- my $result;
- my $start = $target;
- $result = $search->($start) and return $result while $start = $start->next;
- return $search->($start);
+ };
+ my $start = $target;
+
+ # Skip to the farthest sibling and make a list of each with the most
+ # recent at the beginning of the list.
+
+ # I am planning ahead for the day when it turns out that the parent
+ # cannot be found in the last sibling somewhere. Maybe it is just a
+ # null? I would like to be able to back track up the tree to find a
+ # ->next node that will bring us to northeast of (or even better,
+ # directly to) the parent.
+ my @siblings = $start;
+ while ( $start and
+ ${$start->sibling} )
+ {
+ $start = $start->sibling;
+ unshift @siblings, $start;
+ printf( "->sibling %s %s=(0x%07x)\n",
+ class($start)||'null',
+ $start->oldname,
+ $$start )
+ if $DEBUG;
+ }
+
+ # Now search each sibling as noted from above.
+ for $start ( @siblings )
+ {
+ my $next = $start;
+ while ( $$next )
+ {
+ printf( "->next %s %s=(0x%07x)\n",
+ B::class( $next ),
+ $next->oldname,
+ $$next )
+ if $DEBUG;
+
+ my $result = $search_kids->( $next );
+ return $result if $result;
+ }
+ continue
+ {
+ $next = $next->next;
+ }
+ }
+
+ # Having reached here... I give up?
+ undef;
}
=item C<< $op->previous >>
This is a patch to [cpan://B::Generate] 1.07 so that the instantiation of the various B::SPECIAL objects don't trigger segfaults ([cpan://B] got it right - B::Generate just forgot to include this). [http://rt.cpan.org/NoAuth/Bug.html?id=6595]
--- .cpan/build/B-Generate-1.06/lib/B/Generate.xs 2002-07-28 11:45:48.000000000 -0500
+++ .cpan/build/B-Generate-1.06-fix/lib/B/Generate.xs 2004-06-11 13:34:27.000000000 -0500
@@ -425,6 +425,18 @@
MODULE = B::Generate PACKAGE = B PREFIX = B_
+BOOT:
+{
+ specialsv_list[0] = Nullsv;
+ specialsv_list[1] = &PL_sv_undef;
+ specialsv_list[2] = &PL_sv_yes;
+ specialsv_list[3] = &PL_sv_no;
+ specialsv_list[4] = pWARN_ALL;
+ specialsv_list[5] = pWARN_NONE;
+ specialsv_list[6] = pWARN_STD;
+}
+
+
void
B_fudge()
CODE:
Just a quick question for you [diotalevi]. What is the reasoning behind altering every print or printf statement and not just those associated with a filehandle other than STDOUT? This question popped into my head when I saw:
print "Hello world!\n"; # Becomes # print "Hello world!\n" or die $!;
I have an idea as to why you did that. Rather than show ignorance and spout off some possibly totally stupid idea (I am still a novice perlguts/[cpan://B] spelunker after all), I think I'll just wait for a response. :)
I still wonder if that would error if the buffer weren't flushed though. It at least makes it more likely especially as the default also puts an `or die $!` on the close() op. There isn't anything to be done about implicit closes though.
An update. It has been suggested that I patch perl so that failures during implicit closes throw warnings, probably in the 'io' class.
I think you were thinking of throwing errors on STDOUT not being a valid handle. If someone had closed it before or something. This generalizes the error checking to any error including such favorites as ENOSPC. It'll still throw errors if the handle being written to isn't valid. Heck, this is how I did most of my testing - checking against a print to an invalid handle. It was too inconvenient to keep a full file system around just for testing.
print BAR "Hi!" or die $!; # Bad file descriptor at ...
perlmonks.org content © perlmonks.org and bobtfish, diotalevi, Mr. Muskrat
prlmnks.org © 2006 edmund von der burg (eccles & toad)
v 0.03