Initial Devel::UncheckedOps, a macro for perl
diotalevi
created: 2004-06-16 20:01:04

[id://357449] kicked off a quest to see if I could apply the stuff I learned in [http://www.perl.com/pub/a/2002/05/07/optree.html|Hacking The Op Tree For Fun And...]. The following code is the result of a two weeks of hacking at perl to automatically either report ( the default - a safe choice ) or fix your code that doesn't check the result of system calls. The following snippet will automatically fix your entire program so that there are no unchecked io, file, or directory calls. It even gets into print() and printf() which are both normally impossible to override.

I am posting this code because it may be a while before I release another, better version. Along the way I found that it would be easy to abstract the `compiled_program() =~ s///`-like portions of the module off and so I expect to create a Devel::Macro later so I can rewrite this as a proper perl-macro using the impending macro system. In the interim I'm thinking I may just end up really busy with other, more important stuff and instead of just letting this sit around, collecting dust I thought I'd share it and let you all at least try it out.

All the user-configurable code is handled via the import() call so where the documentation isn't clear read that bit of source. See also the %CHECK_DICT for a categorized list of opcodes that can be matched. By default the module looks for everything appropriate from io, sockets, file, directory, eval, and miscellaneous categories.

I had to submit a patch to B::Utils and to B::Generate to get this to work. I have included both patches as responses to this node (mostly so the code I actually want to post isn't cluttered up with these related patches).

use Devel::UncheckedOps
    ( fix => 1,
      functions => [ map @$_,
                     @Devel::UncheckedOps::CHECK_DICT{ qw( io file directory ) } ] );

print "Hello world!\n";
# Becomes
# print "Hello world!\n" or die $!;
package Devel::UncheckedOps;
use strict;
use warnings FATAL => 'all';
use B qw( OPf_WANT_VOID class );
use B::Utils qw( walkallops_filtered opgrep );
use vars qw( $REPORT_CALLBACK %CHECK_DICT @CHECK_OPS @TERMINAL_OPS $VERSION
             $OPCODE_NAME $FIX_OPCODES $O_PM @QUEUED_FIXES_TO_APPLY $DEBUG );
# %ALLOPS %OPMAP
use Carp qw( carp );

$VERSION = '0.01';

# This is a largish list of stuff I think can be validated by this module.
# The default list of opcodes that will be checked is defined in @CHECK_OPS
# immediately following and normal users specify the list of ops to validate
# by passing in a reference to an array to the 'check' parameter of the use()
# call.
%CHECK_DICT = (
               # 'write' would be a nice op to add but I do not yet know how
               # it works in code.
               io =>
               [ qw[open close binmode dbmclose dbmopen fcntl flock getc ioctl
                    pipe_op tie read print prtf seek send sysopen sysread
                    sysseek syswrite recv tell truncate ] ],
               
               sockets =>
               [ qw[accept bind connect listen shutdown sockpair ] ],
               
               file =>
               [ qw[ chdir chmod chown chroot link mkdir readlink rename rmdir
                     symlink unlink utime rmdir ] ],
               
               directory =>
               [ qw[ closedir open_dir readdir rewinddir seekdir telldir ] ],
               
               # process =>
               # TODO:
               #   Check backtick
               #   die $?, not $!
               # [ qw[ exec fork kill system ] ],
               
               # I do not know how to validate the semaphore, shared memory,
               # or message passing code. Thi
               shared_memory =>
               [ qw[ shmctl shmget shmread shmwrite ] ],

               message_passing =>
               [ qw[ msgctl msgget msgrcv msgsnd ] ],

               semaphores =>
               [ qw[ semctl semget semop ] ],
               
               eval => [ qw[ dofile ] ],
               
               miscellaneous =>
               [ qw[ syscall ] ]
               
               );

@CHECK_OPS = ( map @$_,
               @CHECK_DICT{ qw( io
                                sockets
                                file
                                directory
                                eval
                                miscellaneous ) } );

# I started with just nextstate and leavesub but while reading opcode.pl
# went "eh, what the heck. Why not?" and just included the raft of related
# opcodes.
@TERMINAL_OPS = ( qw[ method
                      entersub
                      leavesub
                      leavesublv
                      caller
                      reset
                      
                      lineseq
                      nextstate
                      dbstate
                      unstack
                      enter
                      leave
                      scope
                      enteriter
                      iter
                      enterloop
                      leaveloop
                      return
                      last
                      next
                      redo
                      dump
                      goto
                      exit ] );

$REPORT_CALLBACK = \ &default_report;

CHECK
{
    check();
}

# Create an alias so that when fixing, a person can say fix() instead. This
# might only be interesting when the normal CHECK call wasn't called.
*fix = \✓
sub check
{
    if ( $FIX_OPCODES )
    {
        walkallops_filtered( \ &find_unchecked_system_call,
                             \ &queue_fix_opcode );
        fix_opcode( $_ ) for @QUEUED_FIXES_TO_APPLY;
    }
    else
    {
        walkallops_filtered( \ &find_unchecked_system_call,
                             $REPORT_CALLBACK );
    }
    
    if ( $O_PM )
    {
        eval "use O '$O_PM'";
    }

    return 1;
}

sub import
{
    my $class = shift;
    my %p = @_;
    
    # Ethier take a callback from the user via 
    #   use Devel::UncheckedOps ( callback => sub { ... } );
    # or supply a default.
    $REPORT_CALLBACK = $p{'report_callback'}
        if $p{'report'};

    # Allow both `use Devel::UncheckedOps( check => 'print' )` or
    # `use Devel::UncheckedOps( check => [ 'print' ] )`. This is the
    # parameter I most expect people to specify.
    if ( $p{'function'} )
    {
        @CHECK_OPS = $p{'function'};
    }
    elsif ( $p{'functions'} )
    {
        @CHECK_OPS = @{$p{'functions'}};
    }
    
    # This is a boolean value.
    $FIX_OPCODES = !! $p{'fix'};
    if ( $FIX_OPCODES )
    {
        eval q[
               use B::Generate ();
               use Internals ();
               1;
               ]
                   or carp( $@ );
    }
    
    # This is a boolean value. Various guts will be displayed if you pass
    # in a true value. The guts that are displayed are entirely up to my
    # most recent needs.
    $DEBUG = !! $p{'debug'};
    
    # This is passed to `use O '$O_PM'` so the user of this module can say
    # `use Devel::UncheckedOps( O => 'Deparse' )` to see what the
    # code looks like after deparsing. The parameter is any module in the B::
    # namespace that has already been designed to be called by O.pm in this
    # way. This includes Bblock, Bytecode, C, CC, Concise, Debug, Deparse,
    # Showlex, Stackobj, Stash, Terse, Xref or any other module you might
    # get from CPAN like B::Deobfuscate.
    $O_PM = $p{'O'};

    # I seriously doubt that anyone is going to need to specify these. I
    # include this solely for debugging purposes and perhaps the eventual
    # need for it.
    @TERMINAL_OPS = @{$p{'terminal_ops'}}
        if $p{'terminals'};

    return 1;
}

sub find_unchecked_system_call
{
    # This is used by B::Utils::*_filtered to grep for opcodes that need to
    # be reported or fixed.

    my $op = shift;

    # I am going to fix/report this in another function immediately following.
    $OPCODE_NAME = $op->oldname;
    
#     if ( $FIX_OPCODES )
#     {
#         my $addr = $$op;
#         for my $m ( qw( sibling
#                         first
#                         last ) )
#         {
#             my $to = eval { ${ $op->$m } };
#             next unless $to;
            
#             push @{$OPMAP{ $to }}, [ $op, $m ];
#         }
#     }
    
    # B::Utils::opgrep test to decide if this opcode is one that is desirable.
    return ( opgrep( { name => \ @CHECK_OPS,
                       flags => OPf_WANT_VOID },
                     $op )
             or
             opgrep( { name => \ @CHECK_OPS,
                       next => { name => \ @TERMINAL_OPS } },
                     $op ) );
}

sub default_report
{
    # This is the default callback for reporting that something has gone awry.
    # It may be overriden by saying 
    # use Devel::UncheckedOps( report => \ &other_sub );
    carp( "Unchecked $OPCODE_NAME"
          . " call at $B::Utils::file line $B::Utils::line" );
}

sub queue_fix_opcode
{
    # This function puts fixes into a to-do list so that they are only
    # altered when the tree is not being currently walked.
    my $op = shift;
    push @QUEUED_FIXES_TO_APPLY, { op => $op,
                                   file => $B::Utils::file,
                                   line => $B::Utils::line };
    
    return 1;
}

sub fix_opcode
{
    # This function accepts a 'fix' as previously queued by queue_fix_opcode().
    my $fix = shift;
    my $op   = $fix->{'op'};
    my $file = $fix->{'file'};
    my $line = $fix->{'line'};

    printf( __PACKAGE__
            . " FIXING %s at %s line %s\n",
            op_to_text( $op ),
            $file,
            $line )
        if $DEBUG;
    
    # This is the in-memory address of the opcode. It is used
    my $orig_next = $op->next;
    my $orig_sibling = $op->sibling;
    printf( __PACKAGE__
            . " SIBLING %s\n",
            op_to_text( $orig_sibling ) )
        if $DEBUG;


    my $orig_parent = $op->parent;
    printf( __PACKAGE__
            . " PARENT %s\n",
            op_to_text( $orig_parent ) )
        if $DEBUG;
    
    my $orig_reverse_first;
    $orig_reverse_first = $orig_parent
        if ${$orig_parent->first} == $$op;
    printf( __PACKAGE__
            . " PARENT->FIRST %s\n",
            op_to_text( $orig_parent->first ) )
        if $DEBUG;
    
    my $orig_reverse_last;
    $orig_reverse_last = $orig_parent
        if ${$orig_parent->last} == $$op;
    printf( __PACKAGE__
            . " PARENT->LAST %s\n",
            op_to_text( $orig_parent->last ) )
        if $DEBUG;
    
    # Maybe find the opcode that thinks this opcode is its sibling by going
    # to this opcode's parent and walking over the list of siblings until this
    # one is reached. The previously visited opcode is the one we're after.
    my @siblings = $orig_parent->kids;
    my $orig_reverse_sibling = ( grep ${$siblings[$_]->sibling} == $$op,
                                 0 .. $#siblings - 1 )[0];
    $orig_reverse_sibling = $siblings[ $orig_reverse_sibling ]
        if defined $orig_reverse_sibling;
    
    printf( __PACKAGE__
            . " REVERSE SIBLING %s\n",
            op_to_text( $orig_reverse_sibling ) )
        if ( $DEBUG
             and $orig_reverse_sibling
             and ${$orig_reverse_sibling->sibling} == $$op );
    
    # Construct the new program fragment in reverse order so parent nodes
    # can point to child nodes. This alters the original node so it is
    # now inside the new fragment.
    
    # or
    #    ORIGINAL
    #    die
    #       pushmark
    #       gvsv
    
    use Devel::Peek;
    my $gvsv = B::SVOP->new( 'gvsv' => 2, '$!' );
    
    # Now inflate the reference count for *! because this is a sneaky way
    # to take a reference that doesn't inform the variable's refcnt.
    Internals::SetRefCount( \*!, 1 + Internals::GetRefCount( \*! ) );

    my $pushmark = B::OP->new( 'pushmark' => 2 );
    my $die = B::LISTOP->new( 'die' => 5, $pushmark, $gvsv );
    $die->targ( 1 );
    $die->private( 1 );
    my $or_root = B::LOGOP->new( 'or' => 2, $op, $die );
    my $or_op = $or_root->first;
    $or_op->private( 1 );
    
    
    
    # Insert this fragment into the appropriate place in the tree. Every place
    # that the ORIGINAL node was, this new node has to replace it.
    
    # PARENT
    #   ->first( ORIGINAL )
    #   ->last(  ORIGINAL )
    $orig_reverse_first->first( $or_root ) if $orig_reverse_first;
    $orig_reverse_last->last( $or_root ) if $orig_reverse_last;

    # PARENT
    #    KID
    #       ->sibling( ORIGINAL )
    $orig_reverse_sibling->sibling( $or_root ) if $orig_reverse_sibling;
    
    # PARENT
    #    ORIGINAL
    #       ->sibling( KID )
    $or_root->sibling( $orig_sibling ) if $orig_sibling;


    # Now thread the execution order.
    # EXT -> $op
    #     -> OR
    #        -> $orig_next
    #           ...
    #        -> $pushmark
    #           -> gvsv
    #           -> die
    # Insert the OR into the execution
    $op->next( $or_op );
    
    # Continue as normal if $or succeeds
    $or_op->next( $orig_next );
    
    # Otherwise detour and then reroute back to the normal place
    $or_op->other( $pushmark );
    $pushmark->next( $gvsv );
    $gvsv->next( $die );
    $die->next( $orig_next );

    1;
}

sub op_to_text
{
    my $op = shift;
    
    return 'undef' if not defined $op;
    my $class = class $op;
    my $name;
    eval {
        $name = $op->oldname;
        1;
    } or do {
        $name = '';
    };
    
    my $addr = sprintf '(0x%07x)', $$op ;
    
    join( '=',
          grep length(),
          $class, $name, $addr );
}

1;

__END__

=head1 NAME

Devel::UncheckedOps - Perl extension to warp your mind

=head1 SYNOPSIS

  use Devel::UncheckedOps ( functions => [ 'print', 'prtf' ],
                                  fix => 1 );

=head1 DESCRIPTION

This module examines the compiled perl program and either reports or fixes
unchecked system calls. 

=head1 USE PARAMETERS

=over4

=item function => NAME

This parameter specifies a single function name to search for. Do remember
to document %CHECK_DICT which has a big, categorized list of op codes.

 use Devel::UncheckedOps ( function => 'print' );

=item functions => \ @NAMES

This parameter specifies a list of function names to search for. Do remember
to document %CHECK_DICT which has a big, categorized list of op codes.

=item report => \ &CALLBACK

If the program is not fixing then it is reporting. This allows the user to
specify and alternate reporting function. It is passed the opcode that is
in error.

=item fix => BOOLEAN

A boolean value that triggers all the really cool guts so even non-overridable
stuff like print and printf are fixed up.

=item debug => BOOLEAN

A boolean to get some additional information. Generally this is useful when
debugging the operation of the fixing code.

=item O => B:: backend name

Put stuff like 'Deparse', 'Concise', 'Terse', 'Debug', etc. here. This just
arranges to have the program passed to the appropriate B:: backend after it
has been altered. It is like saying -MO=Deparse to a command-line script.

=item terminal_ops => \ @NAMES

Go read the source.

=back

=head1 MOD_PERL?

Mention that everything can be called directly from check() though everything
normally happens during the normal CHECK routine. This may not even be valid
to bring up.

=head1 SEE ALSO

See... what? That request that wished for this?

=head1 AUTHOR

Me.

=head1 COPYRIGHT AND LICENSE

Same as perl, etc,. 

=cut
Re: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-16 20:02:23

This is a patch to [cpan://B::Utils] 0.04 so that the ->parent call works correctly. [http://rt.cpan.org/NoAuth/Bug.html?id=6623]

--- .cpan/build/B-Utils-0.04/Utils.pm	2002-05-02 15:55:59.000000000 -0500
+++ perl5.8.3/lib/site_perl/5.8.3/B/Utils.pm	2004-06-14 10:40:46.730869456 -0500
@@ -3,6 +3,7 @@
 use 5.006;
 use strict;
 use warnings;
+use vars '$DEBUG';
 our @EXPORT_OK = qw(all_starts all_roots anon_subs
                     walkoptree_simple walkoptree_filtered
                     walkallops_simple walkallops_filtered
@@ -32,6 +33,11 @@
 
 our @bad_stashes = qw(B Carp Exporter warnings Cwd Config CORE blib strict DynaLoader vars XSLoader AutoLoader base);
 
+sub null {
+    my $op = shift;
+    class( $op ) eq 'NULL';
+}
+
 { my $_subsdone=0;
 sub _init { # To ensure runtimeness.
     return if $_subsdone;
@@ -168,31 +174,112 @@
 
 sub B::OP::parent {
     my $target = shift;
+    printf( "parent %s %s=(0x%07x)\n",
+	    B::class( $target),
+	    $target->oldname,
+	    $$target )
+	if $DEBUG;
+
     die "I'm not sure how to do this yet. I'm sure there is a way. If you know, please email me."
         if (!$target->seq);
-    my (%deadend, $search);
-    $search = sub {
-        my $node = shift || return undef;
 
+    my (%deadend, $search_kids);
+    $search_kids = sub {
+        my $node = shift || return undef;
+	
+	printf( "Searching from %s %s=(0x%07x)\n",
+		class($node)||'?',
+		$node->oldname,
+		$$node )
+	    if $DEBUG;
+		
         # Go up a level if we've got stuck, and search (for the same
         # $target) from a higher vantage point.
-        return $search->($node->parent) if exists $deadend{$node};
-
-        # Test the immediate children
-        return $node if scalar grep {$_ == $target} $node->kids;
-
-        # Recurse
-        my $x;
-        defined($x = $search->($_)) and return $x for $node->kids;
-
+        if ( exists $deadend{ $node } )
+	{
+	    printf( "   search parent %s %s=(0x%07x)\n",
+		    B::class( $node ),
+		    $node->oldname,
+		    $$node )
+		if $DEBUG;
+	    return $search_kids->( $node->parent );
+	}
+	
+        # Test the immediate children, but only children we haven't visited
+	# already.
+	my @new_kids = ( grep !$deadend{ $_ },
+			 $node->kids );
+	if ( scalar grep $$_ == $$target, @new_kids )
+	{
+	    return $node;
+	}
+	
+        # Recurse and examine each child, in turn.
+	print( "   search kids\n"
+	       . join( "",
+		       map sprintf( "      %s %s=(0x%07x)\n",
+				    B::class( $_ ),
+				    $_->oldname,
+				    $$_ ),
+		       @new_kids ) )
+	    if $DEBUG and @new_kids;
+	
+	for ( @new_kids )
+	{
+	    my $x = $search_kids->( $_ );
+	    return $x if $x;
+	}
+	
         # Not in this subtree.
         $deadend{$node}++;
         return undef;
-   };
-   my $result;
-   my $start = $target;
-   $result = $search->($start) and return $result while $start = $start->next;
-   return $search->($start);
+    };
+    my $start = $target;
+    
+    # Skip to the farthest sibling and make a list of each with the most
+    # recent at the beginning of the list.
+    
+    # I am planning ahead for the day when it turns out that the parent
+    # cannot be found in the last sibling somewhere. Maybe it is just a
+    # null? I would like to be able to back track up the tree to find a
+    # ->next node that will bring us to northeast of (or even better,
+    # directly to) the parent.
+    my @siblings = $start;
+    while ( $start and
+	    ${$start->sibling} )
+    {
+	$start = $start->sibling;
+	unshift @siblings, $start;
+	printf( "->sibling %s %s=(0x%07x)\n",
+		class($start)||'null',
+		$start->oldname,
+		$$start )
+	    if $DEBUG;
+    }
+    
+    # Now search each sibling as noted from above.
+    for $start ( @siblings )
+    {
+	my $next = $start;
+	while ( $$next )
+	{
+	    printf( "->next %s %s=(0x%07x)\n",
+		    B::class( $next ),
+		    $next->oldname,
+		    $$next )
+		if $DEBUG;
+	
+	    my $result = $search_kids->( $next );
+	    return $result if $result;
+	}
+	continue
+	{
+	    $next = $next->next;
+	}
+    }
+    
+    # Having reached here... I give up?
+    undef;
 }
 
 =item C<< $op->previous >>
Re: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-16 20:03:49

This is a patch to [cpan://B::Generate] 1.07 so that the instantiation of the various B::SPECIAL objects don't trigger segfaults ([cpan://B] got it right - B::Generate just forgot to include this). [http://rt.cpan.org/NoAuth/Bug.html?id=6595]

--- .cpan/build/B-Generate-1.06/lib/B/Generate.xs	2002-07-28 11:45:48.000000000 -0500
+++ .cpan/build/B-Generate-1.06-fix/lib/B/Generate.xs	2004-06-11 13:34:27.000000000 -0500
@@ -425,6 +425,18 @@
 
 MODULE = B::Generate	PACKAGE = B	PREFIX = B_
 
+BOOT:
+{
+    specialsv_list[0] = Nullsv;
+    specialsv_list[1] = &PL_sv_undef;
+    specialsv_list[2] = &PL_sv_yes;
+    specialsv_list[3] = &PL_sv_no;
+    specialsv_list[4] = pWARN_ALL;
+    specialsv_list[5] = pWARN_NONE;
+    specialsv_list[6] = pWARN_STD;
+}
+
+
 void
 B_fudge()
     CODE:
Re: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-17 13:13:36

Just a quick question for you [diotalevi]. What is the reasoning behind altering every print or printf statement and not just those associated with a filehandle other than STDOUT? This question popped into my head when I saw:

print "Hello world!\n";
# Becomes
# print "Hello world!\n" or die $!;

I have an idea as to why you did that. Rather than show ignorance and spout off some possibly totally stupid idea (I am still a novice perlguts/[cpan://B] spelunker after all), I think I'll just wait for a response. :)

Re^2: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-17 14:02:59
perl -e 'print "'ello world\n" or die $!' > file_on_fs_that_is_already_full

I still wonder if that would error if the buffer weren't flushed though. It at least makes it more likely especially as the default also puts an `or die $!` on the close() op. There isn't anything to be done about implicit closes though.


An update. It has been suggested that I patch perl so that failures during implicit closes throw warnings, probably in the 'io' class.

Re^3: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-18 11:41:24
I did not even think of that one. I was thinking that if someone used select to set the default filehandle then you might need to make a second pass through to figure out what filehandle is really being used.
Re^4: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-18 14:10:06
What does it matter which filehandle is being written to if an error is throw? I don't follow why you care whether the handle is named STDIN or is default.
Re^5: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-19 01:01:40
Because not being able to write to STDOUT usually means that there is something more serious than a script error. Right? Or has my brain gone soft this week of vacation?
Re^6: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-19 08:11:13

I think you were thinking of throwing errors on STDOUT not being a valid handle. If someone had closed it before or something. This generalizes the error checking to any error including such favorites as ENOSPC. It'll still throw errors if the handle being written to isn't valid. Heck, this is how I did most of my testing - checking against a print to an invalid handle. It was too inconvenient to keep a full file system around just for testing.

print BAR "Hi!" or die $!;
# Bad file descriptor at ...
Re: Initial Devel::UncheckedOps, a macro for perl
created: 2004-06-18 05:52:11
Sorry for the line-noise comment, however just wanted to say I'm SO impressed.

I'll be testing this and once it gets stable/in CPAN etc we'll probably be using it as a tool for checking our code.

perlmonks.org content © perlmonks.org and bobtfish, diotalevi, Mr. Muskrat

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03