Perl Golf 101
chargrill
created: 2006-05-26 10:43:59

I was recently trying to golf down some code to post as an obfuscation, and in a flash of inspiration happened upon some golfing techniques that I've never been able to find listed in a tidy package, so I thought I'd post my tips and tricks (relatively basic though they are) for anyone on a similar quest to shave characters.

This isn't meant as an exhaustive treatise on golfing, there are others much better at it than I. However, I was unable to find a good "howto" anywhere, and though I'd make a little one here. It's my hope that others better at this sport might share some tidbits of their own.

Note: It's my hope to polish this up and have it moved to the tutorial section. I'll leave this as a readmore until then.

Important: NO tip or technique presented here has ANY reason to show up in production code - golf is fine for fun and games, but can and will cause production code to fail to run properly, fail to be easy to maintain, etc.

  1. Reduce all variable and subroutine names down to single characters. Eliminate as much whitespace as possible. These might be obvious, but it's certainly a starting point. For purposes of readability of this tutorial, whitespace is NOT fully eliminated.
  2. Do you really need strict compliance? Get rid of variable declarations where possible. Dropping this:
    my(%s,$y,$l,$t,$d);
    
    turns something like this:
    my $k = k(\%s);
    
    into this:
    $k = k();
    
    and this:
    my $l = $d / int(length($t) / $k) / 100;
    
    into this:
    $l = $d / int(length($t) / $k) / 100;
    
    Hint: $_ doesn't need to be declared as a my variable.
  3. Do you actually need lexically scoped variables? If you do, take as much advantage of any "my" declarations as you can:
    sub i{
      my ($g,$l,$t) = @_;
      my @c;
      ...
    }
    
    1. $l and $t are now globals, so those get dropped, but I still need a lexically scoped $g. Also, the code calling this sub is shortened to also take advantage of the globals.
    2. I still need a lexically scoped array @c, and I know that sub i will never be called with more than one parameter, so I save some characters:
    sub i{
      my ($g,@c) = @_;
      ...
    }
    
  4. my($o)=@_; is 1 less character than my$0=shift;. Furthermore, if you know that subroutine will always get called with a valid parameter, you can save even more: ($o)=@_;.
  5. Rearrange while and for loops in postfix notation where possible (one statement per iteration):
    for(3..6){u($_)}
    
    becomes:
    u($_)for 3..6;
    
  6. Taking advantage of default expressions for functions (generally $_), remembering other defaults for operators and other perl idioms (<>, for example), along with the above hint turns this:
    while(){$t.=lc$_}
    
    into this:
    $t.=lc for<>;
    
    If you need to rotate an array, try it in a subroutine so you can take advantage of @_:
    push @_, shift;
    
    If it's an array of arrays that you want to rotate:
    push@$_, shift@$_ for @AoA;
    
  7. 'Inline' subroutine and function calls. Dropping a global $k out of the call and dropping "my" where it isn't necessary turn this:
    my $n = t($_,$k);
    my %n = f($n);
    
    into:
    %n = f(t($_));
    
  8. When you can't postfix a for loop, try to use map in void context. This, combined with dropping globals, dropping lexical scoping, and adding inline functions takes something like this:
    for(0..($k-1)){
      my $n = t($_,$k);
      my %n = f($n);
      my @g = b(1,\%n);
      $y   .= i(\@g,$l,\%t)
    }
    
    and turns it into this:
    map{
      %n = f(t($_));
      @g = b(1,\%n);
      $y .= i(\@g)
    } 0..$k-1;
    
  9. Drop temporary variables where possible. Cast function/subroutine return types. Combining this idea with inline functions in a simple example:
    @array = routine1($param1,$param2);
    $result = join ':', @array;
    
    becomes:
    $result = join ':', @{routine1($param1,$param2)}
    
    Combining this idea with using map in a void context, inline functions takes this code:
    for((split//,$t)){
      my $l = (split //,$y)[$c];
      my $s = join '', @$l;
      my $p = index($s,$_);
      $o   .= $p >= 0 ? $a[$p] : $_;
      $c   += $c < $k-1 ? 1 : -$k+1
    }
    
    and turns it into:
    map{
      $p  = index((join '', @{(split //,$y)[$c]} ), $_);
      $o .= $p >= 0 ? $a[$p] : $_;
      $c += $c < $k-1 ? 1 : 1-$k
    } split //, $t;
    
  10. Drop "if" checks when iterating when possible. Don't forget the ||= operator. Combining this with postfix for loop notation:
    for(a..z){
      if( ! $d{$_} ){
        $d{$_} = 0
      }
    }
    
    becomes:
    $d{$_} ||= 0 for a..z;
    
  11. Remember how your data is used. In the original fuller version, I was formatting the return code for display. In my golfed version, I no longer display intermediary values, so I can drop the formatting:
    my @k = map { $_->[1] } sort { $b->[0] <=> $a->[0] } @c;
    my $r = $k[0];
    $r =~ s/([a-z])\s\(.*/$1/;
    return $r
    
    with dropping the intermediary variable @k, becomes:
    return( map{ $_->[1] } sort { $b->[0] <=> $a->[0] } @c )[0]
    

More examples:

  1. Taking advantage of subroutine parameter initialization to get extra lexical variables, postfix for notation, new globals:
    sub b{
      my( $e, $l ) = @_;
      my @g;
      for(sort keys %$l){
        push @g, [ $_, '=', (split //,'#' x int($$l{$_} * $e))]
      }
      return @g
    }
    
    becomes:
    sub b {
      my( $e, $l, @g) = @_;
      push @g, [ $_, (split //,'#' x int($$l{$_} * $e))] for sort keys %$l;
      return @g
    }
    
  2. Populating a hash with a default values for all desired keys if the array passed in doesn't already contain them:
    sub f{
      my %d;
      $d{$_}++ for grep /[a-z]/, split //, shift;
      for(a..z){
        if( ! $d{$_}){
          $d{$_} = 0
        }
      }
      return %d
    }
    
    becomes:
    sub f{
      my %d;
      $d{$_}++ for grep /[a-z]/, split //, shift;
      $d{$_} ||= 0 for a..z;
      return %d
    }
    
  3. Using globals, taking advantage of subroutine parameter initialization, using map in void context instead of a for loop (twice), dropping temporary variables, ignoring display formatting:
    sub i{
      my ($g,$l,$t) = @_;
      my @c;
      for(0..25){
        my $v = v($g,$l,$t);
        push @c, o($v,$$g[0][0]);
        w($g)
      }
      my @k = map{ $_->[1] }
             sort{ $b->[0]<=>$a->[0] } @c;
      my $r = $k[0];
      $r =~ s/([a-z])\s\(.*/$1/;
      return $r
    }
    
    becomes:
    sub i{
      my ($g,@c) = @_;
      map{ push @c, o(v($g), $$g[0][0]); w($g) } 0..25;
      return( map{ $_->[1] } sort { $b->[0] <=> $a->[0] } @c )[0]
    }
    
  4. Using a global (no longer passing a hashref), inlining functions:
    sub k{
      my $s = shift;
      my @g;
      for( sort{ $$s{$b} <=> $$s{$a} || $a cmp $b } keys %$s ){
        last if $$s{$_} < 3;
        next unless $_ =~ y/a-z// > 2;
        my @f;
        push @f, ( pos($t)-3 ) while $t =~ /$_/g;
        my $g = c(n(@f));
        $g > 2 && push @g, $g
      }
      return c(@g)
    }
    
    becomes:
    sub k{
      my @g;
      for( sort{ $s{$b} <=> $s{$a} } keys %s ){
        last if $s{$_} < 3;
        next unless y/a-z// > 2;
        my @f;
        push @f, (pos($t)-3) while $t =~ /$_/g;
        my $g = c(n(@f));
        $g > 2 && push @g, $g
      }
      return c(@g)
    }
    
  5. Rolling two for loops into maps shaved a few characters:
    sub o{
      my ($g,$w) = @_;
      my $c = 0;
      for( @$g ){
        for( @$_ ){
          /\+/ && $c++;
          /\-/ && $c--
        }
      }
      return [$c,$w]
    }
    
    becomes:
    sub o{
      my ($g,$w) = @_;
      my $c = 0;
      map{
        map{ /\+/ && $c++; /\-/ && $c--} @$_
      } @$g;
      return [$c,$w]
    }
    
  6. Using a global, rolling a for loop into a map:
    sub t{
      my ($o,$k) = @_;
      my $c = 0;
      my $r;
      for(split //,$t){
        $r .= $_ unless(($c+($k-$o)) % $k);
        $c++
      }
      $r =~ s/[^a-z]//g;
      return $r
    }
    
    becomes:
    sub t{
      my ($o) = @_;
      my $c = 0;
      my $r;
      map{ $r .= $_ unless($k-$o+$c) % $k; $c++ } split //,$t;
      $r =~ s/[^a-z]//g;
      return $r
    }
    
  7. Using globals, skipping a my declaration (I know the sub only be called with a single, valid parameter, so $s will always be unique), rolling two for loops in maps in void context, skipping a long if(){}elsif(){} by using the ternary ?: with the same net effect:
    sub v {
      my ($m,$l,$t) = @_;
      my @g = b($l,$t);
      my $s = \@g;
      my $z = 0;
      for( @$m ){
        my $x = 0;
        for( @$_ ){
          if( $$m[$z][$x] eq '#' && $$s[$z][$x] eq '#' ){
            $$s[$z][$x] = '+'
          }
          elsif( $$m[$z][$x] eq '#'&&$$s[$z][$x] ne '#' ){
            $$s[$z][$x] = '-'
          }
          $x++
        }
        $z++
      }
      return $s
    }
    
    becomes:
    sub v{
      my ($m) = @_;
      my @g = b($l,\%t);
      $s = \@g;
      $z = 0;
      map{ $x=0;
           map{
               $$s[$z][$x] = $$m[$z][$x] eq '#' && $$s[$z][$x] eq '#' ? '+' : '-';
               $x++
              } @$_;
           $z++
         } @$m;
      return$s
    }
    

Well, those are all the tips and tricks I have used in a recent obfuscation, where I shaved about 200 characters off of 1785 or so. Could it have been golfed further? Probably. Let me know if there's anything I've missed!



--chargrill
$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse split//=>$*
){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$#C]=$/;($#C
>$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^$$C[$%++]}
Re: Perl Golf 101
created: 2006-05-26 10:57:59
This isn't meant as an exhaustive treatise on golfing, there are others much better at it than I. However, I was unable to find a good "howto" anywhere, and though I'd make a little one here. It's my hope that others better at this sport might share some tidbits of their own.

There used to be a nice and very extensive reference at http://terje2.perlgolf.org/~golf-info/Book.html, but the domain seem to have expired. I should have a local copy at home.

Re^2: Perl Golf 101
created: 2006-05-26 11:57:08

Yes, in fact I specifically recall that exact URL when I was looking for hints and tips, but couldn't even convince google to spit out a cache. If you could post your local copy somewhere, that would be awesome, as a relative newcomer to golf, I know I'm already starting at a handicap. :)



--chargrill
$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse split//=>$*
){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$#C]=$/;($#C
>$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^$$C[$%++]}
Re^3: Perl Golf 101
created: 2006-05-26 12:05:37
There might be another site that caches that sort of stuff - I'm trying to remember the URL tho. It's kind of an archive of pages/changes to sites over the years. If I come up with the site I'll post it and let you know (might be in my notes from a class I took a few years ago).
Re^4: Perl Golf 101
created: 2006-05-27 05:26:20
You are probably thinking of this site and this site and the mailing list archives.
Re^2: Perl Golf 101
created: 2006-05-26 14:15:22

< mtve> we with km forgot to pay for it in time, then it was freed and now it's taken away by those mo***s.

From a conversation at #perlgolf, talking about perlgolf.org yesterday.

Re^2: Perl Golf 101
created: 2006-05-28 10:02:33

Yep, we've lost this domain :(

Book is here http://terje2.frox25.no-ip.org/~golf-info/Book.html

Re: Perl Golf 101
created: 2006-05-26 11:01:49
A very nice guide!  [chargrill]++.

One note, don't forget that the return of a subroutine is the last statement of the subroutine.  So instead of:

sub o{
  my ($g,$w) = @_;
  my $c = 0;
  map{
    map{ /\+/ && $c++; /\-/ && $c--} @$_
  } @$g;
  return [$c,$w]
}
You can just do:
sub o{
  my ($g,$w) = @_;
  my $c = 0;
  map{
    map{ /\+/ && $c++; /\-/ && $c--} @$_
  } @$g;
  [$c,$w]
}

s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re^2: Perl Golf 101
created: 2006-05-26 12:00:52

You know, the funny thing is that I know that, and that was one of the first changes I made in going from my full, fairly properly coded version to the golfed version. However, I specifically recall that there was at least one of my subs that refused to function correctly after the change. Was it from removing the explicit return, or some other change I made? I don't know, but I do know that restoring the return (and probably some other subtle changes) restored the functionality, so from that point on, dropping the explicit return was left out of my golf bag.



--chargrill
$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse split//=>$*
){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$#C]=$/;($#C
>$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^$$C[$%++]}
Re^3: Perl Golf 101
sgt
created: 2006-05-30 18:17:27

actually a few months ago I remember a discussion on p5p on return. with no explicit return and with control structures it's actually kind of a lottery to guess the actual *return*

the sad thing was that it was no easy to fix... and if I recall correctly PBP says always use return ...orthogonal to golf

Re^4: Perl Golf 101
created: 2006-05-30 20:10:01
...orthogonal to golf

Indeed - I dare anyone to run perlcritic* against any golfed code :-) (Which is why I posted the big notice about not letting any of these "techniques" make it into production code ;)

I've gotten in the habit of running perlcritic against pretty much all of my production level code these days, and taking the output with a grain of salt. My .perlcriticrc does contain quite a few things from PBP that I don't particularly agree with...

* For those unfamiliar with it, perlcritic is a commandline utility that (by default) critiques your code against a set of rules as outlined by Perl Best Practices.



--chargrill
$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse split//=>$*
){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$#C]=$/;($#C
>$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^$$C[$%++]}
Re: Perl Golf 101
created: 2006-05-26 21:32:31
A lot of times, golf is about shaving off a few characters here and there. I'm sure these have been pointed out before, but:
  • y///c is one character shorter than length
  • ($x=~/./g) is two characters shorter than (split//,$x)
  • bareword=> can save a character over 'bareword',
For control flow, you want for, map, and punctuation. map and for are largely interchangeable. while is almost never necessary. Don't forget about the C-language-style for(;;), although it's rarely a gain. Postfix your operators; statement-modifier for doesn't require the parens or the braces that make a BLOCK. Replace branching if() statements with ternary or logical operators. Precedence rules will help here; also, remember that in Perl, true is '1' and false is the empty string. If you don't need the short-circuiting, you can use the bitwise flavor of & and |. Know which builtin variables are used and what special effects they have. ($- and $=, I'm looking at you.)

From the snippets you have posted:

$o .= $p >= 0 ? $a[$p] : $_;
could become $o .= $p < 0 ? $_ : $a[$p] and
$r =~ s/[^a-z]//g;
could become $r =~ y/a-z//cd;
Re^2: Perl Golf 101
created: 2006-05-27 02:45:35

Excellent tips! I'll have to incorporate those into my golfed obfu, though I shan't post an updated version because its format is quite well fixed with the existing number of characters it had prior to the application of some of the tips received since posting. Since the first round of feedback from this node, I've already cut down another ~60 characters, and with yours I'm sure I'll cut down quite a few as well, as the original makes a fair amount of use of some of the idioms you're suggesting which can be golfed further.

I sort of hinted at postfixing for not requiring the parens of braces without stating it explicitly, along with order of precedence.

As far as builtin variables, while I know what $- and $= are (and default to), I don't have anything in my notes about special effects they have - just a note for $* making any numerical value assigned to it an implicit int (which gives me an idea... ;-)

I will more than likely incorporate some of your suggestions (and especially your explanations!) into my original post, with of course your permission (and proper attribution on my part).



--chargrill
$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$*.=pack'c'=>$,+=$_}for(reverse split//=>$*
){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$#C]=$/;($#C
>$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$|;for(@$c){print$_^$$C[$%++]}
Re^3: Perl Golf 101
created: 2006-05-27 03:36:51
As you point out, $* converts numerical values to int. $= is similar except that it converts everything to int. $- is the same as $= except it can't go negative, or higher than INT_MAX. ($= seems to be able to go up to UINT_MAX.)

perlmonks.org content © perlmonks.org and blazar, chargrill, jwkrahn, liverpole, mtve, nimdokk, sgt, szbalint, whio

prlmnks.org © 2006 edmund von der burg (eccles & toad)

v 0.03