Sunday, January 24, 2010

Recursion makes my brain hurt...

"So the other day I was at the local bar, working with recursion and..." I know we've all heard this joke before and it always ends in new and more imaginative ways to reduce a system to a quivering pile of uselessness. This got me wondering if there might be some standard nuts and bolts standards to avoid the pitfalls of writing recursive code. Most of the problems that I have encountered either stem from not giving my perl code a set of brakes or not understanding the data enough to know how my code going to react. So I have managed to waste more than a few hours and caused myself just as many headaches when it comes to recursion.

I remember Blake Williford, a colleague of mine, saying "...with recursive functions you should always process the degenerate case first...", this makes sense and might be what I'm looking for. The two pitfalls (degenerate cases I think) that come to mind is "run away code" and "circular references in the data". Both of these can end up with a "the never ending boat ride" or a "quivering pile of uselessness". So what do I mean by "run away code"? This is code that has no brakes:
# My run away boat 
   sub recurse{
      my $num = shift; 
      $num++; 
      print $num."\n"; 
      recurse($num); 
   }
In this case, the code will continue to add 1 to itself till it sails off the edge of the world. This is generally the point where I will hit the largest integer the system can handle or the recursion times out. So as my perl code sails over the edge and dies, we should be able see the perl interpreter complaining about large integers and/or deep recursion. As a cautionary note, if you were to couple the above code with a fork(), system(), and/or exec() like operations, the code would run till it consumes some critical system resource. At which time your system will suddenly burst into flames and crash into a quivering pile(well maybe not the flames part). At any case I need to throw the anchor out and stop my code before it gets to far away:
# my degenerate function 
   sub recurse{
      my $num = shift;

      # set how our limit 
      my $biggest_num = 999999999999; 

      # throw out the anchor if over ther limit
      # AKA test the degenerate case  
      return if($num > $biggest_num); 

      $num++;
      print $num."\n"; 
      recurse($num);
   } 
I can see two advantages with this approach, first only the least amount code is executed before returning. Secondly my code won't be sailing for the edge, it now knows where to drop anchor. But I do wonder if this will be so obvious when you start adding data and more logic to my perl code. So lets make things a little more interesting and try to process a bit more realistic data:
my @family_tree = (
      {
         'name' => 'warren',
         'relation' => 'self',
         'parents' => 'bob,barbara' 
      },
      {
         'name' => 'bob',
         'relation' => 'father',
         'parents' => 'robert,barb' 
      },
      {
         'name' => 'robert',
         'relation' => 'grandfather',
         'parents' => 'gert,william' 
      },
      {
         'name' => 'barb',
         'relation' => 'grandmother',
         'parents' => 'julie,sam' 
      },
      {
         'name' => 'barbara',
         'relation' => 'mother',
         'parents' => 'warren,marry' 
      },
      {
         # my namesake
         'name' => 'warren',
         'relation' => 'grandfather',
         'parents' => 'jack,jill' 
      },
      {
         'name' => 'marry',
         'relation' => 'grandmother',
         'parents' => 'oliver,dorothy' 
      },
      {
         'name' => 'john',
         'relation' => 'son',
         'parents' => 'warren,samatha' 
      } 
   ); 
Next I would like to watch how a recursive function might explore my family tree. Since we have a small data set my code can't go to far before it runs out of things to process so I'm only validating the data for my degenerate cases . But if the data set was much larger I would need to puts some limits on it(bring the boat anchor).
# get my all of sons hash references and feed it to the tree walker
   walk_tree(grep($_->{'relation'} eq 'son',@family_tree)); 

   sub walk_tree{
      my $person = shift;

      # return not a valid person
      return unless(defined $person->{'name'} 
      && defined $person->{'relation'});

      # tell the world who I am
      print "\n";
      print 'Hello my name is '.$person->{'name'};
      print ' and I\'m Warren\'s '.$person->{'relation'}."\n"; 

      # process the parents or return there is none
      if(defined $person->{'parents'}){
         my @parent_names = ();
         @parent_names = split /,/, $person->{'parents'};
         print 'My parents are: '.join(" and ",@parent_names)."\n";

         foreach my $parent_name(@parent_names){
            print 'I\'m checking '.$parent_name."'s lineage\n";
            walk_tree(grep($_->{'name'} eq $parent_name,@family_tree)); 
         } 
      }    
   }
If you actually run the code, you will see this is actually an example of my second case "circular references in the data", and as you will also see has determined that I'm my own grandfather. I have also created a logical and maybe even a temporal paradox that my perl script is unable to resolve without manual intervention(^C). But despite my cataclysmic coding error there's still some good news, this code is only juggling the same pointers around and not creating new ones, so all I have is run away code that is going around the world to the left(maybe right). So I need to figure some other degenerate case and so to speak throw a bucket of water on this type of behaviour.
# get my all of sons hash references and feed it to the tree walker
   walk_tree(grep($_->{'relation'} eq 'son',@family_tree)); 

   sub walk_tree{
      my $person = shift;
      my $state = shift;

      # return not a valid person
      return unless(defined $person->{'name'} 
      && defined $person->{'relation'});

      # tokenize some data that makes me unique 
      # and store it's state  
      push(@{$state},$person->{'name'}.$person->{'relation'});

      # tell the world who I am
      print "\n";
      print 'Hello my name is '.$person->{'name'};
      print ' and I\'m Warren\'s '.$person->{'relation'}."\n"; 

      # process the parents or return there is none
      if(defined $person->{'parents'}){
         my @parent_names = ();
         @parent_names = split /,/, $person->{'parents'};
         print 'My parents are: '.join(" and ",@parent_names)."\n";

         foreach my $parent_name(@parent_names){

            # get list of all people with same name as parent.
            my @people =();
            push @people, grep($_->{'name'} eq $parent_name,@family_tree);
            
            foreach my $me(@people){

               # see if we've already checked this person
               my $used = scalar
                  grep($me->{'name'}.$me->{'relation'} eq $_,@{$state});

               unless($used){
                  print 'I\'m checking '.$me->{'name'}."'s lineage\n";
                  walk_tree($me,$state);
               }
            }
         } 
      }    
   } 
We can now track the state of who's already been looked at and make sure we don't recurse on them again. Now that my script works as expected it is far from over, consider we can't always guarantee our data is clean or in the right order. This leave the possibility of more degenerate cases but I'm not going to entertain them now, because as mentioned this stuff makes my brain hurt. However for though that wish to go swimming these waters, please comment and show me some examples of what your thinking of, for as of right now I've only entertained the one standard.

And for all you more experienced coders that are wondering why such a rant over this trivial aspect of coding? I'll just say I've seen a few fledgling system administrators render some very large production systems useless (myself included) with this exact weapon. This post I hope can also be a cautionary old fishermen's yarn for all those newbie perl developers and system administrators out there that might be playing with fire. Hopefully my ranting on recursion will be useful for anyone who is thinking about jumping into the deep end of the recursion pool.

1 comment:

  1. In your sub walk_tree() you start off doing two shifts...
    my $person = shift;
    my $state = shift;

    Then you run your code.

    To be true to the recursive form you should do this before your next call to walk_tree();

    my $person = shift;
    my $state = shift;

    print 'I\'m checking '.$me->{'name'}."'s lineage\n";
    walk_tree($me,$state);

    Then your check for the degenerate case is the first thing done in your system call.

    sub walk_tree {
    # return not a valid person
    return unless(defined $person->{'name'}
    && defined $person->{'relation'});

    ReplyDelete