Sunday, April 21, 2013

Learning Perl5i #3 revisted

I was somewhat uncomfortable that my solution to the combinatorics task, which involved a single, not very long routine, became so long with the addition of command line processing and POD  documentation. For one thing, while it had some information about how to invoke the routine, it did not have tests.

I still feel that a book on programming, say, should have a fully-implemented solution once a chapter, perhaps, to show the way things should be done. But that is too long for every single bit of code. Full documentation and sample implementations stretch on too long, compared to the few lines a snippet actually deserves.

Providing tests may still be appropriate. They take up less space than full documentation, yet explain how the various components work, by the very act of stress-testing those lines of code. Tests provide examples of how the code should be invoked.

Revising the combinatorics task into a module, this generates:

use perl5i::2;

# ----------------------------------------
# generate combinations of length $n consisting of characters
# from the sorted set @set, using each character once in a
# combination, with sorted strings in sorted order.
#
# Returns a list of array references, each containing one 
# combination.
#
func combine($n, @set) {
    return unless @set;
    return map { [ $_ ] } @set if $n == 1;

    my ($head) = shift @set;
    my @result = combine( $n-1, @set );
    for my $subarray ( @result ) {
$subarray->unshift( $head );
    }
    return ( @result, combine( $n, @set ) );
}

# ------------------------------------------------------------------
# Tests
#
func runtests() {
    eval { use Test::More }; # only include if testing

    test_no_set();
    test_length_zero();
    test_length_one();
    test_longer();
    done_testing();
}

func test_no_set() {
  is( combine( 1, () ), undef, 
      'set == empty list returns empty array' );
  is( combine( 1 ), undef, 
      'set not provided returns empty array' );
}

func test_length_zero() {
  is( combine( 0, 'a' ), undef, 'zero length returns empty array' );
}

func test_length_one() {
  my ( @results ) = combine( 1, 'a' );
  is_deeply( \@results, [['a']],
    'length 1 returns unary set as array element');

  @results = combine( 1, 'a', 'b', 'c' );
  is_deeply( \@results, [['a'], ['b'], ['c']],
    'length 1 returns larger set as array elements');
}
func test_longer() {
  my ( @results ) = combine( 2, 'a', 'b', 'c' );
  is_deeply( \@results, [['a', 'b'], ['a', 'c'], ['b', 'c']],
    'longer length generates list of  combinations');
}
# ------------------------------------------------------------------
# Run tests if invoked as a program, rather than in cluded as a 
# module.
#
runtests() unless caller();

Output from invoking the module as a program:

bash-3.2$ perl combinations.pli
ok 1 - set == empty list returns empty array
ok 2 - set not provided returns empty array
ok 3 - zero length returns empty array
ok 4 - length 1 returns unary set as array element
ok 5 - length 1 returns larger set as array elements
ok 6 - longer length generates list of  combinations
1..6

Thursday, April 18, 2013

Learning Perl5i #3

I've long admired the way functional languages specify multiple implementations of a routine for different sets of argument values, to detect terminal conditions. Converting a Perl6 solution  of a combinatorics task to Perl5i, I realized the plain Perl solution isn't necessarily all that bad, either.

The problem is to list the different ways you can select M elements from a set of N. Select 2 elements from a set of 4 would lead to the values ( 1, 2 ); ( 1, 3 ); ( 1, 4 ); ( 2, 3 ); ( 2, 4 ); ( 3, 4 ). Oh yeah, they want the characters in each combination in sorted order, and  the list of combinations sorted, too.

So it all consists of a single routine which gets invoked as combine(3, ['a'..'e']).

Ignore the first couple of lines, for a moment. If the first character of the set exists in a combination, it must be in the first position, since the characters are in sorted order. That means the remainder of the combination must be one character shorter, made of the remaining characters. So that part calls for a recursive solution:

* pull off the first character from the alphabet;
* recursively determine combinations of length -1 from the remainder of the set;
* reassemble be prefixing the first character onto each solution.

The other possibility is that the first character is not in the solution. That calls for recursively invoking the routine with the original length, but a smaller alphabet. If we do that often enough, we reach a situation where the alphabet is empty. This situation is handled by the first line we skipped over. In fact, we could optimize and abort if the size of the alphabet is smaller than the length of string we are requesting.

The more normal end case is when we are asking for a single character, the last in the sequence. In this situation, each character in the set is a candidate, so we return each character as an element in an anonymous array. Returning to the calling level, we then prefix the previous head of the alphabet onto the sequence using unshift. When we reach the top level, we have a list of arrays, each array being one combination.


While I was satisfied with the solution, I was unhappy with hard-coding the invocation. If I'm trying to demonstrate good coding, I should allow the program to be invoked with command-line arguments, to vary the way it is used.

So I added my favourite command-line options module, Getopt::Long, and modified the main section to fetch and use options:

my $options = parse_options();
say @$_ for combine( $options->{length}, 
                     @{$options->{alphabet}} );

The routine to read options looks like this:

func parse_options() {
    my %options = ( length => 3,
   set => 5,
 );
    GetOptions( \%options, 'length=s', 'set=s' )
or die( 'some usage message';

    $options{alphabet} = 
        [ ('a'..'z')[0..$options{set}-1] ];
    return \%options;
}

Getopt::Long has a  rich set of options to support single character or long argument names, to allow flags, options with string or numeric values, hashes and even multiple values, as well as supporting multiple names and abbreviations. You can specify a parameter, and then specify a reference to a scalar in which to store the value, but I prefer to pass a reference to a hash as the first argument, which gets all the values.

By setting some values in the hash, you can specify default values.

In this case I allow two arguments, the length of the strings, M in combinatorics talk, and the size of the alphabet, N in combinatorics talk. I then create an alphabet of set N by taking an appropriately-sized subset of the sequence a to z.



But if GetOptions fails because the argument list is totally unusable, you need to report an error. And it should probably provide a usage message, describing how to invoke the program. My favourite solution is Pod::Usage, which makes it easy to display the POD contained in the file ... but that means documenting the program with POD.

So the question is, is this getting silly? Or is it an appropriate example to present to people wanting to learn modern Perl?


use perl5i::2;
use Getopt::Long;
use Pod::Usage;

# ----------------------------------------
# Generate combinations of length $len consisting
of characters from the sorted set @chars, using 
each character once in a combination,with sorted
# strings in sorted order.
#
# We will use a recursive solution:
# * If we run out of eligable characters, we've
#   gone too far, and won't find a solution along
#   this path.
# * If we are looking for a single character, each 
#   character in @set is eligible, so return each
#   as the single element of an array.
# * We have not yet reached the last character, so
#   there are two possibilities:
#   1) push the first element onto the front of an 
#      N-1 length combination from the remainder of
#      the set.
#   2) skip the current element, and generate an  
#      N-length combination from the remainder
#
func combine($len, @chars) {
  return unless @chars;
  return map { [ $_ ] } @chars if $len == 1;

  my ($head) = shift @chars;
  my @result = combine( $len-1, @chars );
  for my $subarray ( @result ) {
    $subarray->unshift( $head );
  }
  return ( @result, combine( $len, @chars ) );
}

func parse_options() {
    my %options = ( length => 3,
   set => 5,
 );
    GetOptions( \%options, 'length=s', 'set=s',
                'man', 'help' )
or pod2usage(2);
    pod2usage(1)  if $options{help};
    pod2usage(-verbose => 2)  if $options{man};

    $options{alphabet} =
        [ ('a'..'z')[0..$options{set}-1] ];
    return \%options;
}

my $options = parse_options();
say @$_
    for combine( $options->{length},
                 @{$options->{alphabet}} );
__END__


=pod

=head1 NAME

combinations.pli - determine possible combinations of M elements from a set of N

=head1 VERSION

This documentation applies to combinations.pli version 0.0.1

=head1 SYNOPSYS

    combinations.pli [--length M] [--set N] \
                     [-man] [-help]

=head1 OPTIONS

=over 4

=item length M

Create strings of this length

=item set N

Use an alphabet of this many characters

=item help

Display a brief summary of information about the program.

=item man

Display the full documentation.

=back

=head1 DESCRIPTION

Generate combinations of length C<length> consisting of characters from the sorted set C<alphabet>, using each character once in a combination, with sorted strings in sorted order.

=cut

Tuesday, April 16, 2013

Learning Perl5i #2

Continuing the exploration of Perl5i beyond part one, I decided to treat perl5i as a language and use it to implement tasks at Rosetta Code.

Since Moose is a complete OO solution, I use a lightweight approach to OO, where used.

The first RC task I implemented is 100 doors: see the full implementation.

In terms of documentation, I gave each routine a small header in which I show how the routine is invoked, specify the inputs and outputs, and summarize the primary actions of the routine.

Since the task is defined in terms of a number of doors, and operations on those doors, it seemed appropriate to use an object. "some object" + "operations on that object" ==> "segregate into an OO implementation".

I used a simple constructor with initialization performed in _init() to simplify subclassing. The constructor doesn't need to be modified, just a new _init() written to handle additional attributes or arguments, if any. The new implementation can handle the duties of the original routine, or better, invoke the original as $self->SUPER::_init( @_ ).

The initializer stores how many doors we are dealing with, then then creates an array of doors. We are interested in doors 1..N, but we actually create one more, 0..N, since that's the way Perl arrays work. We'll simply ignore door zero, it is permanently closed. One subtle detail about the x multiplier: $CLOSED x $N is 0000..., string repetition; list repetition uses parentheses: ($CLOSED) x $N to generate ( 0, 0, 0, ... ).

The instructions say there are N passes, in which you toggle every door, every other door, every Nth door. I created toggle() to toggle a single door, toggle_n() to process a single pass, and toggle_all() to handle all the passes.

Besides eliminating duplication and excessive typing, one advantage of the ternary operator ( ? : ) over an if-block is that it is clear a single l-value is the target.

Simple statement modifier loops like these are, I believe, quicker than more general loops, and they offer simpler optimization in the future into parallel processing, but they also communicate that we are dealing with a set of data. Too often, people coming to Perl from C and Java think in terms of indices: I get the first index; I get the first element in the array; I do something ... I prefer to think in terms of sets, like a Unix pipeline: Take this array of elements, do something to them, ...

Note in toggle_n(), that $n * int( N / $n ) is the largest multiple of $n les than N. Taking the sequence 1.. int($self->{N} / $n ) and multiplying each element by $n produces $n, 2 * $n, 3 * $n, ... $n * int( N / $n ), the list of doors to toggle on that pass.

The final requirement is to print which doors are open.  grep() takes a code block and a list of values. Values are passed if they generate a true value in the block, and barred if they generate a false value.

Putting the routines together into a demonstration of the 100 door task becomes a simple three-line sequence.