Friday, June 5, 2015

Node JS As The Universal Shell Scripting System

For the past 20 years, I've focused on programming in real languages for real applications, as opposed to web stuff, :-). When I have dealt with the web, it's been using server-side scripts. Definitely not that Javascript crud. Mind you, back when I worked for a porn website, they also came up with a JS based website, Simple.com, long before other people were doing all-JS interfaces.

But the time has come to re-evaluate ... well, everything. Javascript can be checked for validity and style using JSHint & JSLint; it can be debugged. There are frameworks and layers that make it possible to do useful things easily, treating old JS as a sort of widely-supported low-level language on which to implement more elegant systems.

One of these new paradigms is the implementation of node.js to run JS programs on the server, rather than in a web page. Node uses the Google's V8 Javascript engine from the Chrome browser, which means a large corporation ensures high performance on Windows, Mac and Linux platforms powered by Intel 32 or 64 bit processors as well as ARM and MIPS processors. Node only has to handle the difference between a process and a web component: no window or document objects to access, but instead new process and global objects.

And with the coming release of EcmaScript 6, the language itself will become far more modern and sensible.

So if the language is tolerable to use, and available everywhere, perhaps it's time to start using it.BASH is great, but you need Cygwyn to use it on Windows. How does Node compare to Bash, or to Perl?

I went to RosettaCode to look for implementations of problems in the required languages, and selected matrix multiplication as the first challenge. I expect this to perform badly in a Unix shell script, but frankly, I don't generally use arrays in shell scripts, so we'll see how their solution works out.

I used the matrix transpose and matrix multiplication implementations to create a Matrix.js file. The one change is that I added:

    exports.Matrix = Matrix;

to make available the constructor to the importing site; the other functions become available as methods on an object.

I added a readMatrix() routine to import data from a file. If perhaps my code is inefficient, neither is it evaluated for performance. I only time the core multiplication. Here's my routine:


// Run the program as 
//     "DEBUGGING=1 node program args"
// to get debugging output

exports.readMatrix = function(err, data) {
   if (err) {
      throw err;
   }
   var row = 0,
       col = 0,
       i,len, num = 0;
       matrix = [[]];
   for ( i = 0, len = data.length; i < len; i++) {
      switch ( data[i] ) {
         case ' ' :
            if ( num > 0 ) {
               matrix[row][col] = num;
               num = 0;
            }
            col++;
            break;

         case "\n" :
            if ( num > 0 ) {
               matrix[row][col] = num;
               num = 0;
            }
            row++;
            matrix[row] = [];
            break;

         case '1': case '2': case '3': case '4': case '5':
         case '6': case '7': case '8': case '9': case '0':
            num = num * 10 + (data[i] - '0');
            break;
      }
   }
   if ( num > 0 ) { // trailing value due to no final \n
      matrix[row][col] = num;
   }
   else {           // extra empty array due to final \n
      if (matrix[row].length = == 0 ) {
         matrix.splice(row, 1);
      }
   }
   return matrix;
}
This function will be used as the callback to a FileSystem readFile() function. It receives either an error message or the contents of the file. Errors are dealt with at the top, otherwise data is processed character by character. Numeric values are separated by spaces, so when a space is seen, the accumulated value, num, is stored in the current matrix cell. At the end of the line, a new empty row is begun, after storing the accumulator if necessary. For digits, the current contents of the accumulator are shifted over one decimal place, and the new digit added. Clearly, this only works with integers, and no error-proofing is provided, other than handling a trailing space at the end of a line. Superfluous blank lines would result in empty row arrays being added to the matrix, but we'll simply make sure that circumstance doesn't arise. The top-level code is simple: Read in the above modules and the FileSystem module; read in the command line arguments, read in the actual matrices, and determine the time to perform the matrix multiplication reps times:

var rm  = require('./readMatrix.js');
var fs  = require('fs');
var mat = require('./Matrix.js');

var reps  = process.argv[2],
    file1 = process.argv[3],
    file2 = process.argv[4];

fs.readFile(file1, "utf-8", function readM(err, data) {
    var M = new mat.Matrix(rm.readMatrix(err, data));
    fs.readFile(file2, "utf-8", function readN(err, data) {
        var N = new mat.Matrix(rm.readMatrix(err, data));

        var t1 = process.hrtime(); 
        var prod, i;
        for ( i = 0; i < reps; i++ ) {
            prod = M.mult(N);
        }
        var delta = process.hrtime(t1);
        console.log( reps + " repetitions took " + 
                     delta[0] +"."+delta[1]+" seconds");
    });
});
To begin, I multiplied two simple 2x2 matrices:
    [ 1 2 ] [ 5 6 ] __\ [ 19 22 ]
    [ 3 4 ] [ 7 8 ]   / [ 32 50 ]
The process took 0.11 seconds, but most of that is overhead. Doing it 10 times, or 100, or 1,000 or even 1,000,000 time took 0.11 ~ 0.47 seconds. Ten million iterations took 4.7 seconds, 100 million takes 49 seconds,  one billion iterations takes 522 seconds. So that's pretty linear, and pretty fast. Of course the multiplication is insignificant, it's mostly overhead: function call, creating and discarding variables, garbage collection. Since matrix multiplication is an N^3 operation, 2x2 matrices require 8 multiplication, while 5x5 matrices require 125. Increasing to a 5x5 matrix results in 1/8 the performance of a 2x2 matrix. 10x10 produces 1000 multiplications in 30 times as much time as 2x2. 32x32 matricies require 32768 multiplications to process, while 100x100 involve a million scalar multiplications.





My conclusion, even prior to having other language versions to compare with, is that Node JS provides excellent performance. You may want another language to implement MatLab or a nuclear power plant controller, but it is clear why it is being used for real applications, among them several IDE editors.

No comments: