Wednesday, June 24, 2015

Bash Inefficient Matrix Multiplication

Bash can only handle integer arithmetic. To process floats, you need to shell out to 'bc' or some other external utility. The core multiplication changes to:

    result=`echo  "$result + $m * $n" | bc`;

Boy, does that slow things down! For each multiplication, you're launching a shell, invoking a program, returning results. The result is 800 multiplications a second. Note this is running on integer matrices, there remain unresolved problems processing floats.


One obvious problem is shelling out for every multiplication. More efficient is to accumulate the expression to calculate, and just invoke the external calculator once. Now we only shell out N^2 times, instead of N^3, so increasing matrix size leads to better performance, up to a limit. the 32x32 matrix gives 18x performance compared to the simpler version. This implies the 32x reduction in shelling out is offset by the cost of building the expression. By the time we reach 100x100, overhead costs are overwhelming gains: a 100x reduction in shelling out generates only a 16x improvement in performance.


Tuesday, June 23, 2015

BASH Matrix Multiplication

tl;dr Bash is not the language for math-intensive operations.





REPS=$1;
FILE_1=$2;
FILE_2=$3
OUTFILENAME=$4;

readonly COLS=`head -1 $FILE_1 | wc -w`;
readonly ROWS=`cat $FILE_1 | wc -l`;
# echo "rows is $ROWS; cols is $COLS"
if [[ $ROWS != $COLS ]]; then
    echo "Expecting square matrices, " \
         "but rows = $ROWS, cols = $COLS\n";
    exit 1;
fi

# --------------------------------------------------
# SUBROUTINES
#
function outputMatrix()
{
    local matrixname=$1;
    local matrix;
    local elem;
    echo "matrix is '$matrixname'.";
    eval matrix=\( \${${matrixname}[@]} \);
    local i=0;
    for elem in "${matrix[@]}"; do
 echo -n "$elem ";
 if (( ++i == $COLS )); then
     echo '';
     i=0;
 fi
    done  
}

function multiply()
{
    declare -a product;
    local M=$1 N=$2;
    local i j k idx1 idx2 idx3;
    for ((i=0; i < $ROWS; i++ )); do
        for ((j=0; j<$COLS; j++)); do
            local result=0;
            for ((k=0; k<$ROWS; k++)); do
                idx1=$((i*$ROWS+k));
                idx2=$((k*$COLS+j));
                result=$((result+${M}[idx1]*${N}[idx2]));
            done
            idx3=$((i*$COLS+j));
            product[$idx3]=$result;
        done
    done
#    outputMatrix product;
}
# --------------------------------------------------
#

matrix1=($(cat $FILE_1));
matrix2=($(cat $FILE_2));


start=`date  +%s.%N`;
for ((repeat = 0; repeat < $REPS; repeat++)); do
    multiply matrix1 matrix2 
done
end=`date  +%s.%N`;
echo "$end - $start" | bc

My core multiplication routine is similar to the one at RosettaCode. Results are drastically inferior compared to Perl or JavaScript, a factor or 200x ~ 1000x as slow.


Notice that the 100x100 matrix has only a single data point, at 55,000 multiplications per second. That took 547 seconds; I wasn't going to wait around 2 hours for more repetitions.

Part of the problem is from flattening a two-dimensional matrix into a single one-dimensional array. Converting indices into a single array index overwhelms the actual matrix multiplication, resulting in 3N^3 + N^2 multiplications. These all happen in "user space", while Perl & JS have the opportunity to carry out array indexing in "language space", potentially more efficient.

Actually, simplifying the index calculations only improved runtime about 10%, not a whole lot. Possibly variable access overwhelms the difference between addition and multiplication.


    for ((i=0; i <; $ROWS; i++ )); do
        for ((j=0; j < $COLS; j++)); do
            local result=0;
            local idx1=$((i*$ROWS));
            local idx2=$j;

            for ((k=0; k < $ROWS; k++)); do
                result=$((result+${M}[idx1]*${N}[idx2]));
                ((idx1 += 1))
                ((idx2 += COLS))
            done
            product[$idx3]=$result;
            ((idx3 ++))
        done
    done



Wednesday, June 17, 2015

Better Performance Through Transposition

Both Perl and Javascript implement only one-dimensional arrays. To represent a matrix you need to nest arrays within arrays, typically one array for each row, within the overall matrix array.

Each element of the result matrix is the product of one row from the first matrix, and one column from the second. Going from one element to the next in the same row is much quicker than going to another row. The first matrix only changes rows once for each row, N changes, but the second matrix has N^3 changes of row, one for each multiplication.

If we were to transpose the second array, turn the rows into columns and columns into rows, the multiplication could be carried out by by scanning along a row instead of going from row to row.That reduces the cost to N^2. How much will that save us?

I transposed the second array while reading it in:

for my $size ( keys %vars ) {
   my $filestub = q{M_} . $size . q{x} . $size . q{.};
   my ( $f1, $f2 ) = ( $filestub . '1', $filestub . '2' );
   $vars{$size}[0] = readMatrix( $f1 );
   $vars{$size}[1] = transpose( readMatrix( $f2 ));
}

and use pairwise() from List::MoreUtils to multiply corresponding elements of the rows, and sum0() from List::Utils to add up the products.


for my $i ( 0..$rows - 1 ) {
    for my $j ( 0 .. $cols - 1 ) {
    prod[$i][$j] 
        = sum0 pairwise { $a * $b } M->[$i], N->[$j];
    }
}

The results were much the same for both integers and floats.

-➤   perl ./int_benchmark.pl
Processing for 5 seconds each size will take 25 seconds.
Wed Jun 17 15:25:14 2015
              Rate M_100x100   M_32x32   M_10x10     M_5x5     M_2x2
M_100x100   5.54/s        --      -97%     -100%     -100%     -100%
M_32x32      174/s     3046%        --      -97%     -100%     -100%
M_10x10     5114/s    92135%     2832%        --      -85%      -98%
M_5x5      34909/s   629461%    19913%      583%        --      -87%
M_2x2     274124/s  4943582%   157052%     5260%      685%        --
Wed Jun 17 15:25:46 2015

-➤   perl -E '
say "100 => ", 100**3 * 5.54   / 10**6;
say " 32 => ",  32**3 * 174    / 10**6;
say " 10 => ",  10**3 * 5114   / 10**6;
say "  5 => ",   5**3 * 34909  / 10**6;
say "  2 => ",   2**3 * 274124 / 10**6;
'
100 => 5.54
 32 => 5.701632
 10 => 5.114
  5 => 4.363625
  2 => 2.192992

Previous float results were 2.1, 4.1, 4.9, 5.2 and 5.4 MFLOPS for 2x2 ... 100x100 sized matrices. So 10,000 row changes or 1,000,000, it doesn't make much difference.

I modified the Javascript program to operate on a transposed second matrix, but since Javascript doesn't provide a pairwise() routine, I generated a dot-product() routine which takes two rows. But  both integer and float matrices produce much the same results as before ... +/- 10%. So transposing is irrelevant when there's so much other overhead in a scripting language.

Monday, June 15, 2015

Linpack FLOPS Benchmark

The numbers I've been reporting for Javascript and Perl performance need a context. Are they running on a Commodore 64, or a super-computer? Nope, it's my laptop, and here's what Linpack has to report about it: 4 core, 8 thread (Intel core i7) at 3.389 GHz. Peak performance is 46 GFLOPS.

Javascript and Perl are both single-threaded, so you only get 1/8 peak performance, 5 3/4 GFLOPS. That's still 1000 times the performance we actually achieve on more convenient languages. You have to sacrifice some things to gain benefits. On the optimistic side, a moderately fast modern computer sunning Perl or Javascript is almost as fast as  Sparc-10 running LINPACK.

Intel(R) LINPACK data

Current date/time: Fri Jun 12 01:19:06 2015

CPU frequency:    3.389 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 8

Parameters are set to:

Number of tests                             : 15
Number of equations to solve (problem size) : 1000  2000  5000  10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array                  : 1000  2000  5008  10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run                     : 4     2     2     2     2     2     2     2     2     2     1     1     1     1     1    
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     1     1     1     1    

Maximum memory requested that can be used = 16200901024, at the size = 45000

============= Timing linear equation system solver =================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm)
1000   1000   4      0.019      35.5628  1.029343e-12 3.510325e-02
1000   1000   4      0.018      36.2016  1.029343e-12 3.510325e-02
1000   1000   4      0.018      36.1857  1.029343e-12 3.510325e-02
1000   1000   4      0.018      36.2075  1.029343e-12 3.510325e-02
2000   2000   4      0.136      39.2932  4.298950e-12 3.739560e-02
2000   2000   4      0.135      39.4407  4.298950e-12 3.739560e-02
5000   5008   4      1.923      43.3503  2.581643e-11 3.599893e-02
5000   5008   4      1.925      43.3232  2.581643e-11 3.599893e-02
10000  10000  4      14.781     45.1176  9.603002e-11 3.386116e-02
10000  10000  4      14.825     44.9833  9.603002e-11 3.386116e-02
15000  15000  4      49.465     45.4959  2.042799e-10 3.217442e-02
15000  15000  4      49.541     45.4258  2.042799e-10 3.217442e-02
18000  18008  4      85.142     45.6727  2.894987e-10 3.170367e-02
18000  18008  4      85.978     45.2284  2.894987e-10 3.170367e-02
20000  20016  4      115.975    45.9938  4.097986e-10 3.627616e-02
20000  20016  4      116.138    45.9292  4.097986e-10 3.627616e-02
22000  22008  4      156.203    45.4512  4.548092e-10 3.331299e-02
22000  22008  4      156.091    45.4840  4.548092e-10 3.331299e-02
25000  25000  4      227.453    45.8025  6.089565e-10 3.462917e-02
25000  25000  4      227.769    45.7390  6.089565e-10 3.462917e-02
26000  26000  4      254.249    46.0913  6.669421e-10 3.506981e-02
26000  26000  4      254.840    45.9845  6.669421e-10 3.506981e-02
27000  27000  4      284.795    46.0804  6.672171e-10 3.253690e-02
30000  30000  1      414.983    43.3796  8.421348e-10 3.319704e-02
35000  35000  1      732.317    39.0347  1.085509e-09 3.151068e-02
40000  40000  1      1479.396   28.8428  1.466774e-09 3.262155e-02
45000  45000  1      2764.282   21.9782  1.711494e-09 3.011194e-02

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
1000   1000   4       36.0394  36.2075 
2000   2000   4       39.3669  39.4407 
5000   5008   4       43.3367  43.3503 
10000  10000  4       45.0504  45.1176 
15000  15000  4       45.4608  45.4959 
18000  18008  4       45.4506  45.6727 
20000  20016  4       45.9615  45.9938 
22000  22008  4       45.4676  45.4840 
25000  25000  4       45.7708  45.8025 
26000  26000  4       46.0379  46.0913 
27000  27000  4       46.0804  46.0804 
30000  30000  1       43.3796  43.3796 
35000  35000  1       39.0347  39.0347 
40000  40000  1       28.8428  28.8428 
45000  45000  1       21.9782  21.9782 

End of tests



Perl Floating Point-Multiplication Benchmark

I was worried whether I was making basic errors in testing the Perl version, so I decided to use the Benchmark module to get the numbers. I copied the matmult.pl file and added use Benchmark ':all' to the header of the file. The main() routine got changed to :

    my $time = $ARGV[0] || 5;
    my %vars = (   2 => [],
                   5 => [],
                  10 => [],
                  32 => [],
                 100 => [],
    );
    for my $size ( keys %vars ) {
       my $filestub = q{F_} . $size . q{x} . $size . q{.};
       $vars{$size}[0] = readMatrix( $filestub . '1' );
       $vars{$size}[1] = readMatrix( $filestub . '2' );
    }
    
    say "Processing for $time seconds each size ",
        "will take @{[5 * $time]} seconds."; 
    say scalar localtime;
    cmpthese( -$time, {
       'F_2x2'     => sub { matmult( $vars{2}[0],
                                        $vars{2}[1]); },
       'F_5x5'     => sub { matmult( $vars{5}[0],
                                        $vars{5}[1]); },
       'F_10x10'   => sub { matmult( $vars{10}[0],  
                                        $vars{10}[1]); },
       'F_32x32'   => sub { matmult( $vars{32}[0],  
                                        $vars{32}[1]); },
       'F_100x100' => sub { matmult( $vars{100}[0], 
                                        $vars{100}[1]); },
       });
    say scalar localtime;


The user can specify how long each variant should be run, with a default of 5 seconds for each size if no arg is provided.

-➤   perl ./benchmark.pl 2
Processing for 2 seconds each size will take 10 seconds.
Mon Jun 15 17:11:08 2015
              Rate F_100x100   F_32x32   F_10x10     F_5x5     F_2x2
F_100x100   5.39/s        --      -97%     -100%     -100%     -100%
F_32x32      163/s     2931%        --      -97%     -100%     -100%
F_10x10     4909/s    90934%     2904%        --      -85%      -98%
F_5x5      32731/s   606904%    19929%      567%        --      -88%
F_2x2     264660/s  4908131%   161856%     5292%      709%        --
Mon Jun 15 17:11:22 2015

Adjusting for the number of multiplications involved in each size of matrix --- 8 for the 2x2 up to 1,000,000 for the 100x100 --- produces results very similar to what we've already seen ( millions of multiplications per second):

perl -E'say  "100 => 5.36";
        say  "32  => ", 32000 * 154 / 10**6;
        say  "20  => ", 1000 * 4904 / 10**6;
        say  "5   => ", 125 * 32767 / 10**6;
        say  "2   => ", 8 * 263472  / 10**6; 
       '

100 => 5.39
32  => 5.216
20  => 4.909
5   => 4.091375
2   => 2.11728

Rerunning the benchmarks with integer matrices and use integer enabled generated insignificantly, fractionally better numbers; though I wonder if the pragma affected Benchmark's calculations.

Friday, June 12, 2015

Perl Float Matrix Multiplication

Floating point performance compared to integer multiplication on Perl


This is much the same as Perl Integer performance, and disappointly inferior to Javascript. Not a whole lot worse, comparable, really, but MY language should be better than the OTHER language! :-)


Monday, June 8, 2015

Perl Matrix Multiplication

Here's the Perl script for matrix multiplication. It's derived from the RosettaCode site with slight modifications ... I didn't like some things about their way of doing things, but it's much the same in the end.

#! /usr/bin/perl5.20.2

use 5.020;
use warnings; 
use strict;
use English;
use Time::HiRes qw(gettimeofday tv_interval);
use autodie;
use Data::Dumper;

# -------------------------------------------------------
# SUBROUTINES
#
# ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
# matmult - multiple matrices. Copied from
#    RosettaCode.orgwiki/Matrix_multiplication#Perl.
#
sub matmult {
    my ( $M, $N ) = @_;

    my ( $rows, $cols ) = (scalar @$M, scalar @{$N->[0]} );
    my $prod = [];
    my $k = @$N - 1;

    for my $i ( 0..$rows - 1 ) {
        for my $j ( 0 .. $cols - 1 ) {
            $prod->[$i][$j] += $M->[$i][$_]
                                * $N->[$_][$j]
                for 0..$k;
        }
    }
    return $prod;
}
# ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
# readMatrix - read file data -> array of array of int.
#
sub readMatrix {
    my ( $filename ) = @_;

    open my $infile, '<', $filename;
    local $/ = undef;
    my ($matrix) = [map { [split / +/, $_] }
                      split / *\n/, <$infile>
                   ];

    return $matrix;
}
# ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
# main line of processing
#
sub main {
   die( "USAGE: $PROGRAM_NAME reps file1 file1" )
      unless 3 == @ARGV 
          && $ARGV[0] > 0
          && -e $ARGV[1] && -r _
          && -e $ARGV[2] && -r _;

   my ($M, $N) = (readMatrix($ARGV[1]),
                  readMatrix($ARGV[2]));

#    say Data::Dumper::Dumper [$M, $N];

    my $t0 = [ gettimeofday ];
    my $prod;
    for my $reps ( 1 .. $ARGV[0] ) {
        $prod = matmult( $M, $N );
    }
    my $delta = tv_interval($t0);
    say "Multiplying matrices $ARGV[0] times takes ",
        "$delta seconds.";
#    say Data::Dumper::Dumper [$prod];
}
# -----------------------------------------------------

main();



Inverting the numbers to look at millions of multiplications per second, we see that overhead on small matrices degrades performance, but larger matrices enable 5.5 million multiplications per second.


Sunday, June 7, 2015

Node JS Multiplying Float Matrices

I don't know why I wrote such stupid code for readMatrix.js. I guess when someone uses a new language their brain cells seize up, effective function becomes impossible.

After tests I ran on Friday, I wanted to see how Javascript does with floating point numbers? JS only supports 64 bit floating-point numbers, but there must be special code in their to make integers look like integers ... unless their isn't.

When I went to update my code to support numbers with fractions, my first thought was to flag whether a decimal point had been seen and then keep track of the value by which to divide, multiplying by ten for each fractional digit. I quickly realized I was sinking into the mire of coding stupidity; JS must have a function to convert strings into integers or floats, and indeed it has parseInt() and parseFloat(). But in my research I discovered ( or re-discovered ) that there's no need for the conversion. Like Perl, JS allows arithmetic on the string representation of numbers:

    1  +  2 ⇛ 3
   "1" + "2"⇛ 3

This made me think about how I would parse the file in Perl, using split(). I knew JS had map(), so quite likely it would have split() as well. And it does! So all that manual processing went in the trash, and I simplified readMatrix.js to


   exports.readMatrix = function (err, data) {
      if (err) {
         throw err;
      }
      var rows = data.split(/ *\n/)
                     .filter(function(val) {
                         return val.length > 0;
                      });
      var matrix = [], i, len;
      for ( 1 = 0, len = rows.length; i<len; i++ ) {
         matrix[i] = rows[i].split(/ +/);
      }
      return matrix;                  
   }; 

I think I had the two chunks merged, in an online TryIt editor I was experimenting with,but this is compact enough for the moment.


{Correction - Although the original JS used only floats, including for integer values, Googles V* JS engine, the one used in node.js, has separate 32 bit integers.}

Although integers are represented by 64-bit IEEE floating point numbers, integer multpiplication is still 3 times as fast as floating point multiplication. There are clearly optimizations that are being made use of to improve performance.

Calculating average operations per second shows small matrixes have performance limited by other overhead, but larger matrices are pumping along at 7.8 million multiplications per second. 



Friday, June 5, 2015

Node JS As The Universal Shell Scripting System

For the past 20 years, I've focused on programming in real languages for real applications, as opposed to web stuff, :-). When I have dealt with the web, it's been using server-side scripts. Definitely not that Javascript crud. Mind you, back when I worked for a porn website, they also came up with a JS based website, Simple.com, long before other people were doing all-JS interfaces.

But the time has come to re-evaluate ... well, everything. Javascript can be checked for validity and style using JSHint & JSLint; it can be debugged. There are frameworks and layers that make it possible to do useful things easily, treating old JS as a sort of widely-supported low-level language on which to implement more elegant systems.

One of these new paradigms is the implementation of node.js to run JS programs on the server, rather than in a web page. Node uses the Google's V8 Javascript engine from the Chrome browser, which means a large corporation ensures high performance on Windows, Mac and Linux platforms powered by Intel 32 or 64 bit processors as well as ARM and MIPS processors. Node only has to handle the difference between a process and a web component: no window or document objects to access, but instead new process and global objects.

And with the coming release of EcmaScript 6, the language itself will become far more modern and sensible.

So if the language is tolerable to use, and available everywhere, perhaps it's time to start using it.BASH is great, but you need Cygwyn to use it on Windows. How does Node compare to Bash, or to Perl?

I went to RosettaCode to look for implementations of problems in the required languages, and selected matrix multiplication as the first challenge. I expect this to perform badly in a Unix shell script, but frankly, I don't generally use arrays in shell scripts, so we'll see how their solution works out.

I used the matrix transpose and matrix multiplication implementations to create a Matrix.js file. The one change is that I added:

    exports.Matrix = Matrix;

to make available the constructor to the importing site; the other functions become available as methods on an object.

I added a readMatrix() routine to import data from a file. If perhaps my code is inefficient, neither is it evaluated for performance. I only time the core multiplication. Here's my routine:


// Run the program as 
//     "DEBUGGING=1 node program args"
// to get debugging output

exports.readMatrix = function(err, data) {
   if (err) {
      throw err;
   }
   var row = 0,
       col = 0,
       i,len, num = 0;
       matrix = [[]];
   for ( i = 0, len = data.length; i < len; i++) {
      switch ( data[i] ) {
         case ' ' :
            if ( num > 0 ) {
               matrix[row][col] = num;
               num = 0;
            }
            col++;
            break;

         case "\n" :
            if ( num > 0 ) {
               matrix[row][col] = num;
               num = 0;
            }
            row++;
            matrix[row] = [];
            break;

         case '1': case '2': case '3': case '4': case '5':
         case '6': case '7': case '8': case '9': case '0':
            num = num * 10 + (data[i] - '0');
            break;
      }
   }
   if ( num > 0 ) { // trailing value due to no final \n
      matrix[row][col] = num;
   }
   else {           // extra empty array due to final \n
      if (matrix[row].length = == 0 ) {
         matrix.splice(row, 1);
      }
   }
   return matrix;
}
This function will be used as the callback to a FileSystem readFile() function. It receives either an error message or the contents of the file. Errors are dealt with at the top, otherwise data is processed character by character. Numeric values are separated by spaces, so when a space is seen, the accumulated value, num, is stored in the current matrix cell. At the end of the line, a new empty row is begun, after storing the accumulator if necessary. For digits, the current contents of the accumulator are shifted over one decimal place, and the new digit added. Clearly, this only works with integers, and no error-proofing is provided, other than handling a trailing space at the end of a line. Superfluous blank lines would result in empty row arrays being added to the matrix, but we'll simply make sure that circumstance doesn't arise. The top-level code is simple: Read in the above modules and the FileSystem module; read in the command line arguments, read in the actual matrices, and determine the time to perform the matrix multiplication reps times:

var rm  = require('./readMatrix.js');
var fs  = require('fs');
var mat = require('./Matrix.js');

var reps  = process.argv[2],
    file1 = process.argv[3],
    file2 = process.argv[4];

fs.readFile(file1, "utf-8", function readM(err, data) {
    var M = new mat.Matrix(rm.readMatrix(err, data));
    fs.readFile(file2, "utf-8", function readN(err, data) {
        var N = new mat.Matrix(rm.readMatrix(err, data));

        var t1 = process.hrtime(); 
        var prod, i;
        for ( i = 0; i < reps; i++ ) {
            prod = M.mult(N);
        }
        var delta = process.hrtime(t1);
        console.log( reps + " repetitions took " + 
                     delta[0] +"."+delta[1]+" seconds");
    });
});
To begin, I multiplied two simple 2x2 matrices:
    [ 1 2 ] [ 5 6 ] __\ [ 19 22 ]
    [ 3 4 ] [ 7 8 ]   / [ 32 50 ]
The process took 0.11 seconds, but most of that is overhead. Doing it 10 times, or 100, or 1,000 or even 1,000,000 time took 0.11 ~ 0.47 seconds. Ten million iterations took 4.7 seconds, 100 million takes 49 seconds,  one billion iterations takes 522 seconds. So that's pretty linear, and pretty fast. Of course the multiplication is insignificant, it's mostly overhead: function call, creating and discarding variables, garbage collection. Since matrix multiplication is an N^3 operation, 2x2 matrices require 8 multiplication, while 5x5 matrices require 125. Increasing to a 5x5 matrix results in 1/8 the performance of a 2x2 matrix. 10x10 produces 1000 multiplications in 30 times as much time as 2x2. 32x32 matricies require 32768 multiplications to process, while 100x100 involve a million scalar multiplications.





My conclusion, even prior to having other language versions to compare with, is that Node JS provides excellent performance. You may want another language to implement MatLab or a nuclear power plant controller, but it is clear why it is being used for real applications, among them several IDE editors.