You are here

Choosing the Best C# Array Type for Matrix Multiplication

Implementing 2D arrays in C# involves some decisions as to how to represent the array. C# has two different types of arrays. This is somewhat different than the Java programming language, which supports only a single type of array. In C# you must choose between rectangular and jagged arrays. There are very important considerations for each of these array types. There are also many articles about the differences between array handling in Java and C#. This article focuses on one thing-- performance. Particularly how to implement a matrix with the best performance. In a nutshell, I found that I got 50% worse performance from the rectangular array than I did the ragged one.

As the creator of the Encog Artificial Intelligence Framework for Java and C#, I do a fair amount of matrix programming in both Java and C#. Neural networks are mathematically intensive and it is important to get the maximum performance from the arrays. I found that Encog for C# was operating at about one third of Encog for Java. I decided the performance of the two languages can't be that far off. I had to be doing something wrong. This turned out to be the case. I was using the slower of C#'s two array types to implement Encog.

First, lets look at the two array types supported by C#. Consider the following Java code.

double[][] matrix = new double[n][m];

This code won't compile in C#. To translate this code to C# you need to think about which of C#'s two array types you want to use. C# supports a rectangular array, which uses the following syntax.

double[,] rectangular = new double[n,m];

These arrays must be perfectly rectangular, each row must have exactly the same number of columns. The second type of array supported by C# is the jagged array. A jagged array is seen here.

double[][]jagged = new double[10][];
Random random = new Random();
for (int i = 0; i < jagged.Length; i++)
  jagged[i] = new double[random.Next(20)];

Here an array with 10 rows and a random number columns is created.

You would think that the rectangular array would perform faster than the jagged. This is what I thought as well, and it led to terrible performance for Encog. I ended up using jagged arrays, that just so happened to have the same number of columns for each row. This implementing my matrix.

I will now show how I came to this conclusion. To get the best matrix performance it is important if you access the matrix in a column or row-centric way. There is a very good article that explores this in the following article.

http://www.cs.princeton.edu/introcs/95linear/MatrixMultiplication.java.html

The source code I used to experiment with C# arrays is a Java translation of the above source code. The following code shows a basic implementation of matrix multiplication, in Java.

for (int i = 0; i < N; i++)
  for (int j = 0; j < N; j++)
    for (int k = 0; k < N; k++)
      C[i][j] += A[i][k] * B[k][j];

Here the matrix named A is being multiplied against the matrix B, resulting in the Matrix C. The order of the loops can be changed and the program still produces the same results. However, the order of the i, j and k loop has a great deal of impact on the performance of the program. The following Java program tests this and times this. This will give us some base numbers to compare the two different C# array types to.

Listing 1: Comparing Java Matrix Multiplication Techniques

public class MatrixMultiplication {
    public static void show(double[][] a) {
        int N = a.length;
        for (int i = 0; i < N; i++) {
            for (int j = 0; j < N; j++) {
                System.out.printf("%6.4f ", a[i][j]);
            }
            System.out.println();
        }
        System.out.println();
    }


    public static void main(String[] args) {
        int N = 500;
        long start, stop;
        double elapsed;


        // generate input
        start = System.currentTimeMillis(); 

        double[][] A = new double[N][N];
        double[][] B = new double[N][N];
        double[][] C;

        for (int i = 0; i < N; i++)
            for (int j = 0; j < N; j++)
                A[i][j] = Math.random();

        for (int i = 0; i < N; i++)
            for (int j = 0; j < N; j++)
                B[i][j] = Math.random();

        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Generating input:  " + elapsed + " seconds");

        // order 1: ijk = dot product version
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int i = 0; i < N; i++)
            for (int j = 0; j < N; j++)
                for (int k = 0; k < N; k++)
                    C[i][j] += A[i][k] * B[k][j];
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order ijk:  " + elapsed + " seconds");
        if (N < 10) show(C);

        // order 2: ikj
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int i = 0; i < N; i++)
            for (int k = 0; k < N; k++)
                for (int j = 0; j < N; j++)
                    C[i][j] += A[i][k] * B[k][j];
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order ikj:  " + elapsed + " seconds");
        if (N < 10) show(C);

        // order 3: jik
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int j = 0; j < N; j++)
            for (int i = 0; i < N; i++)
                for (int k = 0; k < N; k++)
                    C[i][j] += A[i][k] * B[k][j];
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order jik:  " + elapsed + " seconds");
        if (N < 10) show(C);

        // order 4: jki = GAXPY version
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int j = 0; j < N; j++)
            for (int k = 0; k < N; k++)
                for (int i = 0; i < N; i++)
                    C[i][j] += A[i][k] * B[k][j];
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order jki:  " + elapsed + " seconds");
        if (N < 10) show(C);

        // order 5: kij
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int k = 0; k < N; k++)
            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    C[i][j] += A[i][k] * B[k][j];
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order kij:  " + elapsed + " seconds");
        if (N < 10) show(C);

        // order 6: kji = outer product version
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int k = 0; k < N; k++)
            for (int j = 0; j < N; j++)
                for (int i = 0; i < N; i++)
                    C[i][j] += A[i][k] * B[k][j];
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order kji:  " + elapsed + " seconds");
        if (N < 10) show(C);


        // order 7: jik optimized ala JAMA 
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        double[] bcolj = new double[N];
        for (int j = 0; j < N; j++) {
            for (int k = 0; k < N; k++) bcolj[k] = B[k][j];
            for (int i = 0; i < N; i++) {
                double[] arowi = A[i];
                double sum = 0.0;
                for (int k = 0; k < N; k++) {
                    sum += arowi[k] * bcolj[k];
                }
                C[i][j] = sum;
            }
        }
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order jik JAMA optimized:  " + elapsed + " seconds");
        if (N < 10) show(C);

        // order 8: ikj pure row
        C = new double[N][N];
        start = System.currentTimeMillis(); 
        for (int i = 0; i < N; i++) {
            double[] arowi = A[i];
            double[] crowi = C[i];
            for (int k = 0; k < N; k++) {
                double[] browk = B[k];
                double aik = arowi[k];
                for (int j = 0; j < N; j++) {
                    crowi[j] += aik * browk[j];
                }
            }
        }
        stop = System.currentTimeMillis();
        elapsed = (stop - start) / 1000.0;
        System.out.println("Order ikj pure row:  " + elapsed + " seconds");
        if (N < 10) show(C);

    }
}

Running this program will give the following output on one of my computers. I used an Intel Core 2 Duo, running Windows Vista.

Java Array Times

Generating input:  0.059 seconds
Order ijk:  1.755 seconds
Order ikj:  1.082 seconds
Order jik:  1.711 seconds
Order jki:  2.229 seconds
Order kij:  1.08 seconds
Order kji:  2.206 seconds
Order jik JAMA optimized:  0.377 seconds
Order ikj pure row:  0.374 seconds

As you can see, from the above times, the order of the loops matters greatly. The order of ikj or kij seem to perform the best. There are also two special optimizations provided by the last two that provide even better results. I was seeking to make Encog C# perform as well as Encog Java. I now had my baseline Java numbers. I wanted to see which of C#'s array types would be the closest to Java's performance.

Using a C# Rectangular Array

I began by using C#'s rectangular array. I translated the above application to a C# application that uses rectangular arrays. I came up with the following code.

Listing 2: C# Matrix Multiplication with Rectangular Arrays ([,])

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Matrix1
{
    class Program
    {
        static void Main(string[] args)
        {
            int N = 500;
            long start, stop;
            double elapsed;


            // generate input
            start = DateTime.Now.Ticks;

            double[,] A = new double[N, N];
            double[,] B = new double[N, N];
            double[,] C;

            Random rand = new Random();

            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    A[i, j] = rand.NextDouble();

            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    B[i, j] = rand.NextDouble();

            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Generating input:  " + elapsed + " seconds");

            // order 1: ijk = dot product version
            C = new double[N, N];
            start = DateTime.Now.Ticks;
            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    for (int k = 0; k < N; k++)
                        C[i, j] += A[i, k] * B[k, j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order ijk:  " + elapsed + " seconds");

            // order 2: ikj
            C = new double[N, N];
            start = DateTime.Now.Ticks;
            for (int i = 0; i < N; i++)
                for (int k = 0; k < N; k++)
                    for (int j = 0; j < N; j++)
                        C[i, j] += A[i, k] * B[k, j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order ikj:  " + elapsed + " seconds");

            // order 3: jik
            C = new double[N, N];
            start = DateTime.Now.Ticks;
            for (int j = 0; j < N; j++)
                for (int i = 0; i < N; i++)
                    for (int k = 0; k < N; k++)
                        C[i, j] += A[i, k] * B[k, j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order jik:  " + elapsed + " seconds");

            // order 4: jki = GAXPY version
            C = new double[N, N];
            start = DateTime.Now.Ticks;
            for (int j = 0; j < N; j++)
                for (int k = 0; k < N; k++)
                    for (int i = 0; i < N; i++)
                        C[i, j] += A[i, k] * B[k, j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order jki:  " + elapsed + " seconds");

            // order 5: kij
            C = new double[N, N];
            start = DateTime.Now.Ticks;
            for (int k = 0; k < N; k++)
                for (int i = 0; i < N; i++)
                    for (int j = 0; j < N; j++)
                        C[i, j] += A[i, k] * B[k, j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order kij:  " + elapsed + " seconds");

            // order 6: kji = outer product version
            C = new double[N, N];
            start = DateTime.Now.Ticks;
            for (int k = 0; k < N; k++)
                for (int j = 0; j < N; j++)
                    for (int i = 0; i < N; i++)
                        C[i, j] += A[i, k] * B[k, j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order kji:  " + elapsed + " seconds");
        }
    }
}

This produced the following result.

C# Rectangular Array Times

Generating input:  0.034 seconds
Order ijk:  2.085 seconds
Order ikj:  1.424 seconds
Order jik:  2.067 seconds
Order jki:  2.361 seconds
Order kij:  1.439 seconds
Order kji:  2.36 seconds

As you can see, the performance is significantly worse than Java. The ikj result in Java was 1.082 seconds. In C#, using a rectangular array the result is 1.424 seconds. Almost 50% higher. Many online articles about C# arrays indicate that the rectangular array has the highest performance of the two. And this is what you might initially think. But using DotNet 3.5, I did not find this to be the case. It just did not make sense that DotNet would be 50% slower than Java for basic matrix operations.

Also, you will notice that the above application dropped the two special algorithms used in the Java example. Specifically the JAMA and pure row techniques. These could not be translated to C# rectangular arrays because the rectangular arrays do NOT allow you to only specify the first array element and obtain an entire row. For example, in Java, the following gives you the first row of a 2D array.

double[] x = matrix[0] ;

This would give you the first row of an array in Java. In C#, using a rectangular array, this would give you an error. C# ragged arrays allow the above access. This provides another performance boost to ragged arrays, as you will see in the next section.

Using a C# Ragged Array

I then translated the above program to make use of ragged arrays. This can be seen here.

Listing 3: C# Matrix Multiplication with Rectangular Arrays ([,])

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        public static double[][] allocate(int n)
        {
            double[][] result = new double[n][];
            for (int i = 0; i < n; i++)
            {
                result[i] = new double[n];
            }
            return result;
        }

        static void Main(string[] args)
        {
            int N = 500;
            long start, stop;
            double elapsed;


            // generate input
            start = DateTime.Now.Ticks;

            double[][] A = allocate(N);
            double[][] B = allocate(N);
            double[][] C;

            Random rand = new Random();

            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    A[i][j] = rand.NextDouble();

            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    B[i][j] = rand.NextDouble();

            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Generating input:  " + elapsed + " seconds");

            // order 1: ijk = dot product version
            C = allocate(N);
            start = DateTime.Now.Ticks;
            for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                    for (int k = 0; k < N; k++)
                        C[i][j] += A[i][k] * B[k][j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order ijk:  " + elapsed + " seconds");

            // order 2: ikj
            C = allocate(N);
            start = DateTime.Now.Ticks;
            for (int i = 0; i < N; i++)
                for (int k = 0; k < N; k++)
                    for (int j = 0; j < N; j++)
                        C[i][j] += A[i][k] * B[k][j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order ikj:  " + elapsed + " seconds");

            // order 3: jik
            C = allocate(N);
            start = DateTime.Now.Ticks;
            for (int j = 0; j < N; j++)
                for (int i = 0; i < N; i++)
                    for (int k = 0; k < N; k++)
                        C[i][j] += A[i][k] * B[k][j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order jik:  " + elapsed + " seconds");

            // order 4: jki = GAXPY version
            C = allocate(N);
            start = DateTime.Now.Ticks;
            for (int j = 0; j < N; j++)
                for (int k = 0; k < N; k++)
                    for (int i = 0; i < N; i++)
                        C[i][j] += A[i][k] * B[k][j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order jki:  " + elapsed + " seconds");

            // order 5: kij
            C = allocate(N);
            start = DateTime.Now.Ticks;
            for (int k = 0; k < N; k++)
                for (int i = 0; i < N; i++)
                    for (int j = 0; j < N; j++)
                        C[i][j] += A[i][k] * B[k][j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order kij:  " + elapsed + " seconds");

            // order 6: kji = outer product version
            C = allocate(N);
            start = DateTime.Now.Ticks;
            for (int k = 0; k < N; k++)
                for (int j = 0; j < N; j++)
                    for (int i = 0; i < N; i++)
                        C[i][j] += A[i][k] * B[k][j];
            stop = DateTime.Now.Ticks;
            elapsed = (stop - start) / 1000.0 / 10000;
            Console.WriteLine("Order kji:  " + elapsed + " seconds");

            // order 7: jik optimized ala JAMA 
            C = allocate(N);
            start = DateTime.Now.Ticks;  
            double[] bcolj = new double[N];
            for (int j = 0; j < N; j++) {
                for (int k = 0; k < N; k++) bcolj[k] = B[k][j];
                for (int i = 0; i < N; i++) {
                    double[] arowi = A[i];
                    double sum = 0.0;
                    for (int k = 0; k < N; k++) {
                        sum += arowi[k] * bcolj[k];
                    }
                    C[i][j] = sum;
                }
            }
            stop = DateTime.Now.Ticks; 
            elapsed = (stop - start) / 1000.0 / 10000.0;
            Console.WriteLine("Order jik JAMA optimized:  " + elapsed + " seconds");
                
            // order 8: ikj pure row
            C = allocate(N);
            start = DateTime.Now.Ticks;  
            for (int i = 0; i < N; i++) {
                double[] arowi = A[i];
                double[] crowi = C[i];
                for (int k = 0; k < N; k++) {
                    double[] browk = B[k];
                    double aik = arowi[k];
                    for (int j = 0; j < N; j++) {
                        crowi[j] += aik * browk[j];
                    }
                }
            }
            stop = DateTime.Now.Ticks; 
            elapsed = (stop - start) / 1000.0 / 10000.0;
            Console.WriteLine("Order ikj pure row:  " + elapsed + " seconds");
        }
    }
}

This produced the following output.

C# Ragged Array Times

Generating input:  0.033 seconds
Order ijk:  1.821 seconds
Order ikj:  0.964 seconds
Order jik:  1.579 seconds
Order jki:  2.268 seconds
Order kij:  1.043 seconds
Order kji:  2.149 seconds
Order jik JAMA optimized:  0.601 seconds
Order ikj pure row:  0.407 seconds

As you can see, the performance, even using the simple non-optimized techniques is considerably better. The ikj ordering is now performing at just under a second (0.964). This is much better than the rectangular application's time of 1.424. Even slightly faster than the Java application's time of 1.082.

My conclusion is that using a ragged array gives the best performance in C# for matrix operations. The resulting performance causes the C# and Java matrix operations to have nearly the same performance.

Calais Document Category: 
Programming Language: 
Technology: 
Events Facts: 

Comments

Paul Russell's picture

Similarly, I found that, in C++ programming, use of a 2D rectangular array gave poorer performance than use of a rectangular dynamically allocated 1D array of dynamically allocated 1D arrays. The dynamically allocated 1D array of dynamically allocated 1D arrays is equivalent to the 2D ragged array in C#.

dsblank's picture

I tried the same tests under Linux with Mono, and got the opposite results.

Rectangular matrix:

Generating input:  0.030143 seconds
Order ijk:  4.267592 seconds
Order ikj:  2.567762 seconds
Order jik:  3.973856 seconds
Order jki:  4.492291 seconds
Order kij:  2.60914 seconds
Order kji:  4.382744 seconds

And the ragged matrix:

Generating input:  0.036513 seconds
Order ijk:  8.68827 seconds
Order ikj:  1.33076 seconds
Order jik:  8.839093 seconds
Order jki:  15.293305 seconds
Order kij:  1.349857 seconds
Order kji:  15.180922 seconds
Order jik JAMA optimized:  1.063982 seconds
Order ikj pure row:  0.812425 seconds
rishka's picture

Hi,
ragged Arrays? In my poor knowledge, these arrays a[][] are called jagged.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer