Choosing the Best C# Array Type for Matrix Multiplication
Implementing 2D arrays in C# involves some decisions as to how to represent the array. C# has two different types of arrays. This is somewhat different than the Java programming language, which supports only a single type of array. In C# you must choose between rectangular and jagged arrays. There are very important considerations for each of these array types. There are also many articles about the differences between array handling in Java and C#. This article focuses on one thing-- performance. Particularly how to implement a matrix with the best performance. In a nutshell, I found that I got 50% worse performance from the rectangular array than I did the ragged one.
As the creator of the Encog Artificial Intelligence Framework for Java and C#, I do a fair amount of matrix programming in both Java and C#. Neural networks are mathematically intensive and it is important to get the maximum performance from the arrays. I found that Encog for C# was operating at about one third of Encog for Java. I decided the performance of the two languages can't be that far off. I had to be doing something wrong. This turned out to be the case. I was using the slower of C#'s two array types to implement Encog.
First, lets look at the two array types supported by C#. Consider the following Java code.
double[][] matrix = new double[n][m];
This code won't compile in C#. To translate this code to C# you need to think about which of C#'s two array types you want to use. C# supports a rectangular array, which uses the following syntax.
double[,] rectangular = new double[n,m];
These arrays must be perfectly rectangular, each row must have exactly the same number of columns. The second type of array supported by C# is the jagged array. A jagged array is seen here.
double[][]jagged = new double[10][]; Random random = new Random(); for (int i = 0; i < jagged.Length; i++) jagged[i] = new double[random.Next(20)];
Here an array with 10 rows and a random number columns is created.
You would think that the rectangular array would perform faster than the jagged. This is what I thought as well, and it led to terrible performance for Encog. I ended up using jagged arrays, that just so happened to have the same number of columns for each row. This implementing my matrix.
I will now show how I came to this conclusion. To get the best matrix performance it is important if you access the matrix in a column or row-centric way. There is a very good article that explores this in the following article.
http://www.cs.princeton.edu/introcs/95linear/MatrixMultiplication.java.html
The source code I used to experiment with C# arrays is a Java translation of the above source code. The following code shows a basic implementation of matrix multiplication, in Java.
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
Here the matrix named A is being multiplied against the matrix B, resulting in the Matrix C. The order of the loops can be changed and the program still produces the same results. However, the order of the i, j and k loop has a great deal of impact on the performance of the program. The following Java program tests this and times this. This will give us some base numbers to compare the two different C# array types to.
Listing 1: Comparing Java Matrix Multiplication Techniques
public class MatrixMultiplication {
public static void show(double[][] a) {
int N = a.length;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
System.out.printf("%6.4f ", a[i][j]);
}
System.out.println();
}
System.out.println();
}
public static void main(String[] args) {
int N = 500;
long start, stop;
double elapsed;
// generate input
start = System.currentTimeMillis();
double[][] A = new double[N][N];
double[][] B = new double[N][N];
double[][] C;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
A[i][j] = Math.random();
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
B[i][j] = Math.random();
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Generating input: " + elapsed + " seconds");
// order 1: ijk = dot product version
C = new double[N][N];
start = System.currentTimeMillis();
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order ijk: " + elapsed + " seconds");
if (N < 10) show(C);
// order 2: ikj
C = new double[N][N];
start = System.currentTimeMillis();
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
C[i][j] += A[i][k] * B[k][j];
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order ikj: " + elapsed + " seconds");
if (N < 10) show(C);
// order 3: jik
C = new double[N][N];
start = System.currentTimeMillis();
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order jik: " + elapsed + " seconds");
if (N < 10) show(C);
// order 4: jki = GAXPY version
C = new double[N][N];
start = System.currentTimeMillis();
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
for (int i = 0; i < N; i++)
C[i][j] += A[i][k] * B[k][j];
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order jki: " + elapsed + " seconds");
if (N < 10) show(C);
// order 5: kij
C = new double[N][N];
start = System.currentTimeMillis();
for (int k = 0; k < N; k++)
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
C[i][j] += A[i][k] * B[k][j];
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order kij: " + elapsed + " seconds");
if (N < 10) show(C);
// order 6: kji = outer product version
C = new double[N][N];
start = System.currentTimeMillis();
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
C[i][j] += A[i][k] * B[k][j];
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order kji: " + elapsed + " seconds");
if (N < 10) show(C);
// order 7: jik optimized ala JAMA
C = new double[N][N];
start = System.currentTimeMillis();
double[] bcolj = new double[N];
for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) bcolj[k] = B[k][j];
for (int i = 0; i < N; i++) {
double[] arowi = A[i];
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += arowi[k] * bcolj[k];
}
C[i][j] = sum;
}
}
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order jik JAMA optimized: " + elapsed + " seconds");
if (N < 10) show(C);
// order 8: ikj pure row
C = new double[N][N];
start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
double[] arowi = A[i];
double[] crowi = C[i];
for (int k = 0; k < N; k++) {
double[] browk = B[k];
double aik = arowi[k];
for (int j = 0; j < N; j++) {
crowi[j] += aik * browk[j];
}
}
}
stop = System.currentTimeMillis();
elapsed = (stop - start) / 1000.0;
System.out.println("Order ikj pure row: " + elapsed + " seconds");
if (N < 10) show(C);
}
}
Running this program will give the following output on one of my computers. I used an Intel Core 2 Duo, running Windows Vista.
Java Array Times
Generating input: 0.059 seconds Order ijk: 1.755 seconds Order ikj: 1.082 seconds Order jik: 1.711 seconds Order jki: 2.229 seconds Order kij: 1.08 seconds Order kji: 2.206 seconds Order jik JAMA optimized: 0.377 seconds Order ikj pure row: 0.374 seconds
As you can see, from the above times, the order of the loops matters greatly. The order of ikj or kij seem to perform the best. There are also two special optimizations provided by the last two that provide even better results. I was seeking to make Encog C# perform as well as Encog Java. I now had my baseline Java numbers. I wanted to see which of C#'s array types would be the closest to Java's performance.
Using a C# Rectangular Array
I began by using C#'s rectangular array. I translated the above application to a C# application that uses rectangular arrays. I came up with the following code.
Listing 2: C# Matrix Multiplication with Rectangular Arrays ([,])
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Matrix1
{
class Program
{
static void Main(string[] args)
{
int N = 500;
long start, stop;
double elapsed;
// generate input
start = DateTime.Now.Ticks;
double[,] A = new double[N, N];
double[,] B = new double[N, N];
double[,] C;
Random rand = new Random();
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
A[i, j] = rand.NextDouble();
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
B[i, j] = rand.NextDouble();
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Generating input: " + elapsed + " seconds");
// order 1: ijk = dot product version
C = new double[N, N];
start = DateTime.Now.Ticks;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
C[i, j] += A[i, k] * B[k, j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order ijk: " + elapsed + " seconds");
// order 2: ikj
C = new double[N, N];
start = DateTime.Now.Ticks;
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
C[i, j] += A[i, k] * B[k, j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order ikj: " + elapsed + " seconds");
// order 3: jik
C = new double[N, N];
start = DateTime.Now.Ticks;
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
C[i, j] += A[i, k] * B[k, j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order jik: " + elapsed + " seconds");
// order 4: jki = GAXPY version
C = new double[N, N];
start = DateTime.Now.Ticks;
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
for (int i = 0; i < N; i++)
C[i, j] += A[i, k] * B[k, j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order jki: " + elapsed + " seconds");
// order 5: kij
C = new double[N, N];
start = DateTime.Now.Ticks;
for (int k = 0; k < N; k++)
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
C[i, j] += A[i, k] * B[k, j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order kij: " + elapsed + " seconds");
// order 6: kji = outer product version
C = new double[N, N];
start = DateTime.Now.Ticks;
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
C[i, j] += A[i, k] * B[k, j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order kji: " + elapsed + " seconds");
}
}
}
This produced the following result.
C# Rectangular Array Times
Generating input: 0.034 seconds Order ijk: 2.085 seconds Order ikj: 1.424 seconds Order jik: 2.067 seconds Order jki: 2.361 seconds Order kij: 1.439 seconds Order kji: 2.36 seconds
As you can see, the performance is significantly worse than Java. The ikj result in Java was 1.082 seconds. In C#, using a rectangular array the result is 1.424 seconds. Almost 50% higher. Many online articles about C# arrays indicate that the rectangular array has the highest performance of the two. And this is what you might initially think. But using DotNet 3.5, I did not find this to be the case. It just did not make sense that DotNet would be 50% slower than Java for basic matrix operations.
Also, you will notice that the above application dropped the two special algorithms used in the Java example. Specifically the JAMA and pure row techniques. These could not be translated to C# rectangular arrays because the rectangular arrays do NOT allow you to only specify the first array element and obtain an entire row. For example, in Java, the following gives you the first row of a 2D array.
double[] x = matrix[0] ;
This would give you the first row of an array in Java. In C#, using a rectangular array, this would give you an error. C# ragged arrays allow the above access. This provides another performance boost to ragged arrays, as you will see in the next section.
Using a C# Ragged Array
I then translated the above program to make use of ragged arrays. This can be seen here.
Listing 3: C# Matrix Multiplication with Rectangular Arrays ([,])
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
public static double[][] allocate(int n)
{
double[][] result = new double[n][];
for (int i = 0; i < n; i++)
{
result[i] = new double[n];
}
return result;
}
static void Main(string[] args)
{
int N = 500;
long start, stop;
double elapsed;
// generate input
start = DateTime.Now.Ticks;
double[][] A = allocate(N);
double[][] B = allocate(N);
double[][] C;
Random rand = new Random();
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
A[i][j] = rand.NextDouble();
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
B[i][j] = rand.NextDouble();
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Generating input: " + elapsed + " seconds");
// order 1: ijk = dot product version
C = allocate(N);
start = DateTime.Now.Ticks;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order ijk: " + elapsed + " seconds");
// order 2: ikj
C = allocate(N);
start = DateTime.Now.Ticks;
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
C[i][j] += A[i][k] * B[k][j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order ikj: " + elapsed + " seconds");
// order 3: jik
C = allocate(N);
start = DateTime.Now.Ticks;
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order jik: " + elapsed + " seconds");
// order 4: jki = GAXPY version
C = allocate(N);
start = DateTime.Now.Ticks;
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
for (int i = 0; i < N; i++)
C[i][j] += A[i][k] * B[k][j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order jki: " + elapsed + " seconds");
// order 5: kij
C = allocate(N);
start = DateTime.Now.Ticks;
for (int k = 0; k < N; k++)
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
C[i][j] += A[i][k] * B[k][j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order kij: " + elapsed + " seconds");
// order 6: kji = outer product version
C = allocate(N);
start = DateTime.Now.Ticks;
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
C[i][j] += A[i][k] * B[k][j];
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000;
Console.WriteLine("Order kji: " + elapsed + " seconds");
// order 7: jik optimized ala JAMA
C = allocate(N);
start = DateTime.Now.Ticks;
double[] bcolj = new double[N];
for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) bcolj[k] = B[k][j];
for (int i = 0; i < N; i++) {
double[] arowi = A[i];
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += arowi[k] * bcolj[k];
}
C[i][j] = sum;
}
}
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000.0;
Console.WriteLine("Order jik JAMA optimized: " + elapsed + " seconds");
// order 8: ikj pure row
C = allocate(N);
start = DateTime.Now.Ticks;
for (int i = 0; i < N; i++) {
double[] arowi = A[i];
double[] crowi = C[i];
for (int k = 0; k < N; k++) {
double[] browk = B[k];
double aik = arowi[k];
for (int j = 0; j < N; j++) {
crowi[j] += aik * browk[j];
}
}
}
stop = DateTime.Now.Ticks;
elapsed = (stop - start) / 1000.0 / 10000.0;
Console.WriteLine("Order ikj pure row: " + elapsed + " seconds");
}
}
}
This produced the following output.
C# Ragged Array Times
Generating input: 0.033 seconds Order ijk: 1.821 seconds Order ikj: 0.964 seconds Order jik: 1.579 seconds Order jki: 2.268 seconds Order kij: 1.043 seconds Order kji: 2.149 seconds Order jik JAMA optimized: 0.601 seconds Order ikj pure row: 0.407 seconds
As you can see, the performance, even using the simple non-optimized techniques is considerably better. The ikj ordering is now performing at just under a second (0.964). This is much better than the rectangular application's time of 1.424. Even slightly faster than the Java application's time of 1.082.
My conclusion is that using a ragged array gives the best performance in C# for matrix operations. The resulting performance causes the C# and Java matrix operations to have nearly the same performance.




Comments
Choosing the Best C# Array Type
Similarly, I found that, in C++ programming, use of a 2D rectangular array gave poorer performance than use of a rectangular dynamically allocated 1D array of dynamically allocated 1D arrays. The dynamically allocated 1D array of dynamically allocated 1D arrays is equivalent to the 2D ragged array in C#.
Interesting, and different results
I tried the same tests under Linux with Mono, and got the opposite results.
Rectangular matrix:
And the ragged matrix: