I’ve been tasked with presenting a training session at work on the new concurrent programming features in the .NET Framework 4.0, so I’ve been playing around with a Mandelbrot set visualiser for which I have written three routines: a sequential routine, a (potentially) each line can be calculated in parallel, and a third where (potentially) each pixel can be calculated in parallel. I say “potentially” because the resource manager underneath the TPL actually manages the threading for you. In essence, when using a technique like Parallel.For you create a lambda function for the loop body, any single iteration of which could be executed in parallel to any other iteration of the loop body, and the TPL takes care of scheduling these iterations across some magically optimized number of threads.
My sequential code is a nested for loop where the out loop iterates y and the inner loop iterates x. In the Parallel.For version of the visualiser I replace the outer loop with a Parallel.For like so:
Parallel.For(0, parameters.imageHeight, y =>
double c_im = parameters.maxIm - y * parameters.imFactor;
for (uint x = 0; x < parameters.imageWidth; ++x)
mandelbrotPixel(bmd, (uint)y, c_im, x);
What this means is that the TPL will try to run as many iterations of the body of the lambda function in parallel as is sensible.
So, onto the results. I ran the same program on my work machine (Core2 Duo, 4GB RAM, Vista 32) and my home machine (Core2 Quad, 2GB RAM, Windows 7 64), and got the following results:
Test | Time (work) | Time (home)
Sequential | 9.480s | 9.373s
Parallel.For| 4.540s | 2.451s
Nested P.F | 4.300s | 2.403s
As you can see, by simply running the code on a machine with two more cores, the runtime was halved.
In order to get the run-times large enough for changes in performance to be meaningful I made the application generate a 4096 by 4096 image.
Take a look at the source code here: ParallelProgramming.zip, it’s a Visual Studio 2010 solution, but the code is fairly straightforward.