Sunday, July 25, 2010

Task Parallel Library - Refresher - Stop an Iteration

I have been talking about the Task Parallel Library (TPL) 2 years back , when it was in CTP, I was taking a class on High Performance Computing yesterday, and I just remembered that I have forgotten all about this library :).

This library has now been officially released with .Net 4.0 and Microsoft recommend you to actually use this library if possible when writing concurrent programs, so that your program can take maximum advantage on the numbers of processes that you have.

I thought of posting a sample code as a refresher,.
Here is the example, I have a list of Customer objects and I need to get the object that matches a specific criteria, lets say the Name property should be "F".

Here is how my Customer object looks like.

public class Customer
{
public int ID { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}

Lets assume that the Name property is unique.

If I was to write the algorithm for this in .Net 1.1 or .Net 2.0 my logic would look like this.

foreach (Customer c in dataSource)
{

if (c.Name == "F")
{
result = c;
break;
}
}

This code will run in the same thread, unless you want to write a partition algorithm and then give crunches of the data source to different threads.

This is where TPL comes in to play, if I was using TPL I would write this code like this.

Customer result = null;
IList dataSource = GetMockDataSource(); //Get the data

Parallel.For(0, dataSource.Count, (i, state) =>
{
if (!state.IsStopped)
{
Customer c = dataSource[i];
if (c.Name == "F")
{
result = c;
state.Stop();

}
Console.WriteLine(c.Name);
}

});

This is what happens under the cover, the TPL runtime would partition the array (in our case we are using mere numbers and accessing the Customer object through the index) and create threads and give crunches of the indexes to each and every thread. The runtime can actually now spawn thread on different cores according to resource availability. By comparison this will increase performance as we are dividing the Customer list into crunches and each crunch is processed by different threads managed by the runtime.

Lets examine the code,
Line number 1 and 2 says it all, line number 3 is the place where we are using the TPL library.
The Parallel class is within the System.Threading.Tasks namespace, the static method For has many overloads, in the one that we used, the first parameter specifies the index the loop should start from and the second parameter specifies where the loop should end.

The 3 parameter, takes in an Action Delegate of type , for simplicity I have implemented it as a lambda function.
Within the lambda function, I check if the current Customer object satisfies our criteria, if so I use the ParallelLoopState object to signal to the runtime that we should now stop all iterations as we have found what we have been looking for by signaling ParallelLoopState.Stop().

When you call the Stop method on the ParallelLoopState object, the runtime will not create any more iteration , however, it cannot stop the iteration that have already started, so we explicitly check if the some other thread has signaled to stop by checking the IsStopped property of the ParallelLoopState object.

Although this example could have been done more efficiently using PLINQ, I chose the task library to show the underlining basics.

No comments:

Post a Comment