Monday, July 26, 2010

Parallel LINQ (PLINQ) - Intro

.Net 4.0 supports parallel LINQ or PLINQ, PLINQ is a parallel implementation of LINQ.
PLINQ has the same characteristics has LINQ, in that it executes queries in a differed manner.
However, the main difference is that with PLINQ, your data source gets partitioned and each chunk is processed by different worker threads (taking into account the number of processor cores that you have) , making your query execute much faster in certain occasions.

Running a query in parallel is just a matter of calling the AsParallel() method of the data source, this will return a ParallelQuery<T> and your query will execute parallel.

Let's look a code sample...

var query = from num in source.AsParallel()
where num % 3 == 0
select ProcessNumber(num);

Now, when this query is iterated our a foreach loop or when you call ToList() etc..the query will be run in different worker threads.

Although, you have parallelized your query execution, if you want to do something with that result within a loop, then that processing will happen serially although you query executed in a parallel way.

You can achieve this parallelism by running the loop using a Parallel.ForEach() or you can use the ForAll method like this....

var query = from num in source.AsParallel()
where num % 3 == 0
select ProcessNumber(num);


query.ForAll
( x => { /*Do Something*/ } );

In the code above the query will run in parallel as well as the result will be processed parallely.

Running LINQ queries in parallel is does not always gives you best performance, this is basically due to the fact that the initialization and partitioning outwits the cost of actually running the query in parallel.
Hence, it's necessary for you to compare which option is best LINQ or PLINQ.
MSDN documents that PLINQ will first see if the query can be run in parallel, then sees the cost of running this query in parallel vs sequentially, if the cost of running this in parallel is more then running it sequentially, then the runtime will run this query in a sequential manner.
I tried it out, but could not actually see the difference :)

Another good option that you might want to run your query if you are thinking of running it in a ForAll method is running the parallel query with a ParallelMergeOptions.
By default, although the query executes parallely, the runtime would have to merge the results from different worker threads into one single result if your running the query over a foreach loop or doing a ToList(), this sometimes causes a partial buffering.

However if you are iterating the query over a ForAll() you can take the benefit of not buffering the record and processing it once the result return from the thread without buffering...here is a code sample on how to do this...

var query = from num in source.AsParallel().WithMergeOptions(ParallelMergeOptions.NotBuffered)
where num % 3 == 0
select ProcessNumber(num);


query.ForAll
( x => { /*Do Something*/ } );

Although using a ForAll() consumes the items when it returns from the thread, I saw some noticeable difference when running the query with a ParallelMergeOption.

No comments:

Post a Comment