Monday, July 26, 2010

Parallel LINQ (PLINQ) - Intro

.Net 4.0 supports parallel LINQ or PLINQ, PLINQ is a parallel implementation of LINQ.
PLINQ has the same characteristics has LINQ, in that it executes queries in a differed manner.
However, the main difference is that with PLINQ, your data source gets partitioned and each chunk is processed by different worker threads (taking into account the number of processor cores that you have) , making your query execute much faster in certain occasions.

Running a query in parallel is just a matter of calling the AsParallel() method of the data source, this will return a ParallelQuery<T> and your query will execute parallel.

Let's look a code sample...

var query = from num in source.AsParallel()
where num % 3 == 0
select ProcessNumber(num);

Now, when this query is iterated our a foreach loop or when you call ToList() etc..the query will be run in different worker threads.

Although, you have parallelized your query execution, if you want to do something with that result within a loop, then that processing will happen serially although you query executed in a parallel way.

You can achieve this parallelism by running the loop using a Parallel.ForEach() or you can use the ForAll method like this....

var query = from num in source.AsParallel()
where num % 3 == 0
select ProcessNumber(num);


query.ForAll
( x => { /*Do Something*/ } );

In the code above the query will run in parallel as well as the result will be processed parallely.

Running LINQ queries in parallel is does not always gives you best performance, this is basically due to the fact that the initialization and partitioning outwits the cost of actually running the query in parallel.
Hence, it's necessary for you to compare which option is best LINQ or PLINQ.
MSDN documents that PLINQ will first see if the query can be run in parallel, then sees the cost of running this query in parallel vs sequentially, if the cost of running this in parallel is more then running it sequentially, then the runtime will run this query in a sequential manner.
I tried it out, but could not actually see the difference :)

Another good option that you might want to run your query if you are thinking of running it in a ForAll method is running the parallel query with a ParallelMergeOptions.
By default, although the query executes parallely, the runtime would have to merge the results from different worker threads into one single result if your running the query over a foreach loop or doing a ToList(), this sometimes causes a partial buffering.

However if you are iterating the query over a ForAll() you can take the benefit of not buffering the record and processing it once the result return from the thread without buffering...here is a code sample on how to do this...

var query = from num in source.AsParallel().WithMergeOptions(ParallelMergeOptions.NotBuffered)
where num % 3 == 0
select ProcessNumber(num);


query.ForAll
( x => { /*Do Something*/ } );

Although using a ForAll() consumes the items when it returns from the thread, I saw some noticeable difference when running the query with a ParallelMergeOption.

Implementing Asynchronous Callbacks with Task Parallel Library

Bored again...so thought of posting how you can implement callback with Task Parallel Library (TPL).

So what am I talking here, basically I start a task in one thread and I want it to call another method once it completes (Asynchronous callbacks ).

Here is a sample code....

Task<int> parent = new Task<int>(
() =>
{
Console.WriteLine("In parent");
return 100; }
);


Task<int> child = parent.ContinueWith( a =>
{
Console.WriteLine("In Child");
return 19 + a.Result;
});

parent.Start()

Console.WriteLine(child.Result);

The code explains it all, all I have to do is create the task and then call its ContinueWith method and register the callback, its important to note that the parent task is an input to the continuation callback and the result of the parent can be accessed by the callback.

The callback is again another Task, so it does not block the calling thread.
The callback in TPL gives you more flexibility, in the way you want the callback to be invoked, for an example I can specify that I want the callback to be invoked only if the parent did not run successfully to the end.

I can re-write the above code to do that exactly by passing in a TaskContinuationOptions option as the 2nd parameter of the ContinueWith method.

Task<int> parent = new Task<int>(
() =>
{
Console.WriteLine("In parent");
return 100; }
);


Task<int> child = parent.ContinueWith( a =>
{
Console.WriteLine("In Child");
return 19 + a.Result;
},
TaskContinuationOptions.NotOnRanToCompletion
);

parent.Start()

Console.WriteLine(child.Result);

The option is bitwise so I can specify several options demarcated by the pipe line. A few important options would be NotOnCanceled, OnlyOnRanToCompletion, OnlyOnFaulted etc

Running Parallel Tasks with The Task Parallel Library

Down at home with Conjunctivitis, was boring at home, so was listening to some old classics and then thought of writing a post on how you can run tasks with the Task Parallel Library (TPL).

Going forward, Microsoft encourages developers to use TPL for concurrent programming. In my previous post I talked about data parallelism , where I showed how blocks of work running inside a loop can be scheduled to run on different threads.

In previous versions of .Net if I want to execute a task in another thread I had do this.
Thread thread = new Thread(
() =>
{
//Do some work
Console.WriteLine("Starting thread");
}
);
thread.Start();

With TPL I only do this..

Parallel.Invoke(
() =>
{
//Do some work
Console.WriteLine("Starting thread");
}
);
The static Invoke method of the Parallel class has 2 overloads, the one that we use takes in a number of varying void and parameter less delegates.

If you want more control over, what you pass into the thread and if you also need the return value, you could use the Task class within the System.Threading.Tasks namespace.

.Net 2.0 introduced the Thread class with another constructor that takes in a ParameterizedThreadStart delegate that does not have a return type but takes in an object as a parameter.

With TPL this can be achieved much more easily with a Task class, which will take care of scheduling this work in another thread.
Lets take a look at some code...

Task <int> task = new Task <int>(obj =>
{
return ((int)obj + 10);
},
14);

task.Start();
Console.WriteLine(task.Result);

Line number 1, we create a task object, the generic integer specifies that the return value from the thread is an integer.

Next, the first parameter into the .Ctor of the Task object is a delegate that takes in object and returns a value of the type specified as the generic, in our case it is an integer.

The next parameter is the state object, basically this is the input parameter into the thread. Finally the 3rd parameter takes the actual value of the parameter that we pass into the thread, in this case I am passing 14.

Inside the lambda function, I just add the input value with 10 and return, now I can access the Task.Result property and would see 24.

Accessing the Result property of the Task object before the execution of the thread will cause the calling (main) thread to halt and will return once the value for result is available.

Another efficient way of running this task in TPL is like this....

Task<int>t = Task.Factory.StartNew<int>(
obj =>
{
return ((int)obj + 10);
},
14);

Console.WriteLine(t.Result);

I like the above if I don't need the flexibility of creating the Task separately and the starting it separately.

Sunday, July 25, 2010

ConcurrentBag<T> - Thread Safe Collections

.Net 4.0 introduces a new namespace, System.Collections.Concurrent, this namespace contains a set of collections that would be very much useful in threaded programming.
The Add or Remove methods of List<T> is not thread safe, meaning that if you are adding or removing items from a list which is accessed from multiple threads, you will end up overwriting some of the items.
So, in multi threading programming, you would need to lock the list before adding or removing items from it.

The ConcurrentBag<T&gt in the System.Collections.Concurrent is a thread safe collections i.e all you have to do is call Add or Remove in the usual way and the collection will take care of adding the items without overwriting them.

Here is an example using the ConcurrentBag<T&gt with the Parallel Task Library...

ConcurrentBag array = new ConcurrentBag();
try
{
Parallel.For(0, 100000, i =>
{
//Do some work here
array.Add(i);

});


}
catch (AggregateException e)
{
Console.WriteLine("\n\n\n\n");
foreach (Exception ex in e.InnerExceptions)
{
Console.WriteLine(ex.Message);
}

}

The ConcurrentBag resembles the List in that it is unordered and contain duplicate values and also accept a null as valid value for a reference type.

The System.Collection.Concurrent namespace also contains other implementation of thread safe collections, to name a few,ConcurrentStack, ConcurrentDictionary, ConcurrentQueue...

Task Parallel Library - Refresher - Stop an Iteration

I have been talking about the Task Parallel Library (TPL) 2 years back , when it was in CTP, I was taking a class on High Performance Computing yesterday, and I just remembered that I have forgotten all about this library :).

This library has now been officially released with .Net 4.0 and Microsoft recommend you to actually use this library if possible when writing concurrent programs, so that your program can take maximum advantage on the numbers of processes that you have.

I thought of posting a sample code as a refresher,.
Here is the example, I have a list of Customer objects and I need to get the object that matches a specific criteria, lets say the Name property should be "F".

Here is how my Customer object looks like.

public class Customer
{
public int ID { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}

Lets assume that the Name property is unique.

If I was to write the algorithm for this in .Net 1.1 or .Net 2.0 my logic would look like this.

foreach (Customer c in dataSource)
{

if (c.Name == "F")
{
result = c;
break;
}
}

This code will run in the same thread, unless you want to write a partition algorithm and then give crunches of the data source to different threads.

This is where TPL comes in to play, if I was using TPL I would write this code like this.

Customer result = null;
IList dataSource = GetMockDataSource(); //Get the data

Parallel.For(0, dataSource.Count, (i, state) =>
{
if (!state.IsStopped)
{
Customer c = dataSource[i];
if (c.Name == "F")
{
result = c;
state.Stop();

}
Console.WriteLine(c.Name);
}

});

This is what happens under the cover, the TPL runtime would partition the array (in our case we are using mere numbers and accessing the Customer object through the index) and create threads and give crunches of the indexes to each and every thread. The runtime can actually now spawn thread on different cores according to resource availability. By comparison this will increase performance as we are dividing the Customer list into crunches and each crunch is processed by different threads managed by the runtime.

Lets examine the code,
Line number 1 and 2 says it all, line number 3 is the place where we are using the TPL library.
The Parallel class is within the System.Threading.Tasks namespace, the static method For has many overloads, in the one that we used, the first parameter specifies the index the loop should start from and the second parameter specifies where the loop should end.

The 3 parameter, takes in an Action Delegate of type , for simplicity I have implemented it as a lambda function.
Within the lambda function, I check if the current Customer object satisfies our criteria, if so I use the ParallelLoopState object to signal to the runtime that we should now stop all iterations as we have found what we have been looking for by signaling ParallelLoopState.Stop().

When you call the Stop method on the ParallelLoopState object, the runtime will not create any more iteration , however, it cannot stop the iteration that have already started, so we explicitly check if the some other thread has signaled to stop by checking the IsStopped property of the ParallelLoopState object.

Although this example could have been done more efficiently using PLINQ, I chose the task library to show the underlining basics.

Saturday, July 17, 2010

HashSet<T> vs .Net 4.0 SortedSet<T>

HashSet<T> has been there for a while, it can store objects in such a way that the time to add an item to the set or remove it or search for an item is O(1), constant time.
It uses hash based implementation to achieve this constant time for these operations, however when you want to iterate this collection in a sorted way, the operation is intensive, as the values within the HashSet is not sorted, you need to create a sorted collection and then iterate this collection.
As the values are stored within the collection indirectly based on hash, the sort operation is expensive and also a new collection has to be created.

This is where SortedSet<T> comes into play, this collection type was introduced in .Net 4.0, when you add item to this collection, the item is placed in the collection according to the sort criteria, thus when you need to iterate this sorted collection, it is much faster then the HashSet.

SortedSet has it's own cons, now that the items has to be placed in the correct position in the collection according to the sort order the Add() operation and Remove() operation don't take constant time anymore.

Searching for an element, would mean that a binary search has to be done on the collection which is logarithmic time.

The conclusion is that these Sets can be used accordingly to the requirements that you have and you are not forced to choose one collection over the other, if you need to iterate over a sorted collection, then a SortedSet would be the best choice.

.Net 4.0 also introduces the ISet<T> interface, both HashSet and SortedSet implements this interface, so you can always program to an interface and change your type in the middle of the implementation if you feel HashSet would do better then SortedSet.

Btw, why do you need to use sets anyways ?, cos' then they can be manipulated with set operation like Union, Intersect etc.. and that they can contain only unique elements.

Saturday, July 10, 2010

ASP.NET Localization - Implicit Resource Assignment


In one of my previous post I talked about ASP.NET localization basic and 2 ways of which resources can be assigned to ASP.NET pages and controls were also discussed.
In this post I will talk about a powerful ASP.net feature that can be used to assign resources implicitly.
This feature can be used to assign multiple resources to a control at one shot without individually assigning resources.

Lets take an example to demonstrate this, I am going to place a simple button control on a page.
Now I want assign 3 types of resources, the Text, Color of the button and the tooltip of the button.
This is my resource file.

I have assigned values for all fields that I require. Note how the key has been named, that is a prefix followed by a period and then followed by the property of the control that I need to assign these resources too, for an example "button" is my prefix followed by a period and then the "Text" property.

Now, in my ASPX page all I have to do is this.
Note that without assigning resources to individual properties I add a meta:resourcekey attribute, and I pass the prefix as the value for this attribute.

In the background ASP.NET will collect all the resource keys for this page and then filter out the ones that contain the prefix "button" and then for each of these filtered key, it will see if the suffix matches a property in the control if so it will assign the resource value to that property.

For an example in our example the initial filtered list will contain 3 keys, "button.Text", "button.Tooltip" and "button.BackColor", now we have assigned this meta tag to the button control, so ASP.net will see if the suffix matches the properties of the button control, the suffix of the first key is Text, so the resource value for the key "button.Text" will be assigned to the text property of the button, next Tooltip is also a property of the button so that resource value for the key "button.Tooltip" will be assigned to the button's tooltip.

The advantage of using the meta:resourcekey as you can see is that you can assign multiple resources to properties to in one shot implicitly.

Sunday, July 4, 2010

Satellite Assemblies and Strong Names

A good friend of mine from another project was talking about how a satellite assembly for a given culture that he created for an ASP.NET server control project did not get picked up for the that culture and the default resource was picked up.

I was so curious to why this did not happen, and it was today that it hit me that there project is signed with a strong key and that he did not sign the satellite assembly with the same key.

In other words, if your main project is signed then so does all your satellite assemblies need to be signed with the same key, if not the loading for that assembly will fall and it will resort to the default resource bundle, that is contained within the main assembly.

Friday, July 2, 2010

ASP.NET Localization


I have written some blog post on globalization before, but all this was something to do with desktop application or general best practices. This week I was interested in localizing ASP.NET applications (well, I was forced to :))

So, let me start by giving an introduction to ASP.NET resource assignment features. Basically it's the same .NET concept, you create resource files, compile and create satellite assemblies and link it to the main DLL.
The resource assignment would happen through the resource manager.

Although you can compile your resources before deploying, you can also take the feature of a ASP.NET folder named App_LocalResources, you can just plaace all your resource files with these folder and ASP.NET will automatically compile and create satellite assemblies.

The screen shot on the left shows a screen shot of my solution.

Here I have added the resource files for the Local.aspx into the App_LocalResources folder; I have added 3 different resource files, one for the default (Engligh), one for French (fr-FR) and the other one for Spanish (es-ES).

The Local.aspx file contains a button with id button1, If I need to assign resource text to the button I can do it in 3 ways, I will examine the first 2 ways in this post and the last one in the next post (hopefully !).

The first way I can do this is using an explicit declaration like this.
By doing the above declaration, ASP.NET will fetch the correct resource string from the local resource file (i.e is the resource file with the App_LocaResources folder that matches the aspx file name and the culture suffix eg Local.aspx.fr-FR.resx).

The second way is to actually call the resource manager from the code behind, like this.

Button1.Text = GetLocalResourceObject("button.Text") .ToString();

The GetLocalResourceObject is method within the Page base class.

You can also access global resources using the GetGlobalResourceObject().

The other cleaner and faster way of assigning resources to controls is by using the implicit assignments, which I will post in my next blog post.

On a final note, although the default implementation of the resource manager queries resource files, you can always extend this with your own resource provider that queries a text file or a database or any other data store, this is a great link where you can start doing that.