Saturday, July 17, 2010

HashSet<T> vs .Net 4.0 SortedSet<T>

HashSet<T> has been there for a while, it can store objects in such a way that the time to add an item to the set or remove it or search for an item is O(1), constant time.
It uses hash based implementation to achieve this constant time for these operations, however when you want to iterate this collection in a sorted way, the operation is intensive, as the values within the HashSet is not sorted, you need to create a sorted collection and then iterate this collection.
As the values are stored within the collection indirectly based on hash, the sort operation is expensive and also a new collection has to be created.

This is where SortedSet<T> comes into play, this collection type was introduced in .Net 4.0, when you add item to this collection, the item is placed in the collection according to the sort criteria, thus when you need to iterate this sorted collection, it is much faster then the HashSet.

SortedSet has it's own cons, now that the items has to be placed in the correct position in the collection according to the sort order the Add() operation and Remove() operation don't take constant time anymore.

Searching for an element, would mean that a binary search has to be done on the collection which is logarithmic time.

The conclusion is that these Sets can be used accordingly to the requirements that you have and you are not forced to choose one collection over the other, if you need to iterate over a sorted collection, then a SortedSet would be the best choice.

.Net 4.0 also introduces the ISet<T> interface, both HashSet and SortedSet implements this interface, so you can always program to an interface and change your type in the middle of the implementation if you feel HashSet would do better then SortedSet.

Btw, why do you need to use sets anyways ?, cos' then they can be manipulated with set operation like Union, Intersect etc.. and that they can contain only unique elements.

No comments:

Post a Comment