Why should I use List and not ArrayList

I came upon a questing at StackOverflow:

The system I work on here was written before .net 2.0 and didn't have the benefit of generics. It was eventually updated to 2.0, but none of the code was refactored due to time constraints. There are a number of places where the code uses ArraysLists etc. that store things as objects.

From performance perspective, how important change the code to using generics? I know from a perfomance perspective, boxing and unboxing etc., it is inefficient, but how much of a performance gain will there really be from changing it? Are generics something to use on a go forward basis, or it there enough of a performance change that a conscience effort should be made to update old code?

Other then the obvious benefit of using strongly typed collection there is a performance increase as well.

Performance benchmarking

Checking the difference between using List & ArrayList is easy to check by writing a simple C# code:

class Program
{
static Stopwatch stopper = new Stopwatch();
const int ITEM_NUM = 100000;
const int NUM_OF_ITERATIONS = 1000;

static void Main(string[] args)
{
ArrayList arrayList = new ArrayList();
List<int> listOfInt = new List<int>();

Console.WriteLine("Inserting to ArrayList took: {0}ms", InsertItemsToArrayList(arrayList));
Console.WriteLine("Inserting to List<T> took: {0}ms", InsertItemsToList(listOfInt));
Console.WriteLine("Enumerating ArrayList took: {0}ms", IterateArrayList(arrayList));
Console.WriteLine("Enumerating List<T> took: {0}ms", IterateList(listOfInt));
}

private static double InsertItemsToArrayList(ArrayList arrayList)
{
stopper.Reset();

for (int i = 0; i < NUM_OF_ITERATIONS; i++)
{
stopper.Start();

for (int j = 0; j < ITEM_NUM; j++)
{
arrayList.Add(i);
}
stopper.Stop();

arrayList.Clear();
}

return (double)stopper.ElapsedMilliseconds / (double)NUM_OF_ITERATIONS;
}

private static double InsertItemsToList(List<int> listOfInt)
{
stopper.Reset();
stopper.Start();

for (int i = 0; i < NUM_OF_ITERATIONS; i++)
{
for (int j = 0; j < ITEM_NUM; j++)
{
listOfInt.Add(i);
}

stopper.Stop();
listOfInt.Clear();
}

return (double)stopper.ElapsedMilliseconds / (double)NUM_OF_ITERATIONS;
}

private static double IterateArrayList(ArrayList arrayList)
{
stopper.Reset();
for (int i = 0; i < ITEM_NUM; i++)
{
arrayList.Add(i);
}

for (int i = 0; i < NUM_OF_ITERATIONS; i++)
{
stopper.Start();
for (int j = 0; j < arrayList.Count; j++)
{
int item = (int)arrayList[i];
}
stopper.Stop();

}

return (double)stopper.ElapsedMilliseconds / (double)NUM_OF_ITERATIONS;
}

private static double IterateList(List<int> listOfInt)
{
stopper.Reset();

for (int i = 0; i < ITEM_NUM; i++)
{
listOfInt.Add(i);
}

for (int i = 0; i < NUM_OF_ITERATIONS; i++)
{
stopper.Start();

for (int j = 0; j < listOfInt.Count ; j++)
{
int item = listOfInt[i];
}
stopper.Stop();
}

return (double)stopper.ElapsedMilliseconds / (double)NUM_OF_ITERATIONS;
}
}


The results



image



There is a performance hit when using ArrayList but we could not attribute it entirely to boxing/unboxing.



Boxing And Unboxing Impact



In a blog post “Why boxing doesn't keep me awake at nights” Jon Skeet shows that although boxing an integer causes performance decrease it is relatively minor. You can see in his benchmark that using object instead of int in various function calls doesn’t cause 4-100% decrease in performance like we saw in our tests. So the big performance hit in the ArrayList vs. List<int> tests must be due to some other factor.



Other reasons



Searching the net I found another post - Performance Guideline: Use Generics To Eliminate the Cost Of Boxing, Casting and Virtual calls.



In this post J.D. Meier shows that there is a performance difference even between ArrayList and List<object>:




List<Object> class gives better performance over ArrayList.  An example benchmark of a quick-sort of an array of one million integers may show the generic method is 3 times faster than the non-generic equivalent. This is because boxing of the values is avoided completely. In another example, the quick-sort of an array of one million string references with the generic method was 20 percent faster due to the absence of a need to perform type checking at run time.   You results will depend on your scenario.




And the last reason I can think of is Memory Caching:



From Wikipedia:




A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are to cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory.




The CPU Cache is of limited size, when a collection is being read some of it copied into the cache but due to the limited size of the cache when we iterate past the elements in the cache there is a need of another copy.



The size of int is 4 byte while the size of an object is four times that size this means that only a quarter of the elements would be copied to the cache memory when the collection is enumerated or read (in order to insert objects into it). When enumerating the entire collection the cache is re-written less times when using ints then when using objects.



Summery



Object based collections are a relic from .NET 1.x times before we had Generics. But since version 2 of .NET got out and we can use strongly typed collections it seems that we do not need to using ArrayList and such - Other then the performance benefits that were shown in here we gain type-safety and make our code much more readable.

Labels: , ,