Dynamic Arrays: Facts and figures

The one problem we face with arrays is that its size is fixed. We can run out of space once the array has been filled up. This is pretty bad in production as we cannot really estimate the size of the array beforehand in many situations. Of course, we could allocate a huge array but we’d be wasting resources at the end of the day if they’re not populated. Sounds like a chicken and an egg problem.. How would we want to solve it?

The answer is dynamic array allocation. The concept is pretty simple. While instantiating a dynamic array, initially allocate a considerable size N (for example, 10 as done in Java). Then once we run out of space we double its size from N to 2 \cdot N. To delve a bit deeper into it, we first allocate an array of size 2 \cdot N, then copy the N elements over to the newly allocated array. Then we give up the older array’s to the memory management system after making the array reference point to the new one. The figure below illustrates the state of the array before, during and after re-sizing the array.

Figure: Dynamic Array Re-sizing (click image to enlarge)

It can be noted that the same procedure holds good while we have too much space with us, once the array is only filled to N/2, it gets re-sized to this number, instead of holding up space for all the N elements.

Now let’s get into the interesting part, algorithmic complexity. For simplicity, let’s assume that the initial size of the array is 1.

The first question to be asked here is how many times do we have to double the array to reach a final capacity of N. Before thinking in abstract terms, let us talk some numbers. Suppose if the capacity of my dynamic array is 16 right now, how many times have I doubled the capacity of my array? We’ve doubled 4 times (1 to 2, 2 to 4, 4 to 8, 8 to 16). Thus we can conlude that to reach the capacity of N we must have doubled log_2 N times.

The question to be asked next is how many times have we re-copied elements into a new array the original array is currently at capacity of N and we’ve just re-sized to a capacity of 2 \cdot N? Half of the elements of the original array has moved once, the quarter of the original array have moved twice, and so on. When we formulate this as a summation S we get:

S = \sum_{i=1}^{\log N}{i \cdot n / 2^i} = n \cdot \sum_{i=1}^{\log N}{i / 2^i} < N \cdot \sum_{i=1}^{\log \infty}{i / 2^i} = 2 \cdot N

Notice that the we’re talking about the total number of movements (re-copies) of elements rather than how many times an element moves. This is useful because, when we look at N insertions as a whole, we’ve just spent 2 \cdot N amount of work. This is really cool because, creating and managing a dynamic array framework is still guarenteed to be O\left(N\right). Such a guarantee/cost analysis is formally known as an amortized guarantee/analysis depending on the context.


The intuition behind computing combinations recursively (N choose K)

Note: I have rewritten this article to make it more straightforward, and also I have corrected a lot of grammatical errors.

With this article, I plan to explain the intuition behind the following “recursive” formula.

{N \choose k}={N-1 \choose k-1}+{N-1 \choose k}
C(N,k) = C(N-1,k-1) + C(N-1,k)

Let’s start by building a mental model, imagine that you have two baskets with you. Let’s call the first as B_1, has a capacity of N and is full. The second, B_2 has a capacity k (for clarity, k \leq N) and empty. For simplicity, let’s assume that all items are distinct and distinguishable.

The idea of choosing k items from N is nothing but counting the number of ways you can pick k items from B_1 and put it in B_2.

Another critical thing to note is that, while computing combinations, the order of how you choose those k items is unimportant, the only thing that matters is what items you choose. Choosing k items from N is mathematically denoted as {N \choose k} and I shall be using this notation from now on.

One way to approach a problem recursively is to see if it makes sense to decrease the problem size by one and delegate the problem to someone else. Let’s do just that, and pick an item out of B_1. Now you can either put it in B_2 or keep it aside. Each choice leads to interesting consequences:

Case 1: The item was placed in B_2, so now B_1 has N-1 items in it, and you can only put k-1 items in B_2. So now you have to choose k-1 items from N-1 items or in other words {N-1 \choose k-1}.

Case 2: The item was placed aside (not in B_2), consequently B_1 has N-1 items in it, but B_2 is still empty and you can still put k items in it. This translates to you having to choose k items from N-1 items or in other words {N-1 \choose k}.

We can conclude that the number of ways of choosing k items from N is the sum of:

  • Number of ways of choosing k-1 items from N-1
  • Number of ways of choosing k items from N-1.

Thus we have derived the recursive equation, {N \choose k}={N-1 \choose k-1}+{N-1 \choose k}.

Are we done? No, not yet. We still haven’t discussed another crucial thing when it comes to recursion, the base cases. Base cases are how the recursive formulae yield an outcome.

Q1) What is {N \choose N}?

A: We have only one way of choosing all the items from B_1 and putting them into B_2. Hence, we can also conclude that {1 \choose 1} is 1.

Q2) What is {N \choose 1}?

A: We have N choices, hence N ways of choosing one item from N items.

Let us make how we stop our recursion a bit smarter by considering the following case:

Q3) What is {N \choose N-k}?

A: Choosing k items out of N is the same as not choosing N-k items out of N. Hence, you have {N \choose k} = {N \choose N-k}. You can derive some interesting results out of this, here are a few

  • {N \choose 0} = {N \choose N} = 1
  • {N \choose N-1} = {N \choose 1} = N