Dive deep into Kadane’s algorithm​

Disclaimer: If you are looking for a quick read before an interview, this article is not for you. However, please feel free to bookmark this page and come back later.


Recently, I interviewed a candidate and asked him a problem which finally boiled down to a variant of maximum subarray problem. He was able to recognize it as such and also able to walk me through his implementation of Kadane’s algorithm explaining how it works. When I asked him if he could tell why this algorithm works, he could not give me a definitive answer. After the interview, while I was walking him out, he did confess that he’d just blindly accepted whatever Wikipedia had told him. As soon as I was back in my seat, I searched for this on Wikipedia, and here is what I found.

Kadane’s algorithm begins with a simple inductive question: if we know the maximum subarray sum ending at position i (call this B_i), what is the maximum subarray sum ending at position $i+1$ (equivalently, what is B_{i+1})? The answer turns out to be relatively straightforward: either the maximum subarray sum ending at position $i+1$ includes the maximum subarray sum ending at position $i$ as a prefix, or it doesn’t (equivalently B_{i+1}=max(A_{i+1}, A_{i+1} + B_i), where A_{i+1} is the element at index i+1

Their explanation was perfect and to the point. So much so, if I’d glanced over a single word without reasoning about it, I would have missed the point on why the algorithm works the way it does. I did search around on the Internet for decent explanations, the only person whose explanation was helpful was the one by CS Dojo, but even his attempt of proving the algorithm just touched the tip of the iceberg.

My aim in this article is to not only expand upon these brief explanations but take you on a small journey where you experience the problem in different lights and finally figure out the algorithm yourself.

Continue reading “Dive deep into Kadane’s algorithm​”

Proof for the divisibility test of 3

This post is a bit mathematical, and talks about the divisibility test of 3 we all learnt in high school. As I’m not aiming for brevity but for completeness and ease of explanation let me remind you all of the theorem involved to test the divisibility of a number by 3 and then move onto its proof.

To test the divisibility of a number N by 3, recursively sum the digits that make up the number till you reach a manageable number whose divisibility of 3 can be inferred trivially.

For example if we’re testing the divisibility of 3423 by 3, we’ll add up 3+4+2+3 and check if that sum is divisible by 3. We know that 12 is divisible by 3, hence the number 3423 is divisible by 3 as well . If we did not know that 12 was divisible by 3 we would have summed up the digits of 12 and and try to see if that sum is divisible by 3 and so on.

Proof for theorem:

The main crux of the proof is representing the number as a specific set of additions and multiplications and your work is almost done. If we have a number N with n digits a_{1}a_{2} \ldots a_{n}, then the number N can also be represented as follows

N = a_1 \cdot 10^{n-1} + a_2 \cdot 10^{n-2} + ... + a_n \cdot 1

With out proof in mind let’s separate the sum of all the digits of N i.e, a_1 + a_2 + \ldots + a_n and see what we get.

N = \left(\underbrace{99\ldots9}_{n\text{-times}} \cdot a_1 + \underbrace{99\ldots9}_{n-1\text{-times}} \cdot a_2 + \ldots 9 \cdot a_{n-1} \right) + \left( a_1 + a_2 + \ldots + a_n \right)

Now if we can prove that the right glob we have here is divisible by 3 then our job is done (We’ll know that the if the sum of its digits is divisible by 3 then N is divisible by 3).

Corollary #1:

A number with just 9 as its digits is divisble by 3

Proof for Corollary #1:

We know that any number which is divisible by 9 is divisible by 3 as well \left(3 * 3 = 9\right). So a number M which has m digits in it and all m digits are 9 can be represented as

\underbrace{99\ldots9}_{m\text{-times}} = 9 \cdot 10^{m-1} + 9 \cdot 10^{m-2} + ... + 9

On further simplification we get

\underbrace{99\ldots9}_{m\text{-times}} = 9 \left(10^{m-1} + 10^{m-2} + ... + 1\right)

The above proves the corollary. With the corollary proven we can concur that the sum of the digits of a number should also be divisible by 3 if a number is divisible by 3 and vice versa.

Acknowledgement: I first saw this proof on the One Mathematical Cat website, but the instructor just proved it for a specific case of 3 digits. This was my effort in generalizing the proof for any arbitrary number of digits.

Deploy Meteor apps on Bluemix in 5 easy steps

All IBMers have free access to Bluemix, unfortunately Bluemix does support the Meteor buildpack by default (Node.js app). Here’s how you can work around that limitation in 5 simple steps

I’m assuming that you’ve logged into Bluemix via the CloudFoundry command line interface

Step #1: Go to the Meteor application directory on your local machine

cd path/to/meteor/app/dir

Step #2: Add .cfignore file to ignore downloaded packages associated with this app (This will be downloaded again while deploying Meteor, so no need to upload them).

echo ".meteor/local" >> ./.cfignore

Step #3: Create a MongoLab service (specifically go for MongoLab and not any other MongoDB provider). At the time of this writing Bluemix did not have a MongoLab instance in Europe (eu-gb) so you will probably be able to deploy your Meteor app only in North America (US-South).

cf create-service mongolab sandbox

Step #4: Bind the created Mongolab service to this application

cf bind-service

Step #5: Push the app with Ben Cox‘s Meteor buildpack for Bluemix.

cf push  -b https://github.com/ind1go/bluemix-buildpack-meteor.git

That’s it. Wait for the push to succeed, you should see your Meteor application up an running on Bluemix!

Dynamic Arrays: Facts and figures

The one problem we face with arrays is that its size is fixed. We can run out of space once the array has been filled up. This is pretty bad in production as we cannot really estimate the size of the array beforehand in many situations. Of course, we could allocate a huge array but we’d be wasting resources at the end of the day if they’re not populated. Sounds like a chicken and an egg problem.. How would we want to solve it?

The answer is dynamic array allocation. The concept is pretty simple. While instantiating a dynamic array, initially allocate a considerable size N (for example, 10 as done in Java). Then once we run out of space we double its size from N to 2 \cdot N. To delve a bit deeper into it, we first allocate an array of size 2 \cdot N, then copy the N elements over to the newly allocated array. Then we give up the older array’s to the memory management system after making the array reference point to the new one. The figure below illustrates the state of the array before, during and after re-sizing the array.

Figure: Dynamic Array Re-sizing (click image to enlarge)

It can be noted that the same procedure holds good while we have too much space with us, once the array is only filled to N/2, it gets re-sized to this number, instead of holding up space for all the N elements.

Now let’s get into the interesting part, algorithmic complexity. For simplicity, let’s assume that the initial size of the array is 1.

The first question to be asked here is how many times do we have to double the array to reach a final capacity of N. Before thinking in abstract terms, let us talk some numbers. Suppose if the capacity of my dynamic array is 16 right now, how many times have I doubled the capacity of my array? We’ve doubled 4 times (1 to 2, 2 to 4, 4 to 8, 8 to 16). Thus we can conlude that to reach the capacity of N we must have doubled log_2 N times.

The question to be asked next is how many times have we re-copied elements into a new array the original array is currently at capacity of N and we’ve just re-sized to a capacity of 2 \cdot N? Half of the elements of the original array has moved once, the quarter of the original array have moved twice, and so on. When we formulate this as a summation S we get:

S = \sum_{i=1}^{\log N}{i \cdot n / 2^i} = n \cdot \sum_{i=1}^{\log N}{i / 2^i} < N \cdot \sum_{i=1}^{\log \infty}{i / 2^i} = 2 \cdot N

Notice that the we’re talking about the total number of movements (re-copies) of elements rather than how many times an element moves. This is useful because, when we look at N insertions as a whole, we’ve just spent 2 \cdot N amount of work. This is really cool because, creating and managing a dynamic array framework is still guarenteed to be O\left(N\right). Such a guarantee/cost analysis is formally known as an amortized guarantee/analysis depending on the context.

The intuition behind computing combinations recursively (N choose K)

We all deal with combinations on a regular basis and we’re also familiar with the recursive formula

{N \choose k}={N-1 \choose k-1}+{N-1 \choose k}
C(N,k) = C(N-1,k-1) + C(N-1,k)

How did this formula come into existence? What is the intuition behind this recursive formula?

Here’s my shot at explaining this equation:

Suppose you have a big basket with N items in it and another smaller (empty) basket of size k in which you are supposed to put k items from the big basket. C(N,k) simply means the number of ways you can choose k items from the big basket and put it in your smaller basket.

Before we delve into approaching the problem let’s look at the trivial case of choosing things.

How many ways can you choose one item if you have only one item? You guessed it, it’s just one.

Similarly how many ways can you choose N items from N items? You guessed it again, there’s just one way of choosing them.

So now we can say C(z,z) = 1, also implying C(1,1) = 1

The other smaller piece of the puzzle you need to think about is not choosing any items when I have m items with me. So how many ways can I do this? The answer is obvious you can only do this one way (just sit idle)

So we can say C(m,0) = 1, also implying C(1,0) = 1

So now we’ll look at another piece of the puzzle: By looking at the two intuitions we developed above. What can you say about you smartly stopping to pick items a particular way?

You stop picking items in the case where there are Z items to pick and there are only Z items left. This includes leaving only one item and picking that one.

Now armed with this background we can go about solving the complete puzzle.

We humans are brilliant, we always try to break up problems into manageable pieces and try to solve them. This is how recursion works. We divide the problem into a readily solvable part (for which we know the answer) and a part which needs further computation

So what I do is approach the problem solving for one item at a time.

I pick an item from the N items in the big basket and then I ask myself what can I do with this item in my hand. We can do a lot of things like throw it, eat it.. etc. but I can do one of the two things which are of interest to me in this problem:

  • I can place it my smaller basket
  • I can not place it my smaller basket, just place it beside my big basket

In the first case as I have placed that item in the smaller basket, I just have to choose k-1 items of the N-1 items present. So now I should worry about computing the number of ways I can do that.

In the the second case as I have not placed that item in the smaller basket I still have to choose k items from N-1 items in the big basket. I have to worry about the number of ways I can go about doing this.

So solving this big puzzle now becomes: Summing up these two ways. I note this down in a book. I have two problems to solve now. I go about solving them one after another and go deeper and deeper into trying to solve a smaller problem

I keep track of all the things I did till now and I keep going deep over and over again until I reach one of trivial cases:

  • Compute the number of ways of choosing Z items of Z items in the big basket
  • Compute the number of ways of choosing no items of the Z items in the big basket

Both these are one and I propagate this result to the previous equation which had a dependency on this.

Finally when I finish solving all these problems and finish adding up all the way from bottom to top I’ve solved the complete puzzle.

A trivial (but a very inefficient) C code snippet shows the $N$ choose $K$ recursion in action:

int C(int n, int k)
    if(n == k || k == 0) return(1);
    else return(C(n-1,k-1) + C(n-1,k));

Hope this cleared up things for you. Leave comments if you’ve got any questions or doubts or if you spot any errors.

Google App Engine: Memcaching for dummies example

Here’s a very simple code written in Python 2.7 and webapp2 on Google App Engine that does memcaching. It is possibly the easiest example I could think of, that I could implement memcaching on Google App Engine.

The main confusion which arises when one starts to look into memcaching is that Google fails to show that memecaching has to be done after instantiating a memcache client object.

The source code for this application is available here and is self explanatory.

Questions, comments, doubts are welcome.

Encoding Special Characters with URLLib Quote Function in Python

I was actually testing my final year project today and there was so much noise in the data, I was really frustrated by the end of it.

One of the most troublesome and difficult to figure out was urllib.quote(movie) function.

You should see movie titles people update, here are a few.

I ♥ Bollywood, Funniest movies ever

That really seemed a challenge to be sent over for an API call. We thought of stripping them off in the sentence, but there are a few French and Italian movies which always have some or the other odd character in them. I used quote() function and was getting KeyError exception.

Finally I figured it out, you have to encode it into UTF-8 so that they can be sent across. So while calling a URL, if it has any special characters in it, better encode it and sent it accross.

Example: (Google App Engine)

import urllib
from google.appengine.api import urlfetch

data = u'♥+ツ'
url = 'http://www.google.com?search='

response = urlfetch.fetch(url + urllib.quote(data.encode(encoding = 'UTF-8')))

if response.status_code == 200:
    output = response.content

I wasted almost an hour on figuring out what to do. If you ever get a KeyError when you are using URLLib.Quote, then this is the solution.