Holiday Projects

rookie mistake – an important lesson

I love moocs! Right now I am a huge fan of Coursera, and am working through Stanford’s Design and Analysis of Algorithms (here). I have just completed Programming Qeustion #1, and it was one of those real “ah-ha” moments that was so fundamentally simple, I had to blog it! The question requires us to read in a list of integers (100, 000 of them), sort them and count the number of inversions required to sort it (i.e. the number of swaps it takes to get from the unsorted list, to the sorted list).

The process
Bubble sort is my algorithm of choice for creating a “mental picture” of the scenario. With the list (1, 3, 5, 2, 4, 6) as an example, I could think through bubble sort and get the number of inversions, which is 3. Great 🙂 It took me a bit longer to wrap my head around merge sort and how on earth I was going to implement this. Finally I realised that all of the inversions are exposed during the ‘merge’ stage of merge sort and I was away.

Testing the theory
I re-implemented merge-sort to include a counter function and tested it on 5 simple test cases:


(1, 3, 5, 2, 4, 6)        =>  3 inversions

(1, 3, 5, 7, 2, 4, 6)     =>  6 inversions

(1, 3, 2, 5, 4)           =>  2 inversions

(3, 1, 2, 5, 4)           =>  3 inversions

(7, 3, 1, 2, 5, 4, 6)     =>  9 inversions

All seemed good, so I ran it against the question input and submitted my result.

WRONG!

oh-no? For the life of me, I couldn’t find out what happened. I spent an entire hour re-thinking my algorithm, re-thinking my counter function, re-assessing my key assumption: “that all inversions are exposed in the merge routine”, all with no success. I couldn’t figure out where I had gone wrong.

The Solution:
So, I started trawling through Coursera’s forums and found all sorts of things that had no bearing (e.g. issues with integer types, which isn’t an issue in Python) before I finally found someone who had the same issue…

with open("IntegerArray.txt") as f:
    input_integers = [i for i in f.readlines()]

This was the issue, it was returning a list of 100, 000 STRINGS! Oops! So a simple change fixed it all:

with open("IntegerArray.txt") as f:
    input_integers = [int(i) for i in f.readlines()]

Such a simple solution, but an important one. I am working through a Haskell tutorial at the moment as well, and it is really interesting that the tutorial goes to great lengths to emphasise the importance of establishing your types in the very first design phase. If I had of stopped and looked at my types, I probably would have figured this out for myself, instead of boxing on with a silly error.

Never mind, a good lesson learned today 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s