I love moocs! Right now I am a huge fan of Coursera, and am working through Stanford’s Design and Analysis of Algorithms (here). I have just completed Programming Qeustion #1, and it was one of those real “ah-ha” moments that was so fundamentally simple, I had to blog it! The question requires us to read in a list of integers (100, 000 of them), sort them and count the number of inversions required to sort it (i.e. the number of swaps it takes to get from the unsorted list, to the sorted list).
Bubble sort is my algorithm of choice for creating a “mental picture” of the scenario. With the list (1, 3, 5, 2, 4, 6) as an example, I could think through bubble sort and get the number of inversions, which is 3. Great 🙂 It took me a bit longer to wrap my head around merge sort and how on earth I was going to implement this. Finally I realised that all of the inversions are exposed during the ‘merge’ stage of merge sort and I was away.
Testing the theory
I re-implemented merge-sort to include a counter function and tested it on 5 simple test cases:
(1, 3, 5, 2, 4, 6) => 3 inversions (1, 3, 5, 7, 2, 4, 6) => 6 inversions (1, 3, 2, 5, 4) => 2 inversions (3, 1, 2, 5, 4) => 3 inversions (7, 3, 1, 2, 5, 4, 6) => 9 inversions
All seemed good, so I ran it against the question input and submitted my result.
oh-no? For the life of me, I couldn’t find out what happened. I spent an entire hour re-thinking my algorithm, re-thinking my counter function, re-assessing my key assumption: “that all inversions are exposed in the merge routine”, all with no success. I couldn’t figure out where I had gone wrong.
So, I started trawling through Coursera’s forums and found all sorts of things that had no bearing (e.g. issues with integer types, which isn’t an issue in Python) before I finally found someone who had the same issue…
with open("IntegerArray.txt") as f: input_integers = [i for i in f.readlines()]
This was the issue, it was returning a list of 100, 000 STRINGS! Oops! So a simple change fixed it all:
with open("IntegerArray.txt") as f: input_integers = [int(i) for i in f.readlines()]
Such a simple solution, but an important one. I am working through a Haskell tutorial at the moment as well, and it is really interesting that the tutorial goes to great lengths to emphasise the importance of establishing your types in the very first design phase. If I had of stopped and looked at my types, I probably would have figured this out for myself, instead of boxing on with a silly error.
Never mind, a good lesson learned today 🙂