python

Working with Lists in Python – a “gotcha” to watch for

Working with lists in Python is fantastically simple. Like any C-like language, lists are mutable in Python which allows you to do all sorts of brilliant things, like sorting in place, or modifying values, all the while reducing the potential for memory overhead.

Getting a slice from a list is also very simple in Python:

my_list = [1, 2, 3, 4, 5]
my_greeting = "Hello WordPress!"

my_list[2:4] => [3, 4]
my_greeting[6:] => "Wordpress!"

However, I have been coding up a quicksort algorithm, and have discovered a simple little gotcha that it is worth being aware of. The beauty of quicksort, is that it sorts the list in-place. But you have to be careful in Python that you implement the recursion correctly. At first, I passed a subset (slice) of the list to be sorted into each recursive call:

def partition(p, pivot):
    """
        Divide the list into two parts a lesser and greater part
    """

    # walk the unpartitioned sub-array (between left and right boundaries)
    greater_start = 1
    for current in range(1, len(p)):
        if p[current] < pivot:
            p[greater_start], p[current] = p[current], p[greater_start]
            greater_start += 1

    # finally put the pivot into the correct spot
    p[0], p[greater_start - 1] = p[greater_start - 1], p[0]

    # identify partition boundaries
    lesser = (0, greater_start - 1)
    greater = (greater_start, len(p))

    return lesser, greater

def quicksort(p):
    if (len(p)) <= 1:
        return
    # arbitrarily use the first element as pivot
    pivot = p[0]

    lesser, greater = partition(p, pivot)

    # recurse on each partition
    quicksort(p[lesser[0]:lesser[1]])
    quicksort(p[greater[0]:greater[1]])

This appeared to work well, when you stepped through with a debugger (for example pdb) it was clear that the list was being properly sorted at each recursive call. BUT… if I printed the final list I got:

# list at start
my_list = [3, 4, 1, 2, 5]
quicksort(my_list)

# list after sorting:
print(my_list)
=> [2, 1, 3, 4, 5]

I couldn’t figure out what was going wrong! The problem was all to do with the slice I was passing to quicksort at each recursive call. I thought that I was passing a portion of ‘my_list’ in each recursive call, but really I was passing a COPY of the slice. This is an important distinction, it meant that each recursive partition was partitioning a COPY of ‘my_list’ and leaving the original list untouched! Ultimately, this meant the only time that ‘my_list’ was being partitioned was in the very first partition.

To get around this, I had to think a little more C-like. To do quicksort in C, you would have to pass memory address pointers to each recursive call. This way, C modifies the value at a certain memory address, NOT the value referenced by some place-holder variable. And this is the trick for Python here as well, you must pass ‘my_list’ into each partition call, and use values to denote left (l) and right (r) to tell partition which sub-set of ‘my_list’ to work on.

For example:

partition(my_list, pivot, l=4, r=5)
...
# --> will act on my_list between index 4 & 5 inclusive

The full, working quicksort is below:

def partition(p, pivot, l, r):
    """
        Divide the list into two parts a lesser and greater part
    """

    # walk the unpartitioned sub-array (between left and right boundaries)
    greater_start = l + 1
    for current in range(l, r):
        if p[current] < pivot:
            p[greater_start], p[current] = p[current], p[greater_start]
            greater_start += 1

    # finally put the pivot into the correct spot
    p[l], p[greater_start - 1] = p[greater_start - 1], p[l]

    # identify partition boundaries
    lesser = (l, greater_start - 1)
    greater = (greater_start, r)

    return lesser, greater

def quicksort(p, l, r):
    if (r - l) <= 1:
        return
    # arbitrarily use the first element as pivot
    pivot = p[l]

    lesser, greater = partition(p, pivot, l, r)

    # recurse on each partition
    quicksort(p, lesser[0], lesser[1])
    quicksort(p, greater[0], greater[1])
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s