By Ibrahim Rahimi

2019-03-11 10:01:43 8 Comments

I am doing a code challenge of which ask for following things: Given an array of integers, sort its elements by the difference of their largest and smallest digits. In the case of a tie, that with the larger index in the array should come first.


For a = [152, 23, 7, 887, 243], the output should be
digitDifferenceSort(a) = [7, 887, 23, 243, 152].

Here are the differences of all the numbers:

152: difference = 5 - 1 = 4;
23: difference = 3 - 2 = 1;
7: difference = 7 - 7 = 0;
887: difference = 8 - 7 = 1;
243: difference = 4 - 2 = 2.

23 and 887 have the same difference, but 887 goes after 23 in a, so in the sorted array it comes first. I wrote following code using python3 and it passes all normal tests but it cannot pass execution time tests. How can I improve my code to decrease it's execution time for list with considerable amount of elements?

def digitDifferenceSort(a):
    diff = []
    for i in a:
        i = list(str(i))

    for i in range(len(diff)):
        for j in range(i+1, len(diff)):
            if int(max(diff[i])) - int(min(diff[i])) > int(max(diff[j])) - int(min(diff[j])):
                diff[j], diff[i] = diff[i], diff[j]
            elif int(max(diff[i])) - int(min(diff[i])) == int(max(diff[j])) - int(min(diff[j])):
                diff[i], diff[j] = diff[j], diff[i]           
    new_list = []          
    for i in diff:
        b = ''
        for j in i:
            b = b + j


    return new_list


@Graipher 2019-03-11 10:30:07

Python has a built-in sorted function, you should use it. What it needs to sort according to some special criteria is a key function:

def max_digit_diff(n):
    n_str = str(n)
    return int(max(n_str)) - int(min(n_str))

This uses the fact that "0" < "1" < ... < "9".

However, the sorted function uses a stable sorting algorithm, so if two elements compare equal, the original order is preserved. But here we want the opposite order (later elements come first), so we just reverse the list first:

def digit_difference_sort(a):
    return sorted(reversed(a), key=max_digit_diff)

This should be vastly easier to read than your convoluted function. Note that the function names also follow Python's official style-guide, PEP8.

Like all (good) sorting functions, this is \$\mathcal{O}(n \log n)\$. Here is a timing comparison to your function with arrays up to length 10k (at which point your function takes more than a minute...).

enter image description here

Here is an implementation of the radix sort suggested by @JollyJoker in their answer:

from itertools import chain

def radix_sort(a):
    sub_a = [[] for _ in range(10)]
    for x in a:
    return list(chain.from_iterable(reversed(x) for x in sub_a))

This seems to have the same complexity as my approach, probably the implementation of max_digit_diff actually dominates this:

enter image description here

@Ibrahim Rahimi 2019-03-11 10:57:58

Thanks a lot. Would you please give more clarification about passing max_digit_diff method as a parameter to sorted function and how this part works?

@Graipher 2019-03-11 11:00:30

@IbrahimRahimi: sorted calls the function you specify as a key exactly once for each input and sorts according to that. Have a look at the official Sorting HOW TO for more information.

@gustavovelascoh 2019-03-11 12:30:53

Nice answer. Just curious, How did you generate the comparison graph? is there a package for doing it?

@Graipher 2019-03-11 13:56:46

@gustavovelascoh: It is basically done with the code in my question here, with some input from the answers. One of these days I will finally make it look pretty and upload it to github...

@JollyJoker 2019-03-11 14:11:43

You can do better than \$\mathcal{O}(n \log n)\$ using a Radix sort.

The differences can only have values 0-9, so you can sort the original array into a list of 10 lists while just going through the array once. Then, for each list 0-9, pop() the values into an output list until the list is empty.

@Graipher 2019-03-11 14:30:48

Added an implementation of this to my answer and included it in the timings. Interestingly it is exactly the same as using the built-in sorted. Probably due to the fact that both use max_digit_diff.

@JollyJoker 2019-03-11 15:02:46

@Graipher The scaling seems odd. Could you add some timing check before the return row just to check there's nothing slow in the last line? Then again, maybe sorted just is that good.

@Graipher 2019-03-11 15:18:38

Weirdly, it looks exactly the same when directly returning sub_a. Also, max_digit_diff is basically constant time, obviously, since even the longest numbers have only a few digits (less than a hundred).

@Graipher 2019-03-11 15:27:11

I also tried arrays with up to 10k elements, still the same (without the OPs algorithm, obviously).

@Graipher 2019-03-11 15:28:28

Since there are only ten distinct possible values for the digit difference, organizing the array into runs is very efficient (, so I think Python's timsort basically performs partial radix sort as a first step.

@JollyJoker 2019-03-11 15:48:51

@Graipher You're probably right on Timsort. BTW, good job on turning my answer into actual code :) I wasn't at all certain my text was clear enough.

Related Questions

Sponsored Content

1 Answered Questions

1 Answered Questions

Find max in-order difference in array

2 Answered Questions

2 Answered Questions

[SOLVED] Solving this Python implementation for Stock Gains

1 Answered Questions

[SOLVED] "Max Min" in HackerRank

0 Answered Questions

2 Answered Questions

[SOLVED] Finding longest common prefix

2 Answered Questions

[SOLVED] Improving excution time for list and sampling in Python

2 Answered Questions

[SOLVED] Find Missing Numbers in an int list

Sponsored Content