By Alex


2019-12-02 22:12:02 8 Comments

Currently have a dataframe that is countries by series, with values ranging from 0-25

I want to sort the df so that the highest values appear in the top left (first), while the lowest appear in the bottom right (last).

FROM

        A   B   C   D  ...

USA     4   0   10  16
CHN     2   3   13  22
UK      2   1   8   14
...

TO

      D   C   A   B   ...

CHN   22  13  2   3
USA   16  10  4   0
UK    14  8   2   1

...

In this, the column with the highest values is now first, and the same is true with the index.

I have considered reindexing, but this loses the 'Countries' Index.

    D   C   A   B   ...

0   22  13  2   3
1   16  10  4   0
2   14  8   2   1
...

I have thought about creating a new column and row that has the Mean or Sum of values for that respective column/row, but is this the most efficient way?

How would I then sort the DF after I have the new rows/columns??

Is there a way to reindex using...

df_mv.reindex(df_mv.mean(or sum)().sort_values(ascending = False).index, axis=1)

... that would allow me to keep the country index, and simply sort it accordingly?

Thanks for any and all advice or assistance.

EDIT

Intended result organizes columns AND rows from largest to smallest.

Regarding the first row of the A and B columns in the intended output, these are supposed to be 2, 3 respectively. This is because the intended result interprets the A column as greater than the B column in both sum and mean (even though either sum or mean can be considered for the 'value' of a row/column).

By saying the higher numbers would be in the top left, while the lower ones would be in the bottom right, I simply meant this as a general trend for the resulting df. It is the columns and rows as whole however, that are the intended focus. I apologize for the confusion.

4 comments

@Mykola Zotko 2019-12-02 22:58:56

Using numpy:

arr = df.to_numpy()
arr = arr[np.max(arr, axis=1).argsort()[::-1], :]
arr = np.sort(arr, axis=1)[:, ::-1]
df1 = pd.DataFrame(arr, index=df.index, columns=df.columns)

print(df1)

Output:

      A   B  C  D
USA  22  13  3  2
CHN  16  10  4  0
UK   14   8  2  1

@jorijnsmit 2019-12-02 22:42:53

Here's another way, this time without transposing but using axis=1 as an argument:

df = df.sort_values(df.max().idxmax(), ascending=False)
df = df.sort_values(df.index[0], axis=1, ascending=False)

@ansev 2019-12-02 22:37:59

You could use:

rows_index=df.max(axis=1).sort_values(ascending=False).index
col_index=df.max().sort_values(ascending=False).index
new_df=df.loc[rows_index,col_index]
print(new_df)

      D   C  A  B
CHN  22  13  2  3
USA  16  10  4  0
UK   14   8  2  1

@jorijnsmit 2019-12-02 22:43:51

This does not match the expected result.

@Erfan 2019-12-02 22:53:31

His actually does, yours does not. @jorijnsmit

@ansev 2019-12-02 22:54:17

Look well at the expected result, now that I notice it is your answer that does not match

@Mykola Zotko 2019-12-02 23:00:06

I don't understand why you have 2, 3 and not 3, 2 in the first row at the end.

@jorijnsmit 2019-12-02 23:04:26

Hmm question is ambiguous actually. OP asks for highest number in the top left (22) and lowest in the bottom right; which is 0!

@Alex 2019-12-03 00:31:03

Using df.max unfortunately did not yield the results I was looking for- however, keeping all else the same and changing that df.max to df.mean, worked out great! thank you!

@ansev 2019-12-03 00:31:39

I am glad to help you:)

@jorijnsmit 2019-12-02 22:18:31

Use .T to transpose rows to columns and vice versa:

df = df.sort_values(df.max().idxmax(), ascending=False)
df = df.T
df = df.sort_values(df.columns[0], ascending=False).T

Result:

>>> df
      D   C  B  A
CHN  22  13  3  2
USA  16  10  0  4
UK   14   8  1  2

@ansev 2019-12-02 22:27:17

Why do you know that you must order by D? I think this is not a general solution.

@jorijnsmit 2019-12-02 22:35:45

You are right! Edited my answer to sort by df.max().idxmax() instead of by arbitrarily selecting a column with a highest value.

@Mykola Zotko 2019-12-02 23:04:09

Shouldn't it be 4, 0 in the second row and 2, 1 in the last row at the end?

@jorijnsmit 2019-12-02 23:05:48

¯_(ツ)_/¯ question can be interpreted in multiple ways.

@ansev 2019-12-02 23:06:52

I think the expected output speaks for itself

@Mykola Zotko 2019-12-02 23:08:05

@ansev Ok. If you take the expected output of OP you get 2, 1 in the last row and not 1, 2.

@ansev 2019-12-02 23:10:37

which on the other hand is the output that OP expects

Related Questions

Sponsored Content

28 Answered Questions

[SOLVED] How to change the order of DataFrame columns?

19 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

13 Answered Questions

[SOLVED] Delete column from pandas DataFrame

11 Answered Questions

[SOLVED] How to drop rows of Pandas DataFrame whose value in a certain column is NaN

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

19 Answered Questions

[SOLVED] Add one row to pandas DataFrame

18 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

19 Answered Questions

[SOLVED] How to sort a dataframe by multiple column(s)

15 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

Sponsored Content