By natsuki_2002


2013-10-20 21:18:37 8 Comments

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

For example, if I'm given a DataFrame like this:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would want to get a list like this:

>>> header_list
['y', 'gdp', 'cap']

18 comments

@rohit singh 2019-04-16 06:32:43

%%timeit
final_df.columns.values.tolist()
948 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
list(final_df.columns)
14.2 µs ± 79.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.columns.values)
1.88 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
final_df.columns.tolist()
12.3 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.head(1).columns)
163 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@cs95 2019-04-03 09:18:29

Surprised I haven't seen this posted so far, so I'll just leave this here.

Extended Iterable Unpacking (python3.5+): [*df] and Friends

Unpacking generalizations (PEP 448) have been introduced with Python 3.5. So, the following operations are all possible.

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x 

If you want a list....

[*df]
# ['A', 'B', 'C']

Or, if you want a set,

{*df}
# {'A', 'B', 'C'}

Or, if you want a tuple,

*df,  # Please note the trailing comma
# ('A', 'B', 'C')

Or, if you want to store the result somewhere,

*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']

... if you're the kind of person who converts coffee to typing sounds, well, this is going consume your coffee more efficiently ;)

P.S.: if performance is important, you will want to ditch the solutions above in favour of

df.columns.to_numpy().tolist()
# ['A', 'B', 'C']

This is similar to Ed Chum's answer, but updated for v0.24 where .to_numpy() is preferred to the use of .values. See this answer (by me) for more information.

Visual Check
Since I've seen this discussed in other answers, you can utilise iterable unpacking (no need for explicit loops).

print(*df)
A B C

print(*df, sep='\n')
A
B
C

Critique of Other Methods

Don't use an explicit for loop for an operation that can be done in a single line (List comprehensions are okay).

Next, using sorted(df) does not preserve the original order of the columns. For that, you should use list(df) instead.

Next, list(df.columns) and list(df.columns.values) are poor suggestions (as of the current version, v0.24). Both Index (returned from df.columns) and NumPy arrays (returned by df.columns.values) define .tolist() method which is faster and more idiomatic.

Lastly, listification i.e., list(df) should only be used as a concise alternative to the aforementioned methods.

@Simeon Visser 2013-10-20 21:23:07

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use: (as shown in Ed Chum's answer):

list(my_dataframe)

@Tjorriemorrie 2014-11-21 08:30:27

Why does this doc not have columns as an attribute?

@Simeon Visser 2014-11-21 10:18:48

@Tjorriemorrie: I'm not sure, it may have to do with the way they automatically generate their documentation. It is mentioned in other places though: pandas.pydata.org/pandas-docs/stable/…

@alvas 2016-01-13 06:48:03

I would have expect something like df.column_names(). Is this answer still right or is it outdated?

@Simeon Visser 2016-01-13 09:30:26

@alvas there are various other ways to do it (see other answers on this page) but as far as I know there isn't a method on the dataframe directly to produce the list.

@WindChimes 2016-01-25 13:07:44

Importantly, this preserves the column order.

@Davos 2018-05-02 07:20:02

I tried using this with unittest assertListEqual to check the headers in a df matched an expected list, and it tells me it's not a list, but rather a sequence, it looks like array(['colBoolean','colTinyint', 'colSmallnt', ...], dtype=object)

@StefanK 2018-05-09 08:22:12

df.keys().tolist() is more universal, because it works also for older versions of pandas than 0.16.0

@Igor Jakovljevic 2018-11-23 09:53:16

Even though the solution that was provided above is nice. I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function: frame.columns.tolist()

@Timbus Calin 2019-03-16 07:35:07

Note that dataframe[column_name].to_numpy() is the suggested method to get the values of a column as of pandas 0.24.1

@cs95 2019-04-03 09:50:39

This first option is terrible (as of the current version of pandas - v0.24) because it is mixing idioms. If you are going through the trouble to access the numpy array, please use the .tolist() method instead, it is faster and more idiomatic.

@EdChum - Reinstate Monica 2013-10-20 22:25:15

There is a built in method which is the most performant:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function .tolist to return a list.

If performance is not as important to you, Index objects define a .tolist() method that you can call directly:

my_dataframe.columns.tolist()

The difference in performance is obvious:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For those who hate typing, you can just call list on df, as so:

list(df)

@Sascha Gottfried 2018-05-08 09:19:21

Did not down vote, but want to explain: do not rely on implementation details, use "public interface" of DataFrame. Think about the beauty of df.keys()

@EdChum - Reinstate Monica 2018-05-08 09:27:02

@SaschaGottfried the implementation of the DataFrame iterable has not changed since day one: pandas.pydata.org/pandas-docs/stable/basics.html#iteration. The iterable returned from a DataFrame has always been the columns so doing for col in df: should always behave the same unless the developers have a meltdown so list(df) is and should still be a valid method. Note that df.keys() is calling into the internal implementation of the dict-like structure returning the keys which are the columns. Inexplicable downvotes is the collateral damage to be expected on SO so don't worry

@Sascha Gottfried 2018-05-08 11:25:35

I was refering to the implementation details of columns attribute. An hour ago I read about Law of Demeter promoting that the caller should not depend on navigating the internal object model. list(df) does explicit type conversion. Notable side effect: execution time and memory consumption increase with dataframe size df.keys() method is part of the dict-like nature of a DataFrame. Notable fact: execution time for df.keys() is rather constant regardless of dataframe size - part of responsibility of pandas developers.

@EdChum - Reinstate Monica 2018-05-08 12:16:15

@SaschaGottfried I can add this to my answer and credit you seeing as no one else has included this

@Sascha Gottfried 2018-05-08 12:54:12

I can see value in given answer as well as in comments - no need to change anything.

@EdChum - Reinstate Monica 2019-04-03 09:16:00

@coldspeed is df.columns.tolist() the same as df.columns.values.tolist()? calling .values decays to numpy which was why it was faster originally

@Igor Jakovljevic 2019-02-14 10:58:42

Even though the solution that was provided above is nice. I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function: frame.columns.tolist()

frame.columns.tolist() 

@cs95 2019-04-03 09:40:18

Already covered (and beaten to death) in other answers.

@Harikrishna 2018-08-22 20:23:17

This gives us the names of columns in a list:

list(my_dataframe.columns)

Another function called tolist() can be used too:

my_dataframe.columns.tolist()

@cs95 2019-04-03 09:43:12

This has already been covered in other answers. Your first solution also mixes idioms, which is not a great idea. See my comment under another answer.

@Joseph True 2018-08-22 16:17:27

For a quick, neat, visual check, try this:

for col in df.columns:
    print col

@Sunitha G 2018-06-11 06:26:59

This solution lists all the columns of your object my_dataframe:

print(list(my_dataframe))

@cs95 2019-04-03 09:40:23

Already covered in other answers.

@Sascha Gottfried 2014-01-23 17:23:40

A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.

my_dataframe.keys()

Create a list of keys/columns - object method to_list() and pythonic way

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

@cs95 2019-04-03 09:45:11

My tests show df.columns is a lot faster than df.keys(). Not sure why they have both a function and attribute for the same thing (well, it isn't the first time I've seen 10 different ways to do something in pandas).

@Sascha Gottfried 2019-04-09 10:05:42

The intention of my answer was to show a couple of ways to query column labels from a DataFrame and highlight a performance anti-pattern. Nevertheless I like your comments and upvoted your recent answer - since they provide value from a software engineering point of view.

@Vivek 2018-02-16 18:36:08

as answered by Simeon Visser...you could do

list(my_dataframe.columns.values) 

or

list(my_dataframe) # for less typing.

But I think most the sweet spot is:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.

@cs95 2019-04-03 09:42:22

"It is explicit, at the same time not unnecessarily long." I disagree. Calling list has no merit unless you are calling it on df directly (for, example, conciseness). Accessing the .columns attribute returns an Index object that has a tolist() method defined on it, and calling that is more idiomatic than listifying the Index. Mixing idioms just for the sake of completeness is not a great idea. Same goes for listifying the array you get from .values.

@Alexander 2015-05-28 15:58:05

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

@alvas 2016-01-13 06:49:43

Would that list(df) work only with autoincrement dataframes? Or does it work for all dataframes?

@Alexander 2016-01-13 07:28:26

Should work for all. When you are in the debugger, however, you need to use a list comprehension [c for c in df].

@StefanK 2017-12-13 14:47:36

I feel question deserves additional explanation.

As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__ command.

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:

df.keys().tolist() because there is no df.columns method implemented yet.

The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.

@cs95 2019-04-04 21:00:03

The con of keys() is that it is a function call rather than an attribute lookup, so it's always going to be slower. Of course, with constant time accesses, no one really cares about differences like these, but I think it's worth mentioning anyway; df.columns is now a more universally accepted idiom for accessing headers.

@firelynx 2016-03-30 07:19:35

In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

In a code repository

In code I find it more explicit to do

df.columns

Because it tells others reading your code what you are doing.

@cs95 2019-04-03 09:45:53

sorted(df) changes order. Use with caution.

@firelynx 2019-04-03 11:48:39

@coldspeed I do mention this though "Which will produce an easy to read alphabetically ordered list."

@Anton Protopopov 2015-12-04 21:41:53

It's interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

@cs95 2019-04-03 09:48:34

Timings have already been covered in this answer. The reason for the discrepancy is because .values returns the underlying numpy array, and doing something with numpy is almost always faster than doing the same thing with pandas directly.

@fixxxer 2015-04-07 14:50:33

Its gets even simpler (by pandas 0.16.0) :

df.columns.tolist()

will give you the column names in a nice list.

@tegan 2014-12-01 20:31:56

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)

@BrenBarn 2013-10-20 21:20:06

That's available as my_dataframe.columns.

@yeliabsalohcin 2017-09-05 12:59:47

And explicitly as a list by header_list = list(my_dataframe.columns)

@cs95 2019-04-03 09:52:24

^ Or better still: df.columns.tolist().

@user21988 2013-10-20 21:43:30

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

@Sascha Gottfried 2014-01-23 16:22:46

please replace it with a list comprehension.

@Anton Protopopov 2015-12-04 21:31:21

change your first 3 lines to [n for n in dataframe.columns]

@cs95 2019-04-03 09:36:49

Why would you want to go through all this trouble for an operation you can easily do in a single line?

Related Questions

Sponsored Content

22 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

23 Answered Questions

[SOLVED] Renaming columns in pandas

5 Answered Questions

[SOLVED] Convert list of dictionaries to a pandas DataFrame

18 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

12 Answered Questions

[SOLVED] Calling a function of a module by using its name (a string)

  • 2008-08-06 03:36:08
  • ricree
  • 627950 View
  • 1555 Score
  • 12 Answer
  • Tags:   python object

9 Answered Questions

15 Answered Questions

[SOLVED] Selecting multiple columns in a pandas dataframe

13 Answered Questions

[SOLVED] Delete column from pandas DataFrame

7 Answered Questions

[SOLVED] Change data type of columns in Pandas

Sponsored Content