By user2938093


2016-02-18 20:01:07 8 Comments

I have a pandas dataFrame with one column that looks like the following:

`
In [207]:df2.teams
Out[207]: 
0         [SF, NYG]
1         [SF, NYG]
2         [SF, NYG]
3         [SF, NYG]
4         [SF, NYG]
5         [SF, NYG]
6         [SF, NYG]
7         [SF, NYG]
`

I need to split this column of lists into 2 columns named team1 and team2 using pandas

3 comments

@Joseph Davison 2018-06-15 17:03:07

Much simpler solution:

pd.DataFrame(df2.teams.tolist(), columns=['team1', 'team2'])

Yields,

  team1 team2
-------------
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG
7    SF   NYG

If you wanted to split a column of delimited strings rather than lists, you could similarly do:

pd.DataFrame(df.teams.str.split('<delim>', expand=True).values,
             columns=['team1', 'team2'])

@mikkokotila 2018-01-09 11:53:39

There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:

df2 = pd.DataFrame(df['meta'].str.split().values.tolist())

@otteheng 2018-01-11 16:29:36

I got an error but I resolved it by removing the str.split(). This was much simpler and has the advantage if you don't know the number of items in your list.

@jezrael 2016-02-18 20:06:49

You can use DataFrame constructor with lists created by converting to numpy array by values with tolist:

import pandas as pd

d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
       teams
0  [SF, NYG]
1  [SF, NYG]
2  [SF, NYG]
3  [SF, NYG]
4  [SF, NYG]
5  [SF, NYG]
6  [SF, NYG]

df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)
       teams team1 team2
0  [SF, NYG]    SF   NYG
1  [SF, NYG]    SF   NYG
2  [SF, NYG]    SF   NYG
3  [SF, NYG]    SF   NYG
4  [SF, NYG]    SF   NYG
5  [SF, NYG]    SF   NYG
6  [SF, NYG]    SF   NYG

And for new DataFrame:

df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
print (df3)
  team1 team2
0    SF   NYG
1    SF   NYG
2    SF   NYG
3    SF   NYG
4    SF   NYG
5    SF   NYG
6    SF   NYG

Solution with apply(pd.Series) is very slow:

#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)

In [89]: %timeit df2['teams'].apply(pd.Series)
1 loop, best of 3: 1.15 s per loop

In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
1000 loops, best of 3: 820 ┬Ás per loop

@Sherlock 2017-07-09 16:21:17

what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.

@jezrael 2017-07-09 16:21:58

I think it works nice too.

@user1700890 2017-11-06 15:16:42

Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.

@jezrael 2017-11-06 15:18:53

@user1700890 - yes, or specify index in DataFrame constructor df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)

@mikkokotila 2018-01-09 11:49:54

@Sherlock generally it's a good measure to change that kind of column names in any case ;)

@dondapati 2018-02-14 13:43:46

@jezrael Why should we use values method could you provide the explanation.

@dondapati 2018-02-14 13:45:02

because pd.DataFrame(df2.teams.tolist(),columns=(['team1','team2'])) it gives same answer. is there any reason to use values

@jezrael 2018-02-14 13:47:16

@user7462639 - Because better performance. Check this - generally converting numpy arrays should be faster as convert Series - pandas functions are obviously slowier as numpy.

@jezrael 2018-02-14 13:48:18

@user7462639 No problem, happy coding!

@kit 2018-05-10 12:03:14

@jezrael-Can you please look at this. I am facing lot of performance issues: stackoverflow.com/questions/50246976/…

@Ramya 2018-06-20 15:00:55

Thanks for this. I am using this method: pd.DataFrame(df2.teams.values.tolist(), index= df2.index), but if my data has missing values it errors out. "object of type 'NoneType' has no len()". Is there an inbuilt fix for it?

@Catbuilts 2018-11-20 11:08:23

It seems to me that apply() is likely to be slow for the most part. Should I avoid using the function if I have alternatives ?

@jezrael 2018-11-20 11:08:59

@Catbuilts - yes, if exist vectorize solution the best avoid it.

@Catbuilts 2018-11-20 11:19:39

@jezrael: Thanks for the advice. Btw, what do you mean by vectorize solution. numpy is a way of vectorize solution, right? What else can be considered as vectorize solution ? Thanks

@jezrael 2018-11-20 11:21:24

@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this

@CheTesta 2019-02-11 09:31:10

@Catbuilts Indeed apply() might be slower but is the go-to method when input string and values are not equal across rows of the original Series!

Related Questions

Sponsored Content

39 Answered Questions

[SOLVED] How to make a flat list out of list of lists

22 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3202422 View
  • 3475 Score
  • 22 Answer
  • Tags:   python directory

33 Answered Questions

[SOLVED] Renaming columns in pandas

18 Answered Questions

[SOLVED] Get list from pandas DataFrame column headers

57 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

17 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

13 Answered Questions

[SOLVED] "Large data" work flows using pandas

1 Answered Questions

[SOLVED] Use a list of values to select rows from a pandas dataframe

7 Answered Questions

[SOLVED] Change data type of columns in Pandas

Sponsored Content