By jiggy357

2018-06-13 14:14:00 8 Comments

I have two columns home and away. So one row will be England vs Brazil and the next row will be Brazil England. How can I count occurrences of when Brazil faces England or England vs Brazil in one count?

Based on previous solutions, I have tried

results.groupby(["home_team", "away_team"]).size()
results.groupby(["away_team", "home_team"]).size()

however this does not give me the outcome that I am looking for.

Undesired output: home_team away_team
England Brazil 1

away_team home_team
Brazil England 1

I would like to see: England Brazil 2


@HerrIvan 2018-06-13 14:41:36

Option 1

You can use numpy.sort to sort the values of the dataframe However, as that sorts in place, maybe it is better to create a copy of the dataframe.

dfTeams = pd.DataFrame(data=df.values.copy(), columns=['team1','team2'])

(I changed the column names, because with the sorting you are changing their meaning)

After having done this, you can use your groupby.

results.groupby(['team1', 'team2']).size()

Option 2

Since a more general title for your question would be something like how can I count combination of values in multiple columns on a dataframe, independently of their order, you could use a set.

A set object is an unordered collection of distinct hashable objects.

More precisely, create a Series of frozen sets, and then count values.

pd.Series(map(lambda home, away: frozenset({home, away}), 

Note: I use the dataframe in @Harv Ipan's answer.

@jezrael 2018-06-13 14:33:45

You can sort values by numpy.sort, create DataFrame and use your original solution:

df1 = (pd.DataFrame(np.sort(df[['home','away']], axis=1), columns=['home','away'])
        .groupby(["home", "away"])

@harvpan 2018-06-13 14:22:29

May be you need below:

df = pd.DataFrame({
    'home':['England', 'Brazil', 'Spain'],
    'away':['Brazil', 'England', 'Germany']

pd.Series('-'.join(sorted(tup)) for tup in zip(df['home'], df['away'])).value_counts()


Brazil-England    2
Germany-Spain     1
dtype: int64

PS: If you do not like the - between team names, you can use:

pd.Series(' '.join(sorted(tup)) for tup in zip(df['home'], df['away'])).value_counts()

@jiggy357 2018-06-13 14:37:04

This is exactly what I was looking for. Thanks for the help

@harvpan 2018-06-13 14:40:06

@jiggy357, glad I could help.

Related Questions

Sponsored Content

25 Answered Questions

[SOLVED] How can I safely create a nested directory in Python?

16 Answered Questions

[SOLVED] How can I make a time delay in Python?

23 Answered Questions

[SOLVED] How can you profile a Python script?

26 Answered Questions

[SOLVED] How can I remove a trailing newline in Python?

  • 2008-11-08 18:25:24
  • RidingThisToTheTop
  • 1409012 View
  • 1430 Score
  • 26 Answer
  • Tags:   python newline trailing

4 Answered Questions

[SOLVED] pandas create new column based on values from other columns

  • 2014-11-12 12:08:12
  • Dave
  • 235025 View
  • 170 Score
  • 4 Answer
  • Tags:   python numpy pandas

13 Answered Questions

[SOLVED] "Large data" work flows using pandas

4 Answered Questions

[SOLVED] Apply multiple functions to multiple groupby columns

43 Answered Questions

[SOLVED] How can I represent an 'Enum' in Python?

2 Answered Questions

[SOLVED] Getting the size of a groupby operation in Pandas

  • 2017-07-06 14:40:22
  • GreenGodot
  • 118 View
  • 2 Score
  • 2 Answer
  • Tags:   python pandas

2 Answered Questions

[SOLVED] Subtracting Two Columns with a Groupby in Pandas

Sponsored Content