By Harry


2020-02-14 08:27:25 8 Comments

I have a bunch of column that i want to concatenate and then find how many of them are same in the concat column. I written some code but my dataframe is too big and it takes too long to complete this exercise.

This is what i have done.

import pandas as pd

l = [[1,'a','b','c','d'],[2,'a','c','c','d'],[3,'a','c','c','d'],[4,'a','b','b','d'],[5,'a','c','c','d']]
df = pd.DataFrame(l,columns = ['Serial No','one','two','three','four'])


df["Conc"] = df["one"] + "____" + df["three"] + "____" + df["four"]
df['Yes/No'] = ""


for i in range(0, df.shape[0]):
    for j in range(0, df.shape[0]):
        if (i != j):
            if (df.iloc[i,df.shape[1]-2] == df.iloc[j,df.shape[1]-2]):
                df.iloc[i,df.shape[1]-1] = "yes"

This works on the smaller dataframe but on a bigger one it takes too long. Is there a more efficient way to produce the same result?

2 comments

@Mo7art 2020-02-14 08:49:39

I think this is a faster way to solve this.

import pandas as pd

l = [[1, 'a', 'b', 'c', 'd'], [2, 'a', 'c', 'c', 'd'], [3, 'a', 'c', 'c', 'd'], [4, 'a', 'b', 'b', 'd'],
     [5, 'a', 'c', 'c', 'd']]
df = pd.DataFrame(l, columns=['Serial No', 'one', 'two', 'three', 'four'])

df["Conc"] = df["one"] + "____" + df["three"] + "____" + df["four"]
df['Yes/No'] = ""

df['Yes/No'] = df.duplicated(["Conc"], keep=False)
df = df.replace({'Yes/No': {True: "Yes", False: "No"}})

@Guillem 2020-02-14 08:42:08

You can use broadcasting rules to avoid one loop.

import pandas as pd

l = [[1,'a','b','c','d'],[2,'a','c','c','d'],[3,'a','c','c','d'],[4,'a','b','b','d'],[5,'a','c','c','d']]
df = pd.DataFrame(l,columns = ['Serial No','one','two','three','four'])


df["Conc"] = df["one"] + "____" + df["three"] + "____" + df["four"]
df['Yes/No'] = ""

for i in range(0, df.shape[0]):
  any_eq = df.iloc[i, -2] == df.Conc
  df.iloc[i, -1] = 'yes' if any_eq.any() else 'no'

Related Questions

Sponsored Content

20 Answered Questions

[SOLVED] How to check if any value is NaN in a Pandas DataFrame

13 Answered Questions

[SOLVED] How to drop rows of Pandas DataFrame whose value in a certain column is NaN

24 Answered Questions

[SOLVED] How to count the NaN values in a column in pandas DataFrame

22 Answered Questions

[SOLVED] Combine two columns of text in dataframe in pandas/python

9 Answered Questions

[SOLVED] Deleting DataFrame row in Pandas based on column value

  • 2013-08-11 14:14:57
  • TravisVOX
  • 728514 View
  • 463 Score
  • 9 Answer
  • Tags:   python pandas

12 Answered Questions

[SOLVED] How to find if directory exists in Python

  • 2012-01-19 21:03:20
  • David542
  • 1061080 View
  • 1090 Score
  • 12 Answer
  • Tags:   python directory

28 Answered Questions

[SOLVED] How to change the order of DataFrame columns?

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

19 Answered Questions

[SOLVED] How to sort a dataframe by multiple column(s)

Sponsored Content