By ABusy_developerT


2019-12-27 13:13:09 8 Comments

My Input dataframe looks like this:

df = pd.DataFrame({
    'key': [1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
    'col1': [20, np.nan, np.nan, np.nan, 25, np.nan, np.nan,np.nan, 20, np.nan,np.nan,np.nan,25, np.nan,np.nan, np.nan],
    'col2': [np.nan, 'a',np.nan, np.nan, np.nan, 'b', np.nan, np.nan,np.nan, 'c', np.nan, np.nan,np.nan, 'd', np.nan, np.nan],
    'col3': [np.nan, np.nan, 'aa', np.nan, np.nan,np.nan,'bb', np.nan,np.nan, np.nan,'cc', np.nan,np.nan, np.nan,'dd', np.nan],
    'col4': [np.nan, np.nan, np.nan, 50, np.nan, np.nan, np.nan, 200,np.nan, np.nan, np.nan, 100,np.nan, np.nan, np.nan, 300]
})

INPUT:

enter image description here

I need to roll up the missing values for all the columns for the grouping of 'Key' & 'col1'. OR basically fill in the NAN values with the only available non-null value under that column in that grouping.

I am tried fillna with bfill & ffill methods, but it doesn't fill the values correctly as it picks the available value from the upper or lower grouping (specially the cells at the border of group) The output should look like this:

enter image description here

1 comments

@jezrael 2019-12-27 13:20:48

First idea is use GroupBy.apply with remove each column separately with Series.dropna and Series constructor:

f = lambda x : x.apply(lambda y: pd.Series(y.dropna().to_numpy()))
df1 = (df.groupby('key').apply(f)
         .drop('key', 1)
         .dropna(how='all')
         .reset_index(level=1, drop=True)
         .reset_index())
print (df1)
   key  col1 col2 col3   col4
0    1  20.0    a   aa   50.0
1    1  25.0    b   bb  200.0
2    2  20.0    c   cc  100.0
3    2  25.0    d   dd  300.0

Or use reshape by DataFrame.stack, add counter by GroupBy.cumcount and reshape back by Series.unstack:

df1 = df.set_index('key').stack().to_frame('val')
df1 = (df1.set_index(df.groupby(level=[0,1]).cumcount(), append=True)['val']
         .unstack(1)
         .reset_index(level=1, drop=True)
         .reset_index())
print (df1)
   key col1 col2 col3 col4
0    1   20    a   aa   50
1    1   25    b   bb  200
2    2   20    c   cc  100
3    2   25    d   dd  300

Or you can create lists first and then explode for DataFrame:

df1 = df.groupby('key').agg(lambda x: x.dropna().tolist())
comp =[pd.DataFrame(df1[x].tolist(), index=df1.index) for x in df1.columns]
df1 = (pd.concat(comp, axis=1, keys=df1.columns).stack()
        .reset_index(level=1, drop=True)
        .reset_index())
print (df1)
   key  col1 col2 col3   col4
0    1  20.0    a   aa   50.0
1    1  25.0    b   bb  200.0
2    2  20.0    c   cc  100.0
3    2  25.0    d   dd  300.0

@ABusy_developerT 2019-12-27 14:24:23

Thanks. Let me try this one out

Related Questions

Sponsored Content

28 Answered Questions

[SOLVED] How do I concatenate two lists in Python?

14 Answered Questions

[SOLVED] How do I get a substring of a string in Python?

55 Answered Questions

[SOLVED] How to upgrade all Python packages with pip?

  • 2010-04-27 09:23:25
  • thedjpetersen
  • 1231547 View
  • 2071 Score
  • 55 Answer
  • Tags:   python pip

42 Answered Questions

[SOLVED] How to get the current time in Python

  • 2009-01-06 04:54:23
  • user46646
  • 3341021 View
  • 2943 Score
  • 42 Answer
  • Tags:   python datetime time

11 Answered Questions

[SOLVED] How to access environment variable values?

13 Answered Questions

[SOLVED] How to remove a key from a Python dictionary?

22 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas

16 Answered Questions

[SOLVED] How do I copy a file in Python?

34 Answered Questions

[SOLVED] How do I sort a dictionary by value?

10 Answered Questions

[SOLVED] How to select rows from a DataFrame based on column values?

Sponsored Content