By Krzysztof Słowiński

2018-02-01 12:45:06 8 Comments

Having such a Pandas DataFrame df with a sorted, numeric index (representing f.e. time or distance) with possible duplicate values:

     a    b
  0  4.0  1.0
1.5  5.5  2.5
1.5  5.5  2.5
  2  6.0  3.0
4.5  8.5  5.5

I would like to create a column c, with values from column a with index shifts that are matching the original index. All index shifts that are not matching the original index should be still taken into account when filling in the original index values that did not get the value assigned, using f.e. linear interpolation.


Taking 0.5 as an example index shift, column c would be constructed from column a with index values of 0, 0.5, 1.5, 2, 2.5, 4.5 and 5, giving the following intermediate result with the missing values marked below as (i):

  0  Nan(i)
0.5  4.0
1.5  4.75(i)
  2  5.5
2.5  6.0
4.5  7.25(i)
  5  8.5

The final result should be indexed using the original indexes used in df:

     a    b    c
  0  4.0  1.0  Nan(i)
1.5  5.5  2.5  4.75(i)
1.5  5.5  2.5  4.75(i)
  2  6.0  3.0  5.5
4.5  8.5  5.5  7.25(i)

There is a question how to take the value for the duplicate index, in this example one value was chosen, but a mean might be a better appraoch.


@Krzysztof Słowiński 2018-02-01 15:10:49

This is my current approach that takes one of the duplicate index values into account when constructing a new column.

import pandas as pd
import numpy as np

def create_shift(df, column, shift_value, method, name):
    Create a new column based on an existing column with a given shift value. 
    The shifted column is indexed based on an existing index with the
    missing values interpolated using the given method.

    :param df:          DataFrame to create the shift in.
    :param column:      The column name.
    :param shift_value: The value to shift the existing column by.
    :param method:      The interpolation method.
    :param name:        The name used for the newly created column.
    if column in df.columns:
        current_index = df.index
        # creating the shifted index with the 2 decimal point precision
        shift_index = [round(i + shift_value, 2) for i in current_index.values]
        shift_data = pd.Series(data=df[column].tolist(), index=shift_index)
        # removing possible duplicates
        shift_data = shift_data[~shift_data.index.duplicated(keep='first')]
        shift_index = shift_data.index
        missing_index = current_index.difference(shift_index)
        combined_index = pd.Index(np.append(shift_index, missing_index)).sort_values()
        combined_data = shift_data.reindex(combined_index)
        combined_data.interpolate(method=method, inplace=True)
        df[name] = combined_data
        print("[Warning] Cannot create shift {} for missing {} column...".format(name, column))

d1 = {'a': [4.0, 5.5, 5.5, 6.0, 8.5], 'b': [1.0, 2.5, 2.5, 3.0, 5.5]}
df1 = pd.DataFrame(data=d1, index=[0, 1.5, 1.5, 2, 4.5])
create_shift(df1, 'a', 0.5, 'linear', 'c')

@Mr. T 2018-02-01 14:33:50

I think, this is, what you try to achieve:

#define the shift value
index_shift = 0.5
#find values common to both indices before and after the shift
ind_intersect = df.index.intersection(df.index + index_shift)
#create new column
df["c"] = np.nan
#transfer values from column a to column c
df["c"][ind_intersect] = df["a"][ind_intersect - index_shift]

You can of course fill your new column with other values than numpy NaN.

@Krzysztof Słowiński 2018-02-13 09:00:40

It is a good solution, but not all values are taken for filling in the missing values, only the ones that after the shift are contained in the original index. The original index is not evenly spaced and there are any shift values possible, so in the worst case no values will be present.

@Krzysztof Słowiński 2018-02-13 09:04:33

Please check the description with the intermediate step for calculating the result.

@Krzysztof Słowiński 2018-02-13 09:12:33

I've updated the description to better capture the idea of the problem.

Related Questions

Sponsored Content

9 Answered Questions

[SOLVED] How to drop rows of Pandas DataFrame whose value in certain columns is NaN

20 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

3 Answered Questions

[SOLVED] How to query MultiIndex index columns values in pandas

11 Answered Questions

[SOLVED] Select rows from a DataFrame based on values in a column in pandas

4 Answered Questions

[SOLVED] Deleting DataFrame row in Pandas based on column value

  • 2013-08-11 14:14:57
  • TravisVOX
  • 403146 View
  • 279 Score
  • 4 Answer
  • Tags:   python pandas

16 Answered Questions

[SOLVED] How to sort a dataframe by multiple column(s)?

1 Answered Questions

lookup, interpolate and match between pandas dataframes

7 Answered Questions

[SOLVED] Filter dataframe rows if value in column is in a set list of values

Sponsored Content