By bigbug


2013-12-17 03:48:02 8 Comments

Background

I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this:

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE

I want to know what exactly it means? Do I need to change something?

How should I suspend the warning if I insist to use quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE?

The function that gives errors

def _decode_stock_quote(list_of_150_stk_str):
    """decode the webpage and return dataframe"""

    from cStringIO import StringIO

    str_of_all = "".join(list_of_150_stk_str)

    quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
    quote_df['TClose'] = quote_df['TPrice']
    quote_df['RT']     = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
    quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
    quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
    quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
    quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
    quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

    return quote_df

More error messages

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

13 comments

@Mikulas 2019-06-08 16:04:04

This topic is really confusing with Pandas. Luckily, it has a relatively simple solution.

The problem is that it is not always clear whether data filtering operations (e.g. loc) return a copy or a view of the DataFrame. Further use of such filtered DataFrame could therefore be confusing.

The simple solution is (unless you need to work with very large sets of data):

Whenever you need to update any values, always make sure that you implicitely copy the DataFrame before the assignment.

df  # Some DataFrame
df = df.loc[:, 0:2]  # Some filtering (unsure whether a view or copy is returned)
df = df.copy()  # Ensuring a copy is made
df[df["Name"] == "John"] = "Johny"  # Assignment can be done now (no warning)

@user443854 2019-02-27 21:26:02

Here I answer the question directly. How to deal with it?

Make a .copy(deep=False) after you slice. See pandas.DataFrame.copy.

Wait, doesn't a slice return a copy? After all, this is what the warning message is attempting to say? Read the long answer:

import pandas as pd
df = pd.DataFrame({'x':[1,2,3]})

This gives a warning:

df0 = df[df.x>2]
df0['foo'] = 'bar'

This does not:

df1 = df[df.x>2].copy(deep=False)
df1['foo'] = 'bar'

Both df0 and df1 are DataFrame objects, but something about them is different that enables pandas to print the warning. Let's find out what it is.

import inspect
slice= df[df.x>2]
slice_copy = df[df.x>2].copy(deep=False)
inspect.getmembers(slice)
inspect.getmembers(slice_copy)

Using your diff tool of choice, you will see that beyond a couple of addresses, the only material difference is this:

|          | slice   | slice_copy |
| _is_copy | weakref | None       |

The method that decides whether to warn is DataFrame._check_setitem_copy which checks _is_copy. So here you go. Make a copy so that your DataFrame is not _is_copy.

The warning is suggesting to use .loc, but if you use .loc on a frame that _is_copy, you will still get the same warning. Misleading? Yes. Annoying? You bet. Helpful? Potentially, when chained assignment is used. But it cannot correctly detect chain assignment and prints the warning indiscriminately.

@delica 2019-05-17 09:47:34

Some may want to simply suppress the warning:

class SupressSettingWithCopyWarning:
    def __enter__(self):
        pd.options.mode.chained_assignment = None

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = 'warn'

with SupressSettingWithCopyWarning():
    #code that produces warning

@cs95 2018-12-28 07:18:55

How to deal with SettingWithCopyWarning in Pandas?

This post is meant for readers who,

  1. Would like to understand what this warning means
  2. Would like to understand different ways of suppressing this warning
  3. Would like to understand how to improve their code and follow good practices to avoid this warning in the future.

Setup

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1

What is the SettingWithCopyWarning?

To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.

When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A "view" is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a "copy" is a replication of data from the original, and modifying the copy has no effect on the original.

As mentioned by other answers, the SettingWithCopyWarning was created to flag "chained assignment" operations. Consider df in the setup above. Suppose you would like to select all values in column "B" where values in column "A" is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,

df[df.A > 5]['B']

1    3
2    6
Name: B, dtype: int64

And,

df.loc[df.A > 5, 'B']

1    3
2    6
Name: B, dtype: int64

These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back. To build on the earlier example, consider how this code is executed by the interpreter:

df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)

With a single __setitem__ call to df. OTOH, consider this code:

df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B", 4)

Now, depending on whether __getitem__ returned a view or a copy, the __setitem__ operation may not work.

In general, you should use loc for label-based assignment, and iloc for integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should use at and iat.

More can be found in the documentation.

Note
All boolean indexing operations done with loc can also be done with iloc. The only difference is that iloc expects either integers/positions for index or a numpy array of boolean values, and integer/position indexes for the columns.

For example,

df.loc[df.A > 5, 'B'] = 4

Can be written nas

df.iloc[(df.A > 5).values, 1] = 4

And,

df.loc[1, 'A'] = 100

Can be written as

df.iloc[1, 0] = 100

And so on.


Just tell me how to suppress the warning!

Consider a simple operation on the "A" column of df. Selecting "A" and dividing by 2 will raise the warning, but the operation will work.

df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df2
     A
0  2.5
1  4.5
2  3.5

There are a couple ways of directly silencing this warning:

  1. Make a deepcopy

    df2 = df[['A']].copy(deep=True)
    df2['A'] /= 2
    
  2. Change pd.options.mode.chained_assignment
    Can be set to None, "warn", or "raise". "warn" is the default. None will suppress the warning entirely, and "raise" will throw a SettingWithCopyError, preventing the operation from going through.

    pd.options.mode.chained_assignment = None
    df2['A'] /= 2
    

@Peter Cotton in the comments, came up with a nice way of non-intrusively changing the mode (modified from this gist) using a context manager, to set the mode only as long as it is required, and the reset it back to the original state when finished.

class ChainedAssignent:
    def __init__(self, chained=None):
        acceptable = [None, 'warn', 'raise']
        assert chained in acceptable, "chained must be in " + str(acceptable)
        self.swcw = chained

    def __enter__(self):
        self.saved_swcw = pd.options.mode.chained_assignment
        pd.options.mode.chained_assignment = self.swcw
        return self

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = self.saved_swcw

The usage is as follows:

# some code here
with ChainedAssignent():
    df2['A'] /= 2
# more code follows

Or, to raise the exception

with ChainedAssignent(chained='raise'):
    df2['A'] /= 2

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

The "XY Problem": What am I doing wrong?

A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem "Y" that is actually a symptom of a deeper rooted problem "X". Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.

Question 1
I have a DataFrame

df
       A  B  C  D  E
    0  5  0  3  3  7
    1  9  3  5  2  4
    2  7  6  8  8  1

I want to assign values in col "A" > 5 to 1000. My expected output is

      A  B  C  D  E
0     5  0  3  3  7
1  1000  3  5  2  4
2  1000  6  8  8  1

Wrong way to do this:

df.A[df.A > 5] = 1000         # works, because df.A returns a view
df[df.A > 5]['A'] = 1000      # does not work
df.loc[df.A  5]['A'] = 1000   # does not work

Right way using loc:

df.loc[df.A > 5, 'A'] = 1000


Question 21
I am trying to set the value in cell (1, 'D') to 12345. My expected output is

   A  B  C      D  E
0  5  0  3      3  7
1  9  3  5  12345  4
2  7  6  8      8  1

I have tried different ways of accessing this cell, such as df['D'][1]. What is the best way to do this?

1. This question isn't specifically related to the warning, but it is good to understand how to do this particular operation correctly so as to avoid situations where the warning could potentially arise in future.

You can use any of the following methods to do this.

df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345


Question 3
I am trying to subset values based on some condition. I have a DataFrame

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

I would like to assign values in "D" to 123 such that "C" == 5. I tried

df2.loc[df2.C == 5, 'D'] = 123

Which seems fine but I am still getting the SettingWithCopyWarning! How do I fix this?

This is actually probably because of code higher up in your pipeline. Did you create df2 from something larger, like

df2 = df[df.A > 5]

? In this case, boolean indexing will return a view, so df2 will reference the original. What you'd need to do is assign df2 to a copy:

df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]


Question 4
I'm trying to drop column "C" in-place from

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

But using

df2.drop('C', axis=1, inplace=True)

Throws SettingWithCopyWarning. Why is this happening?

This is because df2 must have been created as a view from some other slicing operation, such as

df2 = df[df.A > 5]

The solution here is to either make a copy() of df, or use loc, as before.

@cs95 2019-04-03 22:12:22

P.S.: Let me know if your situation is not covered under section 3's list of questions. I will amend my post.

@Garrett 2013-12-17 06:20:23

The SettingWithCopyWarning was created to flag potentially confusing "chained" assignments, such as the following, which don't always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]

df[df['A'] > 2]['B'] = new_val  # new_val not set in df

The warning offers a suggestion to rewrite as follows:

df.loc[df['A'] > 2, 'B'] = new_val

However, this doesn't fit your usage, which is equivalent to:

df = df[df['A'] > 2]
df['B'] = new_val

While it's clear that you don't care about writes making it back to the original frame (since you overwrote the reference to it), unfortunately this pattern can not be differentiated from the first chained assignment example, hence the (false positive) warning. The potential for false positives is addressed in the docs on indexing, if you'd like to read further. You can safely disable this new warning with the following assignment.

pd.options.mode.chained_assignment = None  # default='warn'

@ely 2013-12-17 20:58:45

I think I would mostly be in favor of not warning about this at all. If you work with the chained assignment syntax, you can definitely figure out the order of indexing that needs to happen in order for it to work as expected in any given situation. I think it's overly paranoid that there are these exhaustive precautions about it. In the same spirit as "letting everyone be grown-ups" about 'private' class methods or attributes, I think it's better for pandas to let users be grownups about chained assignments. Only use them if you know what you're doing.

@ely 2013-12-17 21:00:46

It's a bit unPythonic to try to warn folks when they go hacking around for alternatives. The newer-style Pandas methods for access (improved .ix, improved .iloc, etc.) can definitely be viewed as "the primary way" without warning everyone incessantly about other ways. Instead let them be grown-ups and if they want to do chained assignment, so be it. My two cents anyway. One sees disgruntled comments from Pandas devs on here often when chained assignments will work to solve a problem, but wouldn't be considered the "primary" way to do so.

@Jeff Tratner 2013-12-18 22:14:13

@EMS the problem is that it's not always clear from the code where a copy vs. a view is being made and a number of bugs / confusion arises from this issue. We were considering putting in an rc file / options to automatically do configuration, which might be more useful given how the setting with copy warning is working.

@ely 2013-12-18 22:18:42

Whether it is clear from the code or not seems like a matter of opinion to me. Working through enough cases makes it as clear as any other complicated third party API. I don't see what the big deal is. If this is patched over with warnings, then surely tons and tons of other stuff should be too, and you get a warnings free for all.

@Jeff Tratner 2013-12-19 15:56:09

@EMS you could make the same argument for allowing bool(arr) on numpy ndarrays. The goal is to get people to use the right thing and not be surprised when they try to do something and it fails because this time they have an NaN value in one column that they didn't have earlier.

@ely 2013-12-19 16:03:24

I would make that argument for allowing bool(arr) to have some implementation and not raise an error. If you're using ndarray you should know what bool does on that input, rather than the library preventing it or warning because some people might not. I would say that Python's principle of least astonishment is not satisfied by the numpy choice to make that produce an error. It would be less astonishing if it returned True. That is, bool([1,2,3]) returns True, but bool(np.asarray([1,2,3])) raises an error. That's worse, IMO, but it's such a longstanding convention now.

@Garrett 2013-12-19 17:48:00

my two cents: I also hit this warning with the same pattern as the OP: filtering and storing result back into the same variable name. first questions were: what changed and/or what's going to change? Since Jeff Tratner pointed out elsewhere that nothing has changed and, from what I can tell, there are no "view versus copy" changes on the horizon that I need to prepare for, I will disable this warning for now. I support the effort to reduce confusion, but wouldn't say the OP is doing anything wrong.

@Thomas Andrews 2016-11-01 18:34:48

The reason to warn is for people upgrading old code, of course. And I definitely need a warning, because I am dealing with some very ugly old code.

@Muon 2018-07-13 01:08:40

On a side note, I found that disabling the chained_assignment warning: pd.options.mode.chained_assignment = None has resulted in my code running approximately 6 times faster. Anyone else experienced similar results?

@musbur 2019-02-13 07:39:54

Followup beginner question / remark

Maybe a clarification for other beginners like me (I come from R which seems to work a bit differently under the hood). The following harmless-looking and functional code kept producing the SettingWithCopy warning, and I couldn't figure out why. I had both read and understood the issued with "chained indexing", but my code doesn't contain any:

def plot(pdb, df, title, **kw):
    df['target'] = (df['ogg'] + df['ugg']) / 2
    # ...

But then, later, much too late, I looked at where the plot() function is called:

    df = data[data['anz_emw'] > 0]
    pixbuf = plot(pdb, df, title)

So "df" isn't a data frame but an object that somehow remembers that it was created by indexing a data frame (so is that a view?) which would make the line in plot()

 df['target'] = ...

equivalent to

 data[data['anz_emw'] > 0]['target'] = ...

which is a chained indexing. Did I get that right?

Anyway,

def plot(pdb, df, title, **kw):
    df.loc[:,'target'] = (df['ogg'] + df['ugg']) / 2

fixed it.

@firelynx 2016-10-24 09:01:35

Pandas dataframe copy warning

When you go and do something like this:

quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

pandas.ix in this case returns a new, stand alone dataframe.

Any values you decide to change in this dataframe, will not change the original dataframe.

This is what pandas tries to warn you about.


Why .ix is a bad idea

The .ix object tries to do more than one thing, and for anyone who has read anything about clean code, this is a strong smell.

Given this dataframe:

df = pd.DataFrame({"a": [1,2,3,4], "b": [1,1,2,2]})

Two behaviors:

dfcopy = df.ix[:,["a"]]
dfcopy.a.ix[0] = 2

Behavior one: dfcopy is now a stand alone dataframe. Changing it will not change df

df.ix[0, "a"] = 3

Behavior two: This changes the original dataframe.


Use .loc instead

The pandas developers recognized that the .ix object was quite smelly[speculatively] and thus created two new objects which helps in the accession and assignment of data. (The other being .iloc)

.loc is faster, because it does not try to create a copy of the data.

.loc is meant to modify your existing dataframe inplace, which is more memory efficient.

.loc is predictable, it has one behavior.


The solution

What you are doing in your code example is loading a big file with lots of columns, then modifying it to be smaller.

The pd.read_csv function can help you out with a lot of this and also make the loading of the file a lot faster.

So instead of doing this

quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

Do this

columns = ['STK', 'TPrice', 'TPCLOSE', 'TOpen', 'THigh', 'TLow', 'TVol', 'TAmt', 'TDate', 'TTime']
df = pd.read_csv(StringIO(str_of_all), sep=',', usecols=[0,3,2,1,4,5,8,9,30,31])
df.columns = columns

This will only read the columns you are interested in, and name them properly. No need for using the evil .ix object to do magical stuff.

@jf328 2016-11-07 15:12:53

"The pandas developers recognized that the .ix object was quite smelly[speculatively] and thus created two new objects" -- what's the other one?

@Brian Bien 2016-11-14 04:07:53

@jf328 .iloc I think

@Ninjakannon 2017-05-13 15:02:59

Yes, it is .iloc. These are the two primary methods for indexing pandas data structures. Read more in the documentation.

@boldnik 2017-05-26 20:31:03

How do one should replace a DataFrame column with timestamps into column with date time object or string ?

@firelynx 2017-05-27 06:38:59

@boldnik Check this answer stackoverflow.com/a/37453925/3730397

@Jeff 2013-12-17 20:49:10

In general the point of the SettingWithCopyWarning is to show users (and especially new users) that they may be operating on a copy and not the original as they think. There are false positives (IOW if you know what you are doing it could be ok). One possibility is simply to turn off the (by default warn) warning as @Garrett suggest.

Here is another option:

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [2]: dfa = df.ix[:, [1, 0]]

In [3]: dfa.is_copy
Out[3]: True

In [4]: dfa['A'] /= 2
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!/usr/local/bin/python

You can set the is_copy flag to False, which will effectively turn off the check, for that object:

In [5]: dfa.is_copy = False

In [6]: dfa['A'] /= 2

If you explicitly copy then no further warning will happen:

In [7]: dfa = df.ix[:, [1, 0]].copy()

In [8]: dfa['A'] /= 2

The code the OP is showing above, while legitimate, and probably something I do as well, is technically a case for this warning, and not a false positive. Another way to not have the warning would be to do the selection operation via reindex, e.g.

quote_df = quote_df.reindex(columns=['STK', ...])

Or,

quote_df = quote_df.reindex(['STK', ...], axis=1)  # v.0.21

@bigbug 2013-12-18 04:38:26

Thanks for the information & discussion, I just turn off the warning to let the console silent on this. It sounds like the view & table in SQL database. I need to know more about the benefit of introduction of 'copy' concept, but IMHO it is somewhat a burden to take care of the subtle semantic、syntax difference ..

@Jeff Tratner 2013-12-18 21:04:42

@bigbug the copy behavior isn't new, it's always been going on, instead it's that people were accidentally creating copies rather than views and encountering subtle bugs.

@Jeff Tratner 2013-12-18 21:05:38

@Jeff do you mean False rather than Flag?

@sedeh 2014-12-26 23:30:00

@Jeff There is a similar question here. Any thoughts?

@rdchambers 2015-04-25 14:27:12

I agree with the copy() ; it's clear and it fixed my issue (which was a false positive).

@dashesy 2015-07-06 21:58:03

after update to 0.16 I see many more false positives, problem with false positives is one learns to ignore it, even though sometimes it is legit.

@Jeff 2015-07-06 22:13:40

@dashesy if you think that you actually have a false positive, pls report an issue. The point is that these are NOT false positives. You can ignore at your own risk.

@dashesy 2015-07-07 16:00:11

@Jeff the reason I think I see some false positives is that they do the expected behavior (they change the original dataframe and not the copy), so are they false positives? I would understand a deprecation warning to disallow this in the api but throwing a warning that sometimes is legit and some times it is not, makes them like deprecated but not quite. It is like numpy throwing a warning any time two arrays are divided, about the possibility of having a 0 in the denominator. That sort of stuff (corner cases) should be just documented (unless they can be warned without false positive).

@Jeff 2015-07-07 16:09:38

@dashesy you are missing the point. sometimes maybe even most of the time it might work. But it can happen for instance if the frame is bigger/smaller or you add a column say of a different dtype that it doesn't work. That's the point. You are doing something that may work but is not guaranteed. This is very different from deprecation warnings. If you want to continue using it and it works, great. But be forewarned.

@dashesy 2015-07-07 18:50:12

@Jeff makes sense now, so it is an undefined behavior. If anything it should throw an error then (to avoid pitfalls seen in C), since api is frozen the current behavior of warning makes sense for backward compatibility. And I will make them throw to catch them as errors in my production code (warnings.filterwarnings('error', r'SettingWithCopyWarning). Also the suggestion to use .loc sometimes does not help either (if it is in a group).

@jrouquie 2018-03-09 09:48:49

This should work:

quote_df.loc[:,'TVol'] = quote_df['TVol']/TVOL_SCALE

@hughdbrown 2017-10-13 14:45:16

You could avoid the whole problem like this, I believe:

return (
    pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    .rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    .ix[:,[0,3,2,1,4,5,8,9,30,31]]
    .assign(
        TClose=lambda df: df['TPrice'],
        RT=lambda df: 100 * (df['TPrice']/quote_df['TPCLOSE'] - 1),
        TVol=lambda df: df['TVol']/TVOL_SCALE,
        TAmt=lambda df: df['TAmt']/TAMT_SCALE,
        STK_ID=lambda df: df['STK'].str.slice(13,19),
        STK_Name=lambda df: df['STK'].str.slice(21,30)#.decode('gb2312'),
        TDate=lambda df: df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10]),
    )
)

Using Assign. From the documentation: Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones.

See Tom Augspurger's article on method chaining in pandas: https://tomaugspurger.github.io/method-chaining

@Petr Szturc 2017-11-27 09:39:57

For me this issue occured in a following >simplified< example. And I was also able to solve it (hopefully with a correct solution):

old code with warning:

def update_old_dataframe(old_dataframe, new_dataframe):
    for new_index, new_row in new_dataframe.iterrorws():
        old_dataframe.loc[new_index] = update_row(old_dataframe.loc[new_index], new_row)

def update_row(old_row, new_row):
    for field in [list_of_columns]:
        # line with warning because of chain indexing old_dataframe[new_index][field]
        old_row[field] = new_row[field]  
    return old_row

This printed the warning for the line old_row[field] = new_row[field]

Since the rows in update_row method are actually type Series, I replaced the line with:

old_row.at[field] = new_row.at[field]

i.e. method for accessing/lookups for a Series. Eventhough both works just fine and the result is same, this way I don't have to disable the warnings (=keep them for other chain indexing issues somewhere else).

I hope this may help someone.

@Raphvanns 2017-07-27 22:19:01

To remove any doubt, my solution was to make a deep copy of the slice instead of a regular copy. This may not be applicable depending on your context (Memory constraints / size of the slice, potential for performance degradation - especially if the copy occurs in a loop like it did for me, etc...)

To be clear, here is the warning I received:

/opt/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:54:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Illustration

I had doubts that the warning was thrown because of a column I was dropping on a copy of the slice. While not technically trying to set a value in the copy of the slice, that was still a modification of the copy of the slice. Below are the (simplified) steps I have taken to confirm the suspicion, I hope it will help those of us who are trying to understand the warning.

Example 1: dropping a column on the original affects the copy

We knew that already but this is a healthy reminder. This is NOT what the warning is about.

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A   B
0   111 121
1   112 122
2   113 123


>> df2 = df1
>> df2

A   B
0   111 121
1   112 122
2   113 123

# Dropping a column on df1 affects df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
    B
0   121
1   122
2   123

It is possible to avoid changes made on df1 to affect df2

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

A   B
0   111 121
1   112 122
2   113 123

>> import copy
>> df2 = copy.deepcopy(df1)
>> df2
A   B
0   111 121
1   112 122
2   113 123

# Dropping a column on df1 does not affect df2
>> df1.drop('A', axis=1, inplace=True)
>> df2
    A   B
0   111 121
1   112 122
2   113 123

Example 2: dropping a column on the copy may affect the original

This actually illustrates the warning.

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A   B
0   111 121
1   112 122
2   113 123

>> df2 = df1
>> df2

    A   B
0   111 121
1   112 122
2   113 123

# Dropping a column on df2 can affect df1
# No slice involved here, but I believe the principle remains the same?
# Let me know if not
>> df2.drop('A', axis=1, inplace=True)
>> df1

B
0   121
1   122
2   123

It is possible to avoid changes made on df2 to affect df1

>> data1 = {'A': [111, 112, 113], 'B':[121, 122, 123]}
>> df1 = pd.DataFrame(data1)
>> df1

    A   B
0   111 121
1   112 122
2   113 123

>> import copy
>> df2 = copy.deepcopy(df1)
>> df2

A   B
0   111 121
1   112 122
2   113 123

>> df2.drop('A', axis=1, inplace=True)
>> df1

A   B
0   111 121
1   112 122
2   113 123

Cheers!

@Steohan 2017-06-24 01:30:48

If you have assigned the slice to a variable and want to set using the variable as in the following:

df2 = df[df['A'] > 2]
df2['B'] = value

And you do not want to use Jeffs solution because your condition computing df2 is to long or for some other reason, then you can use the following:

df.loc[df2.index.tolist(), 'B'] = value

df2.index.tolist() returns the indices from all entries in df2, which will then be used to set column B in the original dataframe.

@Claudiu Creanga 2018-06-14 14:08:08

this is 9 time more expensive then df["B"] = value

@gies0r 2018-09-23 19:26:20

Can you explain this more deeply @ClaudiuCreanga?

Related Questions

Sponsored Content

9 Answered Questions

[SOLVED] Select rows from a DataFrame based on values in a column in pandas

22 Answered Questions

[SOLVED] Renaming columns in pandas

23 Answered Questions

[SOLVED] Adding new column to existing DataFrame in Python pandas

17 Answered Questions

[SOLVED] How to iterate over rows in a DataFrame in Pandas?

22 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3374588 View
  • 3474 Score
  • 22 Answer
  • Tags:   python directory

40 Answered Questions

[SOLVED] How to merge two dictionaries in a single expression?

38 Answered Questions

[SOLVED] How do I check whether a file exists without exceptions?

14 Answered Questions

[SOLVED] Delete column from pandas DataFrame

2 Answered Questions

[SOLVED] Replace letter in string pandas

  • 2018-08-31 11:12:37
  • BERA
  • 46 View
  • 3 Score
  • 2 Answer
  • Tags:   pandas

3 Answered Questions

[SOLVED] Dataframe Warning : SettingWithCopyWarning in python

Sponsored Content