2014-01-28 20:04:04 8 Comments

I'm working with boolean index in Pandas. The question is why the statement:

```
a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]
```

works fine whereas

```
a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]
```

exits with error?

Example:

```
a=pd.DataFrame({'x':[1,1],'y':[10,20]})
In: a[(a['x']==1)&(a['y']==10)]
Out: x y
0 1 10
In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

### Related Questions

#### Sponsored Content

#### 10 Answered Questions

#### 23 Answered Questions

#### 29 Answered Questions

#### 23 Answered Questions

### [SOLVED] Does Python have a ternary conditional operator?

**2008-12-27 08:32:18****Devoted****1791202**View**5563**Score**23**Answer- Tags: python operators ternary-operator conditional-operator

#### 18 Answered Questions

#### 18 Answered Questions

#### 5 Answered Questions

#### 19 Answered Questions

#### 13 Answered Questions

#### 3 Answered Questions

### [SOLVED] Python Pandas add column for row-wise max value of selected columns

**2013-11-17 16:22:56****user2333196****83156**View**72**Score**3**Answer- Tags: python python-2.7 pandas max

## 3 comments

## @cs95 2019-01-25 02:53:59

## TLDR;

_{Logical Operators in Pandas are &, | and ~, and parentheses (...) is important!}Python's

`and`

,`or`

and`not`

logical operators are designed to work with scalars. So Pandas had to do one better and override the bitwise operators to achievevectorized(element-wise) version of this functionality.So the following in python (

`exp1`

and`exp2`

are expressions which evaluate to a boolean result)......will translate to...

for pandas.

If in the process of performing logical operation you get a

`ValueError`

, then you need to use parentheses for grouping:For example,

And so on.

Boolean Indexing: A common operation is to compute boolean masks through logical conditions to filter the data. Pandas providesthreeoperators:`&`

for logical AND,`|`

for logical OR, and`~`

for logical NOT.Consider the following setup:

Logical ANDFor

`df`

above, say you'd like to return all rows where A < 5 and B > 5. This is done by computing masks for each condition separately, and ANDing them.Overloaded Bitwise`&`

OperatorBefore continuing, please take note of this particular excerpt of the docs, which state

So, with this in mind, element wise logical AND can be implemented with the bitwise operator

`&`

:And the subsequent filtering step is simply,

The parentheses are used to override the default precedence order of bitwise operators, which have higher precedence over the conditional operators

`<`

and`>`

. See the section of Operator Precedence in the python docs.If you do not use parentheses, the expression is evaluated incorrectly. For example, if you accidentally attempt something such as

It is parsed as

Which becomes,

Which becomes (see the python docs on chained operator comparison),

Which becomes,

Which throws

So, don't make that mistake!

^{1}Avoiding Parentheses GroupingThe fix is actually quite simple. Most operators have a corresponding bound method for DataFrames. If the individual masks are built up using functions instead of conditional operators, you will no longer need to group by parens to specify evaluation order:

See the section on Flexible Comparisons.. To summarise, we have

Another option for avoiding parentheses is to use

`DataFrame.query`

(or`eval`

):I have

extensivelydocumented`query`

and`eval`

in Dynamic Expression Evaluation in pandas using pd.eval().`operator.and_`

Allows you to perform this operation in a functional manner. Internally calls

`Series.__and__`

which corresponds to the bitwise operator.You won't usually need this, but it is useful to know.

Generalizing:`np.logical_and`

(and`logical_and.reduce`

)Another alternative is using

`np.logical_and`

, which also does not need parentheses grouping:`np.logical_and`

is a ufunc (Universal Functions), and most ufuncs have a`reduce`

method. This means it is easier to generalise with`logical_and`

if you have multiple masks to AND. For example, to AND masks`m1`

and`m2`

and`m3`

with`&`

, you would have to doHowever, an easier option is

This is powerful, because it lets you build on top of this with more complex logic (for example, dynamically generating masks in a list comprehension and adding all of them):

_{1 - I know I'm harping on this point, but please bear with me. This is a very, very common beginner's mistake, and must be explained very thoroughly. }Logical ORFor the

`df`

above, say you'd like to return all rows where A == 3 or B == 7.Overloaded Bitwise`|`

If you haven't yet, please also read the section on

Logical ANDabove, all caveats apply here.Alternatively, this operation can be specified with

`operator.or_`

Calls

`Series.__or__`

under the hood.`np.logical_or`

For two conditions, use

`logical_or`

:For multiple masks, use

`logical_or.reduce`

:Logical NOTGiven a mask, such as

If you need to invert every boolean value (so that the end result is

`[False, False, True]`

), then you can use any of the methods below.Bitwise`~`

Again, expressions need to be parenthesised.

This internally calls

But don't use it directly.

`operator.inv`

Internally calls

`__invert__`

on the Series.`np.logical_not`

This is the numpy variant.

Note,

`np.logical_and`

can be substituted for`np.bitwise_and`

,`logical_or`

with`bitwise_or`

, and`logical_not`

with`invert`

.## @flow2k 2019-06-13 21:40:47

@ cs95 in the TLDR, for element-wise boolean OR, you advocate using

`|`

, which is equivalent to`numpy.bitwise_or`

, instead of`numpy.logical_or`

. May I ask why? Isn't`numpy.logical_or`

designed for this task specifically? Why add the burden of doing it bitwise for each pair of elements?## @cs95 2019-06-13 21:50:21

@flow2k can you quote the relevant text please? I cannot find what you're referring to. FWIW I maintain that logical_* is the correct functional equivalent of the operators.

## @flow2k 2019-06-13 21:59:08

@ cs95 I am referring to the first line of the Answer: "TLDR; Logical Operators in Pandas are &, | and ~".

## @cs95 2019-06-13 22:05:05

@flow2k It is literally in the documentation: "Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not."

## @flow2k 2019-06-14 00:11:48

@ cs95, ok, I just read this section, and it does use

`|`

for element-wise boolean operation. But to me, that documentation is more of a "tutorial", and in contrast, I feel these API references are closer to the source of truth: numpy.bitwise_or and numpy.logical_or - so I'm trying to make sense of what is described here.## @flow2k 2019-06-14 00:11:52

What's unclear to me is this: In the first numpy doc, it is mentioned

`numpy.bitwise_or`

is equivalent to`|`

. But they don't say`numpy.bitwise_or`

is functionally equivalent to`numpy.logical_or`

. So how can we be sure they are? The former is a bitwise operation so doesn't it depend on NumPy's binary representation of the Boolean values?## @cs95 2019-06-14 01:33:40

@flow2k the main difference between logical and bitwise operations is the short circuiting property (bitwise operators do not short circuit). So in that respect, they are not equivalent. However they do produce the same output for boolean masks.

## @flow2k 2019-06-14 19:05:06

@ cs95 but a boolean value (one element in the boolean mask) is not encoded/represented as 1 bit, or is it? If it's not 1 bit, then a bitwise operation may produce different output than a logical operation.

## @cs95 2019-06-14 19:09:11

@flow2k A bool is represented by an 8 bit number but uses only 1 bit. This is well understood.

## @flow2k 2019-06-15 03:02:59

@ cs95 And the bitwise boolean operations only operate on the 1 bit of the 8? Would you have a reference on this fact?

## @flow2k 2019-06-15 03:17:30

For example, doing a bitwise not operation would normally invert all the bits, including the other 7.

## @cs95 2019-06-15 03:50:44

@flow2k it won't if the array is dtype

`bool`

. Otherwise, you're right. Try it out:`np.bitwise_not([False])`

versus`np.bitwise_not(np.array([False], dtype=object))`

## @flow2k 2019-06-15 20:07:25

@ cs95 that's interesting. I also tested with

`bitwise_xor`

, and it seems these bitwise operators do not blindly work on all bits - it checks the type, as you mentioned; if it's`np._bool`

, it's "smart" enough to know to operate only the meaningful bit. So back to my original point: I now see for Boolean element-wise operations,`|`

and`numpy.bitwise_or`

are equivalent to`numpy.logical_or`

, and`|`

is probably preferred due to succinctness.## @MSeifert 2019-01-25 21:48:36

It's important to realize that you cannot use any of the Python

logical operators(`and`

,`or`

or`not`

) on`pandas.Series`

or`pandas.DataFrame`

s (similarly you cannot use them on`numpy.array`

s with more than one element). The reason why you cannot use those is because they implicitly call`bool`

on their operands which throws an Exception because these data structures decided that the boolean of an array is ambiguous:I did cover this more extensively in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A.

## NumPys logical functions

However NumPy provides element-wise operating equivalents to these operators as functions that can be used on

`numpy.array`

,`pandas.Series`

,`pandas.DataFrame`

, or any other (conforming)`numpy.array`

subclass:`and`

has`np.logical_and`

`or`

has`np.logical_or`

`not`

has`np.logical_not`

`numpy.logical_xor`

which has no Python equivalent but is a logical "exclusive or" operationSo, essentially, one should use (assuming

`df1`

and`df2`

are pandas DataFrames):## Bitwise functions and bitwise operators for booleans

However in case you have boolean NumPy array, pandas Series, or pandas DataFrames you could also use the element-wise bitwise functions (for booleans they are - or at least should be - indistinguishable from the logical functions):

`np.bitwise_and`

or the`&`

operator`np.bitwise_or`

or the`|`

operator`np.invert`

(or the alias`np.bitwise_not`

) or the`~`

operator`np.bitwise_xor`

or the`^`

operatorTypically the operators are used. However when combined with comparison operators one has to remember to wrap the comparison in parenthesis because the bitwise operators have a higher precedence than the comparison operators:

This may be irritating because the Python logical operators have a lower precendence than the comparison operators so you normally write

`a < 10 and b > 10`

(where`a`

and`b`

are for example simple integers) and don't need the parenthesis.## Differences between logical and bitwise operations (on non-booleans)

It is really important to stress that bit and logical operations are only equivalent for boolean NumPy arrays (and boolean Series & DataFrames). If these don't contain booleans then the operations will give different results. I'll include examples using NumPy arrays but the results will be similar for the pandas data structures:

And since NumPy (and similarly pandas) does different things for boolean (Boolean or “mask” index arrays) and integer (Index arrays) indices the results of indexing will be also be different:

## Summary table

Where

the logical operator does not work for NumPy arrays, pandas Series, and pandas DataFrames. The others work on these data structures (and plain Python objects) and work element-wise. However be careful with the bitwise invert on plain Python`bool`

s because the bool will be interpreted as integers in this context (for example`~False`

returns`-1`

and`~True`

returns`-2`

).## @unutbu 2014-01-28 20:22:56

When you say

You are implicitly asking Python to convert

`(a['x']==1)`

and`(a['y']==10)`

to boolean values.NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a boolean value -- in other words, they raise

when used as a boolean value. That's because its unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if

allits elements are True. Others might want it to be True ifanyof its elements are True.Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.

Instead, you must be explicit, by calling the

`empty()`

,`all()`

or`any()`

method to indicate which behavior you desire.In this case, however, it looks like you do not want boolean evaluation, you want

element-wiselogical-and. That is what the`&`

binary operator performs:returns a boolean array.

By the way, as alexpmil notes, the parentheses are mandatory since

`&`

has a higher operator precedence than`==`

. Without the parentheses,`a['x']==1 & a['y']==10`

would be evaluated as`a['x'] == (1 & a['y']) == 10`

which would in turn be equivalent to the chained comparison`(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)`

. That is an expression of the form`Series and Series`

. The use of`and`

with two Series would again trigger the same`ValueError`

as above. That's why the parentheses are mandatory.## @Andy Hayden 2014-01-28 20:37:23

numpy arrays do have this property

ifthey are length one. Only pandas devs (stubbornly) refuse to guess :p## @Andy Hayden 2014-01-28 21:47:35

Discussion here: groups.google.com/forum/#!topic/pydata/XzSHSLlTSZ8

## @Indominus 2016-04-15 21:17:15

Doesn't '&' carry the same ambiguous curve as 'and'? How come when it comes to '&', suddenly all users all agree it should be element-wise, while when they see 'and', their expectations vary?

## @unutbu 2016-04-15 22:58:13

@Indominus: The Python language itself requires that the expression

`x and y`

triggers the evaluation of`bool(x)`

and`bool(y)`

. Python "first evaluates`x`

; if`x`

is false, its value is returned; otherwise,`y`

is evaluated and the resulting value is returned." So the syntax`x and y`

can not be used for element-wised logical-and since only`x`

or`y`

can be returned. In contrast,`x & y`

triggers`x.__and__(y)`

and the`__and__`

method can be defined to return anything we like.## @Alex P. Miller 2017-07-18 18:41:36

Important to note: the parentheses around the

`==`

clause aremandatory.`a['x']==1 & a['y']==10`

returns the same error as in the question.## @Euler_Salter 2018-01-24 09:02:23

What is " | " for?

## @Kyle C 2019-01-25 22:36:48

@Euler_Salter

`|`

is the bitwise`or`

operator. Python operator docs found here.