Selecting Rows Based on a Condition

Modified

Mar 18, 2025

Selecting rows of a pandas DataFrame which meet a condition is logically a two-step process.

  1. Generate a Boolean array by placing a condition on a column.
  2. Select rows corresponding to the index positions where the Boolean array is True.

See the following example.

# Import the pandas library.
import pandas as pd

# Create a DataFrame.
df = pd.DataFrame({"A":[2,4,6,8,10], "B":[1,3,1,3,1]})
df
A B
0 2 1
1 4 3
2 6 1
3 8 3
4 10 1

Suppose we want to select rows where column B has 3.

Step 1: Place a condition on column B. To place a condition on a column, select the column and compare it to a scalar or an array of the same length. We will save the result, so that we can use it in Step 2.

cond_B = df.loc[:,'B'] == 3
cond_B
0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

Step 2: Select rows corresponding to the index positions wher ethe Boolean array has True values. Note that only the .loc[] attribute accepts a Boolean array; we cannot use iloc[].

df.loc[cond_B]
A B
1 4 3
3 8 3

Multiple Conditions

We can select rows which meet more than one condition.

We will use the logical operators: & for ‘and’, | for ‘or’ and ~ for ‘not’.

To combine conditions:

  1. Place a condition on a column to generate a Boolean array. Repeat this for every condition.
  2. Combine all the Boolean arrays using the appropriate logical operators; use parentheses if necessary.
  3. Select rows where the result of the combined Boolean array is True.

In the following example, we will select rows where column A is greater than 5 and B is 3.

Step 1: Generate a Boolean array from each condition.

# We already have cond_B.

cond_A = df.loc[:,"A"] > 5
cond_A
0    False
1    False
2     True
3     True
4     True
Name: A, dtype: bool

Step 2: Combine multiple Boolean arrays into one Boolean array.

Note that we need to use the operators & for ‘and’, | for ‘or’ and ~ for ‘not’ when combining Boolean arrays. The operators and, or and not expect scalar arguments.

cond_combined = cond_B & cond_A
cond_combined
0    False
1    False
2    False
3     True
4    False
dtype: bool

Step 3: Now, select the required rows.

df.loc[cond_combined]
A B
3 8 3

Combining Steps

We can avoid explicitly creating a Boolean array and write code with both (all) steps combined in one line.

However, this makes the code less readable, AND, importantly, parentheses are required.

# Same as the first example.
df.loc[df.loc[:,"B"]==3]
A B
1 4 3
3 8 3
# Same as second example.
# BUT PARENTHESES ARE REQUIRED.
df.loc[(df.loc[:,"B"]==3) & (df.loc[:,"A"]>5)]
A B
3 8 3

Additional Examples for Errors

If parentheses are not used when combining two conditions, Python’s operator precedence dictates how the code is evaluated. It leads to errors.

s1 = pd.Series([2,2,2])
s1
0    2
1    2
2    2
dtype: int64
s2 = pd.Series([1,2,3])
s2
0    1
1    2
2    3
dtype: int64
# No parentheses, error, even though the & operator is used.
s1 == 1 & s2 == 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_228782/473611134.py in ?()
----> 2 # No parentheses, error, even though the & operator is used.
      3 s1 == 1 & s2 == 1

~/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py in ?(self)
   1575     @final
   1576     def __nonzero__(self) -> NoReturn:
-> 1577         raise ValueError(
   1578             f"The truth value of a {type(self).__name__} is ambiguous. "
   1579             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1580         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# One parentheses, UNEXPECTED result!
s1 == 2 & (s2 == 1)
0    False
1    False
2    False
dtype: bool

In the above, s1 is compared to 2 & (s2==1), element-by-element. 1. 2 and True is 1; hence not 2; hence False. 2. 2 and False is 0; hence not 2; hence False. 3. Same as previous row.

Hence the result.

2 & False
0
# Use parentheses around each condition to first generate the Boolean arrays before combining them.
(s1==2) & (s2==1)
0     True
1    False
2    False
dtype: bool
# Pandas version used in this article.
pd.__version__
'2.2.2'

References

Pandas documentation: why parentheses are required.