# Import the pandas library.
import pandas as pd
# Create a DataFrame.
df = pd.DataFrame({"A":[2,4,6,8,10], "B":[1,3,1,3,1]})
df| A | B | |
|---|---|---|
| 0 | 2 | 1 |
| 1 | 4 | 3 |
| 2 | 6 | 1 |
| 3 | 8 | 3 |
| 4 | 10 | 1 |
Mar 18, 2025
Selecting rows of a pandas DataFrame which meet a condition is logically a two-step process.
True.See the following example.
# Import the pandas library.
import pandas as pd
# Create a DataFrame.
df = pd.DataFrame({"A":[2,4,6,8,10], "B":[1,3,1,3,1]})
df| A | B | |
|---|---|---|
| 0 | 2 | 1 |
| 1 | 4 | 3 |
| 2 | 6 | 1 |
| 3 | 8 | 3 |
| 4 | 10 | 1 |
Suppose we want to select rows where column B has 3.
Step 1: Place a condition on column B. To place a condition on a column, select the column and compare it to a scalar or an array of the same length. We will save the result, so that we can use it in Step 2.
Step 2: Select rows corresponding to the index positions wher ethe Boolean array has True values. Note that only the .loc[] attribute accepts a Boolean array; we cannot use iloc[].
We can select rows which meet more than one condition.
We will use the logical operators: & for ‘and’, | for ‘or’ and ~ for ‘not’.
To combine conditions:
True.In the following example, we will select rows where column A is greater than 5 and B is 3.
Step 1: Generate a Boolean array from each condition.
0 False
1 False
2 True
3 True
4 True
Name: A, dtype: bool
Step 2: Combine multiple Boolean arrays into one Boolean array.
Note that we need to use the operators & for ‘and’, | for ‘or’ and ~ for ‘not’ when combining Boolean arrays. The operators and, or and not expect scalar arguments.
Step 3: Now, select the required rows.
We can avoid explicitly creating a Boolean array and write code with both (all) steps combined in one line.
However, this makes the code less readable, AND, importantly, parentheses are required.
If parentheses are not used when combining two conditions, Python’s operator precedence dictates how the code is evaluated. It leads to errors.
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_228782/473611134.py in ?() ----> 2 # No parentheses, error, even though the & operator is used. 3 s1 == 1 & s2 == 1 ~/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py in ?(self) 1575 @final 1576 def __nonzero__(self) -> NoReturn: -> 1577 raise ValueError( 1578 f"The truth value of a {type(self).__name__} is ambiguous. " 1579 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1580 ) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In the above, s1 is compared to 2 & (s2==1), element-by-element. 1. 2 and True is 1; hence not 2; hence False. 2. 2 and False is 0; hence not 2; hence False. 3. Same as previous row.
Hence the result.