What is iloc [] in python?

Pandas tips and tricks to help you get started with data analysis

What is iloc [] in python?

Photo by Clay Banks on Unsplash

When it comes to select data on a DataFrame, Pandas loc and iloc are two top favorites. They are quick, fast, easy to read, and sometimes interchangeable.

In this article, we’ll explore the differences between loc and iloc, take a looks at their similarities, and check how to perform data selection with them. We will go over the following topics:

  1. Differences between loc and iloc
  2. Selecting via a single value
  3. Selecting via a list of values
  4. Selecting a range of data via slice
  5. Selecting via conditions and callable
  6. loc and iloc are interchangeable when labels are 0-based integers

Please check out Notebook for the source code.

1. Differences between loc and iloc

The main distinction between loc and iloc is:

  • loc is label-based, which means that you have to specify rows and columns based on their row and column labels.
  • iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

Here are some differences and similarities between loc and iloc :

What is iloc [] in python?

Differences and Similarities between loc and iloc (image by author)

For demonstration, we create a DataFrame and load it with the Day column as the index.

df = pd.read_csv('data/data.csv', index_col=['Day'])

What is iloc [] in python?

image by author

2. Selecting via a single value

Both loc and iloc allow input to be a single value. We can use the following syntax for data selection:

  • loc[row_label, column_label]
  • iloc[row_position, column_position]

For example, let’s say we would like to retrieve Friday’s temperature value.

With loc, we can pass the row label 'Fri' and the column label 'Temperature'.

# To get Friday's temperature
>>> df.loc['Fri', 'Temperature']
10.51

The equivalent iloc statement should take the row number 4 and the column number 1 .

# The equivalent `iloc` statement
>>> df.iloc[4, 1]
10.51

We can also use : to return all data. For example, to get all rows:

# To get all rows
>>> df.loc[:, 'Temperature']
Day
Mon 12.79
Tue 19.67
Wed 17.51
Thu 14.44
Fri 10.51
Sat 11.07
Sun 17.50
Name: Temperature, dtype: float64
# The equivalent `iloc` statement
>>> df.iloc[:, 1]

And to get all columns:

# To get all columns
>>> df.loc['Fri', :]
Weather Shower
Temperature 10.51
Wind 26
Humidity 79
Name: Fri, dtype: object
# The equivalent `iloc` statement
>>> df.iloc[4, :]

Note that the above 2 outputs are Series.loc and iloc will return a Series when the result is 1-dimensional data.

3. Selecting via a list of values

We can pass a list of labels to loc to select multiple rows or columns:

# Multiple rows
>>> df.loc[['Thu', 'Fri'], 'Temperature']
Day
Thu 14.44
Fri 10.51
Name: Temperature, dtype: float64
# Multiple columns
>>> df.loc['Fri', ['Temperature', 'Wind']]
Temperature 10.51
Wind 26
Name: Fri, dtype: object

Similarly, a list of integer values can be passed to iloc to select multiple rows or columns. Here are the equivalent statements using iloc:

>>> df.iloc[[3, 4], 1]Day
Thu 14.44
Fri 10.51
Name: Temperature, dtype: float64
>>> df.iloc[4, [1, 2]]Temperature 10.51
Wind 26
Name: Fri, dtype: object

All the above outputs are Series because their results are 1-dimensional data.

The output will be a DataFrame when the result is 2-dimensional data, for example, to access multiple rows and columns

# Multiple rows and columns
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']
df.loc[rows, cols]

What is iloc [] in python?

The equivalent iloc statement is:

rows = [3, 4]
cols = [1, 2]
df.iloc[rows, cols]

4. Selecting a range of data via slice

Slice (written as start:stop:step) is a powerful technique that allows selecting a range of data. It is very useful when we want to select everything in between two items.

loc with slice

With loc, we can use the syntax A:B to select data from label A to label B (Both A and B are included):

# Slicing column labels
rows=['Thu', 'Fri']
df.loc[rows, 'Temperature':'Humidity' ]

What is iloc [] in python?

image by author
# Slicing row labels
cols = ['Temperature', 'Wind']
df.loc['Mon':'Thu', cols]

What is iloc [] in python?

image by author

We can use the syntax A:B:S to select data from label A to label B with step size S (Both A and B are included):

# Slicing with step
df.loc['Mon':'Fri':2 , :]

What is iloc [] in python?

image by author

iloc with slice

With iloc, we can also use the syntax n:m to select data from position n (included) to position m (excluded). However, the main difference here is that the endpoint (m) is excluded from the iloc result.

For example, selecting columns from position 0 up to 3 (excluded):

df.iloc[[1, 2], 0 : 3]

What is iloc [] in python?

image by author

Similarly, we can use the syntax n:m:s to select data from position n (included) to position m (excluded) with step size s. Notes that the endpoint m is excluded.

df.iloc[0:4:2, :]

What is iloc [] in python?

image by author

5. Selecting via conditions and callable

Conditions

loc with conditions

Often we would like to filter the data based on conditions. For example, we may need to find the rows where humidity is greater than 50.

With loc, we just need to pass the condition to the loc statement.

# One condition
df.loc[df.Humidity > 50, :]

What is iloc [] in python?

image by author

Sometimes, we may need to use multiple conditions to filter our data. For example, find all the rows where humidity is more than 50 and the weather is Shower:

## multiple conditions
df.loc[
(df.Humidity > 50) & (df.Weather == 'Shower'),
['Temperature','Wind'],
]

What is iloc [] in python?

image by author

iloc with conditions

For iloc, we will get a ValueError if pass the condition straight into the statement:

# Getting ValueError
df.iloc[df.Humidity > 50, :]

What is iloc [] in python?

image by author

We get the error because iloc cannot accept a boolean Series. It only accepts a boolean list. We can use the list() function to convert a Series into a boolean list.

# Single condition
df.iloc[list(df.Humidity > 50)]

Similarly, we can use list() to convert the output of multiple conditions into a boolean list:

## multiple conditions
df.iloc[
list((df.Humidity > 50) & (df.Weather == 'Shower')),
:,
]

Callable function

loc with callable

loc accepts a callable as an indexer. The callable must be a function with one argument that returns valid output for indexing.

For example to select columns

# Selecting columns
df.loc[:, lambda df: ['Humidity', 'Wind']]

What is iloc [] in python?

And to filter data with a callable:

# With condition
df.loc[lambda df: df.Humidity > 50, :]

What is iloc [] in python?

image by author

iloc with callable

iloc can also take a callable as an indexer.

df.iloc[lambda df: [0,1], :]

What is iloc [] in python?

image by author

To filter data with callable, iloc will require list() to convert the output of conditions into a boolean list:

df.iloc[lambda df: list(df.Humidity > 50), :]

What is iloc [] in python?

image by author

6. loc and iloc are interchangeable when labels are 0-based integers

For demonstration, let’s create a DataFrame with 0-based integers as headers and index labels.

df = pd.read_csv(
'data/data.csv',
header=None,
skiprows=[0],
)

With header=None, the Pandas will generate 0-based integer values as headers. With skiprows=[0], those headers Weather, Temperature, etc we have been using will be skipped.

What is iloc [] in python?

image by author

Now, loc, a label-based data selector, can accept a single integer and a list of integer values. For example:

>>> df.loc[1, 2]
19.67

>>> df.loc[1, [1, 2]]
1 Sunny
2 19.67
Name: 1, dtype: object

The reason they are working is that those integer values (1 and 2) are interpreted as labels of the index. This use is not an integer position along with the index and is a bit confusing.

In this case, loc and iloc are interchangeable when selecting via a single value or a list of values.

>>> df.loc[1, 2] == df.iloc[1, 2]
True
>>> df.loc[1, [1, 2]] == df.iloc[1, [1, 2]]
1 True
2 True
Name: 1, dtype: bool

Note that loc and iloc will return different results when selecting via slice and conditions. They are essentially different because:

  • slice: endpoint is excluded from iloc result, but included in loc
  • conditions: loc accepts boolean Series, but iloc can only accept a boolean list.

Conclusion

Finally, here is a summary

loc is label based and allowed inputs are:

  • A single label 'A' or 2 (Note that 2 is interpreted as a label of the index.)
  • A list of labels ['A', 'B', 'C'] or [1, 2, 3] (Note that 1, 2, 3 are interpreted as labels of the index.)
  • A slice with labels 'A':'C' (Both are included)
  • Conditions, a boolean Series or a boolean array
  • A callable function with one argument

iloc is integer position based and allowed inputs are:

  • An integer e.g. 2.
  • A list or array of integers [1, 2, 3].
  • A slice with integers 1:7(the endpoint 7 is excluded)
  • Conditions, but only accept a boolean array
  • A callable function with one argument

loc and iloc are interchangeable when the labels of Pandas DataFrame are 0-based integers

I hope this article will help you to save time in learning Pandas data selection. I recommend you to check out the documentation to know about other things you can do.

Thanks for reading. Please check out the notebook for the source code and stay tuned if you are interested in the practical aspect of machine learning.

You may be interested in some of my other Pandas articles:

  • Pandas cut() function for transforming numerical data into categorical data
  • Using Pandas method chaining to improve code readability
  • How to do a Custom Sort on Pandas DataFrame
  • All the Pandas shift() you should know for data analysis
  • When to use Pandas transform() function
  • Pandas concat() tricks you should know
  • Difference between apply() and transform() in Pandas
  • All the Pandas merge() you should know
  • Working with datetime in Pandas DataFrame
  • Pandas read_csv() tricks you should know
  • 4 tricks you should know to parse date columns with Pandas read_csv()

More tutorials can be found on my Github

What is the difference between LOC [] and ILOC []?

The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.

What is ILOC () in Python?

The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.

What does ILOC mean?

iloc() is an indexed-based selection technique which means that we have to pass integer index in the method to select a specific row/column. Input you can use for . iloc are: An integer. A list of integers.

What is the ILOC property used for?

iloc[] is a property that is used to select rows and columns by position/index. If the position/index does not exist, it gives an index error.