Show
Pandas tips and tricks to help you get started with data analysisPhoto by Clay Banks on UnsplashWhen it comes to select data on a DataFrame, Pandas In this article, we’ll explore the differences between
1. Differences between loc and ilocThe main distinction between
Here are some differences and similarities between For demonstration, we create a DataFrame and load it with the Day column as the index. df = pd.read_csv('data/data.csv', index_col=['Day']) image by author2. Selecting via a single valueBoth
For example, let’s say we would like to retrieve Friday’s temperature value. With # To get Friday's temperature The equivalent # The equivalent `iloc` statement We can also use # To get all rows And to get all columns: # To get all columns Note that the above 2 outputs are Series. 3. Selecting via a list of valuesWe can pass a list of labels to # Multiple rows Similarly, a list of integer values can be passed to >>> df.iloc[[3, 4], 1]Day All the above outputs are Series because their results are 1-dimensional data. The output will be a DataFrame when the result is 2-dimensional data, for example, to access multiple rows and columns # Multiple rows and columns The equivalent rows = [3, 4] 4. Selecting a range of data via sliceSlice (written as loc with sliceWith # Slicing column labelsimage by author # Slicing row labelsimage by author We can use the syntax # Slicing with step image by authoriloc with sliceWith For example, selecting columns from position 0 up to 3 (excluded): df.iloc[[1, 2], 0 : 3] image by authorSimilarly, we can use the syntax df.iloc[0:4:2, :] image by author5. Selecting via conditions and callableConditions
Often we would like to filter the data based on conditions. For example, we may need to find the rows where humidity is greater than 50. With # One condition image by authorSometimes, we may need to use multiple conditions to filter our data. For example, find all the rows where humidity is more than 50 and the weather is Shower: ## multiple conditions
For # Getting ValueError image by authorWe get the error because # Single condition Similarly, we can use ## multiple conditions Callable function
For example to select columns # Selecting columns And to filter data with a callable: # With condition image by author
df.iloc[lambda df: [0,1], :] image by authorTo filter data with callable, df.iloc[lambda df: list(df.Humidity > 50), :] image by author6. loc and iloc are interchangeable when labels are 0-based integersFor demonstration, let’s create a DataFrame with 0-based integers as headers and index labels. df = pd.read_csv( With Now, >>> df.loc[1, 2] The reason they are working is that those integer values
( In this case, >>> df.loc[1, 2] == df.iloc[1, 2] Note that
ConclusionFinally, here is a summary
I hope this article will help you to save time in learning Pandas data selection. I recommend you to check out the documentation to know about other things you can do. Thanks for reading. Please check out the notebook for the source code and stay tuned if you are interested in the practical aspect of machine learning. You may be interested in some of my other Pandas articles:
More tutorials can be found on my Github What is the difference between LOC [] and ILOC []?The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.
What is ILOC () in Python?The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.
What does ILOC mean?iloc() is an indexed-based selection technique which means that we have to pass integer index in the method to select a specific row/column. Input you can use for . iloc are: An integer. A list of integers.
What is the ILOC property used for?iloc[] is a property that is used to select rows and columns by position/index. If the position/index does not exist, it gives an index error.
|