How to put the results of loop into a dataframe python

Question

If pandas.DataFrame is iterated by for loop as it is, column names are returned. You can iterate over columns and rows of pandas.DataFrame with the iteritems(), iterrows(), and itertuples() methods.

Table of Contents Show

Iterate pandas.DataFrame in for loop as it is
Iterate columns of pandas.DataFrame
DataFrame.iteritems()
Iterate rows of pandas.DataFrame
DataFrame.iterrows()
DataFrame.itertuples()
Iterate only specific columns
Update values in for loop
Speed comparison
How do I put the results of a loop into a DataFrame in Python?
How do you add data to a DataFrame using a FOR loop?
How do I add results to a DataFrame?
How do you add values to a column in pandas DataFrame using for loop?

This article describes the following contents.

Iterate pandas.DataFrame in for loop as it is
Iterate columns of pandas.DataFrame
- DataFrame.iteritems()
Iterate rows of pandas.DataFrame
- DataFrame.iterrows()
- DataFrame.itertuples()
Iterate only specific columns
Update values in for loop
Speed comparison

For more information on the for statement in Python, see the following article.

for loop in Python (with range, enumerate, zip, etc.)

Use the following pandas.DataFrame as an example.

import pandas as pd
import numpy as np

df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])

print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92

Iterate pandas.DataFrame in for loop as it is

If you iterate pandas.DataFrame in a for loop as is, the column names are returned in order.

for column_name in df:
    print(column_name)
# age
# state
# point

Iterate columns of pandas.DataFrame

DataFrame.iteritems()

The iteritems() method iterates over columns and returns (column name, Series), a tuple with the column name and the content as pandas.Series.

pandas.DataFrame.iteritems — pandas 1.4.2 documentation

for column_name, item in df.iteritems():
    print(column_name)
    print('------')
    print(type(item))
    print(item)
    print('------')
    print(item[0], item['Alice'], item.Alice)
    print(item[1], item['Bob'], item.Bob)
    print('======\n')
# age
# ------
# <class 'pandas.core.series.Series'>
# Alice    24
# Bob      42
# Name: age, dtype: int64
# ------
# 24 24 24
# 42 42 42
# ======
# 
# state
# ------
# <class 'pandas.core.series.Series'>
# Alice    NY
# Bob      CA
# Name: state, dtype: object
# ------
# NY NY NY
# CA CA CA
# ======
# 
# point
# ------
# <class 'pandas.core.series.Series'>
# Alice    64
# Bob      92
# Name: point, dtype: int64
# ------
# 64 64 64
# 92 92 92
# ======
#

Iterate rows of pandas.DataFrame

The iterrows() and itertuples() methods iterate over rows. The itertuples() method is faster.

If you only need the values for a particular column, it is even faster to iterate over the elements of a given column individually, as explained next. The results of the speed comparison are shown at the end.

DataFrame.iterrows()

The iterrows() method iterates over rows and returns (index, Series), a tuple with the index and the content as pandas.Series.

pandas.DataFrame.iterrows — pandas 1.4.2 documentation

for index, row in df.iterrows():
    print(index)
    print('------')
    print(type(row))
    print(row)
    print('------')
    print(row[0], row['age'], row.age)
    print(row[1], row['state'], row.state)
    print(row[2], row['point'], row.point)
    print('======\n')
# Alice
# ------
# <class 'pandas.core.series.Series'>
# age      24
# state    NY
# point    64
# Name: Alice, dtype: object
# ------
# 24 24 24
# NY NY NY
# 64 64 64
# ======
# 
# Bob
# ------
# <class 'pandas.core.series.Series'>
# age      42
# state    CA
# point    92
# Name: Bob, dtype: object
# ------
# 42 42 42
# CA CA CA
# 92 92 92
# ======
#

DataFrame.itertuples()

The itertuples() method iterates over rows and returns a tuple of the index and the content. The first element of the tuple is the index.

pandas.DataFrame.itertuples — pandas 1.4.2 documentation

By default, it returns a namedtuple named Pandas. Because it is a namedtuple, you can access the value of each element by . as well as [].

for row in df.itertuples():
    print(type(row))
    print(row)
    print('------')
    print(row[0], row.Index)
    print(row[1], row.age)
    print(row[2], row.state)
    print(row[3], row.point)
    print('======\n')
# <class 'pandas.core.frame.Pandas'>
# Pandas(Index='Alice', age=24, state='NY', point=64)
# ------
# Alice Alice
# 24 24
# NY NY
# 64 64
# ======
# 
# <class 'pandas.core.frame.Pandas'>
# Pandas(Index='Bob', age=42, state='CA', point=92)
# ------
# Bob Bob
# 42 42
# CA CA
# 92 92
# ======
#

A normal tuple is returned if the name parameter is set to None.

for row in df.itertuples(name=None):
    print(type(row))
    print(row)
    print(row[0], row[1], row[2], row[3])
    print('======\n')
# <class 'tuple'>
# ('Alice', 24, 'NY', 64)
# Alice 24 NY 64
# ======
# 
# <class 'tuple'>
# ('Bob', 42, 'CA', 92)
# Bob 42 CA 92
# ======
#

Iterate only specific columns

If you only need the elements of a particular column, you can also write as follows.

The pandas.DataFrame column is pandas.Series.

print(df['age'])
# Alice    24
# Bob      42
# Name: age, dtype: int64

print(type(df['age']))
# <class 'pandas.core.series.Series'>

If you apply pandas.Series to a for loop, you can get its values in order. You can get the values of that column in order by specifying a column of pandas.DataFrame and applying it to a for loop.

for age in df['age']:
    print(age)
# 24
# 42

You can also get the values of multiple columns with the built-in zip() function.

zip() in Python: Get elements from multiple lists

for age, point in zip(df['age'], df['point']):
    print(age, point)
# 24 64
# 42 92

Use the index attribute if you want to get the index. As in the example above, you can get it together with other columns by zip().

print(df.index)
# Index(['Alice', 'Bob'], dtype='object')

print(type(df.index))
# <class 'pandas.core.indexes.base.Index'>

for index in df.index:
    print(index)
# Alice
# Bob

for index, state in zip(df.index, df['state']):
    print(index, state)
# Alice NY
# Bob CA

Update values in for loop

The pandas.Series returned by the iterrows() method is a copy, not a view, so changing it will not update the original data.

for index, row in df.iterrows():
    row['point'] += row['age']

print(df)
#        age state  point
# Alice   24    NY     64
# Bob     42    CA     92

You can update it by selecting elements of the original DataFrame with at[].

for index, row in df.iterrows():
    df.at[index, 'point'] += row['age']

print(df)
#        age state  point
# Alice   24    NY     88
# Bob     42    CA    134

See the following article on at[].

pandas: Get/Set element values with at, iat, loc, iloc

However, in many cases, it is not necessary to use a for loop to update an element or to add a new column based on an existing column. It is simpler and faster to write without a for loop.

Same process without a for loop:

df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]},
                  index=['Alice', 'Bob'])
df['point'] += df['age']
print(df)
#        age state  point
# Alice   24    NY     88
# Bob     42    CA    134

You can add a new column.

df['new'] = df['point'] + df['age'] * 2
print(df)
#        age state  point  new
# Alice   24    NY     88  136
# Bob     42    CA    134  218

You can also apply NumPy functions to each element of a column.

df['age_sqrt'] = np.sqrt(df['age'])
print(df)
#        age state  point  new  age_sqrt
# Alice   24    NY     88  136  4.898979
# Bob     42    CA    134  218  6.480741

For strings, various methods are provided to process the columns directly. The following is an example of converting to lower case and selecting the first character.

pandas: Handle strings (replace, strip, case conversion, etc.)
pandas: Slice substrings from each element in columns

df['state_0'] = df['state'].str.lower().str[0]
print(df)
#        age state  point  new  age_sqrt state_0
# Alice   24    NY     88  136  4.898979       n
# Bob     42    CA    134  218  6.480741       c

Speed comparison

Compare the speed of iterrows(), itertuples(), and the method of specifying columns.

Use pandas.DataFrame with 100 rows and 10 columns as an example. It is a simple example with only numeric elements, row name index and column name columns are default sequential numbers.

numpy.arange(), linspace(): Generate ndarray with evenly spaced values
pandas: Get first/last n rows of DataFrame with head(), tail(), slice

import pandas as pd

df = pd.DataFrame(pd.np.arange(1000).reshape(100, 10))
print(df.shape)
# (100, 10)

print(df.head())
#     0   1   2   3   4   5   6   7   8   9
# 0   0   1   2   3   4   5   6   7   8   9
# 1  10  11  12  13  14  15  16  17  18  19
# 2  20  21  22  23  24  25  26  27  28  29
# 3  30  31  32  33  34  35  36  37  38  39
# 4  40  41  42  43  44  45  46  47  48  49

print(df.tail())
#       0    1    2    3    4    5    6    7    8    9
# 95  950  951  952  953  954  955  956  957  958  959
# 96  960  961  962  963  964  965  966  967  968  969
# 97  970  971  972  973  974  975  976  977  978  979
# 98  980  981  982  983  984  985  986  987  988  989
# 99  990  991  992  993  994  995  996  997  998  999

Note that the code below uses the Jupyter Notebook magic command %%timeit and does not work when run as a Python script.

Measure execution time with timeit in Python

%%timeit
for i, row in df.iterrows():
    pass
# 4.53 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
for t in df.itertuples():
    pass
# 981 µs ± 43.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
for t in df.itertuples(name=None):
    pass
# 718 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
for i in df[0]:
    pass
# 15.6 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
for i, j, k in zip(df[0], df[4], df[9]):
    pass
# 46.1 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
for t in zip(df[0], df[1], df[2], df[3], df[4], df[5], df[6], df[7], df[8], df[9]):
    pass
# 147 µs ± 3.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

iterrows() is slow because it converts each row to pandas.Series.

itertuples() is faster than iterrows(), but the method of specifying columns is the fastest. In the example environment, it is faster than itertuples() even if all columns are specified.

As the number of rows increases, iterrows() becomes even slower. You should try using itertuples() or column specification in such a case.

Of course, as mentioned above, it is best not to use the for loop if it is not necessary.

How do I put the results of a loop into a DataFrame in Python?

Step 1 - Import the library. import pandas as pd. ... .

Step 2 - Setup the Data. df= pd.DataFrame({'Table of 9': [9,18,27], 'Table of 10': [10,20,30]}) ... .

Step 3 - Appending dataframe in a for loop. ... .

Step 4 - Printing results. ... .

Step 5 - Let's look at our dataset now..

How do you add data to a DataFrame using a FOR loop?

How to append rows to a pandas DataFrame using a for loop in....

Combine the column names as keys with the column data as values using zip(keys, values).

Create a dictionary with the zipped iterator using dict(zipped).

Store the created dictionary in a list..

How do I add results to a DataFrame?

Dataframe append syntax Using the append method on a dataframe is very simple. You type the name of the first dataframe, and then . append() to call the method. Then inside the parenthesis, you type the name of the second dataframe, which you want to append to the end of the first.