How to put the results of loop into a dataframe python

If pandas.DataFrame is iterated by for loop as it is, column names are returned. You can iterate over columns and rows of pandas.DataFrame with the iteritems(), iterrows(), and itertuples() methods.

This article describes the following contents.

  • Iterate pandas.DataFrame in for loop as it is
  • Iterate columns of pandas.DataFrame
    • DataFrame.iteritems()
  • Iterate rows of pandas.DataFrame
    • DataFrame.iterrows()
    • DataFrame.itertuples()
  • Iterate only specific columns
  • Update values in for loop
  • Speed comparison

For more information on the for statement in Python, see the following article.

  • for loop in Python (with range, enumerate, zip, etc.)

Use the following pandas.DataFrame as an example.

import pandas as pd import numpy as np df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]}, index=['Alice', 'Bob']) print(df) # age state point # Alice 24 NY 64 # Bob 42 CA 92

Iterate pandas.DataFrame in for loop as it is

If you iterate pandas.DataFrame in a for loop as is, the column names are returned in order.

for column_name in df: print(column_name) # age # state # point

Iterate columns of pandas.DataFrame

DataFrame.iteritems()

The iteritems() method iterates over columns and returns (column name, Series), a tuple with the column name and the content as pandas.Series.

  • pandas.DataFrame.iteritems — pandas 1.4.2 documentation

for column_name, item in df.iteritems(): print(column_name) print('------') print(type(item)) print(item) print('------') print(item[0], item['Alice'], item.Alice) print(item[1], item['Bob'], item.Bob) print('======\n') # age # ------ # <class 'pandas.core.series.Series'> # Alice 24 # Bob 42 # Name: age, dtype: int64 # ------ # 24 24 24 # 42 42 42 # ====== # # state # ------ # <class 'pandas.core.series.Series'> # Alice NY # Bob CA # Name: state, dtype: object # ------ # NY NY NY # CA CA CA # ====== # # point # ------ # <class 'pandas.core.series.Series'> # Alice 64 # Bob 92 # Name: point, dtype: int64 # ------ # 64 64 64 # 92 92 92 # ====== #

Iterate rows of pandas.DataFrame

The iterrows() and itertuples() methods iterate over rows. The itertuples() method is faster.

If you only need the values for a particular column, it is even faster to iterate over the elements of a given column individually, as explained next. The results of the speed comparison are shown at the end.

DataFrame.iterrows()

The iterrows() method iterates over rows and returns (index, Series), a tuple with the index and the content as pandas.Series.

  • pandas.DataFrame.iterrows — pandas 1.4.2 documentation

for index, row in df.iterrows(): print(index) print('------') print(type(row)) print(row) print('------') print(row[0], row['age'], row.age) print(row[1], row['state'], row.state) print(row[2], row['point'], row.point) print('======\n') # Alice # ------ # <class 'pandas.core.series.Series'> # age 24 # state NY # point 64 # Name: Alice, dtype: object # ------ # 24 24 24 # NY NY NY # 64 64 64 # ====== # # Bob # ------ # <class 'pandas.core.series.Series'> # age 42 # state CA # point 92 # Name: Bob, dtype: object # ------ # 42 42 42 # CA CA CA # 92 92 92 # ====== #

DataFrame.itertuples()

The itertuples() method iterates over rows and returns a tuple of the index and the content. The first element of the tuple is the index.

  • pandas.DataFrame.itertuples — pandas 1.4.2 documentation

By default, it returns a namedtuple named Pandas. Because it is a namedtuple, you can access the value of each element by . as well as [].

for row in df.itertuples(): print(type(row)) print(row) print('------') print(row[0], row.Index) print(row[1], row.age) print(row[2], row.state) print(row[3], row.point) print('======\n') # <class 'pandas.core.frame.Pandas'> # Pandas(Index='Alice', age=24, state='NY', point=64) # ------ # Alice Alice # 24 24 # NY NY # 64 64 # ====== # # <class 'pandas.core.frame.Pandas'> # Pandas(Index='Bob', age=42, state='CA', point=92) # ------ # Bob Bob # 42 42 # CA CA # 92 92 # ====== #

A normal tuple is returned if the name parameter is set to None.

for row in df.itertuples(name=None): print(type(row)) print(row) print(row[0], row[1], row[2], row[3]) print('======\n') # <class 'tuple'> # ('Alice', 24, 'NY', 64) # Alice 24 NY 64 # ====== # # <class 'tuple'> # ('Bob', 42, 'CA', 92) # Bob 42 CA 92 # ====== #

Iterate only specific columns

If you only need the elements of a particular column, you can also write as follows.

The pandas.DataFrame column is pandas.Series.

print(df['age']) # Alice 24 # Bob 42 # Name: age, dtype: int64 print(type(df['age'])) # <class 'pandas.core.series.Series'>

If you apply pandas.Series to a for loop, you can get its values in order. You can get the values of that column in order by specifying a column of pandas.DataFrame and applying it to a for loop.

for age in df['age']: print(age) # 24 # 42

You can also get the values of multiple columns with the built-in zip() function.

  • zip() in Python: Get elements from multiple lists

for age, point in zip(df['age'], df['point']): print(age, point) # 24 64 # 42 92

Use the index attribute if you want to get the index. As in the example above, you can get it together with other columns by zip().

print(df.index) # Index(['Alice', 'Bob'], dtype='object') print(type(df.index)) # <class 'pandas.core.indexes.base.Index'> for index in df.index: print(index) # Alice # Bob for index, state in zip(df.index, df['state']): print(index, state) # Alice NY # Bob CA

Update values in for loop

The pandas.Series returned by the iterrows() method is a copy, not a view, so changing it will not update the original data.

for index, row in df.iterrows(): row['point'] += row['age'] print(df) # age state point # Alice 24 NY 64 # Bob 42 CA 92

You can update it by selecting elements of the original DataFrame with at[].

for index, row in df.iterrows(): df.at[index, 'point'] += row['age'] print(df) # age state point # Alice 24 NY 88 # Bob 42 CA 134

See the following article on at[].

  • pandas: Get/Set element values with at, iat, loc, iloc

However, in many cases, it is not necessary to use a for loop to update an element or to add a new column based on an existing column. It is simpler and faster to write without a for loop.

Same process without a for loop:

df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]}, index=['Alice', 'Bob']) df['point'] += df['age'] print(df) # age state point # Alice 24 NY 88 # Bob 42 CA 134

You can add a new column.

df['new'] = df['point'] + df['age'] * 2 print(df) # age state point new # Alice 24 NY 88 136 # Bob 42 CA 134 218

You can also apply NumPy functions to each element of a column.

df['age_sqrt'] = np.sqrt(df['age']) print(df) # age state point new age_sqrt # Alice 24 NY 88 136 4.898979 # Bob 42 CA 134 218 6.480741

For strings, various methods are provided to process the columns directly. The following is an example of converting to lower case and selecting the first character.

  • pandas: Handle strings (replace, strip, case conversion, etc.)
  • pandas: Slice substrings from each element in columns

df['state_0'] = df['state'].str.lower().str[0] print(df) # age state point new age_sqrt state_0 # Alice 24 NY 88 136 4.898979 n # Bob 42 CA 134 218 6.480741 c

Speed comparison

Compare the speed of iterrows(), itertuples(), and the method of specifying columns.

Use pandas.DataFrame with 100 rows and 10 columns as an example. It is a simple example with only numeric elements, row name index and column name columns are default sequential numbers.

  • numpy.arange(), linspace(): Generate ndarray with evenly spaced values
  • pandas: Get first/last n rows of DataFrame with head(), tail(), slice

import pandas as pd df = pd.DataFrame(pd.np.arange(1000).reshape(100, 10)) print(df.shape) # (100, 10) print(df.head()) # 0 1 2 3 4 5 6 7 8 9 # 0 0 1 2 3 4 5 6 7 8 9 # 1 10 11 12 13 14 15 16 17 18 19 # 2 20 21 22 23 24 25 26 27 28 29 # 3 30 31 32 33 34 35 36 37 38 39 # 4 40 41 42 43 44 45 46 47 48 49 print(df.tail()) # 0 1 2 3 4 5 6 7 8 9 # 95 950 951 952 953 954 955 956 957 958 959 # 96 960 961 962 963 964 965 966 967 968 969 # 97 970 971 972 973 974 975 976 977 978 979 # 98 980 981 982 983 984 985 986 987 988 989 # 99 990 991 992 993 994 995 996 997 998 999

Note that the code below uses the Jupyter Notebook magic command %%timeit and does not work when run as a Python script.

  • Measure execution time with timeit in Python

%%timeit for i, row in df.iterrows(): pass # 4.53 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %%timeit for t in df.itertuples(): pass # 981 µs ± 43.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %%timeit for t in df.itertuples(name=None): pass # 718 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %%timeit for i in df[0]: pass # 15.6 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %%timeit for i, j, k in zip(df[0], df[4], df[9]): pass # 46.1 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) %%timeit for t in zip(df[0], df[1], df[2], df[3], df[4], df[5], df[6], df[7], df[8], df[9]): pass # 147 µs ± 3.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

iterrows() is slow because it converts each row to pandas.Series.

itertuples() is faster than iterrows(), but the method of specifying columns is the fastest. In the example environment, it is faster than itertuples() even if all columns are specified.

As the number of rows increases, iterrows() becomes even slower. You should try using itertuples() or column specification in such a case.

Of course, as mentioned above, it is best not to use the for loop if it is not necessary.

How do I put the results of a loop into a DataFrame in Python?

Step 1 - Import the library. import pandas as pd. ... .
Step 2 - Setup the Data. df= pd.DataFrame({'Table of 9': [9,18,27], 'Table of 10': [10,20,30]}) ... .
Step 3 - Appending dataframe in a for loop. ... .
Step 4 - Printing results. ... .
Step 5 - Let's look at our dataset now..

How do you add data to a DataFrame using a FOR loop?

How to append rows to a pandas DataFrame using a for loop in....
Combine the column names as keys with the column data as values using zip(keys, values).
Create a dictionary with the zipped iterator using dict(zipped).
Store the created dictionary in a list..

How do I add results to a DataFrame?

Dataframe append syntax Using the append method on a dataframe is very simple. You type the name of the first dataframe, and then . append() to call the method. Then inside the parenthesis, you type the name of the second dataframe, which you want to append to the end of the first.

How do you add values to a column in pandas DataFrame using for loop?

Python3.
Create a Pandas DataFrame from a Numpy array and specify the index column and column headers. 18, Aug 20..
Create a pandas column using for loop. 14, Jan 19..
Get column index from column name of a given Pandas DataFrame. ... .
Convert given Pandas series into a dataframe with its index as another column on the dataframe..

Postingan terbaru

LIHAT SEMUA