将 Pandas DataFrame 的行作为字典进行迭代

Question

将 Pandas DataFrame 的行作为字典进行迭代

我需要遍历 Pandas 数据帧，以便将每一行作为函数（实际上是类构造函数）的参数传递给**kwargs. 这意味着每一行都应该像一个字典，键是列名，值是每行对应的列名。

这有效，但它的表现非常糟糕：

import pandas as pd


def myfunc(**kwargs):
    try:
        area = kwargs.get('length', 0)* kwargs.get('width', 0)
        return area
    except TypeError:
        return 'Error : length and width should be int or float'


df = pd.DataFrame({'length':[1,2,3], 'width':[10, 20, 30]})

for i in range(len(df)):
    print myfunc(**df.iloc[i])

Run Code Online (Sandbox Code Playgroud)

关于如何提高性能的任何建议？我尝试使用 try 进行迭代df.iterrows()，但出现以下错误：

类型错误：** 之后的 myfunc() 参数必须是映射，而不是元组

我也试过df.itertuples()and df.values，但要么我遗漏了一些东西，要么意味着我必须将每个元组/ np.array 转换为 pd.Series 或 dict ，这也会很慢。我的限制是脚本必须使用 python 2.7 和 pandas 0.14.1。

Answer 1

avl*_*oss 27

一个干净的选择是这个：

for row_dict in df.to_dict(orient="records"):
    print(row_dict['column_name'])

Run Code Online (Sandbox Code Playgroud)

这是最好的答案 (2认同)
根据最新的文档，现在是 `orient='records'`：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html (2认同)
这也是迭代行的最佳方法，而不会出现 **1)** 强制数据类型（如 `.iterrows()` ）的问题，或 **2)** 保留具有无效 Python 标识符（如 `itertuples() 的列）的问题）`确实如此。 (2认同)

Answer 2

ste*_*sia 16

你可以试试：

for k, row in df.iterrows():
    myfunc(**row)

Run Code Online (Sandbox Code Playgroud)

这k是数据帧索引并且row是一个字典，因此您可以使用以下方式访问任何列：row["my_column_name"]

Answer 3

jpp*_*jpp 1

为此定义一个单独的函数效率很低，因为您正在应用逐行计算。更有效的方法是计算一个新的序列，然后迭代该序列：

df = pd.DataFrame({'length':[1,2,3,'test'], 'width':[10, 20, 30,'hello']})

df2 = df.iloc[:].apply(pd.to_numeric, errors='coerce')

error_str = 'Error : length and width should be int or float'
print(*(df2['length'] * df2['width']).fillna(error_str), sep='\n')

10.0
40.0
90.0
Error : length and width should be int or float

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年前
查看次数：	20052 次
最近记录：	4 年，9 月前