仅供参考,性能/速度对于这个问题并不重要。
我有一个名为cost_table... 的现有熊猫数据框。
+----------+---------+------+-------------------------+-----------------+
| material | percent | qty | price_control_indicator | acct_assign_cat |
+----------+---------+------+-------------------------+-----------------+
| abc111 | 1.00 | 50 | v | # |
| abc222 | 0.25 | 2000 | s | # |
| xyz789 | 0.45 | 0 | v | m |
| def456 | 0.9 | 0 | v | # |
| 123xyz | 0.2 | 0 | v | m |
| lmo888 | 0.6 | 0 | v | m |
+----------+---------+------+-------------------------+-----------------+
Run Code Online (Sandbox Code Playgroud)
我需要cost_source基于多个字段中的值添加一个字段。
谷歌上出现的大多数答案都涉及列表理解或三元运算符,但它们仅包括基于一列中值的逻辑。例如,
cost_table['cost_source'] = ['map' if qty > 0 else None for qty in cost_table['qty']]
这是基于一列中的值工作的,但是我不知道如何扩展它以在多列中包含逻辑(或者是否有可能?)。它似乎也不是一个易读/可维护的解决方案。
我尝试使用for in带有if elif语句的循环,但其中的值cost_table['cost_source']保持不变,适用None于所有行。但是,如果我在循环中打印每一行,则row['cost_source']具有所需的值。
d = {
'material': ['abc111', 'abc222', 'xyz789', 'def456', '123xyz', 'lmo888'],
'percent': [1, .25, .45, .9, .2, .6],
'qty': [50, 2000, 0, 0, 0, 0],
'price_control_indicator': ['v', 's','v', 'v', 'v', 'v'],
'acct_assign_cat': ['#', '#', 'm', '#', 'm', 'm']
}
cost_table = pd.DataFrame(data=d)
cost_table['cost_source'] = None
for index, row in cost_table.iterrows():
if (row['qty'] > 0) or (row['price_control_indicator'] == "s") or (row['acct_assign_cat'] == "#"):
row['cost_source'] = "map"
elif (row['percent'] >= 40) and (row['acct_assign_cat'] == "m"):
row['cost_source'] = "vendor"
else:
row['cost_source'] = None
print(row['cost_source']) # outputs map, vendor, or None as expected
print(cost_table)
Run Code Online (Sandbox Code Playgroud)
哪个输出...
+----------+---------+------+-------------------------+-----------------+-------------+
| material | percent | qty | price_control_indicator | acct_assign_cat | cost_source |
+----------+---------+------+-------------------------+-----------------+-------------+
| abc111 | 1.00 | 50 | v | # | None |
| abc222 | 0.25 | 2000 | s | # | None |
| xyz789 | 0.45 | 0 | v | m | None |
| def456 | 0.9 | 0 | v | # | None |
| 123xyz | 0.2 | 0 | v | m | None |
| lmo888 | 0.6 | 0 | v | m | None |
+----------+---------+------+-------------------------+-----------------+-------------+
Run Code Online (Sandbox Code Playgroud)
这是我想要的结果...
+----------+---------+------+-------------------------+-----------------+-------------+
| material | percent | qty | price_control_indicator | acct_assign_cat | cost_source |
+----------+---------+------+-------------------------+-----------------+-------------+
| abc111 | 1.00 | 50 | v | # | map |
| abc222 | 0.25 | 2000 | s | # | map |
| xyz789 | 0.45 | 0 | v | m | vendor |
| def456 | 0.9 | 0 | v | # | map |
| 123xyz | 0.2 | 0 | v | m | None |
| lmo888 | 0.6 | 0 | v | m | vendor |
+----------+---------+------+-------------------------+-----------------+-------------+
Run Code Online (Sandbox Code Playgroud)
如@bazinga所述,请使用df.apply(lambda x: fun(x),但要使用参数axis=1,因此lambda函数将逐行应用(默认为逐列)。
d = {
'material': ['abc111', 'abc222', 'xyz789', 'def456', '123xyz', 'lmo888'],
'percent': [100, 25, 45, 90, 20, 60],
'qty': [50, 2000, 0, 0, 0, 0],
'price_control_indicator': ['v', 's','v', 'v', 'v', 'v'],
'acct_assign_cat': ['#', '#', 'm', '#', 'm', 'm']
}
cost_table = pd.DataFrame(data=d)
def process_row(row):
if (row['qty'] > 0) or (row['price_control_indicator'] == "s") or (row['acct_assign_cat'] == "#"):
return "map"
elif (row['percent'] >= 40) and (row['acct_assign_cat'] == "m"):
return "vendor"
else:
return None
cost_table['cost_source'] = cost_table.apply(lambda row: process_row(row), axis=1)
print(cost_table)
Run Code Online (Sandbox Code Playgroud)
(我还纠正了一个不一致的地方:在数据procents中可能应该乘以100)
| 归档时间: |
|
| 查看次数: |
590 次 |
| 最近记录: |