its*_*kcl 6 python iteration loops if-statement pandas
所以我有这个 df
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP
SUP1 P1 STR1 50 5 18
SUP1 P1 STR2 6 7 18
SUP1 P1 STR3 74 4 18
SUP2 P4 STR1 35 3 500
SUP2 P4 STR2 5 4 500
SUP2 P4 STR3 54 7 500
Run Code Online (Sandbox Code Playgroud)
它始终按供应商和产品 ID 分组。TO_SHIP 列对于该组是唯一的。例如,我有 18 个产品供 SUP1 和 P1 发送。然后我添加新列:
可视化运行:
首先输出(计算wk_bal,然后发送1 pkg到最低):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 18 10 0
SUP1 P1 STR2 6 4 18 1.5 1
SUP1 P1 STR3 8 4 18 2 0
SUP2 P4 STR1 35 3 500 11.67 0
SUP2 P4 STR2 5 4 500 1.25 1
SUP2 P4 STR3 54 7 500 7.71 0
Run Code Online (Sandbox Code Playgroud)
第二个输出(计算更新的 wk_bal,将1 个pkg 发送到最低):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 17 10 0
SUP1 P1 STR2 8 4 17 1.75 2
SUP1 P1 STR3 8 4 17 2 0
SUP2 P4 STR1 35 3 499 11.67 0
SUP2 P4 STR2 7 4 499 1.5 2
SUP2 P4 STR3 54 7 499 7.71 0
Run Code Online (Sandbox Code Playgroud)
依此类推...所以直到剩下 to_ship 为止,计算排名-给出一包。这个过程的原因是我想确保 wk_balance 最低的商店首先获得包裹。(还有很多其他原因)
我最初是在 SQL 上构建的,但由于复杂性,我转向了 python。不幸的是,我的Python不太擅长提出具有多个条件的循环,特别是在pandas df上。到目前为止我已经尝试过(但失败了):
df['Wk_Bal'] = 0
df['TO_SHIP'] = 0
for i in df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']:
if i > 0:
df['Wk_Bal'] = df['BALANCE'] / df['AVG_SALES']
df['TO_SHIP'] = df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']-1
df['SEND_PKGS'] = + 1
df['BALANCE'] = + 1
else:
df['TO_SHIP'] = 0
Run Code Online (Sandbox Code Playgroud)
我该如何做得更好?
希望我已经理解您的所有要求。这是您的原始数据:
df = pd.DataFrame({'SUPPLIER': ['SUP1', 'SUP1', 'SUP1', 'SUP2', 'SUP2', 'SUP2'],
'PRODUCTID': ['P1', 'P1', 'P1', 'P4', 'P4', 'P4'],
'STOREID': ['STR1', 'STR2', 'STR3', 'STR1', 'STR2', 'STR3'],
'BALANCE': [50, 6, 74, 35, 5, 54],
'AVG_SALES': [5, 4, 4, 3, 4, 7],
'TO_SHIP': [18, 18, 18, 500, 500, 500]})
Run Code Online (Sandbox Code Playgroud)
这是我的方法:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])['Wk_bal'].idxmin()
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
Run Code Online (Sandbox Code Playgroud)
我继续更新,df直到该TO_SHIP列全部为零。然后我递增SEND_PKGS对应于Wk_bal每组中最低值的值。然后更新Wk_bal并减少任何非零TO_SHIP列。
我最终得到:
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP SEND_PKGS Wk_bal
0 SUP1 P1 STR1 50 5 0 0 10.000000
1 SUP1 P1 STR2 6 4 0 18 6.000000
2 SUP1 P1 STR3 74 4 0 0 18.500000
3 SUP2 P4 STR1 35 3 0 92 42.333333
4 SUP2 P4 STR2 5 4 0 165 42.500000
5 SUP2 P4 STR3 54 7 0 243 42.428571
Run Code Online (Sandbox Code Playgroud)
编辑:在有多个最小值的情况下Wk_bal,我们可以根据最小值进行选择AVG_SALES:
def find_min(x):
num_mins = x["Wk_bal"].loc[x["Wk_bal"] == x["Wk_bal"].min()].shape[0]
if num_mins == 1:
return(x["Wk_bal"].idxmin())
else:
min_df = x.loc[x["Wk_bal"] == x["Wk_bal"].min()]
return(min_df["AVG_SALES"].idxmin())
Run Code Online (Sandbox Code Playgroud)
然后,或多或少和以前一样:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])[['Wk_bal', 'AVG_SALES']].apply(find_min)
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
Run Code Online (Sandbox Code Playgroud)