Pre*_*sto 5 python r pandas dplyr pandas-groupby
我正在寻找重新创建一个 R 脚本,但我一直在思考如何在 Python 中重新创建这个管道。我正在分析不同工厂的累计产量,需要对它们的累计生产时间进行归一化,以便进行比较。
管道看起来像这样:
Norm_hrs <- Cum_df%>%
group_by(Name)%>%
complete(Cum_hrs = seq(0,max(Cum_hrs),730.5))
Run Code Online (Sandbox Code Playgroud)
它需要这样:
Name Cum_Hrs A B C
Factory 1 1 0 1.887861 3.775722
Factory 1 251 0 2104.335728 21932.57871
Factory 1 611 0 2324.586178 37498.99722
Factory 1 1208 0 4361.588197 65235.05541
Factory 2 48 0 1517.840244 6604.770432
Factory 2 163 0 3370.461172 17252.70972
Factory 2 822 0 13284.87786 71918.78308
Factory 2 1541 0 21476.93602 134569.0388
Factory 2 2285 0 32053.99192 225895.1477
Factory 2 3028 0 42299.41357 340798.6151
Factory 2 3699 0 50125.85599 462145.5438
Factory 2 4436 0 56715.74945 584474.9989
Run Code Online (Sandbox Code Playgroud)
并把它变成这样:
Name Cum_Hrs A B C
Factory 1 1 0 1.887861 3.775722
Factory 1 251 0 2104.335728 21932.57871
Factory 1 611 0 2324.586178 37498.99722
Factory 1 730.5 NA NA NA
Factory 1 1208 0 4361.588197 65235.05541
Factory 2 48 0 1517.840244 6604.770432
Factory 2 163 0 3370.461172 17252.70972
Factory 2 730.5 NA NA NA
Factory 2 822 0 13284.87786 71918.78308
Factory 2 1461 NA NA NA
Factory 2 1541 0 21476.93602 134569.0388
Factory 2 2091.5 NA NA NA
Factory 2 2285 0 32053.99192 225895.1477
Factory 2 2922 NA NA NA
Factory 2 3028 0 42299.41357 340798.6151
Run Code Online (Sandbox Code Playgroud)
这反过来又允许我为标准化的时间步长在 DataFrame 中插入 NA 的值
只需将所有唯一名称的连续数据帧与增量Cum_Hrs值连接起来即可:
seq_df = pd.concat([pd.DataFrame({'Name': i, 'Cum_Hrs': np.arange(0, max(g['Cum_Hrs']), 730.5)})
for i,g in df.groupby(['Name'])])
final_df = (pd.concat([df, seq_df], sort=True)
.sort_values(['Name', 'Cum_Hrs'])
.reset_index(drop=True)
.reindex(columns=df.columns)
)
print(final_df)
# Name Cum_Hrs A B C
# 0 Factory 1 0.0 NaN NaN NaN
# 1 Factory 1 1.0 0.0 1.887861 3.775722
# 2 Factory 1 251.0 0.0 2104.335728 21932.578710
# 3 Factory 1 611.0 0.0 2324.586178 37498.997220
# 4 Factory 1 730.5 NaN NaN NaN
# 5 Factory 1 1208.0 0.0 4361.588197 65235.055410
# 6 Factory 2 0.0 NaN NaN NaN
# 7 Factory 2 48.0 0.0 1517.840244 6604.770432
# 8 Factory 2 163.0 0.0 3370.461172 17252.709720
# 9 Factory 2 730.5 NaN NaN NaN
# 10 Factory 2 822.0 0.0 13284.877860 71918.783080
# 11 Factory 2 1461.0 NaN NaN NaN
# 12 Factory 2 1541.0 0.0 21476.936020 134569.038800
# 13 Factory 2 2191.5 NaN NaN NaN
# 14 Factory 2 2285.0 0.0 32053.991920 225895.147700
# 15 Factory 2 2922.0 NaN NaN NaN
# 16 Factory 2 3028.0 0.0 42299.413570 340798.615100
# 17 Factory 2 3652.5 NaN NaN NaN
# 18 Factory 2 3699.0 0.0 50125.855990 462145.543800
# 19 Factory 2 4383.0 NaN NaN NaN
# 20 Factory 2 4436.0 0.0 56715.749450 584474.998900
Run Code Online (Sandbox Code Playgroud)
类似的过程可以在基本 R 中处理。通常将基本 R(非 tidyverse)转换为 Pandas 会更容易:
seq==>np.arangeby==>pd.DataFrame.groupbydata.frame==>pd.DataFramedo.call+ rbind==>pd.concatorder==>pd.sort_valuesrow.names=NULL==>pd.reset_index()右
seq_df = pd.concat([pd.DataFrame({'Name': i, 'Cum_Hrs': np.arange(0, max(g['Cum_Hrs']), 730.5)})
for i,g in df.groupby(['Name'])])
final_df = (pd.concat([df, seq_df], sort=True)
.sort_values(['Name', 'Cum_Hrs'])
.reset_index(drop=True)
.reindex(columns=df.columns)
)
print(final_df)
# Name Cum_Hrs A B C
# 0 Factory 1 0.0 NaN NaN NaN
# 1 Factory 1 1.0 0.0 1.887861 3.775722
# 2 Factory 1 251.0 0.0 2104.335728 21932.578710
# 3 Factory 1 611.0 0.0 2324.586178 37498.997220
# 4 Factory 1 730.5 NaN NaN NaN
# 5 Factory 1 1208.0 0.0 4361.588197 65235.055410
# 6 Factory 2 0.0 NaN NaN NaN
# 7 Factory 2 48.0 0.0 1517.840244 6604.770432
# 8 Factory 2 163.0 0.0 3370.461172 17252.709720
# 9 Factory 2 730.5 NaN NaN NaN
# 10 Factory 2 822.0 0.0 13284.877860 71918.783080
# 11 Factory 2 1461.0 NaN NaN NaN
# 12 Factory 2 1541.0 0.0 21476.936020 134569.038800
# 13 Factory 2 2191.5 NaN NaN NaN
# 14 Factory 2 2285.0 0.0 32053.991920 225895.147700
# 15 Factory 2 2922.0 NaN NaN NaN
# 16 Factory 2 3028.0 0.0 42299.413570 340798.615100
# 17 Factory 2 3652.5 NaN NaN NaN
# 18 Factory 2 3699.0 0.0 50125.855990 462145.543800
# 19 Factory 2 4383.0 NaN NaN NaN
# 20 Factory 2 4436.0 0.0 56715.749450 584474.998900
Run Code Online (Sandbox Code Playgroud)