根据选择的列值划分组数据?

Jac*_*ack 5 python pandas

df

\n
   ts_code    type  close\n\n  0 861001.TI   1   648.399\n  1 861001.TI   20  588.574\n  2 861001.TI   30  621.926\n  3 861001.TI   60  760.623\n  4 861001.TI   90  682.313\n  ...   ... ... ...\n  8328  885933.TI   5   1083.141\n  8329  885934.TI   1   951.493\n  8330  885934.TI   5   1011.346\n  8331  885935.TI   1   1086.558\n  8332  885935.TI   5   1028.449\n
Run Code Online (Sandbox Code Playgroud)\n

目标

\n
ts_code    l5d_close l20d_close \xe2\x80\xa6\xe2\x80\xa6 l90d_close\n861001.TI   NaN       1.10          0.95\n\xe2\x80\xa6\xe2\x80\xa6           \xe2\x80\xa6\xe2\x80\xa6       \xe2\x80\xa6\xe2\x80\xa6            \xe2\x80\xa6\xe2\x80\xa6\n
Run Code Online (Sandbox Code Playgroud)\n

我想 groupbyts_code来计算closeof type(1)/the closeof type(N:5,20,30\xe2\x80\xa6\xe2\x80\xa6)861001.TI例如,为l5d_closenan,因为类型为5时没有值。l20d_close等于648.399/588.574=1.10,l90d_close等于648.399/682.313=0.95。并且结果是四舍五入的。

\n

尝试

\n
df.groupby('ts_code')\\\n  .pipe(lambda x: x[x.type==1].close/x[x.type==10].close)\n\nGot: KeyError: 'Column not found: False'\n
Run Code Online (Sandbox Code Playgroud)\n

类型值为:1,5,20,30,60,90,180,200

\n

type注意:每一列都有一个值ts_code

\n

tdy*_*tdy 5

用于sort_values确保type == 1是每组的第一行并使用以下命令提取它们groupby.transform('first')

df = df.sort_values(['ts_code', 'type'])
close1 = df.groupby('ts_code')['close'].transform('first')
df['close'] = close1 / df['close']

#         ts_code  type     close
# 0     861001.TI     1  1.000000
# 1     861001.TI    20  1.101644
# 2     861001.TI    30  1.042566
# 3     861001.TI    60  0.852458
# ...         ...   ...       ...
Run Code Online (Sandbox Code Playgroud)

然后将pivottype插入列标题:

out = (df.pivot(index='ts_code', columns='type', values='close')
         .drop(columns=1)
         .add_prefix('l')
         .add_suffix('d_close'))

# type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
# ts_code
# 861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
# ...              ...         ...         ...         ...         ...
Run Code Online (Sandbox Code Playgroud)

为了链接在一起,在 之前有assign一列:ratiopivot

(df.assign(ratio=df.groupby('ts_code').close.transform('first').div(df.close))
   .pivot(index='ts_code', columns='type', values='ratio')
   .drop(columns=1)
   .add_prefix('l')
   .add_suffix('d_close'))

# type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
# ts_code
# 861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
# ...              ...         ...         ...         ...         ...
Run Code Online (Sandbox Code Playgroud)