在 Pandas 中，如何将“长”表转换为“宽且稀疏”表？

Question

在 Pandas 中，如何将“长”表转换为“宽且稀疏”表？

use*_*764 5 python numpy sparse-matrix dataframe pandas

我的术语很糟糕，所以这个术语值得一些解释。想象一下我有一个像这样的 DataFrame（我称之为“长”表）：

time       stock     price
---------------------------
13:03:00   AAPL      100.00
13:03:00   SPY       200.00
13:03:01   AAPL      100.01
13:03:02   SPY       200.01
13:03:03   SPY       200.02
.
.
.

Run Code Online (Sandbox Code Playgroud)

我想将其转换为这样的 DataFrame（我称之为“宽而稀疏”表）：

time       AAPL      SPY
---------------------------
13:03:00   100.00    200.00
13:03:01   100.01    Nan
13:03:02   Nan       200.01
13:03:03   Nan       200.02

Run Code Online (Sandbox Code Playgroud)

显然这是一个很大的转变。是否有内置函数可以执行此操作？看起来这可能是一件很常见的事情。

谢谢！

Answer 1

jez*_*ael 6

您可以使用pivot：

df = df.pivot(index='time', columns='stock', values='price')
print (df)
stock       AAPL     SPY
time                    
13:03:00  100.00  200.00
13:03:01  100.01     NaN
13:03:02     NaN  200.01
13:03:03     NaN  200.02

Run Code Online (Sandbox Code Playgroud)

另一个解决方案unstack：

df = df.set_index(['time', 'stock']).price.unstack()
print (df)
stock       AAPL     SPY
time                    
13:03:00  100.00  200.00
13:03:01  100.01     NaN
13:03:02     NaN  200.01
13:03:03     NaN  200.02

Run Code Online (Sandbox Code Playgroud)

但如果得到：

ValueError：索引包含重复条目，无法重塑

是否需要pivot_table与某些聚合函数一起使用，默认np.mean.

print (df)
       time stock   price
0  13:03:00  AAPL  100.00
1  13:03:00   SPY  200.00
2  13:03:01  AAPL  100.01
3  13:03:02   SPY  200.01
4  13:03:03   SPY  200.02
5  13:03:03   SPY  500.02 <- duplicates for same time and stock 


df = df.pivot_table(index='time', columns='stock', values='price')
print (df)
stock       AAPL     SPY
time                    
13:03:00  100.00  200.00
13:03:01  100.01     NaN
13:03:02     NaN  200.01
13:03:03     NaN  350.02

Run Code Online (Sandbox Code Playgroud)

time重复和的另一种可能的解决方案stock：

df = df.groupby(['time', 'stock']).price.mean().unstack()
print (df)
stock       AAPL     SPY
time                    
13:03:00  100.00  200.00
13:03:01  100.01     NaN
13:03:02     NaN  200.01
13:03:03     NaN  350.02

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，11 月前
查看次数：	5379 次
最近记录：	8 年，3 月前