使用 Pandas 迭代列表 VS

Question

使用 Pandas 迭代列表 VS

我有一个名为 reassembly 的大列表，其组织方式如下：

['HYDR', 30472.0, 'B'], ['HYDR', 30470.0, 'S'], ['HYDR', 30474.0, 'B'].....

Run Code Online (Sandbox Code Playgroud)

我的一段代码：

sum_buys = 0
sum_sells = 0
for deal in reassembly:
    ticker, vol, oper = deal[0], deal[1], deal[2]
    if oper == "B":
        sum_buys = sum_buys + vol
    elif oper == "S":
        sum_sells = sum_sells + vol

Run Code Online (Sandbox Code Playgroud)

名单非常大。从开始到结束运行大约需要 5 分钟。很长时间。

pandas 库可以帮助我更快地完成吗？我从未与它合作过。

我该怎么办？

将名为 reassembly 的列表转换为 pandas 数据框
使用 pandas 方法计算两个参数：sum_buys 和 sum_sells

请帮帮我！

Answer 1

Qua*_*ang 5

是的，您可以并且应该将列表转换为熊猫数据框并使用groupby()：

df = pd.DataFrame(reassembly, columns=['tickers','vol','operation'])

df.groupby('operation')['vol'].sum()

Run Code Online (Sandbox Code Playgroud)

样本数据的输出：

operation
B    60946.0
S    30470.0
Name: vol, dtype: float64

Run Code Online (Sandbox Code Playgroud)

另外，比如说，也许您对每个股票的买入/卖出总额感兴趣，您可以这样做：

df.groupby(['tickers','operation'])['vol'].sum()

Run Code Online (Sandbox Code Playgroud)

得到这样的东西：

tickers  operation
HYDR     B            60946.0
         S            30470.0
Name: vol, dtype: float64

Run Code Online (Sandbox Code Playgroud)

要忽略与交易vol<100000，我们可以将所有涉及vol >= 100000的

df = df[df['vol']>=100000]

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，5 月前
查看次数：	34 次
最近记录：	5 年，5 月前