Pandas：如何编写一个更快的循环来检查一列，然后根据第一列的值更改另一列的符号？

Question

Pandas：如何编写一个更快的循环来检查一列，然后根据第一列的值更改另一列的符号？

我的数据框中有大约 1000 万行数据。下面是 2 行的示例。

指数	数量	借记卡信用卡
0	1000	1
1	2000年	2

我想编写一个函数来检查“借方/贷方”列中的值是借方 1 还是贷方 2。如果“金额”列中的数字为 2，则将其替换为负数。因此，例如，该表将更改为：

指数	数量	借记卡信用卡
0	1000	1
1	-2000	2

这是我写的函数，但它对于 900 万行来说真的很慢。谁能告诉我如何重构这段代码？或者是否有更有效的方法来执行此任务？（使用 python 或 sql。最好是 python。）

def change_credits_to_negative(df):
    for num in range(len(df)):
        if df['debit/credit'].loc[num] == 2: # 1 is for debit & 2 is for credit
            df['Amount'].loc[num] = -df['Amount'].loc[num]

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ark*_*usz 6

你可以用.loc，但没有循环：

df.loc[df['debit/credit'].eq(2), 'Amount'] *= -1

Run Code Online (Sandbox Code Playgroud)

输出：

    Amount  debit/credit
0     1000             1
1    -2000             2

Run Code Online (Sandbox Code Playgroud)

或者

通过np.where()：

import numpy as np

df['Amount'] = np.where(df['debit/credit'].eq(2), df['Amount']*-1, df['Amount'])

Run Code Online (Sandbox Code Playgroud)

性能测试：

让我们创建一个包含 2 列和 1000 万行的示例数据框：

import time

df = pd.DataFrame({'Amount': np.random.randint(1000, 10000, size=10000000),
                   'debit/credit': np.random.randint(1, size=10000000) + 1})

Run Code Online (Sandbox Code Playgroud)

1）循环：

start = time.perf_counter()

change_credits_to_negative(df)

stop = time.perf_counter()
print(stop - start)

97.34215749999998

Run Code Online (Sandbox Code Playgroud)

2）位置：

start = time.perf_counter()

df.loc[df['debit/credit'].eq(2), 'Amount'] *= -1

stop = time.perf_counter()
print(stop - start)

0.03006110000001172

Run Code Online (Sandbox Code Playgroud)

它给了我们 97 秒。与循环和 0.03 秒。没有它。

归档时间：	4 年，5 月前
查看次数：	57 次
最近记录：	4 年，5 月前