pandas系列成对最大值

use*_*195 5 python vectorization pandas

我想找到 pandas Series 中每个元素和 0 之间的成对最大值。我的粗略解决方案如下:

import numpy as np
import pandas as pd
np.random.seed(1)

series = pd.Series(np.random.randn(100))
pmax = pd.Series([])
for i in range(len(series)):
    pmax[i] = max(series[i],0)
Run Code Online (Sandbox Code Playgroud)

我需要在大量系列上运行这个解决方案,而且这个解决方案太慢了。是否有矢量化方法可以达到相同的结果?

neb*_*oth 8

我正在寻找 Rs 的 python 实现的解决方案pmax(),并偶然发现了 numpysmaximum()函数,它的作用正是pmax()

pmax(5,c(1,2,6))
[1] 5 5 6
Run Code Online (Sandbox Code Playgroud)

和:

>>> import numpy as np
>>> np.maximum(5, [1,2,6])
array([5, 5, 6])
Run Code Online (Sandbox Code Playgroud)


use*_*203 3

设置

\n\n
s = pd.Series([1,2,3,-1,-2,3,4,-5])\n
Run Code Online (Sandbox Code Playgroud)\n\n

使用mask0 作为填充值:

\n\n
s.mask(s<0, 0)\n\n0    1\n1    2\n2    3\n3    0\n4    0\n5    3\n6    4\n7    0\ndtype: int64\n
Run Code Online (Sandbox Code Playgroud)\n\n

使用np.clip无上限:

\n\n
np.clip(s, 0, None)\n
Run Code Online (Sandbox Code Playgroud)\n\n

@Coldspeed 建议使用pd.Series.clip_lower

\n\n
s.clip_lower(0)\n
Run Code Online (Sandbox Code Playgroud)\n\n

时间安排

\n\n
In [204]: %%timeit\n     ...: pmax = pd.Series([])\n     ...: for i in range(len(series)):\n     ...:     pmax[i] = max(series[i],0)\n     ...:\n81.2 ms \xc2\xb1 4.06 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [205]: %timeit series.mask(series<0, 0)\n626 \xc2\xb5s \xc2\xb1 30.6 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n\nIn [206]: %timeit np.clip(series, 0, None)\n124 \xc2\xb5s \xc2\xb1 3.44 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n\nIn [209]: %timeit series.clip_lower(0)\n97.2 \xc2\xb5s \xc2\xb1 3.15 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n