我试图将数据帧中的所有值加总为一个数字.
例如,使用数据帧
BBG.XAMS.FUR.S_pnl_pos_cost BBG.XAMS.MT.S_pnl_pos_cost
date
2015-03-23 -0.674996 -0.674997
2015-03-24 82.704951 11.868748
2015-03-25 -11.027327 84.160210
2015-03-26 228.426675 -131.901556
2015-03-27 -99.744986 214.579858
Run Code Online (Sandbox Code Playgroud)
我想要返回值377.71658.
我尝试过df.sum(),但只按列进行求和.
任何帮助将非常感激.
我会做
>>> df.values.sum()
377.71658000000002
Run Code Online (Sandbox Code Playgroud)
如果帧是全数字的话,它会下降到底层的numpy数组,并且可能是最快的.但是还有很多其他选择:
>>> %timeit df.values.sum()
100000 loops, best of 3: 6.27 µs per loop
>>> %timeit df.sum().sum()
10000 loops, best of 3: 109 µs per loop
>>> %timeit df.unstack().sum()
1000 loops, best of 3: 233 µs per loop
>>> %timeit df.stack().sum()
1000 loops, best of 3: 190 µs per loop
Run Code Online (Sandbox Code Playgroud)
只需将列总和相加即可:
\ndf.sum().sum()\nRun Code Online (Sandbox Code Playgroud)\n或者为了更好的性能:
\nnp.nansum(df)\nRun Code Online (Sandbox Code Playgroud)\n请注意,您需要使用nansum将 NaN 视为零才能对它们求和。
时间:
\n# Create dataframe with 1m rows and 100 columns.\nnp.random.seed(0)\nrows = 1_000_000\ncols = 100\ndf = pd.DataFrame(np.random.randn(rows, cols))\n# Add one thousand NaNs.\nfor row, col in zip(np.random.randint(0, rows, 1000),\n np.random.randint(0, cols, 1000)):\n df.iat[row, col] = np.nan\n\n%timeit np.nansum(df)\n# 274 ms \xc2\xb1 3.24 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n%timeit df.fillna(0).to_numpy().sum()\n# 974 ms \xc2\xb1 3.97 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n%timeit df.sum().sum()\n# 1.04 s \xc2\xb1 3.24 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n>>> df.to_numpy().sum()\nnan\n\n>>> np.nansum(df)\n5965.87530314851\nRun Code Online (Sandbox Code Playgroud)\n