熊猫随机加权选择

Question

熊猫随机加权选择

Lea*_*ava 5 python numpy python-2.7 pandas

我想考虑使用加权随机选择一个值Pandas。

df：

   0  1  2  3  4  5
0  40  5 20 10 35 25
1  24  3 12  6 21 15
2  72  9 36 18 63 45
3  8   1  4  2  7 5
4  16  2  8  4 14 10
5  48  6 24 12 42 30

Run Code Online (Sandbox Code Playgroud)

我知道使用np.random.choice，例如：

x = np.random.choice(
  ['0-0','0-1',etc.], 
  1,
  p=[0.4,0.24 etc.]
)

Run Code Online (Sandbox Code Playgroud)

因此，我想以类似于np.random.choicefrom的样式/替代方法来获取输出df，但使用Pandas。与如上所述手动插入值相比，我想以一种更有效的方式进行操作。

使用np.random.choice我知道所有值都必须加起来1。我不确定如何解决这个问题，也不确定使用来基于加权随机选择一个值Pandas。

当指代输出时，如果随机选择的权重例如为40，则输出将位于0-0中，因为它位于那个中column 0，row 0依此类推。

Answer 1

ayh*_*han 5

堆叠数据帧：

stacked = df.stack()

Run Code Online (Sandbox Code Playgroud)

标准化权重（使它们加起来为 1）：

weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.

Run Code Online (Sandbox Code Playgroud)

然后使用示例：

stacked.sample(1, weights=weights)
Out: 
1  2    12
dtype: int64

# Or without normalization, stacked.sample(1, weights=stacked)

Run Code Online (Sandbox Code Playgroud)

DataFrame.sample 方法允许您从行或列中采样。考虑一下：

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out: 
    0  1   2  3   4   5
1  24  3  12  6  21  15

Run Code Online (Sandbox Code Playgroud)

它选择一行（第一行有 40% 的机会，第二行有 30% 的机会等等）

这也是可能的：

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out: 
   1
0  5
1  3
2  9
3  1
4  2
5  6

Run Code Online (Sandbox Code Playgroud)

相同的过程，但 40% 的机会与第一列相关联，我们正在从列中进行选择。但是，您的问题似乎暗示您不想选择行或列 - 您想选择里面的单元格。因此，我将维度从 2D 更改为 1D。

df.stack()

Out: 
0  0    40
   1     5
   2    20
   3    10
   4    35
   5    25
1  0    24
   1     3
   2    12
   3     6
   4    21
   5    15
2  0    72
   1     9
   2    36
   3    18
   4    63
   5    45
3  0     8
   1     1
   2     4
   3     2
   4     7
   5     5
4  0    16
   1     2
   2     8
   3     4
   4    14
   5    10
5  0    48
   1     6
   2    24
   3    12
   4    42
   5    30
dtype: int64

Run Code Online (Sandbox Code Playgroud)

因此，如果我现在从中采样，我将同时采样一行和一列。例如：

df.stack().sample()
Out: 
1  0    24
dtype: int64

Run Code Online (Sandbox Code Playgroud)

选择第 1 行和第 0 列。

归档时间：	8 年，6 月前
查看次数：	2769 次
最近记录：	8 年，6 月前