标签: vectorization

如何在 MATLAB 中矢量化随机游走仿真

我正在 MATLAB 中重写一个 Monte Carlo 仿真模型，重点是可读性。该模型涉及许多粒子，表示为 (x,y,z)，它们在具有特定终止概率的一小组状态上随机游走。与输出相关的信息是在给定状态终止的粒子数。

模拟需要足够的粒子，单独为每个粒子运行它的成本高得令人望而却步。矢量化似乎是从 MATLAB 中获得性能的方法，但是有没有任何惯用的方法可以在 MATLAB 中创建此仿真的矢量化版本？

我正在用头撞墙来完成这个 - 我什至尝试创建一个 (nStates x nParticles) 矩阵来表示每个粒子状态组合，但是这种方法在可读性方面很快就会失控，因为粒子从状态反弹相互独立地陈述。我应该硬着头皮改用更适合这个的语言吗？

simulation matlab markov-chains vectorization montecarlo

9
推荐指数

1
解决办法

4014
查看次数

MATLAB版本7中的pdist2等效项

我需要在matlab中计算2个矩阵之间的欧氏距离.目前我正在使用bsxfun并计算距离如下(我附加了一段代码):

for i=1:4754
test_data=fea_test(i,:);
d=sqrt(sum(bsxfun(@minus, test_data, fea_train).^2, 2));
end

Run Code Online (Sandbox Code Playgroud)

fea_test的大小是4754x1024而fea_train是6800x1024,使用他的for循环导致for的执行花费大约12分钟,我认为太高了.有没有办法更快地计算两个矩阵之间的欧氏距离？

我被告知通过删除不必要的for循环,我可以减少执行时间.我也知道pdist2可以帮助减少计算时间,但由于我使用的是matlab版本7.我没有pdist2函数.升级不是一种选择.

任何帮助.

问候,

巴维亚

matlab vectorization euclidean-distance bsxfun

9
推荐指数

2
解决办法

4202
查看次数

Fast popcount on Intel Xeon Phi

I'm implementing an ultra fast popcount on Intel Xeon® Phi®, as it's a performance hotspot of various bioinformatics software.

I've implemented five pieces of code,

#if defined(__MIC__)
#include <zmmintrin.h>
__attribute__((align(64))) static const uint32_t POPCOUNT_4bit[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
__attribute__((align(64))) static const uint32_t MASK_4bit[16] = {0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF};
inline uint64_t vpu_popcount1(uint64_t* buf, size_t n)  { …

Run Code Online (Sandbox Code Playgroud)

c vectorization hammingweight intel-mic xeon-phi

9
推荐指数

1
解决办法

1749
查看次数

高效实现`im2col`和`col2im`

MATLAB的im2col并col2im用图片的时候是在MATLAB量化非常重要的功能.
但他们需要MATLAB的图像处理工具箱.

我的问题是,是否有一种有效的(Vectorzied)方法来实现使用MATLAB的函数(没有工具箱)？
我需要sliding和distinct模式.

我不需要任何填充.

谢谢.

matlab image-processing vectorization

9
推荐指数

1
解决办法

6422
查看次数

在Python中矢量化Haversine距离计算

我正在尝试使用Haversine公式计算纬度和经度标识的一长串位置的距离矩阵,该公式采用两个坐标对元组来产生距离:

def haversine(point1, point2, miles=False):
    """ Calculate the great-circle distance bewteen two points on the Earth surface.

    :input: two 2-tuples, containing the latitude and longitude of each point
    in decimal degrees.

    Example: haversine((45.7597, 4.8422), (48.8567, 2.3508))

    :output: Returns the distance bewteen the two points.
    The default unit is kilometers. Miles can be returned
    if the ``miles`` parameter is set to True.

    """

Run Code Online (Sandbox Code Playgroud)

我可以使用嵌套for循环计算所有点之间的距离,如下所示:

data.head()

   id                      coordinates
0   1   (16.3457688674, 6.30354512503)
1   2    (12.494749307, 28.6263955635)
2   3    (27.794615136, 60.0324947881) …

Run Code Online (Sandbox Code Playgroud)

python performance numpy vectorization pandas

作者

9
推荐指数

2
解决办法

2085
查看次数

奇怪的uint32_t浮点数组转换

我有以下代码片段:

#include <cstdio>
#include <cstdint>

static const size_t ARR_SIZE = 129;

int main()
{
  uint32_t value = 2570980487;

  uint32_t arr[ARR_SIZE];
  for (int x = 0; x < ARR_SIZE; ++x)
    arr[x] = value;

  float arr_dst[ARR_SIZE];
  for (int x = 0; x < ARR_SIZE; ++x)
  {
    arr_dst[x] = static_cast<float>(arr[x]);
  }

  printf("%s\n", arr_dst[ARR_SIZE - 1] == arr_dst[ARR_SIZE - 2] ? "OK" : "WTF??!!");

  printf("magic = %0.10f\n", arr_dst[ARR_SIZE - 2]);
  printf("magic = %0.10f\n", arr_dst[ARR_SIZE - 1]);
  return 0;
}

Run Code Online (Sandbox Code Playgroud)

如果我在MS Visual Studio 2015下编译它,我可以看到输出是:

WTF??!! …

Run Code Online (Sandbox Code Playgroud)

c++ sse vectorization visual-studio

9
推荐指数

3
解决办法

554
查看次数

向量化 3D 数组的 NumPy 协方差

我有一个 3D numpy 形状数组(t, n1, n2)：

x = np.random.rand(10, 2, 4)

Run Code Online (Sandbox Code Playgroud)

我需要计算另一个 3D 数组y，其形状(t, n1, n1)为：

y[0] = np.cov(x[0,:,:])

Run Code Online (Sandbox Code Playgroud)

...对沿第一个轴的所有切片依此类推。

所以，一个循环的实现将是：

y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
    y[i] = np.cov(x[i, :, :])

Run Code Online (Sandbox Code Playgroud)

有什么方法可以将其矢量化，以便一次性计算所有协方差矩阵？我试着做：

x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)

Run Code Online (Sandbox Code Playgroud)

但它没有用。

python numpy vectorization covariance multidimensional-array

9
推荐指数

1
解决办法

3964
查看次数

4 uint16_t的快速模12算法打包在uint64_t中

考虑以下联合:

union Uint16Vect {
    uint16_t _comps[4];
    uint64_t _all;
};

Run Code Online (Sandbox Code Playgroud)

是否有快速算法来确定每个组件是否等于1模12？

一个天真的代码序列是:

Uint16Vect F(const Uint16Vect a) {
    Uint16Vect r;
    for (int8_t k = 0; k < 4; k++) {
        r._comps[k] = (a._comps[k] % 12 == 1) ? 1 : 0;
    }
    return r;
}

Run Code Online (Sandbox Code Playgroud)

c algorithm vectorization modulo avx2

9
推荐指数

2
解决办法

316
查看次数

二维数组中每条对角线的最大值

我有数组，需要动态窗口的最大滚动差异。

a = np.array([8, 18, 5,15,12])
print (a)
[ 8 18  5 15 12]

Run Code Online (Sandbox Code Playgroud)

所以首先我自己创造差异：

b = a - a[:, None]
print (b)
[[  0  10  -3   7   4]
 [-10   0 -13  -3  -6]
 [  3  13   0  10   7]
 [ -7   3 -10   0  -3]
 [ -4   6  -7   3   0]]

Run Code Online (Sandbox Code Playgroud)

然后将上三角矩阵替换为 0：

c = np.tril(b)
print (c)
[[  0   0   0   0   0]
 [-10   0   0   0   0]
 [  3  13   0   0   0]
 [ -7   3 -10 …

Run Code Online (Sandbox Code Playgroud)

python numpy max vectorization diagonal

9
推荐指数

1
解决办法

741
查看次数

OpenJDK Panama Vector API jdk.incubator.vector 没有为 Vector 点积提供改进的结果

我正在测试OpenJDK Panama Vector API jdk.incubator.vector 并在亚马逊 c5.4xlarge 实例上进行了测试。但在每种情况下，简单展开的矢量点积都无法执行 Vector API 代码。

我的问题是：为什么我无法获得如Richard Startin 的博客中所示的性能提升。同样的性能提升也在这次会议meetup中被英特尔人讨论过。有什么不见了？

JMH 基准测试结果：

Benchmark                                              (size)   Mode  Cnt      Score    Error  Units

FloatVector256DotProduct.unrolled                       1048576  thrpt   25   2440.726 ? 21.372  ops/s
FloatVector256DotProduct.vanilla                        1048576  thrpt   25    807.721 ?  0.084  ops/s
FloatVector256DotProduct.vector                         1048576  thrpt   25    909.977 ?  6.542  ops/s
FloatVector256DotProduct.vectorUnrolled                 1048576  thrpt   25    887.422 ?  5.557  ops/s
FloatVector256DotProduct.vectorfma                      1048576  thrpt   25    916.955 ?  4.652  ops/s
FloatVector256DotProduct.vectorfmaUnrolled              1048576  thrpt   25    877.569 ?  1.451  ops/s

JavaDocExample.simpleMultiply                           1048576  thrpt …

Run Code Online (Sandbox Code Playgroud)

java vectorization dot-product project-panama

9
推荐指数

1
解决办法

1018
查看次数

标签统计

vectorization ×10

c ×2

avx2 ×1

c++ ×1

dot-product ×1

euclidean-distance ×1

hammingweight ×1

image-processing ×1

java ×1

markov-chains ×1

max ×1

multidimensional-array ×1

performance ×1

project-panama ×1

sse ×1

visual-studio ×1

«
1
…
17
18
19
20
21
…
120
»