相关疑难解决方法(0)

private static long ForSumArray(long[] array)
{
    var result = 0L;
    for (var i = 0L; i < array.LongLength; i++)
    {
        result += array[i];
    }
    return result;
}

private static long ForSumArray2(long[] array)
{
    var length = array.LongLength;
    var result = 0L;
    for (var i = 0L; i < length; i++)
    {
        result += array[i];
    }
    return result;
}

private static long IterSumArray(long[] array)
{
    var result = 0L;
    foreach (var entry in array)
    {
        result += …

Run Code Online (Sandbox Code Playgroud)

c# performance

Nic*_*ell

2009 12-22

21
推荐指数

2
解决办法

5109
查看次数

很长一段时间我的印象是,使用嵌套std::vector<std::vector...>来模拟N维数组通常很糟糕,因为内存不保证是连续的,并且可能有缓存未命中.我认为最好使用平面矢量并从多个维度映射到1D,反之亦然.所以,我决定测试它(最后列出的代码).这非常简单,我定时读取/写入嵌套的3D矢量与我自己的1D矢量3D包装器.我用两者编译了代码,g++并且打开clang++了-O3优化.对于每次运行,我都改变了尺寸,因此我可以很好地了解这种行为.令我惊讶的是,这些是我在我的机器MacBook Pro(Retina,13英寸,2012年末),2.5GHz i5,8GB RAM,OS X 10.10.5上获得的结果:

g ++ 5.2

dimensions       nested   flat
X   Y   Z        (ms)     (ms) 

100 100 100  ->  16       24
150 150 150  ->  58       98
200 200 200  ->  136     308
250 250 250  ->  264     746
300 300 300  ->  440    1537

Run Code Online (Sandbox Code Playgroud)

clang ++(LLVM 7.0.0)

dimensions       nested   flat
X   Y   Z        (ms)     (ms) 

100 100 100  ->  16       18
150 150 150  ->  53       61
200 …

Run Code Online (Sandbox Code Playgroud)

c++ performance caching vector performance-testing

vso*_*tco

2017 08-05

16
推荐指数

2
解决办法

994
查看次数

从硬件架构的角度来看,为什么非规范化浮点数比其他浮点数慢得多？

与法线相比,已知非正规性表现不佳,大约100倍左右.这经常导致意外的软件问题.

我很好奇,从CPU架构角度来看,为什么非正规数必须是多慢？缺乏表现是否是他们不幸的表现所固有的？或者CPU架构师可能会忽略它们以减少硬件成本(错误的)假设非正规无关紧要？

在前一种情况下,如果非正规本质上是硬件不友好的,那么是否已知非IEEE-754浮点表示在零附近也是无间隙的,但是对于硬件实现更方便？

floating-point cpu-architecture

Mic*_*ael

2017 05-23

15
推荐指数

1
解决办法

911
查看次数

C＃和SIMD：高和低加速。怎么了？

问题介绍

我正在尝试加快我正在编写的（2d）光线跟踪器的交集代码。我正在使用C＃和System.Numerics库来提高SIMD指令的速度。

问题是我得到了奇怪的结果，屋顶加速和加速都很慢。我的问题是，为什么一个人过高而另一个人过低？

内容：

RayPack结构是一系列（不同的）射线，包装在System.Numerics的Vector中。
BoundingBoxPack和CirclePack结构是单个bb /圆圈，包装在System.Numerics的向量中。
使用的CPU是i7-4710HQ（Haswell），带有SSE 4.2，AVX（2）和FMA（3）指令。
在发布模式（64位）下运行。该项目在.Net Framework 472中运行。未设置其他选项。

尝试次数

我已经尝试查找某些操作是否受到正确支持（请注意：这些操作适用于c ++。https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/或http://sci.tuomastonteri.fi/programming/sse），但似乎并非如此，因为我使用的笔记本电脑支持SSE 4.2。

在当前代码中，应用了以下更改：

使用更正确的说明（例如，最小包装）。
不使用float *向量指令（导致大量其他操作，请参见原始程序集）。

代码...摘要？

对于大量代码，我们深表歉意，但是我不确定如果没有大量代码，我们如何才能具体讨论这一点。

雷代码-> BoundingBox

public bool CollidesWith(Ray ray, out float t)
{
    // https://gamedev.stackexchange.com/questions/18436/most-efficient-aabb-vs-ray-collision-algorithms
    // r.dir is unit direction vector of ray
    float dirfracx = 1.0f / ray.direction.X;
    float dirfracy = 1.0f / ray.direction.Y;
    // lb is the corner of AABB with minimal coordinates - left bottom, rt is maximal corner
    // r.org is origin of …

Run Code Online (Sandbox Code Playgroud)

c# performance x86-64 simd avx

Wil*_*124

2019 07-10

12
推荐指数

1
解决办法

217
查看次数

x86和x64之间的浮点算术的差异

我偶然发现了在x86和x64的MS VS 2010版本之间完成浮点算术的方式不同(两者都在同一台64位机器上执行).

这是一个简化的代码示例:

float a = 50.0f;
float b = 65.0f;
float c =  1.3f;
float d = a*c;
bool bLarger1 = d<b;
bool bLarger2 = (a*c)<b;

Run Code Online (Sandbox Code Playgroud)

布尔bLarger1始终为false(在两个版本中d都设置为65.0).变量bLarger2对于x64为false,但对于x86为true!

我很清楚浮点算术和圆角效应正在发生.我也知道32位有时使用不同的指令进行浮动操作而不是64位构建.但在这种情况下,我错过了一些信息.

为什么bLarger1和bLarger2之间首先存在差异？为什么它只出现在32位版本上？

左:x86,右:x64

c c++ floating-point 64-bit visual-studio-2010

Oli*_*del

lucky-day

10
推荐指数

1
解决办法

6635
查看次数

使用空构造函数会使数组未初始化，从而导致计算速度变慢

我对一件事感到非常困惑...如果我将构造函数添加到 struct A 中，那么 for 循环中的计算会变得慢很多倍。为什么？我不知道。

在我的计算机上，输出中的代码片段的时间为：

有构造函数：1351

没有构造函数：220

这是一个代码：

#include <iostream>
#include <chrono>
#include <cmath>

using namespace std;
using namespace std::chrono;

const int SIZE = 1024 * 1024 * 32;

using type = int;

struct A {
    type a1[SIZE];
    type a2[SIZE];
    type a3[SIZE];
    type a4[SIZE];
    type a5[SIZE];
    type a6[SIZE];

    A() {} // comment this line and iteration will be twice faster
};

int main() {
    A* a = new A();
    int r;
    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    for (int i …

Run Code Online (Sandbox Code Playgroud)

c++ optimization default-constructor compiler-optimization visual-studio-2013

dar*_*uki

2022 02-21

6
推荐指数

1
解决办法

273
查看次数

绩效评估的惯用方法？

我正在评估我的项目的网络+渲染工作负载。

程序连续运行一个主循环：

while (true) {
   doSomething()
   drawSomething()
   doSomething2()
   sendSomething()
}

Run Code Online (Sandbox Code Playgroud)

主循环每秒运行 60 多次。

我想查看性能故障，每个程序需要多少时间。

我担心的是，如果我打印每个程序的每个入口和出口的时间间隔，

这会导致巨大的性能开销。

我很好奇什么是衡量性能的惯用方法。

日志打印是否足够好？

benchmarking microbenchmark

shp*_*ark

lucky-day

1
推荐指数

1
解决办法

1322
查看次数

标签统计

c++ ×3

performance ×3

c# ×2

floating-point ×2

64-bit ×1

assembly ×1

avx ×1

benchmarking ×1

c ×1

caching ×1

compiler-optimization ×1

cpu ×1

cpu-architecture ×1

default-constructor ×1

iaca ×1

intel ×1

microbenchmark ×1

optimization ×1

performance-testing ×1

simd ×1

vector ×1

visual-studio-2010 ×1

visual-studio-2013 ×1

x86 ×1

x86-64 ×1

标签 统计

标签统计