Wil*_*124 12 c# performance x86-64 simd avx
问题介绍
我正在尝试加快我正在编写的(2d)光线跟踪器的交集代码。我正在使用C#和System.Numerics库来提高SIMD指令的速度。
问题是我得到了奇怪的结果,屋顶加速和加速都很慢。我的问题是,为什么一个人过高而另一个人过低?
内容:
尝试次数
我已经尝试查找某些操作是否受到正确支持(请注意:这些操作适用于c ++。https://fgiesen.wordpress.com/2016/04/03/sse-mind-the-gap/或http://sci.tuomastonteri.fi/programming/sse),但似乎并非如此,因为我使用的笔记本电脑支持SSE 4.2。
在当前代码中,应用了以下更改:
代码...摘要?
对于大量代码,我们深表歉意,但是我不确定如果没有大量代码,我们如何才能具体讨论这一点。
雷代码-> BoundingBox
public bool CollidesWith(Ray ray, out float t)
{
// https://gamedev.stackexchange.com/questions/18436/most-efficient-aabb-vs-ray-collision-algorithms
// r.dir is unit direction vector of ray
float dirfracx = 1.0f / ray.direction.X;
float dirfracy = 1.0f / ray.direction.Y;
// lb is the corner of AABB with minimal coordinates - left bottom, rt is maximal corner
// r.org is origin of ray
float t1 = (this.rx.min - ray.origin.X) * dirfracx;
float t2 = (this.rx.max - ray.origin.X) * dirfracx;
float t3 = (this.ry.min - ray.origin.Y) * dirfracy;
float t4 = (this.ry.max - ray.origin.Y) * dirfracy;
float tmin = Math.Max(Math.Min(t1, t2), Math.Min(t3, t4));
float tmax = Math.Min(Math.Max(t1, t2), Math.Max(t3, t4));
// if tmax < 0, ray (line) is intersecting AABB, but the whole AABB is behind us
if (tmax < 0)
{
t = tmax;
return false;
}
// if tmin > tmax, ray doesn't intersect AABB
if (tmin > tmax)
{
t = tmax;
return false;
}
t = tmin;
return true;
}
Run Code Online (Sandbox Code Playgroud)
RayPack的代码-> BoundingBoxPack
public Vector<int> CollidesWith(ref RayPack ray, out Vector<float> t)
{
// ------------------------------------------------------- \\
// compute the collision. \\
Vector<float> dirfracx = Constants.ones / ray.direction.x;
Vector<float> dirfracy = Constants.ones / ray.direction.y;
Vector<float> t1 = (this.rx.min - ray.origin.x) * dirfracx;
Vector<float> t2 = (this.rx.max - ray.origin.x) * dirfracx;
Vector<float> t3 = (this.ry.min - ray.origin.y) * dirfracy;
Vector<float> t4 = (this.ry.max - ray.origin.y) * dirfracy;
Vector<float> tmin = Vector.Max(Vector.Min(t1, t2), Vector.Min(t3, t4));
Vector<float> tmax = Vector.Min(Vector.Max(t1, t2), Vector.Max(t3, t4));
Vector<int> lessThanZeroMask = Vector.GreaterThan(tmax, Constants.zeros);
Vector<int> greaterMask = Vector.LessThan(tmin, tmax);
Vector<int> combinedMask = Vector.BitwiseOr(lessThanZeroMask, greaterMask);
// ------------------------------------------------------- \\
// Keep track of the t's that collided. \\
t = Vector.ConditionalSelect(combinedMask, tmin, Constants.floatMax);
return combinedMask;
}
Run Code Online (Sandbox Code Playgroud)
雷代码->圈
public bool Intersect(Circle other)
{
// Step 0: Work everything out on paper!
// Step 1: Gather all the relevant data.
float ox = this.origin.X;
float dx = this.direction.X;
float oy = this.origin.Y;
float dy = this.direction.Y;
float x0 = other.origin.X;
float y0 = other.origin.Y;
float cr = other.radius;
// Step 2: compute the substitutions.
float p = ox - x0;
float q = oy - y0;
float r = 2 * p * dx;
float s = 2 * q * dy;
// Step 3: compute the substitutions, check if there is a collision.
float a = dx * dx + dy * dy;
float b = r + s;
float c = p * p + q * q - cr * cr;
float DSqrt = b * b - 4 * a * c;
// no collision possible! Commented out to make the benchmark more fair
//if (DSqrt < 0)
//{ return false; }
// Step 4: compute the substitutions.
float D = (float)Math.Sqrt(DSqrt);
float t0 = (-b + D) / (2 * a);
float t1 = (-b - D) / (2 * a);
float ti = Math.Min(t0, t1);
if(ti > 0 && ti < t)
{
t = ti;
return true;
}
return false;
}
Run Code Online (Sandbox Code Playgroud)
RayPack的代码-> CirclePack原始代码,未经编辑,可以在以下位置找到:https ://pastebin.com/87nYgZrv
public Vector<int> Intersect(CirclePack other)
{
// ------------------------------------------------------- \\
// Put all the data on the stack. \\
Vector<float> zeros = Constants.zeros;
Vector<float> twos = Constants.twos;
Vector<float> fours = Constants.fours;
// ------------------------------------------------------- \\
// Compute whether the ray collides with the circle. This \\
// is stored in the 'mask' vector. \\
Vector<float> p = this.origin.x - other.origin.x; ;
Vector<float> q = this.origin.y - other.origin.y;
Vector<float> r = twos * p * this.direction.x;
Vector<float> s = twos * q * this.direction.y; ;
Vector<float> a = this.direction.x * this.direction.x + this.direction.y * this.direction.y;
Vector<float> b = r + s;
Vector<float> c = p * p + q * q - other.radius * other.radius;
Vector<float> DSqrt = b * b - fours * a * c;
Vector<int> maskCollision = Vector.GreaterThan(DSqrt, zeros);
// Commented out to make the benchmark more fair.
//if (Vector.Dot(maskCollision, maskCollision) == 0)
//{ return maskCollision; }
// ------------------------------------------------------- \\
// Update t if and only if there is a collision. Take \\
// note of the conditional where we compute t. \\
Vector<float> D = Vector.SquareRoot(DSqrt);
Vector<float> bMinus = Vector.Negate(b);
Vector<float> twoA = twos * a;
Vector<float> t0 = (bMinus + D) / twoA;
Vector<float> t1 = (bMinus - D) / twoA;
Vector<float> tm = Vector.ConditionalSelect(Vector.LessThan(t1, t0), t1, t0);
Vector<int> maskBiggerThanZero = Vector.GreaterThan(tm, zeros);
Vector<int> maskSmallerThanT = Vector.LessThan(tm, this.t);
Vector<int> mask = Vector.BitwiseAnd(
maskCollision,
Vector.BitwiseAnd(
maskBiggerThanZero,
maskSmallerThanT)
);
this.t = Vector.ConditionalSelect(
mask, // the bit mask that allows us to choose.
tm, // the smallest of the t's.
t); // if the bit mask is false (0), then we get our original t.
return mask;
}
Run Code Online (Sandbox Code Playgroud)
汇编代码
这些可以在pastebin上找到。请注意,基准工具中有一些样板组件。您需要查看函数调用。
标杆管理
我一直在用BenchmarkDotNet基准测试情况。
Circle / CirclePack的结果(已更新):
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| Intersection | 9.710 ms | 0.0540 ms | 0.0505 ms |
| IntersectionPacked | 3.296 ms | 0.0055 ms | 0.0051 ms |
Run Code Online (Sandbox Code Playgroud)
BoundingBox / BoundingBoxPacked的结果:
| Method | Mean | Error | StdDev |
|------------------- |----------:|----------:|----------:|
| Intersection | 24.269 ms | 0.2663 ms | 0.2491 ms |
| IntersectionPacked | 1.152 ms | 0.0229 ms | 0.0264 ms |x
Run Code Online (Sandbox Code Playgroud)
由于采用了AVX,因此预计可以提高6到8倍的速度。边界框的加速非常明显,而圆的加速却很低。
在最上方再问一个问题:为什么一个加速是过高的,而另一种却是较低的?两者中的较低者(CirclePack)如何变得更快?
关于Peter Cordes的编辑(评论)
我不会尝试回答有关 SIMD 加速的问题,而是提供一些关于标量版本中的不良编码的详细评论,这些编码会延续到矢量版本中,以一种不适合 SO 评论的方式。
这段代码Intersect(Circle)简直是荒谬的:
// Step 3: compute the substitutions, check if there is a collision.
float a = dx * dx + dy * dy;
Run Code Online (Sandbox Code Playgroud)
平方和 ->a保证非负
float b = r + s;
float c = p * p + q * q - cr * cr;
float DSqrt = b * b - 4 * a * c;
Run Code Online (Sandbox Code Playgroud)
这不是 D 的平方根,而是 D 的平方。
// no collision possible! Commented out to make the benchmark more fair
//if (DSqrt < 0)
//{ return false; }
// Step 4: compute the substitutions.
float D = (float)Math.Sqrt(DSqrt);
Run Code Online (Sandbox Code Playgroud)
sqrt具有有限的域。避免对负输入的调用不仅可以节省平方根的平均成本,还可以防止您必须处理非常非常慢的 NaN。
而且,D是非负数,因为Math.Sqrt返回正分支或 NaN。
float t0 = (-b + D) / (2 * a);
float t1 = (-b - D) / (2 * a);
Run Code Online (Sandbox Code Playgroud)
这两者之间的区别是t0 - t1 = D / a。两个非负变量的比率也是非负的。因此t1永远不会更大。
float ti = Math.Min(t0, t1);
Run Code Online (Sandbox Code Playgroud)
此调用始终选择t1. t0更大的计算和测试是一种浪费。
if(ti > 0 && ti < t)
{
t = ti;
return true;
}
Run Code Online (Sandbox Code Playgroud)
回想一下ti总是t1,并且a是非负的,第一个测试相当于-b - D > 0或b < -D。
在 SIMD 版本中,Vector.SquareRoot文档没有描述输入为负时的行为。并且Vector.LessThan没有描述输入为 NaN 时的行为。