如何使用矢量 SSE 操作将图像像素数据的字节数组转换为灰度

Question

如何使用矢量 SSE 操作将图像像素数据的字节数组转换为灰度

Cof*_*ght 7 c# sse image-processing simd vectorization

我在将存储的图像数据转换byte[] array为灰度时遇到问题。我想使用向量 SIMD 操作，因为将来需要编写 ASM 和 C++ DLL 文件来测量操作时间。

当我读到SIMD我发现，SSE指令是在128位寄存器的操作，因此是一个问题，因为我需要我转换byte[] array成几个Vector<T>存储List<T>.

图像是四通道 RGBA JPEG，所以我还需要知道如何使用基于单个 128 位的 R、G、B 数据创建向量Vector<T>。之后，我可以使用灰度算法

fY(R, G, B) ? R x 0.29891 + G x 0.58661 + B x 0.11448

总而言之，问题是：

如何将块加载byte[] array到 128 位寄存器中Vector<T>。
如何Vector<T>将 R、G、B 值分开以将其相乘并复制到源 Vector。

Answer 1

Soo*_*nts 9

它需要 System.Runtime.Intrinsics.Experimental.dll 并且不安全，但它相对简单，并且对于许多实际应用程序来说可能足够快。

/// <summary>Load 4 pixels of RGB</summary>
static unsafe Vector128<int> load4( byte* src )
{
    return Sse2.LoadVector128( (int*)src );
}

/// <summary>Pack red channel of 8 pixels into ushort values in [ 0xFF00 .. 0 ] interval</summary>
static Vector128<ushort> packRed( Vector128<int> a, Vector128<int> b )
{
    Vector128<int> mask = Vector128.Create( 0xFF );
    a = Sse2.And( a, mask );
    b = Sse2.And( b, mask );
    return Sse2.ShiftLeftLogical128BitLane( Sse41.PackUnsignedSaturate( a, b ), 1 );
}

/// <summary>Pack green channel of 8 pixels into ushort values in [ 0xFF00 .. 0 ] interval</summary>
static Vector128<ushort> packGreen( Vector128<int> a, Vector128<int> b )
{
    Vector128<int> mask = Vector128.Create( 0xFF00 );
    a = Sse2.And( a, mask );
    b = Sse2.And( b, mask );
    return Sse41.PackUnsignedSaturate( a, b );
}

/// <summary>Pack blue channel of 8 pixels into ushort values in [ 0xFF00 .. 0 ] interval</summary>
static Vector128<ushort> packBlue( Vector128<int> a, Vector128<int> b )
{
    a = Sse2.ShiftRightLogical128BitLane( a, 1 );
    b = Sse2.ShiftRightLogical128BitLane( b, 1 );
    Vector128<int> mask = Vector128.Create( 0xFF00 );
    a = Sse2.And( a, mask );
    b = Sse2.And( b, mask );
    return Sse41.PackUnsignedSaturate( a, b );
}

/// <summary>Load 8 pixels, split into RGB channels.</summary>
static unsafe void loadRgb( byte* src, out Vector128<ushort> red, out Vector128<ushort> green, out Vector128<ushort> blue )
{
    var a = load4( src );
    var b = load4( src + 16 );
    red = packRed( a, b );
    green = packGreen( a, b );
    blue = packBlue( a, b );
}

const ushort mulRed = (ushort)( 0.29891 * 0x10000 );
const ushort mulGreen = (ushort)( 0.58661 * 0x10000 );
const ushort mulBlue = (ushort)( 0.11448 * 0x10000 );

/// <summary>Compute brightness of 8 pixels</summary>
static Vector128<short> brightness( Vector128<ushort> r, Vector128<ushort> g, Vector128<ushort> b )
{
    r = Sse2.MultiplyHigh( r, Vector128.Create( mulRed ) );
    g = Sse2.MultiplyHigh( g, Vector128.Create( mulGreen ) );
    b = Sse2.MultiplyHigh( b, Vector128.Create( mulBlue ) );
    var result = Sse2.AddSaturate( Sse2.AddSaturate( r, g ), b );
    return Vector128.AsInt16( Sse2.ShiftRightLogical( result, 8 ) );
}

/// <summary>Convert buffer from RGBA to grayscale.</summary>
/// <remarks>
/// <para>If your image has line paddings, you'll want to call this once per line, not for the complete image.</para>
/// <para>If width of the image is not multiple of 16 pixels, you'll need to do more work to handle the last few pixels of every line.</para>
/// </remarks>
static unsafe void convertToGrayscale( byte* src, byte* dst, int count )
{
    byte* srcEnd = src + count * 4;
    while( src < srcEnd )
    {
        loadRgb( src, out var r, out var g, out var b );
        var low = brightness( r, g, b );
        loadRgb( src + 32, out r, out g, out b );
        var hi = brightness( r, g, b );

        var bytes = Sse2.PackUnsignedSaturate( low, hi );
        Sse2.Store( dst, bytes );

        src += 64;
        dst += 16;
    }
}

Run Code Online (Sandbox Code Playgroud)

但是，等效的 C++ 实现会更快。C# 内联这些函数做得不错，即不convertToGrayscale包含函数调用。但是该函数的代码远非最佳。.NET 未能传播常量，因为它在循环内发出了这样的代码：

mov         r8d,962Ch
vmovd       xmm1,r8d
vpbroadcastw xmm1,xmm1

Run Code Online (Sandbox Code Playgroud)

生成的代码仅使用 16 个寄存器中的 6 个。所有涉及的幻数都有足够的可用寄存器。

此外，.NET 会发出许多冗余指令，这些指令只是将数据打乱：

vmovaps xmm2, xmm0
vmovaps xmm3, xmm1

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，11 月前
查看次数：	1982 次
最近记录：	5 年，3 月前