二进制矩阵乘法比特错误的黑客

Lu4*_*Lu4 7 algorithm bit-manipulation pseudocode matrix matrix-multiplication

抽象

嗨,假设你有两个不同的独立64位二进制矩阵AT(它T是自身的转置版本,使用矩阵的转置版本允许在乘法运算T的行而不是列,这对二进制算术来说非常酷)和你想要将这些矩阵相乘,唯一的事情是矩阵乘法结果被截断为64位,如果你得到的值大于1某个特定矩阵单元格中的值,则生成的矩阵单元格将包含1否则0

   A        T
00000001 01111101 
01010100 01100101 
10010111 00010100 
10110000 00011000 <-- This matrix is transposed
11000100 00111110 
10000011 10101111 
11110101 11000100 
10100000 01100010 
Run Code Online (Sandbox Code Playgroud)

二元和传统乘法结果:

 Binary  Traditional
11000100  11000100
11111111  32212121
11111111  32213421
11111111  21112211
11101111  22101231
11001111  11001311
11111111  54213432
11001111  11001211
Run Code Online (Sandbox Code Playgroud)

如何以最有效的方式以上述方式将这些矩阵相乘?

PS

我试图利用二进制and(即&运算符)而不是在单独的位上执行乘法,在这种情况下,我必须为乘法准备数据:

ulong u;

u = T & 0xFF;
u = (u << 00) + (u << 08) + (u << 16) + (u << 24)
  + (u << 32) + (u << 40) + (u << 48) + (u << 56);
Run Code Online (Sandbox Code Playgroud)

现在通过执行二进制and超过两个整数A,u它将屈服于以下:

   A        u        R        C
00000001 01111101 00000001    1
01010100 01111101 01010100    3
10010111 01111101 00010101    3
10110000 01111101 00110000    2
11000100 01111101 01000100    2
10000011 01111101 00000001    1
11110101 01111101 01110101    5
10100000 01111101 00100000    1
Run Code Online (Sandbox Code Playgroud)

在上面的示例中,R包含A位与u位的乘法结果,并且为了获得最终值,我们必须sum在一行中的所有位.请注意,该列C包含的值等于Traditional上面结果矩阵乘法的第一列中找到的值.问题是,在这个步骤中我必须操作一个单独的位,我认为这是次优的方法,我通过http://graphics.stanford.edu/~seander/bithacks.html阅读寻找方法在并行但没有运气的情况下这样做,如果有人知道如何"压扁"并将R列中的值"合并" 到最终的64位矩阵中,我将不胜感激,如果你给我几行,

谢谢,

编辑

非常感谢David Eisenstat,最终算法将如下所示:

var A = ...;
var T = ...; // T == transpose(t), t is original matrix, algorithm works with transposed matrix

var D = 0x8040201008040201UL;

U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D);
Run Code Online (Sandbox Code Playgroud)

以下代码:

    public static void Main (string[] args){
        ulong U;
        var Random = new Xor128 ();

        var timer = DateTime.Now;

        var A = Random.As<IUniformRandom<UInt64>>().Evaluate();
        var T = Random.As<IUniformRandom<UInt64>>().Evaluate();

        var steps = 10000000;

        for (var i = 0; i < steps; i++) {
            ulong r = 0;

            var d = 0x8040201008040201UL;

            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
            U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d);
        }

        Console.WriteLine (DateTime.Now - timer);


        var m1 = new Int32[8,8];
        var m2 = new Int32[8,8];
        var m3 = new Int32[8,8];

        for (int row = 0; row < 8; row++) {
            for (int col = 0; col < 8; col++) {
                m1 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
                m2 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
                m3 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
            }
        }

        timer = DateTime.Now;

        for (int i = 0; i < steps; i++) {
            for (int row = 0; row < 8; row++) {
                for (int col = 0; col < 8; col++) {
                    var sum = 0;

                    for (int temp = 0; temp < 8; temp++) {
                        sum += m1 [row, temp] * m2 [temp, row];
                    }

                    m3 [row, col] = sum;
                }
            }
        }

        Console.WriteLine (DateTime.Now - timer);
    }
Run Code Online (Sandbox Code Playgroud)

向我展示以下结果:

00:00:02.4035870
00:00:57.5147150
Run Code Online (Sandbox Code Playgroud)

感谢大家,Mac OS X/Mono的性能提升了23倍

Dav*_*tat 6

我不知道有效的,但这里的一些尝试.以下指令序列计算乘积A*T'的主对角线.将T和D旋转8位并重复7次迭代.

// uint64_t A, T;
uint64_t D = UINT64_C(0x8040201008040201);
uint64_t P = A & T;
// test whether each byte is nonzero
P |= P >> 1;
P |= P >> 2;
P |= P >> 4;
P &= UINT64_C(0x0101010101010101);
// fill each nonzero byte with ones
P *= 255;  // or P = (P << 8) - P;
// leave only the current diagonal
P &= D;
Run Code Online (Sandbox Code Playgroud)