在OpenCV中循环浏览16位Mat像素的有效方法

Question

在OpenCV中循环浏览16位Mat像素的有效方法

我正在尝试在16位灰度的OpenCV Mat上进行非常简单的操作（类似LUT），这是有效的并且不会减慢调试器的速度。

尽管文档中有一个非常详细的页面专门解决此问题，但它没有指出大多数方法仅适用于8位图像（包括完善的，优化的LUT功能）。

我尝试了以下方法：

uchar* p = mat_depth.data;
for (unsigned int i = 0; i < depth_width * depth_height * sizeof(unsigned short); ++i)
{
    *p = ...;
    *p++;
}

Run Code Online (Sandbox Code Playgroud)

非常快，不幸的是仅支持uchart（就像LUT一样）。

int i = 0;
    for (int row = 0; row < depth_height; row++)
    {
        for (int col = 0; col < depth_width; col++)
        {
            i = mat_depth.at<short>(row, col);
            i = ..
            mat_depth.at<short>(row, col) = i;
        }
    }

Run Code Online (Sandbox Code Playgroud)

根据此答案改编而成：https : //stackoverflow.com/a/27225293/518169。没有为我工作，而且非常慢。

cv::MatIterator_<ushort> it, end;
    for (it = mat_depth.begin<ushort>(), end = mat_depth.end<ushort>(); it != end; ++it)
    {
       *it = ...;   
    }

Run Code Online (Sandbox Code Playgroud)

效果很好，但是它占用大量CPU，并且使调试器超级慢。

这个答案/sf/answers/1896978821/指出了内置LUT函数的源代码，但是只提到了高级优化技术，例如IPP和OpenCL。

我正在寻找的是一个非常简单的循环，如第一个代码，但对于ushorts。

您建议使用哪种方法解决此问题？我不是在寻求极端的优化，只是与.data上单循环的性能相提并论。

Answer 1

hyp*_*not 5

我实施了 Michael 和 Kornel 的建议，并在发布和调试模式下对它们进行了基准测试。

代码：

cv::Mat LUT_16(cv::Mat &mat, ushort table[])
{
    int limit = mat.rows * mat.cols;

    ushort* p = mat.ptr<ushort>(0);
    for (int i = 0; i < limit; ++i)
    {
        p[i] = table[p[i]];
    }
    return mat;
}

cv::Mat LUT_16_reinterpret_cast(cv::Mat &mat, ushort table[])
{
    int limit = mat.rows * mat.cols;

    ushort* ptr = reinterpret_cast<ushort*>(mat.data);
    for (int i = 0; i < limit; i++, ptr++)
    {
        *ptr = table[*ptr];
    }
    return mat;
}

cv::Mat LUT_16_if(cv::Mat &mat)
{
    int limit = mat.rows * mat.cols;

    ushort* ptr = reinterpret_cast<ushort*>(mat.data);
    for (int i = 0; i < limit; i++, ptr++)
    {
        if (*ptr == 0){
            *ptr = 65535;
        }
        else{
            *ptr *= 100;
        }
    }
    return mat;
}

ushort* tablegen_zero()
{
    static ushort table[65536];
    for (int i = 0; i < 65536; ++i)
    {
        if (i == 0)
        {
            table[i] = 65535;
        }
        else
        {
            table[i] = i;
        }
    }
    return table;
}

Run Code Online (Sandbox Code Playgroud)

结果如下（发布/调试）：

LUT_16：0.202 毫秒/0.773毫秒
LUT_16_reinterpret_cast：0.184 毫秒/0.801毫秒
LUT_16_if：0.249 毫秒/0.860 毫秒

所以结论是 reinterpret_cast 在发布模式下快了 9%，而 ptr one 在调试模式下快了 4%。

有趣的是，直接调用 if 函数而不是应用 LUT 只会使其慢 0.065 毫秒。

规格：流式传输 640x480x16 位灰度图像，Visual Studio 2013，i7 4750HQ。

归档时间：	11 年前
查看次数：	4758 次
最近记录：	11 年前