小编mat*_*tch的帖子

有效的D:最佳实践和设计模式

一个非常有趣的会议是关于D-Specific设计模式的,在D社区中,有些人认为它可能是一本关于有效编码技术的书的起点.其他人认为现在为时尚早,因为没有多少人有很多经验,这样一本书的作者会对有效性的概念有一些偏见/个人欣赏.SO是一种更具互动性的媒体(有其局限性).因此,等待'Effective D'出来,如果我们可以分享一些(查杀)建议/技术/模式以使D代码看起来更好,那将是很好的.我认为如果答案会更清楚:

揭露一种独特的技术
本质上是一段注释代码
(如果它太大)只是代码的链接(公共要点......)

d phobos

mat*_*tch

2013 06-16

12
推荐指数

1
解决办法

1154
查看次数

在C(SIMD)中快速转换图像和Sobel滤波器优化

我想为我和我的朋友写一个真正(非常)快速的Sobel算子用于射线追踪器(可以在这里找到源码).以下是我到目前为止所得到的......

首先,假设图像是8位无符号整数数组中逐行的灰度图像存储.

要编写真正的Sobel滤波器,我需要为每个像素计算Gx和Gy.由于原点旁边有6个像素,因此计算出这些数字中的每一个.但SIMD指令允许我处理16或甚至32(AVX)像素.希望运算符的内核具有一些不错的属性,因此我可以通过以下方式计算Gy:

减去每个i和i + 2行并将结果存储在某个其他图片(数组)的i + 1行中
添加i,i + 1和i + 2列的两倍给出最终图片的i + 1列

我会做同样的(但转置)计算Gx然后添加两张图片.

一些说明:

我不关心内存分配,因为一切都将在开始时分配.
我可以处理溢出并签署将值除以4的问题(感谢_mm_srli_epi8) (uint8_t >> 2 - uint8_t >> 2) = int7_t //really store as int8_t int7_t + uint8_t << 1 >> 2 + int7_t = uint8_t //some precision is lost but I don't care

我面临的真正问题是从行到列.因为我无法将图片加载到SIMD寄存器中.我必须三次翻转图像至少不是吗？

一旦原始图片.然后我可以计算Gx和Gy的第一步,然后翻转结果图片以计算第二步.

所以,这是我的问题:

这种实现是一个好主意吗？
有没有办法比dumb算法更快地转置数组？(我不这么认为)
瓶颈在哪里？(有什么猜测？:P)

c optimization sse simd

mat*_*tch

2013 08-14

8
推荐指数

1
解决办法

3898
查看次数

C内在函数,SSE2点积和gcc -O3生成的汇编

我需要使用SSE2编写一个点积(没有_mm_dp_ps也没有_mm_hadd_ps):

#include <xmmintrin.h>

inline __m128 sse_dot4(__m128 a, __m128 b)
{
    const __m128 mult = _mm_mul_ps(a, b);
    const __m128 shuf1 = _mm_shuffle_ps(mult, mult, _MM_SHUFFLE(0, 3, 2, 1));
    const __m128 shuf2 = _mm_shuffle_ps(mult,mult, _MM_SHUFFLE(1, 0, 3, 2));
    const __m128 shuf3 = _mm_shuffle_ps(mult,mult, _MM_SHUFFLE(2, 1, 0, 3));

    return _mm_add_ss(_mm_add_ss(_mm_add_ss(mult, shuf1), shuf2), shuf3);
}

Run Code Online (Sandbox Code Playgroud)

但我看了生成的汇编程序与gcc 4.9(实验)-O3,我得到:

    mulps   %xmm1, %xmm0
    movaps  %xmm0, %xmm3         //These lines
    movaps  %xmm0, %xmm2         //have no use
    movaps  %xmm0, %xmm1         //isn't it ?
    shufps  $57, %xmm0, %xmm3
    shufps  $78, %xmm0, %xmm2 …

Run Code Online (Sandbox Code Playgroud)

c assembly sse

mat*_*tch

2016 01-29

4
推荐指数

1
解决办法

3745
查看次数

获取浮动模板参数工作的"hack"编译,但在g ++和clang上都是segfaulted

我知道为什么我不能使用float作为模板参数以及如何通过分子/分母对设置模板类的静态const float成员.但是我正在尝试基于reinterpret_cast的另一个"hack"到来自IEEE754十六进制写入的"emule"浮点模板参数.

这是一小段代码:

#include <iostream>
#include <cstdint>

template <uint32_t T>
struct MyStruct
{
    static const float value;
};

template <uint32_t T>
const float MyStruct<T>::value = *reinterpret_cast<float*>(T);

int main()
{
    typedef MyStruct<0x40490fdb> Test;
    std::cout << Test::value << std::endl;
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

我编译它...

g++ -Wall -pedantic main.cpp -std=c++0x -g

Run Code Online (Sandbox Code Playgroud)

没有任何警告.

并且它分裂......

brugelca@artemis:~/workspace/draft$ ./a.out 
Segmentation fault (core dumped)

Run Code Online (Sandbox Code Playgroud)

这是valgrind输出:

brugelca@artemis:~/workspace/draft$ valgrind ./a.out
==10871== Memcheck, a memory error detector
==10871== Copyright (C) 2002-2012, and GNU GPL'd, by Julian …

Run Code Online (Sandbox Code Playgroud)

c++ templates reinterpret-cast c++11

mat*_*tch

2017 05-23

4
推荐指数

1
解决办法

446
查看次数

为什么 `std::this_thread::yield()` 比 `std::this_thread::sleep_for(0s)` 慢 10 倍？

刚刚测试了两个小程序，

#include <thread>

int main()
{
    for (int i = 0; i < 10000000; i++)
    {
        std::this_thread::yield();
    }

    return 0;
}

Run Code Online (Sandbox Code Playgroud)

和：

#include <thread>
#include <chrono>

int main()
{
    using namespace std::literals;

    for (int i = 0; i < 10000000; i++)
    {
        std::this_thread::sleep_for(0s);
    }

    return 0;
}

Run Code Online (Sandbox Code Playgroud)

我在我的系统上得到了各自的计时（Ubuntu 22.04 LTS，内核版本 5.19.0-43-generic），

./a.out  0,33s user 1,36s system 99% cpu 1,687 total

Run Code Online (Sandbox Code Playgroud)

和：

./a.out  0,14s user 0,00s system 99% cpu 0,148 total

Run Code Online (Sandbox Code Playgroud)

为什么std::this_thread::yield()比慢 10 倍std::this_thread::sleep_for(0s)？

注意 g++ …

c++ linux multithreading scheduler

mat*_*tch

2023 06-16

4
推荐指数

1
解决办法

138
查看次数

如何从STL容器中获取仅移动类型？

让我们看一个std::unordered_set的std::unique_ptr<T>作为一个例子.我可以在其他位置移动该组的元素吗？

#include <unordered_set>
#include <iostream>
#include <memory>
#include <vector>

int main()
{
    std::unordered_set<std::unique_ptr<int>> mySet;

    mySet.insert(std::make_unique<int>(1));
    mySet.insert(std::make_unique<int>(2));
    mySet.insert(std::make_unique<int>(3));

    std::vector<std::unique_ptr<int>> myVector;

    for (auto&& element : mySet)
    {
        std::cout << *element << std::endl;
        //myVector.push_back(element); won't compile as you can only get a const ref to the key
    }
}

Run Code Online (Sandbox Code Playgroud)

我有一个非常实用的代码示例,我想这样做,但我减少使用a std::shared_ptr.你知道另一个(更好吗？)的选择吗？

c++ memory containers move c++11

mat*_*tch

2016 10-02

3
推荐指数

1
解决办法

320
查看次数

如何让"工厂功能"返回不可复制的对象？

上下文

尝试使用不同的文件名创建一些gzip存档我写下以下代码片段.

#include <iostream>
#include <utility>

#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/filter/gzip.hpp>

boost::iostreams::filtering_ostream&& makeGZipStream(const std::string& archiveName,
                                                     const std::string& fileName)
{
    boost::iostreams::filtering_ostream theGzipStream;

    boost::iostreams::gzip_params theGzipParams;

    theGzipParams.file_name = fileName;

    theGzipStream.push(boost::iostreams::gzip_compressor{theGzipParams});

    theGzipStream.push(boost::iostreams::file_sink{archiveName});

    return std::move(theGzipStream);
}

int main()
{
    boost::iostreams::filtering_ostream&& theGzipStream = makeGZipStream("archive.gz", "file");

    theGzipStream << "This is a test..." << std::endl;

    return 0;
}

Run Code Online (Sandbox Code Playgroud)