使 std::vector 分配对齐内存的现代方法

Question

使 std::vector 分配对齐内存的现代方法

Pru*_*ica 13 c++ simd memory-alignment stdvector c++17

在以下问题是相关的，但答案是旧的，并且从用户评论马克Glisse表明有因为C ++ 17的新方法这个问题可能没有得到充分讨论。

我试图让对齐的内存为 SIMD 正常工作，同时仍然可以访问所有数据。

在 Intel 上，如果我创建一个类型为的浮点向量__m256，并将我的大小减小 8 倍，它会给我对齐的内存。

例如 std::vector<__m256> mvec_a((N*M)/8);

以一种稍微有点麻烦的方式，我可以将指向向量元素的指针转换为浮点，这允许我访问单个浮点值。

相反，我更喜欢std::vector<float>正确对齐的，因此可以加载到__m256其他 SIMD 类型中而不会出现段错误。

我一直在研究aligned_alloc。

这可以给我一个正确对齐的 C 样式数组：

auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

Run Code Online (Sandbox Code Playgroud)

但是我不确定如何为std::vector<float>. 授予的std::vector<float>所有权marr_a 似乎是不可能的。

我已经看到一些建议我应该编写自定义分配器，但这似乎需要做很多工作，也许现代 C++ 有更好的方法？

Answer 1

mxm*_*nkn 8

STL 容器采用分配器模板参数，可用于对齐其内部缓冲区。指定的分配器类型必须至少实现 allocate、deallocate和value_type。

\n

与这些答案相反，这种分配器的实现避免了依赖于平台的对齐 malloc 调用。相反，它使用C++17 对齐new运算符。

\n

这里是 godbolt 的完整示例。

\n

#include <limits>\n#include <new>\n\n/**\n * Returns aligned pointers when allocations are requested. Default alignment\n * is 64B = 512b, sufficient for AVX-512 and most cache line sizes.\n *\n * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2.\n */\ntemplate<typename    ElementType,\n         std::size_t ALIGNMENT_IN_BYTES = 64>\nclass AlignedAllocator\n{\nprivate:\n    static_assert(\n        ALIGNMENT_IN_BYTES >= alignof( ElementType ),\n        "Beware that types like int have minimum alignment requirements "\n        "or access will result in crashes."\n    );\n\npublic:\n    using value_type = ElementType;\n    static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };\n\n    /**\n     * This is only necessary because AlignedAllocator has a second template\n     * argument for the alignment that will make the default\n     * std::allocator_traits implementation fail during compilation.\n     * @see https://stackoverflow.com/a/48062758/2191065\n     */\n    template<class OtherElementType>\n    struct rebind\n    {\n        using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;\n    };\n\npublic:\n    constexpr AlignedAllocator() noexcept = default;\n\n    constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;\n\n    template<typename U>\n    constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept\n    {}\n\n    [[nodiscard]] ElementType*\n    allocate( std::size_t nElementsToAllocate )\n    {\n        if ( nElementsToAllocate\n             > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {\n            throw std::bad_array_new_length();\n        }\n\n        auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );\n        return reinterpret_cast<ElementType*>(\n            ::operator new[]( nBytesToAllocate, ALIGNMENT ) );\n    }\n\n    void\n    deallocate(                  ElementType* allocatedPointer,\n                [[maybe_unused]] std::size_t  nBytesAllocated )\n    {\n        /* According to the C++20 draft n4868 \xc2\xa7 17.6.3.3, the delete operator\n         * must be called with the same alignment argument as the new expression.\n         * The size argument can be omitted but if present must also be equal to\n         * the one used in new. */\n        ::operator delete[]( allocatedPointer, ALIGNMENT );\n    }\n};\n

Run Code Online (Sandbox Code Playgroud)\n

然后可以像这样使用该分配器：

\n

#include <iostream>\n#include <stdexcept>\n#include <vector>\n\ntemplate<typename T, std::size_t ALIGNMENT_IN_BYTES = 64>\nusing AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >;\n\nint\nmain()\n{\n    AlignedVector<int, 1024> buffer( 3333 );\n    if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024 != 0 ) {\n        std::cerr << "Vector buffer is not aligned!\\n";\n        throw std::logic_error( "Faulty implementation!" );\n    }\n\n    std::cout << "Successfully allocated an aligned std::vector.\\n";\n    return 0;\n}\n

Run Code Online (Sandbox Code Playgroud)\n

C++17 支持过度对齐的动态分配，例如 `std::vector<__m256i>` 应该可以工作。有没有办法利用这一点，而不是使用丑陋的黑客来过度分配，然后留下部分分配未使用？ (2认同)

Answer 2

Sam*_*hik 0

标准 C++ 库中的所有容器（包括向量）都有一个可选的模板参数，用于指定容器的分配器，并且实现您自己的分配器实际上并不是很多工作：

class my_awesome_allocator {
};

std::vector<float, my_awesome_allocator> awesomely_allocated_vector;

Run Code Online (Sandbox Code Playgroud)

您将必须编写一些代码来实现您的分配器，但它不会比您已经编写的代码多多少。如果不需要 C++17 之前的支持，则只需实现allocate()和deallocate()方法即可。

这可能是一个提供规范答案的好地方，其中包含一个示例，人们可以复制/粘贴以跳过 C++ 烦人的麻烦。（如果有办法让 std::vector 尝试就地重新分配，而不是通常脑残的 C++ 总是分配+复制，那就加分了。）当然还要注意，这个 `vector<float, MAA>` 不是类型兼容的使用 `vector<float>` （并且不能是因为在没有此分配器的情况下编译的普通 `std::vector<float>` 上执行 `.push_back` 的任何内容都可以执行新的分配并复制到最小对齐内存中。并且new/delete与aligned_alloc/free不兼容） (2认同)
我认为不能保证从分配器返回的指针直接用作“std::vector”数组的基地址。例如，我可以想象“std::vector”的实现仅使用一个指向已分配内存的指针，该指针将结束/容量/分配器存储在内存中的值范围之前。这很容易破坏分配器所做的对齐。 (2认同)
除了 `std::vector` 保证了这一点。这就是它的用途。也许您应该查看此处 C++ 标准的指定内容。 (2认同)

归档时间：	5 年，9 月前
查看次数：	729 次
最近记录：	5 年，9 月前