相关疑难解决方法(0)

如何对表示为字节向量的多列进行有效排序？

我有一个名为的集合Dataframe，本质上是一个std::vector<char>. 它包含records特定的长度record_size，可以由1个或多个不同类型的字段组成。的定义Dataframe是不可协商的，因为它与代码库的其余部分紧密耦合。一般来说，该结构可以包含数百万数量级的大量记录和数百数量级的大量列。

根据我上面所说的，可能只有一个列或多列。单例列，就是我所说的简单情况，因为字符向量可以转换为实际类型，并且排序非常有效。

我遇到的问题是，对于多个列，我需要有一个指向记录的指针向量，应用一个跳转到内存中访问实际数据的比较器函数，然后交换指针，当排序过程结束时，应用到真实数据由指针标识的排列。这会比较慢，尤其是当Dataframe包含真实世界数据时。

下面的代码摘录可以更好地解释这种情况。有没有其他方法可以解决这个问题？

#include <iostream>
#include <vector>
#include <cstring>
#include <algorithm>
#include <numeric>
#include <iomanip>
#include <functional>

class Dataframe {

public:
    enum class Base : char
    {
        SIGNED = 'S',
        UNSIGNED = 'U',
        CHAR = 'A',
        // and other types like floats, date, timestamp, etc.
    };

    class Dtype
    {
    public:
        Dtype(Base base_dtype, std::size_t size) : m_base_dtype(base_dtype), m_size(size) {}
        auto base_dtype() const { return m_base_dtype; }
        auto …

Run Code Online (Sandbox Code Playgroud)

c++ sorting algorithm performance

reu*_*man

2023 04-28

1
推荐指数

1
解决办法

342
查看次数

标签统计

algorithm ×1

c++ ×1

performance ×1

sorting ×1

如何对表示为字节向量的多列进行有效排序？

标签 统计

标签统计