无序集（const char）比无序集（字符串）慢得多

Question

无序集（const char）比无序集（字符串）慢得多

Mat*_*ert 5 performance char unordered-set

我正在将磁盘中的很长的列表加载到unordered_set中。如果我使用一组字符串，则速度非常快。大约1秒内大约7 MB负载的测试列表。但是，使用一组char指针大约需要2.1分钟！

这是字符串版本的代码：

unordered_set<string> Set;
string key;
while (getline(fin, key))
{
    Set.insert(key);
}

Run Code Online (Sandbox Code Playgroud)

这是char *版本的代码：

struct unordered_eqstr
{
    bool operator()(const char* s1, const char* s2) const
    {
        return strcmp(s1, s2) == 0;
    }
};

struct unordered_deref
{
    template <typename T>
    size_t operator()(const T* p) const
    {
        return hash<T>()(*p);
    }
};

unordered_set<const char*, unordered_deref, unordered_eqstr> Set;
string key;

while (getline(fin, key))
{
    char* str = new(mem) char[key.size()+1];
    strcpy(str, key.c_str());
    Set.insert(str);
}

Run Code Online (Sandbox Code Playgroud)

“ new（mem）”是因为我使用的是自定义内存管理器，因此我可以分配大内存块并将其分配给诸如c字符串之类的小对象。但是，我已经用常规的“ new”进行了测试，结果是相同的。我还使用了我的内存管理器在其他工具中没有问题。

这两个结构对于根据实际c字符串而不是其地址进行插入和查找哈希是必需的。我实际上在堆栈溢出处发现的unordered_deref。

最终，我需要加载数GB的文件。这就是为什么我使用自定义内存管理器，但这也是为什么这种可怕的速度下降是不可接受的原因。有任何想法吗？

Answer 1

Mat*_*ert 4

开始了。

struct unordered_deref
{
    size_t operator()(const char* p) const
    {
        return hash<string>()(p);
    }
};

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，4 月前
查看次数：	1338 次
最近记录：	14 年，2 月前