在实践中,unordered_map真的比地图快吗?

Ric*_*ges 14 c++ performance dictionary unordered-map

当然,unordered_map的查找性能平均是常量,并且映射的查找性能是O(logN).

但是当然为了在unordered_map中找到一个对象,我们必须:

  1. 哈希我们想要找到的密钥.
  2. equality_compare键与同一个桶中的每个键.

而在地图中,我们只需要将所搜索的密钥与log2(N)密钥进行比较,其中N是地图中的项目数.

我想知道真正的性能差异是什么,假设散列函数增加了开销并且equality_compare不比less_than比较便宜.

我写了一个测试,而不是用一个我可以自己回答的问题打扰社区.

我已经分享了下面的结果,以防其他人发现这个有趣或有用.

如果有人能够并且愿意添加更多信息,当然会邀请更多答案.

Ric*_*ges 18

在回答有关错过搜索次数的性能问题时,我重构了测试以对此进行参数化.

示例结果:

searches=1000000 set_size=      0 miss=    100% ordered=   4384 unordered=  12901 flat_map=    681
searches=1000000 set_size=     99 miss=  99.99% ordered=  89127 unordered=  42615 flat_map=  86091
searches=1000000 set_size=    172 miss=  99.98% ordered= 101283 unordered=  53468 flat_map=  96008
searches=1000000 set_size=    303 miss=  99.97% ordered= 112747 unordered=  53211 flat_map= 107343
searches=1000000 set_size=    396 miss=  99.96% ordered= 124179 unordered=  59655 flat_map= 112687
searches=1000000 set_size=    523 miss=  99.95% ordered= 132180 unordered=  51133 flat_map= 121669
searches=1000000 set_size=    599 miss=  99.94% ordered= 135850 unordered=  55078 flat_map= 121072
searches=1000000 set_size=    695 miss=  99.93% ordered= 140204 unordered=  60087 flat_map= 124961
searches=1000000 set_size=    795 miss=  99.92% ordered= 146071 unordered=  64790 flat_map= 127873
searches=1000000 set_size=    916 miss=  99.91% ordered= 154461 unordered=  50944 flat_map= 133194
searches=1000000 set_size=    988 miss=   99.9% ordered= 156327 unordered=  54094 flat_map= 134288
Run Code Online (Sandbox Code Playgroud)

键:

searches = number of searches performed against each map
set_size = how big each map is (and therefore how many of the searches will result in a hit)
miss = the probability of generating a missed search. Used for generating searches and set_size.
ordered = the time spent searching the ordered map
unordered = the time spent searching the unordered_map
flat_map = the time spent searching the flat map

note: time is measured in std::system_clock::duration ticks.
Run Code Online (Sandbox Code Playgroud)

TL; DR

结果:只要地图中有数据,unordered_map就会显示其优越性.唯一一次表现出比有序地图更差的表现是地图是空的.

这是新代码:

#include <iostream>
#include <iomanip>
#include <random>
#include <algorithm>
#include <string>
#include <vector>
#include <map>
#include <unordered_map>
#include <unordered_set>
#include <chrono>
#include <tuple>
#include <future>
#include <stdexcept>
#include <sstream>

using namespace std;

// this sets the length of the string we will be using as a key.
// modify this to test whether key complexity changes the performance ratios
// of the various maps
static const size_t key_length = 20;

// the number of keys we will generate (the size of the test)
const size_t nkeys = 1000000;



// use a virtual method to prevent the optimiser from detecting that
// our sink function actually does nothing. otherwise it might skew the test
struct string_user
{
    virtual void sink(const std::string&) = 0;
    virtual ~string_user() = default;
};

struct real_string_user : string_user
{
    virtual void sink(const std::string&) override
    {

    }
};

struct real_string_user_print : string_user
{
    virtual void sink(const std::string& s) override
    {
        cout << s << endl;
    }
};

// generate a sink from a string - this is a runtime operation and therefore
// prevents the optimiser from realising that the sink does nothing
std::unique_ptr<string_user> make_sink(const std::string& name)
{
    if (name == "print")
    {
        return make_unique<real_string_user_print>();
    }
    if (name == "noprint")
    {
        return make_unique<real_string_user>();
    }
    throw logic_error(name);
}

// generate a random key, given a random engine and a distribution
auto gen_string = [](auto& engine, auto& dist)
{
    std::string result(key_length, ' ');
    generate(begin(result), end(result), [&] {
        return dist(engine);
    });
    return result;
};

// comparison predicate for our flat map.
struct pair_less
{
    bool operator()(const pair<string, string>& l, const string& r) const {
        return l.first < r;
    }

    bool operator()(const string& l, const pair<string, string>& r) const {
        return l < r.first;
    }
};

template<class F>
auto time_test(F&& f, const vector<string> keys)
{
    auto start_time = chrono::system_clock::now();

    for (auto const& key : keys)
    {
        f(key);
    }

    auto stop_time = chrono::system_clock::now();
    auto diff =  stop_time - start_time;
    return diff;
}

struct report_key
{
    size_t nkeys;
    int miss_chance;
};

std::ostream& operator<<(std::ostream& os, const report_key& key)
{
    return os << "miss=" << setw(2) << key.miss_chance << "%";
}

void run_test(string_user& sink, size_t nkeys, double miss_prob)
{
    // the types of map we will test
    unordered_map<string, string> unordered;
    map<string, string> ordered;
    vector<pair<string, string>> flat_map;

    // a vector of all keys, which we can shuffle in order to randomise
    // access order of all our maps consistently
    vector<string> keys;
    unordered_set<string> keys_record;

    // generate keys
    auto eng = std::default_random_engine(std::random_device()());
    auto alpha_dist = std::uniform_int_distribution<char>('A', 'Z');
    auto prob_dist = std::uniform_real_distribution<double>(0, 1.0 - std::numeric_limits<double>::epsilon());

    auto generate_new_key = [&] {
        while(true) {
            // generate a key
            auto key = gen_string(eng, alpha_dist);
            // try to store it in the unordered map
            // if it already exists, force a regeneration
            // otherwise also store it in the ordered map and the flat map
            if(keys_record.insert(key).second) {
                return key;
            }
        }
    };

    for (size_t i = 0 ; i < nkeys ; ++i)
    {
        bool inserted = false;
        auto value = to_string(i);

        auto key = generate_new_key();
        if (prob_dist(eng) >= miss_prob) {
            unordered.emplace(key, value);
            flat_map.emplace_back(key, value);
            ordered.emplace(key, std::move(value));
        }
        // record the key for later use
        keys.emplace_back(std::move(key));
    }
    // turn our vector 'flat map' into an actual flat map by sorting it by pair.first. This is the key.
    sort(begin(flat_map), end(flat_map),
         [](const auto& l, const auto& r) { return l.first < r.first; });

    // shuffle the keys to randomise access order
    shuffle(begin(keys), end(keys), eng);

    auto unordered_lookup = [&](auto& key) {
        auto i = unordered.find(key);
        if (i != end(unordered)) {
            sink.sink(i->second);
        }
    };

    auto ordered_lookup = [&](auto& key) {
        auto i = ordered.find(key);
        if (i != end(ordered)) {
            sink.sink(i->second);
        }
    };

    auto flat_map_lookup = [&](auto& key) {
        auto i = lower_bound(begin(flat_map),
                             end(flat_map),
                             key,
                             pair_less());
        if (i != end(flat_map) && i->first == key) {
            sink.sink(i->second);
        }
    };

    // spawn a thread to time access to the unordered map
    auto unordered_future = async(launch::async,
                                  [&]()
                                  {
                                      return time_test(unordered_lookup, keys);
                                  });

    // spawn a thread to time access to the ordered map
    auto ordered_future = async(launch::async, [&]
                                {
                                    return time_test(ordered_lookup, keys);
                                });

    // spawn a thread to time access to the flat map
    auto flat_future = async(launch::async, [&]
                             {
                                 return time_test(flat_map_lookup, keys);
                             });

    // synchronise all the threads and get the timings
    auto ordered_time = ordered_future.get();
    auto unordered_time = unordered_future.get();
    auto flat_time = flat_future.get();

    cout << "searches=" << setw(7) << nkeys;
    cout << " set_size=" << setw(7) << unordered.size();
    cout << " miss=" << setw(7) << setprecision(6) << miss_prob * 100.0 << "%";
    cout << " ordered=" << setw(7) << ordered_time.count();
    cout << " unordered=" << setw(7) << unordered_time.count();
    cout << " flat_map=" << setw(7) << flat_time.count() << endl;

}

int main()
{
    // generate the sink, preventing the optimiser from realising what it
    // does.
    stringstream ss;
    ss << "noprint";
    string arg;
    ss >> arg;
    auto puser = make_sink(arg);

    for (double chance = 1.0 ; chance >= 0.0 ; chance -= 0.0001)
    {
        run_test(*puser, 1000000, chance);
    }


    return 0;
}
Run Code Online (Sandbox Code Playgroud)

  • 您的基准测试计划对我非常有帮助。我确实调整了它来检查映射和带有 int 键的无序映射之间的性能差异。这里是对 100 个元素容器进行 1,000,000 次查找而没有遗漏的时间。我使用 for 循环来查找 i%100 。这是我得到的结果:ordered=259130usec unordered=125470usec。iow,100 个整数的 unordered_map 大约比 map 快 2 倍!这已经用 c++20 模式编译的 gcc 11.2 进行了测试 (2认同)

Ric*_*ges 5

在下面的测试中,我使用-O3在Apple clang上进行了编译,我采取了一些措施来确保测试是公平的,例如:

  1. 通过vtable调用每次接收结果的接收器函数,以防止优化程序内联整个搜索!

  2. 在3种不同的包含相同数据的地图上,以相同的顺序并行运行测试。这意味着,如果一个测试开始“领先”,它将开始为搜索集输入缓存缺失区域(请参见代码)。这意味着没有人会遇到遇到“热”缓存的不公平优势。

  3. 参数化密钥大小(以及复杂度)

  4. 参数化地图大小

  5. 测试了三种不同类型的地图(包含相同的数据)-unordered_map,地图和键/值对的排序向量。

  6. 检查汇编器输出,以确保优化器由于死代码分析而无法优化掉整个逻辑块。

这是代码:

#include <iostream>
#include <random>
#include <algorithm>
#include <string>
#include <vector>
#include <map>
#include <unordered_map>
#include <chrono>
#include <tuple>
#include <future>
#include <stdexcept>
#include <sstream>

using namespace std;

// this sets the length of the string we will be using as a key.
// modify this to test whether key complexity changes the performance ratios
// of the various maps
static const size_t key_length = 20;

// the number of keys we will generate (the size of the test)
const size_t nkeys = 1000000;


// the types of map we will test
unordered_map<string, string> unordered;
map<string, string> ordered;
vector<pair<string, string>> flat_map;

// a vector of all keys, which we can shuffle in order to randomise
// access order of all our maps consistently
vector<string> keys;

// use a virtual method to prevent the optimiser from detecting that
// our sink function actually does nothing. otherwise it might skew the test
struct string_user
{
    virtual void sink(const std::string&) = 0;
    virtual ~string_user() = default;
};

struct real_string_user : string_user
{
    virtual void sink(const std::string&) override
    {

    }
};

struct real_string_user_print : string_user
{
    virtual void sink(const std::string& s) override
    {
        cout << s << endl;
    }
};

// generate a sink from a string - this is a runtime operation and therefore
// prevents the optimiser from realising that the sink does nothing
std::unique_ptr<string_user> make_sink(const std::string& name)
{
    if (name == "print")
    {
        return make_unique<real_string_user_print>();
    }
    if (name == "noprint")
    {
        return make_unique<real_string_user>();
    }
    throw logic_error(name);
}

// generate a random key, given a random engine and a distribution
auto gen_string = [](auto& engine, auto& dist)
{
    std::string result(key_length, ' ');
    generate(begin(result), end(result), [&] {
        return dist(engine);
    });
    return result;
};

// comparison predicate for our flat map.
struct pair_less
{
    bool operator()(const pair<string, string>& l, const string& r) const {
        return l.first < r;
    }

    bool operator()(const string& l, const pair<string, string>& r) const {
        return l < r.first;
    }
};

int main()
{
    // generate the sink, preventing the optimiser from realising what it
    // does.
    stringstream ss;
    ss << "noprint";
    string arg;
    ss >> arg;
    auto puser = make_sink(arg);

    // generate keys
    auto eng = std::default_random_engine(std::random_device()());
    auto alpha_dist = std::uniform_int_distribution<char>('A', 'Z');

    for (size_t i = 0 ; i < nkeys ; ++i)
    {
        bool inserted = false;
        auto value = to_string(i);
        while(!inserted) {
            // generate a key
            auto key = gen_string(eng, alpha_dist);
            // try to store it in the unordered map
            // if it already exists, force a regeneration
            // otherwise also store it in the ordered map and the flat map
            tie(ignore, inserted) = unordered.emplace(key, value);
            if (inserted) {
                flat_map.emplace_back(key, value);
                ordered.emplace(key, std::move(value));
                // record the key for later use
                keys.emplace_back(std::move(key));
            }
        }
    }
    // turn our vector 'flat map' into an actual flat map by sorting it by pair.first. This is the key.
    sort(begin(flat_map), end(flat_map),
         [](const auto& l, const auto& r) { return l.first < r.first; });

    // shuffle the keys to randomise access order
    shuffle(begin(keys), end(keys), eng);

    // spawn a thread to time access to the unordered map
    auto unordered_future = async(launch::async, [&]()
                                  {
                                      auto start_time = chrono::system_clock::now();

                                      for (auto const& key : keys)
                                      {
                                          puser->sink(unordered.at(key));
                                      }

                                      auto stop_time = chrono::system_clock::now();
                                      auto diff =  stop_time - start_time;
                                      return diff;
                                  });

    // spawn a thread to time access to the ordered map
    auto ordered_future = async(launch::async, [&]
                                {

                                    auto start_time = chrono::system_clock::now();

                                    for (auto const& key : keys)
                                    {
                                        puser->sink(ordered.at(key));
                                    }

                                    auto stop_time = chrono::system_clock::now();
                                    auto diff =  stop_time - start_time;
                                    return diff;
                                });

    // spawn a thread to time access to the flat map
    auto flat_future = async(launch::async, [&]
                                {

                                    auto start_time = chrono::system_clock::now();

                                    for (auto const& key : keys)
                                    {
                                        auto i = lower_bound(begin(flat_map),
                                                               end(flat_map),
                                                               key,
                                                               pair_less());
                                        if (i != end(flat_map) && i->first == key)
                                            puser->sink(i->second);
                                        else
                                            throw invalid_argument(key);
                                    }

                                    auto stop_time = chrono::system_clock::now();
                                    auto diff =  stop_time - start_time;
                                    return diff;
                                });

    // synchronise all the threads and get the timings
    auto ordered_time = ordered_future.get();
    auto unordered_time = unordered_future.get();
    auto flat_time = flat_future.get();

    // print
    cout << "  ordered time: " << ordered_time.count() << endl;
    cout << "unordered time: " << unordered_time.count() << endl;
    cout << " flat map time: " << flat_time.count() << endl;

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

结果:

  ordered time: 972711
unordered time: 335821
 flat map time: 559768
Run Code Online (Sandbox Code Playgroud)

如您所见,unordered_map令人信服地击败了地图和排序对向量。对向量的速度是映射解的两倍。这很有趣,因为lower_bound和map :: at具有几乎相等的复杂性。

TL; DR

在此测试中,无序图的速度(查找)大约是有序图的3倍,排序后的向量令人信服地击败了图。

实际上,我对它的速度感到震惊。

  • 看起来您的“平面地图”测试实际上在搜索排序后的向量和有序地图。所以我感到奇怪的是它的时机相同。实际上-这可能与同时运行测试有关。如果不同时运行测试以消除争用作为一个因素,我个人会感觉更好。此外,平面地图测试不应对“有序”对象进行任何操作(除非我误解了)。 (3认同)
  • @Richard re:“测试是如此之快,以至于无法衡量”检查复杂性顺序的全部意义在于了解不同 N 值的性能影响。特别是,较低复杂性的好处是无论有多大可能是恒定的开销,会有一个阈值 N,从该阈值开始,较低的复杂度开始表现得更好。_为了用较低的 N 值进行测试,您需要多次重复测试以获得可测量的结果_。 (2认同)
  • 我刚刚在 GCC 和 MSVC 上运行了基准测试。地图中约有 20 个项目,使用 GCC 时无序的速度是原来的 2 倍,但使用 MSVC 时速度已经接近 3 倍。MSVC 确实抱怨了 `std::uniform_int_distribution&lt;char&gt;` 但它可以与 `&lt;int&gt;` 一起使用。 (2认同)