为什么不能在计数排序算法中使用哈希表/字典？

Question

为什么不能在计数排序算法中使用哈希表/字典？

Wil*_*itt 6 python sorting algorithm hash python-3.x

当您使用计数排序算法时，您将创建一个列表，并使用其索引作为键，同时添加整数出现的次数作为列表中的值。为什么这与简单地使用keys作为索引和counts作为值的字典创建不同？如：

hash_table = collections.Counter(numList)

Run Code Online (Sandbox Code Playgroud)

或者

hash_table = {x:numList.count(x) for x in numList}

Run Code Online (Sandbox Code Playgroud)

创建哈希表后，您基本上只需将整数出现的次数复制到另一个列表中。哈希表/字典的查找时间为 O(1)，那么如果您只是引用键/值对，为什么这不是更可取的呢？

我在下面包含了计数排序算法以供参考：

def counting_sort(the_list, max_value):
    # List of 0's at indices 0...max_value
    num_counts = [0] * (max_value + 1)

    # Populate num_counts
    for item in the_list:
        num_counts[item] += 1

    # Populate the final sorted list
    sorted_list = []

    # For each item in num_counts
    for item, count in enumerate(num_counts):

        # For the number of times the item occurs
        for _ in xrange(count):

            # Add it to the sorted list
            sorted_list.append(item)

    return sorted_list

Run Code Online (Sandbox Code Playgroud)

Answer 1

tem*_*def 9

你当然可以做这样的事情。问题是这样做是否值得。

计数排序的运行时间为 O(n + U)，其中 n 是数组中元素的数量，U 是最大值。请注意，随着 U 越来越大，该算法的运行时间开始显着降低。例如，如果 U > n 并且我向 U 增加一位（例如，将其从 1,000,000 更改为 10,000,000），则运行时间可以增加 10 倍。这意味着随着 U 越来越大，计数排序开始变得不切实际，因此您通常在 U 相当小时运行计数排序。如果您要使用 U 的小值运行计数排序，那么使用哈希表不一定值得开销。散列项比仅执行标准数组查找花费更多的 CPU 周期，对于小数组，潜在的内存节省可能不值得花费额外的时间。如果你使用非常大的 U 值，

另一个问题是计数排序的重组步骤具有惊人的引用局部性 - 您只需扫描计数数组和输入数组并并行填充值。如果您使用哈希表，则会丢失一些局部性，因为哈希表中的元素不一定连续存储。

但这些是比其他任何东西都更多的实现参数。从根本上说，计数排序不是“使用数组”，而是“构建频率直方图”。只是碰巧在构建直方图时，常规的旧数组通常比哈希表更可取。

归档时间：	6 年，8 月前
查看次数：	662 次
最近记录：	6 年，8 月前