如何在小于 O(N*M) 的时间内连接两个列表？

Question

如何在小于 O(N*M) 的时间内连接两个列表？

假设我们有两个表（就像 SQL 表一样），其中一个表中的主键是另一个表中的外键。我应该编写一个简单的算法来模拟这两个表的连接。我考虑过迭代第一个表中主键列中的每个元素，并进行第二个循环，检查外键是否匹配，然后将其存储在外部数组或列表中。然而，这需要 O(N*M) 并且我需要找到更好的东西。教科书上有一个提示，它涉及到哈希，但是，我不确定这里如何实现哈希，或者如何让它变得更好？

编辑以添加示例：

Table Employees
|ID (Primary Key)| Name | Department|
|:---------------|:----:|----------:|
|       1        | Jack |    IT     |   
|       2        | Amy  |  Finance  |

Table Transactions
|Sold Product| Sold By(Foreign Key)| Sold To|
|:-----------|:-------------------:|-------:|
|    TV      |          1          |  Mary  |
|   Radio    |          1          |  Bob   |
|   Mobile   |          2          |  Lisa  |

Run Code Online (Sandbox Code Playgroud)

我想做的是编写一个使用散列连接这两个表的算法，而不是任何复杂的东西，只是一个关于如何做到这一点的简单想法。

Answer 1

Boh*_*ian 6

使用主表元素填充 HashMap，使用它们的键作为映射键，使用整个对象作为值。这仅需要对主表进行 1 次传递，并且添加到 HashMap 的每个操作都在恒定时间 O(1) 内发生。这类似于创建数据库索引。

\n

通过从映射中获取父表，以恒定时间 O(1) 迭代子表 \xe2\x80\x9cjoining\xe2\x80\x9d 到父行。

\n

总运行时间为每个 \xe2\x80\x9ctable\xe2\x80\x9d 1 次，因此 O(n + m)。

\n

代码可能类似于：

\n

class Parent {\n    int id;\n    // other fields, getters etc\n}\n\nclass Child {\n    int parentId;\n    // other fields, getters etc\n}\n\nvoid join(List<Parent> parents, List<Child> children) {\n    Map<Integer, Parent> parentMap = parents.stream()\n      .collect(toMap(Parent::getKey, p -> p)); // FYI toMap collects to a HashMap\n    for (Child child : children) {\n        Parent parent = parentMap.get(child.getParentId());\n        // not sure what \xe2\x80\x9cjoin\xe2\x80\x9d means, but you now have child and its parent\n    }\n}\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	3 年，11 月前
查看次数：	472 次
最近记录：	3 年，11 月前