如何过滤2个包含数百万个具有相同ID的项目的巨大列表

Question

如何过滤2个包含数百万个具有相同ID的项目的巨大列表

这是我的第二个清单，其中包含超过数百万个项目。两者都有相同 ID 的相同项目。ID 位于字符串中。我只需要ID不同的项目。我就是这样做的。但我确信一定有一个更好且持久的解决方案：-

    List<Transaction> differentList = new ArrayList<>();

    for(Transaction tx : foundTransactions ){
        for(Transaction aTx : ArchivedTransactions) 
        {
            if(!tx.getId().equalsIgnoreCase(aTx.getId()) ){
                differentList .add(tx);
            }
        }
    }

Run Code Online (Sandbox Code Playgroud)

我尝试使用流，但我做不到。我想使用stream API应该会更好。请建议我任何改进。

Answer 1

dre*_*ash 5

您可以尝试将其转换为HashMap第一个，例如：

Set<String> collect = ArchivedTransactions.stream().map(i -> i.getId().toLowerCase())
                                           .collect(Collectors.toSet());

for(Transaction tx : foundTransactions )
    if(!collect.contains(tx.getId()))
       differentList.add(tx);

Run Code Online (Sandbox Code Playgroud)

返回Collectors.toSet()一个HashSet. 您可以将代码简化为：

Set<String> collect = ArchivedTransactions.stream().map(i -> i.getId().toLowerCase())
                                          .collect(Collectors.toSet());

List<Transaction> differentList = foundTransactions.stream()
                                                   .filter(tx -> !collect.contains(tx.getId()))
                                                   .collect(Collectors.toList())

Run Code Online (Sandbox Code Playgroud)

将第一个添加IDs到 aHashSet作为中间步骤将为您提供更好的整体复杂性时间，因为（来源）：

HashSet 操作的时间复杂度：HashSet 的底层数据结构是哈希表。因此， HashSet 的添加、删除和查找（包含方法）操作的摊销（平均或通常情况）时间复杂度需要O(1)时间。

time complexity因此，解决方案的整体"HashMap"将是O(N + M)，其中 N和分别M是列表中元素的数量ArchivedTransactions和foundTransactions。尽管如此，space-wise你还是会为拥有这个额外的结构付出代价。

您的解决方案space-wise更好，但时间复杂度最差。如果N = M您的解决方案的时间复杂度为O(N^2)，而解决方案的时间HashSet复杂度为O(2N)，则O(N)。这是一个巨大的差异。

只做

Set<Transaction> result = new LinkedHashSet<>();
result.addAll(foundTransactions);
result.addAll(ArchivedTransactions);

Run Code Online (Sandbox Code Playgroud)

单独是行不通的，因为您明确要求：

!tx.getId().equalsIgnoreCase(aTx.getId())

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，11 月前
查看次数：	1373 次
最近记录：	4 年，11 月前