哈希联接与哈希半联接

Question

哈希联接与哈希半联接

PostgreSQL 9.2

我试图了解Hash Semi Join和 just之间的区别Hash Join。

这里有两个查询：

一世

EXPLAIN ANALYZE SELECT * FROM orders WHERE customerid IN (SELECT
customerid FROM customers WHERE state='MD');

Hash Semi Join  (cost=740.34..994.61 rows=249 width=30) (actual time=2.684..4.520 rows=120 loops=1)
  Hash Cond: (orders.customerid = customers.customerid)
  ->  Seq Scan on orders  (cost=0.00..220.00 rows=12000 width=30) (actual time=0.004..0.743 rows=12000 loops=1)
  ->  Hash  (cost=738.00..738.00 rows=187 width=4) (actual time=2.664..2.664 rows=187 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 7kB
        ->  Seq Scan on customers  (cost=0.00..738.00 rows=187 width=4) (actual time=0.018..2.638 rows=187 loops=1)
              Filter: ((state)::text = 'MD'::text)
              Rows Removed by Filter: 19813

Run Code Online (Sandbox Code Playgroud)

二

EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON o.customerid = c.customerid WHERE c.state = 'MD'

Hash Join  (cost=740.34..1006.46 rows=112 width=298) (actual time=2.831..4.762 rows=120 loops=1)
  Hash Cond: (o.customerid = c.customerid)
  ->  Seq Scan on orders o  (cost=0.00..220.00 rows=12000 width=30) (actual time=0.004..0.768 rows=12000 loops=1)
  ->  Hash  (cost=738.00..738.00 rows=187 width=268) (actual time=2.807..2.807 rows=187 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 37kB
        ->  Seq Scan on customers c  (cost=0.00..738.00 rows=187 width=268) (actual time=0.018..2.777 rows=187 loops=1)
              Filter: ((state)::text = 'MD'::text)
              Rows Removed by Filter: 19813

Run Code Online (Sandbox Code Playgroud)

可以看出，计划中的唯一区别是，在第一种情况下，hastable 消耗7kB，而在第二种情况下，37kB节点是Hash Semi Join。

但我不明白哈希表大小的差异。该Hash节点完全使用Seq Scan具有相同Filter. 为什么会有差异？

Answer 1

jja*_*nes 6

在第一个查询中，只需要将 customer_id 保存customers到哈希表中，因为这是实现半连接所需的唯一数据。

在第二个查询中，所有列都需要存储到哈希表中，因为您要从表中选择所有列（使用*），而不仅仅是测试 customer_id 的存在。

归档时间：	10 年，6 月前
查看次数：	3355 次
最近记录：	10 年，5 月前