select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as
timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table
as prod_and_ts;
Run Code Online (Sandbox Code Playgroud)
通过使用上面的查询,我得到以下输出.
USER_ID | PRODUCT_ID | TIMESTAMPS
------------+------------------+-------------
1015826235 220003038067 1004841621
1015826235 300003861266 1005268799
1015826235 140002997245 1061569397
1015826235 *200002448035* 1005542471
Run Code Online (Sandbox Code Playgroud)
如果你比较以上output from the query with the below Table2 data,则product_id在last line of above output不与匹配ITEM_ID在下面的最后一行Table2数据.
BUYER_ID | ITEM_ID | CREATED_TIME
-------------+-------------------+------------------------
1015826235 220003038067 2001-11-03 19:40:21
1015826235 300003861266 2001-11-08 18:19:59
1015826235 140002997245 2003-08-22 09:23:17
1015826235 *210002448035* 2001-11-11 22:21:11
Run Code Online (Sandbox Code Playgroud)
所以我的问题是
找到所有那些PRODUCT_ID(ITEM_ID)和TIMESTAMPS(CREATED_TIME)不与匹配Table2对应于特定BUYER_ID或USER_ID数据.
所以我需要为上面的例子显示这样的结果 -
BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | TIMESTAMPS
-----------+-------------------+-------------------------+---------------+------------------+------------------
1015826235 *210002448035* 2001-11-11 22:21:11 1015826235 *200002448035* 1005542471
Run Code Online (Sandbox Code Playgroud)
我需要加入我用table2编写的上述查询来获得上述结果.所以我需要在JOINING过程中使用我的上述查询.这让我很困惑.任何建议将不胜感激.
更新: -
我写了下面的查询,但不知怎的,我无法实现我想要实现的输出.谁能帮我这个?
SELECT table2.buyer_id, table2.item_id, table2.created_time from
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as
timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table
as prod_and_ts) prod_and_ts JOIN table2 where
prod_and_ts.user_id = table2.buyer_id
and (product_id <> table2.item_id or
timestamps <> UNIX_TIMESTAMP(table2.created_time));
Run Code Online (Sandbox Code Playgroud)
我认为你可以通过两个查询来做你想做的事情,但我不是 100% 确定。通常在这种情况下,在第一个表中查找与第二个表中不匹配的内容就足够了。您还试图获得“最接近”的匹配,这就是为什么这具有挑战性。
以下查询查找用户 ID 和其他两个字段之一的匹配项,然后将它们组合起来:
SELECT table2.buyer_id, table2.item_id, table2.created_time, prod_and_ts.*
from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps
from testingtable2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts
) prod_and_ts JOIN
table2
on prod_and_ts.user_id = table2.buyer_id and
prod_and_ts.product_id = table2.item_id and
prod_and_ts.timestamps <> UNIX_TIMESTAMP(table2.created_time)
union all
SELECT table2.buyer_id, table2.item_id, table2.created_time, prod_and_ts.*
from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps
from testingtable2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts
) prod_and_ts JOIN
table2
on prod_and_ts.user_id = table2.buyer_id and
prod_and_ts.product_id <> table2.item_id and
prod_and_ts.timestamps = UNIX_TIMESTAMP(table2.created_time)
Run Code Online (Sandbox Code Playgroud)
这不会发现任何一个字段都没有匹配的情况。
另外,我使用“on”语法而不是“where”来编写此内容。我认为 HIVE 支持这一点。