从bigquery中的表中删除重复项

3 sql google-bigquery

我通过执行以下查询在表中发现了重复项。

SELECT name, id, count(1) as count
  FROM [myproject:dev.sample] 
  group by name, id 
  having count(1) > 1
Run Code Online (Sandbox Code Playgroud)

现在我想使用 DML 语句根据 id 和名称删除这些重复项,但它显示“0 行受影响”消息。我错过了什么吗?

DELETE FROM PRD.GPBP WHERE
    id not in(select id from [myproject:dev.sample] GROUP BY id) and 
    name not in (select name from [myproject:dev.sample] GROUP BY name) 
Run Code Online (Sandbox Code Playgroud)

May*_*wal 7

我建议您创建一个没有重复项的新表。删除原始表并将新表重命名为原始表。

您可以找到如下重复项:

Create table new_table as 
Select name, id, ...... , put our remaining 10 cols here
FROM(
SELECT *, 
ROW_NUMBER() OVER(Partition by name , id Order by id) as rnk
FROM [myproject:dev.sample] 
)a
WHERE rnk = 1;
Run Code Online (Sandbox Code Playgroud)

然后删除旧表并new_table使用旧表名称重命名。