我正在对 Google BigQuery 上的 Reddit 数据进行数据挖掘,我想根据整个 201704 数据的每个 subreddit 的分数排名前 1000 个帖子。我尝试了不同的技术,但由于 BigQuery 的限制,结果太大而无法返回。
select body, score, subreddit from
(
select body, score, subreddit,row_number() over
(
partition by subreddit order by score desc
) mm
from [fh-bigquery:reddit_comments.2017_04]
)
where mm <= 1000 AND subreddit in
(
select subreddit from
(
select Count(subreddit) as counts, subreddit from
[fh-bigquery:reddit_comments.2017_04] GROUP BY subreddit ORDER BY counts DESC
LIMIT 10000
)
)
LIMIT 10000000
Run Code Online (Sandbox Code Playgroud)
有没有办法分治这个问题,因为启用大查询结果意味着不能做任何复杂的查询。Google 是否为大型查询资源提供付款选项?