如何按年份查找信用卡的分布情况,以及完成的交易。将这些信用卡分为三类:少于 10 笔交易、10 到 30 笔交易、超过 30 笔交易?
我尝试使用的第一种方法是在 PostgresQL 中使用 width_buckets 函数,但文档说只创建等距的桶,这不是我想要的。因此,我转向案例陈述。但是,我不确定如何将 case 语句与 group by 一起使用。
这是我正在使用的数据:
table 1 - credit_cards table
credit_card_id
year_opened
table 2 - transactions table
transaction_id
credit_card_id - matches credit_cards.credit_card_id
transaction_status ("complete" or "incomplete")
Run Code Online (Sandbox Code Playgroud)
这是我到目前为止得到的:
SELECT
CASE WHEN transaction_count < 10 THEN “Less than 10”
WHEN transaction_count >= 10 and transaction_count < 30 THEN “10 <= transaction count < 30”
ELSE transaction_count>=30 THEN “Greater than or equal to 30”
END as buckets
count(*) as ct.transaction_count
FROM credit_cards c
INNER JOIN transactions t
ON c.credit_card_id = t.credit_card_id
WHERE t.status = “completed”
GROUP BY v.year_opened
GROUP BY buckets
ORDER BY buckets
Run Code Online (Sandbox Code Playgroud)
预期输出
credit card count | year opened | transaction count bucket
23421 | 2002 | Less than 10
etc
Run Code Online (Sandbox Code Playgroud)
You can specify the bin sizes in width_bucket by specifying a sorted array of the lower bound of each bin.
In you case, it would be array[10,30]: anything less than 10 gets bin 0, between 10 and 29 gets bin 1 and 30 or more gets bin 2.
WITH a AS (select generate_series(5,35) cnt)
SELECT cnt, width_bucket(cnt, array[10,30])
FROM a;
Run Code Online (Sandbox Code Playgroud)
要弄清楚这一点,您需要计算每张信用卡的交易数量,以便找出正确的存储桶,然后您需要计算每年每个存储桶的信用卡数量。有几种不同的方法可以获得最终结果。一种方法是首先连接所有数据并计算第一级聚合值。然后计算聚合值的最终级别:
with t1 as (
select year_opened
, c.credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from credit_cards c
join transactions t
on t.credit_card_id = c.credit_card_id
where t.transaction_status = 'complete'
group by year_opened
, c.credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from t1
group by year_opened
, buckets;
Run Code Online (Sandbox Code Playgroud)
然而,在将交易表连接到信用卡表之前,首先计算交易表上的第一级聚合数据可能会更有效:
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join (select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id) t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
Run Code Online (Sandbox Code Playgroud)
如果您喜欢展开上述查询并使用通用表表达式,您也可以这样做(我发现这更容易阅读/遵循):
with bkt as (
select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join bkt t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
Run Code Online (Sandbox Code Playgroud)