如何使用 PostgresQL 在这些桶中创建桶和组

pr3*_*338 5 postgresql

如何按年份查找信用卡的分布情况,以及完成的交易。将这些信用卡分为三类:少于 10 笔交易、10 到 30 笔交易、超过 30 笔交易?

我尝试使用的第一种方法是在 PostgresQL 中使用 width_buckets 函数,但文档说只创建等距的桶,这不是我想要的。因此,我转向案例陈述。但是,我不确定如何将 case 语句与 group by 一起使用。

这是我正在使用的数据:

table 1 - credit_cards table
credit_card_id
year_opened


table 2 - transactions table
transaction_id
credit_card_id - matches credit_cards.credit_card_id
transaction_status ("complete" or "incomplete")
Run Code Online (Sandbox Code Playgroud)

这是我到目前为止得到的:

SELECT 

CASE WHEN transaction_count < 10 THEN “Less than 10”
WHEN transaction_count >= 10 and transaction_count < 30 THEN “10 <= transaction count < 30”
ELSE transaction_count>=30 THEN “Greater than or equal to 30”
END as buckets

count(*) as ct.transaction_count
FROM credit_cards c
INNER JOIN transactions t
ON c.credit_card_id = t.credit_card_id
WHERE t.status = “completed”
GROUP BY v.year_opened

GROUP BY buckets
ORDER BY buckets
Run Code Online (Sandbox Code Playgroud)

预期输出

credit card count | year opened | transaction count bucket
23421             | 2002        | Less than 10
etc
Run Code Online (Sandbox Code Playgroud)

JGH*_*JGH 8

You can specify the bin sizes in width_bucket by specifying a sorted array of the lower bound of each bin.

In you case, it would be array[10,30]: anything less than 10 gets bin 0, between 10 and 29 gets bin 1 and 30 or more gets bin 2.

WITH a AS (select generate_series(5,35) cnt)
SELECT  cnt, width_bucket(cnt, array[10,30]) 
FROM a;
Run Code Online (Sandbox Code Playgroud)


Sen*_*nel 2

要弄清楚这一点,您需要计算每张信用卡的交易数量,以便找出正确的存储桶,然后您需要计算每年每个存储桶的信用卡数量。有几种不同的方法可以获得最终结果。一种方法是首先连接所有数据并计算第一级聚合值。然后计算聚合值的最终级别:

with t1 as (
  select year_opened
     , c.credit_card_id
     , case when count(*) < 10 then 'Less than 10'
            when count(*) < 30 then 'Between [10 and 30)'
            else 'Greater than or equal to 30'
       end buckets
  from credit_cards c
  join transactions t
    on t.credit_card_id = c.credit_card_id
 where t.transaction_status = 'complete'
 group by year_opened
     , c.credit_card_id
)
select count(*) credit_card_count
     , year_opened
     , buckets
  from t1
 group by year_opened
     , buckets;
Run Code Online (Sandbox Code Playgroud)

然而,在将交易表连接到信用卡表之前,首先计算交易表上的第一级聚合数据可能会更有效:

select count(*) credit_card_count
     , year_opened
     , buckets
  from credit_cards c
  join (select credit_card_id
             , case when count(*) < 10 then 'Less than 10'
                    when count(*) < 30 then 'Between [10 and 30)'
                    else 'Greater than or equal to 30'
               end buckets
          from transactions
         group by credit_card_id) t
    on t.credit_card_id = c.credit_card_id
 group by year_opened
     , buckets;
Run Code Online (Sandbox Code Playgroud)

如果您喜欢展开上述查询并使用通用表表达式,您也可以这样做(我发现这更容易阅读/遵循):

with bkt as (
  select credit_card_id
       , case when count(*) < 10 then 'Less than 10'
              when count(*) < 30 then 'Between [10 and 30)'
              else 'Greater than or equal to 30'
          end buckets
    from transactions
   group by credit_card_id
)
select count(*) credit_card_count
     , year_opened
     , buckets
  from credit_cards c
  join bkt t
    on t.credit_card_id = c.credit_card_id
 group by year_opened
     , buckets;
Run Code Online (Sandbox Code Playgroud)