我正在寻找在商店数据集中的交易(连续的?),这种趋势遵循的趋势是尽管一天中有一些先前的取消,但他们最终还是完成了交易。
有效的批处理交易必须符合一组条件。
X取消但1完成。查询应保留所有符合上述条件的批次。最终表应有一列batch,其中包含批次的总运行量以区分它们。
初始表:
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101011
A 1000 Completed 20101011
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
Run Code Online (Sandbox Code Playgroud)
按天划分:
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Cancelled 20101010
------------------------------
A 1234 Completed 20101011
A 1000 Completed 20101011
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
------------------------------
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
------------------------------
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
Run Code Online (Sandbox Code Playgroud)
结果表:
shop amount status date batch
-------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
-------------------------------------
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
-------------------------------------
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
-------------------------------------
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Run Code Online (Sandbox Code Playgroud)
表格查询:
([] shop:`A`A`A`A`A`A`B`B`B`B`C`C`C`D`D`D`D; amount: 1234 1234 1234 1234 1234 1000 100 100 4321 4321 333 333 333 111 155 111 155; status:`Cancelled`Cancelled`Completed`Cancelled`Completed`Completed`Cancelled`Cancelled`Cancelled`Cancelled`Cancelled`Completed`Completed`Cancelled`Cancelled`Completed`Completed; date: `20101010`20101010`20101010`20101010`20101011`20101011`20101011`20101011`20101011`20101011`20101012`20101012`20101012`20101013`20101013`20101013`20101013)
Run Code Online (Sandbox Code Playgroud)
说明:
在第一天,A进行了4笔交易。前三个批处理在一起,因为它们的数量相同(已取消->已取消->已完成)。最后一天的交易将被忽略,因为它已结束。
在第二天,A进行了一笔等额的交易,1234但没有将前一天的交易作为其批次的一部分。完成A的另一笔交易1000。B进行了四次交易,但由于a)被取消或b)十次方而未跟踪。
第三天,C进行了三笔相同金额的交易。这被认为是两个批次,因为第一个取消和完成构成了初始批次,而最终完成的交易是一个单独的批次。
在第四天,D进行了四笔交易并组成了两个批次。请注意,此处的交易不是连续的,因为有两个金额不同的已取消交易,但两者都将在将来完成。
表按时间戳和日期排序,即23:59:59到00:00:00。查询不必是单行的,而可以是写入任何临时表/变量等的多行查询。
此外,如果有一种方法可以获取每批已取消交易的数量,这将很有帮助。
小智 7
因此,首先计算完成的批次数。
q)n:count select from tab where status=`Completed
Run Code Online (Sandbox Code Playgroud)
然后使用以下查询将批号分配给每个“已完成”行
q)btab:update batch:1+til n from tab where status=`Completed
q)btab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010 1
A 1234 Cancelled 20101010
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Run Code Online (Sandbox Code Playgroud)
然后反转表格以按日期,店铺和金额填充空值,然后反转并删除10的幂的所有取消(使用与特里林奇相同的逻辑)
q)ftab:reverse update fills batch by date,shop,amount from reverse btab where not (status=`Cancelled)&{x=`int$x}10 xlog amount
q)ftab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
A 1234 Cancelled 20101010
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Run Code Online (Sandbox Code Playgroud)
然后从表中选择并提取具有批号的数据
q)stab:select from ftab where batch<>0N
q)stab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
q)
Run Code Online (Sandbox Code Playgroud)
最后这是一个查询,以获取每批的取消数量
q)select numberOfCancellations:-1+count batch by batch from stab
batch| numberOfCancellations
-----| ---------------------
1 | 2
2 | 0
3 | 0
4 | 1
5 | 0
6 | 1
7 | 1
Run Code Online (Sandbox Code Playgroud)