yjh*_*ody 4 database pivot clickhouse dolphindb
我想对pivot某些数据做一些操作。就像跟随。
>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar A B C
foo
one 1 2 3
two 4 5 6
Run Code Online (Sandbox Code Playgroud)
我知道DolphinDB可以pivot在sql中完成。
dateValue=2007.08.01
num=500
syms = (exec count(*) from taq
where
date = dateValue,
time between 09:30:00 : 15:59:59,
0<bid, bid<ofr, ofr<bid*1.2
group by symbol order by count desc).symbol[0:num]
priceMatrix = exec avg(bid + ofr)/2.0 as price from taq
where
date = dateValue, Symbol in syms,
0<bid, bid<ofr, ofr<bid*1.2,
time between 09:30:00 : 15:59:59
pivot by time.minute() as minute, Symbol
Run Code Online (Sandbox Code Playgroud)
但是如何pivot在Clickhouse中做?我应该使用客户端API来获取数据吗?但是行太多了,很难处理很多行。而且如果我不能使用pandas,如何pivot轻松实现操作?
这是可以帮助您开始的初步实施。
备注:
不支持行中的“孔”(每列应包含值)
转换为通用类型(字符串)的所有列的类型
介绍了字段orderNum。它是结果中源列的订单号(例如,“ bar”列为第二列)
结果表示为具有Array类型的一列的行。数组项的顺序由orderNum定义。
准备测试数据:
CREATE TABLE test.pivot_test
(
orderNum Int,
s String,
values Array(String)
) ENGINE = Memory;
INSERT INTO test.pivot_test
VALUES
(1, 'foo', ['one', 'one', 'one', 'two', 'two', 'two']),
(3, 'baz', ['1', '2', '3', '4', '5', '6']),
(4, 'zoo', ['x', 'y', 'z', 'q', 'w', 't']),
(2, 'bar', ['A', 'B', 'C', 'A', 'B', 'C']);
/*
The content of table test.pivot_test:
??orderNum???s?????values?????????????????????????????????
? 1 ? foo ? ['one','one','one','two','two','two'] ?
? 3 ? baz ? ['1','2','3','4','5','6'] ?
? 4 ? zoo ? ['x','y','z','q','w','t'] ?
? 2 ? bar ? ['A','B','C','A','B','C'] ?
??????????????????????????????????????????????????????????
*/
Run Code Online (Sandbox Code Playgroud)
枢轴模拟:
SELECT arrayMap(x -> x.1, arraySort(x -> x.2, groupArray(value_ordernum))) as row
FROM
(
SELECT
(value, orderNum) AS value_ordernum,
value_index
FROM test.pivot_test
ARRAY JOIN
values AS value,
arrayEnumerate(values) AS value_index
/*
The result of execution the nested query:
??value_ordernum???value_index??
? ('one',1) ? 1 ?
? ('one',1) ? 2 ?
? ('one',1) ? 3 ?
? ('two',1) ? 4 ?
? ('two',1) ? 5 ?
? ('two',1) ? 6 ?
? ('1',3) ? 1 ?
? ('2',3) ? 2 ?
? ('3',3) ? 3 ?
? ('4',3) ? 4 ?
? ('5',3) ? 5 ?
? ('6',3) ? 6 ?
? ('x',4) ? 1 ?
? ('y',4) ? 2 ?
? ('z',4) ? 3 ?
? ('q',4) ? 4 ?
? ('w',4) ? 5 ?
? ('t',4) ? 6 ?
? ('A',2) ? 1 ?
? ('B',2) ? 2 ?
? ('C',2) ? 3 ?
? ('A',2) ? 4 ?
? ('B',2) ? 5 ?
? ('C',2) ? 6 ?
????????????????????????????????
*/
)
GROUP BY value_index;
/*
The final result:
??row??????????????????
? ['two','A','4','q'] ?
? ['one','C','3','z'] ?
? ['one','B','2','y'] ?
? ['two','B','5','w'] ?
? ['one','A','1','x'] ?
? ['two','C','6','t'] ?
???????????????????????
*/
Run Code Online (Sandbox Code Playgroud)