Onk*_*tem 6 sql json group-by aggregate-functions jq
以前在这里问过类似的问题:
计算单个键的项目数: jq 按特定键计算 json 中的项目数
计算对象值的总和: 如何对 jq 中的映射数组中的值求和?
如何模拟 COUNT 聚合函数,它的行为应该与其 SQL 原始函数类似?让我们进一步扩展这个问题以包括其他常规 SQL 函数:
最后一个不是标准的 SQL 函数——它来自 PostgreSQL 但非常有用。
输入端是一个有效的 JSON 对象流。为了演示,让我们选择一个关于主人和他们的宠物的简单故事。
基础关系:所有者
id name age
1 Adams 25
2 Baker 55
3 Clark 40
4 Davis 31
Run Code Online (Sandbox Code Playgroud)
基础关系:宠物
id name litter owner_id
10 Bella 4 1
20 Lucy 2 1
30 Daisy 3 2
40 Molly 4 3
50 Lola 2 4
60 Sadie 4 4
70 Luna 3 4
Run Code Online (Sandbox Code Playgroud)
从上面我们得到一个以 JSON 格式呈现的派生关系Owner_Pet(上述关系的 SQL JOIN 的结果),用于我们的 jq 查询(源数据):
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 10, "pet": "Bella", "litter": 4 }
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 20, "pet": "Lucy", "litter": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pet_id": 30, "pet": "Daisy", "litter": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pet_id": 40, "pet": "Molly", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 50, "pet": "Lola", "litter": 2 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 60, "pet": "Sadie", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 70, "pet": "Luna", "litter": 3 }
Run Code Online (Sandbox Code Playgroud)
以下是示例请求及其预期输出:
{ "owner_id": 1, "owner": "Adams", "age": 25, "pets_count": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets_count": 1 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets_count": 1 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets_count": 3 }
Run Code Online (Sandbox Code Playgroud)
{ "owner_id": 1, "owner": "Adams", "age": 25, "litter_total": 6, "litter_max": 4 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "litter_total": 3, "litter_max": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "litter_total": 4, "litter_max": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "litter_total": 9, "litter_max": 4 }
Run Code Online (Sandbox Code Playgroud)
{ "owner_id": 1, "owner": "Adams", "age": 25, "pets": [ "Bella", "Lucy" ] }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets": [ "Daisy" ] }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets": [ "Molly" ] }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets": [ "Lola", "Sadie", "Luna" ] }
Run Code Online (Sandbox Code Playgroud)
这是一种替代方案,不使用任何带有基本 JQ 的自定义函数。(我冒昧地删除了问题中多余的部分)
数数
In> jq -s 'group_by(.owner_id) | map({ owner_id: .[0].owner_id, count: map(.pet) | length})'
Out>[{"owner_id": "1","pets_count": 2}, ...]
Run Code Online (Sandbox Code Playgroud)
和
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, sum: map(.litter) | add})'
Out> [{"owner_id": "1","sum": 6}, ...]
Run Code Online (Sandbox Code Playgroud)
最大限度
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, max: map(.litter) | max})'
Out> [{"owner_id": "1","max": 4}, ...]
Run Code Online (Sandbox Code Playgroud)
总计的
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, agg: map(.pet) })'
Out> [{"owner_id": "1","agg": ["Bella","Lucy"]}, ...]
Run Code Online (Sandbox Code Playgroud)
当然,这些可能不是最有效的实现,但它们很好地展示了如何自己实现自定义函数。不同函数之间的所有变化都在最后一个函数map和管道|( length, add, max)之后的函数内部
第一个映射迭代不同的组,从第一个项目中获取名称,并再次使用映射来迭代同一组的项目。不像 SQL 那样漂亮,但也没有复杂得多。
我今天学习了 JQ,并且已经成功做到了这一点,所以这对于任何入门的人来说都是令人鼓舞的。JQ 既不像 sed 也不像 SQL,但也不是很难。
扩展jq解决方案:
自定义count()功能:
jq -sc 'def count($k): group_by(.[$k])[] | length as $l | .[0]
| .pets_count = $l
| del(.pet_id, .pet, .litter);
count("owner_id")' source.data
Run Code Online (Sandbox Code Playgroud)
输出:
{"owner_id":1,"owner":"Adams","age":25,"pets_count":2}
{"owner_id":2,"owner":"Baker","age":55,"pets_count":1}
{"owner_id":3,"owner":"Clark","age":40,"pets_count":1}
{"owner_id":4,"owner":"Davis","age":31,"pets_count":3}
Run Code Online (Sandbox Code Playgroud)
自定义sum()功能:
jq -sc 'def sum($k): group_by(.[$k])[] | map(.litter) as $litters | .[0]
| . + {litter_total: $litters | add, litter_max: $litters | max}
| del(.pet_id, .pet, .litter);
sum("owner_id")' source.data
Run Code Online (Sandbox Code Playgroud)
输出:
{"owner_id":1,"owner":"Adams","age":25,"litter_total":6,"litter_max":4}
{"owner_id":2,"owner":"Baker","age":55,"litter_total":3,"litter_max":3}
{"owner_id":3,"owner":"Clark","age":40,"litter_total":4,"litter_max":4}
{"owner_id":4,"owner":"Davis","age":31,"litter_total":9,"litter_max":4}
Run Code Online (Sandbox Code Playgroud)
自定义array_agg()功能:
jq -sc 'def array_agg($k): group_by(.[$k])[] | map(.pet) as $pets | .[0]
| .pets = $pets | del(.pet_id, .pet, .litter);
array_agg("owner_id")' source.data
Run Code Online (Sandbox Code Playgroud)
输出:
{"owner_id":1,"owner":"Adams","age":25,"pets":["Bella","Lucy"]}
{"owner_id":2,"owner":"Baker","age":55,"pets":["Daisy"]}
{"owner_id":3,"owner":"Clark","age":40,"pets":["Molly"]}
{"owner_id":4,"owner":"Davis","age":31,"pets":["Lola","Sadie","Luna"]}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4419 次 |
| 最近记录: |