合并具有重复键的 json 数组

ron*_*k22 6 arrays bash json inner-join jq

我想在 的帮助下合并两个 json 数组jq。数组中的每个对象都包含名称字段,它允许我对两个数组进行分组并将两个数组合并为一个。

标签

[
  {
    "name": "power_branch",
    "description": "master"
  },
  {
    "name": "test_branch",
    "description": "main"
  }
]
Run Code Online (Sandbox Code Playgroud)

跑步者

[
  {
    "name": "power_branch",
    "runner": "power",
    "runner_tag": "macos"
  },
  {
    "name": "power_branch",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "runner": "tester",
    "runner_tag": ""
  },
  {
    "name": "development",
    "runner": "dev",
    "runner_tag": "ubuntu"
  }
]
Run Code Online (Sandbox Code Playgroud)

所需输出

[
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "macos"
  },
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "description": "main",
    "runner": "tester",
    "runner_tag": ""
  }
]
Run Code Online (Sandbox Code Playgroud)

我尝试使用以下脚本,但 power_branch 条目被覆盖,相反我想要另一个具有不同 runner_tag 的条目

#!/usr/bin/bash

LABELS='[{"name": "power_branch","description": "master"},{"name": "test_branch","description": "main"}]'
RUNNERS='''
[
  { "name": "power_branch", "runner": "power", "runner_tag": "macos" },
  { "name": "power_branch", "runner": "power", "runner_tag": "ubuntu" },
  { "name": "test_branch", "runner": "tester", "runner_tag": "" },
  { "name": "development", "runner": "dev", "runner_tag": "ubuntu" }
]
'''

FINAL=$(jq -s '[ .[0] + .[1] | group_by(.name)[] | select(length > 1) | add]' <(echo $LABELS) <(echo $RUNNERS))
echo $FINAL
Run Code Online (Sandbox Code Playgroud)

输出

[
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "description": "main",
    "runner": "tester",
    "runner_tag": ""
  }
]
Run Code Online (Sandbox Code Playgroud)

pmf*_*pmf 5

如果您有两个文件labels.jsonrunners.json,您可以使用 读取后者(运行程序)作为变量,并使用确定的相应字段--argjson附加到输入数组(标签)的每个元素。mapselect

\n
jq --argjson runners "$(cat runners.json)" \'\n  map(.name as $name | . + ($runners[] | select(.name == $name)))\n\' labels.json\n
Run Code Online (Sandbox Code Playgroud)\n

但是,这会将整个 runners 数组读入 shell 命令行空间(--argjson采用两个字符串:名称和值),如果 runners 数组变得足够大,则很容易溢出。

\n

因此,您可以不使用命令替换 "$(\xe2\x80\xa6)",而是直接使用--slurpfile另一个迭代级别的成本读取运行程序文件[][],或者(尽管手册说不要 - 在注释中阅读更多相关信息)--argfile仅使用单个迭代级别像之前一样:

\n
jq --slurpfile runners runners.json \'\n  map(.name as $name | . + ($runners[][] | select(.name == $name)))\n\' labels.json\n
Run Code Online (Sandbox Code Playgroud)\n
jq --argfile runners runners.json \'\n  map(.name as $name | . + ($runners[] | select(.name == $name)))\n\' labels.json\n
Run Code Online (Sandbox Code Playgroud)\n

为了避免所有这些问题,@peak建议input将每个文件与该选项一起使用-n。请注意,这要求这两个文件按照严格的顺序提供,因为它们是按顺序读取的。

\n
jq -n \'input as $runners | input |\n  map(.name as $name | . + ($runners[] | select(.name == $name)))\n\' runners.json labels.json\n
Run Code Online (Sandbox Code Playgroud)\n

由于第二个input(标签)直接作为过滤器的主要输入传递(与存储在变量中供以后使用的运行程序相反),因此可以通过再次删除选项-n(文件的顺序)来进一步简化仍然很重要):

\n
jq \'input as $runners |\n  map(.name as $name | . + ($runners[] | select(.name == $name)))\n\' runners.json labels.json\n
Run Code Online (Sandbox Code Playgroud)\n

最后,这是使用SQL 样式运算符的 INDEX另一种方法,JOIN该方法是在 jq v1.6 中引入的。这也采用了仅使用一个的技术input,而且文件的顺序仍然很重要,因为我们需要跑步者数组作为过滤器的主要输入。

\n
jq \'\n  JOIN(INDEX(input[]; .name); .name) | map(select(.[1]) | add)\n\' runners.json labels.json\n
Run Code Online (Sandbox Code Playgroud)\n