cro*_*sso -1 bash json bigdata rethinkdb
我通过连接到某些API的脚本创建了超过50万个JSON文档.我想将这些文档导入RethinkDB,但似乎RethinkDB无法大量导入文件,因此我考虑将所有这些文件合并到一个大的JSON文件中(比如bigfile.json).这是他们的结构:
档案1.json:
{
"key_1": "value_1.1",
"key_2": "value_1.2",
"key_3": "value_1.3",
...
"key_n": "value_1.n"
}
Run Code Online (Sandbox Code Playgroud)
文件2.json:
{
"key_1": "value_2.1",
"key_2": "value_2.2",
"key_3": "value_2.3",
...
"key_n": "value_2.n"
}
...
Run Code Online (Sandbox Code Playgroud)
文件n.json:
{
"key_1": "value_n.1",
"key_2": "value_n.2",
"key_3": "value_n.3",
...
"key_n": "value_n.n"
}
Run Code Online (Sandbox Code Playgroud)
我想知道哪个是创建一个大JSON文件的最佳结构(完整,每个文件都有一个由3个变量组成的特定名称,第一个是时间戳(YYYYMMDDHHMMSS)),以及哪个命令或脚本(到现在为止)我只为bash编写脚本...)允许我产生合并.
你提到了bash,所以我假设你使用*nix你可以使用echo,cat并sed实现你想要的.
$ ls
file1.json file2.json merge_files.sh output
$ cat file1.json
{
"key_1": "value_1.1",
"key_2": "value_1.2",
"key_3": "value_1.3",
"key_n": "value_1.n"
}
$ ./merge_files.sh
$ cat output/out.json
{
"file1":
{
"key_1": "value_1.1",
"key_2": "value_1.2",
"key_3": "value_1.3",
"key_n": "value_1.n"
},
"file2":
{
"key_1": "value_2.1",
"key_2": "value_2.2",
"key_3": "value_2.3",
"key_n": "value_2.n"
}
}
Run Code Online (Sandbox Code Playgroud)
下面的脚本读入文件夹中的所有json文件,并将它们连接成一个以文件名为键的"大"文件.
#!/bin/bash
# create the output directory (if it does not exist)
mkdir -p output
# remove result from previous runs
rm output/*.json
# add first opening bracked
echo { >> output/tmp.json
# use all json files in current folder
for i in *.json
do
# first create the key; it is the filename without the extension
echo \"$i\": | sed 's/\.json//' >> output/tmp.json
# dump the file's content
cat "$i" >> output/tmp.json
# add a comma afterwards
echo , >> output/tmp.json
done
# remove the last comma from the file; otherwise it's not valid json
cat output/tmp.json | sed '$ s/.$//' >> output/out.json
# remove tempfile
rm output/tmp.json
# add closing bracket
echo } >> output/out.json
Run Code Online (Sandbox Code Playgroud)