Python YAML 到 JSON 到 YAML

Tom*_* B. 6 python json yaml

我是 python 的新手,所以我正在构建一个简单的程序来将 YAML 解析为 JSON,并将 JSON 解析为 YAML。

yaml2json上一行YAML转换到JSON,但JSON验证说,这是正确的。

到目前为止,这是我的代码:

def parseyaml(inFileType, outFileType):
   infile = input('Please enter a {} filename to parse: '.format(inFileType))
   outfile = input('Please enter a {} filename to output: '.format(outFileType))

   with open(infile, 'r') as stream:
       try:
           datamap = yaml.safe_load(stream)
           with open(outfile, 'w') as output:
               json.dump(datamap, output)
       except yaml.YAMLError as exc:
           print(exc)

    print('Your file has been parsed.\n\n')


def parsejson(inFileType, outFileType):
   infile = input('Please enter a {} filename to parse: '.format(inFileType))
   outfile = input('Please enter a {} filename to output: '.format(outFileType))

   with open(infile, 'r') as stream:
       try:
           datamap = json.load(stream)
           with open(outfile, 'w') as output:
               yaml.dump(datamap, output)
       except yaml.YAMLError as exc:
           print(exc)

   print('Your file has been parsed.\n\n')
Run Code Online (Sandbox Code Playgroud)

原始 YAML 与新 YAML 的示例

原来的:

inputs:
  webTierCpu:
    type: integer
    minimum: 2
    default: 2
    maximum: 5
    title: Web Server CPU Count
    description: The number of CPUs for the Web nodes
Run Code Online (Sandbox Code Playgroud)

新的:

inputs:
  dbTierCpu: {default: 2, description: The number of CPUs for the DB node, maximum: 5,
    minimum: 2, title: DB Server CPU Count, type: integer}
Run Code Online (Sandbox Code Playgroud)

它看起来不像是解码所有的 JSON,所以我不确定下一步应该去哪里......

Ant*_*hon 4

您的文件正在丢失其格式,因为dump默认情况下原始例程以 YAML 流样式写入所有叶节点,而您的输入始终是块样式。

您还会丢失键的顺序,首先是因为 JSON 解析器使用 dict,其次是因为dump对输出进行排序。

如果您查看中间 JSON,您已经发现密钥顺序此时已消失。为了保留这一点,请使用新的 API 加载 YAML,并使用特殊的 JSON 编码器作为转储的替代品,该编码器可以处理Mapping加载 YAML 的子类,类似于 标准 Python 文档中的此示例。

假设您的 YAML 存储在input.yaml

import sys
import json
from collections.abc import Mapping, Sequence
from collections import OrderedDict
import ruamel.yaml

# if you instantiate a YAML instance as yaml, you have to explicitly import the error
from ruamel.yaml.error import YAMLError


yaml = ruamel.yaml.YAML()  # this uses the new API
# if you have standard indentation, no need to use the following
yaml.indent(sequence=4, offset=2)

input_file = 'input.yaml'
intermediate_file = 'intermediate.json'
output_file = 'output.yaml'


class OrderlyJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Mapping):
            return OrderedDict(o)
        elif isinstance(o, Sequence):
            return list(o)
        return json.JSONEncoder.default(self, o)


def yaml_2_json(in_file, out_file):
    with open(in_file, 'r') as stream:
        try:
            datamap = yaml.load(stream)
            with open(out_file, 'w') as output:
                output.write(OrderlyJSONEncoder(indent=2).encode(datamap))
        except YAMLError as exc:
            print(exc)
            return False
    return True


yaml_2_json(input_file, intermediate_file)
with open(intermediate_file) as fp:
    sys.stdout.write(fp.read())
Run Code Online (Sandbox Code Playgroud)

这使:

{
  "inputs": {
    "webTierCpu": {
      "type": "integer",
      "minimum": 2,
      "default": 2,
      "maximum": 5,
      "title": "Web Server CPU Count",
      "description": "The number of CPUs for the Web nodes"
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

您会看到 JSON 具有适当的键顺序,我们也需要在加载时保留该顺序。您可以通过指定将 JSON对象Mapping加载到YAML 解析器在内部使用的 的子类中 (通过提供object_pairs_hook.

from ruamel.yaml.comments import CommentedMap


def json_2_yaml(in_file, out_file):
    with open(in_file, 'r') as stream:
        try:
            datamap = json.load(stream, object_pairs_hook=CommentedMap)
            # if you need to "restore" literal style scalars, etc.
            # walk_tree(datamap)
            with open(out_file, 'w') as output:
                yaml.dump(datamap, output)
        except yaml.YAMLError as exc:
            print(exc)
            return False
    return True


json_2_yaml(intermediate_file, output_file)
with open(output_file) as fp:
    sys.stdout.write(fp.read())
Run Code Online (Sandbox Code Playgroud)

哪个输出:

inputs:
  webTierCpu:
    type: integer
    minimum: 2
    default: 2
    maximum: 5
    title: Web Server CPU Count
    description: The number of CPUs for the Web nodes
Run Code Online (Sandbox Code Playgroud)

我希望这与您的原始输入足够相似,可以接受。

笔记:

  • 使用新 API 时,我倾向于使用yaml,ruamel.yaml.YAML()而不是from ruamel import yaml. 然而,这掩盖了 的使用,yaml.YAMLError因为错误类不是 的属性YAML()

  • 如果您正在开发此类内容,我建议您至少从实际功能中删除用户输入。编写parseyamlparsejson调用yaml_2_jsonresp 应该很简单。 json_2_yaml

  • 原始 YAML 文件中的任何注释都将丢失,尽管 ruamel.yaml 可以加载它们。JSON 最初确实允许注释,但它不在规范中,而且我知道没有解析器可以输出注释。


由于您的真实文件具有文字块标量,因此您必须使用一些魔法来恢复它们。

包括以下遍历树的函数,递归到 dict 值和列表元素,并将带有嵌入换行符的任何行转换为将输出作为文字块样式标量就位到 YAML 的类型(因此没有返回值):

from ruamel.yaml.scalarstring import PreservedScalarString, SingleQuotedScalarString
from ruamel.yaml.compat import string_types, MutableMapping, MutableSequence

def preserve_literal(s):
    return PreservedScalarString(s.replace('\r\n', '\n').replace('\r', '\n'))

def walk_tree(base):
    if isinstance(base, MutableMapping):
        for k in base:
            v = base[k]  # type: Text
            if isinstance(v, string_types):
                if '\n' in v:
                    base[k] = preserve_literal(v)
                elif '${' in v or ':' in v:
                    base[k] = SingleQuotedScalarString(v)
            else:
                walk_tree(v)
    elif isinstance(base, MutableSequence):
        for idx, elem in enumerate(base):
            if isinstance(elem, string_types):
                if '\n' in elem:
                    base[idx] = preserve_literal(elem)
                elif '${' in elem or ':' in elem:
                    base[idx] = SingleQuotedScalarString(elem)
            else:
                walk_tree(elem)
Run Code Online (Sandbox Code Playgroud)

然后做

    walk_tree(datamap)
Run Code Online (Sandbox Code Playgroud)

从 JSON 加载数据后。

有了上述所有内容,您的文件中应该只有一行不同Wordpress.yaml