如何控制PyYAML用于数据的标量形式?

Ned*_*der 21 python yaml pyyaml

我有一个带有短字符串属性的对象,以及一个长多行字符串属性.我想将短字符串写为YAML引用的标量,将多行字符串写为文字标量:

my_obj.short = "Hello"
my_obj.long = "Line1\nLine2\nLine3"
Run Code Online (Sandbox Code Playgroud)

我希望YAML看起来像这样:

short: "Hello"
long: |
  Line1
  Line2
  Line3
Run Code Online (Sandbox Code Playgroud)

我怎样才能指示PyYAML这样做?如果我调用yaml.dump(my_obj)它,它会产生一个类似dict的输出:

{long: 'line1

    line2

    line3

    ', short: Hello}
Run Code Online (Sandbox Code Playgroud)

(不确定为什么长的是这样的双倍间距...)

我可以指示PyYAML如何对待我的属性吗?我想影响顺序和风格.

jfs*_*jfs 20

基于Python中的任何yaml库,支持将长字符串转储为块文字或折叠块?

import yaml
from collections import OrderedDict

class quoted(str):
    pass

def quoted_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)

class literal(str):
    pass

def literal_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)

def ordered_dict_presenter(dumper, data):
    return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)

d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))

print(yaml.dump(d))
Run Code Online (Sandbox Code Playgroud)

产量

short: "Hello"
long: |
  Line1
  Line2
  Line3
Run Code Online (Sandbox Code Playgroud)

  • 有什么办法做到这一点,从而不影响全局yaml状态,但确实影响对`dump()`的单个调用? (2认同)

xen*_*soz 19

爱上@ lbt的方法,我得到了这个代码:

import yaml

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
Run Code Online (Sandbox Code Playgroud)

它使每个多行字符串成为块文字.

我试图避免猴子修补部分.完全归功于@lbt和@JFSebastian.

  • 简单地测试“if '\n' in data”而不是“splitlines”,更便宜并且可以做同样的事情。 (4认同)
  • 一种很好的方法,可以避免显式标记输入字符串.您可以使用`is_multiline = lambda s:len(s.splitlines())> 1`自动识别Unicode换行符,并且对于单行不返回true. (2认同)
  • 嗯,`style='|'` 似乎不影响 pyyaml (2认同)

lbt*_*lbt 12

我希望其中的任何输入\n都是块文字.使用代码yaml/representer.py作为基础我得到:

# -*- coding: utf-8 -*-
import yaml

def should_use_block(value):
    for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
        if c in value:
            return True
    return False

def my_represent_scalar(self, tag, value, style=None):
    if style is None:
        if should_use_block(value):
             style='|'
        else:
            style = self.default_style

    node = yaml.representer.ScalarNode(tag, value, style=style)
    if self.alias_key is not None:
        self.represented_objects[self.alias_key] = node
    return node


a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}

print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
Run Code Online (Sandbox Code Playgroud)

产量

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: 'Lêne1

    Lêne2

    Lêne3

    ', short: Hello}

After override

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: |
  Lêne1
  Lêne2
  Lêne3
short: Hello
Run Code Online (Sandbox Code Playgroud)


Ant*_*hon 5

ruamel.yaml除了做您想做的事情之外,您还可以使用及其RoundTripLoader / Dumper(免责声明:我是该程序包的作者),它支持YAML 1.2规范(从2009年开始),并且还具有其他一些改进:

import sys
from ruamel.yaml import YAML

yaml_str = """\
short: "Hello"  # does keep the quotes, but need to tell the loader
long: |
  Line1
  Line2
  Line3
folded: >
  some like
  explicit folding
  of scalars
  for readability
"""

yaml = YAML()
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
Run Code Online (Sandbox Code Playgroud)

给出:

short: "Hello"  # does keep the quotes, but need to tell the loader
long: |
  Line1
  Line2
  Line3
folded: >
  some like
  explicit folding
  of scalars
  for readability
Run Code Online (Sandbox Code Playgroud)

(包括评论,从与以前相同的列开始)

您也可以从头开始创建此输出,但是随后您确实需要提供额外的信息,例如折叠位置的明确位置。


bri*_*ian 5

值得注意的是 pyyaml不允许块标量中存在尾随空格,并将强制内容采用双引号格式。看来很多人都遇到过这个问题。如果您不关心能够往返数据,这将删除那些尾随空格:

def str_presenter(dumper, data):
    if len(data.splitlines()) > 1 or '\n' in data:  
        text_list = [line.rstrip() for line in data.splitlines()]
        fixed_data = "\n".join(text_list)
        return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
Run Code Online (Sandbox Code Playgroud)