Python中的任何yaml库都支持将长字符串转储为块文字或折叠块?

gui*_*ism 19 python yaml pyyaml

我希望能够转储一个字典,其中包含我希望在块样式中具有的长字符串以便于阅读.例如:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block
Run Code Online (Sandbox Code Playgroud)

PyYAML支持使用这种样式加载文档,但我似乎找不到以这种方式转储文档的方法.我错过了什么吗?

Gar*_*rwe 25

import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)
Run Code Online (Sandbox Code Playgroud)

结果:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/
Run Code Online (Sandbox Code Playgroud)

为了完整性,还应该有str实现,但我会变懒:-)

  • 乍一看,我以为这是一个阴茎.我到底在看什么? (10认同)
  • 这是纸质卷轴,是从pyyaml文档中复制并粘贴的。 (2认同)
  • 谢谢你!python 3 的一个注意事项:类定义中的“unicode”应替换为“str”。 (2认同)

dno*_*zay 18

pyyaml 支持转储文字或折叠块.

运用 Representer.add_representer

定义类型:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass
Run Code Online (Sandbox Code Playgroud)

然后,您可以为这些类型定义代表.请注意,虽然Gary解决方案适用于unicode,但您可能需要更多工作才能使字符串正常工作(请参阅代表实现).

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)
Run Code Online (Sandbox Code Playgroud)

然后,您可以将这些代表添加到默认转储程序:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)
Run Code Online (Sandbox Code Playgroud)

...并测试它:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)
Run Code Online (Sandbox Code Playgroud)

结果:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
Run Code Online (Sandbox Code Playgroud)

运用 default_style

如果您有兴趣让所有字符串都遵循默认样式,您还可以使用default_style关键字参数,例如:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3
Run Code Online (Sandbox Code Playgroud)

或折叠文字:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3
Run Code Online (Sandbox Code Playgroud)

或双引文字:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"
Run Code Online (Sandbox Code Playgroud)

注意事项:

以下是您可能不期望的一些示例:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)
Run Code Online (Sandbox Code Playgroud)

结果是:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "
Run Code Online (Sandbox Code Playgroud)

1)不可打印的字符

有关转义字符的信息,请参阅YAML规范(第5.7节):

请注意,转义序列仅在双引号标量中进行解释.在所有其他标量样式中,"\"字符没有特殊含义,并且不可打印的字符不可用.

如果要保留不可打印的字符(例如TAB),则需要使用双引号标量.如果您能够转储具有文字样式的标量,并且那里有一个不可打印的字符(例如TAB),则您的YAML转储器不符合要求.

例如,即使指定了默认样式,也会pyyaml检测不可打印的字符\t并使用双引号样式:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"
Run Code Online (Sandbox Code Playgroud)

2)前导和尾随空格

规范中的另一些有用信息是:

所有前导和尾随空白字符都从内容中排除

这意味着如果您的字符串确实具有前导或尾随空格,则除了双引号之外,它们不会保留在标量样式中.因此,pyyaml尝试检测标量中的内容并强制使用双引号样式.

  • 您会得到剥离 chomping 指示符(“|”和“>”后面的破折号),因为您没有以换行符结尾的字符串,其中 OP 的 YAML 在两个标量上都有这些单个尾随换行符。只需添加换行符就可以解决这个问题。您也不会将折叠标量折叠到OP期望折叠的地方,从而导致所有内容都在一行上(因为它足够宽)。这种差异在 PyYAML 中不太容易解决。 (3认同)