nom*_*ype 5 python indentation literals python-3.x python-unicode
当我在 Python 中使用三引号多行字符串时,我倾向于使用 textwrap.dedent 来保持代码可读性,并具有良好的缩进:
some_string = textwrap.dedent("""
First line
Second line
...
""").strip()
Run Code Online (Sandbox Code Playgroud)
但是,在 Python 3.x 中, textwrap.dedent 似乎不适用于字节字符串。我在为返回长多行字节字符串的方法编写单元测试时遇到了这个问题,例如:
# The function to be tested
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
# Unit test
import unittest
import textwrap
class SomeTest(unittest.TestCase):
def test_some_function(self):
self.assertEqual(some_function(), textwrap.dedent(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""").strip())
if __name__ == '__main__':
unittest.main()
Run Code Online (Sandbox Code Playgroud)
在 Python 2.7.10 中,上述代码工作正常,但在 Python 3.4.3 中失败:
some_string = textwrap.dedent("""
First line
Second line
...
""").strip()
Run Code Online (Sandbox Code Playgroud)
那么:是否有替代 textwrap.dedent 的方法可以处理字节字符串?
答案2:textwrap主要是关于Textwrap类和函数。 dedent列在下面
# -- Loosely related functionality --------------------
Run Code Online (Sandbox Code Playgroud)
据我所知,唯一使它成为文本(unicode str)特定的东西是 re 文字。我为所有 6 加上前缀b,瞧!(我没有编辑任何其他内容,但应该调整函数文档字符串。)
import re
_whitespace_only_re = re.compile(b'^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile(b'(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
def dedent_bytes(text):
"""Remove any common leading whitespace from every line in `text`.
This can be used to make triple-quoted strings line up with the left
edge of the display, while still presenting them in the source code
in indented form.
Note that tabs and spaces are both treated as whitespace, but they
are not equal: the lines " hello" and "\\thello" are
considered to have no common leading whitespace. (This behaviour is
new in Python 2.5; older versions of this module incorrectly
expanded tabs before searching for common leading whitespace.)
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub(b'', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent
# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass
# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent
# Find the largest common whitespace between current line
# and previous winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
else:
margin = margin[:len(indent)]
# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split(b"\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
if margin:
text = re.sub(rb'(?m)^' + margin, b'', text)
return text
print(dedent_bytes(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")
)
# prints
b'\nLorem ipsum dolor sit amet\n consectetuer adipiscing elit\n'
Run Code Online (Sandbox Code Playgroud)
遗憾的是,它似乎dedent不支持字节串。但是,如果您想要交叉兼容的代码,我建议您利用该six库:
import sys, unittest
from textwrap import dedent
import six
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
class SomeTest(unittest.TestCase):
def test_some_function(self):
actual = some_function()
expected = six.b(dedent("""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")).strip()
self.assertEqual(actual, expected)
if __name__ == '__main__':
unittest.main()
Run Code Online (Sandbox Code Playgroud)
这与您在问题中的要点建议类似
我可以转换为 unicode,使用 textwrap.dedent,然后转换回字节。但这仅在字节字符串符合某种 Unicode 编码时才可行。
但是您在这里误解了有关编码的一些内容 - 如果您可以像这样在测试中首先编写字符串文字,并且让 python 成功解析文件(即正确的编码声明位于模块上),那么就有这里没有“转换为unicode”步骤。该文件以指定的编码(或者sys.defaultencoding,如果您没有指定)进行解析,然后当字符串是 python 变量时,它已经被解码。
| 归档时间: |
|
| 查看次数: |
1956 次 |
| 最近记录: |