kal*_*sin 7 python newline line-breaks
给定一个未知来源的文本字符串,如何最好地重写它以获得已知的lineend-convention?
我经常这样做:
lines = text.splitlines()
text = '\n'.join(lines)
Run Code Online (Sandbox Code Playgroud)
...但是这并不处理完全混淆的约定的"混合"文本文件(是的,它们仍然存在!).
我正在做的事情当然是:
'\n'.join(text.splitlines())
Run Code Online (Sandbox Code Playgroud)
......那不是我要问的.
之后的总行数应该相同,因此不会剥离空行.
拆分
'a\nb\n\nc\nd'
'a\r\nb\r\n\r\nc\r\nd'
'a\rb\r\rc\rd'
'a\rb\n\rc\rd'
'a\rb\r\nc\nd'
'a\nb\r\nc\rd'
Run Code Online (Sandbox Code Playgroud)
..应该全部产生5行.在混合上下文中,splitlines假定'\ r \n'是单个逻辑换行符,导致最后两个测试用例为4行.
Hm,包含'\ r \n'的混合上下文可以通过比较splitlines()和split('\n')和/或split('\ r')的结果来检测...
dot*_*mag 14
mixed.replace('\r\n', '\n').replace('\r', '\n')
Run Code Online (Sandbox Code Playgroud)
应该处理所有可能的变种.
...但是这不处理完全混淆的约定的"混合"文本文件(是的,它们仍然存在!)
实际上它应该工作正常:
>>> s = 'hello world\nline 1\r\nline 2'
>>> s.splitlines()
['hello world', 'line 1', 'line 2']
>>> '\n'.join(s.splitlines())
'hello world\nline 1\nline 2'
Run Code Online (Sandbox Code Playgroud)
您使用的是哪个版本的Python?
编辑:我仍然不知道你怎么splitlines()没有工作:
>>> s = '''\
... First line, with LF\n\
... Second line, with CR\r\
... Third line, with CRLF\r\n\
... Two blank lines with LFs\n\
... \n\
... \n\
... Two blank lines with CRs\r\
... \r\
... \r\
... Two blank lines with CRLFs\r\n\
... \r\n\
... \r\n\
... Three blank lines with a jumble of things:\r\n\
... \r\
... \r\n\
... \n\
... End without a newline.'''
>>> s.splitlines()
['First line, with LF', 'Second line, with CR', 'Third line, with CRLF', 'Two blank lines with LFs', '', '', 'Two blank lines with CRs', '', '', 'Two blank lines with CRLFs', '', '', 'Three blank lines with a jumble of things:', '', '', '', 'End without a newline.']
>>> print '\n'.join(s.splitlines())
First line, with LF
Second line, with CR
Third line, with CRLF
Two blank lines with LFs
Two blank lines with CRs
Two blank lines with CRLFs
Three blank lines with a jumble of things:
End without a newline.
Run Code Online (Sandbox Code Playgroud)
据我所知splitlines(),没有两次或任何分割列表.
你能粘贴那些给你带来麻烦的输入样本吗?
| 归档时间: |
|
| 查看次数: |
5092 次 |
| 最近记录: |