在字符串中规范化lineends的最pythonic方法是什么?

kal*_*sin 7 python newline line-breaks

给定一个未知来源的文本字符串,如何最好地重写它以获得已知的lineend-convention?

我经常这样做:

lines = text.splitlines()
text = '\n'.join(lines)
Run Code Online (Sandbox Code Playgroud)

...但是这并不处理完全混淆的约定的"混合"文本文件(是的,它们仍然存在!).

编辑

我正在做的事情当然是:

'\n'.join(text.splitlines())
Run Code Online (Sandbox Code Playgroud)

......那不是我要问的.

之后的总行数应该相同,因此不会剥离空行.

测试用例

拆分

'a\nb\n\nc\nd'
'a\r\nb\r\n\r\nc\r\nd'
'a\rb\r\rc\rd'
'a\rb\n\rc\rd'
'a\rb\r\nc\nd'
'a\nb\r\nc\rd'
Run Code Online (Sandbox Code Playgroud)

..应该全部产生5行.在混合上下文中,splitlines假定'\ r \n'是单个逻辑换行符,导致最后两个测试用例为4行.

Hm,包含'\ r \n'的混合上下文可以通过比较splitlines()和split('\n')和/或split('\ r')的结果来检测...

dot*_*mag 14

mixed.replace('\r\n', '\n').replace('\r', '\n')
Run Code Online (Sandbox Code Playgroud)

应该处理所有可能的变种.


Ste*_*osh 7

...但是这不处理完全混淆的约定的"混合"文本文件(是的,它们仍然存在!)

实际上它应该工作正常:

>>> s = 'hello world\nline 1\r\nline 2'

>>> s.splitlines()
['hello world', 'line 1', 'line 2']

>>> '\n'.join(s.splitlines())
'hello world\nline 1\nline 2'
Run Code Online (Sandbox Code Playgroud)

您使用的是哪个版本的Python?

编辑:我仍然不知道你怎么splitlines()没有工作:

>>> s = '''\
... First line, with LF\n\
... Second line, with CR\r\
... Third line, with CRLF\r\n\
... Two blank lines with LFs\n\
... \n\
... \n\
... Two blank lines with CRs\r\
... \r\
... \r\
... Two blank lines with CRLFs\r\n\
... \r\n\
... \r\n\
... Three blank lines with a jumble of things:\r\n\
... \r\
... \r\n\
... \n\
... End without a newline.'''

>>> s.splitlines()
['First line, with LF', 'Second line, with CR', 'Third line, with CRLF', 'Two blank lines with LFs', '', '', 'Two blank lines with CRs', '', '', 'Two blank lines with CRLFs', '', '', 'Three blank lines with a jumble of things:', '', '', '', 'End without a newline.']

>>> print '\n'.join(s.splitlines())
First line, with LF
Second line, with CR
Third line, with CRLF
Two blank lines with LFs


Two blank lines with CRs


Two blank lines with CRLFs


Three blank lines with a jumble of things:



End without a newline.
Run Code Online (Sandbox Code Playgroud)

据我所知splitlines(),没有两次或任何分割列表.

你能粘贴那些给你带来麻烦的输入样本吗?