用python从txt中删除空格

ays*_*sha 10 python regex whitespace python-2.7 shlex

我有一个.txt文件(从网站上格式化为预先格式化的文本),其中数据如下所示:

B, NICKOLAS                       CT144531X       D1026    JUDGE ANNIE WHITE JOHNSON  
ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS        
Run Code Online (Sandbox Code Playgroud)

我想删除列之间的所有额外空格(它们实际上是不同数量的空格,而不是制表符).我还想用一些分隔符替换它(tab或pipe,因为数据中有逗号),如下所示:

ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Run Code Online (Sandbox Code Playgroud)

环顾四周,发现最好的选择是使用正则表达式或shlex来分割.两个类似的场景:

tim*_*geb 7

您可以将正则表达式'\s{2,}'(两个或多个空白字符)应用于每一行,并将匹配替换为单个'|'字符.

>>> import re
>>> line = 'ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS        '
>>> re.sub('\s{2,}', '|', line.strip())
'ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS'
Run Code Online (Sandbox Code Playgroud)

在应用之前从行中去除任何前导和尾随空格re.sub可确保您不会'|'在行的开头和结尾处获取字符.

您的实际代码应该类似于:

import re
with open(filename) as f:
    for line in f:
        subbed = re.sub('\s{2,}', '|', line.strip())
        # do something here
Run Code Online (Sandbox Code Playgroud)


Ahs*_*que 6

那这个呢?

your_string ='ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS'
print re.sub(r'\s{2,}','|',your_string.strip())
Run Code Online (Sandbox Code Playgroud)

输出:

ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Run Code Online (Sandbox Code Playgroud)

Expanation:

我使用过re.sub()3个参数,一个模式,一个你要替换的字符串和你想要处理的字符串.

我所做的是将至少两个空间放在一起,我用一个空格替换它们|并将它应用在你的弦上.


Vel*_*uha 5

s = """B, NICKOLAS                       CT144531X       D1026    JUDGE ANNIE WHITE JOHNSON  
ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS
"""

# Update
re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
In [71]: print re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
B, NICKOLAS|CT144531X|D1026|JUDGE ANNIE WHITE JOHNSON  
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Run Code Online (Sandbox Code Playgroud)