我有一个看起来像这样的文本文件:
“Distance 1: Distance XY” 1 2 4 5 9 “Distance 2: Distance XY” 3 6 8 10 5 “Distance 3: Distance XY” 88 45 36 12 4
Run Code Online (Sandbox Code Playgroud)
这一切都在一条大线上.我的问题是我如何采取这个并分开距离测量,以使线看起来更像这样:
“Distance 1: Distance XY” 1 2 4 5 9
“Distance 2: Distance XY” 3 6 8 10 5
“Distance 3: Distance XY” 88 45 36 12 4
Run Code Online (Sandbox Code Playgroud)
我想这样做,为每个距离测量制作一本字典.
您可以使用re.split正则表达式拆分字符串:
import re
s = '\"Distance 1: Distance XY\" 1 2 4 5 9 \"Distance 2: Distance XY\" 3 6 8 10 5 \"Distance 3: Distance XY\" 88 45 36 12 4'
re.split(r'(?<=\d)\s+(?=\")', s)
# ['"Distance 1: Distance XY" 1 2 4 5 9',
# '"Distance 2: Distance XY" 3 6 8 10 5',
# '"Distance 3: Distance XY" 88 45 36 12 4']
Run Code Online (Sandbox Code Playgroud)
(?<=\d)\s+(?=\") 将分隔符约束为数字和引号之间的空格.
如果它是文本文件中的\"智能引号,请替换为智能引号,选项+ [在mac上,请在此处查看Windows:
with open("test.txt", 'r') as f:
for line in f:
print(re.split(r'(?<=\d)\s+(?=“)', line.rstrip("\n")))
# ['“Distance 1: Distance XY” 1 2 4 5 9', '“Distance 2: Distance XY” 3 6 8 10 5', '“Distance 3: Distance XY” 88 45 36 12 4']
Run Code Online (Sandbox Code Playgroud)
或者使用unicode作为左侧智能引号\u201C:
with open("test.csv", 'r') as f:
for line in f:
print(re.split(r'(?<=\d)\s+(?=\u201C)', line.rstrip("\n")))
# ['“Distance 1: Distance XY” 1 2 4 5 9', '“Distance 2: Distance XY” 3 6 8 10 5', '“Distance 3: Distance XY” 88 45 36 12 4']
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
117 次 |
| 最近记录: |