Sar*_*rya 1 python text-processing
我有一个这样格式的文本文件:
B2100 Door Driver Key Cylinder Switch Failure B2101 Head Rest Switch Circuit Failure B2102 Antenna Circuit Short to Ground`, plus 1000 lines more.
这就是我想要的方式:
B2100*Door Driver Key Cylinder Switch Failure
B2101*Head Rest Switch Circuit Failure
B2102*Antenna Circuit Short to Ground
B2103*Antenna Not Connected
B2104*Door Passenger Key Cylinder Switch Failure
这样我就可以在LibreOffice Calc中复制这些数据,它会将其格式化为两列代码,并将其分别表示.
我的思考过程:
在Bxxxx上应用一个普通快递,并在它前面放一个星号(它作为分隔符)和一个\n意思之前(我不知道这是否有用?),并删除白色空间直到下一个遇到了字符.
我试图隔离B2100并且直到现在都失败了.我天真的尝试:
import re
text = """B2100 Door Driver Key Cylinder Switch Failure B2101 Head Rest Switch Circuit Failure B2102 Antenna Circuit Short to Ground B2103 Antenna Not Connected B2104 Door Passenger Key Cylinder Switch Failure B2105 Throttle Position Input Out of Range Low B2106 Throttle Position Input Out of Range High B2107 Front Wiper Motor Relay Circuit Short to Vbatt B2108 Trunk Key Cylinder Switch Failure"""
# text_arr = text.split("\^B[0-9][0-9][0-9][0-9]$\gi");
l = re.compile('\^B[0-9][0-9][0-9][0-9]$\gi').split(text)
print(l)
Run Code Online (Sandbox Code Playgroud)
这输出:
['B2100\tDoor Driver Key Cylinder Switch Failure B2101\tHead Rest Switch Circuit Failure B2102\tAntenna Circuit Short to Ground B2103\tAntenna Not Connected B2104\tDoor Passenger Key Cylinder Switch Failure B2105\tThrottle Position Input Out of Range Low B2106\tThrottle Position Input Out of Range High B2107\tFront Wiper Motor Relay Circuit Short to Vbatt B2108\tTrunk Key Cylinder Switch Failure']
Run Code Online (Sandbox Code Playgroud)
我如何达到预期的效果?
为了进一步细分,我想做的是:
将所有内容分解为代码(B1001)和意义(后面的文本)数组,然后\n单独应用每个操作(事物).如果你对如何做好整件事有更好的想法,那就更好了.我很乐意听到它.
基本上,你想:
*.这一切都可以用一个完成re.sub():
re.sub(r'\s*(B\d{4})\s*', r'\n\1*', text).strip()
Run Code Online (Sandbox Code Playgroud)
匹配模式:
\s* # Any amount of whitespace
(B\d{4}) # "B" followed by exactly 4 digits
\s* # Any amount of whitespace
Run Code Online (Sandbox Code Playgroud)
替换模式:
\n # Newline
\1 # The first parenthesized sequence from the matching pattern (B####)
* # Literal "*"
Run Code Online (Sandbox Code Playgroud)
这样做的目的strip()是修剪任何前导或尾随空格,包括将由第一个B ####序列的子部分产生的换行符.