解析逃脱角色

Mar*_*iet 2 python parsing

使用Python,我试图解析这样的字符串:

"hello" "I am an example" "the man said:\"hello!\""
Run Code Online (Sandbox Code Playgroud)

进入这些代币:

1) hello
2) I am an example
3) the man said: "hello!"
Run Code Online (Sandbox Code Playgroud)

类似的东西re.findall(r'"[^"]*"', str)接近,但无法处理转义字符(\).我很好奇有什么样的pythonic方法可以处理转义字符而不需要使用for循环或大型解析器包.

Tim*_*ker 5

这非常适合正则表达式:

re.findall(r'"(?:\\.|[^"\\])*"', str)
Run Code Online (Sandbox Code Playgroud)

说明:

"        # Match a "
(?:      # Match either...
 \\.     # an escaped character (\\, \" etc.)
|        # or
 [^"\\]  # any character except " or \
)*       # any number of times
"        # Match a "
Run Code Online (Sandbox Code Playgroud)

这将正确处理转义反斜杠:

>>> import re
>>> test = r'"hello" "Hello\\" "I am an example" "the man said:\"hello!\\\""'
>>> for match in re.findall(r'"(?:\\.|[^"\\])*"', test):
...     print(match)
...
"hello"
"Hello\\"
"I am an example"
"the man said:\"hello!\\\""
Run Code Online (Sandbox Code Playgroud)


Ned*_*der 5

您可以使用Python标记器:

import StringIO
s = r'"hello" "I am an example" "the man said:\"hello!\""'
sio = StringIO.StringIO(s)
t = list(tokenize.generate_tokens(sio.readline))
for tok in t: 
    print tok[1]
Run Code Online (Sandbox Code Playgroud)

打印:

"hello"
"I am an example"
"the man said:\"hello!\""
Run Code Online (Sandbox Code Playgroud)

这假设您确实需要字符串的Python语法.