在问这个问题时,我意识到我对原始字符串知之甚少.对于那些自称是Django训练师的人来说,这很糟糕.
我知道编码是什么,而且我知道u''自从我得到什么是Unicode以来我们独自做了什么.
但到底r''做了什么呢?它会产生什么样的字符串?
And above all, what the heck does ur'' do?
Finally, is there any reliable way to go back from a Unicode string to a simple raw string?
Ah, and by the way, if your system and your text editor charset are set to UTF-8, does u'' actually do anything?
任何人都可以解释为什么下面的示例1有效,何时r不使用前缀?我认为r只要使用转义序列,就必须使用前缀.示例2和示例3证明了这一点.
# example 1
import re
print (re.sub('\s+', ' ', 'hello there there'))
# prints 'hello there there' - not expected as r prefix is not used
# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
# prints 'hello there' - as expected as r prefix is used
# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))
# prints 'hello there there' - as expected as r prefix is not used
Run Code Online (Sandbox Code Playgroud)