替换多个(特殊)字符-最有效的方法?

Poe*_*dit 3 python string python-3.x

在我收到的文本中,我想将以下特殊字符替换为一个空格:

symbols = ["`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"]
Run Code Online (Sandbox Code Playgroud)

什么是最有效的方法(就代码执行时间而言)?

例如,我想要这样:

(Hello World)] *!
Run Code Online (Sandbox Code Playgroud)

成为这个:

Hello World
Run Code Online (Sandbox Code Playgroud)

候选方法似乎如下:

  1. 清单理解
  2. .replace()
  3. .translate()
  4. 常用表达

yat*_*atu 5

对于有效的解决方案,您可以使用str.maketrans此方法。请注意,一旦定义了转换表,就只需要映射字符串中的字符即可。您可以按照以下方式进行操作:

symbols = ["`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+",
           "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"]
Run Code Online (Sandbox Code Playgroud)

首先,通过dict.fromkeys为每个条目设置一个空格作为值从符号创建字典,然后从字典创建翻译表:

d = dict.fromkeys(''.join(symbols), ' ')
# {'`': ' ', ',': ' ', '~': ' ', '!': ' ', '@': ' '...
t = str.maketrans(d)
Run Code Online (Sandbox Code Playgroud)

然后调用string translate方法以将上述字典中的字符映射为空白:

s = '~this@is!a^test@'
s.translate(t)
# ' this is a test '
Run Code Online (Sandbox Code Playgroud)


Olv*_*ght 5

在进行了一些测试之后,我可以说这str.translate()是最好的变体。

输入数据:

symbols = {"`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"}
translate_table = {126: None, 93: None, 91: None, 125: None, 92: None, 42: None, 45: None, 94: None, 62: None, 47: None, 35: None, 59: None, 44: None, 58: None, 60: None, 124: None, 61: None, 36: None, 95: None, 43: None, 96: None, 123: None, 64: None, 33: None, 38: None, 63: None, 46: None, 34: None, 41: None, 37: None, 40: None}
regular_expression = "[`~!@#$%^&*()_\-+={[\]}|\\:;\"<,>.?/]"
small_document = "Some**r@an]]\"dom t##xt"
normal_document = "TbsX^Kt$FZ%haZe+sLxu:Al\"xNAL\\Kix[mHp_gn]PrG`DqGd~GdNc;BoEq.SYD?Rp>ukq,UfO<XdTc=RUH}oifc&oP!CB*me@Qv{Qf-Li)gmXL/IQH#mne(Khaj|"
big_document = "QOfY+dymyoGBAxTAoIeM+jEWlaECUZEUXuMvprJOqFtQR*OiHtTFZkUNbYipSTTDPOVkIdGTcjWrQmbmthKBHBSEOZ)lQAIJOrVgmGGFdtqbuFfj<Dls<JWtKczAFMPYMemiJBJHdPeeul\\x>lGIBvUsxBokagvVovrrdxdKMtAKx>MEexYv>DGqPUXYaBQKwiSIUobrPQYjilhHMQunE;RiqOZPTnyOEgRrpxcuobvvmGkFpTqgMxYYhrmRRnauiqgvCmZ\"UauceaXsgAMSakxewzPrlIrYkVCVZaEGh]qiizYyzbkcHPF@qQsQMfHPDEbEnWtrCFoARUYAloOcctqmL@hegZbfhsHaJOxOxzQhZAVjVDgokosATfhKMT!WYyPWKcKAHKCzQGGJOCglYGZbftsuyntXZUKNqgGlsLJqgN,pUcOoA/tStXFXgpoSErgvw/OUMPWjJwt=bhMAIDayOZXJm=ifYYUuAvSIZjwnBfktNvEvZmvQso%HiNZEVqoDR%nQBtCkhjSfVfDuRSRsvp-sCunjDDUYSEVLICQdisxhEfqkUTkiPlLiUNNwrvO#WTDmweZyMeIbgNXkIsvaJeHYXV(HvRcGNZM(PPRIAyyLWivGiqMVBtwObqLfEEISyyjGNEdUU:ys`dXcVawkIEAjFXky`RUXNTm`LDM}mwTOcmsSo}haJXPnkwOhKLYwve}SWifzKq}grw}fMSQXXWguUQtlWpPZQymR^wBKEyolFlZnzEEmehSNenOqDOHWRit[Npm?R?DIPXAmQYYBbmJofxUzzWBsVCoPI?VmpXhoMxCfXyHEHowXzIJvExThiffLhBTtma_jk_NrbkPCGGypXvOuBqBxDYfC{bwIHoaqnJSKytxwWXBNnKG~PKuQklGblEwH~rJoGpKZmm~tTEFnPLdmzfrqJibMYIykzL$RZLPmsZjB$AAbZwFnByOydEOIfFvTaEQaSjbpeBZuUGY&ZfPQgLihmPYrhZxSwMzLrNF.WjFiDCLyXksdkLeMHVCfrdgCAotElQ|"
no_match_document = "XOtasggWqhtSLJpHEGoCmMRepFBlRfAGKTLPcEtKonFVsPgvWgAbvJVeMWILPgLapwAmTgXWVbxOJtUFmMygzIqYPqyAxzwElTFyYcGdtnNa"
Run Code Online (Sandbox Code Playgroud)

代码:

def func1(doc):
    for c in symbols:
        doc = doc.replace(c, "")
    return doc


def func2(doc):
    return doc.translate(translate_table)


def func3(doc):
    return re.sub(regular_expression, "", doc)


def func4(doc):
    return "".join(c for c in doc if c not in symbols)
Run Code Online (Sandbox Code Playgroud)

检测结果:

func1(small_document):      0.701037002
func1(normal_document):     1.1260866900000002
func1(big_document):        3.4234831459999997
func1(no_match_document):   0.7740780450000004

func2(small_document):      0.14135037500000003
func2(normal_document):     0.5368806810000004
func2(big_document):        0.8128472860000002
func2(no_match_document):   0.394245089

func3(small_document):      0.3157141610000007
func3(normal_document):     0.927359323000001
func3(big_document):        1.9310377590000005
func3(no_match_document):   0.18656399199999996

func4(small_document):      0.3034549070000008
func4(normal_document):     1.3695875739999988
func4(big_document):        10.115730064
func4(no_match_document):   1.2086623230000022
Run Code Online (Sandbox Code Playgroud)

UPD。

我提供的输入数据是专门为纯方法测试“准备”的。

为了生成translate_table我使用了下一个字典理解:

translate_table = {ord(s): None for s in symbols}
Run Code Online (Sandbox Code Playgroud)

是正则表达式验证网站的链接(可能会有所帮助)。


如果您想自己重新计算测试,这里是代码:

    if __name__ == '__main__':
    import timeit
    print("func1(small_document)", timeit.timeit("func1(small_document)", setup="from __main__ import func1, small_document", number=100000))
    print("func1(normal_document): ", timeit.timeit("func1(normal_document)", setup="from __main__ import func1, normal_document", number=100000))
    print("func1(big_document): ", timeit.timeit("func1(big_document)", setup="from __main__ import func1, big_document", number=100000))
    print("func1(no_match_document): ", timeit.timeit("func1(no_match_document)", setup="from __main__ import func1, no_match_document", number=100000))

    print("func2(small_document): ", timeit.timeit("func2(small_document)", setup="from __main__ import func2, small_document", number=100000))
    print("func2(normal_document): ", timeit.timeit("func2(normal_document)", setup="from __main__ import func2, normal_document", number=100000))
    print("func2(big_document): ", timeit.timeit("func2(big_document)", setup="from __main__ import func2, big_document", number=100000))
    print("func2(no_match_document): ", timeit.timeit("func2(no_match_document)", setup="from __main__ import func2, no_match_document", number=100000))

    print("func3(small_document): ", timeit.timeit("func3(small_document)", setup="from __main__ import func3, small_document", number=100000))
    print("func3(normal_document): ", timeit.timeit("func3(normal_document)", setup="from __main__ import func3, normal_document", number=100000))
    print("func3(big_document): ", timeit.timeit("func3(big_document)", setup="from __main__ import func3, big_document", number=100000))
    print("func3(no_match_document): ", timeit.timeit("func3(no_match_document)", setup="from __main__ import func3, no_match_document", number=100000))

    print("func4(small_document): ", timeit.timeit("func4(small_document)", setup="from __main__ import func4, small_document", number=100000))
    print("func4(normal_document): ", timeit.timeit("func4(normal_document)", setup="from __main__ import func4, normal_document", number=100000))
    print("func4(big_document): ", timeit.timeit("func4(big_document)", setup="from __main__ import func4, big_document", number=100000))
    print("func4(no_match_document): ", timeit.timeit("func4(no_match_document)", setup="from __main__ import func4, no_match_document", number=100000))
Run Code Online (Sandbox Code Playgroud)