Nor*_*her 5 python grammar python-2.7 python-3.x
我需要编写一个python脚本,用非字母字符删除文本文件中的每个单词,以便测试Zipf定律.例如:
asdf@gmail.com said: I've taken 2 reports to the boss
Run Code Online (Sandbox Code Playgroud)
至
taken reports to the boss
Run Code Online (Sandbox Code Playgroud)
我该怎么办?
使用正则表达式仅匹配字母(和下划线),您可以这样做:
import re
s = "asdf@gmail.com said: I've taken 2 reports to the boss"
# s = open('text.txt').read()
tokens = s.strip().split()
clean_tokens = [t for t in tokens if re.match(r'[^\W\d]*$', t)]
# ['taken', 'reports', 'to', 'the', 'boss']
clean_s = ' '.join(clean_tokens)
# 'taken reports to the boss'
Run Code Online (Sandbox Code Playgroud)
尝试这个:
sentence = "asdf@gmail.com said: I've taken 2 reports to the boss"
words = [word for word in sentence.split() if word.isalpha()]
# ['taken', 'reports', 'to', 'the', 'boss']
result = ' '.join(words)
# taken reports to the boss
Run Code Online (Sandbox Code Playgroud)