如何删除非字母字符的每个单词

Nor*_*her 5 python grammar python-2.7 python-3.x

我需要编写一个python脚本,用非字母字符删除文本文件中的每个单词,以便测试Zipf定律.例如:

asdf@gmail.com said: I've taken 2 reports to the boss
Run Code Online (Sandbox Code Playgroud)

taken reports to the boss
Run Code Online (Sandbox Code Playgroud)

我该怎么办?

sch*_*ggl 5

使用正则表达式仅匹配字母(和下划线),您可以这样做:

import re

s = "asdf@gmail.com said: I've taken 2 reports to the boss"
# s = open('text.txt').read()

tokens = s.strip().split()
clean_tokens = [t for t in tokens if re.match(r'[^\W\d]*$', t)]
# ['taken', 'reports', 'to', 'the', 'boss']
clean_s = ' '.join(clean_tokens)
# 'taken reports to the boss'
Run Code Online (Sandbox Code Playgroud)


Cth*_*Sky 5

尝试这个:

sentence = "asdf@gmail.com said: I've taken 2 reports to the boss"
words = [word for word in sentence.split() if word.isalpha()]
# ['taken', 'reports', 'to', 'the', 'boss']

result = ' '.join(words)
# taken reports to the boss
Run Code Online (Sandbox Code Playgroud)