用空格替换标点符号

oce*_*o22 17 python string python-3.x

我的代码有问题,无法弄清楚如何前进.

tweet = "I am tired! I like fruit...and milk"
clean_words = tweet.translate(None, ",.;@#?!&$")
words = clean_words.split()

print tweet
print words
Run Code Online (Sandbox Code Playgroud)

输出:

['I', 'am', 'tired', 'I', 'like', 'fruitand', 'milk']
Run Code Online (Sandbox Code Playgroud)

我想要的是用空格替换标点符号但不知道使用什么函数或循环.有人可以帮我吗?

Yua*_*iKe 22

通过这样改变你的"maketrans"很容易实现:

import string
tweet = "I am tired! I like fruit...and milk"
translator = string.maketrans(string.punctuation, ' '*len(string.punctuation)) #map punctuation to space
print(tweet.translate(translator))
Run Code Online (Sandbox Code Playgroud)

它适用于运行python 3.5.2的机器.希望它也适用于你的.

  • 对于 python 3 使用 str.maketrans 而不是 string.maketrans (8认同)
  • 不确定 python3 但对于 python2.7.x 将 `str.maketrans(...)` 更改为 `string.maketrans(...)` (2认同)

Jon*_*han 7

这是一个基于正则表达式的解决方案,已经在Python 3.5.1下进行了测试。我认为这既简单又简洁。

import re

tweet = "I am tired! I like fruit...and milk"
clean = re.sub(r"""
               [,.;@#?!&$]+  # Accept one or more copies of punctuation
               \ *           # plus zero or more copies of a space,
               """,
               " ",          # and replace it with a single space
               tweet, flags=re.VERBOSE)
print(tweet + "\n" + clean)
Run Code Online (Sandbox Code Playgroud)

结果:

I am tired! I like fruit...and milk
I am tired I like fruit and milk
Run Code Online (Sandbox Code Playgroud)

精简版:

tweet = "I am tired! I like fruit...and milk"
clean = re.sub(r"[,.;@#?!&$]+\ *", " ", tweet)
print(tweet + "\n" + clean)
Run Code Online (Sandbox Code Playgroud)


小智 5

有几种方法可以解决这个问题。我有一个有效的,但相信它是次优的。希望更了解正则表达式的人会出现并改进答案或提供更好的答案。

你的问题被标记为python-3.x,但你的代码是python 2.x,所以我的代码也是2.x。我包括一个适用于 3.x 的版本。

#!/usr/bin/env python

import re

tweet = "I am tired! I like fruit...and milk"
# print tweet

clean_words = tweet.translate(None, ",.;@#?!&$")  # Python 2
# clean_words = tweet.translate(",.;@#?!&$")  # Python 3
print(clean_words)  # Does not handle fruit...and

regex_sub = re.sub(r"[,.;@#?!&$]+", ' ', tweet)  # + means match one or more
print(regex_sub)  # extra space between tired and I

regex_sub = re.sub(r"\s+", ' ', regex_sub)  # Replaces any number of spaces with one space
print(regex_sub)  # looks good
Run Code Online (Sandbox Code Playgroud)