使用Python和正则表达式计算文本中的标点符号

Question

使用Python和正则表达式计算文本中的标点符号

我试图计算标点符号出现在小说中的次数.例如,我想找到问号和句号的出现以及所有其他非字母数字字符.然后我想将它们插入到csv文件中.我不知道怎么做正则表达式因为我没有那么多的python经验.有人可以帮我吗？

texts=string.punctuation
counts=dict(Counter(w.lower() for w in re.findall(r"\w+", open(cwd+"/"+book).read())))
writer = csv.writer(open("author.csv", 'a'))
writer.writerow([counts.get(fieldname,0) for fieldname in texts])

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sea*_*ett 6

In [1]: from string import punctuation

In [2]: from collections import Counter

In [3]: counts = Counter(open('novel.txt').read())

In [4]: punctuation_counts = {k:v for k, v in counts.iteritems() if k in punctuation}

Run Code Online (Sandbox Code Playgroud)

我唯一真正的问题是你将整本小说一次性加载到内存中!`open('novel.txt').read()`我可以想象任何平均大小的小说都会使这个内存密集型操作. (2认同)

归档时间：	12 年，7 月前
查看次数：	5964 次
最近记录：	12 年，7 月前