小编Tim*_*fey的帖子

绘制Pandas DataFrame中出现的次数

我有一个包含两列的DataFrame.其中一个包含时间戳和另一个 - 某个动作的id.像这样的东西:

2000-12-29 00:10:00     action1
2000-12-29 00:20:00     action2
2000-12-29 00:30:00     action2
2000-12-29 00:40:00     action1
2000-12-29 00:50:00     action1
...
2000-12-31 00:10:00     action1
2000-12-31 00:20:00     action2
2000-12-31 00:30:00     action2

Run Code Online (Sandbox Code Playgroud)

我想知道在某一天已经执行了多少某种类型的动作.即每天,我需要计算actionX的出现次数,并为每个日期绘制此数据,其中X轴上的日期和Y轴上的actionX的出现次数.

当然,只要迭代我的数据集,我就可以天真地计算每一天的行动.但是用pandas/matplotlib做什么是"正确的方法"？

python matplotlib pandas

Tim*_*fey

lucky-day

9
推荐指数

2
解决办法

3万
查看次数

为什么在词干后懦弱变成了懦夫？

我注意到在应用 Porter 词干（来自 NLTK 库）后，我得到了奇怪的词干，例如"cowardli"或"contrari"。对我来说，它们根本不像茎。

没关系吗？难道是我哪里弄错了？

这是我的代码：

string = string.lower()
tokenized = nltk.tokenize.regexp_tokenize(string,"[a-z]+")
filtered = [w for w in tokenized if w not in nltk.corpus.stopwords.words("english")]


stemmer = nltk.stem.porter.PorterStemmer()
stemmed = []
for w in filtered:
    stemmed.append(stemmer.stem(w))

Run Code Online (Sandbox Code Playgroud)

这是我用于处理http://pastebin.com/XUMNCYAU的文本（Dostoevsky 的“罪与罚”一书的开头）。

nlp stemming nltk

Tim*_*fey

lucky-day

1
推荐指数

1
解决办法

1536
查看次数