gou*_*tam 9 python string dataframe
我想计算一个单词在复习字符串中重复的次数
我正在读取csv文件并使用下面的行将其存储在python数据帧中
reviews = pd.read_csv("amazon_baby.csv")
Run Code Online (Sandbox Code Playgroud)
当我将其应用于单个评论时,以下行中的代码可以正常工作.
print reviews["review"][1]
a = reviews["review"][1].split("disappointed")
print a
b = len(a)
print b
Run Code Online (Sandbox Code Playgroud)
上述行的输出是
it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.']
2
Run Code Online (Sandbox Code Playgroud)
当我使用以下行将相同的逻辑应用于整个数据帧时.我收到一条错误消息
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
Run Code Online (Sandbox Code Playgroud)
错误信息:
Traceback (most recent call last):
File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module>
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__
(type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'split'
Run Code Online (Sandbox Code Playgroud)
hoy*_*and 11
您正在尝试拆分数据框的整个审阅列(这是错误消息中提到的系列).您要做的是将一个函数应用于数据框的每一行,您可以通过调用数据框上的apply来执行此操作:
f = lambda x: len(x["review"].split("disappointed")) -1
reviews["disappointed"] = reviews.apply(f, axis=1)
Run Code Online (Sandbox Code Playgroud)
pandas 0.20.3具有pandas.Series.str.split(),它作用于系列的每个字符串并进行拆分。因此,您可以简单地拆分,然后计算拆分的次数
len(reviews['review'].str.split('disappointed')) - 1
Run Code Online (Sandbox Code Playgroud)