列中包含python中的字符串列表

Question

列中包含python中的字符串列表

ano*_*428 12 python slice dataframe pandas

我有一个像下面这样的pandas数据帧:

                                          categories  review_count
0                  [Burgers, Fast Food, Restaurants]           137
1                         [Steakhouses, Restaurants]           176
2  [Food, Coffee & Tea, American (New), Restaurants]           390
...                                          ....              ...
...                                          ....              ...
...                                          ....              ...

Run Code Online (Sandbox Code Playgroud)

从这个dataFrame,我想只提取那些行,其中该行的'categories'列中的列表包含'Restaurants'类别.我至今尝试过: df[[df.categories.isin('Restaurants'),review_count]],

因为我在dataFrame中还有其他列,所以我指定了要提取的这两列.但我得到错误:

TypeError: unhashable type: 'list'

Run Code Online (Sandbox Code Playgroud)

我不太清楚这个错误意味着什么,因为我对熊猫很新.请告诉我如何实现我的目标,即仅从dataFrame中提取那些行,其中该行的"categories"列具有字符串'Restaurants'作为categories_list的一部分.任何帮助将非常感激.

提前致谢!

Answer 1

Mar*_*ius 12

我认为您可能必须使用一个lambda函数,因为您可以测试列中的值是否有isin一些序列,但pandas似乎没有提供用于测试列中的序列是否包含某些值的函数:

import pandas as pd
categories = [['fast_food', 'restaurant'], ['coffee', 'cafe'], ['burger', 'restaurant']]
counts = [137, 176, 390]
df = pd.DataFrame({'categories': categories, 'review_count': counts})
# Show which rows contain 'restaurant'
df.categories.map(lambda x: 'restaurant' in x)
# Subset the dataframe using this:
df[df.categories.map(lambda x: 'restaurant' in x)]

Run Code Online (Sandbox Code Playgroud)

输出:

Out[11]: 
                categories  review_count
0  [fast_food, restaurant]           137
2     [burger, restaurant]           390

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，5 月前
查看次数：	14955 次
最近记录：	12 年，5 月前