python pandas:过滤掉给定字段具有空字符串或空字符串的记录

Eda*_*ame 5 python dataframe pandas

我试图在数据框中过滤出field_A为null或空字符串的记录,如下所示:

my_df[my_df.editions is not None]
my_df.shape
Run Code Online (Sandbox Code Playgroud)

这给我错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-e1969e0af259> in <module>()
      1 my_df['editions'] = my['editions'].astype(str)
----> 2 my_df = my_df[my_df.editions is not None]
      3 my_df.shape

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1995             return self._getitem_multilevel(key)
   1996         else:
-> 1997             return self._getitem_column(key)
   1998 
   1999     def _getitem_column(self, key):

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2002         # get column
   2003         if self.columns.is_unique:
-> 2004             return self._get_item_cache(key)
   2005 
   2006         # duplicate columns & possible reduce dimensionality

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1348         res = cache.get(item)
   1349         if res is None:
-> 1350             values = self._data.get(item)
   1351             res = self._box_item_values(item, values)
   1352             cache[item] = res

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3288 
   3289             if not isnull(item):
-> 3290                 loc = self.items.get_loc(item)
   3291             else:
   3292                 indexer = np.arange(len(self.items))[isnull(self.items)]

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()

KeyError: True
Run Code Online (Sandbox Code Playgroud)

要么

my_df[my_df.editions != None]
my_df.shape
Run Code Online (Sandbox Code Playgroud)

这个没有给出任何错误,但是没有过滤掉任何None值。

我也尝试过:

my_df = my_df[my_df.editions.notnull()]
Run Code Online (Sandbox Code Playgroud)

这个不会给出错误,但是也不会过滤出任何None值。

有人可以建议如何解决这个问题吗?谢谢!

小智 13

您可以在使用 过滤时否定条件~

所以在你的情况下你应该这样做:

my_df = my_df[~my_df.editions.isnull()]
Run Code Online (Sandbox Code Playgroud)


Sta*_*ckG 8

您可以像这样过滤掉数据框中的空字符串:

df = df[df['str_field'].str.len() > 0]
Run Code Online (Sandbox Code Playgroud)


Mat*_*ttR 5

您可以通过过滤创建一个新的数据框吗?

之前的数据框:

a     b
1     9
2    10
3    11
4    12
5    13
6    14
7    15
8  null
Run Code Online (Sandbox Code Playgroud)

例子:

import pandas

my_df = pandas.DataFrame({"a":[1,2,3,4,5,6,7,8],"b":[9,10,11,12,13,14,15,"null"]})

my_df2= my_df[(my_df['b']!="null")]
print(my_df2)
Run Code Online (Sandbox Code Playgroud)

之后的数据框:

a   b
1   9
2  10
3  11
4  12
5  13
6  14
7  15
Run Code Online (Sandbox Code Playgroud)

它所做的就是寻找“null”并将其排除。你可以用空字符串做同样的事情。