Ril*_*n42 6 python dataframe pandas
我试图在Pandas数据帧中取消堆栈()数据,但我不断收到此错误,我不知道为什么.到目前为止,我的代码是我的数据样本.我尝试修复它的方法是删除所有的行,其中voteId不是数字,这对我的实际数据集不起作用.这种情况发生在Anaconda笔记本(我正在开发的地方)和我的生产环境中,当我部署代码时.
我无法弄清楚如何在我的示例代码中重现错误...可能是由于在实例化数据帧时不存在的类型转换问题,就像我在示例中所做的那样?
#dataset simulate likely input
# d = {'vote': [100, 50,1,23,55,67,89,44],
# 'vote2': [10, 2,18,26,77,99,9,40],
# 'ballot1': ['a','b','a','a','b','a','c','c'],
# 'voteId':[1,2,3,4,5,'aaa',7,'NaN']}
# df1=pd.DataFrame(d)
#########################################################
df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')
s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)
dflw=pd.DataFrame(s)
Run Code Online (Sandbox Code Playgroud)
完整错误消息/堆栈跟踪:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-10-0a520180a8d9> in <module>()
22 df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')
23
---> 24 s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
25 s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)
26 dflw=pd.DataFrame(s)
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in unstack(self, level, fill_value)
4567 """
4568 from pandas.core.reshape.reshape import unstack
-> 4569 return unstack(self, level, fill_value)
4570
4571 _shared_docs['melt'] = ("""
~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value)
467 if isinstance(obj, DataFrame):
468 if isinstance(obj.index, MultiIndex):
--> 469 return _unstack_frame(obj, level, fill_value=fill_value)
470 else:
471 return obj.T.stack(dropna=False)
~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in _unstack_frame(obj, level, fill_value)
480 unstacker = partial(_Unstacker, index=obj.index,
481 level=level, fill_value=fill_value)
--> 482 blocks = obj._data.unstack(unstacker)
483 klass = type(obj)
484 return klass(blocks)
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in unstack(self, unstacker_func)
4349 new_columns = new_columns[columns_mask]
4350
-> 4351 bm = BlockManager(new_blocks, [new_columns, new_index])
4352 return bm
4353
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
3035 self._consolidate_check()
3036
-> 3037 self._rebuild_blknos_and_blklocs()
3038
3039 def make_empty(self, axes=None):
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)
3127
3128 if (new_blknos == -1).any():
-> 3129 raise AssertionError("Gaps in blk ref_locs")
3130
3131 self._blknos = new_blknos
AssertionError: Gaps in blk ref_locs
Run Code Online (Sandbox Code Playgroud)
要获取触发异常的真实数据,请添加额外的调试信息
调整
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py
添加行至class BlockManager()
def __init__(self)
print('BlockManager blocks')
pprint(self.blocks)
print('BlockManager axes')
pprint(self.axes)
Run Code Online (Sandbox Code Playgroud)
您将获得数据:
_unstack_frame 级别 -1 fill_value 无
投票 投票2
ballot1 投票ID
南 xx 100.0 10.0
假 aaa 50.1 2.0
-1 \n 1.0 18.0
真 NaN 23.0 26.0
b 错误 55.0 77.0
一个\ 67.0 99.0
89.0 9.0
8 44.0 南
调整
~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py
def __unstack_frame(self, ...)
from pprint import pprint
print('_unstack_frame level {} fill_value {} {}'.format(level, fill_value, type(obj)))
pprint(obj)
Run Code Online (Sandbox Code Playgroud)
你会看到数据:
区块管理器区块
(FloatBlock: 切片(0, 16, 1), 16 x 8, 数据类型: float64,)
块管理器轴
[MultiIndex(levels=[[u'vote', u'vote2'], [False, 8, u'\n', u' ', u'\', u'aaa', u'xx']],
标签=[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], [-1, 0, 1, 2, 3, 4 , 5, 6, -1, 0, 1, 2, 3, 4, 5, 6]],
名称=[无,u'voteId']),
Index([nan, -1, False, True, u'', u'a', u'b', u'c'], dtype='object', name=u'ballot1')]
我确实用另一个例子触发了异常:
文件“/usr/lib64/python2.7/site-packages/pandas/core/internals.py”,第 2902 行,在 _rebuild_blknos_and_blklocs 中
raise AssertionError("blk ref_locs 中的间隙")
断言错误:blk ref_locs 中存在间隙
带有调试信息
区块管理器区块 (FloatBlock: [-1, -1, -1], 3 x 2, dtype: float64,) 块管理器轴 [索引([aaa, bbb, ccc], dtype='object'), Int64Index([0, 1], dtype='int64')]