pandas 中 `header = None` 和 `header = 0` 之间的区别

Question

pandas 中 `header = None` 和 `header = 0` 之间的区别

Sar*_*pta 4 csv dataframe python-3.x pandas

我正在编写一段代码来读取csv文件pandas，我看到了该包的一些奇怪的功能。我的文件有我想忽略的列名，因此我使用header = 0or'infer'代替None。但我看到了一些奇怪的东西。

当我使用None并且我想获取特定列时，我只需要执行此操作，df[column_index]但是当我使用0or时'infer'，我需要执行df.ix[:,column_index]其他操作才能获取该列，因为df[column_index]我收到以下错误：

回溯（最近一次调用最后）：文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py”，第2525行，在get_loc中返回self。 _engine.get_loc(key) 文件“pandas/_libs/index.pyx”，第 117 行，在 pandas._libs.index.IndexEngine.get_loc 文件“pandas/_libs/index.pyx”，第 139 行，在 pandas._libs.index .IndexEngine.get_loc 文件“pandas/_libs/hashtable_class_helper.pxi”，第 1265 行，在 pandas._libs.hashtable.PyObjectHashTable.get_item 文件“pandas/_libs/hashtable_class_helper.pxi”，第 1273 行，在 pandas._libs.hashtable.PyObjectHashTable .get_item KeyError：column_index

在处理上述异常的过程中，又出现了一个异常：

回溯（最近一次调用最后）：文件“”，第1行，文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py”，第2139行，在getitem 返回 self._getitem_column(key) 文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py”，第 2146 行，在 _getitem_column 返回 self. _get_item_cache(key) 文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/generic.py”，第 1842 行，在 _get_item_cache 值 = self._data.get(item ）文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/internals.py”，第3843行，在get loc = self.items.get_loc(item)文件中“ /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py”，第 2527 行，在 get_loc 中返回 self._engine.get_loc(self._maybe_cast_indexer(key) ）文件“pandas/_libs/index.pyx”，第 117 行，在 pandas._libs.index.IndexEngine.get_loc 文件“pandas/_libs/index.pyx”，第 139 行，在 pandas._libs.index.IndexEngine.get_loc 文件中“pandas/_libs/hashtable_class_helper.pxi”，第 1265 行，在 pandas._libs.hashtable.PyObjectHashTable.get_item 文件“pandas/_libs/hashtable_class_helper.pxi”，第 1273 行，在 pandas._libs.hashtable.PyObjectHashTable.get_item KeyError：column_index

有人可以帮忙吗？为什么会发生这种情况？

Answer 1

小智 7

当使用带有标题的数据框时，就会出现差异，所以假设您的数据框df有标题！

header=Nonepandas 自动将第一行df（这是实际的列名称）分配给第一行，因此您的列不再有名称
header=0，pandas 首先删除列名称（标题），然后为其分配新的列名称（仅当您在加载文件时传递名称= [........] 时）。 read_csv( filepath, header = 0 , names = ['....' , '....' ...])

希望能帮助到你！

归档时间：	7 年，5 月前
查看次数：	37041 次
最近记录：	3 年，9 月前