Pandas 与 Numpy 索引：为什么索引排序存在根本差异？

Question

Pandas 与 Numpy 索引：为什么索引排序存在根本差异？

inq*_*One 4 python numpy dataframe pandas

麻木：

import numpy as np
nparr = np.array([[1, 5],[2,6], [3, 7]])
print(nparr)
print(nparr[0])    #first choose the row 
print(nparr[0][1]) #second choose the column

Run Code Online (Sandbox Code Playgroud)

给出预期的输出：

[[1 5]
 [2 6]
 [3 7]]

[1 5]

5

Run Code Online (Sandbox Code Playgroud)

熊猫：

[[1 5]
 [2 6]
 [3 7]]

[1 5]

5

Run Code Online (Sandbox Code Playgroud)

给出以下输出：

   a  b
0  1  5
1  2  6
2  3  7

0    1
1    2
2    3
Name: a, dtype: int64

2

Run Code Online (Sandbox Code Playgroud)

将 Pandas 数据框中“索引”的默认顺序更改为列第一的根本原因是什么？这种一致性/直觉性的丧失对我们有什么好处？

当然，如果我使用该iloc函数，我们可以将其编码为类似于 Numpy 数组索引：

df = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [5, 6, 7]
})
print(df)
print(df['a'])  #first choose the column !!!
print(df['a'][1])  #second choose the row !!!

Run Code Online (Sandbox Code Playgroud)

   a  b
0  1  5
1  2  6
2  3  7

a    1
b    5
Name: 0, dtype: int64

5

Run Code Online (Sandbox Code Playgroud)

Answer 1

Fat*_*ici 5

因为 Numpy 的直觉是数学（更具体地说是矩阵，类似于 MATLAB），而 Pandas 是数据库（类似于 SQL）。Numpy 按行和列（行在前，因为(i, j)矩阵的元素表示第ith 行和j第 th 列），而 Pandas 基于数据库的列工作，您可以在其中选择元素，即行。当然iloc，正如您提到的，您可以使用直接处理索引。

希望两者在范式/哲学上的差异是有道理的。

归档时间：	5 年，10 月前
查看次数：	1009 次
最近记录：	5 年，10 月前