为什么使用 [:] 与 iloc[:] 分配会在 Pandas 中产生不同的结果？

Question

为什么使用 [:] 与 iloc[:] 分配会在 Pandas 中产生不同的结果？

我iloc对熊猫中使用的不同索引方法感到困惑。

假设我正在尝试将一维数据帧转换为二维数据帧。首先我有以下一维数据框

a_array = [1,2,3,4,5,6,7,8]
a_df = pd.DataFrame(a_array).T

Run Code Online (Sandbox Code Playgroud)

我将把它转换成一个大小为2x4. 我首先按如下方式预设二维数据框：

b_df = pd.DataFrame(columns=range(4),index=range(2))

Run Code Online (Sandbox Code Playgroud)

然后我使用for循环来帮助我使用以下代码将a_df（1-d）转换为b_df（2-d）

for i in range(2):
    b_df.iloc[i,:] = a_df.iloc[0,i*4:(i+1)*4]

Run Code Online (Sandbox Code Playgroud)

它只给我以下结果

     0    1    2    3
0    1    2    3    4
1  NaN  NaN  NaN  NaN

Run Code Online (Sandbox Code Playgroud)

但是，当我换b_df.iloc[i,:]到b_df.iloc[i][:]。结果是正确的，如下所示，这就是我想要的

   0  1  2  3
0  1  2  3  4
1  5  6  7  8

Run Code Online (Sandbox Code Playgroud)

任何人都可以向我解释.iloc[i,:]和之间的区别.iloc[i][:]是什么，以及为什么.iloc[i][:]在我上面的示例中起作用但不起作用.iloc[i,:]

Answer 1

cs9*_*s95 3

series.iloc[:]当分配回来时，和之间有一个非常非常大的区别series[:]。(i)loc始终检查以确保您分配的任何内容与受让人的索引相匹配。同时，[:]语法分配给底层 NumPy 数组，绕过索引对齐。

s = pd.Series(index=[0, 1, 2, 3], dtype='float')  
s                                                                          

0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

# Let's get a reference to the underlying array with `copy=False`
arr = s.to_numpy(copy=False) 
arr 
# array([nan, nan, nan, nan])

# Reassign using slicing syntax
s[:] = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])                 
s                                                                          

0    1
1    2
2    3
3    4
dtype: int64

arr 
# array([1., 2., 3., 4.]) # underlying array has changed

# Now, reassign again with `iloc`
s.iloc[:] = pd.Series([5, 6, 7, 8], index=[3, 4, 5, 6]) 
s                                                                          

0    NaN
1    NaN
2    NaN
3    5.0
dtype: float64

arr 
# array([1., 2., 3., 4.])  # `iloc` created a new array for the series
                           # during reassignment leaving this unchanged

s.to_numpy(copy=False)     # the new underlying array, for reference                                                   
# array([nan, nan, nan,  5.])

Run Code Online (Sandbox Code Playgroud)

现在您已经了解了差异，让我们看看代码中发生了什么。只需打印出循环的 RHS 即可查看您分配的内容：

for i in range(2): 
    print(a_df.iloc[0, i*4:(i+1)*4]) 

# output - first row                                                                   
0    1
1    2
2    3
3    4
Name: 0, dtype: int64
# second row. Notice the index is different
4    5
5    6
6    7
7    8
Name: 0, dtype: int64

Run Code Online (Sandbox Code Playgroud)

在第二次迭代中分配给时b_df.iloc[i, :]，索引不同，因此没有分配任何内容，您只能看到 NaN。但是，更改 b_df.iloc[i, :]为b_df.iloc[i][:]将意味着您分配给底层 NumPy 数组，因此索引对齐被绕过。该操作可以更好地表示为

for i in range(2):
    b_df.iloc[i, :] = a_df.iloc[0, i*4:(i+1)*4].to_numpy()

b_df                                                                       

   0  1  2  3
0  1  2  3  4
1  5  6  7  8

Run Code Online (Sandbox Code Playgroud)

还值得一提的是，这是一种链式赋值形式，这不是一件好事，而且还会使您的代码更难以阅读和理解。

归档时间：	5 年，10 月前
查看次数：	604 次
最近记录：	5 年，10 月前