dtypes muck things up when shifting on axis one (columns)

piR*_*red 7 python pandas

Consider the dataframe df

df = pd.DataFrame(dict(A=[1, 2], B=['X', 'Y']))

df

   A  B
0  1  X
1  2  Y
Run Code Online (Sandbox Code Playgroud)

If I shift along axis=0 (the default)

df.shift()

     A    B
0  NaN  NaN
1  1.0    X
Run Code Online (Sandbox Code Playgroud)

It pushes all rows downwards one row as expected.

But when I shift along axis=1

df.shift(axis=1)

    A    B
0 NaN  NaN
1 NaN  NaN
Run Code Online (Sandbox Code Playgroud)

Everything is null when I expected

     A  B
0  NaN  1
1  NaN  2
Run Code Online (Sandbox Code Playgroud)

I understand why this happened. For axis=0, Pandas is operating column by column where each column is a single dtype and when shifting, there is clear protocol on how to deal with the introduced NaN value at the beginning or end. But when shifting along axis=1 we introduce potential ambiguity of dtype from one column to the next. In this case, I'm trying for force int64 into an object column and Pandas decides to just null the values.

This becomes more problematic when the dtypes are int64 and float64

df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.]))

df

   A    B
0  1  1.0
1  2  2.0
Run Code Online (Sandbox Code Playgroud)

And the same thing happens

df.shift(axis=1)

    A   B
0 NaN NaN
1 NaN NaN
Run Code Online (Sandbox Code Playgroud)

My Question

What are good options for creating a dataframe that is shifted along axis=1 in which the result has shifted values and dtypes?

For the int64/float64 case the result would look like:

df_shifted

     A  B
0  NaN  1
1  NaN  2
Run Code Online (Sandbox Code Playgroud)

and

df_shifted.dtypes

A    object
B     int64
dtype: object
Run Code Online (Sandbox Code Playgroud)

A more comprehensive example

df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.], C=['X', 'Y'], D=[4., 5.], E=[4, 5]))

df

   A    B  C    D  E
0  1  1.0  X  4.0  4
1  2  2.0  Y  5.0  5
Run Code Online (Sandbox Code Playgroud)

Should look like this

df_shifted

     A  B    C  D    E
0  NaN  1  1.0  X  4.0
1  NaN  2  2.0  Y  5.0

df_shifted.dtypes

A     object
B      int64
C    float64
D     object
E    float64
dtype: object
Run Code Online (Sandbox Code Playgroud)

piR*_*red 7

事实证明,熊猫正在转移类似的块 dtypes

定义df

df = pd.DataFrame(dict(
    A=[1, 2], B=[3., 4.], C=['X', 'Y'],
    D=[5., 6.], E=[7, 8], F=['W', 'Z']
))

df

#  i    f  o    f  i  o
#  n    l  b    l  n  b
#  t    t  j    t  t  j
#
   A    B  C    D  E  F
0  1  3.0  X  5.0  7  W
1  2  4.0  Y  6.0  8  Z
Run Code Online (Sandbox Code Playgroud)

它将整数移至下一个整数列,将浮点数移至下一个浮点列,将对象移至下一个对象列

df.shift(axis=1)

    A   B    C    D    E  F
0 NaN NaN  NaN  3.0  1.0  X
1 NaN NaN  NaN  4.0  2.0  Y
Run Code Online (Sandbox Code Playgroud)

我不知道这是个好主意,但这就是正在发生的事情。


方法

astype(object) 第一

dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.astype(object).shift(1, axis=1).astype(dtypes)

df_shifted

     A  B    C  D    E  F
0  NaN  1  3.0  X  5.0  7
1  NaN  2  4.0  Y  6.0  8
Run Code Online (Sandbox Code Playgroud)

transpose

会做到的 object

dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.T.shift().T.astype(dtypes)

df_shifted

     A  B    C  D    E  F
0  NaN  1  3.0  X  5.0  7
1  NaN  2  4.0  Y  6.0  8
Run Code Online (Sandbox Code Playgroud)

itertuples

pd.DataFrame([(np.nan, *t[1:-1]) for t in df.itertuples()], columns=[*df])

     A  B    C  D    E  F
0  NaN  1  3.0  X  5.0  7
1  NaN  2  4.0  Y  6.0  8
Run Code Online (Sandbox Code Playgroud)

虽然我可能会这样做

pd.DataFrame([
    (np.nan, *t[:-1]) for t in
    df.itertuples(index=False, name=None)
], columns=[*df])
Run Code Online (Sandbox Code Playgroud)

  • 对我而言,这绝对是一个错误,这使具有键控列并按列移动N个位置的整个点无效 (4认同)