Consider the dataframe df
df = pd.DataFrame(dict(A=[1, 2], B=['X', 'Y']))
df
A B
0 1 X
1 2 Y
Run Code Online (Sandbox Code Playgroud)
If I shift along axis=0 (the default)
df.shift()
A B
0 NaN NaN
1 1.0 X
Run Code Online (Sandbox Code Playgroud)
It pushes all rows downwards one row as expected.
But when I shift along axis=1
df.shift(axis=1)
A B
0 NaN NaN
1 NaN NaN
Run Code Online (Sandbox Code Playgroud)
Everything is null when I expected
A B
0 NaN 1
1 NaN 2
Run Code Online (Sandbox Code Playgroud)
I understand why this happened. For axis=0, Pandas is operating column by column where each column is a single dtype and when shifting, there is clear protocol on how to deal with the introduced NaN value at the beginning or end. But when shifting along axis=1 we introduce potential ambiguity of dtype from one column to the next. In this case, I'm trying for force int64 into an object column and Pandas decides to just null the values.
This becomes more problematic when the dtypes are int64 and float64
df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.]))
df
A B
0 1 1.0
1 2 2.0
Run Code Online (Sandbox Code Playgroud)
And the same thing happens
df.shift(axis=1)
A B
0 NaN NaN
1 NaN NaN
Run Code Online (Sandbox Code Playgroud)
What are good options for creating a dataframe that is shifted along axis=1 in which the result has shifted values and dtypes?
For the int64/float64 case the result would look like:
df_shifted
A B
0 NaN 1
1 NaN 2
Run Code Online (Sandbox Code Playgroud)
and
df_shifted.dtypes
A object
B int64
dtype: object
Run Code Online (Sandbox Code Playgroud)
A more comprehensive example
df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.], C=['X', 'Y'], D=[4., 5.], E=[4, 5]))
df
A B C D E
0 1 1.0 X 4.0 4
1 2 2.0 Y 5.0 5
Run Code Online (Sandbox Code Playgroud)
Should look like this
df_shifted
A B C D E
0 NaN 1 1.0 X 4.0
1 NaN 2 2.0 Y 5.0
df_shifted.dtypes
A object
B int64
C float64
D object
E float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
事实证明,熊猫正在转移类似的块 dtypes
定义df为
df = pd.DataFrame(dict(
A=[1, 2], B=[3., 4.], C=['X', 'Y'],
D=[5., 6.], E=[7, 8], F=['W', 'Z']
))
df
# i f o f i o
# n l b l n b
# t t j t t j
#
A B C D E F
0 1 3.0 X 5.0 7 W
1 2 4.0 Y 6.0 8 Z
Run Code Online (Sandbox Code Playgroud)
它将整数移至下一个整数列,将浮点数移至下一个浮点列,将对象移至下一个对象列
df.shift(axis=1)
A B C D E F
0 NaN NaN NaN 3.0 1.0 X
1 NaN NaN NaN 4.0 2.0 Y
Run Code Online (Sandbox Code Playgroud)
我不知道这是个好主意,但这就是正在发生的事情。
astype(object) 第一dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.astype(object).shift(1, axis=1).astype(dtypes)
df_shifted
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
Run Code Online (Sandbox Code Playgroud)
transpose会做到的 object
dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.T.shift().T.astype(dtypes)
df_shifted
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
Run Code Online (Sandbox Code Playgroud)
itertuplespd.DataFrame([(np.nan, *t[1:-1]) for t in df.itertuples()], columns=[*df])
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
Run Code Online (Sandbox Code Playgroud)
虽然我可能会这样做
pd.DataFrame([
(np.nan, *t[:-1]) for t in
df.itertuples(index=False, name=None)
], columns=[*df])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
80 次 |
| 最近记录: |