熊猫，应用于浮点列的 astype(int) 返回负数

Question

熊猫，应用于浮点列的 astype(int) 返回负数

我的任务是将数据从 excel 读取到数据框。数据有点乱，为了清理我已经做了：

df_1 = pd.read_excel(offers[0])
df_1 = df_1.rename(columns={'???????????? [???? ?????: 29.05.2019 ?????: 10:29:42 ]':'good_name', 
                     '????????':'barcode', 
                     '???? ??. ???.':'price',
                     '???????': 'balance'
                    })
df_1 = df_1[new_columns]
# I don't know why but without replacing NaN with another char code doesn't work
df_1.barcode = df_1.barcode.fillna('_')
# remove all non-numeric characters
df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row))
# convert str to numeric
df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer').fillna(0)
df_1.head()

Run Code Online (Sandbox Code Playgroud)

它返回类型为 float64 的列条形码（为什么会这样？）

0    0.000000e+00
1    7.613037e+12
2    7.613037e+12
3    7.613034e+12
4    7.613035e+12
Name: barcode, dtype: float64

Run Code Online (Sandbox Code Playgroud)

然后我尝试将该列转换为整数。

df_1.barcode = df_1.barcode.astype(int)

Run Code Online (Sandbox Code Playgroud)

但我不断收到愚蠢的负数。

df_1.barcode[0:5]
0             0
1   -2147483648
2   -2147483648
3   -2147483648
4   -2147483648

Name: barcode, dtype: int32

Run Code Online (Sandbox Code Playgroud)

感谢@Will 和@micric 最终我有了一个解决方案。

df_1 = pd.read_excel(offers[0])
df_1 = df_1[new_columns]
# replacing NaN with 0, it'll help to convert the column explicitly to dtype integer
df_1.barcode = df_1.barcode.fillna('0')
# remove all non-numeric characters
df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row))
# convert str to integer
df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer')

Run Code Online (Sandbox Code Playgroud)

恢复：

pd.to_numeric 将 NaN 转换为 float64。由于具有 NaN 和非 Nan 值的列，我们应该期望列 dtype float64。
检查您正在处理的数字的大小。int32 有它的限制，即 2**32 = 4294967296。非常感谢你们的帮助，伙计们！

Answer 1

Jas*_*oal 7

我遇到了与OP相同的问题，使用

astype(np.int64)

解决了我的问题，请参阅此处的链接。

我喜欢这个解决方案，因为它符合我更改列的列类型的习惯pandas，也许有人可以检查这些解决方案的性能。

Answer 2

mic*_*ric 5

这个数字是一个 32 位的下限。您的号码超出了您尝试使用的 int32 范围，因此它会返回限制（注意 2**32 = 4294967296，除以 2 2147483648，即您的号码）。

您应该改用 astype(int64) 。

它实际上应该是“astype(np.int64)”，/sf/ask/3076943481/ (6认同)

Answer 3

Wil*_*ill 4

许多问题合而为一。

所以你期望的 dtype...

pd.to_numeric(df_1.barcode, downcast='integer').fillna(0)

Run Code Online (Sandbox Code Playgroud)

pd.to_numeric向下转换为整数会给你一个整数，但是，你的数据中有 NaN，pandas 需要使用 float64 类型来表示 NaN

归档时间：	6 年，4 月前
查看次数：	3714 次
最近记录：	6 年，3 月前