Pandas：仅填充数字（int 或 float）列

Question

Pandas：仅填充数字（int 或 float）列

6 python pandas

我fillna只想应用于数字列。有可能吗？

现在，我正在所有列中应用它：

df = df.replace(r"^\s*$", np.nan, regex=True)

Answer 1

Pra*_*iel 9

您可以选择数字列然后填充例如：

import pandas as pd

df = pd.DataFrame({'a': [1, None] * 3,
                    'b': [True, None] * 3,
                  'c': [1.0, None] * 3})

# select numeric columns
numeric_columns = df.select_dtypes(include=['number']).columns

# fill -1 to all NaN 
df[numeric_columns] = df[numeric_columns].fillna(-1)

# print
print(df)

Run Code Online (Sandbox Code Playgroud)

Answer 2

sam*_*mmy 6

这是一个老问题，但是，我发现单独填充列比当前选择的答案更快：

\n

def func(df, value):\n    df = df.copy()\n    for col in df:\n        # select only integer or float dtypes\n        if df[col].dtype in ("int", "float"):\n            df[col] = df[col].fillna(value)\n    return df\n\n func(df, value=-1) # or df.pipe(func, value=-1)\n\n      a      b        c\n0    1.0    True     1.0\n1   -1.0    None    -1.0\n2    1.0    True     1.0\n3   -1.0    None    -1.0\n4    1.0    True     1.0\n5   -1.0    None    -1.0\n

Run Code Online (Sandbox Code Playgroud)\n

比较速度loop返回470 \xc2\xb5s \xc2\xb1 12.1 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)，而接受的答案返回1.57 ms \xc2\xb1 26.3 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)。

\n

如果数据帧大小增加到 60,000 行： pd.concat([df]*10_000, ignore_index=True)，则loop返回1.48 ms \xc2\xb1 79.2 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)，而所选答案返回2.47 ms \xc2\xb1 140 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)。

\n

对于这两种情况，循环都比所选答案快得多。此外，您的里程可能会有所不同。只是一些值得深思的东西，尤其是在试图获得更多性能时。

\n

归档时间：	5 年，9 月前
查看次数：	2120 次
最近记录：	4 年，10 月前