如果系列是全纳,或者剩下的非纳米条目是零,如何有效地填充(0)?

Lie*_*ien 8 python multiple-conditions conditional-statements pandas fillna

鉴于我有一个pandas系列,如果所有值都是NaN或者所有值都是零或NaN ,我想用零填充NaN.

例如,我想用零填充以下系列中的NaN.

0       0
1       0
2       NaN
3       NaN
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN
Run Code Online (Sandbox Code Playgroud)

但是,我希望fillna(0)以下系列:

0       0
1       0
2       2
3       0
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN
Run Code Online (Sandbox Code Playgroud)

我正在查看文档,似乎我可以使用pandas.Series.value_counts来确保值只有0和NaN,然后​​只需调用fillna(0).换句话说,我想检查是否设置(s) .unique().astype(str)).issubset(['0.0','nan']),THEN fillna(0),否则不.

考虑到熊猫有多强大,似乎可能有更好的方法来做到这一点.有没有人有任何建议干净有效地做到这一点?

cᴏʟᴅsᴘᴇᴇᴅ的潜在解决方案

if s.dropna().eq(0).all():
    s = s.fillna(0)
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 8

你可以比较0,isna如果只是NaNs 0和然后fillna:

if ((s == 0) | (s.isna())).all():
    s = pd.Series(0, index=s.index)
Run Code Online (Sandbox Code Playgroud)

或者比较唯一值:

if pd.Series(s.unique()).fillna(0).eq(0).all():
    s = pd.Series(0, index=s.index)
Run Code Online (Sandbox Code Playgroud)

@cᴏʟᴅsᴘᴇᴇᴅ解决方案,谢谢 - 比较没有NaNs的系列dropna:

 if s.dropna().eq(0).all():
    s = pd.Series(0, index=s.index)
Run Code Online (Sandbox Code Playgroud)

来自问题的解决方案 - 需要转换为strings,因为NaNs 比较的问题:

if set(s.unique().astype(str)).issubset(['0.0','nan']):

    s = pd.Series(0, index=s.index)
Run Code Online (Sandbox Code Playgroud)

时间:

s = pd.Series(np.random.choice([0,np.nan], size=10000))

In [68]: %timeit ((s == 0) | (s.isna())).all()
The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 574 µs per loop

In [69]: %timeit pd.Series(s.unique()).fillna(0).eq(0).all()
1000 loops, best of 3: 587 µs per loop

In [70]: %timeit s.dropna().eq(0).all()
The slowest run took 4.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop

In [71]: %timeit set(s.unique().astype(str)).issubset(['0.0','nan'])
The slowest run took 5.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 157 µs per loop
Run Code Online (Sandbox Code Playgroud)