如何使用 dropna 将列删除到 Pandas 的列子集上

Gau*_*sal 1 python pandas

我想使用 Pandas 的dropna函数axis=1来删除列,但仅限于具有某些thresh集合的列的子集。更具体地说,我想传递一个关于在dropna操作中要忽略哪些列的参数。我怎样才能做到这一点?下面是我尝试过的示例。

import pandas as pd
df = pd.DataFrame({
    'building': ['bul2', 'bul2', 'cap1', 'cap1'],
    'date': ['2019-01-01', '2019-02-01', '2019-01-01', '2019-02-01'],
    'rate1': [301, np.nan, 250, 276],
    'rate2': [250, 300, np.nan, np.nan],
    'rate3': [230, np.nan, np.nan, np.nan], 
    'rate4': [230, np.nan, 245, np.nan], 
})

# Only retain columns with more than 3 non-missing values
df.dropna(1, thresh=3)
    building    date    rate1
0   bul2    2019-01-01  301.0
1   bul2    2019-02-01  NaN
2   cap1    2019-01-01  250.0
3   cap1    2019-02-01  276.0

# Try to do the same but only apply dropna to the subset of [building, date, rate1, and rate2],
# (meaning do NOT drop rate3 and rate4)
df.dropna(1, thresh=3, subset=['building', 'date', 'rate1', 'rate2'])
KeyError: ['building', 'date', 'rate1', 'rate2']
Run Code Online (Sandbox Code Playgroud)

Ale*_*der 5

# Desired subset of columns against which to apply `dropna`.
cols = ['building', 'date', 'rate1', 'rate2']

# Apply `dropna` and see which columns remain.
filtered_cols = df.loc[:, cols].dropna(axis=1, thresh=3).columns

# Use a conditional list comprehension to determine which columns were dropped.
dropped_cols = [col for col in cols if col not in filtered_cols]

# Use a conditional list comprehension to display all columns other than those that were dropped.
new_cols = [col for col in df if col not in dropped_cols]
>>> df[new_cols]
  building        date  rate1  rate3  rate4
0     bul2  2019-01-01  301.0  230.0  230.0
1     bul2  2019-02-01    NaN    NaN    NaN
2     cap1  2019-01-01  250.0    NaN  245.0
3     cap1  2019-02-01  276.0    NaN    NaN
Run Code Online (Sandbox Code Playgroud)