我想使用 Pandas 的dropna函数axis=1来删除列,但仅限于具有某些thresh集合的列的子集。更具体地说,我想传递一个关于在dropna操作中要忽略哪些列的参数。我怎样才能做到这一点?下面是我尝试过的示例。
import pandas as pd
df = pd.DataFrame({
'building': ['bul2', 'bul2', 'cap1', 'cap1'],
'date': ['2019-01-01', '2019-02-01', '2019-01-01', '2019-02-01'],
'rate1': [301, np.nan, 250, 276],
'rate2': [250, 300, np.nan, np.nan],
'rate3': [230, np.nan, np.nan, np.nan],
'rate4': [230, np.nan, 245, np.nan],
})
# Only retain columns with more than 3 non-missing values
df.dropna(1, thresh=3)
building date rate1
0 bul2 2019-01-01 301.0
1 bul2 2019-02-01 NaN
2 cap1 2019-01-01 250.0
3 cap1 2019-02-01 276.0
# Try to do the same but only apply dropna to the subset of [building, date, rate1, and rate2],
# (meaning do NOT drop rate3 and rate4)
df.dropna(1, thresh=3, subset=['building', 'date', 'rate1', 'rate2'])
KeyError: ['building', 'date', 'rate1', 'rate2']
Run Code Online (Sandbox Code Playgroud)
# Desired subset of columns against which to apply `dropna`.
cols = ['building', 'date', 'rate1', 'rate2']
# Apply `dropna` and see which columns remain.
filtered_cols = df.loc[:, cols].dropna(axis=1, thresh=3).columns
# Use a conditional list comprehension to determine which columns were dropped.
dropped_cols = [col for col in cols if col not in filtered_cols]
# Use a conditional list comprehension to display all columns other than those that were dropped.
new_cols = [col for col in df if col not in dropped_cols]
>>> df[new_cols]
building date rate1 rate3 rate4
0 bul2 2019-01-01 301.0 230.0 230.0
1 bul2 2019-02-01 NaN NaN NaN
2 cap1 2019-01-01 250.0 NaN 245.0
3 cap1 2019-02-01 276.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2328 次 |
| 最近记录: |