我有一个数据框df1,看起来像:
c k l
0 A 1 a
1 A 2 b
2 B 2 a
3 C 2 a
4 C 2 d
Run Code Online (Sandbox Code Playgroud)
另一个叫做df2:
c l
0 A b
1 C a
Run Code Online (Sandbox Code Playgroud)
我想过滤df1只保留不在的值df2.要过滤的值应为as (A,b)和(C,a)tuples.到目前为止,我尝试应用该isin方法:
d = df[~(df['l'].isin(dfc['l']) & df['c'].isin(dfc['c']))]
Run Code Online (Sandbox Code Playgroud)
除了在我看来太复杂,它返回:
c k l
2 B 2 a
4 C 2 d
Run Code Online (Sandbox Code Playgroud)
但我期待:
c k l
0 A 1 a
2 B 2 a
4 C 2 d
Run Code Online (Sandbox Code Playgroud)
jak*_*vdp 41
您可以使用isin从所需列构造的多索引有效地执行此操作:
df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
'k': [1, 2, 2, 2, 2],
'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
'l': ['b', 'a']})
keys = list(df2.columns.values)
i1 = df1.set_index(keys).index
i2 = df2.set_index(keys).index
df1[~i1.isin(i2)]
Run Code Online (Sandbox Code Playgroud)
我认为这改进了@ IanS的类似解决方案,因为它不假设任何列类型(即它将使用数字和字符串).
(以上答案是编辑.以下是我的初步答案)
有趣!这是我之前没有遇到的......我可能会通过合并两个数组来解决它,然后删除df2定义的行.这是一个使用临时数组的示例:
df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],
'k': [1, 2, 2, 2, 2],
'l': ['a', 'b', 'a', 'a', 'd']})
df2 = pd.DataFrame({'c': ['A', 'C'],
'l': ['b', 'a']})
# create a column marking df2 values
df2['marker'] = 1
# join the two, keeping all of df1's indices
joined = pd.merge(df1, df2, on=['c', 'l'], how='left')
joined
Run Code Online (Sandbox Code Playgroud)
# extract desired columns where marker is NaN
joined[pd.isnull(joined['marker'])][df1.columns]
Run Code Online (Sandbox Code Playgroud)
可能有一种方法可以在不使用临时数组的情况下执行此操作,但我想不到一个.只要您的数据不是很大,上述方法应该是一个快速而充分的答案.
小智 12
这非常简洁,效果很好:
df1 = df1[~df1.index.isin(df2.index)]
Run Code Online (Sandbox Code Playgroud)
DataFrame.merge & DataFrame.query:A more elegant method would be to do left join with the argument indicator=True, then filter all the rows which are left_only with query:
d = (
df1.merge(df2,
on=['c', 'l'],
how='left',
indicator=True)
.query('_merge == "left_only"')
.drop(columns='_merge')
)
print(d)
c k l
0 A 1 a
2 B 2 a
4 C 2 d
Run Code Online (Sandbox Code Playgroud)
indicator=True returns a dataframe with an extra column _merge which marks each row left_only, both, right_only:
df1.merge(df2, on=['c', 'l'], how='left', indicator=True)
c k l _merge
0 A 1 a left_only
1 A 2 b both
2 B 2 a left_only
3 C 2 a both
4 C 2 d left_only
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
33831 次 |
| 最近记录: |