这看起来很简单,但我在互联网上找不到任何相关信息
我有一个如下的数据框
City State Zip Date Description
Earlham IA 50072-1036 2014-10-10 Postmarket Assurance: Devices
Earlham IA 50072-1036 2014-10-10 Compliance: Devices
Madrid IA 50156-1748 2014-09-10 Drug Quality Assurance
Run Code Online (Sandbox Code Playgroud)
如何消除与5列中的4列匹配的重复项?列不匹配Description.
结果将是
City State Zip Date Description
Earlham IA 50072-1036 2014-10-10 Postmarket Assurance: Devices
Madrid IA 50156-1748 2014-09-10 Drug Quality Assurance
Run Code Online (Sandbox Code Playgroud)
我在网上找到的是drop_dupilcates与subset参数可以工作,但我不确定我怎么可以把它应用到多个列.
ayh*_*han 26
你实际上找到了解决方案.对于多列,子集将是一个列表.
df.drop_duplicates(subset=['City', 'State', 'Zip', 'Date'])
Run Code Online (Sandbox Code Playgroud)
或者,只需声明要忽略的列:
df.drop_duplicates(subset=df.columns.difference(['Description']))
Run Code Online (Sandbox Code Playgroud)