Nab*_*zir 5 python duplicates dataframe pandas
我有数据集,数据集有配对重复。这是我的数据
Id antecedent descendant
1 one two
2 two one
3 two three
4 one three
5 three two
Run Code Online (Sandbox Code Playgroud)
这是我需要的,因为one, two等于two, one所以我想删除重复的对
Id antecedent descendant
1 one two
3 two three
4 one three
Run Code Online (Sandbox Code Playgroud)
使用numpy.sortfor duplicatedboolean mask对每行进行排序:
df1 = pd.DataFrame(np.sort(df[['antecedent','descendant']], axis=1))
Run Code Online (Sandbox Code Playgroud)
或者:
#slowier solution
#df1 = df[['antecedent','descendant']].apply(frozenset, 1)
Run Code Online (Sandbox Code Playgroud)
df = df[~df1.duplicated()]
print (df)
Id antecedent descendant
0 1 one two
2 3 two three
3 4 one three
Run Code Online (Sandbox Code Playgroud)