Pandas：在Python中合并两个字符串列，删除重复的字符串并删除不需要的字符串，除非只剩下不需要的字符串

Question

Pandas：在Python中合并两个字符串列，删除重复的字符串并删除不需要的字符串，除非只剩下不需要的字符串

我正在尝试合并两个字符串列，并且希望摆脱'others'计数器值是否为“非其他”值 - 例如'apple' + 'others' = 'apple'but 'others' + 'others' = 'others'。我管理了第二个条件，但是如何在合并时适应这两个条件？

data = {'fruit1':["organge", "apple", "organge", "organge", "others"],
        'fruit2':["apple", "others", "organge", "watermelon", "others"]}
df = pd.DataFrame(data)

df["together"] = df["fruit1"] + ' ' + df["fruit2"]
df["together"] = df["together"].apply(lambda x: ' '.join(pd.unique(x.split())))

    fruit1      fruit2            together
0  organge       apple       organge apple
1    apple      others        apple others
2  organge     organge             organge
3  organge  watermelon  organge watermelon
4   others      others              others

Run Code Online (Sandbox Code Playgroud)

预期输出：

    fruit1      fruit2            together
0  organge       apple       organge apple
1    apple      others               apple
2  organge     organge             organge
3  organge  watermelon  organge watermelon
4   others      others              others

Run Code Online (Sandbox Code Playgroud)

Answer 1

Dan*_*ejo 5

您只想替换一个"others"，因此简单加入然后使用str.replace一次：

df["together"] = (df["fruit1"] + " " + df["fruit2"]).str.replace("others", "", n=1).str.strip()
print(df)

    fruit1      fruit2            together
0  organge       apple       organge apple
1    apple      others               apple
2  organge     organge     organge organge
3  organge  watermelon  organge watermelon
4   others      others              others

Run Code Online (Sandbox Code Playgroud)

该n参数指定要进行的替换次数，来自文档：

n int，默认 -1（全部）
从开始进行的替换次数。

更新

要同时删除重复项，请使用以下正则表达式：

df["together"] = df["together"].str.replace(r"\b(\w+)\s+\1\b", r"\1", n=1, regex=True).str.strip()
print(df)

Run Code Online (Sandbox Code Playgroud)

输出

    fruit1      fruit2            together
0  organge       apple       organge apple
1    apple      others               apple
2  organge     organge             organge
3  organge  watermelon  organge watermelon
4   others      others              others

Run Code Online (Sandbox Code Playgroud)

请参阅此处对正则表达式的解释。

归档时间：	4 年，3 月前
查看次数：	737 次
最近记录：	4 年，3 月前