从数据帧单元格中的字符串中删除单词/字符？

Question

从数据帧单元格中的字符串中删除单词/字符？

我有一个包含街道交叉口的列的数据框

|          Locations           |
--------------------------------
|W Madison Ave & S Randall Blvd|
|N Clemson St & E Tower Ave    |
|E Thompson St & S Garfield Ln |

Run Code Online (Sandbox Code Playgroud)

我想删除方向字符（N、S、E、W）以及街道的后缀（Blvd、St、Ave 等...），以便我的输出看起来像这样

|     Locations     |
---------------------
|Madison & Randall  |
|Clemson & Tower    |
|Thompson & Garfield|

Run Code Online (Sandbox Code Playgroud)

我不能这样做，str.replace()因为它会从我需要留下的单词中删除字符。我尝试使用lstrip()andrstrip()但这不会修复我想从字符串中间删除的字符。

我也尝试过 Series.apply()

banned = ['N', 'S', 'E', 'W', 'Ave', 'Blvd', 'St', 'Ln']
df["Locations"].apply(lambda x: [item for item in x if item not in banned])

Run Code Online (Sandbox Code Playgroud)

但这本质上是做 astr.replace()并将所有内容放在数据框中的列表中。

Answer 1

jez*_*ael 7

你很接近 - 你可以先拆分值，然后join：

f = lambda x: ' '.join([item for item in x.split() if item not in banned])
df["Locations"] = df["Locations"].apply(f)

Run Code Online (Sandbox Code Playgroud)

或者list comprehension：

df["Locations"] = [' '.join([item for item in x.split() 
                  if item not in banned]) 
                  for x in df["Locations"]]


print (df)
             Locations
0    Madison & Randall
1      Clemson & Tower
2  Thompson & Garfield

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	5715 次
最近记录：	7 年，6 月前