Pandas：使用 apply 将列拆分为 2

Question

Pandas：使用 apply 将列拆分为 2

我有一个带有列（“位置”）的数据框，其中包含有关以逗号分隔的城市和州的信息。有些值为 None。

我写了一个函数将数据拆分成城市和州并稍微清理一下：

def split_data(x):
    if x:
        s = x.split(',')
        city = s[0].lstrip().rstrip()
        state = s[1].lstrip().rstrip()
    else:
        city = None
        state = None
    return city, state

Run Code Online (Sandbox Code Playgroud)

我很难弄清楚如何从此函数创建 2 个单独的列。如果我使用以下内容：

df['location_info'] = df['location'].apply(split_data)

Run Code Online (Sandbox Code Playgroud)

它在“location_info”列中创建一个元组。

在数据框中创建 2 个新列的最佳方法是什么 - 一个称为“城市”，另一个称为“州”？

Answer 1

jez*_*ael 6

我认为您可以使用矢量化函数str.split和str.strip：

df[['city','state']]=df['location'].str.split(',',expand=True).apply(lambda x: x.str.strip())

Run Code Online (Sandbox Code Playgroud)

或者：

df[['city','state']] = df['location'].str.split(',', expand=True)
df['city'] = df['city'].str.strip()
df['state'] = df['state'].str.strip()

Run Code Online (Sandbox Code Playgroud)

样本：

df = pd.DataFrame({'location':[' a,h ',' t ,u', None]})
print (df)
  location
0     a,h 
1     t ,u
2     None

df[['city','state']]=df['location'].str.split(',',expand=True).apply(lambda x: x.str.strip())
print (df)
  location  city state
0     a,h      a     h
1     t ,u     t     u
2     None  None  None

Run Code Online (Sandbox Code Playgroud)

但是如果真的需要使用你的功能（例如更复杂），请添加Series：

def split_data(x):
    if x:
        s = x.split(',')
        city = s[0].strip()
        state = s[1].strip()
    else:
        city = None
        state = None
    return pd.Series([city, state], index=['city','state'])

df[['city','state']] = df['location'].apply(split_data)
print (df)
  location  city state
0     a,h      a     h
1     t ,u     t     u
2     None  None  None

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，7 月前
查看次数：	4818 次
最近记录：	8 年，7 月前