如何使用条件从现有列在数据框中创建新列？

Question

如何使用条件从现有列在数据框中创建新列？

Tsa*_*tsa 16 python series dataframe pandas

我有一列包含所有看起来像这样的数据（需要分隔的值带有类似（c）的标记）：

UK (c)
London
Wales
Liverpool
US (c)
Chicago
New York
San Francisco
Seattle
Australia (c)
Sydney
Perth

Run Code Online (Sandbox Code Playgroud)

我希望将其分为两列，如下所示：

London          UK
Wales           UK
Liverpool       UK
Chicago         US
New York        US
San Francisco   US
Seattle         US
Sydney          Australia
Perth           Australia

Run Code Online (Sandbox Code Playgroud)

问题2：如果这些国家没有（c）的模式怎么办？

Answer 1

WeN*_*Ben 10

逐步使用endswith和ffill+str.strip

df['country']=df.loc[df.city.str.endswith('(c)'),'city']
df.country=df.country.ffill()
df=df[df.city.ne(df.country)]
df.country=df.country.str.strip('(c)')

Run Code Online (Sandbox Code Playgroud)

Answer 2

cs9*_*s95 7

`extract` 和 `ffill`

以extract和开头ffill，然后删除多余的行。

df['country'] = (
    df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill())
df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia

Run Code Online (Sandbox Code Playgroud)

哪里，

df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill()

0            UK
1            UK
2            UK
3            UK
4            US
5            US
6            US
7            US
8            US
9     Australia
10    Australia
11    Australia
Name: country, dtype: object

Run Code Online (Sandbox Code Playgroud)

该模式'(.*)\s+\(c\)'匹配形式为“国家（c）”的字符串，并提取国家/地区名称。不符合此模式的所有内容都将替换为NaN，因此您可以方便地向前填充行。

`split`与`np.where`和`ffill`

这在“（c）”上分割。

u = df['data'].str.split(r'\s+\(c\)')
df['country'] = pd.Series(np.where(u.str.len() == 2, u.str[0], np.nan)).ffill()

df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia

Run Code Online (Sandbox Code Playgroud)

Answer 3

yat*_*atu 6

您可以首先使用str.extract来找到结尾的城市(c)并提取国家名称，然后ffill填充新country列。

可以使用相同的提取匹配项来定位要删除的行，即notna：

m = df.city.str.extract('^(.*?)(?=\(c\)$)')
ix = m[m.squeeze().notna()].index
df['country'] = m.ffill()
df.drop(ix)

            city     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia

Run Code Online (Sandbox Code Playgroud)

Answer 4

Moh*_*ani 5

你可以用np.where与str.contains太：

mask = df['places'].str.contains('(c)', regex = False)
df['country'] = np.where(mask, df['places'], np.nan)
df['country'] = df['country'].str.replace('\(c\)', '').ffill()
df = df[~mask]
df
            places     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia

Run Code Online (Sandbox Code Playgroud)

str包含寻找(c)，如果存在，则该索引返回True。如果此条件为True，则将国家/地区值添加到国家/地区列

归档时间：	6 年，8 月前
查看次数：	309 次
最近记录：	6 年，8 月前

如何使用条件从现有列在数据框中创建新列？

extract 和 ffill

split与np.where和ffill

`extract` 和 `ffill`

`split`与`np.where`和`ffill`