Pandas - 根据行值有条件地选择新列的数据源列

aen*_*nsm 15 python pandas

是否有pandas功能允许根据条件从不同的列中进行选择?这类似于SQL Select子句中的CASE语句.例如,假设我有以下DataFrame:

foo = DataFrame(
    [['USA',1,2],
    ['Canada',3,4],
    ['Canada',5,6]], 
    columns = ('Country', 'x', 'y')
)
Run Code Online (Sandbox Code Playgroud)

我希望在Country =='USA'时从列'x'中选择,在Country =='Canada'时从列'y'中选择,产生如下内容:

  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]
Run Code Online (Sandbox Code Playgroud)

fal*_*tru 11

使用DataFrame.whereother论点和pandas.concat:

>>> import pandas as pd
>>>
>>> foo = pd.DataFrame([
...     ['USA',1,2],
...     ['Canada',3,4],
...     ['Canada',5,6]
... ], columns=('Country', 'x', 'y'))
>>>
>>> z = foo['x'].where(foo['Country'] == 'USA', foo['y'])
>>> pd.concat([foo['Country'], z], axis=1)
  Country  x
0     USA  1
1  Canada  4
2  Canada  6
Run Code Online (Sandbox Code Playgroud)

如果您想要z作为列名,请指定keys:

>>> pd.concat([foo['Country'], z], keys=['Country', 'z'], axis=1)
  Country  z
0     USA  1
1  Canada  4
2  Canada  6
Run Code Online (Sandbox Code Playgroud)


EdC*_*ica 5

这可行:

In [84]:

def func(x):
    if x['Country'] == 'USA':
        return x['x']
    if x['Country'] == 'Canada':
        return x['y']
    return NaN
foo['z'] = foo.apply(func(row), axis = 1)
foo
Out[84]:
  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]
Run Code Online (Sandbox Code Playgroud)

你可以使用loc:

In [137]:

foo.loc[foo['Country']=='Canada','z'] = foo['y']
foo.loc[foo['Country']=='USA','z'] = foo['x']
foo
Out[137]:
  Country  x  y  z
0     USA  1  2  1
1  Canada  3  4  4
2  Canada  5  6  6

[3 rows x 4 columns]
Run Code Online (Sandbox Code Playgroud)

编辑

尽管使用loc较大的数据帧会使用笨重的扩展更好,因为这里的apply适用于每一行,而使用布尔索引时将会进行矢量化.