Chu*_*uck 7 python numpy pandas
我有一个数据框:
df = pd.DataFrame(
{'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})
df =
number condition
0 10 A
1 20 B
2 30 A
3 40 B
Run Code Online (Sandbox Code Playgroud)
我想将一个函数应用于number列中的每个元素,如下所示:
df['number'] = df['number'].apply(lambda x: func(x))
Run Code Online (Sandbox Code Playgroud)
但是,即使我将函数应用于数字列,我也希望函数也引用该condition列,即使用伪代码:
func(n):
#if the value in corresponding condition column is equal to some set of values:
# do some stuff to n using the value in condition
# return new value for n
Run Code Online (Sandbox Code Playgroud)
对于单个数字和示例函数,我将编写:
number = 10
condition = A
def func(num, condition):
if condition == A:
return num*3
if condition == B:
return num*4
func(number,condition) = 15
Run Code Online (Sandbox Code Playgroud)
如何将相同的功能合并到apply上面编写的语句中?即引用条件列中的值,同时作用于数字列中的值?
注:我已经通过对文档阅读np.where(),pandas.loc()并且pandas.index()可我就是不知道怎样把它付诸实践。
我在为从函数中引用另一列的语法而苦苦挣扎,因为我需要访问numberand condition列中的值。
因此,我的预期输出是:
df =
number condition
0 30 A
1 80 B
2 90 A
3 160 B
Run Code Online (Sandbox Code Playgroud)
更新:上面太含糊了。请参阅以下内容:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
Entries Conflict
0 "man" "Yes"
1 "guy" "Yes"
2 "boy" "Yes"
3 "girl" "No
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)
Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}
Run Code Online (Sandbox Code Playgroud)
我如何应用上面的np.where语句来获取注释中提到的熊猫系列,并产生如下所示的所需输出:
所需输出:
Entries Conflict
0 "manaaa" "Yes"
1 "guyaaa" "Yes"
2 "boyaaa" "Yes"
3 "girlbbb" "No
Run Code Online (Sandbox Code Playgroud)
由于问题是关于同一行的dataframe列的apply函数,将pandas apply函数与lambda以下命令结合使用似乎更准确:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
def func(number,condition):
multiplier = {'A': 2, 'B': 4}
return number * multiplier[condition]
df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)
Run Code Online (Sandbox Code Playgroud)
In this example, lambda takes the columns 'number' and 'condition' of the dataframe df and applies these columns of the same row to the function func with apply.
This returns the following result:
df
Out[10]:
condition number new_number
0 A 10 20
1 B 20 80
2 A 30 60
3 B 40 160
Run Code Online (Sandbox Code Playgroud)
For the UPDATE case its also possible to use the pandas apply function:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)
Run Code Online (Sandbox Code Playgroud)
In this example, lambda takes the columns 'Entries' and 'Conflict' of the dataframe df and applies these columns either to funcA or funcB of the same row with apply. The condition if funcA or funcB will be applied is done with an if-else clause in lambda.
This returns the following result:
df
Out[12]:
Conflict Entries
0 Yes manaaa
1 Yes guyaaa
2 Yes boyaaa
3 No girlbbb
Run Code Online (Sandbox Code Playgroud)
我不知道使用pandas.DataFrame.apply,但是您可以定义一个特定的condition:multiplier键值映射(multiplier如下所示),并将其传递给您的函数。然后,您可以使用列表推导number根据这些条件计算新的输出:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
multiplier = {'A': 2, 'B': 4}
def func(num, condition, multiplier):
return num * multiplier[condition]
df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'],
multiplier) for idx in range(len(df))]
Run Code Online (Sandbox Code Playgroud)
结果如下:
df
Out[24]:
condition number new_number
0 A 10 30
1 B 20 80
2 A 30 90
3 B 40 160
Run Code Online (Sandbox Code Playgroud)
向量化的纯熊猫解决方案可能更“理想”。但这也很紧迫。