我在 python pandas 中运行了一个命令,如下所示:
q1_fisher_r[(q1_fisher_r['TP53']==1) & q1_fisher_r[(q1_fisher_r['TumorST'].str.contains(':1:'))]]
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]
Run Code Online (Sandbox Code Playgroud)
我尝试使用的解决方案: 错误链接。
相应地将代码更改为:
q1_fisher_r[(q1_fisher_r['TumorST'].str.contains(':1:')) & (q1_fisher_r[(q1_fisher_r['TP53']==1)])]
Run Code Online (Sandbox Code Playgroud)
但我仍然遇到相同的错误 TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]
jez*_*ael 26
要按多个条件进行过滤,将它们链接起来&并过滤boolean indexing:
q1_fisher_r[(q1_fisher_r['TP53']==1) & q1_fisher_r['TumorST'].str.contains(':1:')]
^^^^ ^^^^
first condition second condition
Run Code Online (Sandbox Code Playgroud)
问题是这段代码返回了过滤后的数据,所以不能按条件链接:
q1_fisher_r[(q1_fisher_r['TumorST'].str.contains(':1:'))]
Run Code Online (Sandbox Code Playgroud)
类似问题:
q1_fisher_r[(q1_fisher_r['TP53']==1)]
Run Code Online (Sandbox Code Playgroud)
样品:
q1_fisher_r = pd.DataFrame({'TP53':[1,1,2,1], 'TumorST':['5:1:','9:1:','5:1:','6:1']})
print (q1_fisher_r)
TP53 TumorST
0 1 5:1:
1 1 9:1:
2 2 5:1:
3 1 6:1
df = q1_fisher_r[(q1_fisher_r['TP53']==1) & q1_fisher_r['TumorST'].str.contains(':1:')]
print (df)
TP53 TumorST
0 1 5:1:
1 1 9:1:
Run Code Online (Sandbox Code Playgroud)
下面的设置有类似的问题,它产生了相同的错误消息。对我来说非常简单的解决方案是将每个单独的条件放在括号之间。应该知道,但想强调以防其他人有同样的问题。
不正确的代码:
conditions = [
(df['A'] == '15min' & df['B'].dt.minute == 15), # Note brackets only surrounding both conditions together, not each individual condition
df['A'] == '30min' & df['B'].dt.minute == 30, # Note no brackets at all
]
output = [
df['Time'] + dt.timedelta(minutes = 45),
df['Time'] + dt.timedelta(minutes = 30),
]
df['TimeAdjusted'] = np.select(conditions, output, default = np.datetime64('NaT'))
Run Code Online (Sandbox Code Playgroud)
正确代码:
conditions = [
(df['A'] == '15min') & (df['B'].dt.minute == 15), # Note brackets surrounding each condition
(df['A'] == '30min') & (df['B'].dt.minute == 30), # Note brackets surrounding each condition
]
output = [
df['Time'] + dt.timedelta(minutes = 45),
df['Time'] + dt.timedelta(minutes = 30),
]
df['TimeAdjusted'] = np.select(conditions, output, default = np.datetime64('NaT'))
Run Code Online (Sandbox Code Playgroud)