ana*_*ine 3 python dataframe python-3.x pandas
目前使用 Pandas 和 Numpy。我有一个名为“df”的数据框。假设我有以下数据,如何根据 between 子句为第三列提供值?如果可能的话,我想将其视为一种矢量化方法,以保持我已有的速度。
我尝试过 lambda 函数,但坦率地说,我不明白我在做什么,并且我收到错误,例如对象没有属性“之间”。
一般方法 - 使用非矢量化方法:
NOTE: I am looking for a way to make this vectorised.
If df.['Col2'] is between 0 and 10
df.['Col 3'] = 1
Elseif df.['Col2'] is between 10.01 and 20
df.['Col3'] = 2
Else if df.['Col2'] is between 20.1 and 30
df.['Col3'] = 3
Run Code Online (Sandbox Code Playgroud)
样本集
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| a | 5 | 1 |
| b | 10 | 1 |
| c | 15 | 2 |
| d | 20 | 2 |
| e | 25 | 3 |
| f | 30 | 3 |
| g | 1 | 1 |
| h | 11 | 2 |
| i | 21 | 3 |
| j | 7 | 1 |
+------+------+------+
Run Code Online (Sandbox Code Playgroud)
非常感谢
def cust_func(row):
r = row['Col2']
if r >=0 AND r<=10:
val = 1
elif r >=10.01 AND r<=20:
val = 2
elseif r>=20.01 AND r<=30:
val = 3
return val
df['Col3'] = df.apply(cust_func, axis=1)
Run Code Online (Sandbox Code Playgroud)
cut_labels = [1, 2, 3]
cut_bins = [0, 10, 20,30]
df['Col3'] = pd.cut(df['Col2'], bins=cut_bins, labels=cut_labels)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
71 次 |
| 最近记录: |