Dom*_*m B 7 python numpy dataframe pandas
我有一个pandas数据帧,我从csv文件读入的值.我有一个标有'SleepQuality'的列,值从0.0到100.0浮动.我想创建一个标记为'SleepQualityGroup'的新列,其中原始列btw 0 - 49中的值在新列中的值为0,50 - 59 = 1,60 - 69 = 2,70 - 79 = 3,80 - 89 = 4,而90 - 100 = 5
为了做到这一点,最好的配方是什么?我坚持识别每个范围中所有值并分配给新值所需的逻辑.
下面在新的"SleepQualityGroup"列中输出结果的示例.
SleepQuality SleepQualityGroup
80.4 4
90.1 5
66.4 2
50.3 1
86.2 4
75.4 3
45.7 0
91.5 5
61.3 2
54 1
58.2 1
Run Code Online (Sandbox Code Playgroud)
Flo*_*oor 12
使用pd.cutie
df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])
Run Code Online (Sandbox Code Playgroud)
输出:
SleepQuality SleepQualityGroup new
0 80.4 4 4
1 90.1 5 5
2 66.4 2 2
3 50.3 1 1
4 86.2 4 4
5 75.4 3 3
6 45.7 0 0
7 91.5 5 5
8 61.3 2 2
9 54.0 1 1
10 58.2 1 1
这基本上是一个分箱操作.因此可以在这里使用这两种工具.
bins = np.arange(50,100,10)
df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)
Run Code Online (Sandbox Code Playgroud)
使用np.digitize-
df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)
Run Code Online (Sandbox Code Playgroud)
样品输出 -
In [866]: df
Out[866]:
SleepQuality SleepQualityGroup
0 80.4 4
1 90.1 5
2 66.4 2
3 50.3 1
4 86.2 4
5 75.4 3
6 45.7 0
7 91.5 5
8 61.3 2
9 54.0 1
10 58.2 1
Run Code Online (Sandbox Code Playgroud)
运行时测试 -
In [921]: df
Out[921]:
SleepQuality SleepQualityGroup
0 80.4 4
1 90.1 5
2 66.4 2
3 50.3 1
4 86.2 4
5 75.4 3
6 45.7 0
7 91.5 5
8 61.3 2
9 54.0 1
10 58.2 1
In [922]: df = pd.concat([df]*10000,axis=0)
# @Dark's soln using pd.cut
In [923]: %timeit df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])
1000 loops, best of 3: 1.04 ms per loop
In [926]: %timeit df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)
1000 loops, best of 3: 591 µs per loop
In [927]: %timeit df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)
1000 loops, best of 3: 538 µs per loop
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
470 次 |
| 最近记录: |