熊猫如何使用pd.cut()

Che*_*eng 19 python pandas

这是片段:

test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60])
Run Code Online (Sandbox Code Playgroud)

输出:

    days    range
0   0       NaN
1   31      (30, 60]
2   45      (30, 60]
Run Code Online (Sandbox Code Playgroud)

我很惊讶0不在(0,30),我应该怎么做才能将0归类为(0,30)?

jez*_*ael 30

test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
   days           range
0     0  (-0.001, 30.0]
1    31    (30.0, 60.0]
2    45    (30.0, 60.0]
Run Code Online (Sandbox Code Playgroud)

看到差异:

test = pd.DataFrame({'days': [0,20,30,31,45,60]})

test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
#30 value is in [30, 60) group
test['range2'] = pd.cut(test.days, [0,30,60], right=False)
#30 value is in (0, 30] group
test['range3'] = pd.cut(test.days, [0,30,60])
print (test)
   days          range1    range2    range3
0     0  (-0.001, 30.0]   [0, 30)       NaN
1    20  (-0.001, 30.0]   [0, 30)   (0, 30]
2    30  (-0.001, 30.0]  [30, 60)   (0, 30]
3    31    (30.0, 60.0]  [30, 60)  (30, 60]
4    45    (30.0, 60.0]  [30, 60)  (30, 60]
5    60    (30.0, 60.0]       NaN  (30, 60]
Run Code Online (Sandbox Code Playgroud)

或使用numpy.searchsorted,但days有待分类的值:

arr = np.array([0,30,60])
test['range1'] = arr.searchsorted(test.days)
test['range2'] = arr.searchsorted(test.days, side='right') - 1
print (test)
   days  range1  range2
0     0       0       0
1    20       1       0
2    30       1       1
3    31       2       1
4    45       2       1
5    60       2       2
Run Code Online (Sandbox Code Playgroud)

  • @pyd,hmmm,尝试将`bins = [0,30,60,np.inf]`和`labels = ['0-30','30-60','60 +']`与`pd.cut (df ['col'],bins = bins,labels = labels)` (4认同)
  • 如何找到最高值的范围..我只看到“include_lowest”而不是“highest” (2认同)

piR*_*red 15

pd.cut文档
包含参数right=False

test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60], right=False)

test

   days     range
0     0   [0, 30)
1    31  [30, 60)
2    45  [30, 60)
Run Code Online (Sandbox Code Playgroud)


小智 10

您也可以对 pd.cut() 使用标签。以下示例包含 0-10 范围内的学生成绩。我们添加了一个名为“grade_cat”的新列来对成绩进行分类。

bins代表区间:0-4为1个区间,5-6为1个区间,依此类推对应的标签为“差”、“正常”等

bins = [0, 4, 6, 10]
labels = ["poor","normal","excellent"]
student['grade_cat'] = pd.cut(student['grade'], bins=bins, labels=labels)
Run Code Online (Sandbox Code Playgroud)


小智 5

.cut 如何工作的示例

s=pd.Series([168,180,174,190,170,185,179,181,175,169,182,177,180,171])
    pd.cut(s,3)
    #To add labels to bins
    pd.cut(s,3,labels=["Small","Medium","Large"])
Run Code Online (Sandbox Code Playgroud)

这可以直接用于范围