我有一系列633个值,其中约50%为0.理想情况下,我想使用我的值(用于等值线映射目的)qcut(),但由于非唯一的bin边缘,这会给我一个错误.分离数据的最佳方法是什么,对非零值进行分类,然后将它们重新组合成一列,使得零值的值为0,而量化值具有categorical.label + 1?
我有数据集:
recency;frequency;monetary
21;156;41879955
13;88;16850284
8;74;79150488
2;74;26733719
9;55;16162365
...;...;...
Run Code Online (Sandbox Code Playgroud)
详细的原始数据 - > http://pastebin.com/beiEeS80
和我投入DataFrame,这里是我的完整代码:
df = pd.DataFrame(datas, columns=['userid', 'recency', 'frequency', 'monetary'])
df['recency'] = df['recency'].astype(float)
df['frequency'] = df['frequency'].astype(float)
df['monetary'] = df['monetary'].astype(float)
df['recency'] = pd.qcut(df['recency'].values, 5).codes + 1
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1
df['monetary'] = pd.qcut(df['monetary'].values, 5).codes + 1
Run Code Online (Sandbox Code Playgroud)
但它的返回错误
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1
ValueError: Bin edges must be unique: array([ 1., 1., 2., 4., 9., 156.])
Run Code Online (Sandbox Code Playgroud)
怎么解决这个?
我使用 Pandas 的 Qcut 将数据离散化为大小相等的存储桶。我想要有价格桶。这是我的数据框:
productId sell_prix categ popularity
11997 16758760.0 28.75 50 524137.0
11998 16758760.0 28.75 50 166795.0
13154 16782105.0 24.60 50 126890.5
13761 16790082.0 65.00 50 245437.0
13762 16790082.0 65.00 50 245242.0
15355 16792720.0 29.00 50 360219.0
15356 16792720.0 29.00 50 360100.0
15357 16792720.0 29.00 50 360027.0
15358 16792720.0 29.00 50 462850.0
15367 16792728.0 29.00 50 193030.5
Run Code Online (Sandbox Code Playgroud)
这是我的代码:
df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)
Run Code Online (Sandbox Code Playgroud)
我有这个错误消息:
**ValueError: Bin edges must be unique: array([ 24.6, 29. , 29. , 65. ])**
Run Code Online (Sandbox Code Playgroud)
实际上,我有一个包含 …
I am trying to compute percentile of two columns using the pandas qcut method like below:
my_df['float_col_quantile'] = pd.qcut(my_df['float_col'], 100, labels=False)
my_df['int_col_quantile'] = pd.qcut(my_df['int_col'].astype(float), 100, labels=False)
Run Code Online (Sandbox Code Playgroud)
The column float_col_quantile works fine, but the column int_col_quantile has the following error. Any idea what I did wrong here? And how can I fix this problem? Thanks!
ValueError Traceback (most recent call last)
<ipython-input-19-b955e0b00953> in <module>()
1 my_df['float_col_quantile'] = pd.qcut(my_df['float_col'], 100, labels=False)
----> 2 my_df['int_col_quantile'] = pd.qcut(my_df['int_col'].astype(float), 100, labels=False)
/usr/local/lib/python3.4/dist-packages/pandas/tools/tile.py in qcut(x, q, …Run Code Online (Sandbox Code Playgroud)