Pre*_*cks 17 python-2.7 pandas
我有一个csv文件,显示订单中的零件.列包括天晚,数量和商品.
我需要将数据分组数天,将商品与数量之和进行分组.然而,延迟的日子需要分为几个范围.
>56
>35 and <= 56
>14 and <= 35
>0 and <=14
Run Code Online (Sandbox Code Playgroud)
我希望我可以使用一个字典.像这样的东西
{'Red':'>56,'Amber':'>35 and <= 56','Yellow':'>14 and <= 35','White':'>0 and <=14'}
Run Code Online (Sandbox Code Playgroud)
我正在寻找这样的结果
Red Amber Yellow White
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
Run Code Online (Sandbox Code Playgroud)
我是熊猫的新手,所以我不知道这是否可行.谁能提供一些建议.
谢谢
unu*_*tbu 26
假设您从这些数据开始:
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
# Days Late ID quantity
# 0 60 STRSUB 56
# 1 60 BOTDWG 20
# 2 50 STRSUB 60
# 3 50 BOTDWG 67
# 4 20 STRSUB 74
# 5 20 BOTDWG 87
# 6 10 STRSUB 40
# 7 10 BOTDWG 34
Run Code Online (Sandbox Code Playgroud)
然后,您可以使用找到状态类别pd.cut
.请注意,默认情况下,pd.cut
将Series拆分df['Days Late']
为半开区间的类别(-1, 14], (14, 35], (35, 56], (56, 365]
:
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
print(df)
# ID quantity status
# 0 STRSUB 56 Red
# 1 BOTDWG 20 Red
# 2 STRSUB 60 Amber
# 3 BOTDWG 67 Amber
# 4 STRSUB 74 Yellow
# 5 BOTDWG 87 Yellow
# 6 STRSUB 40 White
# 7 BOTDWG 34 White
Run Code Online (Sandbox Code Playgroud)
现在用于pivot
获取所需形式的DataFrame:
df = df.pivot(index='ID', columns='status', values='quantity')
Run Code Online (Sandbox Code Playgroud)
并用于reindex
获取行和列的所需顺序:
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
Run Code Online (Sandbox Code Playgroud)
从而,
import numpy as np
import pandas as pd
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
df = df.pivot(index='ID', columns='status', values='quantity')
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
Red Amber Yellow White
ID
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
Run Code Online (Sandbox Code Playgroud)
您可以使用以下函数或函数DataFrame
在您的Days Late列中创建一个列.我们先创建一些示例数据.map
apply
df = pandas.DataFrame({ 'ID': 'foo,bar,foo,bar,foo,bar,foo,foo'.split(','),
'Days Late': numpy.random.randn(8)*20+30})
Days Late ID
0 30.746244 foo
1 16.234267 bar
2 14.771567 foo
3 33.211626 bar
4 3.497118 foo
5 52.482879 bar
6 11.695231 foo
7 47.350269 foo
Run Code Online (Sandbox Code Playgroud)
创建一个辅助函数来转换Days Late列的数据并添加一个名为Code的列.
def days_late_xform(dl):
if dl > 56: return 'Red'
elif 35 < dl <= 56: return 'Amber'
elif 14 < dl <= 35: return 'Yellow'
elif 0 < dl <= 14: return 'White'
else: return 'None'
df["Code"] = df['Days Late'].map(days_late_xform)
Days Late ID Code
0 30.746244 foo Yellow
1 16.234267 bar Yellow
2 14.771567 foo Yellow
3 33.211626 bar Yellow
4 3.497118 foo White
5 52.482879 bar Amber
6 11.695231 foo White
7 47.350269 foo Amber
Run Code Online (Sandbox Code Playgroud)
最后,您可以使用ID和代码列groupby
进行聚合,并按如下方式获取组的计数:
g = df.groupby(["ID","Code"]).size()
print g
ID Code
bar Amber 1
Yellow 2
foo Amber 1
White 2
Yellow 2
df2 = g.unstack()
print df2
Code Amber White Yellow
ID
bar 1 NaN 2
foo 1 2 2
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
16430 次 |
最近记录: |