Stu*_*nce 3 python dataframe cumulative-sum pandas
我正试图找到一种方法来做一个累计总计来解释熊猫的关系.
让我们从赛道会议中获取假设数据,在那里我有人,赛,热和时间.
每个人的位置根据以下内容:
对于给定的种族/热量组合:
等等...
这将是相当简单的代码,但一方面..
如果两个人有相同的时间,他们都会得到相同的位置,然后下一次大于他们的时间将具有该值+ 1作为位置.
在下表中,对于100码短跑,加热1,RUNNER1首先完成,RUNNER2/RUNNER3 获得第二, RUNNER3 获得第三(下一次在RUNNER2/RUNNER3之后)
所以基本上,逻辑如下:
如果race <> race.shift()或heat <> heat.shift(),则place = 1
如果race = race.shift()和heat = heat.shift()和time> time.shift那么place = place.shift()+ 1
如果race = race.shift()和heat = heat.shift()和time> time.shift那么place = place.shift()
令我困惑的部分是如何处理这种关系.否则我可以做点什么
df['Place']=np.where(
(df['race']==df['race'].shift())
&
(df['heat']==df['heat'].shift()),
df['Place'].shift()+1,
1)
Run Code Online (Sandbox Code Playgroud)
谢谢!
示例数据如下:
Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92
Run Code Online (Sandbox Code Playgroud)
我最终想要的是
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4
Run Code Online (Sandbox Code Playgroud)
[编辑]现在,更进一步..
让我们假设一旦我留下一组唯一值,下次该设置出现时,值将重置为1 ..
所以,例如, - 注意它变为"加热1"然后"加热2"并回到"加热1" - 我不希望排名从之前的"加热1"继续,而是我想要它们重置.
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2
Run Code Online (Sandbox Code Playgroud)
你可以使用:
grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
Run Code Online (Sandbox Code Playgroud)
import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})
grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
Heat Person Race Time Place Rank
0 1 RUNNER1 100 Yard Dash 9.87 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 4.0 5.0
11 1 RUNNER7 200 Yard Dash 19.63 1.0 1.0
12 1 RUNNER8 200 Yard Dash 19.67 2.0 2.0
13 1 RUNNER9 200 Yard Dash 19.72 3.0 3.0
14 1 RUNNER10 200 Yard Dash 19.72 3.0 3.0
15 1 RUNNER11 200 Yard Dash 19.86 4.0 5.0
16 1 RUNNER12 200 Yard Dash 19.92 5.0 6.0
Run Code Online (Sandbox Code Playgroud)
请注意,Pandas有一种Groupby.rank方法可以计算许多常见的排名形式 - 但不是你描述的那种.请注意,例如在第3行Rank,第二个和第三个参赛者之间的比赛结果是4,Place而是3.
关于编辑:使用
(df['Heat'] != df['Heat'].shift()).cumsum()
Run Code Online (Sandbox Code Playgroud)
消除歧义:
import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})
df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped = df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
Run Code Online (Sandbox Code Playgroud)
产量
Heat Person Race Time HeatGroup Place Rank
0 1 RUNNER1 100 Yard Dash 9.87 1 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 1 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 1 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 1 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 1 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 1 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 2 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 2 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 2 4.0 5.0
11 1 RUNNER7 100 Yard Dash 19.63 3 1.0 1.0
12 1 RUNNER8 100 Yard Dash 19.67 3 2.0 2.0
13 1 RUNNER9 100 Yard Dash 19.72 3 3.0 3.0
14 1 RUNNER10 100 Yard Dash 19.72 3 3.0 3.0
15 1 RUNNER11 100 Yard Dash 19.86 3 4.0 5.0
16 1 RUNNER12 100 Yard Dash 19.92 3 5.0 6.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
826 次 |
| 最近记录: |