17 python sorting grouping dataframe pandas
我有一个script指派基于关闭两个值columns中的一个pandas df.下面的代码能够实现第一步,但我正在努力实现第二步.
所以脚本最初应该:
1)分配Person为每个单独的string在[Area]与所述第一3 unique values中[Place]
2)看看重新分配People少于3 unique values
示例.在df下面有6 unique values中[Area]和[Place].但是3 People被分配了.理想情况下,每个2人都会2 unique values
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],
'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],
'Area' : ['X','X','Y','X','X','X','X','X'],
})
df = pd.DataFrame(data=d)
def g(gps):
s = gps['Place'].unique()
d = dict(zip(s, np.arange(len(s)) // 3 + 1))
gps['Person'] = gps['Place'].map(d)
return gps
df = df.groupby('Area', sort=False).apply(g)
s = df['Person'].astype(str) + df['Area']
df['Person'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('Person ')
Run Code Online (Sandbox Code Playgroud)
输出:
Time Place Area Person
0 8:03:00 House 1 X Person 1
1 8:17:00 House 2 X Person 1
2 8:20:00 House 1 Y Person 2
3 10:15:00 House 3 X Person 1
4 10:15:00 House 4 X Person 3
5 11:48:00 House 5 X Person 3
6 12:00:00 House 1 X Person 1
7 12:10:00 House 1 X Person 1
Run Code Online (Sandbox Code Playgroud)
如您所见,第一步工作正常.或者每个个体string中[Area],第一3 unique values在[Place]被分配给一个Person.这使得Person 1用3 values,Person 2用1 value而Person 3用2 values.
第二步是我在努力的地方.
如果a Person已经少于3 unique values分配给他们,那么改变它,以便每个人Person都有3 unique values
预期产出:
Time Place Area Person
0 8:03:00 House 1 X Person 1
1 8:17:00 House 2 X Person 1
2 8:20:00 House 1 Y Person 2
3 10:15:00 House 3 X Person 1
4 10:15:00 House 4 X Person 2
5 11:48:00 House 5 X Person 2
6 12:00:00 House 1 X Person 1
7 12:10:00 House 1 X Person 1
Run Code Online (Sandbox Code Playgroud)
描述:
script已经columns分配给所有好人.pandas和df曾少,所以我们应该结合这些.所有重复值应保持不变.
据我了解,您对人员分配之前的一切感到满意。因此,这里有一个即插即用的解决方案,可以“合并”少于 3 个唯一值的人员,这样每个人最终都会有 3 个唯一值,显然除了最后一个值(基于您发布的倒数第二个 df(“输出:”),而没有触摸那些已经有 3 个独特值的值,然后合并其他值。
编辑:大大简化了代码。再次以 df 作为输入:
n = 3
df['complete'] = df.Person.apply(lambda x: 1 if df.Person.tolist().count(x) == n else 0)
df['num'] = df.Person.str.replace('Person ','')
df.sort_values(by=['num','complete'],ascending=True,inplace=True) #get all persons that are complete to the top
c = 0
person_numbers = []
for x in range(0,999): #Create the numbering [1,1,1,2,2,2,3,3,3,...] with n defining how often a person is 'repeated'
if x % n == 0:
c += 1
person_numbers.append(c)
df['Person_new'] = person_numbers[0:len(df)] #Add the numbering to the df
df.Person = 'Person ' + df.Person_new.astype(str) #Fill the person column with the new numbering
df.drop(['complete','Person_new','num'],axis=1,inplace=True)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1085 次 |
| 最近记录: |