重新组合pandas df中的列值

17 python sorting grouping dataframe pandas

我有一个script指派基于关闭两个值columns中的一个pandas df.下面的代码能够实现第一步,但我正在努力实现第二步.

所以脚本最初应该:

1)分配Person为每个单独的string[Area]与所述第一3 unique values[Place]

2)看看重新分配People少于3 unique values 示例.在df下面有6 unique values[Area][Place].但是3 People被分配了.理想情况下,每个2人都会2 unique values

d = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],                 
   'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],                 
    'Area' : ['X','X','Y','X','X','X','X','X'],    
     })

df = pd.DataFrame(data=d)

def g(gps):
        s = gps['Place'].unique()
        d = dict(zip(s, np.arange(len(s)) // 3 + 1))
        gps['Person'] = gps['Place'].map(d)
        return gps

df = df.groupby('Area', sort=False).apply(g)
s = df['Person'].astype(str) + df['Area']
df['Person'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('Person ')
Run Code Online (Sandbox Code Playgroud)

输出:

       Time    Place Area    Person
0   8:03:00  House 1    X  Person 1
1   8:17:00  House 2    X  Person 1
2   8:20:00  House 1    Y  Person 2
3  10:15:00  House 3    X  Person 1
4  10:15:00  House 4    X  Person 3
5  11:48:00  House 5    X  Person 3
6  12:00:00  House 1    X  Person 1
7  12:10:00  House 1    X  Person 1
Run Code Online (Sandbox Code Playgroud)

如您所见,第一步工作正常.或者每个个体string[Area],第一3 unique values[Place]被分配给一个Person.这使得Person 13 values,Person 21 valuePerson 32 values.

第二步是我在努力的地方.

如果a Person已经少于3 unique values分配给他们,那么改变它,以便每个人Person都有3 unique values

预期产出:

       Time    Place Area    Person
0   8:03:00  House 1    X  Person 1
1   8:17:00  House 2    X  Person 1
2   8:20:00  House 1    Y  Person 2
3  10:15:00  House 3    X  Person 1
4  10:15:00  House 4    X  Person 2
5  11:48:00  House 5    X  Person 2
6  12:00:00  House 1    X  Person 1
7  12:10:00  House 1    X  Person 1
Run Code Online (Sandbox Code Playgroud)

描述:

script已经columns分配给所有好人.pandasdf曾少,所以我们应该结合这些.所有重复值应保持不变.

在此输入图像描述

Dav*_*vid 3

据我了解,您对人员分配之前的一切感到满意。因此,这里有一个即插即用的解决方案,可以“合并”少于 3 个唯一值的人员,这样每个人最终都会有 3 个唯一值,显然除了最后一个值(基于您发布的倒数第二个 df(“输出:”),而没有触摸那些已经有 3 个独特值的值,然后合并其他值。

编辑:大大简化了代码。再次以 df 作为输入:

n = 3
df['complete'] = df.Person.apply(lambda x: 1 if df.Person.tolist().count(x) == n else 0)
df['num'] = df.Person.str.replace('Person ','')
df.sort_values(by=['num','complete'],ascending=True,inplace=True) #get all persons that are complete to the top

c = 0
person_numbers = []
for x in range(0,999): #Create the numbering [1,1,1,2,2,2,3,3,3,...] with n defining how often a person is 'repeated'
    if x % n == 0:
        c += 1        
    person_numbers.append(c) 

df['Person_new'] = person_numbers[0:len(df)] #Add the numbering to the df
df.Person = 'Person ' + df.Person_new.astype(str) #Fill the person column with the new numbering
df.drop(['complete','Person_new','num'],axis=1,inplace=True)
Run Code Online (Sandbox Code Playgroud)