17 python numpy dataframe assign pandas
这个问题与排班或人员配备有关.我正在尝试为个人(员工)分配各种工作.使用df下面的,
`[Person]` = Individuals (employees)
`[Area]` and `[Place]` = unique jobs
`[On]` = How many unique jobs are occurring at each point in time
Run Code Online (Sandbox Code Playgroud)
因此[Area],[Place]在一起将构成unique不同工作的价值观.这些值将分配给个人,总体目标是尽可能少地使用个人.assigned任何一个人最独特的值是3. [On]显示正在和正在发生的当前unique值的数量.因此,这为我需要的人数提供了具体的指导.例如,[Place][Area]
1-3 unique values occurring = 1 individual
4-6 unique values occurring = 2 individuals
7-9 unique values occurring = 3 individuals etc
Run Code Online (Sandbox Code Playgroud)
问:
在哪里量unique的值[Area]和[Place]大于3是造成我的麻烦.我不能做groupby,我assign第一个3 unique values到individual 1,并在未来3个unique值individual 2等我想组唯一值[Area]和[Place]通过[Area].因此,assign在[Area]个人(最多3个)中查找相同的值.然后,如果有剩余的值(<3),则应尽可能将它们组合成3个组.
我设想这种工作的方式是:通过一个展望未来hour.对于每个新row值,script应该看到有多少个值[On](这表明需要多少个人).当unique值> 3,他们应该是assigned通过grouping在相同的值[Area].如果有剩余的价值,无论如何它们应该组合成一组3.
将其分为一步一步的过程:
1)使用[On] Column,以确定通过查看有多少人需要到未来的一个hour
2)如果出现的unique值超过3 ,则[Area]首先分配相同的值.
3)如果有任何剩余的值,那么无论如何都要结合起来.
对于df下面,有9个unique发生对值[Place]和[Area]用hour.所以我们应该有3个人assigned.当unique值> 3时,应分配[Area]并查看是否出现相同的值.在剩余的值应该与具有小于3倍其他个人相结合unique的值.
import pandas as pd
import numpy as np
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],
'Area' : ['A','B','C','D','E','D','E','F','G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],
})
df = pd.DataFrame(data=d)
Run Code Online (Sandbox Code Playgroud)
这是我的尝试:
def reduce_df(df):
values = df['Area'] + df['Place']
df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..
person_count = df1.groupby('Person')['Person'].agg('count')
leftover_count = person_count[person_count < 3] # the 'leftovers'
# try merging pairs together
nleft = leftover_count.shape[0]
to_try = np.arange(nleft - 1)
to_merge = (leftover_count.values[to_try] +
leftover_count.values[to_try + 1]) <= 3
to_merge[1:] = to_merge[1:] & ~to_merge[:-1]
to_merge = to_try[to_merge]
merge_dict = dict(zip(leftover_count.index.values[to_merge+1],
leftover_count.index.values[to_merge]))
def change_person(p):
if p in merge_dict.keys():
return merge_dict[p]
return p
reduced_df = df.copy()
# update df with the merges you found
reduced_df['Person'] = reduced_df['Person'].apply(change_person)
return reduced_df
df1 = (reduce_df(reduce_df(df)))
Run Code Online (Sandbox Code Playgroud)
这是输出:
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 4
4 8:35:00 House 5 E 5 Person 5
5 8:40:00 House 1 D 6 Person 4
6 8:42:00 House 2 E 7 Person 5
7 8:45:00 House 3 F 8 Person 5
8 8:50:00 House 2 G 9 Person 7
Run Code Online (Sandbox Code Playgroud)
这是我的预期输出:
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 5 E 5 Person 3
5 8:40:00 House 6 D 6 Person 2
6 8:42:00 House 2 E 7 Person 3
7 8:45:00 House 3 F 8 Person 2
8 8:50:00 House 2 G 9 Person 3
Run Code Online (Sandbox Code Playgroud)
我想如何获得此输出的说明:
Index 0: One `unique` value occurring. So `assign` to individual 1
Index 1: Two `unique` values occurring. So `assign` to individual 1
Index 2: Three `unique` values occurring. So `assign` to individual 1
Index 3: Four `unique` values on. So `assign` to individual 2
Index 4: Five `unique` values on. This one is a bit tricky and hard to conceptualise. But there is another `E` within an `hour`. So `assign` to a new individual so it can be combined with the other `E`
Index 5: Six `unique` values on. Should be `assigned` with the other `D`. So individual 2
Index 6: Seven `unique` values on. Should be `assigned` with other `E`. So individual 3
Index 7: Eight `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
Index 8: Nine `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
Run Code Online (Sandbox Code Playgroud)
示例No2:
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],
'Area' : ['X','X','X','X','X','X','X','X','X'],
'On' : ['1','2','3','3','3','3','3','3','3'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],
})
df = pd.DataFrame(data=d)
Run Code Online (Sandbox Code Playgroud)
我收到一个错误:
IndexError: index 1 is out of bounds for axis 1 with size 1
Run Code Online (Sandbox Code Playgroud)
在这一行:
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
Run Code Online (Sandbox Code Playgroud)
但是,如果我将Person更改为1,2,3重复,则返回以下内容:
'Person' : ['Person 1','Person 2','Person 3','Person 1','Person 2','Person 3','Person 1','Person 2','Person 3'],
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 2
4 8:35:00 House 2 X 3 Person 2
5 8:40:00 House 3 X 3 Person 2
6 8:42:00 House 1 X 3 Person 3
7 8:45:00 House 2 X 3 Person 3
8 8:50:00 House 3 X 3 Person 3
Run Code Online (Sandbox Code Playgroud)
预期产出:
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 1
4 8:35:00 House 2 X 3 Person 1
5 8:40:00 House 3 X 3 Person 1
6 8:42:00 House 1 X 3 Person 1
7 8:45:00 House 2 X 3 Person 1
8 8:50:00 House 3 X 3 Person 1
Run Code Online (Sandbox Code Playgroud)
示例2的主要内容是:
1) There are <3 unique values on so assign to individual 1
Run Code Online (Sandbox Code Playgroud)
这是函数形式的答案allocatePeople。它基于预先计算一小时内重复区域的所有索引:
from collections import Counter
import numpy as np
import pandas as pd
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
Run Code Online (Sandbox Code Playgroud)
df['Person'].unique()注意在 中的使用allocatePeople。这可以处理人们在输入中重复的情况。假设输入中人员的顺序是分配这些人员所需的顺序。
我allocatePeople针对OP的示例输入(example1和example2)以及我想出的几个边缘情况进行了测试,我认为(?)与OP所需的算法相匹配:
ds = dict(
example1 = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],
'Area' : ['A','B','C','D','E','D','E','F','G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],
}),
example2 = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],
'Area' : ['X','X','X','X','X','X','X','X','X'],
'On' : ['1','2','3','3','3','3','3','3','3'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],
}),
long_repeats = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:25:00','8:30:00','8:31:00','8:35:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],
'Area' : ['A','A','A','A','B','C','C','C','B'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 4','Person 4','Person 3'],
'On' : ['1','2','3','4','5','6','7','8','9'],
}),
many_repeats = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 2'],
'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'F'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],
}),
large_gap = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 3'],
'Area' : ['A', 'B', 'C', 'D', 'E', 'F', 'D', 'D', 'D'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],
}),
different_times = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','09:42:00','09:45:00','09:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 1'],
'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],
})
)
expectedPeoples = dict(
example1 = [1,1,1,2,3,2,3,2,3],
example2 = [1,1,1,1,1,1,1,1,1],
long_repeats = [1,1,1,2,2,3,3,3,2],
many_repeats = [1,1,1,2,2,3,3,2,3],
large_gap = [1,1,1,2,3,3,2,2,3],
different_times = [1,1,1,2,2,2,3,3,3],
)
for name,d in ds.items():
df = pd.DataFrame(d)
expected = ['Person %d' % i for i in expectedPeoples[name]]
ap = allocatePeople(df)
print(name, ap, sep='\n', end='\n\n')
np.testing.assert_array_equal(ap['Person'], expected)
Run Code Online (Sandbox Code Playgroud)
语句assert_array_equal通过,并且输出与 OP 的预期输出匹配:
example1
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 5 E 5 Person 3
5 08:40:00 House 1 D 6 Person 2
6 08:42:00 House 2 E 7 Person 3
7 08:45:00 House 3 F 8 Person 2
8 08:50:00 House 2 G 9 Person 3
example2
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 1
4 8:35:00 House 2 X 3 Person 1
5 8:40:00 House 3 X 3 Person 1
6 8:42:00 House 1 X 3 Person 1
7 8:45:00 House 2 X 3 Person 1
8 8:50:00 House 3 X 3 Person 1
Run Code Online (Sandbox Code Playgroud)
我的测试用例的输出也符合我的期望:
long_repeats
Time Place Area Person On
0 8:03:00 House 1 A Person 1 1
1 8:17:00 House 2 A Person 1 2
2 8:20:00 House 3 A Person 1 3
3 8:25:00 House 4 A Person 2 4
4 8:30:00 House 1 B Person 2 5
5 8:31:00 House 1 C Person 3 6
6 8:35:00 House 2 C Person 3 7
7 8:45:00 House 3 C Person 3 8
8 8:50:00 House 2 B Person 2 9
many_repeats
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 1 D 5 Person 2
5 08:40:00 House 1 E 6 Person 3
6 08:42:00 House 2 E 7 Person 3
7 08:45:00 House 1 F 8 Person 2
8 08:50:00 House 2 F 9 Person 3
large_gap
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 1 E 5 Person 3
5 08:40:00 House 1 F 6 Person 3
6 08:42:00 House 2 D 7 Person 2
7 08:45:00 House 1 D 8 Person 2
8 08:50:00 House 3 D 9 Person 3
different_times
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 1 D 5 Person 2
5 08:40:00 House 1 E 6 Person 2
6 09:42:00 House 2 E 7 Person 3
7 09:45:00 House 1 F 8 Person 3
8 09:50:00 House 1 G 9 Person 3
Run Code Online (Sandbox Code Playgroud)
让我知道它是否满足您想要的一切,或者是否仍需要一些调整。我想每个人都渴望看到你实现你的愿景。
| 归档时间: |
|
| 查看次数: |
1310 次 |
| 最近记录: |