将函数应用于数据帧行，使用结果作为下一行输入

Question

将函数应用于数据帧行，使用结果作为下一行输入

我正在尝试创建一个基本的调度系统。这是我到目前为止所拥有的：

我有一个 pandas 数据框job_data，如下所示：

厕所	工作	开始	期间
1	J1	2022-08-16 07:30:00	17 号
1	J2	2022-08-16 07:30:00	5
2	J3	2022-08-16 07:30:00	21
2	J4	2022-08-16 07:30:00	12

它包含 wc（工作中心）、作业、作业的开始日期和持续时间（以小时为单位）。

我创建了一个函数add_hours，它接受以下参数：start（日期时间）、hours（int）。

它根据开始时间和持续时间计算作业何时完成。

add_hours的代码是：

def is_in_open_hours(dt):
    return (
        dt.weekday() in business_hours["weekdays"]
        and dt.date() not in holidays
        and business_hours["from"].hour <= dt.time().hour < business_hours["to"].hour
    )


def get_next_open_datetime(dt):
    while True:
        dt = dt + timedelta(days=1)
        if dt.weekday() in business_hours["weekdays"] and dt.date() not in holidays:
            dt = datetime.combine(dt.date(), business_hours["from"])
            return dt


def add_hours(dt, hours):
    while hours != 0:
        if is_in_open_hours(dt):
            dt = dt + timedelta(hours=1)
            hours = hours - 1
        else:
            dt = get_next_open_datetime(dt)
    return dt

Run Code Online (Sandbox Code Playgroud)

计算结束列的代码是：

df["end"] = df.apply(lambda x: add_hours(x.start, x.duration), axis=1)

Run Code Online (Sandbox Code Playgroud)

函数的结果是最后一列：

厕所	工作	开始	期间	结尾
1	J1	2022-08-16 07:30:00	17 号	2022-08-17 14:00:00
1	J2	2022-08-16 07:30:00	5	2022-08-17 10:00:00
2	J3	2022-08-16 07:30:00	21	2022-08-18 08:00:00
2	J4	2022-08-16 07:30:00	12	2022-08-18 08:00:00

问题是，我需要第二行中的开始日期时间是前一行的结束日期时间，而不是它们全部使用相同的开始日期。我还需要为每个厕所重新开始这个过程。

所以期望的输出是：

厕所	工作	开始	期间	结尾
1	J1	2022-08-16 07:30:00	17 号	2022-08-17 14:00:00
1	J2	2022-08-17 14:00:00	5	2022-08-17 19:00:00
2	J3	2022-08-16 07:30:00	21	2022-08-18 08:00:00
2	J4	2022-08-18 08:00:00	10	2022-08-18 18:00:00

Answer 1

D.L*_*D.L 1

我展示了一种替代方法，您只需要first start date然后根据作业持续时间引导列表。


# import required modules
import io
import pandas as pd
from datetime import datetime
from datetime import timedelta

# make a dataframe
# note: only the first start date is required
x = '''
wc  job start   duration    end
1   J1  2022-08-16 07:30:00 17  2022-08-17 14:00:00
1   J2  2022-08-16 07:30:00 5   2022-08-17 10:00:00
2   J3  2022-08-16 07:30:00 21  2022-08-18 08:00:00
2   J4  2022-08-16 07:30:00 12  2022-08-18 08:00:00
'''
data = io.StringIO(x)

df = pd.read_csv(data, sep='\t')

# construct start and end lists
start = datetime.strptime(df['start'][0], '%Y-%m-%d %H:%M:%S')
start_list = [start]
end_list = []
for x in df['duration']:
    time_change = timedelta(hours=float(x))
    new_time = start_list[-1] + time_change
    start_list.append(new_time)
    end_list.append(new_time)

start_list.pop(-1)

# add to dataframe
df['start'] = start_list
df['end'] = end_list

# finished
df

Run Code Online (Sandbox Code Playgroud)

结果是这样的：

归档时间：	3 年，5 月前
查看次数：	594 次
最近记录：	3 年，5 月前