Bra*_*rad 6 python dataframe pandas
我有一个熊猫数据框,代表了全年的轮班时间表,给出为:
January 2019 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Shift A 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1
Shift B 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2
Shift C 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0
Shift D 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0
February 2019 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 nan nan nan
Shift A 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 nan nan nan
Shift B 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 nan nan nan
Shift C 2 2 2 0 1 1 1 0 0 0 2 2 0 0 1 1 1 1 0 2 2 0 0 0 0 0 0 0 nan nan nan
Shift D 1 1 1 1 0 2 2 0 0 0 0 0 0 0 2 2 2 0 1 1 1 0 0 0 2 2 0 0 nan nan nan
Run Code Online (Sandbox Code Playgroud)
其中1代表日班(06:00-18:00),2代表夜班(18:00-06:00),0可以忽略。在给定的时间段内,只有一个轮班团队。
我需要一种数据格式,其中该数据由具有当前工作班次的DateTime时间戳索引,例如:
DateTime Shift
0 2019-01-01 06:00:00 A
1 2019-01-01 18:00:00 D
2 2019-01-02 06:00:00 A
3 2019-01-02 18:00:00 B
4 2019-01-03 06:00:00 A
5 2019-01-03 18:00:00 B
.
.
.
Run Code Online (Sandbox Code Playgroud)
重新编制数据索引以实现此目标的最有效的Pandas方法是什么,即避免for循环?
使用:
#get first column by position
first = df.iloc[:, 0]
#convert column to datetimes with missing values for no datetimes values
dates = pd.to_datetime(first, errors='coerce')
#mask for data row
mask = dates.isna()
#forward filling missing values and replace first NaNs by first column name
df.index = dates.ffill().fillna(pd.to_datetime(first.name))
#filter out rows with datetimes in first column, add first column to index
df = df[mask.values].set_index(first.name, append=True)
#convert columns names to timedeltas in days, first is 0 days
df.columns = pd.to_timedelta(df.columns.astype(int) - 1, unit='D')
#dictionary for map 1, 2 values
mapp = {1: pd.Timedelta('06:00:00'), 2:pd.Timedelta('18:00:00')}
#remove 0 rows with convert to NaN by mask and reshape by stack
#map by dict and convert MultiIndex to columns
df = (df.mask(df == 0)
.stack()
.map(mapp)
.rename_axis(('Datetime','Shift', 'day'))
.reset_index(name='td')
)
#add days to hours and add to Datetime
df['Datetime'] += (df.pop('td') + df.pop('day'))
#sorting ans create default index
df = df.sort_values(['Datetime','Shift']).reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)
print (df)
Datetime Shift
0 2019-01-01 06:00:00 Shift A
1 2019-01-01 18:00:00 Shift D
2 2019-01-02 06:00:00 Shift A
3 2019-01-02 18:00:00 Shift B
4 2019-01-03 06:00:00 Shift A
.. ... ...
113 2019-02-26 18:00:00 Shift D
114 2019-02-27 06:00:00 Shift A
115 2019-02-27 18:00:00 Shift B
116 2019-02-28 06:00:00 Shift A
117 2019-02-28 18:00:00 Shift B
[118 rows x 2 columns]
Run Code Online (Sandbox Code Playgroud)