qua*_*ant 5 python python-3.x pandas
这是这个的后续问题
我有两个dataframes:
print df_1
timestamp A B
0 2016-05-15 0.020228 0.026572
1 2016-05-15 0.057780 0.175499
2 2016-05-15 0.098808 0.620986
3 2016-05-17 0.158789 1.014819
4 2016-05-17 0.038129 2.384590
5 2018-05-17 0.011111 9.999999
print df_2
start end event
0 2016-05-14 2016-05-16 E1
1 2016-05-14 2016-05-16 E2
2 2016-05-17 2016-05-18 E3
Run Code Online (Sandbox Code Playgroud)
如果落在和之间,我想合并df_1和df_2并获取。event columndf_1timestampstartend
问题以及与这个问题的差异是
1) eventsE1和E2具有相同的start和end。
2) 同样在df_1第6行中不属于任何区间。
最后,我希望同时拥有这两个事件,并且对于没有任何事件的行NA。
所以我希望我的结果dataframe是这样的
timestamp A B event
0 2016-05-15 0.020228 0.026572 E1
1 2016-05-15 0.057780 0.175499 E1
2 2016-05-15 0.098808 0.620986 E1
3 2016-05-15 0.020228 0.026572 E2
4 2016-05-15 0.057780 0.175499 E2
5 2016-05-15 0.098808 0.620986 E2
6 2016-05-17 0.158789 1.014819 E3
7 2016-05-17 0.038129 2.384590 E3
8 2018-05-17 0.011111 9.999999 NA
Run Code Online (Sandbox Code Playgroud)
import pandas as pd
df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
'end':['2016-05-16','2016-05-16','2016-05-18'],
'event':['E1','E2','E3']})
df_1.timestamp = pd.to_datetime(df_1.timestamp, format='%Y-%m-%d')
df_2.start = pd.to_datetime(df_2.start, format='%Y-%m-%d')
df_2.end = pd.to_datetime(df_2.end, format='%Y-%m-%d')
# convert game_ref_dt to long format with all the dates in between, and do a left merge on date
df_2_2 = pd.melt(df_2, id_vars='event', value_name='timestamp')
df_2_2.timestamp = pd.to_datetime(df_2_2.timestamp)
df_2_2.set_index('timestamp', inplace=True)
df_2_2.drop('variable', axis=1, inplace=True)
df_2_3 = df_2_2.groupby('event').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
df_2 = pd.merge(df_2, df_2_3)
df_2 = df_2.drop(columns=['start', 'end'])
df_1 = df_1.merge(df_2,on='timestamp', how='left')
print(df_1)
timestamp A event
0 2016-05-15 1 E1
1 2016-05-15 1 E2
2 2016-05-15 1 E1
3 2016-05-15 1 E2
4 2016-05-15 1 E1
5 2016-05-15 1 E2
6 2016-05-17 1 E3
7 2016-05-17 1 E3
8 2018-05-17 1 NaN
Run Code Online (Sandbox Code Playgroud)
归功于此
也是这个解决方案,但没有给出NA最后一行
import pandas as pd
df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
'end':['2016-05-16','2016-05-16','2016-05-18'],
'event':['E1','E2','E3']})
df_try2 = pd.merge(df_1.assign(key=1), df_2.assign(key=1), on='key').query('timestamp >= start and timestamp <= end')
print(df_try2)
timestamp A key start end event
0 2016-05-15 1 1 2016-05-14 2016-05-16 E1
1 2016-05-15 1 1 2016-05-14 2016-05-16 E2
3 2016-05-15 1 1 2016-05-14 2016-05-16 E1
4 2016-05-15 1 1 2016-05-14 2016-05-16 E2
6 2016-05-15 1 1 2016-05-14 2016-05-16 E1
7 2016-05-15 1 1 2016-05-14 2016-05-16 E2
11 2016-05-17 1 1 2016-05-17 2016-05-18 E3
14 2016-05-17 1 1 2016-05-17 2016-05-18 E3
Run Code Online (Sandbox Code Playgroud)