按时计算后在数据框中添加新列

5 python python-2.7 pandas

DataFrame喜欢这样的:

          Name           first_seen       last_seen
   0      Random guy 1   5/22/2016 18:12  5/22/2016 18:15 
   1      Random guy 2   5/22/2016 12:03  5/22/2016 12:03 
   2      Random guy 3   5/22/2016 21:06  5/22/2016 21:06
   3      Random guy 4   5/22/2016 16:20  5/22/2016 16:20 
   4      Random guy 5   5/22/2016 14:46  5/22/2016 14:46 
Run Code Online (Sandbox Code Playgroud)

现在我必须添加一个columnnamed Visit_period,[morning,afternoon,evening,night]当该person(row)花费的最长时间落入时,它将获取4个值中的一个:

 - morning: 08:00 to 12:00 hrs
 - afternoon: 12:00 to 16:00 hrs
 - evening: 16:00 to 20:00 hrs
 - night: 20:00 to 24:00 hrs
Run Code Online (Sandbox Code Playgroud)

所以对于上面的五行输出将是这样的.

   visit_period
        evening
      afternoon
          night
        evening
      afternoon  
Run Code Online (Sandbox Code Playgroud)

我已经提到了花费的最长时间,因为有些人可能会first_seen在14:30到last_seen16:21.我想分配价值,afternoon因为他在下午的平板上花了30分钟,在晚上的平板上花了21分钟.我正在使用python 2.7.

Ste*_*fan 1

您可以使用apply以下main_visit_period函数来尝试根据您概述的条件分配访问期限:

times = list(range(8, 21, 4))
labels = ['morning', 'afternoon', 'evening', 'night']
periods = dict(zip(times, labels))
Run Code Online (Sandbox Code Playgroud)

这使:

{8: 'morning', 16: 'evening', 12: 'afternoon', 20: 'night'}
Run Code Online (Sandbox Code Playgroud)

现在分配句点的函数:

def period(row):
    visit_start = {'hour': row.first_seen.hour, 'min': row.first_seen.minute} # get hour, min of visit start
    visit_end = {'hour': row.last_seen.hour, 'min': row.last_seen.minute} # get hour, min of visit end
    for period_start, label in periods.items():
        period_end = period_start + 4
        if period_start <= visit_start['hour'] < period_end:
            if period_start <= visit_end['hour'] < period_end or (period_end - visit_start['hour']) * 60 - visit_start['min'] > (visit_end['hour'] - period_end) * 60 + visit_end['min']:
                return label
            else:
                return periods[period_end] # assign label of following period  
Run Code Online (Sandbox Code Playgroud)

最后.apply()

df['period'] = df.apply(period, axis=1)
Run Code Online (Sandbox Code Playgroud)

要得到:

           Name          first_seen           last_seen     period
0  Random guy 1 2016-05-22 18:12:00 2016-05-22 18:15:00    evening
1  Random guy 2 2016-05-22 12:03:00 2016-05-22 12:03:00  afternoon
2  Random guy 3 2016-05-22 21:06:00 2016-05-22 21:06:00      night
3  Random guy 4 2016-05-22 16:20:00 2016-05-22 16:20:00    evening
4  Random guy 5 2016-05-22 14:46:00 2016-05-22 14:46:00  afternoon
Run Code Online (Sandbox Code Playgroud)