将带有日期值的列表加载到pandas数据帧中,并绘制一段时间内的活动

Cur*_*tLH 3 python time-series pandas

我有一些Twitter数据,我想根据推文的类型(推文/提及/转推)加班加点活动.

数据当前被加载到包含 date 的元组列表中type:

time = [('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'tweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'retweet'),
        ('2014-04-13', 'mention'),
        ('2014-04-13', 'tweet')]
Run Code Online (Sandbox Code Playgroud)

我已将数据加载到pandas DataFrame中:

time_df = pd.DataFrame(time, columns=['date','time'])
Run Code Online (Sandbox Code Playgroud)

现在数据看起来像这样:

         date     time
0  2014-04-13    tweet
1  2014-04-13    tweet
2  2014-04-13  mention
3  2014-04-13  retweet
4  2014-04-13  mention
...
...
...
Run Code Online (Sandbox Code Playgroud)

但是,现在我在绘制这些数据时会迷失方向.另外,我想将每种类型(推文/提及/转推)分解为不同的颜色线.我还应该注意,有时我可能需要按日/周/月汇总数据.

理想情况下,我希望我的情节看起来类似于下面的情节,除了Tweet,Mention,Retweet:

http://pandas.pydata.org/pandas-docs/stable/visualization.html

Pau*_*l H 5

所以,我想我明白你需要做什么,即使你的问题没有明确说明.

请允许我模拟一些数据:

import numpy as np
import pandas
import random

tweet_types = ['tweet', 'retweet', 'mention']
index = pandas.DatetimeIndex(freq='5min', start='2014-04-13', end='2014-05-13')
tweets = [random.choice(tweet_types) for _ in range(len(index))]
time_df = pandas.DataFrame(index=index, data=tweets, columns=['tweet type'])
time_df['day'] = time_df.index.date
time_df['count'] = 1
print(time_df.head())
Run Code Online (Sandbox Code Playgroud)

所以前几行现在看起来像这样:

                     tweet type         day  count
2014-04-13 00:00:00     mention  2014-04-13      1
2014-04-13 00:05:00     mention  2014-04-13      1
2014-04-13 00:10:00       tweet  2014-04-13      1
2014-04-13 00:15:00       tweet  2014-04-13      1
2014-04-13 00:20:00     retweet  2014-04-13      1
Run Code Online (Sandbox Code Playgroud)

我添加了这个count值,因为我们需要一些东西来累计我们的日常聚合,在这里完成:

daily_counts = time_df.groupby(by=['tweet type', 'day']).count()
daily_counts_xtab = daily_counts.unstack(level='tweet type')['count']
print(daily_counts_xtab.head())
Run Code Online (Sandbox Code Playgroud)

这给了我们......

tweet type  mention  retweet  tweet
day                                
2014-04-13       89      101     98
2014-04-14       98      113     77
2014-04-15       87      103     98
2014-04-16       81      107    100
2014-04-17       96       92    100
Run Code Online (Sandbox Code Playgroud)

那么

daily_counts_xtab.plot()
Run Code Online (Sandbox Code Playgroud)

给我:

在此输入图像描述