基于一天中不同时间的密度图

use*_*413 1 r ggplot2 lubridate density-plot

我有以下数据集:

https://app.box.com/s/au58xaw60r1hyeek5cua6q20byumgvmj

我想根据一天中的时间创建密度图。这是我到目前为止所做的:

library("ggplot2")
library("scales")
library("lubridate")

timestamp_df$timestamp_time <- format(ymd_hms(hn_tweets$timestamp), "%H:%M:%S")

ggplot(timestamp_df, aes(timestamp_time)) + 
       geom_density(aes(fill = ..count..)) +
       scale_x_datetime(breaks = date_breaks("2 hours"),labels=date_format("%H:%M"))
Run Code Online (Sandbox Code Playgroud)

它给出以下错误: Error: Invalid input: time_trans works with objects of class POSIXct only

如果我将其转换为POSIXct,它会将日期添加到数据中。

更新1

以下数据转换为“NA”

timestamp_df$timestamp_time <- as.POSIXct(timestamp_df$timestamp_time, format = "%H:%M%:%S", tz = "UTC"
Run Code Online (Sandbox Code Playgroud)

更新2

以下是我想要实现的目标: 在此输入图像描述

rei*_*ner 5

此处发布的解决方案的一个问题是,它们忽略了该数据是圆形/极坐标(即 00 小时 == 24 小时)这一事实。您可以在另一个答案的图表中看到图表的末端彼此不匹配。这不会对这个特定的数据集产生太大的影响,但对于午夜附近发生的事件,这可能是一个极其有偏差的密度估计量。这是我的解决方案,考虑到时间数据的循环性质:

# modified code from https://freakonometrics.hypotheses.org/2239

library(dplyr)
library(ggplot2)
library(lubridate)
library(circular)

df = read.csv("data.csv")
datetimes = df$timestamp %>%
  lubridate::parse_date_time("%m/%d/%Y %h:%M")
times_in_decimal = lubridate::hour(datetimes) + lubridate::minute(datetimes) / 60
times_in_radians = 2 * pi * (times_in_decimal / 24)

# Doing this just for bandwidth estimation:
basic_dens = density(times_in_radians, from = 0, to = 2 * pi)

res = circular::density.circular(circular::circular(times_in_radians,
                                                    type = "angle",
                                                    units = "radians",
                                                    rotation = "clock"),
                                 kernel = "wrappednormal",
                                 bw = basic_dens$bw)

time_pdf = data.frame(time = as.numeric(24 * (2 * pi + res$x) / (2 * pi)), # Convert from radians back to 24h clock
                      likelihood = res$y)

p = ggplot(time_pdf) +
  geom_area(aes(x = time, y = likelihood), fill = "#619CFF") +
  scale_x_continuous("Hour of Day", labels = 0:24, breaks = 0:24) +
  scale_y_continuous("Likelihood of Data") +
  theme_classic()
Run Code Online (Sandbox Code Playgroud)

考虑圆形数据的密度图

请注意,密度图的值和斜率在 00h 和 24h 点处匹配。