如何在R ggplot中对此行列表进行直方图?

Léo*_* 준영 8 csv statistics r ggplot2

我试图通过以下过程在第一行中绘制描述性变量.我也尝试引用列/行名称失败

  1. 旋转CSV数据中的行和列,以获得线程中所需的重叠数据结构(高表)一个非常简单的直方图与R?ggplot
  2. 绘制事件的直方图Absolute可变XOR( ,Average,)MinMax

    • 如果只是绝对值,只需在直方图中绘制绝对值.
    • 如果(平均值,最小值和最大值),只需在直方图中用胡须(= 胡须图)绘制它们,其中胡须的极限由最小值和最大值组成.

数据

  1. 原来, data.csv

    "Vars"    , "Sleep", "Awake", "REM", "Deep"
    "Absolute",        ,       , 5     , 7
    "Average" , 7      , 12    ,       ,
    "Min"     , 4      , 5     ,       , 
    "Max"     , 10     , 15    ,       ,
    
    Run Code Online (Sandbox Code Playgroud)
  2. 视觉重塑后的数据

                V1       V2       V3       V4
    Vars  Absolute Average  Min      Max     
    Sleep     <NA>        7        4       10
    Awake     <NA>       12        5       15
    REM          5     <NA>     <NA>     <NA>
    Deep         7     <NA>     <NA>     <NA>
    
    Run Code Online (Sandbox Code Playgroud)
  3. 重塑R后的数据

     data <- structure(list(V1 = structure(c(3L, NA, NA, 1L, 2L), .Names = c("Vars", 
     "Sleep", "Awake", "REM", "Deep"), .Label = c(" 5", " 7", "Absolute"
     ), class = "factor"), V2 = structure(c(3L, 2L, 1L, NA, NA), .Names = c("Vars", 
     "Sleep", "Awake", "REM", "Deep"), .Label = c("12", " 7", "Average "
     ), class = "factor"), V3 = structure(c(3L, 1L, 2L, NA, NA), .Names = c("Vars", 
    "Sleep", "Awake", "REM", "Deep"), .Label = c(" 4", " 5", "Min     "
     ), class = "factor"), V4 = structure(c(3L, 1L, 2L, NA, NA), .Names = c("Vars", 
    "Sleep", "Awake", "REM", "Deep"), .Label = c("10", "15", "Max     "
     ), class = "factor")), .Names = c("V1", "V2", "V3", "V4"), row.names = c("Vars", 
    "Sleep", "Awake", "REM", "Deep"), class = "data.frame")
    
    Run Code Online (Sandbox Code Playgroud)

R代码带有调试代码

dat.m <- read.csv("data.csv")

# rotate rows and columns
dat.m <- as.data.frame(t(dat.m)) # https://stackoverflow.com/a/7342329/54964 Comment 42-

library("reshape2")
dat.m <- melt(dat.m, id.vars="Vars")

## Just plot values existing there correspondingly    
library("ggplot2")
# https://stackoverflow.com/a/25584792/54964
# TODO following
#ggplot(dat.m, aes(x = "Vars", y = value,fill=variable)) 
Run Code Online (Sandbox Code Playgroud)

错误

Error: id variables not found in data: Vars
Execution halted
Run Code Online (Sandbox Code Playgroud)

R: 3.3.3,3.4.0 (backports)
操作系统:Debian 8.7
R reshape2,ggplot2,... sessionInfo()加载两个软件包后

Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.1.0  reshape2_1.4.2

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2 scales_0.4.1     magrittr_1.5     plyr_1.8.4      
 [5] tools_3.3.3      gtable_0.2.0     Rcpp_0.12.10     stringi_1.1.5   
 [9] grid_3.3.3       stringr_1.2.0    munsell_0.4.3    
Run Code Online (Sandbox Code Playgroud)

测试HaberdashPI的提议

输出在图1中,其中在错误地绝对值SleepAwake.如果NA,只需将值设置为零.

图1 HaberdashPI的提案输出与预期不符

在此输入图像描述

dat.m转置前的数据结构

'data.frame':   4 obs. of  5 variables:
 $ Absolute: Factor w/ 2 levels " 5"," 7": NA NA 1 2
  ..- attr(*, "names")= chr  "Sleep" "Awake" "REM" "Deep"
 $ Average : Factor w/ 2 levels "12"," 7": 2 1 NA NA
  ..- attr(*, "names")= chr  "Sleep" "Awake" "REM" "Deep"
 $ Min     : Factor w/ 2 levels " 4"," 5": 1 2 NA NA
  ..- attr(*, "names")= chr  "Sleep" "Awake" "REM" "Deep"
 $ Max     : Factor w/ 2 levels "10","15": 1 2 NA NA
  ..- attr(*, "names")= chr  "Sleep" "Awake" "REM" "Deep"
 $ Vars    : chr  "Sleep" "Awake" "REM" "Deep"
      Absolute Average  Min      Max       Vars
Sleep     <NA>        7        4       10 Sleep
Awake     <NA>       12        5       15 Awake
REM          5     <NA>     <NA>     <NA>   REM
Deep         7     <NA>     <NA>     <NA>  Deep
Run Code Online (Sandbox Code Playgroud)

dat.m转置后的数据结构

'data.frame':   16 obs. of  3 variables:
 $ Vars    : chr  "Sleep" "Awake" "REM" "Deep" ...
 $ variable: Factor w/ 4 levels "Absolute","Average ",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ value   : chr  NA NA " 5" " 7" ...

    Vars variable value
1  Sleep Absolute  <NA>
2  Awake Absolute  <NA>
3    REM Absolute     5
4   Deep Absolute     7
5  Sleep Average      7
6  Awake Average     12
7    REM Average   <NA>
8   Deep Average   <NA>
9  Sleep Min          4
10 Awake Min          5
11   REM Min       <NA>
12  Deep Min       <NA>
13 Sleep Max         10
14 Awake Max         15
15   REM Max       <NA>
16  Deep Max       <NA>
Run Code Online (Sandbox Code Playgroud)

测试akash87的提议

ds <- dat.m
str(ds)
ds
ds$variable
ds$variable %in% c("Min","Max")
Run Code Online (Sandbox Code Playgroud)

输出错误,因为False最终

 $ Vars    : chr  "Sleep" "Awake" "REM" "Deep" ...
 $ variable: Factor w/ 4 levels "Absolute","Average ",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ value   : chr  NA NA " 5" " 7" ...
    Vars variable value
1  Sleep Absolute  <NA>
2  Awake Absolute  <NA>
3    REM Absolute     5
4   Deep Absolute     7
5  Sleep Average      7
6  Awake Average     12
7    REM Average   <NA>
8   Deep Average   <NA>
9  Sleep Min          4
10 Awake Min          5
11   REM Min       <NA>
12  Deep Min       <NA>
13 Sleep Max         10
14 Awake Max         15
15   REM Max       <NA>
16  Deep Max       <NA>
[1] "hello 3"
 [1] Absolute Absolute Absolute Absolute Average  Average  Average  Average 
 [9] Min      Min      Min      Min      Max      Max      Max      Max     
Levels: Absolute Average  Min      Max     
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE
Run Code Online (Sandbox Code Playgroud)

这样做ds[ds$variable %in% c("Min","Max"), ]会给出False输出因为错误结转.

测试Uwe的提议

代码用显式data.table::dcast和两次data.table::melt.sessionInfo()刚刚打印出来molten <- ....注意library(ggplot2)尚未加载,因为错误来自该行molten <- ....

$ Rscript test111.r 
    Vars "Average" "Max" "Min" Absolute
1: Sleep         7    10     4       NA
2: Awake        12    15     5       NA
3:   REM        NA    NA    NA        5
4:  Deep        NA    NA    NA        7
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base     

other attached packages:
[1] data.table_1.10.4

loaded via a namespace (and not attached):
[1] compiler_3.4.0 methods_3.4.0 
Error in melt.data.table(transposed, measure.vars = c("Absolute", "Average")) : 
  One or more values in 'measure.vars' is invalid.
Calls: <Anonymous> -> melt.data.table
Execution halted
Run Code Online (Sandbox Code Playgroud)

用测试代码2测试Uwe的提议

molten <- structure(list(Vars = structure(c(1L, 2L, 1L, 2L, 1L, 2L), class = "factor", .Label = c("V1", "V2")), variable = structure(c(1L, 1L, 2L, 2L, 3L, 3L), class = "factor", .Label = c("ave", "ave_max", "lepo")), value = c(7L, 8L, 10L, 10L, 4L, 4L)), .Names = c("Vars", "variable", "value"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))

print(molten)

library(ggplot2)
ggplot(molten, aes(x = Vars, y = value, fill = variable, ymin = lepo, ymax = ave_max)) + 
  geom_col() + geom_errorbar(width = 0.2)
Run Code Online (Sandbox Code Playgroud)

产量

  Vars variable value
1   V1      ave     7
2   V2      ave     8
3   V1  ave_max    10
4   V2  ave_max    10
5   V1     lepo     4
6   V2     lepo     4
Error in FUN(X[[i]], ...) : object 'lepo' not found
Calls: <Anonymous> ... by_layer -> f -> <Anonymous> -> f -> lapply -> FUN -> FUN
Execution halted
Run Code Online (Sandbox Code Playgroud)

use*_*478 4

您的代码的问题在于您在 ggplot aes 函数中使用带引号的“Vars”而不是简单的 Vars。此外,您的数据集的标题也很混乱。Absolute、Average...应该是数据集的列名称,而不是值本身。这就是熔化函数出现错误的原因。

鉴于您的数据集,这是我的尝试:

#Data
data = cbind.data.frame(c("Sleep", "Awake", "REM", "Deep"),
                        c(NA, NA, 5, 7),
                        c(7, 12, NA, NA),
                        c(4, 5, NA, NA),
                        c(10, 15, NA, NA))
colnames(data) = c("Vars", "Absolute", "Average", "Min", "Max")

#reshape
dat.m <- melt(data, id.vars="Vars")
#Stacked plot
ggplot(dat.m, aes(x = Vars, y = value)) + geom_bar(aes(fill=variable), stat = "identity")
Run Code Online (Sandbox Code Playgroud)

这将产生:

堆积条

#Or multiple bars
ggplot(dat.m, aes(x = Vars, y = value)) + 
  geom_bar(aes(fill=variable), stat = "identity", position="dodge") 
Run Code Online (Sandbox Code Playgroud)

非堆叠式

#Or separated by Vars
ggplot(dat.m, aes(x = Vars, y = value)) + geom_bar(aes(fill=variable), stat = "identity", position="dodge") + facet_wrap( ~ Vars, scales="free")
Run Code Online (Sandbox Code Playgroud)

分离者

我正在为答案添加另一个图表。这与@Uwe 合作回答。

#data
data <- structure(list(Vars = structure(1:2, class = "factor", .Label = c("V1", "V2")), ave = c(7L, 8L), ave_max = c(10L, 10L), lepo = c(4L, 4L)), .Names = c("Vars", "ave", "ave_max", "lepo"), row.names = c(NA, -2L), class = c("data.table", "data.frame"), sorted = "Vars")
#Melt
library(data.table)
mo = data.table::melt(data, measure.vars = c("ave"))
ggplot(mo, aes(x = Vars, y = value, fill = variable, ymin = lepo, ymax = ave_max)) + geom_col() + geom_errorbar(width = 0.2)
Run Code Online (Sandbox Code Playgroud)

这将产生:

在此输入图像描述