当在 data.table 中按组计算平均值时,我得到不同的结果:
\nqty <- c(1:6)\nname <- c("a", "b","a", "a", "c","b" )\ntype <- c("i", "i", "i", "f", "f", "f")\n\nDT <- data.table(qty,name,type) \n\nDT[, avg_mean := mean(qty) , by = .(name, type)]\nDT[, avg_sum_N := sum(qty)/.N , by = .(name, type)]\n\n > DT\n qty name type avg_mean avg_sum_N\n <int> <char> <char> <num> <num>\n1: 1 a i 2 2\n2: 2 b i 4 2\n3: 3 a i 2 2\n4: 4 a f 2 4\n5: 5 c f 6 5\n6: 6 b f 5 6\nRun Code Online (Sandbox Code Playgroud)\n我期望avg_mean和avg_sum_N会完全相同,例如avg_sum_N。\n为什么它们不同?谢谢。
请查找以下会话信息。
\n> packageVersion('data.table')\n[1] \xe2\x80\x981.14.3\xe2\x80\x99\n> sessionInfo()\nR version 4.1.0 (2021-05-18)\nPlatform: x86_64-w64-mingw32/x64 (64-bit)\nRunning under: Windows 10 x64 (build 19044)\n\nMatrix products: default\n\nlocale:\n[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252\n[4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252 \n\nattached base packages:\n[1] stats graphics grDevices utils datasets methods base \n\nother attached packages:\n [1] zoo_1.8-10 lubridate_1.8.0 RPostgres_1.4.3 DBI_1.1.2 stringi_1.7.6 readxl_1.4.0 \n [7] gsubfn_0.7 proto_1.0.0 stringr_1.4.0 magrittr_2.0.3 stringdist_0.9.8 fuzzyjoin_0.1.6 \n[13] data.table_1.14.3\n\nloaded via a namespace (and not attached):\n [1] Rcpp_1.0.8.3 pillar_1.7.0 compiler_4.1.0 cellranger_1.1.0 tools_4.1.0 bit_4.0.4 \n [7] lattice_0.20-44 lifecycle_1.0.1 tibble_3.1.6 pkgconfig_2.0.3 rlang_1.0.2 cli_3.2.0 \n[13] rstudioapi_0.13 writexl_1.4.0 parallel_4.1.0 dplyr_1.0.8 hms_1.1.1 generics_0.1.2 \n[19] vctrs_0.4.1 grid_4.1.0 bit64_4.0.5 tidyselect_1.1.2 glue_1.6.2 R6_2.5.1 \n[25] fansi_1.0.3 tcltk_4.1.0 blob_1.2.3 purrr_0.3.4 ellipsis_0.3.2 assertthat_0.2.1\n[31] utf8_1.2.2 crayon_1.5.1\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
182 次 |
| 最近记录: |