After reading the convincing book R for Data Science I was excited about all the tidyverse functions, especially the transformation and data wrangling components dplyr and tidyr. It seemed that coding with those saves a lot of time and results in better readability compared to base R. But the more I use dplyr, the more I encounter situations where the opposite seems to be the case. In one of my last questions I asked how to replace rows with NAs if one of the variable exceeds some threshold. In base I would simply do
df[df$age > 90, ] <- NA
Run Code Online (Sandbox Code Playgroud)
The two answers suggested using
df %>% select(x, y, age) %>% mutate_all(~replace(.x, age> 90, NA))
# or
df %>% mutate_all(function(i) replace(i, .$age> 90, NA))
Run Code Online (Sandbox Code Playgroud)
Both answers are great and I am thankful to get them. Still, the code in base R seems so much simpler to me. Now I am facing another situation where my code with dplyr is much more complicated, too. I am aware that it is a subjective impression whether some code is complicated, but putting it in a more objective way I would say that nchar(dplyr_code) > nchar(base_code) in many situations.
Further, I noticed that I seem to encounter this more often if the code I need to write is about operations on rows rather than on columns. It can be argued that one can use tidyr from tidyverse to transpose the data in order to change rows to columns. But even doing this seems also much more complicated in the tidyverse frame than in base R (see here).
我的问题是我面临这个问题是否是因为我很陌生tidyverse,或者在某些情况下编码base是否更有效。如果是后者:是否有资源可以在抽象级别上进行总结,当使用basevs进行编码时效率更高tidyverse,或者您可以说明一些情况吗?我这样问是因为有时我花了相当多的时间来弄清楚如何解决某些问题tidyverse,最后我发现base在这种情况下这是一种更方便的编码。知道何时使用tidyverse或base进行数据整理和转换将为我节省大量时间。
如果这个问题太宽泛,请告诉我,我会尝试重新表述或删除该问题。
如果您在 R 基础上有一个干净、可读且功能齐全的解决方案,并且看起来更合适,那么您为什么要选择附加层呢?也许是为了在脚本中保留相同的接口(管道),以提高可读性?但正如您所说,与基础 R 相比,tidyverse 并不总是能保证这一点。
主要区别是:
Base R 高度关注稳定性,而 tidyverse 无法保证这一点。来自他们自己的文档:“tidyverse 将在寻找更好的界面方面做出重大改变”(https://tidyverse.tidyverse.org/articles/paper.html)。
这使得基础 R 在某些情况下成为生产环境的更好合作伙伴,因为您可能会发现 tidyverse 函数随着时间的推移而弃用和变化。我自己更喜欢包中尽可能少的依赖项。