“data.frame”、“tribble”和“tibble”函数之间有什么区别?

Jor*_*rea 4 r dataframe tidyverse tibble

data.frametribble、 和函数之间有什么区别tibble?哪个更容易,哪个对于分析大量数据更有用?我正在创建一个数据框,但我不知道该选择哪一个。

bra*_*zzi 6

数据框

\n

数据是一个表,其中每一列可以有不同类型的值。其用途类似于电子表格或 SQL 表。一个例子可以让事情变得更清楚。

\n

例子

\n

例如,假设您有有关人员的数据:姓名、年龄以及他们是否受雇。我们可以将这些数据存储在向量中,例如:

\n
names <- c(\'John\', \'Sylvia\', \'Arthemis\')\nage <- c(32, 16, 21)\nemployed <- c(TRUE, FALSE, TRUE)\n
Run Code Online (Sandbox Code Playgroud)\n

数据框允许我们将与一个人相关的所有数据放在一行中。要创建它,我们只需将向量作为参数传递给data.frame()

\n
> df <- data.frame(Name=names, Age=age, Working=employed)\n> df\n      Name Age Working\n1     John  32    TRUE\n2   Sylvia  16   FALSE\n3 Arthemis  21    TRUE\n
Run Code Online (Sandbox Code Playgroud)\n

请注意现在的数据格式是多么清晰。有了数据框,许多操作变得更加容易。例如,过滤:

\n
> df[df$Age>20,]\n      Name Age Working\n1     John  32    TRUE\n3 Arthemis  21    TRUE\n
Run Code Online (Sandbox Code Playgroud)\n

This is just one example of many. Filtering, aggregating, plotting, etc. became much more straightforward with data frames.

\n

Tibbles

\n

Tibbles are just a new kind of data frame. It is part of the very popular tidyverse set of packages and subtly differs from data frames in a few points.

\n

Differences from data frames

\n

One notable difference is that the tibble format contains more information:

\n
> t <- tibble(Name=names, Age=age, Working=employed)\n> t\n# A tibble: 3 \xc3\x97 3\n  Name       Age Working\n  <chr>    <dbl> <lgl>  \n1 John        32 TRUE   \n2 Sylvia      16 FALSE  \n3 Arthemis    21 TRUE \n
Run Code Online (Sandbox Code Playgroud)\n

More important, though, is that tibbles do not have some confusing features that data frames have.

\n

For example, you can get a column from the data frame by giving only the beginning of the column name:

\n
> df$N\n[1] "John"     "Sylvia"   "Arthemis"\n
Run Code Online (Sandbox Code Playgroud)\n

It may look practical, but if you find this line in your source code, it can be hard to understand. It can also lead to bugs if multiple columns start with the same prefix.

\n

If you do that to tibbles, it will return NULL and print a warning:

\n
> t$N\nNULL\nWarning message:\nUnknown or uninitialised column: `N`. \n
Run Code Online (Sandbox Code Playgroud)\n

This is just one example. More differences can be found on this page, although most of them are more relevant to older, more experienced coders.

\n

The tribble() function

\n

We created tibble objects with the function tibble() so far. tribble() is just another way of creating tibble objects. The difference is that, while tibble() receives vectors very much like data.frame(), tribble() expects as arguments:

\n
    \n
  • the name of the columns in the so-called "tilde syntax"; and then
  • \n
  • each row
  • \n
\n

without having to create any vector.

\n

How to use tribble()

\n

To understand what it means and why it is useful, an example will make it clear:

\n
> t2 <- tribble(\n+   ~Name,       ~Age, ~`Employment status`,\n+   "John",      32,   TRUE,\n+   "Sylvia",    16,   FALSE,\n+   "Arthemis",  21,   TRUE\n+ )\n
Run Code Online (Sandbox Code Playgroud)\n

请注意,您可以在输入数据时看到表格格式。对于代码中的示例来说非常有用!但不要误会:返回对象相当于以下创建的相同对象tibble()

\n
> t2\n# A tibble: 3 \xc3\x97 3\n  Name       Age `Employment status`\n  <chr>    <dbl> <lgl>              \n1 John        32 TRUE               \n2 Sylvia      16 FALSE              \n3 Arthemis    21 TRUE               \n
Run Code Online (Sandbox Code Playgroud)\n

使用哪一个?

\n

您可以使用任何您喜欢的东西!它们都运行良好。然而,有些可能更适合一种或另一种情况。

\n
    \n
  • 如果您不使用 tidyverse,您可能会使用传统的数据框。
  • \n
  • 现在,如果您使用 tidyverse,您可能更喜欢 tibbles,因为它们是这些包的基石。您可能还更喜欢 tibble 以避免混淆数据框行为。
  • \n
\n

假设您要创建 tibbles,您应该使用哪个函数?

\n
    \n
  • 如果您从文件或向量中读取数据,您可能更喜欢使用tibble().
  • \n
  • 如果您要向 tibble 添加硬编码值,那么tribble() function may be more practical.
  • \n
\n

附录:混合tibble()tribble() up

\n

tibble()tribble() return the same kind of object, but they have very different signatures. Yet, their names are really similar, so people often confuse them. Pay attention to that!

\n

如果你打电话tibble()过去tribble() arguments, you\'ll get an error similar to this:

\n
# \xe2\x9d\x8c WRONG!\n> tibble(\n+ ~Name, ~Age, ~`Employment status`,\n+ "John", 32, TRUE\n+ )\nError:\n! All columns in a tibble must be vectors.\n\xe2\x9c\x96 Column `~Name` is a `formula` object.\nRun `rlang::last_error()` to see where the error occurred.\n
Run Code Online (Sandbox Code Playgroud)\n

如果你打电话tribble()过去tibble() arguments, this is the error you will get:

\n
# \xe2\x9d\x8c WRONG!\n> t <- tribble(Name=names, Age=age, Working=employed)\nError:\n! Must specify at least one column using the `~name` syntax.\nRun `rlang::last_error()` to see where the error occurred.\n
Run Code Online (Sandbox Code Playgroud)\n

如果您遇到与这些类似的错误消息问题,请验证您在通话中使用了正确的签名。

\n

(我发布这个附录是为了让人们在谷歌上搜索这些错误时可以找到这个问答。我花了一个小时试图理解为什么我会收到这个错误。这是一个令人惊讶的无法谷歌搜索的主题!)

\n