数据框是一个表,其中每一列可以有不同类型的值。其用途类似于电子表格或 SQL 表。一个例子可以让事情变得更清楚。
\n例如,假设您有有关人员的数据:姓名、年龄以及他们是否受雇。我们可以将这些数据存储在向量中,例如:
\nnames <- c(\'John\', \'Sylvia\', \'Arthemis\')\nage <- c(32, 16, 21)\nemployed <- c(TRUE, FALSE, TRUE)\nRun Code Online (Sandbox Code Playgroud)\n数据框允许我们将与一个人相关的所有数据放在一行中。要创建它,我们只需将向量作为参数传递给data.frame():
> df <- data.frame(Name=names, Age=age, Working=employed)\n> df\n Name Age Working\n1 John 32 TRUE\n2 Sylvia 16 FALSE\n3 Arthemis 21 TRUE\nRun Code Online (Sandbox Code Playgroud)\n请注意现在的数据格式是多么清晰。有了数据框,许多操作变得更加容易。例如,过滤:
\n> df[df$Age>20,]\n Name Age Working\n1 John 32 TRUE\n3 Arthemis 21 TRUE\nRun Code Online (Sandbox Code Playgroud)\nThis is just one example of many. Filtering, aggregating, plotting, etc. became much more straightforward with data frames.
\nTibbles are just a new kind of data frame. It is part of the very popular tidyverse set of packages and subtly differs from data frames in a few points.
\nOne notable difference is that the tibble format contains more information:
\n> t <- tibble(Name=names, Age=age, Working=employed)\n> t\n# A tibble: 3 \xc3\x97 3\n Name Age Working\n <chr> <dbl> <lgl> \n1 John 32 TRUE \n2 Sylvia 16 FALSE \n3 Arthemis 21 TRUE \nRun Code Online (Sandbox Code Playgroud)\nMore important, though, is that tibbles do not have some confusing features that data frames have.
\nFor example, you can get a column from the data frame by giving only the beginning of the column name:
\n> df$N\n[1] "John" "Sylvia" "Arthemis"\nRun Code Online (Sandbox Code Playgroud)\nIt may look practical, but if you find this line in your source code, it can be hard to understand. It can also lead to bugs if multiple columns start with the same prefix.
\nIf you do that to tibbles, it will return NULL and print a warning:
> t$N\nNULL\nWarning message:\nUnknown or uninitialised column: `N`. \nRun Code Online (Sandbox Code Playgroud)\nThis is just one example. More differences can be found on this page, although most of them are more relevant to older, more experienced coders.
\ntribble() functionWe created tibble objects with the function tibble() so far. tribble() is just another way of creating tibble objects. The difference is that, while tibble() receives vectors very much like data.frame(), tribble() expects as arguments:
without having to create any vector.
\ntribble()To understand what it means and why it is useful, an example will make it clear:
\n> t2 <- tribble(\n+ ~Name, ~Age, ~`Employment status`,\n+ "John", 32, TRUE,\n+ "Sylvia", 16, FALSE,\n+ "Arthemis", 21, TRUE\n+ )\nRun Code Online (Sandbox Code Playgroud)\n请注意,您可以在输入数据时看到表格格式。对于代码中的示例来说非常有用!但不要误会:返回对象相当于以下创建的相同对象tibble():
> t2\n# A tibble: 3 \xc3\x97 3\n Name Age `Employment status`\n <chr> <dbl> <lgl> \n1 John 32 TRUE \n2 Sylvia 16 FALSE \n3 Arthemis 21 TRUE \nRun Code Online (Sandbox Code Playgroud)\n您可以使用任何您喜欢的东西!它们都运行良好。然而,有些可能更适合一种或另一种情况。
\n假设您要创建 tibbles,您应该使用哪个函数?
\ntibble().tribble() function may be more practical.tibble()和tribble() uptibble()和tribble() return the same kind of object, but they have very different signatures. Yet, their names are really similar, so people often confuse them. Pay attention to that!
如果你打电话tibble()过去tribble() arguments, you\'ll get an error similar to this:
# \xe2\x9d\x8c WRONG!\n> tibble(\n+ ~Name, ~Age, ~`Employment status`,\n+ "John", 32, TRUE\n+ )\nError:\n! All columns in a tibble must be vectors.\n\xe2\x9c\x96 Column `~Name` is a `formula` object.\nRun `rlang::last_error()` to see where the error occurred.\nRun Code Online (Sandbox Code Playgroud)\n如果你打电话tribble()过去tibble() arguments, this is the error you will get:
# \xe2\x9d\x8c WRONG!\n> t <- tribble(Name=names, Age=age, Working=employed)\nError:\n! Must specify at least one column using the `~name` syntax.\nRun `rlang::last_error()` to see where the error occurred.\nRun Code Online (Sandbox Code Playgroud)\n如果您遇到与这些类似的错误消息问题,请验证您在通话中使用了正确的签名。
\n(我发布这个附录是为了让人们在谷歌上搜索这些错误时可以找到这个问答。我花了一个小时试图理解为什么我会收到这个错误。这是一个令人惊讶的无法谷歌搜索的主题!)
\n| 归档时间: |
|
| 查看次数: |
1149 次 |
| 最近记录: |