我最近为我的在线课程构建了 R 包。但是,当我运行 travis-ci build 时,它由于以下错误而停止:
https://github.com/AnoushiravanR/fars
ERROR: configuration failed for package ‘gert’
* removing ‘/home/travis/R/Library/gert’
Error in i.p(...) :
(converted from warning) installation of package ‘gert’ had non-zero exit status
Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p
Execution halted
The command "Rscript -e 'deps <- remotes::dev_package_deps(dependencies = NA);remotes::install_deps(dependencies = TRUE);if (!all(deps$package %in% installed.packages())) { message("missing: ", paste(setdiff(deps$package, installed.packages()), collapse=", ")); q(status = 1, save = "no")}'" failed and exited with 1 during .
Your …Run Code Online (Sandbox Code Playgroud) 根据我今天遇到的一个问题,我想知道如何bind_rows在管道中使用函数,同时避免重复和NA值。考虑我有以下简单的小标题:
df <- tibble(
col1 = c(3, 4, 5),
col2 = c(5, 3, 1),
col3 = c(6, 4, 9),
col4 = c(9, 6, 5)
)
Run Code Online (Sandbox Code Playgroud)
我想用col1&col2逐行绑定col3,col4这样我就有一个包含 2 列和 6 个观察值的 tibble。最后将列名更改为colnew1和colnew2。但是当我使用时,bind_rows我得到了以下输出,其中包含大量重复项和NA值。
df %>%
bind_rows(
select(., 1:2),
select(., 3:4)
)
# A tibble: 9 x 4
col1 col2 col3 col4
<dbl> <dbl> <dbl> <dbl>
1 3 5 6 9
2 4 …Run Code Online (Sandbox Code Playgroud) 我有一个这样的数据框:
df <- data.frame(x = 1:100, y = runif(100))
Run Code Online (Sandbox Code Playgroud)
我把它分成5部分:
z <- split(df, rep(1:5, length.out = nrow(df), each = ceiling(nrow(df)/5)))
Run Code Online (Sandbox Code Playgroud)
现在我试图找到每个部分的描述性统计数据,z但我收到了这个错误:(我实际上对df$y在这 5 个部分中找到列的描述性统计数据很感兴趣。)
psych::describe(z,na.rm = TRUE)
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
is.atomic(x) is not TRUE
Ek olarak: Warning message:
In mean.default(x, na.rm = na.rm) :
argument is not numeric or logical: returning NA
Run Code Online (Sandbox Code Playgroud)
我正在尝试找到这样的东西:(可能看起来不像z[1]$y,但假设这就是我想要找到的东西)
vars n mean sd median trimmed mad min max range skew kurtosis …Run Code Online (Sandbox Code Playgroud) 我有一个包含 34 列和 12,964 行的数据框,其中两列是 Gene.Name 和 Mutation_Frequency。例如:
| 基因名称 | Mutation_Frequency |
|---|---|
| CTLA4 | 0 |
| TP53 | 4 |
| CTLA4 | 2 |
| CTLA4 | 2 |
| TP53 | 4 |
| TP53 | 6 |
我现在想创建一个名为“Highest_Mutation_Frequency”的列,它告诉我 Gene.Name 的最高突变频率,并将其放在一个新列中,如下所示:
| 基因名称 | Mutation_Frequency | Highest_Mutation_Frequency |
|---|---|---|
| CTLA4 | 0 | 2 |
| TP53 | 4 | 6 |
| CTLA4 | 2 | 2 |
| CTLA4 | 2 | 2 |
| TP53 | 0 | 6 |
| TP53 | 6 | 6 |
我意识到我可能可以使用 max() 命令,但我不确定如何实现它。与往常一样,任何帮助表示赞赏!
编辑:虽然这与另一个问题非常相似: 选择每组中具有最大值的行 这个问题还涉及生成唯一的行并将它们放置在另一个数据框中。
我想计算每组5天内的累计总和。
df <- data.frame(
date = ymd( c( "2022-01-02","2022-01-03","2022-01-05","2022-01-07","2022-01-11","2022-01-14","2022-01-17","2022-01-18","2022-01-24","2022-01-27","2022-01-01","2022-01-04","2022-01-04","2022-01-08","2022-01-12","2022-01-14","2022-01-19","2022-01-24","2022-01-25","2022-01-28")),
group = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
number = c(10,30,20,50,30,50,40,50,30,50,55,10,30,20,50,30,40,30,40,30))
Run Code Online (Sandbox Code Playgroud)
下面是我的数据框的一个小样本,包括累积总和列应返回的内容。任何帮助,将不胜感激。谢谢。
date group number cumsum(s)
2022-01-02 A 10 10
2022-01-03 A 30 40
2022-01-05 A 20 60
2022-01-07 A 50 110
2022-01-11 A 30 80
2022-01-14 A 50 80
2022-01-17 A 40 90
2022-01-18 A 50 140
2022-01-24 A 30 30
2022-01-27 A 50 80
2022-01-01 B 55 55
2022-01-04 B 10 65
2022-01-04 B 30 95
2022-01-08 B 20 60
2022-01-12 B 50 70 …Run Code Online (Sandbox Code Playgroud) 我有这个 df:
df <- data.frame(colA=c("A","B","C"),
colB = c("Stringn","Stringc","Stringb"),
x2008 = c(2.71472,1.62307,1.62269),
x2009 = c(NA,1.68250,1.66570))
df%>%
select(`x2008`,`x2009`)%>%
colMeans (na.rm = T)
Run Code Online (Sandbox Code Playgroud)
返回:
| x2008 | x2009 |
|---|---|
| 1.986827 | 1.674100 |
预期收益:
| 可乐 | 列 | x2008 | x2009 |
|---|---|---|---|
| 一种 | 字符串 | 2.71472 | 不适用 |
| 乙 | 字符串 | 1.62307 | 1.6825 |
| C | 字符串 | 1.62269 | 1.6657 |
| 平均数 | 结果 | 1.986827 | 1.674100 |
我正在这样做:
df%>%
select(`x2008`,`x2009`)%>%
colMeans (na.rm = T)%>%
mutate (`ColA` =" Average ",` ColB` = "result")
Run Code Online (Sandbox Code Playgroud)
但它给出了错误,知道如何解决这个问题吗?
我有这个数据框:
id a1 a2 b1 b2 c1 c2
<int> <int> <int> <int> <int> <int> <int>
1 1 83 33 55 33 85 86
2 2 37 0 60 98 51 0
3 3 97 71 85 8 44 40
4 4 51 6 43 15 55 57
5 5 28 53 62 73 70 9
Run Code Online (Sandbox Code Playgroud)
df <- structure(list(id = 1:5, a1 = c(83L, 37L, 97L, 51L, 28L), a2 = c(33L,
0L, 71L, 6L, 53L), b1 = c(55L, 60L, 85L, 43L, …Run Code Online (Sandbox Code Playgroud) 我有一个数据框 games_h。这只是表格的一个片段,但它有很多球队,并按日期、球队、比赛编号排序。我正在尝试创建按团队分组的加权滚动平均值。我希望最近一场比赛的权重是两场以上之前的。因此权重将为 (Game_1 * 1+ Game_2 *2)/3 或权重等于 1,且比率相同,因此权重 = c(1-.667, .667)。
dput(games_h)
structure(list(GameId = c(16, 16, 37, 37, 57, 57), GameDate = structure(c(17905,
17905, 17916, 17916, 17926, 17926), class = "Date"), NeutralSite = c(0,
0, 0, 0, 0, 0), AwayTeam = c("Virginia Cavaliers", "Virginia Cavaliers",
"Florida State Seminoles", "Florida State Seminoles", "Syracuse Orange",
"Syracuse Orange"), HomeTeam = c("Boston College Eagles", "Boston College Eagles",
"Boston College Eagles", "Boston College Eagles", "Boston College Eagles",
"Boston College Eagles"), Team = c("Virginia Cavaliers", …Run Code Online (Sandbox Code Playgroud)