小编Tho*_*ing的帖子

在 dplyr>=1.0 中，为什么当在 mutate 中使用动态列名时，列名必须是 ensym ？

我花了一段时间才明白，当在具有粘合语法列名的 mutate 函数中使用时，函数中的列名必须是 ensym。理由是什么？为什么我必须使用 ensym？为什么和不够{{}}用{}？

工作正常：

a <- 1:5
data_set <- tibble(a,x=a*2)

test_function <- function(data,var_x){
  var_x <- ensym(var_x)
  data %>% mutate("is_four_in_{var_x}":=if_else({{var_x}}==4,{{var_x}},NA_integer_)) %>% 
    return()
}


data_set %>% test_function(x)

Run Code Online (Sandbox Code Playgroud)

但如果

var_x <- ensym(var_x) 被删除，我得到

Error in eval(parse(text = text, keep.source = FALSE), envir) :

Run Code Online (Sandbox Code Playgroud)

object 'x' not found

Run Code Online (Sandbox Code Playgroud)

syntax r dplyr

Pon*_*lis

2023 03-29

5
推荐指数

1
解决办法

84
查看次数

在 R/python 中将向量分割成 n 个“相似”段

如果我有一个由m 个实数组成的向量，如何将向量分成 n 个段，使得每个段都包含“相似”值，并且所有值都在原始向量中具有相邻位置？

这里的“相似”可以指最大限度地减少每个部分中数字的变异性。因此，例如，如果我有向量：[4, 4.2, 4, 18, 1, 2, 0.98, 15, 17]，并且我想将其分成 4 个（为了示例而随机选择）段，我最终会得到段：{[4, 4.2, 4], [18], [1, 2, 0.98],[15, 17]}。

值得注意的是，相似性不必定义为最小变异性，但这对我来说才有意义。

所以我的问题是：

是否有一种算法，给定大小为m的向量和数字n（其中n \xe2\x89\xa4 m），可以找到最佳 m 段，使得每个部分包含“相似的“数字？这里的目标可能是最小化每个分段的方差之和。
是否有一种算法可以执行上述操作，但不将段数作为参数，而是可以找到最佳段数及其位置？（在我看来，最佳的段数就是 m 个段，因为每个段的可变性为 0，因此必须有某种与添加新段相关的成本函数）。

我理想地希望得到 R 或 python 中的答案，但是，我主要只对此类算法的逻辑/名称感兴趣。

python algorithm r cluster-analysis

osk*_*ska

2023 08-30

5
推荐指数

1
解决办法

180
查看次数

如何检查以1开头以2结尾的模式？

该序列将类似于：

"4122222222222281222222211111212"

Run Code Online (Sandbox Code Playgroud)

我想要的结果是：

"1222222222222"

"12222222"

"12"

"12"

Run Code Online (Sandbox Code Playgroud)

您可以看到该模式中可以有任意数量的“2”。

有没有办法在R中找到这样的模式？

regex string r

dor*_*mon

2023 10-04

5
推荐指数

1
解决办法

144
查看次数

提取R中没有重复顶点属性的所有子图

我有一个像这样的图形对象：

# Create an empty graph
gss <- make_empty_graph(n = 12, directed = FALSE)

# Define vertex attributes
vertex_attr(gss) <- list(
  name = c("1", "2", "3", "4", "6", "7", "8", "10", "11", "17", "21", "23"),
  label = c("st_con_rt=main-room", "st_con_rt=sub-room", "st_con_tr=direct", "st_con_tr=terrace", "st_th=tsuma", "st_adsb=add", "st_adsb=sub", "tr_adsb=sub", "st_sub_main_th=hira", "roo_com=1a+7", "roo_com=2a+7", "roo_com=4a"),
  index = c(1, 2, 3, 4, 6, 7, 8, 10, 11, 17, 21, 23),
  element = c("st_con_rt", "st_con_rt", "st_con_tr", "st_con_tr", "st_th", "st_adsb", "st_adsb", "tr_adsb", "st_sub_main_th", "roo_com", "roo_com", "roo_com")
)

# Define …

Run Code Online (Sandbox Code Playgroud)

algorithm r igraph

Hid*_*o.S

2023 12-23

5
推荐指数

1
解决办法

152
查看次数

Find the minimum date between two maximum dates based off unique values in a column

Data example.

date1 = seq(as.Date("2019/01/01"), by = "month", length.out = 29)
date2= seq(as.Date("2019/05/01"), by = "month", length.out = 29)

subproducts1=rep("1",29)
subproducts2=rep("2",29)

b1 <- c(rnorm(29,5))
b2 <- c(rnorm(29,5))

dfone <- data.frame("date"= c(date1,date2),
                "subproduct"= 
                  c(subproducts1,subproducts2),
                "actuals"= c(b1,b2))

Run Code Online (Sandbox Code Playgroud)

Max Date for Subproduct 1 is May 2021 and max date for Subproduct 2 is Sept 2021.

Question: Is there a way to:

Find the max date for both unique subproduct and
Find the minimum date from the two max dates all in one step? …

r data-manipulation dataframe dplyr

chr*_*456

2021 07-15

4
推荐指数

3
解决办法

95
查看次数

在循环中跨列替换多个值

我想在不同的列中重新编码多个值。

例如：

df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2),
                 party = rep(c("A", "A", "A", "B", "B", "B"), 2),
                 s_item = rep(c(3,4,5,1,2,6), 2), 
                 s_item2 = rep(c(1,2,3,4,5,6), 2),
                 s_item3 = rep(c(6,2,3,1,5,4), 2))

Run Code Online (Sandbox Code Playgroud)

数据：

   wave party s_item s_item2 s_item3
1     1     A      3       1       6
2     1     A      4       2       2
3     1     A      5       3       3
4     1     B      1       4       1
5     1     B      2       5       5
6     1     B      6       6       4
7     2     A      3       1       6
8     2     A      4 …

Run Code Online (Sandbox Code Playgroud)

replace for-loop r dataframe recode

Joo*_*xen

2021 08-02

4
推荐指数

1
解决办法

47
查看次数

根据包含参与者之间关系的数据帧的列更改边缘厚度

此代码根据参与者和关系的数据帧绘制图表。

library(igraph)
actors <- data.frame(name=c("Alice", "Bob", "Cecil", "David",
                            "Esmeralda"))
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
                               "David", "Esmeralda"),
                        to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
                        friendship=c(4,15,5,2,11,1))
g <- graph_from_data_frame(relations, directed=TRUE, vertices=actors)

plot(g)

Run Code Online (Sandbox Code Playgroud)

结果是：

我想根据的值更改弧的厚度（而不是长度）relations$friendship。

plot r igraph

Mar*_*ark

2021 11-05

4
推荐指数

1
解决办法

969
查看次数

从成对矩阵中，找到等于某个值的最大个体群体

我有一个 39x39 的成对相关性矩阵，其中包含 39 个个体的所有成对组合的相关性值。我想找到完全不相关的最大个体组，即该组中所有成对相关性值都等于 0。

在 R 中是否有一种简单的方法可以做到这一点？

一个更简单的例子：

set.seed(420)

#Create the matrix
relatedness.matrix <- matrix(data = sample(x = c(0.5, 1, 0,0), size = 25, replace = TRUE), nrow = 5, ncol = 5)

# Matrix has the same upper and lower triangles
relatedness.matrix[upper.tri(relatedness.matrix)] <- relatedness.matrix[lower.tri(relatedness.matrix)]

# Add names for simplicity of reference
colnames(relatedness.matrix) <- letters[1:5]
rownames(relatedness.matrix) <- letters[1:5]

# Relatedness between the same individual does not count
diag(relatedness.matrix) <- NA

Run Code Online (Sandbox Code Playgroud)

在这种情况下，存在三种可能的解决方案：仅包含和的 2x2 矩阵e、仅包含和的 …

algorithm r matrix igraph submatrix

Ale*_*ohn

2023 09-16

4
推荐指数

1
解决办法

121
查看次数

非方阵中的最大集团问题

我有很多非方阵，如下例所示：

1    1    0
1    1    0
1    1    0
1    0    1

Run Code Online (Sandbox Code Playgroud)

我想要一个通用的解决方案来找到这些矩阵中最大的密集连接区域。因此，对于我的示例，解决方案将返回rows=c(1, 2, 3), columns=c(1,2). 也就是说，我可以接受非最佳解决方案，即局部最小值就可以了。

我认为这类似于max-clique 问题。然而，我的矩阵不是方形的，它们不代表图形，所以我在使用像igraph::cliques(). 如何找到非方阵的密集连接区域？

为了澄清“密集区域”，我指的是矩阵中包含全 1 的任何矩形块，这可以通过重新排序行和列来实现。因此，原始矩阵中行和列的顺序并不重要，我想考虑顺序的所有排列。我真的在寻找与邻接矩阵中的派系类似/等效的区域，但是，同样，这些矩阵不是方形的。

r graph-theory matrix igraph

R G*_*cey

2023 09-26

4
推荐指数

1
解决办法

179
查看次数

如何根据R中其他变量的类别汇总值？

我有一个数据集，显示了X国甲方和乙方的宗教信仰，以及每个国家宗教信徒的百分比。

df <- data.frame(
  PartyA = c("Christian","Muslim","Muslim","Jewish","Sikh"),
  PartyB = c("Jewish","Muslim","Christian","Muslim","Buddhist"),
  ChristianPop = c(12,1,74,14,17),
  MuslimPop = c(71,93,5,86,13),
  JewishPop = c(9,2,12,0,4),
  SikhPop = c(0,0,1,0,10),
  BuddhistPop = c(1,0,2,0,45)
)
#      PartyA    PartyB ChristianPop MuslimPop JewishPop SikhPop BuddhistPop
# 1 Christian    Jewish           12        71         9       0           1
# 2    Muslim    Muslim            1        93         2       0           0
# 3    Muslim Christian           74         5        12       1           2
# 4    Jewish    Muslim           14        86         0       0           0
# 5      Sikh  Buddhist           17        13         4      10          45

Run Code Online (Sandbox Code Playgroud)

借此，我想将“参与”的宗教信徒的总数加在一起。因此，第一行将得到一个等于 …

r dataframe

san*_*j00

2024 02-27

4
推荐指数

1
解决办法

139
查看次数