基本上是这个问题的扩展,因为我注意到,如果你第二次进行子集化,就不可能改变列的值.
random.length <- sample(x = 15:30, size = 1)
dt <- data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))
set.seed(1)
dt[sample(.N,3), score :=9999]
set.seed(1)
dt[sample(.N,3),]
Run Code Online (Sandbox Code Playgroud)
这可以按预期工作,并将三个随机选择的城市的分数更改为9999.虽然如果您在第一步中进行分组,然后进行采样并尝试分配新的分数值,那么这是不可能的.
set.seed(1)
dt[city == "New York",][sample(.N,1), score := 55555]
set.seed(1)
dt[city == "New York",][sample(.N,1)]
Run Code Online (Sandbox Code Playgroud)
我想要实现的是,我可以更改某个列的值,该列是某个子集的一部分,并从该子集中随机选择.
dt[city == "New York"]返回一个全新的对象,您可以通过引用进行更新.但是,这并没有影响dt.即
dt[expr, col := val] != dt[expr][, col := val]
Run Code Online (Sandbox Code Playgroud)
第一个表达式的更新dt,其中expr的计算结果为TRUE.第二个更新返回的子集dt[expr].除非您将结果分配回变量,否则无法取回结果.
which除了以上所有建议之外,您还可以对索引进行采样(可以使用函数计算):
dt[sample(which(city == "New York"), 1), score:=555L]
dt
# city score
# 1: Tel Aviv 8
# 2: Amsterdam 3
# 3: Cape Town 10
# 4: New York 1
# 5: Cape Town 10
# 6: Pittsburgh 2
# 7: Pittsburgh 8
# 8: Amsterdam 10
# 9: Amsterdam 8
# 10: Amsterdam 4
# 11: Tel Aviv 7
# 12: Amsterdam 2
# 13: Pittsburgh 1
# 14: Amsterdam 3
# 15: Pittsburgh 2
# 16: New York 7
# 17: Tel Aviv 10
# 18: New York 10
# 19: Cape Town 1
# 20: Amsterdam 7
# 21: Amsterdam 3
# 22: New York 555
# 23: Cape Town 6
# 24: New York 1
# 25: Tel Aviv 10
# city score
Run Code Online (Sandbox Code Playgroud)