use*_*533 22 random r sampling
我有一个格式的数据框:
head(subset)
# ants 0 1 1 0 1
# age 1 2 2 1 3
# lc 1 1 0 1 0
Run Code Online (Sandbox Code Playgroud)
我需要根据年龄和lc创建带有随机样本的新数据框.例如,我想要30个年龄的样本:1和lc:1,30个样本来自年龄:1和lc:0等.
我确实看过随机抽样方法;
newdata <- function(subset, age, 30)
Run Code Online (Sandbox Code Playgroud)
但这不是我想要的代码.
A5C*_*2T1 43
我建议使用stratified我的"splitstackshape"包或sample_n"dplyr"包:
## Sample data
set.seed(1)
n <- 1e4
d <- data.table(age = sample(1:5, n, T),
lc = rbinom(n, 1 , .5),
ants = rbinom(n, 1, .7))
# table(d$age, d$lc)
Run Code Online (Sandbox Code Playgroud)
对于stratified,您基本上指定数据集,分层列和表示每个组所需大小的整数或表示要返回的分数的小数(例如,.1表示每组的10%).
library(splitstackshape)
set.seed(1)
out <- stratified(d, c("age", "lc"), 30)
head(out)
# age lc ants
# 1: 1 0 1
# 2: 1 0 0
# 3: 1 0 1
# 4: 1 0 1
# 5: 1 0 0
# 6: 1 0 1
table(out$age, out$lc)
#
# 0 1
# 1 30 30
# 2 30 30
# 3 30 30
# 4 30 30
# 5 30 30
Run Code Online (Sandbox Code Playgroud)
对于sample_n首先要创建一个分组表(使用group_by),然后指定想要观测次数.如果你想要比例采样,你应该使用sample_frac.
library(dplyr)
set.seed(1)
out2 <- d %>%
group_by(age, lc) %>%
sample_n(30)
# table(out2$age, out2$lc)
Run Code Online (Sandbox Code Playgroud)
Tho*_*mas 16
这是一些数据:
set.seed(1)
n <- 1e4
d <- data.frame(age = sample(1:5,n,TRUE),
lc = rbinom(n,1,.5),
ants = rbinom(n,1,.7))
Run Code Online (Sandbox Code Playgroud)
您需要一个拆分应用组合策略,您split可以d在其中使用data.frame(在此示例中),对每个子样本中的行/观察进行采样,然后将其组合在一起rbind.以下是它的工作原理:
sp <- split(d, list(d$age, d$lc))
samples <- lapply(sp, function(x) x[sample(1:nrow(x), 30, FALSE),])
out <- do.call(rbind, samples)
Run Code Online (Sandbox Code Playgroud)
结果:
> str(out)
'data.frame': 300 obs. of 3 variables:
$ age : int 1 1 1 1 1 1 1 1 1 1 ...
$ lc : int 0 0 0 0 0 0 0 0 0 0 ...
$ ants: int 1 1 0 1 1 1 1 1 1 1 ...
> head(out)
age lc ants
1.0.2242 1 0 1
1.0.4417 1 0 1
1.0.389 1 0 0
1.0.4578 1 0 1
1.0.8170 1 0 1
1.0.5606 1 0 1
Run Code Online (Sandbox Code Playgroud)
djh*_*rio 15
请参阅strata包装采样中的功能.该函数选择分层简单随机抽样并作为结果给出样本.添加了额外的两列 - 包含概率(Prob)和分层指示符(Stratum).查看示例.
require(data.table)
require(sampling)
set.seed(1)
n <- 1e4
d <- data.table(age = sample(1:5, n, T),
lc = rbinom(n, 1 , .5),
ants = rbinom(n, 1, .7))
# Sort
setkey(d, age, lc)
# Population size by strata
d[, .N, keyby = list(age, lc)]
# age lc N
# 1: 1 0 1010
# 2: 1 1 1002
# 3: 2 0 993
# 4: 2 1 1026
# 5: 3 0 1021
# 6: 3 1 982
# 7: 4 0 958
# 8: 4 1 940
# 9: 5 0 1012
# 10: 5 1 1056
# Select sample
set.seed(2)
s <- data.table(strata(d, c("age", "lc"), rep(30, 10), "srswor"))
# Sample size by strata
s[, .N, keyby = list(age, lc)]
# age lc N
# 1: 1 0 30
# 2: 1 1 30
# 3: 2 0 30
# 4: 2 1 30
# 5: 3 0 30
# 6: 3 1 30
# 7: 4 0 30
# 8: 4 1 30
# 9: 5 0 30
# 10: 5 1 30
Run Code Online (Sandbox Code Playgroud)
这是一个单行使用data.table:
set.seed(1)
n <- 1e4
d <- data.table(age = sample(1:5, n, T),
lc = rbinom(n, 1, .5),
ants = rbinom(n, 1, .7))
out <- d[, .SD[sample(1:.N, 30)], by=.(age, lc)]
# Check
out[, table(age, lc)]
## lc
## age 0 1
## 1 30 30
## 2 30 30
## 3 30 30
## 4 30 30
## 5 30 30
Run Code Online (Sandbox Code Playgroud)