我有两个数据框,x和y.
x<-data.frame(id=c(1,2,3,4,5), g=c(21,52,43,94,35))
y<-data.frame(id=c(3,4,7), u=c(55, 77, 99))
Run Code Online (Sandbox Code Playgroud)
我希望将x子集仅包括具有"ID"的观察结果,这些观察结果也在y中.
这样做的最佳方式是什么?
谢谢!
我的数据看起来像这样:
DF<- data.frame( id=c("A1","A2","A3","A1"), submission=c(1,1,1,2))
Run Code Online (Sandbox Code Playgroud)
保留每个ID的最后一次提交的最佳方法是什么?那是:
DF<- data.frame( id=c("A2","A3","A1"), submission=c(1,1,2))
Run Code Online (Sandbox Code Playgroud)
谢谢!
假设我有一个看起来像这样的数据框:
df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3",
"N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L,
4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above",
"Private not-for-profit, 4-year or above", "Public, 4-year or above"
), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name",
"sector", "flagship"), row.names = c(NA, 6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我想创建一个新的因子变量“ Sector”。我可以用很多行代码来做很长的路,但是我敢肯定有一种更有效的方法。
现在这就是我正在做的:
df1$PublicFlag=0
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1
df1$Public=0
df1$Public[df1$sector=="Public, 4-year or …Run Code Online (Sandbox Code Playgroud) 在 RI 中可以weighted.mean用来计算加权算术平均值。例如:
wt <- c(5, 5, 4, 1)/15
x <- c(3.7,3.3,3.5,2.8)
xm <- weighted.mean(x, wt)
xm
[1] 3.453333
Run Code Online (Sandbox Code Playgroud)
我可以“手工”计算这个: wt[1]*x[1]+wt[2]*x[2]+wt[3]*x[3]+wt[4]*x[4]
我想写一个循环来做同样的事情。我写了这段代码:
xm <-0
for(j in length(wt)){
xm <- xm + wt[j]*x[j]
}
xm
[1] 0.1866667
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
我有一个看起来像这样的数据框:
df1 <- data.frame(V1=rnorm(n = 100, mean=0, sd=1),
Edu=sample(x = c(-999,12,13,14,16,1), size = 100,
replace = T, prob = c(0.05,0.2,.2,0.2,0.2,0.15)))
Run Code Online (Sandbox Code Playgroud)
我想将变量转换为Edu有序因子变量.我可以使用以下代码将其转换为字符变量:
lutedu <- c('-999' = NA, '12' = "High School", '13' = "Associate's",
'14' = "Associate's", '16' = "Bachelor's",
'18' = "Master's, Graduate/professional", '21' = "PhD")
df1$Edu <- lutedu[as.character(df1$Edu)]
Run Code Online (Sandbox Code Playgroud)
从那里我可以将字符变量转换为有序因子ordered():
df1$Edu <-
ordered(
x = df1$Edu, levels = c(
"High School", "Associate's", "Bachelor's",
"Master's, Graduate/professional", "PhD"
)
)
Run Code Online (Sandbox Code Playgroud)
有没有更好的方法呢?
假设我有一个如下数据框:
df <- data.frame(v1 = sample(1:10, 100, replace = T), v2 = sample(LETTERS, 100, replace = T),
V3 = sample(letters, 100, replace = T), v4 = sample(1:15, 100, replace = T))
Run Code Online (Sandbox Code Playgroud)
我想创建一个新的数据框df2只包含超过10个值的列.因此,在这个例子中它将是v2,v3和v4.我怎样才能做到这一点?在实践中,我的数据框有数千列.
我试过这个:
df2 <- df %>% select(which(length(unique(.))>10))
Run Code Online (Sandbox Code Playgroud) 我用Rcpp和OpenMP编写了以下简单示例,当我从RStudio中获取cpp文件时,该示例工作正常:
#include <Rcpp.h>
#include <omp.h>
// [[Rcpp::plugins(openmp)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix my_matrix(int I, int J, int nthreads) {
NumericMatrix A(I,J);
int i,j,tid;
omp_set_num_threads(nthreads);
#pragma omp parallel for private(i, j, tid)
for(int i = 0; i < I; i++) {
for(int j = 0; j < J; j++) {
tid = omp_get_thread_num();
A(i,j) = tid ;
}
}
return A;
}
/*** R
set.seed(42)
my_matrix(10,10,5)
*/
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 …Run Code Online (Sandbox Code Playgroud) 假设我有2个数据框,一个用于2015年,一个用于2016年。我想为每个数据框运行回归,并为每个回归绘制系数之一及其各自的置信区间。例如:
set.seed(1020022316)
library(dplyr)
library(stargazer)
df16 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
select(-e)
df15 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
select(-e)
lm16 <- lm(y ~ x1 + t, data …Run Code Online (Sandbox Code Playgroud)