小编use*_*689的帖子

分组连续范围

我有一个包含很多行的数据表,我想有条件地将两列分组,即Begin和End.这些列代表相关人员正在做某事的某个月.这是一些示例数据(如果不使用R,可以使用R读入,或者在下面找到纯表):

# base:
test <- read.table(
text = "
1   A   mnb USA prim    4   12
2   A   mnb USA x   13  15
3   A   mnb USA un  16  25
4   A   mnb USA fdfds   1   2
5   B   ghf CAN sdg 3   27
6   B   ghf CAN hgh 28  29
7   B   ghf CAN y   24  31
8   B   ghf CAN ghf 38  42
",header=F)
library(data.table)
setDT(test)
names(test) <-  c("row","Person","Name","Country","add info","Begin","End")
out <- read.table(
text = "
1   A …
Run Code Online (Sandbox Code Playgroud)

sql r range plyr data.table

7
推荐指数
1
解决办法
375
查看次数

使用xtable生成具有显着星的Latex表(***)

我目前正在使用xtable从R生成Latex表.它运行正常,但在其中一个表中,我对某些数字有明显的影响.像这样的数据帧X:

1 2 3 4 5 Test1 Test2 Test3    
a  "1.34" "0.43" "-0.26" "0.13" "0.05" "3.35^{.}"     "343^{***}" "3244^{***}"
b "2.02" "2.17" "-3.19" "4.43" "1.43" "390.1^{***}"  "31.23^{***}"  "24^{***}"
c    "23.07" "32.1"  "24.3"   "3.89" "0.4"  "429.38^{***}" "17.04^{***}"  "2424^{***}" 
d    "21.48" "14.45" "14.19"  "22.04" "0.15" "385.17^{***}" "2424^{***}"  "2424^{***}"
Run Code Online (Sandbox Code Playgroud)

我在星星之前使用'^',因为在Latex中,星星在那种格式中看起来更好.另一种选择是:

a  "1.34" "0.43" "-0.26" "0.13" "0.05" "3.35."     "343***" "3244***"
b "2.02" "2.17" "-3.19" "4.43" "1.43" "390.1***"  "31.23^***"  "24***"
# etc.
Run Code Online (Sandbox Code Playgroud)

如果我使用xtable via:

  print(xtable(X, label="X"),
  size="normalsize", 
  include.rownames=FALSE, 
  include.colnames=TRUE, 
  caption.placement="top",
  hline.after=NULL
  )
Run Code Online (Sandbox Code Playgroud)

我得到如下输出:

 \begin{table}[ht]
 \centering
{\normalsize …
Run Code Online (Sandbox Code Playgroud)

latex r xtable output

6
推荐指数
1
解决办法
1944
查看次数

数据表中的相关矩阵

如果我有以下数据表:

set.seed(1)
TDT <- data.table(Group = c(rep("A",40),rep("B",60)),
                      Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      norm = round(runif(100)/10,2),
                      x1 = sample(100,100),
                      x2 = round(rnorm(100,0.75,0.3),2),
                      x3 = round(rnorm(100,0.75,0.3),2),
                      x4 = round(rnorm(100,0.75,0.3),2),
                      x5 = round(rnorm(100,0.75,0.3),2))
Run Code Online (Sandbox Code Playgroud)

如何通过时间计算x1,x2,x3,x4和x5之间的相关性?

这个:

TDT[,x:= list(cor(TDT[,5:9])), by = Time]
Run Code Online (Sandbox Code Playgroud)

不起作用。

怎么做datatable呢?

r correlation data.table

5
推荐指数
1
解决办法
1537
查看次数

缩短嵌套ifelse

如果给出以下数据表,并且我们希望将x1与x2和x5进行比较,则可以使用以下数据:

set.seed(1)
library(data.table)
TDT <- data.table(x1 = round(rnorm(100,0.75,0.3),2),
                  x2 = round(rnorm(100,0.75,0.3),2),
                  x3 = round(rnorm(100,0.75,0.3),2),
                  x4 = round(rnorm(100,0.75,0.3),2),
                  x5 = round(rnorm(100,0.75,0.3),2))

TDT[,compare := ifelse(x1 < x2,1,ifelse(x1 < x3,2,ifelse(x1 < x4,3,ifelse(x1 < x5,4,5))))]
Run Code Online (Sandbox Code Playgroud)

所以,如果x1 < x2,然后compare == 1,等

现在在我的例子中,我有更多的列来比较x1和.有没有办法更简洁地写这个,即没有嵌套的ifelse?

if-statement r data.table

3
推荐指数
2
解决办法
250
查看次数

转换为平衡的面板数据

我有一个不平衡的面板,如下例所示:

test <- read.table(
text = "
A   2010-01-01  1   rdm
A   2010-01-10  2   dfg
A   2010-01-14  3   fdgfd
A   2010-02-15  4   fdgfd
A   2010-08-17  5   dg
A   2010-12-19  6   dfg
B   2009-01-01  1   dfg
B   2010-01-01  2   ydg
B   2010-01-10  3   fdgfd
B   2010-01-14  4   dfg
B   2010-02-15  5   dfg
",header=F)
library(data.table)
setDT(test)
names(test) <-  c("ID", "date", "nr", "namecol")
Run Code Online (Sandbox Code Playgroud)

我想在日期方面进行平衡,即每个人(A,B等)在没有数据的日期都有NA行.我不知道每组的最小日期或组之间的最小日期.与最大值相同,但也许选择一个等于特定日期的最大值(与跨组计算相比)更快.所需的输出是:

out <- read.table(
text = "
A   2009-01-01  NA  NA
A   2010-01-01  1   rdm
A   2010-01-10  2   dfg
A …
Run Code Online (Sandbox Code Playgroud)

r panel reshape2 data.table

2
推荐指数
1
解决办法
293
查看次数

标签 统计

r ×5

data.table ×4

correlation ×1

if-statement ×1

latex ×1

output ×1

panel ×1

plyr ×1

range ×1

reshape2 ×1

sql ×1

xtable ×1