我有一个包含很多行的数据表,我想有条件地将两列分组,即Begin和End.这些列代表相关人员正在做某事的某个月.这是一些示例数据(如果不使用R,可以使用R读入,或者在下面找到纯表):
# base:
test <- read.table(
text = "
1 A mnb USA prim 4 12
2 A mnb USA x 13 15
3 A mnb USA un 16 25
4 A mnb USA fdfds 1 2
5 B ghf CAN sdg 3 27
6 B ghf CAN hgh 28 29
7 B ghf CAN y 24 31
8 B ghf CAN ghf 38 42
",header=F)
library(data.table)
setDT(test)
names(test) <- c("row","Person","Name","Country","add info","Begin","End")
out <- read.table(
text = "
1 A …Run Code Online (Sandbox Code Playgroud) 我目前正在使用xtable从R生成Latex表.它运行正常,但在其中一个表中,我对某些数字有明显的影响.像这样的数据帧X:
1 2 3 4 5 Test1 Test2 Test3
a "1.34" "0.43" "-0.26" "0.13" "0.05" "3.35^{.}" "343^{***}" "3244^{***}"
b "2.02" "2.17" "-3.19" "4.43" "1.43" "390.1^{***}" "31.23^{***}" "24^{***}"
c "23.07" "32.1" "24.3" "3.89" "0.4" "429.38^{***}" "17.04^{***}" "2424^{***}"
d "21.48" "14.45" "14.19" "22.04" "0.15" "385.17^{***}" "2424^{***}" "2424^{***}"
Run Code Online (Sandbox Code Playgroud)
我在星星之前使用'^',因为在Latex中,星星在那种格式中看起来更好.另一种选择是:
a "1.34" "0.43" "-0.26" "0.13" "0.05" "3.35." "343***" "3244***"
b "2.02" "2.17" "-3.19" "4.43" "1.43" "390.1***" "31.23^***" "24***"
# etc.
Run Code Online (Sandbox Code Playgroud)
如果我使用xtable via:
print(xtable(X, label="X"),
size="normalsize",
include.rownames=FALSE,
include.colnames=TRUE,
caption.placement="top",
hline.after=NULL
)
Run Code Online (Sandbox Code Playgroud)
我得到如下输出:
\begin{table}[ht]
\centering
{\normalsize …Run Code Online (Sandbox Code Playgroud) 如果我有以下数据表:
set.seed(1)
TDT <- data.table(Group = c(rep("A",40),rep("B",60)),
Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
norm = round(runif(100)/10,2),
x1 = sample(100,100),
x2 = round(rnorm(100,0.75,0.3),2),
x3 = round(rnorm(100,0.75,0.3),2),
x4 = round(rnorm(100,0.75,0.3),2),
x5 = round(rnorm(100,0.75,0.3),2))
Run Code Online (Sandbox Code Playgroud)
如何通过时间计算x1,x2,x3,x4和x5之间的相关性?
这个:
TDT[,x:= list(cor(TDT[,5:9])), by = Time]
Run Code Online (Sandbox Code Playgroud)
不起作用。
怎么做datatable呢?
如果给出以下数据表,并且我们希望将x1与x2和x5进行比较,则可以使用以下数据:
set.seed(1)
library(data.table)
TDT <- data.table(x1 = round(rnorm(100,0.75,0.3),2),
x2 = round(rnorm(100,0.75,0.3),2),
x3 = round(rnorm(100,0.75,0.3),2),
x4 = round(rnorm(100,0.75,0.3),2),
x5 = round(rnorm(100,0.75,0.3),2))
TDT[,compare := ifelse(x1 < x2,1,ifelse(x1 < x3,2,ifelse(x1 < x4,3,ifelse(x1 < x5,4,5))))]
Run Code Online (Sandbox Code Playgroud)
所以,如果x1 < x2,然后compare == 1,等
现在在我的例子中,我有更多的列来比较x1和.有没有办法更简洁地写这个,即没有嵌套的ifelse?
我有一个不平衡的面板,如下例所示:
test <- read.table(
text = "
A 2010-01-01 1 rdm
A 2010-01-10 2 dfg
A 2010-01-14 3 fdgfd
A 2010-02-15 4 fdgfd
A 2010-08-17 5 dg
A 2010-12-19 6 dfg
B 2009-01-01 1 dfg
B 2010-01-01 2 ydg
B 2010-01-10 3 fdgfd
B 2010-01-14 4 dfg
B 2010-02-15 5 dfg
",header=F)
library(data.table)
setDT(test)
names(test) <- c("ID", "date", "nr", "namecol")
Run Code Online (Sandbox Code Playgroud)
我想在日期方面进行平衡,即每个人(A,B等)在没有数据的日期都有NA行.我不知道每组的最小日期或组之间的最小日期.与最大值相同,但也许选择一个等于特定日期的最大值(与跨组计算相比)更快.所需的输出是:
out <- read.table(
text = "
A 2009-01-01 NA NA
A 2010-01-01 1 rdm
A 2010-01-10 2 dfg
A …Run Code Online (Sandbox Code Playgroud) r ×5
data.table ×4
correlation ×1
if-statement ×1
latex ×1
output ×1
panel ×1
plyr ×1
range ×1
reshape2 ×1
sql ×1
xtable ×1