1.创建一个空数据框
y <- data.frame()
Run Code Online (Sandbox Code Playgroud)
2.将x,一个字符串向量分配给y作为其列名
x <- c("name", "age", "gender")
colnames(y) <- x
Run Code Online (Sandbox Code Playgroud)
结果:
Error in `colnames<-`(`*tmp*`, value = c("name", "age", "gender")) :
'names' attribute [3] must be the same length as the vector [0]
Run Code Online (Sandbox Code Playgroud)
实际上,x长度是动态的,所以
y <- data.frame(name=character(), age=numeric(), gender=logical())
Run Code Online (Sandbox Code Playgroud)
不是命名列的有效方法.我该如何解决这个问题?谢谢
我有一个数据表dt:
library(data.table)
dt = data.table(a=LETTERS[c(1,1:3)],b=4:7)
a b
1: A 4
2: A 5
3: B 6
4: C 7
Run Code Online (Sandbox Code Playgroud)
结果dt[, .N, by=a]是
a N
1: A 2
2: B 1
3: C 1
Run Code Online (Sandbox Code Playgroud)
我知道by=a或者by="a"按a列分组的方法,N列是重复次数的总和a.但是,我没有使用,nrow()但我得到了结果.这.N不仅仅是列名吗?我??".N"在R中找不到该文件.我试图使用.K,但它不起作用.什么.N意思?
我有一个包含字母[az],空格[]和撇号[']的字符串变量,例如.x <- "a'b c"
我想用blank []替换撇号['],并用下划线[_]替换space [].
x <- gsub("'", "", x)
x <- gsub(" ", "_", x)
Run Code Online (Sandbox Code Playgroud)
它绝对有效,但是当我有很多条件时,代码变得丑陋.因此,我想使用chartr(),但chartr()不能处理空白,例如.
x <- chartr("' ", "_", x)
#Error in chartr("' ", "_", "a'b c") : 'old' is longer than 'new'
Run Code Online (Sandbox Code Playgroud)
有什么方法可以解决这个问题吗?谢谢!
我有一个5Gb .dat文件(> 1000万行).每行的格式类似于aaaa bb cccc0123 xxx kkkkkkkkkkkkkk或者aaaaabbbcccc01234xxxkkkkkkkkkkkkkk例如.因为readLines在读取大文件时性能不佳,我选择fread()阅读此内容,但发生了错误:
library("data.table")
x <- fread("test.DAT")
Error in fread("test.DAT") :
Expecting 5 cols, but line 5 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep=' ' and/or (unescaped) '\n' characters within unbalanced unescaped quotes. fread cannot handle such ambiguous cases and those lines may not have been read in as expected. Please read the …Run Code Online (Sandbox Code Playgroud) 我有一个主表(a),包含列:id、age 和sex。例如。
a <- data.frame(id=letters[1:4], age=c(18,NA,9,NA), sex=c("M","F","F","M"))
id age sex
1 a 18 M
2 b NA F
3 c 9 F
4 d NA M
Run Code Online (Sandbox Code Playgroud)
我有一个补充表(b),只包含表(a)中的所有缺失数据或表(a)中的重复数据。例如。
b <- data.frame(id=c("a","b","d"), age=c(18,32,20))
id age
1 a 18
2 b 32
3 d 20
Run Code Online (Sandbox Code Playgroud)
现在我想合并这两个表,如下所示:
id age sex
1 a 18 M
2 b 32 F
3 c 9 F
4 d 20 M
Run Code Online (Sandbox Code Playgroud)
不过,我试过了merge(a,b,by="id",all=T)。结果不是我想要的。有没有办法解决这个问题?谢谢!
我想替换:
(1)", "(逗号+空格)带"_"(下划线)
(2)"'"(撇号)与"'s"(撇号+ s)
library(gsubfn)
x <- c("Mary' car is red.", "A, B, C")
gsubfn(".", list(", " = "_", "'" = "'s"), x)
Run Code Online (Sandbox Code Playgroud)
我想是"Mary's car is red."和"A_B_C",但结果却是"Mary's car is red."和"A, B, C".为什么?