由于某些特定于我的R程序的原因,我想根据R中数据框中的现有列和行来分配列名和行名.也就是说,第一行必须成为列名,第一列有成为行名.
我首先想到的很简单,使用:
colnames(myDataFrame) <- myDataFrame[1,]
rownames(MyDataFrame) <- myDataFrame[,1]
Run Code Online (Sandbox Code Playgroud)
因为它也写在这个主题中.
但是我的数据框的第一行和第一列有很多情况要处理:只有文本,带有数字的文本,文本或数字...这就是为什么这有时不起作用.查看第一行中仅包含文本的示例:
我首先加载我的数据框,没有任何标题:
> tab <- read.table(file, header = FALSE, sep = "\t")
> tab
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 TEST this is only text hoping it will work
2 I 4 0 0 0 0 0 0 1
3 really 7 6 6 3 10 6 10 10
4 hope 187 141 140 129 130 157 138 168
Run Code Online (Sandbox Code Playgroud)
这是我的数据框,没有行名和列名.我希望"TEST这只是文本,希望它可以工作"成为我的专栏名称.这个做法不起作用:
> colnames(tab) <- tab[1,]
> tab
2 10 9 9 10 8 9 8 9
1 TEST this is only text hoping it will work
2 I 4 0 0 0 0 0 0 1
3 really 7 6 6 3 10 6 10 10
4 hope 187 141 140 129 130 157 138 168
Run Code Online (Sandbox Code Playgroud)
虽然这有效:
> colnames(tab) <- as.character(unlist(tab[1,]))
> tab
TEST this is only text hoping it will work
1 TEST this is only text hoping it will work
2 I 4 0 0 0 0 0 0 1
3 really 7 6 6 3 10 6 10 10
4 hope 187 141 140 129 130 157 138 168
Run Code Online (Sandbox Code Playgroud)
我认为问题是因为R有时会将第一列或第一行视为因素.但正如你所看到的:
> is.factor(tab[1,])
FALSE
Run Code Online (Sandbox Code Playgroud)
即使它没有被R转换为因子,它也会失败.
我试图在我的程序中提示"as.character(unlist()))",但在我可能遇到的其他一些情况下,它不再有效!...在第一行中查看带有文本和数字的示例:
> otherTab <- read.table(otherFile, header = FALSE, sep = "\t")
> otherTab
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 TEST this45 is 486text 725 with ca257 some numbers
2 number45 4 0 0 0 0 0 0 1
3 254every 7 6 6 3 10 6 10 10
4 where 187 141 140 129 130 157 138 168
> colnames(otherTab) <- as.character(unlist(otherTab[1,]))
> otherTab
6 10 9 7 725 8 9 8 9
1 TEST this45 is 486text 725 with ca257 some numbers
2 number45 4 0 0 0 0 0 0 1
3 254every 7 6 6 3 10 6 10 10
4 where 187 141 140 129 130 157 138 168
Run Code Online (Sandbox Code Playgroud)
那么如何以一种简单的方式处理这些不同的情况(因为这似乎是一个如此简单的问题)?提前谢谢了.
发生这种情况是因为,在您的初始数据框中,V5是一个类型为"int"的列,而不是一个因子(因此您的第一行中有两种不同的类型)
#> str(df)
#'data.frame': 4 obs. of 9 variables:
# $ V1: Factor w/ 4 levels "254every","TEST",..: 2 3 1 4
# $ V2: Factor w/ 4 levels "187","4","7",..: 4 2 3 1
# $ V3: Factor w/ 4 levels "0","141","6",..: 4 1 3 2
# $ V4: Factor w/ 4 levels "0","140","486text",..: 3 1 4 2
# $ V5: int 725 0 3 129
# $ V6: Factor w/ 4 levels "0","10","130",..: 4 1 2 3
# $ V7: Factor w/ 4 levels "0","157","6",..: 4 1 3 2
# $ V8: Factor w/ 4 levels "0","10","138",..: 4 1 2 3
# $ V9: Factor w/ 4 levels "1","10","168",..: 4 1 2 3
Run Code Online (Sandbox Code Playgroud)
矢量的所有元素必须是相同的类型.当您尝试unlist()将值存储在要传递给的向量中时colnames(),实际上传递了一个"int"向量(因为R将元素强制转换为公共类型):
#> str(unlist(df[1,]))
# Named int [1:9] 2 4 4 3 725 4 4 4 4
# - attr(*, "names")= chr [1:9] "V1" "V2" "V3" "V4" ...
Run Code Online (Sandbox Code Playgroud)
如果修改数据框的结构以指定该列V5是一个因子,那么您的初始方法将起作用:
df[,5] <- as.factor(df[,5])
colnames(df) <- unlist(df[1,])
Run Code Online (Sandbox Code Playgroud)
你会得到:
#> df
# TEST this45 is 486text 725 with ca257 some numbers
#1 TEST this45 is 486text 725 with ca257 some numbers
#2 number45 4 0 0 0 0 0 0 1
#3 254every 7 6 6 3 10 6 10 10
#4 where 187 141 140 129 130 157 138 168
Run Code Online (Sandbox Code Playgroud)
如果您不想修改列类型,可以as.character()在强制转换为向量并传递给colnames():之前应用于第一行的每个元素.
colnames(df) <- lapply(df[1,], as.character)
Run Code Online (Sandbox Code Playgroud)
结果如下:
#> df
# TEST this45 is 486text 725 with ca257 some numbers
#1 TEST this45 is 486text 725 with ca257 some numbers
#2 number45 4 0 0 0 0 0 0 1
#3 254every 7 6 6 3 10 6 10 10
#4 where 187 141 140 129 130 157 138 168
Run Code Online (Sandbox Code Playgroud)
数据
structure(list(V1 = structure(c(2L, 3L, 1L, 4L), .Label = c("254every",
"TEST", "number45", "where"), class = "factor"), V2 = structure(c(4L,
2L, 3L, 1L), .Label = c("187", "4", "7", "this45"), class = "factor"),
V3 = structure(c(4L, 1L, 3L, 2L), .Label = c("0", "141",
"6", "is"), class = "factor"), V4 = structure(c(3L, 1L, 4L,
2L), .Label = c("0", "140", "486text", "6"), class = "factor"),
V5 = c(725L, 0L, 3L, 129L), V6 = structure(c(4L, 1L, 2L,
3L), .Label = c("0", "10", "130", "with"), class = "factor"),
V7 = structure(c(4L, 1L, 3L, 2L), .Label = c("0", "157",
"6", "ca257"), class = "factor"), V8 = structure(c(4L, 1L,
2L, 3L), .Label = c("0", "10", "138", "some"), class = "factor"),
V9 = structure(c(4L, 1L, 2L, 3L), .Label = c("1", "10", "168",
"numbers"), class = "factor")), .Names = c("V1", "V2", "V3",
"V4", "V5", "V6", "V7", "V8", "V9"), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Run Code Online (Sandbox Code Playgroud)